nyc-taxi-green-dec-2016

From MaRDI portal
Dataset:6036121



OpenML42729MaRDI QIDQ6036121

OpenML dataset with id 42729

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/22044763/nyc-taxi-green-dec-2016.arff

Upload date: 18 November 2020



Dataset Characteristics

Number of classes: 0
Number of features: 19 (numeric: 10, symbolic: 9 and in total binary: 3 )
Number of instances: 581,835
Number of instances with missing values: 0
Number of missing values: 0

String datetime information extracted to numeric columns.Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) [1]. The dataset includes TLC trips of the green line in December 2016. Data was downloaded on 03.11.2018. For a description of all variables in the dataset checkout the TLC homepage [2]. The variable 'tip_amount' was chosen as target variable. The variable 'total_amount' is ignored by default, otherwise the target could be predicted deterministically. The date variables 'lpep_pickup_datetime' and 'lpep_dropoff_datetime' (ignored by default) could be used to compute additional time features. In this version, we chose only trips with 'payment_type' == 1 (credit card), as tips are not included for most other payment types. We also removed the variables 'trip_distance' and 'fare_amount' to increase the importance of the categorical features 'PULocationID' and 'DOLocationID'.