We must predict the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.
- datetime - hourly date + timestamp
- season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
- holiday - whether the day is considered a holiday
- workingday - whether the day is neither a weekend nor holiday
- weather - 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp - temperature in Celsius
- atemp - "feels like" temperature in Celsius
- humidity - relative humidity
- windspeed - wind speed
- casual - number of non-registered user rentals initiated
- registered - number of registered user rentals initiated
- count - number of total rentals
-
Use all information stored in date field. Feeds must include:
- hour
- month
- day_of_weed (1-7)
- year
-
It is a good idea to use log transform (
np.log1p
,np.expm1
) for 'count' column. The data is skewed to the left so log transformation makes it more 'normal' thus improves overall score from 0.5 to 0.42 (the most valuable improvement)
- tuned LGB (0.4)
- tuned XGB (0.41)