BIKE SHARING DATA ANALYSIS
BIKE SHARING DATA ANALYSIS
• After analyzing the data, the main challenge seems to be the uncertainty in demand (each hour & each day)
• 2 of the discrete factor variables and 4 continuous variables that can have an impact on demand and their
relationship has been depicted in the following graphs and table (Blue Boxes):
•
•
Bike demand per hour can be segregated as:
High
: 7-9 and 17-19 hours
Average : 10-16 hours
Low
: 0-6 and 20-24 hours
The plot shows impact of Rain on the demand as
it fell drastically
Demand is highest during clear weather as well as
the numbers are steady
Total users in a day
2 types of data were available, analysis of file with hourly
data has been presented.
The independent variables identified are:
- instant: record index
- dteday : date
- season : season (1:springer, 2:summer, 3:fall, 4:winter)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- hr : hour (0 to 23)
- holiday : weather day is holiday or not
- weekday : day of the week
- workingday : if day is not weekend/holiday its 1 else 0.
- weathersit : weather(1: Clear, 2: Mist, 3: Light Snow, Light
Rain, 4: Heavy Rain + Ice Pallets
- temp : Normalized temperature in Celsius.
- atemp: Normalized feeling temperature in Celsius.
- hum: Normalized humidity
- windspeed: Normalized wind speed.
Dependent variables are:
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and
registered
•
Correlation factor of continuous variables on all three
dependent variables
•
•
•
•
Variables and their individual effects have been
presented as examples.
However, it will be interesting to measure their impacts
altogether and prepare a model to predict the hourly
demand
A sample model was prepared with numeric type
variables after factorizing the discrete variables.
The correlation factor between fitted and original values
of “cnt” obtained is 0.7921 and the plot is as shown in the
right.
Thus, it needs to be determined if this model can be
accepted or needs to be improved to gain more accuracy
• After analyzing the data, the main challenge seems to be the uncertainty in demand (each hour & each day)
• 2 of the discrete factor variables and 4 continuous variables that can have an impact on demand and their
relationship has been depicted in the following graphs and table (Blue Boxes):
•
•
Bike demand per hour can be segregated as:
High
: 7-9 and 17-19 hours
Average : 10-16 hours
Low
: 0-6 and 20-24 hours
The plot shows impact of Rain on the demand as
it fell drastically
Demand is highest during clear weather as well as
the numbers are steady
Total users in a day
2 types of data were available, analysis of file with hourly
data has been presented.
The independent variables identified are:
- instant: record index
- dteday : date
- season : season (1:springer, 2:summer, 3:fall, 4:winter)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- hr : hour (0 to 23)
- holiday : weather day is holiday or not
- weekday : day of the week
- workingday : if day is not weekend/holiday its 1 else 0.
- weathersit : weather(1: Clear, 2: Mist, 3: Light Snow, Light
Rain, 4: Heavy Rain + Ice Pallets
- temp : Normalized temperature in Celsius.
- atemp: Normalized feeling temperature in Celsius.
- hum: Normalized humidity
- windspeed: Normalized wind speed.
Dependent variables are:
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and
registered
•
Correlation factor of continuous variables on all three
dependent variables
•
•
•
•
Variables and their individual effects have been
presented as examples.
However, it will be interesting to measure their impacts
altogether and prepare a model to predict the hourly
demand
A sample model was prepared with numeric type
variables after factorizing the discrete variables.
The correlation factor between fitted and original values
of “cnt” obtained is 0.7921 and the plot is as shown in the
right.
Thus, it needs to be determined if this model can be
accepted or needs to be improved to gain more accuracy