Business Analytics Project - Group 06

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Business Analytics Report (2019-2020)

Post Graduate Diploma in Management

Analysis on Bicycle Rent Data

Submitted By:

Group: 06

Name Roll No
Priyanka Sahu B032
Rahul Vaidyanathan B033
Raghav Rathi B034
Raunak Roy B035
Rohan Chopra B036
Sagnik Bhattacharya B037

1|Page
TABLE OF CONTENTS

S No. Content Page No

1. Letter of Transmittal 3
2. Introduction / Executive Summary 4
3. Data Description 5
4. Objective 6
5. Results and Discussion 7
6. Graphs and Comments 9
7. Conclusion 16

2|Page
Letter of Transmittal:

March 07, 2020

Prof. Puneet Kumar.


Professor, Business Analytics
Department of Management Studies
NMIMS, Bengaluru

Subject: Letter of Transmittal

Respected Sir,

With due respect, we the undersigned students NMIMS, Bengaluru of PGDM 10 have reported on a
dataset about Bicycle Rent Systems.

Though we are in the learning curve, this report has enabled us to gain insight on how to present a
business report on analytical tools and what are the basic elements for structuring a business report on
the same.

Lastly, thanking you for your valuable guidance.

Yours sincerely,
Priyanka Sahu B032
Raghav Rathi B033
Rahul Vaidyanathan B034
Raunak Roy B035
Rohan Chopra B036
Sagnik Bhattacharya B037

3|Page
Executive Summary / Introduction:

Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental,
and bike return is automated via a network of kiosk locations throughout a city. Using these systems,
people are able rent a bike from a one location and return it to a different place on an as-needed basis.
Currently, there are over 500 bike-sharing programs around the world.

The data generated by these systems makes them attractive for researchers because the duration of travel,
departure location, arrival location, and time elapsed is explicitly recorded. Bike sharing systems therefore
function as a sensor network, which can be used for studying mobility in a city.

The typical bike share has several defining characteristics and features, including station-based bikes
and payment systems, membership and pass fees, and per-hour usage fees. Programs are generally
intuitive enough to facilitate a manageable learning curve for novice users. And, despite
some variation, the differences are usually small enough to prevent confusion when a regular user of
one city’s bike share uses another city’s program for the first time.

4|Page
Data Description:
• instant: Record index
• dteday: Date
• season: Season (1:springer, 2:summer, 3:fall, 4:winter)
• yr: Year (Past: 2011, Present:2012)
• mnth: Month (1 to 12)
• hr: Hour (0 to 23)
• holiday: weather day is holiday or not (extracted from Holiday Schedule)
• weekday: Day of the week
• workingday: If day is neither weekend nor holiday is 1, otherwise is 0.
• weathersit: (extracted from Freemeteo)
1: Clear, Few clouds, Partly cloudy, Partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
• temp: Normalized temperature in Celsius.
The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
• atemp: Normalized feeling temperature in Celsius.
The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
• hum: Normalized humidity. The values are divided to 100 (max)
• windspeed: Normalized wind speed. The values are divided to 67 (max)
• registered: count of registered users

The data that was used includes the following variables:


Feature details: - temp: Temperature in Celsius (x)
- atemp: Feeling temperature in Celsius
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)
- registered: count of total rental bikes aggregated on daily basis (y)
Type: Numerical (All features are quantitative)
Qualitative features are excluded because they were not adding any value to the model

5|Page
Objective:

The objective of this report is to analyse the bike sharing data and generate some insight about the factors
affecting the bike rental.
We are doing regression analysis and predicting daily bike rental count based on the environmental and
seasonal settings. Initially we found out the Bike Rental registrations (registered) is highly correlated to
the Temperature (temp) and that’s why we decided to perform Simple Linear Regression on these 2
parameters.

temp atemp hum windspeed registered

temp 1

atemp 0.987672 1

hum -0.06988 -0.05192 1

windspeed -0.02313 -0.06234 -0.2901 1

registered 0.335361 0.332559 -0.27393 0.082321 1

6|Page
Results and Discussions:

Firstly, we performed descriptive analysis to get the idea about distribution and correlation
and here are the results:
registered – rented bike count (response variable)
temp – temperature (explanatory variable)
atemp – feeling temperature (explanatory variable)
hum – humidity (explanatory variable)
windspeed – speed of wind (explanatory variable)

7|Page
We performed correlation analysis to get the features which is highly correlated with our outcome variable.

The variable atemp is ignored because it is highly correlated with temp which means it will explain same
variance in outcome variable as compared to temp variable (Multicollinearity).

8|Page
We will check for the linear relation between these two variables with the help of Scatter plot

Temp vs registered

registered
1000
900
800
700
600
500
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1 1.2

Comments: Positive relationship between temperature and registered users is seen in the above graph

Atemp vs registered

registered
1000
900
800
700
600
500
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1 1.2

Comments: In case of both measures of temperature which are temperature in Celsius and temperature feels
like in Celsius the relationship with number of bike rentals look similar. A linear model is used to present
the trend although the effect size of correlation is small in both cases. The obvious question for this is
whether both the variables should be used since they are corelated. Due to observed linear trend (even if it’s
fairly weak) we can assume that temperature can be a good predictor to include in the linear model.

9|Page
Humidity vs registered

registered
1000
900
800
700
600
500
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1 1.2

Comments: Relationship between air humidity and bike rentals is not linear.

Instead of that we have a circumstance where both low and high dampness can be related with less bicycle
rentals with the highest point of rentals some place in between. Since the relationship isn't straight we can
accept that this variable may act ineffectively in the model.

Windspeed vs registered

registered
1000
900
800
700
600
500
400
300
200
100
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Comments: For the most part, the higher the breeze speed the less bicycle rentals. We can expect the same
on account of temperature that this variable can be utilized during the modelling stage.

10 | P a g e
Hour of the day vs Registered

registered
1000
900
800
700
600
500
400
300
200
100
0
0 5 10 15 20 25

Comments: The hour of the day tells at which times people are most likely to rent. These are the office
hours or the hours of morning or evening rides for leisure

Temp vs atemp

atemp
1.2

0.8

0.6

0.4

0.2

0
0 0.2 0.4 0.6 0.8 1 1.2

11 | P a g e
Windspeed

Humidity

12 | P a g e
Log (windspeed)ss

Log(Windspeed)
0
0 5000 10000 15000 20000

-0.2

-0.4

-0.6

-0.8

-1

-1.2

Log (Hum)

LOG(Hum)
0
0 5000 10000 15000 20000
-0.2

-0.4

-0.6

-0.8

-1

-1.2

Feature Extractions/Engineering: From Correlation analysis and Linear pattern b/w Outcome and
Exploratory variables we can conclude the following: -

- atemp predictor is highly correlated with temp so we can drop it (Multicollinearity)

- Humidity predictor has very less correlation (0.05) and doesn’t show any linear relation with outcome
variable Bike Count and we can drop it too

- Windspeed predictor has also very less correlation which means it explains very less variance in outcome
variable (registered) and we can drop it too.

13 | P a g e
We are finally left with Temperature (temp) (explanatory) and registered (outcome) for our
regression analysis.

We will also try to include Humidity and Windspeed later in our model to check if we can improve our
model. The reason behind it is that though these variables have less correlation with outcome variable but
in combination with Temperature we might achieve better accuracy because all these explanatory variables
can explain variance in outcome variable (Bike Count).

Since we have just one explanatory variable, we will perform Simple Linear Regression, it is same as our
Initial Report. Here are the results from it: After performing Simple Linear Regression, we got the estimated
regression equation:

Registered =22.7714+263.60temp

Regression Statistics
Multiple R 0.335361
R Square 0.112467
Adjusted R
Square 0.112416
Standard Error 142.5963
Observations 17379

Standard Lower Upper


Coefficients Error t Stat P-value 95% 95%

Intercept 22.77714 2.994092 7.607364 2.94E-14 16.90842 28.64586

temp 263.6079 5.617601 46.92535 0 252.5968 274.6189

14 | P a g e
For 1 unit increase in temperature there will be 263.60 increase in the registrations

Intercept value tells us that 22.7714 is the portion of the bike registration which is not explained by the
Temperature.

If we perform a multiple regression between registered and other variable like temp, temp ,windspeed and
humidity , the quality of regression model improves as R square increase to 17.7%

Regression Statistics
Multiple R 0.421355
R Square 0.17754
Adjusted R
Square 0.177351
Standard Error 137.2811
Observations 17379

Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 135.5679 5.697697 23.79346 3.6E-123 124.3999 146.736
temp 38.18567 35.69021 1.06992 0.28467 -31.7707 108.1421

atemp 240.8654 40.03678 6.0161 1.82E-09 162.3893 319.3415


hum -194.611 5.66267 -34.3674 1.8E-250 -205.711 -183.512
windspeed 35.29329 9.15325 3.855821 0.000116 17.352 53.23458

Registered = 135.567 +38.18temp+240.86atemp-194.611hum+35.29windspeed

15 | P a g e
Conclusion:

From Simple Linear Regression we found that Coefficient of Determination (R-squared) equal to 0.1124
and we can say that 11.24% of variance in bike counts is explained by variation in temperature. From
Multivariate Regression we found that 17.7 (Adjusted R2) of variance in bike registrations is explained by
variation in temperature, humidity and windspeed. This is still not a perfect model, but the error got reduces
little bit. We also tried different combination of features to build model but the best one was only
Temperature (temp), Humidity (hum) and Windspeed with Bike registrations (registered) as outcome
variable. Our suggestion is that the increase in sample size will be helpful in analysis since we will get better
insight about data and eventually mitigate overfitting issue in regression analysis.

Collecting more data will definitely help us in improving our model and it will help in decreasing the
Residual errors. We can try some non-linear model (windspeed) that could be useful but desired result with
logarithmic model is not achieved. Polynomial model might work in favour. The findings about the bike
rent count correlation with temperature makes sense because people tend to rent bike more in a pleasant
weather and this is why there is a higher correlation between bike rent count and temperature (0.9). This
analysis will be useful to the Bike Rental companies in handling the supplies of bikes according to the
demand. They can improve their stocks to fulfil the demand accordingly during different seasons.

16 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy