Project - Cold Storage Study PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Project - Cold Storage Study

Problem 1
1. Project Objective:
The objective of this report is to explore the Cold Storage Case Study in R and generate
insights about the data set. This exploration report will consists of the following:

>Importing the dataset in R


>Understanding the structure of dataset
>Run an analysis on the data, Check distribution patterns
>Graphical exploration
>Describe statistics

2 Understanding Problem 1 data

A Cold Storage was started in 2016 which stores different types of milk products.The Storage
needs to maintain a strict temperature range of 2 - 4 C. If this temperature range is not
maintained by the maintenance firm, the company has to pay a penalty.

The penalty was set at 10% of AMC (annual maintenance cost), if probability of temperature
going outside the 2 - 4 C range was above 2.5% and less than 5%. If it exceeds 5% then the
penalty would be 25% of the AMC fee.

3 Data Analysis – A step by step data exploration consists of the following steps:

1. Environment Set up and Data Import


2. Variable Identification
3. Segregate Data
4. Graphic Analysis
5. Find mean cold storage temperature for Summer, Winter and Rainy Season
6. Find overall mean for the full year
7. Find Standard Deviation for the full year
8. Assume Normal distribution, what is the probability of temperature having fallen below 2 C
9. Assume Normal distribution, what is the probability of temperature having gone above 4 C?
10. What will be the penalty for the AMC Company? (7.5 marks)

Feature Exploration
3.1 Environment Set up and Data Import
3.1.1 ## Set working directory
setwd("C:/Users/satyam.sharma/Desktop/R programming")

3.1.2 ### install package for reading CSV files


> install.packages("readr")
3.1.3 ### import data set
>Cold_storage = read.csv("Cold_Storage_Temp_Data.csv")

3.1.4 ### open library for reading the csv files


>library(readr)

3.2. Variable identification


### check structure of the data
str(Cold_storage)
'data.frame': 365 obs. of 4 variables:
$ Season : Factor w/ 3 levels "Rainy","Summer",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Date : int 1 2 3 4 5 6 7 8 9 10 ...
$ Temperature: num 2.4 2.3 2.4 2.8 2.5 2.4 2.8 2.3 2.4 2.8 ...

###Know classes of data


> class(Cold_storage)
[1] "data.frame"

### Know dimension of the data


> dim(Cold_storage)
[1] 365 4

### Show top 5 heads of the data


> head(Problem2)
Season Month Date Temperature
1 Summer Feb 11 4.0
2 Summer Feb 12 3.9
3 Summer Feb 13 3.9
4 Summer Feb 14 4.0
5 Summer Feb 15 3.8
6 Summer Feb 16 4.0

### Show last 5 entries of the data


> tail(Problem2)
Season Month Date Temperature
30 Summer Mar 12 3.8
31 Summer Mar 13 4.2
32 Summer Mar 14 4.2
33 Summer Mar 15 3.8
34 Summer Mar 16 3.9
35 Summer Mar 17 3.9

### Just to understand the data better we did a summary of the data
We find out Temperature mean is 3.974, median is 3.9

> summary(Problem2)
Season Month Date Temperature
Summer:35 Feb:18 Min. : 1.0 Min. :3.800
Mar:17 1st Qu.: 9.5 1st Qu.:3.900
Median :14.0 ​Median :3.900
Mean :14.4 ​ Mean :3.974
3rd Qu.:19.5 3rd Qu.:4.100
Max. :28.0 Max. :4.600

3.3: Segregate Data: ​To solve our problem we can segregate the Seasons and Temperature
data from the main file as “Seasons”

Seasons = Cold_Storage [,c("Season", "Temperature")]

4. #Graphic pattern recognisation

install.packages(ggplot1)

library(ggplot2)

>hist(Temperature,col ="Red")
>boxplot(Seasons$Temperature, horizontal = TRUE, col = "Green", main = "Boxplot of
Temperature")

5. Find mean cold storage temperature for Summer, Winter and Rainy Season

We can find mean of each seasons through ​by ​function of dplyr package

Install.package(dplyr)
Library (dplyr)

> by(Cold_storage, INDICES = Season, FUN = summary)


Mean
Summer = ​3.153
Winter = ​2.701
Rainy Season​ = 3.039
6. Find overall mean for the full year:
> mean(Temperature) or mean (Seasons$Temperature)
[1] 2.96274

7. Find Standard Deviation for the full year:


> sd (Temperature) or sd (Seasons$Temperature)
[1] 0.508589

8. Assume Normal distribution, what is the probability of temperature having fallen below
2C

Ans ​2.9% Probability of temperature having fallen below 2 C


> pnorm(2, mean = 2.96274, sd = 0.508589)
[1] 0.02918142

9. Assume Normal distribution, what is the probability of temperature having gone above
4 C?
Ans: 2.07% probability of temperature gone above 4 C

> 1-pnorm(4,mean=2.96274, sd = 0.508589)


[1] 0.02070079

10. What will be the penalty for the AMC Company?


Since the added probability of below 2 C and above 4C (2.9% + 2.07% = 4.97%) temperature is
4.97%, which lies in the penalty range of 2.​5% and 5%, the penalty would be 10% of the AMC.

============================================================================

Problem 2

1. Project Objective:
The objective of this report is to explore the Cold Storage data of a month R and find statistical
evidence ​whether there is a need for some corrective action in the Cold Storage Plant or not.

>Importing the dataset in R


>Understanding the structure of dataset
>Run an analysis on the data, Check distribution patterns, do hypothesis testing
>Describe statistics
>Give inference

2 Understanding Problem 2 data

Cold Storage is getting complaints about dairy products going sour and bad smell. Supervisor
takes out temperature data of 35 days and decides to maintain the temperature at 3.9 C or
below.

With the given data we have to not find out whether some corrective action required in the Cold
Storage Plant.

3. Data Analysis: A step by step data exploration consists of the following steps:

1. Environment Set up and Data Import


2. Variable Identification
3. Segregate Data
4. Graphic Analysis
5.Which Hypothesis test shall be performed to check if corrective action is needed at the cold
storage plant?
6. State the Hypothesis, perform hypothesis test and determine p-value
7. Give inference

Feature Exploration
3.1 Environment Set up and Data Import
3.1.1 ## Set working directory
setwd("C:/Users/satyam.sharma/Desktop/R programming")

3.1.2 ### install package for reading CSV files


> install.packages("readr")

3.1.3 ### import data set


>Problem2 = read.csv("Cold_Storage_Mar2018.csv")

3.1.4 ### open library for reading the csv files


>library(readr)

3.2. Variable identification


> str(Problem2)
'data.frame': 35 obs. of 4 variables:
$ Season : Factor w/ 1 level "Summer": 1 1 1 1 1 1 1 1 1 1 ...
$ Month : Factor w/ 2 levels "Feb","Mar": 1 1 1 1 1 1 1 1 1 1 ...
$ Date : int 11 12 13 14 15 16 17 18 19 20 ...
$ Temperature: num 4 3.9 3.9 4 3.8 4 4.1 4 3.8 3.9 ...

> class(Problem2)
[1] "data.frame"

> dim(Problem2)
[1] 35 4
>summary(Problem2)
Season Month Date Temperature
Summer:35 Feb:18 Min. : 1.0 Min. :3.800
Mar:17 1st Qu.: 9.5 1st Qu.:3.900
Median :14.0 Median :3.900
Mean :14.4 Mean :3.974
3rd Qu.:19.5 3rd Qu.:4.100
Max. :28.0 Max. :4.600

3.4 ### Segregate data in the form of Season and Temperature

>Seasons = Problem2 [,c("Season", "Temperature")]

#### Find Standard Deviation or Sigma of Temperature


> SD = sd (Seasons$Temperature)
> SD
[1] 0.159674

### Find Mean of Temperature


> Mean = mean (Seasons$Temperature)
> Mean
[1] 3.974286

4. Which Hypothesis test shall be performed to check the if corrective action is needed at
the cold storage plant? Justify your answer.

Since the Sample size is more than 30, we can conduct the Z hypothesis test; and T test to find
out the corrective action with the help of Z value, Level of significance alpha and the P Value..

5. St​ate the Hypothesis, perform hypothesis test and determine p-value

To find out whether the cold storage temperature breached the 3.9 C we have to formulate null
Hypothese (Ho) and alternative Hypothesis (H1) as follows:

Ho: Temperature less than or equal to 3.9


H1: Temperature greater than 3.9

As Ho is less than or equal to 3.9, it would be a one tail right tail test.
Since we don’t know the distribution in normally distributed or not we will apply the central limit
theorem.
Z or tstat= (Xbar-Mu)/(SD/(sqrt(35))

X bar = Population mean we calculated above that is 3.974286


Mu = 3.9
SD = 0.159674
N = 35
By substituting these figures in R we can find out the Z stat value of 2.752359

>Z = (Xbar-Mu)/(SD/(sqrt(35)))
>Z
[1] 2.752359

To find PValue we can use the R command


pnorm(-abs(Z))
[1] 0.002958384

Or Excel function Norm.s.dist by putting the z value. We have to do 1- since its the positive tail

1​-NORM.S.DIST(​2.75​)= ​0.002958380972

Now we have to calculate the alpha or the level of significance to see weather the p value is greater or
lower than alpha.

For calculating value for Alpha we have to use ​NORM.S.INV in excel. In the problem
statement its given that alpha is 0.1 or the 10%. We get an critical value
of ​1.281551564

Alpha= NORM.S.INV(​0.9​)= ​1.281551564

We can also calculate the X critical value by substituting all the values
in the Z stat formula except X bar.
X critical = 3.934
Inference:

Now, after getting all values we can see that PValue is smaller than the critical value or Alpha. It
means the value falls in the critical region of the normal distribution and we reject the null
hypothesis.

Further Z test also confirms the finding. The Z value of 2.75 is also higher than the significance level.
We reject the null hypothesis and accepts the alternative hypothesis that the temperature had
indeed exceeded 3.9 C in the cold storage.
Similarly our calculated X critical value of 3.934 is also higher than Mu and falls under the critical
area.

This all indicated that the temperature had risen more than 3.9 C on some days and the storage
needs correction.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy