Project of Statistical Packages Excel Work
Project of Statistical Packages Excel Work
Project of Statistical Packages Excel Work
SUBMITTED TO
MA’AM wAjihA nAsir
SUBMITTED BY
MARYAM HINA
BS19034
SEMESTER
5 TH
DEPARTMENT OF STATISTICS
GOVTERNMENT COLLEGE WOMEN
UNIVERSITY, SIALKOT
Project of statistical packages using excel
Descriptive statistics
Descriptive statistics are brief descriptive coefficients that summarize a given data set,
which can be either a representation of the entire population or a sample of a population.
Binomial distribution
The binomial is a type of distribution that has two possible outcomes. Binomial distribution
summarizes the number of trials, or observations when each trial has the same probability of
attaining one particular value. n a binomial distribution, only 2 parameters, namely n and p.
Procedure on excel =binomdist (x, n, p, false)
Example :n=15 P=0.8 find binomial dist.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Interpretation
0 0.00047018
1 0.00517203
2 0.027114
3 0.0905019
4 0.21727771
5 0.40321555
6 0.60981316
7 0.78689682
8 0.90495259
9 0.9661667
10 0.99065234
11 0.99807223
12 0.9997211
13 0.99997477
14 0.99999893
15 1
Poison distribution
A Poisson distribution is a probability distribution that is used to show how many times
an event is likely to occur over a specified period.
Procedure on excel: =POISSONDIST (x, mean, true)
Example
X=10 mean= 7
so p (10,7) = 0.901479
Interpretation:
From the above provided evidence, we can see that the probability that 10 particles enter the
counter is 0.901479.
Geometric mean
Geometric mean takes several values and multiplies them together and sets them to the
1/nth power.
Procedure on excel: =geomean(data)
Participant 1 2 3 4 5
Reaction time (milliseconds) 287 345 365 298 380
Interpretation
The average geometric mean is 322.96
Harmonic mean
The harmonic mean is calculated by dividing the number of observations by the reciprocal
of each number in the series. The harmonic mean is the reciprocal of the arithmetic mean of the
reciprocals.
Procedure on excel: =harmean(data)
Participant 1 2 3 4 5
Reaction time (milliseconds) 287 345 365 298 380
Interpretation
The average hormonic mean of milliseconds is 330.90
Median
For an even-numbered data set, find the two values in the middle of the data set: the values at the
n/2 and (n/2) + 1 positions. Then, find their mean.
Procedure on excel: =median(data)
Example
Reaction time 287 298 345 357 365 380
(milliseconds)
Interpretation
The median of milliseconds is 351.
Mode
The mode is the most frequently occurring value in the data set. It’s possible to have no mode,
one mode, or more than one mode.
Procedure on excel: =mode.multi(data)
Participant 1 2 3 4 5 6 7 8 9
Reaction time (milliseconds) 267 345 401 324 401 312 382 298 303
Interpretation
The mode of milliseconds is 401 because it repeated in data 2 time.
The variability
variability summarizes how far apart they are. This is important because it tells you whether the
points tend to be clustered around the center or more widely spread out.
Range
The range tells you the spread of your data from the lowest to the highest value in the
distribution. It’s the easiest measure of variability to calculate.
Procedure on excel: Max value -Mini value
Data (minutes) 72 110 134 190 238 287 305 324
Interpretation
The range is 252 minutes.
Interquartile range
The interquartile range gives you the spread of the middle of your distribution.
For any distribution that’s ordered from low to high, the interquartile range contains half of the
values. While the first quartile (Q1) contains the first 25% of values, the fourth quartile (Q4)
contains the last 25% of values.
Procedure on excel: Q1 -Q3
Where, Q1 is equal to =Quartile(data,1)
Q3 is equal to =Quartile(data,3)
Data (minutes) 72 110 134 190 238 287 305 324
Interpretation
Out IQR of minutes is 163.5
Outlier
Outliers can significantly increase or decrease the mean when they are included in the
calculation. Since all values are used to calculate the mean, it can be affected by extreme
outliers. An outlier is a value that differs significantly from the others in a data set.
Procedure on excel: =IF (value>minimum/maximum value, "outlier", "not outlier")
Participant 1 2 3 4 5
Reaction time (milliseconds) 832 345 365 298 380
Interpretation :832 is outlier in data.
Standard deviation
The standard deviation is the average amount of variability in your dataset.
Procedure on excel: =stdev(data)
Interpretation
The standard deviation of data is 95.54. This means that on average, each score deviates
from the mean by 95.54 points.
Variance
The variance is the average of squared deviations from the mean. A deviation from the
mean is how far a score lies from the mean.
Procedure on excel:=var(data)
Data (minutes) 72 110 134 190 238 287 305 324
Interpretation
The variance of your data is 9129.14.
Charts
A statistical graph or chart is defined as the pictorial representation of statistical data in
graphical form. The statistical graphs are used to represent a set of data to make it easier to
understand and interpret statistical information.
There are some types of charts
• Bar chart
• Pie chart
• Scatter chart
• Preto chart
• Histogram chart
Bar chart
A bar graph is a chart that plots data using rectangular bars or columns (called bins) that represent
the total amount of observations in the data for that category.
Chart Title
7
6
5
Series1, 4
Axis Title
4
3
2
1
0
Comedy Action Romance Drama SciFi
Pie chart
A pie chart, sometimes called a circle chart, is a way of summarizing a set of nominal data
or displaying the different values of a given variable (e.g. percentage distribution). This type of
chart is a circle divided into a series of segments. Each segment represents a particular category.
Example
Expenses Amount
Rent 7000
Grocery 3000
Transport 800
Current 300
School fee 2000
Savings 1900
Chart Title
13% Rent
Grocery
13% 47% Transport
2% Current
5%
School fee
20% Savings
Scatter plot
A scatterplot is a type of data display that shows the relationship between two numerical
variables. Each member of the dataset gets plotted as a point whose x-y coordinates relates to its
values for the two variables.
Example
Ice Cream Sales vs Temperature
Temperature °C Ice Cream Sales
14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408
$700
$600
$500
Axis Title
$400
$300 Series1
$200 Linear (Series1)
$100
$0
0 5 10 15
Axis Title
Pareto chart
A Pareto chart is a bar graph. The lengths of the bars represent frequency or cost and are
arranged with longest bars on the left and the shortest to the right. In this way the chart visually
depicts which situations are more significant.
Example
TYPE OF DEFECT FREQUENCY OF DEFECT % OF TOTAL CUMULATIVE %
Button Defect 23 39.0 39.0
Pocket Defect 16 27.1 66.1
Collar Defect 10 16.9 83.1
Cuff Defect 7 11.9 11.9
Sleeve Defect 3 5.1 16.9
Total 59 - -
Chart Title
25 120.00%
20 100.00%
94.92% 100.00% 80.00%
15 83.05%
60.00%
10 66.10%
40.00%
5 38.98% 20.00%
0 0.00%
Button Pocket Collar Cuff Sleeve
Defect Defect Defect Defect Defect
Histogram chart
A histogram is a graphical representation that organizes a group of data points into user-
specified ranges. Similar in appearance to a bar graph, the histogram condenses a data series into
an easily interpreted visual by taking many data points and grouping them into logical ranges or
bins.
Example
Salary (in thousands of $) Number of employees
0–10 50
11–20 300
21–30 250
31–40 400
41–50 550
51–60 433
61–70 266
71–80 350
81–90 100
91+ 20
salaries of empolyees
600
Number of empolyeee
400
200
0
salary
T-test
A t-test is a type of inferential statistic used to determine if there is a significant difference
between the means of two groups, which may be related in certain features.
Procedure on excel: Go to data→data analysis→t-test paired two sample for mean→then
ok→then select data →level of significance →then ok
Example
pre-treatment post-treatment
143 143
129 112
142 132
154 120
133 127
130 127
147 138
128 128
144 142
142 121
142 138
130 131
129 121
128 125
120 117
114 123
125 138
121 120
144 125
124 129
Solution:
pre-
treatment post-treatment
Mean 133.45 127.85
Variance 112.47 73.40
Observations 20 20
Pearson Correlation 0.38
Hypothesized Mean Difference 0.00
Df 19.00
t Stat 2.31
P(T<=t) one-tail 0.02
t Critical one-tail 1.73
P(T<=t) two-tail 0.03
t Critical two-tail 2.09
Hypothesis:
H0: µd =0
H1: µd ≠0
Level of significance:
α =0.05
Test statistic:
̅ −µ𝒅
𝒅
𝒕 = 𝒔𝒅
⁄
√𝒏
Calculations: t = 2.31
Conclusion:
From the provided evidence, we can see that our p-value is 0.03 which is less than 0.05,
i.e,.0.03<0.05, so we reject H0 and conclude that there is difference in blood pressure after the
treatment.
Regression
Simple linear regression
Simple linear regression is a regression model that estimates the relationship between one
independent variable and one dependent variable using a straight line. Both variables should be
quantitative.
Procedure on excel: Data then data analysis then regression then select x independent variable
and y dependent variable then select place press ok.
Example
The number of pounds of steam used per month by a chemical plant is thought to be
related to the average ambient temperature (in F) for that month. The past year’s usage and
temperature are shown in the following table:
Jan. 21 185.79
Feb. 24 214.47
Mar. 32 288.03
Apr. 47 424.84
May 50 454.58
June 59 539.03
July 68 621.55
Aug. 74 675.06
Sept. 62 562.03
Oct. 50 452.93
Nov. 41 369.95
Dec. 30 273.98
Answer
Regression Statistics
Multiple R 0.999933
R Square 0.999865
Adjusted R
Square 0.999852
Standard
Error 1.942835
Observations 12
ANOVA
Significance
df SS MS F F
Regression 1 280583.1 280583.1 74334.36 1.08E-20
Residual 10 37.74609 3.774609
Total 11 280620.9
Standard
Coefficients Error t Stat P-value
Intercept -6.3355 1.667648 -3.79906 0.003491
Temp. 9.208362 0.033774 272.6433 1.08E-20
Interpretation
̂ = -6.33+9.21X
𝒀
Interpretation
From the above model we conclude that if we change one unit in temperature then there
will be 9.21 units change in usage same direction.
Individual testing
𝜷′ 𝒔 = 𝟎
𝜷′ 𝒔 ≠ 𝟎
Level of significance:
𝜶 =0.05
Test statistic
̂ 𝟏− 𝜷𝟏
𝜷
t= ̂ 𝟏)
𝑺.𝑬(𝜷
Calculation
t= 272.64
conclusion:
As our P-value is 0.003 so we reject Ho and conclude that there is relationship between
temperature and usage.
Overall significant
Ho: 𝜷′ 𝒔 = 𝟎
H1: 𝜷′ 𝒔 ≠ 𝟎
Level of significance:
𝛼 = 0.05
Test statistic:
𝑀𝑆 𝑟𝑒𝑔
F=
𝑀𝑆 𝑟𝑒𝑠𝑖
Calculation
Significance
df SS MS F F
Regression 1 280583.1 280583.1 74334.36 1.08E-20
Residual 10 37.74609 3.774609
Total 11 280620.9
Conclusion:
From the above table as our f(Cal) 74334.36 is greater than our f tabulated 1.08E-20 so we
reject Ho and conclude that our regression parameters are significant.
Example
An engineer at a semiconductor company wants to model the relationship between
the device HFE ( y) and three parameters: Emitter-RS (x1), Base-RS (x2), and Emitter-to-
Base RS (x3). The data are shown in the following table.
Regression Statistics
Multiple R 0.996842
R Square 0.993695
Adjusted R
Square 0.992513
Standard Error 3.479627
Observations 20
ANOVA
Significance
df SS MS F F
Regression 3 30531.5 10177.17 840.5463 8.31E-18
Residual 16 193.7248 12.1078
Total 19 30725.23
Coefficients Standard Error t Stat P-value
Intercept 47.174 49.58148 0.951444 0.355532
Emitter-RS -9.7352 3.691625 -2.6371 0.017935
Base -RS 0.428287 0.223933 1.912564 0.073876
Emitter to base 18.23745 1.311802 13.9026 2.37E-10
̂ = 47.17-9.7352+0.42+18.23X
𝒀
Interpretation
From the above model we conclude that if we change one unit in Emitter-RS and Emitter,
base-RS and Emitter to base then there will be 0.42 and 18.23 units in same direction and 9.734
units in opposite direction.
Individual testing
𝜷′ 𝒔 = 𝟎
𝜷′ 𝒔 ≠ 𝟎
Level of significance:
𝜶 =0.05
Test statistic
̂ 𝟏− 𝜷𝟏
𝜷
t= ̂ 𝟏)
𝑺.𝑬(𝜷
Calculation
t= -2.63
conclusion:
As our P-value is 0.01 so we reject Ho and conclude that there is relationship between
device HFF and Emitter-RS, Base RS and Emitter to base RS.
Overall significant
Ho: 𝜷′ 𝒔 = 𝟎
H1: 𝜷′ 𝒔 ≠ 𝟎
Level of significance:
𝛼 = 0.05
Test statistic:
𝑀𝑆 𝑟𝑒𝑔
F=
𝑀𝑆 𝑟𝑒𝑠𝑖
Calculation
Significance
df SS MS F F
Regression 3 30531.5 10177.17 840.5463 8.31E-18
Residual 16 193.7248 12.1078
Total 19 30725.23
Conclusion:
From the above table as our fcal is 840.54 is greater than our f tabulated 8.31E-18 so we
reject Ho and conclude that our regression parameters are significant.
Level of significance
𝛼 = 0.05
Test statistic
𝑀𝑆𝑇
F=
𝑀𝑆𝐸
Calculation
Source of
Variation SS df MS F P-value F crit
Rows 0.10218 4 0.025545 8.91623 0.000266 2.866081
Columns 0.062867 5 0.012573 4.388598 0.007364 2.71089
Error 0.0573 20 0.002865
Total 0.222347 29
Conclusion:
As our p value is less than 0.05 so we reject Ho. Its mean there is no effect of nozzle types
on shape measurement.
ANOVA
Source of
Variation SS df MS F P-value F crit
Rows 72.6575 3 24.21917 1.6072 0.239502 3.490295
Columns 90.518 4 22.6295 1.501709 0.262908 3.259167
Error 180.83 12 15.06917
Total 344.0055 19
Interpretation
Interpretation
Hypothesis
Ho: Leakage voltage does not depend on channel length.
H1: Leakage voltage depends on channel length.
Level of significance
𝛼 = 0.05
Test statistic
𝑀𝑆𝑇 𝑀𝑆𝐵
F1= , F2=
𝑀𝑆𝐸 𝑀𝑆𝐸
Calculation
ANOVA
Source of
Variation SS df MS F P-value F crit
Rows 72.6575 3 24.21917 1.6072 0.239502 3.490295
Columns 90.518 4 22.6295 1.501709 0.262908 3.259167
Error 180.83 12 15.06917
Total 344.0055 19
Conclusion
As our p value is greater than 0.05 so we don’t not reject Ho. Its mean Leakage voltage
does not depend on channel length.
Matrix
A matrix (whose plural is matrices) is a rectangular array of numbers, symbols, or
expressions, arranged in rows and columns.
Determinant of matrix
The determinant of a matrix is the scalar value or number calculated using a square matrix.
The square matrix could be 2×2, 3×3, 4×4, or any type, such as n × n, where the number of column
and rows are equal.
Procedure on excel: =MDETERM (data)
Example
𝟑 −𝟏
The matrix is given by, A = [ ]Find the value of |A|
𝟒 𝟑
Interpretation
The determinant of matrix is 13.
Transpose of matrix
The transpose of a matrix is an operator which flips a matrix over its diagonal.
Procedure on excel: =transpose (array, ctrl, shift, enter)
Example
−2 5 6
A= [ ]order the transpose od given matrix
5 2 7
Answer
The transpose of matrix is
-2 5
5 2
6 7
Inverse of matrix
The concept of inverse of a matrix is a multidimensional generalization of the concept of
reciprocal of a number: the product between a number and its reciprocal is equal to 1.
Procedure on excel: =MMULT (1st,2nd ctrl, shift, enter)
Example
2 3 −4 3
A= 𝐵=
3 4 3 −2
3 −2]
Inverse of matrix is
1 0
0 1