Quantitive Methods
Quantitive Methods
Quantitive Methods
BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad
1
Course and programs
Timings: 10.30am-12.30pm.
Days: Sundays +++.
4
BITS Pilani, Pilani Campus
Pre-recorded lectures
5
BITS Pilani, Pilani Campus
Course handout and Evaluation
6
BITS Pilani, Pilani Campus
Familiarity with QM techniques
7
BITS Pilani, Pilani Campus
Computer software
MS Excel
for all chapters.
http://reshmat.ru/graphical_method_lpp.html
for Chapter 7, 9, 10 of TB-2: Online and free-to-use.
http://www.phpsimplex.com/simplex/simplex.htm?l=en
for Chapter 7, 9, 10 of TB-2. Online and free-to-use.
8
BITS Pilani, Pilani Campus
Familiarity with MS Excel?
9
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
10
Course feedback….1/2
3. “I, for one can say, that I find QM as easy as Physiology, if not
easier.” –BDS.
13
BITS Pilani, Pilani Campus
The approach
Pace of coverage
14
BITS Pilani, Pilani Campus
Expect examples from diverse fields
1. Your company/organisation-
2. Industry-
3. Designation-
4. Experience in years-
5. Highest educational qualification-
6. Your city-
For example-
ONGC/Oil&Gas/Asst Mgr/9 yrs/ElectricalEngg/Mumbai.
TataMotors/Automobile/Senior Engineer/6 yrs/MechanicalEngg/Pune.
16
BITS Pilani, Pilani Campus
.
.
.
..
.
.
Class Monitor
20
BITS Pilani, Pilani Campus
Statistics
21
BITS Pilani, Pilani Campus
Recent developments
22
BITS Pilani, Pilani Campus
Data in a manufacturing firm
23
BITS Pilani, Pilani Campus
Data visualization- Dashboard
Sales Factory
BITS Pilani, Pilani Campus
QM- Applications and techniques
Defining variables,
Collecting data
Types of data
From Wikipedia
500 200 425 425 275 375 200 350 425 200
350 425 425 425 375 375 375 500 375 425
350 200 400 500 275 500 200 400 275 200
425 350 425 425 200 425 375 350 200 500
425 500 375 200 200 375 500 425 500 425
425 14
500 8
Charts
Total 50
Bar chart (Horizontal), Pie chart,
Histogram, Column chart (Vertical),
Box plot, Line chart/Time series,
Scatter, 3D chart, Stem-and-Leaf
chart ...
Pareto chart, Radar, Area, Surface,
Gantt chart, Tree map, Network
diagram, Word cloud, Venn diagram,
Onion chart….
28
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
1. ID- Roll no 10. Annual Salary in 5 years- expected Columns are called Fields. This dataset has 14 Fields.
2. Gender: m/male, f/female after graduation, $ ‘000. Rows are called Records. This dataset has 50 records.
3. Age-years 11. Employment status- ft/Full Time, Record # 7 to 46 are not shown.
4. Height- inches pt/Part Time, u/Unemployed.
5. Class- fr/Fresher, so/Sophomore, jr/Junior, sr/Senior. 12. No of affiliations- of clubs
6. Major- a/Accounting, CIS, ef/Economics/Finance, 13. Satisfaction advisement- likely to
ib/International Business, m/Management, r/Retailing, advise others to join the college, 1 to
m/Marketing, o/Others. 5 scale.
7. Grad school- yes/y, no/n, u/unknown 14. Spending- Money spent on laptop,
8. GPA- Grade Point Average out of 4.00. books, etc. $.
9. Expected salary- $ ‘000
30
BITS Pilani, Pilani Campus
Measurement scales
1. Nominal (Categorical)
2. Ranked (Ordinal, from Order)
3. Interval More information
4. Ratio
31
BITS Pilani, Pilani Campus
1. Nominal data
32
BITS Pilani, Pilani Campus
Organizing Nominal data
Major
Gender Grad School Frequency and Relative Frequency Percentage
Category Frequency Employment Status
Category Frequency mr 10 Category Frequency Category Frequency Gender
m 26 ef 9 y 18 un 11 Category Frequency Percentage%
11 17 38 m 26 52
f 24 a un pt
24 48
ib 3 n 15 ft 1 f
Total 50
is 4 Total 50 Total 50 Total 50
o 2
un 2
mr 9
Total 50
33
BITS Pilani, Pilani Campus
Visualizing Nominal data
Gender
Category Frequency
m 26
f 24
Total 50
Major
Category Frequency
mr 10
ef 9
a 11
ib 3
is 4
o 2
un 2
m 9
Total 50
34
BITS Pilani, Pilani Campus
2. Ranked data
Ranked (from ‘Rank), also called Ordinal (from
order) data.
Tall, taller, tallest.
Big, bigger, biggest.
Major, Colonel, Brigadier.
Child, adult, senior citizen.
Olympics: First, second, third, fourth….
Thickness: Very thick, thick, thin.
Taste: Good, average, below average, bad.
Temperature: Freezing, cool, warm, hot.
Garment sizes; S, M, L, XL, XXL, XXXL.
Customer satisfaction: not satisfied, somewhat satisfied,
satisfied, highly satisfied.
36
BITS Pilani, Pilani Campus
Visualizing Ranked data
Class
Cumulative
Class Frequency
frequency
fr 18 18
so 23 41
jr 5 46
sr 4 50
Total 50
37
BITS Pilani, Pilani Campus
This chapter will be continued in the next session.
38
BITS Pilani, Pilani Campus
Quantitative Methods
BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad
1
Items of interest
1. Excel-2 will be held on 27Jan, Thursday 7-9 pm.
Recording of the session will be available.
2. HW-1: Tables and Charts is available at Taxila,
under Topic-2. HWs are not to be submitted.
Solutions of HWs are posted after 2 weeks, at
Taxila.
3. Post your messages only on Discussion Forum at
Taxila, and not at Impartus. ?
4. PPTs of Chapter 01 and 02, and Chapter-03 are
available in advance at Taxila, under Topic-1.
2
BITS Pilani, Pilani Campus
The approach
Pace of coverage
3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
6
BITS Pilani, Pilani Campus
Measurement scales
1. Nominal (Categorical)
2. Ranked (Ordinal, from Order)
3. Interval More information
4. Ratio
7
BITS Pilani, Pilani Campus
1. Nominal data
8
BITS Pilani, Pilani Campus
Organizing Nominal data
Major
Gender Grad School Frequency and Relative Frequency Percentage
Category Frequency Employment Status
Category Frequency mr 10 Category Frequency Category Frequency Gender
m 26 ef 9 y 18 un 11 Category Frequency Percentage%
11 17 38 m 26 52
f 24 a un pt
24 48
ib 3 n 15 ft 1 f
Total 50
is 4 Total 50 Total 50 Total 50
o 2
un 2
mr 9
Total 50
9
BITS Pilani, Pilani Campus
Visualizing Nominal data
Gender
Category Frequency
m 26
f 24
Total 50
Major
Category Frequency
mr 10
ef 9
a 11
ib 3
is 4
o 2
un 2
m 9
Total 50
10
BITS Pilani, Pilani Campus
2. Ranked data
Ranked (from ‘Rank), also called Ordinal (from
order) data.
Tall, taller, tallest.
Big, bigger, biggest.
Major, Colonel, Brigadier.
Child, adult, senior citizen.
Olympics: First, second, third, fourth….
Thickness: Very thick, thick, thin.
Taste: Good, average, below average, bad.
Temperature: Freezing, cool, warm, hot.
Garment sizes; S, M, L, XL, XXL, XXXL.
Customer satisfaction: not satisfied, somewhat satisfied,
satisfied, highly satisfied.
12
BITS Pilani, Pilani Campus
Visualizing Ranked data
Class
Cumulative
Class Frequency
frequency
fr 18 18
so 23 41
jr 5 46
sr 4 50
Total 50
13
BITS Pilani, Pilani Campus
14
BITS Pilani, Pilani Campus
3. Interval scale
Satisfaction Advisement
Cumulative
Rating Frequency
frequency
1 3 3
2 5 8
3 12 20
4 13 33
5 13 46
6 3 49
7 1 50
Total 50
17
BITS Pilani, Pilani Campus
Visualizing Interval data
Satisfaction Advisement
Cumulative
Rating Frequency
frequency
1 3 3
2 5 8
3 12 20
4 13 33
5 13 46
6 3 49
7 1 50
Total 50
18
BITS Pilani, Pilani Campus
4. Ratio scale
19
BITS Pilani, Pilani Campus
Organizing and Visualizing Ratio data
Cumulative
Spending, $ Frequency
frequency
0-200 1 1
201-400 20 21
401-600 20 41
601-800 8 49
801-1000 1 50
Total 50
20
BITS Pilani, Pilani Campus
Frequency polygon
21
BITS Pilani, Pilani Campus
Stem-and-Leaf plot
153 Class Interval Frequency
154 150-160 3 Histogram
154 160-170 3
162 170-180 5
165 180-190 4
169 190-200 1
Total 16
172
176
176
176
Stem Leaf
26667
177
1267
181 15 344 • Stem-and-Leaf plot retains the data,
344
259
Leaf
unlike Histogram.
182 16 259 • Stem-and Leaf plot is used when data
3
186 17 26667
size is small.
187
18 1267
Stem
193
15
16
17
18
19
19 3
22
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
23
Two categorical variables
Idly 3 4 7
Total 4 6 10
24
BITS Pilani, Pilani Campus
Two categorical variables
Contingency Table
25
BITS Pilani, Pilani Campus
Visualizing two categorical variables-
Column charts
Gender
f m Total
fr 3 2 5
Class
so 11 12 23
jr 7 11 18
sr 3 1 4
Total 24 26 50
Two variables are- Gender (f/m) and
Class (fr, so, jr, sr).
26
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
27
Two numerical variables- Line chart
Year No (miliion)
2001 2.54
2002 2.38
2003 2.73
2004 3.46
2005 3.92
2006 4.45
2007 5.08
2008 5.28
2009 5.17
2010 5.78
2011 6.31
2012 6.58
2013 6.97
2014 7.68
2015 8.03
2016 8.80
28
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
29
Two numerical variables- Scatter plot
Nominal data-???
Ranked data-???
Interval data-???
Ratio data-???
31
BITS Pilani, Pilani Campus
HW:
Measurement scale of each variable (10 Columns)?
Warranty Claims- Truck tyres
Tyre Product Claim Acceptance Tyre Defect Tyre Usage Claim Loss Claim Production Claim Manufacturing Tyre Hardness
Code (% ) (Rs) Month/Year Month Location Plant Code (SHA)
104113 Accepted-Special K LOCK RING /FITMENT D 37.50 9,310.62 Apr-19 Apr-18 Faridabad 1400 75
104113 Accepted-Special K LOCK RING /FITMENT D 50.00 7,448.50 Apr-19 Apr-18 Faridabad 1400 78
104113 Accepted-Special K LOCK RING /FITMENT D 40.00 8,938.20 Apr-19 Apr-18 Faridabad 1400 77
104113 Accepted-Special K FITMENT/LOCK RING/BE 66.25 5,027.74 Apr-19 Jul-18 Faridabad 1400 73
101766 Accepted-Manufacturing TREAD / SHOULDER SEP 54.05 6,269.42 Apr-19 Jun-16 Faridabad 1400 74
104113 Accepted-Special K TURN UP SEPARATION 55.00 6,703.65 May-19 Jul-18 Faridabad 1400 76
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 35.00 9,683.05 May-19 Feb-18 Faridabad 1400 75
104113 Accepted-Manufacturing BELT EDGE / BELT SEP 37.50 9,310.62 May-19 Jun-18 Faridabad 1400 76
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 55.00 6,703.65 May-19 Feb-18 Faridabad 1400 74
104113 Rejected-Non Manufacturing FITMENT/LOCK RING/BE 5.00 0.00 May-19 Jan-19 Faridabad 1400 78
104113 Accepted-Manufacturing TURN UP SEPARATION 26.75 10,912.05 May-19 Aug-18 Faridabad 1400 77
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 52.50 7,076.07 Jun-19 Sep-18 Faridabad 1400 78
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 42.50 8,565.77 Jun-19 May-18 Faridabad 1400 75
104113 Accepted-Special K TURN UP SEPARATION 60.00 5,958.80 Jun-19 May-18 Faridabad 1400 74
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 45.00 8,193.35 Jun-19 Oct-18 Faridabad 1400 74
104113 Accepted-Special K TURN UP SEPARATION 15.00 12,947.20 Aug-19 Feb-19 Chandigarh 1400 72
105627 Accepted-Manufacturing BELT EDGE / BELT SEP 60.00 6,184.40 Aug-19 Dec-18 Chandigarh 1400 74
104113 Accepted-Special K TURN UP SEPARATION 55.00 6,854.40 Aug-19 Nov-17 Chandigarh 1400 76
104112 Accepted-Special K TURN UP SEPARATION 60.00 6,094.40 Aug-19 Jan-19 Chandigarh 1400 76
104113 Accepted-Special K TURN UP SEPARATION 70.00 4,569.60 Aug-19 Jan-17 Chandigarh 1400 77
Nominal data-???
Ranked data-???
Interval data-???
Ratio data-???
32
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Contents
1. Graphs- Bar charts and Pie charts
2. Histograms
3. Frequency tables
4. Contingency tables
5. Leaf and Stem diagram
6. Line charts
7. Fun (Application of Histograms)
8. More Fun (Application of Histograms)
9. Big Fun (Application of Histograms)
10. Medical Image processing- (Application of Histograms)
Nice to know
35
The origin of …..
36
BITS Pilani, Pilani Campus
Structured and Unstructured data
38
BITS Pilani, Pilani Campus
Take 5… Bar charts
1. X axis in a Bar chart has Categories; X axis in a Histogram has Interval or Ratio data.
2. There should not be a gap between the ‘bars’ in a Histogram.
.
41
BITS Pilani, Pilani Campus
Take 5… Stem-and-Leaf plots
42
BITS Pilani, Pilani Campus
Take 5… Line (Time Series) charts
43
BITS Pilani, Pilani Campus
Additional charts
44
BITS Pilani, Pilani Campus
MS Excel can also plot three variable
charts
Here three variables are- High, Low and Close prices of a stock.
45
BITS Pilani, Pilani Campus
Bad or Incorrect charts…1/2
Pie chart would have been better? 8 variables plotted, difficult to analyse 3D, is it adding value?
48
BITS Pilani, Pilani Campus
Mutually exclusive and Collectively
exhaustive
Class Interval Frequency
Raw data 1-10 2
10 10 20 20 20 30 40 50 50 11-20 3
21-30 1
31-40 1
41-50 2 Not mutually exclusive
Total 9
BITS Pilani
Pilani Campus
Frequency table
One nominal variable- Blood Group
Contingency table
Three nominal variables- Blood group, Gender and Rh
Contingency table
Two nominal variables- Blood Group and Ethnicity 51
BITS Pilani, Pilani Campus
Charts of Categorical data
1 2 3
1 Pie chart
2 Column chart
3 Side-by-side chart
4 Stacked horizontal chart
5 Stacked column chart
4 5
52
BITS Pilani, Pilani Campus
Two numerical variables charts
Data on Map 54
BITS Pilani, Pilani Campus
55
BITS Pilani
Pilani Campus
Next Chapter
Chapter-3: Numerical Descriptive Measures
Quantitative Methods
BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad
1
Items of interest
1. Excel-3 will be held on 1 Feb, Tuesday 7-9 pm.
Recording of the session will be available.
2. Solution of HW-01 is now available at Taxila,
under Topic-2.
3. HW-2: Numerical Descriptive Measures is
available at Taxila, under Topic-2. HWs are
not to be submitted. Solutions of HWs are
?
posted after 2 weeks, at Taxila.
4. Post your messages only on Discussion Forum
at Taxila, and not at Impartus.
5. PPT of Chapter-03 is available in advance at
Taxila, under Topic-1.
2
BITS Pilani, Pilani Campus
The approach
Pace of coverage
3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
Relevant PRLs
Numerical Descriptive Measures
1. Central tendency
2. Variation
3. Shape
4. Exploring numerical data
5. Covariance and Coefficient of correlation
(will be covered later, with Chapter-12)
5
BITS Pilani, Pilani Campus
Summary measures
6
BITS Pilani, Pilani Campus
Chapter summary-
Numerical Descriptive Measures
Numerical Descriptive Measures
Innings played 329
Total score 15917
Outliers Mean 48
Stdev 51
Median 32
Mean
Minimum 0
Median
Quartile 1, Q1 10
Quartile 2, Q2 32
Quartile 3, Q3 73.5
15 15
40 28
122
31
88
92
113
67
52
45
20
69
41
86
1
0
248
36
4 56
82
88
10
105
16
61
36
73
32
7
37
Maximum 248
59 148 104 177 4 4 155 16 94 1 68 143 107 41 21
8
41
6
17
71
142
74
10 83
0
136
97
8
15
1
12
34
1
37 52
34
44
62
15
47
109 7
13
6
80
15
32
1
IQR, Q3 - Q1 63.5
35 114 96 0 143 6 21 22 92 0 52 14 154 12 100 146 8 10
57 5 6 42 139 29 20 88 193 44 41 63 12 37 106 14 25 74 CoV 1.05
0 0 43 7 8 0 18 54 241 16 0 71 103 8 34 13
24
88
11
111
11
34
18
2
23
15
9
53 122
103
26
35
43
60
194
22
109
64
14
13
153
11
5
84
203
12
16
19
17
Skewness 1.5
5 1 85 61 148 124 39 90 16 2 16 101 13 160 41 56 27
6 179 36 13 18 201 176 36 8 23 31 0 54 1 13 Kurtosis -1.2
10 0 54 15 4 126 76 36 176 1 19 122 27 49 38 40
27 73 40 4 155 15 65 42 8 8 37 12 64 98 23 8
68 10 169 79 44 10 79 51 2 14 16 5 62 214 91 8
119 50 4 9 177 217 10 117 9 5 91 31 9 53 7 76 Boxplot
21 9 0 35 31 15 126 0 32 55 23 1 6 4 40 76 5
11 165 52 9 34 61 17 0 8 3 26 82 14 100 12 38 2
16 78 2 7 7 0 74 8 7 20 16 1 13 40 13 94 81
7 62 24 15 47 116 36 0 55 32 28 1 49 53 3 13
Runs scored in Tests by Tendulkar (played 347 and batted in 329 innings)
7
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
1. Central tendency
(Mean, Median and Mode)
8
Central tendency measures for raw data
2. Median- value of middle observation- after sorting the data in ascending order.
1 3 4 8 9 Median value= 4.
11 22 30 40 50 60 700000 Median value= 40.
1 3 4 8 9 12 When even number of observations Median value= 6 = (4+8)/2.
2 5 6 7 8 9 11 18 1500
10
BITS Pilani, Pilani Campus
When to use Mode?
Median 450
Mode 400 11 times
12
BITS Pilani, Pilani Campus
Central Tendency Measures-
Raw vs. Grouped data
Raw data: Sorted Grouped data
The observation 500 is considered in
Class Interval 300-500 and so on.
200 400 400 500 550
Class Interval Freq. Cum. Freq.
250 400 425 500 600
300 400 450 500 600 100-300 4 4
300 400 450 500 600 300-500 33 37
350 400 450 500 600 500-700 11 48
350 400 450 500 600 700-900 1 49
350 400 450 500 650
900-1100 1 50
350 400 450 525 700
360 400 500 550 800 Total 50
Notice that the Mean, Median and Mode values for Raw data and Grouped data are not equal.
13
BITS Pilani, Pilani Campus
Mean of Grouped data- Computation
14
BITS Pilani, Pilani Campus
What is the difference?
15
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
2. Variation
16
Variation in daily life
6 Total= 3 +6 + 4 +2 + 4 + 1 + 4 +2 +1 +3 = 30.
5
4
3
2
1 3 6 4 2 4 1 4 2 1 3
6
5 Total= 3 x 10 = 30.
4
3
2
1 3 3 3 3 3 3 3 3 3 3
1 2 3 4 5 6 7 8 9 10
17
BITS Pilani, Pilani Campus
Following may mean high variation
High
High fluctuations Low
volatility predictability
High Not
risk steady
High Highly
uncertainty uneven
Low
High noise
reliability
Poor High
quality vibrations
High High
Highly
contrast
inconsistent Variation (Image)
18
BITS Pilani, Pilani Campus
Measuring variability
19
BITS Pilani, Pilani Campus
Most important slide of this course
1. Error
Class Interval, CI Freq.
100-300 4
300-500 33
500-700 11
3. Variation
4. Correlation
20
BITS Pilani, Pilani Campus
Most important formula of this course
Errors of 1, 2, 3, 4, 5? Mean=15/5=3.
2. Variation
22
Measuring variation
1. Range = Maximum-Minimum
2. Variance, population = σ2 = 1/N * ∑ (xi-Mean)2
3. Standard deviation, population =σ
4. Coefficient of Variation, population = σ/Mean
5. Variance, sample = s2 = 1/(N-1) * ∑ (xi-Mean)2
6. Standard deviation, sample =s
7. Coefficient of Variation, sample = s/Mean
8. Mean absolute deviation = 1/N * ∑ |xi-Mean|
9. Z score = [xi-Mean]/ σ
10. Quartiles (Q , Q , Q )
1 2 3 Smallest 25%, 50%, 75% observations.
11. Inter-quartile range = Q3 – Q1
12. 5-number summary Minimum, Q1, Q2, Q3, Maximum.
13. Boxplot (called Box and Whisker chart in MS Excel 2019) Plot of 5-number summary.
23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
24
Higher variation- Red or Blue?
Range 4 4
Range= Maximum-Minimum.
25
BITS Pilani, Pilani Campus
Variance (σ2 ) and Standard deviation (σ )
Blue Data- 1, 5, 3, 4, 2.
Mean = 15/5 = 3
1 5 3 4 2 Data
(1-3) (5-3) (3-3) (4-3) (2-3) 1. Error from Mean.
-2 2 0 1 -1 Simplifying
4 4 0 1 1 2. Square the Error.
Red Data- 3, 3, 1, 3, 5.
Mean = 15/5 = 3
3 3 1 3 5 Data
(3-3) (3-3) (1-3) (3-3) (5-3) 1. Error from Mean.
0 0 -2 0 2 Simplifying
0 0 4 0 4 2. Square the Error.
Standard deviation = + 1.6 = 1.265 5. + Square Root of Mean of Square of Errors (RMSE).
27
BITS Pilani, Pilani Campus
Higher variation- Red or Blue?
Range 4 4
Stdev 1.41 1.26
Range= Maximum-Minimum.
Standard deviation = 1/(N-1) * ∑ (xi-Mean)2
28
BITS Pilani, Pilani Campus
Remaining chapter will be covered in the next session.
29
BITS Pilani, Pilani Campus
Quantitative Methods
BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad
1
Items of interest
1. Post your messages only on Discussion
Forum at Taxila, and not at Impartus.
2
BITS Pilani, Pilani Campus
The approach
Pace of coverage
3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
Relevant PRLs
Numerical Descriptive Measures
1. Central tendency
To be done today
2. Variation (partly covered in previous session)
3. Shape
4. Exploring numerical data
5. Covariance and Coefficient of correlation (will
be covered later, with Chapter-12)
5
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
2. Variation
6
Measuring variation
1. Range = Maximum-Minimum
2. Variance, population (entire data) = σ2 = 1/N * ∑ (xi-Mean)2
3. Standard deviation, population = σ = RMSE
4. Coefficient of Variation, population = σ/Mean
5. Variance, sample (partial data) = s2 = 1/(N-1) * ∑ (xi-Mean)2
6. Standard deviation, sample =s
7. Coefficient of Variation, sample = s/Mean
8. Mean absolute deviation = 1/N * ∑ |xi-Mean|
9. Z score = [xi-Mean]/ σ
10. Quartiles (Q , Q , Q )
1 2 3 Smallest 25%, 50%, 75% observations.
11. Inter-quartile range = Q3 – Q1
12. 5-number summary Minimum, Q1, Q2, Q3, Maximum.
13. Boxplot (called Box and Whisker chart in MS Excel 2019) Plot of 5-number summary.
7
BITS Pilani, Pilani Campus
Higher variation- Red or Blue?
Range 4 4
Stdev 1.41 1.26
Range= Maximum-Minimum.
Stdev= SQRT(1/N * ∑ (xi-Mean)2).
8
BITS Pilani, Pilani Campus
HW: DIY
Range, Variance, Standard deviation?
A: 10 10 10 10 10 10.
D: 0 20 0 20 0 20 0 20 0 20.
Standard
Dataset Range Variance
Deviation
A
B
Do only by hand, do not use calculator or computer. C
D
9
BITS Pilani, Pilani Campus
Range- Uses and shortcomings
Stock price during a day
Uses
Stock price- minimum and maximum in a day.
Ambient temperature- minimum and maximum in a
day.
Blood pressure- high and low within few minutes. Ambient temperature during a day
R-chart (Range) in Statistical Process Control (SPC).
Range is computed only from two observations.
Hence, easy to compute.
11
BITS Pilani, Pilani Campus
ISC Exam results 2020- A school’s analysis
https://aniruddhadeb.blogspot.com/2020/07/
12
13
Coefficient of Variation, CoV
NSE
NSE Sensex
Sensex
Min
Min 16614
16614 55822
55822 When the means of two datasets differ a lot, as in both the
Max
Max 18477
18477 61766
61766
cases here, use CoV to compare variation between them.
Mean 17616 59096
CoV is dimensionless. Hence, CoV can be used to compare
Range
Range 1863
1863 5944
5944 NSE
NSE<<Sensex
Sensex
Stdev 457 1493 NSE < Sensex
variation between very different kinds of systems- say
CoV 2.6 2.5 NSE ≈ Sensex, almost equal variation in the incomes and experience in years of
employees of a company; Javelin (meters) and Marathon
(secs), etc.
CoV= Standard deviation/Mean * 100 14
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Z score
15
Z score
How far is an observation from the mean, in terms of Unlike, Mean and Variance-
standard deviations. Z score is not a summary measure of the dataset;
Z score can be computed for each data point.
Z scores of 1, 2, 3, 4, 5?
Uses
Mean = 15/5=3. To identify Outliers (extreme values).
Standard deviation= 1.41. Z value < 3 or > 3 are called Outliers.
16
BITS Pilani, Pilani Campus
Z score- Outlier
Outliers
Mean
Z = (ObservedValue – Mean)/Stdev
17
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
18
Typical shapes of frequency
distributions…1/2
Symmetric, Bell Symmetric, less Symmetric and Negatively skewed, tail Positively skewed, tail Another Positively
shaped, high concentration in the flattest. There is no on left side. on right side. skewed, tail on right
concentration in the middle. mode. side.
middle.
1 2 3 4 5 6
19
BITS Pilani, Pilani Campus
Typical shapes of frequency
distributions…2/2
Symmetric, Bell Symmetric, less Symmetric and Negatively skewed, tail Positively skewed, tail Another Positively
shaped, high concentration in the flattest. There is no on left side. on right side. skewed, tail on right
concentration in the middle. mode. side.
middle.
1 2 3 4 5 6
Mean=Median=Mode Mean=Median=Mode Mean=Median Mean<Median<Mode Mode<Median<Mean Mode<Median<Mean
20
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
21
Flatness (or Peakedness) of a frequency distribution
23
BITS Pilani
Pilani Campus
Quartiles
24
Quartiles (from Quarter)
Divide the sorted data into 4 quarters- 25% 40, 50, 60, 80, 100, 110, 120, 180, 220, 300, 600, 700, 900, 910, 930 and 950.
observations in each quarter.
Dataset
950
930
910
900 Q3=800 = (900+700)/2
90 200 800
700
600
Q1, First Quartile- Lowest 25% observations.
300
Q2, Second Quartile- Lowest 50% observations.
220 Q2=200, (220+180)/2
Q3, Third Quartile- Lowest 75% observations. 180 IQR = 800 - 90 = 710.
120
Inter-Quartile Range (IQR) = Q3-Q1. 110
100 Q1=90, =(80+100)/2
Notice that Q2 = Median. 80
Quartiles are used to study variation in the data, and to spot 60
whether the distribution of data is symmetric. 50
Quartiles are also used in 5-number summary and Boxplots. 40
25
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
5-number Summary
26
5-number summary
The dataset is summarized by 40, 50, 60, 80, 100, 110, 120, 180, 220, 300, 600, 700, 900, 910, 930 and 950.
following 5 numbers-
Dataset 5-numbers
1. Minimum 950 950
930
2. Q1- Quartile 1 910
3. Q2- Quartile 2 900 Q3=800 = (900+700)/2 800
4. Q3- Quartile 3 700
600
5. Maximum 300
220 Q2=200, (220+180)/2 200
180
5-number summary is used to study
variation in the data, and to quickly 120
establish whether distribution of data is 110
symmetric. 100 Q1=90, =(80+100)/2 90
80
Used in Boxplots. 60
50
40 40
5-number summary of above dataset-
40, 90, 200, 800 and 950 27
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
The Boxplot
28
Boxplot
29
BITS Pilani, Pilani Campus
Boxplot, another example
5-number summary
1. Minimum: 2
2. Q1- 55 [=(50+60)/2]
3. Q2- 71 [=(70+72)/2]
4. Q3- 111 [=(110+112)/2]
5. Maximum:120
30
BITS Pilani, Pilani Campus
Typical Boxplots
Boxplot Q1, Q2, Q3,
no. Lowest 25% Lowest 50% Lowest 75%
All 100% Comments 1 5
1 25 5 5 25 Symmetric, narrow IQR
2 20 20 20 20 Symmetric, wider IQR, narrower Q1
3 20 30 30 20 Symmetric, very wide IQR, narrower Q1
4 35 15 15 35 Symmetric, narrow IQR, wider Q1
5 10 40 40 10 Symmetric, very wide IQR, very narrow Q1
6 25 5 5 25 Symmetric, low (30) median
2 6
7 45 10 10 15 Negative (left) skewed
8 20 5 10 45 Positive (Right) skewed
31
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
32
BITS Pilani
Pilani Campus
Contents
1. CentralTendency- Raw
2. DescriptiveMeasures
3. StemAndLeaf
4. BoxPlot
5. CentralTendency- Grouped
6. Variation1
7. Variation2
8. Z-score-1
9. Z-score-2
10. Shape
11. MoreFun Returns
Nice to know
38
The origin of….
In modern times
Average- Early 1500s- A part of the cargo
was thrown overboard to make the ship
lighter/safer/stable when it faced bad storm.
The losses were distributed proportionately
among the merchants whose goods were on
the ship.
39
BITS Pilani, Pilani Campus
Applications of mean
40
BITS Pilani, Pilani Campus
Median in other disciplines…
41
BITS Pilani, Pilani Campus
An application of Median…
https://www.bulbs.com/learning/arl.aspx
The rated life of light bulbs is Median value, and not Mean value.
42
BITS Pilani, Pilani Campus
Another application of Median…
Image Processing
Severely corrupted image on the left has salt and pepper
noise…it has been improved by using a Median filter.
https://en.wikipedia.org/wiki/Median_filter
43
BITS Pilani, Pilani Campus
Median…..
Income and wealth distribution has high inequality. Hence median, and not mean, is preferred.
44
BITS Pilani, Pilani Campus
Few properties of Mean, Median, Mode
46
BITS Pilani, Pilani Campus
HW: Rank these graphs on their variations
1 2 3
100 100 100
80 80 80
60 60 60
40 40 40
20 20 20
0 0 0
A B C D E F G H I J K L A B C D E F G H I J K L A B C D E F G H I J K L
4 5 6
100 100 100
80 80 80
60 60 60
40 40 40
20 20 20
0 0 0
A B C D E F G H I J K L A B C D E F G H I J K L A B C D E F G H I J K
Above three standard deviations (σ) have been computed from the histograms of the their digital images.
Low standard deviation- Low contrast image; High standard deviation- High contrast image.
48
BITS Pilani, Pilani Campus
Standard deviation- Signal processing
Test Cricket
Dravid Tendulkar Sehwag
Total score 13289 15917 8229
Mean 46 48 47
Standard deviation 48 51 58
50
BITS Pilani, Pilani Campus
Measuring variation of a dataset
52
BITS Pilani, Pilani Campus
HW…. A tale of 3 exams
A B C
Mean= 105. Mean= 75. Mean= 45.
Stdev= 24.5. Stdev= 27. Stdev= 24.5.
Skewness= -1.3. Skewness= 0. Skewness= 1.3.
Kurtosis= 1.26. Kurtosis= 0.67. Kurtosis= 1.26.
No. of observations= 262. No. of observations= 262. Number of observations= 262.
54
BITS Pilani, Pilani Campus
Take 5… Boxplots
From Husband-Wife
169 pairs dataset,
available at Taxila.
55
BITS Pilani, Pilani Campus
Take 5… Outliers
A B C D
57
BITS Pilani, Pilani Campus
Pre- Tokyo Olympics race
58
BITS Pilani, Pilani Campus
From mythology….
59
BITS Pilani, Pilani Campus
Formulas
60
BITS Pilani, Pilani Campus
Mean- Discrete and Continuous
Discrete sequence
1 𝑁𝑁
Mean = ∑ 𝑥𝑥 Mean
N 𝑖𝑖=1 𝑖𝑖 xi
1 2 3 ….. N
f(x)
𝑏𝑏
Mean = 1/(b-a) * ∫𝑎𝑎 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑
a b
64
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Next Chapter
Chapter-4: Basic Probability
Quantitative Methods
BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad
1
Items of interest
1. Quiz-1 will be available at Taxila (eLearn) between
14-24 Feb. Syllabus- Chapters 1 to 4, TB-1. Last
date will not be extended.
2. Extra class…… on 15Feb, Tue 7-9 pm.
Pace of coverage
3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
5
BITS Pilani, Pilani Campus
What happens when….
8
BITS Pilani, Pilani Campus
An application of measurement of
uncertainty-2
9
BITS Pilani, Pilani Campus
An application of measurement of
uncertainty-3
Probability of attack
X Y Y+ Z Z+
• Z+ category is a security detail of 55 personnel, including 10+ NSG commandos and police personnel.
• Z category is a security detail of 22 personnel, including 4-6 NSG commandos and police personnel.
• Y+ category is a security detail of 11 personnel, including 2-4 commandos and police personnel.
• Y category is a security detail of 8 personnel, including 1 or 2 commandos and police personnel.
• X category is a security detail of 2 personnel, with no commandos but only armed police personnel,
From Wikipedia.
10
BITS Pilani, Pilani Campus
An application of measurement of
uncertainty-4
Average-1/26= 3.48%
?
Uncertainty can be measured (estimated)
in above cases..
12
BITS Pilani, Pilani Campus
Business applications-1/2
13
BITS Pilani, Pilani Campus
Business applications-1/2
Rain
No Rain
50% days.
10% days.
Applications: Visit Mumbai or not; Schedule IPL/Test match or not; Insurance for Olympics 20XX… Application: Portfolio selection by Mutual Funds…
Earthquakes
0 1 2 3 4 5 6 7 8 9 10 11 12 …
15
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
16
Getting values of probability
(p-: 150-151, TB-1)
1. A priori
Classical/Equi-likely.
Textbook examples of Coin tossing, Playing cards,
Throwing a dice, etc.
When you know nothing.
2. Empirical
From historical data, observations, or experiments.
Life tables in insurance, earthquakes, rainfall,
twins, quality, stock market …
3. Subjective
Personal judgement.
Covid-19 will be over in 2024, Outcome of India vs
Brazil cricket match …
17
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
A priori probability
Probability- a priori…1/3
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
1. Tossing a coin-
Outcomes- Head or Tail.
P(Head) = P(Tail) = ½.
2. Throwing a Dice-
Outcomes- 1, 2, 3, 4, 5, or 6.
P(1)=P(2)=P(3)=P(4)=P(5)=P(6)= 1/6.
P(Even) = 3/6. P(<3) = 2/6.
3. Births-
Outcomes- Male or Female.
P(Male) = P(Female) = ½.
4. Playing cards-
Outcomes- 52 nos. Its probability tree will be very large- 52
branches, hence not shown.
P(King) = 4/52.
P(Heart) = 13/52. 19
Number of outcomes in which the event occurs Total number of possible outcomes
1. P(4)= = 1/6 1 1, 2, 3, 4, 5, 6.
2. P(5)= = 1/6 1 1, 2, 3, 4, 5, 6.
3. P(Even)= = 3/6 3 1, 2, 3, 4, 5, 6.
4. P(<5)= = 4/6 4 1, 2, 3, 4, 5, 6.
5. P(<=5)= = 5/6 5 1, 2, 3, 4, 5, 6.
6. P(Divisible by 3)=
= 2/6 2 1, 2, 3, 4, 5, 6.
7. P(Divisible by 5)=
= 1/6 1 1, 2, 3, 4, 5, 6.
8. P(Prime)= = 3/6 3 1, 2, 3, 4, 5, 6.
20
BITS Pilani, Pilani Campus
Probability- a priori...3/3
2. P(Diamond)== 13/52.
Club
3. P(Picture)= = 12/52.
4. P(=7)= = 4/52.
Heart
5. P(<7)= = 24/52.
6. P(King)= = 4/52. Spade
P(Red) > P(<7) > P(Diamond )> P(Picture) > P(King)= P(=7).
21
BITS Pilani, Pilani Campus
Shortcoming of ‘a priori’ approach
22
BITS Pilani, Pilani Campus
Uncertainty, Probability and Risk
Outcome-1
When probabilities are not considered, there is a risk.
Outcome-2
p Outcome-1
When probabilities are considered, still there is a risk.
1-p Outcome-2
23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Empirical probability-
From experiments or observations
Empirical probability
?
When probability is computed from experiments,
observations, surveys, etc.
?
Item Probability
Left-handed 1 : 10 persons
Twins 3 : 100 births
Breast Cancer 1 : 8 Women in US
17.2 in 100 male smokers
Lung Cancer
11.6 in 100 females smokers
Vegetarian 38 : 100 persons
Aircraft crash 1 : 48 lakh flights
Boys to Girls ratio 51.2 : 48.8
(In most industrialized countries)
Sources of these probabilities are given in a later slide- Nice to Know.
25
BITS Pilani, Pilani Campus
Empirical probability computation-
Examples
.
S&P BSE Sensex observed for 26 days- Down-11 times,
Up- 14 times. Range Frequency Probability*
20-30 1 0.01
P(Down) = 11/25 = 44%. 30-40 40 0.26
P(Up) = 14/25 = 56%. 40-50 76 0.50
50-60 26 0.17
60-70 6 0.04
70-80 1 0.01
80-90 1 0.01
Total 151 1
* or Relative Frequency 26
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Subjective probability
Subjective probability
Based on experience, private knowledge, personal
opinions, biases, etc.
3. Sports betting-
P(IndiaWillWin) = 0.40. BookieA.
P(IndiaiWillWin) = 0.45. BookieB.
P(IndiaWillWin) = 0.70. BookieC.
4. Cancer?
P(Cancer=Yes) = 0.40. DoctorA.
P(Cancer=Yes) = 0.45. DoctorB.
P(Cancer=Yes) = 0.70. DoctorC.
28
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Conditional probability
Joint, Marginal and Conditional probability- for two or more events
29
Types of probability
(p: 153-156, TB-1)
Outcome Probability
1 1/6
For single event- 2 1/6
Outcome Probability 3 1/6
Simple probability. H 1/2
4 1/6
T 1/2
Total 1
5 1/6
Single Coin 6 1/6
Single Dice
Total 1
Color
Coin B
Tail 0.25 0.25 0.5
Black 2/52 24/52 26/52
Total 4/52 48/52 52/52
Total 0.5 0.5 1.0
Probabilities-
Joint Both A and B occur. P(A and B).
Marginal Only A occurs. P(A) = P(A and B) Or P(A and NotB).
Only B occurs. P(B) = P(A and B) Or P(NotA and B).
Conditional A occurs given that B has occurred. P(A/B).
SK rule- For small classroom problems, draw Probability (Decision) Tree for better understanding and faster calculations. 30
BITS Pilani, Pilani Campus
Remaining chapter will be covered in the next session.
31
BITS Pilani, Pilani Campus
Quantitative Methods
1
Items of interest
1. Quiz-1 is available at Taxila (eLearn) between 14-
24 Feb. Syllabus- Chapters 1 to 4, TB-1. Last date
will not be extended.
2. Excel HW-03 will be available at Taxila on 17 Feb.
3. PPT of Chapter-4 (Basic Probability) is available in
advance at Taxila, under Topic-1.
4. Students who have joined late should refer to the
Course Handout available at Taxila (eLearn) and
PPT of Lecture-1 (16Jan) available at Impartus for
prescribed Textbooks, Evaluation plan, coverage,
syllabus, etc.
5. Post your messages only on Discussion Forum at
Taxila, and not at Impartus.
2
BITS Pilani, Pilani Campus
The approach
Pace of coverage
3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
5
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Conditional probability
Joint, Marginal and Conditional probability- for two or more events
6
Types of probability
(p: 153-156, TB-1)
Outcome Probability
For single event- 1 1/6
Outcome Probability 2 1/6
Simple probability. H 1/2 3 1/6
T 1/2 4 1/6
Total 1
5 1/6
Single Coin 1 2 3 4 5 6
1/2 1/2 Single Dice
1/6 1/6 1/6 1/6 1/6 1/6
6 1/6
Total 1
Color
Tail 1/4 1/4 2/4 Black 2/52 24/52 26/52
Total 2/4 2/4 4/4 Total 4/52 48/52 52/52
Marginal
1. P(King)= = 4/52.
2. P(Red)= = 26/52.
3. P(7)= = 4/52.
4. P(Picture)= = 12/52.
5. P(Diamond)= = 13/52.
Joint
5. P(Red and King)= = 2/52.
6. P(Diamond and Red)== 13/52.
Marginal probability- concerned only one with event.
7. P(Picture and Red)= = 6/52. P(King) means the probability that the card is a King.
8. P(Black and Red)= = 0/52.
9. P(<3 and Red)= = 4/52. Joint probability- both events occur.
P(Red and King) means the probability that the card is of Red
color and it is also a King.
Conditional
Conditional probability- has knowledge of one of the events.
10. P(Red/King)= = 2/4. P(Red/King) means the probability that the card is of Red color
11. P(Red/7)= = 2/4. if the card is known to be a King.
12. P(Diamond/Picture)== 3/12. Here all the probabilities were computed from the data- 52 cards
13. P(Picture/Diamond)== 3/13. picture, and not from the probabilities of other events.
Notice that P(Diamond/Picture) is not be equal to P(Picture/Diamond). 8
Table: Historical data Computing probabilities from historical data- Joint and Marginal probabilities
Beverage
Customer # Food Beverage Marginal probability-
P(Dosa)= 3/10 = 0.3. P(Idly) = 7/10 = 0.7. Tea Coffee Total
1 Dosa Tea P(Tea) = 4/10 = 0.4. P(Coffee)= 6/10 = 0.6. Dosa 0.1 0.2 0.3
Food
2 Dosa Coffee
Joint probability- (AND) Idly 0.3 0.4 0.7
3 Dosa Coffee P(Dosa and Tea)= 1/10 = 0.1. P(Dosa and Coffee)= 2/10 = 0.2.
4 Idly Tea P(Idly and Tea) = 3/10 = 0.3. P(Idly and Coffee) = 4/10 = 0.4. Total 0.4 0.6 1.0
Food
9 Idly Coffee P(Tea/Idly) = 3/7 = 0.43. P(Coffee/Idly) = 4/7 = 0.57.
10 Idly Coffee Or- Idly 0.75 0.67 x
Event-1: Food (Events- Dosa, Idly). P(Dosa or Tea)= 6/10 = 0.6. P(Dosa or Coffee)= 7/10 = 0.7. Total 1.00 1.00 x
Event-2: Beverage (Events- Tea, Coffee). P(Idly or Tea) = 8/10 = 0.8. P(Idly or Coffee) = 9/10 = 0.9.
A 0.25 0.33
Event 1
x
P(A/B) = P(A and B) / P(B).
NotA 0.3 0.4 0.7 NotA 0.75 0.67 x = 0.1 / 0.4 = 0.25.
Total 0.4 0.6 1.0 Total 1.00 1.00 x P(A or B) = P(A) + P(B) – P(A and B).
= 0.3 + 0.4 - 0.1 = 0.6. 9
* p-155.
** p-160.
*** p-164, General Multiplication rule for Independent events.
**** p-156, General Addition rule.
10
BITS Pilani, Pilani Campus
Conditional probability
12
BITS Pilani, Pilani Campus
Kitty party
1. How many guests took Frooti?
= PizzaAndFrooti + BurgerAndFrooti
= 100*0.25*0.20 + 100*0.75*0.60 = 5 + 45 = 50.
2. What % of the guests who took Frooti had taken Pizza?
= 5/ (5 + 45) = 0.10, or 10%.
3. What % of the guests who took Frooti had taken Burger?
= 45/ (5 + 45) = 0.90, or 90%.
14
BITS Pilani, Pilani Campus
Quantitative Methods
1
Items of interest
1. Quiz-1 is available at Taxila (eLearn) between 14-24 Feb.
Syllabus- Chapters 1 to 4, TB-1. Last date will not be
extended.
2
BITS Pilani, Pilani Campus
The approach
Pace of coverage
3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
5
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Bayes’ Theorem
6
Kitty party
1. How many guests took Frooti?
= PizzaAndFrooti + BurgerAndFrooti
0.25 0.75 = 100*0.25*0.20 + 100*0.75*0.60 = 5 + 45 = 50.
2. What % of the guests who took Frooti had taken Pizza?
= 5/ (5 + 45) = 0.10, or 10%.
3. What % of the guests who took Frooti had taken Burger?
= 45/ (5 + 45) = 0.90, or 90%.
0.20 0.80 0.60 0.40
4. How many guests took Coke?
= PizzaAndCoke + BurgerAndCoke
= 100*0.25*0.80 + 100*0.75*0.40 = 20 + 30 = 50.
5. What % of the guests who took Coke had taken Pizza?
= 20/ (20 + 30) = 0.40, or 40%.
6. What % of the guests who took Coke had taken Burger?
5 45 20 30 = 30/ (20 + 30) = 0.60, or 60%.
For 2 above, following formulas were used (even without thinking about them)-
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹
50 50 𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃/𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 = =
𝑃𝑃(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) 𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 + 𝑃𝑃(𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ∗𝑃𝑃(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹/𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃)
=
You have learnt 𝑃𝑃(𝑃𝑃𝑖𝑖zz𝑎𝑎)∗𝑃𝑃(𝐹𝐹𝑟𝑟𝑜𝑜𝑜𝑜𝑡𝑡𝑖𝑖/𝑃𝑃𝑖𝑖𝑧𝑧𝑧𝑧𝑎𝑎)+𝑃𝑃(Burger)∗𝑃𝑃(Frooti/Burger)
Bayes’ Theorem !!! =
0.25∗0.20
=
0.05
= 0.10, or 10%.
0.25∗0.20 + 0.75∗0.60 0.05 + 0.45
7
BITS Pilani, Pilani Campus
Reverse probability and Bayes’ Theorem
8
BITS Pilani, Pilani Campus
Why second test?
. False Positives
False Negatives
9
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
10
From Textbook…
(p: 169-170, TB-1)
The probability that a person has a certain disease is 0.03 or 3% Positivity rate
If the disease is actually present, the probability that the medical 0.03 0.97
diagnostics test will give a positive result (indicating the disease is Efficacy of
present) is 0.90. Testing
procedure.
If the disease is not actually present, the probability of a positive test
result is 0.02. 0.90 0.10 0.02 0.98
b. Suppose the test has given a positive result. What is the probability that 0.0464 0.9536
the disease is actually present.
= HasDiseaseANDTestsPositive / TestsPostive
= 0.03 * 0.9 / 0.0464 = 0.5819 or 58.19%.
P(DY/TP) = P(DY) * P(TP/DY) / P(TP)
11
BITS Pilani, Pilani Campus
‘Reverse’ probability of previous problem
0.5819 0.419
12
BITS Pilani, Pilani Campus
From Textbook
(p: 167-168, TB-1)
= P(S)*P(F/S) + P(U)*P(F/U)
P(F) = (0.4 * 0.8 + 0.6 * 0.3) P(S/F) = (0.4 * 0.8)/ (0.4 * 0.8 + 0.6 * 0.3)
= 0.50. = 0.32 / (0.32 + 0.18)
= 0.64.
14
BITS Pilani, Pilani Campus
HW-03:
Download Excel file at Taxila. Below Topic-2.
Contents
1. A priori
2. Empirical- Graph
3. Empirical- Dataset
4. Emperical- BSE
5. CFC Hospital
6. BBC
7. Cancer
8. NotObvious
9. Fun with 2019
10. More Fun with 2019
15
BITS Pilani, Pilani Campus
Hot seat
p1 p2 p3
No fever
fever
Probabilities
Probabilities
Outcome Outcome
16
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Nice to know
The origin of Probability
18
BITS Pilani, Pilani Campus
God and uncertainty
19
BITS Pilani, Pilani Campus
Empirical probability-1
1/48,00,000
90%
https://en.wikipedia.org/wiki/Handedness#:~:text=In%20human%20biology%2C%20handedness%20is,called%20the%20non%2Ddominant%20hand.
3%
38%
20
BITS Pilani, Pilani Campus
Empirical probability-2
1:8
17.2%, 11.6%
51.2%
≠ 50:50
21
BITS Pilani, Pilani Campus
Empirical probability-3
P(Male) ≠ P(Female) ?
Bathtub distribution
Start observing 1,00,000 newly born babies, and record the
surviving number after every 5 years, Col(3).
Col (4) is the difference in successive values of Col(3),
Col 2 = Col4/Col3.
A graph between Col(2) on Y axis and Col(1) on X axis will
resemble the graph given on the left side.
23
BITS Pilani, Pilani Campus
An application of probability- file
compression
A file of 50KB is compressed to 20 KB file using probability.
24
BITS Pilani, Pilani Campus
An application of probability-
Storage strategy in a Warehouse
https://www.allaboutlean.com/storage-strategies-random-chaotic-abc/
25
BITS Pilani, Pilani Campus
Reliability based pricing of Warranty?
HW: Explain the difference… why one is lower and another is higher in terms of reliability.
26
BITS Pilani, Pilani Campus
Applications of conditional probability
Email spam filters. Conditional probability is Other applications
used to classify emails like the one reproduced
Autocorrect spellings in SMS, MS word.
below as spam.
fb- ‘people you may know.’
Amazon’s/Netflix’s recommendation system- if watched Tom
and Jerry, high probability will watch Chotta Bhim.
Target advertising- if female and young, show ads of cosmetics.
Target pricing- if purchased iPhone, high probability the buyer is
rich… try selling other products above the competitive price.
Loan/Credit card approval (next slide- Banking services).
Medical investigation- (slides later- Medical diagnosis- 1, 2 and 3).
Crime investigation (slide later- Crime Patrol).
Neural Networks and Machine learning.
Why second Covid-19 test/ second opinion?
27
BITS Pilani, Pilani Campus
Application of conditional probability-
Banking services
29
BITS Pilani, Pilani Campus
For card addicts
Fig.1 Fig.2
Method-1: Get probabilities for Fig.1 directly from the picture of 52 cards above.
Method-2: Get probabilities for Fig.1 without referring the picture of 52 cards. Instead
refer to the probabilities available in Fig. 2 and use Bayes’ Theorem.
P(Diamond/Picture)= (13/52*3/13)/(13/52*3/13+39/52*9/39)= 3/12.
P(Diamond/NotPicture)= (13/52*10/13)/(13/52*10/13+39/52*30/39)= 10/40.
P(NotDiamond/Picture)= 1- P(Diamond/Picture)= 1- 3/12= 9/12.
P(NotDiamond/NotPicture)= 1- P(Diamond/NotPicture)= 1 - 10/40= 30/40.
Method-2 is extensively used in practice since raw data (here 52 card Picture) is rarely
available.
30
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
32
BITS Pilani, Pilani Campus
Application of conditional probability-
Medical diagnosis-1
P(HasDiabetes/UrinateALot)= a. https://www.cdc.gov/diabetes/basics/symptoms.html
P(HasDiabetes/DoesNotUrinateALot)= b.
Above, a ≠ b.
33
BITS Pilani, Pilani Campus
Application of conditional probability-
Medical diagnosis-2
HW: Easy.
If a patient smokes, what is the probability he/she
has Chest pain?
https://journals.plos.org/plosone/article/figures?id=10.1371/journal.pone.0195029
35
BITS Pilani, Pilani Campus
a
b
From above-
P(HusbandDoctor/WifeDoctor)= 1/4= 0.25.
P(WifeDoctor/HusbandDoctor)= 0.16.
36
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
37
Field placement
38
BITS Pilani, Pilani Campus
Seven self-declared experts on Coin Tossing have
offered following strategies for next several tosses-
Expert # 1: HHHHHHHHHHHHH…. Every time choose Head.
Expert # 2: TTTTTTTTTTTTTTTT….
Expert # 3: HTHTHTHTHTHTHT….
Expert # 4: HHHTTTHHHTTTHHHTTT....
Expert # 5: Same as the last outcome.
Expert # 6: Opposite of the last outcome.
Expert # 7: Toss another coin, and choose its outcome.
…..
HW:
Which Expert’s advise should Kohli follow?
39
BITS Pilani, Pilani Campus
For Cinephiles
p Khushi
1-p Gham
40
BITS Pilani, Pilani Campus
From Mythology
Others
41
BITS Pilani, Pilani Campus
Conditional probability- From Mythology
Ashwathama
has died
42
BITS Pilani, Pilani Campus
Conditional probability- From Bollywood
A B
HAHK?
A or B?
Hum
43
BITS Pilani, Pilani Campus
A priori vs. Empirical
Empirical- 100:0.
A priori- 50:50. (Based on experiments)
(Equi-likely)
44
BITS Pilani, Pilani Campus
HW: Make the toss fair
45
BITS Pilani, Pilani Campus
Make the game fair
https://www.dutchreferee.com/alternative-penalty-shootout-for-football-attacker-defender-goalkeeper/
HW:
Suggest two or more rules to make penalty shootouts fairer
(closer to 50:50).
46
BITS Pilani, Pilani Campus
Crime patrol
47
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Next chapter
Chapter-5: Discrete Probability Distributions
48
Chapter-5:
Discrete Probability Distributions
BITS Pilani
Pilani Campus
49
This chapter
Textbook # Chapter # Chapter Title
1 1 Defining and Collecting Data
1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures
1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
50
BITS Pilani, Pilani Campus
Topics
Relevant Pre-recorded lectures,
Chapter-5: Discrete Probability accessible from Taxila
Distributions
51
BITS Pilani, Pilani Campus
Most important slide of this course
1.Error
Class Interval, CI Freq.
100-300 4
300-500 33
500-700 11
2.Frequency/Probability distribution 700-900
900-1100
Total
1
1
50
3.Variation
4.Correlation
This slide was first used in L-03, 30Jan.
52
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
1 2 3
Hospital stay
Days Probability
1 0.10
2 0.15
3 0.30
4 0.25
Discrete means the outcome is a- 5 0.15
Categorical variable (H/T, Won/Lost/Draw), or 5+ 0.05
an integer (1, 2, 3, 4…). Total 1
But not a fraction (0.666, 1.414, 1.618, 2.718,
3.1415, 9.11, etc.). Fractions are considered in 4
Continuous probability distributions, like Normal A probability distribution may also
distribution.
be expressed by an equation.
54
BITS Pilani, Pilani Campus
Probability distribution
p-184, TB-1.
Interruption Probability,
per day, x P(x)
0 0.35
1 0.25
2 0.20
3 0.10
4 0.05
5 0.05
Total 1
55
BITS Pilani, Pilani Campus
Expected Value (EV)
p-184, TB-1.
Interruption Probability,
x*P(x)
per day, x P(x)
0 0.35 0.00
1 0.25 0.25
2 0.20 0.40
3 0.10 0.30
4 0.05 0.20
5 0.05 0.25
Total 1 1.40 EV
56
BITS Pilani, Pilani Campus
Standard deviation of interruptions
p-184, TB-1.
Interruption Probability, 2
(x-Mean) , C C*P(x)
per day, x P(x)
0 0.35 1.96 0.69
1 0.25 0.16 0.04
2 0.20 0.36 0.07
3 0.10 2.56 0.26
4 0.05 6.76 0.34
5 0.05 12.96 0.65
Total 1 2.04 Variance.
Variance, σ2 = ∑ (xi-µ)2*P(xi)
where xi is a random variable and P(xi) is its probability. Potential applications-
Standard deviation, σ = + Sqrt(Variance). Service Level Agreements (SLA); AMC rate.
EV, µ= ∑ xi * P(xi). No of technicians required; No of spare parts required.
Reliability studies- Mean Time Between Failures (MTBF).
57
BITS Pilani, Pilani Campus
Sources of probability distributions
1. Empirical
From historical data, experiments.
Interruptions: Empirical
probability distribution
2. Theoretical
Binominal
Poisson
A fair dice. Theoretical
probability distribution
Numerous other theoretical distributions (Uniform).
Normal, F, t, Chi-square (later in the course)
Not in the course- Uniform, Geometric, Hypergeometric, Beta,
Gamma, Maxwell-Boltzman, Cauchy, Rayleigh, Erlang, …
58
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Binomial distribution
59
Binomial examples
1. Coin: Head/Tail.
2. Births: Male/Female. Or, Underweight/NotUnderweight.
3. Quality: Ok/Defective.
4. Machine status: Working/Not Working.
5. Diagnostic result: Positive/Negative.
6. Cards: Red/Black. Or, Picture/NotPicture. Or, Diamond/NotDiamond.
7. KBC: Correct/Incorrect.
8. Will vote for: Cong/NotCong.
9. Football: Win/NotWin.
10. Patient: Inpatient/Outpatient.
11. SS Operation: Successful/Failure.
12. Dice: Odd/Even. Or, Prime/Composite. Or, <2 / >=2.
13. More:Defaulter/NotDefaulter; OnDuty/OnLeave; Employed/Unemployed;
Graduate/NotGraduate; BSEup/BSEdown; Immigrant/NotImmigrant.
60
BITS Pilani, Pilani Campus
Generate Binomial distribution….
0.5 0.5
No. Outcome Probability Calculations Probability of 0, 1, 2 Heads?
1 HH 0.25 0.5*0.5
0.5 0.5 2 HT 0.25 0.5*0.5 P(x)= 2Cx (0.5)x (1-0.5)(2-x)
0.5 0.5 0.5 0.5
3 TH 0.25 0.5*0.5 Excel function-
4 TT 0.25 0.5*0.5 =BINOM.DIST(x,2,0.5,FALSE)
HH HT TH TT HH HT TH TT Total 1.00
0.25 0.25 0.25 0.25
61
BITS Pilani, Pilani Campus
Generate Binomial distribution….
P(MF)= P(M)*P(F)= 0.6*0.4= 0.24. And so on, when events are independent.
62
BITS Pilani, Pilani Campus
Generate Binomial distribution….
0.6 0.4
Given
No Outcome Probability Calculations
1 MMM 0.216 0.6*0.6*0.6
0.6 0.4 0.6 0.4 2 MMF 0.144 0.6*0.6*0.4
3 MFM 0.144 0.6*0.4*0.6
4 MFF 0.096 0.6*0.4*0.4
0.6 0.4 Probability of 0, 1, 2, 3 Males?
5 FMM 0.144 0.4*0.6*0.6
6 FMF 0.096 0.4*0.6*0.4 P(x)= 3Cx (0.6)x (1-0.4)(3-x)
0.6 0.4 0.6 0.4 0.6 0.4 0.6 0.4
7 FFM 0.096 0.4*0.4*0.6
Excel function-
8 FFF 0.064 0.4*0.4*0.4
=BINOM.DIST(x,3,0.6,FALSE)
Total 1.000
MMM MMF MFM MFF FMM FMF FFM FFF
0.216 0.144 0.144 0.096 0.144 0.096 0.096 0.064
P(MFM)= P(M)*P(F)*P(M)= 0.6*0.4*0.6= 0.144. And so on, when events are independent.
63
BITS Pilani, Pilani Campus
Generate Binomial distribution….
P(MFM)= P(M)*P(F)*P(M)*P(F)= 0.6*0.4*0.6*0.4= 0.0576. And so on, when events are independent. 64
BITS Pilani, Pilani Campus
From Textbook….1/2
p-183, 191-192, TB-1.
a. What is the probability that there are 3 When customers submit orders online,
tagged orders forms in a sample of 4? the Accounting Information System
(x=3; n=4). reviews the order for possible mistakes.
P(Tagged=3)=?
Any questionable invoices are tagged and
b. What is the probability that there are 3 or
included in a daily exception report.
more tagged order forms in a sample of 4?
(x=3, x=4; n=4).
P(Tagged>=3)=? Recent data collected by the company
show that the likelihood is 10% that an
c. What is the probability that there are less order will be tagged.
than 3 tagged forms in a sample of 4? (x=0, [π=P(Y)=0.10, P(N)=0.90].
x=1, x=2; n=4).
P(Tagged<3)=?
0.10 Tagged, Y
0.90 NotTagged, N
65
BITS Pilani, Pilani Campus
From Textbook….2/2
p-183, 191-192, TB-1.
68
BITS Pilani, Pilani Campus
Quantitative Methods
1
Items of interest
1. Quiz-1 will close on 24 Feb. Syllabus- Chapters 1 to 4,
TB-1. Last date will not be extended.
2
BITS Pilani, Pilani Campus
The approach
Pace of coverage
3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
5
Topics
Relevant Pre-recorded lectures,
Chapter-5: Discrete Probability accessible from Taxila
Distributions
6
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Binomial distribution
7
Binomial distribution-
Formula and Excel function
π Invoice Audit. p-191, TB-1.
1-π P(Tagged, Y), π= 0.1; P(NotTagged, N)= 1- π= 1-0.1= 0.9. n=4.
0.1 Y P(Y)=0.1, P(N)=0.9. n=4.
Outcomes: Outcome no Outcomes Calculations Probability
P(3)= 4C3 (0.1)3 (1-0.1)(4-3) = 4 * 0.0009= 0.0036 Three tagged. 11 NYNY 0.9*0.1*0.9*0.1 0.0081
P(x) = nCx πx (1-π)(n-x) P(4)= 4C4 (0.1)4 (1-0.1)(4-4) = 1 * 0.0001= 0.0001 Four tagged.
12
13
NYNN
NNYY
0.9*0.1*0.9*0.9
0.9*0.9*0.1*0.1
0.0729
0.0081
14 NNYN 0.9*0.9*0.1*0.9 0.0729
15 NNNY 0.9*0.9*0.9*0.1 0.0729
Statistical-
If P(A)=P(A/B), then A is independent of B, i.e., B does not affect probability of A.
P(King)= 4/52=1/13, P(King/Red)= 2/26= 1/13. Since P(King)= P(King/Red),
the event King is independent of event Red.
P(King)= 4/52=1/13, P(King/Picture)= 4/12=1/3. Since P(King)≠
P(King/Picture), the event King is dependent on event Picture.
9
BITS Pilani, Pilani Campus
Binomial probability distributions
Generate probability distributions for-
10
BITS Pilani, Pilani Campus
Solutions- previous slide
1 2
Excel function-
=BINOM.DIST(x,n,π,FALSE)
3 4 5
6 7 8
11
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Poisson distribution
12
What is the Probability distribution?
1. Avg no. of accidents= 4/day. Probability of 0, 1, 2, 3, 4, 5… accidents, x?
2. Avg no. of potholes= 6/km. Probability of 0, 1, 2, 3, 4, 5… potholes, x?
3. Avg no. of goals= 3.2/match. Probability of 0, 1, 2, 3, 4, 5… goals, x?
4. Avg no. of teeth cavities= 3.28/patient. Probability of 0, 1, 2, 3, 4, 5… teeth cavities, x?
5. Avg no. of shooting stars= 0.3/hour. Probability of 0, 1, 2, 3, 4, 5… shooting stars, x?
6. Avg no. of typos= 2.7/page. Probability of 0, 1, 2, 3, 4, 5… typos, x? Probability of 0, 1, 2, 3…accidents?
Distribution of the data is essential to make decisions.
Often distribution of the data is not available.
But, Mean (Avg.) of the data may be available.
If only Mean is known, then Poisson distribution
Solution of 1 above. formula can be used provided certain conditions are
x Probability, P(x) met.
0 0.0183
1 0.0733
2 0.1465 Poisson distribution formula Excel function-
3 0.1954 =POISSON.DIST(x,4,FALSE)
4 0.1954
5 0.1563
6 0.1042
7 0.0595
Mean=4
8 0.0298
9 0.0132
10 0.0053
… …
Sum 1.0000
13
BITS Pilani, Pilani Campus
Poisson distribution formula
_
𝑒𝑒 λ λ𝑥𝑥
P(x)=
𝑥𝑥!
e= 2.718… a constant.
𝑥𝑥= 0, 1, 2, 3…. number of events (accidents, cavities, goals, potholes…).
14
BITS Pilani, Pilani Campus
From Textbook
p-196, TB-1.
Suppose the mean number of customers (λ) who Average no. customers per minute, λ = 3.
arrive per minute at the bank during the noon-
to-1 PM hour is 3.
What is the probability exactly two customers P(0)= 0.0498. No customer arrives.
will arrive in a given minute? P(1)= 0.1494. 1 customer arrives.
P(2)= 0.2240. 2 customers arrive,
P(x=2)= 0.2240, or 22.4%.
P(3)= 0.2240. 3 Customers arrive.
P(4)= 0.1680. And so on.
What is the probability more than two P(5)= 0.1008.
…
customers will arrive in a given minute?
…
P(x>2)= 1 - 0.0498 - 0.1494 - 0.2240 -3 8
= 0.5768, or 57.68%. P(8)= e * 3 /8! = 0.0081.
P(9)= …
P(10)= …
…
P(200)= …
P(x>2)= P(3) + P(4) + P(5) + P6) + P(7) +… or
…
= 1 – P(0) – P(1) – P(2).
Excel function-
=POISSON.DIST(x,3,FALSE)
15
BITS Pilani, Pilani Campus
From Textbook
p-197, TB-1.
Excel function-
=POISSON.DIST(x,2.5,FALSE)
16
BITS Pilani, Pilani Campus
Using MS Excel
Using the Formula
Mean no. of customers per minute, λ= 3.
=POISSON.DIST(x,3,FALSE)
Excel function-
x Probability, P(x) MS Excel function =POISSON.DIST(x,Mean,FALSE)
17
BITS Pilani, Pilani Campus
Poisson probability distributions
Generate probability distributions for-
18
BITS Pilani, Pilani Campus
Poisson probability distributions
Excel function-
=POISSON.DIST(x,Mean,FALSE)
x= 1, 2, 3….20….
19
BITS Pilani, Pilani Campus
On Poisson distribution
Requirements-
The event of interest is the number of events in a given
interval (time/length/area).
The probability that an event occurs is same in every
interval of equal size.
The number of events that occur in one interval is
independent of number of events that occur in another
interval.
The probability that two or more events will occur in
an interval approaches zero as the interval becomes
smaller.
20
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Poisson Distribution:
Mean = λ, given or known.
Variance= Mean= λ.
Examples:
Binomial distribution: Mean= n*π. Variance= n*π*(1-π).
Example No of Trials, n. Prob. of Success, π. Mean= n*π. Variance=n*π*(1-π). Stdev= Sqrt(Variance).
Coin Tossing n=100. No of Tosses. P(Head), π = 0.5. Mean no. of Heads= 100*0.5= 50.
= 100*0.5*0.5=25. =Sqrt(25)= 5.
Children in family n=4. Children in family. P(Male), π = 0.6. Mean no. of Males= 4*0.6= 2.4 = 4*0.6*0.4=0.96. =Sqrt(0.96)=0.98.
Quality inspection n = 400. Lot size. P(Defective), π= 0.03. Mean no. of Defectives= 400*0.03= 12. = 400*0.03*0.97=11.64. =Sqrt(11.64)= 3.4.
MCQ questions n= 30. No of questions. P(Correct), π = 0.20. Mean no. of Correct answers= 30*0.20= 6. = 30*0.20*0.80=4.8. =Sqrt(4.8)= 2.2.
Poisson distribution: Mean= λ. Variance= Mean= λ.
Mean no of injuries, λ= 3/month. Variance=Mean= λ= 3.
Mean no. of potholes, λ= 6/km. Variance=Mean= λ= 6.
Mean no of goals scored, λ= 3.2/match. Variance=Mean= λ= 3.2.
Mean no. shooting stars, λ= 0.3/hour. Variance=Mean= λ= 0.3.
22
BITS Pilani, Pilani Campus
HW-04:
Download Excel file at Taxila. Below Topic-2.
Contents-
1. EV- Expected Value and Variance
2. Binomial
3. BetiBachao
4. Quality
5. MumbaiRains
6. Poisson
7. Ambulance
8. eLudo
9. BigBFun
23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Nice to know
Origin of ….
Binomial distribution:
Introduced by Jakob Bernoulli, from the
family of famous Swiss mathematicians.
Published in 1713, posthumously.
Poisson distribution:
Introduced in 1837 by Poisson, a French
mathematician.
While studying wrongful convictions.
25
BITS Pilani, Pilani Campus
Binomial distribution in action
26
BITS Pilani, Pilani Campus
HW
28
BITS Pilani, Pilani Campus
HW: KBC for Monkeys
A monkey has reached the hot seat of KBC. The
monkey picks the answers randomly (say, it
uses a four-sided fair ‘dice’ with A, B, C, and D
on the faces).
1. What is the probability that the monkey answers
all 15 questions correctly without using any
lifeline?
2. What is the average amount (Expected value)
won by the monkey. What is the variance?
3. What is the probability that option A gets chosen
on all 15 questions?
4. Suggest strategies to choose life lines- when to use
Ask the Expert, take Audience poll, or use 50:50
life line?
5. How many people are in the audience? (Hint: Use
Mode value and Minimum % responses of the
audience poll result).
29
BITS Pilani, Pilani Campus
From Mythology
HW:
What is the probability of having 100 sons and
1 daughter in a family of 101 children?
30
BITS Pilani, Pilani Campus
Application of Binomial distribution
A town has 600 families with 2 children and 100 families with one child.
https://vikaspedia.in/social-welfare/women-and-child-development/child-development-1/girl-child-welfare/state-wise-schemes-for-girl-child-welfare/sivagami-ammaiyar-memorial-girl-child-protection-scheme
31
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
33
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Next chapter
Chapter-6: The Normal Distribution
Chapter-6:
The Normal Distribution
BITS Pilani
Pilani Campus
35
This chapter
Textbook # Chapter # Chapter Title
1 1 Defining and Collecting Data
1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures
1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution Textbook #1
1 7 Sampling Distributions
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models Textbook #2
36
BITS Pilani, Pilani Campus
Most important slide of this course
1.Error
Class Interval, CI Freq.
100-300 4
300-500 33
500-700 11
2.Frequency/Probability distribution 700-900
900-1100
Total
1
1
50
3.Variation
4.Correlation
This slide was first used in L-03, on 30Jan.
37
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
39
BITS Pilani, Pilani Campus
Distribution of data
Birthweight (kg)
N=3,326
Mean=3.39 kg
Stdev= 0.55kg.
40
BITS Pilani, Pilani Campus
Resemblance to the theoretical Normal
distribution
Normal A theoretical
distribution. Inner diameter of
distribution bearing rings
N=120.
Mean= 62.998 mm.
Stdev= 0.020mm
IQ score-
N=736,808.
41
BITS Pilani, Pilani Campus
Obtaining the probabilities
If the mean and standard deviation of a data
are known then the data can be
approximated by the Normal distribution
provided following conditions are met-
Distribution of the data is near symmetric,
A vast majority of the observations are
closer to the mean.
Very few extremely small or large values.
42
BITS Pilani, Pilani Campus
Normal distribution
1. Normal distribution curve is symmetric about Mean, and a
vast majority of observations lie close to the mean.
2. Other names- Bell-shaped curve, Law of error and Gaussian.
3. It is a continuous distribution- its x-axis can have fractional
values like 3.39, 5.55, etc. (weight, diameter…) and the
curve ranges from –infinity to +infinity on x-axis.
4. Area under the curve represents probability. Total area
under the curve is 1, that is, probability of all the events=1.
5. A Normal distribution is described by two parameters- 2 4 6 8 10 12 14
x
16 18
mean (μ) and standard deviation (σ).
6. Reading probabilities from the graph is not easy. Therefore,
published tables are used to obtain area (probability) of
different regions.
7. Textbooks provide only one table for mean, μ= 0 and
standard deviation, σ = 1, called z table, or Cumulative
Probability (or Area)=?
Standardized Normal distribution table. z table =Norm.Dist(x,Mean,Stdev,True)
(p-540-541, TB-1).
8. z value is used to get probabilities from this table when μ≠ X=?
0 or σ ≠ 1. =Norm.Inv(Probability,Mean,Stdev)
44
BITS Pilani, Pilani Campus
Quantitative Methods
1
Items of interest
1. Quiz-1: Correct answer of- Defectives in 25 nos of parts:
Minor defect (6) and Major defect (2). P(Major/Defect)=
2/8 and not 6/8. Correction will be made in 4-5 days.
2
BITS Pilani, Pilani Campus
QM Mid-Term test
QM Mid-Term test-
1. Syllabus: Chapter 1 to 8; TB-1.
2. Number of questions-5-7; spread over all the chapters.
3. Numerical: Non-Numerical; planning for … 70:30.
4. Read QP instructions carefully. Questions may have sub-parts.
5. Ensure what is asked is answered, and completely.
6. Show all the workings- no marks without detailed workings.
7. Only hand written will be evaluated. No computer typed/No
Screenshots/No computer outputs.
8. Explain well. Express well.
3
BITS Pilani, Pilani Campus
The approach
Pace of coverage
4
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
6
Topics
7
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Area is given, x?
Smallest 10% plantations. = 14.9 acres.
MS Excel formula to get x-
Smallest 30% plantations. = 17.9. =NORM.INV(AreaToTheLeftOfx,Mean,Stdev)
Smallest 50% plantations. = 20.0.
Plantation size Area to the Left x Excel Function
Smallest 80% plantations. = 23.4.
Smallest 10% 0.1 14.9 =NORM.INV(0.1,20,4)
Smallest 30% 0.3 17.9 =NORM.INV(0.3,20,4)
Largest 10% plantations. = 25.1. Smallest 50% 0.5 20.0 =NORM.INV(0.5,20,4)
Largest 40% plantations. = 21.0. Smallest 80% 0.8 23.4 =NORM.INV(0.8,20,4)
Largest 10% 0.9 25.1 =NORM.INV(0.9,20,4)
Middle 50% plantations. = 17.3 to 22.7. Largest 40% 0.6 21.0 =NORM.INV(0.6,20,4)
Middle 80% plantations. = 14.9 to 25.1. Middle 50%
0.2500 17.3 =NORM.INV(0.25,20,4)
0.7500 22.7 =NORM.INV(0.75,20,4)
0.1000 14.9 =NORM.INV(0.1,20,4)
Middle 80%
0.9000 25.1 =NORM.INV(0.9,20,4) 10
-4 -3 -2 -1 0 1 2 3 4
z
This table is for Mean, μ=0 and Stdev, σ=1.
This table gives area (probability) to the left of z.
The area to the right of z= (1 – area to the left of z).
Examples-
1: Area to the left of z= -1 is 0.1587, or 15.87%.
2: Area to the left of z= 2 is 0.9772, or 97.72%.
3: Area to the left of z= 1.12 is 0.8686, or 86.86%.
Notice that-
68.26% observations lie within Mean ± 1 Stdev.
95.44% observations lie within Mean ± 2 Stdev.
99.72% observations lie within Mean ± 3 Stdev.
Area to the left of –z Area to the left of +z
If μ≠0 or σ≠1, then compute z= (x – μ)/σ to use
the z table. Notice z= x when μ=0 and σ=1.
12
BITS Pilani, Pilani Campus
Area (probability) is given, find z
Area z=?
1. 5.48% (0.0548) 1. -1.60
2. 88.49% (0.8849) 2. 1.20
3. 30% (0.3000) 3. -0.52
4. 50% (0.0000) 4. 0.00
5. 80% (0.8000) 5. 0.84
6. 90% (0.9000) 6. 1.28
7. 99% (0.9900) 7. 2.32
8. 99.5% (0.9950) 8. 2.57
MS Excel function for the above problems,
Mean=0 and Stdev=1. MS Excel function-
=NORM.INV(Area,0,1) =NORM.INV(Area,Mean,Stdev)
It gives x value for the area given to the left of x.
13
BITS Pilani, Pilani Campus
x is given, Area=?- Using z Table
The average height (μ) of visitors to the Statute Museum
has been recorded as 150 cms with standard deviation (σ)
of 5 cms.
Largest 40% plantations. = 21.0. Smallest 50% 0.5000 0.00 20+ 0*4 = 20.0.
Smallest 80% 0.8000 0.84 20+0.84*4 = 23.4
Largest 10% 0.9000 1.29 20+ 1.29*4 =25.2.
Middle 50% plantations. = 17.3 to 22.7.
Largest 40% 0.6000 0.26 20+0.26*4 = 21.0.
Middle 80% plantations. = 14.9 to 25.2.
0.2500 -0.67 20+(-0.67)*4 = 17.3.
Middle 50%
0.7500 0.68 20+0.68*4 = 22.7
0.1000 -1.28 20+(-1.28)*4 = 14.9.
Middle 80% 15
0.9000 1.29 20+1.29*4 = 25.2.
Excel output,
Required Excel function
Area=
68.27%
P(x>9) =1-NORM.DIST(9,7,2,TRUE) 0.1587
P(<7 or >9) =NORM.DIST(7,7,2,TRUE)+(1-NORM.DIST(9,7,2,TRUE)) 0.6587
P(5 to 9) =NORM.DIST(9,7,2,TRUE)-NORM.DIST(5,7,2,TRUE) 0.6827
MS Excel function-
=Norm.Dist(X,Mean,Stdev,TRUE).
It gives area to the left of x.
16
BITS Pilani, Pilani Campus
From Textbook- Using Z table
p- 213 & 214. TB-1.
65.87%- Two
b. Probability download speed <7 or > 9 secs? light blue areas
P(<7 or >9)= P(<7) + P(>9)
= 0.5000 + (1-0.8413) From z table: p-540 & 541, TB-1.
= 0.6587, or 65.87%. x z = (x-Mea n)/Stdev
Area to the
left of z
68.26%-
5 = (5-7)/2 = -1.00 0.1587
Red area
c. Probability download speed is 5 to 9 secs? 7 = (7-7)/2 = 0. 0.5000
9 = (9-7)/2 = 1.00 0.8413
P(5 to 9)= P(<9) - P(<5)
= 0.8413 - 0.1587
= 0.6826, or 68.26%.
17
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
18
Using MS Excel: Obesity
MS Excel function-
3.3% 12.5% 21.1% 32.2% 21.7% 9.1%
=NORM.DIST(x,Mean,Stdev,TRUE)
MS Excel gives area to the left of x.
BMI Area MS Excel Formula
Less than <18 0.0334 =NORM.DIST(18,29,6,TRUE) 3.3% 12.5% 21.1% 32.2% 21.7% 9.1%
18 to 23 0.1253 =NORM.DIST(23,29,6,TRUE)-NORM.DIST(18,29,6,TRUE)
Obesity Class-III
Obesity Class-II
Obesity Class-I
Normal weight
23 to 27 0.2108 =NORM.DIST(27,29,6,TRUE)-NORM.DIST(23,29,6,TRUE)
Underweight
Over weight
27 to 32 0.3220 =NORM.DIST(32,29,6,TRUE)-NORM.DIST(27,29,6,TRUE)
32 to 37 0.2173 =NORM.DIST(37,29,6,TRUE)-NORM.DIST(32,29,6,TRUE)
More than 37 0.0912 =1-NORM.DIST(37,29,6,TRUE)
Sum 1.0000
Potential applications:
Capacity planning & target marketing by Wellness centers, Demand 5 11 17 23 29 35 41 47 53
for nutrition, Demand Obesity Class-III specialists…… 18 23 27 32 37 19
BITS Pilani, Pilani Campus
Using Z table: Distribution of IQ
IQ test results have near Normal distribution, with Mean, μ=100 and Standard
deviation, σ=15.
21
BITS Pilani, Pilani Campus
HW-05:
Download Excel file at Taxila. Below Topic-2.
Likely Contents-
1. Find Area- z value is given.
2. Find z value- Area is given.
3. IQ- Get distribution of IQ; Mean and Stdev are given.
4. Obesity- Get distribution of BMI; Mean and Stdev are given.
5. Museum- Revenue projection from Height-based pricing.
6. QuarterFinals- Stadium capacity.
7. Warranty- Financial impact of different duration of warranty, and
Budgeting.
8. JaipurExpress- OnTimeEveryTime.
22
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Nice to know
Big Bang origin of ….
Normal distribution
This distribution first appeared in a paper
by DeMoivre , a Frenchman, in 1733. On
gambling.
Other contributors- Laplace- a Frenchman,
and Gauss- a German, 1807.
The name Normal is attributed to Galton
and Pearson, both English, 1880s/1890s.
24
BITS Pilani, Pilani Campus
Normal distribution?
CBSE. Class 12. Examinations 2015.
“The distribution, in this case, is much more normal and
symmetrical than the individual subjects' distribution.”
https://en.wikipedia.org/wiki/Central_Board_of_Secondary_Education
25
BITS Pilani, Pilani Campus
Compare following Normal distributions
A B C D
60 70 90 100
Rank above Normal distributions Rank above Normal distributions Rank above Normal distributions
on their means and standard on their means and standard on their means and standard
deviations. deviations. deviations.
Mean: Blue=Red=Yellow=1000. Mean: Blue (30) < Red (50) Mean: A (60) < B (70) < C (90) < D (100).
Stdev: Blue < Red < Yellow. Stdev: Blue = Red. Stdev: D < A < C < B.
26
BITS Pilani, Pilani Campus
Beware- Three types of tables
27
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
29
BITS Pilani, Pilani Campus
Application: Height-based pricing
The average height (μ) of visitors to the Statute Museum
has been recorded as 150 cms with standard deviation
(σ) of 5 cms.
The museum is planning to introduce the following
height-based entry fee.
Height, X cm Entry fee, Rs Visitors, % Visitors, nos. Revenue, Rs
< 140 0 2.28 14 -
140-147 10 25.15 151 1,509
147-157 20 64.50 387 7,740
> 157 50 8.08 48 2,423 2.28% 25.15% 64.50% 8.08%
Total 100.00 600 11,672
(Expected) Visitors, % = From Normal distribution.
Excel functions used-
(Expected) Visitors, nos. = (Expected) Visitors, %/100 * Visitors (600). 0.0228; =Norm.Dist(140,150,5,True)
(Expected) Revenue= Fee, Rs * (Expected) Visitors, nos. 0.2515; =Norm.Dist(147,150,5,True) - Norm.Dist(140,150,5,True)
0.6450; =Norm.Dist(157,150,5,True) - Norm.Dist(147,150,5,True)
0.0808; =1-Norm.Dist(157,150,5,True)
What is the expected daily revenue (Expected Value) if
the museum gets 600 visitors a day? Ans- Rs 11,672/day.
Now solution will appear……..
Height is assumed to be Normal distributed,
with Mean= 150 cms and Stdev= 5cms.
This problem also appears in HW-05,
Excel file at Taxila. 30
BITS Pilani, Pilani Campus
Application: Warranty
OnOff Ltd sells 100K nos. of electric bulbs annually.
The Mean life the bulbs is 15K hours and Stdev of 4K 22.7% 30.9% 40.1%
hours. Assume life of the bulbs can be approximated
by the Normal distribution.
=Norm.Dist(12,15,4,True) =Norm.Dist(13,15,4,True) =Norm.Dist(14,15,4,True)
If a bulb fails within the warranty period, OnOff
offers to refund Rs 200.
Warranty, Explected Refund Amount,
Area %
What is the annual expected warranty refund cost if '000 hours failures, nos Rs
the warranty period is- 12 22.7 22,663 45,32,547
a. 12K hours? Rs 45.32 L. 13 30.9 30,854 61,70,751
14 40.1 40,129 80,25,873
b. 13K hours? Rs 61.71 L.
Expected failures, nos= Area %/100 * 100,000.
c. 14K hours? Rs 80.26 L. Refund Amount, Rs= Expected failures * Refund/failure (Rs 200).
31
https://www.researchgate.net/publication/318569954_Early_diagnosis_of_chronic_conditions_and_lif
https://en.wikipedia.org/wiki/Gestational_age https://www.researchgate.net/publication/24356000_Lung_Cancer_Susceptibility_Model
estyle_modification/figures?lo=1
_Based_on_Age_Family_History_and_Genetic_Variants/figures?lo=1
HW:
For Ophthalmologists For ENT specialists Which of these distributions can be
approximated by Normal distribution?
https://dizziness-and-balance.com/disorders/bppv/bppv.html
34
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Next chapter
Chapter-7: Sampling Distributions
Quantitative Methods
1
Items of interest
1. Will be used today- “The Cumulative Standardized
Normal Distribution” table on p-540 & 541.
2
BITS Pilani, Pilani Campus
QM Mid-Term test
QM Mid-Term test-
1. Syllabus: Chapter 1 to 8; TB-1.
2. Number of questions-5-7; spread over all the chapters.
3. Numerical: Non-Numerical; planning for … 70:30.
4. Read QP instructions carefully. Questions may have sub-parts.
5. Ensure what is asked is answered, and completely.
6. Show all the workings- no marks without detailed workings.
7. Only hand written will be evaluated. No computer typed/No
Screenshots/No computer outputs.
8. Explain well. Express well.
Pace of coverage
4
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
6
Topics
7
BITS Pilani, Pilani Campus
Most important slide of this course
1.Error
Class Interval, CI Freq.
100-300 4
300-500 33
500-700 11
2.Frequency/Probability distribution 700-900
900-1100
Total
1
1
50
3.Variation
4.Correlation
This slide was first used in L-03, on 30Jan.
8
BITS Pilani, Pilani Campus
Samples
9
BITS Pilani, Pilani Campus
Census and Sampling
Census Population Sample We want to know-
Who will win- A or B?
Entire population (population, tiger,
Is the Water is safe; Is the soil suitable for crop?
agriculture, health facilities).
Is the new Drug safe? Is it effective?
Sampling Bulbs/Concrete cubes meet the specs?
A portion of the population. Life of elephants?
What is the inflation (price change)?
Quality, voting, blood, soil, customer
Customer/Employee satisfaction surveys.
surveys, voice, interviews, …. Time spend on
watching the TV in
What % people believe in Evolution theory?
the last week.
Why samples?
Quicker. Own a pet- Y/N?
Cheaper.
May not participate/Not available.
When tests are destructive.
Scientifically chosen samples can
give good accuracy about the
properties of population.
10
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Probability sampling
Simple random sampling
Each item has equal probability of getting chosen
(=1/N, N is population size).
Systematic sampling
Every nth customer/item/bottle on the production
line.
Simple random sampling Stratified sampling
Stratified sampling probability=1/12 of each item Probability=proportional to each strata
Samples from each strata: Men/Women
(20%/80%); Rural/Urban (30%, 70%);
Steel/Chemicals/Telecom stocks (10%, 20%,
70%); Customers/Non Customers (10%, 90%);
Tourist/NonTourist (60%, 40%).
Cluster sampling
Samples from a geographical district. Systematic sampling
Every 3rd item/bottle/customer
Cluster sampling
probability=1/6 of each cluster 12
BITS Pilani, Pilani Campus
Parameter and Statistic
Parameter Statistic
(Population) (Sample) Population
1. Coverage error-
Excluded from the frame
2. Nonresponse error
3. Measurement error-
Bad or leading question
SE: 45.8-36.3= 9.5.
Sampling distributions
Sampling distribution of the mean
Sampling distribution of the proportion
Relationship between population and
samples
Characteristics of the sample are known. What are
the characteristics of its population?
Chapter-8 to 11: Confidence Interval Estimation,
Hypothesis Testing, Two Sample Tests and ANOVA, Chi- 11 32 11
32
Square Tests. 5
78 5
45
78 45
𝑋𝑋� =34.2; S=29.3, n=5.
µ=?, σ=?
To understand above-
1. First understand: When samples are drawn from a
population whose characteristics are known, what are the
characteristics of samples?
2. Then establish a relationship between sample
characteristics and population characteristics.
Chapter-7: Sampling Distributions (Mean and Proportion)
Sampling distribution.
�
X =μ.
19
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
20
Problem-1
p-238, TB-1.
368
Standard error is Standard deviation of the sampling
distribution.
21
BITS Pilani, Pilani Campus
Problem-2
p-239-240, TB-1.
A cereal filling machine is set to fill the boxes with 368g of cereal in
the boxes. The standard deviation of the filling process is 15g.
Assume Normal distribution.
A random sample of 25 boxes is taken. σ=15g
What is the probability that the sample mean is below 365g? 368g
� will be 368g.
Mean of the sampling distribution, X,
Its standard error will be σX̅ = σ/ 𝑛𝑛= 15/ 25 = 3.0g.
Sampling distribution will be Normal disturbed (due to CLT).
The probability that that the sample mean is below 365 g. is 15.87%. σX̅ = σ/ 𝒏𝒏 = 15/ 𝟐𝟐𝟐𝟐 = 3.0 g
362.12g 373.88g
To get x values from Excel functions-
=Norm.Inv(0.025,368,3) 362.12, for 2.5% area on the left side.
=Norm.Inv(0.975,368,3) 373.88, for 97.5% area on the left side.
23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
A report says that 32% adults are unable to stop thinking about work
while on vacation. {That is, population proportion, π is 0.32.}
A random sample of 200 vacationers is taken.
What is the probability more than 40% vacationers in the sample
are unable to stop thinking about vacation.
The probability that more than 40% vacationers in the sample are
unable to stop thinking about vacation, is 0.78%.
0.0077 or 0.78%
To get the value from Excel function-
=1-Norm.Dist(0.40,0.32,0.033,True) 0.0077, or 0.78%.
Sample proportion, p= 0.32
Using z table- StdError, σp = 0.033.
z= (p-π)/σp = (0.40-0.32)/0.033= 2.424.
From z table, Area on the left of z= 2.424 is 0.9922.
Area on the right of z= 1.0 - 0.9922 = 0.0078, or 0.78%. 25
Nice to know
26
A nice way to explain….
27
BITS Pilani, Pilani Campus
Beware
1. In TB-1:
z table gives area to the left of z value (p-540, 541).
t table gives area to right of t value (542, 543).
Chi-square table gives area to the right of Chi square value (p-
544).
F tables give area to the right of F value (p-545 to 551).
2. In MS Excel:
MS Excel functions for probability distributions are consistent:
They all give areas to the left of z/t/Chi-square/F value.
Guess- Why all to the left?
My Guess: Because Bill gates is left-handed.
Next chapter
Chapter-8: Confidence Interval estimation
29
Quantitative Methods
BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad
1
Items of interest
1. Will be used today-
“The Cumulative Standardized Normal Distribution” table on p-
540 & 541.
“Critical Values of t” table on p-542 & 543.
2
BITS Pilani, Pilani Campus
QM Mid-Term test
QM Mid-Term test-
1. Syllabus: Chapter 1 to 8; TB-1.
2. Number of questions-5-7; spread over all the chapters.
3. Numerical: Non-Numerical; planning for … 70:30.
4. Read QP instructions carefully. Questions may have sub-parts.
5. Ensure what is asked is answered, and completely.
6. Show all the workings- no marks without detailed workings.
7. Only hand written will be evaluated. No computer typed/No
Screenshots of the typed material/No computer outputs.
8. Explain well. Express well.
Pace of coverage
4
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan
6
Topics
Will be done in the next session + not in the syllabus for MidTerm exam
3. Determining sample size.
7
BITS Pilani, Pilani Campus
Estimation of population proportion and
Population mean
NY NN 4
8
N N 4 7
N YN 7 9 4
NN Y N N 9 3 12 7
Y
N N N
3
6 10
N N 8
Types of Estimation-
1. Estimation of population proportion (left.)
2. Estimation of population mean (right).
8
BITS Pilani, Pilani Campus
Point and Interval estimates
9
BITS Pilani, Pilani Campus
Making the estimates
Sample Mean, �
X
Notice that, after rearranging the last two equations-
For population mean-
Margin of Error= Population mean - Sample mean.
For population proportion-
Margin of Error= Population proportion - Sample proportion.
.
10
BITS Pilani, Pilani Campus
Confidence level
11
BITS Pilani, Pilani Campus
Equations for Estimating population mean and
proportion
Population mean, µ= Sample mean ± Margin of Error
= Sample mean ± t-value * Std error of mean
= Sample mean ± t-value * S/ 𝑛𝑛
= X� ± tvalue * S/ n
X− μ ObservedValue−Pop. Mean
Standard error is standard deviation of the sampling distribution. z= =
n- sample size. S- Standard deviation of the sample (divide by n-1, and not by n). σ Pop. Stdev
�− μ
X Sample Mean−Pop. Mean
For mean- t-value from t-table, p-542, 543, TB-1. t= =
For proportion: p- proportion in the sample. Z-value from Z-table, p-540, 541; TB-1. S/ n Stdev of Sampling distribution
Excel function gives t-value for the given area on the left tail.
=T.INV(AreaOnTheLeftOftValue,df).
=T.INV(0.975,15) =2.131. for 97.5% area on left or 2.5% on right.
=T.INV(0.950,10) =1.812. for 95.0% area on left or 5,0% on right.
Not for everyone: The ratio of two Normal distributed variates has t- t-table given in TB-1 has df up to 120.
distribution with n-1 degrees of freedom.
BITS Pilani, Pilani Campus
Degrees of freedom, df
� 2
Sample variance, S2= 1/(n-1) * ∑ (Xi-𝑋𝑋)
14
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus
Sample characteristics-
Sample size, n= 27. Mean, �X= 43.89 days. Stdev.s, S= 25.28 days.
Sample size, n= 50. t-value from Excel function- df=n-1= 50-1= 49.
Mean, �X= 5.5014g. Stdev.s, S= 0.1058g. =T.Inv(area in the left tail,df) [0.5% and 99.5%; middle area 99%]
=T.Inv(0.005,49) = -2.6800 0.5% area in the left tail.
=T.Inv(0.995,49) = +2.6800 99.5% area in the left tail.
Sample characteristics-
Sample size, n= 100. Invoices containing errors, p= 10.
Proportion containing errors, p=10/100= 0.10 or 10%.
Point estimate, π = Sample proportion.
π =p= 0.10. a
Sample characteristics-
Sample size, n=200. Non-conforming newspapers= 35 nos.
Proportion non-conforming, p = 35/200 = 0.175 or 17.5%.
a. Estimate the proportion of newspapers (π) Interval estimate of π = 𝑝𝑝 ± z90% * Std. Error of proportion
having non-conformance attribute? = 𝑝𝑝 ± z90% * p ∗ (1 − p)/n
b. What is the estimation interval, for 90% = 0.175 ± 1.645 * 0.175 ∗ (1 − 0.175)/200
confidence level?
= 0.175 ± 0.0442, or 0.1308 to 0.2192.
Sample size, n= 200. 13.08 to 21.92%. b.
Non conforming newspapers= 35.
z-value from Excel function-
=NORM.INV(area in the left tail,0,1) [5% and 95%; middle area 90%].
=NORM.INV(0.05,0,1) = -1.645 5% area in the left tail.
=NORM.INV(0.95,0,1) = +1.645 95% area in the left tail.
23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus