0% found this document useful (0 votes)
43 views27 pages

Sampling Distribution

The document discusses numerical summaries of data and sampling distributions. It defines key terms like population, sample, point estimation, and sampling distribution. The central limit theorem is introduced as stating that the sampling distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. Point estimators like the sample mean and sample median are presented as ways to estimate population parameters from sample data.

Uploaded by

Ansh Sachdeva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views27 pages

Sampling Distribution

The document discusses numerical summaries of data and sampling distributions. It defines key terms like population, sample, point estimation, and sampling distribution. The central limit theorem is introduced as stating that the sampling distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. Point estimators like the sample mean and sample median are presented as ways to estimate population parameters from sample data.

Uploaded by

Ansh Sachdeva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

09-03-2022

POINT ESTIMATE
&
SAMPLING
DISTRIBUTION
Dr. Navneet Bhatt

NUMERICAL SUMMARIES OF DATA

 Data are the numeric observations of a phenomenon of


interest.
 The totality of all observations is a population.
 A portion used for analysis is a random sample.

Dr. Navneet Bhatt, ASMSOC, NMIMS 1


09-03-2022

NUMERICAL SUMMARIES OF DATA

 We gain an understanding of this collection (population) by


describing it numerically and graphically, usually with the sample
data.
 We describe the collection in terms of Shape, Outliers, Center, and
Spread (SOCS).
 The center is measured by the mean.
 The spread is measured by the variance.

NUMERICAL SUMMARIES

The variance is the average of the squares of the deviations.


The standard deviation is a number that measures how far data values are from their mean.
4

Dr. Navneet Bhatt, ASMSOC, NMIMS 2


09-03-2022

POINT ESTIMATION

• Estimation represents ways or a process of learning and


determining the population parameter based on the model
fitted to the data.
• There are three main ways of learning about the
population parameter from the sample statistic.
✓ Point estimation
✓ Interval estimation
✓ Hypothesis testing
5

POINT ESTIMATION

• A point estimate is a reasonable, single value that estimates a


population parameter and calculated from the sample.
are random variables, then functions of these
• If 𝑋1, 𝑋2, … , 𝑋n
random variables, 𝑋 and 𝑆2, are also random variables called
statistics.

Dr. Navneet Bhatt, ASMSOC, NMIMS 3


09-03-2022

SAMPLING DISTRIBUTION

• The probability distribution of a statistic is called a sampling


distribution.
 To get a sampling distribution:
1. Take a sample of size 𝑛 (a given number like 5, 10, or 1000) from a population
2. Compute the statistic (e.g., the mean) and record
it.
3. Repeat 1 and 2 (infinitely for large pops).
4. Plot the resulting sampling distribution, a distribution
of a statistic over repeated samples.

SAMPLING DISTRIBUTION

 Objective: To know Avg. no of coins any individual person carry.


 Step 1: Take a sample of size n = 10.
 Step 2: Record the statistic (sample mean)
 Step 3: Repeat the experiment.

Dr. Navneet Bhatt, ASMSOC, NMIMS 4


09-03-2022

SAMPLING DISTRIBUTION

 For sample size n = 10  For sample size n = 25  For sample size n = 50

POINT ESTIMATOR

• A point estimate of some population parameter  is a single


numerical value 𝜃 of a statistic Θ
• The statistic Θ is called the point estimator.
 Example: suppose that the random variable 𝑋 is normally
distributed with an unknown mean 𝜇. The sample mean is a point
estimator of the unknown population mean 𝜇. That is, 𝜇̂ = 𝑋.
After the sample has been selected, the numerical value 𝑋 is the
point estimate of 𝜇. Thus, if 𝑋1 = 25, 𝑋2 = 30, 𝑋3 = 29 and 𝑋4 =
31, the point estimate of 𝜇 is
10

Dr. Navneet Bhatt, ASMSOC, NMIMS 5


09-03-2022

SOME PARAMETERS & THEIR STATISTICS

• Ways to estimate the mean of a population:


– We could choose the:
• Sample mean
• Sample median
• Average of the largest & smallest observations in the sample 11

SOME DEFINITIONS

• The random variables 𝑋1, 𝑋2, … , 𝑋n are a random sample of


size 𝑛 if:
a) The 𝑋𝑖’s are independent random variables
b) Every 𝑋𝑖 has the same probability distribution
• A statistic is any function of the observations in a random
sample, i.e. 𝑋, 𝑆2, 𝑆 …
We use statistics to estimate parameters

12

Dr. Navneet Bhatt, ASMSOC, NMIMS 6


09-03-2022

SOME DEFINITIONS

• Consider determining the sampling distribution of the sample mean


𝑋.
• If a random sample of size 𝑛 is taken from a normal population
with mean 𝜇 and variance 𝜎 2 , then each observation in this
sample (𝑋1, 𝑋2, … , 𝑋𝑛) is a normally and independently distributed
random variable with mean 𝜇 and variance 𝜎 2 .
• Reason: linear functions of independently, normally distributed
random variables are also normally distributed.

13

SOME DEFINITIONS

 Conclusion: For normal population, the sample mean

has a normal distribution with mean

and variance

14

Dr. Navneet Bhatt, ASMSOC, NMIMS 7


09-03-2022

THE CENTRAL LIMIT THEOREM

 The Central Limit Theorem is one of the most powerful and


useful ideas in all of statistics.
 The Central Limit Theorem is concerned with drawing finite
samples of size n from a population with a known mean, μ,
and a known standard deviation, σ.
 The conclusion is that if we collect samples of size n with a
"large enough n," calculate each sample's mean, and create a
histogram (distribution) of those means, then the resulting
distribution will tend to have an approximate normal
distribution.

THE CENTRAL LIMIT THEOREM

• The Central Limit Theorem states that the sampling distribution of


the sampling means approaches a normal distribution as the sample
size gets larger (usually 𝑛 > 30), no matter what the shape of the
population distribution.
• By taking more samples (especially large ones), the graph of the
sample means will look like a normal distribution.

16

Dr. Navneet Bhatt, ASMSOC, NMIMS 8


09-03-2022

THE CENTRAL LIMIT THEOREM

 Central Limit Theorem:If 𝑋1, 𝑋2, … , 𝑋n is a random sample of size 𝑛


taken from a population (either finite or infinite) with mean 𝜇 and
finite variance 𝜎2 and if 𝑋 is the sample mean, the limiting form of
the distribution of

as 𝑛  , is the standard normal distribution.

Large samples produce sample estimates very close to the parameter.


18

Dr. Navneet Bhatt, ASMSOC, NMIMS 9


09-03-2022

EXAMPLE

 A synthetic fiber used in manufacturing carpet has tensile strength that is


normally distributed with mean 520 kN/m2 and standard deviation 25 kN/m2.
Find the probability that a random sample of 𝑛 = 6 fiber specimens will have
sample mean tensile strength that exceeds 525 kN/m2.

19

EXAMPLE

 An electronics company manufactures resistors that have a mean resistance of


100 ohms and a standard deviation of 10 ohms. The distribution of resistance is
normal. Find the probability that a random sample of n = 25 resistors will have an
average resistance less than 95 ohms.

20

Dr. Navneet Bhatt, ASMSOC, NMIMS 10


09-03-2022

QUESTION

 The amount of time that a customer spends waiting at an airport check-in counter
is a random variable with mean 8.2 minutes and standard deviation 1.5
minutes. Suppose that a random sample of n = 49 customers is observed. Find
the probability that the average time waiting in line for these customers is
(a) Less than 10 minutes
(b) Between 5 and 10 minutes
(c) Less than 6 minutes

21

SAMPLING DISTRIBUTION OF THE DIFFERENCE


BETWEEN TWO MEANS
 If we have two independent populations with means μ1 and μ2, and variances
𝜎1 and 𝜎2 , and let 𝑋 and 𝑋 be the sample means of two independent
random samples of sizes 𝑛 and 𝑛 from these populations. Then the sampling
distribution of:

 is approximately standard normal, if the conditions of the central limit


theorem apply.
• If the two populations are normal, then the sampling distribution of22𝑍 is exactly
standard normal.

Dr. Navneet Bhatt, ASMSOC, NMIMS 11


09-03-2022

EXAMPLE
SAMPLING DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO
MEANS
 The effective life of a component used in jet-turbine aircraft engine is a random
variable with mean 5000 and SD 40 hours and is close to a normal distribution.
The engine manufacturer introduces an improvement into the Manufacturing
process for this component that changes the parameters to 5050 and 30.
Random samples of size 16 and 25 are selected. What is the probability that
the difference in the two sample means 𝑋 − 𝑋 is at least 25 hours?

23

 The distribution of 𝑋 is normal with mean μ1 = 5000 hours, and the


distribution of 𝑋 is normal with mean μ2 = 5050 hours. Now the distribution of
𝑋 − 𝑋 is normal with mean μ2 − μ1 = 5050 − 5000 = 50 hours and variance

The sampling distribution of 𝑋 − 𝑋

24

Dr. Navneet Bhatt, ASMSOC, NMIMS 12


09-03-2022

EXAMPLE

The television picture tubes of manufacturer A have a mean lifetime of 6.5 years and a
standard deviation of 0.9 year, while those of manufacturer B have a mean lifetime of 6.0
years and a standard deviation of 0.8 year. What is the probability that a random sample of
36 tubes from manufacturer A will have a mean lifetime that is at least 1 year more than
the mean lifetime of a sample of 49 tubes from manufacturer B?

25

CONFIDENCE
INTERVAL

Dr. Navneet Bhatt, ASMSOC, NMIMS 13


09-03-2022

UNDERSTANDING CONFIDENCE INTERVAL

27

CONFIDENCE INTERVAL

• A Confidence Interval is a range of values we are fairly sure


our true value lies in.
• Example: Average Height
• We measure the heights of 40 randomly chosen men, and
get a:
– mean height of 175cm
– standard deviation of 20cm

28

Dr. Navneet Bhatt, ASMSOC, NMIMS 14


09-03-2022

CONFIDENCE INTERVAL

 The 95% Confidence Interval (we will show how to calculate it later)
is:
175 cm  6.2 cm

165 170 175 180 185


168.8 181.2

 This says the true mean of ALL men (if we could measure their
heights) is likely to be between 168.8cm and 181.2cm. But it might
not be!
29

CONFIDENCE INTERVAL

• The "95%" says that 95% of experiments like we just did will include
the true mean, but 5% won't.
• So there is a 1-in-20 chance (5%) that our Confidence Interval
does NOT include the true mean.

175 cm  6.2 cm

165 170 175 180 185


168.8 181.2

30

Dr. Navneet Bhatt, ASMSOC, NMIMS 15


09-03-2022

CALCULATING THE CONFIDENCE INTERVAL

• Step 1: Write down the number of samples 𝑛, and calculate


the mean 𝑋 and standard deviation S of those samples:
– Number of samples: 𝑛 = 40
 – Mean: 𝑋 = 175
– Standard Deviation: S = 20

31

CALCULATING THE CONFIDENCE INTERVAL

 Step 2: Decide what Confidence level you want. 90%, 95% and 99%
are common choices. Then find the “𝑧" value for that Confidence
Interval here:
Confidence level Z
80% 1.282
85% 1.440
90% 1.645
95% 1.960
99% 2.576
 For 95% the 𝑧 value is 1.960
99.5% 2.807
99.9% 3.291
32

Dr. Navneet Bhatt, ASMSOC, NMIMS 16


09-03-2022

CALCULATING THE CONFIDENCE INTERVAL

 Step 3: Use that 𝑧 in this formula for the Confidence Interval

33

CALCULATING THE CONFIDENCE INTERVAL

 So, we have

34

Dr. Navneet Bhatt, ASMSOC, NMIMS 17


09-03-2022

HOW TO FIND −VALUE FROM TABLE?

 For CI=95%, 𝛼 = 0.05 (or 𝛼/2 = 0.025)


 The probability that (𝑍) = (0.95 + 0.025) = 0.975 is equal to the gray
area under the curve to the right.

35

HOW TO FIND −VALUE FROM TABLE?

36

Dr. Navneet Bhatt, ASMSOC, NMIMS 18


09-03-2022

CONFIDENCE INTERVAL AND ITS PROPERTIES

• A confidence interval estimate for 𝜇 is an interval of the form

where the end-points 𝑙 and 𝑢 are computed from the sample data.
• There is a probability of 1 − 𝛼 of selecting a sample for which the CI
will contain the true value of 𝜇.
• The endpoints or bounds 𝑙 and 𝑢 are called lower- and upper-
confidence limits, and 1 − 𝛼 is called the confidence coefficient.

37

CONFIDENCE INTERVAL AND ITS PROPERTIES

38

Dr. Navneet Bhatt, ASMSOC, NMIMS 19


09-03-2022

CONFIDENCE INTERVAL ON THE MEAN OF A


NORMAL DISTRIBUTION, VARIANCE KNOWN
 If 𝑥̅ is the sample mean of a random sample of size 𝑛 from a normal
population with known variance 𝜎 2 , a 100(1 − 𝛼)% CI on  is given
by

where 𝑧𝛼 /2 is the upper 100 𝛼/2 percentage point of the standard


normal distribution.

39

EXAMPLE: METALLIC MATERIAL TRANSITION

Ten measurements of impact energy (J) on specimens of A238 steel cut


at 60°C are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8,
64.2, and 64.3. The impact energy is normally distributed with 𝝈 = 1J.
Find a 95% CI for , the mean impact energy.
Answer:
The required quantities are zα/2 = z0.025 = 1.96, n = 10, 𝜎 = 1, and 𝑥̅ =64.46.

Interpretation: Based on the sample data, a range of highly plausible values for mean impact energy for A238
steel at 60°C is
63.84 J ≤ 𝜇 ≤ 65.08 J
40

Dr. Navneet Bhatt, ASMSOC, NMIMS 20


09-03-2022

ONE-SIDED CONFIDENCE BOUNDS

 A 100(1 −𝛼)% upper-confidence bound for 𝜇 is

One-Sided Confidence Bounds


on the Mean, Variance Known

 and a 100(1 −𝛼)% lower-confidence bound for 𝜇 is

One-Sided Confidence Bounds


on the Mean, Variance Known

41

EXAMPLE: ONE-SIDED CONFIDENCE BOUND

The same data for impact testing from Example 1 are used to
construct a lower, one-sided 95% confidence interval for the mean
impact energy.
Answer: 𝑧𝛼 = 1.64, 𝑛 = 10, 𝜎 = 1, and 𝑥̅ = 64.46.
A 100(1 − α)% lower-confidence bound for 𝜇 is

The lower limit of a one-sided interval is


always greater than the lower limit of a two-
sided interval of equal confidence.

The upper limit of a one-sided interval is


always less than the lower limit of a two-sided
interval of equal confidence.
42

Dr. Navneet Bhatt, ASMSOC, NMIMS 21


09-03-2022

43

EXAMPLE

A manufacturer produces piston rings for an automobile engine. It is


known that ring diameter is normally distributed with 𝜎 = 0.004
millimeters. A random sample of 20 rings has a mean diameter of
𝑥̅ =74.036 millimeters.
(a)Construct a 99% two-sided confidence interval on the mean piston
ring diameter.
For CI=99%,  = 0.01 (or /2=0.005); The probability that
Φ(Z)=(0.99+0.005)=0.995
zα/2 = z0.005 = 2.58 → 74.0337 ≤ 𝜇 ≤ 74.0383
(b) Construct a 99% lower-confidence bound on the mean piston
ring diameter.
For CI=99%,  = 0.01; The probability that Φ(Z)=0.99
zα = z0.01 = 2.33 → 𝜇 ≥ 74.0339 44

Dr. Navneet Bhatt, ASMSOC, NMIMS 22


09-03-2022

THE DISTRIBUTION (STUDENT- DISTRIBUTION)


William Sealy
Gosset
 Let X1, X2, , Xn be a random sample from a normal distribution
with known mean  and unknown variance 2. The random
variable

has a t distribution with n  1 degrees of freedom.

The t distribution is a probability distribution that is used to


estimate population parameters when:
✓ the sample size is small and/or
✓ the population variance is unknown
45

THE DISTRIBUTION

 The t probability density function is

k is the number of
degrees of freedom.

 As the number of degrees of freedom k, the limiting form of the


t distribution is the standard normal distribution.

If the sample size is large enough, say n ≥ 30, the distribution of T does not differ considerably from the
standard normal. However, for n < 30, it is useful to deal with the exact distribution of T.

46

Dr. Navneet Bhatt, ASMSOC, NMIMS 23


09-03-2022

THE DISTRIBUTION

• Shape: Bell-shaped, symmetric


• Center: Centered at zero
• Spread: Controlled by degree of freedom
• Sample size = 𝑛
• Degree of freedom = 𝑛 −1

47

THE DISTRIBUTION

• Let tα,k be the value of the random variable T with k degrees of


freedom above which we find an area (or probability) .
• Thus, tα,k is an upper-tailed 100 percentage point of the t
distribution with k degrees of freedom.

48
Percentage points of the t distribution.

Dr. Navneet Bhatt, ASMSOC, NMIMS 24


09-03-2022

Upper-tail probability p
It is customary to let tα represent the t-
value above which we find an area equal
to α.
 Upper-tail probability p

Hence, the t-value with 10 degrees of


freedom leaving an area of 0.025 to the
right is t = 2.228.

Since the t-distribution is symmetric


about a mean of zero, we have 𝑡 =
−𝑡 ; that is, the t-value leaving an area
of 1 − α to the right and therefore an
area of α to the left is equal to the
negative t-value that leaves an area of α
in the right tail of the distribution

t = −t
0.95 0.05

t = −t
0.99
49
0.01

EXAMPLE

 The t-value with k = 14 degrees of freedom that leaves an area of


0.025 to the left, and therefore an area of 0.975 to the right, is
𝑡 . = −𝑡 . = −2.145

 Find P(−𝑡 . < T <𝑡 . ).


Since 𝑡 . leaves an area of 0.05 to the right, and −𝑡 . leaves an
area of 0.025 to the left, we find a total area of
1 − 0.05 − 0.025 = 0.925
50

Dr. Navneet Bhatt, ASMSOC, NMIMS 25


09-03-2022

EXAMPLE

 Find k such that P(k < T < −1.761) = 0.045 for a random sample of size 15 selected from a normal
̅
distribution and ⁄
.

t-0.05
From Table, we note that 1.761 corresponds to t0.05 when v = 14. Therefore, −t0.05 = −1.761. Since k in the
original probability statement is to the left of −t0.05 = −1.761, let k = −tα . Then, from Figure, we have
0.045 = 0.05 − α, or α = 0.005.
Hence, from Table with v = 14,
k = −t0.005 = −2.977 and P(−2.977 < T < −1.761) = 0.045.
51

CONFIDENCE INTERVAL ON MEAN, VARIANCE


UNKNOWN
• If x̄ and s are the mean and standard deviation of a random sample
from a normal distribution with unknown variance 2, a 100(1 − 𝛼)
% confidence interval on  is given by

 where t2,n1 the upper 100 𝛼/2 percentage point of the t


distribution with 𝑛 − 1 degrees of freedom.
• One-sided confidence bounds on the mean are found by replacing
t/2,n-1 in the above Equation with t,n-1.

52

Dr. Navneet Bhatt, ASMSOC, NMIMS 26


09-03-2022

EXAMPLE: ALLOY ADHESION

Construct a 95% CI on 19.8 10.1 14.9 7.5 15.4 15.4


15.4 18.5 7.9 12.7 11.9 11.4
 to the following data. 11.4 14.1 17.6 16.7 15.8
19.5 8.8 13.6 11.9 11.4

The sample mean is 𝑥̅ = 13.71 and sample standard deviation is s = 3.55.

Answer: Since n = 22, we have n  1 =21 degrees of freedom for t, so


t0.025,21 = 2.080 [Table].
The resulting CI is

Interpretation: The CI is fairly wide because there is a lot of variability in the measurements.
A larger sample size would have led to a shorter interval. 53

EXAMPLE

 Acme Corporation manufactures light bulbs. The CEO claims that an average Acme light bulb
lasts 300 days. A researcher randomly selects 15 bulbs for testing. The sampled bulbs last an
average of 290 days, with a standard deviation of 50 days. If the CEO's claim were true, what is
the probability that 15 randomly selected bulbs would have an average life of no more than
290 days?

54

Dr. Navneet Bhatt, ASMSOC, NMIMS 27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy