Statistics Midterms Notes
Statistics Midterms Notes
MODULE:
PROFESSOR: Sir Tommie Dicang Date: October 23, 2022
√
μ /¿ 2
AD=Σ/ x− ∨AD =Σ /x−x ¿ ¿ ¿ Σ (x−xbar )x = value of any particular
N s=
n−1 observations or
AD = average deviation
s =Σ x −¿ ¿ ¿ measurement
2 2
x = the value of any particular observations or
Σx = sum of all xs
s= √ Σ x −¿ ¿ ¿ ¿ Σ
measurement 2
¿ x 2= sum of all the square
µ = population mean
of xs.
X(bar) = sample mean
N = population Xbar = sample mean
n = sample population n = sample population
Example:
The daily rates of a sample of eight employees at GMS Grouped data
Inc.: are P550, P420, P560, P500, P700, P670, P860, 2 2
Σ f (x−xbar ) s = sample variance
2
P480. Find the average deviation. s=
n−1 s = sample SD
x(bar) = 592.50 x = value of any particular
√
2
Σ f ( x−xbar ) observations or measurement
s=
n−1 Σfx = sum of all the product of
2
f and xs
2( Σ fx) 2
Σf x − Σ fx = sum of all the product
n of f and square of xs.
s2=
n−1 Xbar = sample mean
√
2n = sample population
( Σ fx) f = frequency
2
Σf x −
AD = 905 / 8 = 113.13 n
s=
n−1
Grouped data
μ/¿ xbar /¿
AD=Σ f / x− ∨ AD=Σ f / x− ¿¿
N n 1
SUBJECT: Statistical and Analytic Process
MODULE:
PROFESSOR: Sir Tommie Dicang Date: October 23, 2022
2(9+1)
Qk =¿ =5
POPULATION VARIANCE AND STANDARD DEVIATION 4
2
2
σ =
Σ ( X−μ)2
N
σ=
N √
Σ (X −μ)2
Qk =¿
3(9+1)
4
=7.5
45, 46, 48, 51, 53, 54, 55, 58, 59
σ = population variance
σ = population SD
46+ 48
X = value of any particular observations or Q 1= =4 7
2
measurement 55+58
Q 3= =56. 5
μ = population mean 2
N = population Q2=5 3
[ ]
The monthly incomes of the five research directors of
kN
Recoletos schools are: P55,000, 59500, 62500, 57000 −cf
4
and 61000. Find the variance and SD. Qk =LB+ ( i)
f
√
2 2
2 Σ ( X−μ) Σ( X−μ)
σ = σ=
N N Q k = quartile
Compute for the mean: 59,000 N = population
Variance: 730,000 k = quartile location
SD: 2,701.85 LB = lower boundary of the quartile class
f = frequency of the quartile class
QUARTILES, DECILES AND PERCENTILES cf = cumulative frequency before the quartile
Quartiles – cut off points to split the data (4 groups) class
such that 25% of the observations are in each group. i = class interval
Quantiles – General term for such cut off points
Deciles – 10 parts DECILE AND PERCENTILE FOR GROUPED DATA
Percentiles – Split the data into 100 parts (centiles)
( )
*The lowest quartile is also the 25th percentile and the
kN
−cf
10
median is the 50th percentile or the 5th decile. Dk= (i)
f
Ungrouped data Dk = decile
k ( N +1) N = population
Qk = k = decile location
4
LB = lower boundary of the decile class
Qk = quartile f = frequency of the decile class
N = population cf = cumulative frequency before the decile class
k = quartile location
( )
Example: kN
Find the first, second and third quartile of the ages of 9 −cf
100
middle-management employees of a certain company. Pk = (i)
f
The ages are53, 45, 59, 48, 54, 46, 51, 58 and 55.
Pk = percentile
k (N +1) 1(9+1)
Qk = = =2.5 N = population
4 4 k = percentile location
LB = lower boundary of the percentile class 2
f = frequency of the percentile class
SUBJECT: Statistical and Analytic Process
MODULE:
PROFESSOR: Sir Tommie Dicang Date: October 23, 2022
Example:
56.5−47
QD= =4.7 5
2
MIDHINGE
- Mean of the first and third quartiles in the data set Example:
- Used to overcome potential problems introduced Determine the midhinge, IQR, and QD of the
by extreme values in the data set frequency distribution on the ages of 50 people taking
travel tours.
Q 1+ Q3
midhinge=
2
Example:
Find the Midhinge of the ages of 9 middle-
management employees of a certain company. The
ages are 53, 45, 59, 48, 51, 58 and 55.
Q1+Q3 47 +56.5
Midhinge= = =51.75
2 2
Q1+Q 3 40+ 58.82
Midhinge= = =49.4 1
INTERQUARTILE RANGE 2 2
- Midspread or middle fifty IQR=Q3 −Q1 = 58.82 – 40 = 18.82
- Measure of statistical dispersion, being equal to Q −Q1 58.82 – 40
the difference between the third and first quartiles QD= 3 = =9.4 1
2 2
- It is a robust (strong) statistic, having a breakdown
point of 25% and is preferred to the total range.
COEFFICIENT OF VARIATION
- Used to build box plots
- Standard deviation for two samples with different
IQR = units of measures.
Q3−Q1 - SD divided by the means
Example:
- Expressed as percentage
Find the IQR of the ages of 9 middle-management
employees of a certain company. The range are 53, 45, σ
For sample :CV = (100 %)
59, 48, 54, 46, 51, 58, and 55. x
IQR = 56.5 – 47 = 9.5 σ
For population :CV = ( 100 % )
μ
QUARTILE DEVIATION CV = coefficient of variation
CHEBYSHEV’S THEOREM
- For any set of observations, the proportion of the
humps and shot tails. Negative kurtosis (kurt < 0) and
values that lie within k standard deviation of the
denotes a low degree of peakedness.
1
mean is at least 1−
k2 NORMAL DISTRIBUTION
- Where k is any constant greater than 1 - Gaussian Distribution
Example: - ell-shaped continuous probability distribution
The mean price of laptop computer is P25500 and the widely used in statistical inference that describes
sd is P2500. Find the price range for which at least data that clusters around a mean
88.89% of the laptop will sell. Properties:
25500 + 3(2500) = 33000 - bell-shaped
25500 – 3(2500) = 18000 - mean, median and mode are equal and are located
at the center of the distribution
KURTOSIS (KYRTOS or KURTOS) - normal distribution curve is symmetric about the
- Greek word for bulging mean (the shape are same on both sides)
- It is a statistical measure used to describe the - normal distribution is continuous
distribution of observed data around the mean. - normal curve is asymptotic (it never touches the x-
- It measures the relative peakedness or flatness of axis)
a distribution - total area under the normal distribution curve is
1.00 or 100%
ku rt =¿ - area under the part of a normal curve that lies
{[ ][ ∑ ]}
n 4 2
n(n+1) X i−Xbar 3 ( n−1 ) within 1 SD of the mean 68%; within the 2 SD,
( ) −
(n−1)( n−2)(n−3) i=1 σ ( n−2 ) ( n−3about
) 95%; and with 3 SD, about 99.7%.
kurt = kurtosis
n= sample population STANDARD NORMAL (Z) DISTRIBUTION
X = the value of an particular measure - Z Value is the signed distance between a selected
Xbar = sample mean value, designated X, and the mean, µ, divided by
σ = sd the SD.
- Other term for Z value is Z score, the z statistics,
3 TYPES OF KURTOSIS
the standard normal deviates or the standard
1. Leptokurtic – Distributions where values normal values.
clustered heavily or pile up in the center. There
X−μ
are tall distribution with narrow humps and Z=
σ
long high tails. Positive kurtosis (kurt > 0) and
z = z value
denotes with a high degree of peakedness. X = the value of any particular observation or
measurement
2. Mesokurtik – Intermediate distribution which µ = the mean of the distribution
are neither too peaked nor too flat. The values = SD of the observation
4
SUBJECT: Statistical and Analytic Process
MODULE:
PROFESSOR: Sir Tommie Dicang Date: October 23, 2022
Example: The average Pag-ibig salary loan for RFS Pharmacy Inc,
Find the area under the standard normal distribution employees is P23,000. If the debt is normally
curve between z = 0 and z = 1.85 distributed with a SD of P2500, find the probability
- Using the table, locate 1.8 and 0.05, then the ztab that the employee owes less than P18,500.
= 0.4678
- P (0< z < 1.85 Step 1: Draw the figure and represent the area.
Example 2:
The average age of bank managers is 40 years.
Assume the variable is normally distributed. If the SD
is 5 years, find the probability that the age of a
- The required area is the right tail of the normal
randomly selected bank manager will be in the range
curve. Since P(0 <z < 1.15) which is 0.3749
between 35 and 46 years old.
- Subtract P(0<z <1.15) = 0.3749 from 0.5000 since
half of the area under the curve is to the right of z
- Z = 35 – 40 / 5 = -1.00 ; z = 46 – 40 / 5 = 1.20
= 0.
- P(-1.00 < z <0) = 0.3413; P(0<z< 1,20)
- P(35<x<46) = P(-1.00 < z < 1.20) = 0.3413 +
0.3849 = 0.7262
5
SUBJECT: Statistical and Analytic Process
MODULE:
PROFESSOR: Sir Tommie Dicang Date: October 23, 2022
- Central limit theorem is used to gain information
about a sample mean when the variable is CONFIDENCE INTERVALS OF THE MEAN AND SAMPLE
normally distributed, or sample size is 30 or more SIZE SD KNOWN
Example 1: Point estimate – value of a sample statistic that is used
The average cost per household of owning a brand- to estimate a population parameter.
new car is P5,000. Suppose that we randomly selected
40 households, determine the probability that the
sample mean for these 40 households is more than - For the estimation of the population mean, the
P5350? Assume the variable is normally distributed margin of error is calculated as: Margin of error =
and the SD is P1,230. ± 1.960 σ x ∨±1.960 s x
σ 1230 - s xis a point estimator of σ x
σ x= = =194.48
√ n √ 40
xbar−μ 5350−5000 CONFIDENCE INTERVALS
z= = =1.80
σ /√n 1230/ √ 40 1 - good estimator
- Ztab = 0.4641, since the area is in the right tail, - unbiased estimator – a population parameter is an
subtract it to 0.500 estimator whose expected value is equal to that
- P(xbar >5350) = P(z >1.80) = 0.500 – 0.4641 = parameter
0.0359 2 - consistent estimator – if the difference between the
estimator and the parameter grows smaller as the
NORMAL APPROXIMATION TO THE BINOMIAL sample size grows larger
DISTRIBUTION 3 - relatively efficient estimator – if there are 2
- Binomial distribution with n trials and probability unbiased estimators of a parameter, the one whose
p of success gets closer to a normal distribution. variance is smaller.
- The normal distribution has the same mean µ = np
and SD, = √ npq as the binomial distribution Interval estimate - interval or a range of values used to
estimate the parameter being estimated
- degree of confidence can be assigned before an
interval estimate is prepared
Confidence level – states how much confidence we
have that is interval contains true population
parameter.
- denoted by (1 – α)100%. When stated as
The formulas for the mean and SD for binomial
probability, it is called CONFIDENCE COEFFICIENT
distribution are as follows:
and is denoted as 1 – α. (0.99, 0.95, and 0.90)
μ=np σ =√ npq where q=1− p
The formula for confidence interval of the mean for the
specific α is
CONFIDENCE INTERVALS AND SAMPLE SIZE
Estimation - One aspect of inferential statistics xbar−z a ( √σn )< μ< xbar+ z ( √σn )
2
a
2
- process of estimating the value of a parameter
from information drawn from a sample
- determines the approximate value of a population
xba r−z
( √σn ) is called the lower confidence limit ( LCL )
a
2
xbar−t a
2
( √sn )< μ< xbar + t ( √sn )
a
2
The degree of freedom is n-1
Example:
Example: A recent study of 19 students of San Sebastian College
An independent researcher wishes to estimate the – Recoletos, Manila showed that the mean of the
average amount of money a young professional distance they traveled to go to school was 10.2 km.
spends in clothing each month. A sample 40 young
professional who spend in clothing found the mean to The standard deviation of the sample was 2.3 km. Find
be P500 and the sd to be P80. Find the best point the 99% confidence interval of the true mean.
estimate of the population mean and 95% confidence
interval of the population mean. - xbar= 10.2; s = 2.3; n = 19
- xbar = 500; sd = 80; n = 40 - Find alpha/2 = 0.01/2 = 0.005
- For the 95% confidence interval, z a /2 = 1.960 - Using the t-distribution table, with df = n-1 = 18-1
- xbar−z a
σ
( )< μ< xbar+ z a
σ
( ) = 18 (one tailed); t = 2.878
2 √n 2 √n
- ( √sn )< μ< xbar + t ( √sn )
xbar−t a a
- 500−1.960
( )
80
< μ< 4+1.960
80
( ) 2 2
-
√ 40
500−24.792< μ< 500+24.792
√ 40 - 10.2−2.878
( √2.319 )< μ< 10.2+ 2.878 ( √2.319 )
- 475.208< μ <524.792 - 10.2−1.519< μ< 10.2+1.519
- 8.681< μ<11.719
SAMPLE SIZE DETERMINATION
To get an accurate estimate we need three things: CONFIDENCE INTERVALS AND SAMPLE SIZE FOR
1. the maximum error of estimate PROPORTIONS
2. the population standard deviation Population can be taken from populations or samples.
3. the degree of confidence
X n−X
phat = ∧qhat= ∨qhat =1− p
E=z a/ 2
( √σn ) n n
To construct a confidence interval about a proportion,
one must use the maximum error of estimate, which is
n=¿
n = sample size
= population standard deviation
E = maximum error of estimate
• E=z a/ 2
√ pq
n
Confidence intervals about the proportions must meet
the criteria that np ≥ 5 and nq ≥ 5.
√ √
CONFIDENCE INTERVALS FOR THE MEAN AND pq pq
SAMPLE SIZE UNKNOWN • phat−z a < p< phat+ z a
2 n 2 n
- When is known or unknown and n ≥ 30,
standard normal distribution is used to determine
SAMPLE SIZE FOR PROPORTIONS
the confidence intervals of the mean.
By using the maximum error part of the confidence
interval formula, it is possible to determine the size of
the sample that must be taken in order to estimate p
7
SUBJECT: Statistical and Analytic Process
MODULE:
PROFESSOR: Sir Tommie Dicang Date: October 23, 2022
with a desired accuracy. The maximum error of
estimate for a proportion can be expressed as
E=z a/ 2
√ pq
n
n=pq ¿