FORMULAS

PART 1: DATA BINNING AND DESCRIPTIVE STATISTICS
1. Data binning (Equal-width binning)

- The number of bins:
- The width of a bin:
2. Frequency tables
Frequenc Percentag
Valu Cumulative Cumulative
y e
e frequency percentage
fi (%)
X1 f1 d1 f1 d1
X2 f2 d2 f1+f2 d1+d2
... ... ... ... ...
Xk fk dk f1+f2+...+fk d1+d2+...+dk
Total n 100
3. Numerical measures of central tendency

3.1. Mean:
- Arithmetic mean:
- Weighted mean:
3.2. Mode: is the value that occurs with the greatest frequency.
 For the unbinned data: Mode is the value with the highest frequency.
 For the equal-width binned data: The bin containing the mode value is
the bin with the highest frequency. The mode value is calculated by this
formula:
1
 For the fixed-width binned data: The determination of the bin containing
the mode is not based on the frequency but on the distribution density
(Distribution density=Frequency/Width):
3.3. Median: is the value in the middle when the data are arranged in
ascending order (smallest value to largest value). The median divides the
data into 2 parts, with each part having one-twice of the observations
(50%).
- For an odd number of observations:
- For an even number of observations:
- The median value in the binned dataset:

S1: Calculate the cumulative frequencies
S2: The bin containing the median is the bin with the cumulative
n+1
frequency that is greater than or equal to 2
S3: Apply this formula:
3.4. Quartiles: divide the sorted data set in ascending order into four
parts, with each part having one-fourth of the observations (25%).
If n+1 is divisible by 4:
2
If n+1 is NOT divisible by 4:
EX: We have the following numbers: 1800; 1900; 2000; 2100; 2200;
2500; 2700; 2800.
 Based on the weighted average:

n+1 2(n+1) 3(n+1)
→ n=8 → 4 =2.25; =4.5; =6.75
4 4
→ Q1=1900+ 0.25 ( 2000−1900 )=1925
Q2=2100+ 0.5( 2200−2100 )=2150
Q3=2500+ 0.75( 2700−2500 )=2650
 Based on Tukey's Hinges

n+1 1 2(n+1) 1 3(n+1) 3
→ n=8 → 4 =2 4 ; =4 2 ; =6 4
4 4
1
→ Q1=1900+ 2 ( 2000−1900 )=1950
1
Q2=2100+ 2 ( 2200−2100 )=2150
1
Q3=2500+ 2 ( 2700−2500 )=2600
3.5. Percentiles: In a dataset, the pth percentile divides the data into two
parts: approximately p% of the observations are less than the pth
percentile, and approximately (100 – p)% of the observations are greater
than the pth percentile.
Qp% = Xp%(n+1)
4. Numerical measures of dispersion (variability)
3
4.1. Range: is the difference between the maximum and the minimum
value of the data set.
R=XMax-XMin
4.2. Interquartile Range: is the difference between the third quartile, Q3,
and the first quartile, Q1. It is the range for the middle 50% of the data. It
overcomes the dependency on extreme values (or outliers).
RQ=Q3-Q1
4.3. Variance: refers to the average of the squared differences from the
mean. In other words, it is the square value of the standard deviation.
 In the unbinned dataset:
 In the binned dataset with corresponding frequencies:
4.4. Standard Deviation: is one of the measures of dispersion. Standard

Deviation (usually denoted by σ for the population and by S for the
sample) is calculated by taking the square root of the variance:
σ =√ σ 2
2
S=√ S 2
2
4.5. Coefficient of Variation: is a relative measure of variability; it

measures the standard deviation relative to the mean.
σ S
CV (population)= μ *100% CV (sample)= X *100%
5. Chebyshev’s Theorem: any sample (regardless of the shape of the

1
distribution) with the mean X and the standard deviation S has at least (1- m2
).100% of the values fall in the range ( X - mS; X + mS) with m>1. Or at least (1-
4
1
).100% of the data values must be within m standard deviations of the mean,
m2
where m is any value greater than 1.
Some of the implications of this theorem, with m = 1.5, 2, 2.5 and 3

standard deviations, follow:
• At least 55.6% of the data values must be within ( X - 1.5 S; X + 1.5S)
• At least 75% of the data values must be within ( X - 2 S; X + 2S)
• At least 84% of the data values must be within ( X - 2.5 S; X + 2.5S)
• At least 88.9% of the data values must be within ( X - 3 S; X + 3S)
6. Empirical Rule: When the data is believed to approximate the normal

distribution (a bell-shaped distribution):
• Approximately 68% of the data values will be within ( X - S; X + S)
• Approximately 95% of the data values will be within ( X - 2S; X +
2S)
• Approximately 99.7% of the data values will be within ( X - 3S; X +
3S)
5
7. Z-scores: are the measures of relative location that help us determine how far
a particular value is from the mean. It is measured in terms of standard
deviations from the mean.
X−μ X− X
Z (population)= σ Z (sample)= S
PART 2: INTERVAL ESTIMATION AND HYPOTHESIS TEST

1. Interval estimation (for 1 sample):
2. Estimating the mean difference between 2 independent samples

With σ 1 and σ 2 known:
With σ 1 and σ 2 unknown:
6
3. Estimating the mean difference between 2 paired samples
4. Estimating the proportion difference between 2 independent samples
5. Hypothesis test about the population mean (1 sample)
Rejection rules:
7
(similar when comparing t values)
6. Hypothesis test about the population proportion (1 sample)
7. Hypothesis test about the mean difference between 2 independent

samples
8
Rejection rules:
(similar when comparing t values)
8. Hypothesis test about the mean difference between 2 paired samples
9
Rejection rules: similar when comparing Z or t values for
1 sample
9. Hypothesis test about the proportion difference between 2

independent samples
Rejecting rules: similar when comparing Z or t values

for 1 sample
10. Analysis of variance (ANOVA)

One-way ANOVA (Fisher test or F test):
SST = SSG + SSW
10
Post-hoc one-way ANOVA (Tukey test):
o If the response (or outcome) variable is not normally distributed, we

can convert the data from the quantitative data to the categorical data
(i.e. using the ordinal scale) and apply a non-parametric test called
Kruskal-Wallis:
H0: μ1= μ 2=... = μ k
H1: μ1≠ μ 2≠... ≠ μ k
Rejecting H0 if:
PART 3: TIME SERIES DATA AND FORECASTING / INDEX

NUMBERS
1. Forecasting using the Moving Averages Method
11
2. Forecasting using the Arithmetic Progression Method
3. Forecasting using the Geometric Progression Method
4. Forecasting using the Exponential growth method
12
5. Forecasting using Linear Trend Regression
6. Forecasting using Quadratic Trend Equation
13
7. Forecasting using Exponential Trend Equation
8. Forecasting using Time Series Decomposition
9. Forecasting using the Exponential Smoothing Method
10. Forecasting using the Holt-Winters forecasting method
14
11. Simple index number
12. Unweighted aggregate price index
13. Weighted aggregate price index: when the quantity of usage is the measure
of importance.
15
14. Weighted aggregate quantity index: when the price is the measure of
importance.
15. Composite index system
16

FORMULAS

Uploaded by

Copyright:

Available Formats

FORMULAS

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FORMULAS

Uploaded by

Copyright:

Available Formats

PART 1: DATA BINNING AND DESCRIPTIVE STATISTICS

1. Data binning (Equal-width binning)

- The width of a bin:

3. Numerical measures of central tendency

- For an odd number of observations:

- For an even number of observations:

- The median value in the binned dataset:

 Based on the weighted average:

 Based on Tukey's Hinges

 In the unbinned dataset:

 In the binned dataset with corresponding frequencies:

4.4. Standard Deviation: is one of the measures of dispersion. Standard

4.5. Coefficient of Variation: is a relative measure of variability; it

5. Chebyshev’s Theorem: any sample (regardless of the shape of the

Some of the implications of this theorem, with m = 1.5, 2, 2.5 and 3

6. Empirical Rule: When the data is believed to approximate the normal

PART 2: INTERVAL ESTIMATION AND HYPOTHESIS TEST

2. Estimating the mean difference between 2 independent samples

With σ 1 and σ 2 unknown:

4. Estimating the proportion difference between 2 independent samples

5. Hypothesis test about the population mean (1 sample)

6. Hypothesis test about the population proportion (1 sample)

7. Hypothesis test about the mean difference between 2 independent

(similar when comparing t values)

8. Hypothesis test about the mean difference between 2 paired samples

9. Hypothesis test about the proportion difference between 2

Rejecting rules: similar when comparing Z or t values

10. Analysis of variance (ANOVA)

SST = SSG + SSW

o If the response (or outcome) variable is not normally distributed, we

PART 3: TIME SERIES DATA AND FORECASTING / INDEX

1. Forecasting using the Moving Averages Method

3. Forecasting using the Geometric Progression Method

4. Forecasting using the Exponential growth method

6. Forecasting using Quadratic Trend Equation

8. Forecasting using Time Series Decomposition

9. Forecasting using the Exponential Smoothing Method

10. Forecasting using the Holt-Winters forecasting method

12. Unweighted aggregate price index

15. Composite index system

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.