FORMULAS

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

PART 1: DATA BINNING AND DESCRIPTIVE STATISTICS

1. Data binning (Equal-width binning)


- The number of bins:

- The width of a bin:

2. Frequency tables

Frequenc Percentag
Valu Cumulative Cumulative
y e
e frequency percentage
fi (%)
X1 f1 d1 f1 d1
X2 f2 d2 f1+f2 d1+d2
... ... ... ... ...
Xk fk dk f1+f2+...+fk d1+d2+...+dk
Total n 100

3. Numerical measures of central tendency


3.1. Mean:
- Arithmetic mean:

- Weighted mean:

3.2. Mode: is the value that occurs with the greatest frequency.
 For the unbinned data: Mode is the value with the highest frequency.
 For the equal-width binned data: The bin containing the mode value is
the bin with the highest frequency. The mode value is calculated by this
formula:

1
 For the fixed-width binned data: The determination of the bin containing
the mode is not based on the frequency but on the distribution density
(Distribution density=Frequency/Width):

3.3. Median: is the value in the middle when the data are arranged in
ascending order (smallest value to largest value). The median divides the
data into 2 parts, with each part having one-twice of the observations
(50%).

- For an odd number of observations:

- For an even number of observations:

- The median value in the binned dataset:


S1: Calculate the cumulative frequencies
S2: The bin containing the median is the bin with the cumulative
n+1
frequency that is greater than or equal to 2
S3: Apply this formula:

3.4. Quartiles: divide the sorted data set in ascending order into four
parts, with each part having one-fourth of the observations (25%).

If n+1 is divisible by 4:

2
If n+1 is NOT divisible by 4:

EX: We have the following numbers: 1800; 1900; 2000; 2100; 2200;
2500; 2700; 2800.

 Based on the weighted average:


n+1 2(n+1) 3(n+1)
→ n=8 → 4 =2.25; =4.5; =6.75
4 4
→ Q1=1900+ 0.25 ( 2000−1900 )=1925
Q2=2100+ 0.5( 2200−2100 )=2150
Q3=2500+ 0.75( 2700−2500 )=2650

 Based on Tukey's Hinges


n+1 1 2(n+1) 1 3(n+1) 3
→ n=8 → 4 =2 4 ; =4 2 ; =6 4
4 4
1
→ Q1=1900+ 2 ( 2000−1900 )=1950
1
Q2=2100+ 2 ( 2200−2100 )=2150
1
Q3=2500+ 2 ( 2700−2500 )=2600

3.5. Percentiles: In a dataset, the pth percentile divides the data into two
parts: approximately p% of the observations are less than the pth
percentile, and approximately (100 – p)% of the observations are greater
than the pth percentile.
Qp% = Xp%(n+1)
4. Numerical measures of dispersion (variability)

3
4.1. Range: is the difference between the maximum and the minimum
value of the data set.

R=XMax-XMin
4.2. Interquartile Range: is the difference between the third quartile, Q3,
and the first quartile, Q1. It is the range for the middle 50% of the data. It
overcomes the dependency on extreme values (or outliers).

RQ=Q3-Q1

4.3. Variance: refers to the average of the squared differences from the
mean. In other words, it is the square value of the standard deviation.

 In the unbinned dataset:

 In the binned dataset with corresponding frequencies:

4.4. Standard Deviation: is one of the measures of dispersion. Standard


Deviation (usually denoted by σ for the population and by S for the
sample) is calculated by taking the square root of the variance:

σ =√ σ 2
2
S=√ S 2
2

4.5. Coefficient of Variation: is a relative measure of variability; it


measures the standard deviation relative to the mean.
σ S
CV (population)= μ *100% CV (sample)= X *100%

5. Chebyshev’s Theorem: any sample (regardless of the shape of the


1
distribution) with the mean X and the standard deviation S has at least (1- m2
).100% of the values fall in the range ( X - mS; X + mS) with m>1. Or at least (1-

4
1
).100% of the data values must be within m standard deviations of the mean,
m2
where m is any value greater than 1.

Some of the implications of this theorem, with m = 1.5, 2, 2.5 and 3


standard deviations, follow:
• At least 55.6% of the data values must be within ( X - 1.5 S; X + 1.5S)
• At least 75% of the data values must be within ( X - 2 S; X + 2S)
• At least 84% of the data values must be within ( X - 2.5 S; X + 2.5S)
• At least 88.9% of the data values must be within ( X - 3 S; X + 3S)

6. Empirical Rule: When the data is believed to approximate the normal


distribution (a bell-shaped distribution):
• Approximately 68% of the data values will be within ( X - S; X + S)
• Approximately 95% of the data values will be within ( X - 2S; X +
2S)
• Approximately 99.7% of the data values will be within ( X - 3S; X +
3S)

5
7. Z-scores: are the measures of relative location that help us determine how far
a particular value is from the mean. It is measured in terms of standard
deviations from the mean.

X−μ X− X
Z (population)= σ Z (sample)= S

PART 2: INTERVAL ESTIMATION AND HYPOTHESIS TEST


1. Interval estimation (for 1 sample):

2. Estimating the mean difference between 2 independent samples


With σ 1 and σ 2 known:

With σ 1 and σ 2 unknown:

6
3. Estimating the mean difference between 2 paired samples

4. Estimating the proportion difference between 2 independent samples

5. Hypothesis test about the population mean (1 sample)

Rejection rules:

7
(similar when comparing t values)

6. Hypothesis test about the population proportion (1 sample)

7. Hypothesis test about the mean difference between 2 independent


samples

8
Rejection rules:

(similar when comparing t values)

8. Hypothesis test about the mean difference between 2 paired samples

9
Rejection rules: similar when comparing Z or t values for
1 sample

9. Hypothesis test about the proportion difference between 2


independent samples

Rejecting rules: similar when comparing Z or t values


for 1 sample

10. Analysis of variance (ANOVA)


One-way ANOVA (Fisher test or F test):

SST = SSG + SSW

10
Post-hoc one-way ANOVA (Tukey test):

o If the response (or outcome) variable is not normally distributed, we


can convert the data from the quantitative data to the categorical data
(i.e. using the ordinal scale) and apply a non-parametric test called
Kruskal-Wallis:
H0: μ1= μ 2=... = μ k
H1: μ1≠ μ 2≠... ≠ μ k

Rejecting H0 if:

PART 3: TIME SERIES DATA AND FORECASTING / INDEX


NUMBERS

1. Forecasting using the Moving Averages Method

11
2. Forecasting using the Arithmetic Progression Method

3. Forecasting using the Geometric Progression Method

4. Forecasting using the Exponential growth method

12
5. Forecasting using Linear Trend Regression

6. Forecasting using Quadratic Trend Equation

13
7. Forecasting using Exponential Trend Equation

8. Forecasting using Time Series Decomposition

9. Forecasting using the Exponential Smoothing Method

10. Forecasting using the Holt-Winters forecasting method

14
11. Simple index number

12. Unweighted aggregate price index

13. Weighted aggregate price index: when the quantity of usage is the measure
of importance.

15
14. Weighted aggregate quantity index: when the price is the measure of
importance.

15. Composite index system

16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy