Basic Statistics Terms and Calculations

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Basic Statistics Terms and Calculations

Statistics - The science of collecting, organizing, describing, and interpreting data or information.
In the study of statistics, it is important to be familiar with a variety of terms.

Data Set - A collection of information.

Variable - The characteristics about which information can be collected.

Distribution - The way a variable’s values are spread over the possible values. The distribution can be
displayed in a table or a graph.

Qualitative Variable - Variables that are classified into categories (i.e. colors, sports, makes of cars, etc.)
Quantitative Variable - Variables that are numerical and describe how much or how many of something
there is (i.e. peoples’ ages, heights, salaries, test scores, etc.).

Mode -The most common value that shows up in a data set. There can be more than one mode. If all of
the data values appear only once, then there is no mode. If the data set is bar graphed, the
mode(s) will show up as a high point/peak. Both Qualitative and Quantitative variables can have a
mode.

Note: There are certain terms used in statistical analysis that only apply to
Quantitative Variables. Often, these terms involve a mathematical computation
as part of their definition. For example…

Mean – The average value of a data set. This can be calculated by summing up all of the individual
values in the data set and dividing the total by the number of data values (n) in the set.

Median - The middle value in a sorted (i.e. low to high) data set. If there is an even number of values,
then it is the average of the two middle values.

Range – The difference between the highest and lowest values of a data set.

Variation - A measure of how widely data values are spread out from the center of a data set.

Variance - A measure of how far the values in a data set are from the mean, on the average. To
complete the calculation, it is necessary to know whether the data set is from a population or
a sample.

Standard Deviation – A measure of how far data values are spread around the mean of a data set. It is
computed as the square root of the variance. Therefore, to complete the
calculation, it is necessary to know whether the data set is from a population or a
sample.

z-score - A measure of how many standard deviations a specific value (x) in the data set is from the
mean of the data set. The z-score is positive if the data value is greater than the mean and
negative if it is less than the mean. This term can also be referred to as “the standard score.”

now whether the data set is from a population or a sample.


Note: When analyzing a data set, it is necessary to identify if the data is from a
population or from a sample. Different symbols and formulas are used to
represent certain statistical terms depending on whether the data is from an
entire population or a sampling of the population.

Population – The complete set of people or things being studied.


Sample – A subset of the population from which the raw data are actually obtained. Sampling
techniques are often utilized if it is not feasible to gather the entire population of data.

Term Symbol(s) Formula(s)


Sigma Ʃ This symbol is used to represent the sum of a specific set of values.

𝜇 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡


Mean =
(population) 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡
Mean
𝑥 ∑ 𝑥𝑖 𝑥1 + 𝑥2 + 𝑥3 +⋯
(sample) 𝜇 = 𝑥 = = ,
𝑛 𝑛

where 𝑥𝑖 = individual data points

𝜎2 ∑ (𝑥𝑖 − 𝜇)2
Population Variance =
(population) 𝑛
Variance
𝑠2 ∑ (𝑥𝑖 − 𝑥)2
Sample Variance =
(sample) 𝑛−1

𝜎 Population Standard Deviation (𝜎)


(population)
∑ (𝑥𝑖 − 𝜇)2
𝜎 = √𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √
Standard 𝑛
Deviation 𝑠
(sample) Sample Standard Deviation (s)

∑ (𝑥𝑖 − 𝑥)2
𝑠 = √𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √ 𝑛−1

𝑥− 𝜇 𝑥−𝑥
Z-Score z
z = =
𝜎 𝑠
x = specific value from the data set.
Example - The population of class grades on a recent exam is displayed in the table.
Using the following normally distributed data set, identify or calculate the following -
a. Mode Students Grades on the most recent Statistics Exam
b. Mean
c. Median 96 74 92 75 64
d. Range 80 70 54 81 80
e. Variance
72 88 61 76 92
f. Standard Deviation
g. z-score for the student who
received an 88 on the exam.

Note: Before analyzing the data, it can be helpful to re-order it from smallest to biggest! This visually
allows the identification of some important information.

Re-ordering the data Students Grades on the most recent Statistics Exam
from low to high makes (Data Re-ordered from low to high)
it easier to identify the
High, Low, any 54 61 64 70 72
repeated numbers, and
74 75 76 80 80
the “middle” of the
data set. 81 88 92 92 96

Lowest The Median – There are 2 Highest


data point since there is an modes (80 and data point
odd number of 92) because they
data points in the both show up
set, it is the point twice.
in the middle.
Answers –
80 and 92 (The data values that show up the most often)
a. Mode(s) 

b. Mean  ∑ 𝑥𝑖 1155 Sum all of the data points together


𝜇 = = = 77
𝑛 15

Divide by the total number of data points

76 (Center of data set)


c. Median 

d. Range  𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ − 𝐿𝑜𝑤 = 96 − 54 = 42


e. Variance  If calculating by hand, it can be helpful to create a table.

Table for Calculating Variance


Data Point (𝒙) Mean (µ) (𝒙 − 𝝁) (𝒙 − 𝝁)𝟐
1 54 77 -23 529
2 61 77 -16 256
3 64 77 -13 169
4 70 77 -7 49
5 72 77 -5 25
6 74 77 -3 9
7 75 77 -2 4
8 76 77 -1 1
9 80 77 3 9
10 80 77 3 9
11 81 77 4 16
12 88 77 11 121
13 92 77 15 225
14 92 77 15 225
15 96 77 19 361
2008
Sum Sum of
n this
column
Variance  ∑ (𝑥− 𝜇)2 2008
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = = = 133.87
𝑛 15

f. Standard Deviation  𝜎 = √𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √133.87 = 11.57

g. Z-score for student who received an 88 on the exam  𝑥− 𝜇 88−77


𝑧= = = 0.95
𝜎 11.57

Try this problem on your own!


Corporate Profit Levels in billions in 2012
(Source: CNN Money)
Find the following for the given data set
Exxon 41.1 Ford 20.2
a. Mode (Answ: No Mode) Walmart 15.7 H-P 7.1
b. Mean (Answ: 10.08) Chevron 26.9 AT&T 3.9
Conoco 12.4 Valero 2.1
c. Median (Answ: 9.2)
GM 9.2 BoA 1.5
d. Range (Answ: 58) GE 14.2 McKesson 1.2
e. Variance (Answ: 163.49) Berk.Hath. 10.2 Verizon 2.4
f. Standard Deviation (Answ: 12.79) Fannie Mae -16.9
g. Z-score for Walmart (Answ: 0.44)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy