0% found this document useful (0 votes)
13 views213 pages

Unit 2- Statistics Notes

The document provides a comprehensive overview of statistics fundamentals, including definitions, types of data, and various statistical measures. It covers qualitative and quantitative data, descriptive statistics, and measures of central tendency such as mean, geometric mean, and harmonic mean. Additionally, it discusses the advantages and disadvantages of different statistical methods and their applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views213 pages

Unit 2- Statistics Notes

The document provides a comprehensive overview of statistics fundamentals, including definitions, types of data, and various statistical measures. It covers qualitative and quantitative data, descriptive statistics, and measures of central tendency such as mean, geometric mean, and harmonic mean. Additionally, it discusses the advantages and disadvantages of different statistical methods and their applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 213

Contents

1 Introduction to Statistics Fundamentals 1


1.1 What is Statistics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Need of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Advantages of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Disadvantages of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Applications of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Types of Data or Variables in Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Qualitative vs Quantitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Qualitative Data or Categorical Data . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1.1 Nominal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1.2 Ordinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Quantitative Data or Numerical Data . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2.1 Discrete Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2.2 Continuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Interval vs Ratio Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.0.1 Interval Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.0.2 Ratio Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Other Types of Data/Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.1 Primary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.2 Binary (Dichotomous) Variable . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Statistical Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7 On the Basis of Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7.1 Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7.2 Spatial Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7.3 Condition Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 On the Basis of Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8.1 Individual Series (or Raw Data) . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8.2 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8.2.1 Frequency Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8.2.2 Types of Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8.2.3 Frequency Distribution . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8.3 Discrete Series or UnGrouped Frequency Distribution or Frequency Array . . . . . . . 27
1.8.4 Continuous Series or Grouped Frequency Distribution . . . . . . . . . . . . . . . 28
1.8.4.1 Important Terms under Continuous Series . . . . . . . . . . . . . . . . 29
1.8.4.2 Inclusive Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.8.4.3 Exclusive Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.8.4.4 Conversion of Inclusive Series into Exclusive Series? . . . . . . . . . . . 33
1.8.4.5 Difference between Inclusive and Exclusive Series . . . . . . . . . . . . 34
1.8.5 Open End Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.9 Types of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.10 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.10.1 Key Components of Descriptive Statistics: . . . . . . . . . . . . . . . . . . . . 36
1.11 Population vs Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.12 Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.13 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.13.1 Types of Mean (or Pythagorean mean) . . . . . . . . . . . . . . . . . . . . . . 39

3
1.14 Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.14.1 How to Calculate Arithmetic Mean? . . . . . . . . . . . . . . . . . . . . . . . 40
1.14.2 Mean of Individual Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.14.2.1 Direct Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.14.2.2 Assumed Mean Method . . . . . . . . . . . . . . . . . . . . . . . . 43
1.14.2.3 Step-Deviation Method . . . . . . . . . . . . . . . . . . . . . . . . 46
1.14.2.4 Assumed Mean Method vs Step Deviation Method . . . . . . . . . . . . 50
1.14.3 Mean of Ungrouped Frequency Distribution or Discrete Series . . . . . . . . . . . . 51
1.14.3.1 Direct Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.14.3.2 Assumed Mean Method . . . . . . . . . . . . . . . . . . . . . . . . 57
1.14.3.3 Step-Deviation Method . . . . . . . . . . . . . . . . . . . . . . . . 61
1.14.4 Mean of Continuous Series (or Grouped Frequency Distribution) . . . . . . . . . . . 64
1.14.4.1 Direct Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1.14.4.2 Assumed Mean Method . . . . . . . . . . . . . . . . . . . . . . . . 67
1.14.4.3 Step-Deviation Method . . . . . . . . . . . . . . . . . . . . . . . . 74
1.14.5 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
1.15 Geometric Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
1.15.1 Advantages of Geometric Mean . . . . . . . . . . . . . . . . . . . . . . . . . 82
1.15.2 Disadvantages of Geometric Mean . . . . . . . . . . . . . . . . . . . . . . . . 82
1.15.3 Application of Geometric Mean . . . . . . . . . . . . . . . . . . . . . . . . . 82
1.15.4 Geometric Mean for Individual Series . . . . . . . . . . . . . . . . . . . . . . 83
1.15.5 Geometric Mean for Discrete Series . . . . . . . . . . . . . . . . . . . . . . . 87
1.15.6 Geometric Mean for Continuous Series . . . . . . . . . . . . . . . . . . . . . . 90
1.15.7 Geometric Mean vs Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . 93
1.15.8 When is the Geometric mean better than the Arithmetic mean? . . . . . . . . . . . . 94
1.16 Harmonic Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
1.16.1 Advantages of Harmonic Mean . . . . . . . . . . . . . . . . . . . . . . . . . 95
1.16.2 Limitations of Harmonic Mean . . . . . . . . . . . . . . . . . . . . . . . . . 95
1.16.3 Applications of Harmonic Mean . . . . . . . . . . . . . . . . . . . . . . . . 96
1.16.4 Harmonic Mean for Individual Series . . . . . . . . . . . . . . . . . . . . . . 97
1.16.5 Harmonic Mean for a Discrete Series . . . . . . . . . . . . . . . . . . . . . . 102
1.16.6 Harmonic Mean for a Continuous Series . . . . . . . . . . . . . . . . . . . . . 105
1.16.7 Weighted Harmonic Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
1.17 Relation Between AM, GM, and HM . . . . . . . . . . . . . . . . . . . . . . . . . . 111
1.18 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
1.18.1 Advantages of the Median . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
1.18.2 Limitations of the Median . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
1.18.3 Applications of the Median . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
1.18.4 Median for Individual Series . . . . . . . . . . . . . . . . . . . . . . . . . . 115
1.18.5 Median for Discrete Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
1.18.6 Median for Continuous Series . . . . . . . . . . . . . . . . . . . . . . . . . 129
1.19 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
1.19.1 Advantages of Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
1.19.2 Limitations of Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
1.19.3 Applications of Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
1.19.4 Mode for Individual Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
1.19.5 Mode for Discrete Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
1.19.6 Mode for Continuous Series . . . . . . . . . . . . . . . . . . . . . . . . . . 148
1.19.7 Relationship Between Mean, Median, and Mode . . . . . . . . . . . . . . . . . . 155
1.20 Measure of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
1.20.1 Advantages of Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . 157
1.20.2 Limitations of Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . 157
1.20.3 Applications of Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . 157
1.20.4 Types of Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . 158
1.20.4.1 Absolute Measure of Dispersion . . . . . . . . . . . . . . . . . . . . 159
1.20.4.2 Relative Measure of Dispersion . . . . . . . . . . . . . . . . . . . . 159
1.21 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
1.21.1 Range for Individual Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
1.21.2 Range for Discrete Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
1.21.3 Range for Continuous Series . . . . . . . . . . . . . . . . . . . . . . . . . . 168
1.21.4 Advantages of Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
1.21.5 Limitations of Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
1.21.6 Applications of Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
1.22 Mean Deviation (MD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
1.22.1 Mean Deviation for Individual Series . . . . . . . . . . . . . . . . . . . . . . 174
1.22.1.1 Mean Deviation around Mean for Individual Series . . . . . . . . . . . . 175
1.22.1.2 Mean Deviation around Median for Individual Series . . . . . . . . . . . 182
1.22.1.3 Mean Deviation around Mode for Individual Series . . . . . . . . . . . . 183
1.22.2 Mean Deviation for Discrete Series . . . . . . . . . . . . . . . . . . . . . . . 184
1.22.2.1 Mean Deviation around the Mean for Discrete Series . . . . . . . . . . . . 185
1.22.2.2 Mean Deviation around the Median for Discrete series . . . . . . . . . . . 187
1.22.2.3 Mean Deviation around the Mode for Discrete Series . . . . . . . . . . . 191
1.22.3 Mean Deviation for Continuous Series . . . . . . . . . . . . . . . . . . . . . . 193
1.22.3.1 Mean Deviation around Mean for a Continuous Series . . . . . . . . . . . 193
1.22.3.2 Mean Deviation around Median for a Continuous Series . . . . . . . . . . 195
1.22.3.3 Mean Deviation around Mode for Continuous Series . . . . . . . . . . . . 197
1.22.4 Advantages of Mean Deviation . . . . . . . . . . . . . . . . . . . . . . . . . 200
1.22.5 Limitations of Mean Deviation . . . . . . . . . . . . . . . . . . . . . . . . . 200
1.22.6 Applications of Mean Deviation . . . . . . . . . . . . . . . . . . . . . . . . . 200
1.23 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
1.23.1 Probability Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
1.23.1.1 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 203
1.23.1.2 Systematic Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 204
1.23.1.3 Stratified Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 205
1.23.1.4 Cluster Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 206
1.23.1.5 Multi-Stage Sampling . . . . . . . . . . . . . . . . . . . . . . . . 207
1.23.2 Non-Probability Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
1.23.2.1 Convenience Sampling . . . . . . . . . . . . . . . . . . . . . . . . 208
1.23.2.2 Judgmental Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 208
1.23.2.3 Quota Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
1.24 Inferential Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Chapter 1

Introduction to Statistics Fundamentals

1.1 What is Statistics?


• The term statistics, derived from the word state, was used to refer to a collection of facts of interest to the
state. Statistics is the art of learning from data.
• Statistics is an applied mathematics subject involved with collecting, characterizing, analyzing, and extract-
1
ing conclusions from quantitative data .
• Statistics involves using mathematical techniques to summarize and describe data, as well as to draw
conclusions and make decisions based on data.

1.1.1 Need of Statistics


Statistics provide the necessary tools to collect, analyze, interpret, and present data effectively. They help in making
decisions that are based on data rather than assumptions, thus reducing uncertainty and increasing the reliability of
conclusions drawn from data. Here are several key reasons why statistics are essential:

1. Data Analysis and Interpretation


• Extracting Meaningful Information: Statistics provide tools and methodologies to analyze large
volumes of data, extracting patterns and trends that might not be immediately apparent.
• Summarizing Data: Measures like mean, median, and standard deviation help summarize complex
data sets into understandable metrics.
2. Decision Making
• Evidence-Based Decisions: In fields like business, medicine, and public policy, decisions must be
based on data rather than intuition. Statistical analysis helps make informed and objective decisions.
• Risk Assessment: Statistics help in evaluating risks and uncertainties, enabling better decision-
making in uncertain conditions.
3. Predictive Analysis
• Forecasting: Statistical models can predict future trends based on historical data, which is crucial in
finance, economics, weather forecasting, and more.
• Trend Analysis: By analyzing past and present data, statistics help identify trends that can influence
future strategies.
4. Quality Control and Improvement
• Monitoring Processes: In manufacturing and service industries, statistical methods are used to monitor
and control processes, ensuring products and services meet quality standards.
• Continuous Improvement: Statistical tools like Six Sigma and control charts help in identifying areas
for improvement and in maintaining high-quality standards.
1 https://www.youtube.com/watch?v=p205pOUEYMk

1
5. Scientific Research
• Hypothesis Testing: Statistics are essential in testing hypotheses and validating research findings
across various scientific disciplines.
• Data Collection and Analysis: Researchers use statistical methods to design experiments, collect
data, and analyze results, ensuring the validity and reliability of their studies.
6. Understanding Variability
• Managing Uncertainty: Statistics help in understanding and managing variability in data, which is
inherent in any real-world process or phenomenon.
• Quantifying Differences: Through statistical tests, it’s possible to determine if observed differences
in data are significant or due to random variation.
7. Policy Formulation and Evaluation
• Public Policy: Governments and organizations use statistical data to formulate policies, assess their
impact, and make necessary adjustments.
• Socio-Economic Analysis: Statistics help in understanding social and economic issues, guiding
policy decisions on health, education, employment, and more.
8. Business and Market Research
• Consumer Insights: Businesses use statistics to understand consumer behavior, preferences, and
market trends.
• Product Development: Statistical analysis helps in identifying market needs, leading to the develop-
ment of new products and services.
9. Education and Psychology
• Educational Assessment: Statistics are used to analyze educational data, assess student performance,
and improve teaching methods.
• Psychological Research: In psychology, statistics help in studying human behavior, testing theories,
and validating psychological assessments.
10. Healthcare and Medicine
• Clinical Trials: Statistics are crucial in designing and analyzing clinical trials to ensure the efficacy
and safety of new treatments.
• Epidemiology: Statistical methods help in studying the distribution and determinants of health-
related events in populations, guiding public health interventions.

2
1.1.2 Advantages of Statistics
• Informed Decision Making
– Data-Driven Decisions: Statistics enable decisions based on data rather than intuition, increasing
the reliability and effectiveness of outcomes.
– Risk Management: Statistical analysis helps in identifying and managing risks, allowing for better
planning and mitigation strategies.
• Predictive Analysis
• Quality Control
• Scientific Research
• Understanding Variability
• Policy Formulation and Evaluation
• Business Applications
• Healthcare Applications

1.1.3 Disadvantages of Statistics


• Misinterpretation of Data
– Complexity: Statistical methods can be complex, leading to misinterpretation or misuse if not
properly understood.
– Over Generalization: Incorrect conclusions may be drawn if statistical results are overgeneralized
beyond the scope of the data.
• Data Quality Issues
– Garbage In, Garbage Out: Poor quality or biased data can lead to misleading results and incorrect
conclusions.
– Sampling Errors: Improper sampling techniques can result in non-representative samples, affecting
the validity of the results.
• Ethical Issues
– Data Manipulation: There is a risk of manipulating data or using selective statistics to mislead or
support a specific agenda.
– Privacy Concerns: Collecting and analyzing personal data raises privacy and ethical concerns,
especially in sensitive areas like healthcare.
• Resource Intensive
– Time-Consuming: Collecting, analyzing, and interpreting statistical data can be time-consuming
and resource-intensive.
– Cost: Conducting large-scale surveys or experiments can be costly, requiring significant financial
and human resources.
• Statistical Limitations
– Assumptions: Many statistical methods rely on certain assumptions (e.g., normality, independence),
and violating these assumptions can affect the results.
– Causation vs. Correlation: Statistics can identify correlations but not necessarily causation, leading
to potential misinterpretation of cause-and-effect relationships.
• Dynamic Nature of Data

3
1.1.4 Applications of Statistics
1. Business and Economics
• Market Research: Analyzing consumer behavior, preferences, and market trends to guide marketing
strategies and product development.
• Quality Control: Using statistical methods to monitor and improve product and service quality.
• Financial Analysis: Evaluating investment opportunities, assessing risks, and forecasting financial
trends.
• Operational Efficiency: Optimizing supply chain management, inventory control, and resource
allocation.
2. Healthcare and Medicine
• Clinical Trials: Designing and analyzing clinical trials to determine the efficacy and safety of new
drugs and treatments.
• Epidemiology: Studying the distribution and determinants of health-related events to guide public
health interventions and policy.
• Medical Research: Analyzing data from medical studies to understand disease patterns, treatment
outcomes, and health risks.
• Health Services Management: Improving hospital management, patient care, and resource allocation
through statistical analysis.
3. Social Sciences
• Sociological Research: Analyzing social behaviors, trends, and patterns to understand societal
dynamics and inform policy.
• Psychology: Using statistical methods to validate psychological theories, assess interventions, and
analyze behavioral data.
• Education: Evaluating educational programs, assessing student performance, and improving teaching
methods through data analysis.
4. Engineering and Manufacturing
• Quality Assurance: Applying statistical process control (SPC) to monitor and improve manufacturing
processes.
• Reliability Engineering: Analyzing the reliability and life-cycle of products to enhance durability and
performance.
• Design of Experiments: Optimizing product design and development through systematic experimen-
tation and analysis.
5. Environmental Science
• Climate Studies: Analyzing climate data to understand trends, model climate change, and predict
future conditions.
• Environmental Monitoring: Assessing pollution levels, natural resource management, and ecological
impacts through statistical analysis.
• Conservation Biology: Studying species populations, habitat use, and conservation strategies using
statistical methods.
6. Government and Public Policy
• Census and Surveys: Collecting and analyzing population data to inform policy decisions and
resource allocation.
• Economic Planning: Using statistical models to forecast economic growth, unemployment, inflation,
and other macroeconomic indicators.
• Policy Evaluation: Assessing the impact and effectiveness of public policies and programs through
data analysis.

4
7. Sports and Entertainment
• Performance Analysis: Analyzing athlete performance, game statistics, and team strategies to
enhance competitive edge.
• Audience Analytics: Studying viewer preferences, ratings, and engagement to optimize content and
marketing strategies in media and entertainment.
8. Information Technology and Data Science
• Machine Learning: Using statistical methods to develop algorithms for predictive modeling, classi-
fication, and clustering.
• Data Mining: Extracting meaningful patterns and insights from large datasets to inform business
decisions and strategies.
• Cybersecurity: Analyzing security threats, intrusion patterns, and system vulnerabilities through
statistical techniques.
9. Agriculture and Food Science
• Crop Yield Analysis: Studying factors affecting crop yields, pest control, and soil health to improve
agricultural practices.
• Food Safety: Monitoring and analyzing food production processes to ensure safety and compliance
with health regulations.
10. Education
• Assessment and Evaluation: Analyzing student performance data, evaluating educational programs,
and improving instructional methods.
• Educational Research: Using statistical methods to study learning outcomes, teaching effectiveness,
and educational trends.
11. Astronomy and Space Science
• Astrophysical Research: Analyzing astronomical data to study celestial bodies, cosmic phenomena,
and the structure of the universe.
• Space Mission Planning: Using statistical models to plan and optimize space missions, satellite
deployments, and exploration strategies.
12. Law and Forensics
• Criminology: Analyzing crime data to understand trends, patterns, and the effectiveness of law
enforcement strategies.
• Forensic Analysis: Using statistical methods in forensic science to analyze evidence, identify
patterns, and solve crimes.

5
1.2 Types of Data or Variables in Statistics

6
1.3 Qualitative vs Quantitative Data

1.3.1 Qualitative Data or Categorical Data


2
• Qualitative data also known as Categorical Data are not numerical.
• Qualitative data, also known as the categorical data, represents characteristics or describes the data that fits
into the categories.
• Qualitative data encompasses non-numerical information categorized into groups or classes.
• Categorical measures involves categorical variables and are defined in terms of natural language specifica-
tions, but not in terms of numbers such as a person’s gender, home town etc.
• Sometimes categorical data can hold numerical values (quantitative value), but those values do not have a
mathematical sense like birth date, pincode. Here, the birthdate and school postcode hold the quantitative
value, but it does not give numerical meaning.
• Qualitative data is further categorized into two categories that includes:
– Nominal Data
– Ordinal Data

2 https://www.youtube.com/watch?v=E1C5hB0yAM4

7
1.3.1.1 Nominal Data
• Nominal data is a type of data that consists of categories or names that cannot be ordered or ranked.
• Nominal data is often used to categorize observations into groups, and the groups are not comparable.
• In other words, nominal data has no inherent order or ranking. Therefore, if you would change the order of
its values, the meaning would not change.
• Examples of nominal data include:
– Gender (Male or female),
– Race (White, Black, Asian),
– Religion (Hinuduism, Christianity, Islam, Judaism)
– blood type (A, B, AB, O).

• Nominal data can be represented using frequency tables and bar charts, which display the number or
proportion of observations in each category.
• For example, a frequency table for gender might show the number of males and females in a sample of
people.
• Nominal data is analyzed using non-parametric tests, which do not make any assumptions about the
underlying distribution of the data.
• Common non-parametric tests for nominal data include Chi-Squared Tests and Fisher’s Exact Tests.
These tests are used to compare the frequency or proportion of observations in different categories.

8
1.3.1.2 Ordinal Data
• Ordinal data is a type of data that consists of categories that can be ordered or ranked. However, the distance
between categories is not necessarily equal.
• Ordinal data is nearly the same as nominal data, except that its ordering matters.
• Ordinal data is often used to measure subjective attributes or opinions, where there is a natural order to the
responses.
• Examples of ordinal data include education level (Elementary, Middle, High School, College), job position
(Manager, Supervisor, Employee), etc.

• Note that the difference between Elementary and High School is different from the difference between
High School and College. This is the main limitation of ordinal data, the differences between the values is
not really known. Because of that, ordinal scales are usually used to measure non-numeric features like
happiness, customer satisfaction and so on.
• Ordinal data can be represented using bar charts, line charts. These displays show the order or ranking of
the categories, but they do not imply that the distances between categories are equal.
• Ordinal data is analyzed using non-parametric tests, which make no assumptions about the underlying
distribution of the data.
• Common non-parametric tests for ordinal data include the Wilcoxon Signed-Rank test and Mann-Whitney
U test.

9
1.3.2 Quantitative Data or Numerical Data
3
• Quantitative Data takes about quantity. Something that we can measure in numbers.
• Quantitative Data is a fundamental component of Statistics, providing a numerical foundation for analysis
and decision-making.
• Quantitative data are data represented numerically, including anything that can be counted, measured, or
given a numerical value.
• They are also called the Numerical Data (i.e., how much, how often, how many).
• Quantitative data type is used to represent quantities, measurements, and observations like height, weight,
length and other things of the data.

4
• Quantitative data is further classified into two categories :
– Discrete Data
– Continuous Data

3 https://www.youtube.com/watch?v=kNARs2oeuk0
4 https://www.youtube.com/watch?v=Cg0W6mod9Hw

10
1.3.2.1 Discrete Data
• Discrete data type is a type of data in statistics that only uses distinct and countable values.
• Discrete information contains only a finite number of possible values. Those values cannot be subdivided
meaningfully.
• In a Discrete Dataset, apparent gaps or intervals exist between the values. These gaps indicate that there are
no values between the specified data points.
• The example of the discrete data types are,
– Marks of the students in a class test
– Number of customers
– Dice rolls: When rolling a six-sided dice, the possible outcomes are discrete and countable, ranging
from 1 to 6.
• Discrete Data is often analyzed using Statistical techniques tailored to discrete variables, such as frequency
distributions, bar charts, and probability calculations. These methods help to summarize and interpret Data
that can be counted or categorized into distinct values.
• key characteristics of discrete data.
– Finite, countable, and nondivisible: Discrete data includes discrete variables that are finite, numeric,
and non-negative integers (5, 10, 15, and so on).
– Easy to visualize: Discrete data can be easily visualized and demonstrated using simple statistical
methods such as bar charts, line charts, or pie charts.
– Can be categorical: Discrete data can also be categorical - containing a finite number of data values,
such as the gender of a person.
– Easy to distribute: Discrete data is distributed discretely in terms of time and space. Discrete
distributions make analyzing discrete values more practical.

1.3.2.2 Continuous Data


• Continuous Data, in contrast to Discrete Data, represent the data in a continuous range.
• The variable in the data set can have infinite number of probable values that can be selected within a given
specific range.
• The values are typically expressed as decimals or fractions and can be divided into smaller and smaller
intervals.
• Examples of the continuous data types are:
– Height of individuals: Heights can vary continuously, from fractions of an inch to several feet.
Measuring the height of people yields Continuous Data.
– Temperature readings: Temperature measurements can include decimal values and vary continu-
ously, ranging from below freezing to triple digits above zero.
– Weight of products: In manufacturing and commerce, the weight of products is often measured with
precision, resulting in Continuous Data.
• Key characteristics of continuous data are:
– Changes over time: Continuous data changes over time and can have different values at different
time intervals.
– May or may not have decimals: Continuous data comprises random variables that may or may not
be whole numbers.
– Visualized with line graphs or skews: Continuous data is measured using data analysis methods
such as line graphs and skews.

11
12
1.4 Interval vs Ratio Data
5
Interval vs Ratio Data: Video Lecture

1.4.0.1 Interval Data


• Numerical data with meaningful intervals between values, but no true zero point.
• Interval data is measured so that each value is placed at an equal distance from one another in a clear order.
• Interval data lacks the absolute zero point, it makes direct comparisons of magnitude impossible (e.g. A is
twice as large as B).
• Interval scales hold no true zero and can represent values below zero. For example, you can measure
temperatures below 0 degrees Celsius, such as -10 degrees.
• Interval variables are also commonly known as Scaled variables.
– Examples: Temperature in Celsius: Differences are meaningful, but 0°C does not mean the absence
of temperature.
– Calendar Years: 2000, 2010, 2020 (the intervals are equal, but there is no true zero year).

1.4.0.2 Ratio Data


• Numerical variables with meaningful intervals and a true zero point, allowing for the calculation of ratios.
• Ratio data uses absolute zero as a reference point for measurement. In other words, Ratio data has a defined
zero point, whereas interval data lacks the absolute zero point.
• Ratio variables, never fall below zero. Height and weight measure from 0 and above, but never fall below it.
• Ratio data can include variables like income, height, weight, annual sales, market share, product defect
rates, time to repurchase, unemployment rate, and crime rate. As an analyst, you can say a crime rate of
10% is twice that of 5%, or annual sales of 2 million are 25% greater than 1.5 million.
• Interval variables are also commonly known as Scaled variables.
– Examples: Income: 0, 50, 000, 100, 000 (income of 0 means no income, and you can say that
100, 000 is twice as much as 50, 000).
– Distance: 0 km, 5 km, 10 km (0 means no distance, and 10 km is twice as far as 5 km).
– Age: 0 years, 25 years, 50 years (0 means no age, and 50 years is twice as old as 25 years).
5 https://www.youtube.com/watch?v=kNARs2oeuk0

13
14
15
16
1.5 Other Types of Data/Variables
1.5.1 Primary Data
Primary data in mathematics is defined as the data that is collected for the first time. It is pure data and no analysis
is performed in this data.

1.5.2 Binary (Dichotomous) Variable


• Variables with only two possible values.
– Light Switch: On, Off.
– Pass/Fail: Pass, Fail.
– Yes/No: Yes, No.

17
1.6 Statistical Series
Statistical Series

Characteristics Construction

Time Series Spacial Series Condition Series Individual Series (or Raw Data)

Frequency

Frequency Distribution

UnGrouped Frequency Distribution Grouped Frequency Distribution

Discrete Series Continuous Series

Frequency Array Inclusive Exclusive Open End Cumulative Mid-Value Equal and Unequal
Series Series Series Frequency Frequency Class Interval Series
Series Series

18
• Data is important for researchers but in its raw form, it is hardly usable.
6
• Therefore, data is often organized in series to facilitate analysis and interpretation.
• Series has its own characteristics and they obey some general principles.
• Such types of series are very important for researchers and economists to gain insights so that they can use
them for actionable purposes.
• A statistical series refers to a set of observations arranged in a particular order based on one or more criteria.
• In other words, arranging data in some logical order such as according to the time of occurrence, size, or
7
some other measurable or non-measurable characteristics is known as Statistical Series. .
• Understanding the different types of statistical series is crucial for effectively analyzing and presenting data.
• Statistical Series can be classified:
– On the Basis of Characteristics:
* Time Series
* Spatial Series
* Condition Series
– On the Basis of Construction:
* Individual Series
* Discrete Series
* Continuous Series

6 https://www.youtube.com/watch?v=NWNW1jln8cc
7 https://www.youtube.com/watch?v=VunpIAw5pPg

19
1.7 On the Basis of Characteristics
When the data is arranged on the basis of qualitative characteristics, statistical series are of three kinds:
• Time Series
• Spatial Series
• Condition Series

20
1.7.1 Time series
• If the different values taken by a variable in a period of time are arranged in chronological order, the series
obtained is called a Time Series. Thus, a Time series is a series of data points indexed (or listed or graphed)
in time order.
• Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a
sequence of discrete-time data.
• Simply, time series is a statistical series in which the given data is presented with regard to time unit; i.e.,
day, month, week, or year.
• Time series analysis is used for non-stationary data—things that are constantly fluctuating over time or are
affected by time. Industries like finance, retail, and economics frequently use time series analysis because
currency and sales are always changing. Stock market analysis is an excellent example of time series
analysis in action, especially with automated trading algorithms. Likewise, time series analysis is ideal for
forecasting weather changes, helping meteorologists predict everything from tomorrow’s weather report to
future years of climate change.
• Examples of time series analysis in action include:
– Weather data
– Rainfall measurements
– Temperature readings
– Heart rate monitoring (EKG)
– Brain monitoring (EEG)
– Quarterly sales
– Stock prices
– Automated stock trading
– Industry forecasts
– Interest rates

21
1.7.2 Spatial Series
• Spatial data is any type of data that directly or indirectly references a specific geographical area or location.
• Example: The following is the sex ratio of 6 different states of India as per the Census of 2011.

22
1.7.3 Condition Series
• In this series, data is classified according to the changes occurring in variables according to certain condition,
then it is called a Condition Series.
• Students of a certain class arranged according to their age. Heights, weights, marks etc.
• Example: The following is the table showing the arrangement of 40 students in a class according to their
age. It is a condition series because the data is arranged on basis of the age of the students

23
1.8 On the Basis of Construction
89
When the data is arranged on the basis of quantitative characteristics, statistical series are of three kinds :

• Individual Series
– Unorganized Individual Series
– Organized Individual Series
• Discrete Series
• Continuous Series
– Exclusive Series
– Inclusive Series
– Open-end Distribution
– Cumulative Frequency Series
– Equal and Unequal Class Interval Series
– Mid-value Series

8 https://www.youtube.com/watch?v=NWNW1jln8cc
9 https://www.tutorialspoint.com/statistical-series

24
1.8.1 Individual Series (or Raw Data)
• Individual series is that series in which the terms are listed singly.
• In simple terms, a separate value of the measurement is given to each item.
• Example: If the marks of 10 students of Class is given individually as, 80, 82, 75, 95, 77, 81, 60, 35, 54, and
99; then, the resultant series will be an individual series.
• In such series, there is no class of the items and also there is no frequency of the items.
• The two types of individual series are:

1. Unorganized Individual Series


– A series with raw data or an unarranged mass of data is known as Unorganised Series. Raw
Data is the data in its original form.
– Simply put, when the investigator collects the data and has not arranged it in a systematic
manner, then the collected data will be known as unorganised data.
– The data presented through unorganised series does not provide the investigator with any useful
information; instead, it confuses them.
2. Organized Individual Series
– A series with orderly arranged raw data is known as Organised Individual Series.
– An individual series can be arranged in ascending or descending order.

25
1.8.2 Frequency
• In statistics, the frequency or absolute frequency of an event i is the number ni of times the observation has
occurred/recorded in an experiment or study.
• Frequency is basically the number of times a data item occurs in the series. In other words, it deals with
how frequent a data item is in the series.

1.8.2.1 Frequency Table


• After data collection, we have to show data in a meaningful manner for better understanding. Organize the
data in such a way that all its features are summarized in a table.
• A frequency table is a way to present data. The data are counted and ordered to summarize larger sets of
data. With a frequency table you can analyze the way the data is distributed across different values.
• Example: Twenty students were asked how many hours they worked per day. Their responses, in hours, are
as follows:
5; 6; 3; 3; 2; 4; 7; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3
Work Hours Frequency
2 3
3 5
4 3
5 6
6 2
7 1

Table 1.1: Frequency Table of Student Work Hours

1.8.2.2 Types of Frequency


10
• There are several types of Frequency is statistics:
– Frequency or Absolute Frequency
– Absolute Cumulative Frequency
– Relative Frequency
– Cumulative Relative Frequency

1.8.2.3 Frequency Distribution


• A frequency distribution shows the frequency of repeated items in a graphical form or tabular form.
• It gives a visual display of the frequency of items or shows the number of times they occurred.
• Types of Frequency Distribution:
– Ungrouped Frequency Distribution or Discrete Series
– Grouped Frequency Distribution or Continuous Series

10 https://edu.gcfglobal.org/en/statistics-basic-concepts/frequency-tables/1/

26
1.8.3 Discrete Series or UnGrouped Frequency Distribution or Frequency Array
• Discrete Series is nothing but ungrouped frequency distribution series where different values of the variables
are shown with their respective frequencies.
• The classification of data for a discrete variable is known as Frequency Array.
• In discrete series, data obtained in raw form are presented along with their frequencies. In such a series,
data are not presented in ascending or descending manner.
• Instead, the data and its frequencies are presented in a tabular or grouped manner.
• For example, if the monthly wages of five employees of a company are 10,000, 12,000, 10,000, 12,000,
13,000, 14,000, and 15,000, then the discrete series will be made as follows

27
1.8.4 Continuous Series or Grouped Frequency Distribution
• A discrete series cannot take any value in an interval; therefore, in cases where it is essential to represent
continuous variables with a range of values of different items of a given data, Continuous Series is used.
• In continuous series (grouped frequency distribution), the value of a variable is grouped into several class
intervals (such as 0-5,5-10,10-15) along with the corresponding frequencies.
• Other names of Continuous Series are Frequency Distribution, Grouped Frequency Distribution, Series with
Class Intervals, and Series of Grouped Data.
11 12
• Different types of Continuous Series :
– Inclusive Series
– Exclusive Series
– Open-end Distribution
– Cumulative Frequency Series
– Equal and Unequal Class Interval Series
– Mid-value Series

11 https://www.geeksforgeeks.org/types-of-frequency-distribution/
12 https://www.toppr.com/guides/economics/organisation-of-data/frequency-distribution/

28
1.8.4.1 Important Terms under Continuous Series
• Class: Class in Continuous Series refers to a group of numbers in which the items are placed. For example,
0-5, 5-10, 10-15, 15-20, 20-25, etc.
• Number of Classes: The decision regarding the number of classes of a given data usually depends upon
the judgement of the individual investigator. Even though there is no strict rule regarding the number of
classes, the number should not be very small or very large.
• Class Limits: In continuous series, the class limit is formed by the two numbers between which every class
is located. The lowest value of the class is known as Lower Limit and the highest value of the class is known
as Upper Limit. For example, if a class is 5 - 10, then 5 is the lower limit and 10 is the upper limit.
• Class Interval: It is the difference between the lower limit and upper limit of a class.
• Range: It is the difference between the lower limit of the first class interval and the upper limit of the last
class interval. For example, if the classes of a distribution are 0-5, 5-10, 10-15, . . . . . . . . . . . . .till 45-50, then
the range will be 50 – 0 = 50.
• Width of Class Intervals: At the time of constructing the frequency distribution, it is suggested that the
width of each class interval is equal in size. The formula for determining the size or width of each class
interval is as follows:

range
width = √
SampleSize

29
• How to make a grouped frequency table?
Example: A sociologist conducted a survey of 20 adults. She wants to report the frequency distribution of
the ages of the survey respondents. The respondents were the following ages in years:
52, 34, 32, 29, 63, 40, 46, 54, 36, 36, 24, 19, 45, 20, 28, 29, 38, 33, 49, 37

Range = Highest − Lowest


Range = 63 − 19
Range = 44
Range
Width = √
SampleSize
44
Width = √
20
Width = 9.84 ≈ 10
The class intervals are:

19 ≤ a ≤ 29
29 ≤ a ≤ 39
39 ≤ a ≤ 49
49 ≤ a ≤ 59
59 ≤ a ≤ 69

30
1.8.4.2 Inclusive Series
• The series with class intervals, in which all the items having the range from the lower limit up to the upper
limit are included, is known as Inclusive Series.
• However, there is a gap (between 0.1 to 1) between the upper-class limit of one class interval and the lower
limit of the next class interval.
• For example, class intervals of an inclusive series can be, 0-9, 10-19, 20-29, 30-39, and so on. In this case,
the gap between the upper limit of one class interval and the lower limit of the next class interval is 1.

Work Hours Frequency


0-9 0
10-19 2
20-29 8
30-39 3
40-49 5
50-59 6
60-69 6

Table 1.2: Frequency Distribution in Inclusive Series Example

• From the above table of inclusive series, it can be seen that the upper limit of one class interval (say, 9 of
interval 0-9) is not the same as the lower limit of the next class interval (10 of interval 10-19). Also, all the
values that come under 0-9, including 0 and 9 are included in the frequency against 0-9.
• For statistical calculation, sometimes it becomes necessary to convert the inclusive series into exclusive
series. Suppose, in the above example some students have obtained marks such as 10.5, 40,5, etc. In this
case, this series will be converted into exclusive series,

31
1.8.4.3 Exclusive Series
• The series with class intervals, in which all the items having the range from the lower limit to the value just
below its upper limit are included, is known as the Exclusive Series.
• For example, if a class interval is 0-10, and the values of the given series are 4, 10, 2, 15, 8, and 9, then only
4, 2, 8, and 9 will be included in the 0-10 class interval. 10 and 15 will be included in the next class interval,
i.e., 10-20.
• In Exclusive Series, the upper limit of a class interval is the lower limit of the next class interval.

Work Hours Frequency


0-10 0
10-20 2
20-30 8
30-40 3
40-50 5
50-60 6
60-69 6

Table 1.3: Frequency Distribution in Inclusive Series Example

• From the above table of exclusive series, it can be seen that the upper limits of the first class interval is the
lower limit of the second class interval, and so on.
• If the data includes a value 10, it will be included in the class interval 10-20, not in 0-10.

32
1.8.4.4 Conversion of Inclusive Series into Exclusive Series?
• For statistical calculation, sometimes it becomes necessary to convert the inclusive series into exclusive
series.
• Suppose, in the above example some students have obtained marks such as 10.5, 40, 5, etc. In this case, this
series will be converted into exclusive series,
• The steps for converting an inclusive series into exclusive series are:

– In this first step, calculate the difference between the upper class limit of one class interval and the
lower limit of the next class interval.
– The next step is to divide the difference by two and then add the resulting value to the upper limit of
every class interval and subtract it from the lower limit of every class interval.

• The inclusive series of the above example is converted into exclusive series as under:

Work Hours Frequency


0 - 9.5 0
9.5 - 19.5 2
19.5 - 29.5 8
29.5 - 39.5 3
39.5 - 49.5 5
49.5 - 59.5 6
59.5 - 69.5 6

Table 1.4: Frequency Distribution in Exclusive Series Example

33
1.8.4.5 Difference between Inclusive and Exclusive Series
• In Inclusive Series, the upper limit of one class interval is not the same as the lower limit of the next class
interval. There is a gap ranging from 0.1 to 1.0 between the upper class limit of one class interval and the
lower class limit of the next class interval. However, in the Exclusive Series, the upper limit of one class
interval is the same as the lower limit of the next class interval.
• In the case of Inclusive Series, the value of the upper and the lower limit are included in that class interval
only. However, in the case of Exclusive Series, the value of upper limit of a class interval is not included in
that interval, instead, it is included in the next class interval.
• Inclusive Series is suitable for an investigator only if the value is in complete number and not in decimal
form. However, an Exclusive Series is suitable for an investigator whether the value is in complete number
or decimal form.
• Counting in Inclusive Series is possible only after converting it into an Exclusive Series. However, counting
in Exclusive Series is possible in all cases.

34
1.8.5 Open End Series
• Sometimes the lower limit of the first class interval and the upper class limit of a series is not available;
instead, Less than or Below is mentioned in the former case (in place of the lower limit of the first class
interval), and More than or Above is mentioned in the latter case (in place of the upper limit of the last class
interval). These types of series are known as Open End Series.

• For statistical calculations, if one needs to change the first and last class open-end class interval into limits,
it can be done by the general practice of giving the same magnitude or class size to these intervals as the
class size of other class intervals.
• In the above example, the magnitude of other class intervals is 5. Therefore, the open-end class intervals
can be written as 5-10 and 30-35, respectively.

35
1.9 Types of Statistics
Statistics can be broadly classified into two main types:
1. Descriptive Statistics
2. Inferential Statistics

1.10 Descriptive Statistics


• Descriptive Statistics offers methods to describe and summarize data set by transforming raw observations
into meaningful information that is easy to interpret and share.
• A data set is a collection of responses or observations from a sample or entire population.
• The data set is summarized from a population sample using factors such as mean and standard deviation.
• Descriptive Statistics is a way of organizing, presenting, and explaining the data set using tables and graphs
(histograms, pie charts, bars, and scatter plots).

1.10.1 Key Components of Descriptive Statistics:


• Measures of Central Tendency:
– Mean: The average of a set of numbers.
– Median: The middle value when the data is ordered.
– Mode: The most frequently occurring value in the dataset.
• Measures of Dispersion/Variability/Spread:
– Range: The difference between the highest and lowest values.
– Variance: The average of the squared differences from the mean.
– Standard Deviation: The square root of the variance, representing the spread of the data.
– Interquartile Range (IQR): The range between the first quartile (25th percentile) and the third quartile
(75th percentile).
• Data Representations:
– Histograms: Bar graphs representing the frequency distribution of numerical data.
– Bar Charts: Graphs representing categorical data with rectangular bars.
– Pie Charts: Circular charts divided into sectors representing proportions.
– Box Plots: Visual representations of the distribution of data based on five-number summaries
(minimum, first quartile, median, third quartile, and maximum).
– Pictograph
– Frequency Distribution

36
1.11 Population vs Sample
• Population: A collection or set of individuals or objects or events whose properties are to be analyzed.
• Sample: A subset of the population is called ‘Sample’. A well-chosen sample will contain most of the
information about a particular population parameter.

• Outliers: An outlier is a data point that differs significantly from the majority of the data taken from a
sample or population. There are many possible causes of outliers, but here are a few to start you off:
– Natural variation in data
– Change in the behavior of the observed system
– Errors in data collection

37
1.12 Measures of Central Tendency
• Central Tendencies in Statistics are the numerical values that are used to represent mid-value or central value
a large collection of numerical data. These obtained numerical values are called central values in Statistics.
• Measures of central tendency are statistical metrics that describe or represents the center or the single value
as representative of the entire distribution or a dataset.
• Such a value is of great significance because it depicts the nature or characteristics of the entire data, which
is otherwise very difficult to observe.
• The three most common measures of central tendency are:
– Mean : provides the average value of the dataset
– Median: provides the central value of the dataset
– Mode: provides the most frequent value in the dataset

38
1.13 Mean
• Mean is the measure of central tendency and is mostly used in Statistics.
• Mean is the central tendency of the distributed data, which refers to the average value of the given set of
data.
• The method of finding the mean is also different depending on the type of data (Grouped or Ungrouped
Data).
• Mean is also referred to as the average.
• Mean is sensitive to skewed data and extreme values.

1.13.1 Types of Mean (or Pythagorean mean)


There are majorly four different types of mean value that you will be studying in statistics.

• Arithmetic Mean
• Geometric Mean
• Harmonic Mean
When not specified, the mean is generally referred to as the arithmetic mean.

39
1.14 Arithmetic Mean
1.14.1 How to Calculate Arithmetic Mean?
There are three ways to determine the arithmetic mean for both Grouped/Ungrouped Data or Individual, Discrete
13 14
and Continuous Series. .

• Direct Method
• Assumed Mean Method or Short-Cut Method
• Step Deviation Method

13 https://www.youtube.com/playlist?list=PLYwJOKtPsLuiFjFGKDFoPZOM0g4JBKUrj
14 https://www.youtube.com/playlist?list=PLEHGYFbPuuMEhz_AU8iCrBTYb5eNtFpeg

40
1.14.2 Mean of Individual Series
• Raw data is the dataset simply contains all the data in no particular manner.
• The series in which the items are listed singly is known as Individual Series.
• The mean is of raw data calculated by adding up all the observations and dividing it by the total number of
observations in the set.
• Mean = Sum of all Observations ÷ Total number of Observations
• The population mean is represented by the Greek letter µ (mu).

• The sample mean is represented by x (x-bar).


• The sample mean is usually the best, unbiased estimate of the population mean. However, the mean is
influenced by extreme values (outliers) and may not be the best measure of center with strongly skewed
data.

41
1.14.2.1 Direct Method
• The following equations compute the population mean and sample mean:

x1 + x2 ..... + xN
µ=
N
N
∑ xi
i=1
µ=
N
where, N is the total number of observations in the population

x1 + x2 ..... + xn
x=
n
n
∑ xi
i=1
x=
n
where, n is the total number of observations in the sample

42
1.14.2.2 Assumed Mean Method
15
• Assumed mean method finds the actual mean of the data by first assuming a mean value.
• When the calculation of the mean for raw data using the direct method becomes very tedious, then the mean
can be calculated using the assumed mean method.
• When calculating the mean using the direct mean method, you obtain significantly bigger numbers. The
likelihood of making calculating errors is decreased when utilizing the assumed mean approach, also known
as a shift of origin because it gives you smaller numbers to work with (as well as negative numbers that
lower the sum).
• The Assumed Mean method simplifies the calculation of the arithmetic mean by reducing the size of the
numbers involved in the calculation, making it easier to compute, thus suitable if your data set has large
values.
• The following equations compute the population mean and sample mean:

∑ di
µ = A+
N
where A is the assumed mean and d is the deviation from the mean

∑ di
x̄ = A +
n
• Advantages:
– Simplifies arithmetic by using smaller numbers.
– Reduces computational complexity.
• Disadvantages:
– Assumed mean is still a central value, so deviations might still be relatively large.
• How to Calculate Mean using Assumed Mean Method?: We can calculate mean using the assumed
mean method by following the below steps:
1. Choose an Assumed Mean (A): Select a value from the data, often a central value, to act as an
assumed mean.
2. Calculate the Deviations (d): Subtract the assumed mean from each data point to find the deviation
di = xi − A, where xi is each data point.
3. Find the Sum of Deviations (∑ di ): Add up all the deviations.
4. Calculate the Mean using the above formulas
15 https://testbook.com/maths/assumed-mean-method

43
• Example:
– Assume your data set is 73, 75, 76, 78 and 79.
– Sort your data set from smallest to largest.
– Assume a mean. This should be a number that you feel is a close representation of your data set.
– In a simple example, take the number in the center of your data set; in this case 76.
– Subtract your assumed mean from each data entry.
– In our example, 73−76 = −3, 75−76 = −1, 76−76 = 0, 78−76 = 2and79−
76 = 3
– Add together these differences from the mean.
– (−3) + (−1) + 0 + 2 + 3 = 1
– Divide the sum of the differences from assumed mean by the number of data points.
– 1/5 = 0.2
– Add the result of the division to your assumed mean.
– Mean = 76 + 0.2 = 76.2
• Example: Find the mean of the following data using Assumed mean method 40, 50, 55, 78, 58

n
∑d
i=1
x̄ = A +
n
x̄ = 40 + 81/5
Mean(x̄) = 56.2

44
• Find the average for the following data using Assumed mean method

∑d
x̄ = A +
N
17
x̄ = 8 +
10
Mean(x̄) = 9.7

45
1.14.2.3 Step-Deviation Method
• The Step Deviation method is an extension of the Assumed Mean method.
• This method further simplifies calculations by choosing a common factor (step size) to reduce the size of
the deviations from an assumed mean.
• Advantage:
– The step deviations simplify the calculations, especially when the original deviations are large or
involve complex numbers.
– Makes it easier to work with data when the values are spread out over a large range.
• Disadvantage:
– Requires an additional step of selecting an appropriate step size hh.
– May not always lead to simpler calculations if hh is not chosen wisely.
• How to Calculate Mean using Step Deviation Method?

– Choose an Assumed Mean (A): Select a value close to the center of your data as the assumed mean.
This value can be one of the data points.
– Calculate the Deviations (d): Subtract the assumed mean from each data point to find the deviation
di = xi − A, where xi is each data point.
– Select a Common Factor (h): Choose a common factor hh (also known as the step size), which could
be a convenient value, such as 2, 5, 10, etc., depending on the data range.
– Calculate Step Deviations: Divide each deviation by the chosen factor h to obtain the step deviations
ui .
di x i − A
ui = =
h h
– Find the Sum of Step Deviations (∑ ui ): Add up all the step deviations
– Calculate the Mean: The following equations compute the population mean and sample mean:

∑ ui
µ = A+h×
N

∑ ui
x̄ = A + h ×
n

46
• Example: Let’s consider the following ungrouped data: 47, 53, 59, 65, 71

1. Choose an Assumed Mean (A): Select A = 59 (a central value from the data).
2. Calculate Deviations (d):
(a) d1 = 47 − 59 = −12
(b) d2 = 53 − 59 = −6
(c) d3 = 59 − 59 = 0
(d) d4 = 65 − 59 = 6
(e) d5 = 71 − 59 = 12
3. Step 3: Select a Common Factor (h):
4. Calculate Deviations (d): Choose h = 6 (a convenient value given the range of deviations).
−12
(a) u1 = 6 = −2
−6
(b) u2 = 6 = −1
0
(c) u3 = 6 = 0
6
(d) u4 = 6 = 1
12
(e) u5 = 6 = 2

5. Calculate Step Deviations (ui ):


6. Find the Sum of Step Deviations (∑ ui ):

∑ ui = (−2) + (−1) + 0 + 1 + 2 = 0
7. Calculate the Arithmetic Mean using the above formula:

∑ ui
x̄ = A + h ×
n
0
= 59 + 6 ×
5
= 59

47
• Example: Find the mean of the following data using direct method, assumed mean method and step
deviation method. 40, 50, 55, 78, 58

48
• Find the average for the following data 35, 40, 60, 75, 90 using step-deviation method.

∑ ui
x̄ = A + h ×
n
0
= 60 + 5 ×
5
= 60

49
1.14.2.4 Assumed Mean Method vs Step Deviation Method
• The assumed mean method is typically used when the mean of the dataset is a known, predetermined value.
• This assumed mean method is appropriate when the focus is on calculating the standard deviation rather
than estimating the mean.
• The formula for the standard deviation using the assumed mean method is:
q
∑ni=1 (xi −x̄)2
s= n
• The step deviation method, on the other hand, is used when the mean of the dataset is unknown and needs
to be calculated as part of the standard deviation computation.
• This method involves calculating the deviations of each data point from the actual mean, and then using
those deviations to compute the standard deviation.
• The formula for the standard deviation using the step deviation method is:
r
∑ni=1 (xi −x̄)2
s= n−1

• To summarize:
– Use the assumed mean method when the mean is a known, predetermined value and the focus is on
calculating the standard deviation.
– Use the step deviation method when the mean is unknown and needs to be calculated as part of the
standard deviation computation.

50
1.14.3 Mean of Ungrouped Frequency Distribution or Discrete Series
• In discrete series (ungrouped frequency distribution), the values of variables represent the repetitions.
• It means that the frequencies are given corresponding to the different values of variables.
• The total number of observations in a discrete series, N , equals the sum of the frequencies, which is ∑ f i .
• Example of Discrete Series: If 6 students of a class score 50 marks, 4 students score 60 marks, 7 students
score 70 marks, 3 students score 80 marks, and 5 students score 90 marks, then this information will be
shown as:

Figure 1.1: Frequency Table

51
1.14.3.1 Direct Method
1. List the Data: Prepare a frquency table with values (xi ) and their corresponding frequencies ( f i )

2. Calculate the Product of (xi ) and ( f i ): Multiply each value by its frequency to get xi . f i

3. Find the Sum of the Products ∑(xi . f i ): Add all the products together.

4. Find the Total Frequency (∑ f i ): Add all the frequencies together.

5. Calculate the Arithmetic Mean (x̄): Use the formula:


• Arithmetic Mean for Sample:

f1 .x1 + f2 .x2 + .... + fn .xn


x̄ =
f1 + f2 + ...... + fn
n
∑ xi . f i
x̄ = i=1
∑ fi
• Arithmetic Mean for Population:

f1 .x1 + f2 .x2 + .... + fN .xN


µ=
f1 + f2 + ...... + fN
∑Ni=1 xi . fi
µ=
∑ fi

52
• Example:

53
• Example:

∑ xi . f i
x̄ =
∑ fi
264
=
28
= 9.42

54
• Example: Calculate the mean of the following distribution, which represents the scores obtained by students
in a quiz.

∑ xi . f i
x̄ =
∑ fi
3595
=
115
= 31.26

55
• Example: If the mean of the following distribution is 28, locate the missing frequency.

56
1.14.3.2 Assumed Mean Method
1. Choose an Assumed Mean (A): Select a value close to the center of the data as the assumed mean.

2. Calculate Deviations (di )): Find the deviation of each value from the assumed mean di − A.

3. Multiply Deviations by Frequencies: Multiply each deviation by its corresponding frequency f i × di .

4. Find the Sum of the Products: Add all the products together, ∑ f i .di

5. Calculate the Arithmetic Mean (x̄): Use the formula:


• Arithmetic Mean for Sample:

f1 .d1 + f2 .d2 + .... + fn .dn


x̄ = A +
f1 + f2 + ...... + fn
n
∑ fi .di
x̄ = A + i=1
∑ fi
• Arithmetic Mean for Population:

f1 .d1 + f2 .d2 + .... + fN .dN


µ = A+
f1 + f2 + ...... + fN
∑N fi .di
µ = A + i=1
∑ fi

57
58
• Example: Calculate the arithmetic mean for the following data using Assumed Mean Method.

• Take the assumed mean A = 80

∑ni=1 fi .di
x̄ = A +
∑ fi
= 80 + 115/50
= 82.3

59
• Example: Consider the following data set and calculate the mean using Direct, Assumed Mean and Step
Deviation Method.

• Answer: Mean = 153


• Example: Consider the following data set and calculate the mean using Direct, Assumed Mean and Step
Deviation Method.

• Answer: Mean = 48.02

60
1.14.3.3 Step-Deviation Method
1. Choose an Assumed Mean (A): Select a central value as the assumed mean.

2. Calculate Deviations (di )): Find the deviation of each value from the assumed mean di − A.

3. Select a Common Factor (h): Choose a step size based on the data.

4. Calculate Step Deviations (ui ): Find step deviations using ui = di /h.


5. Multiply Step Deviations by Frequencies: f i .ui

6. Find the Sum of the Products: ∑ fi .ui


7. Calculate the Arithmetic Mean
• Arithmetic Mean for Sample:

∑ni=1 fi .ui
x̄ = A + h ×
∑ fi
• Arithmetic Mean for Population:

∑Ni=1 fi .ui
µ = A+h×
∑ fi

61
62
• Determine the arithmetic mean from the following frequency table using Step-Deviation Method:

∑Ni=1 fi .ui
µ = A+h×
∑ fi
−20
= 60 + 10 ×
50
= 56

63
1.14.4 Mean of Continuous Series (or Grouped Frequency Distribution)
• In a continuous series (grouped frequency distribution), the data is grouped into class intervals with
corresponding frequencies.
• Each class interval represents a range of values (such as 0-5,5-10,10-15), and the frequency shows how
many observations fall within that interval.
• Example: If 15 students of a class score marks between 50-60, 10 students score marks between 60-70,
and 20 students score marks between 70-80, then this information will be shown as:

64
1.14.4.1 Direct Method
1. Determine the Midpoint (Class Mark) for Each Class Interval: For each class interval, find the midpoint
(xi ) using:
Lower Limit + U pper Limit
xi = 2

2. Calculate the Product of the Midpoint and Frequency: Multiply each midpoint by its corresponding
frequency f i .xi

3. Find the Sum of the Products: Add all the products together ∑(xi . f i )

4. Find the Total Frequency: Add all the frequencies together (∑ f i )


5. Calculate the Arithmetic Mean: Use the formula:
• Arithmetic Mean for Sample:

∑ni=1 xi . fi
x̄ =
∑ fi
• Arithmetic Mean for Population:

∑Ni=1 xi . fi
µ=
∑ fi

65
66
1.14.4.2 Assumed Mean Method
1. Determine the Midpoint (Class Mark) for Each Class Interval: For each class interval, find the midpoint
(xi ) using:
Lower Limit + U pper Limit
xi = 2

2. Choose an Assumed Mean (A): Select a midpoint value close to the center of the data as the assumed
mean.
3. Calculate Deviations (di )): Find the deviation of each value from the assumed mean di − A.

4. Multiply Step Deviations by Frequencies: f i .di

5. Find the Sum of the Products: ∑ fi .di


6. Find the Total Frequency: Add all the frequencies together (∑ f i )
7. Calculate the Product of the Midpoint and Frequency: Multiply each midpoint by its corresponding
frequency f i .xi

8. Find the Sum of the Products: Add all the products together ∑(xi . f i )

9. Find the Total Frequency: Add all the frequencies together (∑ f i )


10. Calculate the Arithmetic Mean
• Arithmetic Mean for Sample:

∑ni=1 fi .di
x̄ = A +
∑ fi
• Arithmetic Mean for Population:

∑Ni=1 fi .di
µ = A+
∑ fi

67
68
∑Ni=1 fi .di
µ = A+
∑ fi
= 25 + (−10/110)
= 24.9

69
∑Ni=1 fi .di
µ = A+
∑ fi
= 7 + (32/20)
= 8.6

70
∑Ni=1 fi .di
µ = A+
∑ fi
= 50 + (580/150)
= 53.87

71
∑Ni=1 fi .di
µ = A+
∑ fi
= 25 + (−10/110)
= 24.9

72
∑Ni=1 fi .di
µ = A+
∑ fi
= 40 + (360/35)
= 50.28

73
1.14.4.3 Step-Deviation Method
1. Determine the Midpoint (Class Mark) for Each Class Interval: For each class interval, find the midpoint
(xi ) using:
Lower Limit + U pper Limit
xi = 2

2. Choose an Assumed Mean (A): Select a midpoint value close to the center of the data as the assumed
mean.
3. Calculate Deviations (di )): Find the deviation of each value from the assumed mean di − A.

4. Select a Common Factor (h): Choose a step size based on the data.

5. Calculate Step Deviations (ui ): Find step deviations using ui = di /h.


6. Multiply Step Deviations by Frequencies: f i .ui

7. Find the Sum of the Products: ∑ fi .ui


8. Calculate the Arithmetic Mean
• Arithmetic Mean for Sample:

∑ni=1 fi .ui
x̄ = A + h ×
∑ fi
• Arithmetic Mean for Population:

∑Ni=1 fi .ui
µ = A+h×
∑ fi

74
∑Ni=1 fi .di
µ = A+h×
∑ fi
= 35 + 10 × (−4/35)
= 33.86

75
Example: Calculate average profit earned by 50 companies from the following data using Step Deviation
Method:

∑Ni=1 fi .di
µ = A+h×
∑ fi
= 50 + 20 × (−15/50)
= 44

76
1.14.5 Practice Questions
• Calculate the mean for the following set of data 2, 6, 7, 9, 15, 11, 13, 12
• If there are 5 observations, which are 27, 11, 17, 19, and 21 then find the mean
• Find the mean for the following sample data set: 6.4, 5.2, 7.9, 3.4
• Find the mean of 9, 6, -3, 2, -7, 1
• Find the mean of 5,10,15,20,25.
• Find the mean of the given data set: 10,20,30,40,50,60,70,80,90.
• Calculate the mean of the first 10 natural numbers.
• Find the mean of the first 10 even numbers.
• Find the mean of the first 10 odd numbers.
• The Mean of a series with 5 items is 40, and the values of four items are 35, 10, 65, 50. Find out the missing
5th item.

77
78
79
80
81
1.15 Geometric Mean
16
• The geometric mean is a measure of central tendency that is particularly useful when dealing with data
that involves rates, ratios, percentages, or data that grows exponentially.
• The geometric mean must be used when working with percentages, which are derived from values, while
the standard arithmetic mean works with the values themselves.

1.15.1 Advantages of Geometric Mean


• Appropriate for Multiplicative Relationships: The geometric mean is suitable when the data involves
rates of change, percentages, or ratios, especially in cases where the values are not independent of each
other.
• Less Affected by Extreme Values: Compared to the arithmetic mean, the geometric mean is less influenced
by extreme values, making it a more reliable measure in the presence of outliers.
• Consistent Measurement: The geometric mean provides a consistent measure of central tendency in
datasets that are log-normally distributed.
• Useful in Financial and Economic Data: It is commonly used in finance to calculate average growth rates
over time, such as the compound annual growth rate (CAGR).

1.15.2 Disadvantages of Geometric Mean


• Limited to Positive Numbers: The geometric mean can only be calculated for positive numbers. If any
value in the dataset is zero or negative, the geometric mean cannot be computed.
• Complexity in Calculation: Calculating the geometric mean involves multiplication and taking roots,
which can be more complex than calculating the arithmetic mean, especially for large datasets.
• Not Always Intuitive: The geometric mean may not be as intuitive to interpret as the arithmetic mean,
especially for those unfamiliar with the concept.
• Sensitive to Small Values: The geometric mean can be significantly affected by very small values in the
dataset, as these can drag the mean down.

1.15.3 Application of Geometric Mean


• Financial and Economic Analysis: The geometric mean is widely used in finance to calculate average
rates of return over time. For example, the compound annual growth rate (CAGR) of an investment is
calculated using the geometric mean.
• Data Involving Percentages and Ratios: When analyzing data involving percentages or ratios, such as
growth rates, inflation rates, or interest rates, the geometric mean is often the preferred measure of central
tendency.
• Growth Rates: The geometric mean is used to calculate average growth rates in various fields, including
biology, economics, and demographics.
• Risk Analysis: In risk analysis, the geometric mean is used to determine the average level of risk over time,
particularly when dealing with multiplicative processes.
• Population Studies: It is used in population studies where growth rates are considered, such as in epidemi-
ology to calculate the average rate of spread of diseases.

16 https://www.youtube.com/watch?v=HuZdOvoK4hM

82
1.15.4 Geometric Mean for Individual Series
th
• The geometric mean is calculated for a set of n values by calculating the n root of the product of all n
observed values.
th
• In other words, it is also defined as the n root of the product of n values.
• Before calculating this (geometric mean) measure of central tendency, note that:
– The geometric mean can only be found for positive values.
– If any value in the dataset is zero, the geometric mean is zero.
• There are two main steps to calculating the geometric mean:
– Multiply all values together to get their product.
th
– Find the n root of the product (n is the number of values).
– Given a set of n positive numbers x1 , x2 , x3 , . . . , xn , the geometric mean (GM) is calculated as:

Geometric Mean (GM) = nth root o f (x1 × x2 × x3 × x4 · · · n values)



= n x1 × x2 × x3 × · · · × xn
! 1n
n
= ∏ xi
i=1

83
84
85
• While the arithmetic means show higher efficiency for Machine B, the geometric means show that Machine
B is more efficient.
• The geometric mean is more accurate here because the arithmetic mean is skewed towards values that are
higher than most of your dataset.

86
1.15.5 Geometric Mean for Discrete Series
• Steps to Calculate Geometric Mean:

1. Multiply the Values by Their Frequencies: Raise each data value to the power of its corresponding
frequency.
2. Multiply All the Resulting Terms: Take the product of all the values obtained in step 1.
th
3. Take the Nth Root: Take the N root of the product, where N is the sum of the frequencies.

• Given a set of data with frequencies, the formula for the geometric mean (GM) is:

!1
n N q
f N f f
Geometric Mean (GM) = ∏ xi i = x11 × x22 × · · · × xnfn
i=1

where:
• x1 , x2 , . . . , xn are the data values.
• f1 , f2 , . . . , fn are the corresponding frequencies of these values.
• N = ∑ni=1 fi is the total number of observations.

87
88
Example:

Example:

The Geometric Mean of the given numbers is 211.15

89
1.15.6 Geometric Mean for Continuous Series
• Steps to Calculate Geometric Mean:
– Calculate the Midpoint (Class Mark) for Each Class Interval:
Lower Limit + U pper Limit
xi = 2
– Raise Each Midpoint xi to the Power of Its Frequency f i
– Multiply All the Results: Take the product of all the values obtained in step 2.
th
– Take the Nth Root: Take the N root of the product, where N is the sum of the frequencies.
• The formula for the geometric mean for continuous series is:

!1
n N q
f N f f
Geometric Mean (GM) = ∏ xi i = x11 × x22 × · · · × xnfn
i=1

where:
– x1 , x2 , . . . , xn are the Midpoint for Each Class Interval.
– f1 , f2 , . . . , fn are the corresponding frequencies or each Class Interval..
– N = ∑ni=1 fi is the total number of observations.

90
Step 1: Calculate the Midpoint

1000 + 2000
M1 = = 1500
2
2000 + 3000
M2 = = 2500,
2
3000 + 4000
M3 = = 3500,
2
4000 + 5000
M4 = = 4500
2
Step 2: Raise Each Midpoint to the Power of its Frequency

15003 = 3.375 × 109 ,


25005 = 9.765625 × 1016 ,
35004 = 1.500625 × 1014 ,
45002 = 2.025 × 107
Step 3: Product:

3.375 × 109 × 9.765625 × 1016 × 1.500625 × 1014 × 2.025 × 107


Step 4: Take the Nth Root:
– First, calculate the total frequency N

N = 3 + 5 + 4 + 2 = 14
th
– Now, take the 14 root of the product:
p
14
9.947 × 1047 ≈ 2782.56

91
• Example:

• Example:

The average percentage of Geometric Mean declared by the companies is 14.58

92
1.15.7 Geometric Mean vs Arithmetic Mean
• Use Cases:
– Arithmetic Mean:
* Used when data points are independent of each other.
* Commonly used for data that is additive in nature, such as total scores, sums, and averages in
daily use.
* Examples: Average income, average test scores, average temperature.

– Geometric Mean:
* Used when data points are interrelated, such as when dealing with rates of change, ratios, or
proportional growth.
* Commonly used in finance, economics, and population studies where growth rates, percentages,
or ratios are involved.
* Examples: Compound annual growth rate (CAGR), average growth rate of populations, invest-
ment returns over time.

• Sensitivity to Values:
– Arithmetic Mean:
* Highly sensitive to extreme values (outliers) as it directly sums all the values.
* An unusually high or low value can skew the mean.
– Geometric Mean:
* Less sensitive to extreme values since it multiplies values and takes the root, thereby diluting
the impact of outliers.
* Provides a better central tendency measure in skewed distributions, especially when dealing
with multiplicative processes.
• Mathematical Relationship:
– The geometric mean is always less than or equal to the arithmetic mean for any set of non-negative
data.
Geometric Mean ≤ Arithmetic Mean
– The only time they are equal is when all the data points are the same.
x1 = x2 = . . . = xn

93
1.15.8 When is the Geometric mean better than the Arithmetic mean?
• Multiplicative Processes or When Dealing with Growth Rates: If a stock grows by 20% in the first year,
30% in the second year, and declines by 10% in the third year, the geometric mean will give the true average
annual growth rate, considering the compounding effect.
• Non-Negative Data with Skewed Distributions:
– For datasets that are heavily skewed, especially with positive values, the geometric mean is less
influenced by extreme outliers than the arithmetic mean, providing a better central tendency measure.
– In a positively skewed distribution, there’s a cluster of lower scores and a spread-out tail on the right.
Income distribution is a common example of a skewed dataset.
– While most values tend to be low, the arithmetic mean is often pulled upward (or rightward) by high
values or outliers in a positively skewed dataset.

– Because the geometric mean tends to be lower than the arithmetic mean, it represents smaller values
better than the arithmetic mean.
• Normalizing Data: The geometric mean is useful in normalizing different data sets, especially when
comparing ratios or indices. For example, when comparing price indexes across different periods or regions,
the geometric mean helps in making the comparisons more meaningful.
• Consistent Measurement Across Different Scales or Combining Different Scales: When aggregating
data measured on different scales, the geometric mean helps in maintaining consistency, as it doesn’t
disproportionately weigh higher values.

94
1.16 Harmonic Mean
17
• The harmonic mean is one of the measures of central tendency that is particularly useful when the data
set contains rates, ratios, or is related to speeds (km/hr, km/liter, hour/semester and tonnes/per month).

1.16.1 Advantages of Harmonic Mean


• Appropriate for Rates and Ratios: The harmonic mean is most suitable for datasets involving rates, like
speeds, densities, or any other ratio-based data.
• Minimizes the Impact of Large Outliers: It gives less weight to larger numbers, which makes it useful
for situations where larger values could disproportionately affect the mean.

• Consistent with the Average Rate Concept: For example, the harmonic mean correctly calculates the
average speed when traveling the same distance at different speeds.
• Useful in Finance and Economics: It is commonly used to calculate the average price-earnings (P/E) ratio
or the average cost in dollar-cost averaging.

1.16.2 Limitations of Harmonic Mean


• Sensitive to Small Values: The harmonic mean can be significantly affected by very small values in the
dataset.
• Not Defined for Zero or Negative Values: The harmonic mean cannot be calculated if any of the data
values are zero or negative (as these would make reciprocals undefined). The harmonic mean is designed to
deal with rates, ratios, or averages of speeds, which are naturally positive quantities. Including negative
values in these contexts would not provide a valid or interpretable measure.
• Less Intuitive Interpretation: Compared to the arithmetic mean, the harmonic mean can be less intuitive,
especially for those unfamiliar with its concept.
• Complexity in Large Datasets: Calculating the harmonic mean for large datasets or when dealing with
complex ratios can be computationally intensive.

17 https://www.youtube.com/watch?v=LK52iuIp84o

95
1.16.3 Applications of Harmonic Mean
• Speed and Time Problems: The harmonic mean is used to find the average speed of an object over a
certain distance when it travels at different speeds for equal time intervals.
• Finance: In finance, the harmonic mean is used to calculate the average price-earnings ratio (P/E ratio) for
companies. It is also used in calculating the weighted average cost of capital (WACC).
• Physics: The harmonic mean is used in situations involving rates of change, such as calculating average
densities, resistance in parallel circuits, or optical problems involving different mediums.
• Economics: It is used to calculate the average rate of growth or to analyze economic data where rates or
ratios are prevalent.
• Harmonic Mean in Decision Making: In multi-criteria decision analysis, the harmonic mean is used when
the aggregation of criteria favors a balance rather than the dominance of one criterion over others.

96
1.16.4 Harmonic Mean for Individual Series
• The harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocals of the data values.
Here, the total number of observations is divided by the sum of reciprocals of all observations.
• Thus, the harmonic mean formula is given by

• Steps to Calculate Harmonic Mean:


1. Reciprocate Each Value: Find the reciprocal of each value in the dataset.
2. Sum the Reciprocals: Add up all the reciprocals obtained in step 1.
3. Divide the Number of Observations by the Sum of Reciprocals: Use the formula to calculate the
harmonic mean.
• Harmonic Mean of Two Numbers: Say we want to find the harmonic mean of any two numbers, a and b,
in a data set. Both a and b are non-zero numbers. Thus, using the aforementioned formula we get,

n=2
2
HM = 1
+ 1b
a
2ab
=
a+b

97
98
99
100
101
1.16.5 Harmonic Mean for a Discrete Series
• Given a set of n values x1 , x2 , . . . , xn with corresponding frequencies f 1 , f 2 , . . . , f n , the harmonic
mean (HM) can be calculated as:

N
Harmonic Mean (HM) = f
∑ni=1 xii
where:
– N = ∑ni=1 fi is the total number of observations (sum of all frequencies).
– xi is the ith data value.
– fi is the frequency of the ith data value.
• Steps to Calculate Harmonic Mean for a Discrete Series:
1
1. Calculate the Reciprocal of Each Value: For each data value xi , find the reciprocal x .
i
2. Multiply Each Reciprocal by Its Frequency: For each value, multiply its reciprocal by the corre-
sponding frequency f i .
3. Sum the Results: Sum all the products obtained from step 2.
4. Divide the Total Number of Observations by the Sum of Reciprocals: Use the formula to calcu-
late the harmonic mean.

102
• Example: Suppose we have the following dataset representing the frequency of different response times (in
seconds) of a computer system:

Response Time (xi ) Frequency ( f i )


2 3
4 5
6 4
8 2

Step 1: Calculate the Reciprocal of Each Value:

1 1 1 1
= 0.5, = 0.25, ≈ 0.1667, = 0.125
2 4 6 8
Step 2: Multiply Each Reciprocal by Its Frequency:

3 × 0.5 = 1.5

5 × 0.25 = 1.25

4 × 0.1667 ≈ 0.6668

2 × 0.125 = 0.25
Step 3: Sum the Results:

1.5 + 1.25 + 0.6668 + 0.25 = 3.6668


Step 4: Divide the Total Number of Observations by the Sum of Reciprocals:
First, calculate the total number of observations (N ):

N = 3 + 5 + 4 + 2 = 14
Now, use the harmonic mean formula:

N 14
Harmonic Mean (HM) = f
= ≈ 3.82 seconds
∑ni=1 xii 3.6668

103
104
1.16.6 Harmonic Mean for a Continuous Series
• Calculating the harmonic mean for a continuous series (grouped frequency distribution) is an extension of
the process used for discrete series.
• In a continuous series, data is grouped into class intervals, and the harmonic mean is calculated by using the
midpoints of these intervals along with their corresponding frequencies.
• Formula for Harmonic Mean in a Continuous Series: Given a set of class intervals, the harmonic mean
(HM) is calculated using the midpoints of the intervals and their frequencies. The formula is:

N
Harmonic Mean (HM) = f
∑ni=1 mii
where:
– N = ∑ni=1 fi is the total number of observations (sum of all frequencies).
– fi is the frequency of the ith class interval.
– mi is the midpoint of the ith class interval.
• Steps to Calculate Harmonic Mean for a Continuous Series
1. Find the Midpoint of Each Class Interval: For each class interval, calculate the midpoint (mi )
using the formula:
Lower Limit + Upper Limit
mi =
2
1
2. Calculate the Reciprocal of Each Midpoint: Find the reciprocal of each midpoint ( m ).
i
3. Multiply Each Reciprocal by Its Frequency: For each class interval, multiply the reciprocal of the
midpoint by the corresponding frequency ( f i ).
4. Sum the Results: Sum all the products obtained from step 3.
5. Divide the Total Number of Observations by the Sum of Reciprocals: Use the formula to calculate
the harmonic mean.

105
• Example: Suppose we have the following data representing the time taken (in minutes) by a group of
students to complete a test, grouped into class intervals:

Class Interval (minutes) Frequency ( f i )


10-20 5
20-30 8
30-40 12
40-50 6

Step 1: Find the Midpoint of Each Class Interval:

10 + 20
m1 = = 15
2
20 + 30
m2 = = 25
2
30 + 40
m3 = = 35
2
40 + 50
m4 = = 45
2
Step 2: Calculate the Reciprocal of Each Midpoint:

1 1 1 1
≈ 0.0667, = 0.04, ≈ 0.0286, ≈ 0.0222
15 25 35 45
Step 3: Multiply Each Reciprocal by Its Frequency:

5 × 0.0667 = 0.3335

8 × 0.04 = 0.32

12 × 0.0286 ≈ 0.3432

6 × 0.0222 ≈ 0.1332
Step 4: Sum the Results:

0.3335 + 0.32 + 0.3432 + 0.1332 = 1.13


Step 5: Divide the Total Number of Observations by the Sum of Reciprocals:
First, calculate the total number of observations (N ):

N = 5 + 8 + 12 + 6 = 31
Now, use the harmonic mean formula:

31
Harmonic Mean (HM) = ≈ 27.43 minutes
1.13

106
1.16.7 Weighted Harmonic Mean
• The weighted harmonic mean is an extension of the harmonic mean that accounts for the importance or
weight of each observation.
• It is particularly useful when different data points contribute unequally to the overall mean, where not all
observations have the same significance.
• Formula for Weighted Harmonic Mean: Given a set of values x1 , x2 , . . . , xn with corresponding
weights w1 , w2 , . . . , wn , the weighted harmonic mean (WHM) is calculated using the formula:

∑ni=1 wi
Weighted Harmonic Mean (WHM) = n w
∑i=1 xii
where:
– wi is the weight associated with the ith value.
– xi is the ith data value.
• Steps to Calculate Weighted Harmonic Mean
1
1. Calculate the Reciprocal of Each Value: For each data value xi , find the reciprocal x .
i
2. Multiply Each Reciprocal by Its Corresponding Weight: For each value, multiply its reciprocal
by the corresponding weight wi .
3. Sum the Results: Sum all the products obtained from step 2.
4. Divide the Sum of the Weights by the Sum of the Weighted Reciprocals: Use the formula to
calculate the weighted harmonic mean.

107
• Example: Suppose we have the following dataset representing the time taken (in hours) by different
machines to complete a task, with the number of tasks completed as the weight:

Time (xi ) Number of Tasks (wi )


2 3
4 5
6 2

Step 1: Calculate the Reciprocal of Each Value:

1 1 1
= 0.5, = 0.25, ≈ 0.1667
2 4 6
Step 2: Multiply Each Reciprocal by Its Corresponding Weight:

3 × 0.5 = 1.5

5 × 0.25 = 1.25

2 × 0.1667 ≈ 0.3334
Step 3: Sum the Results:

1.5 + 1.25 + 0.3334 = 3.0834


Step 4: Sum the Weights:

3 + 5 + 2 = 10
Step 5: Divide the Sum of the Weights by the Sum of the Weighted Reciprocals:

10
Weighted Harmonic Mean (WHM) = ≈ 3.24 hours
3.0834

108
109
110
1.17 Relation Between AM, GM, and HM
• This inequality holds for any set of positive numbers and illustrates that the arithmetic mean is always
greater than or equal to the geometric mean, which in turn is greater than or equal to the harmonic mean.

HM ≤ GM ≤ AM
• Consider a set of positive numbers x1 , x2 , . . . , xn , the square of the geometric mean is equal to the
product of the arithmetic mean and the harmonic mean.

HM × AM = GM2
• This relationship indicates a special balance between the different means. The geometric mean, being the
square root of the product of the arithmetic mean and the harmonic mean, shows its centrality among these
means.

111
112
1.18 Median
• The median is a measure of central tendency that represents the middle value in a sorted, ascending or
descending, list of numbers.
• If the dataset contains an odd number of observations, the median is the middle number. If there is an even
number of observations, the median is typically calculated as the average of the two middle numbers.

1.18.1 Advantages of the Median


• Not Affected by Extreme Values (Outliers): The median is robust to outliers and skewed data because it
focuses on the middle value rather than the overall distribution.
• Simple to Understand and Calculate: The concept of the median is straightforward—it is simply the
middle value in a sorted list. This makes it easy to understand and calculate, especially for small datasets.
• Useful for Ordinal Data: The median is suitable for ordinal data (data that can be ranked or ordered) since
it focuses on the position of values. It can be used even when the actual distances between ranks are not
equal or known.
• Unaffected by Open-ended Distributions: The median can still be computed in cases where data distri-
bution is open-ended (e.g., income levels "above 200,000") because it does not require knowing the exact
values at the tails of the distribution.
• Useful in Non-normal Distributions: For skewed distributions (e.g., income, property prices), the median
provides a better representation of central tendency than the mean, which can be pulled toward the longer
tail.

113
1.18.2 Limitations of the Median
• Not Suitable for All Types of Data: The median is not appropriate for nominal data (data that cannot be
ordered or ranked) because it requires a sense of order among the values.
• Less Sensitive to Data Changes: The median is less sensitive to small changes in the data than the mean.
For example, changing any value that is not at the median position does not affect the median, while it may
affect the mean.
• Does Not Utilize All Data Points: The median only considers the middle value(s), ignoring the actual
values of all other observations. Therefore, it may not provide a comprehensive picture of the dataset,
especially when the distribution is complex or multimodal.
• Difficult to Use in Mathematical Calculations: Unlike the mean, which can be easily used in further
statistical calculations (like variance and standard deviation), the median does not lend itself to further
mathematical analysis as easily.
• Less Informative for Small Sample Sizes: In very small datasets, the median might not provide as clear an
insight into central tendency as it does in larger datasets because the middle value might not be representative
of the overall trend.

1.18.3 Applications of the Median


• Income and Wealth Distribution: In economics, the median is often used to report income or wealth
distribution. This helps in understanding the typical income of a population, especially when the income
distribution is highly skewed.
• Real Estate Market Analysis: Median property prices are often reported in real estate markets to provide
a clearer picture of the typical market condition, avoiding the skewing effect of extremely high or low
property values.
• Describing Central Tendency in Skewed Distributions: In fields like environmental science, biology,
and medicine, where data distributions are often skewed (e.g., pollutant levels, survival times), the median
is a preferred measure of central tendency.
• Public Health and Social Science: Median values are commonly used in health statistics to report measures
like the median age, median survival time, or median income. This helps to represent a central trend without
the influence of outliers.
• Consumer Behavior: Companies might use the median to understand typical customer spending patterns
or other behaviors, avoiding being misled by extreme behavior that could skew the mean.
• Quality Control: In manufacturing and quality control, the median can be used to assess the central
tendency of product measurements, such as weight, size, or other attributes, to ensure they are within
acceptable limits.

114
1.18.4 Median for Individual Series
To calculate the median for an individual series (ungrouped or raw data series), follow these steps:

• Arrange the Data in Order:


Sort the data values in ascending order (from the smallest to the largest).
• Determine the Number of Observations (n):
Count the total number of data points in the series. Let this number be n.
• Find the Median Position:
The median position depends on whether n (the total number of observations) is odd or even:
– If n is odd, the median is the middle value of the sorted list.
– If n is even, the median is the average of the two middle values.
Specifically:

115
n+1
– If n is odd: Median position = 2 th value

116
– If n is odd: Median position = The average of n2 th and n2 + 1 th values

117
• Examples: Odd Number of Observations Consider the following individual series of data:
7, 3, 5, 9, 1
1. Arrange in Ascending Order:
1, 3, 5, 7, 9
2. Count the Observations (n):
n = 5 (Odd number)
3. Find the Median Position:

n+1 5+1
Median position = = =3
2 2
4. Identify the Median Value:
The 3rd value in the ordered list is 5.

Therefore, the median is 5.

118
• Example: Even Number of Observations
Consider the following individual series of data: 8, 3, 4, 10
1. Arrange in Ascending Order:
3, 4, 8, 10
2. Count the Observations (n):
n = 4 (even number)
3. Find the Median Position:
n n
Median position = Average of and + 1 values
2 2
n 4 n
= = 2 and + 1 = 3
2 2 2
4. Identify the Median Value:
The 2nd value is 4 and the 3rd value is 8.

4+8
Median = =6
2
Therefore, the median is 6.

119
• Example:
– Step 1: Consider the data: 4, 4, 6, 3, and 2. Let’s arrange this data in ascending order: 2, 3, 4, 4, 6.
– Step 2: Count the number of values. There are 5 values.
– Step 3: Look for the middle value. The middle value is the median. Thus, median = 4.

120
• Example: The age of the members of a weekend poker team has been listed below. Find the median of the
above set. 42, 40, 50, 60, 35, 58, 32

– Step 1: Arrange the data items in ascending order.


Original set: 42, 40, 50, 60, 35, 58, 32
Ordered Set: 32, 35, 40, 42, 50, 58, 60
– Step 2: Count the number of observations.
– Step 3: Calculate the median using the formula

Median = [(n + 1)/2] th term


= (7 + 1)/2 th term
= 4th term
= 42

121
122
123
124
125
126
1.18.5 Median for Discrete Series
Calculating the median for a discrete series (ungrouped frequency distribution) involves determining the middle
value of the data, taking into account the frequency of each data point. Here’s a step-by-step guide:

1. Prepare the Data:


List the data values and their corresponding frequencies. A discrete series consists of individual data points
and their frequencies.
For example, consider the following data:

Data (x) Frequency (f)


2 3
4 5
6 4
8 2
10 1
2. Calculate the Cumulative Frequency (cf):
Calculate the cumulative frequency for each data point. The cumulative frequency is the running total of the
frequencies up to that point.

Data (x) Frequency (f) Cumulative Frequency (cf)


2 3 3
4 5 8
6 4 12
8 2 14
10 1 15
3. Find the Median Position:
Calculate the total number of observations (N ), which is the sum of all frequencies:

N = ∑ f = 3 + 5 + 4 + 2 + 1 = 15
The median position is given by:
N +1
Median position =
2
In this example:
15 + 1
Median position = =8
2
4. Locate the Median Class:
Look at the cumulative frequency column to find where the median position falls. Identify the first cumulative
frequency that is equal to or greater than the median position.
In this example, the 8th position falls within the cumulative frequency of 8, which corresponds to the data
value x = 4.
5. Determine the Median Value:
The median is the data value corresponding to the median position. Based on the cumulative frequency, the
median position of 8 falls under the data value x = 4.
Therefore, the median is 4.

127
Example:

Data (x) Frequency (f)


5 2
10 3
15 5
20 4
25 1
1. Calculate Cumulative Frequency:

Data (x) Frequency (f) Cumulative Frequency (cf)


5 2 2
10 3 5
15 5 10
20 4 14
25 1 15
2. Find N and Median Position:

N = 2 + 3 + 5 + 4 + 1 = 15
15 + 1
Median position = =8
2
3. Locate the Median Class:
The 8th position is under the cumulative frequency of 10, corresponding to x = 15.
4. Determine the Median:
The median is 15.

128
1.18.6 Median for Continuous Series
Calculating the median for a continuous series (grouped frequency distribution) involves finding the value that
divides the data into two equal parts, taking into account the frequencies and class intervals. Here’s a detailed guide:

1. Prepare the Data:


Organize the data into a table with class intervals and corresponding frequencies. A continuous series
groups data into intervals with associated frequencies.
For example, consider the following data:

Class Interval Frequency (f)


0 − 10 5
10 − 20 8
20 − 30 12
30 − 40 7
40 − 50 3
2. Calculate the Cumulative Frequency (cf):
Calculate the cumulative frequency for each class interval. The cumulative frequency is the running total of
the frequencies up to that interval.

Class Interval Frequency (f) Cumulative Frequency (cf)


0 − 10 5 5
10 − 20 8 13
20 − 30 12 25
30 − 40 7 32
40 − 50 3 35
3. Find the Median Position:
Calculate the total number of observations (N ), which is the sum of all frequencies:

N = ∑ f = 5 + 8 + 12 + 7 + 3 = 35
The median position is given by:
N
Median position =
2
In this example:
35
Median position = = 17.5
2
4. Locate the Median Class:
Identify the class interval where the cumulative frequency is greater than or equal to the median position.
This class interval is called the median class.
In this example, the cumulative frequency first exceeds 17.5 at 25, which falls in the class interval 20 − 30.

129
5. Apply the Median Formula:
Use the following formula to calculate the median:
!
N
2 − cF
Median = L+ ×C
fm

where:
• L = lower boundary of the median class
• cF = cumulative frequency of the class preceding the median class
• f m = frequency of the median class
• C = class width (size of the class interval)
For the class interval 20 − 30:
• L = 20
• cF = 13 (cumulative frequency of the class before the median class)
• f m = 12 (frequency of the median class)
• C = 10 (width of each class interval)
Substituting the values into the formula:
 
17.5 − 13
Median = 20 + × 10
12
 
4.5
= 20 + × 10
12
= 20 + (0.375) × 10
= 20 + 3.75 = 23.75

130
Example: Let’s use another example to solidify the concept:

Class Interval Frequency (f)


5 − 15 6
15 − 25 11
25 − 35 15
35 − 45 8
45 − 55 5
1. Calculate Cumulative Frequency:

Class Interval Frequency (f) Cumulative Frequency (cf)


5 − 15 6 6
15 − 25 11 17
25 − 35 15 32
35 − 45 8 40
45 − 55 5 45
2. Find N and Median Position:

N = 6 + 11 + 15 + 8 + 5 = 45
45
Median position = = 22.5
2
3. Locate the Median Class:
The 22.5th position is under the cumulative frequency of 32, corresponding to the class interval 25 − 35.
4. Apply the Median Formula:
!
N
2 − cF
Median = L+ ×C
fm
 
22.5 − 17
= 25 + × 10
15
 
5.5
= 25 + × 10
15
= 25 + 3.67 ≈ 28.67

131
132
133
134
The median = 45.71

135
1.19 Mode
• Mode is a measure of central tendency that identifies the most frequently occurring value in a dataset.
• Mode: The mode of a dataset is the value (or values) that appear most frequently. A dataset may have
one mode (unimodal), more than one mode (bimodal or multimodal), or no mode at all (if all values are
unique).
• Example:
– Single Mode (Unimodal):
* Data: 2, 4, 4, 4, 5, 6, 7
* Mode: 4 (since 4 appears most frequently)
– Two Modes (Bimodal):
* Data: 1, 2, 3, 4, 4, 5, 5, 6
* Modes: 4 and 5 (both appear twice, more than any other values)
– Multiple Modes (Multimodal):
* Data: 1, 2, 2, 3, 3, 4, 4
* Modes: 2, 3, and 4 (each appears twice)
– No Mode:
* Data: 1, 2, 3, 4, 5
* Mode: None (no value repeats)

136
1.19.1 Advantages of Mode
• Simplicity and Ease of Calculation: The mode is straightforward to identify and calculate, especially in
small or ordered datasets.
• Applicable to Categorical Data: Mode is particularly useful for qualitative data, where arithmetic opera-
tions (like calculating a mean) are not meaningful. For example, identifying the most common category
(e.g., favorite color, most purchased product). In others words, the mode is especially useful for categorical
or nominal data, where arithmetic operations are not meaningful.
• Not Affected by Extreme Values: Since the mode is based on frequency, it is not influenced by extreme
values (outliers), which can skew the mean.

1.19.2 Limitations of Mode


• Not Always Unique: A dataset can have no mode or multiple modes, making it less reliable as a single
measure of central tendency.
• Less Informative with Large Data: For large or continuous datasets, the mode may not provide as much
insight as the mean or median, especially if there are no distinct peaks in the data distribution.
• Ignores Other Data: The mode only considers the most frequent value(s) and ignores the rest of the data,
which can lead to a loss of information.
• Sensitive to Binning in Continuous Data: When continuous data is grouped into intervals, the mode can
change significantly based on the choice of intervals, making it less robust in such cases.

1.19.3 Applications of Mode


• Market Research: In surveys and market research, mode helps identify the most preferred product, service,
or brand. For example, finding the most commonly purchased item in a store.
• Public Health: Mode can be used to determine the most common disease or health condition in a population,
which helps in resource allocation and planning.
• Education: In educational assessments, the mode can indicate the most common grade or score among
students, providing insights into teaching effectiveness or areas needing improvement.
• Sociology and Demography: Mode is used to identify the most common social or demographic character-
istic in a population, such as the most common household size or the most frequent age group.
• Retail and Inventory Management: Identifying the mode in sales data helps retailers stock the most
popular items, optimizing inventory and reducing stockouts.

137
1.19.4 Mode for Individual Series
Steps to Calculate the Mode for Individual Series:

1. Organize the Data: Arrange the data values in ascending or descending order to facilitate the identification
of frequencies.
2. Count the Frequency: Determine the frequency of each value (i.e., how many times each value appears in
the dataset).
Identify the Mode: The mode is the value with the highest frequency. If multiple values share the highest
frequency, the dataset is multimodal (i.e., has more than one mode).

138
Example 1: Unimodal Series (Single Mode)

• : Consider the following dataset of student test scores:


Data: 85, 78, 92, 85, 70, 85, 92, 78
• Steps to Calculate Mode:
1. Organize the Data:
Sorted Data: 70, 78, 78, 85, 85, 85, 92, 92
2. Count the Frequency:
70: appears 1 time
78: appears 2 times
85: appears 3 times
92: appears 2 times

3. Identify the Mode: The mode is 85 because it appears 3 times, which is more frequent than any
other value.
• Result: Mode = 85

139
Example 2: Bimodal Series (Two Mode)

• : Consider the following dataset of student test scores:


Data: 12, 15, 18, 15, 19, 12, 17, 15, 12
• Steps to Calculate Mode:
1. Organize the Data:
Sorted Data: 12, 12, 12, 15, 15, 15, 17, 18, 19
2. Count the Frequency:
12: appears 3 times
15: appears 3 times
17: appears 1 time
18: appears 1 time
19: appears 1 time

3. Identify the Mode: The mode values are 12 and 15, each appearing 3 times.
• Result: 12 and 15 (bimodal)

140
Example 3: No Mode

• Consider the following dataset of student test scores:


Data: 5, 7, 9, 10, 11
• Steps to Calculate Mode:
1. Organize the Data:
Sorted Data: 5, 7, 9, 10, 11
2. Count the Frequency:
5: appears 1 time
7: appears 1 time
9: appears 1 time
10: appears 1 time
11: appears 1 time

3. Identify the Mode: Since each value appears only once, there is no mode.
• Result: No Mode (all values have the same frequency)

141
142
143
1.19.5 Mode for Discrete Series
Steps to Calculate the Mode for Discrete Series

1. List the Values and Their Frequencies: Start by organizing the data into a table format, showing each
distinct value and its corresponding frequency.
2. Identify the Highest Frequency: Look for the highest frequency in the list. The value corresponding to
this highest frequency is the mode.
3. Check for Multiple Modes: If more than one value shares the highest frequency, the series is multimodal,
meaning it has multiple modes.

144
Eaxmple:
Number of Sales (x) Frequency (f)
5 2
6 3
7 6
8 8
9 5
10 4
1. Organize the Data:
The data is already organized into a table with the number of sales (x) and their corresponding frequencies
( f ).
2. Identify the Highest Frequency:
From the table, the highest frequency is 8.
3. Find the Value Corresponding to the Highest Frequency:
The number of sales corresponding to the highest frequency (8) is 8.

Result:
Mode = 8
(The most frequent number of sales made by a representative is 8.)

145
Eaxmple:
Number of Pets (x) Frequency (f)
0 3
1 5
2 7
3 7
4 2
5 1
1. Organize the Data:
The data is already listed with the number of pets (x) and their frequencies ( f ).
2. Identify the Highest Frequency:
The highest frequency here is 7.
3. Find the Values Corresponding to the Highest Frequency:
The number of pets corresponding to the highest frequency (7) are 2 and 3.

Result:
Modes = 2 and 3
(The most common number of pets owned by families is either 2 or 3. This dataset is bimodal.)

146
147
1.19.6 Mode for Continuous Series
Steps to Calculate the Mode for Continuous Series:

1. Identify the Modal Class: The modal class is the class interval with the highest frequency.
2. Use the Mode Formula for Grouped Data: Once the modal class is identified, use the following formula
to calculate the mode:
 
fm − fm−1
Mode = L + ×h
( fm − fm−1 ) + ( fm − fm+1 )
where:
• L = Lower boundary of the modal class
• fm = Frequency of the modal class
• fm−1 = Frequency of the class preceding the modal class
• fm+1 = Frequency of the class succeeding the modal class
• h = Width of the class intervals (assuming all classes have the same width)

148
Example:
Class Interval (Scores) Frequency (f)
0 − 10 5
10 − 20 8
20 − 30 12
30 − 40 20
40 − 50 10
50 − 60 6
1. Identify the Modal Class: The class interval with the highest frequency is the modal class. In this case,
the modal class is 30-40 (with a frequency of 20).
2. Calculate the Mode Using the Formula:

• L = 30 (Lower boundary of the modal class)


• fm = 20 (Frequency of the modal class)
• fm−1 = 12 (Frequency of the class preceding the modal class, 20-30)
• fm+1 = 10 (Frequency of the class succeeding the modal class, 40-50)
• h = 10 (Width of the class intervals)
Substitute these values into the mode formula:
 
20 − 12
Mode = 30 + × 10
(20 − 12) + (20 − 10)
 
8
Mode = 30 + × 10
8 + 10
 
8
Mode = 30 + × 10
18
 
4
Mode = 30 + × 10
9
Mode = 30 + 4.44

Mode ≈ 34.44
Result: Mode ≈ 34.44
(The most frequent score range is around 34.44).

149
150
151
152
153
154
1.19.7 Relationship Between Mean, Median, and Mode

1. Symmetric Distribution (or Normal or Gaussian Distribution):


• In a perfectly symmetric (normal) distribution, the mean, median, and mode are all equal and located
at the center of the distribution.
• Relationship:

Mean = Median = Mode


• Example: Heights of adults in a population are often normally distributed.
2. Positively Skewed Distribution (or Right-Skewed):
• In a positively skewed distribution, the tail on the right side of the distribution is longer or fatter than
the left side.
• Relationship:

Mode < Median < Mean


• Example: Income distribution in many countries, where a small number of individuals earn signifi-
cantly more than the majority.
3. Negatively Skewed Distribution (Left-Skewed):
• In a negatively skewed distribution, the tail on the left side is longer or fatter than the right side.
• Relationship:

Mean < Median < Mode


• Example: Age at retirement, where most people retire around a certain age, but a few retire much
earlier.

155
4. Empirical Relationship (Approximation):
• In many practical scenarios, especially with moderate skewness, the mean, median, and mode can be
approximated using the following empirical formula:

Mean − Mode ≈ 3 × (Mean − Median)


2 × Mean + Mode ≈ 3 × Median
• This relationship helps to estimate one measure if the other two are known and indicates the degree
of skewness.

156
1.20 Measure of Dispersion
• The measures of central tendency (like mean, median, and mode) provide information about the central
point of a dataset, measures of dispersion tell us how much the data values deviate from this central point.
• Measures of dispersion in statistics are statistical tools used to describe how spread out or scattered the data
is around an average value. It helps to understand if the data points are close together or far apart.
• Measures of Dispersion measure the scattering of the data. It tells us how the values are distributed in the
data set.
• They help in understanding the degree of variability and the reliability of the central tendency.
• Dispersion shows the variability or consistency in a set of data.

1.20.1 Advantages of Measures of Dispersion


• Understanding Data Variability: Measures of dispersion provide insights into how spread out the data
values are, helping to understand the reliability of the mean or median.
• Identifying Outliers: They help in identifying outliers, which can significantly affect data analysis and
decision-making.
• Comparing Datasets: By comparing the dispersion of different datasets, one can determine which dataset
has more variability and thus, potentially more uncertainty.
• Making Informed Decisions: Knowing the extent of data variability is crucial for making accurate
predictions, quality control, and risk management.

1.20.2 Limitations of Measures of Dispersion


• Sensitivity to Outliers: Some measures, like range and standard deviation, are sensitive to extreme values,
which can distort the measure of dispersion.
• Not Informing About Data Shape: Measures of dispersion do not provide information about the shape of
the data distribution (e.g., skewness or kurtosis).
• Interpretation Difficulty: Measures like variance and standard deviation, which are in squared units, may
be harder to interpret compared to the original data units.
• Dependence on Sample Size: Some measures, like range, can change with the size of the dataset, making
it less reliable for small samples.

1.20.3 Applications of Measures of Dispersion


• Finance: Used to assess the risk or volatility of investments. Standard deviation is a key metric in portfolio
management.
• Quality Control: Variance and standard deviation are used to monitor the consistency and quality of
manufacturing processes.
• Healthcare: Measures of dispersion help in analyzing the spread of health-related variables, such as blood
pressure or cholesterol levels, across a population.
• Market Research: Understanding the variability in consumer preferences, spending habits, or product
ratings.
• Education: Analyzing the spread of student test scores to understand performance variability and improve
teaching methods.

157
1.20.4 Types of Measures of Dispersion
Measures of dispersion can be classified into the following two types :

1. Absolute Measure of Dispersion


2. Relative Measure of Dispersion

158
1.20.4.1 Absolute Measure of Dispersion
• The measures of dispersion that measure and express the amount of variation in a dataset in the units of data
themselves are called Absolute Measure of Dispersion.
• They provide a direct and specific measure of the spread of values.
• Advantages of Absolute Measures:
– Easy Interpretation: Since they are in the same units as the data, they are easy to understand and
interpret.
– Direct Measurement: They provide a direct measure of the spread of data.
• Limitations of Absolute Measures:
– Unit Dependency: These measures are dependent on the units of measurement, which makes
comparisons across different datasets with different units challenging.
– Not Scaled: They do not provide relative comparisons or standardized measures of variability.
• Some absolute measures of dispersion are:
– Range
– Mean Deviation
– Standard Deviation
– Variance
– Quartile Deviation
– Interquartile Range

1.20.4.2 Relative Measure of Dispersion


• Relative measures of dispersion are dimensionless measures that express the amount of variation in a dataset
relative to a central value, such as the mean or median.
• They provide a way to compare variability across datasets with different units or scales.
• We use relative measures of dispersion to measure the two quantities that have different units to get a better
idea about the scattering of the data.
• Advantages of Relative Measures:
– Unit-Free: These measures are not dependent on the units of measurement, making them useful for
comparing variability across different datasets or variables.
– Scalability: They provide standardized measures of dispersion that can be easily compared.
• Limitations of Relative Measures:
– Complexity: Relative measures can be more complex to calculate and understand compared to
absolute measures.
– Dependence on Central Value: Their interpretation is heavily dependent on the accuracy and
stability of the central value (e.g., mean), which can be affected by outliers.
• Here are some of the relative measures of dispersion:

– Coefficient of Range
– Coefficient of Variation
– Coefficient of Mean Deviation
– Coefficient of Quartile Deviation

159
1.21 Range
• The range is the simplest measure of dispersion, calculated as the difference between the maximum and
minimum values in a dataset.
• It provides a quick snapshot of the extent to which the data varies but does not provide information about
the distribution of values within the range.

160
1.21.1 Range for Individual Series
• The range is the difference between the largest and the smallest values in the distribution.
• How to Calculate Range?
1. Identify the maximum value (the largest value) in your dataset.
2. Identify the minimum value (the smallest value) in your dataset.
3. Subtract the minimum value from the maximum value to find the range.

Range = Maximum value − Minimum value

161
162
163
164
165
1.21.2 Range for Discrete Series
• In the context of a discrete series, which typically represents data with specific values and their corresponding
frequencies, the range is still calculated as the difference between the maximum and minimum values.
• These values are the highest and lowest data points in the series, not the frequencies themselves.
• The frequencies (how often each goal count occurs) do not affect the calculation of the range. The range is
purely based on the extreme values in the data.
• Steps to Calculate Range for a Discrete Series
1. Identify the Maximum Value: Find the highest value in the dataset (the maximum).
2. Identify the Minimum Value: Find the lowest value in the dataset (the minimum).
3. Calculate the Range: Subtract the minimum value from the maximum value.

Range = Maximum value − Minimum value

166
167
1.21.3 Range for Continuous Series
• In a continuous series, the data is grouped into intervals or classes. To calculate the range for such a series,
we focus on the class boundaries rather than individual data points.
• The range in a continuous series is the difference between the upper boundary of the highest class and the
lower boundary of the lowest class.
• Steps to Calculate Range for a Continuous Series:
1. Identify the Upper Boundary of the Highest Class: This is the maximum value of the dataset,
represented by the upper limit of the last class interval.
2. Identify the Lower Boundary of the Lowest Class: This is the minimum value of the dataset,
represented by the lower limit of the first class interval.
3. Calculate the Range: Subtract the lower boundary of the lowest class from the upper boundary of
the highest class.

Range = Upper Limit of the Last Class Interval − Lower Limit of First Class Interval

168
169
170
1.21.4 Advantages of Range
• Simplicity: The range is easy to understand and calculate. It gives a quick sense of the variability in the
dataset.
• Quick Comparison: It is useful for making quick comparisons between the spread of different datasets.
• Initial Insight: Provides an initial idea about the variability or spread of the data, which can be useful in
exploratory data analysis.

1.21.5 Limitations of Range


• Sensitivity to Outliers: The range is highly sensitive to extreme values (outliers). A single very high or
very low value can significantly affect the range.

In the example above, the range indicates much more variability in the data than there actually is. Although
we have a large range, most values are actually clustered around a clear middle.

No Information About Distribution: It does not provide any information about how the values are
distributed within the range. Two datasets can have the same range but different distributions.
Not Robust: The range is not a robust measure of dispersion because it only considers the extreme values
and ignores the rest of the data.

171
1.21.6 Applications of Range
• Quality Control: In manufacturing, the range can be used to monitor the consistency of product dimensions,
weights, or other characteristics.
• Weather Reports: The range is often used in weather reports to indicate the difference between the highest
and lowest temperatures recorded in a day or a specific period.
• Finance: In finance, the range can be used to measure the volatility of stock prices over a given period.
• Education: Teachers and educational researchers use the range to analyze the spread of students’ scores in
tests and exams.

172
1.22 Mean Deviation (MD)
• Mean deviation is used to show how far the observations are situated from the central point of the data (the
central point can be either mean, median or mode).
• We simply define the mean deviation of the given data distribution as the mean of the absolute average
deviations of the observations from a suitable central value. This suitable central value can be the mean,
median, and mode of any one of the central tendencies of the data.
• Steps to Calculate Mean Deviation:
1. Calculate the Central Value: Determine whether you are calculating the MD around the Mean ,
Median or Mode and calculate that value.
2. Find Absolute Deviations: Compute the absolute deviations of each data point from the chosen
central value.
3. Sum the Absolute Deviations: Add up all the absolute deviations.
4. Calculate the Mean Deviation: Divide the sum of absolute deviations by the number of observations.
• Some deviations might be positive and some might be negative from central value of the data. If they are
added like that, their sum will not reveal much as they tend to cancel each other’s effect.

173
1.22.1 Mean Deviation for Individual Series
For an individual series, the Mean Deviation can be calculated around the Mean (x) (Section 1.14.2), Median (M )
(Section 1.18.4) or Mode (Mo ) (Section 1.19.4). The formulas are as follows:

1. Mean Deviation about the Mean:

∑ni=1 |xi − x|
Mean Deviation (MD) =
n
2. Mean Deviation about the Median:

∑ni=1 |xi − M|
Mean Deviation (MD) =
n
3. Mean Deviation about the Mode:

∑ni=1 |xi − Mo |
Mean Deviation (MD) =
n
where:

• n = number of observations
• xi = each individual observation
• x = mean of the data
• M = median of the data
• Mo = mode of the data
• |xi − x| or |xi − M| or = |xi − Mo | absolute deviation from the mean, median or Mode

174
1.22.1.1 Mean Deviation around Mean for Individual Series
:
Consider a dataset representing the number of hours students study per day:

{2, 3, 4, 5, 7}
• Step 1: Calculate the Mean

2 + 3 + 4 + 5 + 7 21
x= = = 4.2
5 5
• Step 2: Find the Absolute Deviations from the Mean

|2 − 4.2| = 2.2
|3 − 4.2| = 1.2
|3 − 4.2| = 1.2
|4 − 4.2| = 0.2
|5 − 4.2| = 0.8
|7 − 4.2| = 2.8

• Step 3: Sum the Absolute Deviations

Sum of absolute deviations = 2.2 + 1.2 + 0.2 + 0.8 + 2.8 = 7.2


• Step 4: Calculate the Mean Deviation

7.2
Mean Deviation = = 1.44
5
Explanation: The mean deviation of 1.44 hours indicates that, on average, each student’s study time deviates
from the mean study time (4.2 hours) by about 1.44 hours. This provides a sense of how much the study habits vary
among the students.

175
176
177
178
179
180
181
1.22.1.2 Mean Deviation around Median for Individual Series

182
1.22.1.3 Mean Deviation around Mode for Individual Series
• Consider a dataset representing the number of items sold by a store each day:

{3, 3, 4, 4, 4, 5, 6}
• Step 1: Find the Mode
The mode (Mo) is the most frequently occurring value:

Mo = 4
• Step 2: Find the Absolute Deviations from the Mode

|3 − 4| = 1
|3 − 4| = 1
|4 − 4| = 0
|4 − 4| = 0
|4 − 4| = 0
|5 − 4| = 1
|6 − 4| = 2

• Step 3: Sum the Absolute Deviations

Sum of absolute deviations = 1+1+0+0+0+1+2 = 5


• Step 4: Calculate the Mean Deviation

5
Mean Deviation = ≈ 0.71
7

183
1.22.2 Mean Deviation for Discrete Series
The general formula for Mean Deviation (MD) around a central value (C) (Mean - Section 1.14.3, Median - 1.18.5 ,
Mode - 1.19.5) is:

∑ |xi −C| × fi
Mean Deviation (MD) =
∑ fi
where:

• xi = each individual observation


• fi = frequency of each observation
• C = central value (mean, median, or mode)
• |xi −C| = absolute deviation from the central value
• ∑ fi = total number of observations (sum of frequencies)

184
1.22.2.1 Mean Deviation around the Mean for Discrete Series
1. Calculate the Mean (x):
∑ni xi . fi
x= n
∑i fi
2. Find Absolute Deviations from the Mean: Calculate |xi − x| for each observation xi .

3. Multiply by Frequencies: Multiply each absolute deviation by its corresponding frequency f i .


4. Sum and Calculate Mean Deviation:

∑ |xi − x| × fi
MD =
∑ fi

185
Example:
xi (Value) fi (Frequency)
2 3
4 5
6 4
8 2

1. Calculate the Mean:

(2 × 3) + (4 × 5) + (6 × 4) + (8 × 2) 6 + 20 + 24 + 16 66
x= = = ≈ 4.71
3+5+4+2 14 14
2. Find Absolute Deviations from the Mean

xi fi |xi − x| |xi − x| × fi
2 3 |2 − 4.71| = 2.71 2.71 × 3 = 8.13
4 5 |4 − 4.71| = 0.71 0.71 × 5 = 3.55
6 4 |6 − 4.71| = 1.29 1.29 × 4 = 5.16
8 2 |8 − 4.71| = 3.29 3.29 × 2 = 6.58
3. Mean Deviation Around Mean:

8.13 + 3.55 + 5.16 + 6.58 23.42


MD = = ≈ 1.67
14 14

186
1.22.2.2 Mean Deviation around the Median for Discrete series
1. Find the Median: For discrete series, find the cumulative frequency to locate the median class (Section ??).

2. Find Absolute Deviations from the Median: Calculate |xi − M| for each observation xi .

3. Multiply by Frequencies: Multiply each absolute deviation by its corresponding frequency f i .


4. Sum and Calculate Mean Deviation:

∑ |xi − M| × fi
MD =
∑ fi

187
Example:
14+1
1. Find the Median: Since there are 14 total observations, the median position is at the 2 = 7.5th
observation.

xi fi Cumulative Frequency
2 3 3
4 5 8
6 4 12
8 2 14

2. The median M = 4.
xi fi |xi − M| |xi − M| × fi
2 3 |2 − 4| = 2 2 × 3 = 6
4 5 |4 − 4| = 0 0 × 5 = 0
6 4 |6 − 4| = 2 2 × 4 = 8
8 2 |8 − 4| = 4 4 × 2 = 8
3. Mean Deviation Around Median:

6 + 0 + 8 + 8 22
MD = = ≈ 1.57
14 14

188
189
190
1.22.2.3 Mean Deviation around the Mode for Discrete Series
1. Identify the Mode: The mode is the value that appears most frequently in the series.
2. Calculate Absolute Deviations from the Mode: For each observation xi , calculate the absolute deviation
from the mode |xi − Mo|.

3. Multiply by Frequencies: Multiply each absolute deviation by the corresponding frequency f i .


4. Calculate the Mean Deviation:

∑ |xi − Mo | × fi
MDMode =
∑ fi
Where:
• xi = each individual observation
• fi = frequency of each observation
• Mode = the modal value
• |xi − Mode| = absolute deviation from the mode
• ∑ fi = total number of observations (sum of frequencies)

191
Example:
xi (Value) fi (Frequency)
2 3
4 5
6 4
8 2
1. Step 1: Identify the Mode
The mode is the value with the highest frequency. Here, the mode is Mode = 4 because it has the highest
frequency (5).
2. Step 2: Calculate Absolute Deviations from the Mode
Calculate |xi − 4| for each value xi :

xi fi |xi − 4|
2 3 |2 − 4| = 2
4 5 |4 − 4| = 0
6 4 |6 − 4| = 2
8 2 |8 − 4| = 4
3. Step 3: Multiply by Frequencies
Now multiply the absolute deviations by the corresponding frequencies:

xi fi |xi − 4| |xi − 4| × fi
2 3 2 2×3 = 6
4 5 0 0×5 = 0
6 4 2 2×4 = 8
8 2 4 4×2 = 8
4. Step 4: Calculate the Mean Deviation Around the Mode: Finally, sum all the products and divide by the
total frequency:
6 + 0 + 8 + 8 22
MD = = ≈ 1.57
3 + 5 + 4 + 2 14

192
1.22.3 Mean Deviation for Continuous Series
The central values Mean, Median and Mode for Continuous series can be revised in Section 1.14.4, Section 1.18.6,
and Section 1.19.6 respectively.

1.22.3.1 Mean Deviation around Mean for a Continuous Series


1. Find the Midpoints:
Lower Class Limit + Upper Class Limit
xi =
2
2. Calculate the Mean Section 1.14.4:
∑ f i xi
x=
∑ fi
where f i is the frequency and xi is the midpoint.
3. Calculate Absolute Deviations from the Mean:

|xi − x|
4. Multiply Absolute Deviations by Frequency:

fi |xi − x|
5. Calculate the Mean Deviation Around the Mean:

∑ fi |xi − x|
MDx =
∑ fi

193
Example:
Consider the following frequency distribution:

Class Interval Frequency ( f i )


0 − 10 5
10 − 20 8
20 − 30 12
30 − 40 10
40 − 50 5
1. Find midpoints xi :
Class Interval fi xi
0 − 10 5 5
10 − 20 8 15
20 − 30 12 25
30 − 40 10 35
40 − 50 5 45
2. Calculate the mean:

5 × 5 + 8 × 15 + 12 × 25 + 10 × 35 + 5 × 45 925
x= = = 23.125
40 40
3. Absolute deviations and their products with frequency:

xi fi |xi − x| fi |xi − x|
5 5 18.125 90.625
15 8 8.125 65
25 12 1.875 22.5
35 10 11.875 118.75
45 5 21.875 109.375
4. Calculate Mean Deviation Around the Mean:

90.625 + 65 + 22.5 + 118.75 + 109.375 406.25


MDx = = = 10.156
40 40

194
1.22.3.2 Mean Deviation around Median for a Continuous Series
1. Find the Median Class (1.18.6): The median class is the class interval where the cumulative frequency is
greater than or equal to half the total frequency.
2. Use the Formula to Calculate the Median:
!
N
2 − cF
Median = L+ ×C
fm
where:
• L = lower boundary of the median class
• cF = cumulative frequency of the class preceding the median class
• f m = frequency of the median class
• C = class width (size of the class interval)
3. Calculate Absolute Deviations from the Mean: For each observation xi , calculate the absolute deviation
from the mode |xi − M|.
4. Multiply by Frequencies and calculate the Mean Deviation as done with the Mean.

∑ fi |xi − Median|
MD =
∑ fi

195
Example:

Now compute the absolute deviations and Mean Deviation around the median:

104.165 + 86.664 + 9.996 + 91.67 + 95.835 388.33


MDMedian = = = 9.71
40 40

196
1.22.3.3 Mean Deviation around Mode for Continuous Series
1. Identify the Modal Class: The modal class is the class interval with the highest frequency.
2. Find the mode Section 1.18.6 using the formula:

f1 − f0
Mode = L+ ×h
(2 f1 − f0 − f2 )
Where:
• L = lower boundary of the modal class
• f1 = frequency of the modal class
• f0 = frequency of the class preceding the modal class
• f2 = frequency of the class succeeding the modal class
• h = class width
3. Calculate the Absolute Deviations from the Mode:

|xi − Mode|
where xi is the midpoint of each class interval.
4. Multiply Absolute Deviations by Frequencies:

fi |xi − Mode|
where f i is the frequency of each class interval.
5. Calculate Mean Deviation around the mode using the formula:

∑ fi |xi − Mode|
MDMode =
∑ fi

197
Example:
1. Consider the following frequency distribution:

Class Interval Frequency ( f i )


0 − 10 5
10 − 20 8
20 − 30 12
30 − 40 10
40 − 50 5
2. Step 1: Identify the Modal Class
The class interval 20 − 30 has the highest frequency (12), so it is the modal class.
3. Step 2: Calculate the Mode
Using the formula:
f1 − f0
Mode = L+ ×h
(2 f1 − f0 − f2 )
where:
• L = 20 (lower boundary of the modal class)
• f1 = 12 (frequency of the modal class)
• f0 = 8 (frequency of the class before, 10 − 20)
• f2 = 10 (frequency of the class after, 30 − 40)
• h = 10 (class width)
Substituting the values:

12 − 8
Mode = 20 + × 10
(2 × 12 − 8 − 10)
4
Mode = 20 + × 10
(24 − 18)
4 40
Mode = 20 + × 10 = 20 + = 20 + 6.67 = 26.67
6 6
So, the mode is approximately 26.67.
4. Step 3: Calculate the Midpoints xi
The midpoints xi for each class interval are calculated as follows:

Lower Class Limit + Upper Class Limit


xi =
2
Class Interval xi (Midpoint)
0 − 10 5
10 − 20 15
20 − 30 25
30 − 40 35
40 − 50 45

198
5. Step 4: Calculate the Absolute Deviations and Multiply by Frequency

xi fi |xi − Mode| fi |xi − Mode|


5 5 |5 − 26.67| = 21.67 5 × 21.67 = 108.35
15 8 |15 − 26.67| = 11.67 8 × 11.67 = 93.36
25 12 |25 − 26.67| = 1.67 12 × 1.67 = 20.04
35 10 |35 − 26.67| = 8.33 10 × 8.33 = 83.30
45 5 |45 − 26.67| = 18.33 5 × 18.33 = 91.65
6. Step 5: Calculate the Mean Deviation Around the Mode
The total frequency (∑ f i ) is:

∑ fi = 5 + 8 + 12 + 10 + 5 = 40
The sum of f i |xi − Mode| is:

108.35 + 93.36 + 20.04 + 83.30 + 91.65 = 396.7


Finally, the Mean Deviation around the Mode is:

396.7
MDMode = = 9.92
40

199
1.22.4 Advantages of Mean Deviation
• Simplicity: Mean deviation is easy to understand and compute, as it involves basic arithmetic operations.
• As it is based on all of the Data values provided, it will provide a more accurate assessment of dispersion.
• Uses Absolute Values: By using absolute deviations, mean deviation avoids the problem of positive
and negative deviations canceling each other out, which is common in calculating variance and standard
deviation.
• Indicative of Variability: Mean deviation provides a straightforward indication of the average amount of
variation or dispersion from the central value.

1.22.5 Limitations of Mean Deviation


• It can be determined with respect to Mean, median, and Mode, therefore it isn’t strictly defined.
• As we use the absolute value, we ignore both negative and positive indicators. This may result in inaccuracies
in the final product.
• Less Sensitive to Extreme Values: Unlike variance and standard deviation, which square the deviations,
mean deviation may understate the impact of extreme values or outliers because it only considers the
absolute differences.
• Not as Widely Used: Mean deviation is not as commonly used as variance or standard deviation in statistical
analysis, which can make it less familiar to some practitioners.
• Lacks Mathematical Properties: Mean deviation lacks some of the desirable mathematical properties
(e.g., differentiability) that variance and standard deviation have, which limits its use in more advanced
statistical methods.

1.22.6 Applications of Mean Deviation


• Economics: Mean deviation is used to measure the variability of economic data, such as income distribution
or price changes.
• Quality Control: In manufacturing, mean deviation helps assess the consistency of product dimensions or
other quality characteristics.
• Social Sciences: Researchers use mean deviation to understand the spread of survey responses or demo-
graphic data, particularly when comparing groups.

200
1.23 Sampling
• Why we need sampling?: Consider a scenario wherein you’re asked to perform a survey about the eating
habits of teenagers in the US. There are over 42 million teens in the US at present and this number is
growing as you read this blog. Is it possible to survey each of these 42 million individuals about their health?
Obviously not! That’s why sampling is used.
• How can one choose a sample that best represents the entire population?. Sampling is a statistical
method that deals with the selection of individual observations within a population that best represents the
entire population.
• There are two main types of Sampling techniques:
– Probability Sampling
– Non-Probability Sampling

201
1.23.1 Probability Sampling
• This is a sampling technique in which samples from a large population are chosen using the theory of
probability.
• Probability sampling techniques ensure that every member of the population has a known and non-zero
chance of being selected.
• There are three types of probability sampling:
– Simple Random Sampling or Random Sampling
– Systematic Sampling
– Stratified Sampling

202
1.23.1.1 Random Sampling
• In this method, each member of the population has an equal chance of being selected in the sample.
• Example: A company wants to survey its employees’ job satisfaction. They use a random number generator
to select 50 employees out of 500, ensuring each employee has an equal chance of being chosen.

• Advantages:
– Easy to implement.
– Reduces selection bias.
• Disadvantages:
– Requires a complete list of the population.
– May not be practical for large populations.

203
1.23.1.2 Systematic Sampling
• In Systematic sampling, every nth record is chosen from the population to be a part of the sample after a
random starting point.
• Example: In a factory with 1000 products, an inspector selects every 10th product for quality testing,
starting with the 5th product randomly.

• Advantages:
– Simple and quick to implement.
– Ensures a spread across the population.
• Disadvantages:
– Can introduce bias if there is a hidden pattern in the population.

204
1.23.1.3 Stratified Sampling
• Stratified sampling divides the population into stratum/strata (subgroups).
• A stratum is a subset of the population that shares at least one common characteristic.
• After this, the random sampling method is used to select a sufficient number of subjects from each stratum.
• Example: A researcher wants to study the income levels of different age groups. They divide the population
into age strata (e.g., 18-29, 30-49, 50-69) and randomly select individuals from each stratum.

• Advantages:
– Ensures representation of all subgroups.
– Increases precision.
• Disadvantages:
– Requires detailed population information.
– More complex to administer.

205
1.23.1.4 Cluster Sampling
• Divides the population into clusters, randomly selects some clusters, and then samples all or some members
within those clusters.
• Example: A school district wants to evaluate student performance. They randomly select 5 out of 20
schools (clusters) and then test all students in those selected schools.

• Advantages:
– Cost-effective for large populations.
– Reduces travel and administrative costs.
• Disadvantages:
– Less precise if clusters are not homogeneous.
– Can increase sampling error.

206
1.23.1.5 Multi-Stage Sampling
• Multistage sampling is an extension of cluster sampling in that, first, clusters are randomly selected and,
second, sample units within the selected clusters are randomly selected.
• It involves multiple stages of sampling, where each stage becomes progressively smaller and more focused.
• Here’s a step-by-step explanation:
– Stage 1: Primary Sampling Units (PSUs) - Divide the population into larger groups or clusters, such
as cities, states, or regions.
– Stage 2: Secondary Sampling Units (SSUs) - Select a random sample of PSUs.
– Stage 3: Tertiary Sampling Units (TSUs) - Divide the selected SSUs into smaller sub-groups, such as
neighborhoods or blocks.
– Stage 4: Final Sample - Select a random sample of individuals or units from the TSUs.
• Example: A national health survey first randomly selects regions (stage 1), then randomly selects towns
within those regions (stage 2), and finally selects households within those towns (stage 3).

• Advantages:
– Flexible and cost-effective.
– Suitable for large-scale surveys.
• Disadvantages:
– Complex to design and analyze.
– Errors can accumulate at each stage, affecting the overall accuracy of the sample.
– If not properly implemented, multistage sampling can introduce bias at each stage.

207
1.23.2 Non-Probability Sampling
• Non-probability sampling techniques do not provide every individual with a known or equal chance of being
selected.
• These techniques are often used when probability sampling is not feasible.

1.23.2.1 Convenience Sampling


• Convenience sampling is also known as opportunity sampling,
• Convenience sampling method involves collecting samples from easily accessible locations or sources.
• Example: A researcher surveys customers at a shopping mall because they are easily accessible.
• Advantages:
– Quick and inexpensive.
– Easy to implement.
• Disadvantages:
– High risk of bias.
– Not representative of the population.

1.23.2.2 Judgmental Sampling


• Snowball sampling is a recruitment technique in which research participants are asked to assist researchers
in identifying other potential subjects.
• Participants recruit other participants from their acquaintances. Thus the sample group is said to grow like a
rolling snowball.
• Example: A researcher studying a rare disease starts with one patient and asks them to refer other patients
they know.
• Advantages:
– Useful for hard-to-reach populations.
– Builds networks of related subjects.
• Disadvantages:
– Potential for bias.
– Limited control over the sample composition.

208
1.23.2.3 Quota Sampling
• Quota sampling is a method for selecting survey participants that is a non-probabilistic version of stratified
sampling.
• Ensures that specific characteristics (quotas) are represented in the sample.
• Quota sampling is a non-probability sampling method that relies on the non-random selection of a predeter-
mined number or proportion of units. This is called a quota.
You first divide the population into mutually exclusive subgroups (called strata) and then recruit sample
units until you reach your quota. These units share specific characteristics, determined by you prior to
forming your strata.
• Example: A researcher ensures that their sample includes a certain number of men and women, age groups,
and ethnic backgrounds, reflecting the population’s proportions.
• Advantages:
– Ensures representation of specific groups.
– More practical than stratified sampling.
• Disadvantages:
– Can introduce bias.
– Not random, limiting generalizability.

209

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy