Sta111 Lecture Note 1
Sta111 Lecture Note 1
INTRODUCTION
Statistics plays a vital role in numerous disciplines by providing tools for data collection, analysis,
interpretation, and decision-making. Below is a comprehensive exploration of how statistics is applied
across different fields, highlighting its importance and practical use cases.
Statistics in Computer Science: In computer science, statistics is essential for designing algorithms,
machine learning, data mining, and performance analysis. It helps in managing and interpreting large
datasets, often referred to as "big data," to derive insights and make predictions.
Applications:
1. Machine Learning and AI: Statistical models form the backbone of machine learning algorithms,
where data is used to train models that can predict outcomes.
➢ Example: A spam filter in email services uses logistic regression, a statistical method, to
classify emails as spam or non-spam.
2. Performance Analysis: Statistics help in analyzing the efficiency and performance of algorithms.
➢ Example: Calculating the average execution time of sorting algorithms to determine their
efficiency.
3. Natural Language Processing (NLP): Statistical methods are used to process and analyze textual
data.
➢ Example: Sentiment analysis in social media posts using statistical classification
techniques.
Statistics in Physics: Physics often deals with experimental data and uncertainties, making statistics
crucial for analyzing and interpreting results. Statistical tools help in understanding complex systems
and validating theories through empirical data.
Applications in Physics:
1. Quantum Mechanics: Probability distributions describe the behavior of particles at a quantum
level.
➢ Example: The Heisenberg Uncertainty Principle uses statistical concepts to explain the
limits of measuring particle position and momentum simultaneously.
2. Experimental Data Analysis: Physicists use statistical methods to analyze experimental data and
reduce measurement errors.
➢ Example: Analyzing data from the Large Hadron Collider (LHC) to discover new particles.
Page 1 of 7
Statistics in Geology: Geologists use statistics to analyze geological data, predict natural disasters,
and assess the availability of natural resources.
Applications:
1. Mineral Exploration: Statistical models help estimate the probability of finding minerals in a
given area.
➢ Example: Geostatistical techniques are used to predict the presence of oil reserves.
2. Seismology: Analyzing seismic data to predict earthquakes and understand tectonic movements.
➢ Example: Calculating the probability of future earthquakes based on historical data.
Statistics in Engineering: Engineering relies on statistics for quality control, reliability testing, and
optimization of processes. It helps engineers design systems that meet performance and safety
standards.
Applications in Engineering:
1. Quality Control: Statistical Process Control (SPC) ensures that products meet quality standards.
➢ Example: Monitoring the diameter of manufactured parts to ensure they fall within specified
tolerances.
2. Reliability Engineering: Predicting the lifespan of components and systems.
➢ Example: Calculating the failure rate of electronic components to design more reliable systems.
Statistics in Biological Sciences and Medicine: Statistics is indispensable in biology and medicine
for designing experiments, analyzing biological data, and evaluating treatments' effectiveness.
Applications:
1. Clinical Trials: Statistical methods evaluate the effectiveness of new drugs or treatments.
➢ Example: A randomized controlled trial uses statistics to determine whether a new cancer drug
is more effective than existing treatments.
2. Epidemiology: Studying the distribution and determinants of diseases in populations.
➢ Example: Analyzing data from COVID-19 cases to predict infection rates and outcomes.
3. Genetics: Analyzing genetic data to understand hereditary diseases.
➢ Example: Statistical models identify genes associated with diseases like diabetes.
Statistics in Economics: Economists use statistics to analyze economic data, forecast trends, and
evaluate policies. It helps in understanding market behavior and economic indicators.
Applications:
1. Economic Forecasting: Predicting future economic trends based on historical data.
➢ Example: Using time series analysis to forecast inflation rates.
2. Policy Evaluation: Assessing the impact of government policies on the economy.
➢ Example: Analyzing the effects of tax cuts on consumer spending.
3. Market Analysis: Understanding consumer behavior and market dynamics.
➢ Example: Analyzing survey data to understand demand for new products.
Statistics in Banking and Finance: In banking and finance, statistics helps in risk assessment,
investment analysis, and portfolio management. It supports decision-making and financial planning.
Applications in Banking and Finance:
1. Risk Management: Identifying and quantifying risks associated with investments.
➢ Example: Value-at-Risk (VaR) models estimate the potential loss in a portfolio.
2. Credit Scoring: Assessing the creditworthiness of borrowers using statistical models.
➢ Example: Banks use logistic regression to predict the likelihood of loan default.
3. Stock Market Analysis: Predicting stock prices and market trends.
➢ Example: Statistical arbitrage strategies in quantitative trading.
Statistics in Accounting: In accounting, statistics aids in auditing, cost control, and financial analysis.
It ensures accuracy and compliance with financial regulations.
Applications in Accounting:
Page 2 of 7
1. Auditing: Statistical sampling techniques select a representative sample of transactions for
auditing.
➢ Example: Auditors use random sampling to verify financial statements.
2. Cost Analysis: Estimating costs and identifying cost-saving opportunities.
➢ Example: Analyzing production costs to identify inefficiencies.
3. Forecasting: Predicting future financial performance.
➢ Example: Using regression analysis to forecast revenue.
Statistics in Agricultural Sciences: In agriculture, statistics is used to improve crop yields, optimize
resource use, and assess the impact of environmental factors.
Applications:
1. Experimental Design: Designing experiments to test the effects of fertilizers or pesticides.
➢ Example: Comparing crop yields under different irrigation methods using analysis of variance
(ANOVA).
2. Yield Prediction: Predicting crop yields based on weather and soil data.
➢ Example: Using regression models to forecast wheat production.
3. Resource Optimization: Allocating resources efficiently to maximize output.
➢ Example: Optimizing fertilizer use to reduce costs and increase yields.
Statistics in Food Science: In food science, statistics helps in quality control, sensory evaluation, and
product development. It ensures food safety and compliance with standards.
Applications in Food Science:
1. Quality Control: Monitoring the quality of food products during production.
➢ Example: Using control charts to monitor the weight of packaged products.
2. Sensory Evaluation: Analyzing consumer preferences and product acceptability.
➢ Example: Conducting taste tests to compare different flavors of a product.
3. Shelf Life Studies: Estimating the shelf life of food products.
➢ Example: Analyzing microbial growth data to determine expiration dates.
Descriptive statistics is a branch of statistics that focuses on summarizing, organizing, and analyzing
data to provide meaningful insights. It involves methods that describe and visualize data through
numerical measures, tables, and graphical representations. Unlike inferential statistics, which makes
predictions or inferences about a population based on a sample, descriptive statistics only describes
the data at hand. This course will equip students with the fundamental tools needed to explore and
summarize datasets effectively.
Page 3 of 7
DATA
Today, we work with laboratory equipment that is continuously producing huge data in the high-
dimension and large quantity. However, without an understanding of statistics which is the
techniques required to analyse, summarize and interpret these data, we are very limited in what
we can learn from our observations, which will in turn hinder our ability to move forward in our
research, decision making and planning. Even with experiments that generate very little data,
there is a need to simulate phenomena by modeling the behaviour of systems and their
parameters, which again often needs to be done statistically. It is therefore imperative that we
understand some basics concepts of statistics in our field. The knowledge of statistics in various
fields assists in the following.
i) Enables one to read and understand the various statistical studies performed in
your fields. To have this understanding, you must be knowledgeable about the
vocabulary, symbols, concepts, and statistical procedures used in these studies.
ii) Allows you to conduct research in your field. Since statistical procedures are basic
to research. To accomplish this, you must be able to design experiments; collect,
organize, analyse, and summarize data; and possibly make reliable predictions or
forecasts for future use. You must also be able to communicate the results of the
study in your own words.
iii) You can also use the knowledge gained from studying statistics to become better
consumers and citizens. For example, you can make intelligent decisions about
what products to purchase based on consumer studies, about government
spending based on utilization studies, and so on.
Statistical data are the basic raw materials for statistical investigation. Information is essentially
referred to as data in Statistics. In everything we do, we seek information to guide us in all our
activities. In fact, activities we embark upon today will provide information to guide us better in
executing similar activities in (subsequent days) future activities. However, gathering
information may be formal or informal. Formal Gathering of Information involves documented
information in which every bit of what has been observed in the past or what is being observed
currently is expected to be kept in its original (or raw) form. Informal Gathering of Information
involves information about experiences in the past which were not immediately captured. It may
not always provide desired level of information that is equivalent to complete retrieval as in the
formal method of gathering information.
DATA: Data are the values (measurements or observations) that a variable such as age, weight, height,
exam scores, shoe size etc. can assume. On the other hand, Biological data are data or measurements
collected from biological sources, which are often stored or exchanged in a digital form. Biological
data are commonly stored in files or databases.
Page 4 of 7
Examples of biological data include;
▪ Sequences: DNA, RNA, Protein
▪ Structures of biological Molecules
▪ Gene expressions profiles
▪ Biochemical pathway
▪ Chromosomal mapping
▪ Phylogenetic data
▪ Single Nucleotide Polymorphisms (SNPs), Etc.
The challenge thus lies in the use of statistical methods in analysing and making meaningful
inference for immediate and future use using some biological data.
DATA COLLECTION
There are two main source of data collection in statistics namely;
- Primary source
- Secondary source
➢ Primary source of data
Data from primary source are datasets obtained directly from the concerned object. Primary
sources of data provide data compiled as a result of population count or results obtained from a
sample of the population where the population is too large for individual count. Primary sourced
data can be collected either by
i. Direct personal observations (e.g. Laboratory experiments)
ii. Personal interview
iii. Mailed questionnaire
iv. Questionnaires administered by enumerators
v. Direct interview by people
Advantages of Primary source of data
i. It supplies exact information
ii. It gives more reliable data than the secondary source
iii. It gives detailed data than the secondary source
Disadvantages of Primary source of data
i. It is very expensive
ii. It takes time
iii. It may involve large non-responses.
Page 5 of 7
Advantages of secondary source of data
i. It gives quicker information than the primary source
ii. It is more timely than the primary source
iii. It is not as expensive as the primary source
Disadvantages of secondary source of data
i. It gives less information than the primary source
ii. It may be wider or narrower than the objectives of the research
iii. It may not be as detailed information as the primary source
NOTE
▪ VARIATE (VARIABLE): A variate is any quantity or attribute whose values varies from one
unit of investigation to another. In other words, a variable is a characteristic or attribute that
can assume different values. Variables whose values are determined by chance are called
random variables.
▪ OBSERVATION: An observation is the value taken by a variate or variable for a particular
unit of investigation.
SCALES OF MEASUREMENT
Scales of measurement, also known as levels of measurement or measurement scales, are a
fundamental concept in statistics that categorize the types of data based on their nature and
characteristics. These scales help researchers and statisticians determine the appropriate statistical
techniques and operations that can be applied to a particular set of data. There are four primary scales
of measurement:
1) NOMINAL: The nominal scale of measurement classifies data into mutually exclusive (non-
overlapping) categories in which no order or ranking can be imposed on the data. Data in this
category are qualitative. Example include: gender, Marital status, colour of substance, course
of study, etc.
2) ORDINAL: The ordinal level of measurement classifies data into categories that can be ranked;
however, precise differences between the ranks do not exist. For instance, when people are
classified according to their size of shoes (small, medium, or large), a large variation exists among
the individuals in each class. Other examples include; class of degrees, position in a competition,
HIV test result etc.
3) INTERVAL: Interval data have ordered categories with equal and meaningful intervals between
them, but they lack a true zero point. Examples include temperature measured in Celsius or
Fahrenheit and IQ scores. You can perform arithmetic operations like addition and subtraction on
interval data, but multiplication and division are not meaningful. Means, standard deviations, and
parametric statistical tests can be used.
4) RATIO: Ratio data have all the properties of interval data, but they also have a true zero point,
which indicates the absence of the measured quantity. Examples include height, weight, age, and
income. All arithmetic operations (addition, subtraction, multiplication, division) are meaningful
with ratio data. You can use means, standard deviations, and a wide range of statistical techniques.
Page 7 of 7