Module 001 Basic Statistical Concept
Module 001 Basic Statistical Concept
Module 001 Basic Statistical Concept
1
Basic Statistical Concept
Introduction
In today’s technologically advanced world, we have access to large volumes of data. The first
step of data analysis is to accurately summarize all of this data, both graphically and
numerically, so that we can understand what the data reveals. To be able to use and interpret
the data correctly is essential in making informed decisions. For instance, when you see a
survey of opinion about a certain TV program, you may be interested in the proportion of
those people who indeed like the program.
This is an example of application of statistics:
What is Statistics?
According to the International Encyclopedia of Statistical Science, Statistics is the
study of how to collect, organize, analyze, and interpret numerical information from
data. It is both the science of uncertainty and the technology of extracting information
from data.
It is a particularly useful branch of mathematics that is not only studied theoretically
by advanced mathematicians but one that is used by researchers in many fields to
organize, analyze and summarize data. Statistical methods and analyses are often used
to communicate research findings and to support hypotheses and give credibility to
research methodology and conclusions.
Statistics is a branch of science that deals with the collection, organization, analysis of
data and drawing of inferences from the samples to the whole population. This
requires a proper design of the study, an appropriate selection of the study sample and
choice of a suitable statistical test. An adequate knowledge of statistics is necessary for
proper designing of an epidemiological study or a clinical trial. Improper statistical
methods may result in erroneous conclusions which may lead to unethical practice.
Variables
Variable is a characteristic that varies from one individual member of population to
another individual. Variables such as height and weight are measured by some type
Course Module
of scale, convey quantitative information and are called as quantitative variables.
Sex and eye color give qualitative information and are called as qualitative variables.
Quantitative Variables
Quantitative or numerical data are numerical measurements that arise from a
natural numerical scale. It represents measurable quantities. The values which these
variables can take can be ordered in a logical or natural way. Examples are:
Size of shoes,
Price of houses,
Number of semesters studied, and
Weight of a person
Quantitative Variables are subdivided into discrete and continuous
measurements:
1. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,…
(integer). Observations that can be counted constitute the discrete data.
Example:
Number of episodes of respiratory arrests or the number of re-intubations in
an intensive care unit
2. Continuous data can assume any value. and observations that can be
measured constitute the continuous data.
Example:
The serial serum glucose levels, partial pressure of oxygen in arterial blood
and the esophageal temperature.
Qualitative Variables
Qualitative Data are measurements for which there is no natural numerical scale,
but which consist of attributes, labels, or other non numerical characteristics. These
are variables that cannot be ordered in a logical or natural way. For example:
The color of the eye,
The name of a political party, and
The type of transport used to travel to work.
These are all qualitative variables. Neither is there any reason to list blue eyes before
brown eyes (or vice versa) nor does it make sense to list buses before trains (or vice
versa).
Quantitative Method
3
Basic Statistical Concept
The extent to which the observations cluster around a central location is described
by the central tendency and the spread towards the extremes is described by the
degree of dispersion
𝑎1 + 𝑎2 + … + 𝑎𝑛
𝜇=
𝑛
where
Course Module
µ is the population mean, or
𝑥̅ is the sample mean
n is the total number of items in a set
a is each element in a set
Example:
Given the set of values: {1, 2, 4, 7}, we substitute the values to the given formula.
1+2+4+ 7
𝜇=
4
14
𝜇=
4
𝜇 = 3.5
Median is defined as the middle of a distribution in a ranked data (with half of the
variables in the sample above and half below the median value). If the number of
values in a set is even, then the median is the sum of the two middle values divided
by two (2).
Mode is the most frequently occurring variable in a distribution. A set can have more
than one mode.
Unimodal – A set that has only one mode
Bimodal – A set with two modes
Multimodal – A set with three or more modes
Variance is a measure of how spread out is the distribution. It gives an indication of
how close an individual observation clusters about the mean value. The variance of a
population is defined by the following formula:
2
∑(X i − X)2
σ =
N
where:
σ2 is the population variance,
X is the population mean,
Xi is the i th element from the population, and
N is the number of elements in the population.
Quantitative Method
5
Basic Statistical Concept
2
∑(X i − X)2
s =
n−1
where:
s2 is the sample variance,
x is the sample mean,
xi is the i th element from the sample and
n is the number of elements in the sample.
The formula for the variance of a population has the value ‘n’ as the denominator. The
expression ‘n−1’ is known as the degrees of freedom and is one less than the number
of parameters. Each observation is free to vary, except the last one which must be a
defined value. The variance is measured in squared units. To make the interpretation
of the data simple and to retain the basic unit of observation, the square root of
variance is used. The square root of the variance is the standard deviation (SD).The
SD of a population is defined by the following formula:
∑(X i − X)2
σ = √(( )
n−1
where:
s is the sample SD,
x is the sample mean,
Course Module
xi is the i th element from the sample, and
n is the number of elements in the sample
Population vs Sample
We begin with a simple example. There are millions of passenger automobiles in the
United States. What is their average value? It is obviously impractical to attempt to
solve this problem directly by assessing the value of every single car in the country,
adding up all those numbers, and then dividing by however many numbers there are.
Instead, the best we can do would be to estimate the average. One natural way to do
so would be to randomly select some cars, say 200 of them, ascertain the value of each
of those cars, and find the average of those 200 numbers.
The set of all those millions of vehicles is called the population of interest, and the
number attached to each one is a measurement, the average value is a parameter. The
set of 200 cars selected from the population is called a sample, and the 200 numbers,
the monetary values of the cars we selected, are the sample data. And the average of
the data is the statistics.
Population
In simple terms, population means the aggregate of all elements under study having
one or more common characteristic, for example, all people living in India constitutes
the population. The population is not confined to people only, but it may also include
animals, events, objects, buildings, etc. It can be of any size, and the number of
elements or members in a population is known as population size, i.e. if there are
hundred million people in India, then the population size (N) is 100 million. The
different types of population are discussed as under:
1. Finite Population: When the number of elements of the population is fixed and
thus making it possible to enumerate it in totality, the population is said to be
finite.
Examples
The population of all workers working in the sugar factory
The population of motorcycles produced by a particular company
The population of mosquitoes in a town
The population of tax payers in India
Sample
By the term sample, we mean a part of population chosen at random for participation
in the study. The sample so selected should be such that it represent the population
in all its characteristics, and it should be free from bias, so as to produce miniature
cross-section, as the sample observations are used to make generalizations about the
population.
Course Module
In other words, the respondents selected out of population constitutes a ‘sample’, and
the process of selecting respondents is known as ‘sampling.’ The units under study
are called sampling units, and the number of units in a sample is called sample size.
While conducting statistical testing, samples are mainly used when the sample size is
too large to include all the members of the population under study.
The difference between population and sample can be drawn clearly on the following
grounds:
The population consists of each and every element of the entire group. On the other
hand, only a handful of items of the population is included in a sample.
The characteristic of population based on all units is called parameter while the
measure of sample observation is called statistic.
When information is collected from all units of population, the process is known as
census or complete enumeration. Conversely, the sample survey is conducted to
gather information from the sample using sampling method.
With population, the focus is to identify the characteristics of the elements whereas
in the case of the sample; the focus is made on making the generalization about the
characteristics of the population, from which the sample came from.
Importance of Statistics
In general, statistics can be defined as a branch of applied research which is
concerned with the development and application methods for collecting, organizing,
presenting, analyzing and interpreting quantitative data in such a way that the
reliability of conclusions based on data may be evaluated in terms of probability
statements. It can be used in a diversified field of study; some of the purpose of
statistics can be as follows:
We can represent the things in their true form with the help of figures. Without
a statistical study, our ideas would be vague and indefinite. The facts are to be
given in a definite form. If the results are given in numbers, then they are more
Quantitative Method
9
Basic Statistical Concept
convincing than if the results are expressed on the basis of quality. The
statements like, there is lot of unemployment in India or population is
increasing at a faster rate are not in the definite form. The statements should
be in definite form like the population in 2004 would be 15% more as
compared to 1990.
The statistics are presented in a definite form so they also help in condensing
the data into important figure, so statistical methods present meaningful
information. In other words statistics helps in simplifying complex data to
simple-to make them understandable. The data may be presented in the form
of a graph, diagram or through an average, or coefficients etc. For example, we
cannot know the price position from individual prices of all good, but we can
know it, if we get the index of general level of prices.
3. Comparisons
5. Forecasting
Statistics is not only concerned with the above functions, but it also predicts
the future course of action of the phenomena. We can make future policies on
the basis of estimates made with the help of Statistics. We can predict the
demand for goods in 2005 if we know the population in 2004 on the basis of
growth rate of population in past. Similarly a businessman can exploit the
market situation in a successful manner if he knows about the trends in the
Course Module
market. The statistics help in shaping future policies.
6. Policy Making
With help of statistics we can frame favourable policies. How much food is
required to be imported in 2007? It depends on the food-production in 2007
and the demand for food in 2007. Without knowing these factors we cannot
estimate the amount of imports. On the basis of forecast the government forms
the policies about food grains, housing etc. But if the forecasting is not correct,
then the whole set up will be affected.
7. It Enlarges Knowledge
Whipple rightly remarks that “Statistics enables one to enlarge his horizon”.
So when a person goes through various procedures of statistics, it widens his
knowledge pattern. It also widens his thinking and reasoning power. It also
helps him to reach to a rational conclusion.
8. To Measure Uncertainty:
Future is uncertain, but statistics help the various authorities in all the
phenomenon of the world to make correct estimation by taking and analyzing
the various data of the part. So the uncertainty could be decreased. As we have
to make a forecast we have also to create trend behaviors of the past, for which
we use techniques like regression, interpolation and time series analysis.
Some of the major purposes of statistics are to help us understand and describe
phenomena in our world and to help us draw reliable conclusions about those
phenomena. It plays an important role in every field of human activity.
4. Banking – The banks make use of statistics for number of purposes. The
bankers use statistical approaches to estimate the number of depositors and
their claims for a certain day.
Course Module
2. It is a breakdown of statistics into three areas namely Sampling Method, Descriptive
Statistics and Inferential Statistics; https://youtu.be/SFPGVTThJNk; July 29, 2018
3. Discuss the definition of statistics and its importance. It uses graphics, pictures, and
interesting stories to illustrate the relevance of statistics and how many different
things can be learned about the nation and its communities through the study of
statistics; https://youtu.be/yxXsPcobphQ; July 29, 2018