0% found this document useful (0 votes)
196 views

Scope and Limitations of Scope and Limitations of Statistics Statistics Statistics Statistics

This document discusses the scope and limitations of statistics. It explains that statistics are now used widely in many fields like economics, business, science, and administration. Some key points: - Statistics help formulate and evaluate economic plans, and express economic problems numerically. - Statistics are useful for businesses in formulating policies and forecasting trends. - Administrators rely on statistics for collecting information on policies. - Statistical methods are used extensively in research fields like agriculture, health, and social sciences. - Data collection methods include primary methods like surveys and observations, and secondary methods of using existing data. Classification and tabulation of data are important for presentation.

Uploaded by

Venkat Ramu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views

Scope and Limitations of Scope and Limitations of Statistics Statistics Statistics Statistics

This document discusses the scope and limitations of statistics. It explains that statistics are now used widely in many fields like economics, business, science, and administration. Some key points: - Statistics help formulate and evaluate economic plans, and express economic problems numerically. - Statistics are useful for businesses in formulating policies and forecasting trends. - Administrators rely on statistics for collecting information on policies. - Statistical methods are used extensively in research fields like agriculture, health, and social sciences. - Data collection methods include primary methods like surveys and observations, and secondary methods of using existing data. Classification and tabulation of data are important for presentation.

Uploaded by

Venkat Ramu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 188

Scope and Limitations of

Statistics
Dr. T N KAVITHA
Assistant Professor of Mathematics
SCSVMV
Here we discuss the Importance, Scope, and Limitations of
Statistics.
The scope of Statistics was limited in ancient times as the
government used statistics for the purpose of
administration alone.
Gradually, the subject became more and more popular and
its application has become more extensive. Now is hardly
any field of human activity where statistics are not used.
Now is used by economists, businessmen, scientists,
administrators, etc.
What is the Scope of Statistics?
All economic plans of a formulated on the
basis of statistical data. The success of the
plan is also evaluated with the help of
statistics.
Economic problems such as production,
consumption, wages, price profits,
unemployment, poverty, etc. can be
expressed numerically.
Statistics are very useful to businessmen. It
helps businessmen in formulating policies
regarding business and forecasting future
trends.
Efficient administration cannot be perceived
without statistics. Statistics have been used
from the time of origin of statistics to collect
information regarding the military and fiscal
policies.
Statistical methods are extensively used in
every type of research work. Whether it is
agriculture, health, or social science, the
statistics help in carrying out different types
of researches.
Data Collection Methods
In Statistics, the data collection is a process of
gathering information from all the relevant
sources to find a solution to the research problem.
It helps to evaluate the outcome of the problem.
The data collection methods allow a person to
conclude an answer to the relevant question.
Most of the organizations use data collection
methods to make assumptions about future
probabilities and trends.
A data can be classified into two types, namely
• Primary Data Collection methods
• Secondary Data Collection methods
Primary data or raw data is a type of information
that is obtained directly from the first-hand
source through experiments, surveys, or
observations. The primary data collection
method is further classified into two types.
They are
Quantitative Data Collection Methods
Qualitative Data Collection Methods
Quantitative Data Collection Methods
It is based on mathematical calculations using
various formats like close-ended
questions, methods,
mean, median or mode measures. This method is
cheaper than qualitative data collection methods,
and it can be applied in a short duration of time.
It does not involve any mathematical calculations.
This method is closely associated with elements
that are not quantifiable. This qualitative data
collection method includes interviews,
questionnaires, observations, case studies etc.
There are several methods to collect this type of
data. They are
• Observation Method
• Questionnaire Method
• Interview Method
The different types of observations are:
Structured and unstructured observation
Controlled and uncontrolled observation
Participant,
Participant, non-participant and disguised
observation
The method of collecting data in terms of oral or
verbal responses. It is achieved in two ways, such as
Personal Interview – In this method, a person
known as an interviewer is required to ask questions
face to face to the other person. The personal
interview can be structured or unstructured, direct
investigation, focused conversation etc.
Telephonic Interview – In this method, an
interviewer obtains information by contacting people
on the telephone to ask the questions or views orally.
In this method, the set of questions are mailed to
the respondent.
They should read, reply and subsequently return
the questionnaire.
The questions are printed in the definite order on
the form.
A good survey should have the following
features:
Short and simple
Should follow a logical sequence
Provide adequate space for answers
Avoid technical terms
Should have good physical appearance such as
colour, quality of the paper to attract the
attention of the respondent
This method is similar to the questionnaire
method with a slight difference. The
enumerations are specially appointed for the
purpose of filling the schedules. It explains the
aims and objects of the investigation and may
remove misunderstandings if any have come up.
Enumerations should be trained to perform their
job with hard work and patience.
Secondary data is data collected by someone
other than the actual user. It means that the
information is already available, and someone
analyses it. The secondary data includes
magazines, newspapers, books, journals etc. It
may be either published data or unpublished data.
Government publications
Public records
Historical and statistical documents
Business documents
Technical and trade journals
Diaries
Letters
Unpublished biographies etc.
Types of
Classifications&Tabulation
Classifications &Tabulation
The grouping of related facts/data into different
classes according to certain common
characteristic.
Basis of data Classification: Broadly 4 broad
basis
1. Geographical
2. Chronological or Temporal
3. Qualitative
4. Quantitative
Geographical classifications i.e. area wise
• Total Population of india by states and by
districts
• No. of death due to covid-19 by countries.
• Deaths in Tamilnadu by districts

so area wise classification i.e world , countries,


states, districts, region, ......
Chronological classification or Temporal
classification i.e. on the basis of time Table:

by year wise , month wise, week wise, daywise,


time wise,..........
Qualitative i.e. on the Sex Urban Rural

basis of some attributes Boys 200 390

or characters Girls 167 100

Example: People by
place of residence,
Rural- Urban, Male-
Female, Illiterate-
Literate .......
Quantitative: On the basis of quantitative class
intervals
For example students of a college may be classified
according to weight as follows

Weight of 90-100 100-110 110-120 120-130 130-140 140-150 Total


students of a
college Wt.
In (LBS)

No. of 50 200 260 360 90 40 1000


students
Presentation of data refers to an exhibition or
putting up data in an attractive and useful
manner such that it can be easily interpreted. The
three main forms of presentation of data are:
1.Textual presentation
2.Data tables OR Tabulation
3.Diagrammatic presentation
Tabulation is the systematic arrangement of the
statistical data in columns or rows.
It involves the orderly and systematic
presentation of numerical data in a form
designed to explain the problem under
consideration.
Tabulation helps in drawing the inference from
the statistical figures.
In general, the tabulation is classified in two
parts, that is a simple tabulation, and a complex
tabulation.
Simple tabulation, gives information regarding
one or more independent questions.
Complex tabulation gives information regarding
two mutually dependent questions.
These types of table give information regarding
two mutually dependent questions.
For example,
How many millions of the persons are in the
Divisions?
The Two-Way Tables will answer the question by
giving the column for female and male.
Three-Way Table gives information regarding three
mutually dependent and inter-related questions.
For example, from one-way table, we get
information about population, and from two-way
table, we get information about the number of male
and female available in various divisions.
Now we can extend the same table to a three way
table, by putting a question, “How many male and
female are literate?”
Thus the collected statistical data will show the
following, three mutually dependent and inter-
related questions:
1.Population in various division.
2.Their sex-wise distribution.
3.Their position of literacy.
• Table Number:
• Title:
• Headnotes:
• Stubs:
• Caption:
• Body or field:
• Footnotes:
• Source:
Table Number: Each table should have a specific
table number for ease of access and locating. This
number can be readily mentioned anywhere which
serves as a reference and leads us directly to the data
mentioned in that particular table.
Title: A table must contain a title that clearly tells
the readers about the data it contains, time period
of study, place of study and the nature of
classification of data.
Headnotes: A headnote further aids in the
purpose of a title and displays more information
about the table. Generally, headnotes present the
units of data in brackets at the end of a table title.
Stubs: These are titles of the rows in a table.
Thus a stub display information about the data
contained in a particular row.
Caption: A caption is the title of a column in the
data table. In fact, it is a counterpart if a stub and
indicates the information contained in a column.
Body or field: The body of a table is the content
of a table in its entirety. Each item in a body is
known as a ‘cell’.
Footnotes: Footnotes are rarely used. In effect,
they supplement the title of a table if required.
Source: When using data obtained from a
secondary source, this source has to be
mentioned below the footnote.
There are many ways for construction of a good table.
However, some basic ideas are:
The title should be in accordance with the objective of
study:
Comparison:
Alternative location of stubs:
Headings:
Footnote:
Size of columns:
Use of abbreviations:
Units:
The title should be in accordance with the
objective of study: The title of a table should
provide a quick insight into the table.
Comparison: If there might arise a need to
compare any two rows or columns then these
might be kept close to each other.
Alternative location of stubs: If the rows in a
data table are lengthy, then the stubs can be
placed on the right-hand side of the table.
Headings: Headings should be written in a
singular form. For example, ‘good’ must be used
instead of ‘goods’.
Footnote: A footnote should be given only if
needed.
Size of columns: Size of columns must be
uniform and symmetrical.
Use of abbreviations: Headings and sub-
headings should be free of abbreviations.
Units: There should be a clear specification of
units above the columns.
1.The large mass of confusing data is easily reduced
to reasonable form that is understandable to kind.
2.The data once arranged in a suitable form, gives
the condition of the situation at a glance, or gives a
bird eye view.
3.From the table it is easy to draw some reasonable
conclusion or inferences.
4.Tables gave grounds for analysis of the data.
5.Errors, and omission if any are always detected in
tabulation.
Graphical Representation
Graphical Representation is a way of analysing
numerical data. It exhibits the relation between
data, ideas, information and concepts in a
diagram. It is easy to understand and it is one of
the most important learning strategies. It always
depends on the type of information in a
particular domain.
There are different types of graphical
representation. Some of them are as follows:
Line Graphs
Bar Graphs
Histograms
Line Plot
Frequency Table
Circle Graph
Stem and Leaf Plot
Line Graphs – Line graph or the linear graph
is used to display the continuous data and it is
useful for predicting future events over time.
Bar Graphs – Bar Graph is used to display
the category of data and it compares the data
using solid bars to represent the quantities.
Histograms – The graph that uses bars to
represent the frequency of numerical data that
are organised into intervals. Since all the
intervals are equal and continuous, all the bars
have the same width.
Line Plot – It shows the frequency of data on
a given number line. ‘ x ‘ is placed above a
number line each time when that data occurs
again.
Frequency Table – The table shows the
number of pieces of data that falls within the
given interval.
Circle Graph – Also known as the pie chart
that shows the relationships of the parts of the
whole. The circle is considered with 100% and
the categories occupied is represented with that
specific percentage like 15%, 56%, etc.
Stem and Leaf Plot – In the stem and leaf plot,
the data are organised from least value to the
greatest value. The digits of the least place
values from the leaves and the next place value
digit forms the stems.
Box and Whisker Plot – The plot diagram
summarises the data by dividing into four parts.
Box and whisker show the range (spread) and the
middle ( median) of the data.
There are certain rules to effectively present the
information in the graphical representation. They
are:
Suitable Title: Make sure that the appropriate
title is given to the graph which indicates the
subject of the presentation.
Measurement Unit: Mention the
measurement unit in the graph.
Proper Scale: To represent the data in an accurate
manner, choose a proper scale.
Index: Index the appropriate colours, shades, lines,
design in the graphs for better understanding.
Data Sources: Include the source of information
wherever it is necessary at the bottom of the graph.
Keep it Simple: Construct a graph in an easy way
that everyone can understand.
Neat: Choose the correct size, fonts, colours etc in
such a way that the graph should be a visual aid for
the presentation of information.
Algebraic principles are applied to all types of
graphical representation of data.
In graphs, it is represented using two lines called
coordinate axes.
The horizontal axis is denoted as the x-axis and
the vertical axis is denoted as the y-axis.
The point at which two lines intersect is called an
origin ‘O’.
Consider x-axis,
the distance from the origin to the right side will
take a positive value and the distance from the
origin to the left side will take a negative value.
Similarly, for the y-axis,
the points above the origin will take a positive
value, and the points below the origin will take a
negative value.
• Generally, the frequency distribution is
represented in four methods, namely
• Histogram
• Smoothed frequency graph
• Pie diagram
• Cumulative or ogive frequency graph
• Frequency Polygon
Some of the merits of using graphs are as
follows:
The graph is easily understood by everyone
without any prior knowledge.
It saves time
It allows us to relate and compare the data for
different time periods
It is used in statistics to determine the mean,
median and mode for different data, as well as in
the interpolation and the extrapolation of data.
Some of the advantages of graphical
representation are:
It makes data more easily understandable.
It saves time.
It makes the comparison of data more efficient.
Measures of central
Tendency
• Mean - Average
• Median - Middle most value
• Mode - Most repeated value
M e a n =
 x
n
 x x 1 + x 2 + x 3 + x 4 + ........
M ean = =
n n

M ean = A +
 d
n
w h e re d = x -A ,
A -A s s u m e d m e a n ( In d iv id u a l O b s e rv a tio n s )
Mode = the item which is occurred more
number of times.
Mode = the item which is occurred more
number of times.
Relation Between Mean, Median and Mode
If the value of the n=mode is equal to the value of
the median and the mean then we call it as
symmetrical data set. For such data sets, there is a
simple relationship between the three M’s (mean,
median and mode):
Mean - Mode =3 (Mean – Median)
(OR)
Mode = Mean - 3 Mean + 3 Median
(OR)
Mode = 3 Median - 2 Mean
Find the AM , median and mode of the following
set of observations:
25,32,28,34,24,31,36; 27,29,30.
Find the mean, median and
mode for the following data:
Class(x) frequency(f)
0-10 20
10-20 5
20-30 3
30-40 8
40-50 10
50-60 35
60-70 10
70-80 4
80-90 3
90-100 2
Measures of Dispersion
MEASURES OF DISPERSION

• Range
• Quartile Deviation
• Mean Deviation:
• Standard Deviation
Range
Range = L - S
L- Largest values
S- Smallest value

L − S
Coefficient of Range =
L + S
Quartile Deviation
Mean Deviation(Individual
Observations)
Mean Deviation(Discrete series)
Mean Deviation(Continuous
series)
Standard Deviation(Individual
Observations)
Standard Deviation(Discrete
series)
Standard Deviation(Continuous
series)
Coefficient of Variation
Range
Geometric Mean, weighted
Arithmetic mean & Harmonic Mean
A simple way to define a harmonic mean is to
call it the reciprocal of the arithmetic mean of
the reciprocals of the observations. The most
important criteria for it is that none of the
observations should be zero.
If all the observation taken by a variable are
constants, say k, then the harmonic mean of the
observations is also k
The harmonic mean has the least value when
compared to the geometric mean and the
arithmetic mean
A.M <= G.M <= H.M
• A harmonic mean is rigidly defined
• It is based upon all the observations
• The fluctuations of the observations do not
affect the harmonic mean
• More weight is given to smaller items
• Not easily understandable
• Difficult to compute
The Geometric Mean is a special type of average
where we multiply the numbers together and
then take a square root (for two numbers), cube
root (for three numbers) etc.
Geometric mean = G.M.
= (x1 f1. x2 f2 … xn fn) (1 ∕ N)
A geometric mean is a mean or average which
shows the central tendency of a set of numbers
by using the product of their values. For a set of
n observations, a geometric mean is the nth root
of their product. The geometric mean G.M., for a
set of numbers x1, x2, … , xn is given as
G.M. = (x1. x2 … xn)1∕n
or, G. M. = (π i = 1n xi) 1∕n
= n√( x1, x2, … , xn).
The geometric mean of two numbers, say x, and
y is the square root of their product x × y. For
three numbers, it will be the cube root of their
products i.e., (x y z) 1∕3.
In order to make our calculation easy and less time
consuming we use the concept of logarithms in the
calculation of geometric means.
Since, G.M. = (x1. x2 … xn) 1∕n
Taking log on both sides, we have
log G.M. = (1 ∕ n) (log ((x1. x2 … xn))
or, log G.M. = (1 ∕ n)(log x1 + log x2 + … + log xn)
or, log G.M. = (1 ∕ n) ∑ I = 1n log xi
or, G.M. = Antilog(1 ∕ n)∑ I = 1n log xi)).
For a grouped frequency distribution, the geometric
mean G.M. is
G.M. = (x1 f1. x2 f2 … xn fn) 1 ∕ N ,
where N = ∑ i = 1n fi
Taking logarithms on both sides, we get
log G.M. = 1 ∕ N (f1 log x1 + f2 log x2 + …+ fn log xn)
= 1 ∕ N [∑ i = 1n fi log xi ].
x 2 4 5 8

f 3 3 2 2
• The logarithm of geometric mean is the
arithmetic mean of the logarithms of given
values
• If all the observations assumed by a variable
are constants, say K >0, then the G.M. of the
observation is also K
• The geometric mean of the ratio of two
variables is the ratio of the geometric means of
the two variables
• The geometric mean of the product of two
variables is the product of their geometric means
Suppose G1, and G2 are the geometric means of two
series of sizes n1, and n2 respectively. The geometric
mean G, of the combined groups, is:
log G = (n1 log G1 + n2 log G2) ∕ (n1 + n2)
or, G = antilog [(n1 log G1 + n2 log G2) ∕ (n1 + n2)]
In general for ni geometric means, i = 1 to k,
we have
G = antilog [(n1 log G1 + n2 log G2 + … + nk log Gk)
∕ /(n1 + n2 + … +nk)]
• A geometric mean is based upon all the
observations
• It is rigidly defined
• The fluctuations of the observations do not
affect the geometric mean
• It gives more weight to small items
• A geometric mean is not easily
understandable by a non-mathematical
person
• If any of the observations is zero, the
geometric mean becomes zero
• If any of the observation is negative, the
geometric mean becomes imaginary
Weighted Mean
Weighted Mean is an average computed by
giving different weights to some of the
individual values. If all the weights are equal,
then the weighted mean is the same as the
arithmetic mean.
Question: Suppose that a marketing firm conducts a survey of
1,000 households to determine the average number of TVs each
household owns. The data show a large number of households
with two or three TVs and a smaller number with one or four.
Every household in the sample has at least one TV and no
household has more than four. Find the mean number of TVs per
household.
Number of 1 2 3 4
TVs per
Household
Number of 73 378 459 90
Households
Solution:
As many of the values in this data set are repeated
multiple times, you can easily compute the sample
mean as a weighted mean. Follow these steps to
calculate the weighted arithmetic mean:
• Step 1: Assign a weight to each value in the
dataset:
• x1= 1, w1= 73
• x2= 2, w2= 378
• x3= 3, w3= 459
• x4= 4, w4= 90
Step 2:
Compute the numerator of the weighted mean
formula.
Multiply each sample by its weight and then add
the products together:

∑4i = 1wixi = w1x1+w2x2+w3x3+w4x4


= (1)(73)+(2)(378)+(3)(459)+(4)(90)
= 73 + 756 + 1377 +360
= 2566
Step 3:
Now, compute the denominator of the weighted
mean formula by adding the weights together.
∑4i = 1wi = w1+w2+w3+w4
= 73 + 378 + 459 + 90
=1000
Step 4:
Divide the numerator by the denominator
∑4i=1wixi / ∑4i=1wi
=2566 / 1000
=2.566
The mean number of TVs per household in this
sample is 2.566.
UNIT III
III
CORRELATION & REGRESSION

BY
DR. T N KAVITHA
ASSISTANT PROFESSOR OF MATHEMATICS
SCSVMV
Regression:
byx= 5/4= 1.25
bxy = 9/20 = 0.45
r = ±v 1.25x0.45= ±v0.5625
=±0.75
Rank Correlation:
Rank correlation for Repeated ranks:

Where m1,m2,…. are the number of times a


value repeated
Measures of Skewness
The measures of central tendency and variation
do not reveal all the characteristics of a given set
of data.
For example, two distributions may have the
same mean and standard deviation but may differ
widely in the shape of their distribution.
Either the distribution of data is symmetrical or it
is not. If the distribution of data is not
symmetrical, it is called asymmetrical or skewed.
Thus skewness refers to the lack of symmetry in
distribution.
Asymmetrical or skewed symmetrical distribution
distribution mean = median = mode.
Negative skewness
If the longer tail is
towards the lower value
or left hand side, the
skewness is negative.
Negative skewness arises
when the mean is
decreased by some
extremely low values,
thus making
mean < median < mode.
Positive skewness
If the longer tail of the
distribution is towards
the higher values or right
hand side, the skewness
is positive. Positive
skewness occurs when
mean is increased by
some unusually high
values, thereby making
mean > median > mode.
RELATIVE SKEWNESS
Note
If the value of this coefficient is zero, the distribution
is symmetrical; (mean = median=mode)

If the value of the coefficient is positive, it is


positively skewed distribution,(mean>median>mode)

If the value of the coefficient is negative, it is


negatively skewed distribution. (mean<median<mode)
The value of this coefficient usually lies between ± 1
When we are given open-end
distributions where extreme
values are present in thedata or
positional measures such as
median and quartiles, the
formula for coefficient of
skewness (given by Bowley) is
more appropriate.
Absolute Measures of
Skewness
Following are the absolute measures of skewness:
1. Skewness (Sk) = Mean – Median
2. Skewness (Sk) = Mean – Mode
3. Skewness (Sk) = (Q3 - Q2) - (Q2 - Q1)

For comparing to series, we do not calculate these


absolute mearues we calculate the relative measures
which are called coefficient of skewness.

Coefficient of skewness are pure numbers independent


of units of measurements.
For a distribution Karl Pearson’s coefficient of
skewness is 0.64,S.D is 13 and mean is 59.2 Find mode
and median.
Karl Pearson’s coefficient of skewness is 1.28,
its mean is164and mode 100, find the standard
deviation.
The following are the marks of 150 students in
an examination.Calculate Karl Pearson’s
coefficient of skewness.
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
10 40 20 0 10 40 16 14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy