Ppt
Ppt
Ppt
10. The general term for any unit which is measured in a research
is called variable
T
Statistics
“Statistics is a way to get information from
data”
Statisti
cs
Informatio
Data
Data: Facts, n
Information:
especially Knowledge
numerical facts, communicated
collected concerning some
together for particular fact
information. related to these
data.
Definitions…
A variable is some characteristic of a
population or sample which is different for
different samples. (Quantitative
characteristics)
E.g. student grades, Student height
Independent----Explanatory or causal
variable
Latent variable---- can not be measured
I. Descriptive
II. Inferential
Types of data:-
1. Quantitative data
2. Qualitative data
3. Time series data (longitudinal
data)
Qualitative data
When a particular characteristics can not
be measured, but can be expressed in
frequency.
Non-numerical
Ex- gender , religion.
Can not measure characteristics, but it can
be expressed in frequency.
Enumeration data
Nominal and Ordinal
Quantitative data
Both the characteristics and frequency of a
variable can be measured.
Measurement data
Continuous and discrete data
DISCRETE/ categorical
whole number
Example: The no. of family members
The no. of heart beats
The no. of admissions in a day
Height (cm/feet/inch)
Weight
Hb
Blood sugar
Blood pressure
Time Series Data…
Observations measured at the same point
in time are called cross-sectional data.
Kelvin scale
Eg.
IQ
Credit score
SES
How to represent the data ?
Y-Values
4 Y-
Val
0 ues
0 2 4
Data Presentation
Qual
Bar
Pie or sector diagram & doughnut chart
Pictogram
Map or spot diagram
Venn diagram
Quant
Histogram
Frequency polygon & Frequency curve
Line diagram
Cumulative frequency polygon (ogive)
Scatter diagram
Venn diagram
Shows degree of overlap and exclusivity for
2 or more characteristics or factors within a
sample, or population
Bar diagram
Simple
Multiple or compound
Component or proportional
Frequency polygon
Frequency Curve
Line diagram
Ogive…
Is a graph of a cumulative frequency
distribution.
first class…
next
class: .355+.185=.540
:
:
last
class: .930+.070=1.00
Ogive…
The ogive can be used
to answer questions
like:
“around $35”
Scatter Diagram…
Dot diagram/ Correlation diagram/ Scatter
plot
Example :- A real estate agent wanted to
know to what extent the selling price of a
home is related to its size…
ŷ a bX
Dependent
Independen
t
Y = Dependent variable
X = Independent variable.
Scatter Diagram…
It appears that in fact there is a
relationship, that is, the greater the house
size the greater the selling price…
Patterns of Scatter
Diagrams…
Linearity and Direction are two concepts we
are interested in
1.
37
Weak or Non-Linear Relationship
Scatter diagram is the only diagram for
quantitative data, where relationship
between two variable is determined.
Box plot / Box and
Whisker plot
Box plot is a representation of the quartiles
(25%, 50% and 75%) and the range of a
continuous and ordered data set.
Stem and Leaf plot
1. The best method to show the association between
height and weight of children in a class is by
(d) Histogram
c
2. Two variables can be plotted together in which of
the following diagram?
(b) Histogram
d
3. The age and sex structure of a population may be
represented by
c
4. Between height and weight there is an
(a) Association
(b) Correlation
(c) Proportion
(d) Index
a & b
5. An analysis of the religion of populations, who reside in a
rural block, reveals that 45% are Hindu, 30% are Muslims,
15% are Christians, and 10% are Jains. These data would best
be depicted graphically by which of the following diagram?
(d) Histogram
(b) Ogive
Sample Mean
Population Mean
Statistics is a pattern
language…
Population Sample
Size N n
Mean
Problem of being “mean”
The main problem associated with the
mean value of some data is that it is
sensitive to outliers.
The Median
Because the mean average can be
sensitive to extreme values, the median is
sometimes useful and more accurate.
E.g.
Data: {4, 4, 4, 4, 50} Range = 46
Data: {4, 8, 15, 24, 39, 50} Range = 46
The range is the same in both cases,
but the data sets have very different
distributions…
Statistics is a pattern
language…
Population Sample
Size N n
Mean
Variance
Variance… population mean
S = ( X X ) 2
(n - 1)
=square root
=sum (sigma)
X=score for each point in data
_
X=mean of scores for the variable
n=sample size (number of
observations or cases
Variance
( X X ) 2
2=
S (n - 1)
• Note that this is the same equation except for
no square root taken.
Sample Variance
𝜒2 =
E
= (22-18.36)2 + (68-71.55)2 + (14-17.54)2 +
(72-68.37)2
18.36 71.55 17.54
68.37
= 1.39
Step-4: finding the degree of freedom (df)
df= (C-1) (R-1)
= (2-1) (2-1)
=1
of 0.05 is 3.84.
Since our observed value (1.79) is much lower,
we can conclude that Vac-B is not superior to
Vac-A & our null hypothesis is true
Q: A study was conducted to find out
effectiveness of ORS and Homebased
fluid, in children less than 5 years. Out of
100 under-5 taking ORS, 15 developed
dehydration. Out of 150 Under-5 taking
home based fluid, 35 developed
dehydration. Find out association of
dehydration with ORS and home based
fluid.
Dehydratio Dehydratio Tot Atta
n(+) n(-) al ck
rate
ORS 15 85 100 15%
Home 35 115 150 23.3
Based %
Total 50 200 250