Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

CLASSIFICATION OF DATA

Objectives:
1. Understand how data are
classified.
2. Recognize the different types
of data.
VARIABLES AND DATA
What is a variable?
A variable is any one of the measures which
comprise the database in a study.
or:
A variable is a single parameter in a research
database.
THEREFORE: RESEARCH DATABASES ARE
COMPOSED OF DATA SETS OF
VARIABLES
PROPERTIES OF VARIABLES
 RELATIONAL DESCRIPTION
 How variables relate to each other

 FUNCTIONAL DESCRIPTION
RELATIONAL DESCRIPTION OF
VARIABLES
A. INDEPENDENT. Those to which you
randomize, or control or manipulate

B. DEPENDENT. That which you measure.


The results; the outcome etc. OR that
what you hope comes out well!
Univariate Statistics
 One independent; one dependent variable
 More than one of each is called
“multivariate” statistics
FUNCTIONAL DESCRIPTION OF
VARIABLES (Levels of Measurement)
A. QUALITATIVE (categorical)
1. Nominal: data described by a quality, property or
category- given a name i.e. eye color
2. Ordinal: categorical data which can be assigned
arbitrary numerical ranks
Ordinal data doesn’t relate to one another
mathematically- i.e. 1-10 scale for pain
B. QUANTITATIVE (numerical)
1. Continuous: Infinitely variable
(interval {0 arb} or ratio {0 not arb})
2. Discrete- not infinitely variable
Some Examples:
NOMINAL DATA
Pos/Neg (eg, dichotomous data)
Variety of lab tests
Gene expression, amplification, etc
Male/Female
Disease Present/Absent

Multiple categories
Mild, moderate, severe
Phenotypic data
Food preferences
Some Examples:
ORDINAL DATA

Scales and indices (assessments):


Pain (VAS)- i.e. 1-10 scale
Physiologic responses
Psychological tests (Likert scale) (never, likely,
always, etc.)
Grading histology slides

Assignment based on ranks:


Water quality
Range assignment

REMEMBER: THESE “NUMBERS” ARE NOT RATIONAL!


Some Examples:
CONTINUOUS DATA

Measurements using common analytical instrumentation


Glucose
Blood Pressure
Cell surface receptors

Measurements using mathematically-derived scales


Height
Weight
Some Examples:
DISCRETE DATA

Always discrete (“Attribute Data”)


Number off offspring
Genotypic: eg, deletions, translocations, etc.

Conveniently discrete: NB: THESE DON’T HAVE TO BE


DISCRETE/NOT ALWAYS DISCRETE
Age
Time
“Thinking on your feet”
 What kind of data are pH measurements?
 Continuous- interval
 How about temperature?
 Continuous- interval for Fahrenheit and Celsius (0F and 0C are
arbitrary); continuous ratio for kelvin (0K is not just arbitrary)
 You use a device to measure a dependent variable and I
use a different device to measure the same variable.
Your measurements record to two decimal places; mine
only to whole numbers. Comment on which
measurements are discrete (if any) and which are
continuous (if any).
Dispersion or Variability
Objectives:
1. Recognize the difference between the
study sample and the population
2. Understand the causes of dispersion in
populations, samples and data
3. Learn how to graphically describe
dispersion.
DESCRIBING POPULATIONS

A. The sample as representative of the


population
B. The distribution of a numerical
variable in the sample
THE SAMPLE
POPULATION----------------CONCLUSION
INFERENCE

STUDY
SAMPLE DATA
(study groups)

Therefore, your conclusions about


a population are only as good as
your sample
DISPERSION OR VARIABILITY
 Inherent in the population (biological
variability)
 Due to the observer (experimental
error)
 a. Systematic errors have same
magnitude and are unidirectional- i.e.
subtract a blank
 b. Random errors are caused by
imprecision and other factors inherent in
the observation- normally distributed on
average
The “Normal” Distribution

 THE NORMAL DISTRIBUTION IS ONE


WHICH FOLLOWS A GAUSSIAN MODEL
FOR THE BINOMIAL DISTRIBUTION OF
PROBABILITIES
THE BINOMIAL DISTRIBUTION
Frequency Distribution Obtained When Sampling a
population Mutually Exclusive Variables ,e.g., M&F,
H&T
PROPERTIES OF A DISTRIBUTION I

 Absolute vs. relative: n vs. proportion

Histogram of Data 1:Freq. dist. (histogram)


10 Histogram of Data 1:Freq. dist. (histogram)
9 0.100
8

7 0.075
6
Number

5
0.050
4

2 0.025

0 0.000
0

10

11

12

13

14

15

16

17

18

19

20

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Bin Center
Bin Center
PROPERTIES OF A DISTRIBUTION I
 Differential vs. cumulative. In diff., each bin contains
value for that bin only; in cumul., each bin is sum of all
previous bins

Histogram of Data 1:Freq. dist. (histogram) Histogram of Data 1:Freq. dist. (histogram)
0.100 1.00

0.075 0.75

0.050 0.50

0.025
0.25

0.000
0.00
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Bin Center
Bin Center
PROPERTIES OF A DISTRIBUTION II
 SHAPE  MEASURES OF
Tails: values far from CENTRAL TENDENCY
central tendencies Mode: value represent-
Skewness: ing greatest number of
asymmetry caused counts
by “uneven Median: 50%>;50%<
distribution
Mean: mathematical
Kurtosis: “flatness’ average,i.e., Sxn/n
or “peakedness” of
curve.
THE NORMAL DISTRIBUTION
1. Two-tailed, bell-shaped, mathematically described
as “Gaussian”
2. Median, mode and mean are identical
SOME EXAMPLES
Plotting Distributions; Line Graph
0.275

0.250

0.225

0.200

0.175
Percentage

0.150

0.125

0.100

0.075

0.050

0.025

0.000
10.0

12.5

15.0

17.5

20.0

22.5

25.0

27.5
0.0

2.5

5.0

7.5

Bin Center
Plotting Distributions; Bar Graph

18
17
16
15
14
13
12
Percentage

11
10
9
8
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Bin Center
“Thinking on your feet”
 If you do logarithmic transformation of a typically skewed
distribution of a biological parameter from a population,
should it be base 10 log or natural log?
 Doesn’t matter which one
 Would it make a difference and why (or why not)?
 Doesn’t matter which one- still will end up normal, just the values
of the parameter will be a bit different because of the base you
use
 *Can you calculate percentiles without graphing your
data? How?
 Yes, you can calculate it by figuring out a ratio comparison or
ratio proportion
QUANTIFYING DISPERSION
Objectives:
1. Recognize why dispersion is described
in quantitative terms.
2. Learn the various ways in which
dispersion can be quantified.
3. Understand the purpose of each of these
terms.
The RANGE
 Simplest description of variability
 Gives no indication of the manner in which
a variable is distributed
 Can be misleading, eg, the statement,
“values of x ranged from 0.1 to 10.1”
 0.1, 0.1, 0.2, 0.2, 0.4, 0.5, 10.1
 0.1, 9.8, 8.9, 10.0,10.0, 9.5, 10.1
The VARIANCE
 Actually gives an indication of the distribution
because it depends upon how far the point
estimates are from the mean.
 The variance depends on the difference
between
 each point estimate and the mean (how far from the
mean is each point?) and
 the number of points (intuitively, shouldn’t more
points result in a better estimate of the degree of
dispersion?)- more points = better estimate
The VARIANCE
The difference between each point and the
mean, summed:
(x1-x ) + (x2-x ) + (x3-x ) +… (xn-x )
or: S (x-x) and then squared: S (x-x)2
divided by n or n-1

WELL WHICH ONE????


“BIASED” versus “UNBIASED”
The population vs. the sample
…remember the difference?

Estimates are called “biased” if they are based


on your own sample size, n. “n” should only be
used if you are sampling the entire
population.  population

Otherwise, use Bessel’s correction, estimated by


n-1 sample

* N for population; n for sample


The VARIANCE
So our first quantitative measurement of
dispersion, the variance, S, is given by:

S = S (x-x )2
n-1

We hardly ever use it!!!!


The STANDARD DEVIATION
What we most often use to describe dispersion
is the standard deviation, called s (little “S”)
and given by the square root of the variance:

s =(S)1/2
or

s = (S (x-xbar)2 / n-1)1/2
THE NORMAL DISTRIBUTION
1. Two-tailed, bell-shaped, mathematically described
as “Gaussian”
2. Median, mode and mean are identical
POPULATION vs. SAMPLE PARAMETERS

m vs. x
 s vs. s or SD
WHAT’S A STANDARD DEVIATION?
 It’s the distance from the mean in which 68.26%
of all point estimates of a variable from a
normally distributed sample will be found.
 2 SD will include 95.45% of all points
 3 SD will include 99.73% of all points

FROM YOUR SAMPLE!!!

Wouldn’t it be nice to get a good estimate the


dispersion in your POPULATION from these??
THE CENTRAL LIMIT THEOREM

 States that:
 a large number of means, gathered from many
samples from a given population, will be normally
distributed,
 the mean of the means of all these samples will
approximate the population mean, and
 The standard deviation from the mean of the means
will approximate the population standard deviation
THE “STANDARD ERROR”
 One implication of the CL theorem is that we
can learn about dispersion in the population
from which our sample derives if we can
estimate the standard deviation of the mean of
the means.
 We call this estimate the standard error of the
mean, abbreviated SEM*
 SEM is calculated (very simply) by:
SEM=SD/n1/2
* more strictly: sx
CONFIDENCE INTERVALS (CI)
 Thus SEM tells us something about our
population
 We can use this information to calculate
Confidence Intervals- tells how likely a
value that we have belongs to the
population of the interest
CONFIDENCE INTERVALS (CI)
 Answers the question: What range of values can I expect for
X% of measurements made in my population ? (Where X is
the CI).
 Can be calculated for different levels of likelihood, eg, 90%,
95%, 99% CI (usually 95%).
 So a 95% CI about a mean value of 10 equal to 5-15 means that
I’m confident that 95% of measurements from the population
would fall in the range 10 to 15.
 CIs can be estimated from the simple equation:
lower end of 95% CI= x – (1.96.SEM)
upper end of 95% CI= x + (1.96.SEM)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy