Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data

CLASSIFICATION OF DATA
Objectives:
1. Understand how data are
classified.
2. Recognize the different types
of data.
VARIABLES AND DATA
What is a variable?
A variable is any one of the measures which
comprise the database in a study.
or:
A variable is a single parameter in a research
database.
THEREFORE: RESEARCH DATABASES ARE
COMPOSED OF DATA SETS OF
VARIABLES
PROPERTIES OF VARIABLES
 RELATIONAL DESCRIPTION
 How variables relate to each other
 FUNCTIONAL DESCRIPTION
RELATIONAL DESCRIPTION OF
VARIABLES
A. INDEPENDENT. Those to which you
randomize, or control or manipulate
B. DEPENDENT. That which you measure.

The results; the outcome etc. OR that
what you hope comes out well!
Univariate Statistics
 One independent; one dependent variable
 More than one of each is called
“multivariate” statistics
FUNCTIONAL DESCRIPTION OF
VARIABLES (Levels of Measurement)
A. QUALITATIVE (categorical)
1. Nominal: data described by a quality, property or
category- given a name i.e. eye color
2. Ordinal: categorical data which can be assigned
arbitrary numerical ranks
Ordinal data doesn’t relate to one another
mathematically- i.e. 1-10 scale for pain
B. QUANTITATIVE (numerical)
1. Continuous: Infinitely variable
(interval {0 arb} or ratio {0 not arb})
2. Discrete- not infinitely variable
Some Examples:
NOMINAL DATA
Pos/Neg (eg, dichotomous data)
Variety of lab tests
Gene expression, amplification, etc
Male/Female
Disease Present/Absent
Multiple categories
Mild, moderate, severe
Phenotypic data
Food preferences
Some Examples:
ORDINAL DATA
Scales and indices (assessments):

Pain (VAS)- i.e. 1-10 scale
Physiologic responses
Psychological tests (Likert scale) (never, likely,
always, etc.)
Grading histology slides
Assignment based on ranks:

Water quality
Range assignment
REMEMBER: THESE “NUMBERS” ARE NOT RATIONAL!

Some Examples:
CONTINUOUS DATA
Measurements using common analytical instrumentation

Glucose
Blood Pressure
Cell surface receptors
Measurements using mathematically-derived scales

Height
Weight
Some Examples:
DISCRETE DATA
Always discrete (“Attribute Data”)

Number off offspring
Genotypic: eg, deletions, translocations, etc.
Conveniently discrete: NB: THESE DON’T HAVE TO BE

DISCRETE/NOT ALWAYS DISCRETE
Age
Time
“Thinking on your feet”
 What kind of data are pH measurements?
 Continuous- interval
 How about temperature?
 Continuous- interval for Fahrenheit and Celsius (0F and 0C are
arbitrary); continuous ratio for kelvin (0K is not just arbitrary)
 You use a device to measure a dependent variable and I
use a different device to measure the same variable.
Your measurements record to two decimal places; mine
only to whole numbers. Comment on which
measurements are discrete (if any) and which are
continuous (if any).
Dispersion or Variability
Objectives:
1. Recognize the difference between the
study sample and the population
2. Understand the causes of dispersion in
populations, samples and data
3. Learn how to graphically describe
dispersion.
DESCRIBING POPULATIONS
A. The sample as representative of the

population
B. The distribution of a numerical
variable in the sample
THE SAMPLE
POPULATION----------------CONCLUSION
INFERENCE
STUDY
SAMPLE DATA
(study groups)
Therefore, your conclusions about

a population are only as good as
your sample
DISPERSION OR VARIABILITY
 Inherent in the population (biological
variability)
 Due to the observer (experimental
error)
 a. Systematic errors have same
magnitude and are unidirectional- i.e.
subtract a blank
 b. Random errors are caused by
imprecision and other factors inherent in
the observation- normally distributed on
average
The “Normal” Distribution
 THE NORMAL DISTRIBUTION IS ONE

WHICH FOLLOWS A GAUSSIAN MODEL
FOR THE BINOMIAL DISTRIBUTION OF
PROBABILITIES
THE BINOMIAL DISTRIBUTION
Frequency Distribution Obtained When Sampling a
population Mutually Exclusive Variables ,e.g., M&F,
H&T
PROPERTIES OF A DISTRIBUTION I
 Absolute vs. relative: n vs. proportion
Histogram of Data 1:Freq. dist. (histogram)

10 Histogram of Data 1:Freq. dist. (histogram)
9 0.100
8
7 0.075
6
Number
5
0.050
4
2 0.025
0 0.000
0
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Bin Center
Bin Center
PROPERTIES OF A DISTRIBUTION I
 Differential vs. cumulative. In diff., each bin contains
value for that bin only; in cumul., each bin is sum of all
previous bins
Histogram of Data 1:Freq. dist. (histogram) Histogram of Data 1:Freq. dist. (histogram)
0.100 1.00
0.075 0.75
0.050 0.50
0.025
0.25
0.000
0.00
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Bin Center
Bin Center
PROPERTIES OF A DISTRIBUTION II
 SHAPE  MEASURES OF
Tails: values far from CENTRAL TENDENCY
central tendencies Mode: value represent-
Skewness: ing greatest number of
asymmetry caused counts
by “uneven Median: 50%>;50%<
distribution
Mean: mathematical
Kurtosis: “flatness’ average,i.e., Sxn/n
or “peakedness” of
curve.
THE NORMAL DISTRIBUTION
1. Two-tailed, bell-shaped, mathematically described
as “Gaussian”
2. Median, mode and mean are identical
SOME EXAMPLES
Plotting Distributions; Line Graph
0.275
0.250
0.225
0.200
0.175
Percentage
0.150
0.125
0.100
0.075
0.050
0.025
0.000
10.0
12.5
15.0
17.5
20.0
22.5
25.0
27.5
0.0
2.5
5.0
7.5
Bin Center
Plotting Distributions; Bar Graph
18
17
16
15
14
13
12
Percentage
11
10
9
8
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Bin Center
“Thinking on your feet”
 If you do logarithmic transformation of a typically skewed
distribution of a biological parameter from a population,
should it be base 10 log or natural log?
 Doesn’t matter which one
 Would it make a difference and why (or why not)?
 Doesn’t matter which one- still will end up normal, just the values
of the parameter will be a bit different because of the base you
use
 *Can you calculate percentiles without graphing your
data? How?
 Yes, you can calculate it by figuring out a ratio comparison or
ratio proportion
QUANTIFYING DISPERSION
Objectives:
1. Recognize why dispersion is described
in quantitative terms.
2. Learn the various ways in which
dispersion can be quantified.
3. Understand the purpose of each of these
terms.
The RANGE
 Simplest description of variability
 Gives no indication of the manner in which
a variable is distributed
 Can be misleading, eg, the statement,
“values of x ranged from 0.1 to 10.1”
 0.1, 0.1, 0.2, 0.2, 0.4, 0.5, 10.1
 0.1, 9.8, 8.9, 10.0,10.0, 9.5, 10.1
The VARIANCE
 Actually gives an indication of the distribution
because it depends upon how far the point
estimates are from the mean.
 The variance depends on the difference
between
 each point estimate and the mean (how far from the
mean is each point?) and
 the number of points (intuitively, shouldn’t more
points result in a better estimate of the degree of
dispersion?)- more points = better estimate
The VARIANCE
The difference between each point and the
mean, summed:
(x1-x ) + (x2-x ) + (x3-x ) +… (xn-x )
or: S (x-x) and then squared: S (x-x)2
divided by n or n-1
WELL WHICH ONE????

“BIASED” versus “UNBIASED”
The population vs. the sample
…remember the difference?
Estimates are called “biased” if they are based

on your own sample size, n. “n” should only be
used if you are sampling the entire
population.  population
Otherwise, use Bessel’s correction, estimated by

n-1 sample
* N for population; n for sample

The VARIANCE
So our first quantitative measurement of
dispersion, the variance, S, is given by:
S = S (x-x )2
n-1
We hardly ever use it!!!!

The STANDARD DEVIATION
What we most often use to describe dispersion
is the standard deviation, called s (little “S”)
and given by the square root of the variance:
s =(S)1/2
or
s = (S (x-xbar)2 / n-1)1/2
THE NORMAL DISTRIBUTION
1. Two-tailed, bell-shaped, mathematically described
as “Gaussian”
2. Median, mode and mean are identical
POPULATION vs. SAMPLE PARAMETERS
m vs. x
 s vs. s or SD
WHAT’S A STANDARD DEVIATION?
 It’s the distance from the mean in which 68.26%
of all point estimates of a variable from a
normally distributed sample will be found.
 2 SD will include 95.45% of all points
 3 SD will include 99.73% of all points
FROM YOUR SAMPLE!!!
Wouldn’t it be nice to get a good estimate the

dispersion in your POPULATION from these??
THE CENTRAL LIMIT THEOREM
 States that:
 a large number of means, gathered from many
samples from a given population, will be normally
distributed,
 the mean of the means of all these samples will
approximate the population mean, and
 The standard deviation from the mean of the means
will approximate the population standard deviation
THE “STANDARD ERROR”
 One implication of the CL theorem is that we
can learn about dispersion in the population
from which our sample derives if we can
estimate the standard deviation of the mean of
the means.
 We call this estimate the standard error of the
mean, abbreviated SEM*
 SEM is calculated (very simply) by:
SEM=SD/n1/2
* more strictly: sx
CONFIDENCE INTERVALS (CI)
 Thus SEM tells us something about our
population
 We can use this information to calculate
Confidence Intervals- tells how likely a
value that we have belongs to the
population of the interest
CONFIDENCE INTERVALS (CI)
 Answers the question: What range of values can I expect for
X% of measurements made in my population ? (Where X is
the CI).
 Can be calculated for different levels of likelihood, eg, 90%,
95%, 99% CI (usually 95%).
 So a 95% CI about a mean value of 10 equal to 5-15 means that
I’m confident that 95% of measurements from the population
would fall in the range 10 to 15.
 CIs can be estimated from the simple equation:
lower end of 95% CI= x – (1.96.SEM)
upper end of 95% CI= x + (1.96.SEM)

Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data

Uploaded by

Copyright:

Available Formats

Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data

Uploaded by

Copyright:

Available Formats

CLASSIFICATION OF DATA

B. DEPENDENT. That which you measure.

Scales and indices (assessments):

Assignment based on ranks:

REMEMBER: THESE “NUMBERS” ARE NOT RATIONAL!

Measurements using common analytical instrumentation

Measurements using mathematically-derived scales

Always discrete (“Attribute Data”)

Conveniently discrete: NB: THESE DON’T HAVE TO BE

A. The sample as representative of the

Therefore, your conclusions about

 THE NORMAL DISTRIBUTION IS ONE

 Absolute vs. relative: n vs. proportion

Histogram of Data 1:Freq. dist. (histogram)

WELL WHICH ONE????

Estimates are called “biased” if they are based

Otherwise, use Bessel’s correction, estimated by

* N for population; n for sample

We hardly ever use it!!!!

FROM YOUR SAMPLE!!!

Wouldn’t it be nice to get a good estimate the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.