Basics in Statistics: Study Design and Estimation
Basics in Statistics: Study Design and Estimation
Basics in Statistics
Study Design
and
Estimation
Michael Edlinger
1
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
First session
Descriptive Statistics and Data Management
2
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Objectives
Study Design and Estimation
3
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Literature
Median or mean?
Measures of variablility
Quantiles, cumulative distribution function, and box-plot
The 2 by 2 table
What is a confidence interval?
What is a significance test? General issues
Common significance tests
4
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Contents
Probability theory
Estimation
Study design
Exercises
Test (separate)
5
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Probability theory
Basics of sampling theory (warnings)
6
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Probability
The probability (p) that a certain event (A) will take place: p(A)
0 ≤ p(A) ≤ 1
p() = 1
7
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Probability
Algorithms:
• p() = 1 - p(A)
• p(BC) = p(B) + p(C) - p(BC)
8
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Bayes' theorem
Satz von Bayes
9
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Bayes' theorem
Satz von Bayes
Example
10
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Bayes' theorem
Satz von Bayes
More examples
11
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Distributions
Discrete probability distributions:
• Poisson
• Bernoulli
• binomial
• geometric
• negative binomial
Normal distribution
f(x) 0,25
0,20
0,15
0,10
f(x)
0,05
0,00
13
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Normal distribution
N(,2): - < X <
Standardisation:
X
Z
Many practical applications, important characteristics, basis for
various test distributions
14
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Questions ?
15
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Estimation
Exploratory versus confirmatory research
Descriptive versus inferential research
16
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Hypothesis testing
Hypothesis testing:
• H0: the null hypothesis that there is no effect
• H1: the alternative hypothesis that there is an effect
17
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Hypothesis testing
Interpretation of p-values is problematic
Samples give results that differ from what is true in the population
The test of H0 is whether or not the p-value lies below the chosen
cut-off point
18
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Hypothesis testing
Significance
19
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Hypothesis testing
Significance
Too often the custom: based on the p-value, the analysis is treated
as a process for making a decision
20
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Hypothesis testing
Significance
21
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Hypothesis testing
The problem
Hypothesis testing
Misinterpretations of p-value:
• the probability of the data having arisen by chance
• the probability that the observed effect is not a real one
Questions ?
24
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Effect size
Measurement of the occurrence of disease:
• prevalence: n / N at a particular point in time
• incidence:
- cumulative incidence: n in time period / population at risk
- incidence rate: n per 1000 person-years
25
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Precision
In a sample:
se
n
where:
se = standard error of the mean
= standard deviation in the population
n = sample size
Precision
Continuous variable
s
se( x )
n
s12 s22
se( x1 x2 )
n1 n2
27
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Precision
Dichotomy
p (1 p)
se( p)
n
p1 (1 p1 ) p2 (1 p2 )
se( p1 p2 )
n1 n2
28
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Precision
Two features:
• the mean observed in a sample:
best estimate of the true value in the population
• the distribution of the means obtained in several samples:
approximately Normal-distributed for large samples
confidence interval:
a range of values which very likely includes the true value
29
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Precision
95% confidence interval typically:
between
x 1.96 se
and
x 1.96 se
30
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
31
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Statistical modelling
Statistical model: a mathematical relationship, between 2 or more
variables, that gives an approximate description of the data;
a simplification compatible with the data
Important:
1. certain assumptions must be fulfilled to be able to use a model
2. how well does the model fit the data?:
– systematic discrepancies
– soundness of prediction for an individual
Statistical modelling
Non-parametric methods
Questions ?
35
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Analysis method
Scale:
• quantitative: discrete, continuous
• qualitative: dichotomous, nominal, ordinal
36
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Parametric methods
One group of observations:
• mean and confidence interval
• one sample t-test
37
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Non-parametric methods
One group of observations: Wilcoxon signed rank sum test
38
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Qualitative variables
One variable:
• proportion and confidence interval
• z-test
39
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
40
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
a. cohort study:
• difference between proportions and confidence interval
• z-test
• relative risk (RR) and confidence interval
• χ2-test (with 1 d.f.)
• Fisher's exact test
b. case-control study:
• odds ratio (OR) and (exact) confidence interval
• χ2-test (with 1 d.f.)
• Fisher's exact test
41
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Correlation - warning
r = 0.816
wikipedia
42
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Questions ?
43
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Study design
Most studies aim to answer simple questions
The data from a good study can be analysed in many ways, but not
much can afterwards compensate for study design problems
Classifications:
• experimental - observational
• prospective - retrospective: the way the data are collected
• longitudinal - cross-sectional: numerous observations or once
44
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Variation
Care is needed to make samples "representative" of or fitting for
the population
45
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Classification
Experimental study: to investigate the effects of an intervention,
e.g.:
• clinical trials
• animal studies
• laboratory studies
Strong inferences, comparisons between groups
Experimental research is often prospective and longitudinal
46
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Trials
Randomisation: at random allocation of treatments to patients
47
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Trials
Random allocation:
• to prevent bias (confounding)
• statistical theory is based on the idea of random sampling
Types of randomisation:
• simple
• block
• stratified
• cluster
48
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Observational studies
Randomisation often not an option
(unethical, unpractical, expensive)
49
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Observational studies
Case-control study
Advantages:
• simple, thus quick and cheap
• valuable when the condition is very rare
Main disadvantage:
• possible biases
50
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Observational studies
Case-control study
51
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Observational studies
Cohort study
Advantages:
• more possibilities to focus on subgroups and determinants
• data recording can be carefully controlled
Disadvantages:
• time-consuming, thus expensive
• selection of subjects not always obvious (clinic-based versus
population-based studies; natural history of disease)
• loss to follow-up: bias (loss related to outcome/determinant)
• surveillance bias: more intensive investigation high risk group
52
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Observational studies
Cross-sectional study
Advantages:
• no recall bias
• no loss to follow-up
• relatively cheap and easy
Problems:
• sample selection: representativeness
• response rates often quite low (volunteer bias)
• sequence in time: what is cause and what is effect?
53
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Questions ?
54
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
55
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Causal
Aim:
to causally explain course of disease as influenced by treatment
56
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Causal
Aim:
to causally explain occurrence of a disease from determinant
57
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Descriptive
Aim:
to predict the probability of presence of a disease
from clinical and non-clinical profile
58
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Descriptive
Aim:
to predict the course of a disease
from clinical and non-clinical profile
59
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
SPSS session
• Compare means
• Crossstabs (incl. OR)
• Non-parametric tests
60
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
http://www.bmj.com/about-bmj/resources-readers/publications
• How to read a paper
• Epidemiology for the uninitiated
• Statistics at square one
61
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Literature
Median or mean?
Measures of variability
Quantiles, cumulative distribution function, and box-plot
The 2 by 2 table
What is a confidence interval?
What is a significance test? General issues
Common significance tests
62
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Consultation
63
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Test
64
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Questions ?
65
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
Exercises
Formulate a research question with an exposure and an outcome
Interpretation, reflection
66
Basics Statistics 2, 2016/2017, ME
Medical Statistics, Informatics and Health Economics
INNSBRUCK MEDICAL UNIVERSITY
67
Basics Statistics 2, 2016/2017, ME