BRM Data Analysis Techniques
BRM Data Analysis Techniques
6
Statistical Analysis: Key Terms
• Hypothesis
– Unproven proposition: a supposition that
tentatively explains certain facts or
phenomena.
– An assumption about nature of the world.
• Null Hypothesis
– No difference in sample and population.
• Alternative Hypothesis
– Statement that indicates the opposite of the
null hypothesis.
7
Statistical Analysis: Key Terms
• Hypothesis
– Unproven proposition: a supposition that
tentatively explains certain facts or
phenomena.
– An assumption about nature of the world.
• Null Hypothesis
– No difference in sample and population.
• Alternative Hypothesis
– Statement that indicates the opposite of the
null hypothesis.
8
Choosing the Appropriate Statistical
Technique
9
Univariate analysis
Univariate analysis involves the examination
across cases of one variable at a time.
There are three major characteristics of a
single variable that we tend to look at:
– the distribution
– the central tendency
– the dispersion
C ov( X ,Y )
X i X Y i Y
N
V ar( X )
X i X X i X
N
Correlation (continued)
Scatter Plot #1 (of moderate correlation):
Correlation (continued)
Scatter Plot #2 (of negative correlation):
Correlation (continued)
Scatter Plot #3 (of high correlation)
Correlation (continued)
Scatter Plot #4 (of very low correlation)
Correlation (continued)
C. How to compute a correlation
coefficient?
• By hand:
– Definitional formula (the familiar one)
C o v ( X ,Y )
r
– S X S Y (different but equivalent)
Computational formula
• By SPSS: Analyze Correlate Bivariate
Correlation Coefficient (r): Definitional Formula
( X X )(Y Y )
r
(X X )2 (Y Y ) 2
XY N X Y
r
( X )( Y
2 2
2
N X 2
N Y )
Correlation (continued)
D. How to test correlation for significance?
D. Test Null Hypothesis that: r = 0
E. Use t-test:
r N 2
t df N 2
1 r 2
Correlation (continued)
E. What are assumptions/requirements of correlation
1. Numeric variables (interval or ratio level)
2. Linear relationship between variables
3. Random sampling (for significance test)
4. Normal distribution of data (for significance test)
F. What to do if the assumptions do not hold
1. May be able to transform variables
2. May use ranks instead of scores
– Pearson Correlation Coefficient (scores)
– Spearman Correlation Coefficient (ranks)
Correlation (continued)
G. How to interpret correlations
1. Sign of coefficient?
2. Magnitude of coefficient ( -1 < r < +1)
Usual Scale: (slightly different from textbook)
+1.00 perfect correlation
+.75 strong correlation
+.50 moderately strong correlation
+.25 moderate correlation
+.10 weak correlation
.00 no correlation (unrelated)
-.10 weak negative correlation
(and so on for negative correlations)
Correlation (continued)
G. How to interpret correlations (continued)
NOTE: Zero correlation may indicate that relationShip is nonlinear
(rather than no association between variables)
H. Important to check shape of distribution linearity;
lopsidedness; weird “outliers”
– Scatterplots = usual method
– Line graphs (if scatter plot is hard to read)
– May need to transform or edit the data:
• Transforms to make variable more “linear”
• Exclusion or recoding of “outliers”
Correlation (continued)
– Scatterplots vs. Line graphs (example)
Correlation (continued)
I. How to report correlational results?
1. Single correlations (r and significance - in text)
2. Multiple correlations (matrix of coefficients in a separate table)
– Note the triangular-mirrored nature of the matrix
Finally, decide whether you are doing a one-tailed or two-tailed test. In this
example, since there is no strong prior theory to suggest whether the relationship
between height and self esteem would be positive or negative, we opt for the two
-tailed test
45
Other Correlations
The specific type of correlation illustrated here is known as
the Pearson Product Moment Correlation.
It is appropriate when both variables are measured at an in
terval level.
However there are a wide variety of other types of
correlations for other circumstances. for instance,
if you have two ordinal variables, you could use the
Spearman rank Order Correlation (rho) or the Kendall rank
order Correlation (tau).
When one measure is a continuous interval level one and
the other is dichotomous (i.e., two-category) you can use
the Point-Biserial Correlation.
For other situations, consulting the web-based statistics
selection program, Selecting Statistics at http://trochim.h
uman.cornell.edu/selstat/ssstart.htm.
Regression Analysis
• Simple (Bivariate) Linear Regression
– A measure of linear association that investigates straight-
line relationships between a continuous dependent variable
and an independent variable that is usually continuous, but
can be a categorical dummy variable.
• The Regression Equation (Y = α + βX )
– Y = the continuous dependent variable
– X = the independent variable
– α = the Y intercept (regression line intercepts Y axis)
– β = the slope of the coefficient (rise over run)
47
The Regression Equation
• Parameter Estimate Choices
– β is indicative of the strength and direction of the
relationship between the independent and dependent
variable.
– α (Y intercept) is a fixed point that is considered a
constant (how much Y can exist without X)
• Standardized Regression Coefficient (β)
– Estimated coefficient of the strength of relationship
between the independent and dependent variables.
– Expressed on a standardized scale where higher
absolute values indicate stronger relationships
(range is from -1 to 1).
48
Simple Regression Results Example
49