Unit 3
Unit 3
Descriptive
Statistics and
Hypothesis
Testing with
SPSS
©shumank deep
Definition
Purpose
Importance of Tabulation:
Select Frequencies:
Steps for
Navigate to Analyze > Descriptive Statistics > Frequencies.
Choose Variables:
Select the variables you want to analyze (e.g., Age or Income).
in SPSS
Add cumulative percentages and charts (e.g., histograms).
Export/Save Results:
Exportdeep
©shumank the results to a document or presentation format.
Understand Column Layout:
Analyze Percentages:
Interpreting
• Example: “25% of respondents fall within the 25–34 age group.”
Cumulative Percentages:
a
• Useful for understanding data distribution thresholds.
• Example: “50% of respondents are under 35 years old.”
Spot Trends:
Highlight Outliers:
Steps for
Open your dataset in SPSS.
Creating a
Crosstabs.
Cross-
Choose a column variable (e.g.,
Purchase Decision).
Tabulation
Customize the Table: Under Cells, select counts and
percentages (row, column, or total).
in SPSS
Click OK to view the cross-tabulation in
Generate the Crosstab: the SPSS output window.
Chi-Square Test
•Example: Does gender influence purchase decisions?
Key Outputs in •Observed Counts: Actual data counts in each category combination.
•Expected Counts: The counts expected if no relationship exists.
•Pearson Chi-Square Value: Statistic indicating the difference between observed and expected counts.
Steps to Analyze •Compare Observed and Expected Counts for notable differences.
•Look at the Chi-Square Value and associated P-value.
©shumank deep
Interpreting Chi-Square Results
©shumank deep
Definition
• Measures of location describe the central point of a dataset.
• They summarize where most values in the data tend to cluster.
Key Measures
• Mean (Average):
• Sum of all data values divided by the number of values.
Measures of • Example: Average exam score.
• Median (Middle Value):
Location • The central value when data is sorted in ascending order.
• Example: Median income to reduce skew by outliers.
• Mode (Most Frequent):
• The value that appears most often.
• Example: Most common shoe size.
©shumank deep
Definition:
Example:
©shumank deep
The value that appears
Definition: most frequently.
Where to
common shoe sizes.
©shumank deep
Mean: Best for evenly distributed
data without extreme values.
©shumank deep
Graphical Data Display
Purpose of Summarize large datasets visually.
Graphical Identify patterns, trends, and outliers.
Displays Enhance understanding and decision-making.
Pie Charts
Common Types
Boxplots
of Graphs
Histograms
Role of SPSS in
Graphical Easy creation and customization of visuals.
Displays
©shumank deep
Definition
• Circular chart showing proportions of
categories.
When to Use
• Best for displaying parts of a whole
Pie
(percentages).
• Example: Market share distribution among
companies.
Charts
SPSS Steps
• Go to Graphs > Chart Builder.
• Choose Pie Chart and drag a variable to the
sector.
• Click OK to generate the chart.
Interpretation
• Larger slices indicate higher proportions.
• Example: 50% of customers prefer Product A.
©shumank deep
Histograms
©shumank deep
Key Terms in a Boxplot
Median (Middle Line in the The middle value of the data.
Box) Splits the data into two equal halves.
Whiskers (Lines Outside the Show the smallest and largest values within a
normal range.
Box) They do not include extreme or unusual values.
Outliers (Dots Outside the Values that are far away from the rest of the data.
Whiskers) These are considered unusual.
Minimum and Maximum The smallest and largest values that are still
(Ends of Whiskers) within the normal range.
©shumank deep
Boxplots in Action
When to Use Boxplots
Interpreting a Boxplot
Key Measures
• Range:
• Difference between the largest and smallest values.
When to
Standard Deviation
• Use to understand the spread of data around the mean.
• Smaller SD: Data points are close to the mean.
Use
• Larger SD: Data points are widely spread.
Illustration
• Dataset 1: {50, 52, 54, 56, 58} → Low SD (values close to the
mean).
• Dataset 2: {30, 50, 70, 90, 110} → High SD (values spread out).
Key Insights
• Small variability: Data is consistent and predictable.
• Large variability: Data shows more diversity or uncertainty.
©shumank deep
How to Measure Variability in SPSS
Open your dataset in SPSS.
Navigate to Analyze > Descriptive Statistics > Descriptives.
Steps to Calculate Select the variable(s) you want to analyze.
Variability in SPSS
Check the option for Standard Deviation (default).
Click OK to view the output table.
©shumank deep
Introduction to Measures of Shape
Why Measures of
Definition Key Metrics When to Use
Shape Matter
• Measures of • Skewness: • They help • Use to check if
shape describe Indicates the identify: data meets
the distribution symmetry of the • Asymmetry in assumptions for
of data relative to distribution. data statistical tests
symmetry and • Kurtosis: distribution. (e.g., normality).
tail behavior. Describes the • Presence of • Compare
"tailedness" or extreme values distributions
concentration of or heavy tails. across datasets.
data in the tails
vs. the center.
©shumank deep
Definition of Skewness
Symmetrical Distribution:
• Skewness = 0.
• Right-Skewed (Positive Skew): Tail extends to the right (e.g.,
income data).
• Left-Skewed (Negative Skew): Tail extends to the left (e.g.,
retirement age).
Breaking It Down
Kurtosis -
flat.
•Sharp peak: Data is concentrated around the center.
•Flat peak: Data is more evenly spread out.
•Tails: Refers to the extreme values at the edges of the distribution.
•Heavy tails: More extreme values than normal.
Tails of the
•Light tails: Fewer extreme values than normal.
Distribution
•Example: Heights of adults in a population.
•Positive Excess Kurtosis (>0): Data has a sharp peak with heavy tails
(Leptokurtic).
•Example: Extreme events like high stock market returns.
•Negative Excess Kurtosis (<0): Data has a flat peak with light tails
(Platykurtic).
•Example: Data evenly spread, like grades in an easy test.
©shumank deep
Definition
Key Terms
Importance
©shumank deep
Null Assumes no relationship or effect exists between variables.
Hypothesis Example: "There is no difference in test scores between students
Types of Hypotheses
(H₀) taught online and in-person."
©shumank deep
Definition
• Parametric tests assume the data follows a specific distribution,
typically normal.
Parametric
• ANOVA: Compares means across multiple groups.
• Example: Comparing sales performance of three regions.
Tests
• Pearson Correlation: Measures the relationship between two
variables.
• Example: Correlation between study hours and exam scores.
When to Use
• Data is continuous.
• Follows normal distribution.
• Homogeneity of variances.
Advantages
• More powerful when assumptions are met.
©shumank deep
Definition
When to Use
Advantages
©shumank deep
Parametric vs. Non- Non-
Parametric Tests Aspect Parametric
Parametric
Assumes No
Assumptions normal distribution
• Key Differences
• Choosing the Right Test
distribution assumption
• Use Parametric: When assumptions Ordinal,
are met (e.g., normality).
Continuous, categorical,
• Use Non-Parametric: When data is Data Type
skewed or assumptions are not met. interval/ratio or non-
• Example Scenarios normal
• Parametric: Testing average sales
between two regions. More efficient Works well
• Non-Parametric: Testing customer Efficiency with large with small
satisfaction rankings between two samples samples
stores.
©shumank deep
Why Analyze Samples?
Importance
©shumank deep
Test if the sample mean
Purpose differs significantly from
a known value.
One- Example
Testing whether the
average height of
Sample
students is different
from 5.5 feet.
©shumank deep
1. Steps
One-
Sample T Test.
• Select the variable (e.g., test scores).
Sample
• Enter the test value (e.g., 50).
• Click OK.
©shumank deep
Two Independent Samples Test - Overview
©shumank deep
Two Independent Samples Test in SPSS
Steps
Interpret Results
©shumank deep
Paired Samples Test - Overview
Compare means of the same group at two different
Purpose times or under two conditions.
©shumank deep
Paired Samples Test in SPSS
©shumank deep
Threshold for Use p < 0.05 as the standard for determining significant results.
Significance
Effect Size Evaluate practical significance using effect size metrics (e.g., Cohen’s d).
Key Insights
for All Tests
Visualize Use charts (e.g., bar charts, boxplots) to illustrate findings.
Results
©shumank deep
Understanding Cohen’s d
What is Cohen’s d?
• A measure of effect size that shows the difference between two group averages in terms of standard deviation units.
• Helps to understand the practical importance of a difference.
Example Scenario:
• Comparing the effectiveness of two teaching methods on student test scores.
©shumank deep
Definition
Effect Size
Types of Effect Sizes
4. Key Thresholds
When and
• Pearson’s r: Correlation coefficient for relationships.
Calculate • Eta-Squared: Proportion of variance explained in
ANOVA.
How to Use
Effect Size
• Cohen’s d = 0.6: Moderate difference in test scores
Example between two teaching methods.
Interpretations • Pearson’s r = 0.4: Moderate positive correlation
between hours studied and exam scores.
©shumank deep
Perform a T-Test:
SPSS
• Group 1: Mean = 75, SD = 10
• Group 2: Mean = 70, SD = 12
• Approximation: Difference = 5; Average SD = 11
• Cohen’s d ≈ 5 ÷ 11 = 0.45 (Medium Effect).
Alternative Tools:
©shumank deep
A statistical test used to
What is One-Way determine if there are
significant differences
ANOVA? between the means of
three or more groups.
Analysis of
Variance
Data is normally
distributed.
(ANOVA)
Observations are
independent.
Key Terms in
One-Way Within-Group Variance: Variability within each group.
Navigate in SPSS
Perform One-
Define Variables
in SPSS
Additional Options
ANOVA Table
Results
• Provide pairwise comparisons to identify specific
group differences.
• Example: Tukey test indicates Group A > Group B.
• Small: 0.01
• Medium: 0.06
• Large: 0.14
©shumank deep
Comparing test scores across three teaching methods:
Scenario Traditional, Online, and Hybrid.
Example
Results
Post Hoc Test: Online scores > Traditional and Hybrid.
Interpretation
Teaching method significantly affects test scores.
Conclusion
Online method shows significantly higher scores.
©shumank deep
Limitations and Best Practices
Limitations Best Practices SPSS Advantages
©shumank deep
What is ANCOVA?
• Combines ANOVA and regression to evaluate mean
Purpose
to Analysis • Adjusts the dependent variable by removing the
of influence of a covariate.
• Increases precision by reducing variability caused
Key Applications
(ANCOVA) • Evaluating the effect of an intervention while
accounting for preexisting differences (e.g.,
comparing teaching methods while controlling for
prior knowledge).
©shumank deep
Dependent Variable
• The outcome being measured (e.g., test scores).
Independent Variable
• The categorical factor being tested (e.g., teaching
Key method).
Components of Covariate
ANCOVA • A continuous variable that influences the dependent
variable (e.g., prior knowledge).
Assumptions
• Linearity between covariate and dependent variable.
• Homogeneity of regression slopes.
• Normal distribution and homogeneity of variance.
©shumank deep
Prepare • Ensure the covariate and dependent variable are
Data continuous.
Steps to SPSS
Perform Set • Dependent Variable: Outcome (e.g., test scores).
• Fixed Factor: Group (e.g., teaching method).
©shumank deep
Controls for extraneous variables, improving accuracy.
Benefits
Reduces error variance, increasing statistical power.
Limitations
Misinterpretation risk if covariates are not independent of the
independent variable.
©shumank deep