Social Work Research and Statistics July 18 2023 Quevedo
Social Work Research and Statistics July 18 2023 Quevedo
Social Work Research and Statistics July 18 2023 Quevedo
- Collection of data
- Analysis of data
- Interpretation of data
- Presentation of data
Population
Population Sample
What’s a Variable?
➢ a characteristic that varies from one
person or thing to another, i.e., it is any
characteristic that varies from one
individual member of the population
to another.
How variable vary across cases?
Quantitative Variable
- Numerical variable
- assumes data which are expressed in numerical values or obtained from
counting or measurement.
Qualitative Variable
- Categorical Variable
- assumes data which are classified according to kind or characteristic by
which they differ.
Ordinal Scale
Qualitative
Nominal Scale
Levels of Measurement – was first proposed
by the American psychologist Stanley
Smith Stevens in 1946.
Methods of Collecting Data
Direct or
Interview Observation
Registration
Indirect or
Questionnaire
Experiment
Test
Methods of Presenting Data
Textual Method
Classifying and
arranging data
in a table.
Semi-Tabular
Graphical Presentations
Bar Graphs
Line Graph
Pie Chart
Pictograph
Map Graph or Cartogram
Sampling
Save Save
Money Product
Reasons for
Sampling
Types of Sampling
•Simple • Convenience
• Lottery •Judgment
• Table of Random Numbers
•Stratified • Quota
• Systematic • Snowball
•Cluster
• Multi-stage
Simple Random Sampling
Third Year 40
Cumulative frequency
Class Interval Frequency Midpoint < cf
83 - 94 4 88.5 60
71 - 82 7 76.5 56
Class size is the 14
59 - 70 difference between
64.5 the upper
49
class boundary
47 - 58 and
19 the lower class
52.5 boundary
35 of
a class interval.
35 -
Example:46 11 40.5 16
23 - 34 i =594.5 - 82.528.5
= 12 5
Construction of Frequency Distribution
Frequency Polygon
• is a linear graph where frequencies of each interval
are assumed concentrated at the midpoint of the
interval.
Ogive
• is a line graph of cumulative frequency distribution.
• points plotted correspond to class boundaries and
cumulative frequencies.
The reasons for constructing a frequency
distribution are as follows:
Midrange
A mathematical
Mean ( X ) representation of the
typical value of a
series of numbers.
It is commonly
referred to as
"average" or as
“arithmetic mean".
is a positional
Median ( Md ) measure and the
middlemost value in
the distribution.
is the value or item in
a distribution which
occurs most
Mode ( Mo ) frequently or has the
highest frequency.
For Ungrouped Data
n
Xi where:
X= i =1 X = individual score
n = sample size
n
Example. Find the mean in the following set of measurements:
16, 23, 17, 2, 23, 25, 17, 17, 20, 23
Solution.
16 + 23 + 17 + 2 + 23 + 25 + 17 + 17 + 20 + 23
X=
10
X = 18.3
For Ungrouped Data
wi xi
X= n i =1
wi
i =1
For Ungrouped Data
1. Arrange the data in a particular order.
2. Identify the middle value which is the
median in the distribution.
Example. Find the median in the following set of measurements:
a) 16, 23, 17, 2, 23, 25, 17, 20, 23
b) 85, 90, 60, 84, 65, 60, 86, 83, 76, 74
Solution.
a) 2, 16, 17, 17, 20, 23, 23, 23, 25
Md = 20
b) 60, 60, 65, 74, 76, 83, 84, 85, 86, 90
76 + 83
Md = = 79.5
2
For Ungrouped Data
Look for the value that occurs most
frequently in the distribution.
Example. Find the mode in the following set of measurements:
a) 15, 20, 30, 15, 20, 20, 24, 32, 18
b) 3.5, 2.5, 3.0, 2.5, 2.5, 2.0, 3.0, 2.0, 3.0, 1.8
c) 5, 8, 10, 12, 13, 7, 6, 11, 14, 15
d) 1, 1, 2, 2, 3, 3, 4, 4, 5
e) 1, 1, 2, 2, 3, 3, 4, 4, 5, 5
Solution.
a) Mo = 20 (unimodal)
b) Mo = 2.5, 3.0 (bimodal)
c) Mo = Does not exist.
d) Mo = 1, 2, 3, 4 (multimodal)
e) Mo = Does not exist.
Measures of Center
Mean – Median – Mode
➢ The mean is easy to compute. You only deal with one
number. It is not so with the median.
➢ The mean is affected by outliers while the median is
resistant. In a sense, the median is able to resist the pull of a
far away value, but the mean is drawn to such values.
➢ A change in any of the numbers changes the mean, and the
mean can be changed drastically by changing an extreme
value.
➢ In contrast, the median and the mode of a set of data are
usually not changed by changing an extreme value.
➢ The mean, the median, and the mode are all averages;
however, they are generally not equal.
Measures of Center
Which measure of center is most useful?
1.) A shoe manufacturer wants to know the average shoe size of
women.
Answer: _____________
2.) Another teacher wants to know how well her class performed
in a long test.
Answer: ______________
3.) A teacher wants to know about her students family situation.
She asks for the number of children in their families:
6 3 2 3 4 1 2 2 4 3 1 2 2 4
Mean = ___ Median = _____ Mode = ____
Answer: ______________
Measures of Center
Which measure of center is most useful?
1.) A shoe manufacturer wants to know the average shoe size of
women.
Answer: Mode (pratical)
2.) Another teacher wants to know how well her class performed in
a long test.
Answer: Median (due to outliers)
3.) A teacher wants to know about her students family situation.
She asks for the number of children in their families:
6 3 2 3 4 1 2 2 4 3 1 2 2 4
Mean = 2.79 Median = 2 Mode = 2
Answer: Mean (when the 3 are almost the same, choose the
mean especially because the data is in interval/ratio scale.)
Measures of Center
Compare the mean, the median, and the mode for the
salaries of 5 employees of a small company.
Mean = P101,200
Median = P 36,000
Mode = P 20,000
describe the
characteristic of
a set of data
Population n
Variance
( X − )
2
i
=
2 i =1
Population
N
Standard Deviation
n
( X − )
2
i
= i =1
N
Variance & Standard Deviation
Sample
( X )
n 2
Variance
i −X
s =
2 i =1
n−1 Sample
Standard Deviation
( X )
n 2
i −X
s= i =1
n−1
Example. Find the variance and standard deviation in the following
set of measurements:
2, 4, 4, 4, 5, 5, 7, 9
Solution.
2+4+4+4+5+5+7+9
X= =5
8
For Variance: For Standard Deviation:
32 32
- s =
2
s=
- 7 7
- = 4.57
- = 4.57
= 2.
On average, the scores deviate from the mean of 5 by 2
which implies a bracket of values ranging from 5-2 to
5+2 or from values that lies from 3 to 7.
is the simplest and
Range the most easiest to
compute among the
measures of variability.
3, 5, 1, 2, 7, 4, 1, 2, 1, 4, 2, 5
Solution.
1, 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 7
Range = 7 – 1 = 6
Coefficient of Variation
x−
z=
where
z is the "z-score" (Standard Score)
x is the value to be standardized
μ is the mean
σ is the standard deviation
1N N N
P1 = th P25 = 4 th P50 = 2 th
100
1N N 3N
Q1 = th Q2 = 2 th Q3 = 4 th
4
1N 3N N
D1 = th D3 = 10 th D5 = 2 th
10
Solution:
Q2 = N =
23
= 11.5th = 100
2 2
4N 4(23) 92
D4 = = = = 9.2th = 100
10 10 10
P25 = N =
23
= 5.75th = 98
4 4
Measure of Relative Position
Percentiles and Quartiles
are useful when you want to know where the score is located
in reference to the other scores.
➢ Percentile is a data value for which the specified percentage
of the data is below that value.
➢ The median is the 50th percentile.
➢ The 25th, 50th , 75th percentiles divide the data into lower
quartile Q1, middle quartile Q2, and upper quartile Q3,
respectively.
➢ In using quartiles, there are five numbers to be used
altogether: min value, Q1, median, Q3, and max value.
➢ Quartiles are useful for box plots.
Normal Distribution
Data can be "distributed" (spread out) in different ways.
3
skm = s
n −1
If skm > 0, the distribution is positively skewed.
If skm < 0, the distribution is negatively skewed.
If skm = 0, the distribution is symmetrical.
is a measure of the degree of peakedness or flatness of the
distribution.
Types of Kurtosis:
1) Leptokurtic (tall distribution) - is symmetrical in shape but the
center peak is much higher; that is, there is a higher frequency of
values near the mean.
2) Platykurtic (flat distribution)- is one in which most of the
values share about the same frequency of occurrence. As a result,
the curve is very flat, or plateau-like.
3) Mesokurtic (normal distribution) - is symmetrical distribution.
( x − x)
4
4
km = s
n −1
x−
z=
where
z is the "z-score" (Standard Score)
x is the value to be standardized
μ is the hypothesized mean
σ is the standard deviation
𝑥 − 𝜇 64.1 − 66.5
𝑧= = = −1
𝜎 2.4
Types of Tests:
1) One-tailed Test – a test of
statistical hypothesis where the
region of rejection is on only
one side of the sampling
distribution.
1) Two-tailed Test – a test of
statistical hypothesis where the
region of rejection is on both side
of the sampling distribution.
2. Which of the following statements exhibits a non-predictive
alternative hypothesis?
a) H1: µ = µo
b) H1: µ ≠ µo
c) H1: µ > µo
d) H1: µ < µo
2. Which of the following statements exhibits a non-predictive
alternative hypothesis?
a) H1: µ = µo
b) H1: µ ≠ µo
c) H1: µ > µo
d) H1: µ < µo
Concepts of Hypothesis Testing
1) p-Value of a Test
The p-value of a test provides a measure of how much
statistical evidence exists to support the alternative hypothesis.
Interpreting the p-value
➢ If the p-value is less than 1%, there is overwhelming
evidence that supports the alternative hypothesis.
Weak Evidence
(Not Significant)
No Evidence
(Not Significant)
p-value=.0359
Interpreting the p-value
a) independent samples
b) independent samples only if the sample sizes are equal
c) dependent samples
d) dependent samples only if the sample sizes are equal
7. A political analyst surveys a random sample of registered
voters from District 1 in Davao City and compares the results
with those obtained from a random sample of registered
voters from District 2. This would be an example of what
type of test?
a) independent samples
b) independent samples only if the sample sizes are equal
c) dependent samples
d) dependent samples only if the sample sizes are equal
8. Which of the following assumptions can be done for the
t-test for the difference between the means of two
independent populations?
One-Way Classification
H0 : 1 = 2 = … = k
H1 : at least two of the means are not equal.
9. Which of the following is NOT a required condition for
one-way ANOVA?
a) The sample sizes must be equal.
b) The populations must all be normally distributed.
c) The population variances must be equal.
d) The samples for each treatment must be selected
randomly and independently.
9. Which of the following is NOT a required condition for
one-way ANOVA?
a) The sample sizes must be equal.
b) The populations must all be normally distributed.
c) The population variances must be equal.
d) The samples for each treatment must be selected
randomly and independently.
10. One-way ANOVA is applied to independent samples
taken from three normally distributed populations with
equal variances. Which of the following is the null
hypothesis for this procedure?
a) H0: μ1 + μ 2 + μ 3 = 0
b) H0: μ1 + μ 2 + μ 3 ≠ 0
c) H0: μ1 = μ 2 = μ 3 = 0
d) H0: μ1 = μ 2 = μ 3
10. One-way ANOVA is applied to independent samples
taken from three normally distributed populations with
equal variances. Which of the following is the null
hypothesis for this procedure?
a) H0: μ1 + μ 2 + μ 3 = 0
b) H0: μ1 + μ 2 + μ 3 ≠ 0
c) H0: μ1 = μ 2 = μ 3 = 0
d) H0: μ1 = μ 2 = μ 3
Measures of Relationship
o Statistics are widely used in the social sciences in
making predictions which are based upon the fact
that two variables are related.
o The process of obtaining the measure of the degree
of relationship or association between variables is
called correlation analysis.
o When a known measure of one variable is used to
make estimates of a second variable, the process is
known as regression analysis.
Regression Analysis
o Regression analysis is the process by which one
variable Y is predicted from another variable X.
o The variable Y is called the dependent variable and
X is called the independent variable or the predictor.
a=
Y X
−b = Y − bX
n n
Correlation Analysis
o Correlation analysis is used to measure the linear
relationship or association between two variables.
o The measure of the degree of association between
two variables is known as the coefficient of
correlation (r).
o The value of r varies from –1 to +1. This can
expressed in the interval – 1 r 1.
o For perfectly positive correlation, r = 1, while in a
perfectly negative correlation, r = –1 .
o If r = 0, then there is no linear relation existing
between the two variables.
Correlation Analysis
o A positive correlation is present when high values in one
variable are associated with high values of another variable or
vice versa.
o On the other hand, when high values on one variable are
associated with low values of the other variable or vice versa,
a negative correlation is present.
Correlation Analysis
o The degree of linear relationship can be interpreted
by using the following range of values:
Range of Value of r Description
0.90 to 1.00 or (-0.90 to -1.00) Very high positive (negative) correlation
0.70 to 0.89 or (-0.70 to -0.89) High positive (negative) correlation
0.50 to 0.69 or (-0.50 to -0.69) Moderate positive (negative) correlation
0.30 to 0.49 or (-0.30 to -0.49) Low positive (negative) correlation
0.00 to 0.29 or ( 0.00 to -0.29) Little, if any correlation
Correlation Analysis
Pearson Product Moment Correlation Coefficient
o is a measure of the linear correlation (dependence) between
two variables X and Y, giving a value between +1 and −1
inclusive, where 1 is total positive correlation, 0 is no
correlation, and −1 is total negative correlation.
o is widely used in the sciences as a measure of the degree of
linear dependence between two variables. It was developed
by Karl Pearson from a related idea introduced by Francis
Galton in the 1880s.
n XY − X Y
Pearson r =
n X − ( x ) n Y − ( Y )
2 2
2 2
11. Assuming that a linear relationship exists between Age
(X) and Job Satisfaction (Y), if the coefficient of
correlation (r) equals -0.95, what does this means?
a) there is very weak correlation.
b) if the value of X is low, the value of Y is high.
c) the value of X is always greater than the value of Y.
d) if the value of X is high, so as the value of Y.
11. Assuming that a linear relationship exists between Age
(X) and Job Satisfaction (Y), if the coefficient of
correlation (r) equals -0.95, what does this means?
a) there is very weak correlation.
b) if the value of X is low, the value of Y is high.
c) the value of X is always greater than the value of Y.
d) if the value of X is high, so as the value of Y.
12. A regression analysis between weight (y in pounds) and
height (x in inches) resulted in the following least squares
line: y = 120 + 5x. This implies that if the height is
increased by 1 inch, the weight is expected to which of
the following?
a) increase by 1 pound.
b) decrease by 1 pound.
c) increase by 5 pounds.
d) increase by 24 pounds.
12. A regression analysis between weight (y in pounds) and
height (x in inches) resulted in the following least squares
line: y = 120 + 5x. This implies that if the height is
increased by 1 inch, the weight is expected to which of
the following?
a) increase by 1 pound.
b) decrease by 1 pound.
c) increase by 5 pounds.
d) increase by 24 pounds.
13. In the simple linear regression model, what does the y-
intercept represents?
a) change in y per unit change in x.
b) change in x per unit change in y.
c) value of y when x = 0.
d) value of x when y = 0.
All these tests are however similar in that they provide decision-
making information about the population and all are based upon the
difference between the observed sample frequencies and some
expected or theoretical frequencies of a population.
The Chi-Square Test, 2
The Test for Goodness - of – Fit
To determine if a set of observed data corresponds to some
theoretical distribution, a chi-square goodness-of-fit test is
performed. It is used to determine whether a set of observed
frequencies of one variable is the same as the expected frequencies
on the same variable.
The basic formula for the chi-square is
( OF −EF )
2
=
2
EF
where OF = observed frequency
EF = expected frequency
The Chi-Square Test, 2
Degrees of Freedom
The number of degrees of freedom is based on the number of cells in
the contingency table. The formula for degrees of freedom is
df = (c – 1)(r – 1)
where c is the number of columns and r is the number of rows in the
contingency table.
If c = 1, df = (r – 1) or if r = 1, df = (c – 1)
To obtain the critical value for the chi-square, use the chi-square
distribution table. The tabular value can be obtained by getting the
intersection of the level of significance and the degrees of freedom.
The Chi-Square Test, 2
Computing Expected Frequencies
Total Row
Observed A B C X
Frequency D E F Y
G H I Z
Total Column Q R S T
Total Row
Tests
Choosing a non-
Choosing Choosing parametric test
parametric test
Correlation test Pearson Spearman
Independent measures,
Independent-measures t-test Mann-Whitney test
2 groups
Independent measures, One-way, independent-
Kruskal-Wallis test
>2 groups measures ANOVA
Repeated measures,
Matched-pair t-test Wilcoxon test
2 conditions
Repeated measures, One-way, repeated
Friedman's test
>2 conditions measures ANOVA
Other Non-Parametric Tests
Spearman's rank correlation coefficient or Spearman's
rho,
• named after Charles Spearman and often denoted by the
Greek letter ρ (rho), is a non-parametric measure of
statistical dependence between two variables.
• It assesses how well the relationship between two variables
can be described using a monotonic function.
• Spearman's coefficient, like any correlation calculation, is
appropriate for both continuous and discrete variables,
including ordinal variables.
• The Spearman correlation coefficient is defined as
the Pearson correlation coefficient between the ranked
variables.
Other Non-Parametric Tests
Mann–Whitney U test
• is a non-parametric test of the null hypothesis that two
populations are the same against an alternative hypothesis,
especially that a particular population tends to have larger
values than the other.
• It has greater efficiency than the t-test on non-normal
distributions, such as a mixture of normal distributions, and it
is nearly as efficient as the t-test on normal distributions.
• It was named after Henry Berthold Mann and Donald
Ransom Whitney.
Other Non-Parametric Tests
Kruskal–Wallis one-way analysis of variance by ranks
(named after William Kruskal and W. Allen Wallis) is a non-
parametric method for testing whether samples originate
from the same distribution.
• It is used for comparing two or more samples that are
independent, and that may have different sample sizes, and
extends the Mann-Whitney U test to more than two groups.
• The parametric equivalent of the Kruskal-Wallis test is
the one-way analysis of variance (ANOVA).
Other Non-Parametric Tests
Wilcoxon signed-rank test
• is a non-parametric statistical hypothesis test used when
comparing two related samples, matched samples, or
repeated measurements on a single sample to assess
whether their population mean ranks differ (i.e. it is a paired
difference test).
• It can be used as an alternative to the paired Student t-
test, t-test for matched pairs, or the t-test for dependent
samples when the population cannot be assumed to
be normally distributed.
• The test is named for Frank Wilcoxon (1892–1965) who, in a
single paper, proposed both it and the rank-sum test for two
independent samples .
Other Non-Parametric Tests
Friedman test
• is a non-parametric statistical test developed by
the U.S. economist Milton Friedman.
• Similar to the parametric repeated measures ANOVA, it is
used to detect differences in treatments across multiple test
attempts.
• The procedure involves ranking each row (or block) together,
then considering the values of ranks by columns.
16. A researcher read that firearm-related deaths for people
aged 1 to 18 were distributed as follows: 74% were
accidental, 16% were homicides, and 10% were
suicides. In her city, there were 68 accidental deaths, 27
homicides, and 5 suicides during the past year. What
statistical test she should use if she wants to test the
claim that the percentages are equal?
a) t-test on dependent sample
b) Anova
c) chi-square
d) pearson r
16. A researcher read that firearm-related deaths for people
aged 1 to 18 were distributed as follows: 74% were
accidental, 16% were homicides, and 10% were
suicides. In her city, there were 68 accidental deaths, 27
homicides, and 5 suicides during the past year. What
statistical test she should use if she wants to test the
claim that the percentages are equal?
a) t-test on dependent sample
b) Anova
c) chi-square
d) pearson r
17. In a large department store, the owner wishes to see
whether the number of shoplifting incidents per day will
change if the number of uniformed security officers is
doubled. The number of shoplifting incidents were
recorded 7 days before security was increased and 7
days after the increased. The owner wants to find out if
there is a difference in the number of shoplifting
incidents before and after the increase in security. What
test will he perform?
a) Kruskal Wallis b) Spearman rho
c) Mann-Whitney d) Wilcoxon signed-rank test
References
Alston, M., & Bowles, W. (2020). Research for social workers: An introduction
to methods. Routledge.
Aufmann et al (2013). Mathematical Excursions 3ed. Brooks/Cole, Cengage
Learning.
Bluman, A. G. (2012). Elementary statistics: a step by step approach 8ed. New
York: McGraw-Hill.
COMAP, Inc. (2013). For all practical purposes: mathematical literacy in today’s
world. New York: W.H Freeman and Company.
Johnson & Mowry (2012). Mathematics: a practical odyssey. Brooks/Cole,
Cengage Learning