CH 8 Data Analysis
CH 8 Data Analysis
DATA ANALYSIS
Before analyzing the data for your research, it is important to know the type of data you have
at hand as the technique you use is determined by the data.
The following figure provides you clear information of the type of data to be used for
research.
1 01-09-2024
2 01-09-2024
8.1.1. Quantitative data can be divided into
two distinct groups:
A. Categorical and
B. Numerical
A. Categorical data
quantities.
Categorical data can be further sub-divided into
3 01-09-2024
1. Nominal- whose values can‘t be measured numerically
or can‘t be ranked. Rather these data simply count the
number of occurrences in each category of a variable.
Examples of nominal variables:
Where a person lives (AA, Adama, B/Dar, etc.)
Gender (male, female)
Nationality (American, Ethiopian, Chinese)
Ethnicity (Oromo, Amhara, Tgire, Gurage…)
4 01-09-2024
2. Ranked/Ordinal data - whose values can be ranked in orders
Examples of ordinal data
degree, Masters)
Agreement (strongly disagree, disagree, neutral, agree, strongly agree)
5 01-09-2024
Descriptive data with only two categories are known as
dichotomous data.
E.g. gender can be divide into female and male.
6 01-09-2024
Cont…
B. Numerical Data
7 01-09-2024
Coding the Data
Coding – Process of translating information gathered from
questionnaires or other sources into something that can be
analyzed
Involves assigning a value to the information given—often value is
given a label.
Coding can make data more consistent
0=No 1=Yes
(1 = value assigned,Yes= label of value)
OR: 1=No 2=Yes
When you assign a value, you must also make it clear what that value
means
In first example above, 1=Yes but in second example 1=No
1 = Northeast
2 = South
3 = Northwest
4 = Midwest
5 = Southwest
Order does not matter, no ordered value associated with each
response
14 01-09-2024
Coding: Continuous Variables
Creating categories from a continuous variable (ex. age) is
common
May break down a continuous variable into chosen categories by
creating an ordinal categorical variable
Example: variable = AGE
1 = 0–9 years old
2 = 10–19 years old
3 = 20–39 years old
4 = 40–59 years old
5 = 60 years or older
15 01-09-2024
8.2. Types of Data Analysis
Is the process of inspecting, cleaning, transforming, and modelling data
24 01-09-2024
A. Measures of Central Tendency
25 01-09-2024
1. Mode
Mode can be defined as the most frequently occurring value in a
group of observations.
If the scores for a given sample distributions are:
32, 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45
Then the mode would be 39 because a score of 39 occurs three
26 01-09-2024
2. Median
Median is defined as the middle value in an ordered arrangement
of observations.
measurements.
28 01-09-2024
Bivariate Analysis/Relationships between Variables
They try to ascertain how two variables are related with each
32 01-09-2024
There are three different types of chi-square analysis
1. Chi-square test for goodness of fit
The first one used to see if the sample has been drawn from
the population and the second if the population are
homogenous with respect to a given characteristics.
The two are not common and we will focus on the third
type of test
33 01-09-2024
8.2.1. 2. Correlations Analysis
Correlation is a measure of relationship between two variable. It has wide
application in business and statistics.
The correlation coefficient describes the direction of the correlation, that is,
whether it is
• Positive or
• Negative,
And the strength of the correlation, that is, whether an existing correlation is:
• Strong or
• Weak.
35 01-09-2024
8.2.1.3. Bi-variate regression analysis
Regression is one of the most frequently used techniques in business and
social researches.
39 01-09-2024
The linear regression equation takes the
following form
Variables:
X = Independent Variable (we provide this)
Y = Dependent Variable (we observe this)
Parameters:
β0 = Y-Intercept
β1 = Slope
ε = error term
Note: β1 = Indicates the change in the dependent variable for
every unit change in the independent variable
40 01-09-2024
Regression coefficient
42 01-09-2024
The unstandardized coefficient can be used in the equation as
43 01-09-2024
R values
R represents the correlation between the observed values and the
research.
45 01-09-2024
R square is the square of R and gives the proportion of variance in the
Therefore the adjusted R square that takes in to account these things and
46 01-09-2024
2. Multicollinearity
49 01-09-2024
How to know the presence of Multicollinearity?
1. If the Variance Inflation Factor ( VIF) > 5 or it mean the Tolerance is < 0.2 as
tolerance is the inverse of VIF
2. If any two IDV have Variance proportion in excess of 0.9 (Column value)
corresponding to any raw in which the condition index is in excess of 30.
50 01-09-2024
8.2. 2. Multivariate Analysis
In many real life situations, it becomes necessary to analyse
analysis techniques.
51 01-09-2024
8.2. 2. Multivariate Analysis …
8.2.2.1. Multiple linear Regression
In simple regression, there is one dependent variable and one
independent variables.
52 01-09-2024
.
regression.
Where: y is a dependent variable and x1, x2, … xk are independent variables and a is
the Y intercept , b1, b2 … bk are the regression coefficient.
Note: All the conditions and tests above are common in case of
multivariate analysis too.
.
53 01-09-2024
End
Thanks
Questions
57 01-09-2024