0% found this document useful (0 votes)
13 views

CH 8 Data Analysis

Chapter Eight discusses data analysis, outlining the importance of understanding data types, which include categorical and numerical data, before applying statistical techniques. It explains various coding methods for data consistency and describes different types of data analysis, including univariate, bivariate, and multivariate analysis, along with specific statistical methods such as regression and correlation. The chapter emphasizes the significance of these analyses in deriving insights and making informed decisions based on the data.

Uploaded by

Abdiman Habibo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

CH 8 Data Analysis

Chapter Eight discusses data analysis, outlining the importance of understanding data types, which include categorical and numerical data, before applying statistical techniques. It explains various coding methods for data consistency and describes different types of data analysis, including univariate, bivariate, and multivariate analysis, along with specific statistical methods such as regression and correlation. The chapter emphasizes the significance of these analyses in deriving insights and making informed decisions based on the data.

Uploaded by

Abdiman Habibo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

CHAPTER EIGHT

DATA ANALYSIS

Data for Analysis ?

 Data Analysis is the process of systematically applying statistical and/or


logical techniques to describe and illustrate, condense and recap, and
evaluate data.

Before analyzing the data for your research, it is important to know the type of data you have
at hand as the technique you use is determined by the data.
The following figure provides you clear information of the type of data to be used for
research.
1 01-09-2024
2 01-09-2024
8.1.1. Quantitative data can be divided into
two distinct groups:

A. Categorical and
B. Numerical
A. Categorical data

 These are data that can‘t be measured numerically as

quantities.
 Categorical data can be further sub-divided into
3 01-09-2024
1. Nominal- whose values can‘t be measured numerically
or can‘t be ranked. Rather these data simply count the
number of occurrences in each category of a variable.
Examples of nominal variables:
Where a person lives (AA, Adama, B/Dar, etc.)
Gender (male, female)
Nationality (American, Ethiopian, Chinese)
Ethnicity (Oromo, Amhara, Tgire, Gurage…)
4 01-09-2024
2. Ranked/Ordinal data - whose values can be ranked in orders
 Examples of ordinal data

 Education (Elementary school, High school, College Diploma, College

degree, Masters)
 Agreement (strongly disagree, disagree, neutral, agree, strongly agree)

 Rating (poor, fair, good, excellent)

 Frequency (never, often, sometimes; always,, )

 Any other scale (―On a scale of 1 to 5...‖)

5 01-09-2024
 Descriptive data with only two categories are known as

dichotomous data.
 E.g. gender can be divide into female and male.

 Or questions with a ‗yes‘ or ‗No‘ response

6 01-09-2024
Cont…
B. Numerical Data

 Which are sometimes termed ‗quantifiable‘, are those

whose values are measured or counted numerically as


quantities.
 Numerical data can be analysed using a far wider
range of statistics than categorical data.

7 01-09-2024
Coding the Data
 Coding – Process of translating information gathered from
questionnaires or other sources into something that can be
analyzed
 Involves assigning a value to the information given—often value is
given a label.
 Coding can make data more consistent

 Example: Question = Sex

 Answers = Male, Female, M, or F

 Coding will avoid such inconsistencies


11 01-09-2024
Coding Systems
 Common coding systems (code and label) for dichotomous variables:

0=No 1=Yes
(1 = value assigned,Yes= label of value)
OR: 1=No 2=Yes

 When you assign a value, you must also make it clear what that value

means
 In first example above, 1=Yes but in second example 1=No

 As long as it is clear how the data are coded, either is fine


12 01-09-2024
Coding- Ordinal Variables
 Coding process is similar with other categorical variables

 Example: variable EDUCATION, possible coding:

0 = Did not graduate from high school


1 = High school graduate
2 = Some college or post-high school education
3 = College graduate

 Could be coded in reverse order (0=college graduate, 3=did

not graduate high school)


13 01-09-2024
Coding: Nominal Variables
For coding nominal variables, order makes no difference
 Example: variable RESIDENCE

1 = Northeast
2 = South
3 = Northwest
4 = Midwest
5 = Southwest
 Order does not matter, no ordered value associated with each
response
14 01-09-2024
Coding: Continuous Variables
Creating categories from a continuous variable (ex. age) is
common
 May break down a continuous variable into chosen categories by
creating an ordinal categorical variable
 Example: variable = AGE
1 = 0–9 years old
2 = 10–19 years old
3 = 20–39 years old
4 = 40–59 years old
5 = 60 years or older

15 01-09-2024
8.2. Types of Data Analysis
 Is the process of inspecting, cleaning, transforming, and modelling data

with the goal of discovering useful information suggesting conclusions, and


supporting decision making.
 Data analysis can be made using:
(i) Descriptive Statistics
(ii) Inferential Statistics
 Descriptive statistics are used to describe, summarize, or
explain a given set of data.

 inferential statistics is used to infer certain characteristics of


samples to population.
22 01-09-2024
8.2.1. Univariate Analysis
 Is the analysis carried out with the description of single

variable in terms of the applicable unit of analysis.

 Measure of central tendencies and measure of dispersion are

the typical categories of univariate analysis.

24 01-09-2024
A. Measures of Central Tendency

 The three most frequently used measures of central


tendency are
• Mode
• Median and
• Mean

25 01-09-2024
1. Mode
 Mode can be defined as the most frequently occurring value in a
group of observations.
 If the scores for a given sample distributions are:
32, 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45
 Then the mode would be 39 because a score of 39 occurs three

times, more than any other score.

 Mode is very good measure for ascertaining the location of

distribution in the case of nominal data.

26 01-09-2024
2. Median
 Median is defined as the middle value in an ordered arrangement

of observations.

 The median is often used to summarize the location of a distribution.

 Further, the median can be used with ordinal, interval, or ratio

measurements.

 If the scores for a given sample distributions are:


32, 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45
The median will be 38 + 39 = 38.5
2
27 01-09-2024
3. Mean
 The arithmetic mean is the most commonly used and accepted

measure of central tendency.

 This should be used in the case of interval or ratio data.


If the scores for a given sample distributions are:
32, 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45
The mean of the distribution will be:
32+32+35+36+37+38+38+39+39+39+40+40+42+45/14= 38
Mid-mean, geometric mean, mid-range are other types of means. (P.139 of
QRM)

28 01-09-2024
Bivariate Analysis/Relationships between Variables

 Help researchers to know the nature, direction, and significance

of the relationships between two variables in the study.

 Often in practical situations, researchers are interested in

describing associations between variables.

 They try to ascertain how two variables are related with each

other, that is, whether a change in one affects the other.

 The measures of association depend on the nature of the data

and could be positive, negative or neutral.


30 01-09-2024
8.2.1.1. Relation between two nominal variables -X2 Test

This analysis technique is used to know if there is relationship between


two nominal variables.
 E.g. Is viewing television advertisement of a product (yes/No)
related to buying that particular product ( buy/Not buy).
 An international business researcher wants to establish if the

performance ( categorized as loss, breakeven and profit) of a


firm is dependent on which country ( categorized as low, middle
and high income) it is located.

32 01-09-2024
There are three different types of chi-square analysis
1. Chi-square test for goodness of fit

2. Chi-square test for homogeneity

3. Chi-square test of independence

 The first one used to see if the sample has been drawn from
the population and the second if the population are
homogenous with respect to a given characteristics.
 The two are not common and we will focus on the third
type of test
33 01-09-2024
8.2.1. 2. Correlations Analysis
 Correlation is a measure of relationship between two variable. It has wide
application in business and statistics.

 The correlation coefficient describes the direction of the correlation, that is,

whether it is
• Positive or

• Negative,

 And the strength of the correlation, that is, whether an existing correlation is:

• Strong or
• Weak.

35 01-09-2024
8.2.1.3. Bi-variate regression analysis
 Regression is one of the most frequently used techniques in business and

social researches.

 Regression analysis is used to predict the value of one variable (the

dependent variable) on the basis of other variables (the independent


variable).

 The most common form of regression, however, is linear regression,

where the dependent variable is related to the independent variable in a


linear way.

39 01-09-2024
 The linear regression equation takes the
following form

Variables:
X = Independent Variable (we provide this)
Y = Dependent Variable (we observe this)
Parameters:
β0 = Y-Intercept
β1 = Slope
ε = error term
Note: β1 = Indicates the change in the dependent variable for
every unit change in the independent variable

40 01-09-2024
Regression coefficient

Is the measure of how strongly the predictor (IDV)


predicts the DV

There are two types of regression coefficients


1. Unstandardized coefficients
2. Standardized coefficients (Beta Values)

42 01-09-2024
 The unstandardized coefficient can be used in the equation as

coefficients of different independent variables along with the


constant term to predict the value of the dependent variable.
o Difference in “Y” per Unit change in “X”

 The standardized coefficient (Beta) is measured in

standard deviation, i.e. the difference in “Y” in standard


deviation per standard deviation difference in “X”

43 01-09-2024
R values
 R represents the correlation between the observed values and the

predicted values (based on the regression equation obtained) of the


dependent variable.

 Is used to measure the fitness of the model used for the

research.

45 01-09-2024
 R square is the square of R and gives the proportion of variance in the

dependent variable accounted for by the set of independent variables


chosen for the model.

 R-square value tend to be influenced when the number of independent

variables is more or when the number of cases if large.

 Therefore the adjusted R square that takes in to account these things and

provides more accurate information about the fitness of the model.

 While it is not uncommon to get R square value of as high as 0.99 in

natural science, a much lower value (0.10 – 0.20 ) of R2 /R-square


is acceptable in social science research.

46 01-09-2024
2. Multicollinearity

 Is a situation when two or more IVs are highly


correlated to each other.

 If variables are so highly correlated with each other, it is

difficult to come up with reliable estimates of their


individual regression coefficients.

 In other words, when two variables are highly correlated,

they both convey essentially the same information.

49 01-09-2024
How to know the presence of Multicollinearity?
1. If the Variance Inflation Factor ( VIF) > 5 or it mean the Tolerance is < 0.2 as
tolerance is the inverse of VIF

2. If any two IDV have Variance proportion in excess of 0.9 (Column value)
corresponding to any raw in which the condition index is in excess of 30.

 If there is serious multicollinearity problem, try other solutions such as:

 Removing highly correlated predictors

 Linearly combining predictors, such as adding them together


 Running entirely different analyses, such as principal components analysis ( to
know similarities and differences)

50 01-09-2024
8.2. 2. Multivariate Analysis
 In many real life situations, it becomes necessary to analyse

relationship among three or more variables led to the


popularity of multivariate statistics.

 Multivariate statistics techniques look at the pattern of

relationships between several variables simultaneously.

 The following section deals with categories of multivariate

analysis techniques.

51 01-09-2024
8.2. 2. Multivariate Analysis …
8.2.2.1. Multiple linear Regression
 In simple regression, there is one dependent variable and one

independent variable, whereas in

 multiple regression, there is one dependent variable and many

independent variables.

 It examines the relationship between a single metric dependent

variable and two or more metric independent variables

52 01-09-2024
 .

 Assumptions of normality and linearity should be checked before using multiple

regression.

Where: y is a dependent variable and x1, x2, … xk are independent variables and a is
the Y intercept , b1, b2 … bk are the regression coefficient.

Note: All the conditions and tests above are common in case of
multivariate analysis too.
.
53 01-09-2024
End

Thanks

Questions

57 01-09-2024

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy