Processing of Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

PROCESSING OF DATA

The collected data in research is processed and analyzed to


come to some conclusions or to verify the hypothesis
made.
Processing of data is important as it makes further analysis of
data easier and efficient. Processing of data technically
means
1. Editing of the data
2. Coding of data
3. Classification of data
4. Tabulation of data.
EDITING:
Data editing is a process by which collected data is
examined to detect any errors or omissions and further
these are corrected as much as possible before
proceeding further.

At two levels:
Micro Editing
Macro Editing

Editing is of two types:


1. Field Editing
Need/ Objectives of Data Editing

1. To check data accuracy


2. Remove Errors
3. Improve Quality
4. Performance of field staff
5. Decision Making
How is Data Edited?

1. The incorrect answer


2. Incomplete Answer
3. Inconsistent Answer
4. Don’t know answer/ “ No reply “
FIELD EDITING:
This is a type of editing that relates to abbreviated or illegible
written form of gathered data. Such editing is more effective when
done on same day or the very next day after the interview. The
investigator must not jump to conclusion while doing field editing.
CENTRAL EDITING:
Such type of editing relates to the time when all data collection
process has been completed. Here a single or common editor
corrects the errors like entry in the wrong place, entry in wrong
unit e.t.c. As a rule all the wrong answers should be dropped from
the final results.
CODING:
Classification of responses may be done on
the basis of one or more common concepts.
In coding a particular numeral or symbol is
assigned to the answers in order to put the
responses in some definite categories or
classes.
CODING:
The classes of responses determined by the researcher should be
appropriate and suitable to the study.
Coding enables efficient and effective analysis as the responses are
categorized into meaningful classes.
Coding decisions are considered while developing or designing the
questionnaire or any other data collection tool.
Coding can be done manually or through computer.
Essentials of CODING:
Should cover all types of answers
given by respondents
Categories should not overlap
Code sheet prepared for quick
reference
Done by knowledgeable person
Use of numbers, symbols, alphabets
CLASSIFICATION:
 Classification of the data implies that the collected raw
data is categorized into common group having common
feature.
 Data having common characteristics are placed in a
common group.
 The entire data collected is categorized into various groups
or classes, which convey a meaning to the researcher.
Classification is done in two ways:
1. Classification according to attributes.
2. Classification according to the class intervals.
CLASSIFICATION ACCORDING THE THE ATTRIBUTES:
Here the data is classified on the basis of common characteristics
that can be descriptive like literacy, sex, honesty, marital status
e.t.c. or numeral like weight, height, income e.t.c.
Descriptive features are qualitative in nature and cannot be
measured quantitatively but are kindly considered while making an
analysis.
Analysis used for such classified data is known as statistics of
attributes and the classification is known as the classification
according to the attributes.
CLASSIFICATION ON THE BASIS OF THE INTERVAL:
The numerical feature of data can be measured quantitatively
and analyzed with the help of some statistical unit like the
data relating to income, production, age, weight e.t.c.
come under this category. This type of data is known as
statistics of variables and the data is classified by way of
intervals.
CLASSIFICATION ACCORDING TO THE CLASS
INTERVAL USUALLY INVOLVES THE FOLLOWING
THREE MAIN PROBLEMS:
1. Number of Classes.
2. How to select class limits.
3. How to determine the frequency of each class.
TABULATION:
The mass of data collected has to be arranged in some kind of
concise and logical order.
Tabulation summarizes the raw data and displays data in form
of some statistical tables.
Tabulation is an orderly arrangement of data in rows and
columns.
OBJECTIVE OF TABULATION:
1. Conserves space & minimizes explanation and descriptive
statements.
2. Facilitates process of comparison and summarization.
3. Facilitates detection of errors and omissions.
4. Establish the basis of various statistical computations.
BASIC PRINCIPLES OF TABULATION:
1. Tables should be clear, concise & adequately titled.
2. Every table should be distinctly numbered for easy
reference.
3. Column headings & row headings of the table should be
clear & brief.
4. Units of measurement should be specified at appropriate
places.
5. Explanatory footnotes concerning the table should be
placed at appropriate places.
6. Source of information of data should be clearly indicated.
7. The columns & rows should be clearly separated with dark lines
8. Demarcation should also be made between data of one class
and that of another.
9. Comparable data should be put side by side.
10. The figures in percentage should be approximated before
tabulation.
11. The alignment of the figures, symbols etc. should be properly
aligned and adequately spaced to enhance the readability of the
same.
12. Abbreviations should be avoided.
ANALYSIS OF DATA

The important statistical measures that are used to analyze


the research or the survey are:
1. Measures of central tendency(mean, median & mode)
2. Measures of dispersion(standard deviation, range, mean
deviation)
3. Measures of asymmetry(skew ness)
4. Measures of relationship etc.( correlation and regression)
5. Association in case of attributes.
6. Time series Analysis
TESTING THE HYPOTHESIS
Several factor are considered into the determination of the
appropriate statistical technique to use when conducting a
hypothesis tests. The most important are as:
1. The type of data being measured.
2. The purpose or the objective of the statistical inference.

Hypothesis can be tested by various techniques. The


hypothesis testing techniques are divided into two broad
categories:
1. Parametric Tests.
2. Non- Parametric Tests.
PARAMETRIC TESTS:

These tests depends upon assumptions typically that the


population(s) from which data are randomly sampled
have a normal distribution. Types of parametric tests are:

1. t- test
2. z- test
3. F- test
4. 2- test
PRECAUTIONS IN INTERPRETATION:
1. Researcher must ensure that the data is appropriate, trust
worthy and adequate for drawing inferences.
2. Researcher must be cautious about errors and take due
necessary actions if the error arises
3. Researcher must ensure the correctness of the data
analysis process whether the data is qualitative or
quantitative.
4. Researcher must try to bring out hidden facts and un
obvious factors and facts to the front and combine it with
the factual interpretation.
5. The researcher must also ensure that there should be
constant interaction between initial hypothesis, empirical
observations, and theoretical concepts.
Meaning of Analysis
• Data analysis is a method in which data is collected and organized so
that one can derive helpful information from it.
• In other words, the main purpose of data analysis is to look at what
the data is trying to tell us.
• For example, what does the data show or do? What does the data
not show or do?
Features of Data Analysis
• In depth Evaluation of organized data
• After Tabulation and before interpretation
• Process of re-arranging data
• Meaningful data
• Improves quality of research
Definition of Descriptive Statistics
• Quantitatively describes the important characteristics of the
dataset.
• For the purpose of describing properties,
• it uses measures of central tendency, i.e. mean, median, mode
and the
• measures of dispersion i.e. range, standard deviation, quartile
deviation and variance, etc.

• The data is summarised by the researcher, in a useful way, with


the help of numerical and graphical tools such as charts, tables,
and graphs, to represent data in an accurate way.
Definition of Inferential Statistics
• Inferential Statistics is all about generalising from the sample to the
population, i.e. the results of analysis of the sample can be deduced to
the larger population, from which the sample is taken.
• It is a convenient way to draw conclusions about the population when it
is not possible to query each and every member of the universe.
• The sample chosen is a representative of the entire population; therefore,
it should contain important features of the population.
• Inferential Statistics is used to determine the probability of properties of
the population on the basis of the properties of the sample, by employing
probability theory.
• The major inferential statistics are based on the statistical models such as
Analysis of Variance, chi-square test, student’s t distribution, regression
analysis, etc. Methods of inferential statistics:
Basis for Comparison Descriptive Statistics Inferential Statistics

Inferential Statistics is a type of


Descriptive Statistics is that branch of statistics, that focuses on drawing
Meaning statistics which is concerned with conclusions about the population, on
describing the population under study. the basis of sample analysis and
observation.

Organize, analyze and present data in a


What it does? Compares, test and predicts data.
meaningful way.

Form of final Result Charts, Graphs and Tables Probability

To explain the chances of occurrence


Usage To describe a situation.
of an event.

It attempts to reach the conclusion to


It explains the data, which is already
Function learn about the population, that
known, to summarize sample.
extends beyond the data available
No Correlation
If there is absolutely no correlation present, the value given is 0.
Perfect linear correlation:
A perfect positive correlation is given the value of 1.
A perfect negative correlation is given the value of -1.
Strong linear correlation: The closer the number is to 1 or -1, the
stronger the correlation, or the stronger the relationship between the variables.
Weak linear correlation:
The closer the number is to 0, the weaker the correlation.
Causal Analysis
• “Why "of research – cause and effect

• Root cause, reasons, effects, results, consequences

• Circular cause – Birth and population

• Systemic cause – multiple related causes joined together


• Eg: cultural change
Data Interpretation Importance
• Evaluation of collected and drawing conclusions
• Important step in Marketing Research
• Recommendation only after critical analysis
• Logical conclusions
• Creative process
Data Interpretation Steps
• Four steps in data interpretation
1. Evaluating data
2. Draw tentative conclusions
3. Test tentative conclusion
4. Arrive at final conclusion
Chi Square Test
• Bivariate Analysis
• Relationship of two continuous variables on interval or ratio scale
• Consumer survey
Code Income of Respondents Code Purchase Intention
1 Less than 10,000 1 No
2 10,001 – 20,000 2 Low
3 20,001 – 30,000 3 High
4 Above 30,000 4 Very High

Intent Code Less than 10,000 10,001 – 20,000 20,001 – 30,000 Above 30,000 Total

None 1 1 - - - 1
Low 2 1 1 1 1 4
High 3 - 1 1 1 3
Very High 4 - - 1 1 2
Total 2 2 3 3 10
Z Test and T Test
• Generally, z-tests are used when we have large sample sizes (n > 30),
• whereas t-tests are most helpful with a smaller sample size (n < 30).
• Both methods assume a normal distribution of the data,
• T-tests are more commonly used than Z-tests.
• but the z-tests are most useful when the standard deviation is known.
• Recognize possible influences of sampling errors
Null Hypothesis
• A null hypothesis is a statistical hypothesis in which there is no significant
difference exist between the set of variables.

• It is the original or default statement, with no effect, often represented by H0


(H-zero).

• It is always the hypothesis that is tested


Alternate Hypothesis
• statistical hypothesis used in hypothesis testing, which states that there is a
significant difference between the set of variables
is often referred to as the hypothesis other than the null hypothesis, often
denoted by H1 (H-one)

• It is what the researcher seeks to prove in an indirect way, by using the


test.
The acceptance of alternative hypothesis depends on the rejection of the
null hypothesis i.e. until and unless null hypothesis is rejected, an
alternative hypothesis cannot be accepted
Examples
• Null Hypothesis: There is no gender effect regarding those who eat
vegetarian meals on a regular basis
• Alternative Hypothesis: Females are more likely than males to eat
vegetarian meals on a regular basis .

• Null Hypothesis: There is no difference in the amount of weight loss


when comparing a low carbohydrate diet with a low fat diet
• Alternative Hypothesis: The mean weight loss should be greater for
those on a low carbohydrate diet when compared with those on a
low fat diet
Action Student 1 (Regular and Student2 (Irregular and
Topper) scoring less marks)

Passes Correct Incorrect (Beta) error

Fails Incorrect (Alpha) Correct


error
Type I Error
• When the null hypothesis is true and you reject it, you make a type I
error.
• The probability of making a type I error is α, which is the level of
significance you set for your hypothesis test.

• An α of 0.05 indicates that you are willing to accept a 5% chance that


you are wrong when you reject the null hypothesis.

• To lower this risk, you must use a lower value for α.


Type II Error
• When the null hypothesis is false and you fail to reject it, you make a
type II error.
• The probability of making a type II error is β, which depends on the
power of the test.
• You can decrease your risk of committing a type II error by ensuring
your test has enough power.

• The probability of rejecting the null hypothesis when it is false is


equal to 1–β. This value is the power of the test.
Level of Significance
• Maximum Level of type1 error which which we would be willing to
risk

• Can be 5 % which means on an average 5% Type 1 error. 95%


confidence of no error

• Can be 1% which means on an average 1% Type2 error


Types of Data Displays
Data Tabulation
• Tabulation is the systematic arrangement of the statistical data in
columns or rows. It involves the orderly and systematic presentation
of numerical data in a form designed to explain the problem under
consideration. Tabulation helps in drawing the inference from the
statistical figures.
Data Tabulation
• Sorting and counting
• Grouping, classification and summary
• Speed and accuracy
• Understand multiple variables
• Manual Tabulation – Tally Marks
• Machine Tabulation
ONE-WAY TABLE TWO-WAY TABLE
DIVISION POPULATION DIVISIO POPULATION
(Millions) N (Millions)
10.875968 Male Female Total
Karachi 14.186954 Karachi
Hyderabad 12.994401 Hyderab
Sukkur ad
Sukkur
THREE-WAY TABLE

DIVISION POPULATION (Millions)


Male Female Total
Litera Illitera Total Litera Illitera Total Litera Illiter
te te te te te ate

Karachi
Hyderaba
d
Sukkur
Tally Chart
“Favorite Pets”
(Grade 1)
Frequency Table
“Favorite Pets”
(Grades 2 and 3)

Note: A frequency table may or may not have a column for the tally marks.
Bar Graphs
(Grades 2, 3, 4, 5)
Bar graph Advantages Disadvantages

A bar graph displays •Visually strong •Graph categories can be


discrete data in separate reordered to emphasize
columns. A double bar •Can easily compare two or certain effects.
graph can be used to three data sets.
compare two data sets. •Use only with discrete data
Categories are considered
unordered and can be
rearranged alphabetically,
by size, etc.
Bar Graphs
(Grades 2, 3, 4, 5)

Grade 2: Single Bar Graph

Grade 3: Single Bar Graph

Grade 4: Double Bar Graph

Grade 5: Multi-Bar Graph


Bar Graphs Example
Horizontal Bar Graph Example
Vertical Bar Graph Example
Which Direction?
Vertical Bar Graph
Displays data better than horizontal bar graphs, and is preferred when
possible.

Horizontal Bar Graph


Useful when category names are too long to fit at the foot of a
column.
Vertical vs. Horizontal
Double Bar Graph
(Grade 4)
Multi-Bar Graph
(Grade 5)
Line Graph
(Grades 3, 4, 5)

Line graph Advantages Disadvantages

A line graph plots •Can compare multiple •Use only with


continuous data as continuous data sets continuous data
points and then joins easily
them with a line.
Multiple data sets can be
graphed together, but a •Interim data can be
key must be used. inferred from graph line.
Line Graph
(Grades 3, 4, 5)

• Grade 3: Single Line Graph

• Grade 4: Single Line Graph

• Grade 5: Double Line Graph


1. Line Graph
• The line graphs are usually drawn to
represent the time series data
Example: temperature, rainfall, population
growth, birth rates and the death rates.
Single Line Graph
(Grades 3 and 4)
Double Line Graph
(Grade 5)
Pie Chart – Circle Graph
(Grade 4)
Pie chart Advantages Disadvantages

A pie chart displays data as •Visually appealing •No exact numerical data
a percentage of the whole.
Each pie section should •Shows percent of total for •Hard to compare 2 data
have a label and each category. sets
percentage. A total data
number should be included.
•“Other” category can be a
problem

•Total unknown unless


specified

•Best for 3 – 7 categories

•Use only with discrete data


Pie Chart – Circle Graph Example
Pie (circle) charts - more info
• A way of summarizing a set of categorical data or displaying the different values
of a given variable (e.g. percentage distribution).

• A circle is divided into a series of segments. Each segment represents a particular


category.

• The area of each segment is the same proportion of a circle’s area as the category
is of the total data set.

• Quite popular. Circle provides a visual concept of the whole (100%).


• Best used for displaying statistical information when
there are no more than six components – otherwise,
the resulting picture will be too complex to
understand.

• Pie charts are not useful when the values of each


component are similar because it is difficult to see the
differences between slice sizes.
Histograms
(Grade 6)
Histogram Advantages Disadvantages

A histogram is a type of bar •Visually strong •Cannot read exact values


graph that displays because data is grouped
continuous data in ordered •Can compare to normal into categories.
columns. Categories are of curve
continuous measure such •More difficult to compare
as time, inches, two data sets.
temperature, etc. •Usually vertical axis is a
frequency count of items
falling into each category. •Use only with continuous
data.
Histogram
Histogram
Stacked Vertical Bar Graph Example
Histogram Example
(a type of bar graph)
Frequency Polygon
Salaries of Acme
Scatter Plot

Scatter plot Advantages Disadvantages

A scatter plot displays the •Shows a trend in the data •Hard to visualize results in
relationship between two relationship large data sets
factors of the experiment. A
trend line is used to •Retains exact data values •Flat trend line gives
determine positive, negative and sample size. inconclusive results.
or no correlation.

•Shows minimum/maximum •Data on both axes should


and outliers be continuous.
Scatter Plot

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy