0% found this document useful (0 votes)
8 views

1Overview-on-Data-Analysis

Module 1 provides an overview of data analysis, including its definition, major approaches, and the importance of correctly treated data. It covers key concepts such as descriptive and inferential statistics, data types, and various data collection and sampling methods. The module aims to equip students with the skills to analyze and interpret data using statistical tools and techniques.

Uploaded by

jamespacilan06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

1Overview-on-Data-Analysis

Module 1 provides an overview of data analysis, including its definition, major approaches, and the importance of correctly treated data. It covers key concepts such as descriptive and inferential statistics, data types, and various data collection and sampling methods. The module aims to equip students with the skills to analyze and interpret data using statistical tools and techniques.

Uploaded by

jamespacilan06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

MODULE 1

Overview on Data Analysis


Prepared by:
Asst. Prof. ISRAEL P. PENERO
Learning Outcomes

At the end of this lesson, the student should be able to:


1. Define what data analysis is.
2. Name and explain all major of data analysis, its
approaches and the required specification in analyzing the
data.
3. Differentiate descriptive into inferential statistics.
4. Define and explain some important terms in data analysis.
5. Identify data according to its level of measurement.
The world is full of data. If you are
going to look around, you will see
different types and kinds of data. Any
information whether it is a numerical or
non-numerical information, we could
say that these things are called data.
It is very essential to study data
analysis and one of the importance is
this would be a backbone of one’s
nation economy.
Module one presented an overview about data
analysis, where and how to get data whether it is
quantitative or qualitative data and why correctly
treated data are very important in every aspect of
our lives. The process of analyzing data as well as
its major approaches and the required specification
will also be discussed on this module. Important
terms to be used for the entire course will also be
presented together with the discussion about the
level of measurements of data.
But what do we mean by data analysis, what are the
different statistical instrument to be used in order to
analyse data and how do we analyse it?

This course will help the students on how to use


different statistical instrument to be able to interpret and
analyse the data?

The course covers on how to use different methods to


obtain data in different type of measurement and
information, probability, sampling techniques and
estimation, statistical interval, hypothesis testing,
statistical inference and the simple linear regression and
correlation.
Based on the e-book of
“Tutorial Points: Simply Easy
Learning on Data Analysis with
Excel”, data analysis is a process
of inspecting, cleaning,
transforming and modeling data
with the goal of discovering
useful information, suggesting
Several data analysis techniques exist
encompassing various domains such as
business, science, social science, etc. with a
variety of names. The major data analysis
approaches are as follows:
 Data Mining
 Business Intelligence
 Statistical Analysis
 Predictive Analytics
 Text Analytics
Data Mining

Data Mining is the analysis of large quantities of data to


extract previously unknown, interesting patterns of data,
unusual data and the dependencies. Note that the goal is the
extraction of patterns and knowledge from large amounts of
data and not the extraction of data itself.

Data mining analysis involves computer science


methods at the intersection of the artificial intelligence,
machine learning, statistics, and database systems. The
patterns obtained from data mining can be considered as a
summary of the input data that can be used in further
analysis or to obtain more accurate prediction results by a
Business Intelligence

Business Intelligence techniques and tools


are for acquisition and transformation of large
amounts of unstructured business data to help
identify, develop and create new strategic
business opportunities.
The goal of business intelligence is to allow
easy interpretation of large volumes of data to
identify new opportunities. It helps in
implementing an effective strategy based on
insights that can provide businesses with a
Statistical Analysis

Statistics is the study of collection, organization, presentation, analysis, and


interpretation of data.

In data analysis, two main statistical methodologies are used.

Descriptive statistics: In descriptive statistics, data from the entire population or a


sample is summarized with numerical descriptors such as Mean, Standard Deviation
for Continuous Data, Frequency, Percentage for Categorical Data

Inferential statistics: It uses patterns in the sample data to draw inferences about
the represented population or accounting for randomness. These inferences are as
follows:

• answering yes/no questions about the data (hypothesis testing)


• estimating numerical characteristics of the data (estimation)
• describing associations within the data (correlation)
• modeling relationships within the data (e.g. regression analysis)
Statistical Inference
The purpose of collecting
data on a sample is not simply to
have data on that sample.
Researchers take the sample in
order to infer from that data some
conclusion about the wider
population represented by the
Statistical inference provides methods for
drawing conclusions about a population from
sample data.
Predictive Analytics

Predictive Analytics use statistical


models to analyze current and
historical data for forecasting
(predictions) about future or
otherwise unknown events. In
business, predictive analytics is used
to identify risks and opportunities
Text Analytics

Text Analytics, also referred to as Text


Mining or as Text Data Mining is the process of
deriving high-quality information from text.
Text mining usually involves the process of
structuring the input text, deriving patterns
within the structured data using means such
as statistical pattern learning, and finally
evaluation and interpretation of the output.
Data Analysis Process
Data Analysis is defined by the statistician John
Tukey in 1961 as "Procedures for analyzing data,
techniques for interpreting the results of such
procedures, ways of planning the gathering of data
to make its analysis easier, more precise or more
accurate, and all the machinery and results of
(mathematical) statistics which apply to analyzing
data.”
Thus, data analysis is a process for obtaining
large, unstructured data from various sources and
converting it into information that is useful for: (a)
answering questions, (b) test hypotheses, (c)
decision-making and (d) disproving theories
SOME MEASUREMENTS
IN
DESCRIPTIVE
AND
INFERENTIAL STATISTICS
In analyzing the gathered data, concepts of statistics are very essential. It could help us to lead
into a correct result and conclusion if we use the appropriate statistical tool in treating the data.
But how can we identify if the statement is descriptive or an inferential?

These are some illustrations to be able to identify if the statement belongs to


descriptive or inferential statistics.

A. Rainfall

a) Descriptive

The average rainfall during the 10 months for the year 2015 is 3.4 cubic mm.

b) Inferential

The average rainfall during the 10 months for the year 2015 is 3.4 cubic mm
and by the next 10 months in a year 2016, we can expect rainfall between 3.2
and 3.5 cubic mm.
B. Academic Records

a) Descriptive

The academic records of the graduating students of the College of


Informatics and Computing Sciences of Batangas State University for the
last four years are that 85% of the entering freshmen eventually
graduated.

b) Inferential

If one of the members of a graduating student will be graduated, we could


conclude that from this study, the chances of graduating are better than
the 80%.
DEFINITION OF
TERMS
Data

These are the facts or


set of information
gathered or under study.
Examples:

Numerical Information Non-Numerical


Information

i. Age i. Level of Awareness


ii. Height ii. Emotions
iii. Numerical Grade iii. Grade in letter such as
A,
A+, A-, etc.
Primary data

These are information collected from


sources such as personal interviews,
questionnaires or survey with a specific
intention on a specific subject.
Information may also come from the
observations and discussions by the
researcher himself or herself. It can be a
lengthy process but does provide first-
Example

A face-to-face interview or focal


group discussion with the aid of interview
guide questions. Information that may
gather can be recorded and transcribed.
Secondary data

These are information that already


available somewhere, like journals, the
internet, a company’s records or, on a larger
scale, in corporate or governmental archives.
Secondary data allows for comparison of, say,
several years worth of statistical information
relating to, for example, a sector of the
economy, where the information may be used
to measure the effects of change or whatever
it is that is being researched.
Example:
i.News gathered from the television
network
ii.Article from newspapers or journals
whether it is print or direct from the
internet
iii.Result in published and unpublished
reading materials such as thesis and
dissertations
Qualitative Data

These are the data attributes


which cannot be subjected to
meaningful arithmetic. It involves or
may be influenced by subjectivity
(bias). Examples of qualitative data
are attitudes, feelings, willingness,
likeliness, etc.
Examples:

i.attitudes
ii.feelings
iii.willingness
iv.likeliness
Quantitative Data

These are numerical in nature and


therefore meaningful arithmetic can be done.
It is an expression of a property or quality in
numerical terms. Data measured and
expressed in quantity enables (1) precise
measurement; (2) knowing trends or changes
over time; and (3) comparisons of trends or
individual unit. Examples of quantitative data
are the number of students in a class,
Examples:

i.number of students in a class


ii.numerical grade of students
iii.number of heartbeats in a specific
time
Discrete Data

It assumes only exact


values and can be
obtained by counting.
Example:
i.Number of siblings
ii.Total number of houses in a certain
barangay
iii.Number of members of the family
Continuous Data

It assumes infinite
values within a specified
interval and can be
obtained by
measurement.
Example:
i.Height
ii.Average grade
iii.Age (in years, month, week, days, hour,
minute, seconds,…)
VARIABLES/DATA
ACCORDING TO
LEVELS OF
MEASUREMENT
Nominal data or Nominal Scale

- is data collected which are simply label or


names or categories that has no intrinsic ordering
to the categories
- is sometimes called a categorical variable
and it cannot be measured
- is an observations with the same label
belong to the same category.
- is the lowest level of measurement
- are frequencies or counts of observations
belong to the same category can be obtained.
- sounds like name
Example:

i.Brand of laundry soap


ii.Name of dishes
iii.Name of the Fast-food chain
iv.Local celebrities
Ordinal data

- is similar to a categorical variable


- is data collected which are labels or
classes with an implied ordering in these labels.
- is distance between two labels cannot be
quantified or the difference between the two is
that there is a clear ordering of the variables
- is a level of measurement that is higher
than nominal
- where ranking can be done on the data
Example:

i. Academic rank
ii.Beauty title in a certain pageant
iii.Military rank
iv.Level of Agreement (Strongly Agree, Agree,
Disagree, Strongly Disagree)
Interval data

-is similar to an ordinal variable, except that the


intervals between the values of the interval variable
are equally spaced.
-is data collected can be ordered, and in addition
it may be added or subtracted but cannot be divided
nor multiplied.
- is the distance between any two numbers on the
scale are of known size, the unit of measurement is
constant but arbitrary. It does not have of “true” zero
point.
- is a level of measurement which is higher than
Example:

i.Time
ii.Age in terms of years
iii.Number of family members from one
occupied house to another.
Ratio

-is data collected that has all


the properties of the interval scale
and in addition it
can be multiplied and divide.
-is data that has a true zero
point
-is the highest level of
Example:

i.Score in a certain quiz.


ii.Income
iii.Result of the number of votes in a
presidential election.
DATA COLLECTION
AND
SAMPLING METHOD
Gathering data is one of the tedious parts of a
researcher. There is more time and effort that a
researcher should spend in order to finish his/her
research work and a lot of patients must take place
from time to time. Collecting information from
different resources is not a joke and before a researcher
treats the gathered data, he/she must use the correct
method of data collection and the number of samples
that must be used.
When a researcher uses statistical inference where
this phase of statistics provides methods for drawing
conclusions about a population, the correct number of
sample from the population is very essential and in order
to do that, the researcher needs to know the total number
of population so that he/she would know the number of
samples could be used in an investigation. We could use
the G* Power or the Raosoft sampling calculator. But
what is sampling and what do we mean by sample?
Sampling is the act, process, or technique of
selecting an appropriate sample, or a representative part
of a population for the purpose of determining the
characteristics of the whole population while sample
refers to the portion of a population to represents their
characteristics or property whereby the members of the
group or set vary or differ from one another and the
sample size can be obtained with the aid of G* Power
or with the use of Raosoft sampling calculator as
mentioned previously.
SAMPLING TECHNIQUES
In doing sampling, it is very important
that you must know what appropriate and
correct sampling technique that you have to
use. There are different kinds of sampling
techniques in both quantitative and
qualitative research. Here in our
discussion, we will be using the quantitative
part in the sampling technique or the
probability sampling method.
Simple Random Sampling. It refers
that all members of the population
have a chance of being included in
the sample. This sampling technique
is also known as a fishbowl or lottery
sampling technique.
Example:

Your teacher wants to determine the performance of his


students in the preliminary examination in Data Analysis.
He already knew the number of students as his sample.
What he does is the name of his student written individually
on a slip of paper and placed in a bowl. Then, he picked the
paper one at a time until he completed the total number of
desired samples.
Systematic Sampling. It
refers to the selection of every
kth member of the population
with the starting point
determined at random.
Example:
Let us say in one of the sections your teacher
will be his/her population then based on the
population (section), a teacher will do the
systematic sampling technique. Your teacher will
determine the starting point and in every kth
position, it will serve as his sample until he
completes the desired number of samples
(students).
Stratified Random Sampling.
This is used when the population
can be subdivided into several
smaller groups or strata, and then
samples are randomly selected
from each stratum.
Example:
Let us say your subjects are the students in Batangas
State University - Alangilan. We all know that there is a
large number of students who are officially enrolled in the
campus. Once you know the desired number of students as
your sample, you can break into subgroups the students into
its college, such as College of Engineering, College of
Architecture, Fine Arts and Design, College of Informatics
and Computing Sciences and the College of Industrial
Technology and from each college, you need to figure out the
number of students as sample size that you need.
Cluster Sampling. This is
sometimes called area sampling.
It is usually used when the
population is very large. In this
technique groups or clusters
instead of individuals are random.
Example:

Let us say you will be using the cluster sampling method.


The first thing that you do is to divide the population into N
groups called clusters and from each cluster do the random
sampling and determine the desired number of samples per
group to generate the total number of samples.

Note: The first two could be used if we have a small number


of population while the last two if we have a large number in
a certain population.
DATA ANALYSIS
PROCESS
In our previous lesson, we define data
analysis as a process of collecting,
transforming, cleaning, and modeling data
with the goal of discovering the required
information.
The results so obtained are
communicated, suggesting conclusions, and
supporting decision-making. Data
visualization is at times used to portray the
data for the ease of discovering the useful
patterns in the data.
Data Analysis Process consists of the
following phases that are iterative in
nature.

 Data Requirements Specification


 Data Collection
 Data Processing
 Data Cleaning
 Data Analysis
 Communication
FLOW CHART ON DATA ANALYSIS PROCESS
Data Requirements
Specification
The data required for analysis is
based on a question or an experiment.
Based on the requirements of those
directing the analysis, the data
necessary as inputs to the analysis is
identified (e.g., Population of people).
Specific variables regarding a
population (e.g., Age and Income) may
be specified and obtained. Data may
be numerical (quantitative) or
Data Collection

Data Collection is the process of gathering information


on targeted variables identified as data requirements. The
emphasis is on ensuring accurate and honest collection of
data. Data Collection ensures that data gathered is accurate
such that the related decisions are valid. Data Collection
provides both a baseline to measure and a target to improve.

Data is collected from various sources ranging from


organizational databases to the information in web pages.
The data thus obtained, may not be structured and may
contain irrelevant information. Hence, the collected data is
required to be subjected to Data Processing and Data
Data Processing

The data that is collected must be


processed or organized for analysis. This
includes structuring the data as required
for the relevant Statistical Analysis Tools.
For example, the data might have to be
placed into rows and columns in a table
within a Spreadsheet or Statistical
Application. A Data Model might have to
Data Cleaning

The processed and organized data may be


incomplete, contain duplicates, or contain errors.
Data Cleaning is the process of preventing and
correcting these errors.
There are several types of Data Cleaning that
depend on the type of data. For example, while
cleaning the financial data, certain totals might be
compared against reliable published numbers or
defined thresholds. Likewise, quantitative data
methods can be used for outlier detection that
would be subsequently excluded in analysis.
Data Analysis
Data that is processed, organized and cleaned would be ready for the analysis.
Various data analysis techniques are available to understand, interpret, and derive
conclusions based on the requirements. Data Visualization may also be used to
examine the data in graphical format, to obtain additional insight regarding the
messages within the data.

Statistical Data Models such as Correlation, Regression Analysis can be used to


identify the relations among the data variables. These models that are descriptive of
the data are helpful in simplifying analysis and communicate results.

The process might require additional Data Cleaning or additional Data


Collection, and hence these activities are iterative in nature.

The results of the data analysis are to be reported in a format as required by


the users to support their decisions and further action. The feedback from the users
might result in additional analysis.

The data analysts can choose data visualization techniques, such as tables
and charts, which help in communicating the message clearly and efficiently to the
users. The analysis tools provide facility to highlight the required information with

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy