0% found this document useful (0 votes)

3 views

Chapter 2 DS

Chapter 2 discusses various data types, including structured, semi-structured, and unstructured data, along with their characteristics and differences. It also categorizes data based on its collection methods into primary and secondary data, and outlines different data types in statistics such as nominal, ordinal, discrete, and continuous data. Additionally, the chapter covers various data sources including databases, files, APIs, web scraping, sensors, and social media.

Uploaded by

trexwarrior92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Chapter 2 DS

Uploaded by

trexwarrior92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Chapter-2

Data Types and Sources

Data Types and Sources: Different types of data: structured, unstructured,
semi-structured, Data sources: databases, files, APIs, web scraping, sensors,
social media

Data can be Structured data, Semi-structured data, and Unstructured data.

1. Structured
Structured data is data whose elements are addressable for effective
analysis. It has been organized into a formatted repository that is typically
a database. It concerns all data which can be stored in database SQL in a
table with rows and columns. They have relational keys and can easily be
mapped into pre-designed fields. Today, those data are most processed in
the development and simplest way to manage information. Example:
Relational-data.

2. Semi-Structured –
Semi-structured data is information that does not reside in a relational
database but that has some organizational properties that make it easier to
analyze. With some processes, you can store them in the relation database
(it could be very hard for some kind of semi-structured data), but Semi-
structured exists to ease space. Example: XML data.

Unstructured data –
Unstructured data is data which is not organized in a predefined manner or does not
have a predefined data model; thus, it is not a good fit for a mainstream relational
database. So, for Unstructured data, there are alternative platforms for storing and
managing. It is increasingly prevalent in IT systems and is used by organizations in
a variety of business intelligence and analytics applications. Example: Word, PDF,
Text, Media logs.
Differences between Structured, Semi-structured and Unstructured data:

Unstructured
Properties Structured data Semi-structured data
data

It is based on
It is based on It is based on
XML/RDF(Resource
Technology Relational character and
Description
database table binary data
Framework).

Matured
No transaction
transaction and Transaction is adapted
Transaction management
various from DBMS not
management and no
concurrency matured
concurrency
techniques

Version Versioning over Versioning over tuples Versioned as a

management tuples,row,tables or graph is possible whole

It is more flexible than It is more

It is schema
structured data but less flexible and
Flexibility dependent and less
flexible than there is absence
flexible
unstructured data of schema
It is very difficult
It’s scaling is simpler It is more
Scalability to scale DB
than structured data scalable.
schema

New technology, not

Robustness Very robust —
very spread

Data Types Based on Its Collection

Based on how data is collected, it can be divided into two categories - Primary and
Secondary data. Let’s review the key differences between these two types in the
following table -

Factor Primary Data Secondary Data

Definition Primary Data refers to the Secondary Data has been collected
first-hand data collected by by other teams in the past. It does
the team. It is collected based not necessarily need to be aligned
on the researcher’s needs. with the researcher’s requirements.

Data Real-time Data Historical Data

Process Time Consuming Quick and Easy

Collection Long Short

Time

Available In Raw and Crude form Refined form

Accuracy Very high Relatively less

and
Reliability

Examples Personal Interviews, Surveys, Websites, Articles, Research

Observations, etc. Papers, Historical Data, etc.

Types of Data:
The data in statistics is classified into four categories:
• Nominal data
• Ordinal data
• Discrete data
• Continuous data

In statistics, there are four main types of data: nominal, ordinal, interval, and ratio.
These types of data are used to describe the nature of the data being collected or
analyzed, and they help determine the appropriate statistical tests to use.

Qualitative Data (Categorical Data)

As the name suggests Qualitative Data tells the features of the data in the statistics.
Qualitative Data is also called Categorical Data and its categories the data into
various categories. Qualitative data includes data such as gender of people, their
family name, and others in a sample of population data.
Qualitative data is further categorized into two categories that includes,
• Nominal Data
• Ordinal Data

Nominal Data
Nominal data is a type of data that consists of categories or names that cannot be
ordered or ranked. Nominal data is often used to categorize observations into groups,
and the groups are not comparable. In other words, nominal data has no inherent
order or ranking. Examples of nominal data include gender (Male or female), race
(White, Black, Asian), religion (Hinduism, Christianity, Islam, Judaism), and blood
type (A, B, AB, O).
Nominal data can be represented using frequency tables and bar charts, which
display the number or proportion of observations in each category. For example, a
frequency table for gender might show the number of males and females in a sample
of people.
Nominal data is analyzed using non-parametric tests, which do not make any
assumptions about the underlying distribution of the data. Common non-parametric
tests for nominal data include Chi-Squared Tests and Fisher’s Exact Tests. These
tests are used to compare the frequency or proportion of observations in different
categories.
Ordinal Data
Ordinal data is a type of data that consists of categories that can be ordered or ranked.
However, the distance between categories is not necessarily equal. Ordinal data is
often used to measure subjective attributes or opinions, where there is a natural order
to the responses. Examples of ordinal data include education level (Elementary,
Middle, High School, College), job position (Manager, Supervisor, Employee), etc.
Ordinal data can be represented using bar charts, line charts. These displays show
the order or ranking of the categories, but they do not imply that the distances
between categories are equal.
Ordinal data is analyzed using non-parametric tests, which make no assumptions
about the underlying distribution of the data. Common non-parametric tests for
ordinal data include the Wilcoxon Signed-Rank test and Mann-Whitney U test.
Quantitative Data (Numerical Data)
Quantitative Data is the type of data that represents the numerical value of the data.
They are also called Numerical Data. This data type is used to represent the height,
weight, length, and other things of the data. Quantitative data is further classified
into two categories that are,
• Discrete Data
• Continuous Data

Discrete Data
Discrete data type is a type of data in statistics that only uses Discrete Value or Single
Values. These data types have values that can be easily counted as whole numbers.
The example of the discrete data types is,
• Height of Students in a class
• Marks of the students in a class test
• Weight of different members of a family, etc.

Continuous Data
Continuous data is the type of quantitative data that represent the data in a continuous
range. The variable in the data set can have any value between the range of the data
set. Examples of the continuous data types are,
• Temperature Range
• Salary range of Workers in a Factory, etc.

Difference between Quantitative and Qualitative Data

Quantitative and Qualitative data has huge differences and the basic differences
between them are studied in the table added below,
Quantitative data Qualitative data

Data is not depicted in numerical

Data is depicted in numerical terms.
terms.

Can be shown in numbers and variables Could be about the behavioral

like ratio, percentage, and more. attributes of a person, or thing.

Examples: loud behavior, fair skin,

Example: 100%, 1:3, 123
soft quality, and more.

Difference between Discrete and Continuous Data

Discrete data and continuous data both come under Quantitative data and the
differences between them is studied in the table added below,

Discrete Data Continuous Data

The type of data that has clear spaces This information falls into a continuous
between values is discrete data. series.
Discrete Data is Countable Continuous Data is Measurable

There are distinct or different values Every value within a range is included in
in discrete data. continuous data.

Discrete Data is depicted using bar Continuous Data is depicted using

graphs histograms

Ungrouped frequency distribution of Grouped distribution of continuous data

discrete data is performed against a tabulation frequencies is performed
single value. against a value group.

Data Sources:

A data source is the location where data that is being used originates from. A data
source may be the initial location where data is born or where physical information
is first digitized, however even the most refined data may serve as a source, as long
as another process accesses and utilizes it.
Databases
A database is an organized collection of structured information, or data, typically
stored electronically in a computer system. A database is usually controlled by a
database management system (DBMS).
Types:
Relational Database
NoSQL Database
Files:
Data stored in files, which can be in various formats such as text files, CSV, Excel
Spreadsheets, and more.

APIs (Application Programming Interface)

API stands for Application Programming Interface. In the context of APIs, the word
Application refers to any software with a distinct function. Interface can be thought
of as a contract of service between two applications. This contract defines how the
two communicate with each other using requests and responses.
Types:
Web APIs: Allow access to data over HTTP (eg. RESTful APIs) and usually return
data in JSON or XML format.
Library APIs: APIs provided by programming libraries to access specific functions
and data.

Web Scraping
Web scraping is the process of using bots to extract content and data from a website.
Unlike screen scraping, which only copies pixels displayed on screen, web scraping
extracts underlying HTML code, and, with it, data stored in a database. The scraper
can then replicate entire website content elsewhere.
Usage: Extracting news articles, product information, reviews, and more from
websites.

Sensors

A sensor is a device that detects and responds to some type of input from the physical
environment. The input can be light, heat, motion, moisture, pressure, or any number
of other environmental phenomena. Sensors collect data from the environment or
devices, providing valuable information for various applications and IOT projects.
In the context of data science sensor data is valuable for IOT applications,
environmental monitoring, health care manufacturing and more.

Social Media

Social Media platforms generate vast amounts of data daily including text messages,
videos, and user engagement metrics.
Usage: Analyzing trends, sentiments, user behavior, and engagement patterns.

Chapter Ends…

Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
Mypresentation 1
No ratings yet
Mypresentation 1
50 pages
Data Analytics
No ratings yet
Data Analytics
302 pages
Data and Types of Data
No ratings yet
Data and Types of Data
7 pages
types and characteristics
No ratings yet
types and characteristics
3 pages
EDA Unit-1
No ratings yet
EDA Unit-1
9 pages
DSUR Notes-1
No ratings yet
DSUR Notes-1
12 pages
Lecture_2_Basics of Data Science (1)
No ratings yet
Lecture_2_Basics of Data Science (1)
56 pages
ordinal data
No ratings yet
ordinal data
3 pages
Unit-2-1
No ratings yet
Unit-2-1
48 pages
Unit 1-Part2
No ratings yet
Unit 1-Part2
28 pages
data types
No ratings yet
data types
4 pages
Types of Data
No ratings yet
Types of Data
14 pages
RESEARCH
No ratings yet
RESEARCH
4 pages
business Analytics (tanya pandey) mba m3a
No ratings yet
business Analytics (tanya pandey) mba m3a
64 pages
Classes of Data
No ratings yet
Classes of Data
10 pages
7 Types of Data
100% (1)
7 Types of Data
9 pages
W1L1,2,3 Lecture Script
No ratings yet
W1L1,2,3 Lecture Script
17 pages
UNIT-I - Data Categorization-by-Dr - SKY
No ratings yet
UNIT-I - Data Categorization-by-Dr - SKY
22 pages
Data Types
No ratings yet
Data Types
5 pages
Data and Its Types
No ratings yet
Data and Its Types
32 pages
Data Science Using R
No ratings yet
Data Science Using R
74 pages
Topic 1 Introduction To Statistics
No ratings yet
Topic 1 Introduction To Statistics
35 pages
Module 3 Data Types
No ratings yet
Module 3 Data Types
10 pages
APznzab7l5VWoN_0b231os0Y7FdKa3_9cjevjnNWPzCSvJOaupzrsNt0kGceg6-X1WDd1Z12_vNl5AHrKfLTNkreibuZztkhanNTF55KHKNWaJjfmcvKbQe2Nb0-0NeG6wf8-FlBmB-qvgFS5iWpo4z6OGYmRV9bRfICmHb7Hqug4XDvpOSE5Y66_hVZnvPjQJGGy8WZZnWYa_7JiTLnPtptXaKjsEfeVNecE4wZ0-l
No ratings yet
APznzab7l5VWoN_0b231os0Y7FdKa3_9cjevjnNWPzCSvJOaupzrsNt0kGceg6-X1WDd1Z12_vNl5AHrKfLTNkreibuZztkhanNTF55KHKNWaJjfmcvKbQe2Nb0-0NeG6wf8-FlBmB-qvgFS5iWpo4z6OGYmRV9bRfICmHb7Hqug4XDvpOSE5Y66_hVZnvPjQJGGy8WZZnWYa_7JiTLnPtptXaKjsEfeVNecE4wZ0-l
4 pages
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
No ratings yet
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
73 pages
Data Types
No ratings yet
Data Types
18 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
Basics of Data and Types of Data
No ratings yet
Basics of Data and Types of Data
3 pages
Lesson 3 Research Data
No ratings yet
Lesson 3 Research Data
36 pages
Classification and Organization of Data
No ratings yet
Classification and Organization of Data
12 pages
Data and Its Types
No ratings yet
Data and Its Types
15 pages
DAT100_Int_Data_Ana_Lec3_Types_Of_Data
No ratings yet
DAT100_Int_Data_Ana_Lec3_Types_Of_Data
35 pages
File 1704270460 0009750 typesofdata-UNIT-1
No ratings yet
File 1704270460 0009750 typesofdata-UNIT-1
7 pages
CHAR OF DATA DV 1
No ratings yet
CHAR OF DATA DV 1
14 pages
Data Types in Statistics - Qualitative Vs Quantitative Data
No ratings yet
Data Types in Statistics - Qualitative Vs Quantitative Data
11 pages
Crisp DM - Crisp MLQ
No ratings yet
Crisp DM - Crisp MLQ
9 pages
Crisp DM - Crisp MLQ
No ratings yet
Crisp DM - Crisp MLQ
12 pages
Structureddata
No ratings yet
Structureddata
17 pages
Data Integration
No ratings yet
Data Integration
21 pages
Day 1 - Data
No ratings yet
Day 1 - Data
11 pages
Data Formats in Practice
No ratings yet
Data Formats in Practice
6 pages
CH 03
No ratings yet
CH 03
19 pages
Basic Statistics: Chapter One
No ratings yet
Basic Statistics: Chapter One
15 pages
Dou 10 06 2024 DBMS
No ratings yet
Dou 10 06 2024 DBMS
14 pages
Chapter 1.1 Introduction to Data
No ratings yet
Chapter 1.1 Introduction to Data
10 pages
Qualitative and Quantitative Data
No ratings yet
Qualitative and Quantitative Data
3 pages
Session 2
No ratings yet
Session 2
17 pages
Data and Information
No ratings yet
Data and Information
6 pages
Unit 1 Types of Data
No ratings yet
Unit 1 Types of Data
17 pages
M01Q02
No ratings yet
M01Q02
5 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
Data Preparation-Part 1-231018-220411
No ratings yet
Data Preparation-Part 1-231018-220411
74 pages
Notes On Statistics
No ratings yet
Notes On Statistics
15 pages
Unit 2
No ratings yet
Unit 2
72 pages
Data & Statistics
No ratings yet
Data & Statistics
18 pages
Lesson 1 Introduction To Statistics
No ratings yet
Lesson 1 Introduction To Statistics
9 pages
Data Formats in Practice
0% (1)
Data Formats in Practice
4 pages
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Vibrating Wire Embedment Strain Gauge (VWSG-E) Manual: ELM0085C
No ratings yet
Vibrating Wire Embedment Strain Gauge (VWSG-E) Manual: ELM0085C
21 pages
Guidelines EMAD2022
No ratings yet
Guidelines EMAD2022
3 pages
Clamshell Equipment - CT - 08 - 09 - 20 - Manisha Grace
No ratings yet
Clamshell Equipment - CT - 08 - 09 - 20 - Manisha Grace
8 pages
Exhaust Valve Housing & Cages - DMI Dubai
No ratings yet
Exhaust Valve Housing & Cages - DMI Dubai
3 pages
Sba Mark Scheme 2019
No ratings yet
Sba Mark Scheme 2019
4 pages
Sumaiya_Resume_Experience
No ratings yet
Sumaiya_Resume_Experience
2 pages
Large Language Models in Finance a Survey
No ratings yet
Large Language Models in Finance a Survey
9 pages
How To Shoot Gorgeous Documentary Interviews PDF
No ratings yet
How To Shoot Gorgeous Documentary Interviews PDF
8 pages
236_ग्रामीण_बैंक_RRB_भर्ती_2025_IBPS_RRB_PO_CLERK_
No ratings yet
236_ग्रामीण_बैंक_RRB_भर्ती_2025_IBPS_RRB_PO_CLERK_
34 pages
Practice test_unit 8
No ratings yet
Practice test_unit 8
5 pages
Module No.1 Media Information and Literacy Q4W1 2 1
No ratings yet
Module No.1 Media Information and Literacy Q4W1 2 1
16 pages
Kta50-G3 0 PDF
0% (1)
Kta50-G3 0 PDF
3 pages
Integrate Google Calendar With ServiceNow - How To Read The Docs - Integration Part 1 - Developer Community - Article - ServiceNow Community
No ratings yet
Integrate Google Calendar With ServiceNow - How To Read The Docs - Integration Part 1 - Developer Community - Article - ServiceNow Community
16 pages
Sudhir Prabhu 2024
No ratings yet
Sudhir Prabhu 2024
5 pages
SP-STR9-MONO-2022-07-06
No ratings yet
SP-STR9-MONO-2022-07-06
12 pages
Major Project Proposal (Steering Mechanism) - 2023-24
No ratings yet
Major Project Proposal (Steering Mechanism) - 2023-24
19 pages
Free Download Management System Project R Free Download Management System Project R
No ratings yet
Free Download Management System Project R Free Download Management System Project R
10 pages
Technology in The Classroom Article Analysis and Response
No ratings yet
Technology in The Classroom Article Analysis and Response
5 pages
Master Thesis Code The Best Position To Be in
No ratings yet
Master Thesis Code The Best Position To Be in
33 pages
DC-60 - CE&FDA - Service Manual - V10.0 - EN
No ratings yet
DC-60 - CE&FDA - Service Manual - V10.0 - EN
249 pages
Digital Map Conflation
No ratings yet
Digital Map Conflation
29 pages
FortiExtender 101F QuickStart - Online
No ratings yet
FortiExtender 101F QuickStart - Online
17 pages
There Are Versatile Applications.: Dynasylan® SIVO 140
No ratings yet
There Are Versatile Applications.: Dynasylan® SIVO 140
2 pages
Mario G. Sapongay JR.: Registered Civil Engineer Materials Engineer
No ratings yet
Mario G. Sapongay JR.: Registered Civil Engineer Materials Engineer
7 pages
Download
No ratings yet
Download
1 page
IT 402 Digital Documentation Class 10 Questions and Answers - CBSE Skill Education
100% (3)
IT 402 Digital Documentation Class 10 Questions and Answers - CBSE Skill Education
17 pages
Introduction To The Atmel Atmega32
No ratings yet
Introduction To The Atmel Atmega32
14 pages
About HND Marine Habitat Surveys
No ratings yet
About HND Marine Habitat Surveys
6 pages
Problem C Cable Car: 3 147.3 N 1 N K I MS ME I VS VE
No ratings yet
Problem C Cable Car: 3 147.3 N 1 N K I MS ME I VS VE
1 page
This Content Downloaded From 84.236.54.138 On Thu, 06 Jan 2022 06:28:26 UTC
No ratings yet
This Content Downloaded From 84.236.54.138 On Thu, 06 Jan 2022 06:28:26 UTC
17 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 2 DS

Uploaded by

Chapter 2 DS

Uploaded by

Chapter-2

Data Types and Sources

Data can be Structured data, Semi-structured data, and Unstructured data.

Version Versioning over Versioning over tuples Versioned as a

It is more flexible than It is more

New technology, not

Data Types Based on Its Collection

Factor Primary Data Secondary Data

Data Real-time Data Historical Data

Collection Long Short

Available In Raw and Crude form Refined form

Accuracy Very high Relatively less

Examples Personal Interviews, Surveys, Websites, Articles, Research

Qualitative Data (Categorical Data)

Difference between Quantitative and Qualitative Data

Data is not depicted in numerical

Can be shown in numbers and variables Could be about the behavioral

Examples: loud behavior, fair skin,

Difference between Discrete and Continuous Data

Discrete Data Continuous Data

Discrete Data is depicted using bar Continuous Data is depicted using

Ungrouped frequency distribution of Grouped distribution of continuous data

APIs (Application Programming Interface)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.