0% found this document useful (0 votes)
2 views26 pages

Introduction to Big Data Analytics_thendral1

The document provides an overview of Big Data, including its definition, characteristics (volume, velocity, variety, and veracity), and types (structured, semi-structured, and unstructured data). It discusses the challenges posed by traditional data processing tools in handling massive datasets and the importance of data quality. Additionally, it highlights various applications and case studies of Big Data analytics in different sectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views26 pages

Introduction to Big Data Analytics_thendral1

The document provides an overview of Big Data, including its definition, characteristics (volume, velocity, variety, and veracity), and types (structured, semi-structured, and unstructured data). It discusses the challenges posed by traditional data processing tools in handling massive datasets and the importance of data quality. Additionally, it highlights various applications and case studies of Big Data analytics in different sectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Big Data Analytics

N.Thendral
Agenda

• Introduction
• Big Data Fundamentals
• Scalability and Parallel Processing
• Designing Data Architecture
• Data Sources and Quality
• Data Pre-processing and Storing
• Data Storage and Analysis
• Big Data Analytics Applications
• Case Studies
Lesson 1
Introduction To BIG DATA

Big Data Characteristics, Types and Classifications


Introduction to Big Data
Analytics
• Definition of Big Data
Big Data is a term used to describe a massive volume of structured and
unstructured data that is too large and complex for traditional data processing
tools to handle.

The rise in technology has led to the production and storage of voluminous
amounts of data. Earlier megabytes (106 B) were used but nowadays petabytes
(1015 B) are used for processing, analysis, discovering new facts and generating
new knowledge. Conventional systems for storage, processing and analysis pose
challenges in large growth in volume of data, variety of data, various forms and
formats, increasing complexity, faster generation of data and need of quickly
processing, analyzing and usage
Evolution of Big Data and their
characteristics

EDP-Electronic data processing


ERP-Enterprise resource planning
Characteristics of Big Data

• Volume: Big data involves large datasets that can range from
terabytes to petabytes or even more.
• Velocity: Data is generated and collected at high speeds, often in real-
time.
• Variety: Data comes in various forms, including text, images, videos,
and more.
• Veracity: Data quality and trustworthiness can be a challenge in big
data.
VOLUME
Volume means “How much Data is generated”. Now-a-days,
Organizations or Human Beings or Systems are generating or getting
very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa
Byte(EB) and more.

VOLUME= Very Large Amount of Data


VELOCITY

Velocity means “How fast produce Data”. Now-a-days, Organizations or


Human Beings or Systems are generating huge amounts of Data at very
fast rate.

VELOCITY = Produce Data at very Fast Rate


VARIETY
Variety means “Different forms of Data”. Now-a-days,
Organizations or Human Beings or Systems are generating very huge
amount of data at very fast rate in different formats. We will discuss in
details about different formats of Data soon.

VARIETY = Produce Data In Different Format


VERACITY

Veracity means “The Quality or Correctness or Accuracy of


Captured Data”. Out of 4Vs, it is most important V for any Big Data
Solutions. Because without Correct Information or Data, there is no use
of storing large amount of data at fast rate and different formats. That
data should give correct business value.

VERACITY = The Correctness of Data


Definitions
“Big Data is also data but with a huge size. Big Data is a term used to
describe a collection of data that is huge in volume and yet growing
exponentially with time. In short such data is so large and complex that
none of the traditional data management tools are able to store it or
process it efficiently.”
“Extremely large data sets that may be analyzed computationally to
reveal patterns , trends and association, especially relating to human
behavior and interaction are known as Big Data.”
Web Data
Web data is the data present on web servers (or enterprise servers) in the form of text,

images, videos, audios and multimedia files for web users. Internet applications

including web sites, web services, web portals, online business applications, emails,

chats, tweets and social networks provide and consume the web data.

Examples:
◗ Wikipedia,
◗ Google Maps,
◗ YouTube.
◗ Face Book
Classification of Data

•◗ structured

•◗ semi-structured

•◗ multi-structured

•◗ unstructured.
Structured
• Any data that can be stored, accessed and processed in the form of
fixed format is termed as a 'structured' data.
• Structured data conform and associate with data schemas and data
models.
• However, nowadays, we are foreseeing issues when a size of such
data grows to a huge extent, typical sizes are being in the range of
multiple zettabytes
Examples Of Structured Data
An 'Employee' table in a database is an example of Structured Data
Structured data enables the following:
data insert, delete, update and append

indexing to enable faster data retrieval

Scalability which enables increasing or decreasing capacities and data processing operations
such as, storing, processing and analytics
Transactions processing which follows ACID rules (Atomicity, Consistency, Isolation and
Durability)

encryption and decryption for data security.


Unstructured
• Any data with unknown form or the structure is classified as
unstructured data.
• In addition to the size being huge, un-structured data poses multiple
challenges in terms of its processing for deriving value out of it.
• A typical example of unstructured data is a heterogeneous data
source containing a combination of simple text files, images, videos
etc.
The output returned by 'Google Search'
Unstructured Data

Data does not possess data features such as a table or a database.

Unstructured data are found in file types such as .TXT, .CSV.

Data may be as key-value pairs, such as hash key-value pairs.

Data may have internal structures, such as in e- mails.

The data do not reveal relationships, hierarchy relationships.

The relationships, schema and features need to be separately established.


Semi Structured Data

Examples of semi-structured data are XML and JSON documents. Semi-


structured data contain tags or other markers, which separate semantic
elements and enforce hierarchies of records and fields within the data.
Semi-structured form of data does not conform and associate with
formal data model structures. Data do not associate data models, such
as the relational database and table models.
Examples Of Semi-structured Data

<rec><name>Prashant
Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema
R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish
Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato
Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah
J.</name><sex>Male</sex><age>35</age></rec
Big Data Types
• Social networks and web data, such as Facebook, Twitter, e-mails, blogs
and YouTube.
• Transactions data and Business Processes (BPs) data, such as credit card
transactions, flight bookings, etc. and public agencies data such as
medical records, insurance business data etc.
• Customer master data, such as data for facial recognition and for the
name, date of birth, marriage anniversary, gender, location and income
category,
• Machine-generated data, such as machine-to-machine or Internet of
Things data, Computer, sensors, trackers, web logs, ..
• Human-generated data such as biometrics data, human–machine
interaction data, e-mail records with a mail server and MySQL database
of student grades
Big Data Examples
1. Chocolate Marketing Company with large number of installed
Automatic Chocolate Vending Machines (ACVMs)
2. Automotive Components and Predictive Automotive Maintenance
Services (ACPAMS) rendering customer services for maintenance
and servicing of (Internet) connected cars and its components
3. Weather data Recording, Monitoring and Prediction (WRMP)
Organization
4. A toy company optimizing the services offered, products and
schedules, devise ways and using Big Data processing and storing
for descriptive, predictive and prescriptive analytics
Basis of Big Data Classification
• Big Data formats
• Data Stores structure
• Big Data sources
• Processing data rates
• Processing Big Data rates
• Analysis types
• Big Data processing methods
• Data analysis methods
• Data usages
Summary

We learnt

• Evolution of Big Data


• Big Data Definitions
• Big Data Characteristics
• Big Data Types
• Basis of Big Data Classification

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy