Introduction to Big Data Analytics_thendral1
Introduction to Big Data Analytics_thendral1
N.Thendral
Agenda
• Introduction
• Big Data Fundamentals
• Scalability and Parallel Processing
• Designing Data Architecture
• Data Sources and Quality
• Data Pre-processing and Storing
• Data Storage and Analysis
• Big Data Analytics Applications
• Case Studies
Lesson 1
Introduction To BIG DATA
The rise in technology has led to the production and storage of voluminous
amounts of data. Earlier megabytes (106 B) were used but nowadays petabytes
(1015 B) are used for processing, analysis, discovering new facts and generating
new knowledge. Conventional systems for storage, processing and analysis pose
challenges in large growth in volume of data, variety of data, various forms and
formats, increasing complexity, faster generation of data and need of quickly
processing, analyzing and usage
Evolution of Big Data and their
characteristics
• Volume: Big data involves large datasets that can range from
terabytes to petabytes or even more.
• Velocity: Data is generated and collected at high speeds, often in real-
time.
• Variety: Data comes in various forms, including text, images, videos,
and more.
• Veracity: Data quality and trustworthiness can be a challenge in big
data.
VOLUME
Volume means “How much Data is generated”. Now-a-days,
Organizations or Human Beings or Systems are generating or getting
very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa
Byte(EB) and more.
images, videos, audios and multimedia files for web users. Internet applications
including web sites, web services, web portals, online business applications, emails,
chats, tweets and social networks provide and consume the web data.
Examples:
◗ Wikipedia,
◗ Google Maps,
◗ YouTube.
◗ Face Book
Classification of Data
•◗ structured
•◗ semi-structured
•◗ multi-structured
•◗ unstructured.
Structured
• Any data that can be stored, accessed and processed in the form of
fixed format is termed as a 'structured' data.
• Structured data conform and associate with data schemas and data
models.
• However, nowadays, we are foreseeing issues when a size of such
data grows to a huge extent, typical sizes are being in the range of
multiple zettabytes
Examples Of Structured Data
An 'Employee' table in a database is an example of Structured Data
Structured data enables the following:
data insert, delete, update and append
Scalability which enables increasing or decreasing capacities and data processing operations
such as, storing, processing and analytics
Transactions processing which follows ACID rules (Atomicity, Consistency, Isolation and
Durability)
<rec><name>Prashant
Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema
R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish
Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato
Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah
J.</name><sex>Male</sex><age>35</age></rec
Big Data Types
• Social networks and web data, such as Facebook, Twitter, e-mails, blogs
and YouTube.
• Transactions data and Business Processes (BPs) data, such as credit card
transactions, flight bookings, etc. and public agencies data such as
medical records, insurance business data etc.
• Customer master data, such as data for facial recognition and for the
name, date of birth, marriage anniversary, gender, location and income
category,
• Machine-generated data, such as machine-to-machine or Internet of
Things data, Computer, sensors, trackers, web logs, ..
• Human-generated data such as biometrics data, human–machine
interaction data, e-mail records with a mail server and MySQL database
of student grades
Big Data Examples
1. Chocolate Marketing Company with large number of installed
Automatic Chocolate Vending Machines (ACVMs)
2. Automotive Components and Predictive Automotive Maintenance
Services (ACPAMS) rendering customer services for maintenance
and servicing of (Internet) connected cars and its components
3. Weather data Recording, Monitoring and Prediction (WRMP)
Organization
4. A toy company optimizing the services offered, products and
schedules, devise ways and using Big Data processing and storing
for descriptive, predictive and prescriptive analytics
Basis of Big Data Classification
• Big Data formats
• Data Stores structure
• Big Data sources
• Processing data rates
• Processing Big Data rates
• Analysis types
• Big Data processing methods
• Data analysis methods
• Data usages
Summary
We learnt