0% found this document useful (0 votes)
3 views10 pages

Big Data Engineer 2021-Ecosystem-Course Guide (2) - 21-30

The document provides an introduction to big data, covering its definition, challenges, types, and the five Vs (Volume, Velocity, Variety, Veracity, and Value). It highlights the exponential growth of data and the complexities involved in managing and analyzing it, as well as the various use cases across different industries. Additionally, it discusses big data analytics techniques and the importance of deriving actionable insights from large datasets.

Uploaded by

moahh20011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views10 pages

Big Data Engineer 2021-Ecosystem-Course Guide (2) - 21-30

The document provides an introduction to big data, covering its definition, challenges, types, and the five Vs (Volume, Velocity, Variety, Veracity, and Value). It highlights the exponential growth of data and the complexities involved in managing and analyzing it, as well as the various use cases across different industries. Additionally, it discusses big data analytics techniques and the importance of deriving actionable insights from large datasets.

Uploaded by

moahh20011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

V11.

2
Unit 1. Introduction to big data

Uempty

Topics
• Big data overview
• Big data use cases
• Evolution from traditional data processing to big data processing
• Introduction to Apache Hadoop and the Hadoop infrastructure

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-3. Topics

© Copyright IBM Corp. 2016, 2021 1-5


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty

Introduction to big data


• The big data tsunami.
• The Vs of big data
(3Vs, 4Vs, 5Vs, and so on).
The count depends on who
does the counting.
• The infrastructure:
ƒ Apache open source
ƒ The distributions
ƒ The add-ons
ƒ Open Data Platform initiative
(OPDi.org)
• Some basic terminology.

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-4. Introduction to big data

Big data is a term that is used to describe large collections of data (also known as data sets). Big
data might be unstructured and grow so large and quickly that is difficult to manage with regular
database or statistics tools.

© Copyright IBM Corp. 2016, 2021 1-6


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty

Big data: A tsunami that is hitting us


• We are witnessing a tsunami of data:
ƒ Huge volumes
ƒ Data of different types and formats
ƒ Impacts on the business at new and ever-increasing speeds
• The challenges:
ƒ Capturing, transporting, and moving the data
ƒ Managing the data, the hardware that is involved, and the software
(open source and not)
ƒ Processing from munging the raw data to programming and providing insight
into the data.
ƒ Storing, safeguarding, and securing:
“Big data refers to non-conventional strategies and innovative technologies that are
used by businesses and organizations to capture, manage, process, and make sense
of a large volume of data.”
• The industries that are involved.
• The future.

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-5. Big data: A tsunami that is hitting us

We are witnessing a tsunami of huge volume of data of different types and formats that make
managing, processing, storing, safeguarding, securing, and transporting data a real challenge.
“Big data refers to non-conventional strategies and innovative technologies that are used by
businesses and organizations to capture, manage, process, and make sense of a large volume of
data.” (Source: Reed, J, Data Analytics: Applicable Data to Advance Any Business. Seattle, WA,
CreateSpace Independent Publishing Platform, 2017. 1544916507.
The analogies:
• Elephant (hence the logo of Hadoop)
• Humongous (the underlying word for Mongo Database)
• Streams, data lakes, and oceans of data

© Copyright IBM Corp. 2016, 2021 1-7


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty

Some examples of big data


• Science
• Astronomy • Medical records
• Atmospheric science • Commercial
• Genomics • Web, event, and database logs
• Biogeochemical • "Digital exhaust“, which is the result of
• Biological human interaction with the internet
• Other complex / interdisciplinary • Sensor networks
scientific research • RFID
• Social • Internet text and documents
• Social networks • Internet search indexing
• Social data: • Call detail records (CDRs)
ƒ Person to person and client to client (P2P • Photographic archives
and C2C): • Video and audio archives
• Wish lists on Amazon.com • Large-scale e-commerce
• Craig’s List: • Regular government business and
ƒ Person to world (P2W) : commerce needs
• Twitter • Military and homeland security
• Facebook surveillance
• LinkedIn

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-6. Some examples of big data

There is much data, such as historical and new data that is generated from social media apps,
science, medical research, stream data from web applications, and IoT sensor data. The amount of
data is larger than ever, growing exponentially, and in many different formats.
The business value in the data comes from the meaning that you can harvest from it. Deriving
business value from all that data is a significant problem.

© Copyright IBM Corp. 2016, 2021 1-8


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty

Types of big data

Structured Semi-structured Unstructured


• Data that can be • Data that does not • Data that has an
stored and processed have a formal structure unknown form and
in a fixed format, of a data model, that cannot be stored
which is also known as is, a table definition in in RDBMS and
a schema. a relational DBMS, but analyzed unless it
has some is transformed into
organizational a structured format
properties like tags is called unstructured
and other markers to data.
separate semantic
elements that makes
• Text files and
multimedia contents
it easier to analyze,
like images, audio, and
such as XML or JSON.
videos are examples
of unstructured data.
Unstructured data
is growing quicker than
other data. Experts
say that 80% of the
data in an organization
is unstructured.

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-7. Types of big data

Here are the types of big data:


• Structured: Data that can be stored and processed in a fixed format is called structured data.
Data that is stored in a relational database management system (RDBMS) is one example of
structured data. It is easier to process structured data because it has a fixed schema.
Structured Query Language (SQL) is often used to manage such data.
• Semi-structured: Semi-structured data is a type of data that does not have the formal structure
of a data model, such as a table definition in a relational DBMS. Semi-structured data has some
organizational properties like tags and other markers to separate semantic elements, which
makes it easier to analyze. XML files or JSON documents are examples of semi-structured
data.
• Unstructured: Data that has an unknown form and cannot be stored in an RDBMS and
analyzed unless it is transformed into a structured format is called unstructured data. Text files
and multimedia contents like images, audio, and videos are examples of unstructured data.
Unstructured data is growing quicker than others. Experts say that 80% of the data in an
organization is unstructured. Examples of unstructured data include images, tweets, Facebook
status updates, instant messenger conversations, blogs, videos, voice recordings, and sensor
data. These types of data do not have a defined pattern. Unstructured data is often a reflection
of human thoughts, emotions, and feelings, which sometimes are difficult to express by using
exact words.

© Copyright IBM Corp. 2016, 2021 1-9


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty

The four classic dimensions of big data (the four Vs)

Variety
Different
forms of data

Velocity
Veracity
Analysis of
streaming Value Uncertainty
of data
data

There is a fifth V, which is


Value. It is the reason for
Volume
working with big data
Scale of
data to obtain business insight.

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-8. The four classic dimensions of big data (the four Vs)

Here are the five Vs of big data:


• Data volume
People and systems are more connected than ever before. This interconnection leads to more
data sources, which results in an amount of data that is larger than ever before (and constantly
growing). Old data is being digitized, which contributes to the volume. The increased volume of
data requires a constant increase of computing power to derive value (meaning) from the data.
Traditional computing methods do not work on the volume of data that is accumulating today.
• Data velocity
The speed and directions from which data comes into the organization is increasing due to
interconnection and advances in network technology. It is coming in faster than we can make
sense out of it. The faster the data comes in and more varied the sources, the harder it is to
derive value (meaning) from the data. Traditional computing methods do not work on data that
is coming in at today’s speeds.

© Copyright IBM Corp. 2016, 2021 1-10


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty
• Data variety
More sources of data mean more varieties of data in different formats: from traditional
documents and databases, to semi-structured and unstructured data from click streams, GPS
location data, social media apps, and IoT (to name a few). Different data formats mean that it is
tougher to derive value (meaning) from the data because it must all be extracted for processing
in different ways. Traditional computing methods do not work on all these different varieties of
data.
• Data veracity
There is usually noise, biases, and abnormality in data. It is possible that such a huge amount
of data has some uncertainty that is associated with it. After much data is gathered, it must be
curated, sanitized, and cleansed.

Often, this process is seen as the thankless job of being a data janitor, and it can take more
than 85% of a data analyst’s or data scientist’s time. Veracity in data analysis is considered the
biggest challenge when compared to volume, velocity, and variety. The large volume, wide
variety, and high velocity along with high-end technology has no significance if the data that is
collected or reported is incorrect. Data trustworthiness (in other words, the quality of data) is of
the highest importance in the big data world.
• Data value
The business value in the data comes from the meaning that we can harvest from it. The value
comes from converting a large volume of data into actionable insights that are generated by
analyzing information, which leads to smarter decision making.
References:
• What is big data? More than volume, velocity and variety:
https://developer.ibm.com/blogs/what-is-big-data-more-than-volume-velocity-and-variety/
• The Four Vs of Big Data:
https://www.ibmbigdatahub.com/infographic/four-vs-big-data
• Big Data Analytics:
ftp://ftp.software.ibm.com/software/tw/Defining_Big_Data_through_3V_v.pdf
• The 5 Vs of big data:
https://www.ibm.com/blogs/watson-health/the-5-vs-of-big-data/
• The 4 Vs of Big Data for Yielding Invaluable Gems of Information:
https://www.promptcloud.com/blog/The-4-Vs-of-Big-Data-for-Yielding-Invaluable-Gems-of-Infor
mation

© Copyright IBM Corp. 2016, 2021 1-11


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty

An insight into big data analytic techniques

Domain knowledge

Business strategy Communications

Statistics Visualizations

Neurocomputing Data mining

Data
Machine Science Pattern
learning recognition
Business analysis
Presentation
KDD AI
Databases and
data processing

Problem solving Inquisitiveness

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-9. An insight into big data analytic techniques

Big data analytics is the use of advanced analytic techniques against large, diverse data sets from
different sources and in different sizes from terabytes to zettabytes. There are several specialized
techniques and technologies that are involved. The slide shows some of the big data analytics
techniques and the relationship between them. This list is not exhaustive, but it helps you
understand the complexity of the problem domain.
For more information, see the articles that are listed under References.
References:
• An Insight into 26 Big Data Analytic Techniques: Part 1:
https://blogs.systweak.com/an-insight-into-26-big-data-analytic-techniques-part-1/
• An Insight into 26 Big Data Analytic Techniques: Part 2:
https://blogs.systweak.com/an-insight-into-26-big-data-analytic-techniques-part-2/
• Big data analytics:
https://www.ibm.com/analytics/hadoop/big-data-analytics
• A Beginner’s Guide to Big Data Analytics:
https://blogs.systweak.com/a-beginners-guide-to-big-data-analytics/

© Copyright IBM Corp. 2016, 2021 1-12


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty
1.2. Big data use cases

© Copyright IBM Corp. 2016, 2021 1-13


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
V11.2
Unit 1. Introduction to big data

Uempty

Big data use cases

Introduction to big data © Copyright IBM Corporation 2021

Figure 1-10. Big data use cases

© Copyright IBM Corp. 2016, 2021 1-14


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy