0% found this document useful (0 votes)

3 views37 pages

Lec 1 - Introduction to Big Data

The document outlines a course on Big Data, focusing on its storage, processing, and core NoSQL concepts. It discusses the characteristics of Big Data, including volume, velocity, variety, veracity, and value, along with the classification of data types. Additionally, it highlights enterprise technologies related to Big Data and Business Intelligence, emphasizing the differences between traditional BI and Big Data methodologies.

Uploaded by

amirosama21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views37 pages

Lec 1 - Introduction to Big Data

Uploaded by

amirosama21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Understanding Big Data - I

Lecture 1
Course Objectives

▪ Understand the fundamentals of storage and

processing of Big Data

▪ Illustrate the core concepts of NoSQL for Big

Data
Prerequisites

▪ To be familiar with the following Database Concepts:

• Database Model and schema.

• Different Database Models

• Normalization

• Standard SQL

• Distributed Databases

• Transaction Processing and Concurrency Control

Lecture Outlines

▪ Why Big Data?

▪ The Definition of Big Data

▪ Characteristics/Challenges of Big Data

▪ Classification of Big Data

▪ Applications of Big Data

▪ Enterprise Technologies for Big Data and Business Intelligence

4
The Model of
Why Big Data? Generating/Consuming
Data has Changed
Why Big Data?

▪ 2.5 quintillion (1018) bytes of data are generated every day!

• Social media sites

• Sensors
• Digital photos
• Business transactions
• Location-based data Website Social Media

• Generative AI Toold
Billing
ERP Network Switches
CRM RFID

Source: IBM http://www-01.ibm.com/software/data/bigdata/

Glen Mules – Big Data University
▪ Big data itself isn’t new – it is been here
Why Big for a while and growing exponentially.
Data?
▪ What is new is the technology to
process and analyze it.

• Increase of storage capacities.

• Increase of processing power.

• Availability of data.
It is all about
Available technology can cost
deriving new insight effectively manage and analyze all
available data in its native form
for the business unstructured, structured, streaming
▪ Wikipedia
• Big data is a term for datasets that are so large or
What is Big complex that traditional data processing applications are
inadequate to deal with them.
Data? • Challenges include analysis, capture, data curation,
search, sharing, storage, transfer, visualization, querying,
updating and information privacy

▪ Gartner
• Big data is a popular term used to acknowledge the
exponential growth, availability and use of information in
the data-rich landscape of tomorrow.

▪ Academia
Key idea: “Big” is relative! • Big Data is any data that is expensive to manage and hard
“Difficult Data” is perhaps to extract value from.”
more apt!
▪ DeepSeek
• Big Data refers to extremely large and complex datasets
that traditional data processing tools and methods are
unable to efficiently manage, analyse, or interpret.

Bill Howe, UW 9
Characteristics of Big Data 3 Vs
Super exponential
Volume growth in data volume

▪ Refers to the vast amount of

data generated every second.

▪ It is not only Terabytes, but

Zettabytes or Brontobytes.

▪ This makes most datasets too

large to store and analyse using
traditional database technology.

▪ Big data tools use distributed

systems to store and analyse
data that are dotted around
anywhere in the world.

https://www.virtualb.it/en/blog/big-data-or-small-data-the-value-its-not-about-quantity-but-about-quality/ 11
Super exponential
Volume growth in data volume

▪ High data volumes impose:

• Distinct data storage and processing demands.
• Additional data preparation, curation and management
processes

https://medium.com/analytics-vidhya/the-5-vs-of-big-data-2758bfcc51d
Data can arrive at fast
Velocity speeds

▪ The most challenging V to conquer, since it

has a compounding effect on the other Vs.

▪ The velocity of data translates into the amount

of time it takes for the data to be processed
once it arrives .

▪ Coping with the fast inflow of data requires to

design highly elastic and available data
processing solutions and corresponding data
storage capabilities.

The 3 V’s of Big Data: Velocity Remains A

Challenge for Many.
Dennis Duckworth | Jan 4, 2023
Data can arrive at fast
Velocity speeds

▪ It is a challenge to manage, analyze, summarize, visualize, and discover

knowledge from the collected data in a timely manner and in a scalable
fashion

https://www.researchgate.net/figure/Examples-of-big-data-velocity_fig3_313400371
Variety Multiple formats and
types of data

▪ Refers to the different types of data we need to use. In fact, 80% of the
world’s data is unstructured.
▪ Data variety brings challenges in terms of data integration,
transformation, processing, and storage.

• Structured data • Data streams

- Financial transactions, students’ records, etc. - Sensor data, RFID data, network data,
trajectory data, etc.
• Documents
• Time series data
- Unstructured text data (Web).
- Semi-structured data (XML, RDF triples, etc.) - Stock exchange data, video/audio data,
trajectory, EEG data, etc.
• Graphs
- Social networks, Semantic Web (RDF graphs), road • Multimedia data
networks, etc. - Audio, video, image, etc.
They could also be 4 V’s © 2014 IBM Corporation
How accurate or
Veracity truthful a data set
may be

▪ Veracity is the degree to which data is

accurate, precise, and trustworthy because
of the biasedness, noise, abnormality in
data.
▪ It also refers to incomplete data or the
presence of errors, outliers, and missing
values.
▪ To convert this type of data into a
consistent, consolidated, and united source
of information creates a big challenge for https://www.researchgate.net/figure/Conceptualization-of-the-
the enterprise Components-of-Big-Data-Veracity_fig3_260178341

Veracity: The Most Important “V” of Big Data

Aug 29, 2019

17
OR 5 V’s © 2014 IBM Corporation
Value The usefulness of
data

▪ The value characteristic is intuitively related to

the veracity characteristic in that the higher
the data fidelity, the more value it holds for the
business.
▪ Value is also dependent on how long data
processing takes because analytics results
have a shelf-life;
▪ i.e., a 20 minute delayed stock quote has little
to no value for making a trade compared to a
quote that is 20 milliseconds old.

▪ Value and time are inversely related. The

longer it takes for data to be turned into
meaningful information, the less value it has
for a business.
Value The usefulness of
data

▪ Apart from veracity and time, value is also impacted

by the following lifecycle-related concerns:
▪ How well has the data been stored?
▪ Were valuable attributes of the data removed during data
cleansing?
▪ Are the right types of questions being asked during data analysis?
▪ Are the results of the analysis being accurately communicated to
the appropriate decision-makers?
And 10 V’s
Classification ▪ The data processed by Big Data solutions can
be human-generated or machine-generated,
of Big Data although it is ultimately the responsibility of
machines to generate the analytic results.
▪ Human-generated data is the result of human
interaction with systems
• Online services and digital devices.
▪ Machine-generated data is generated by
software programs and hardware devices in
response to real-world events.
• Point-of-sale system or information conveyed
from the numerous sensors in a cell phone.
The primary types of data are:
• Structured Data
• Unstructured Data
• Semi-Structured Data
▪ Apart from these three fundamental data types,
another important type of data in Big Data
environments is metadata.
Classification of Big Data Structured data

▪ Structured data is the data which is in an organized form (e.g., in rows and
columns) and can be easily used by a computer program. Relationships exist
between entities of data.
▪ Data stored in databases is an example of structured data.

▪ It is easy to work with structures data W.R.T the following

▪ CRUD operations - SQL

▪ Indexing

▪ Security

▪ Scalability – Scale up

▪ Transaction Processing – ACID properties

Semi-structured
Classification of Big Data data

▪ Semi-structured data has a defined level of

structure and consistency but is not relational in
nature. Instead, semi-structured data is
hierarchical or graph-based.
• XML, markup languages like HTML, etc.
▪ There is no separation between the data and the
schema.

▪ Semi-structured data often has special pre-

processing and storage requirements, especially
if the underlying format is not text-based.
▪ Validation of an XML file to ensure that it conformed to its
schema definition.

▪ Metadata for this data is available but is not

sufficient.
Un-structured
Classification of Big Data data

▪ Unstructured data is the data which does not

conform to a data model or is not in a form which
can be used easily by a computer program.
▪ It is estimated that unstructured data makes up 80%
of the data within any given enterprise
• i.e., memos, chat rooms, presentations, body of an
email, images, videos, letters, researches, white papers
etc.
▪ Unstructured data has a faster growth rate than
structured data.
▪ Unstructured data cannot be directly processed or
queried using SQL. If it is required to be stored
within a relational database, it is stored in a table as
a Binary Large Object (BLOB).
▪ Alternatively, a NoSQL database can be used to
store unstructured data alongside structured data.
Classification of Big Data Meta Data

▪ Metadata provides information about a dataset’s characteristics and

structure.

▪ Mostly machine-generated and can be appended to data.

▪ The tracking of metadata is crucial to Big Data processing, storage and

analysis because it provides information about the pedigree of the data
and its provenance during processing.

▪ Examples of metadata include:

• XML tags providing the author and creation date of a document

• attributes providing the file size and resolution of a digital photograph

▪ Big Data solutions rely on metadata, particularly when processing semi-

structured and unstructured data
Protein-to-protein
Hurricane moving interaction networks
path predication

Satellite imagery, mobile

station, distributed sensor
networks, geographical
plotting …

Web data

Volcano monitoring

Digital health
care
Intelligent transportation

Applications of Big Data

Self-learning

▪ Kamal, Raj, and Preeti Saxena. Big Data Analytics:

Introduction to Hadoop, Spark, and Machine-Learning.
McGraw-Hill Education, 2019.
Chapter 1:
1.7 Big Data Analytics Applications and Case Studies

▪ Bahga, Arshdeep, and Vijay Madisetti. Big Data analytics: A

hands-on approach, 2020.
Chapter 1:
1.4 Domain Specific Examples of Big Data
Enterprise Technologies for Big Data and Business
Intelligence

▪ Big Data has ties to business architecture at each of the organizational layers.

▪ In an enterprise executed as a layered system, the strategic layer constrains the

tactical layer, which directs the operational layer.

▪ The transformation of data into information, information into knowledge and

knowledge into wisdom require the understanding of the following Concepts:
▪ Online Transaction Processing (OLTP)

▪ Online Analytical Processing (OLAP)

▪ Extract Transform Load (ETL)

▪ Data Warehouses

▪ Data Marts

▪ Traditional BI

▪ Big Data BI
Online Transaction Processing (OLTP)

▪ OLTP is a software system that processes transaction-oriented data.

▪ The term “online transaction” refers to the completion of an activity in real time
and is not batch-processed.
▪ OLTP systems store operational data that is normalized. This data is a common
source of structured data and serves as input to many analytic processes.
• OLTP systems example: ticket reservation systems, banking and point of sale
systems.

▪ OLTP queries are comprised of simple insert, delete and update operations with
sub-second response times.
Online Analytical Processing (OLAP)
▪ OLAP systems are used for processing data analysis queries. They form an
integral part of business intelligence, data mining and machine learning
processes.
▪ They are relevant to Big Data in that they can serve as both a data source as
well as a data sink that is capable of receiving data.
▪ They are used in diagnostic, predictive and prescriptive analytics.
▪ OLAP systems perform long-running, complex queries against a
multidimensional database whose structure is optimized for performing
advanced analytics.
▪ OLAP systems store historical data that is aggregated and denormalized to
support fast reporting capability.
Extract Transform Load (ETL)

▪ ETL is a process of loading data from a source system into a target system. It
represents the main operation through which data warehouses are fed data.

▪ The required data is first obtained or extracted from the sources, after which the
extracts are modified or transformed by the application of rules. Finally, the data is
inserted or loaded into the target system.

▪ The source system and the target

system can be a database, a flat file, or
an application.

▪ A Big Data solution encompasses the

ETL feature-set for converting data of
different types.
Data Warehouses
▪ A data warehouse is a central, enterprise-wide repository consisting of historical and
current data.
▪ They are heavily used by BI to run various analytical queries.
▪ They usually interface with an OLAP system to support multi-dimensional analytical
queries.
▪ Data pertaining to multiple business entities from different operational systems is
periodically extracted, validated, transformed and consolidated into a single denormalized
database.
Data Mart

▪ A data mart is a subset of the data stored in a data warehouse that typically
belongs to a department, division, or specific line of business. Data
warehouses can have multiple data marts.
Business intelligent
Traditional BI
▪ Utilizes descriptive and diagnostic analytics to provide
information on historical and current events.
▪ It is not “intelligent” because it only provides answers to
correctly formulated questions.
• Ad-hoc reports – custom-made reports on a specific area of
the business
• Dashboards - holistic view of key business areas at periodic
intervals in realtime or near-realtime

Big Data BI
▪ builds upon traditional BI by acting on the cleansed,
consolidated enterprise wide data in the data warehouse
and combining it with semi-structured and unstructured
data sources.
▪ It comprises both predictive and prescriptive analytics to
facilitate the development of an enterprise-wide
understanding of business performance.
Business Intelligence vs Big Data

▪ Although Big Data and Business Intelligence are two technologies used to analyse data to help
companies in the decision-making process, there are differences between both of them. They differ in
the way they work as much as in the type of data they analyse.
▪ Traditional BI methodology is based on the principle of grouping all business data into a central Data
Warehouse and analysing it in offline mode. The data is structured in a conventional relational database
with an additional set of indexes and forms of access to the tables (multidimensional cubes).
▪ These are the main differences between Big Data and Business Intelligence:
▪ In a Big Data environment, information is stored on a distributed file system, rather than on a central server. It is a much
safer and more flexible space.

▪ Big Data solutions carry the processing functions to the data, rather than the data to the functions. As the analysis is
centered on the information, it´s easier to handle larger amounts of information in a more agile way.

▪ Big Data can analyse data in different formats, both structured and unstructured. The volume of unstructured data is
growing at levels much higher than the structured data. Nevertheless, its analysis carries different challenges. Big Data
solutions solve them by allowing a global analysis of various sources of information.

▪ Data processed by Big Data solutions can be historical or come from real-time sources. Thus, companies can make
decisions that affect their business in an agile and efficient way.

▪ Big Data technology uses parallel mass processing (MPP) concepts, which improves the speed of analysis. With MPP many
instructions are executed simultaneously, and since the various jobs are divided into several parallel execution parts, at the
end the overall results are reunited and presented. This allows you to analyse large volumes of information quickly.
Case Study You are required to go
through the case study at
the end of chapter 1.

“Ensure to Insure (ETI)

Erl T, Khattak W, Buhler P. Big data fundamentals: concepts, drivers &

techniques. Prentice Hall Press; 2016 Jan.

Unit-1
No ratings yet
Unit-1
107 pages
Big Data Analytics - Complete Notes
No ratings yet
Big Data Analytics - Complete Notes
136 pages
DSBDA_UNIT1
No ratings yet
DSBDA_UNIT1
221 pages
big-data-2022-notes
No ratings yet
big-data-2022-notes
118 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Slide 1 Big Data Introduction
No ratings yet
Slide 1 Big Data Introduction
88 pages
Big Data and its characteristics
No ratings yet
Big Data and its characteristics
21 pages
Lecture 1: Big Data Challenges and Overview: Extracted From
No ratings yet
Lecture 1: Big Data Challenges and Overview: Extracted From
26 pages
UNIT 1
No ratings yet
UNIT 1
57 pages
Big Data.pptx (1)
No ratings yet
Big Data.pptx (1)
54 pages
big_data-intro
No ratings yet
big_data-intro
31 pages
Bda M1
No ratings yet
Bda M1
111 pages
Introduction to Big Data Analytics_thendral1
No ratings yet
Introduction to Big Data Analytics_thendral1
26 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Unit 1
No ratings yet
Unit 1
54 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
BDA Unit 1
No ratings yet
BDA Unit 1
60 pages
Module 1.1 - Introduction To Big Data
No ratings yet
Module 1.1 - Introduction To Big Data
18 pages
Unit 1
No ratings yet
Unit 1
76 pages
Module 6_Big Data and NOSQL
No ratings yet
Module 6_Big Data and NOSQL
63 pages
Unit-I_Big Data_ (1)
No ratings yet
Unit-I_Big Data_ (1)
29 pages
Big Data Engineer 2021-Ecosystem-Course Guide (2)-21-30
No ratings yet
Big Data Engineer 2021-Ecosystem-Course Guide (2)-21-30
10 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
38 pages
unit 1
No ratings yet
unit 1
20 pages
Unit 1
No ratings yet
Unit 1
56 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Unit 5
No ratings yet
Unit 5
63 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
UNIT I
No ratings yet
UNIT I
25 pages
Unit - 5 (Data Science)
No ratings yet
Unit - 5 (Data Science)
25 pages
Unit 2 Da
No ratings yet
Unit 2 Da
69 pages
Unit 1
No ratings yet
Unit 1
59 pages
01_Introduction to Big Data Analytics.pdf
No ratings yet
01_Introduction to Big Data Analytics.pdf
37 pages
Unit 3 Big Data Analytics
No ratings yet
Unit 3 Big Data Analytics
18 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Cloud computing
No ratings yet
Cloud computing
86 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
big data unit 1
No ratings yet
big data unit 1
20 pages
BIG_DATA
No ratings yet
BIG_DATA
16 pages
CC&BD Unit 3
No ratings yet
CC&BD Unit 3
16 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
Module 2-4
No ratings yet
Module 2-4
16 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Unit 1
No ratings yet
Unit 1
26 pages
Unit 1.1 - Introduction to Big Data Analytics
No ratings yet
Unit 1.1 - Introduction to Big Data Analytics
19 pages
Big Data 101
No ratings yet
Big Data 101
18 pages
Introduction To Bigdata
No ratings yet
Introduction To Bigdata
31 pages
Sns College of Engineering: Big Data Analytics
No ratings yet
Sns College of Engineering: Big Data Analytics
17 pages
Big Data Intro
No ratings yet
Big Data Intro
12 pages
Bda Chapter 1 Techneo
No ratings yet
Bda Chapter 1 Techneo
27 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
Essentials of Marketing 17th edition William D. Perreault Jr. instant download
No ratings yet
Essentials of Marketing 17th edition William D. Perreault Jr. instant download
41 pages
Marketing Mavericks - Second Round
No ratings yet
Marketing Mavericks - Second Round
2 pages
Big Data
No ratings yet
Big Data
7 pages
Jurnal Fix Kajian BMC Eptilu 2
No ratings yet
Jurnal Fix Kajian BMC Eptilu 2
13 pages
N3 MKT MAIN SESSION RESULT 2024
No ratings yet
N3 MKT MAIN SESSION RESULT 2024
3 pages
CTIT
No ratings yet
CTIT
72 pages
TAUS Quality Ebook PDF
No ratings yet
TAUS Quality Ebook PDF
32 pages
Temenos Ibs
50% (2)
Temenos Ibs
57 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Data Visualization for Managers (2)
No ratings yet
Data Visualization for Managers (2)
2 pages
JD Business Intelligence T1
No ratings yet
JD Business Intelligence T1
2 pages
Business Intelligence Solutions in Healthcare A Case Study: Transforming OLTP System To BI Solution
No ratings yet
Business Intelligence Solutions in Healthcare A Case Study: Transforming OLTP System To BI Solution
7 pages
Activate Bi Content
100% (1)
Activate Bi Content
12 pages
CH 4
No ratings yet
CH 4
38 pages
Kellogg Cloudstrat
No ratings yet
Kellogg Cloudstrat
1 page
HP Vertica 7.1.x MS Connectivity Pack
No ratings yet
HP Vertica 7.1.x MS Connectivity Pack
31 pages
The Impact of Business Intelligence, Knowledge Sharing and SMEs Innovation On Innovative Work Behavior A Proposed Framework For SMEs
No ratings yet
The Impact of Business Intelligence, Knowledge Sharing and SMEs Innovation On Innovative Work Behavior A Proposed Framework For SMEs
9 pages
ccs341 data warehousing syllabus
No ratings yet
ccs341 data warehousing syllabus
2 pages
Solution Manual For Business Research Methods 11th Edition by Cooper Complete Downloadable File At: Research-Methods-11th-Edition-by-Cooper
100% (1)
Solution Manual For Business Research Methods 11th Edition by Cooper Complete Downloadable File At: Research-Methods-11th-Edition-by-Cooper
25 pages
Your Enterprise Data Archiving Strategy PDF
100% (1)
Your Enterprise Data Archiving Strategy PDF
20 pages
B.N.N College, Bhiwandi Department of Information Technology Subject: Business Intelligence Questions Bank
No ratings yet
B.N.N College, Bhiwandi Department of Information Technology Subject: Business Intelligence Questions Bank
49 pages
ISM in Genpact
No ratings yet
ISM in Genpact
18 pages
Healthcare Information Technology: IBM Health Integration Framework
No ratings yet
Healthcare Information Technology: IBM Health Integration Framework
12 pages
Data Integration - Solution Strategy
No ratings yet
Data Integration - Solution Strategy
37 pages
Notes Unit I
No ratings yet
Notes Unit I
12 pages
Integrating Sap Business Objects Xcelsius and Web Intelligence Reporting With Sap Netweaver BW and Sap Netweaver Portal
No ratings yet
Integrating Sap Business Objects Xcelsius and Web Intelligence Reporting With Sap Netweaver BW and Sap Netweaver Portal
45 pages
Entrep Q1 Week6
No ratings yet
Entrep Q1 Week6
3 pages
Channabasaveshwara Institute of Technology
No ratings yet
Channabasaveshwara Institute of Technology
2 pages
Chapter 6 Foundations of Business Intelligence
No ratings yet
Chapter 6 Foundations of Business Intelligence
17 pages
Srikant - Informatica AXON EDC MDM Consultant - GoAhead
No ratings yet
Srikant - Informatica AXON EDC MDM Consultant - GoAhead
4 pages
Market Research
No ratings yet
Market Research
4 pages
Business Intelligence Requirements Gathering
No ratings yet
Business Intelligence Requirements Gathering
2 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lec 1 - Introduction to Big Data

Uploaded by

Lec 1 - Introduction to Big Data

Uploaded by

Understanding Big Data - I

▪ Understand the fundamentals of storage and

▪ Illustrate the core concepts of NoSQL for Big

▪ To be familiar with the following Database Concepts:

• Different Database Models

• Transaction Processing and Concurrency Control

▪ Why Big Data?

▪ The Definition of Big Data

▪ Characteristics/Challenges of Big Data

▪ Classification of Big Data

▪ Applications of Big Data

▪ Enterprise Technologies for Big Data and Business Intelligence

▪ 2.5 quintillion (1018) bytes of data are generated every day!

• Social media sites

Source: IBM http://www-01.ibm.com/software/data/bigdata/

• Increase of storage capacities.

• Increase of processing power.

▪ Refers to the vast amount of

▪ It is not only Terabytes, but

▪ This makes most datasets too

▪ Big data tools use distributed

▪ High data volumes impose:

▪ The most challenging V to conquer, since it

▪ The velocity of data translates into the amount

▪ Coping with the fast inflow of data requires to

The 3 V’s of Big Data: Velocity Remains A

▪ It is a challenge to manage, analyze, summarize, visualize, and discover

• Structured data • Data streams

▪ Veracity is the degree to which data is

Veracity: The Most Important “V” of Big Data

▪ The value characteristic is intuitively related to

▪ Value and time are inversely related. The

▪ Apart from veracity and time, value is also impacted

▪ It is easy to work with structures data W.R.T the following

▪ Transaction Processing – ACID properties

▪ Semi-structured data has a defined level of

▪ Semi-structured data often has special pre-

▪ Metadata for this data is available but is not

▪ Unstructured data is the data which does not

▪ Metadata provides information about a dataset’s characteristics and

▪ Mostly machine-generated and can be appended to data.

▪ The tracking of metadata is crucial to Big Data processing, storage and

▪ Examples of metadata include:

• XML tags providing the author and creation date of a document

• attributes providing the file size and resolution of a digital photograph

▪ Big Data solutions rely on metadata, particularly when processing semi-

Satellite imagery, mobile

Applications of Big Data

▪ Kamal, Raj, and Preeti Saxena. Big Data Analytics:

▪ Bahga, Arshdeep, and Vijay Madisetti. Big Data analytics: A

▪ In an enterprise executed as a layered system, the strategic layer constrains the

▪ The transformation of data into information, information into knowledge and

▪ Online Analytical Processing (OLAP)

▪ Extract Transform Load (ETL)

▪ OLTP is a software system that processes transaction-oriented data.

▪ The source system and the target

▪ A Big Data solution encompasses the

“Ensure to Insure (ETI)

Erl T, Khattak W, Buhler P. Big data fundamentals: concepts, drivers &

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.