0% found this document useful (0 votes)

10 views

BDA-Lec3

The document outlines the Big Data Analytics Lifecycle, detailing its stages from business case evaluation to data analysis, using ETI Insurance Company as a case study. It emphasizes the importance of structured methodologies for managing tasks related to data acquisition, processing, and analysis. Each stage is crucial for ensuring effective data handling and achieving business objectives, particularly in detecting fraudulent claims.

Uploaded by

Ahmed Ibrahim Ghnnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

BDA-Lec3

Uploaded by

Ahmed Ibrahim Ghnnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

3rd grade

Big Data Analytics

Dr. Nesma Mahmoud
Lecture 3: More on
Big Data Analytics
What will we learn in this lecture?

01. Intro to Big Data Analytics Lifecycle

02. Detailed Big Data Analytics Lifecycle with Case Study

03. Real Project Example

Intro to Big Data
01. Analytics Lifecycle
Intro
● To address the distinct requirements for performing
analysis on Big Data, a step-by-step methodology is
needed to organize the activities and tasks involved with
acquiring, processing, analyzing and repurposing data.
Big Data Analytics Lifecycle can be …
Big Data Analytics Lifecycle can be …
Big Data Analytics Lifecycle can be …
Big Data Analytics Lifecycle can be …
● This is a specific data analytics
lifecycle that organizes and manages
the tasks and activities associated
with the analysis of Big Data.
Detailed Big Data
01. Analytics Lifecycle with
Case Study
ETI Insurance Company (Case Study)
● ETI’s Big Data journey has reached the stage where its IT team
possesses the necessary skills and the management is convinced of the
potential benefits that a Big Data solution can bring in support of the
business goals.

● The CEO and the directors are eager to )‫ (متشوق لي‬see Big Data in action.
○ In response to this, the IT team, in partnership with the business
personnel, take on ETI’s first Big Data project.

○ After a thorough evaluation process, the “detection of fraudulent

claims” objective is chosen as the first Big Data solution.

● The team then follows a step-by-step approach as set forth by the Big
Data Analytics Lifecycle in pursuit of achieving this objective.
Stage 1: Business Case Evaluation
● Also called “define the problem”

● Each Big Data analytics lifecycle must begin

with a well-defined business case that
presents a clear understanding of the
justification, motivation and goals of
carrying out the analysis.

● The Business Case Evaluation stage

requires that a business case be created,
assessed and approved prior to proceeding
with the actual hands-on analysis tasks
Stage 1: Business Case Evaluation
● Based on business requirements that are
documented in the business case, it can be
determined whether the business problems being
addressed are really Big Data problems.
○ In order to qualify as a Big Data problem, a
business problem needs to be directly related
to one or more of the Big Data characteristics
of volume, velocity, or variety
● Another outcome of this stage is the determination
of the underlying budget required to carry out the
analysis project.
○ Any required purchase, such as tools,
hardware and training, must be understood
in advance(‫ مقدما‬- ‫ )في األول‬so that the
anticipated investment can be weighed
against the expected benefits of achieving
the goals.
Stage 1: Business Case Evaluation (Case Study)
● Carrying out Big Data analysis for the “detection of fraudulent claims”
○ corresponds to a decrease in monetary loss) ‫ (خسارة مالية‬and hence
carries complete business backing.

● For keeping the analysis somewhat straightforward)‫(مباشره‬, the scope of

Big Data analysis is limited to identification of fraud in the building
sector.
○ ETI provides building and contents insurance)‫ (تأمين‬to both domestic
and commercial ( ‫)محلي و تجاري‬customers.

● To measure the success of the Big Data solution for fraud detection, one
of the KPIs set is the reduction in fraudulent claims by 15%.
Stage 1: Business Case Evaluation (Case Study)
● Taking their budget into account)‫(في الحسبان – في عين االعتبار‬,
○ the team decides that their largest expense will be in the procuring)‫ الحصول على‬- ‫(التوريد‬
○ of new infrastructure that is appropriate for building a Big Data solution environment.
○ They realize that they will be leveraging open source technologies to support batch
processing and therefore do not believe that a large, initial up-front investment is
required for tooling.
○ However, when they consider the broader Big Data analytics lifecycle, the team
members realize that they should budget for the acquisition of additional data
quality and cleansing tools and newer data visualization technologies.
○ After accounting for these expenses, a cost-benefit analysis reveals that the
investment in the Big Data solution can return itself several times over if the targeted
fraud-detecting KPIs can be attained.
○ As a result of this analysis, the team believes that a strong business case exists for
using Big Data for enhanced data analysis.
Stage 2: Data Identification
● This stage is dedicated to identifying the
datasets required for the analysis project
and their sources.

● Identifying a wider variety of data sources

may increase the probability of finding
hidden patterns and correlations.

● Depending on the business scope of the

analysis project and nature of the business
problems being addressed, the required
datasets and their sources can be internal
and/or external to the enterprise.
Stage 2: Data Identification
● In the case of internal datasets,
○ a list of available datasets from internal
sources, such as data marts and operational
systems, are typically compiled and matched
against a predefined dataset specification.

● In the case of external datasets,

○ a list of possible third-party data providers,
such as data markets and publicly available
datasets, are compiled.
○ Some forms of external data may be
embedded within blogs or other types of
content-based web sites, in which case they
may need to be harvested)‫ (الحصول عليها‬via
automated tools.
Stage 2: Data Identification (Case Study)
A number of internal and external datasets are identified.

Internal data includes External data includes

• policy data, • social media data (Twitter feeds),
• insurance application documents, • weather reports,
• claim data, • geographical (GIS) data
• claim adjuster notes, • and census data)‫(بيانات التعداد السكاني‬.
• incident photographs,
• call center agent notes and emails.

• Nearly all datasets go back(Cover past) five years in time.

• The claim data consists of historical claim data(historical records) consisting of
multiple fields where one of the fields specifies if the claim was fraudulent or
legitimate.). ‫(احتيالية أو شرعية‬
Stage 3: Data Acquisition and Filtering
● In this stage, the data is gathered from all of the
data sources that were identified during the
previous stage.

● The acquired data is then subjected to automated

filtering for the removal of corrupt data or data that
has been deemed to have)‫ (يعتبر أن لديه‬no value to the
analysis objectives.

● Depending on the type of data source, data may

come as a collection of files, such as data
purchased from a third-party data provider, or
may require API integration, such as with Twitter.
Stage 3: Data Acquisition and Filtering
● In many cases, especially where external,
unstructured data is concerned, some or most of
the acquired data may be irrelevant (noise) and
can be discarded as part of the filtering process.

● Data classified as “corrupt” can include records

with missing or nonsensical values)‫ (قيم ال معنى لها‬or
invalid data types.

● Data that is filtered out for one analysis may

possibly be valuable for a different type of
analysis.
○ Therefore, it is advisable to store a verbatim copy of
the original dataset before proceeding with the
filtering. To minimize the required storage space, the
verbatim ) ‫(حرفي‬copy can be compressed.
Stage 3: Data Acquisition and Filtering
● Both internal and external data needs to be
persisted(stored) once it gets generated or enters
the enterprise )‫(المؤسسة‬boundary.
○ For batch analytics, this data is persisted to
disk prior to analysis.
○ For real-time analytics, the data is analyzed
first and then persisted to disk.
● As evidenced in this Figure, metadata can be
added via automation to data from both internal
and external data sources to improve the
classification and querying.
Examples of appended metadata include dataset size and structure, source information,
date and time of creation or collection and language-specific information. It is vital that
metadata be machine-readable and passed forward along subsequent analysis stages. This
helps maintain data provenance)‫ (المنشأ‬throughout the Big Data analytics lifecycle, which helps
to establish and preserve )‫ (يحفظ‬data accuracy and quality.
Stage 3: Data Acquisition and Filtering (Case Study)

Data Its Sources

policy data policy administration system
claim data, incident photographs
claims management system
and claim adjuster notes
insurance application documents document management system
currently embedded within the
The claim adjuster notes claim data → a separate process is
used to extract them
Call center agent notes and emails CRM system
rest of the datasets third-party data providers
Stage 3: Data Acquisition and Filtering (Case Study)
● A compressed copy of the original version of all of the datasets is stored
on-disk.

● From a provenance perspective )‫ (من حيث المنشأ‬, the following metadata is

tracked to capture the pedigree of each dataset: dataset’s name, source,
size, format, checksum, acquired date and number of records.

● A quick check of the data qualities of Twitter feeds and weather reports
suggests that around four to five percent of their records are corrupt.
○ Consequently, two batch data filtering jobs are established to remove
the corrupt records.
Stage 4: Data Extraction
● Some of the data identified as input for the
analysis may arrive in a format incompatible
with the Big Data solution.

● The need to address disparate )‫ (مختلفة‬types

of data is more likely with data from external
sources.

● The Data Extraction lifecycle stage is

dedicated to extracting disparate data and
transforming it into a format that the
underlying Big Data solution can use for the
purpose of the data analysis.
Stage 4: Data Extraction
● The extent ) ‫(مدى‬of extraction and transformation
required depends on the types of analytics and
capabilities of the Big Data solution.
○ For example, extracting the required fields
from delimited textual data, such as with
webserver log files, may not be necessary if
the underlying Big Data solution can already
directly process those files.
○ Similarly, extracting text for text analytics,
which requires scans of whole documents,
is simplified if the underlying Big Data
solution can directly read the document in its
native format
Stage 4: Data Extraction

It demonstrates the extraction of the latitude and

longitude coordinates of a user from a single JSON
It illustrates the extraction of comments and a field.
user ID embedded within an XML document
without the need for further transformation.

Further transformation is needed in order to separate the data into two separate fields as required
by the Big Data solution.
Stage 4: Data Extraction (Case Study)
● The IT team observes that some of the datasets will need to be pre-
processed in order to extract the required fields.
● For example,
○ the tweets dataset is in JSON format.
■ In order to be able to analyze the tweets, the user id,
timestamp and the tweet text need to be extracted and
converted to tabular form.
○ the weather dataset arrives in a hierarchical format (XML),
■ fields such as timestamp, temperature forecast, wind speed
forecast, wind direction forecast, snow forecast and flood
forecast are also extracted and saved in a tabular form.
Stage 5: Data Validation and Cleansing
● It is dedicated to establishing often complex
validation rules and removing any known
invalid data.
● Invalid data can skew and falsify analysis
results.
● Unlike traditional enterprise data, where the
data structure is pre-defined and data is
pre-validated, data input into Big Data
analyses can be unstructured without any
indication of validity.
○ Its complexity can further make it
difficult to arrive at a set of suitable
validation constraints.
Stage 5: Data Validation and Cleansing
● Big Data solutions often receive redundant data
across different datasets.
○ This redundancy can be exploited to explore
interconnected datasets in order to
assemble validation parameters and fill in
missing valid data.
○ For example, as illustrated in this Figure:
■ • The first value in Dataset B is
validated against its corresponding
value in Dataset A.
■ • The second value in Dataset B is not
validated against its corresponding
value in Dataset A.
■ • If a value is missing, it is inserted
from Dataset A.
Stage 5: Data Validation and Cleansing
● For batch analytics,
○ data validation and cleansing can be
achieved via an offline ETL operation.
● For real-time analytics,
○ a more complex in-memory system is
required to validate and cleanse the
data as it arrives from the source.

● Provenance can play an important role in

determining the accuracy and quality of
questionable data.
Stage 5: Data Validation and Cleansing (Case Study)
● To keep costs down,
○ ETI is currently using free versions of the weather and the census
datasets that are not guaranteed to be 100% accurate.
■ As a result, these datasets need to be validated and cleansed.
○ Based on the published field information,
■ the team is able to check the extracted fields for typographical
errors and any incorrect data as well as data type and range
validation.
■ A rule is established that a record will not be removed if it
contains some meaningful level of information even though
some of its fields may contain invalid data.
Stage 6: Data Aggregation and Representation
● This stage is dedicated to integrating
multiple datasets together to arrive at a
unified view.
● Data may be spread across multiple
datasets, requiring that datasets be joined
together via common fields, for example
date or ID.
○ In other cases, the same data fields
may appear in multiple datasets, such
as date of birth.
● Either way, a method of data reconciliation
is required or the dataset representing the
correct value needs to be determined.
Stage 6: Data Aggregation and Representation
● Performing this stage can become complicated because of differences
in:
○ • Data Structure – Although the data format may be the same, the
data model may be different.
○ • Semantics – A value that is labeled differently in two different
datasets may mean the same thing, for example “surname” and
“last name.”

● The large volumes processed by Big Data solutions can make data
aggregation a time and effort-intensive operation.

● Reconciling these differences can require complex logic that is

executed automatically without the need for human intervention.
Stage 6: Data Aggregation and Representation
● A data structure standardized by the Big Data solution can act as a common
denominator that can be used for a range of analysis techniques and projects.
○ This can require establishing a central, standard analysis repository, such as
a NoSQL database, as shown in the following Figure.

A simple example of data aggregation where two datasets are aggregated together using
the Id field.
Stage 6: Data Aggregation and Representation
● This Figure shows the same piece of data stored in two different formats.
○ Dataset A contains the desired piece of data, but it is part of a BLOB that is
not readily accessible for querying.
○ Dataset B contains the same piece of data organized in column-based
storage, enabling each field to be queried individually.

Dataset A and B can be combined to create a standardized data structure with a Big Data
solution.
Stage 6: Data Aggregation and Representation (Case
Study)
● For meaningful analysis of data,
○ it is decided to join together policy data, claim data and call center
agent notes in a single dataset that is tabular in nature where each
field can be referenced via a data query.
○ It is thought that this will not only help with the current data analysis
task of detecting fraudulent claims but will also help with other data
analysis tasks, such as risk evaluation and speedy settlement of
claims.
○ The resulting dataset is stored in a NoSQL database.
Stage 7: Data Analysis
● The Data Analysis stage is dedicated to carrying out the
actual analysis task, which typically involves one or more
types of analytics.
● This stage can be iterative in nature,
○ especially if the data analysis is predictive analytics,
in which case analysis is repeated until the
appropriate pattern or correlation is uncovered.
● Depending on the type of analytic result required,
○ this stage can be as simple as querying a dataset to
compute an aggregation for comparison.
○ On the other hand, it can be as challenging as
combining data mining and complex statistical
analysis techniques to discover patterns and
anomalies or to generate a statistical or
mathematical model to depict relationships between
variables.
Stage 7: Data Analysis (Case Study)
● The IT team involves the data analysts at this stage as it does not have the right
skillset for analyzing data in support of detecting fraudulent claims.
● In order to be able to detect fraudulent transactions,
○ first the nature of fraudulent claims needs to be analyzed in order to find
which characteristics differentiate a fraudulent claim from a legitimate claim.
○ For this, the predictive data analysis approach is taken. As part of this
analysis, a range of analysis techniques are applied.
● This stage is repeated a number of times as the results generated after the first
pass are not conclusive enough to comprehend what makes a fraudulent claim
different from a legitimate claim.
● As part of this exercise, attributes that are less indicative of a fraudulent claim are
dropped while attributes that carry a direct relationship are kept or added.
Stage 8: Data Visualization
● The ability to analyze massive amounts of data and
find useful insights carries little value if the only
ones that can interpret the results are the analysts.
● The Data Visualization stage is dedicated to using
data visualization techniques and tools to
graphically communicate the analysis results for
effective interpretation by business users.
● The results of completing the Data Visualization
stage provide users with the ability to perform
visual analysis, allowing for the discovery of
answers to questions that users have not yet even
formulated.
Stage 8: Data Visualization
Stage 8: Data Visualization (Case Study)
● The team has discovered some interesting findings and now needs to
convey the results to the actuaries, underwriters and claim adjusters.

● Different visualization methods are used including bar and line graphs
and scatter plots.
○ Scatter plots are used to analyze groups of fraudulent and
legitimate claims in the light of different factors, such as customer
age, age of policy, number of claims made and value of claim.
Stage 8: Data Visualization (Case Study)
Stage 9: Utilization of Analysis Results
● Subsequent to analysis results being made available to
business users to support business decision-making, such
as via dashboards, there may be further opportunities to
utilize the analysis results.
● This stage is dedicated to determining how and where
processed analysis data can be further leveraged.
● Depending on the nature of the analysis problems being
addressed, it is possible for the analysis results to produce
“models” that encapsulate new insights and understandings
about the nature of the patterns and relationships that exist
within the data that was analyzed.
○ A model may look like a mathematical equation or a set of rules.
○ Models can be used to improve business process logic and
application system logic, and they can form the basis of a new
system or software program.
Stage 9: Utilization of Analysis Results (Case Study)
● Based on the data analysis results, the underwriting and the claims
settlement users have now developed an understanding of the nature of
fraudulent claims.

● However, in order to realize tangible benefits from this data analysis

exercise, a model based on a machine-learning technique is generated,
which is then incorporated into the existing claim processing system to
flag fraudulent claims.
03. Real Project Example
Real Project From Kaggle
● Insurance Claims - Fraud Detection

○ Steps: https://medium.com/@sauravkarki10.12/insurance-claim-
fraud-detection-project-c700e31c7602
○ https://www.kaggle.com/datasets/mykeysid10/insurance-claims-
fraud-detection/data
Thanks!
Do you have any questions?

CREDITS: This presentation template was created by Slidesgo, and includes

icons by Flaticon, and infographics & images by Freepik

Data Analytics for Beginners: Introduction to Data Analytics
From Everand
Data Analytics for Beginners: Introduction to Data Analytics
Anthony S. Williams
4/5 (18)
Statement of Purpose
100% (1)
Statement of Purpose
3 pages
John CV
No ratings yet
John CV
9 pages
Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
BDA-Lec4
No ratings yet
BDA-Lec4
40 pages
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Big Data Categories-Life Cycle
No ratings yet
Big Data Categories-Life Cycle
15 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
BI_All_In_One (1)
No ratings yet
BI_All_In_One (1)
52 pages
Case Study: Ensure To Insure
No ratings yet
Case Study: Ensure To Insure
31 pages
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
From Everand
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Introduction to Data Analytics
From Everand
Introduction to Data Analytics
Dan Martin
No ratings yet
البيانات الضخمة وأثرها في عملية اتخاذ القرار
No ratings yet
البيانات الضخمة وأثرها في عملية اتخاذ القرار
16 pages
Big Data Analytics Life Cycle
No ratings yet
Big Data Analytics Life Cycle
3 pages
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
Unit V
No ratings yet
Unit V
3 pages
Navigating Compliance: A Comprehensive Guide for AI Tool Builders on GDPR and CCPA Data Regulations
From Everand
Navigating Compliance: A Comprehensive Guide for AI Tool Builders on GDPR and CCPA Data Regulations
Callum Knox
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
BIG DATA Homework
No ratings yet
BIG DATA Homework
2 pages
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Data Governance for Tax Administrations: A Practical Guide
From Everand
Data Governance for Tax Administrations: A Practical Guide
Inter-American Center of Tax Administrations – CIAT
No ratings yet
dsbd
No ratings yet
dsbd
23 pages
Business Undestanding and Data Collection
No ratings yet
Business Undestanding and Data Collection
27 pages
As You Delve Into The World of Data Analytics
No ratings yet
As You Delve Into The World of Data Analytics
10 pages
Business Analytics and Big Data
From Everand
Business Analytics and Big Data
Sachin Naha
No ratings yet
ChatGPT and Data security
From Everand
ChatGPT and Data security
Stefan Mielich
No ratings yet
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
Novigo Solutions Capabilities
No ratings yet
Novigo Solutions Capabilities
45 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
CIW Data Analyst Exam Prep: 500 Practice Questions for Certification Success
From Everand
CIW Data Analyst Exam Prep: 500 Practice Questions for Certification Success
Steve Brown
No ratings yet
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
BDA1
No ratings yet
BDA1
2 pages
A Compliance Framework for Private Security Firms in 2025
From Everand
A Compliance Framework for Private Security Firms in 2025
Callum Knox
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Making
From Everand
Decision Making
Ethan Evans
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Data Analytics Process
No ratings yet
Data Analytics Process
10 pages
BDS Session 4
No ratings yet
BDS Session 4
65 pages
Big Data Analytics
100% (1)
Big Data Analytics
11 pages
Data Analysis and Visualization
No ratings yet
Data Analysis and Visualization
7 pages
BI and Big Data Management
From Everand
BI and Big Data Management
Ulrich Hambuch
No ratings yet
Data Analytics Fundementals
No ratings yet
Data Analytics Fundementals
40 pages
Competitor Analysis:Working Paper
From Everand
Competitor Analysis:Working Paper
Jacob Varghese
No ratings yet
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Managing Big Data Effectively
From Everand
Managing Big Data Effectively
Bhima Asan
No ratings yet
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Artificial intelligence: AI in the technologies synthesis of creative solutions
From Everand
Artificial intelligence: AI in the technologies synthesis of creative solutions
Alexander V. Andreichikov
No ratings yet
Data Privacy: What Enterprises Need to Know?
From Everand
Data Privacy: What Enterprises Need to Know?
Deepak Gupta
No ratings yet
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
From Everand
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
RAJIV JAIN
No ratings yet
Chapter 3. Big Data Adoption and Planning Considerations
No ratings yet
Chapter 3. Big Data Adoption and Planning Considerations
70 pages
Big Data - Challenges for the Hospitality Industry: 2nd Edition
From Everand
Big Data - Challenges for the Hospitality Industry: 2nd Edition
Michael Toedt
No ratings yet
Business Analysis : Learn in 24 Hours
From Everand
Business Analysis : Learn in 24 Hours
Alex Nordeen
No ratings yet
Lec4 designPattern
No ratings yet
Lec4 designPattern
48 pages
Lec5 flask
No ratings yet
Lec5 flask
5 pages
MNU CAI ICI334 Lec4&5
No ratings yet
MNU CAI ICI334 Lec4&5
33 pages
22413_SEN_QB5
No ratings yet
22413_SEN_QB5
18 pages
assignment_1
No ratings yet
assignment_1
12 pages
MNU CAI ICI334 Lec7
No ratings yet
MNU CAI ICI334 Lec7
30 pages
Answer Midterm 2024 - 11 - 19
No ratings yet
Answer Midterm 2024 - 11 - 19
4 pages
BDA-Lec10
No ratings yet
BDA-Lec10
33 pages
BDA-Lec1
No ratings yet
BDA-Lec1
25 pages
sodapdf-converted
No ratings yet
sodapdf-converted
4 pages
Lec. 3
No ratings yet
Lec. 3
18 pages
AI lecture 9
No ratings yet
AI lecture 9
39 pages
Lecture 9 - MapReduce
No ratings yet
Lecture 9 - MapReduce
50 pages
Lecture 7 - Wide Column Stores - Part 1
No ratings yet
Lecture 7 - Wide Column Stores - Part 1
30 pages
Lecture-02,03
No ratings yet
Lecture-02,03
54 pages
Chapter 8 Concurrency-P1
No ratings yet
Chapter 8 Concurrency-P1
30 pages
Section 5
No ratings yet
Section 5
7 pages
Financial Consolidation
No ratings yet
Financial Consolidation
2 pages
Flyer - GH070 - EN 202007
No ratings yet
Flyer - GH070 - EN 202007
2 pages
Checking Procedure (Tech 2)
No ratings yet
Checking Procedure (Tech 2)
4 pages
Advanced Cyber Security
No ratings yet
Advanced Cyber Security
4 pages
Advanced Java and J2EE Question Bank 2021
0% (1)
Advanced Java and J2EE Question Bank 2021
29 pages
Manual de Partes Thmig 500
No ratings yet
Manual de Partes Thmig 500
3 pages
Iag Swivel QRG v06
No ratings yet
Iag Swivel QRG v06
3 pages
Day 1
No ratings yet
Day 1
13 pages
1 New Message
No ratings yet
1 New Message
1 page
Value Labs
No ratings yet
Value Labs
11 pages
1 Kerala State Rutronix
No ratings yet
1 Kerala State Rutronix
22 pages
Crime Analysis and Prediction Using Data
No ratings yet
Crime Analysis and Prediction Using Data
7 pages
Spreadsheet& RDBMs
No ratings yet
Spreadsheet& RDBMs
10 pages
Zsarina Gurtiza Final
No ratings yet
Zsarina Gurtiza Final
1 page
Installation Guide For GESOLAR Photovoltaic Module
No ratings yet
Installation Guide For GESOLAR Photovoltaic Module
14 pages
FH Ofdm PDF
No ratings yet
FH Ofdm PDF
5 pages
2-1: Graphing Linear Relations and Functions: Objectives
100% (1)
2-1: Graphing Linear Relations and Functions: Objectives
19 pages
[FREE PDF sample] Handbook of Water and Wastewater Treatment Plant Operations 3rd Edition Frank R. Spellman ebooks
100% (3)
[FREE PDF sample] Handbook of Water and Wastewater Treatment Plant Operations 3rd Edition Frank R. Spellman ebooks
41 pages
SQL Server Backup and Restore
No ratings yet
SQL Server Backup and Restore
8 pages
Ardhouse Pro V.3.1+
No ratings yet
Ardhouse Pro V.3.1+
6 pages
Functions of A DSS PDF
100% (1)
Functions of A DSS PDF
6 pages
Analyze
No ratings yet
Analyze
3 pages
Abdul Muhaimin Bin Che Sohor: Personal Details
No ratings yet
Abdul Muhaimin Bin Che Sohor: Personal Details
3 pages
Air SNS En-1
No ratings yet
Air SNS En-1
20 pages
CLAUSE 8.5 Production and Service Provision
100% (1)
CLAUSE 8.5 Production and Service Provision
10 pages
SQL Interview Questions 1720826611
No ratings yet
SQL Interview Questions 1720826611
7 pages
325 ArticleText 945 1 10 20240809
No ratings yet
325 ArticleText 945 1 10 20240809
26 pages
Snort and Firewall Rules-1
No ratings yet
Snort and Firewall Rules-1
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

BDA-Lec3

Uploaded by

BDA-Lec3

Uploaded by

3rd grade

Big Data Analytics

01. Intro to Big Data Analytics Lifecycle

02. Detailed Big Data Analytics Lifecycle with Case Study

03. Real Project Example

○ After a thorough evaluation process, the “detection of fraudulent

● Each Big Data analytics lifecycle must begin

● The Business Case Evaluation stage

● For keeping the analysis somewhat straightforward)‫(مباشره‬, the scope of

● Identifying a wider variety of data sources

● Depending on the business scope of the

● In the case of external datasets,

Internal data includes External data includes

• Nearly all datasets go back(Cover past) five years in time.

● The acquired data is then subjected to automated

● Depending on the type of data source, data may

● Data classified as “corrupt” can include records

● Data that is filtered out for one analysis may

Data Its Sources

● From a provenance perspective )‫ (من حيث المنشأ‬, the following metadata is

● The need to address disparate )‫ (مختلفة‬types

● The Data Extraction lifecycle stage is

It demonstrates the extraction of the latitude and

● Provenance can play an important role in

● Reconciling these differences can require complex logic that is

● However, in order to realize tangible benefits from this data analysis

CREDITS: This presentation template was created by Slidesgo, and includes

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.