0% found this document useful (0 votes)
21 views

Chapter 1. Understanding Big Data

Uploaded by

Thư Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Chapter 1. Understanding Big Data

Uploaded by

Thư Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

CHAPTER 1.

UNDERSTANDING
BIG DATA

LECTURER: MBA – PH ẠM NG ỌC B ẢO DUY

BIG DATA FUNDAMENTALS: CONCEPTS, DRIVERS & TECHNIQUES – PRENTICE HALL


1

20/05/2020 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


OUTLINE

 Concepts and Terminology


 Big Data Characteristics
 Different Types of Data
 Case Study Background
2

20/05/2020
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
BIG DATA OVERVIEW

 What is Big Data?


 When is Big Data used?
 - - > Big Data is a field dedicated to the analysis, processing, and
storage of large collections of data that frequently originate from
disparate sources.
 - - > Big Data is used when traditional data analysis, processing
and storage technologies and techniques are insufficient. 3

20/05/2020 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


INSIGHTS AND BENEFITS OF BIG DATA
 Operational optimization
 Actionable intelligence
 Identification of new markets
 Accurate predictions
 Fault and fraud detection
 More detailed records
 Improved decision-making
4

 Scientific
701015 - Ứng dụng dữ liệu lớn trong kinh doanh
discoveries
20/05/2020
CONCEPTS AND TERMINOLOGY

5
DATASETS

 Datasets are collections or groups of related data.


 Each group or dataset member (datum) shares the same set of
attributes or properties as others in the same dataset.
 Examples:
 tweets stored in a flat file
 a collection of image files in a directory
 an extract of rows from a database table stored in a CSV formatted file
 historical weather observations that are stored as XML files 6

20/05/2020 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DATASETS

Figure 1.1: three datasets based on three different data formats.

20/05/2020 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DATA ANALYSIS

 Data Analysis is the process of examining data to find facts,


relationships, patterns, insights and/or trends.
 Overall goal of data analysis is to support better decision
making.
 Example:
 the analysis of ice cream sales data in order to determine how the number of ice
cream cones sold is related to the daily temperature.
 The results support decisions related to how much ice cream a store should
relation
order in to weather forecast information. 8

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DATA ANALYSIS

Figure 1.2: The symbol used to represent data analysis.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DATA ANALYTICS

 Data Analytics is a discipline that includes


 (a) the management of the complete data lifecycle (which
encompasses collecting, cleansing, organizing, storing,
analyzing and governing data),
 (b) the development of analysis methods, scientific
techniques and automated tools.
 Is a broader term that encompasses data analysis.
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
DATA ANALYTICS

Figure 1.3: The symbol used to represent data analytics.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DATA ANALYTICS

 Different kinds of organizations use data analytics tools and techniques in


different ways.
 Example:
 In business-oriented environments, data analytics results can lower operational
costs and facilitate strategic decision-making.
 In the scientific domain, data analytics can help identify the cause of a
phenomenon to improve the accuracy of predictions.
 In service-based environments like public sector organizations, data analytics can
help strengthen the focus on delivering high-quality services by driving down
costs.
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
DATA ANALYTICS

Four general categories of analytics that are distinguished by the


results they produce.

descriptiv diagnostic predictiv prescriptive


e analytics e analytics
analytics analytics

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DATA ANALYTICS
 Figure 1.4: Value and complexity increase from descriptive to prescriptive
analytics.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DESCRIPTIVE ANALYTICS
 To answer questions about events that have already occurred. It contextualizes
data to generate information.
 Sample questions:
 What was the sales volume over the past 12 months?
 What is the number of support calls received as categorized by severity and geographic location?
 What is the monthly commission earned by each sales agent?
 The result reports are generally static, display historical data in the form of data grids or charts.

 Queries are executed on operational data stores from within an enterprise, for
example a Customer Relationship Management system (CRM) or Enterprise
Resource Planning (ERP) system via ad-hoc reporting or dashboards. (see Figure 15

1.5)
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
DESCRIPTIVE ANALYTICS
 Figure 1.5: The operational systems, pictured left, are queried via descriptive
analytics tools to generate reports or dashboards, pictured right.

16

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DIAGNOSTIC ANALYTICS
 To determine the cause of a phenomenon that occurred in the past
using questions that focus on the reason behind the event.
 Sample questions:
 Why were Q 2 sales less than Q 1 sales?
 Why have there been more support calls originating from the Eastern region than
from the Western region?
 Why was there an increase in patient re-admission rates over the past three
months?
 The executed queries are performed on multidimensional data
held in processing systems performing drill-down and roll-up analysis.
analytic 17

701015 - Ứng dụng dữ liệu lớn trong kinh doanh


(see Figure 1.6)
20/05/2019
DIAGNOSTIC ANALYTICS

 Figure 1.6: Diagnostic


analytics can result in
data that is suitable for
performing drilldown and
roll-up analysis.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


PREDICTIVE ANALYTICS

 To determine the outcome of an future event and generate future


predictions based on a built model.
 Sample questions (what-if rationale):
 What are the chances that a customer will default on a loan if they have missed a
monthly payment?
 What will be the patient survival rate if Drug B is administered instead of Drug A?
 If a customer has purchased Products A and B, what are the chances that they will
also purchase Product C ?

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


PREDICTIVE ANALYTICS

 The strength and magnitude of the associations based upon past


events will form the basis of models (which include patterns, trends
and exceptions found in historical and current data).
 The models have implicit dependencies on the conditions under
which the past events occurred. If these underlying conditions
change, then the models that make predictions need to be
updated.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


PREDICTIVE ANALYTICS
 Figure 1.7: Predictive analytics tools can provide user-friendly front-end
interfaces.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


PRESCRIPTIVE ANALYTICS
 Prescriptive analytics build upon the results of predictive analytics by
prescribing actions that should be taken and explaining the reason
“why” (because they embed elements of situational understanding).
 Sample questions:
 Among three drugs, which one provides the best results?
 When is the best time to trade a particular stock?
 Various outcomes are calculated, and the best course of action for
each outcome is suggested. This approach shifts from
explanatory
to advisory and can include the simulation of various 22

701015 - Ứng dụng dữ liệu lớn trong kinh doanh


scenarios.
20/05/2019
PRESCRIPTIVE ANALYTICS

 Figure 1.8: Prescriptive analytics


involves the use of business rules and
internal and/or external data to
perform an in-depth analysis.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


BUSINESS INTELLIGENCE (BI)

 BI applies analytics to large amounts of enterprise’s data generated


by its business processes and information systems (which has
typically been consolidated into an enterprise data warehouse) to
gain insight into the performance of an enterprise.
 The output of BI can be surfaced to a dashboard that allows
managers to access and analyze the results and potentially refine
the analytic queries to further explore the data. (see Figure 1.9)

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


BUSINESS INTELLIGENCE (BI)
 Figure 1.9: BI can be used to improve business applications, consolidate data in
data
warehouses and analyze queries via a dashboard.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


KEY PERFORMANCE INDICATORS (KPI)
 A KPI is a metric
 to gauge success within a particular business context.
 to identify business performance problems and demonstrate regulatory
compliance.
 is the quantifiable reference points for measuring a specific aspect of a
business’ overall performance.
 KPIs are linked with an enterprise’s overall strategic goals and
objectives.
 KPIs are often displayed via a KPI dashboard, and compare 26

actual
the
20/05/2019 measurements701015
with -threshold values
Ứng dụng dữ liệu ofkinh
lớn trong KPI. (see Figure
doanh
KEY PERFORMANCE INDICATORS (KPI)
 Figure 1.10: A KPI dashboard acts as a central reference point for gauging business
performance.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


BIG DATA CHARACTERISTICS
FIVE BIG DATA TRAITS

 Are commonly referred to as the Five Vs


 To differentiate “Big” data from other forms of data.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


VOLUME
 The anticipated volume of data is high, substantial and ever-growing.
 Figure 1.12 provides a visual representation of the large volume of data being
created daily by organizations and users world-wide.

Figure 1.12: Organizations and users


world-wide create over 2.5 EBs of
data a day. As a point of comparison,
the Library of Congress currently
holds more than 300 TBs of data.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


VOLUME

 Typical data sources for generating high data volumes:


 online transactions, such as point-of-sale and banking
 scientific and research experiments, such as the Large Hadron
Collider and Atacama Large Millimeter/Submillimeter Array telescope
 sensors, such as GPS sensors, RFIDs, smart meters and telematics
 social media, such as Facebook and Twitter
VELOCITY

 Data can arrive at fast speeds, and enormous datasets can


accumulate within very short periods of time.
 Depending on the data source, velocity may not always be
high. For example, MRI scan images are not generated as
frequently as log entries from a high-traffic webserver.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


VELOCITY
 Figure 1.13: Examples of
high-velocity Big Data can
easily be generated in a
given minute
 350,000 tweets,
 300 hours of video
footage uploaded to
YouTube,
 171 million emails,
 330
fromGBs
a jet of sensor
701015 - Ứng dụng dữ liệu lớn trong kinh doanh
20/05/2019
VARIETY
 Variety is the multiple formats and types of data.
 Data variety brings challenges for enterprises in terms of data
integration, transformation, processing, and storage.
 Figure 1.14 gives an example of data variety, which includes
structured data (i.e. financial transactions), semi-structured data
(i.e. emails) and unstructured data (i.e. images).

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


VERACITY
 Veracity refers to the quality or fidelity of data.
 Noise is data that cannot be converted into information and thus has no
value.
 Signals are date that have value and lead to meaningful information.

 The signal-to-noise ratio


 Data with a high signal-to-noise ratio has more veracity than data with a lower
ratio.
 The signal-to-noise ratio is dependent upon the source data and data type. For
example: Data that is acquired in a controlled manner (via online customer
registrations), usually701015
20/05/2019 contains less
- Ứng dụngnoise
dữ liệuthan data
lớn trong acquired
kinh doanh via
VALUE

 Value is the usefulness of data for an enterprise.


 Value is intuitively impacted by
 (a) the veracity
 (b) the timeliness of generated analytic results (how long
data processing takes)
 (c) the lifecycle-related concerns

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


VALUE

(a) the veracity


the higher the data fidelity, the more value it holds
for the business.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


VALUE
 (b) the timeliness
 value and time are inversely
related. The longer it takes
to turn data into meaningful
information, the less value it
has for a business.
 because analytics results
have a shelf-life; for
example, a 20 minute
delayed stock quote has little
to no value for making a
trade compared to a quote38
that is 20 milliseconds
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
old
VALUE

 (c) the lifecycle-related concerns


 How well has the data been stored?
 Were valuable attributes of the data removed during data cleansing?
 Are the right types of questions being asked during data analysis?
 Are the results of the analysis being accurately communicated to the
appropriate decision-makers?

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


DIFFERENT TYPES OF DATA
THE DATA PROCESSED BY BIG DATA SOLUTIONS CAN BE

Human-generated data Machine-generated data


 The result of human interaction with  Is generated by software programs
systems, such as online services and and hardware devices in response
digital devices. to real-world events.
 For example, a log file captures an
authorization decision made by a
security service, and a point-of-sale
system generates a transaction
against inventory to reflect items
purchased by a customer.
41

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


Human-generated data Machine-generated data

 Figure 1.16: Examples of human-generated data  Figure 1.17: Examples of machine-generated


include social media, blog posts, emails, photo data include web logs, sensor data,
sharing and messaging. telemetry data, smart meter data and appliance
usage data.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


PRIMARY TYPES OF DATA

Both human-generated and machine-generated


data can be in various formats or types:
structured data
unstructured data
semi-structured data
metadata (another type)

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


STRUCTURED DATA
 Is a data model or schema and is often stored in tabular form.
 To capture relationships between different entities and is stored in a relational
database.
 Structured data is frequently generated by enterprise applications and
information systems like ERP and CRM systems.
 Example: banking transactions, invoices, and customer records.
 Figure 1.18: The symbol used to represent structured data stored in a tabular
form.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


UNSTRUCTURED DATA
 Data that does not conform to a data model or data schema, and
accounts for 80% of the data within any given enterprise.
 Unstructured data has a faster growth rate than structured data.
 Form of unstructured data: textual or binary, and often conveyed via
files that are self-contained and non-relational.
 A text file may contain the contents of various tweets or blog postings.
 Binary files are often media files that contain image, audio or video data.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


UNSTRUCTURED DATA

 Special purpose logic is usually required to process and store unstructured


data.
 For example: to play a video file, it is essential that the correct codec (coder-
decoder) is available.
 Unstructured data cannot be directly processed or queried using SQL.
 If it is required to be stored within a relational database, it is stored in a table as
a
Binary Large Object (BLOB).
 Alternatively, a Not-only SQL (NoSQL) database is a non-relational database that
can be used to store unstructured data alongside structured data.
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
SEMI-STRUCTURED DATA
 Semi-structured data has a defined level of structure and consistency, but is not
relational in nature.
 It is hierarchical or graph-based, and commonly stored in files that contain
text.
 It is more easily processed than unstructured data.
 Figure 1.20: XML, JSON and sensor data are common forms of semi-
structured.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


SEMI-STRUCTURED DATA

 Examples of common sources of semi-structured data: electronic


data interchange (EDI) files, spreadsheets, RSS feeds and
sensor data.
 Semi-structured data often has special pre-processing and storage
requirements.
 An example of pre-processing is the validation of an XML file to
ensure that it conformed to its schema definition.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


METADATA
 Metadata provides information about a dataset’s characteristics
and structure.
 It is mostly machine-generated and can be appended to data.
 The tracking of metadata is crucial to Big Data processing, storage
and analysis because it provides information about the pedigree
of the data and its provenance during processing.
 Examples:
 XML tags providing the author and creation date of a document
49

 attributes providing the file size and resolution of a digital photograph


701015 - Ứng dụng dữ liệu lớn trong kinh doanh
20/05/2019
METADATA
 Big Data solutions rely on metadata, particularly when processing
semi-structured and unstructured data.
 Figure 1.21: The symbol used to represent metadata.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


CASE STUDY BACKGROUND
CASE STUDY BACKGROUND

 Company introduction
 Company history and Company structure
 IT environment – Technical Infrastructure and Automation
Environment
 Business Goals and Obstacles to adopt a data-driven IT solution
 Big Data adoption - Case Study Example

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


COMPANY INTRODUCTION

 Ensure to Insure (ETI) is a leading insurance company that provides


a range of insurance plans in the health, building, marine and
aviation sectors.
 25 million globally dispersed customer base.
 5,000 employees.
 more than 350,000,000 USD annual revenue.

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


COMPANY HISTORY
 50 years ago: started as an exclusive health insurance provider.
 Later, ETI has extended its services to property and casualty insurance plans in
the building, marine and aviation sectors.
 Each of four sectors has a core team of specialized and experienced agents,
actuaries, underwriters and claim adjusters.

ETI’s key
department
s
Customer Human IT
Underwriting Claims Settlement Legal Marketing Accounts
care resource
department departmen departmen departmen departmen departmen departmen
departmen departmen
t t t t t t
t t 54
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
Agents
• generating the company’s revenue by selling policies
Actuaries
• managing risk assessment
• designing new insurance plans and revising existing plans
• performing what-if analyses and making use of dashboards and scorecards for scenario
evaluation
Underwriters
• evaluating new insurance applications and deciding on the premium amount
Claim adjusters
• dealing with investigating claims made against a policy
• arriving at a settlement amount for the policyholder

55
COMPANY HISTORY
 Communication channels between Customer care department and prospective
and existing customers:
 telephone
 email
 social media

 Core competence:
 providing competitive policies and premium customer service that does not end once
a policy has been sold.
 helping to achieve increased levels of customer acquisition and retention.
 relying heavily on its actuaries to create insurance plans that reflect the needs of its
56
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
customers.
policy
quotation
customer
IT ENVIRONMENT – relationship policy
TECHNICAL management
(CRM)
administratio
n
INFRASTRUCTURE AND
AUTOMATION A set of client-
ENVIRONMENT enterprise server,
claims
resource mainframe management
planning (ERP) platforms and
systems

risk
billing
assessment
document
management
57

20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh


IT ENVIRONMENT – FUNCTIONS OF EACH SYSTEM
 Policy quotation system
 To create new insurance plans
 To provide quotes to prospective customers
 Is integrated with the website and customer care portal to provide website visitors and customer
care
agents the ability to obtain insurance quotes
 Policy administration system
 To handle policy lifecycle management, including issuance, update, renewal and cancellation of policies

 Claims management system


 To deal with claim processing activities
 A claim is registered when a policyholder makes a report, then assigned to a claim adjuster who
analyzes the claim in light of the available information that was submitted when the claim was made, as
well other background information
701015obtained
- Ứng dụngfromdữ
different
liệu lớninternal and doanh
trong kinh external sources. Based on the
20/05/2019
analyzed58 information, the claim is settled following a certain set of business rules.
IT ENVIRONMENT – FUNCTIONS OF EACH SYSTEM
 Risk assessment system
 Is used by the actuaries to assess any potential risk, such as a storm or a flood
that
could result in policyholders making claims
 To run probability-based risk evaluation that involves executing various
mathematical and statistical models.
 Document management system
 A central repository for all kinds of documents, including policies, claims, scanned
documents and customer correspondence.
 Billing system
 To keep track of premium collection from customers 59

20/05/2019
 To generate 701015 for
various reminders - Ứng dụng dữ liệu
customers wholớnhave
trong kinh doanh
missed their payment via
IT ENVIRONMENT – FUNCTIONS OF EACH SYSTEM
 ERP system
 Day-to-day running of ETI, including human resource management and accounts
 CRM system
 To record all aspects of customer communication via phone, email and postal mail
 To serve as a portal for call center agents for dealing with customer enquiries.
 To allow the marketing team to create, run and manage marketing campaigns.
 = = > Data from these above operational systems is exported to an
Enterprise Data Warehouse (EDW)
 To generate reports for financial and performance analysis.
 To generate reports for different regulatory authorities to ensure continuous 60

regulatory
20/05/2019 compliance. 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
BUSINESS GOALS AND OBSTACLES
 Over the past few decades, ETI is suffering the falling share price and decrease in market share.
 A committee comprised of senior managers was formed to investigate and make recommendations.

Main reason Consequence

The existing regulations change and new regulations are


introduced very fast and frequently. But the company is slow Had to pay heavy fines
to respond and has not been able to ensure full and
continuous compliance.

The insurance plans are created and policies are underwritten


without a thorough risk assessment Reducing the profit made on
- - > incorrect premiums being set and more payouts investments
being made than anticipated

The insurance plans are generally based on the actuaries’ Customers whose
experience and analysis of the population as a whole circumstances deviate from the
average set are not interested
- - > only apply to an average set of customers
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh
in such doanhplans.
insurance
Main reason C onsequenc
e
Direct monetary loss + indirect
The increased number of complex and hard-to-detect
loss (due to the costs related
fraudulent claims and the associated payments being
to the processing of fraudulent
made against them
claims)

A significant increase of natural disasters such as floods,


storms and epidemics Loss in revenue
- - > increasing the number of high-end genuine claims

Customer defection due to slow claims processing and


Loss in the number of customer
insurance products that no longer match the needs of
customers + declines in revenue

The emergence of tech-savvy competitors that employ Loss in the number of customer
the use of telematics to provide personalized + declines in revenue
policies
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
STRATEGIC GOALS TO IMPROVE PROFITABILITY
1. Decrease losses by:
 (a) improving risk evaluation and maximizing risk mitigation, which applies to both creation of
insurance plans and when new applications are screened at the time of issuing a policy,
 (b) implementing a proactive catastrophe management system that decreases the number of potential
claims resulting from a calamity, and
 (c) detecting fraudulent claims.

2. Decrease customer defection and improve customer retention with:


 (a) speedy settlement of claims and
 (b) personalized and competitive policies based on individual circumstances rather than demographic
generalization alone.

3. Achieve and maintain full regulatory compliance at all times by employing enhanced risk
63

20/05/2019 701015
management techniques that can better - Ứngrisks
predict dụng dữ liệu lớn trong kinh doanh
OBSTACLES TO ADOPT A DATA-DRIVEN IT SOLUTION
 Acquiring, storing and processing unstructured data from internal and external
data sources – Currently, only structured data is stored and processed
 Processing large amounts of data in a timely manner – The amount of data
processed cannot be classified as large, and the reports take a long time
to generate.
 Processing multiple types of data and combining structured data with
unstructured data – Unstructured data such as documents and call center logs
that cannot currently be processed, while structured data is used in isolation for
all types of analyses.

64
= = > a recommendation that ETI should adopt Big
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
Data
BIG DATA ADOPTION - CASE STUDY EXAMPLE
1. IT team and skills for Big Data implementation
 Problems
 No in-house Big Data skills
 Have to choose between hiring a Big Data consultant or sending its IT team on a Big Data training course.
 Solutions
 Sending only the senior IT team members to the Big Data training course.
 For long-term plan, this trained team members will become a permanent in-house Big Data resource and can also
train junior team members to further increase the in-house Big Data skillset.
2. During Big Data training course
 Problems
 No common vocabulary of terms
 Lack of business exposure and understanding BI and the establishment of appropriate KPIs
 Solutions
 Building a terms glossary for datasets including claims, policies, quotes, customer profile data and census data.
20/05/2019 65
701015 - Ứng dụng dữ liệu lớn trong kinh doanh
BIG DATA ADOPTION - CASE STUDY EXAMPLE
3. Data Analytics
 Deciding to use of both descriptive and diagnostic analytics
 Descriptive analytics is for:
 querying the policy administration system to determine the number of polices sold each day
 querying the claims management system to find out how many claims are submitted daily
 querying the billing system to find out how many customers are behind on their premium payments.
 Diagnostic analytics is for
 various BI activities, such as performing queries to answer questions such as why last month’s sales target was not met.
 performing drill-down operations to breakdown sales by type and location so that it can be determined which
locations underperformed for specific types of policies.
 In the future, utilizing predictive and prescriptive analytics in a gradual manner by first implementing predictive
analytics and then slowly building up their capabilities to implement prescriptive analytics.
 predictive analytics will enable detection of fraudulent claims by predicting which claim is a fraudulent one and in
of customer defection by predicting which customers are likely to
case 66

defect.
later, via prescriptive analytics, prescribing

20/05/2019 701015the correct
- Ứng premium
dụng dữ liệuamount considering
lớn trong all risk factors or prescribing the
kinh doanh
best course of action to take for mitigating claims when faced with catastrophes, such as floods or storms.
BIG DATA ADOPTION - CASE STUDY EXAMPLE
4. Identifying Data Characteristics
 Volume
 A large amount of transactional data is generated as a result of processing claims,
selling new policies and changes to existing policies.
 A large volumes of unstructured data, both inside and outside the company, including
health records, documents submitted by the customers at the time of submitting an
insurance application, property schedules, fleet data, social media data and weather
data.
 Velocity
 For in-flow data, some is low velocity (such as the claims submission data and the new
policies issued data), some is high (such as webserver logs and insurance quotes).
 For out-flow data, social media data and the weather data may arrive at a fast pace. 67

 For catastrophe management


20/05/2019 and- Ứng
701015 fraudulent
dụng dữclaim detection,
liệu lớn data
trong kinh needs to be
doanh
processed quickly to minimize losses.
BIG DATA ADOPTION - CASE STUDY EXAMPLE
4. Identifying Data Characteristics
 Variety
 Have to incorporate a range of datasets that include health records, policy data, claim
data, quote data, social media data, call center agent notes, claim adjuster notes, incident
photographs, weather reports, census data, webserver logs and emails.
 Veracity
 Inside ETI’s boundary, data has high veracity thanks to data validation performed at multiple
stages such as data entry, function-level input validation, data persistence.
 Outside ETI’s boundary, data has low veracity (such as social media data and weather data)
that requires an increased level of data validation and cleansing
 Value
68

 Have to draw maximum value out of the available datasets by ensuring the datasets are
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
stored in their original form and that they are subjected to the right type of analytics.
BIG DATA ADOPTION - CASE STUDY EXAMPLE
 5. Identifying Types of Data
 Structured data: policy data, claim data, customer profile data and quote data.
 Unstructured data: social media data, insurance application documents, call center
agent notes, claim adjuster notes and incident photographs.
 Semi-structured data: health records, customer profile data, weather reports, census
data,
webserver logs and emails.
 Metadata is a new concept as ETI’s current data management procedures do not create
nor append any metadata.
 Why? - - > Because all data in ETI is stored and processed is structured in nature and originates
from within the company. Hence, the origins and the characteristics of data are implicitly
known.
 Solution - - > for the structured data, the data dictionary and the existence of last updated
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
timestamp and last updated user-id columns within the different relational database tables can be
THANK YOU

70

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy