Chapter 1. Understanding Big Data
Chapter 1. Understanding Big Data
UNDERSTANDING
BIG DATA
20/05/2020
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
BIG DATA OVERVIEW
Scientific
701015 - Ứng dụng dữ liệu lớn trong kinh doanh
discoveries
20/05/2020
CONCEPTS AND TERMINOLOGY
5
DATASETS
Queries are executed on operational data stores from within an enterprise, for
example a Customer Relationship Management system (CRM) or Enterprise
Resource Planning (ERP) system via ad-hoc reporting or dashboards. (see Figure 15
1.5)
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
DESCRIPTIVE ANALYTICS
Figure 1.5: The operational systems, pictured left, are queried via descriptive
analytics tools to generate reports or dashboards, pictured right.
16
actual
the
20/05/2019 measurements701015
with -threshold values
Ứng dụng dữ liệu ofkinh
lớn trong KPI. (see Figure
doanh
KEY PERFORMANCE INDICATORS (KPI)
Figure 1.10: A KPI dashboard acts as a central reference point for gauging business
performance.
Company introduction
Company history and Company structure
IT environment – Technical Infrastructure and Automation
Environment
Business Goals and Obstacles to adopt a data-driven IT solution
Big Data adoption - Case Study Example
ETI’s key
department
s
Customer Human IT
Underwriting Claims Settlement Legal Marketing Accounts
care resource
department departmen departmen departmen departmen departmen departmen
departmen departmen
t t t t t t
t t 54
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
Agents
• generating the company’s revenue by selling policies
Actuaries
• managing risk assessment
• designing new insurance plans and revising existing plans
• performing what-if analyses and making use of dashboards and scorecards for scenario
evaluation
Underwriters
• evaluating new insurance applications and deciding on the premium amount
Claim adjusters
• dealing with investigating claims made against a policy
• arriving at a settlement amount for the policyholder
55
COMPANY HISTORY
Communication channels between Customer care department and prospective
and existing customers:
telephone
email
social media
Core competence:
providing competitive policies and premium customer service that does not end once
a policy has been sold.
helping to achieve increased levels of customer acquisition and retention.
relying heavily on its actuaries to create insurance plans that reflect the needs of its
56
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
customers.
policy
quotation
customer
IT ENVIRONMENT – relationship policy
TECHNICAL management
(CRM)
administratio
n
INFRASTRUCTURE AND
AUTOMATION A set of client-
ENVIRONMENT enterprise server,
claims
resource mainframe management
planning (ERP) platforms and
systems
risk
billing
assessment
document
management
57
20/05/2019
To generate 701015 for
various reminders - Ứng dụng dữ liệu
customers wholớnhave
trong kinh doanh
missed their payment via
IT ENVIRONMENT – FUNCTIONS OF EACH SYSTEM
ERP system
Day-to-day running of ETI, including human resource management and accounts
CRM system
To record all aspects of customer communication via phone, email and postal mail
To serve as a portal for call center agents for dealing with customer enquiries.
To allow the marketing team to create, run and manage marketing campaigns.
= = > Data from these above operational systems is exported to an
Enterprise Data Warehouse (EDW)
To generate reports for financial and performance analysis.
To generate reports for different regulatory authorities to ensure continuous 60
regulatory
20/05/2019 compliance. 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
BUSINESS GOALS AND OBSTACLES
Over the past few decades, ETI is suffering the falling share price and decrease in market share.
A committee comprised of senior managers was formed to investigate and make recommendations.
The insurance plans are generally based on the actuaries’ Customers whose
experience and analysis of the population as a whole circumstances deviate from the
average set are not interested
- - > only apply to an average set of customers
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh
in such doanhplans.
insurance
Main reason C onsequenc
e
Direct monetary loss + indirect
The increased number of complex and hard-to-detect
loss (due to the costs related
fraudulent claims and the associated payments being
to the processing of fraudulent
made against them
claims)
The emergence of tech-savvy competitors that employ Loss in the number of customer
the use of telematics to provide personalized + declines in revenue
policies
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
STRATEGIC GOALS TO IMPROVE PROFITABILITY
1. Decrease losses by:
(a) improving risk evaluation and maximizing risk mitigation, which applies to both creation of
insurance plans and when new applications are screened at the time of issuing a policy,
(b) implementing a proactive catastrophe management system that decreases the number of potential
claims resulting from a calamity, and
(c) detecting fraudulent claims.
3. Achieve and maintain full regulatory compliance at all times by employing enhanced risk
63
20/05/2019 701015
management techniques that can better - Ứngrisks
predict dụng dữ liệu lớn trong kinh doanh
OBSTACLES TO ADOPT A DATA-DRIVEN IT SOLUTION
Acquiring, storing and processing unstructured data from internal and external
data sources – Currently, only structured data is stored and processed
Processing large amounts of data in a timely manner – The amount of data
processed cannot be classified as large, and the reports take a long time
to generate.
Processing multiple types of data and combining structured data with
unstructured data – Unstructured data such as documents and call center logs
that cannot currently be processed, while structured data is used in isolation for
all types of analyses.
64
= = > a recommendation that ETI should adopt Big
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
Data
BIG DATA ADOPTION - CASE STUDY EXAMPLE
1. IT team and skills for Big Data implementation
Problems
No in-house Big Data skills
Have to choose between hiring a Big Data consultant or sending its IT team on a Big Data training course.
Solutions
Sending only the senior IT team members to the Big Data training course.
For long-term plan, this trained team members will become a permanent in-house Big Data resource and can also
train junior team members to further increase the in-house Big Data skillset.
2. During Big Data training course
Problems
No common vocabulary of terms
Lack of business exposure and understanding BI and the establishment of appropriate KPIs
Solutions
Building a terms glossary for datasets including claims, policies, quotes, customer profile data and census data.
20/05/2019 65
701015 - Ứng dụng dữ liệu lớn trong kinh doanh
BIG DATA ADOPTION - CASE STUDY EXAMPLE
3. Data Analytics
Deciding to use of both descriptive and diagnostic analytics
Descriptive analytics is for:
querying the policy administration system to determine the number of polices sold each day
querying the claims management system to find out how many claims are submitted daily
querying the billing system to find out how many customers are behind on their premium payments.
Diagnostic analytics is for
various BI activities, such as performing queries to answer questions such as why last month’s sales target was not met.
performing drill-down operations to breakdown sales by type and location so that it can be determined which
locations underperformed for specific types of policies.
In the future, utilizing predictive and prescriptive analytics in a gradual manner by first implementing predictive
analytics and then slowly building up their capabilities to implement prescriptive analytics.
predictive analytics will enable detection of fraudulent claims by predicting which claim is a fraudulent one and in
of customer defection by predicting which customers are likely to
case 66
defect.
later, via prescriptive analytics, prescribing
20/05/2019 701015the correct
- Ứng premium
dụng dữ liệuamount considering
lớn trong all risk factors or prescribing the
kinh doanh
best course of action to take for mitigating claims when faced with catastrophes, such as floods or storms.
BIG DATA ADOPTION - CASE STUDY EXAMPLE
4. Identifying Data Characteristics
Volume
A large amount of transactional data is generated as a result of processing claims,
selling new policies and changes to existing policies.
A large volumes of unstructured data, both inside and outside the company, including
health records, documents submitted by the customers at the time of submitting an
insurance application, property schedules, fleet data, social media data and weather
data.
Velocity
For in-flow data, some is low velocity (such as the claims submission data and the new
policies issued data), some is high (such as webserver logs and insurance quotes).
For out-flow data, social media data and the weather data may arrive at a fast pace. 67
Have to draw maximum value out of the available datasets by ensuring the datasets are
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
stored in their original form and that they are subjected to the right type of analytics.
BIG DATA ADOPTION - CASE STUDY EXAMPLE
5. Identifying Types of Data
Structured data: policy data, claim data, customer profile data and quote data.
Unstructured data: social media data, insurance application documents, call center
agent notes, claim adjuster notes and incident photographs.
Semi-structured data: health records, customer profile data, weather reports, census
data,
webserver logs and emails.
Metadata is a new concept as ETI’s current data management procedures do not create
nor append any metadata.
Why? - - > Because all data in ETI is stored and processed is structured in nature and originates
from within the company. Hence, the origins and the characteristics of data are implicitly
known.
Solution - - > for the structured data, the data dictionary and the existence of last updated
20/05/2019 701015 - Ứng dụng dữ liệu lớn trong kinh doanh
timestamp and last updated user-id columns within the different relational database tables can be
THANK YOU
70