Big Data Analytics
Big Data Analytics
com
IV.Case Study
‘Nowcasting’ economic activity in Colombia
PwC
Introduction to
Big Data
What it is and Why it
Matters
01
What is Big Data?
“Big Data” exceeds the capacity of traditional analytics and information management
paradigms across what is known as the 4 V’s: Volume, Variety, Velocity, and Veracity
With exponential The speed at which Represents the Reflects the size of a
increases of data from data is generated and diversity of the data. data set. New
unfiltered and used. New data is Data sets will vary by information is
constantly flowing being created every type (e.g. social generated daily and
data sources, data second and in some networking, media, in some cases hourly,
quality often suffers cases it may need to text) and they will creating data sets that
and new methods be analyzed just as vary how well they are measured in
must find ways to quickly are structured terabytes and
“sift” through junk to petabytes
find meaning
PwC 4
The Promise of Big Data
Even more important than its definition is what Big Data promises to achieve:
intelligence in the moment.
Traditional Techniques &
Big Data Differentiators
Issues
• Does not account for biases,
Veracity
In real-time:
Velocity
• Analysis is limited to small data • Scalable for huge amounts of multi-sourced data
Volume
sets
• Facilitation of massively parallel processing
• Analyzing large data sets = High
• Low-cost data storage
Costs & High Memory
PwC 5
Types of Big Data
Variety is the most unique aspect of Big Data. New technologies and new types of data
have driven much of the evolution around Big Data.
(1) M. Milakovich, “Anticipatory Government: Integrating big data for Smaller Government”, in Oxford Internet Institute “Internet, Politics, Policy 2012” Conference, Oxford, 2012
(2) D. Boyd and K. Crawford, “Six Provocations for big data,” in A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, 2011
PwC 7
Why is Big Data valuable?
We have identified 5 key areas where Big Data is uniquely valuable:
Performance Exploration for, and discovery of, new needs, can drive organizations to fine tune
Improvement for optimal performance and efficiency (employee data).
New Business Discovery of trends will lead organizations to form new business models to adapt
by creating new service offerings for their customers. Intermediary companies
Models/Services
with big data expertise will provide analytics to 3rd parties.
PwC 8
$1 One study estimated the potential value of big data in
the U.S. health care, European public sector
administration, global personal location data, U.S.
Structured, semi-structured or
unstructured information
distinguished by one or more of
the four “V”s: Veracity, Velocity,
Variety, Volume.
Big Data Open Data
Crowdsourced
Data
Graphic and definitions based on “Big Data in Action for Development,” World Bank, worldbank.org
PwC 10
Big Data
Analytics
Putting Big Data to Work
02
It’s not just about the data…
It is important to understand the distinction between Big Data sets (large, unstructured,
fast, and uncertain data) and ‘Big Data Analytics’.
PwC 12
… It’s also about what, how, and why you use it
Big Data Analytics – the process of harnessing Big Data to yield actionable insights – is a
combination of five key elements:
PwC 13
Big Data Analytical Capabilities
Continuing increases in processing capacity have opened the door to a range of advanced
algorithms and modeling techniques that can produce valuable insights from Big Data.
Structured Unstructured
A/B/N Testing
Simulation Spatial Analysis
Experiment to find the Classification
Modeling Extract geographic or
most effective Organize data points
Experiment with a topological
variation of a website, into known categories
system virtually information
product, etc
Visualization
Predictive Complex Event Sentiment Analysis
Use visual Modeling Processing Extract consumer
representations of
Use data to forecast or Combine data sources reactions based on
data to find and
infer behavior to recognize events social media behavior
Emerging
communicate info
Rear-view Forward-looking
Continuous
Analytics
Prescriptive
Analytics How do we adapt to
change?
Predictive Analytics What should be
Monitor, decide, and
done?
act autonomously or
Increasing Business Value
Recommend ‘right’ or
Diagnostic Analytics What could semi-autonomously
optimal actions or
happen?
Descriptive decisions
Predict future • Monitor results on a
Analytics outcomes based on the continuous basis
Why did it happen? • Real-time product and
Identify causes of past • Dynamically adjust
service propositions
What happened? trends and outcomes (graph analysis, entity strategies based on
Describe, summarize • Forward-looking view
resolution on data lakes changing environment
and analyze historical of current and future
• Observed behavior or to infer present and improved
data value
events customer need) predictions
• Sentiment Scoring
• Non-traditional data • Rapid evaluation of • Agent-based and
• Observed behavior or • Graph analysis and dynamic simulation
sources such as social multiple ‘what-if’
events Natural Language scenarios models, time-series
listening and web
• Non-traditional data Processing to identify analysis
crawling • Optimization decisions
sources such as social hidden relationships
• Statistical and and themes and actions
listening and web
regression analysis • Dual objective models
crawling
• Dynamic visualization
• Behavioral economics
PwC 15
Examples of Big Data Analytics in Action
Market Leaders are leveraging Big Data Analytics to generate value by starting with a
business need and focusing on implementing actionable insights quickly and decisively
PwC 16
Big Data Analytics in Development
Big Data Analytics is making an equally impressive impact on Development interventions
– allowing decision-makers to reach and serve previously neglected populations.
03
Step 1: Be Yourself
Beginning with a clear understanding of the specific questions you intend to use Big Data
Analytics to address can help guide where and which data solutions are deployed.
Value
enhancement Delivering future value
• Data-driven decision-making in real time
• Use analytics to develop new
programs/opportunities
• Relies heavily on data supplied by others
• Often struggles to move away from exclusively
intuitive decision-making
Strategic
Enabling strategy and improving
performance
• Use analytics to reduce political divergence and
drive consensus
• Real-time analytics to enable quick responses to
events
• Use data to develop personalized services
Tactical • Need for more objective and higher quality data
Value
Day to day operations
enablement • Struggle to move from narrow focus on reactive
operations to more proactive, comprehensive
management of daily operations
• High value for digitization of operational
processes across program units
• Often already proficient in traditional business
Operational intelligence
PwC 19
Step 2: Secure People & Skills
The competencies required of “data scientists” within an analytics organization or project
converge from multiple skill domains.
Subject Area or
Expertise in statistical Domain Expertise Deep understanding of industry,
techniques, tools and subject area, or research domain to
languages used to run help determine which questions need
analyses that generate answering and on what frequency,
insights to effectively specificity, or geography
determine and
communicate
actionable insights
Computer
Statistical & Science &
Mathematical Programming
Comfort in programming
across various languages, a
thorough understanding of
external and internal data
sources, data gathering,
storing, and retrieving
Organization-specific
methods which help combine
Information
Organization-specific knowledge about data Knowledge disparate data sources to
assets – including enterprise “metadata” – generate unique insights
their location and appropriate business
context for use in advanced analytics
PwC 20
Step 3: Let objectives dictate structure, not vice versa
How analytics efforts or organizations are structured – whether reporting is vertically or
horizontally aligned, how interconnected or autonomous separate units are, how resources
and successes are shared – can influence efficiency and impact.
Distributed Analytics Federated Analytics Centralized Analytics
CENTRAL Analytics CENTRAL Analytics CENTRAL Analytics
Competency Center Competency Center Competency Center
BI BI BI
Applications Applications Applications
• Managed locally • Managed locally, but connected • Controlled centrally, with units
Analytics Tools to group framework having access to shared
resources
• Placed within individual units • Placed within individual units • Placed within central analytics
Analytics Staff/
• Skills tailored to specific region or team, available as needed to
Competencies subject matter support individual units
PwC 21
The ‘Hub-Spoke’ operating model often serves as a well-
synchronized, connected system
4 3 2 1
Local Local Centers of Competency Global
Business Central Business
Adoption Excellence Center
Operations Decision Hub Strategy
of Practices (Regional) (‘Standards’)
4 4
Local
Local
‘Spoke’
‘Spoke’
Local
‘Spoke’
3
4 4
Sample Hub-Spoke Local
Local
‘Spoke’ Interaction Model Center of ‘Spoke’
Competency Excellence
Center (Regional)
3
2
4
Center of Central 4
Local Local
Excellence Decision
‘Spoke’ ‘Spoke’
(Regional) 1 Hub
4
Local ‘Standardization’ Local
4
‘Spoke’ ‘Spoke’
4
Center of
Local Excellence Local
4
3. Innovate, invest in, and build new analytics capabilities, and gradually push them out to the
business as user sophistication matures (e.g., data visualization).
8. Allow distributed analytics where it makes sense, but tightly govern and ensure cataloguing.
PwC 26
Case Study
‘Nowcasting’ Economic
Activity in Colombia
04
Situation
In Colombia, the leading economic indicators
used to analyze economic activity have an
average lag of 10 weeks. This presents challenges
for the well-timed design of economic policy and
monitoring of economic shocks or trends.
PwC 28
Group Discussion
PwC 29
Brainstorming Breakout
PwC 30
Solution
Based on web searches performed by Google users, Google Trends (GT) provides daily
information about the query volume for a given search term in a given geographic region.
For Colombia, GT data are available at the departmental level and also for the largest
municipalities.
The Colombian Administrative Department for National Statistics (DANE – for its
acronym in Spanish) combined indexes built using GT data with its own official economic
activity data (both at the aggregate level and at the sectorial level) – both of which are
publicly available – to construct leading indicators that determine, in real-time, the short-
term trend of different economic sectors, as well as their turning points.
In some sense, the GT data takes the place of traditional consumer-sentiment surveys. For
example, the use of data for a certain keyword (such as the brand for a certain product)
might be justified in the case a drop or surge in the web searches for that keyword could be
linked to a fall or increase in its demand and, therefore, a lower or higher production for
the specific sector producing that product.
PwC 31
Example: “Ahorro” vs. Unemployment Rate
Ahorro – savings
PwC 32
Example: “Ahorro” vs. Unemployment Rate
PwC 33
Example: “Zapatos” vs. Employment Rate
Zapatos – shoes
PwC 34
Example: “Zapatos” vs. Employment Rate
PwC 35
Find Out More
Melanie Thomas Armstrong Jean Young
Leading Partner Managing Director
International Public Sector International Public Sector Data Analytics
+1 (202) 320-7098 +1 (703) 918-1001
Melanie.Thomas.Armstrong@us.pwc.com Jean.M.Young@us.pwc.com
This publication has been prepared for general guidance on matters of interest only, and does not constitute professional advice. You should not act upon the
information contained in this publication without obtaining specific professional advice. No representation or warranty (express or implied) is given as to the
accuracy or completeness of the information contained in this publication, and, to the extent permitted by law, PricewaterhouseCoopers LLP, its members,
employees and agents do not accept or assume any liability, responsibility or duty of care for any consequences of you or anyone else acting, or refraining to
act, in reliance on the information contained in this publication or for any decision based on it.
© 2017 PricewaterhouseCoopers LLP. All rights reserved. In this document, “PwC” refers to PricewaterhouseCoopers LLP which is a member firm of
PricewaterhouseCoopers International Limited, each member firm of which is a separate legal entity.
PwC 36
Appendix
Emerging Data
Storage and
Infrastructure
Options
Building an Analytics Organization: Critical Components
Emerging Infrastructure – Data Storage Options
Introduction to Hadoop
Sources: http://hadoop.apache.org/
PwC 40
Building an Analytics Organization: Critical Components
Emerging Infrastructure – Data Storage Options
PwC 41
Building an Analytics Organization: Critical Components
Emerging Infrastructure – Data Storage Options
Cloud Computing The model is compelling; cloud computing can improve flexibility,
scalability and cost management. Businesses best able to realize the
potential will establish a cohesive business strategy as cloud
computing can transform your entire organization — people,
processes, and systems
Source: PwC, “Digital IQ Snapshot: Cloud,”; PwC, “FS Viewpoint: Clouds is the forecast”
PwC 42
Text Mining and
Natural Language
Processing
Data Mining, Text Mining, and Natural Language Processing
What are they and how are they used?
Natural Language
Processing
NLP is a theoretically
motivated range of
computational
Text Mining techniques for analyzing
Analysis of large and representing
quantities of natural naturally occurring texts
language text and at one or more levels of
Data Mining detecting lexical or linguistic analysis for the
linguistic usage purpose of achieving
Extraction of implicit, human-like
patterns to extract
previously unknown, language processing for a
probably useful
and potentially useful range of tasks or
information
information from data applications.
Source: Text Mining, Ian Witten, 2004
PwC 44
Natural Language Processing and Text Mining
What are they and how are they used?
PwC 46
Text Mining/Analytics Tools
Tool kits that provide capabilities for identifying and analyzing features
within individual or groups of texts
PwC 47
Text Mining/Analytics Tools Cont.
Tool kits that provide capabilities for identifying and analyzing features
within individual or groups of texts
A program that automatically identifies and extracts • Information extraction • Topic Linking
ReVerb
binary relationships from English sentences. Link • Topic Identification
Open text Open source tool for summarizing texts. Link • Document summarization
summarizer
Web based API that is used to analyze content and • Attribute/feature • Fact identification
Open Calais
extract topics or information. Link extraction
Knowledge Family of techniques tools for searching and organizing • Semantic Analysis
Search large data collections. Link
A free software for Quantitative Content Analysis or • Text Parsing • Network analysis
KH Coder
Text Mining Link • document search
PwC 48
Resources
Tutorials, Tools, Applications, and Research Groups
PwC 49
DeepQA, Image
Analytics, and
Audio Analytics
DeepQA
Overview and Introduction
What is DeepQA?
• DeepQA forms that core of Watson, the open
domain question analysis and answering system
• The DeepQA stack is comprised of set of search,
NLP, learning, and scoring algorithms
• DeepQA operates on a distributed computing
infrastructure that leverages Map Reduce and
the Unstructured Information Management
Architecture
What is the target problem set?
• Understanding the meaning and context of
human language
• Searching and retrieving information from large
library of unstructured information
• Identifying accurate and precise answers to
questions that are complex and must sourced
from a large knowledge set
PwC 51
DeepQA Infrastructure Technology
Data Management and Search
Technology Links
Unstructured
Information UIMA Link
Architecture
MySQL Link
SQL Server
Apache Derby Link
Java Natural Open NLP Link
Language
Toolkit Stanford NLP Link
PwC 52
DeepQA Infrastructure Technology
Platform and Administration
Technology Links
VMWare Link
Virtualization
Host
Zen Link
File
Management/ rSync Link
Archival
OS Fedora Link
PwC 53
Business Applications
DeepQA provides capabilities that can facilitate knowledge discovery, improve
customer interaction, and uncover hidden facts
Overview Objectives
Search internal and external • Identify information about a subject through deep analysis
Knowledge unstructured/structured information of internal and external information sources
Discovery assets to uncover previously • Answer questions about a business problem or trend that
unknown knowledge may be difficult to analyze within traditional data sources
Search documents and • Identify business topics and trends within communication
communications to uncover relevant and documents
E-Discovery information associated with a specific • Search for non compliance activities within internal and
topic external data sources
• Identify key facts or issues that comprise a contract or sets
Search through single or multiple of contracts
Contract
contracts to answer specific questions
Evaluations about the nature of the contract • Identify contracts or legal documents that contain similar
entities or features
Provide the ability to interact with • Provide a platform for automatically answering consumer
Relationship consumers providing precise questions about products or services
Management responses to technical and open • Reduce reliance on call centers and improve interaction with
domain questions consumers
Search consumer communications, • Identify background information about consumers
Consumer social media, and sales information to • Identify consumer qualities that create risks or represent
Discovery identify opportunities and opportunities
demographics
Technical Find answers to technical and process • Utilize unstructured data and communications to identify
Troubleshooting problems through solutions or root causes to system and process problems
PwC 54
Areas for Further Research
Infrastructure/Tools and Search Technologies/Concepts
Topic Research
The tool is used to distribute queries, analysis, and other processing activities across
Hadoop
multiple CPUs. Further research is required to understand the tools architecture and
Map/Reduce
how to integrate it with other tool kits. OpenNLP, UIMA, Lucene, etc.
A Java library for NLP tasks. Need to evaluate the tools capabilities and gaps as well as
OpenNLP
how it can be incorporated into the UIMA
Tools OpenCYC
An open common sense reasoning platform. Need to better understand the tools role as
well as how it fits within the other technologies
An architecture for managing unstructured data. Further research is needed to
UIMA
understand how to run in parallel and how the SDK can be applied to NLP activities
A text search platform. Further research is needed to understand the library and how to
Lucene
incorporate it into UIMA
Algorithms are used to score search results based on their alignment with the question.
Text Search
Further research is needed to understand what models and scoring metrics can be
Scoring
applied to search results at various phases of DeepQA.
Triple stores maintain data in a subject-predicate-object structure and is used for turning
Triple Store
around quick facts. Further research is needed to understand the philosophy and
Search
Search technologies behind these data storage mechanisms
Commonsense Research is required to understand the branch of AI, technologies and role within
Reasoning DeepQA.
Document/ Generate research on information and document retrieval practices. Technologies and
Information algorithms need to be reviewed. Falls within a broader research topic for enterprise
Retrieval search.
PwC 55
Areas for Further Research
Machine Learning and Natural Language Processing
Topic Description
Research the concept and how they are to used evaluate learning models and assign a
MetaLearners
confidence score based on the learning models that are used to rank search results
Question Identify techniques and models that can be employed to analyze and classify questions
Machine
Classification
Learning
Search Research models are available for ranking search results based on the various search and
Ranking recall techniques that are employed for a question
Models
Logical Form Research how SNA is used to discover logical relationships within text and product an
Analysis understanding about the information within the text
Semantic Identify tools and algorithms that are employed to uncover semantic relationships within
Structure texts/phrases and how these relationships can be applied to extract relevant information
Analysis for question analysis and search
NLP Relationship Research techniques and tools for uncovering temporal, geospatial and spatial
Analysis relationships within a knowledge set
Feature Evaluate tools and algorithms that are used to extract features of entities from text and
Extraction identify methods for structuring the data for search
Phrase Identify algorithms and tools that can be applied to extract key phrases from text based
Analysis on a search context
PwC 56
URLs
Overviews and Applications
Links
The AI Behind Watson
How to build a Watson Jr.
PwC 57
Image Analytics Overview
How can we extract insight from images and video?
Overview
• The process of pulling relevant information
from an image or sets of images for advanced
classification and traditional analysis
• Applies image capture, image processing, and
machine learning techniques to extract,
quantify, and structure, image information
Advantages
• Provides a method to structure, organize, and
search information that is stored within
images
• Offers an additional data set that can be
applied to understanding consumer behavior,
automating business processes, and
discovering knowledge enterprise content
58
PwC
Image Analytics Tools
There are few standalone packages that are capable of performing robust
image analysis; however, solutions can be developed using existing
frameworks and analytics toolkits
Image Machine
Tool Overview Computer Vision
Processing Learning
Open source library of computer vision
OpenCV functions that is accessible via C, Java, X X X
and Python
Integrated image analysis platform that
PAXit
Image Analys
provides basic feature identification X X
is functions
Java based image processing platform
ImageJ that can be accessed via an API and X
expanded with custom plugins
PwC 59
URLs
Tutorials, Tools, Applications, and Research Groups
PwC 60
Audio Analytics Overview
How can we extract insight from audio and voice media?
Overview
• The process of capturing audio and analyzing
its features as to extract content and context
of an event
• Applies speech analysis and signal
processing principles to structure audio
information for analysis via NLP or traditional
analytics techniques
Advantages
• Provides a method for identifying events or
common patterns within sound bytes
• Offers a way of capturing not only the content
and topics within a conversation, but also the
emotions and context
61
PwC
Audio Analytics: Capabilities and Insights
What data can we capture from sound bites that can be used to enhance other
data or analysis?
Sound
• Rate – defines how quickly a sound or a
pattern of sound is occurring and can be used
to evaluate the nature of an exchange, the
state of the sound source, and context of the
Rate
Time topic
• Power and Intensity – measures the
loudness of the sound or event and provides a
Sound or Pitch
way of evaluating the mood or emotion of the
Frequency
sound source
• Sound and Pitch – a measure of the sound
quality and can serve as a tool for isolating
separate audio events or sources as well as
Time
measuring changes to the sound source
PwC 62
Audio Analytics Applications
Analysis Objectives
PwC 63
Audio Analytics Tools
There are few tools on the market that provide a broad range of audio
analysis capabilities. However, basic audio analysis and natural language
tool kits can be combined for robust analytics
Tool Overview Audio Processing Information Retrieval
PwC 64
URLs
Tutorials, Tools, Applications, and Research Groups
PwC 65
Social Network
Analysis
Applications
Analyze organizational structures to identify opportunities that can improve
communication, productivity, and collaborations
Analysis Objectives
Evaluate team structures , • Identify team structures that are not effective
information flows among team
Collaboration • Identify informal organizational structures
members, and information
Analysis exchanges with other teams to • Identify individuals/roles or groups that are influential to
improve working structures collaborative work environments
PwC 67
Applications
Analyze network structures, communication channels, and information flows
to identify operational enhancements
Analysis Objectives
Assess organizational structures • Identify communication improvements to disaster
Disaster recovery and communication patterns as they recovery teams
planning relate to the groups that play a role • Identify weak links among functional groups to improve
in disaster recovery plans collaboration during recovery plan execution
Assess how data points or • Identify overlapping information sets and bottlenecks for
Data/ information sets originate or are information dissemination
Information distributed across the enterprise to • Assess how organization structures or information
Dissemination their intended targets architecture impact the flow of information to its targets
Assess the organization or external • Identify network agents that collaborate with known
Fraud Detection / network to identify communication fraudulent agents
prevention or collaboration patterns that align • Identify activities that align with known fraudulent
with known fraudulent activity behavior
Analyze the organization structure • Identify process improvements through discovery of
Process and communication patterns to hidden process steps, communication flows , and actors
Discovery / uncover process improvements or • Discover undocumented or informal processes that are
Improvement identify new processes hidden within frequent collaboration and communication
paths
Evaluate the structure of a supply • Identify communication gaps that could impact dependent
network and the interactions among process or operations
Supply Chain the entities that comprise the • Identify strategic relationships to optimize the supply
Analysis network to identify gaps, network
bottlenecks and sourcing strategies
• Identify supply nodes that create inefficiencies
PwC 68
Applications
Analyze social media networks and consumer feedback to improve product
offerings and market interactions
Analysis Objectives
Observe how a specific topic, news • Assess how target consumers/market will react to a piece
Novelty/ articles or sentiment diffuses of news or campaign
Sentiment through a consumer network • Evaluate how long news, data, or sentiment will be
Diffusion Analysis retained within a system and how far it will spread
Monitor and analyze connections • Identify individuals or groups that influence markets and
within social media networks to adoption
Market Influencer identify markets or consumers that • Identify untapped markets
Identification are influential within communities
• Identify market segments as targets for ad campaigns to
improve product/service adoption
Analyze the connections and • Improve product or service offerings based on attributes
consumer attributes within the that connect the consumer market
Consumer
target market to discover • Develop strategies to target new or existing consumers
Segmentation communities or groups with based on identified segmentation characteristics
common characteristics
Analyze the flow of communication • Identify segments or individuals that will be likely early
Product or Brand or ideas through a market segment adopters
Diffusion Analysis to evaluate how a product may • Identify incentives or campaigns that will improve
diffuse product/service adoption
Analyze consumer network • Identify new feature sets for products and services
Recommendation connections and common features • Assess new markets for selling similar or new products
Systems among consumers to develop
• Target consumers with specific products or services
recommendations
PwC 69
Tools
Social network analysis plug-ins and APIs for development/scripting
languages and data analysis tools
PwC 70
Tools
Proprietary and open source social network analysis interactive application
suites
PwC 71
Resources
Tutorials, Tools, Applications, and Research Groups
Introductory Lecture
PwC 72
Additional Case
Studies
Example 1
Source: Bruce Bigelow, Big Data, Big Biology, and the ‘Tipping Point’ in Quantified Health: Takeaways from Xconomy’s On-the-Record Dinner,
Xconomy, April 26, 2012
PwC 75
Example 3
When biological and phenotypic features were integrated alongside chemical structures to
predict adverse drug reactions, prediction accuracy increased from 0.9054 to 0.9524.
Source: “Liu M, Wu Y, Chen Y, et al . Large-scale prediction of adverse drug reactions by integrating chemical, biological, and phenotypic properties
of drugs. J Am Med Inform Assoc 2012;19:e28–35.
PwC 76
Other Examples
Satellite Data
Hartford Steam Boiler
Allianz Location
Hartford Steam Boiler is using sensors
Allianz is ‘mashing’ satellite data, and real-time sensor data to monitor
third-party street-level data, assets, reduce losses and manage risks
images, and other internal data better
to better understand risk
concentrations and manage Property-Specific Hartford Steam Boiler has been able to
Map Data
concentration risk in commercial Data manage concentration risks and reduce
property insurance losses, having one of the lowest
combined ratios for a commercial
insurer
Executives are currently using big data to uncover what is currently going on in their business, to
understand why, to predict future performance and to understand what actions P&G should take
Source: “Procter & Gamble – Business Sphere and Decision Cockpits”, Ravi Kalakota, Pratical Analytics Wordpress, Feb. 2012, mskcc.org/cancer-
care; eWeek.com, Healthcare IT News, IBM Watson to Aid Sloan-Kettering With Cancer Research, March 2012
PwC 77
Big Data
Analytics
Technology &
Vendor Mappings
Big Data Analytics – Technology & Vendor Mappings
HP Private Cloud HP
HP Helion HP
PwC 79
Big Data Analytics – Technology & Vendor Mappings
PwC 80
Big Data Analytics – Technology & Vendor Mappings
3. Talend Talend
Data Ingestion
& Integration Hive Apache Software
Foundation
Drill Apache Software
Foundation
Data
Integration Staging *Need assistance in
locating
Persistent *Need assistance in
Staging locating
File Exchange *Need assistance in
locating
File Storage *Need assistance in
locating
PwC 81
Big Data Analytics – Technology & Vendor Mappings
PwC 82
Big Data Analytics – Technology & Vendor Mappings
PwC 84
Big Data Analytics – Technology & Vendor Mappings
Riak Basho
AWS DynamoDB Amazon Web Services
Couchbase Couchbase
PwC 85
Big Data Analytics – Technology & Vendor Mappings
PwC 86