Are You Ready For Big Data
Are You Ready For Big Data
Are You Ready For Big Data
By DC Denison
An examination of Big Data and its role
in the next generation digital experience
Table of Contents
Ch. 1
Ch. 2
Ch. 3
Ch. 4
Ch. 5
Ch. 6
The first weeks of 2013 were barely out of the gate when one
industry analyst was already predicting that Big Data would be
Time magazines 2013 Person of the Year. Watching what Google,
Amazon, and Facebook have done with Big Data is impressive,
enviable.
But what about companies with smaller staffs and budgets? When
does it make sense to start up a Big Data program? If your email
marketing system isnt talking to your sales force automation
system, and neither is synched up with your online purchase
system, are you really ready to tackle a Big Data project?
DC Denison covered
the technology
scene for The
Boston Globe for
Chapter 1
Volume. Billions of computers, smartphones users, and objects are now operating and interacting with each other,
generating exabytes (trust me, thats big) of data every day.
Variety. Much of this data is unstructured, meaning that it doesnt notch neatly into a standard relational database.
Unstructured data is a book review on Amazon, a comment on a blog, a video on YouTube, a podcast, a tweet.
Velocity. A smartphone users location data is changing constantly, so is the value of a portfolio held by a customer
of an online financial service. These kinds of rapid updates present new challenges to information systems.
Recently Mike Gualtieri, a principal analyst with Forrester Research in Cambridge, MA (and the analyst who believes that
Big Data will get Times year-end honor), has come up with what he believes is a more pragmatic definition of Big Data,
one that he claims is an actionable, complete definition for IT and business professionals.
Heres how Gualtieri defines it: Big Data is the frontier of a firms ability to store, process, and access all the data it needs to
operate effectively, make decisions, reduce risks, and serve customers.
Gualtieri says that the key to his definition can be summed up in the acronym SPA, which stands for: Store, Process, and
Access. He advises clients to ask themselves these three questions to determine whether they are able to wring value from
a Big Data project:
Access. Can you retrieve, search, integrate, and visualize the data?
Gualtieris point is that businesses should define their Big Data projects not by the size or shape of the data, but by what
they can accomplishwhat they can dowith large, various, and fast-moving data.
The two definitions frame the issue. The three Vs describe what Big Data is in terms of size: how big is it, what it looks like.
Gualtieris definition describes what a company should expect to do with a Big Data project.
Chapter 2
In many cases, Big Data is used for the exact same kind of analytics youve been doing for some time but with more data points
from new data sources added to the mix.
Franks, who is the author of the book Taming the Big Data Tidal Wave, (John Wiley & Sons, April 2012), points out that forwardlooking companies are always struggling with new data types. In the late 1990s and early 2000s, for example, many organizations
were struggling to use transactional data for broad analytics purposes. Now transaction data is not much of a challenge, he says.
More recently, companies are getting used to working with online browsing history, a data type that was once considered
daunting.
Big Data, according to Franks, is simply a continuation of the struggle weve always had to incorporate ever-growing and
ever more diverse data sources into analytics to enable better business decisions.
Thats why the definition of Big Data, according to Mike Gualtieri, includes the word frontier. The push to incorporate ever
larger and more various data is an essential part of a Big Data project.
To get the most from a Big Data project, experts say, you should start with a goal in mind. Do you want to measure the
effectiveness of your marketing and advertising? How about incorporating the voice of the consumer in your product
lifecycle decisions? Big Data can also be used to create brand new information products and services.
Be explicit about your business goals. That will shape your project, and increase the odds of a successful outcome. If you
dont have a goal, take a look at the Examples of Big Data. Maybe one of those project goals can be adapted to work for
your company or organization.
Remember, too: Most often Big Data aha moments result from the intersections among a variety of data sources. Large
collections of data tend to be stored in silos. Powerful new strategies and insights emerge when you cut across those vertical
containers.
Shawndra Hill, who works with and teaches about Big Data in the Operations and Information Management Department
at The Wharton School of the University of Pennsylvania, advises that a company, should first understand the state of
the art in data mining for their domain in order to identify the best benchmarks for their project and to see whether some
existing solution is available to solve their problems. The next step, according to Hill: Calculate the expected gain from
implementing the project in the best and worst cases of success and compare the estimates to the expected cost of taking
on the project No Big Data project should start just because its fashionable.
The most successful Big Data projects are also action-oriented, with a strong internal push toward acting on the insights
that emerge from the analysis. This is why consultants like Mike Gualtieri caution companies to avoid accepting Big Data
projects that generate lazy data.
You have to be
able to use the
information
to create a
competitive
advantage in your
markets.
Shawndra Hill
The Wharton School of the
University of Pennsylvania
If you have a data warehouse and youre just producing reports, thats not Big Data, Gualtieri says. You have to be able to
use the information to create a competitive advantage in your markets.
Chapter 3
Chapter 4
10
Chapter 5
Davenport says,
What skills do you need to implement a Big Data project? The field you want to explore is called data
science.
a hybrid of data
Data scientist is a relatively new job title, but thousands already have it on their business cards (700 at Google alone). Yet
because the field is so newuniversity programs are rareBig Data professionals are hard to find. McKinsey & Co. predicts
that by 2018, the U.S. could face a shortage of more than 1.5 million specialists needed to capture, store, manage, and
analyze Big Data.
In Fall 2012, Thomas Davenport wrote a cover story for the Harvard Business Review that outlined strategies for staffing Big
Data projects.
One approach they recommended: Grow your own. Recruit and develop Big Data talent in house, or look for achievers in
any field with a strong data and computational focus and grow with them. Experimental physics and systems biology, for
example, are two fields that could generate promising data scientists, according to Davenport and Patil.
But Davenport and Patil warn that the search wont be easy. What makes it particularly difficult, Davenport says, is that the
best data scientists need a variety of technical, business, analytical, and relationship skills. According to Davenport, the best
data scientists often have advanced computer science degrees, or advanced degrees in fields such as physics, biology, or
social sciences that require a lot of computer work. In addition they have to be familiar with a wide variety of disciplines
such as Hadoop, MapReduce and related tools, programming languages like Python, and disciplines like natural language
processing.
Also, Nothing beats experience, adds Shawndra Hill. She says that the best data scientists have loved data for a long time
and have gained an intuition about what can and cant be done. They also have a creative eye to think about how to use new
data to solve old problems and old data to solve new problems.
11
Paths to data science usually start with an interest in solving hard problems, Hill says. The rest of the path is lined
with exciting hard problems that have been solved successfully over time. The speed of computing makes so much
more possible.
In addition to these technical skills, data scientists also need the attributes previously necessary for analytical
professionals, including mathematical and statistical skills, business acumen, and the ability to communicate effectively
with customers, product managers, and decision makers.
The skills are so varied, Davenport reported, that some companies have decided to create data science teams
that together embody this collection of skills. The yearly salary for data scientists, according the online career site
Glassdoor, ranges from $80K to $220K.
One encouraging sign for companies in search of expertise is that many of the hottest, most lavishly funded startups in the Big Data arena are working on products that mix analytics with Big Data, often in a cloud-based service.
Ultimately these products could lighten the load for companies hoping to get a Big Data project off the ground. Until
then, Davenport says, the goal is to find data talent that is a hybrid of data hacker, analyst, communicator, and trusted
adviser. That combination, he admits, is extremely powerfuland rare.
12
Chapter 6
To those who
Start small with Big Data, is the advice from author Bill Franks. Identify a few relatively simple
analytics that wont take much time or data to run. For example, an online retailer might start
by identifying what products each customer viewed within just a few key categories so that the
company can send a follow-up offer if they dont purchase.
An organization that is entering the Big Data waters needs simple, intuitive examples to see what the data can do, Franks
says, adding that this approach also yields results that are easy to test to see what type of lift the analytics provide.
considering a Big
Data project, Gualtieri
advice. Dont sit this
out, he urges. This is
real.
Next, design a one-off test on some company data: a single month of data from one division for one set of products, for
example. Franks cautions against attempting to analyze all of the data all of the time when first starting. That can muddy
the water with too much data, and lead to high initial costs, a problem that plagues many Big Data initiatives. Instead, use
only the data you need to perform the initial tests. At this point, Franks recommends, turn analytic professionals loose on
the data. They can create test and control groups to whom they can send the follow-up offers, and then they can help
analyze the results. During this process, theyll also learn an awful lot about the data and how to use it.
Successful prototypes also make it far easier to get the support required for a larger, more comprehensive effort. Best
of all, the full effort will now be less risky because the data is better understood and the value is already partially proven.
Its also worthwhile to learn early when the initial analytics arent as valuable as hoped. It tells you to focus your effort
elsewhere before youve wasted many months and a lot of money.
Pursuing Big Data with small, targeted steps can actually be the fastest, least expensive, and most effective way to go,
Franks says. It enables an organization to prove theres value in a major investment before making it, and to understand
better how to make a Big Data program pay off for the long term.
13
Whatever the size of your initial foray, experts advise to remember that its a process, a loop. Dont expect fantastic insights
the very first time you route two data streams into the same river. Often the benefits dont start to accrue until after youve
run your tests through a few iterations.
Even then, because of the newness of the field, Big Data projectseven successful onescan be frustrating. We still have
a ways to go to be able to combine evidence from different types of data sourcesfor example from text, social networks,
and time series data, says Shawndra Hill. The methods have not caught up yet with the scale and complexities of todays
Big Data. She adds, This is both exciting and scary. Exciting because there are a lot of new solutions to be generated,
and scary because we are probably leaving a lot of value in databases, and that value may be harder to find as Big Data
becomes even bigger data with even more complexity and noise.
Get Started
Analyst Mike Gualtieri likes to cite a Forrester study that predicts that by 2016, 1 billion people will have smartphones and
tablets, and that number will keep increasing, he says. The more technology people use, the more data they generate,
and the more opportunity there is to provide personal experiences, Gualtieri says. The firms that make things personal
will drive things in the future. The others will drop off. To those who are on the fence, considering a Big Data project,
Gualtieri has a simple piece of advice.
Dont sit this out, he urges. This is real.
14
Consumer product companies and retail organizations are monitoring social media like Facebook and Twitter to get
an unprecedented view into customer behavior, preferences, and product perception.
Manufacturers are monitoring minute vibration data from their equipment, which changes slightly as it wears down,
to predict the optimal time to replace or maintain. Replacing it too soon wastes money; replacing it too late triggers
an expensive work stoppage
Manufacturers are also monitoring social networks, but with a different goal than marketers: They are using it to
detect aftermarket support issues before a warranty failure becomes publicly detrimental.
Financial services organizations are using data mined from customer interactions to slice and dice their users into
finely tuned segments. This enables these financial institutions to create increasingly relevant and sophisticated
offers.
Advertising and marketing agencies are tracking social media to understand responsiveness to campaigns,
promotions, and other advertising mediums.
Insurance companies are using Big Data analysis to see which home insurance applications can be immediately
processed and which ones need a validating in-person visit from an agent.
By embracing social media, retail organizations are engaging brand advocates, changing the perception of brand
antagonists, and even enabling enthusiastic customers to sell their products.
Hospitals are analyzing medical data and patient records to predict those patients that are likely to seek readmission
within a few months of discharge. The hospital can then intervene in hopes of preventing another costly hospital stay.
Web-based businesses are developing information products that combine data gathered from customers to offer
more appealing recommendations and more successful coupon programs.
The government is making data public at both the national, state, and city level for users to develop new
applications that can generate public good.
Sports teams are using data for tracking ticket sales and even for tracking team strategies
15
Recommended Reading:
Books, Reports, Blogs, and Conferences
Books
Analytics at Work: Smarter Decisions. Better Results by Thomas H. Davenport, Jeanne G. Harris, Robert Morison,
(Harvard Business Review Press; 2010)
Taming the Big Data Tidal Wave by Bill Franks (John Wiley & Sons, 2012).
Reports
Data Scientist: The Sexiest Job of the 21st Century, by Thomas H. Davenport and D.J. Patil (Harvard Business Review, Oct. 2012)
HBR Reprint R1210D
Big Data Now: Current Perspectives from OReilly Media. Free download.
Data Jujitsu: The Art of Turning Data into Product by D.J. Patil (OReilly Media, 2012).
Conferences
Strata [http://strataconf.com/strata2013/public/content/home]
Structure:Data [http://event.gigaom.com/structuredata/]
Strata + Hadoop World [http://strataconf.com/stratany2012/]
16