BOOK - The - Basic - Principles - of - People - Analytics
BOOK - The - Basic - Principles - of - People - Analytics
BOOK - The - Basic - Principles - of - People - Analytics
PEOPLE
ANALYTICS
Learn how to use HR data to drive better
outcomes for your business and employees
WRITTEN BY
The Basic Principles of People
ERIKAnalytics
VAN VULPEN Copyright © AIHR Page 1
THE BASIC PRINCIPLES OF
PEOPLE ANALYTICS
WRITTEN BY
PUBLISHED BY
AIHR
www.aihr.com
www.analyticsinhr.com
ISBN: 9781097268757
© Analytics in HR B.V.
FOREWORD ...........................................................................................................................6
1. PEOPLE ANALYTICS.......................................................................................................9
5. TEAM SKILLSETS...........................................................................................................49
CONCLUSION ..................................................................................................................121
REFERENCES ....................................................................................................................122
The head of people analytics of a large Fast Moving Consumer Goods com-
pany is woken up at 5 in the morning by her ringing phone. After answering
with a moody “Olivia”, she is surprised to hear it’s the CHRO who tells her
to get out of bed and report in the office within the hour. Olivia has worked
for the same company for eight years already, in different roles. She has
never had her direct manager tell her to report to the office within an hour
– let alone at 5 AM.
At 5:55 AM the office looks deserted, with the exception of a few senior
managers who are scurrying past a puzzled looking security officer. He al-
ways liked the night shift because he would usually arrive after everyone
had left – and leave early in the morning before everyone arrives. Today
seems, however, different.
After arriving in the conference room and greeting her colleagues who are
all part of the HR management group, Olivia waits slightly nervous for the
CHRO to arrive, who, according to his secretary, is about to finish up an
emergency meeting of the board of directors.
At 6:05 AM the CHRO walks in. His hair has a suspiciously trendy out-of-
bed look – which, Olivia notices with a grin, is hard to pull off for a silver-
haired 60-year-old. The CHRO cuts right to the point. The board has been
informed of a hostile takeover attempt by one of their competitors. The
news will be public within a few hours and is expected to have a direct im-
pact on the business. The key short-term priorities of the board are to con-
tinue business as usual, while the board comes up with and executes a de-
fence strategy.
The CHRO notes that HR’s direct contribution is to ensure that employee
morale stays up, monitor anomalies in the workforce, including absence
Olivia has never done so much in one single day. She coordinated with her
people analytics team and asked them to actively monitor the chatter on
their internal social network platform. Through sentiment analysis the
team is able to summarize a lot of unstructured information into structured
themes and assess the associated sentiment. This helps to easily recognize
tone of voice, expressed emotions, and the contagiousness of these mes-
sages (through measuring comments, upvotes, and other social actions). In
addition, she worked to prepare a series of pulse surveys that will be sent
out every day over the next couple of weeks to measure the attitude to-
wards the takeover company. This type of survey sends out a number of
very specific questions to a small and randomly selected group of employ-
ees to get a proportional reading of employee attitudes while minimizing
the inconvenience of the traditional questionnaire. This pulse survey is also
a very good tool to test messages on an employee focus group and test
their perceived impact and tone of voice. So, after a hurried lunch at her
desk she spends two hours with the communications team to coordinate
and directly test the wording of a press release scheduled for later that af-
ternoon.
After spending well over 12 hours in the office, Olivia takes an Uber to
drive her home. She usually goes by public transport – but after an especial-
ly busy day like this, the 10-minute car ride is a moment to relax. When the
driver asks her if she takes the taxi service more often, she smiles. A few
days after the company started working with the sentiment analysis plat-
form that she championed internally, the company had announced their in-
tention to stop the reimbursement of Uber rides. The general sentiment in
the company was so negative after this announcement that, when she
The original version of this book was written in 2016. This second version
has been revised in April 2019. In this 2019 version, a lot of additional ex-
amples have been included. This book is published by AIHR, the largest on-
line academy in the field of people analytics. Enjoy the book and good luck
with your people analytics journey!
Business case
Google is one of the most innovative companies in the world. After being found-
ed, they experienced astronomical growth. The company expanded to more than
20 000 employees in ten years’ time, more than doubling their workforce every
single year. In 2007, the number of new hires peaked with 200 new employees
every week.
This meant that Google had to spend a tremendous amount of time on recruiting
and selecting new employees. Every new applicant was interviewed by the hiring
manager and by their future colleagues. Some managers spent half a week talk-
ing to new hires!
People analytics is about looking into these numbers. Instead of (or in addi-
tion to) relying on gut feeling, people analytics helps organizations to rely
on data – just like it helped Google evaluate their hiring process. This data
helps us make better decisions. By analyzing the data, decisions can be
made based on facts and numbers: people analytics is a data-driven ap-
proach to managing people at work (Gal, Jensen & Stein, 2017).1
In order to perform data analysis, you need data. This data often originates
from different systems. For instance, to perform their analysis, Google had
to ask their interviewers to rate candidates, as well as collect data from
their Applicant Tracking System and their Performance Review System.
Thus, people analytics often involves aggregating data from different
sources or systems, this aggregation requires programming skills as well as
knowledge of the company’s IT infrastructure. To analyze the data, you
need an analyst with an aptitude for working with data and statistics.
MISSING SKILLSETS
What happens if one skillset is missing?
When you say analytics, most people think of finance or marketing. These
are fields that already measure everything they can measure. On a website,
every button click is recorded, every conversion is measured, and every
sale is registered. In fact, a well-oiled Finance Department is able to show
the conversions for every single dollar spent on online marketing.
Now I need to recant this statement immediately. The old adage in the ear-
ly days of marketing was always: “Half the money I spend on advertising is
wasted; the trouble is, I don’t know which half”. Although we are very good
at tracking advertisement budgets and revenue coming from ads today, the
same holds true. Conversions on websites are attributed to the impression
that that person got when they saw the ad in their Facebook feed – but also
when that same person then searched for the product and entered the
website via a Google advertisement. The thing is that this is a discussion
regarding how we should measure. It’s not about if we should measure.
That distinction is important, because I’ve almost never heard a similar in-
depth discussion about HR data…
Around the turn of the new millennium, new research showed that inter-
views did not necessarily predict future performance very well. Indeed,
when interviews were not done well, they were a very unreliable tool for
selecting new candidates. It turned out that this also held true for Google.
Candidates who came to Google for a job talk never had a second chance
of making a good first impression. By the first handshake, the interviewer
subconsciously knew whether he liked or disliked the candidate. The inter-
viewer would then spend the next hour looking for cues that would confirm
his/her first impression. It turned out that when a candidate made a bad
first impression it was almost impossible to turn this bad first impression
into a good second impression.
The Google analysts found that the interviewing process did not reliably
predict which candidate would perform better than others. The only thing
it did measure accurately was whether or not the interviewer liked the
candidate! That was a big problem because managers at Google spent
roughly five to ten hours interviewing every new hire. This means that
some managers were involved in the hiring process almost on a full-time
basis. A lot of time and money was wasted in inefficient interviewing pro-
cesses. Yet, despite all that time and effort, managers at Google were not
hiring the best people.
The analysis also revealed that multiple interviews with the same candi-
date did not lead to a better estimation of future performance. After the
fourth interview, managers were just as good at estimating performance as
after the tenth interview.
So what does this teach us? The way Google hired was traditional. Man-
agers and employees at Google spent over a hundred thousand hours in-
terviewing new candidates in their first ten years. Only after the data ana-
lysts ran the numbers was it discovered that their interviewing system was
very time-consuming without actually leading to better hires. The numbers
showed that the interview process needed to become more efficient and
more effective.
People analytics is most important for HR and the CEO. HR data and ana-
lytics help HR to make better decisions about the way people are managed.
This means that it can potentially impact all HR processes, such as recruit-
ment and selection (as we saw at Google), compensation, learning and de-
velopment, or firing. Yet, it goes even further than this.
Bloomberg
Bloomberg, a major financial news and data company, sells terminals for
20 000 dollars a year. These terminals provide quick access to the latest
news, sales figures, and other data. Bloomberg tracks all keystrokes on
these terminals, both for their employees and for their customers.3 The
customer information can then be used to provide a better and more
streamlined service. The employee information is useful for analyzing how
often people work and how productive they are. Productivity, in this case, is
measured in keystrokes and in this way Bloomberg is able to analyze which
journalist produces content the fastest. In addition, Bloomberg tracks
when people check in and out of their 192 offices all over the world. Litera-
ture shows that people who arrive later at work are more likely to be ab-
sent in the near future or even switch jobs! (Griffeth, Hom & Gaertner,
2000)
Humanize
These examples are amongst the more futuristic examples but analytics
applies to many day-to-day examples as well. Some questions that can be
answered through analytics include:
• Does our free fitness program actually benefit our employees’ health
and happiness?
In the next chapters of this book, we will give you more examples and en-
able you to build a process for answering the questions that really matter
to you and your organization. Our goal is to make you more familiar with
In the early 1900s, Frederick Winslow Taylor published a book titled “The
Principles of Scientific Management”. In his book, Taylor, who was a me-
chanical engineer, applied the engineering principles familiar to him to the
work that was done by factory employees. According to Taylor, workers
would be more productive when their task matched their personal capabili-
ties, and when there was a reduction in activities and movements extrane-
ous to the task’s completion (Saylor Foundation, 2013)5.
One of Taylor’s followers was car manufacturer Henry Ford. Ford was a
successful businessman who had produced many different cars, which he
labeled alphabetically (the first being his Ford Model A). Ford’s newest car,
the Model T, was very popular amongst consumers. In its first year of pro-
duction Ford sold well over 10 000 vehicles.
This tremendous demand for cars forced Ford to consider more efficient
production methods. To achieve this, he hired Taylor to observe his workers
and come up with efficiency increasing ways to make new cars. Taylor rec-
ommended that larger car parts should remain stationary, while smaller
parts would be brought to the car. Ford studied Taylor’s observations and
applied his principles of scientific management to his production process.
Furthermore, he decided that the workers should also remain stationary.
The car would physically move from workstation to workstation, where
workers at each station would perform their specialized tasks before the
car was moved to the next station. This process would be repeated until the
car was complete (EyeWitness to History, 2005)6.
However, Ford found that to successfully complete their task, some work-
stations required more time than others. This led him to recalibrate tooling
At the same time, American social scientist Elton Mayo was conducting his
famous experiments at a plant in Hawthorne. Mayo was studying the im-
pact of lighting conditions on workers, exposing some workers to higher
levels of illumination than others. When Mayo measured post-intervention
productivity, he found that workers were 25% more productive compared
to when he began the experiments. No matter how physical conditions
were altered, workers were still more productive. What happened?
Personnel management
From that point on, history shows a growing emphasis on job enrichment,
rapid technological progress, surging global competition, and the rise of the
service industry, in which employee’s skills are particularly valuable. These
factors pressured the personnel department into changing focus from per-
sonnel management and administration, to a role that centered on the rein-
forcement of company policy and culture through people practices. So, as
the employee has become part of the company’s (human) capital, the core
of employee management shifts to growth and engagement, and manage-
ment practices aim toward getting the most out of people. In addition, HR
professionals are now called business partners and serve as support to line
managers. Compared to personnel management, an efficient HRM depart-
ment offers a number of integrated services: recruitment, hiring, firing,
learning and development practices, and performance appraisal. These ser-
vices have become more integrated and in line with the company’s vision
and strategy. This integration has been coined Strategic Human Resource
Management.
This is where people analytics enters the picture. Where personnel man-
agement focuses on administration and HRM focuses on supporting em-
ployees, people analytics brings the science back to HR. People analytics
allows HR to quantify its efforts and impact in order to encourage better
people decisions. It is, in a literal sense, a revival of people driven scientific
management.
This idea, that people are best managed by taking a data-driven approach,
is new to many HR practitioners. Instead of relying on gut feeling, HR de-
ploys analytics so as to speak the same language as all the other depart-
ments in the organization: numbers. People analytics lets HR convert a
(people) problem into a numeric rating and a dollar amount. It potentially
enables HR to calculate the Return on Investment (ROI) of people policies.
An ROI shows the added value of these policies and gives HR the power to
show that it can actually help the business earn more money by hiring the
right people and making better people decisions. Although reducing a per-
son to a single number sounds scary to some, it offers HR a weapon that
aids in establishing its position as a serious business partner. HR analytics
and people analytics are strong tools for HRM to become more strategic.
In fact, it’s believed that HR can only become a true business partner when it
quantifies its own impact and actively influences business decisions using
data. If not, HR remains a business assistant that does important work, with-
out adding to the value and competitiveness of the business.
Interest in people analytics has spiked in the last couple of years. According
to ahrefs, a leading Search Engine Optimization tool, around 6,400 people
Google for people analytics every month. This term is most popular in the
U.S., Brazil, and India. A similar trend is visible for HR analytics, with over
11 000 Internet users searching it every month.
Google Trends (showing relative interest) on HR analytics. Search popularity has increased
over 1800% since November 2007. An almost identical trend is visible for people analytics.
The words people analytics and HR analytics are often used interchange-
ably. Although the term HR analytics has frequently been employed, there
has recently been an increased need to put HR analytics into a broader
perspective. At the same time, the term people analytics has become more
popular especially due to the rising demand for HR analytics as a separate
center of excellence within existing companies.
• Evidence-based HR
• A competitive advantage
Evidence-based HR
• Will our managers become better managers when they take leader-
ship training?
• Does the sales training we offer impact our people’s sales perfor-
mance?
In order to answer these questions, we need to run the numbers, just like
Google did in our example in chapter one. Google asked the question: Can a
hiring manager predict employee performance? According to literature,
they could not. Yet, no manager would believe that their hunch about how a
new hire will perform was incorrect. Only by running and showing the
numbers, could HR prove that the manager’s hunch was indeed incorrect
and that new hiring practices were necessary. This is evidence-based HR.
During their highly selective training, the U.S. Special Forces predict which
candidates are most likely to succeed. Two key predictors are ‘grit’ and the
ability to do more than 80 pushups. Grit proved a more accurate predictor
of training success than IQ.
A friend of mine once told me a story about her cleaning business. In order
to retain customers, she wanted to raise customer satisfaction. So, she
started training the customer service employees to provide higher quality
customer care. Contrary to what she had hoped, this had no impact on cus-
tomer satisfaction (which was measured throughout times a year).
After talking to several customers, she discovered that it was the cleaners
who made the biggest impact on customer (dis)satisfaction, not the cus-
tomer service employees. The cleaners were the ones who worked the cus-
tomers’ homes and offices so they were the ones who were in direct contact
with the customers. They needed to be more flexible when office workers
put in overtime or when homeowners came home earlier. This often con-
flicted with cleaning schedules.
The customer care problem was solved by training the cleaners in customer
etiquette and providing them with more autonomy in scheduling. This had
a tremendous impact on customer satisfaction and customer retention.
The attentive reader will point out that my argument in this section focuses
on the advantage of data-driven HR, not evidence-based – like I claimed in
the subtitle. The difference may be mostly academic but I do think it is im-
portant to elaborate on this. Data-driven means that progress in an activity
is compelled by data instead of by intuition or personal experience.17 Evi-
dence based is more than this. Evidence based working is the application of
the scientific method on day-to-day business challenges. This involves a
number of steps:
• Analyze the data. This is where the analytics and statistics comes in –
and this is the part that some hard-liners refer to as ‘the real peo-
ple/HR analytics’. As I stated in earlier in this chapter, we take
broader definition.
“White candidates” (John Andrews and Jenny Hughes) were invited for an
interview 23% of the time. “Black African applicants” (Abu Olasemi and
Yinka Olatunde) were invited only 13% of the time. “Muslim
candidates” (Fatima Khan and Nasser Hanif) were only invited 9% of the
time.
The success rates of the applicants varied wildly despite their identical ap-
plications and CVs. The article suggests that people who make the selection
harbor a racist view, however unconscious it may be. And this is in a time
when organizations are increasingly and actively trying to promote diversi-
ty!
1) RGRRR
2) GRGRRR
3) GRRRRR
As mentioned before, the dice has four green and two red sides. People,
therefore, perceive ‘green’ to be a more likely outcome compared to ‘red’.
Since the first option only has one ‘green’ outcome, and option 2 has two
‘green’ outcomes, option two seems to be more likely to happen. This is why
most people would choose the second option. However, option 1 and 2 are
This is another example of how (in this case very simple) decisions are ex-
posed to biases that we are not aware of. These biases color our judgement
despite our best efforts to make good, rational, and fact-based decisions.
Like in our previous example, it shows how evidence-based decision-mak-
ing can help people make better and more accurate decisions. The remark-
able thing is that humans are bad decision-makers. The example in chapter
one already showed that managers were unable to accurately predict per-
formance. Even if an algorithm is able to predict only 30-40 % of future per-
formance, it already outperforms humans. This is why analytics gives us the
potential to make better decisions and be fairer to everyone.
The strategic role for HR was first advocated by Ulrich in his 1997 book
‘Human Resource Champions: The Next Agenda for Adding Value and Delivering
Results’.22 In this book he advocated for four HR roles. The role that got the
most attention was HR as a strategic partner. Ever since, HR professionals
have been painfully aware of their lack of strategic focus. In order to play a
more strategic role, HR has to be able to show its added value. We saw this
in our Google example: by selecting better candidates, analytics enabled
the company to build a stronger and more suitable workforce and thus
added to the long-term profitability of the company. HR should enable the
business to reach its organizational goals through the creation of an effec-
tive organization and by measure HR’s contributions to these goals. On top
of that, people analytics enables HR to save time and money, i.e. become
more efficient. This data-driven approach will help HR to become a more
strategic partner.
Now this doesn’t mean that the other traditional roles of HR will become
less important – to the contrary! The value of people analytics depends on
the quality of the data that is used. Being an administrative expert will en-
sure high data standards. At the same time, analytics may lead to a number
of conflicts in interests between the employer and the employee. This is
where the employee champion role comes in. Lastly, the implementation of
ULRICH MODEL
A final consideration for the value of people analytics is the employee fo-
cus. There is a good case to be made that it is a firm’s ethical and sometimes
legal duty to take care of its employees. An example of the latter involves
Danish companies, which are required by law to report how people con-
tribute to value creation, or Dutch companies, which have a mandatory
duty of care to be a “good employer”. People analytics can help firms in this
process, as illustrated in the following text.
Due to the aging workforce, the pensionable age worldwide is steadily ris-
ing. Countries like the U.S., Ireland, Spain, Germany, and France will in-
crease the retirement age of workers over the next few decades. This means
that people are leaving the workforce at an older age – and have to work
longer. In general, these seniors are more frequently absent compared to
younger generations.
The people analytics team in this company decided to analyze this specific
group using both quantitative and qualitative methods. Research showed
that absence for seniors is often caused by chronic illnesses, which are
more prevalent at an older age.24 As such, healthy seniors are not necessar-
ily more absent. In line with this research, the team found that the inter-
ventions were effective for people who experienced high workloads and
work stress (often because of physically challenging work) but they did not
make much difference for the majority of this group, most of whom were
healthy and liked their job.
Only a small minority of companies has fully developed their analytical ca-
pabilities. We have already listed a large number of benefits that people
analytics provide. Why don’t all organizations have a fully developed ana-
lytics department?
The answer to this question is complex. There are a number of reasons why
HR lags behind the rest of the organization in terms of analytical capabili-
ties. The next few paragraphs will give an overview of the constraints hold-
ing back HR. These constraints are also likely to limit the adoption of peo-
ple analytics within the company you work in.
Lack of skill
That being said, the skills needed to run an effective HR department have
changed over time. Analytical capabilities require knowledge of data ex-
traction, aggregation, and data structuring. Since the traditional HR de-
HR often struggles to get past the wall of Boudreau. This is because, on one hand, data
from multiple systems need to be combined in order to be properly analyzed while, on the
other hand, more advanced data analytics methods are required do the actual analysis.
partments lack the IT and data analytics skills to adopt an analytical ap-
proach, a lot of organizations struggle to apply people analytics.
Wall of Boudreau
The lack of skill impacts HR’s ability to adopt more advanced analyses. HR
is proficient in creating scorecards and reporting basic data like the number
of sick days people take and benchmarking performance between depart-
ments. These descriptive analytics are relatively easy to produce. However,
HR is typically unable to engage in more advanced analytics. When HR
wants to undertake predictive and prescriptive analysis, it hits a wall. This
‘wall’ was first mentioned by Boudreau and Cascio (2010) and has, there-
fore, been coined ‘the wall of Boudreau’.25 According to Boudreau, HR gets
‘stuck’ because it lacks the skills necessary to use more advanced analytical
methods.
The good news is that companies are increasingly combining their existing
data sources in new (cloud) data storage solutions. This enables them to
play with people data in existing business intelligence systems and makes
extraction of data easier for HR data analytics professionals. This accumu-
lation of aggregated data is one of the drivers behind company-wide ana-
The wall of Boudreau shows that HR has to pass through a few ‘phases’ to
develop analytical capabilities. In an effort to help organizations reach ana-
lytics maturity, Bersin by Deloitte created four talent analytics maturity
levels. Organizations that struggle with descriptive analytics have a lower
analytics maturity level compared to organizations that are actively making
predictive analytics. These maturity levels help organizations to identify
where they currently stand, and what they need to do to develop mature
analytical capabilities. We will discuss this more in depth in the next chap-
ter.
At the time of writing the first edition this book (late 2016), the majority of
companies were at level 1 and 2. These organizations primarily focus on
operational reporting. Metrics such as headcount, attrition, cost of labor,
absenteeism, and attrition are also reported. However, not much is done
with this information. This kind of reporting is part of day-to-day business,
and keeping the reporting up to date is usually time-consuming.
There is a high hygiene factor associated with this type of reporting. Hy-
giene is something that is taken for granted; when someone is hygienic, it
goes unnoticed, but when someone isn't, people surely notice. The same
goes for HR data: you won’t get recognition when the data is up-to-date,
but if it’s not, you will have a problem. Data-driven decision-making is hard
At the time of the first revision of this book (early 2019), Deloitte has up-
dated their people analytics model into a model that addresses this cri-
tique. This model is a bit more complex and – arguably for that reason – less
popular. It also has four levels but focuses more on the process. I will now
At level 2, companies have started with their people analytics journey and
are actively trying to consolidate data, improve accuracy, timeliness, priva-
cy, and security. There is a focus on creating a single source of truth, which
is a data warehouse that contains all the company data. 42% of organiza-
tions at this level have this data warehouse already, while most others are
in the process of building one. These organizations are able to report data
at least at a basic level and have a dedicated people analytics leader to build
a centralized team. They make consistent use of embedded analytics tools
in core HR systems. Only 24% of these organizations report having an HR
function that successfully aligns HR actions and initiatives with business
goals. People analytics is mostly limited to the HR community but there is
some cooperation to align key metrics’ definitions. Approximately 69% of
organizations are at level 2.
3
BUSINESS COMMUNICATION TOOLS 21%
*Note that percentage for this metric are not intended to be additive. “High maturity” refers to
organisations at Levels 3 and 4, while “low maturity” refers to those at Levels 1 and 2.
1) A business context
2) A marketing context
3) An HR context
5) An IT context
MISSING SKILLSETS
People analytics consists of a combination of different skillsets, some of which are rare to find in HR.
It’s not necessary to have a large analytics team. Different team members
can fulfill multiple roles within the team.
A business context
However, when done correctly, analytics can have a dramatic impact on the
business.
A marketing context
It is not enough to just tackle the key strategic issues. In order to promote
meaningful change, HR needs to be able to translate numbers into tangible
Translating data into actual insights is no simple feat. The way data is pre-
sented to people can have a bigger effect on what people do with it than
the data itself. As such, considering the different ways to present the data
before carefully choosing the format will increase the data’s impact. You
can, for example, present information in a dashboard that managers can log
in to, or you can send them an occasional email with a PDF report. Often-
times, and especially when the data does not play an important part in the
day-to-day business, managers forget to log into the dashboard and don’t
look at the data at all. In that case, a monthly or quarterly PDF report sent
to their email is more likely to lead to an action than a self-service dash-
board. In other words, the impact your data will have depends on the way
you present and deliver it to your target audience.
It is also important to think about the data you should not present. It is
tempting to show everyone all your data. Nevertheless, that would result in
an information overload for the average manager – they would see the data
but won’t do anything with it. That being said, effective action is encour-
aged by presenting only the data that is crucial to eliciting the appropriate
action. Nothing more, nothing less.
• Knowing how and which data add value to the business – and which
doesn't
An HR context
For example, when you want to predict flight risk, there are a number of
factors you should consider. Age, tenure, sex, education, and seniority are
all relevant factors. However, there are many more factors. In the field of
occupational psychology, these turnover drivers have been studied in-
tensely since the early 1930s. This knowledge contributes to the identifica-
Previously, I worked with a firm that found out that employee turnover was
especially high in their international operating division, but they could not
explain why. It turned out that people in these divisions frequently traveled
between different countries and spent many nights in hotel rooms, away
from their homes. By including the number of hotel bookings per employee
into their analysis (frequent international travel is a stress factor), the firm
found that this factor greatly influenced the actual turnover, especially for
recently married women in their thirties. Most importantly, this was a great
factor to take into consideration, as the firm could influence it relatively
easily.
First of all, a data analyst needs statistical knowledge. Simple relational an-
alytics like correlation and regression analyses, but also more complex
models like predictive analytics and data mining techniques, require a solid
understanding of statistics.
When an analyst selects data, he/she has to know what a relevant sample
size is, how different variables interact with each other, and how these can
be included in an analysis. Statistical knowledge is also helpful in selecting
the right tools and techniques to do data analytics. For example, when ana-
lyzing turnover, you can use a regression model to estimate the most im-
portant drivers of employee turnover. But you can also use a survival model
to estimate the chances of employees leaving the company based on cer-
tain factors. Both analyses offer interesting results and answer a similar
question. Choosing the analysis that best fits the business problem is part
of the statistical skillset. We will talk more about different data analysis
techniques in chapter nine.
Since HR analytics applies best to larger organizations, the size of data sets
is also larger. Tools like Excel and SPSS can only handle so much data before
they start clogging up your computer’s memory and start struggling with
R is a tool for statistical computation and graphics. It enables the data ana-
lyst to quickly import, manipulate, and analyze data through text com-
mands. This makes it less intuitive compared to Excel and SPSS, but it is
much more powerful and nimbler in dealing with massive data sets.
In terms of competencies, the skillset required for the data analytics con-
text includes:
• Being able to work with software like Excel and SPSS/Stata or other
relevant software
An IT context
A data analyst’s skills are more closely linked to the IT context than any of
the other contexts. Depending on the type of analysis, different data are
required. So, it is beneficial to understand IT structures when aggregating
data from different data sources. For example, when a company wants to
relate engagement data with performance outcomes, it needs to extract
demographic personnel data from the main HR system. Performance data
originates from a performance management system while engagement
data is most often collected by a third party. Aggregating these different
data sources is a challenge that requires a specific set of capabilities. It is
not uncommon for analytics teams to request access to real time data for
In people analytics, there are two processes that are often confused with
each other. One is dashboarding, the other is analytics. A common problem
is that People Analytics (analytics being the active word) is being viewed as
nothing more than a reporting activity. So let’s clear these two definitions
up. First of all, we have reporting. This involves gathering data and display-
ing it on dashboards and reports. While reports are valuable and can help
to steer business, they focus only on the here-and-now rather than on what
is likely to happen in the future; that is, they are not predictive. Further-
more, they do not recommend courses of action to correct problems; that
is, they are not prescriptive. Secondly, we have analytics or statistical mod-
elling. This involves proactive activities such as sorting your employees
from low to high performers and then identifying the factors that distin-
guish low from high. This information can then be used to recruit and de-
velop more high performers.
Proper data management looks like the above. Different data sources are
stored in a data warehouse. This is what was referred to when we wrote
about a ‘single source of truth’. That’s what you try to achieve with a data-
warehouse that is the place where you will find all your data.
From this single source, reporting dashboards can be crated, and data can
be analyzed. Where the latter is still a mostly manual process, the former
can be fully automated. This is a top priority for a lot of people analytics
functions as an automated reporting function will free up significant re-
sources that can then be used for advanced data analysis. This full process
falls under the IT context and will be very hard to achieve with just HR pro-
fessionals.
For those who want to learn more about the vital role of the HR analytics
leader, check van den Heuvel and Bondarouk’s 2016 paper titled “The rise
(and fall) of HR analytics: a study into the future applications, value, struc-
ture, and system support”. This paper describes a set of Dutch organiza-
The marketing focus helps to advocate and ‘sell’ analytics within the orga-
nization. How managers act on data is influenced by the way it is presented
to them. Additionally, different people in different departments and differ-
ent levels of the organization want to see different things in the data. Hav-
ing a customer-driven (marketing) approach will greatly aid in promoting
When one or more of these skillsets are missing, teams tend to run into trouble. By identifying
these problems, the team can oftentimes identify in what area they lack capabilities.
analytics. Without this focus, analytics will still provide beautiful insights
but their impact on the business will be diminished due to low adoption.
When a team lacks an HR focus, they run the risk of relying too heavily on
the available data. HR analytics is essentially an applied science. As is com-
mon in applied science, research (analytics) start with what we already
know. Based on this information, hypotheses are created and tested. With-
Perhaps most importantly, the team needs data analytic capabilities. These
skills are vital to select and clean the relevant data, but also to choose the
most appropriate analytics. Without this skillset, the team will fail to sur-
pass operational reporting, fail to effectively analyze data, and ultimately
fail to apply more advanced strategic and predictive analytics.
The team needs IT skills to effectively aggregate data and automate report-
ing functions. Without the knowledge of IT infrastructures or the ability to
extract data, the analyst will struggle to obtain data from different systems.
This will hinder, or even halt the analytics team’s progress. Last but not
least, the HR analytics leader ensures internal support and effective
project management to reach the goals of the people analytics function
within a set timeframe.
Now, you ask, how does people analytics work? The people analytics
process can be divided into five sequential steps. Every organization has to
follow these steps in order to successfully complete a people analytics
project.
Before you start analyzing your data, you will need to know what questions
you want to answer, or what hypothesis you want to validate. Don’t just
start with any question: choose a question that marks the CEO’s top priori-
ty.
The people analytics cycle involves five steps, which are often repeated multiple times to
successfully use analytics to solve a business problem.
In chapter nine we will discuss the basics of data analysis. We will explain
the different methods of data analysis and illustrate them with examples.
Finally, chapter ten tells of the interpretation and execution of your results.
The first step in the people analytics process is about asking the right ques-
tions. All research starts with one or more questions or hypotheses. They
provide guidance as they structure the entire research project. Your hy-
pothesis influences what data you need to select, how you analyze your
data, and what actions you take to execute on the insights that the data
yields. Thus, this chapter will examine how to ask the right question.
This was an amazing discovery and a win for the people analytics team.
The team was also able to identify factors that contributed towards em-
ployee turnover and could advise manager and HR business partners based
on their data. In the end, they created a dashboard that was accessible to
key managers inside the organization.
People analytics provides both HR and the CEO with tools to produce
amazing insights. Once a good analytics team is in place, its success within
the organization depends on whether or not it is able to solve important
business problems.
Of course, these topics differ per country and organization. Public organi-
zations struggle more with the costs of absenteeism, while private organi-
zations struggle more with high levels of turnover. As we mentioned earlier,
most organizations in the Netherlands do not struggle with turnover. How-
ever, employees with long-term work-related disabilities, like burnouts, are
top of mind. A recent Dutch regulation dictates that some companies have
to pay salaries of disabled employees for up to twelve years after they have
fallen ill. Paying a single employee’s salary for this time roughly equals to
500 000 euro. If analytics is only able to prevent a single employee from
having a burnout every year, it will already benefit the organization. This is
a topic that is top of mind for the CEO.
In the US, for example, turnover is a much more important issue. In fact,
turnover analytics is a starting point for people analytics in many compa-
nies in both the U.S. and Europe.
It turned out that Jane was losing money just as fast as he was losing em-
ployees. Her organization’s turnover was greatly reducing her profit mar-
gins, and she wasn’t even fully aware of it. There are a number of costs as-
sociated with high turnover.
1) Knowledge and contacts are lost: Besides loosing specific (tacit) knowl-
edge, the company loses connections as well. This can be especially
painful for an accounting firm like Jane’s. When clients stay with the
firm for multiple years, chances are that they will have different accoun-
tants over this period of several years. The new accountant has to be-
come familiar with the client company again, and thus expends valuable
time for the customer. Contacts are even more vital for sales people as
they can take their clients with them. In addition, turnover has a large
impact on long-term tenders and projects. When key personnel leaves,
they take years of (sometimes irreplaceable) knowledge with them.
This also emphasizes that, in order to apply people analytics, you should
look at the best ways of adding value to the company. This means that the
issues you’ll work on need to connect with a top business priority and that
HR (analytics) should add value to that specific priority.
When we take a step back and examine the role of people management
within a company, we often see HR struggling to add value to the business.
On the one hand, HR struggles to create value, and on the other hand, HR
struggles to show how it adds value. In order to become both more benefi-
cial and more strategic to the business, HR should be more concerned
about adding value.
These are great goals. They are, however, not enough to create value for
the business. The question remains as to what the impact is of supporting
the line manager, or how to be taken more seriously by management, and
why that adds value to the business. In order to create impact, HR should
examine its added value. When HR becomes aware of its added value, oth-
er initiatives, like people analytics, also greatly increase in value.
Ulrich and Dulebohn (2015) write that HR practitioners should focus more
on the results of their work, instead of focusing on the work itself.35 In or-
der to achieve this, HR practitioners need to explain why they do what they
do. This is best done in a “so that” statement.
Ulrich, 2015, p. 6.
The tricky thing is that HR professionals find it very difficult to define their
added value. In order to identify this, HR professionals should answer why
they do what they do, twice. This added value is often tremendous, but also
invisible. By asking the “so what” question two times, HR will have a much
easier task in specifying how it adds value to business processes.
Only by answering the “so what” question can HR specify how it adds value
to key business challenges, such as doing business in emerging markets and
stimulating product innovation.
Once you know what questions you want to have answered, you can de-
termine the data you need to conduct your analysis. HR analytics and peo-
ple analytics are deeply rooted in quantitative science. This means that
there are a few key principles that you need to remember when conducting
an analysis. These principles prevent you from drawing incorrect conclu-
sions.
There are three key principles you need to keep in mind when you select
your data. The first one has to do with the level of analysis, the second with
the importance of context, and the third with the complexity of the out-
comes.
Level of analysis
Every variable can be grouped into one of these levels. For example, indi-
vidual performance ratings say something about the individual. Team per-
formance says something about a group. Revenue says something about
the entire organization. These three variables are attributed to different
levels.
With every analysis you do, it is very important to keep in mind the relevant
level of analysis. For instance, the individual performance of all team mem-
bers does not equal the performance of the team. There are other factors
at play that influence team performance. When the personalities in the
team are not compatible, or people have overlapping skillsets, a team will
be less likely to perform well – even though each team member is a star
In line with this, when all the divisions in an organization perform well it
does not mean that the overall organization performs equally well. If the
divisions do not cooperate and lack synergy, the organization as a whole is
less likely to benefit from the excellent performance of its individual divi-
sions. When you look at divisions separately, you miss the synergies that
can take place, which can potentially make the whole greater than the sum
of its parts.
In other words: you can't fully deduce the effects of one level, based on
variables that say something about another level. For instance, you are less
likely to find an effect when you want to relate individual engagement lev-
els to organizational performance than when you want to relate individual
engagement levels to individual performance. The level of analysis is there-
fore important to keep in mind for every analysis you’ll do.
To find the strongest effects, you can best stay on the same level of analysis.
Of course, you can analyze relations that cross a single level, e.g. relate in-
dividual engagement levels to team performance, but you should be aware
that information gets lost (for example, the synergies that happen when
people work together). Analyzing relations from the individual level to the
organizational level is much harder to do because you will simply miss too
much information. Relating individual engagement levels to organizational
bottom line performance is therefore harder to do because, similarly, you
will simply miss too much information in your analysis. This will reduce the
effect of the predictor variable and lead to insignificant and potentially use-
less findings.
When you use people analytics, context is very important. When you want
to explain a team’s behavior, you need to pay attention to all the factors
that play a role in predicting this behavior. However, context goes further
than just the level of analysis you use.
In contrast, Groysberg found that when a star was hired by another com-
pany, his/her performance plunged. Goysberg’s data showed that 47% of
analysts did poorly in the year after they left their firm. Performance
dropped by about 20% and did not recover, not even after five years!
Now, why did the performance of these star analysts drop as soon as they
switched jobs? What happened is that these stars’ performance is only par-
tially explained by their personal skills and capabilities. James Cunningham
was still a very smart and capable analyst after joining First Boston. How-
ever, he was not the best anymore.
“Resentful of the rainmaker (and his pay), other managers avoid the
newcomer, cut off information to him, and refuse to cooperate. That
hurts the star’s ego as well as his ability to perform. Meanwhile, he has
to unlearn old practices as he learns new ones. But stars are unusually
slow to adopt fresh approaches to work, primarily because of their past
successes, and they are unwilling to fit easily into organizations. They
become more amenable to change only when they realize that their per-
formance is slipping. By that time, they have developed reputations that
are hard to change.”
When you see something happen within your organization you should al-
ways ask yourself about the context in which it happened. This holds espe-
cially true for performance ratings. In general, we tend to underestimate
the influence of external factors and overemphasize the role of internal
factors. This means that we attribute both good and bad performance
mostly (or exclusively) to the person’s judgment and skills, while we forget
the importance of the environment and the role of colleagues and bosses.
This is what psychologists call the fundamental attribution error.
Complexity in outcomes
Selecting the right data sources is key to conducting your analysis. Say you
want to predict performance, how would you define it? Is it the number of
sales? Is it customer satisfaction? Is it manager-rated performance?
These are real questions. Sales employees can receive a favorable rating
from their manager but if their sales numbers don’t add up, they are not
useful to the organization. Or are they? With the previous examples in
mind, how do these sales people contribute to the team and support others
in their sales efforts? These are questions that you have to start asking
yourself, before you start your analysis.
If you want to know which team is best at playing ice hockey, you should
not look at who wins most of the time, neither should you look at who
scores the most goals. You should look at who has the most ‘shot-at-
goal’ (SAG) events. Here is why.
On average, a National Hockey League team scores 450 goals, has 5 000
shot-on-goals (SOG) and 9 000 SAG. Whereas SAG includes all shots di-
rected toward the goal, SOG only counts the shots that got stopped by the
goaltender or that scored a point. This means that for every game won, an
average of 2.3 goals are scored, 7.8 SOG, and 10.6 SAG occur.
This example will make you look differently at how you measure sales, es-
pecially when you talk about complex ‘solution sales’. The sales cycle in
business to business solution sales can take up to 1.5 years. Like in hockey,
there are other metrics that predict sales success better. Examples could
be the number of contacts a sales person has or the number of phone calls
he/she makes.
Thus, complexity in outcomes means that the more complex (and rarer) it is
for your work to have a successful outcome, the closer you should pay at-
tention to how you can reliably measure success.
After you’ve thought about which analyses you want to run and identified
the specific data you need for these analyses, you’ll get to the next step:
data cleaning. This is a very important step. A common saying in data analy-
sis is: “garbage in, garbage out”. You can put a lot of thought and effort into
your data analysis and come up with lots of results – but your results will
mean nothing if the input data is not accurate. In fact, the results may even
be harmful to your workforce because they misrepresent reality. This is
why data quality, or integrity, is so important.
HR data is oftentimes dirty. Dirty data are data records that contain errors.
This can be caused by different things. Data can be missing, the same func-
tions may have multiple and/or different labels, there may be multiple
records for the same people in multiple systems which do not perfectly
match, and so on.
Of course you can start cleaning all your data at once. However, this can
take tremendous amounts of time so it is much smarter to carefully select
and clean only the data you need to perform a specific analysis. This ap-
proach will prevent a lot of unnecessary work and produce results faster.
Data management
When you are cleaning data, you will inevitably change it; e.g. you manually
add a missing record or change a misspelled name. Depending on the quali-
ty of your HR data, this data-cleaning phase can take of a lot of time but will
also improve your data quality. Higher data quality will lead to more accu-
rate analyses. This also means that you end up with a dataset with data that
is more valuable than the data originally extracted from the system. Since
this data is of a higher quality, it’s preferable to store it in a manner con-
ducive to later use.
performance – thus helping you to specify the attributes you need to focus
on in the selection procedure.
If you want to improve the data coming from your system, start by review-
ing your system configuration. To determine if system configuration
changes are needed, a company can gather and track the repeated errors
within the system. They also can look at their past audits to see how the
system’s configuration contributed to the error.
Companies often cite user error in data entry for poor integrity, but data
entry error should never be considered the root cause of a repeated issue.
These situations either need improved configuration or additional training.
High-quality data is a byproduct of proper system configuration.
There are many simple configuration changes that can quickly improve the
integrity of data entered into the system. Below we will list four of the most
common configuration changes:
Mandatory fields
Example: Birthdays are a mandatory field in the system, but your German
works council does not allow this data to be stored. Local German HR will
need to enter a fake date to bypass the mandatory field (01/01/1900). If an
employee were to transfer from Germany to the UK and the birthdate field
is not corrected, then inaccurate data creeps into other parts of the system.
This spread of inaccurate data can cause a once localized exception to bring
the entire field under question. Allowing the field to be left blank is a better
solution for high data integrity.
Rule #1: Eliminate mandatory requirements for fields not needed in every
country or not always available at the time of entry.
The chart below illustrates the difference between blank data and using
dummy values to complete a field. At first glance, a fully completed field
looks good in a system, but this often hides inaccuracies and make errors
difficult to decipher as shown with the highlighted entries.
Duplicated information
While there is often a business need for these specific individual fields, look
for alternatives to identify the required information.
Benefits
First Name Last Name Employee Status FTE Hours Work Schedule
Eligibility
Unnecessary Fields
Example: A system has a default field for “Number of Children” which seems
potentially useful in the future. The field is left in the system as self-report-
ed. Some employees fill it out. Some include all children, and others include
children they list on their insurance. It’s not maintained or reported by any-
one.
The CEO then sees the fields and asks for an analysis of children’s impact
on turnover. The poor data quality of this field readily becomes apparent
and eliminates any opportunity for meaningful analysis. It’s impossible to
determine which employees omitted the question versus those who do not
have children. It’s difficult to know if the information is up-to-date or accu-
rate as the number of children is a changing figure. Since the data has never
had a purpose, there is no way of knowing the data quality. Also, a work
council, or data protection officer, may have approved this field, but may
not be comfortable with it in the context of turnover analytics.
Where does local HR look when they need data to answer a question? Is it
in your HRIS system? A separate spreadsheet? Payroll? If the answer is any-
thing other than your global HRIS system, you need to investigate the rea-
son. A system that is updated as an administrative task for local HR and
provides them with no value will result in low data quality.
Example: Local HR adds a new hire with the incorrect title. This error
may not be noticed until a promotion/review period or another time it
is brought to the manager’s attention. If the system allows managers
and employees to view their job title anytime the error will be seen
quickly and can be corrected immediately.
When it comes to high quality data, there are two criteria that are of par-
ticular importance. These are validity and reliability. When data is not valid
or reliable, it may tell you something other than what you were looking for.
The following section describes this.
Validity
The city of Boston created an app that their drivers could install on their
smartphone. The app would measure bumps in the road and report their
location via GPS. These bumps were then recorded and the city road ser-
vice would fix them. According to a spokesperson: “[the] data provides the
city with real-time information it uses to fix problems and plan long-term
investments”.39
However, not everyone benefitted equally from this system. The app was
mainly used by young people and in more affluent communities, while the
poorer communities did not have equal access to smartphones and mobile
data. This is a significant bias in the data.
2) Are there any significant biases in the way we measured our data?
Reliability
Reliability is about measuring the same thing over and over again and
achieving the same result. When you measure someone’s engagement in
the morning you want to have a similar result as when you measure it again
in the afternoon. This is because engagement is a trait that is relatively sta-
ble over time. The same holds true for different raters. If you ask both Bill
and Jim to rate Wendy’s engagement, you want both Bill and Jim to give
Wendy the same rating. However, when the scales that are used to rate
Wendy's engagement are vague and open to different interpretations, Bill
and Jim will likely give Wendy different ratings. This is called a rater bias,
which is best avoided.
This might sound obvious but it is not. Oftentimes reported data is influ-
enced by other factors, like the instructions that are given and the mood of
the person who gives the rating. This is the big question when we talk about
reliability: Are the same scores achieved when the same data is measured
in the same way by different people and at different times of the day/week?
1) Did we consistently produce the same results when the same thing
was measured multiple times?
Alyssa wrote that in order to improve the data coming from your system,
you need to gather and track repeated errors. This section will explore this
in more detail by providing a data cleaning checklist.
This is a very practical checklist with six steps for data cleaning:
3) Check data labels across multiple fields and merged datasets and
see if all the data matches.
6) Define valid data output and remove all invalid data values.
a) This is useful for all data. Character data is easily defined (e.g.
gender is defined by M or F). These are the valid data values.
Any other values are presumed to be invalid. This data can be
easily flagged for inspection by using a formula.
Based on the outcomes of this checklist, you can identify if data problems
are incidental or structural. As stated earlier, structural data problems need
to be fixed by improving systems and data practices.
results.
Check IDs
Check data labels of all
for categorical data.
some categorical values
numerical variables.
are mislabeled.
Non-matching data is
In this chapter, we will dive into the actual analytics. First, we’ll discuss the
three main categories of data analysis followed by several examples of dif-
ferent data analytic techniques. Data analytics is all about finding relation-
ships between variables. For example, a lot of people talk about how impor-
tant employee engagement is for performance. Data analytics can be used
to see how engagement (variable 1) impacts performance (variable 2).
There are multiple ways of analyzing how one variable relates to another
and a few of these ways will be exemplified later.
As you read in chapter 4, the three main categories of data analysis are de-
scriptive, predictive, and prescriptive analytics. These categories of analyt-
ics form the basis of people analytics and business intelligence in general.
As we mentioned at the beginning of the book, business intelligence refers
to the techniques and tools used to derive useful insight and information
from raw data. People analytics is a specific example of business intelli-
gence.
Descriptive analytics
Descriptive analytics is the simplest class of analytics; the analysis gives in-
sight into the data. E.g. descriptive statistics can show you how many em-
ployees left the company last month and how much this number increased
compared to the month before. These analytics are well known for most
people as they can be done using standard reporting tools.
This type of analytics enables the user to summarize what happens and see
how different data are correlated, such as traditional dashboards, score-
cards, and business reports. Descriptive analytics is often referred to as
‘slice and dice’, as it enables the user to play with the data by calculating the
population size, mean, median, minimum and maximum, frequency, etc. of
their dataset. Some business tools that provide descriptive analytics are
Predictive analytics
As a more advanced class of analytics, predictive analytics can, for instance,
show you how many people are expected to leave in the next month and
how many more are expected to leave the months after.
Predictive analytics answers the questions “what will happen?” and “why
will it happen?”. These analytics provide a much more tangible grasp of the
data by enabling the user to predict, or forecast, what is likely to happen. As
you can imagine, these tools can be very powerful and, when applied cor-
rectly, have the potential to directly impact decision-making. For example,
when you want to predict which employees are likely to leave your compa-
ny, or how investments in learning and development will impact next year’s
performance, you are applying predictive analytics. This sort of analytics
can be regression analysis or more advanced machine learning techniques,
like decision trees, neural networks, and Naïve Bayes. Performing these
analytics require advanced to expert knowledge in statistics and data
analysis, as well as the use of tools like SPSS, R, and Weka.
Prescriptive analytics
The most advanced class of analytics is prescriptive analytics. Prescriptive
analytics gives advice and helps you take appropriate action. Where predic-
tive analytics tells us: “There is an 80% chance that one of your data scien-
In the next section, we will give you some examples of descriptive and pre-
dictive analyses in order to give you a sense of how they work. Although we
tried to keep it as simple as possible, this section will be quite statistical.
Don’t worry if you don’t fully understand everything. The next section is
included give you a sense of how some of the most commonly used analy-
ses work.
One of the first things you will notice is that males seem to score higher on
performance ratings compared to females. In order to prove that this holds
true, we can run a correlation analysis to find out whether your eyes are
playing tricks on you, or if both variables are really statistically associated
with each other.
The correlation analysis shows that gender and performance are indeed
significantly correlated with each other. In this example, the correlation
(expressed in the point biserial correlation coefficient, rpb, which is very
similar to Person’s correlation) is 0.64, which is considered a moderate cor-
relation. In other words: there is a correlation between someone’s gender
and their performance rating in this example.
A correlation of 0.64 indicates that around 41% of the variance in one vari-
able (gender) can also be found in the other variable (performance rating).
The 41% is known as the coefficient of determination, r2 (r2 = 0.642 = 0.41).
This value tells us how much of the variability in performance is shared by
the variance in gender.
If you look at the data again, you see another pattern. You see that all males
but one are senior, while the majority of females are junior. Maybe it’s not
gender that determines who performs better or worse, but the employee’s
seniority.
The take-home message: (1) correlation does not equal causation, and (2)
always look at your data a second time because you may have missed
something.
HR manager Jill has long suspected that many employees take sick days
when the weather outside is nicer – but she couldn’t prove it until she
learned about the regression analysis. Over the last ten days Jill wrote
down how many people were calling in sick, and the maximum temperature
on that specific day. Here’s what her data set looks like:
Temperature
Day C F # sick
1 10 50 8
2 15 59 7
3 18 64.4 9
4 26 78.8 15
5 31 87.8 18
6 32 89.6 20
7 29 84.2 20
In this picture, you see a scatterplot with a line, which is the line of best fit.
What does best fit mean, you ask? Pretend that all the points on the graph
are houses. We need to build a straight road and ensure that the walking
distance from each house to the road is as short as possible on average.
This way, most of the inhabitants don’t have to walk a long way to the road
and the most people will be happy. It best fits their need to be close to the
road. Similarly, if you were to draw a straight line from left to right in the
The line of best fit represents the shortest routes (shown in black
lines) from all individual data points.41 This is the regression line.
This line is also called the regression line. It’s important because it shows
how changes in one variable (e.g. temperature) can affect the other (e.g.
sick days). The formula for this line is:
Y = constant + a1 * x
a1 is the value of variable . It is possible to add multiple explanatory vari-
ables to the equation. The formula for this specific line is:
When Jill sees that next week is going to be a really hot week, she knows
that she can expect an increase in absence – and she can thus call in a few
extra employees who can cover for the absentees. This is a way to guaran-
tee continuity of business activities.
Side note: In order to build a much more accurate and reliable model, we
need more data. The problem with the current approach is that the regres-
sion line’s accuracy is tested on the same data set that was used to create
the line. That’s very much like a student who marks his/her own paper: in
order to get an objective estimation of this student’s skills you’d prefer
someone else to mark the paper. That’s why you want to test your regres-
sion line on fresh data to check the algorithm. In addition, we would want
to gather a lot more data to build a more accurate algorithm. More data is
better in this case.
A common and rather simple method of creating a predictive and even pre-
scriptive model is the decision tree. A decision tree is a tree-like model con-
sisting of decisions and their possible consequences. In a decision tree,
Let’s take a different dataset. Imagine your neighbor Paul bought the new
BMW Z4 convertible. Since he bought it, he’s taken every chance he’s got-
ten to drive his new convertible.
Given that you’d really like to have a convertible as well, you want to see
how often Paul drives the convertible with the roof off. For the first four-
teen days, you wrote down how often Paul left home with the roof off. For
the sake of this example, you also wrote down the weather forecast, tem-
perature, and humidity on a piece of paper.
NO NO
(5.0) (2.0)
90%
OUTLOOK HUMIDITY
50%
YES YES
(5.0/1.0) (2.0)
In our example, the weather outlook is the single best predictor of whether
or not Paul will leave with the roof off: outlook is the variable with the
largest information gain. All five days when the outlook was rainy, Paul
didn’t take his roof off. Four out of five times when the outlook was sunny,
Paul did take his roof off. However, when the outlook was cloudy, the out-
come was fifty-fifty. The second best predictor in this example is humidity.
This means that we have two variables that are predictive, outlook and
In other words: the weather forecast and humidity can be used fairly accu-
rately to predict whether Paul will take the roof off his convertible. The
great thing about a decision tree like this is that it clearly visualizes how the
decision sequence works. Paul only takes the roof off when it is sunny, or
when it is cloudy but humidity is low (which is often the case when the
weather is nicer).
Even though this simple example might seem very logical, it does show how
predictive analytics work. If you enrich your ‘dataset’ with the average out-
look and humidity throughout the year, you can make an estimation of the
number of days that you can ride a convertible with the roof off – and this
might help you make a better informed decision about buying a convertible
yourself. The process of using algorithms to learn from existing data to
make specific predictions about the future is called data mining. Eric Siegel
compares this to a salesperson. Positive and negative interactions teach a
salesperson which techniques work and which do not. In a similar way, pre-
dictive analytics is a process that enables organizations to learn from pre-
vious experiences (data).42
In the previous example, we wrote about decision trees. A decision tree en-
ables you to make predictions and visualizes the path an algorithm takes to
arrive at its outcome. However, there are even more advanced decision
In a forest, these decision trees do not necessarily fork first at the most
predictive attribute but split at attributes in a randomized order. This pro-
duces a large number of different decision trees that all try to predict the
chosen outcome.
Now, this means that one algorithm has a lot of different trees, which pre-
dict different outcomes. The random forest algorithm reaches its decision
by taking a majority vote between all the different decision trees. The out-
come with the most votes is most likely to happen. These ‘democratic’
forests are often more accurate compared to a single decision tree. As you
can imagine, you need much more elaborate datasets than the one we used
in the previous example to leverage the full potential of random forests.
A drawback is that it’s often quite difficult to see how the algorithm came
to its prediction because it’s the result of many different decision trees
combined. That’s why it’s (almost) impossible to visualize all these trees
and infer how the algorithm came to its prediction. This is often referred to
as a black box: we can observe its input and the output but we don’t know
the workings of the algorithm itself.
We have arrived at the last step of the HR analytics process cycle: interpre-
tation & execution. In the previous steps, we defined a question that is rele-
vant to the business, selected and cleaned the relevant data, and then ana-
lyzed it. Using the results of our analysis, we can now continue to the final
step: interpretation and execution.
Part of the first step is re-analyzing your results: Did you really find what
you were looking for? Or did you find an answer to a different question? On
top of that, did you look at the data in a smart way and take all relevant fac-
tors into consideration? If not, you should go through the analytics cycle
again and revise your analysis.
Interpreting results
The second step involves the interpretation of your results. This step goes
hand in hand with what we discussed in the previous section. Do your re-
sults answer the questions that were asked at the beginning of the analy-
sis? Often, one or more new questions pop up, which need to be answered
before the results can be accurately interpreted. By going through the peo-
ple analytics cycle again, you can answer these new questions and form a
more complete answer to your original question.
Always take a second look at your data when you stumble upon an inter-
esting finding. A prime example of this occurred when I studied innovative
behavior amongst employees within a professional service firm. This firm
had a very hierarchical organizational structure – which is not uncommon
in these kinds of firms. In this case, a large number of employees were
managed by a smaller group of firm partners.
To interpret the results in the best possible way, you should have an inti-
mate knowledge of what’s going on in the business. This is very helpful for
explaining the patterns in the data and for creating a plan to act on these
findings.
The final step is the presentation of your results. How will you sell your re-
sults to the business? Who is your audience? How will you distribute your
message to them? Moreover, how will you explain your findings?
These are all the questions you need to answer before you present your re-
sults. We already discussed this in chapter five: you need to sell the results.
The way you present and visualize your data is essential to effectively
communicate your message. An HR dashboard with information for man-
agers is usually ineffective because managers will forget about it – and thus
not use it.44 In this case, a monthly email with a nice looking report would
serve your purpose better, as this is easily opened and also acts as a re-
minder for the managers.
Lastly, you need to consider who you want to share your information with.
It is common in attrition analysis to estimate the chance that an employee
will leave. What you don’t want to happen is for a manager to, after seeing
this information, go up to an employee and ask him/her: “I see there’s an
80% chance that you will leave the company within the next twelve
months. Why?”. To avoid situations like this, companies like Hewlett
Packard extensively train a select group of managers before they give them
this information.45 Your data and insights can be very powerful, so use
them wisely.
One of the commonly heard arguments is that, if you want your findings to
really have an impact, you should relate them to the Holy Grail of people
analytics: return on investment (ROI). Often, the reasoning behind this is
that finding a financial number creates a clear and urgent message to direc-
tors and managers: investing in people efforts will earn us money. That’s
why a solid business case will greatly benefit the adoption of your findings.
Managers will love it when you come up with an ROI.
Lather, rinse, repeat is an instruction often found on shampoo and has been
coined the ‘shampoo algorithm’. When taken literally, it would produce an
endless loop that ends when the user runs out of shampoo. I wouldn’t ad-
This goes both ways. By looking more closely you’ll find details that influ-
ence how people behave and react, like how someone’s seniority influences
their innovative behavior. Yet, by taking a step back and looking at the
broader picture, you’ll discover different factors at a higher level that may
have influenced your findings; like a new CEO who set a new strategy or
recent budget cuts that had an impact on people’s behavior and attitudes.
Unfortunately, not all HR analytics projects will succeed. Some never really
get off the ground and others don’t produce tangible results. To wrap this
book up, I will briefly discuss five reasons why HR analytics projects fail.
These common pitfalls will help you be more successful in your up and com-
ing HR analytics projects.
It’s easy to get excited when you start an HR analytics project – but don’t
fall in the trap of becoming overexcited. A grand vision and high ambitions
are required to get HR analytics off the ground, but they should not apply
to the first few projects. Often companies bite off more than they can chew
and end up getting stuck in projects that are too large to manage. These
projects can take years before they are completed, cost tremendous
amounts of money, and produce results that are no longer relevant.
These wins are very important. They enable the team to learn and work to-
gether more effectively while increasing the visibility of the people analyt-
ics group throughout the organization. As some people tend to be skeptical
about people analytics, it’s important to demonstrate its value early on by
presenting these short-term wins.
The take-home here is to keep the project as agile as possible. Part of this is
not focusing on the full organization but just on a group of key employees. If
you want to improve customer satisfaction, you can focus on the entire
company – or just on the front-office personnel. The latter group is much
A second trap, which may be just as common as the first one, is a lack of rel-
evance to the business. It’s not uncommon for an analytics project to focus
on an interesting topic that doesn’t actually add value to the business.
A good rule of thumb is to focus on one of the top three business priorities
of the CEO. The CEO is not concerned about the number of employees he
has or about the latest engagement scores. He’s concerned about whether
he has the right people with the right skills to execute the company’s strat-
egy, and he wants to know how he can increase his revenue while minimiz-
ing costs. Only by focusing on a top business priority will HR analytics pro-
vide tangible value.
It’s not uncommon for HR to discover that they cannot gain access to email
or social network data, or fail to gain access to individual employee survey
data because the employees were promised full anonymity. Involving com-
pliancy early in the project will increase the chances of a project’s success,
and prevent the investment of time and resources on projects that were
doomed to fail from the start.
4. Bad data
A fourth reason why HR projects fail is bad and messy data. It’s commonly
known that HR data is not the most pristine: unlike finance, the numbers
never need to add up perfectly. It’s not rare for things like function or de-
partment names to be mislabeled or abbreviated in different ways. In addi-
tion, there are often messy records of promotions and previous functions
within the same company, if at all, which makes it hard to track employment
history.
Bad data can make a project fail in two major ways. Firstly, the analysis can
become distorted when data is mislabeled; e.g., one job type could be ana-
lyzed as two different jobs due to a typo. As the saying goes “garbage in,
garbage out” – which means that poor quality of input always produces er-
roneous output.
Secondly, cleaning the data is a very time-consuming process and can take
months or even years. Large organizations frequently use different soft-
ware systems in different countries and use different data (entry) proce-
dures between those countries. Add cultural differences to the mix, on top-
ics like performance, promotion policies, and training, and you run the risk
of comparing apples to oranges. Especially in these situations, it’s excep-
For example, it’s very hard (if not impossible) to change some things, like an
employee’s sex or age. These variables are interesting and should be in-
cluded in an analysis as control variables, but they cannot easily be manipu-
lated (i.e., you cannot change sex). Other attributes, like engagement, can
be influenced by various interventions. It’s therefore much more useful to
see how engagement levels impact bottom line performance than to see
how sex impacts turnover intentions.
HR analytics is still a novel approach for a lot of companies and its projects
are therefore prone to failure. By focusing on top business priorities, by in-
cluding compliancy early on and by planning quick wins, an HR analytics
project can greatly improve its chance of success. The quick wins are cru-
cial because they force the project team to define a specific question whose
answer doesn’t require huge amounts of data (cleaning), yet also boosts the
team’s morale and visibility within the organization.
This book describes the basic principles of people analytics. My aim for this
book was to convince you, the reader, that working in a more data-driven
way offers great value to both the Human Resource department and the
company as a whole. Moreover, making decisions in a more data-driven
way increases the potential of having better business outcomes.
This is also what we strive to do at AIHR. On the back of our business cards
there is a quote by William Edwards Deming. Deming was a famous Ameri-
can mathematician and statistician who helped spur the Japanese post-war
economic miracle of the 1950s and 1960s whereby Japan rose to have the
world’s second largest economy. Renowned for his work on the plan-do-
check-act iterative management method, which formed the basis of the
lean manufacturing method, Deming famously said:
1 Definition by Gal, U., Jensen, T. B., & Stein, M. K. (2017). People Analytics
in the Age of Big Data: An Agenda for IS Research.
2 To read more about how Google manages people using data, check Bock,
L. (2015). Work rules!: Insights from inside Google that will transform how
you live and lead (First edition.). New York: Twelve.
3 Quartz (2013, May 1). Bloomberg’s culture is all about omniscience, down to
from http://www.business.com/management-theory/human-relations-
management-theory-key-terms/
nooCampus, Leuven.
12 Sundmark (2016). People Analytics – An Example Using R. Retrieved from
https://www.linkedin.com/pulse/people-analytics-example-using-r-lyndon-
sundmark
13 Heuvel & Bondarouk (2016). The Rise (and Fall) of HR Analytics: A Study
into the Future Applications, Value, Structure, and System Support. Retrieved
from http://doc.utwente.nl/99593/1/Van%20den%20Heuvel%20Bon-
darouk%202016%20HRIC%20Sidney%20-%20Metis.pdf
14 Examples from Eric Siegel’s book Predictive Analytics (2013). Eric Siegel
(2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or
Die. John Wiley & Sons, Inc.
15 NY Times (2008, June 4). UN says solving food crisis could cost $30 billion.
18 The Guardian (2004, July 12). Muslim names harm job chances. Retrieved
from https://www.theguardian.com/money/2004/jul/12/discriminationat-
work.workandcareers
19 Steve McConnell (2011, January 9). 10x Software Development. Retrieved
from http://web.archive.org/web/20130327120705/http:/forums.con-
strux.com:80/blogs/stevemcc/archive/2011/01/09/origins-of-10x-how-
valid-is-the-underlying-research.aspx
20 The war for talent was coined by Steven Hankin of McKinsey and Com-
Press. Note: The Ulrich model has since 1997 evolved into a much more
comprehensive model, e.g. Ulrich, D., Kryscynski, D., Ulrich, M., Brockbank,
W. (2017) Victory Through Organization.
23 The best example is a study published in Science in which researches at-
werknemer zonder aandoening vrijwel even hoog als van jongere. https://
www.cbs.nl/nl-nl/nieuws/2014/27/ziekteverzuim-oudere-werknemer-
zonder-aandoening-vrijwel-even-hoog-als-van-jongere [Dutch].
25 Wayne Cascio and John Boudreau (2010). Investing in people: Financial
study into the future applications, value, structure, and system support. Re-
trieved from https://research.utwente.nl/en/publications/the-rise-and-fall-
of-hr-analytics-a-study-into-the-future-applica. Note: The study is a few
years old and since there’s been a lot of discussion about best practices for
the analytics leader that helps to solve the problems that van de Heuvel
and Bondarouk so eloquently describe.
http://www.haygroup.com/nl/Press/Details.aspx?ID=37385 [Dutch].
29 Rasmussen, T., & Ulrich, D. (2015). Learning from practice: how HR ana-
naar kantoor en altijd de trapleuning vasthouden (transl. Why Shell people are
not allowed to take their baby to the office and always hold on to the rail-
ing). Retrieved from https://decorrespondent.nl/4907/waarom-shell-
mensen-hun-baby-niet-mogen-meenemen-naar-kantoor-en-altijd-de-
trapleuning-vasthouden/540795563-a4c4dec5
35 Ulrich, D., & Dulebohn, J. H. (2015). Are we there yet? What's next for
ten-laws-of-hockey-analytics/.
39 Tim Harford (2014, March 28). Big data: are we making a big mistake? Re-
the individual data points is squared in order to achieve the best fit (by
squaring this line, longer distances between the line and the points are pe-
nalized). The line with the least squares is the line that fits the model best.
42 Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click,