120 Data Science Interview Questions
120 Data Science Interview Questions
120 Data Science Interview Questions
mjs01776@gmail.com
120
DATA SCIENCE
INTERVIEW QUESTIONS
To help readers with these goals, we’ve gathered 120 interview questions in product metrics, pro-
gramming and databases, probability, experimentation and inference, data analysis, and predictive
modeling. These questions are all either real data science interview questions or inspired by real data
science interview questions, and should help readers develop the skills needed to succeed in a data
science role.
The role of a data scientist is highly malleable and company dependent. However, the general skillset
needed is similar. Candidates need:
• Technical skills - data analysis and programming
• Business/product intuition - metrics and identifying opportunities for impact
• Communication ability - clarity in explaining findings and insights
To prepare for your interview, you may want to brush up by reviewing some probability, data anal-
ysis, SQL, coding, and experimental design. The questions in this guide should help you do so. The
background of data science applicants varies wildly, so interviews may generally be more holistic and
test your intuition, analytic, and communication abilities rather than focusing on specific technical
concepts.
Prepare to discuss your past work involving analyzing large and complicated datasets, defending
your approaches and communicating what you learned during your project. Expect questions in-
volving how to measure “goodness” of a feature on the company’s product, and be sure to approach
these problems in a scientific and principled way. You have a good chance of getting a product
metrics or experimentation question based on some actual questions the company is tackling at this
time.
Check up on your company’s engineering / data blog and see if anything’s relevant. Be familiar with
A/B testing and common metrics that companies similar to the one you are interviewing for may
use. Brush up on your Python (especially iPython notebook) and/or R abilities to prepare for a po-
tential live data analysis problem.
And finally, of course, follow the general interview advice. Prepare to elaborate on related proj-
ects from your resume. Be enthusiastic. Share your thoughts with your interviewer as you’re going
through a problem or doing a piece of analysis. And be sure to answer the question!
Please feel free to reach out to us with questions, comments and suggestions at www.datasciencehandbook.me
2
CONTENTS
PREDICTIVE MODELING 4
PROGRAMMING 6
PROBABILITY 8
STATISTICAL INFERENCE 11
DATA ANALYSIS 13
PRODUCT METRICS 16
COMMUNICATION 18
10 How could you collect and analyze data to use social me-
dia to predict the weather?
12 How would you design the people you may know feature
on LinkedIn or Facebook?
15 In a search engine, given partial data on what the user has PRO TIP
typed, how would you predict the user’s eventual search Variations on ordinary linear re-
query? gression can help address some
problems that come up work-
16 Given a database of all previous alumni donations to your ing with read data. LASSO helps
when you have too many pre-
university, how would you predict which recent alumni are
dictors by favoring weights of
most likely to donate?
zero. Ridge regression can help
with reducing the variance of
17 You’re Uber and you want to design a heatmap to recom- your weights and predictions
mend to drivers where to wait for a passenger. How would by shrinking the weights to 0.
you approach this? Least absolute deviations or ro-
bust linear regression can help
18 How would you build a model to predict a March Mad- when you have outliers. Logis-
ness bracket? tic regression is used for binary
outcomes, and Poisson regres-
19 You want to run a regression to predict the probability sion can be used to model count
of a flight delay, but there are flights with delays of up to data.
12 hours that are really messing up your model. How can
you address this?
1 See http://en.wikipedia.org/wiki/Knapsack_problem
2 See http://en.wikipedia.org/wiki/Travelling_salesman_problem
4 How can you get a fair coin toss if someone hands you a
coin that is weighted to come up heads more often than
tails?
11 You call 2 UberX’s and 3 Lyfts. If the time that each takes
to reach you is IID, what is the probability that all the Ly-
fts arrive first? What is the probability that all the UberX’s
arrive first?
18 You have two coins, one of which is fair and comes up heads
with a probability 1/2, and the other which is biased and comes
up heads with probability 3/4. You randomly pick coin and flip it
twice, and get heads both times. What is the probability that you
picked the fair coin?
6 How would you run an A/B test for many variants, say 20
or more?
6 How can you make sure that you don’t analyze something
that ends up meaningless?
16 Given that you have wifi data in your office, how would
you determine which rooms and areas are underutilized
and overutilized?
17 How could you use GPS data from a car to determine the
quality of a driver?
PRO TIP
If asked to analyze a dataset
during the interview, the inter-
18 Given accelerometer, altitude, and fuel usage data from a viewer is looking to learn about
car, how would you determine the optimum acceleration your comfort with your statisti-
pattern to drive over hills? cal software and your ability to
generate interesting insights in
19 Given position data of NBA players in a season’s games, a short period of time. We rec-
how would you evaluate a basketball player’s defensive ommend making visualizations
ability? first, to show that you know
good practices, prevent future
20 How would you quantify the influence of a Twitter user? missteps, and identify possible
transformations needed. Be sure
21 Given location data of golf balls in games, how would to talk about your procedure
construct a model that can advise golfers where to aim? and anticipate questions about
your approach.
22 You have 100 mathletes and 100 math problems. Each
mathlete gets to choose 10 problems to solve. Given data
on who got what problem correct, how would you rank
the problems in terms of difficulty?
12 You are on the data science team at Uber and you are
asked to start thinking about surge pricing. What would PRO TIP
be the objectives of such a product and how would you Interviewers are looking for can-
start looking into this? didates who have strong intu-
ition about metrics for success.
13 Say that you are Netflix. How would you determine what You should give many possible
metrics, each a bit more specif-
original series you should invest in and create?
ic than the previous. The inter-
viewer may stop and ask you to
14 What kind of services would find churn (metric that tracks
elaborate or describe how you
how many customers leave the service) helpful? How would collect or visualize the
would you calculate churn? data. Prepare to justify why the
metric is important, relevant,
15 Let’s say that you’re are scheduling content for a content and measurable.
provider on television. How would you determine the best
times to schedule content?