Data Analyst Interview Questions and Answers

15 Data Analyst Interview Questions
and Answers
Written by Coursera • Updated on Apr 5, 2022
Share
Walk into your data analyst interview with confidence by preparing with these 15 interview
questions.
If you’re like many people, the job interview can be one of the most
intimidating parts of the job search process. But it doesn’t have to be. With a
bit of advanced preparation, you can walk into your data analyst interview
feeling calm and confident.
In this article, we’ll go over some of the most common interview questions
you’re likely to encounter as you apply for an entry-level data analyst
position. We’ll walk through what the interviewer is looking for and how best
to answer each question. Finally, we’ll cover some tips and best practices for
interviewing success. Let’s get started.
General data analyst interview questions

These questions cover data analysis from a high level and are more likely to
show up early in an interview.
1. Tell me about yourself.

What they’re really asking: What makes you the right fit for this job?
This question can sound broad and open ended, but it’s really about your
relationship with data analytics. Keep your answer focused on your journey
toward becoming a data analyst. What sparked your interest in the field?
What data analyst skills do you bring from previous jobs or coursework?
As you formulate your answer, try to answer these three questions:

 What excites you about data analysis?
 What excites you about this role?
 What makes you the best candidate for the job?
Interviewer might also ask:

 What made you want to become a data analyst?
 What brought you here?
 How would you describe yourself as a data analyst?
2. What do data analysts do?

What they’re really asking: Do you understand the role and its value to the
company?
If you’re applying for a job as a data analyst, you likely know the basics
of what data analysts do. Go beyond a simple dictionary definition to
demonstrate your understanding of the role and its importance.
Outline the main tasks of a data analyst: identify, collect, clean, analyze, and
interpret. Talk about how these tasks can lead to better business decisions,
and be ready to explain the value of data-driven decision making.

 What is the process of data analysis?
 What steps do you take to solve a business problem?
 What is your process when you start a new project?
3. What was your most successful/most challenging

data analysis project?
What they’re really asking: What are your strengths and weaknesses?
When an interviewer asks you this type of question, they’re often looking to
evaluate your strengths and weaknesses as a data analyst. How do you
overcome challenges, and how do you measure the success of a data
project?
Getting asked about a project you’re proud of is your chance to highlight

your skills and strengths. Do this by discussing your role in the project and
what made it so successful. As you prepare your answer, take a look at the
original job description. See if you can incorporate some of the skills and
requirements listed.
If you get asked the negative version of the question (least successful or
most challenging project), be honest as you focus your answer on lessons
learned. Identify what went wrong—maybe your data was incomplete or your
sample size was too small—and talk about what you’d do differently in the
future to correct the error. We’re human, and mistakes are a part of life.
What’s important here is your ability to learn from them.

 Walk me through your portfolio.
 What is your greatest strength as a data analyst? How about your greatest
weakness?
 Tell me about a data problem that challenged you.
4. What’s the largest data set you’ve worked with?

What they’re really asking: Can you handle large data sets?
Many businesses have more data at their disposal than ever before. Hiring
managers want to know that you can work with large, complex data sets.
Focus your answer on the size and type of data. How many entries and
variables did you work with? What types of data were in the set?
The experience you highlight doesn't have to come from a job. You’ll often
have the chance to work with data sets of varying sizes and types as a part
of a data analysis course, bootcamp, certificate program, or degree. As you
put together a portfolio, you may also complete some independent projects
where you find and analyze a data set. All of this is valid material to build
your answer.

 What type of data have you worked with in the past?
Data analysis process questions

The work of a data analyst involves a range of tasks and skills. Interviewers
will likely ask questions specific to various parts of the data analysis process
to evaluate how well you perform each step.
5. Explain how you would estimate … ?

What they’re really asking: What’s your thought process? Are you an
analytical thinker?
With this type of question (sometimes called a guesstimate), the interviewer

presents you with a problem to solve. How would you estimate the best
month to offer a discount on shoes? How would you estimate the weekly
profit of your favorite restaurant?
The purpose here is to evaluate your ability to problem solve and your
overall comfort working with numbers. Since this is about how you think,
think out loud as you work through your answer.
 What types of data would you need?
 Where might you find that data?
 Once you have the data, how would you use it to calculate an estimate?
6. What is your process for cleaning data?

What they’re really asking: How do you handle missing data, outliers,
duplicate data, etc.?
As a data analyst, data preparation, also known as data cleaning or data

cleansing, will often account for a majority of your time. A potential employer
is going to want to know that you’re familiar with the process and why it’s
important.
In your answer, give a short description of what data cleaning is and why it’s
important to the overall process. Then walk through the steps you typically
take to clean a data set. Consider mentioning how you handle:
 Missing data
 Duplicate data
 Data from different sources
 Structural errors
 Outliers

 How do you deal with messy data?
 What is data cleaning?
7. How do you explain technical concepts to a non-

technical audience?
What they’re really asking: How are your communication skills?
While the ability to draw insights from data is a critical skill for a data analyst,
being able to communicate those insights to stakeholders, management,
and non-technical co-workers is just as important.
Your answer should include the types of audiences you’ve presented to in
the past (size, background, context). If you don’t have a lot of experience
presenting, you can still talk about how you’d present data findings
differently depending on the audience.

 What is your experience conducting presentations?
 Why are communication skills important to a data analyst?
 How do you present your findings to management?
Tip: In some cases, your interviewer might not be involved in data analysis. The entire interview,
then, is an opportunity to demonstrate your ability to communicate clearly. Consider practicing your
answers on a non-technical friend or family member.
8. Tell me about a time when you got unexpected

results.
What they’re really asking: Do you let the data or your expectations drive
your analysis?
Effective data analysts let the data tell the story. After all, data-driven
decisions are based on the facts rather than intuition or gut feelings. When
asking this question, an interviewer might be trying to determine:
 How you validate results to ensure accuracy
 How you overcome selection bias
 If you’re able to find new business opportunities in surprising results
Be sure to describe the situation that surprised you, as well as what you
learned from it. This is your opportunity to demonstrate your natural curiosity
and excitement to learn new things from data.
9. How would you go about measuring the

performance of our company?
What they’re really asking: Have you done your research?
Before your interview, be sure to do some research on the company, its
business goals, and the larger industry. Think about the types of business
problems that could be solved through data analysis, and what types of data
you’d need to perform that analysis. Read up on how data is used in the
industry and by competitors.
Show that you can be business-minded by tying this back to the business.
How would this analysis bring value to the company?
Technical skill questions

Interviewers will be looking for candidates who can leverage a wide range
of technical data analyst skills. These questions are geared toward
evaluating your competency across several skills.
10. What data analytics software are you familiar

with?
What they’re really asking: Do you have basic competency with common
tools? How much training will you need?
This is a good time to revisit the job listing to look for any software
emphasized in the description. As you answer, explain how you’ve used that
software (or something similar) in the past. Show your familiarity with the tool
by using associated terminology.
Mention software solutions you’ve used for various stages of the data
analysis process. You don’t need to go into great detail here. What you used
and what you used it for should suffice.

 What data software have you used in the past?
 What data analytics software are you trained in?
Tip: Gain experience with data analytics software through a Guided Project on Coursera. Get
hands-on learning in under two hours, without having to download or purchase software. You’ll be
ready with something to talk about during your next interview for analysis tools like:
R
Power BI Desktop
Python
Google Sheets
Tableau
Microsoft Excel
MySQL
11. What scripting languages are you trained in?

As a data analyst, you’ll more than likely have to use both SQL and a
statistical programming language like R or Python. If you’re already familiar
with the language of choice at the company you’re applying to, great. If not,
you can take this time to show enthusiasm for learning. Point out that your
experience with one (or more) languages has set you up for success in
learning new ones. Talk about how you’re currently growing your skills.

 What functions in SQL do you like most?
 Do you prefer R or Python?
Five SQL interview questions for data analysts
Knowledge of SQL is one of the most important skills you can have as a data analyst. Many
interviews for data analyst jobs include an SQL screening where you’ll be asked to write code on a
computer or whiteboard. Here are five SQL questions and tasks to prepare for:
1. Create an SQL query: Be ready to use JOIN and COUNT functions to show a query result from
a given database.
2. Describe an SQL query: Given an SQL query, explain what data is being retrieved.
3. Modify a database: Insert new rows, modify existing records, or permanently delete records
from a database.
4. Debug a query: Correct the errors in an existing query to make it functional.
5. Define an SQL term: Understand what terms like foreign and primary key, truncate, drop, union,
union all, and left join and inner join mean (and when you’d use them).
Learn more: 5 SQL Certifications for Your Data Career
12. What statistical methods have you used in data

analysis?
What they’re really asking: Do you have basic statistical knowledge?
Most entry-level data analyst roles will require at least a basic competency in
statistics, as well as an understanding of how statistical analysis ties into
business goals. List the types of statistical calculations you’ve used in the
past and what business insights those calculations yielded.
If you’ve ever worked with or created statistical models, be sure to mention

that as well. If you’re not already, familiarize yourself with the following
statistical concepts:
 Mean
 Standard deviation
 Variance
 Regression
 Sample size
 Descriptive and inferential statistics

 What is your knowledge of statistics?
 How have you used statistics in your work as a data analyst?
13. How have you used Excel for data analysis in the
past?
Spreadsheets rank among the most common tools used by data analysts.
It’s common for interviews to include one or more questions meant to gauge
your skill working with data in Microsoft Excel.
Five Excel interview questions for data analysts
Here are five more questions specific to Excel that you might be asked during your interview:
1. What is a VLOOKUP, and what are its limitations?
2. What is a pivot table, and how do you make one?
3. How do you find and remove duplicate data?
4. What are INDEX and MATCH functions, and how do they work together?
5. What’s the difference between a function and a formula?
Need a quick refresher before your interview? Get a hands-on walkthrough
of important functions and techniques in under 90 minutes with the Problem
Solving Using Microsoft Excel.
14. Explain the term…

What they’re really asking: Are you familiar with the terminology of data
analytics?
Throughout your interview, you may be asked to define a term or explain

what it means. In most cases, the interviewer is trying to determine how well
you know the field, as well as how effective you are at communicating
technical concepts in simple terms. While it’s impossible to know what exact
terms you may be asked about, here are a few you should be familiar with:
 Normal distribution
 Data wrangling
 KNN imputation method
 Clustering
 Outlier
 N-grams
15. Can you describe the difference between … ?

Similar to the last type of question, these interview questions help determine
your knowledge of analytics concepts by asking you to compare two related
terms. Some pairs you might want to be familiar with include:
 Data mining vs. data profiling
 Quantitative vs. qualitative data
 Variance vs. covariance
 Univariate vs. bivariate vs. multivariate analysis
 Clustered vs. non-clustered index
 1-sample T-test vs. 2-sample T-test in SQL
 Joining vs. blending in Tableau
Data Analyst Interview Questions: Basic
This section of questions will consist of all the basic questions that you need to know
related to Data Analytics and its terminologies.
Q1. What is the difference between Data Mining and Data Analysis?
Data Mining Data Analysis
Used to order & organize raw data in a meaningful

Used to recognize patterns in data stored.
manner.
Mining is performed on clean and well- The analysis of data involves Data Cleaning. So, data
documented data. is not present in a well-documented format.
Results extracted from data mining are not easy Results extracted from data analysis are easy to
to interpret. interpret.
Table 1: Data Mining vs Data Analysis – Data Analyst Interview Questions
So, if you have to summarize, Data Mining is often used to identify patterns in the
data stored. It is mostly used for Machine Learning, and analysts have to just
recognize the patterns with the help of algorithms. Whereas, Data Analysis is used to
gather insights from raw data, which has to be cleaned and organized before
performing the analysis.
Q2. What is the process of Data Analysis?
Data analysis is the process of collecting, cleansing, interpreting, transforming and

modeling data to gather insights and generate reports to gain business profits. Refer
to the image below to know the various steps involved in the process.
Fig 1: Process of Data Analysis – Data Analyst Interview Questions
 Collect Data: The data gets collected from various sources and is stored so that it
can be cleaned and prepared. In this step, all the missing values and outliers are
removed.
 Analyse Data: Once the data is ready, the next step is to analyze the data. A model
is run repeatedly for improvements. Then, the mode is validated to check whether it
meets the business requirements.
 Create Reports: Finally, the model is implemented and then reports thus generated
are passed onto the stakeholders.
Q3. What is the difference between Data Mining and Data Profiling?
Data Mining: Data Mining refers to the analysis of data with respect to finding
relations that have not been discovered earlier. It mainly focuses on the detection of
unusual records, dependencies and cluster analysis.
Data Profiling: Data Profiling refers to the process of analyzing individual attributes

of data. It mainly focuses on providing valuable information on data attributes such
as data type, frequency etc.
Q4. What is data cleansing and what are the best ways to practice
data cleansing?
Data Cleansing or Wrangling or Data Cleaning. All mean the same thing. It is the
process of identifying and removing errors to enhance the quality of data. You can
refer to the below image to know the various ways to deal with missing data.
Fig 2: Ways of Data Cleansing – Data Analyst Interview Questions
Q5. What are the important steps in the data validation process?
As the name suggests Data Validation is the process of validating data. This step
mainly has two processes involved in it. These are Data Screening and Data
Verification.
 Data Screening: Different kinds of algorithms are used in this step to screen the
entire data to find out any inaccurate values.
 Data Verification: Each and every suspected value is evaluated on various use-
cases, and then a final decision is taken on whether the value has to be included in
the data or not.
Q6. What do you think are the criteria to say whether a developed data
model is good or not?
Well, the answer to this question may vary from person to person. But below are a
few criteria which I think are a must to be considered to decide whether a developed
data model is good or not:
 A model developed for the dataset should have predictable performance. This is
required to predict the future.
 A model is said to be a good model if it can easily adapt to changes according to
business requirements.
 If the data gets changed, the model should be able to scale according to the data.
 The model developed should also be able to easily consumed by the clients for
actionable and profitable results.
Q7. When do you think you should retrain a model? Is it dependent

on the data?
Business data keeps changing on a day-to-day basis, but the format doesn’t change.
As and when a business operation enters a new market, sees a sudden rise of
opposition or sees its own position rising or falling, it is recommended to retrain the
model. So, as and when the business dynamics change, it is recommended to
retrain the model with the changing behaviors of customers.
Q8. Can you mention a few problems that data analyst usually
encounter while performing the analysis?
The following are a few problems that are usually encountered while performing data
analysis.
 Presence of Duplicate entries and spelling mistakes, reduce data quality.

 If you are extracting data from a poor source, then this could be a problem as you
would have to spend a lot of time cleaning the data.
 When you extract data from sources, the data may vary in representation. Now,
when you combine data from these sources, it may happen that the variation in
representation could result in a delay.
 Lastly, if there is incomplete data, then that could be a problem to perform analysis of
data.
Q9. What is the KNN imputation method?
This method is used to impute the missing attribute values which are imputed by the
attribute values that are most similar to the attribute whose values are missing. The
similarity of the two attributes is determined by using the distance functions.
Q10. Mention the name of the framework developed by Apache for

processing large dataset for an application in a distributed computing
environment?
The complete Hadoop Ecosystem was developed for processing large dataset for an
application in a distributed computing environment. The Hadoop Ecosystem consists
of the following Hadoop components.
 HDFS -> Hadoop Distributed File System

 YARN -> Yet Another Resource Negotiator
 MapReduce -> Data processing using programming
 Spark -> In-memory Data Processing
 PIG, HIVE-> Data Processing Services using Query (SQL-like)
 HBase -> NoSQL Database
 Mahout, Spark MLlib -> Machine Learning
 Apache Drill -> SQL on Hadoop
 Zookeeper -> Managing Cluster
 Oozie -> Job Scheduling
 Flume, Sqoop -> Data Ingesting Services
 Solr & Lucene -> Searching & Indexing
 Ambari -> Provision, Monitor and Maintain cluster
Now, moving on to the next set of questions, which is the Excel Interview Questions.
Data Analyst Interview Questions:
Excel
Microsoft Excel is one of the simplest and most powerful software applications
available out there. It lets users do quantitative analysis, statistical analysis with an
intuitive interface for data manipulation, so much so that its usage spans across
different domains and professional requirements. This is an important field that gives
a head-start for becoming a Data Analyst. So, now let us quickly discuss the
questions asked with respect to this topic.
Q1. Can you tell what is a waterfall chart and when do we use it?
The waterfall chart shows both positive and negative values which lead to the final
result value. For example, if you are analyzing a company’s net income, then you
can have all the cost values in this chart. With such kind of a chart, you can visually,
see how the value from revenue to the net income is obtained when all the costs are
deducted.
Q2. How can you highlight cells with negative values in Excel?
You can highlight cells with negative values in Excel by using the conditional
formatting. Below are the steps that you can follow:
 Select the cells which you want to highlight with the negative values.
 Go to the Home tab and click on the Conditional Formatting option
 Go to the Highlight Cell Rules and click on the Less Than option.
 In the dialog box of Less Than, specify the value as 0.
Fig 3: Snapshot of Highlighting cells in Excel – Data Analyst Interview Questions
Q3. How can you clear all the formatting without actually removing the cell
contents?
Sometimes you may want to remove all the formatting and just want to have the
basic/simple data. To do this, you can use the ‘Clear Formats’ options found in the
Home Tab. You can evidently see the option when you click on the ‘Clear’ drop
down.
Fig 4: Snapshot of clearing all formatting in Excel – Data Analyst Interview
Questions
Q4. What is a Pivot Table, and what are the different sections of a Pivot
Table?
A Pivot Table is a simple feature in Microsoft Excel which allows you to quickly
summarize huge datasets. It is really easy to use as it requires dragging and
dropping rows/columns headers to create reports.
A Pivot table is made up of four different sections:
 Values Area: Values are reported in this area

 Rows Area: The headings which are present on the left of the values.
 Column Area: The headings at the top of the values area makes the columns area.
 Filter Area: This is an optional filter used to drill down in the data set.
Q5. Can you make a Pivot Table from multiple tables?
Yes, we can create one Pivot Table from multiple different tables when there is a
connection between these tables.
Q6. How can we select all blank cells in Excel?
If you wish to select all the blank cells in Excel, then you can use the Go To Special
Dialog Box in Excel. Below are the steps that you can follow to select all the blank
cells in Excel.
 First, select the entire dataset and press F5. This will open a Go To Dialog Box.
 Click the ‘Special‘ button which will open a Go To special Dialog box.
 After that, select the Blanks and click on OK.
The final step will select all the blank cells in your dataset.
Q7. What are the most common questions you should ask a client before
creating a dashboard?
Well, the answer to this question varies on a case-to-case basis. But, here are a few
common questions that you can ask while creating a dashboard in Excel.
 Purpose of the Dashboards

 Different data sources
 Usage of the Excel Dashboard
 The frequency at which the dashboard needs to be updated
 The version of Office the client uses.
Q8. What is a Print Area and how can you set it in Excel?
A Print Area in Excel is a range of cells that you designate to print whenever you
print that worksheet. For example, if you just want to print the first 20 rows from the
entire worksheet, then you can set the first 20 rows as the Print Area.
Now, to set the Print Area in Excel, you can follow the below steps:
 Select the cells for which you want to set the Print Area.
 Then, click on the Page Layout Tab.
 Click on Print Area.
 Click on Set Print Area.
Q9. What steps can you take to handle slow Excel workbooks?
Well, there are various ways to handle slow Excel workbooks. But, here are a few
ways in which you can handle workbooks.
 Try using manual calculation mode.

 Maintain all the referenced data in a single sheet.
 Often use excel tables and named ranges.
 Use Helper columns instead of array formulas.
 Try to avoid using entire rows or columns in references.
 Convert all the unused formulas to values.
Q10. Can you sort multiple columns at one time?
Multiple sorting refers to the sorting of a column and then sorting the other column by
keeping the first column intact. In Excel, you can definitely sort multiple columns at a
one time.
To do multiple sorting, you need to use the Sort Dialog Box. Now, to get this, you
can select the data that you want to sort and then click on the Data Tab. After that,
click on the Sort icon.
In this Dialog box, you can specify the details for one column, and then sort to
another column, by clicking on the Add Level button.
Moving onto the next set of questions, which is questions asked related to Statistics.
Data Analyst Interview Questions: Statistics

Statistics is a branch of mathematics dealing with data collection and organization,
analysis, interpretation, and presentation. Statistics can be divided into two
categories: Differential and Inferential Statistics. This field is related to mathematics
and thus gives a kickstart to Data Analysis career.
Q1. What do you understand by the term Normal Distribution?
This is one of the most important and widely used distributions in statistics.
Commonly known as the Bell Curve or Gaussian curve, normal distributions,
measure how much values can differ in their means and in their standard deviations.
Refer to the below image.
Fig 5: Normal Distribution – Data Analyst Interview Questions
As you can see in the above image, data is usually distributed around a central value
without any bias to the left or right side. Also, the random variables are distributed in
the form of a symmetrical bell-shaped curve.
Q2. What is A/B Testing?
A/B testing is the statistical hypothesis testing for a randomized experiment with two
variables A and B. Also known as the split testing, it is an analytical method that
estimates population parameters based on sample statistics. This test compares two
web pages by showing two variants A and B, to a similar number of visitors, and the
variant which gives better conversion rate wins.
The goal of A/B Testing is to identify if there are any changes to the web page. For
example, if you have a banner ad on which you have spent an ample amount of
money. Then, you can find out the return of investment i.e. the click rate through the
banner ad.
Q3. What is the statistical power of sensitivity?
The statistical power of sensitivity is used to validate the accuracy of a classifier.

This classifier can be either Logistic Regression, Support Vector Machine, Random
Forest etc.
If I have to define sensitivity, then sensitivity is nothing but the ratio of Predicted True
Events to Total Events. Now, True Events are the events which were true and the
model also predicts them as true.
Fig 6: Seasonality Formula – Data Analyst Interview Questions
Q4. What is the Alternative Hypothesis?
To explain the Alternative Hypothesis, you can first explain what the null hypothesis
is. Null Hypothesis is a statistical phenomenon that is used to test for possible
rejection under the assumption that result of chance would be true.
After this, you can say that the alternative hypothesis is again a statistical
phenomenon which is contrary to the Null Hypothesis. Usually, it is considered that
the observations are a result of an effect with some chance of variation.
Q5. What is the difference between univariate, bivariate and
multivariate analysis?
The differences between univariate, bivariate and multivariate analysis are as

follows:
 Univariate: A descriptive statistical technique that can be differentiated based on the

count of variables involved at a given instance of time.
 Bivariate: This analysis is used to find the difference between two variables at a
time.
 Multivariate: The study of more than two variables is nothing but multivariate
analysis. This analysis is used to understand the effect of variables on the
responses.
Q6. Can you tell me what are Eigenvectors and Eigenvalues?
Eigenvectors: Eigenvectors are basically used to understand linear transformations.

These are calculated for a correlation or a covariance matrix.
For definition purposes, you can say that Eigenvectors are the directions along which
a specific linear transformation acts either by flipping, compressing or stretching.
Eigenvalue: Eigenvalues can be referred to as the strength of the transformation or

the factor by which the compression occurs in the direction of eigenvectors.
Q7. What is the difference between 1-Sample T-test, and 2-Sample T-

test?
You can answer this question, by first explaining, what exactly T-tests are. Refer
below for an explanation of T-Test.
T-Tests are a type of hypothesis tests, by which you can compare means. Each test
that you perform on your sample data, brings down your sample data to a single
value i.e. T-value. Refer below for the formula.
Fig 7: Formula to calculate t-value – Data Analyst Interview Questions
Now, to explain this formula, you can use the analogy of the signal-to-noise ratio,
since the formula is in a ratio format.
Here, the numerator would be a signal and the denominator would be the noise.
So, to calculate 1-Sample T-test, you have to subtract the null hypothesis value from
the sample mean. If your sample mean is equal to 7 and the null hypothesis value is
2, then the signal would be equal to 5.
So, we can say that the difference between the sample mean and the null hypothesis
is directly proportional to the strength of the signal.
Now, if you observe the denominator which is the noise, in our case it is the measure
of variability known as the standard error of the mean. So, this basically indicates
how accurately your sample estimates the mean of the population or your complete
dataset.
So, you can consider that noise is indirectly proportional to the precision of the
sample.
Data Analytics Masters Program
Explore Curriculum
Now, the ratio between the signal-to-noise is how you can calculate the T-Test 1. So,
you can see how distinguishable your signal is from the noise.
To calculate, 2-Sample Test, you need to find out the ratio between the difference of
the two samples to the null hypothesis.
So, if I have to summarize for you, the 1-Sample T-test determines how a sample set
holds against a mean, while the 2-Sample T-test determines if the mean between 2
sample sets is really significant for the entire population or purely by chance.
Q8. What are different types of Hypothesis Testing?
The different types of hypothesis testing are as follows:
 T-test: T-test is used when the standard deviation is unknown and the sample size is
comparatively small.
 Chi-Square Test for Independence: These tests are used to find out the
significance of the association between categorical variables in the population
sample.
 Analysis of Variance (ANOVA): This kind of hypothesis testing is used to analyze
differences between the means in various groups. This test is often used similarly to
a T-test but, is used for more than two groups.
 Welch’s T-test: This test is used to find out the test for equality of means between
two population samples.
Q9. How to represent a Bayesian Network in the form of Markov

Random Fields (MRF)?
To represent a Bayesian Network in the form of Markov Random Fields, you can
consider the following examples:
Consider two variables which are connected through an edge in a Bayesian network,
then we can have a probability distribution that factorizes into a probability of A and
then the probability of B. Whereas, the same network if we mention in Markov
Random Field, it would be represented as a single potential function. Refer below:
Fig 7: Representation of Bayesian Network in MRF – Data Analyst Interview Questions
Well, that was a simple example to start with. Now, moving onto a complex example where
one variable is a parent of the other two. Here A is the parent variable and it points down to
B and C. In such a case, the probability distribution would be equal to the probability of A
and the conditional probability of B given A and C given A. Now, if you have to convert this
into Markov Random Field, the factorization of the similarly structured graph, where we have
the potential function of A/B edge and a potential function for A/C edge. Refer to the image
below.
Fig 8: Representation of Bayesian Network in MRF – Data Analyst Interview
Questions
Q10. What is the difference between variance and covariance?
Variance and Covariance are two mathematical terms which are used frequently in
statistics. Variance basically refers to how apart numbers are in relation to the mean.
Covariance, on the other hand, refers to how two random variables will change
together. This is basically used to calculate the correlation between variables.
In case you have attended any Data Analytics interview in the recent past, do paste
those interview questions in the comments section and we’ll answer them ASAP.
You can also comment below if you have any questions in your mind, which you
might have faced in your Data Analytics interview.
Now, let us move on to the next set of questions which is the SAS Interview
Questions.
Data Analyst Interview
Questions: SAS
Statistical Analysis System(SAS) provided by SAS Institute itself is the most
popular Data Analytics tool in the market. In simple words, SAS can process
complex data and generate meaningful insights that would help organizations make
better decisions or predict possible outcomes in the near future. So, this lets you
mine, alter, manage and retrieve data from different sources and analyze it.
Q1. What is interleaving in SAS?

Interleaving in SAS means combining individual sorted SAS data sets into one
sorted data set. You can interleave data sets using a SET statement along with a BY
statement.
In the example that you can see below, the data sets are sorted by the variable Age.
Fig 9: Example for Interleaving in SAS – Data Analyst Interview Questions
We can sort and then join the data sets on Age by writing the following query:
1 data combined;
2 set Data1, Data2;
3 by Age;
4 run;
Q2. What is the basic syntax style of writing code in SAS?
The basic syntax style of writing code in SAS is as follows:
1. Write the DATA statement which will basically name the dataset.
2. Write the INPUT statement to name the variables in the data set.
3. All the statements should end with a semi-colon.
4. There should be a proper space between word and a statement.
Q3. What is the difference between the Do Index, Do While and the Do
Until loop? Give examples.
To answer this question, you can first answer what exactly a Do loop is. So, a Do
loop is used to execute a block of code repeatedly, based on a condition. You can
refer to the image below to see the workflow of the Do loop.
Fig 10: Workflow of Do Loop – Data Analyst Interview Questions
 Do Index loop: We use an index variable as a start and stop value for Do Index
loop. The SAS statements get executed repeatedly till the index variable reaches its
final value.
 Do While Loop: The Do While loop uses a WHILE condition. This Loop executes
the block of code when the condition is true and keeps executing it, till the condition
becomes false. Once the condition becomes false, the loop is terminated.
 Do Until Loop: The Do Until loop uses an Until condition. This Loop executes the
block of code when the condition is false and keeps executing it, till the condition
becomes true. Once the condition becomes true, the loop is terminated.
If you have to explain with respect to the code, then let us say we want to calculate
the SUM and the number of variables.
For the loops you can write the code as follows:
Do Index
1
DATA ExampleLoop;
2 SUM=0;
3 Do VAR = 1 = 10;
4 SUM = SUM + VAR;
5 END;
PROC PRINT DATA = ExampleLoop;

6
Run;
7
The output would be:
Obs SUM VAR
1 55 11
Table 2: Output of Do Index Loop – Data Analyst Interview Questions
Do While
1
DATA ExampleLoop;
2
SUM = 0;
3 VAR = 1;
4 Do While(VAR<15);
5 SUM = SUM + VAR;
6 VAR+1;
END;
7
PROC PRINT DATA = ExampleLoop;
8
Run;
9
Obs SUM VAR

1 105 15
Table 3: Output of Do While Loop – Data Analyst Interview Questions
Do Until
1
DATA ExampleLoop;
2
SUM = 0;
3 VAR = 1;
4 Do Until(VAR>15);
5 SUM=SUM+VAR;
6 VAR+1;
END;
7
PROC PRINT;
8
Run;
9
Obs SUM VAR
1 120 16
Table 4: Output of Do Until Loop – Data Analyst Interview Questions
Q4. What is the ANYDIGIT function in SAS?
The ANYDIGIT function is used to search for a character string. After the string is
found it will simply return the desired string.
Q5. Can you tell the difference between VAR X1 – X3 and VAR X1 —
X3?
When you specify sing dash between the variables, then that specifies consecutively
numbered variables. Similarly, if you specify the Double Dash between the variables,
then that would specify all the variables available within the dataset.
For Example:
Consider the following data set:
Data Set: ID NAME X1 X2 Y1 X3
Then, X1 – X3 would return X1 X2 X3
and X1 — X3 would return X1 X2 Y1 X3

Q6. What is the purpose of trailing @ and @@? How do you use
them?
The trailing @ is commonly known as the column pointer. So, when we use the
trailing @, in the Input statement, it gives you the ability to read a part of the raw data
line, test it and decide how can the additional data be read from the same record.
 The single trailing @ tells the SAS system to “hold the line”.
 The double trailing @@ tells the SAS system to “hold the line more strongly”.
An Input statement ending with @@ instructs the program to release the current raw
data line only when there are no data values left to be read from that line. The @@,
therefore, holds the input record even across multiple iterations of the data step.
Q7. What would be the result of the following SAS function (given that
31 Dec 2017 is Saturday)?
Weeks = intck (‘week’,’31 dec 2017’d,’01jan2018’d);

Years = intck (‘year’,’31 dec 2017’d,’01jan2018’d);
Months = intck (‘month’,’31 dec 2017’d,’01jan2018’d);
Here, we will calculate the weeks between 31st December 2017 and 1st January
2018. 31st December 2017 was a Saturday. So 1st January 2018 will be a Sunday in
the next week.
 Hence, Weeks = 1 since both the days are in different weeks.

 Years = 1 since both the days are in different calendar years.
 Months = 1 since both the days are in different months of the calendar.
Q8. How does PROC SQL work?
PROC SQL is nothing but a simultaneous process for all the observations. The
following steps occur when a PROC SQL gets executed:
 SAS scans each and every statement in the SQL procedure and checks the syntax
errors.
 The SQL optimizer scans the query inside the statement. So, the SQL optimizer
basically decides how the SQL query should be executed in order to minimize the
runtime.
 If there are any tables in the FROM statement, then they are loaded into the data
engine where they can then be accessed in the memory.
 Codes and Calculations are executed.
 The Final Table is created in the memory.
 The Final Table is sent to the output table described in the SQL statement.
Q9. If you are given an unsorted data set, how will you read the last
observation to a new dataset?
We can read the last observation to a new dataset using end = dataset option.
For example:
1 data example.newdataset;
2 set example.olddataset end=last;
3 If last;
4 run;
Where newdataset is a new data set to be created and olddataset is the existing

data set. last is the temporary variable (initialized to 0) which is set to 1 when the set
statement reads the last observation.
Q10. What are the differences between the sum function and using
“+” operator?
The SUM function returns the sum of non-missing arguments whereas “+” operator

returns a missing value if any of the arguments are missing. Consider the following
example.
Example:
1 data exampledata1;
2 input a b c;
cards;
3
44 4 4
4
34 3 4
5
34 3 4
6
. 1 2
7
24 . 4
8 44 4 .
9 25 3 1
10 ;
11 run;
data exampledata2;
12
set exampledata1;
13
x = sum(a,b,c);
14
y=a+b+c;
15
run;
16
17
In the output, the value of y is missing for 4th, 5th, and 6th observation as we have
used the “+” operator to calculate the value of y.
x y
52 52
41 41
41 41
3 .
28 .
48 .
29 29
If you wish to know more questions on SAS, then refer a full-fledged article on SAS
Interview Questions.
Now, let us move on to the next set of questions which is the SQL Interview
Questions.
Data Analyst Interview Questions: SQL
RDBMS is one of the most commonly used databases till date, and therefore SQL
skills are indispensable in most of the job roles such as a Data Analyst. Knowing
Structured Query Language, boots your path on becoming a data analyst, as it will
be clear in your interviews that you know how to handle databases.
Q1. What is the default port for SQL?
The default TCP port assigned by the official Internet Number Authority(IANA) for
SQL server is 1433.
Q2. What do you mean by DBMS? What are its different types?
A Database Management System (DBMS) is a software application that interacts

with the user, applications and the database itself to capture and analyze data. The
data stored in the database can be modified, retrieved and deleted, and can be of
any type like strings, numbers, images etc.
There are mainly 4 types of DBMS, which are Hierarchical, Relational, Network, and
Object-Oriented DBMS.
 Hierarchical DBMS: As the name suggests, this type of DBMS has a style of
predecessor-successor type of relationship. So, it has a structure similar to that of a
tree, wherein the nodes represent records and the branches of the tree represent
fields.
 Relational DBMS (RDBMS): This type of DBMS, uses a structure that allows the
users to identify and access data in relation to another piece of data in the database.
 Network DBMS: This type of DBMS supports many to many relations wherein
multiple member records can be linked.
 Object-oriented DBMS: This type of DBMS uses small individual software called
objects. Each object contains a piece of data and the instructions for the actions to
be done with the data.
Q3. What is ACID property in a database?

ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. This
property is used in the databases to ensure whether the data transactions are
processed reliably in the system or not. If you have to define each of these terms,
then you can refer below.
 Atomicity: Refers to the transactions which are either completely successful or
failed. Here a transaction refers to a single operation. So, even if a single transaction
fails, then the entire transaction fails and the database state is left unchanged.
 Consistency: This feature makes sure that the data must meet all the validation
rules. So, this basically makes sure that the transaction never leaves the database
without completing its state.
 Isolation: Isolation keeps transactions separated from each other until they’re
finished. So basically each and every transaction is independent.
 Durability: Durability makes sure that your committed transaction is never lost. So,
this guarantees that the database will keep track of pending changes in such a way
that even if there is a power loss, crash or any sort of error the server can recover
from an abnormal termination.
Q4. What is Normalization? Explain different types of Normalization

with advantages.
Normalization is the process of organizing data to avoid duplication and

redundancy. There are many successive levels of normalization. These are
called normal forms. Each consecutive normal form depends on the previous one.
The first three normal forms are usually adequate.
 First Normal Form (1NF) – No repeating groups within rows

 Second Normal Form (2NF) – Every non-key (supporting) column value is
dependent on the whole primary key.
 Third Normal Form (3NF) – Dependent solely on the primary key and no
other non-key (supporting) column value.
 Boyce- Codd Normal Form (BCNF) – BCNF is the advanced version of
3NF. A table is said to be in BCNF if it is 3NF and for every X ->Y, relation X
should be the super key of the table.
Some of the advantages are:
 Better Database organization

 More Tables with smaller rows
 Efficient data access
 Greater Flexibility for Queries
 Quickly find the information
 Easier to implement Security
 Allows easy modification
 Reduction of redundant and duplicate data
 More Compact Database
 Ensure Consistent data after modification
Q5. What are the different types of Joins?

The various types of joins used to retrieve data between tables are Inner Join, Left
Join, Right Join and Full Outer Join. Refer to the image on the right side.
 Inner join: Inner Join in MySQL is the most common type of join. It is used to
return all the rows from multiple tables where the join condition is satisfied.
 Left Join: Left Join in MySQL is used to return all the rows from the left table,
but only the matching rows from the right table where the join condition is
fulfilled.
 Right Join: Right Join in MySQL is used to return all the rows from the right
table, but only the matching rows from the left table where the join condition is
fulfilled.
 Full Join: Full join returns all the records when there is a match in any of the
tables. Therefore, it returns all the rows from the left-hand side table and all
the rows from the right-hand side table.
Q6. Suppose you have a table of employee details consisting of

columns names (employeeId, employeeName), and you want to fetch
alternate records from a table. How do you think you can perform this
task?
You can fetch alternate tuples by using the row number of the tuple. Let us say if we
want to display the employeeId, of even records, then you can use the mod function
and simply write the following query:
1 Select employeeId from (Select rownumber, employeeId from employee) where mod
where ‘employee’ is the table name.
Similarly, if you want to display the employeeId of odd records, then you can write
the following query
1 Select employeeId from (Select rownumber, employeeId from employee) where mod
Q7. Consider the following two tables.
Table 5: Example Table – Data Analyst Interview Questions
Now, write a query to get the list of customers who took the course
more than once on the same day. The customers should be grouped
by customer, and course and the list should be ordered according to
the most recent date.
1 SELECT
c.Customer_Id,
2
CustomerName,
3 Course_Id,
4
5 Course_Date,
6 count(Customer_Course_Id) AS count
7 FROM customers c JOIN course_details d ON d.Customer_Id = c.Customer_Id
GROUP BY c.Customer_Id,
8 CustomerName,
9 Course_Id,
10 Course_Date
11 HAVING count( Customer_Course_Id ) > 1
12 ORDER BY Course_Date DESC;
13
Table 6: Output Table – Data Analyst Interview Questions
Q8. Consider the below Employee_Details table. Here the table has
various features such as Employee_Id, EmployeeName, Age, Gender,
and Shift. The Shift has m = Morning Shift and e = Evening Shift. Now,
you have to swap the ‘m’ and the ‘e’ values and vice versa, with a
single update query.
You can write the below query:

1 UPDATE Employee_Details SET Shift = CASE Shift WHEN 'm' THEN 'e' ELSE 'm' END
Table 8: Output Table – Data Analyst Interview Questions
Q9. Write a SQL query to get the third highest salary of an employee

from Employee_Details table as illustrated below.
1 SELECT TOP 1 Salary

2 FROM(
3 SELECT TOP 3 Salary
4 FROM Employee_Details
ORDER BY salary DESC) AS emp
5 ORDER BY salary ASC;
6
Q10. What is the difference between NVL and NVL2 functions in SQL?
NVL(exp1, exp2) and NVL2(exp1, exp2, exp3) are functions which check whether
the value of exp1 is null or not.
If we use NVL(exp1,exp2) function, then if exp1 is not null, then the value of exp1 will
be returned; else the value of exp2 will be returned. But, exp2 must be of the same
data type of exp1.
Similarly, if we use NVL2(exp1, exp2, exp3) function, then if exp1 is not null, exp2
will be returned, else the value of exp3 will be returned.
If you wish to know more questions on SQL, then refer a full-fledged article on SQL
Interview Questions.
Now, moving onto the next set of questions asked i.e. the Tableau Interview
Questions.
Data Analyst Interview Questions: Tableau

Tableau is a business intelligence software which allows anyone to connect to the
respective data. It visualizes and creates interactive, shareable dashboards. knowing
Tableau will enhance your understanding of Data Analysis and Data Visualization.
Q1. What are the differences between Tableau and Power BI?
Parameters Tableau Power BI

Tableau may costs you around
Cost $100 for a yearly subscription
$1000 for a yearly subscription
Licensing Tableau is not free 3 months trial period
Tableau offers variety when it
Ease of use comes to implementation and Power BI is easier to implement.
consulting services.
Power BI it is easier to upload
Visualization scales better to larger datasets
data sets
Year Of Establishment 2003 2013
Cost High Low
Application AD-Hoc Analysis Dashboard
Users Analysts Technical / Non-technical People
Support Level High Low
Scalability (Large Data-
Very Good Good
Sets)
Licensing Flexible Rigid
Overall functionality Very Good Good
Infrastructure Flexible Software as a service
Table 10: Differences between Tableau and Power BI – Data Analyst Interview
Questions
Q2. What is a dual axis?
Dual Axis is a phenomenon provided by Tableau. This helps the users to view two
scales of two measures in the same graph. Websites such as Indeed.com make use
of dual axis to show the comparison between two measures and the growth of these
two measures in a septic set of years. Dual axes let you compare multiple measures
at once, having two independent axes layered on top of one another. Refer to the
below image to see how it looks.
Fig 11: Representation of Dual Axis – Data Analyst Interview Questions
Q3. What is the difference between joining and blending in Tableau?
The Joining term is used when you are combining data from the same source, for
example, worksheet in an Excel file or tables in an Oracle database.
While blending requires two completely defined data sources in your report.
Q4. How to create a calculated field in Tableau?
To create a calculated field in Tableau, you can follow the below steps:
 Click the drop down to the right of Dimensions on the Data pane and select
“Create > Calculated Field”to open the calculation editor.
 Name the new field and create a formula.
Take a look at the snapshot below:

Fig 12: Snapshot of calculated fields – Data Analyst Interview Questions
Learn more about this data visualization tool with Tableau certification training from
industry experts.
Q5. How to view underlying SQL Queries in Tableau?
To view the underlying SQL Queries in Tableau, we mainly have two options:
 Use the Performance Recording Feature: You have to create a

Performance Recording to record the information about the main events you
interact with the workbook. Users can view the performance metrics in a
workbook created by Tableau.
Help -> Settings and Performance -> Start Performance Recording.
Help -> Setting and Performance -> Stop Performance Recording.
 Reviewing the Tableau Desktop Logs: You can review the Tableau
Desktop Logs located at C:UsersMy DocumentsMy Tableau Repository. For
live connection to the data source, you can check log.txt and tabprotosrv.txt
files. For an extract, check tdeserver.txt file.
Q6. Design a view in a map such that if a user selects any country, the
states under that country has to show profit and sales.
According to your question, you must have a country, state, profit and sales fields in
your dataset.
 Double-click on the country field.

 Drag the state and drop it into Marks card.
 Drag the sales and drop it into size.
 Drag profit and drop it into color.
 Click on size legend and increase the size.
 Right-click on the country field and select show quick filter.
 Select any country now and check the view.
Q7. What is the difference between heat map and tree map?
A heat map is used for comparing categories with color and size. With heat maps,
you can compare two different measures together. A treemap is a powerful
visualization that does the same as that of the heat map. Apart from that, it is also
used for illustrating hierarchical data and part-to-whole relationships.
Fig 13: Difference Between Heat Map and Tree Map – Data Analyst Interview
Questions
Q8. What is aggregation and disaggregation of data?
Aggregation of data: Aggregation of data refers to the process of viewing numeric

values or the measures at a higher and more summarized level of data. When you
place a measure on a shelf, Tableau will automatically aggregate your data. You can
determine whether the aggregation has been applied to a field or not, by simply
looking at the function. This is because the function always appears in front of the
field’s name when it is placed on a shelf.
Example: Sales field will become SUM(Sales) after aggregation.
You can aggregate measures using Tableau only for relational data sources.
Multidimensional data sources contain aggregated data only. In Tableau,
multidimensional data sources are supported only in Windows.
Disaggregation of data: Disaggregation of data allows you to view every row of the
data source which can be useful while analyzing measures.
Example: Consider a scenario where you are analyzing results from a product

satisfaction survey. Here the Age of participants is along one axis. Now, you can
aggregate the Age field to determine the average age of participants, or you can
disaggregate the data to determine the age at which the participants were most
satisfied with their product.
Q9. Can you tell how to create stories in Tableau?
Stories are used to narrate a sequence of events or make a business use-case.

The Tableau Dashboard provides various options to create a story. Each story point
can be based on a different view or dashboard, or the entire story can be based on
the same visualization, just seen at different stages, with different marks filtered and
annotations added.
To create a story in Tableau you can follow the below steps:
 Click the New Story tab.

 In the lower-left corner of the screen, choose a size for your story. Choose
from one of the predefined sizes, or set a custom size, in pixels.
 By default, your story gets its title from its sheet name. To edit it, double-click
the title. You can also change your title’s font, color, and alignment.
Click Apply to view your changes.
 To start building your story, drag a sheet from the Story tab on the left and
drop it into the center of the view.
 Click Add a caption to summarize the story point.
 To highlight a key takeaway for your viewers, drag a text object over to the
story worksheet and type your comment.
 To further highlight the main idea of this story point, you can change a filter or
sort on a field in the view, then save your changes by clicking Update above
the navigator box.
Q10. Can you tell how to embed views onto Web pages?
You can embed interactive Tableau views and dashboards into web pages, blogs,
wiki pages, web applications, and intranet portals. Embedded views update as the
underlying data changes, or as their workbooks are updated on Tableau Server.
Embedded views follow the same licensing and permission restrictions used
on Tableau Server. That is, to see a Tableau view that’s embedded in a web page,
the person accessing the view must also have an account on Tableau Server.
Alternatively, if your organization uses a core-based license on Tableau Server, a
Guest account is available. This allows people in your organization to view and
interact with Tableau views embedded in web pages without having to sign in to the
server. Contact your server or site administrator to find out if the Guest user is
enabled for the site you publish to.
You can do the following to embed views and adjust their default appearance:
 Get the embed code provided with a view: The Share button at the top of

each view includes embedded code that you can copy and paste into your
webpage. (The Share button doesn’t appear in embedded views if you
change the showShareOptions parameter to false in the code.)
 Customize the embed code: You can customize the embed code using
parameters that control the toolbar, tabs, and more. For more information, see
Parameters for Embed Code.
 Use the Tableau JavaScript API: Web developers can use Tableau JavaScript
objects in web applications. To get access to the API, documentation, code
examples, and the Tableau developer community, see the Tableau Developer
Portal.
If you wish to know more questions on Tableau, then refer a full-fledged article
on Tableau Interview Questions.
Now, moving onto something more interesting, I have planned up a set of 5 puzzles,
that are most commonly asked in the Data Analyst Interviews.
Data Analyst Interview Questions: Puzzles

The analytics industry predominantly relies on professionals who not only excel in
various Data Analyzing tools available in the market but also on those professionals
who have excellent problem-solving skills. The most important skill that you need to
possess is the approach to the problem. Oh yes, your approach should also be in
such a way that you should be able to explain to the interviewer.
Data Analytics Masters Program
Weekday / Weekend BatchesSee Batch Details
So let’s get started!

Q1. There are 3 mislabeled jars with Black and White balls in the first
and the second jar respectively. The third jar contains a mixture of
white and black balls. Now, you can pick as many balls as required to
label each jar correctly.
Tell the minimum number of balls to be picked up in this process of

labeling the jars.
If you notice the condition in the question, you will observe that there is a circular
misplacement. By which I mean that, if Black is wrongly labeled as Black, Black
cannot be labeled as White. So, it must be named as Back + White. If you consider
that all the 3 jars are wrongly placed, that is, Black + White jar contains either the
Black balls or the White balls, but not the both. Now, just assume you pick one ball
from the Black + White jar and let us assume it to be a Black ball. So, obviously, you
will name the jar as Black. However, the jar labeled Black cannot have Black +
White. Thus, the third jar left in the process should be labeled Black + White. So, if
you just pick up one ball, you can correctly label the jars.
Q2. Pumpkin must be equally divided into 8 equal pieces. You can
have only 3 cuts.
How do you think, will you make this possible?
The approach to answering this question is simple. You just must cut the pumpkin
horizontally down the center, followed by making 2 other cuts vertically intersecting
each other. So, this would give you your 8 equal pieces.
Q3. There are 5 lanes on a race track. One needs to find out the 3
fastest horses among the total of 25.
Determine the minimum number of races to be conducted in order to

find the fastest three cars.
Now, you can start solving the problem by considering the number of cars racing.
Since there are 25 cars racing with 5 lanes, there would be initially 5 races
conducted, with each group having 5 cars. Next, a sixth race will be conducted
between the winners of the first 5 races to determine the 3 fastest cars(let us say X1,
Y1, and Z1).
Now, suppose X1 is the fastest among the three, then that means A1 is the fastest
car among the 25 cars racing. But the question is how to find the 2 nd and the
3rd fastest? We cannot assume that Y1 and Z1 are 2 nd and 3rd since it may happen
that the rest cars from the group of X1s’ cars could be faster than Y1 and Z1. So, to
determine this a 7th race is conducted between cars Y1, Z1, and the cars from X1’s
group(X2, X3), and the second car from Y1’s group Y2.
So, the cars that finish the 1 st and 2nd is the 7th race are actually the 2nd and the
3rd fastest cars among all cars.
Q4. Consider 10 stacks of 10 coins each, where each coin weighs 10
grams. But, one of the 10 stacks is defective, and this defective stack
contains the coins of 9 grams each.
Find the minimum number of weights needed to identify the defective

stack.
The solution to this puzzle is very simple. You just must pick 1 coin from the 1 st stack,
2 coins from the 2nd stack, 3 coins from the 3rd stack and so on till 10 coins from the
10th stack. So, if you add the number of coins then it would be equal to 55.
So, if none of the coins are defective then the weight would 55*10 = 550 grams.
Yet, if stack 1 turns out to be defective, then the total weight would be 1 less then
550 grams, that is 549 grams. Similarly, if stack 2 was defective then the total weight
would be equal to 2 less than 50 grams, that is 548 grams. Similarly, you can find for
the other 8 cases.
So, just one measurement is needed to identify the defective stack.
Q5. Two buses running towards each other on the same track are
moving at a speed of 40km/hr and are separated by 80km. A bird
takes it flight from the bus A and flies towards bus B at a constant
speed of 100km/hr. Once it reaches bus Y, it turns and starts flying
back towards bus X. The bird keeps flying to and forth till both the
buses collide.
Find the distance traveled by the bird.
The solution to the above problem can be as follows:
 The velocity of the two buses approaching towards each other = (40 +
40)km/hr
 The time taken for the buses to collide = 80km/hr = 1 hour.
 The total distance traveled by the bird = 100km/hr * 1 hr = 100 km.
So, that’s an end to this article on Data Analyst Interview Questions.
I hope you found this Data Analyst Interview Questions’ blog informative. The
questions that you learned in this Data Analyst Interview Questions’ blog are
the most sought-after questions that are asked in the interview which will help you
crack your interviews.
Data Analyst Masters Program follows a set structure with 4 core courses and 7
electives spread across 15 weeks. It makes you an expert in key technologies
related to Data Analytics. It is a structured learning path recommended by leading
industry experts and ensures that you transform into an expert Data Analytics
professional while making you learn the usage of tools such
as R, SAS, Tableau, QlikView, Advanced Excel, Machine learning, etc. Individual
courses focus on specialization in one or two specific skills, however, if you intend to
become a Data Analyst, then this is the path for you to follow.
If you have any queries related to this article please leave them in the comments
section below and we will revert as soon as possible.
Table of Contents
What is a Data Analyst?
Data Analyst Roles and Responsibilities Include:
Data Analyst Skills Required
Key Skills for a Data Analyst
Data Analyst Qualifications: What Does it Take
View More
Data has changed the face of our world over the last ten years. The numerous
emails, text messages we share, YouTube videos we watch are part of the nearly
2.5 quintillion bytes of data generated daily across the world. Businesses, both large
and small, deal with massive data volumes, and a lot depends on their ability to
glean meaningful insights from them. A data analyst does precisely that. They
interpret statistical data and turn it into useful information that businesses and
organizations can use for critical decision-making.
Organizations in all sectors are increasingly depending on data to make critical

business decisions like which products to make, which markets to enter, what
investments to make, or which customers to target. They are also using data to
identify weak areas in the business that need to be addressed.
As a result, data analysis has become one of the highest in-demand jobs worldwide,

and data analysts are sought after by the world’s biggest organizations. Data analyst
salary and perks only reflect the demand of this job role which is likely to keep
growing in leaps and bounds.
So if you have the skills required to become a data analyst, you would be remiss not
to take advantage of this scenario.
What is a Data Analyst?
The role of a data analyst can be defined as someone who has the knowledge and
skills to turn raw data into information and insight, which can be used to make
business decisions.
Post Graduate Program in Data Analytics
In partnership with Purdue UniversityVIEW COURSE
Data Analyst Roles and Responsibilities Include:
 Using automated tools to extract data from primary and secondary sources
 Removing corrupted data and fixing coding errors and related problems
 Developing and maintaining databases, data systems – reorganizing data in a readable

format
 Performing analysis to assess quality and meaning of data
 Filter Data by reviewing reports and performance indicators to identify and correct code
problems
 Using statistical tools to identify, analyze, and interpret patterns and trends in complex
data sets that could be helpful for the diagnosis and prediction
 Assigning numerical value to essential business functions so that business performance

can be assessed and compared over periods of time.
 Analyzing local, national, and global trends that impact both the organization and the
industry
 Preparing reports for the management stating trends, patterns, and predictions using
relevant data
 Working with programmers, engineers, and management heads to identify process

improvement opportunities, propose system modifications, and devise data
governance strategies.
 Preparing final analysis reports for the stakeholders to understand the data-analysis
steps, enabling them to take important decisions based on various facts and trends.
Another integral element of data analyst job description is EDA or Exploratory Data
Analysis Project. In such data analyst projects, the analyst needs to scrutinize data
to recognize and identify patterns. The next thing data analysts do is use data
modeling techniques to summarize the overall features of data analysis.
Data Analyst Skills Required
A successful data analyst needs to have a combination of technical as well

leadership skills. A background in Mathematics, Statistics, Computer Science,
Information Management, or Economics can serve as a solid foundation to build your
career as a data analyst.
Key Skills for a Data Analyst
 Strong mathematical skills to help collect, measure, organize and analyze data
 Knowledge of programming languages like SQL, Oracle, R, MATLAB, and Python
 Technical proficiency regarding database design development, data models, techniques

for data mining, and segmentation.
 Experience in handling reporting packages like Business Objects, programming

(Javascript, XML, or ETL frameworks), databases
 Proficiency in statistics and statistical packages like Excel, SPSS, SAS to be used for
data set analyzing
 Adept at using data processing platforms like Hadoop and Apache Spark
 Knowledge of data visualization software like Tableau, Qlik
 Knowledge of how to create and apply the most accurate algorithms to datasets in order
to find solutions
 Problem-solving skills
 Accuracy and attention to detail
 Adept at queries, writing reports, and making presentations
 Team-working skills
 Verbal and Written communication skills
 Proven working ea3xperience in data analysis

Data Analyst Qualifications: What Does it Take
You need more than technical expertise to excel in a career in data analytics. A
bachelor’s degree in a field that emphasizes on statistical and analytical skills is
desired. Students from mathematics, statistics, computer science, or economics
background usually have an edge in the data analyst career path. However, a
postgraduate course in data analytics like Data Analytics Bootcamp can make you
an industry-ready professional.
You would also need soft data analyst skills like:
 excellent communication and presentation skills
 ability for critical thinking
 creativity
 having a systematic and logical approach to problem-solving
 team working skills
FREE Course: Introduction to Data Analytics
Learn Data Analytics Concepts, Tools & SkillsSTART LEARNING
Data Analyst Salary: How Much Does a Data Analyst

Make?
The data analyst salary depends on a number of factors like educational

qualification, location, relevant experience, and skills set.
The average annual salary of an experienced data analyst can range from

approximately $60,000 to $140,000. Financial and technology firms tend to offer
higher pay-package than average.
The cross-market average data analyst salary is approximately $73,528
Data analysts typically move on to higher positions like senior data analysts, data
scientists, data analytics managers, business analysts, etc. Higher responsibilities
come with a substantial pay rise as well. It is estimated that the average annual
salary of data scientists starts at around $95,000, while that of analytical managers
begins at approximately $106,000.
Watch the video below that will help you have an understanding of the various
responsibilities, skills required, and the salary structure of top Data Analytics job
roles.
Top Companies Hiring Data Analysts
If you are looking for a data analyst job, you can choose from more than 86 thousand
open jobs worldwide. Shocking, isn’t it? This is mainly because nearly all industries
benefit from data analysis. Today, the data analyst job description is branching off
into various specializations like finance, healthcare, business, marketing, and e-
commerce.
Presently, business intelligence companies have the highest number of job openings
for data analysts in the US and Europe, followed by finance, sharing economy
services, healthcare, and entertainment companies.
Some of the top global companies hiring data analysts include Amazon, Netflix,
Google, Intuit, Facebook, Apple, CISCO Systems. Smaller companies include Focus
KPI, Affinity Solutions, Norgate Technology, Financial giants like Paypal and
Barclays are also hiring data analysts across various departments.
What is Data Anal`dysis?
Data analysis is basically a process of analyzing, modeling, and interpreting data to

draw insights or conclusions. With the insights gained, informed decisions can be
made. It is used by every industry, which is why data analysts are in high demand. A
Data Analyst's sole responsibility is to play around with large amounts of data and
search for hidden insights. By interpreting a wide range of data, data analysts assist
organizations in understanding the business's current state.
Crack your next tech interview with confidence!

Take a free mock interview, get instant ⚡️feedback and recommendation 💡
Take Free Mock Interview
Data Analyst Interview Questions for Freshers

1. What are the responsibilities of a Data Analyst?
Some of the responsibilities of a data analyst include:

 Collects and analyzes data using statistical techniques and reports the results
accordingly.
 Interpret and analyze trends or patterns in complex data sets.
 Establishing business needs together with business teams or management teams.
 Find opportunities for improvement in existing processes or areas.
 Data set commissioning and decommissioning.
 Follow guidelines when processing confidential data or information.
 Examine the changes and updates that have been made to the source production
systems.
 Provide end-users with training on new reports and dashboards.
 Assist in the data storage structure, data mining, and data cleansing.
2. Write some key skills usually required for a data analyst.
Some of the key skills required for a data analyst include:

 Knowledge of reporting packages (Business Objects), coding languages (e.g., XML,
JavaScript, ETL), and databases (SQL, SQLite, etc.) is a must.
 Ability to analyze, organize, collect, and disseminate big data accurately and
efficiently.
 The ability to design databases, construct data models, perform data mining, and
segment data.
 Good understanding of statistical packages for analyzing large datasets (SAS, SPSS,
Microsoft Excel, etc.).
 Effective Problem-Solving, Teamwork, and Written and Verbal Communication
Skills.
 Excellent at writing queries, reports, and presentations.
 Understanding of data visualization software including Tableau and Qlik.
 The ability to create and apply the most accurate algorithms to datasets for finding
solutions.
3. What is the data analysis process?
Data analysis generally refers to the process of assembling, cleaning, interpreting,

transforming, and modeling data to gain insights or conclusions and generate
reports to help businesses become more profitable. The following diagram
illustrates the various steps involved in the process:
 Collect Data: The data is collected from a variety of sources and is then stored to
be cleaned and prepared. This step involves removing all missing values and
outliers.
 Analyse Data: As soon as the data is prepared, the next step is to analyze it.
Improvements are made by running a model repeatedly. Following that, the model
is validated to ensure that it is meeting the requirements.
 Create Reports: In the end, the model is implemented, and reports are generated
as well as distributed to stakeholders.
You can download a PDF version of Data Analyst Interview Questions.
Download PDF
4. What are the different challenges one faces during data analysis?
While analyzing data, a Data Analyst can encounter the following issues:
 Duplicate entries and spelling errors. Data quality can be hampered and reduced by
these errors.
 The representation of data obtained from multiple sources may differ. It may cause
a delay in the analysis process if the collected data are combined after being
cleaned and organized.
 Another major challenge in data analysis is incomplete data. This would invariably
lead to errors or faulty results.
 You would have to spend a lot of time cleaning the data if you are extracting data
from a poor source.
 Business stakeholders' unrealistic timelines and expectations
 Data blending/ integration from multiple sources is a challenge, particularly if there
are no consistent parameters and conventions
 Insufficient data architecture and tools to achieve the analytics goals on time.
5. Explain data cleansing.
Data cleaning, also known as data cleansing or data scrubbing or wrangling, is

basically a process of identifying and then modifying, replacing, or deleting the
incorrect, incomplete, inaccurate, irrelevant, or missing portions of the data as the
need arises. This fundamental element of data science ensures data is correct,
consistent, and usable.
6. What are the tools useful for data analysis?
Some of the tools useful for data analysis include:
 RapidMiner
 KNIME
 Google Search Operators
 Google Fusion Tables
 Solver
 NodeXL
 OpenRefine
 Wolfram Alpha
 io
 Tableau, etc.
7. Write the difference between data mining and data profiling.
Data mining Process: It generally involves analyzing data to find relations that
were not previously discovered. In this case, the emphasis is on finding unusual
records, detecting dependencies, and analyzing clusters. It also involves analyzing
large datasets to determine trends and patterns in them.
Data Profiling Process: It generally involves analyzing that data's individual

attributes. In this case, the emphasis is on providing useful information on data
attributes such as data type, frequency, etc. Additionally, it also facilitates the
discovery and evaluation of enterprise metadata.
Data Mining Data Profiling

It involves analyzing a pre-built database to It involves analyses of raw data from
identify patterns. existing datasets.
It also analyzes existing databases and large
In this, statistical or informative
datasets to convert raw data into useful
summaries of the data are collected.
information.
It usually involves finding hidden patterns and It usually involves the evaluation of
seeking out new, useful, and non-trivial data to data sets to ensure consistency,
generate useful information. uniqueness, and logic.
In data profiling, erroneous data is
Data mining is incapable of identifying
identified during the initial stage of
inaccurate or incorrect data values.
analysis.
Classification, regression, clustering, This process involves using
summarization, estimation, and description are discoveries and analytical methods to
some primary data mining tasks that are gather statistics or summaries about
needed to be performed. the data.
8. Which validation methods are employed by data analysts?
In the process of data validation, it is important to determine the accuracy of the

information as well as the quality of the source. Datasets can be validated in many
ways. Methods of data validation commonly used by Data Analysts include:
 Field Level Validation: This method validates data as and when it is entered into
the field. The errors can be corrected as you go.
 Form Level Validation: This type of validation is performed after the user submits
the form. A data entry form is checked at once, every field is validated, and
highlights the errors (if present) so that the user can fix them.
 Data Saving Validation: This technique validates data when a file or database
record is saved. The process is commonly employed when several data entry forms
must be validated.
 Search Criteria Validation: It effectively validates the user's search criteria in order
to provide the user with accurate and related results. Its main purpose is to ensure
that the search results returned by a user's query are highly relevant.
9. Explain Outlier.
In a dataset, Outliers are values that differ significantly from the mean of
characteristic features of a dataset. With the help of an outlier, we can determine
either variability in the measurement or an experimental error. There are two kinds
of outliers i.e., Univariate and Multivariate. The graph depicted below shows there
are four outliers in the dataset.
10. What are the ways to detect outliers? Explain different ways to
deal with it.
Outliers are detected using two methods:
 Box Plot Method: According to this method, the value is considered an outlier if it
exceeds or falls below 1.5*IQR (interquartile range), that is, if it lies above the top
quartile (Q3) or below the bottom quartile (Q1).
 Standard Deviation Method: According to this method, an outlier is defined as a
value that is greater or lower than the mean ± (3*standard deviation).
11. Write difference between data analysis and data mining.
Data Analysis: It generally involves extracting, cleansing, transforming, modeling,

and visualizing data in order to obtain useful and important information that may
contribute towards determining conclusions and deciding what to do next.
Analyzing data has been in use since the 1960s.
Data Mining: In data mining, also known as knowledge discovery in the database,
huge quantities of knowledge are explored and analyzed to find patterns and rules.
Since the 1990s, it has been a buzzword.
Data Analysis Data Mining

Analyzing data provides insight or tests A hidden pattern is identified and
hypotheses. discovered in large datasets.
It consists of collecting, preparing, and modeling This is considered as one of the
data in order to extract meaning or insights. activities in Data Analysis.
Data usability is the main
Data-driven decisions can be taken using this way.
objective.
Visualization is generally not
Data visualization is certainly required.
necessary.
It is an interdisciplinary field that requires Databases, machine learning, and
knowledge of computer science, statistics, statistics are usually combined in
mathematics, and machine learning. this field.
Here the dataset can be large, medium, or small,
In this case, datasets are typically
and it can be structured, semi-structured, and
large and structured.
unstructured.
12. Explain the KNN imputation method.
A KNN (K-nearest neighbor) model is usually considered one of the most common
techniques for imputation. It allows a point in multidimensional space to be
matched with its closest k neighbors. By using the distance function, two attribute
values are compared. Using this approach, the closest attribute values to the
missing values are used to impute these missing values.
13. Explain Normal Distribution.
Known as the bell curve or the Gauss distribution, the Normal Distribution plays a
key role in statistics and is the basis of Machine Learning. It generally defines and
measures how the values of a variable differ in their means and standard
deviations, that is, how their values are distributed.
The above image illustrates how data usually tend to be distributed around a
central value with no bias on either side. In addition, the random variables are
distributed according to symmetrical bell-shaped curves.
14. What do you mean by data visualization?
The term data visualization refers to a graphical representation of information and

data. Data visualization tools enable users to easily see and understand trends,
outliers, and patterns in data through the use of visual elements like charts, graphs,
and maps. Data can be viewed and analyzed in a smarter way, and it can be
converted into diagrams and charts with the use of this technology.
15. How does data visualization help you?
Data visualization has grown rapidly in popularity due to its ease of viewing and
understanding complex data in the form of charts and graphs. In addition to
providing data in a format that is easier to understand, it highlights trends and
outliers. The best visualizations illuminate meaningful information while removing
noise from data.
16. Mention some of the python libraries used in data analysis.
Several Python libraries that can be used on data analysis include:
 NumPy
 Bokeh
 Matplotlib
 Pandas
 SciPy
 SciKit, etc.
17. Explain a hash table.
Hash tables are usually defined as data structures that store data in an associative
manner. In this, data is generally stored in array format, which allows each data
value to have a unique index value. Using the hash technique, a hash table
generates an index into an array of slots from which we can retrieve the desired
value.
18. What do you mean by collisions in a hash table? Explain the ways
to avoid it.
Hash table collisions are typically caused when two keys have the same index.
Collisions, thus, result in a problem because two elements cannot share the same
slot in an array. The following methods can be used to avoid such hash collisions:
 Separate chaining technique: This method involves storing numerous items

hashing to a common slot using the data structure.
 Open addressing technique: This technique locates unfilled slots and stores the
item in the first unfilled slot it finds.
Data Analyst Interview Questions for Experienced

19. Write characteristics of a good data model.
An effective data model must possess the following characteristics in order to be

considered good and developed:
 Provides predictability performance, so the outcomes can be estimated as precisely

as possible or almost as accurately as possible.
 As business demands change, it should be adaptable and responsive to
accommodate those changes as needed.
 The model should scale proportionally to the change in data.
 Clients/customers should be able to reap tangible and profitable benefits from it.
20. Write disadvantages of Data analysis.
The following are some disadvantages of data analysis:

 Data Analytics may put customer privacy at risk and result in compromising
transactions, purchases, and subscriptions.
 Tools can be complex and require previous training.

 Choosing the right analytics tool every time requires a lot of skills and expertise.
 It is possible to misuse the information obtained with data analytics by targeting
people with certain political beliefs or ethnicities.
21. Explain Collaborative Filtering.
Based on user behavioral data, collaborative filtering (CF) creates a

recommendation system. By analyzing data from other users and their interactions
with the system, it filters out information. This method assumes that people who
agree in their evaluation of particular items will likely agree again in the future.
Collaborative filtering has three major components: users- items- interests.
Example:
Collaborative filtering can be seen, for instance, on online shopping sites when you
see phrases such as "recommended for you”.
22. What do you mean by Time Series Analysis? Where is it used?
In the field of Time Series Analysis (TSA), a sequence of data points is analyzed over
an interval of time. Instead of just recording the data points intermittently or
randomly, analysts record data points at regular intervals over a period of time in
the TSA. It can be done in two different ways: in the frequency and time domains.
As TSA has a broad scope of application, it can be used in a variety of fields. TSA
plays a vital role in the following places:
 Statistics
 Signal processing
 Econometrics
 Weather forecasting
 Earthquake prediction
 Astronomy
 Applied science
23. What do you mean by clustering algorithms? Write different

properties of clustering algorithms?
Clustering is the process of categorizing data into groups and clusters. In a dataset,
it identifies similar data groups. It is the technique of grouping a set of objects so
that the objects within the same cluster are similar to one another rather than to
those located in other clusters. When implemented, the clustering algorithm
possesses the following properties:
 Flat or hierarchical
 Hard or Soft
 Iterative
 Disjunctive
24. What is a Pivot table? Write its usage.
One of the basic tools for data analysis is the Pivot Table. With this feature, you can
quickly summarize large datasets in Microsoft Excel. Using it, we can turn columns
into rows and rows into columns. Furthermore, it permits grouping by any field
(column) and applying advanced calculations to them. It is an extremely easy-to-
use program since you just drag and drop rows/columns headers to build a report.
Pivot tables consist of four different sections:
 Value Area: This is where values are reported.

 Row Area: The row areas are the headings to the left of the values.
 Column Area: The headings above the values area make up the column area.
 Filter Area: Using this filter you may drill down in the data set.
25. What do you mean by univariate, bivariate, and multivariate
analysis?
 Univariate Analysis: The word uni means only one and variate means variable, so a
univariate analysis has only one dependable variable. Among the three analyses,
this is the simplest as the variables involved are only one.
Example: A simple example of univariate data could be height as shown below:
 Bivariate Analysis: The word Bi means two and variate mean variables, so a

bivariate analysis has two variables. It examines the causes of the two variables and
the relationship between them. It is possible that these variables are dependent on
or independent of each other.
Example: A simple example of bivariate data could be temperature and ice cream
sales in the summer season.
 Multivariate Analysis: In situations where more than two variables are to be
analyzed simultaneously, multivariate analysis is necessary. It is similar to bivariate
analysis, except that there are more variables involved.
26. Name some popular tools used in big data.
In order to handle Big Data, multiple tools are used. There are a few popular ones
as follows:
 Hadoop
 Spark
 Scala
 Hive
 Flume
 Mahout, etc.
27. Explain Hierarchical clustering.

This algorithm group objects into clusters based on similarities, and it is also called
hierarchical cluster analysis. When hierarchical clustering is performed, we obtain a
set of clusters that differ from each other.
This clustering technique can be divided into two types:
 Agglomerative Clustering (which uses bottom-up strategy to decompose clusters)

 Divisive Clustering (which uses a top-down strategy to decompose clusters)
28. What do you mean by logistic regression?
Logistic Regression is basically a mathematical model that can be used to study

datasets with one or more independent variables that determine a particular
outcome. By studying the relationship between multiple independent variables, the
model predicts a dependent data variable.
29. What do you mean by the K-means algorithm?
One of the most famous partitioning methods is K-mean. With this unsupervised
learning algorithm, the unlabeled data is grouped in clusters. Here, 'k' indicates the
number of clusters. It tries to keep each cluster separated from the other. Since it is
an unsupervised model, there will be no labels for the clusters to work with.
30. Write the difference between variance and covariance.
Variance: In statistics, variance is defined as the deviation of a data set from its
mean value or average value. When the variances are greater, the numbers in the
data set are farther from the mean. When the variances are smaller, the numbers
are nearer the mean. Variance is calculated as follows:
Here, X represents an individual data point, U represents the average of multiple

data points, and N represents the total number of data points.
Covariance: Covariance is another common concept in statistics, like variance. In

statistics, covariance is a measure of how two random variables change when
compared with each other. Covariance is calculated as follows:
Here, X represents the independent variable, Y represents the dependent variable,
x-bar represents the mean of the X, y-bar represents the mean of the Y, and N
represents the total number of data points in the sample.
31. What are the advantages of using version control?
Also known as source control, version control is the mechanism for configuring
software. Records, files, datasets, or documents can be managed with this. Version
control has the following advantages:
 Analysis of the deletions, editing, and creation of datasets since the original copy
can be done with version control.
 Software development becomes clearer with this method.
 It helps distinguish different versions of the document from one another. Thus, the
latest version can be easily identified.
 There's a complete history of project files maintained by it which comes in handy if
ever there's a failure of the central server.
 Securely storing and maintaining multiple versions and variants of code files is easy
with this tool.
 Using it, you can view the changes made to different files.
32. Explain N-gram
N-gram, known as the probabilistic language model, is defined as a connected

sequence of n items in a given text or speech. It is basically composed of adjacent
words or letters of length n that were present in the source text. In simple words, it
is a way to predict the next item in a sequence, as in (n-1).
33. Mention some of the statistical techniques that are used by Data
analysts.
Performing data analysis requires the use of many different statistical techniques.
Some important ones are as follows:
 Markov process
 Cluster analysis
 Imputation techniques
 Bayesian methodologies
 Rank statistics
34. What's the difference between a data lake and a data warehouse?
The storage of data is a big deal. Companies that use big data have been in the
news a lot lately, as they try to maximize its potential. Data storage is usually
handled by traditional databases for the layperson. For storing, managing, and
analyzing big data, companies use data warehouses and data lakes.
Data Warehouse: This is considered an ideal place to store all the data you gather
from many sources. A data warehouse is a centralized repository of data where data
from operational systems and other sources are stored. It is a standard tool for
integrating data across the team- or department-silos in mid-and large-sized
companies. It collects and manages data from varied sources to provide meaningful
business insights. Data warehouses can be of the following types:
 Enterprise data warehouse (EDW): Provides decision support for the entire
organization.
 Operational Data Store (ODS): Has functionality such as reporting sales data or
employee data.
Data Lake: Data lakes are basically large storage device that stores raw data in their
original format until they are needed. with its large amount of data, analytical
performance and native integration are improved. It exploits data warehouses'
biggest weakness: their incapacity to be flexible. In this, neither planning nor
knowledge of data analysis is required; the analysis is assumed to happen later, on-
demand.
Multiple Choice Questions

1.
Which is a process of Data Analysis?
Inspecting data
Cleaning data
Transforming data
All of the above
2.
Are any of the following not major approaches to data analysis?
Data Mining
Predictive Intelligence
Business Intelligence
Text Analytics
3.
What is meant by 'outlier'?
Missing data causes a score to be excluded from the analysis
Arithmetic mean
Unquantifiable variables
A value that is extreme at either end of a distribution
4.
In what situations should a multivariate analysis be conducted?
A spurious relationship may exist between two variables.
If there were an intervening variable.
There might be a third variable moderating the relationship.
All of the above.
5.
Which of the following statements is true about Data Visualization?
Information graphics, as tables and charts, are used in data visualization to provide
information to users clearly and efficiently.
Data Visualization simplifies the analysis of large amounts of data.
Data Visualization makes complicated data more understandable, accessible, and usable.
All of the above
6.
____ is a collection of observations recorded at equal intervals of time, usually.
Array data
Data
Geometric Series
Time Series Data
7.
Which of the following is an important process used to extract data patterns using
intelligent methods?
Warehousing
Data Mining
Text Mining
None of the above
8.
What is incorrect about hierarchical clustering?
The hierarchical clustering is also called the HCA.
Choosing an appropriate metric can influence the cluster shape.
Generally, splits and merges are determined in a greedy way.
All of the above.
9.
What is the most sensitive algorithm to outliers among the following?
K-means clustering algorithm
K-medians clustering algorithm
K-modes clustering algorithm
None of the above
10.
Collaborative filtering aims to accomplish what?
Providing recommended products and services
Collaborating with other sites
Determining audience on the basis of demographics
Figuring out what new customers do during their spare time
11.
The PivotTable Fields List does not include which of the following boxes?
Column Labels
Report Filter
Values
Formulas
1. Mention the differences between Data Mining and Data Profiling?

Data Mining Data Profiting
Data profiling is done to evaluate a

Data mining is the process of discovering relevant
dataset for its uniqueness, logic,
information that has not yet been identified before.
and consistency.
In data mining, raw data is converted into valuable It cannot identify inaccurate or
information. incorrect data values.
2. Define the term 'Data Wrangling in Data Analytics.
Data Wrangling is the process wherein raw data is cleaned, structured, and enriched
into a desired usable format for better decision making. It involves discovering,
structuring, cleaning, enriching, validating, and analyzing data. This process can turn
and map out large amounts of data extracted from various sources into a more
useful format. Techniques such as merging, grouping, concatenating, joining, and
sorting are used to analyze the data. Thereafter it gets ready to be used with another
dataset.
3. What are the various steps involved in any analytics project?
This is one of the most basic data analyst interview questions. The various steps
involved in any common analytics projects are as follows:
Understanding the Problem
Understand the business problem, define the organizational goals, and plan for a
lucrative solution.
Collecting Data
Gather the right data from various sources and other information based on your
priorities.
Cleaning Data
Clean the data to remove unwanted, redundant, and missing values, and make it
ready for analysis.
Exploring and Analyzing Data
Use data visualization and business intelligence tools, data mining techniques, and
predictive modeling to analyze data.
Interpreting the Results
Interpret the results to find out hidden patterns, future trends, and gain insights.
FREE Course: Introduction to Data Analytics
Learn Data Analytics Concepts, Tools & SkillsSTART LEARNING
4. What are the common problems that data analysts encounter during
analysis?
The common problems steps involved in any analytics project are:
 Handling duplicate
 Collecting the meaningful right data and the right time
 Handling data purging and storage problems
 Making data secure and dealing with compliance issues
5. Which are the technical tools that you have used for analysis and
presentation purposes?
As a data analyst, you are expected to know the tools mentioned below for analysis
and presentation purposes. Some of the popular tools you should know are:
MS SQL Server, MySQL
For working with data stored in relational databases
MS Excel, Tableau
For creating reports and dashboards
Python, R, SPSS
For statistical analysis, data modeling, and exploratory analysis

MS PowerPoint
For presentation, displaying the final results and important conclusions
6. What are the best methods for data cleaning?
 Create a data cleaning plan by understanding where the common errors take place and
keep all the communications open.
 Before working with the data, identify and remove the duplicates. This will lead to an easy
and effective data analysis process.
 Focus on the accuracy of the data. Set cross-field validation, maintain the value types of
data, and provide mandatory constraints.
 Normalize the data at the entry point so that it is less chaotic. You will be able to ensure
that all information is standardized, leading to fewer errors on entry.
7. What is the significance of Exploratory Data Analysis (EDA)?
 Exploratory data analysis (EDA) helps to understand the data better.
 It helps you obtain confidence in your data to a point where you’re ready to engage a
machine learning algorithm.
 It allows you to refine your selection of feature variables that will be used later for model
building.
 You can discover hidden trends and insights from the data.
8. Explain descriptive, predictive, and prescriptive analytics.
Descriptive Predictive Prescriptive
Suggest various
It provides insights into the
Understands the future to courses of action to
past to answer “what has
answer “what could happen” answer “what should
happened”
you do”
Uses simulation
algorithms and
Uses data aggregation and Uses statistical models and
optimization techniques
data mining techniques forecasting techniques
to advise possible
outcomes
Example: An ice cream

Example: An ice cream company Example: Lower prices
company can analyze how
can analyze how much ice to increase the sale of
much ice cream was sold,
cream was sold, which flavors ice creams, produce
which flavors were sold, and
were sold, and whether more or more/fewer quantities
whether more or less ice
less ice cream was sold than the of a specific flavor of
cream was sold than the day
day before ice cream
before
9. What are the different types of sampling techniques used by data

analysts?
Sampling is a statistical method to select a subset of data from an entire dataset

(population) to estimate the characteristics of the whole population.
There are majorly five types of sampling methods:
 Simple random sampling
 Systematic sampling
 Cluster sampling
 Stratified sampling
 Judgmental or purposive sampling
10. Describe univariate, bivariate, and multivariate analysis.
Univariate analysis is the simplest and easiest form of data analysis where the data
being analyzed contains only one variable.
Example - Studying the heights of players in the NBA.
Univariate analysis can be described using Central Tendency, Dispersion, Quartiles,

Bar charts, Histograms, Pie charts, and Frequency distribution tables.
The bivariate analysis involves the analysis of two variables to find causes,
relationships, and correlations between the variables.
Example – Analyzing the sale of ice creams based on the temperature outside.
The bivariate analysis can be explained using Correlation coefficients, Linear

regression, Logistic regression, Scatter plots, and Box plots.
The multivariate analysis involves the analysis of three or more variables to

understand the relationship of each variable with the other variables.
Example – Analysing Revenue based on expenditure.
Multivariate analysis can be performed using Multiple regression, Factor analysis,

Classification & regression trees, Cluster analysis, Principal component analysis,
Dual-axis charts, etc.
Statistics Data Analyst Interview Questions
11. How can you handle missing values in a dataset?
This is one of the most frequently asked data analyst interview questions, and the
interviewer expects you to give a detailed answer here, and not just the name of the
methods. There are four methods to handle missing values in a dataset.
Listwise Deletion
In the listwise deletion method, an entire record is excluded from analysis if any
single value is missing.
Average Imputation
Take the average value of the other participants' responses and fill in the missing
value.
Regression Substitution
You can use multiple-regression analyses to estimate a missing value.
Multiple Imputations
It creates plausible values based on the correlations for the missing data and then
averages the simulated datasets by incorporating random errors in your predictions.
12. Explain the term Normal Distribution.
Normal Distribution refers to a continuous probability distribution that is symmetric

about the mean. In a graph, normal distribution will appear as a bell curve.
 The mean, median, and mode are equal
 All of them are located in the center of the distribution
 68% of the data falls within one standard deviation of the mean
 95% of the data lies between two standard deviations of the mean
 99.7% of the data lies between three standard deviations of the mean
13. What is Time Series analysis?
Time Series analysis is a statistical procedure that deals with the ordered sequence
of values of a variable at equally spaced time intervals. Time series data are
collected at adjacent periods. So, there is a correlation between the observations.
This feature distinguishes time-series data from cross-sectional data.
Below is an example of time-series data on coronavirus cases and its graph.

14. How is Overfitting different from Underfitting?
This is another frequently asked data analyst interview question, and you are
expected to cover all the given differences!
Overfitting Underfitting
Here, the model neither trains the

The model trains the data well using the training set. data well nor can generalize to new
data.
Performs poorly both on the train

The performance drops considerably over the test set.
and the test set.
This happens when there is

lesser data to build an accurate
Happens when the model learns the random
model and when we try to
fluctuations and noise in the training dataset in detail.
develop a linear model using
non-linear data.
15. How do you treat outliers in a dataset?
An outlier is a data point that is distant from other similar points. They may be due to
variability in the measurement or may indicate experimental errors.
The graph depicted below shows there are three outliers in the dataset.
To deal with outliers, you can use the following four methods:
 Drop the outlier records
 Cap your outliers data
 Assign a new value
 Try a new transformation
17. What are the different types of Hypothesis testing?
Hypothesis testing is the procedure used by statisticians and scientists to accept or

reject statistical hypotheses. There are mainly two types of hypothesis testing:
 Null hypothesis: It states that there is no relation between the predictor and outcome
variables in the population. H0 denoted it.
Example: There is no association between a patient’s BMI and diabetes.
 Alternative hypothesis: It states that there is some relation between the predictor and
outcome variables in the population. It is denoted by H1.
Example: There could be an association between a patient’s BMI and diabetes.
18. Explain the Type I and Type II errors in Statistics?
In Hypothesis testing, a Type I error occurs when the null hypothesis is rejected even
if it is true. It is also known as a false positive.
A Type II error occurs when the null hypothesis is not rejected, even if it is false. It is
also known as a false negative.
Excel Data Analyst Interview Questions
19. In Microsoft Excel, a numeric value can be treated as a text value if it

precedes with what?
20. What is the difference between COUNT, COUNTA, COUNTBLANK,

and COUNTIF in Excel?
 COUNT function returns the count of numeric cells in a range
 COUNTA function counts the non-blank cells in a range
 COUNTBLANK function gives the count of blank cells in a range

 COUNTIF function returns the count of values by checking a given condition
Data Analyst Master's Program
In Collaboration With IBMEXPLORE COURSE
21. How do you make a dropdown list in MS Excel?
 First, click on the Data tab that is present in the ribbon.
 Under the Data Tools group, select Data Validation.
 Then navigate to Settings > Allow > List.
 Select the source you want to provide as a list array.
22. Can you provide a dynamic range in “Data Source” for a Pivot table?
Yes, you can provide a dynamic range in the “Data Source” of Pivot tables. To do
that, you need to create a named range using the offset function and base the pivot
table using a named range constructed in the first step.
23. What is the function to find the day of the week for a particular date
value?
The get the day of the week, you can use the WEEKDAY() function.
The above function will return 6 as the result, i.e., 17th December is a Saturday.
24. How does the AND() function work in Excel?
AND() is a logical function that checks multiple conditions and returns TRUE or
FALSE based on whether the conditions are met.
Syntax: AND(logica1,[logical2],[logical3]....)
In the below example, we are checking if the marks are greater than 45. The result
will be true if the mark is >45, else it will be false.
25. Explain how VLOOKUP works in Excel?
VLOOKUP is used when you need to find things in a table or a range by row.
VLOOKUP accepts the following four parameters:
lookup_value - The value to look for in the first column of a table
table - The table from where you can extract value
col_index - The column from which to extract value
range_lookup - [optional] TRUE = approximate match (default). FALSE = exact

match
Let’s understand VLOOKUP with an example.
If you wanted to find the department to which Stuart belongs to, you could use the
VLOOKUP function as shown below:
Here, A11 cell has the lookup value, A2:E7 is the table array, 3 is the column index
number with information about departments, and 0 is the range lookup.
If you hit enter, it will return “Marketing”, indicating that Stuart is from the marketing
department.
26. What function would you use to get the current date and time in Excel?
In Excel, you can use the TODAY() and NOW() function to get the current date and
time.
27. Using the below sales table, calculate the total quantity sold by sales
representatives whose name starts with A, and the cost of each item they
have sold is greater than 10.
You can use the SUMIFS() function to find the total quantity.
For the Sales Rep column, you need to give the criteria as “A*” - meaning the name
should start with the letter “A”. For the Cost each column, the criteria should be “>10”
- meaning the cost of each item is greater than 10.
The result is 13.
28. Using the data given below, create a pivot table to find the total sales
made by each sales representative for each item. Display the sales as % of
the grand total.
 Select the entire table range, click on the Insert tab and choose PivotTable
 Select the table range and the worksheet where you want to place the pivot table
 Drag Sale total on to Values, and Sales Rep and Item on to Row Labels. It will give the
sum of sales made by each representative for every item they have sold.
 Right-click on “Sum of Sale Total’ and expand Show Values As to select % of Grand
Total.
 Below is the resultant pivot table.
SQL Data Analyst Interview Questions
29. How do you subset or filter data in SQL?
To subset or filter data in SQL, we use WHERE and HAVING clauses.
Consider the following movie table.

Using this table, let’s find the records for movies that were directed by Brad Bird.
Now, let’s filter the table for directors whose movies have an average duration
greater than 115 minutes.
30. What is the difference between a WHERE clause and a HAVING

clause in SQL?
Answer all of the given differences when this data analyst interview question is
asked, and also give out the syntax for each to prove your thorough knowledge to
the interviewer.
WHERE HAVING
The HAVING clause operates on

WHERE clause operates on row data.
aggregated data.
In the WHERE clause, the filter occurs before any groupings HAVING is used to filter values
are made. from a group.
Aggregate functions cannot be used. Aggregate functions can be used.
Syntax of WHERE clause:
SELECT column1, column2, ...

FROM table_name
WHERE condition;
Syntax of HAVING clause;
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
31. Is the below SQL query correct? If not, how will you rectify it?
The query stated above is incorrect as we cannot use the alias name while filtering
data using the WHERE clause. It will throw an error.
32. How are Union, Intersect, and Except used in SQL?
The Union operator combines the output of two or more SELECT statements.
Syntax:
SELECT column_name(s) FROM table1

UNION
SELECT column_name(s) FROM table2;
Let’s consider the following example, where there are two tables - Region 1 and
Region 2.
To get the unique records, we use Union.

The Intersect operator returns the common records that are the results of 2 or more
SELECT statements.
Syntax:

INTERSECT
The Except operator returns the uncommon records that are the results of 2 or more
SELECT statements.
Syntax:

EXCEPT
Below is the SQL query to return uncommon records from region 1.
33. What is a Subquery in SQL?
A Subquery in SQL is a query within another query. It is also known as a nested

query or an inner query. Subqueries are used to enhance the data to be queried by
the main query.
It is of two types - Correlated and Non-Correlated Query.
Below is an example of a subquery that returns the name, email id, and phone
number of an employee from Texas city.
SELECT name, email, phone
FROM employee
WHERE emp_id IN (
SELECT emp_id
FROM employee
WHERE city = 'Texas');
34. Using the product_price table, write an SQL query to find the record
with the fourth-highest market price.
Fig: Product Price table
select top 4 * from product_price order by mkt_price desc;
Now, select the top one from the above result that is in ascending order of mkt_price.
Free Course: Programming with Python
Learn the Basics of Programming with PythonENROLL NOW

35. From the product_price table, write an SQL query to find the total and
average market price for each currency where the average market price is
greater than 100, and the currency is in INR or AUD.
The SQL query is as follows:
The output of the query is as follows:
36. Using the product and sales order detail table, find the products with
total units sold greater than 1.5 million.
Fig: Products table
Fig: Sales order detail table
We can use an inner join to get records from both the tables. We’ll join the tables
based on a common key column, i.e., ProductID.
The result of the SQL query is shown below.

37. How do you write a stored procedure in SQL?
You must be prepared for this question thoroughly before your next data analyst
interview. The stored procedure is an SQL script that is used to run a task several
times.
Let’s look at an example to create a stored procedure to find the sum of the first N
natural numbers' squares.
 Create a procedure by giving a name, here it’s squaresum1
 Declare the variables
 Write the formula using the set statement
 Print the values of the computed variable
 To run the stored procedure, use the EXEC command
Output: Display the sum of the square for the first four natural numbers
38. Write an SQL stored procedure to find the total even number between
two users given numbers.
Here is the output to print all even numbers between 30 and 45.
Tableau Data Analyst Interview Questions
39. How is joining different from blending in Tableau?

Data Joining Data Blending
Data blending is used when the

Data joining can only be carried out when the data
data is from two or more
comes from the same source.
different sources.
E.g: Combining the Oracle table

with SQL Server, or combining
E.g: Combining two or more worksheets from the same Excel sheet and Oracle table or
Excel file or two tables from the same databases. two sheets from Excel.
All the combined sheets or tables contain a common set Meanwhile, in data blending,
of dimensions and measures. each data source contains its
own set of dimensions and
measures.
40. What do you understand by LOD in Tableau?
LOD in Tableau stands for Level of Detail. It is an expression that is used to execute

complex queries involving many dimensions at the data sourcing level. Using LOD
expression, you can find duplicate values, synchronize chart axes and create bins on
aggregated data.
41. What are the different connection types in Tableau Software?
There are mainly 2 types of connections available in Tableau.
Extract: Extract is an image of the data that will be extracted from the data source
and placed into the Tableau repository. This image(snapshot) can be refreshed
periodically, fully, or incrementally.
Live: The live connection makes a direct connection to the data source. The data will
be fetched straight from tables. So, data is always up to date and consistent.
42. What are the different joins that Tableau provides?
Joins in Tableau work similarly to the SQL join statement. Below are the types of
joins that Tableau supports:
 Left Outer Join
 Right Outer Join
 Full Outer Join
 Inner Join
43. What is a Gantt Chart in Tableau?
A Gantt chart in Tableau depicts the progress of value over the period, i.e., it shows
the duration of events. It consists of bars along with the time axis. The Gantt chart is
mostly used as a project management tool where each bar is a measure of a task in
the project.
44. Using the Sample Superstore dataset, create a view in Tableau to

analyze the sales, profit, and quantity sold across different subcategories of
items present under each category.
 Load the Sample - Superstore dataset
 Drag Category and Subcategory columns into Rows, and Sales on to Columns. It will
result in a horizontal bar chart.
 Drag Profit on to Colour, and Quantity on to Label. Sort the Sales axis in descending
order of the sum of sales within each sub-category.
45. Create a dual-axis chart in Tableau to present Sales and Profit across
different years using the Sample Superstore dataset.
 Drag the Order Date field from Dimensions on to Columns, and convert it into continuous
Month.
 Drag Sales on to Rows, and Profits to the right corner of the view until you see a light
green rectangle.
 Synchronize the right axis by right-clicking on the profit axis.

 Under the Marks card, change SUM(Sales) to Bar and SUM(Profit) to Line and adjust the
size.
46. Design a view in Tableau to show State-wise Sales and Profit using the
Sample Superstore dataset.
 Drag the Country field on to the view section and expand it to see the States.
 Drag the Sales field on to Size, and Profit on to Colour.
 Increase the size of the bubbles, add a border, and halo color.
From the above map, it is clear that states like Washington, California, and New York
have the highest sales and profits. While Texas, Pennsylvania, and Ohio have good
amounts of sales but the least profits.
47. What is the difference between Treemaps and Heatmaps in Tableau?
Treemaps Heatmaps
Heat maps can visualize measures

against dimensions with the help of
Treemaps are used to display data in nested
colors and size to differentiate one or
rectangles.
more dimensions and up to two
measures.
You use dimensions to define the structure of the The layout is like a text table with
treemap, and measures to define the size or color variations in values encoded as
of the individual rectangles. colors.
Treemaps are a relatively simple data visualization

In the heatmap, you can quickly see a
that can provide insight in a visually attractive
wide array of information.
format.
48. Using the Sample Superstore dataset, display the top 5 and bottom 5
customers based on their profit.
 Drag Customer Name field on to Rows, and Profit on to Columns.

 Right-click on the Customer Name column to create a set
 Give a name to the set and select the top tab to choose the top 5 customers by
sum(profit)
 Similarly, create a set for the bottom five customers by sum(profit)
 Select both the sets, right-click to create a combined set. Give a name to the set and
choose All members in both sets.
 Drag top and bottom customers set on to Filters, and Profit field on to Colour to get the
desired result.
Python Data Analyst Interview Questions
49. What is the correct syntax for reshape() function in NumPy?

Post Graduate Program in AI and Machine Learning
In Partnership with Purdue UniversityEXPLORE COURSE
50. What are the different ways to create a data frame in Pandas?
There are two ways to create a Pandas data frame.
 By initializing a list
 By initializing a dictionary
51. Write the Python code to create an employee’s data frame from the
“emp.csv” file and display the head and summary.
To create a DataFrame in Python, you need to import the Pandas library and use the
read_csv function to load the .csv file. Give the right location where the file name and
its extension follow the dataset.
To display the head of the dataset, use the head() function.
The ‘describe’ method is used to return the summary statistics in Python.

52. How will you select the Department and Age columns from an
Employee data frame?
You can use the column names to extract the desired columns.
53. Suppose there is an array, what would you do?
num = np.array([[1,2,3],[4,5,6],[7,8,9]]). Extract the value 8 using 2D indexing.

Since the value eight is present in the 2nd row of the 1st column, we use the same
index positions and pass it to the array.
54. Suppose there is an array that has values [0,1,2,3,4,5,6,7,8,9]. How will
you display the following values from the array - [1,3,5,7,9]?
Since we only want the odd number from 0 to 9, you can perform the modulus
operation and check if the remainder is equal to 1.
55. There are two arrays, ‘a’ and ‘b’. Stack the arrays a and b horizontally
using the NumPy library in Python.
You can either use the concatenate() or the hstack() function to stack the arrays.
56. How can you add a column to a Pandas Data Frame?
Suppose there is an emp data frame that has information about a few employees.
Let’s add an Address column to that data frame.
Declare a list of values that will be converted into an address column.

57. How will you print four random integers between 1 and 15 using
NumPy?
To generate Random numbers using NumPy, we use the random.randint() function.
58. From the below DataFrame, how will you find each column's unique
values and subset the data for Age<35 and Height>6?
To find the unique values and number of unique elements, use the unique() and
nunique() function.
Now, subset the data for Age<35 and Height>6.
59. Plot a sine graph using NumPy and Matplotlib library in Python.
Below is the result sine graph.

60. Using the below Pandas data frame, find the company with the highest
average sales. Derive the summary statistics for the sales column and
transpose the statistics.
 Group the company column and use the mean function to find the average sales
 Use the describe() function to find the summary statistics

 Apply the transpose() function over the describe() method to transpose the statistics
So, those were the 60 data analyst interview questions that can help you crack your
next data analyst interview and help you become a data analyst.
Conclusion

Data Analyst Interview Questions and Answers

Uploaded by

Copyright:

Available Formats

Data Analyst Interview Questions and Answers

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analyst Interview Questions and Answers

Uploaded by

Copyright:

Available Formats

What are some common questions asked in a data analyst interview?

What are some common questions asked in a data analyst interview?

How should one prepare to answer questions about their experience and skills?

How should one prepare to answer questions about their experience and skills?

15 Data Analyst Interview Questions

General data analyst interview questions

1. Tell me about yourself.

As you formulate your answer, try to answer these three questions:

Interviewer might also ask:

2. What do data analysts do?

Interviewer might also ask:

3. What was your most successful/most challenging

Getting asked about a project you’re proud of is your chance to highlight

Interviewer might also ask:

4. What’s the largest data set you’ve worked with?

Interviewer might also ask:

Data analysis process questions

5. Explain how you would estimate … ?

With this type of question (sometimes called a guesstimate), the interviewer

6. What is your process for cleaning data?

As a data analyst, data preparation, also known as data cleaning or data

Interviewer might also ask:

7. How do you explain technical concepts to a non-

Interviewer might also ask:

8. Tell me about a time when you got unexpected

9. How would you go about measuring the

Technical skill questions

10. What data analytics software are you familiar

Interviewer might also ask:

11. What scripting languages are you trained in?

Interviewer might also ask:

12. What statistical methods have you used in data

If you’ve ever worked with or created statistical models, be sure to mention

Interviewer might also ask:

14. Explain the term…

Throughout your interview, you may be asked to define a term or explain

15. Can you describe the difference between … ?

Q1. What is the difference between Data Mining and Data Analysis?

Data Mining Data Analysis

Used to order & organize raw data in a meaningful

Table 1: Data Mining vs Data Analysis – Data Analyst Interview Questions

Q2. What is the process of Data Analysis?

Data analysis is the process of collecting, cleansing, interpreting, transforming and

Q3. What is the difference between Data Mining and Data Profiling?

Data Profiling: Data Profiling refers to the process of analyzing individual attributes

Q5. What are the important steps in the data validation process?

Q7. When do you think you should retrain a model? Is it dependent

 Presence of Duplicate entries and spelling mistakes, reduce data quality.

Q9. What is the KNN imputation method?

Q10. Mention the name of the framework developed by Apache for

 HDFS -> Hadoop Distributed File System

A Pivot table is made up of four different sections:

 Values Area: Values are reported in this area

Q5. Can you make a Pivot Table from multiple tables?

Q6. How can we select all blank cells in Excel?

 Purpose of the Dashboards

 Try using manual calculation mode.

Q10. Can you sort multiple columns at one time?

Data Analyst Interview Questions: Statistics

Q1. What do you understand by the term Normal Distribution?