Aai101 Data Science Question Bank
Aai101 Data Science Question Bank
QUESTION BANK
SUBJECT CODE: AAI101 YEAR / SEM: I/II
SUBJECT NAME: INTRODUCTION TO DATA SCIENCE
ACADEMIC YEAR: 2024-2025
NAME OF THE FACULTY: MR. MADHAVAN R
Course Outcomes
After successful Completion of the Course, the Students should be able to
Course
Course Outcomes
Outcome No
CO2 Understand different types of data description for data science process
CO3 Gain knowledge on relationships between data
Q.No Questions BT
Level Mark
Define Data Science and Big Data. (NOV/DEC 2022) Data BTL1 2
1. Science:Use
s and
Benefits
What is the role of data science in business, medical research, Data BTL1 2
2. healthcare, education, social media, technology and financial Science:Use
institutions? s and
Benefits
Write the main types/categories of data? Facets of BTL1 2
3. Data
What is predictor and target variable what are its use in data Data BTL2 15M
4. modeling elaborate in detail with program and example. Modelling
Elaborate the steps of data exploration techniques if the given data Data BTL3 15M
5. set is population dataset from the year 2011 to 2016 and the Exploration
quantity of data required ranging from 6 to 50.
UNIT - 2
Types of Data – Types of Variables -Describing Data with Tables and Graphs –Describing Data with
Averages – Describing Variability – Normal Distributions and Standard (z) Scores
PART – A
BT
Q.No Questions Topic Mark
Level
FRIENDS f
400-above 2
350-399 5
300-349 12
250-299 17
200-249 23
Describing
150-199 49
Data with
5.
100-149 27 Tables and
Graphs
50-99 29
0-49 36
Total 200
What is relative frequency distribution? The GRE scores for a group of BTL3 8
graduate school applicants are distributed as follows:
Describing
GRE Score Frequency
Data with
6. 725-749 1 Tables and
700-724 3 Graphs
675-699 14
650-774 30
625-649 34
600-624 42
575-599 30
550-574 27
525-549 13
500-524 4
475-499 2
TOTAL 200
(i) What is Z-score ? Outline the steps to obtain a Z-Score. Normal BTL3 7
8. (APR/MAY 2023) Distributions
and z score
(ii) Express each of the following scores as a Z Score: First, Mary’s BTL3 6
intelligent quotient is 135, given a mean of 100 and standard deviation 15. Normal
9. Second, Mary obtained a score of 470 in the Competetive Examination Distributions
conducted in April 2022, given a mean of 500 and a standard deviation of and z score
100. (APR/MAY 2023)
10. Describing BTL3 5
What is frequency distribution? Customers who have purchased a particular Data with
product rated the usability of the product on a 10-point scale, ranging from Tables and
1(poor) to 10(excellent) as follows
Graphs
3 7 2 7 8
3 1 4 10 3
2 5 3 5 8
9 7 6 3 7
8 9 7 3 6
Construct a frequency distribution for the above data.
(APR/MAY 2023)
PART – C
(i)What is mode? Can there be distributions with no mode or more than one BTL3 15
mode? The owner of a new car conducts six gas mileage tests and obtains
the following results, expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, Describing
27.4, 26.9. Find the mode for these data. Data with
1. (ii)What is median? Outline the steps to find the median and find the median
for the following scores: first, set of five scores 2,8,2,7,6 and second, set of Tables and
six scores 3,8,9,3,1,8 with steps. Graphs
(APRIL/MAY 2023)
During their first swim through a water maze.15 laboratory rats made the Describing BTL2 15M
2. following number of errors(blind alleyway entrances): Data with
2,17,5,3,28,7,5,8,5,6,2,12,10,4,3(AU APR/MAY 2024) Averages
Explain in detail about the Normal Distribution. Normal BTL2 15 M
3. Distributions
and z score
Explain in detail about the Describing data with Tables and Graphs. Describing BTL2 15 M
Data with
4.
Tables and
Graphs
Explain in detail about the Types of Variables in Data Science give a case BTL2 15 M
Types of
5. study on Sample and Population mean and variance with relevant examples
variables
and dataset.
UNIT - 3
BT
Q.No Questions Topic Mark
Level
Consider Helen sent 10 greeting cards to her friends and she BTL3 2
2. received back 8 cards, what is the kind of relationship it is? Brief on Correlation
it.(NOV/DEC 2022)
What is the use of scatter plot? (APRIL/MAY 2023) BTL4 2
3. Scatter plots
What are the key properties of Pearson correlation coefficient ? (AU Correlation BTL1 2
18.
NOV/DEC 2023) coefficient
Compare correlation and regression BTL2 2
19. Correlation
Multiple BTL4 2
regression
25. What is Multicollinearity? equations
PART – B
Calculate the correlation coefficient for the heights ‘in inches’ of BTL3 13
fathers(x) and their son’s (y) with the data presented Correlation
1. below.(APR/MAY 2023)
coefficient
x 66 68 68 70 71 72 72
y 68 70 69 72 72 72 74
The values of x and their corresponding values of y are presented BTL3 13
below.(APR/MAY 2023)
x 0.5 1.5 2.5 3.5 4.5 5.5 6.5
y 2.5 3.5 5.5 4.5 6.5 8.5 10.5 Least
2.
squares
(i)Find the least square regression line y=ax+b.
(ii)Estimate the value of y when x=10.
3. 2 2 Scatter plots
3 2
1 1
2 2
(1) Construct a scatterplot to verify a lack of pronounced
culvilinearity
(2) Determine the least squares equation for these data.
(Remember, you will first have to calculate r, SSy and SSx)
Determine the standard error of estimate, Sy/x, given that n=7 .
(NOV/DEC 2022)
4. In studies dating back over 100 years, it’s well established that Correlation BTL3 13
regression toward the mean occurs between the heights of fathers
and the heights of their adult sons.
Indicate whether the following statements are true or false.
(1) Sons of tall fathers will tend to be shorter than their fathers.
(2) Sons of short fathers will tend to be taller than the mean for
all sons.
(3) Every son of a tall father will be shorter than his father.
(4) Taken as a group, adult sons are shorter than their fathers.
(5) Fathers of tall sons will tend to be taller than their sons.
(6) Fathers of short sons will tend to be taller than their sons but
shorter than the mean for all fathers.
(ii) Interpret the value of r2 in correlation based analysis.(NOV/DEC
2022)
(i) In statistics, highlight the impact when the goodness of fit test BTL3 13
score is low?
(ii) Given the following dataset of employee.Using regression
analysis, find the expected salary of an employee if the age is 45.
Age Salary
5. 54 67000 Regression
42 43000
49 55000
57 71000
35 25000
(AU APR/MAY 2024)
(i) Define autocorrelation and how it is calculated? What does the BTL2 13
negative correlation convey?
(ii) What is the philosophy of Logistic Regression? Linear
6.
What kind of model it is? What does logistic Regression predict? Regression
Tabulate the cardinal differences of Linear and Logistic Regression.
(AU APR/MAY 2024)
(i) Explain scatter plot. BTL2 13
7. (ii) Describe range and variance Scatter plot
(AU NOV/DEC 2023)
Calculate the value of r using computation formula for the BTL3 13
following data
FRIENDS SEN RECEIVED
T
Dories 13 14
Correlation
8.
Steve 9 18 coefficient
Mike 7 12
Andrea 5 10
John 1 6
(i) Explain the correlation coefficient. Correlation BTL3 6
9.
(ii) Explain how the least squares equation which is used to coefficient
minimize the total of all squared prediction errors with example.
(AU NOV/DEC 2023)
Explain in detail about Multiple Regression Equations. Multiple BTL2 7
10.
regression
PART – C
Assume that an r of - .80 describes the strong negative relationship BTL3 15M
between years of heavy smoking (X) and life expectancy (Y)
Assume, furthermore , that the distributions of heavy smoking and
life expectancy each have the following means and sum of squares :
5 60 35 70 x y X Y SS SS
(i) Determine the least squares regression equation for
predicting life expectancy from years of heavy smoking Standard
1. (ii) Determine the standard error of estimate, Sy/x, assuming error
that the correlation of -80 was based on n=50 pairs of estimate
observations.
(iii) Supply a rough interpretation of Sy/x.
(iv) Predict the life expectancy for John, who has smoked
heavily for 8 years.
Predict the life expectancy for Katie, who has never smoked
heavily.(NOV/DEC 2022)
Consider the following dataset with one response variable y and two BTL3 15M
predictor variables x1 and x2
y 140 155 159 179 192 200 212 215
x1 60 62 67 70 71 72 75 78
Multiple
2. x2 22 25 24 20 15 14 14 11
regression
20 6.8 91738
8 3.2 64445
6 2.2 39891
4 7.1 98273
21 3.2 54445
7 10.5 121872
29 6.0 93490
19 4.0 55794
11 8.2 11812
UNIT 4
Create a data frame with key and data pairs as Key-Data Data BTL2 2
4. pair as A-10,C-20, C-5,B-10,C-10. Find the sum of each manipulation
key and display the result as each key group. with pandas
(NOV/DEC 2022)
Explain Partial sort.(AU NOV/DEC 2023) Sorting arrays BTL2 2
5.
Under what circumstances,the pivot_table() in pandas is Data BTL2 2
6. used? (AU APR/MAY 2024) manipulation
with pandas
Write the output for the following numpy code? NumPy BTL2 2
7. (i) np.array([3,14,4,2,3]) Arrays
(ii) np.array([1,2,3,4],dtype=’float32’)
(iii) np.array([range(i,i+3) for i in [2,4,6]])
(iv) np.zeros(10,dtype=int)
(v) np.ones((3,5), dtype=float)
(vi) np.full((3,5),3.14)
(vii) np.arrange(0,20,20)
(viii) np.linespace(0,1,50
(ix) np.random.random((3,3))
(x) np.random.normal(0,1,(3,3))
Use appropriate data visualization modules develop a Data BTL2 2
8. python code snippet that generates a simple sinusoidal manipulation
wave in an empty gridded axes? (AU APR/MAY 2024) with pandas
What is Data frame? Data BTL2 2
9. manipulation
with pandas
How a pandas data frame can be constructed? Data BTL2 2
10. manipulation
with pandas
What are indexers? Hierarchical BTL2 2
11. indexing
How missing data can be handled in python? Handling BTL2 2
12. missing data
How the operations can be performed on null values in Handling BTL2 2
13. pandas data science? missing data
Define Hierarchical indexing. Hierarchical BTL2 2
14. indexing
What is pivot table? Data BTL2 2
15. manipulation
with pandas
Identify the details maintained by python to store an Structured BTL2 2
16. integer data
Write python code to create 1D,2D and 3D numpy Numpy arrays BTL2 2
17. arrays.
How do you verify the shape of 1D, 2D and 3D/ND Numpy arrays BTL2 2
18. array respectively?
Compare python list with arrays Data Indexing BTL2 2
19. and Selection
Write short note on python array object Data Indexing BTL2 2
20. and Selection
How to perform slicing to access the elements of numpy Data Indexing BTL2 2
21. arrays and Selection
Summarize some built-in Pandas Aggregation BTL2 2
22. aggregations.(NOV/DEC 2023) and Grouping
What is indexing and negative indexing in tuple. Data Indexing BTL2 2
23. and Selection
Write the list of aggregate functions of numpy Aggregation BTL2 2
24. and Grouping
What is fancy indexing? Fancy BTL2 2
25. Indexing
PART - B
Imagine you have a series of data that represents the Data BTL3 13
1. amount of precipitation each day for a year in a given manipulation
city. Load the daily rainfall statistics for the City of with pandas
Chennai in 2021 which is given in a csv file
Chennairainfall2021.csv using pandas generate a
histogram for rainy days, and find out the days that have
high rainfall.(NOV/DEC 2022)
Consider that an E-Commerce organization like Data BTL3 13
2. Amazon, have different regions sales as NorthSales, manipulation
SouthSales, WestSales, EastSales.csv files. They want with pandas
to combine North and West region sales and South and
East sales so to find the aggregate sales of these
collaborating regions help them to do so using Python
code.(NOV/DEC 2022).
Explain grouping in python with example. (AU Aggregation BTL3 13
3. NOV/DEC 2023) and Grouping
UNIT 5
PART – A
Topic