Data Science Model Examination
Data Science Model Examination
(Regulations 2021)
3.Outline Tabulate the differences between variate, bivariate, and multivariate analysis. Give
examples.
6. Brief about the Type I and Type II errors in Statistics, Identify the relationship between standard
error and margin of error.
7. With an assumption of a null hypothesis as correct, what does it mean when the p-values are high
and low"
9. Outline a few approaches to detect outliers? Explain different ways to deal with it.
PART B (5 × 13 = 65 marks)
11. (a) Brief about exploratory analysis in Dataset analysis and knowledge discovery
process. (13)
Or
(b) (i) Outline the purpose of data cleansing. How missing and nullified data attributes
are handled and modified during preprocessing (6)
(ii) Explain Data Analytic life cycle. Brief about Regression Analysis (7)
12. (a) (i) Indicate whether each of the following distributions is positively or negatively
skewed. The distribution of (6)
(2) GPAs for all students at some college have a mean of 3.01 and a median of 3.20
(1) Find the mode, median and mean for these data.
(2) Draw the distribution for balanced, positively skewed or negatively skewed. (7)
Or
(B) (i)Assume that SAT math scores approximate a normal curve with a mean of 500
and a standard deviation of 100.
Sketch a normal curve and shade in the target areas(s) described by each of the
following statements:
(ii) Assume that the burning times of electric light bulbs approximate a normal curve
with a mean of 1200 hours and a standard deviation of 150 hours and standard
derivation of 120 hours. If a large number of new lights are installed at the same time,
at what time will.
13. (a) (i) Among 100 couples who had undergone marital counseling, 60 couples described
their relationships as improved, and among this latter group, 45 couples had children. The
remaining couples described their relationships as unimproved, and among this group, 5
couples had children.
(1) What is the probability of randomly selecting a couple who described their
relationship as improved? (2)
(2) What is the probability of randomly selecting a couple with children? (2)
(3) What is the conditional probability of randomly selecting a couple with children,
given that their relationship was described as improved? (2)
(ii) The probability of a boy being born equals 0.50, or 1/2, as does the probability of a girl
being born. For a randomly selected family with two children, what’s the probability of
Or
(b) (i) The normal range for a widely accepted measure of body size, the Body Mass Index
(BMI), ranges from 18.5 to 25. Using the midrange BMI score of 21.75 as the null
hypothesized value for the population mean, test this hypothesis at the 0.01 level of
significance given a random sample of 30 weight-watcher participants who show a mean
BMI = 22.2 and a standard deviation of 3.1. (6)
(ii) State any two reasons why the research hypothesis is not tested directly. Explain
them in brief (7)
14.(a) (i) Twenty-three overweight male volunteers are randomly assigned to three different
treatment programs designed to produce a weight loss by focusing on either diet, exercise, or
the modification of eating behavior. Weight changes were recorded, to the nearest pound, for
all participants who completed the two-month experiment. Positive scores signify a weight
drop; negative scores, a weight gain.
Behavior
Diet Exercise
Modification
Weight
Change
3 -1 7
4 0 10
0 8 0
3 4 12
10 -3 12
3 0 5
T 22 12 63
n8 6 9
(ii) The F test describes the ratio of two sources of variability: that for subjects treated
differently and that for subjects treated similarly. Is there any sense in which the t-test for two
independent groups can be viewed likewise? (7)
Or
14. (b) (i) Brief about Partial squared curvilinear correlation. What is its purpose? (6)
(ii) Brief about TUKEY’S HSD Test. Additionally, explain in brief about two-factor
ANOVA. (7)
15.
(a) (1) In Statistics, what happens when the goodness of fit test score is low?
(6)
(ii) Given the following dataset of employee, Using regression analysis, find the expected
salary of an employee if the age is 45. (7)
Age Salary
54 07000
42 43090
49 55000
57 71000
35 25000
Or
(b) Define antocorrelation and how is it calculated? What does the negative correlation
convey? (6)
(in) What is the philosophy of Logistic regression? What kind of model it is? What does
logistic Regression predict? Tabulate the cardinal differences of Linear and Logistic
Regression.
(7)
16. (a) (i) Discuss the role of sampling distribution in inferential statistics (8)
(ii) Indicate whether each of the following distributions is positively or negatively skewed.
The distribution of
Or
(b) (i) Assume that we have a stream of items of large and unknown length that we can only
iterate over once. Devise an affective sampling algorithm that randomly chooses an item from
this stream such that each item is equally likely to be selected. (8)
(1) During their first swim through a water maze, 15 laboratory rata made the
following number of errors (blind alleyway entrances): 2, 17, 5, 3, 28, 7, 5, 8, 5, 6,
2, 12, 10, 4, 3.
Find the mode, median and mean for these data.