0% found this document useful (0 votes)
9 views

Data Science Model Examination

Uploaded by

arfath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Science Model Examination

Uploaded by

arfath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Artificial Intelligence and Data Science

AD 3491 FUNDAMENTALS OF DATASCIENCE AND ANALYTICS

(Common to: Computer Science and Business Systems)

(Regulations 2021)

Maximum 100 marks Time: Three hours

Answer ALL questions

1. List down any five skills required for a data analyst.

2. outline the significance of exploratary data analysis(EDA).

3.Outline Tabulate the differences between variate, bivariate, and multivariate analysis. Give
examples.

4. Give an example of a data set with a non Gaussian distribution.

5. Explain the term 'Normal Distribution

6. Brief about the Type I and Type II errors in Statistics, Identify the relationship between standard
error and margin of error.

7. With an assumption of a null hypothesis as correct, what does it mean when the p-values are high
and low"

8. Define the term one-factor ANOVA,

9. Outline a few approaches to detect outliers? Explain different ways to deal with it.

10 Give an approach to handle missing values in a dataset.

PART B (5 × 13 = 65 marks)

11. (a) Brief about exploratory analysis in Dataset analysis and knowledge discovery
process. (13)

Or

(b) (i) Outline the purpose of data cleansing. How missing and nullified data attributes
are handled and modified during preprocessing (6)

(ii) Explain Data Analytic life cycle. Brief about Regression Analysis (7)

12. (a) (i) Indicate whether each of the following distributions is positively or negatively
skewed. The distribution of (6)

(1) Incomes of taxpayers have a mean of $48,000 and a median of $43,000.

(2) GPAs for all students at some college have a mean of 3.01 and a median of 3.20

(ii) Consider the following number of online examination attempts of 15 students:


2,17,5,3,28,7,5,8,8,8,2,12,10,4,3

(1) Find the mode, median and mean for these data.

(2) Draw the distribution for balanced, positively skewed or negatively skewed. (7)

Or

(B) (i)Assume that SAT math scores approximate a normal curve with a mean of 500
and a standard deviation of 100.

Sketch a normal curve and shade in the target areas(s) described by each of the
following statements:

• More than 570


• Less than 515
• Between 520 and 540 (5)

(ii) Assume that the burning times of electric light bulbs approximate a normal curve
with a mean of 1200 hours and a standard deviation of 150 hours and standard
derivation of 120 hours. If a large number of new lights are installed at the same time,
at what time will.

* 1 percent fail (2)

* 50 percent fail (2)

* 95 percent fail (4)

13. (a) (i) Among 100 couples who had undergone marital counseling, 60 couples described
their relationships as improved, and among this latter group, 45 couples had children. The
remaining couples described their relationships as unimproved, and among this group, 5
couples had children.

(1) What is the probability of randomly selecting a couple who described their
relationship as improved? (2)

(2) What is the probability of randomly selecting a couple with children? (2)

(3) What is the conditional probability of randomly selecting a couple with children,
given that their relationship was described as improved? (2)

(ii) The probability of a boy being born equals 0.50, or 1/2, as does the probability of a girl
being born. For a randomly selected family with two children, what’s the probability of

(1)Two boys, that is, a boy and a boy? (3)


(2) Two girls? (2)

(3) Either two boys or two girls? (2)

Or

(b) (i) The normal range for a widely accepted measure of body size, the Body Mass Index
(BMI), ranges from 18.5 to 25. Using the midrange BMI score of 21.75 as the null
hypothesized value for the population mean, test this hypothesis at the 0.01 level of
significance given a random sample of 30 weight-watcher participants who show a mean
BMI = 22.2 and a standard deviation of 3.1. (6)

(ii) State any two reasons why the research hypothesis is not tested directly. Explain
them in brief (7)

14.(a) (i) Twenty-three overweight male volunteers are randomly assigned to three different
treatment programs designed to produce a weight loss by focusing on either diet, exercise, or
the modification of eating behavior. Weight changes were recorded, to the nearest pound, for
all participants who completed the two-month experiment. Positive scores signify a weight
drop; negative scores, a weight gain.

Behavior
Diet Exercise
Modification

Weight
Change
3 -1 7
4 0 10
0 8 0
3 4 12
10 -3 12
3 0 5

T 22 12 63

n8 6 9

ΣX = G = 97; N = 23 ΣX² = 961

Summarize the results with an ANOVA table. (6)

(ii) The F test describes the ratio of two sources of variability: that for subjects treated
differently and that for subjects treated similarly. Is there any sense in which the t-test for two
independent groups can be viewed likewise? (7)
Or

14. (b) (i) Brief about Partial squared curvilinear correlation. What is its purpose? (6)

(ii) Brief about TUKEY’S HSD Test. Additionally, explain in brief about two-factor
ANOVA. (7)

15.

(a) (1) In Statistics, what happens when the goodness of fit test score is low?

(6)

(ii) Given the following dataset of employee, Using regression analysis, find the expected
salary of an employee if the age is 45. (7)

Age Salary

54 07000

42 43090

49 55000

57 71000

35 25000

Or

(b) Define antocorrelation and how is it calculated? What does the negative correlation
convey? (6)

(in) What is the philosophy of Logistic regression? What kind of model it is? What does
logistic Regression predict? Tabulate the cardinal differences of Linear and Logistic
Regression.

(7)

PART C (1*15-15 marks)

16. (a) (i) Discuss the role of sampling distribution in inferential statistics (8)

(ii) Indicate whether each of the following distributions is positively or negatively skewed.
The distribution of

• Incomes of taxpayers have $43,000. a mean 48,000 and a median of


• GPAs for all students at some college have a mean of 3,01 and a median of 3.20.
* Daily TV viewing times for preschool children has a mean of 55 minutes and a median of
73 minutes. (7)

Or

(b) (i) Assume that we have a stream of items of large and unknown length that we can only
iterate over once. Devise an affective sampling algorithm that randomly chooses an item from
this stream such that each item is equally likely to be selected. (8)

(1) During their first swim through a water maze, 15 laboratory rata made the
following number of errors (blind alleyway entrances): 2, 17, 5, 3, 28, 7, 5, 8, 5, 6,
2, 12, 10, 4, 3.
Find the mode, median and mean for these data.

Without constructing a frequency distribution or graph, would you characterize


the shape of this distribution as balanced, positively skewed ar negatively
skowed? (7)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy