0% found this document useful (0 votes)

104 views8 pages

Sample Quiz1 Questions

The document provides sample quiz questions for an introduction to analytics modeling course. It includes 4 multiple choice or short answer questions that cover topics like linear classification, decision trees, probability modeling, time series analysis, clustering, and exponential smoothing. The questions test understanding of different modeling techniques and their appropriate applications to sample prediction problems. Sample data and partial solutions are provided for some questions.

Uploaded by

Big Data

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views8 pages

Sample Quiz1 Questions

Uploaded by

Big Data

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ISYE 6501x Introduction to Analytics Modeling

Sample Quiz #1 Questions

NOTES

1. The real quiz will have more questions, and cover more material; these questions are just meant
to give you an idea of the question style and depth.

2. Because of the online format, I will try to make some of the answers more structured than the
purely-free-answer format in two of the questions below.

3. This is being posted early, because a bunch of you asked for it. Some of the topics covered
below are things you’ll see in the weeks between now and when you take the quiz, so if they
don’t look familiar yet, don’t worry!
NAME____________________________

ISYE 6501x, Introduction to Analytics Modeling

Quiz #1 – 90 minute time limit

INSTRUCTIONS

• Work alone. Do not collaborate with or copy from anyone else.

• You may use any of the following resources:

o One sheet (both sides) of handwritten (not photocopied or scanned) notes

• If any question seems ambiguous, use the most reasonable interpretation (i.e., don’t be like Calvin).

• Good luck!
1. Figure A below shows a linear classifier (dashed line) for a classification problem, using two
predictors (𝒙𝒙𝟏𝟏 and 𝒙𝒙𝟐𝟐 ) to separate between black and white points. Figure B shows a CART
(classification tree) approach to the same problem.

Figure A Figure B
In each leaf, “a out of b” means that there are
b data points in the leaf, and a of them are
classified correctly using the leaf’s answer.

a. In Figure A, which predictor (𝒙𝒙𝟏𝟏 or 𝒙𝒙𝟐𝟐 ) is not important for separating between the black
and white points in this model?

(CORRECT ANSWER: 𝒙𝒙𝟐𝟐 . The classifier is a vertical line, so all that matters is whether x1 is larger or
smaller than 3.5.)

b. In Figure B, both 𝒙𝒙𝟏𝟏 and 𝒙𝒙𝟐𝟐 are used to classify the points (even though one was
unimportant in Figure A). Which classification model do you think is better (Figure A or
Figure B), and why?

CHOICES
i. Figure A, because Figure B overfits the lower-rightmost leaf.
ii. Figure B, because it misclassifies 11 points, and Figure A misclassifies 12 points.
i. Figure B, because it uses both predictors for classification.
ii. Figure A, because it is a simpler model.

(CORRECT ANSWER: i. The lower-rightmost leaf has just one data point in it, a clear example of
overfitting. Although ii is a true answer, it is not correct: B is not a better model even though it
misclassifies one fewer point, because the apparent better fit is due to overfitting. iv might be a
reasonable answer in general, but in this case the overfitting of B overrides having or not having a
slightly simpler model.
NOTE: As we saw in the lessons, as a rule of thumb each leaf should have at least 5% of the
data points, and a common rule of thumb for a factor-based model is to have at least 10 times
as many data points as factors selected.)
2. A geologist would like to build a model to predict the probability that a volcano will erupt in a given
week. The geologist has previous eruption data, as well as several factors that can be used as
predictors.

a. Which of the following models would be most appropriate for the geologist to use to predict the
probability of an eruption?

CHOICES
a. ARIMA g. k-means clustering
b. CART h. k-nearest-neighbor classification
c. Cross-validation i. Linear regression
d. CUSUM j. Logistic regression
e. Exponential smoothing k. Support vector machine
f. GARCH

(CORRECT ANSWER: j. Logistic regression is the model we’ve seen (o will see before the quiz for
predicting probabilities from a set of factors.)

b. Select all of the following models that would be appropriate for the geologist to use to classify
data points into “eruption” or “not eruption”.

CHOICES
a. ARIMA
b. CART
c. CUSUM
d. k-nearest-neighbor
e. Support vector machine

(CORRECT ANSWERS: b,d,e (all three must be selected for full credit). All three of these are
methods we’ve seen (or will see before the quiz) for classifying based on a set of factors.)

b. The geologist used a simple 50% threshold: if the model predicts a probability of 50% or more
that the volcano will erupt, the geologist recommends that the nearby towns be evacuated. Do
you think 50% is the right threshold, or should it be higher or lower? Explain why.

(CORRECT ANSWER: I accepted any answer that made sense. Many people suggested using a
lower threshold, since lives would be at stake: even if the probability of eruption is small, the
towns should be evacuated. Others suggested a higher threshold, also to save lives, because of
the crying-wolf effect: if the towns were evacuated a couple of times and there was no eruption,
people would be less likely to evacuate again, even if the model suggested a higher probability.
And there were other answers I accepted too. Basically, if your answer used analytics reasoning
correctly, you got credit; if it didn’t use analytics reasoning correctly, you didn’t get credit.)
c. Instead of building a model to predict probability, suppose the geologist measures the
magnitude of tremors in the area over time. When the magnitude of the tremors changes (gets
much larger), the geologist will recommend evacuating nearby towns. Which of the following
models would be most appropriate for the geologist to use?

(CORRECT ANSWER: d. CUSUM is the model we’ve seen for directly detecting changes.)
3. A police department has used a k-means approach to find geographic clusters of instances of crime
in their city. Their goal is to find locations for new police patrols to reduce crime in high-crime
areas. The figures below show the solutions for k=2, k=3, and k=4; circles are crime instances and
stars are cluster centers.

k=2

k=3

k=4

Based on how far officers can effectively patrol, the police department initially selected the k=4 solution.
However, they then realized that some of the crimes were committed during the day and others at
night, and they might have two sets of new patrols, one set during the day and set one at night. How
would you suggest the police department redo or change its analysis?

(CORRECT ANSWER: I accepted any answer that made good analytics sense. The most common
one (and the one I was expecting) is that the department could do two clusterings: one using only
daytime data points and one using only nighttime data points, and create daytime patrols and
nighttime patrols based on those two solutions.)
4. Recall the equations for triple exponential smoothing (Winters’/Holt-Winters method):

𝑥𝑥𝑡𝑡
𝑆𝑆𝑡𝑡 = 𝛼𝛼 + (1 − 𝛼𝛼)(𝑆𝑆𝑡𝑡−1 + 𝑇𝑇𝑡𝑡−1 )
𝐶𝐶𝑡𝑡−𝐿𝐿
𝑇𝑇𝑡𝑡 = 𝛽𝛽(𝑆𝑆𝑡𝑡 − 𝑆𝑆𝑡𝑡−1 ) + (1 − 𝛽𝛽)𝑇𝑇𝑡𝑡−1
𝑥𝑥𝑡𝑡
𝐶𝐶𝑡𝑡 = 𝛾𝛾 + (1 − 𝛾𝛾)𝐶𝐶𝑡𝑡−𝐿𝐿
𝑆𝑆𝑡𝑡

A construction vehicle manufacturer wants to use this model to analyze a production process
where construction vehicles are produced in batches of exactly 170, and a batch takes an
average of 9 days to be completed (usually between 8 and 10). Our data includes the day each
vehicle’s production is completed, its sequence in the batch (e.g., 57th out of 170), the day
within the batch that it was completed (e.g., completed on the 3rd day the batch was being
produced), and the number of hours the vehicle operated before its first breakdown.

Vehicle ID Sequence in batch Day within batch that Hours of operation

vehicle was produced before first breakdown
047-92-1HA 56 3 1570
091-46-7ZQ 57 3 2349
854-A9-21B 58 3 3016
620-88-4GA 59 4 2201
Etc.

Based on this data, the manufacturer wants to use a triple exponential smoothing model to
determine whether any patterns exist in the number of hours before the first breakdown, based
on a vehicle’s sequence number in its batch.

For each of the mathematical terms on the left, pick the appropriate number or description
from the right.

a. 𝑥𝑥𝑡𝑡 i. 170
b. 𝐿𝐿 ii. 9
iii. Sequence in batch
iv. Day within batch that vehicle was produced
v. Hours of operation before first breakdown

(CORRECT ANSWERS:
a. v. 𝒙𝒙𝒕𝒕 is the observed value of the response, the hours of operation before the first
breakdown.
b. i. 𝑳𝑳 is the length of the cycle. Since the question specifies “a vehicle’s sequence
number in its batch”, the cycle length is the 170 vehicles in each batch.)

c. If the manufacturer observes that the values of 𝐶𝐶 are generally close to 1, except that
they are significantly lower than 1 for vehicles built near the beginning of batches, what
can be concluded?

CHOICES
i. There is no effect of sequence in batch on the number of hours before the first
breakdown.
ii. Vehicles built early in a batch tend to break down more quickly.
iii. Vehicles built early in a batch tend to break down more quickly, because
workers are adjusting to the different specifications in a each new batch.
iv. Vehicles built early in a batch tend to take longer to break down.
v. Vehicles built early in a batch tend to take longer to break down, because
workers are paying closer attention to their work early in each new batch.

(CORRECT ANSWER: ii. Values of 𝑪𝑪 less than 1 mean that the response (hours before first
breakdown) is lower, so those vehicles tend to break down sooner. However, all this model
can do is make the observation; there’s nothing in it to explain why the effect is observed – so
although iii. might sound like it makes sense, the model does not say anything about
causality.)

d. If the values of 𝑇𝑇 tend to be slightly positive, what can be concluded?

CHOICES
i. Vehicles built more recently tend to take longer to break down.
ii. Vehicles built more recently tend to break down more quickly.

(CORRECT ANSWER: i. Positive values of 𝑻𝑻 mean that the response is getting higher over time,
so newer vehicles’ responses (time until first breakdown) tend to be higher. So, vehicles built
more recently tend to take longer to break down for the first time.

e. Suppose the manufacturer wanted to use a regression model to answer the same
question, using the same data: two predictors (sequence in batch and day within batch)
and one response (hours of operation before first breakdown).

If the manufacturer first used principal component analysis on the data, what would you
expect?

CHOICES
i. The first component would be much more important than the second.
ii. The second component would be much more important than the first.
iii. The two components would have approximately the same importance.

(CORRECT ANSWER: i. With 170 vehicles produced in 8-10 days, the two predictors (sequence
in batch, and day within batch) will be highly correlated. So, the second component will have
much less importance, because its effect will only be whatever is uncorrelated with the first
component. Consequently, in the PCA results the first eigenvalue will be much larger than the
second.)

DataMining - Workbook MCQ
No ratings yet
DataMining - Workbook MCQ
16 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
COMP 1003&1433 Midterm (Tuesday)
No ratings yet
COMP 1003&1433 Midterm (Tuesday)
8 pages
exam-srm-sample-questions
No ratings yet
exam-srm-sample-questions
77 pages
Exam SRM Sample Questions 2
No ratings yet
Exam SRM Sample Questions 2
60 pages
data analytic mcq
No ratings yet
data analytic mcq
5 pages
601 sp09 Midterm Solutions
No ratings yet
601 sp09 Midterm Solutions
14 pages
ML_MCQs_Set
No ratings yet
ML_MCQs_Set
18 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Ai Fundamental Midterm Quizzes - Jei
No ratings yet
Ai Fundamental Midterm Quizzes - Jei
48 pages
Analytics Quiz and Case Study
No ratings yet
Analytics Quiz and Case Study
12 pages
Final 2001f
No ratings yet
Final 2001f
18 pages
Assignment 6: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6: Introduction To Machine Learning Prof. B. Ravindran
3 pages
3rd_data(1) (1)
No ratings yet
3rd_data(1) (1)
18 pages
10-601 Machine Learning: Homework 7: Instructions
No ratings yet
10-601 Machine Learning: Homework 7: Instructions
5 pages
ML4N_exam_sample_2024
No ratings yet
ML4N_exam_sample_2024
6 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
12f-601-Midterm Machine Learning
No ratings yet
12f-601-Midterm Machine Learning
21 pages
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
No ratings yet
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
12 pages
Quiz 1
No ratings yet
Quiz 1
5 pages
PAML Sem 3 - Model Paper Answers (1)
No ratings yet
PAML Sem 3 - Model Paper Answers (1)
4 pages
UNIT 3.docx
No ratings yet
UNIT 3.docx
19 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
AiMidterm Exam - Attempt Review
No ratings yet
AiMidterm Exam - Attempt Review
17 pages
Python For Data Science - Unit 6 - Week 4
No ratings yet
Python For Data Science - Unit 6 - Week 4
5 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Auronova Consulting
No ratings yet
Auronova Consulting
8 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Data Final
No ratings yet
Data Final
17 pages
Data Mining Exam
No ratings yet
Data Mining Exam
14 pages
btech-ece-7-sem-pattern-recognition-and-machine-intelligence-2009
No ratings yet
btech-ece-7-sem-pattern-recognition-and-machine-intelligence-2009
7 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
ST3189 - Machine Learning - 2019 Exam - Zone-B
No ratings yet
ST3189 - Machine Learning - 2019 Exam - Zone-B
6 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
Example Exam
No ratings yet
Example Exam
12 pages
Nptel ML Questions
No ratings yet
Nptel ML Questions
12 pages
Data Science - 2 Sets
No ratings yet
Data Science - 2 Sets
10 pages
EE2211_Past_Paper
No ratings yet
EE2211_Past_Paper
14 pages
B._Sc._H_Computer_S_3OWYH6v
No ratings yet
B._Sc._H_Computer_S_3OWYH6v
6 pages
DS&BDA Techneo Unit 1&2 MCQs
No ratings yet
DS&BDA Techneo Unit 1&2 MCQs
16 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
Exam-dm1-121017-ans
No ratings yet
Exam-dm1-121017-ans
8 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
DS JRE Paper June 2023
No ratings yet
DS JRE Paper June 2023
9 pages
Exam DUT 070816 Ans
No ratings yet
Exam DUT 070816 Ans
5 pages
(LASER) survival8-DM An DSAD-2-print Pending
No ratings yet
(LASER) survival8-DM An DSAD-2-print Pending
29 pages
Sample Test
No ratings yet
Sample Test
17 pages
ML Questions
No ratings yet
ML Questions
6 pages
ML FA24 Final Term Exam (Solution)
No ratings yet
ML FA24 Final Term Exam (Solution)
19 pages
HW 2
No ratings yet
HW 2
7 pages
endsem_ML_makeup_AK-_1_
No ratings yet
endsem_ML_makeup_AK-_1_
7 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Sony Reader PRS500 Service Manual (2006)
No ratings yet
Sony Reader PRS500 Service Manual (2006)
60 pages
Caterpillar 249d | Loader | Operation Manual | Maintenance PDF
No ratings yet
Caterpillar 249d | Loader | Operation Manual | Maintenance PDF
33 pages
Cobbett & Jenkin - Indian Clubs
No ratings yet
Cobbett & Jenkin - Indian Clubs
133 pages
Square Roots: Home Decorating
No ratings yet
Square Roots: Home Decorating
16 pages
Lab 7.1. Stages of The Activity Lifecycle
No ratings yet
Lab 7.1. Stages of The Activity Lifecycle
35 pages
Integrated Building Management System
83% (6)
Integrated Building Management System
11 pages
32 Samss 020
No ratings yet
32 Samss 020
18 pages
Hacktoberfest - Presentation Template 2022 Event Kit
No ratings yet
Hacktoberfest - Presentation Template 2022 Event Kit
33 pages
Falseposition
No ratings yet
Falseposition
11 pages
VNNOX One-Stop Cloud Platform Icare User Manual-V7.50.1
No ratings yet
VNNOX One-Stop Cloud Platform Icare User Manual-V7.50.1
44 pages
Final Exam
No ratings yet
Final Exam
5 pages
03 SQLDataRetrieval
No ratings yet
03 SQLDataRetrieval
11 pages
Operator's Manual: Gps Pilot
No ratings yet
Operator's Manual: Gps Pilot
82 pages
DBMS MCA II SEM NOTES
No ratings yet
DBMS MCA II SEM NOTES
33 pages
TOPIC 5; PREDICTION WITH MANY REGRESSORS AND BIG DATA (PART 1)
No ratings yet
TOPIC 5; PREDICTION WITH MANY REGRESSORS AND BIG DATA (PART 1)
13 pages
Training Presentation - Create Your First Word Document I
No ratings yet
Training Presentation - Create Your First Word Document I
39 pages
Interruptor QC - Tipo Tornillo
No ratings yet
Interruptor QC - Tipo Tornillo
36 pages
Rectangular Patch Antenna
100% (2)
Rectangular Patch Antenna
81 pages
SP200 Brochure
No ratings yet
SP200 Brochure
5 pages
Auto Multiple Choice - en
No ratings yet
Auto Multiple Choice - en
53 pages
Middleware
No ratings yet
Middleware
31 pages
Laconia Capital Group Legal Tech Industry Landscape
No ratings yet
Laconia Capital Group Legal Tech Industry Landscape
15 pages
Radiography: G. Ogunmefun, M. Hardy, S. Boynes
No ratings yet
Radiography: G. Ogunmefun, M. Hardy, S. Boynes
8 pages
Case Study 1 - Digital Media Platform
No ratings yet
Case Study 1 - Digital Media Platform
8 pages
OM20064S
No ratings yet
OM20064S
2 pages
From The Editor-in-Chief: Turkish Online Journal of Distance Education-TOJDE April ISSN 1302-6488
No ratings yet
From The Editor-in-Chief: Turkish Online Journal of Distance Education-TOJDE April ISSN 1302-6488
180 pages
Prolog CH 3
No ratings yet
Prolog CH 3
47 pages
1_Intro
No ratings yet
1_Intro
43 pages
Compiler Project Abstract
No ratings yet
Compiler Project Abstract
12 pages
E2 Lab 3 5 4 in
100% (1)
E2 Lab 3 5 4 in
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sample Quiz1 Questions

Uploaded by

Sample Quiz1 Questions

Uploaded by

ISYE 6501x Introduction to Analytics Modeling

Sample Quiz #1 Questions

ISYE 6501x, Introduction to Analytics Modeling

Quiz #1 – 90 minute time limit

• Work alone. Do not collaborate with or copy from anyone else.

• You may use any of the following resources:

Vehicle ID Sequence in batch Day within batch that Hours of operation

d. If the values of 𝑇𝑇 tend to be slightly positive, what can be concluded?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.