0% found this document useful (0 votes)
45 views6 pages

Questions For Chapter 2

1. The document contains a set of multiple choice and fill-in-the-blank questions about machine learning concepts from Chapter 2. 2. The multiple choice questions cover topics like dependent and independent variables, classification vs prediction problems, dealing with outliers, rule precision and coverage, neural networks vs linear regression, association rules, uses of clustering, and evaluation metrics. 3. The fill-in-the-blank questions involve calculating percentages from a confusion matrix, identifying values within a confusion matrix, computing lift, and determining the better model based on costs.

Uploaded by

Yousaf Bashir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views6 pages

Questions For Chapter 2

1. The document contains a set of multiple choice and fill-in-the-blank questions about machine learning concepts from Chapter 2. 2. The multiple choice questions cover topics like dependent and independent variables, classification vs prediction problems, dealing with outliers, rule precision and coverage, neural networks vs linear regression, association rules, uses of clustering, and evaluation metrics. 3. The fill-in-the-blank questions involve calculating percentages from a confusion matrix, identifying values within a confusion matrix, computing lift, and determining the better model based on costs.

Uploaded by

Yousaf Bashir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Questions for Chapter 2

Multiple Choice Questions


(2.1)

1. Another name for an output attribute.


a. predictive variable
a. independent variable
b. estimated variable
c. dependent variable

2. Classification problems are distinguished from estimation problems in that


a. classification problems require the output attribute to be numeric.
b. classification problems require the output attribute to be categorical.
c. classification problems do not allow an output attribute.
d. classification problems are designed to predict future outcome.

3. Which statement is true about prediction problems?


a. The output attribute must be categorical.
b. The output attribute must be numeric.
c. The resultant model is designed to determine future outcomes.
d. The resultant model is designed to classify current behavior.

4. Which statement about outliers is true?


a. Outliers should be identified and removed from a dataset.
b. Outliers should be part of the training dataset but should not be present in the test data.
c. Outliers should be part of the test dataset but should not be present in the training data.
d. The nature of the problem determines how outliers are used.
e. More than one of a,b,c or d is true.
(2.2)

5. Assume that we have a data set containing information about 200 individuals. One hundred of these
individuals have purchased life insurance. A supervised data mining session has discovered the
following rule:

IF age < 30 and credit card insurance = yes


THEN life insurance = yes
Rule Precision: 70%
Rule Coverage: 30%

How many individuals in the class life insurance= no have credit card insurance and are less than 30
years old?
a. 140
b. 60
c. 42
d. 18

6. Which statement is true about neural network and linear regression models?
a. Both models require input attributes to be numeric.
b. Both models require numeric attributes to range between 0 and 1.
c. The output of both models is a categorical attribute value.
d. Both techniques build models whose output is determined by a linear sum of weighted input
attribute values.
e. More than one of a,b,c or d is true.

(2.3)

7. Unlike traditional production rules, association rules


a. allow the same variable to be an input attribute in one rule and an output attribute in another rule.
b. allow more than one input attribute in a single rule.
c. require input attributes to take on numeric values.
d. require each rule to have exactly one categorical output attribute.

(2.4)

8. Which of the following is a common use of unsupervised clustering?


a. detect outliers
b. determine a best set of input attributes for supervised learning
c. evaluate the likely performance of a supervised learner model
d. determine if meaningful relationships can be found in a dataset
e. All of a,b,c, and d are common uses of unsupervised clustering.

(2.5)

9. The average positive difference between computed and desired outcome values.
a. root mean squared error
b. mean squared error
c. mean absolute error
d. mean positive error

10. Given desired class C and population P, lift is defined as


a. the probability of class C given population P divided by the probability of C given a sample taken
from the population.
b. the probability of population P given a sample taken from P.
c. the probability of class C given a sample taken from population P.
d. the probability of class C given a sample taken from population P divided by the probability of C
within the entire population P.
11. Bootstrapping allows us to
a. choose the same training instance several times.
b. choose the same test set instance several times.
c. build models with alternative subsets of the training data several times.
d. test a model with alternative subsets of the test data several times.

12. With this method, all available data are partitioned into n fixed-size units. n - 1 of the
units are used for training, whereas the nth unit is the test set.
a. x-prediction
b. stratification
c. cross validation
d. bootstrapping

Fill in the Blank


Use the three-class confusion matrix below to answer questions 1 through 3.

Computed Decision

Class 1 Class 2 Class 3

Class 1 10 5 3

Class 2 5 15 3

Class 3 2 2 5

1. What percent of the instances were correctly classified?


2. How many class 2 instances are in the dataset?
3. How many instances were incorrectly classified with class 3?

Use the confusion matrix for Model X and confusion matrix for Model Y to answer questions 4 through 6.

Model Computed Computed Model Computed Computed


X Accept Reject Y Accept Reject

Accept 10 5 Accept 6 9

Reject 25 60 Reject 15 70

4. How many instances were classified as an accept by Model X?


5. Compute the lift for Model Y.
6. You will notice that the lift for both models is the same. Assume that the cost of a false reject is
significantly higher than the cost of a false accept. Which model is the better choice?

Answers to Chapter 2 Questions


Multiple Choice Questions
1. d
2. b
3. c
4. d
5. d
6. a
7. a
8. e
9. c
10. d
11. a
12. c

Fill in the Blank


1. 60%
2. 23
3. 6
4. 35
5. 8/7
6. Model X

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy