0% found this document useful (0 votes)

11 views

HW_02

The document contains a series of homework questions related to machine learning concepts, including perceptrons, linear discriminant analysis (LDA), k-nearest neighbors (kNN), and spam classification. Each question requires mathematical proofs, calculations, and implementations related to classification tasks using various algorithms and datasets. The tasks involve analyzing data, calculating probabilities, and implementing algorithms to classify data points based on given features.

Uploaded by

vishaltiwari.abhyuday

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

HW_02

Uploaded by

vishaltiwari.abhyuday

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

ME-228 (S2) Homework 02

Number of * ~ difficulty of problem, where * mean easy, and *** means difficult
*Q1: Consider the perceptron in two dimensions: h(x) = sign(wTx) where w = [wo, w1, w2]T
and x = [1, x1, x2]T. Technically, x has three coordinates, but we call this perceptron two-
dimensional because the first coordinate is fixed at 1.
(a) Show that the regions on the plane where h(x) = +1 and h(x) = -1 are separated by a line. If
we express this line by the equation x2 = ax1 + b, what are the slope a and intercept b in terms
of wo, w1, w2?
(b) Draw a picture for the cases w = [1, 2, 3]T and w = -[1, 2, 3]T. In more than two dimensions,
the +1 and -1 regions are separated by a hyperplane, the generalization of a line.

**Q2: Suppose we have features 𝑥 ∈ 𝑅 𝑝 , a two-class response, class-1 and class-2 with class
size 𝑁1 , 𝑁2 . The query point coded as −𝑁/𝑁1, 𝑁/𝑁2 , where 𝑁 = 𝑁1 + 𝑁2 . Show that the LDA
rule classifies the query point to class-2 if,
1 𝑁
̂ −1 (𝜇̂ 2 − 𝜇̂ 1 ) >
𝑥𝑇 ∑ ̂ −1 (𝜇̂ 2 − 𝜇̂ 1 ) − 𝑙𝑜𝑔( 2 )
(𝜇̂ 2 + 𝜇̂ 1 )𝑇 ∑
2 𝑁1
and class-1 otherwise.

***Q3: You are designing a spam classifier for an email service. The classifier analyzes an
email for the words "WIN" and “MONEY” and decides whether the email is spam or not.
Based on historical data, the following probabilities are known:

P (Spam) = 0.4 (40% of all emails are spam).

P (Not Spam) = 0.6 (60% of all emails are not spam).

If the email is spam, the probability that it contains the word "WIN" is 0.8.

If the email is not spam, the probability that it contains the word "WIN" is 0.1.

If the email is spam, the probability that it contains the word "MONEY" is 0.7.

If the email is not spam, the probability that it contains the word "MONEY" is 0.2.

If the email is spam, the probability that it contains both "WIN" & "MONEY" 0.5.

If the email is not spam, the probability that it contains both "WIN" & "MONEY" is 0.05.

Compute:

(a) The marginal probability that an email contains both "WIN" and "MONEY", i.e., P(WIN
∩ MONEY).

(b) If an email contains both "WIN" and "MONEY", calculate the probability that it is spam,
i.e., P(Spam | WIN ∩ MONEY)
*Q4: A company is trying to classify a new customer as either “High Spender” or “Low
Spender” based on the amount spent in the last two months. The dataset contains the following
points:
Training Data:

Customer ID Amount Spent (Month 1) Amount Spent (Month 2) Class

1 50 60 Low spender

2 45 55 Low spender

3 200 220 High spender

4 210 230 High spender

5 55 65 Low spender

6 190 200 High spender

(a) Use the Euclidean distance formula to calculate the distance of a new customer with
attributes:
Amount spent (month 1) = 60, Amount spent (month 2) = 70
from all the training points.
(b) If k=3, classify the new customer as either “High Spender” or “Low Spender” based on
the majority vote of the nearest neighbours.

***Q5: The Perceptron algorithm updates its weights whenever it encounters a misclassified
point. The sequence in which data points are presented can influence the updates and,
potentially, the total number of mistakes made during training.
Your task is to investigate this effect:
(a) Implement the Perceptron algorithm for a binary classification problem where y ∈
{−1,1}.
o Initialize the weight vector w and bias b to zero.
o Iterate through the dataset for a maximum of 1000 epochs or until convergence.
o Record the total number of mistakes made during training.
(b) Generate a synthetic 2D dataset which is linearly separable. Run the perceptron
algorithm on this data and print the number of mistakes. (Read about make_blobs from
sklearn to generate the dataset)
(c) Shuffle the dataset into 5 random permutations and run the Perceptron algorithm on
each permutation. For each permutation, record the number of mistakes made by the
algorithm.
(d) Are these mistakes consistent or do they change with the order of data presented? What
can you conclude?

*** Q6: Heads Up: The question is a long read, but it covers all the steps to approach an ML
problem, so stick to the end. It will be fun :)
k Nearest Neighbours or kNN is the simplest of all machine learning algorithms. It simply
calculates the distance between a sample data point and all the other training data points. Then,
it selects the k nearest data points where k can be any integer. Finally, it assigns the sample
data point to the class to which the majority of the k data points belong. For the problem, you
will be using the UCI_Breast Cancer Wisconsin (Original) Dataset (PFA). It contains the first
10 columns as features and the last(11th) column as the class of breast cancer - Benign(2) /
Malignant(4). Feel free to use sklearn and pandas.

Exploratory Data Analysis (EDA) is an essential part of any data science problem as it
provides us insights about the data.
(a) Statistically analyse the dataset by finding the values like mean, median, standard
deviation, count, minimum, maximum.
(b) Check if all the features are numerical values, as we will be finding the distance between
numerical features only. Try to convert the non-numerical values to numerical ones.
(c) Plot the frequency distribution of the 10 features as subplots in a single plot.
There are many times when the training dataset is not refined, which is our job to take care of
before training the model. Null values cause problems in model training as they can be
misleading. In these cases, feature engineering comes into play.
(d) Try to find if there are any null values in the columns.
(e) If you find there are null values, then try to fill those with the median of that feature, as
it can be a good approximation to start training.
Dataset Generation will be the next step, i.e., the division of dataset into training and testing.
This is done to get an idea of model accuracy keeping in mind the problems of overfitting.
(f) Divide the dataset for training and testing. Make sure to set a random state so you don’t
get different data and result each time you run the notebook.

As the features are of different scales, we want to make sure that the contribution should be
equal of each of them. Standard Scaling is a method which will cause all the training columns
to have mean of 0 and a standard deviation of 1.
(g) Perform standard scaling over all the features.
(h) Train the model using different values of K.
(i) Find the test and training errors and accuracy by the model predictions and plot the
errors as K varies.
(j) Find the K value where the error is the least.
(k) Plot a confusion matrix of the final model.

**Q7: In this question we will be using a Stock Market dataset (provided to you). The dataset
contains 1089 weekly returns for 21 years from 1990 to 2010.
(a) Provide some graphical summary of the provided dataset, if any patterns observed
kindly report.
(b) Fit the model using LDA using the training data from 1990 to 2007, with ‘Lag2’ as the
only predictor. Compute the confusion matrix and the correct predictions from the test
data (from 2008 to 2010).
(c) Experiments with different predictors and compute the confusion matrix for the same.
(d) (Optional) Repeat (b) for KNN and Naïve Bayes.

Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Machine Learning Assignments and Answers
No ratings yet
Machine Learning Assignments and Answers
35 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
67 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
8 pages
Exam Spring 10
No ratings yet
Exam Spring 10
10 pages
Comp 1942 finalExamQuestion-2016
No ratings yet
Comp 1942 finalExamQuestion-2016
11 pages
Ml Ese 031223 Openbook
No ratings yet
Ml Ese 031223 Openbook
4 pages
quiz-1
No ratings yet
quiz-1
3 pages
Sample_Exam_ML4DT-revised
No ratings yet
Sample_Exam_ML4DT-revised
10 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
C-3 Pap365er
No ratings yet
C-3 Pap365er
4 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
ML SP24 Mid Term Exam - Solution
No ratings yet
ML SP24 Mid Term Exam - Solution
8 pages
Ai Ml Exam_1march 16 2022-Michael Magreola
No ratings yet
Ai Ml Exam_1march 16 2022-Michael Magreola
8 pages
ml-20240315
No ratings yet
ml-20240315
8 pages
hw2 2020
No ratings yet
hw2 2020
3 pages
EE2211_Past_Paper
No ratings yet
EE2211_Past_Paper
14 pages
Question Bank 1
No ratings yet
Question Bank 1
4 pages
Assignment_III
No ratings yet
Assignment_III
3 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
1st Exam Question Paper
No ratings yet
1st Exam Question Paper
2 pages
ml-20230316-1
No ratings yet
ml-20230316-1
9 pages
Assignment Mtech
No ratings yet
Assignment Mtech
5 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
Answers 2024
No ratings yet
Answers 2024
11 pages
18ai61-Model Question Paper Solutions
No ratings yet
18ai61-Model Question Paper Solutions
71 pages
ASSIGNMENT2
No ratings yet
ASSIGNMENT2
6 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
HW 1
No ratings yet
HW 1
4 pages
ML Questions
No ratings yet
ML Questions
9 pages
ML Midterm Question Pool
No ratings yet
ML Midterm Question Pool
7 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
CS467-ML-Dec2018
No ratings yet
CS467-ML-Dec2018
3 pages
CS467 A
No ratings yet
CS467 A
3 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
SampleQuestion- AIOL 2024
No ratings yet
SampleQuestion- AIOL 2024
5 pages
Slides on DataI
No ratings yet
Slides on DataI
33 pages
Lokesh T00691325
No ratings yet
Lokesh T00691325
5 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
Khoi KHDL - de On
No ratings yet
Khoi KHDL - de On
6 pages
HW 23 P 4 Rie
No ratings yet
HW 23 P 4 Rie
5 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
AIML Assignment II
No ratings yet
AIML Assignment II
2 pages
Qs ML
No ratings yet
Qs ML
8 pages
ML Midsem 2022
No ratings yet
ML Midsem 2022
8 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Statistics Chapter2
No ratings yet
Statistics Chapter2
102 pages
Why Learning Data Analysis Is Essential
No ratings yet
Why Learning Data Analysis Is Essential
2 pages
An Approach To Write Research Paper
No ratings yet
An Approach To Write Research Paper
80 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Instant Access to Applied Multivariate Research Design and Interpretation 3rd Edition Lawrence S. Meyers ebook Full Chapters
100% (6)
Instant Access to Applied Multivariate Research Design and Interpretation 3rd Edition Lawrence S. Meyers ebook Full Chapters
51 pages
13SCEC Cloud Compare Final
No ratings yet
13SCEC Cloud Compare Final
13 pages
Xlri MDP On Data Analysis and Financial Modeling Using Excel 0 30 Yrs
No ratings yet
Xlri MDP On Data Analysis and Financial Modeling Using Excel 0 30 Yrs
6 pages
Solution Manual for Business Analytics, 4th Edition, Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann download
100% (4)
Solution Manual for Business Analytics, 4th Edition, Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann download
62 pages
Data Analysis For Continuous Improvement Template
No ratings yet
Data Analysis For Continuous Improvement Template
55 pages
A Survey On Football Player Performance and Value Estimation Using Machine Learning Techniques (#1215552) - 2816789
No ratings yet
A Survey On Football Player Performance and Value Estimation Using Machine Learning Techniques (#1215552) - 2816789
6 pages
T Test For Independent Samples Solution - Victor Bissonnette PDF
100% (1)
T Test For Independent Samples Solution - Victor Bissonnette PDF
4 pages
White Paper Form Hyperion
No ratings yet
White Paper Form Hyperion
30 pages
Summer Training Report at Event Management
50% (10)
Summer Training Report at Event Management
51 pages
Assignment For TYBBI 2022-23
No ratings yet
Assignment For TYBBI 2022-23
2 pages
Anova
No ratings yet
Anova
18 pages
Glassdoor Resume Resume Ravi
No ratings yet
Glassdoor Resume Resume Ravi
3 pages
Coursera R5CRSFPC8WCT
No ratings yet
Coursera R5CRSFPC8WCT
1 page
Presentation 37
No ratings yet
Presentation 37
12 pages
DA Unit 3 T Test (Class
No ratings yet
DA Unit 3 T Test (Class
30 pages
arju-2
No ratings yet
arju-2
51 pages
Kilimanjaro Manual - Idrisi
No ratings yet
Kilimanjaro Manual - Idrisi
328 pages
SR21909174017
No ratings yet
SR21909174017
5 pages
Different Reliability Tests
No ratings yet
Different Reliability Tests
3 pages
Atelier Solar Document Control
No ratings yet
Atelier Solar Document Control
15 pages
Ch3 DIS 1
No ratings yet
Ch3 DIS 1
2 pages
Subgroup Analysis in Regression Discontinuity Designs
No ratings yet
Subgroup Analysis in Regression Discontinuity Designs
14 pages
Budget of Work in Reading and Writing
No ratings yet
Budget of Work in Reading and Writing
7 pages
Presentation Questions - Logistics and SCM Career Skills
No ratings yet
Presentation Questions - Logistics and SCM Career Skills
13 pages
Lesson 3 Big Data Overview
No ratings yet
Lesson 3 Big Data Overview
30 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

HW_02

Uploaded by

HW_02

Uploaded by

ME-228 (S2) Homework 02

P (Spam) = 0.4 (40% of all emails are spam).

P (Not Spam) = 0.6 (60% of all emails are not spam).

Customer ID Amount Spent (Month 1) Amount Spent (Month 2) Class

3 200 220 High spender

4 210 230 High spender

6 190 200 High spender

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.