88% found this document useful (8 votes)

4K views

Vijaya ML

The document discusses analyzing election data to build models to predict which party voters will vote for. It provides details on exploratory data analysis conducted on a dataset with 1525 voters and 9 variables. Various classification models were applied including logistic regression, LDA, KNN, naive Bayes, and ensemble methods like gradient boosting, decision tree and random forest. Performance metrics like accuracy, confusion matrix and ROC curves were calculated and compared to determine the optimized model. The gradient boosting model showed the best performance with 89% accuracy on training data and 84% on test data. The document also discusses analyzing inaugural speeches from 3 US Presidents - Roosevelt, Kennedy and Nixon. Details like character, word and sentence counts are provided. Stopwords

Uploaded by

Vijayalakshmi Palaniappan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

88% found this document useful (8 votes)

4K views

Vijaya ML

Uploaded by

Vijayalakshmi Palaniappan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Problem 1:

You are hired by one of the leading news channels CNBE who wants to analyze recent elections. This
survey was conducted on 1525 voters with 9 variables. You have to build a model, to predict which party
a voter will vote for on the basis of the given information, to create an exit poll that will help in predicting
overall win and seats covered by a particular party.

1.1 Read the dataset. Do the descriptive statistics and do the null value condition check. Write an
inference on it. (4 Marks)

EDA(Exploratory Data Analysis)

The first step to do the analysis is importing all the necessary libraries. Then we need to load the
data set given. To find out the entries in the data set, we used head()

From the above result we infer that, there are total 10 columns with 1525 entries on each column.
The data types of all the variables are integer except “vote” and “gender” which is object.

To proceed further,we can remove the “unnamed” column,as this will not be able to analyse.
After removing the “unnamed”,our data set will look like

Data Description:

Checking for the duplicates:

Total no of duplicate values = 8

The number of duplicate values are very less , so we can drop those and proceed.

2. Perform Univariate and Bivariate Analysis. Do exploratory data

analysis. Check for Outliers.

Univariate Analysis and Outlier Check

Exploratory Data Analysis is majorly performed using the following methods: Univariate
analysis:- provides summary statistics for each field in the raw data set (or) summary only on one
variable. Ex:- CDF,PDF,Box plot.

Bivariate analysis:- is performed to find the relationship between each variable in the dataset and
the target variable of interest (or) using 2 variables and finding the relationship between
them.Ex:-Box plot, Violin plot.

Multivariate analysis:- is performed to understand interactions between different fields in the

dataset (or) finding interactions between variables more than 2. Ex:- Pair plot and 3D scatter
plot.
Univariate Analysis:
Histogram:

1. Economic.cond.National:
Multivariate Analysis:
Heat Map:

There is no correlation between any variables.

Data Preparation:
1. Encode the data (having string values) for Modelling. Is Scaling necessary here or not? Data
Split: Split the data into train and test (70:30). Encoding the dataset .

Scaling is necessary for KNN model.

1.4 Apply Logistic Regression and LDA (linear discriminant analysis). (4 marks)

MODEL 1: LOGISTIC REGRESSION

We need to apply the logistic regression and fit the model.

Predicting the training and the testing data.

After predicting, we have to find the accuracy of training and testing data.

Training set Accuracy:

Testing set Accuracy:

Confusion and classification matrix for training data:

Confusion and classification matrix for test data:

Based on the accuracy of the training and the testing data result, the model is good to use.
The precision and the recall values are also good.

Model 2: LDA
First we applied LDA model and fitted the dataset. Later that we have predicted the data
training and the testing.

Train accuracy:

Test Accuracy:
Confusion and Classification matrix for Training set:

Confusion and Classification matrix for Testing set:

The LDA model is also having good accuracy and having good precision values.
1.5 Apply KNN Model and Naïve Bayes Model. Interpret the results. (4 marks)

MODEL 3: KNN

KNN and fitting the training data

Predicting the training and the testing :

Accuracy for training set:

Accuracy for testing set:

Confusion and Classification Matrix for training set:

Confusion and Classification Matrix for testing set:

Based on our study, we understood that KNN model is having good accuracy for both the training and
the testing sets with good precision score.

NAÏVE BAYES MODEL:

After modeling and fitting the dataset, the prediction values as follows:

Training set Accuracy:

Testing set Accuracy:

Classification and confusion matrix for training data:

Classification and confusion matrix for testing data:

1.6 Model Tuning, Bagging (Random Forest should be applied for Bagging), and Boosting. (7
marks)
Ada Boosting

The predicting score for training set along with its accuracy and classification ,confusion matrix of ada
boosting is follows:

The predicting score for testing set along with its accuracy and classification ,confusion matrix of ada
boosting is follows:

GRADIENT BOOSTING:
Performance Matrix on train data set

Performance Matrix on test data set:

DECISION TREE:

Performance Matrix on train data set

Performance matrix on test data set:

RANDOM FOREST:

Performance Matrix on train data set

Performance Matrix on test data set:

BAGGING:

Performance Matrix on train data set:

Performance Matrix on test data set:

1.7 Performance Metrics: Check the performance of Predictions on Train and Test sets using
Accuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score for each model. Final
Model: Compare the models and write inference which model is best/optimized.

LOGISTIC REGRESSION:

Confusion matrix:
AUC on Test and train and ROC curve:

LDA:

Confusion and classification matrix:

AUC AND ROC CURVE:

KNN MODEL:

Classification and confusion matrix:

AUC and ROC CURVE:

NAÏVE BAYES MODEL:

Confusion and classification matrix:

AUC and ROC Curve:

Model comparision :

Among all the models, the gradient boosting shows high accuracy of 89% for training set and 84% for
testing set. The precision and recall is also good in gradient boosting.

Inference:

The most important variables are “Hague” and “Blair”.The people gave 4 stars to Blair and 2 stars to
Hague.

Problem 2:
In this particular project, we are going to work on the inaugural corpora from the nltk in
Python. We will be looking at the following speeches of the Presidents of the United
States of America:
1. President Franklin D. Roosevelt in 1941
2. President John F. Kennedy in 1961
3. President Richard Nixon in 1973

(Hint: use .words(), .raw(), .sent() for extracting counts)

2.1 Find the number of characters, words, and sentences for the mentioned documents.
Roosevelt:

Number of Character:
Number of words:

Number of Sentences:

Kennedy:

Number of characters:

Number of words:

Number of sentences:

Nixon:

Number of Characters:

Number of words:

Number of sentences:

2.2 Remove all the stopwords from all three speeches. – 3 Marks
2.3 Which word occurs the most number of times in his inaugural address for each president? Mention
the top three words. (after removing the stopwords)

Rosevelt:

National word occurs most.

Kennedy:
Mostly occurred words are “world,sides.new”

Nixon:

Mostly occurred words are “America,Peace,World”.

Machine Learning Business Report
75% (55)
Machine Learning Business Report
60 pages
Project: Submitted By: Abhijit Kumar Kalita
90% (21)
Project: Submitted By: Abhijit Kumar Kalita
44 pages
MRA Project Milesone-1: BY-Shorya Goel PGP Dsba Oct - 20 B
92% (25)
MRA Project Milesone-1: BY-Shorya Goel PGP Dsba Oct - 20 B
35 pages
Problem 1: Linear Regression
54% (13)
Problem 1: Linear Regression
14 pages
Predictive-Modelling-Project - Graded Project - Predictive Modeling - Business Report - PDF at Main Aadyatomar - Predictive-Modelling-Project GitHub
100% (8)
Predictive-Modelling-Project - Graded Project - Predictive Modeling - Business Report - PDF at Main Aadyatomar - Predictive-Modelling-Project GitHub
64 pages
Ritesh Tandon Machine Learning Project
100% (5)
Ritesh Tandon Machine Learning Project
23 pages
Project - Time Series Forecasting - Rajendra M Bhat
82% (11)
Project - Time Series Forecasting - Rajendra M Bhat
33 pages
Face Emotion Recognition - Capstone Project
100% (2)
Face Emotion Recognition - Capstone Project
25 pages
Project Report - FRA V1.0
71% (7)
Project Report - FRA V1.0
28 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
FinalReport Life Insurance
80% (5)
FinalReport Life Insurance
34 pages
Weekly Quiz 1 (TSF) - Time Series Forecasting - Great Learning PDF
100% (1)
Weekly Quiz 1 (TSF) - Time Series Forecasting - Great Learning PDF
4 pages
Café Chain Analysis
83% (6)
Café Chain Analysis
35 pages
MRA ML1 - Kirtesh
100% (7)
MRA ML1 - Kirtesh
43 pages
Arnab Chowdhury DM
75% (4)
Arnab Chowdhury DM
14 pages
Logistic Regression and Lda
75% (4)
Logistic Regression and Lda
27 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Time Series Forecasting - SoftDrink - Business Report
75% (4)
Time Series Forecasting - SoftDrink - Business Report
37 pages
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
MRA Project Milestone 2
100% (2)
MRA Project Milestone 2
31 pages
Capstone Proect Notes 2
100% (2)
Capstone Proect Notes 2
16 pages
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
100% (2)
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
31 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Project - Machine Learning - Rajendra M Bhat
100% (11)
Project - Machine Learning - Rajendra M Bhat
19 pages
Problem Statement
100% (1)
Problem Statement
17 pages
FRA Project Business Report
100% (2)
FRA Project Business Report
27 pages
Vaibhav Kumar MRA Project Milestone 1
100% (3)
Vaibhav Kumar MRA Project Milestone 1
29 pages
Project-Predictive Modelling - Tanaya - Lokhande
100% (1)
Project-Predictive Modelling - Tanaya - Lokhande
55 pages
Harshini Week 8 Doc PDF
No ratings yet
Harshini Week 8 Doc PDF
10 pages
Project-Predictive Modeling-Rajendra M Bhat
100% (3)
Project-Predictive Modeling-Rajendra M Bhat
14 pages
50 Deep Learning Technical Interview Questions With Answers
100% (1)
50 Deep Learning Technical Interview Questions With Answers
20 pages
Machine Learning Assignment Report - Cars
100% (4)
Machine Learning Assignment Report - Cars
42 pages
Ai PDF
No ratings yet
Ai PDF
14 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
ML Project Report
100% (2)
ML Project Report
35 pages
ML ProjectReport-Sonali Joshi
100% (2)
ML ProjectReport-Sonali Joshi
38 pages
Project Report
100% (3)
Project Report
36 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Problem 2
100% (1)
Problem 2
10 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Predictive Modelling Project - n1
100% (4)
Predictive Modelling Project - n1
36 pages
PN1 Shakti Akshaya S PDF
100% (2)
PN1 Shakti Akshaya S PDF
60 pages
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
Assignment Report - Data Mining
No ratings yet
Assignment Report - Data Mining
24 pages
Time Series Project
50% (4)
Time Series Project
2 pages
Predictive Modeling
100% (1)
Predictive Modeling
22 pages
MRA Project Milestone 1 - Maminulislam
83% (6)
MRA Project Milestone 1 - Maminulislam
30 pages
Lifi
100% (1)
Lifi
16 pages
Gowtham Mra 2
No ratings yet
Gowtham Mra 2
18 pages
FRA Report
100% (1)
FRA Report
30 pages
Advanced Statistics Jupyter File PDF
100% (2)
Advanced Statistics Jupyter File PDF
56 pages
Capstone Project Business: Predict Customer Churn in E-Commerce
100% (2)
Capstone Project Business: Predict Customer Churn in E-Commerce
10 pages
FRA Assignment
100% (1)
FRA Assignment
31 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
Machine Learning Project - Sapan Parikh
100% (1)
Machine Learning Project - Sapan Parikh
12 pages
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
100% (2)
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
29 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
Machine Learning Project: Problem 1
67% (3)
Machine Learning Project: Problem 1
26 pages
House Prices Predictive Model Summary Report
100% (1)
House Prices Predictive Model Summary Report
20 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
Answer Book (Ashish)
100% (1)
Answer Book (Ashish)
21 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
56 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Tuning Multilingual Transformers For Language-Specific Named Entity Recognition (W19-3712)
No ratings yet
Tuning Multilingual Transformers For Language-Specific Named Entity Recognition (W19-3712)
5 pages
AI Greentext Generator
No ratings yet
AI Greentext Generator
4 pages
Arc Hydro Wetland Identification Model PDF
No ratings yet
Arc Hydro Wetland Identification Model PDF
67 pages
Credit Scoring With A Feature Selection Approach Based Deep Learning PDF
No ratings yet
Credit Scoring With A Feature Selection Approach Based Deep Learning PDF
5 pages
House Price Prediction Using Machine Learning Techniques
No ratings yet
House Price Prediction Using Machine Learning Techniques
5 pages
Fake News Detection Using Machine Learning: Presented by Fathima T H MSC Computer Science
71% (7)
Fake News Detection Using Machine Learning: Presented by Fathima T H MSC Computer Science
15 pages
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
No ratings yet
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
44 pages
Object Recognition System Design in Computer Vision: A Universal Approach
No ratings yet
Object Recognition System Design in Computer Vision: A Universal Approach
18 pages
2014 SLT Ali Complete Arabic Kaldi
No ratings yet
2014 SLT Ali Complete Arabic Kaldi
6 pages
Artificial Neural Networks and Efficient Optimization Techniques For Applications in Engineering
No ratings yet
Artificial Neural Networks and Efficient Optimization Techniques For Applications in Engineering
25 pages
Sparkline Deep Learning
No ratings yet
Sparkline Deep Learning
13 pages
Bagging and Random Forest Presentation1
100% (2)
Bagging and Random Forest Presentation1
23 pages
A Comprehensive Guide To Ensemble Learning (With Python Codes)
No ratings yet
A Comprehensive Guide To Ensemble Learning (With Python Codes)
21 pages
Corentin Herbinet Using Machine Learning Techniques To Predict The Outcome of Profressional Football Matches
No ratings yet
Corentin Herbinet Using Machine Learning Techniques To Predict The Outcome of Profressional Football Matches
73 pages
Deep Learning Notes PDF
No ratings yet
Deep Learning Notes PDF
26 pages
Big Data, Machine Learning, and Econometrics
No ratings yet
Big Data, Machine Learning, and Econometrics
48 pages
Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
115 pages
Are NLP Models Really Able To Solve Simple Math Word Problems?
No ratings yet
Are NLP Models Really Able To Solve Simple Math Word Problems?
15 pages
Toxic Comment Classification
No ratings yet
Toxic Comment Classification
4 pages
Addarsh Chandrasekar - Crime Prediction and Classification in San Francisco City
No ratings yet
Addarsh Chandrasekar - Crime Prediction and Classification in San Francisco City
6 pages
A Literature Review On Application of Sentiment Analysis Using Machine Learning Techniques
No ratings yet
A Literature Review On Application of Sentiment Analysis Using Machine Learning Techniques
38 pages
Key Ideas in Machine Learning
No ratings yet
Key Ideas in Machine Learning
11 pages
How To Create A Simple Neural Network in Python
100% (1)
How To Create A Simple Neural Network in Python
4 pages
A Machine Learning Approach To Facies Classification Using Well Logs
No ratings yet
A Machine Learning Approach To Facies Classification Using Well Logs
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Vijaya ML

Uploaded by

Vijaya ML

Uploaded by

Problem 1:

EDA(Exploratory Data Analysis)

Checking for the duplicates:

Total no of duplicate values = 8

2. Perform Univariate and Bivariate Analysis. Do exploratory data

Univariate Analysis and Outlier Check

Multivariate analysis:- is performed to understand interactions between different fields in the

There is no correlation between any variables.

Scaling is necessary for KNN model.

MODEL 1: LOGISTIC REGRESSION

We need to apply the logistic regression and fit the model.

Training set Accuracy:

Testing set Accuracy:

Confusion and classification matrix for training data:

Confusion and Classification matrix for Testing set:

KNN and fitting the training data

Predicting the training and the testing :

Accuracy for training set:

Accuracy for testing set:

Confusion and Classification Matrix for training set:

NAÏVE BAYES MODEL:

Training set Accuracy:

Testing set Accuracy:

Classification and confusion matrix for testing data:

Performance Matrix on test data set:

Performance Matrix on train data set

Performance Matrix on train data set

Performance Matrix on test data set:

Performance Matrix on train data set:

Performance Matrix on test data set:

Confusion and classification matrix:

Classification and confusion matrix:

NAÏVE BAYES MODEL:

Confusion and classification matrix:

AUC and ROC Curve:

(Hint: use .words(), .raw(), .sent() for extracting counts)

National word occurs most.

Mostly occurred words are “America,Peace,World”.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.