Vijaya ML
Vijaya ML
You are hired by one of the leading news channels CNBE who wants to analyze recent elections. This
survey was conducted on 1525 voters with 9 variables. You have to build a model, to predict which party
a voter will vote for on the basis of the given information, to create an exit poll that will help in predicting
overall win and seats covered by a particular party.
1.1 Read the dataset. Do the descriptive statistics and do the null value condition check. Write an
inference on it. (4 Marks)
The first step to do the analysis is importing all the necessary libraries. Then we need to load the
data set given. To find out the entries in the data set, we used head()
From the above result we infer that, there are total 10 columns with 1525 entries on each column.
The data types of all the variables are integer except “vote” and “gender” which is object.
To proceed further,we can remove the “unnamed” column,as this will not be able to analyse.
After removing the “unnamed”,our data set will look like
Data Description:
The number of duplicate values are very less , so we can drop those and proceed.
Bivariate analysis:- is performed to find the relationship between each variable in the dataset and
the target variable of interest (or) using 2 variables and finding the relationship between
them.Ex:-Box plot, Violin plot.
1. Economic.cond.National:
Multivariate Analysis:
Heat Map:
Data Preparation:
1. Encode the data (having string values) for Modelling. Is Scaling necessary here or not? Data
Split: Split the data into train and test (70:30). Encoding the dataset .
After predicting, we have to find the accuracy of training and testing data.
Based on the accuracy of the training and the testing data result, the model is good to use.
The precision and the recall values are also good.
Model 2: LDA
First we applied LDA model and fitted the dataset. Later that we have predicted the data
training and the testing.
Train accuracy:
Test Accuracy:
Confusion and Classification matrix for Training set:
The LDA model is also having good accuracy and having good precision values.
1.5 Apply KNN Model and Naïve Bayes Model. Interpret the results. (4 marks)
MODEL 3: KNN
Based on our study, we understood that KNN model is having good accuracy for both the training and
the testing sets with good precision score.
After modeling and fitting the dataset, the prediction values as follows:
1.6 Model Tuning, Bagging (Random Forest should be applied for Bagging), and Boosting. (7
marks)
Ada Boosting
The predicting score for training set along with its accuracy and classification ,confusion matrix of ada
boosting is follows:
The predicting score for testing set along with its accuracy and classification ,confusion matrix of ada
boosting is follows:
GRADIENT BOOSTING:
Performance Matrix on train data set
DECISION TREE:
RANDOM FOREST:
1.7 Performance Metrics: Check the performance of Predictions on Train and Test sets using
Accuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score for each model. Final
Model: Compare the models and write inference which model is best/optimized.
LOGISTIC REGRESSION:
Confusion matrix:
AUC on Test and train and ROC curve:
LDA:
KNN MODEL:
Among all the models, the gradient boosting shows high accuracy of 89% for training set and 84% for
testing set. The precision and recall is also good in gradient boosting.
Inference:
The most important variables are “Hague” and “Blair”.The people gave 4 stars to Blair and 2 stars to
Hague.
Problem 2:
In this particular project, we are going to work on the inaugural corpora from the nltk in
Python. We will be looking at the following speeches of the Presidents of the United
States of America:
1. President Franklin D. Roosevelt in 1941
2. President John F. Kennedy in 1961
3. President Richard Nixon in 1973
Number of Character:
Number of words:
Number of Sentences:
Kennedy:
Number of characters:
Number of words:
Number of sentences:
Nixon:
Number of Characters:
Number of words:
Number of sentences:
2.2 Remove all the stopwords from all three speeches. – 3 Marks
2.3 Which word occurs the most number of times in his inaugural address for each president? Mention
the top three words. (after removing the stopwords)
Rosevelt:
Kennedy:
Mostly occurred words are “world,sides.new”
Nixon: