Predictive Modelling Report
Predictive Modelling Report
REGARDS,
AKSHAY PANKAR
GREAT LEARNING 1
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
1.1 Read the data and do exploratory data analysis. Describe the data briefly. (Check the Data
types, shape, EDA, 5 point summary). Perform Univariate, Bivariate Analysis, Multivariate
Analysis.
A. Most of the columns in the data are numeric in nature ('int64' or 'float64' type).The
runqsz and user name columns are string columns ('object' type).We will be dropping
the 'runqsz' column for prediction purposes.
B. Replace the missing values with median values of the columns. Note that we do not
need to specify the column names below. Every column's missing value is replaced
with that column's median respectively.
C. Univariate Analysis
GREAT LEARNING 2
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
D. Bivariate analysis among the different variables can be done using scatter matrix plot.
Seaborn libs create a dashboard reflecting useful information about the dimensions.
Output
<seaborn.axisgrid.PairGrid at 0x1aab1315e50>
GREAT LEARNING 3
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
E. Multivariate Analysis
1.2 Impute null values if present, also check for the values which are equal to zero. Do
they have any meaning or do we need to change them or drop them? Check for
the possibility of creating new features if required. Also check for outliers and
duplicates if there.
GREAT LEARNING 4
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
Yes possibility of creating new feature by Imputing of null values are done here and
checked whether is zero or not by usinge medians that is filling missing values with
medians.
From the image 2 Box plot of lwrite Outliers are present in data set.
1.3 Encode the data (having string values) for Modelling. Split the data into train and
test (70:30). Apply Linear regression using scikit learn. Perform checks for significant
variables using appropriate method from statsmodel. Create multiple models and check
the performance of Predictions on Train and Test sets using Rsquare, RMSE & Adj Rsquare.
Compare these models and select the best one with appropriate reasoning.
By , Comparing the two regression results provided, we can see that both models have
relatively similar R-squared values, with the first model having an R-squared of 0.598
and the second model having an R-squared of 0.602. This indicates that both models
explain approximately the same amount of variability in the dependent variable (usr)
using the independent variables included.
However, the second model has more independent variables (20) compared to the
first model, which has an unspecified number of independent variables. The second
GREAT LEARNING 5
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
model also has a lower F-statistic value (431.3) compared to the first model (608.4),
indicating that the first model may be a better fit for the data.
1.4 Inference: Basis on these predictions, what are the business insights and
recommendations.
Overall, the business insights and recommendations from these regression analyses
would likely depend on the specific independent variables included in the models and
the specific goals of the analysis. Further analysis and interpretation of the results
would be necessary to provide more specific insights and recommendations.
GREAT LEARNING 6
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
The problem is to predict do/don't they use a contraceptive method of choice based on
their demographic and socio-economic characteristics.
2.1 Data Ingestion: Read the dataset. Do the descriptive statistics and do null value condi tion
check, check for duplicates and outliers and write an inference on it. Perform Univariate and
Bivariate Analysis and Multivariate Analysis.
A. Most of the columns in the data are numeric in nature ('Object' , 'float64' and 1- int64
type).
B. After checking and treating the null values.
GREAT LEARNING 7
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
From the above graph we can conclude that the with increase in use of contraceptive the
birth rate has been decreased.
GREAT LEARNING 8
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
From the above graph we can conclude that the with increase in wife education rate the use
of contraceptive have been increased.
G. Multivariate Analysis
2.2 Do not scale the data. Encode the data (having string values) for Modelling. Data
Split: Split the data into train and test (70:30). Apply Logistic Regression and LDA
(linear discriminant analysis) and CART.
GREAT LEARNING 9
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
LDA Output
GREAT LEARNING 10
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
Fig 15 - CART.
2.3 Performance Metrics: Check the performance of Predictions on Train and Test sets using
Accuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score for each model Final
Model: Compare Both the models and write inference which model is best/optimized .
GREAT LEARNING 11
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
LDA:
Accuracy (train): 0.6707692307692308
Accuracy (test): 0.6483253588516746
Confusion Matrix:
[[ 90 97]
[ 50 181]]
ROC AUC Score: 0.6850707225038777
CART:
Accuracy (train): 0.9794871794871794
Accuracy (test): 0.65311004784689
Confusion Matrix:
GREAT LEARNING 12
AKSHAY PANKAR PREDICTIVE MODELLING REPORT
[[114 73]
[ 72 159]]
ROC AUC Score: 0.6487487557006274
2.4 Inference: Basis on these predictions, what are the insights and recommendations .
Overall, the business insights and recommendations from such analyses would likely
depend on the specific independent variables included in the models and the specific
goals of the analysis. Further analysis and interpretation of the results would be
necessary to provide more specific insights and recommendations.
GREAT LEARNING 13