0% found this document useful (0 votes)

31 views

Loan Prediction Project

LOAN PREDICTION PROJECT

Uploaded by

Jathavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Loan Prediction Project

LOAN PREDICTION PROJECT

Uploaded by

Jathavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Mining Project

Loan Prediction Using Data Mining

RITU KUMARI
Overview

Dataset Data Attribute

Cleaning Selection
Generation

Analysis of
Data Mining
results
Data Generation
• The data will be collected from the kaggle.com
website. The website has data about Dream
Housing Finance company i.e. deals in all
home loans. They have presence across all
urban, semi urban and rural areas. Customer
first apply for home loan after that company
validates the customer eligibility for loan. We
are using direct that data because kaggle.com
website provide direct in CSV format.
Some Data Attributes
• Loan_ID :Unique Loan ID
• Gender: Male/ Female
• Married: Applicant married (Y/N)
• Dependents: Number of dependents on the applicant
• Education: Applicant Education (Graduate/ Under Graduate)
• Self_Employed: Self employed (Y/N)
• ApplicantIncome: Applicant income
• CoapplicantIncome: Co-applicant income
• LoanAmount: Loan amount in thousands
• Loan_Amount_Term: Term of loan in months
• Credit_History: credit history meets guidelines
• Property_Area: Urban/ Semi Urban/ Rural
• Loan_Status: Loan approved (Y/N)
Attribute Selection
• We select different attributes for loan prediction and
individual applicant prediction.
• We first create all possible subsets of attributes
• Then it makes use of a classification algorithm like
Random Forest, Decision Tree etc. to predict the classifier
from the features available in each subset.
• Then it calculates the accuracy of prediction for each of
the subsets.
• After comparing the accuracy of each subset, this method
then chooses the best subset of attributes whose accuracy
is the highest.
Data Mining Techniques
• We use Logistic Regression using stratified k-
folds cross validation method for predicting
the loan status. It has been found in the
results that error in Linear Regression classifier
is less than Current Run Rate method in
estimating the final prediction
• Stratification is the process of rearranging the
data so as to ensure that each fold is a good
representative of the whole.
Data Mining Techniques
• Based on the domain knowledge, we can
come up with new attributes that might affect
the target variable. We will create the
following three new attributes.
• Total Income - we will combine the Applicant
Income and Co-applicant Income. If the total
income is high, chances of loan approval might
also be high.
Data Mining Techniques
• Balance Income - This is the income left after the EMI has
been paid. Idea behind creating this variable is that if this
value is high, the chances are high that a person will repay
the loan and hence increasing the chances of loan approval.

• EMI - EMI is the monthly amount to be paid by the

applicant to repay the loan. Idea behind making this
variable is that people who have high EMI’s might find it
difficult to pay back the loan. We can calculate the EMI by
taking the ratio of loan amount with respect to loan amount
term.
• By distribution , I check it that these three
attributes are giving better result than previews
given attribute whose all ready present in training
set.
• The correlation between those old features and
these new features will be very high and logistic
regression assumes that the variables are not
highly correlated. We also wants to remove the
noise from the dataset, so removing correlated
features will help in reducing the noise too.
Algorithm for the New Attributes
1.Total_Income = Applicant_Income +
Coapplicant_ Income
2.EMI = Loan_Amount / Loan_Amount_Term
3.Balance Income=Total_Income - (EMI*1000)

Multiply with 1000 to make the units equal

Data Mining Algorithm
Decision Tree
In this technique, we split the population or sample into
two or more homogeneous sets based on most significant
splitter / differentiator in input variables.
Decision trees use multiple algorithms to decide to split a
node in two or more sub-nodes. The creation of sub-
nodes increases the homogeneity of resultant sub-nodes.
From this algorithm we will get purity of the node
increases with respect to the target variable.
The mean validation accuracy for this model is 0.69
Data Mining Algorithm
Random Forest
• Random Forest is a tree based bootstrapping algorithm
wherein a certain no. of weak learners (decision trees) are
combined to make a powerful prediction model.
• For every individual learner, a random sample of rows and
a few randomly chosen variables are used to build a
decision tree model.
• Final prediction can be a function of all the predictions
made by the individual learners.
• In case of regression problem, the final prediction can be
mean of all the predictions.
Data Mining Algorithm
XGBoost
XGBoost works only with numeric variables
and we have already replaced the categorical
variables with numeric variables.
We got an accuracy of 0.73611 with this
model.
Result
• After trying and testing 4 different algorithms,
the best accuracy on the public leader board is
achieved by Logistic Regression (0.7847),
followed by Random Forest (0.7638).

Thank you!!!
RITU KUMARI

Upwork
No ratings yet
Upwork
333 pages
Tsitskilis, Bertsimas-Introduction To Linear Optimization-Mit Press PDF
No ratings yet
Tsitskilis, Bertsimas-Introduction To Linear Optimization-Mit Press PDF
186 pages
Mathematical Analysis by S C Malik Savita Arora 1906574111 PDF
No ratings yet
Mathematical Analysis by S C Malik Savita Arora 1906574111 PDF
6 pages
Christopher F. BAUM - An Introduction To Modern Econometrics Using Stata
No ratings yet
Christopher F. BAUM - An Introduction To Modern Econometrics Using Stata
342 pages
Sample Test
No ratings yet
Sample Test
6 pages
Mathematical Statistics With Applications Solution Manual
No ratings yet
Mathematical Statistics With Applications Solution Manual
5 pages
AirBnB Customer Acquisition Report
No ratings yet
AirBnB Customer Acquisition Report
14 pages
Dong Ying PDF
No ratings yet
Dong Ying PDF
52 pages
SME Credit Scoring Using Social Media Data
No ratings yet
SME Credit Scoring Using Social Media Data
82 pages
A Critical Review On The Tax Structure of Bangladesh
No ratings yet
A Critical Review On The Tax Structure of Bangladesh
17 pages
Intermediate STATS 10
100% (1)
Intermediate STATS 10
35 pages
PDF No-15 - Vocabulary Q Bank Digestion-1 - 10 ( )
No ratings yet
PDF No-15 - Vocabulary Q Bank Digestion-1 - 10 ( )
411 pages
Advance MME SEM-3
No ratings yet
Advance MME SEM-3
23 pages
Quantitative Techniques
100% (1)
Quantitative Techniques
3 pages
IEEE Conference Template 1 (1)
No ratings yet
IEEE Conference Template 1 (1)
5 pages
Operation Research
No ratings yet
Operation Research
25 pages
Business Analytics
No ratings yet
Business Analytics
42 pages
MACHINE LEARNING IN HEALTHCARE
No ratings yet
MACHINE LEARNING IN HEALTHCARE
43 pages
R Visualizations: Derive Meaning from Data 1st Edition David Gerbing - The latest ebook edition with all chapters is now available
100% (3)
R Visualizations: Derive Meaning from Data 1st Edition David Gerbing - The latest ebook edition with all chapters is now available
65 pages
Data Science Public Datasets
No ratings yet
Data Science Public Datasets
37 pages
Stockholm Beamer Theme: STHLM V2.0.2 Is Based On HSRM & Mtheme
No ratings yet
Stockholm Beamer Theme: STHLM V2.0.2 Is Based On HSRM & Mtheme
70 pages
Mumbai Educational Trust: MET Institute of Computer Science
No ratings yet
Mumbai Educational Trust: MET Institute of Computer Science
368 pages
Corporate Growth Maximization
No ratings yet
Corporate Growth Maximization
13 pages
Difference Between Time Series and Cross Sectional Data
No ratings yet
Difference Between Time Series and Cross Sectional Data
3 pages
MSC Applied Statistics Project
No ratings yet
MSC Applied Statistics Project
25 pages
Bba 1st Year
100% (1)
Bba 1st Year
85 pages
Reasoning Shortcut Tricks: by Ramandeep Singh
No ratings yet
Reasoning Shortcut Tricks: by Ramandeep Singh
3 pages
Number System-All Parts
No ratings yet
Number System-All Parts
103 pages
Econometrics I: TA Session 5: Giovanna Ubida
No ratings yet
Econometrics I: TA Session 5: Giovanna Ubida
20 pages
C. B. Gupta - Optimization Techniques in Operation Research-I.K. International (2020)
No ratings yet
C. B. Gupta - Optimization Techniques in Operation Research-I.K. International (2020)
381 pages
17ME-ENV-48 SPSS Practical
No ratings yet
17ME-ENV-48 SPSS Practical
41 pages
Clearias 2019 Questions Vs Upsc 2019 Questions PDF
No ratings yet
Clearias 2019 Questions Vs Upsc 2019 Questions PDF
102 pages
Statistical Estimation
No ratings yet
Statistical Estimation
31 pages
Data Scales and Representation: Prof. Asim Tewari IIT Bombay
No ratings yet
Data Scales and Representation: Prof. Asim Tewari IIT Bombay
27 pages
Introduction To Quantitative Analysis
No ratings yet
Introduction To Quantitative Analysis
6 pages
Elements of Taxation Elements of Taxatio PDF
No ratings yet
Elements of Taxation Elements of Taxatio PDF
24 pages
Social Accounting Matrix in Policy Analysis
100% (2)
Social Accounting Matrix in Policy Analysis
63 pages
Full download Modern Statistics with R From Wrangling and Exploring Data to Inference and Predictive Modelling Second Edition Måns Thulin pdf docx
100% (2)
Full download Modern Statistics with R From Wrangling and Exploring Data to Inference and Predictive Modelling Second Edition Måns Thulin pdf docx
71 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
Calculated Field Practice Questions
No ratings yet
Calculated Field Practice Questions
8 pages
Stock Price Prediction Using Machine Learning Algorithms: ARIMA, LSTM & Linear Regression
No ratings yet
Stock Price Prediction Using Machine Learning Algorithms: ARIMA, LSTM & Linear Regression
7 pages
Assignment 6 ML
No ratings yet
Assignment 6 ML
4 pages
Business Statistics Assignment
No ratings yet
Business Statistics Assignment
7 pages
UNIT2
No ratings yet
UNIT2
25 pages
Statistics Volume 1_47368786_2024_12_06_19_23
No ratings yet
Statistics Volume 1_47368786_2024_12_06_19_23
95 pages
09 2023uganda Demographic and Health Survey (Udhs) 2022 Key Findings
100% (1)
09 2023uganda Demographic and Health Survey (Udhs) 2022 Key Findings
67 pages
Math One:: Requirement: Taxable Income From House Property For The Year Ended On
No ratings yet
Math One:: Requirement: Taxable Income From House Property For The Year Ended On
2 pages
Data Fusion Methodology and Applications Marina Cocchi 2024 Scribd Download
100% (3)
Data Fusion Methodology and Applications Marina Cocchi 2024 Scribd Download
49 pages
Responsiveness and Productivity of Tax Yields
No ratings yet
Responsiveness and Productivity of Tax Yields
30 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
ND Vohra Ch10 Theory of Estimation
No ratings yet
ND Vohra Ch10 Theory of Estimation
37 pages
Statistics
No ratings yet
Statistics
163 pages
Quantitative Techniques Questions
No ratings yet
Quantitative Techniques Questions
2 pages
Measure of Locations
No ratings yet
Measure of Locations
6 pages
Alternative To Profit Maximisation
No ratings yet
Alternative To Profit Maximisation
11 pages
(eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tanpdf download
100% (8)
(eBook PDF) Introduction to Data Mining 2nd Edition by Pang-Ning Tanpdf download
51 pages
Overcoming Poverty and Hunger in Bundelkhand
No ratings yet
Overcoming Poverty and Hunger in Bundelkhand
43 pages
ch1 The Nature of Regression Analysis
No ratings yet
ch1 The Nature of Regression Analysis
12 pages
The Vision, The Tool, and The Project: Scikit
No ratings yet
The Vision, The Tool, and The Project: Scikit
75 pages
GAMS Getting Started
No ratings yet
GAMS Getting Started
31 pages
Forecast Time Series With R Language
No ratings yet
Forecast Time Series With R Language
98 pages
Customer Loan Prediction: Term Project Report
100% (1)
Customer Loan Prediction: Term Project Report
11 pages
Data Mining: (Kumar, Viswanath and Rao, 2016)
No ratings yet
Data Mining: (Kumar, Viswanath and Rao, 2016)
3 pages
Predicting Students Employability Using ML
No ratings yet
Predicting Students Employability Using ML
5 pages
AI Final Assignment
No ratings yet
AI Final Assignment
27 pages
Karthik Ai Project Report
No ratings yet
Karthik Ai Project Report
29 pages
ML notes
No ratings yet
ML notes
16 pages
BCSE497J Project I Report
No ratings yet
BCSE497J Project I Report
51 pages
An_Ensemble_Deep_Learning_Model_for_Vehicular_Engine_Health_Prediction
No ratings yet
An_Ensemble_Deep_Learning_Model_for_Vehicular_Engine_Health_Prediction
19 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
34 pages
Machine Learning (VR20) III B.Tech - II Semester: Random Forest Algorithm
No ratings yet
Machine Learning (VR20) III B.Tech - II Semester: Random Forest Algorithm
14 pages
A Comprehensive Review On Ensemble Solar Power Forecasting AlgorithmsJournal of Electrical Engineering and Technology
No ratings yet
A Comprehensive Review On Ensemble Solar Power Forecasting AlgorithmsJournal of Electrical Engineering and Technology
15 pages
Beyond The Black Box An Intuitive Approach To Investment Prediction With Machine Learning
100% (1)
Beyond The Black Box An Intuitive Approach To Investment Prediction With Machine Learning
17 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
151 pages
Boedeker Kearns 2019 Linear Discriminant Analysis For Prediction of Group Membership A User Friendly Primer
No ratings yet
Boedeker Kearns 2019 Linear Discriminant Analysis For Prediction of Group Membership A User Friendly Primer
14 pages
International Journal of Fatigue: Ankit Agrawal, Alok Choudhary
No ratings yet
International Journal of Fatigue: Ankit Agrawal, Alok Choudhary
12 pages
Rosenthal CoverLetter IMF March2021
No ratings yet
Rosenthal CoverLetter IMF March2021
1 page
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
4 pages
Sumatra Traditional Food Image Classification Using Classical Machine Learning
No ratings yet
Sumatra Traditional Food Image Classification Using Classical Machine Learning
5 pages
Artificial Intelligence For Analyzing Academic Performance in Higher Education Institutions. A Systematic Literature Review
No ratings yet
Artificial Intelligence For Analyzing Academic Performance in Higher Education Institutions. A Systematic Literature Review
22 pages
Big Data Computing Notes
No ratings yet
Big Data Computing Notes
17 pages
dsAnswers
No ratings yet
dsAnswers
14 pages
Based On URL Feature Extraction
No ratings yet
Based On URL Feature Extraction
6 pages
Fake Profile Identification in Online Social Networks
No ratings yet
Fake Profile Identification in Online Social Networks
5 pages
Virtual Screening
No ratings yet
Virtual Screening
11 pages
Agrilyst The Crop Advisor
No ratings yet
Agrilyst The Crop Advisor
7 pages
Project Report
No ratings yet
Project Report
13 pages
Unit IV Naïve Bayes and Support Vector Machine
No ratings yet
Unit IV Naïve Bayes and Support Vector Machine
22 pages
Predictive Models of Embodied Carbon Emissions in Building Design Phases - Machine Learning Approaches Based On Residential Buildings in China
No ratings yet
Predictive Models of Embodied Carbon Emissions in Building Design Phases - Machine Learning Approaches Based On Residential Buildings in China
15 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Loan Prediction Project

Uploaded by

Loan Prediction Project

Uploaded by

Data Mining Project

Loan Prediction Using Data Mining

Dataset Data Attribute

• EMI - EMI is the monthly amount to be paid by the

Multiply with 1000 to make the units equal

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.