0% found this document useful (0 votes)
31 views

Loan Prediction Project

LOAN PREDICTION PROJECT

Uploaded by

Jathavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Loan Prediction Project

LOAN PREDICTION PROJECT

Uploaded by

Jathavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Mining Project

Loan Prediction Using Data Mining


RITU KUMARI
Overview

Dataset Data Attribute


Cleaning Selection
Generation

Analysis of
Data Mining
results
Data Generation
• The data will be collected from the kaggle.com
website. The website has data about Dream
Housing Finance company i.e. deals in all
home loans. They have presence across all
urban, semi urban and rural areas. Customer
first apply for home loan after that company
validates the customer eligibility for loan. We
are using direct that data because kaggle.com
website provide direct in CSV format.
Some Data Attributes
• Loan_ID :Unique Loan ID
• Gender: Male/ Female
• Married: Applicant married (Y/N)
• Dependents: Number of dependents on the applicant
• Education: Applicant Education (Graduate/ Under Graduate)
• Self_Employed: Self employed (Y/N)
• ApplicantIncome: Applicant income
• CoapplicantIncome: Co-applicant income
• LoanAmount: Loan amount in thousands
• Loan_Amount_Term: Term of loan in months
• Credit_History: credit history meets guidelines
• Property_Area: Urban/ Semi Urban/ Rural
• Loan_Status: Loan approved (Y/N)
Attribute Selection
• We select different attributes for loan prediction and
individual applicant prediction.
• We first create all possible subsets of attributes
• Then it makes use of a classification algorithm like
Random Forest, Decision Tree etc. to predict the classifier
from the features available in each subset.
• Then it calculates the accuracy of prediction for each of
the subsets.
• After comparing the accuracy of each subset, this method
then chooses the best subset of attributes whose accuracy
is the highest.
Data Mining Techniques
• We use Logistic Regression using stratified k-
folds cross validation method for predicting
the loan status. It has been found in the
results that error in Linear Regression classifier
is less than Current Run Rate method in
estimating the final prediction
• Stratification is the process of rearranging the
data so as to ensure that each fold is a good
representative of the whole.
Data Mining Techniques
• Based on the domain knowledge, we can
come up with new attributes that might affect
the target variable. We will create the
following three new attributes.
• Total Income - we will combine the Applicant
Income and Co-applicant Income. If the total
income is high, chances of loan approval might
also be high.
Data Mining Techniques
• Balance Income - This is the income left after the EMI has
been paid. Idea behind creating this variable is that if this
value is high, the chances are high that a person will repay
the loan and hence increasing the chances of loan approval.

• EMI - EMI is the monthly amount to be paid by the


applicant to repay the loan. Idea behind making this
variable is that people who have high EMI’s might find it
difficult to pay back the loan. We can calculate the EMI by
taking the ratio of loan amount with respect to loan amount
term.
• By distribution , I check it that these three
attributes are giving better result than previews
given attribute whose all ready present in training
set.
• The correlation between those old features and
these new features will be very high and logistic
regression assumes that the variables are not
highly correlated. We also wants to remove the
noise from the dataset, so removing correlated
features will help in reducing the noise too.
Algorithm for the New Attributes
1.Total_Income = Applicant_Income +
Coapplicant_ Income
2.EMI = Loan_Amount / Loan_Amount_Term
3.Balance Income=Total_Income - (EMI*1000)

Multiply with 1000 to make the units equal


Data Mining Algorithm
Decision Tree
In this technique, we split the population or sample into
two or more homogeneous sets based on most significant
splitter / differentiator in input variables.
Decision trees use multiple algorithms to decide to split a
node in two or more sub-nodes. The creation of sub-
nodes increases the homogeneity of resultant sub-nodes.
From this algorithm we will get purity of the node
increases with respect to the target variable.
The mean validation accuracy for this model is 0.69
Data Mining Algorithm
Random Forest
• Random Forest is a tree based bootstrapping algorithm
wherein a certain no. of weak learners (decision trees) are
combined to make a powerful prediction model.
• For every individual learner, a random sample of rows and
a few randomly chosen variables are used to build a
decision tree model.
• Final prediction can be a function of all the predictions
made by the individual learners.
• In case of regression problem, the final prediction can be
mean of all the predictions.
Data Mining Algorithm
XGBoost
XGBoost works only with numeric variables
and we have already replaced the categorical
variables with numeric variables.
We got an accuracy of 0.73611 with this
model.
Result
• After trying and testing 4 different algorithms,
the best accuracy on the public leader board is
achieved by Logistic Regression (0.7847),
followed by Random Forest (0.7638).

Thank you!!!
RITU KUMARI

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy