Loan Prediction Project
Loan Prediction Project
Analysis of
Data Mining
results
Data Generation
• The data will be collected from the kaggle.com
website. The website has data about Dream
Housing Finance company i.e. deals in all
home loans. They have presence across all
urban, semi urban and rural areas. Customer
first apply for home loan after that company
validates the customer eligibility for loan. We
are using direct that data because kaggle.com
website provide direct in CSV format.
Some Data Attributes
• Loan_ID :Unique Loan ID
• Gender: Male/ Female
• Married: Applicant married (Y/N)
• Dependents: Number of dependents on the applicant
• Education: Applicant Education (Graduate/ Under Graduate)
• Self_Employed: Self employed (Y/N)
• ApplicantIncome: Applicant income
• CoapplicantIncome: Co-applicant income
• LoanAmount: Loan amount in thousands
• Loan_Amount_Term: Term of loan in months
• Credit_History: credit history meets guidelines
• Property_Area: Urban/ Semi Urban/ Rural
• Loan_Status: Loan approved (Y/N)
Attribute Selection
• We select different attributes for loan prediction and
individual applicant prediction.
• We first create all possible subsets of attributes
• Then it makes use of a classification algorithm like
Random Forest, Decision Tree etc. to predict the classifier
from the features available in each subset.
• Then it calculates the accuracy of prediction for each of
the subsets.
• After comparing the accuracy of each subset, this method
then chooses the best subset of attributes whose accuracy
is the highest.
Data Mining Techniques
• We use Logistic Regression using stratified k-
folds cross validation method for predicting
the loan status. It has been found in the
results that error in Linear Regression classifier
is less than Current Run Rate method in
estimating the final prediction
• Stratification is the process of rearranging the
data so as to ensure that each fold is a good
representative of the whole.
Data Mining Techniques
• Based on the domain knowledge, we can
come up with new attributes that might affect
the target variable. We will create the
following three new attributes.
• Total Income - we will combine the Applicant
Income and Co-applicant Income. If the total
income is high, chances of loan approval might
also be high.
Data Mining Techniques
• Balance Income - This is the income left after the EMI has
been paid. Idea behind creating this variable is that if this
value is high, the chances are high that a person will repay
the loan and hence increasing the chances of loan approval.
Thank you!!!
RITU KUMARI