Credit Card Fraud Detection
Credit Card Fraud Detection
In the banking sector lots of people are applying for bank loans but
the bank has its limited assets which it has to grant to limited people only,
so finding out to whom the loan can be granted which will be a safer
option for the bank is a typical process. So in this project we try to reduce
this risk factor behind selecting the safe person so as to save lots of bank
efforts and assets. This is done by mining the Big Data of the previous
records of the people to whom the loan was granted before and on the basis
of these records/experiences the machine was trained using the machine
learning model which give the most accurate result. The main objective of
this project is to predict whether assigning the loan to particular person
will be safe or not. This project is divided into four sections (i)Data
Collection (ii) Comparison of machine learning models on collected data
(iii) Training of system on most promising model (iv) Testing.
1.INTRODUCTION
Distribution of the loans is the core business part of almost every bank. The main
portion the bank’s asset is directly came from the profit earned from the loans
distributed by the banks. In the banking environment, the prime objective is to
invest their assets where they are in safe hands. Many banks and financial large
companies today approve loans after a verification and validation process of
regress, but there is still no guarantee that the applicant selected will be the
deserving right applicant out of all applicants. This method helps us to determine
whether the applicant is right or not. Loan prediction is very helpful for both the
bank employee and the applicant as well. The goal of this paper is to provide the
right applicants with a simple, fast and easy way to choose. It can provide the bank
with special benefits. The applicant may have a time limit to check whether his /
her loan may or may not be sanctioned. Loan Prediction System makes it possible
to jump to a specific application so that it can be checked on a priority basis. This
paper is specifically for the management authority of the bank and finance
organization, no investors would be able to alter the processing of the entire
process of prediction is done privately. Report may be sent to different
departments of banks against particular Loan Id so that they can take appropriate
action on request. It helps to conduct other formalities in all other departments.
Because of their high accuracy and ability to formulate a statistical model in
simple language, decision trees are widely used in the banking industry. Because
government organizations are closely monitoring lending practices in many
countries, executives need to be able to explain why one applicant was rejected for
a loan while the others were accepted. Such data is also useful to consumers trying
to decide why their credit rating is unsatisfactory. Automated credit scoring
systems are likely to be used to automatically accept telephone or internet credit
requests.
We introduce an effective prediction technique that helps the banker to predict the
credit risk for customers who have applied for loan. A prototype is described in the
paper which can be used by the organizations for making the correct decision for
approve or reject the request for loan of the customers. We will also see how the
model's results can be modified to minimize errors leading to the institution's
financial loss. Bank place a vital role in market economy. The success on failure of
organization largely depends on the industry ability to evaluate credit risk. Before
giving the credit loan to the customer, bank decide whether the customer is bad or
good. The prediction of customer status i.e in future borrower will be good or bad
is a challenging task for any organization or bank. Basically the loan defaulter
prediction is a binary classification problem, loan amount governs is creditability
for receiving loan. The problem is to classify customer as good or bad. However,
developing such a model is a very challenging task due to increasing in demand
for loans.
2.LITERATURE SURVEY
1. sAn exploratory data analysis for loan prediction based on nature of the
clients
This paper's main purpose is to classify and analyse the nature of applicants
for loans. Depending on certain parameters, this paper classifies the
customers. Classification is carried out using analyses of exploratory data.
Analysis of exploratory data is a technique for analysing data sets that
summarizes the main features with visual methods.
Here the author explains the loan prediction method of the Ensemble. This paper
describes a prototype of the model that the organization can use to make the right
decision to approve or reject customer loans. This paper provided an ensemble
model for loan forecasting under various training algorithms using several
parameters. The main purpose of this paper is to test model accuracy and develop a
new model called ensemble model that combines the outputs of the three different
models to predict applicant’s loans.
Here the author used six classification models for machine learning to predict
applications. The models are Decision trees, Random Forest, Support Vector
Machine, Linear System, Neural Networks and Adaboost. The main purpose of
this paper is to provide a simple, immediate and quick way to select the eligible
applicants. In this paper, the concept of a banking sector, there were lots of people
applying for bank loans but the bank had its limited slots that only limited people
have to be granted.
Here the author discusses credit risk and loan prediction. In this paper we got all
the information about credit prediction and credit risk. Bank success depends
mainly on credit risk analysis plays a vital role in the banking domain. The paper
used the method of random forest.
This work includes creating an ensemble model through the combination of three
separate machine learning models. Prototype of the model that organizations can
use to decide correctly or properly to approve or reject the customer's loan
application. This application can help banks to predict the future of the loan and its
status and depends on it being able to take action in the initial days of the loan.
Using this application banks can reduce the number of bad loans and the loss of
servers.
The author shows Naïve Bayesian model here to predict the sanctioning of loans.
This paper proposes a loan sanction system based on certain attributes to
determine whether or not a loan should be granted to a consumer. The system we
are proposing for bankers will help them to predict the credible customers who
have applied for loans by improving the chances of their loans being repaid on
time.
3.SYSTEM ANALYSIS
3.1 EXISTING SYSTEM:
Some of the others had done their work regarding loan prediction analysis by
using “NAIVE’S BAYES” classifier and “KNN” classifier.
These are the main drawbacks of these algorithms in the existing system. In
proposed system to overcome these problems we are using other algorithms.
3.2PROPOSED SYSTEM
The proposed model focuses on predicting the creditability of customers for loan
repayment by analysing their behaviour. The input to the model is the customer
behaviour collected. On the output from the classifier, decision on whether to approve or
reject the customer request can be made. Using different data analytics tools loan
prediction and there severity can be forecasted. In this process it is required to train the
data using different algorithms and then compare user data with trained data to predict
the nature of loan.
Python has is a good area for data analytical which helps us in analysing the data
with better models in data science. The libraries in python make the predication for loan
data and results with multiple terms considering all properties of the customer in terms
of predicting.
Data Selection
The data collected for mining process may contain missing values, noise or
inconsistency. A data mining process with high quality of data will produce an efficient
data mining results
Data Pre-Processing
It is the most time consuming space of a data mining process. Data mining
process which deals with preparation and transformation from the initial data set to final
data set.
Its objective is to find a derived model that describe and distinguishes data classes
or concepts. The derived model is based on the analysis set of training data
Prediction
The model is tested using the test data set by using the predict of function. It is
used to predict missing or unavailable numerical data value rather than class label.
Evaluation
In the final stage, the designed system is tested with test set and the performance
is assured.
Proposed Algorithms
HARDWARE REQUIREMENTS:
SOFTWARE REQUIREMENTS:
FEASIBILITY STUDY
ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY
ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must be
justified. Thus the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Any system developed must not have a high
demand on the available technical resources. This will lead to high demands on the
available technical resources. This will lead to high demands being placed on the
client. The developed system must have a modest requirement, as only minimal or
null changes are required for implementing this system.
SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently.
The user must not feel threatened by the system, instead must accept it as a
necessity. The level of acceptance by the users solely depends on the methods that
are employed to educate the user about the system and to make him familiar with
it. His level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the system.
4.SYSTEM DESIGN
This section describes the system in narrative form using non-technical terms. It
should provide a high-level system architecture diagram showing a subsystem breakout
of the system, if applicable. The high-level system architecture or subsystem diagrams
should, if applicable, show interfaces to external systems. Supply a high- level context
diagram for the system and subsystems, if applicable.
This section describes any constraints in the system design (reference any trade-
off analyses conducted such, as resource use versus productivity, or conflicts with other
systems) and includes any assumptions made by the project team in developing the
system design.
The organization code and title of the key points of contact (and alternates if
appropriate) for the information system development effort. These points of contact
should include the Project Manager, System Proponent, User Organization, Quality
Assurance (QA) Manager, Security Manager, and Configuration Manager, as appropriate.
SYSTEM ARCHITECTURE
System architecture is the conceptual model that defines the structure, behavior
and more views of a system. An architecture description is a formal description and
representation of a system, organized in a way that support reasoning about the
structure and behaviors of the system. It consists of system component and the sub-
system developed, that will work together to implement the overall system.
Diagram plays a very important role in the UML. These are kinds of modelling
diagrams are as follows:
In the above diagram, the performing specialists are customer, Loan Officer,
Admin. The customer exchanges the data to the system which disengages the data into
squares and gives the data to Python. By then Python does the data cleaning which is just
performing data connection and data repairing, by then the results will be secured. These
results can be seen using Python and can be secured in server for future reason.
Class Diagram
In software engineering, a class diagram in the Unified Modeling Language (UML)
is a type of static structure diagram that describes the structure of a system by showing
the system's classes, their attributes, operations (or methods), and the relationships
among objects.
The class graph is the most normally pulled in layout UML. It addresses the static
course of action perspective of the structure. It solidifies the strategy of classes,
interfaces, joint attempts and their affiliations.
In the above class diagram, the relationship that is the dependence between each
one of the classes is sketched out. Additionally, even the operations performed in each
and every class is similarly appeared.
Sequence Diagram
A sequence diagram simply depicts interaction between objects in a sequential
order i.e. the order in which these interactions take place. We can also use the terms
event diagrams or event scenarios to refer to a sequence diagram. Sequence diagrams
describe how and in what order the objects in a system function.
This is a support format, which tends to the principal relationship of articles that
send and get messages. It incorporates set of parts, connectors that interface the parts
and the messages sent and get by those parts. This graph is utilized to address the
dynamic perspective of the framework.
In the above collaboration diagram, the joint effort outline contains articles, way
and arrangement number. In the above graph, there are specifically customers, Loan
Officer, Admin. These items are connected to each other utilizing a way. A succession
number show the time request of a message.
State Chart Diagram
State chart diagram is one of the five UML diagrams used to model the dynamic
nature of a system. They define different states of an object during its lifetime and these
states are changed by events. State chart diagram describes the flow of control from one
state to another state.
In the above state chart diagram, a state outline graph contains two components
called states and progress. States speak to circumstances amid the life of a question. We
can without much of a stretch outline a state in Smart Draw by utilizing a rectangle with
adjusted corners.
Component Diagram
A component diagram, also known as a UML component diagram, describes the
organization and wiring of the physical components in a system. In the first version of
UML, components included in these diagrams were physical documents, database table,
files, and executables, all physical elements with a location.
The DFD is also called as bubble chart. It is a simple graphical formalism that can
be used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system. The data flow
diagram (DFD) is one of the most important modeling tools. It is used to model the
system components. These components are the system process, the data used by the
process, an external entity that interacts with the system and the information flows in the
system.
DFD shows the information moves through the system and how it is modified by a
series of transformation. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
DFD may be used to represent system at any level of abstraction. DFD may be
partitioned into levels that represent increasing information flow and functional detail.
Data Flow Diagram
A level 0 DFD describes the system wide boundaries, dealing input to and output
flow from the system and major processes. DFD Level 0 is in like way called a Context
Diagram. It's a urgent review of the entire structure or process being bankrupt down or
appeared. It's required to be an at first watch, demonstrating the framework as a
particular surprising state handle, with its relationship to outside substances.
In the above diagram, the customer will export the data to the officer. The loan
officer will maintain the data and store the data in server.
In the above diagram, the customer will export the data to the officer. The loan
officer will maintain the data and store the data in server. The server will install the
required libraries to perform their actions.
Level 2: File Level Detail Data Flow
In the above diagram, the customer will export the data to the officer. The loan
officer will maintain the data and store the data in server. The server will install the
required libraries in the system to perform their actions. The admin will perform the
analysis to the data.
Level 3:
LIST OF MODULES
Client Module
Admin Module
Data Analyst
Client Module
A client module is a network module that supports and
implements the client side of a Network Programming Interface (NPI).
A client module registers itself with the Network Module Registrar as a
Client of the NPI that it supports. A client module can register itself as
a client of more than one NPI.
Client module will take the data from the customer and it will
send the data to the Admin. Client will get the result analysis by using
different types of algorithms to get an accuracy value and send that
result to the customer through network.
Admin Module
Admin will get the data from the client and admin will perform
analysis based on given algorithms for the data and it will generate the
results. It will store the data for future references and it will update the
data.
Admin will grant the permissions to the user and admin will
control the user permissions. Admin will provide the security to the
data and there will be no any issues.
Data Analyst
Analyst will request the data from the admin and the admin will send the
appropriate data to the analyst. Analyst will perform the analysis on the data and
produce the results. The resultant results will send to the admin. Admin will store
the data and results for the future references. Interpreting data, analyzing results
using statistical techniques. Developing and implementing data analysis, data
collection systems and other strategies that optimize statistical efficiency and
quality
Algorithm
Logistic Regression
Logistic Regression is a Machine Learning algorithm which is used for the
classification problems; it is a predictive analysis algorithm and based on the concept of
probability. We can call a Logistic Regression a Linear Regression model but the Logistic
Regression uses a more complex cost function, this cost function can be defined as the
‘Sigmoid function’ or also known as the ‘logistic function’ instead of a linear function.
Sigmoid Function
In order to map predicted values to probabilities, we use the Sigmoid function.
The function maps any real value into another value between 0 and 1. In machine learning,
we use sigmoid to map predictions to probabilities.
Pros:
Cons:
Decision Tree
Decision Tree are a type of Supervising Machine Learning (that is to explain what
the input is and what the corresponding output is in the training data.) where the data is
continuously split according to a certain parameter. The tree can be explained by two
entities, namely Decision nodes and Leaves. The leaves are the decision or the final
outcome. The decision nodes are where the data is split. They can used to solve both
regression and classification problems.
Decision Trees use multiple algorithms to decide to split a node in two or more
sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In
other words, we can say that purity of the node increases with respect to the target
variable.
Decision Tree algorithm falls under the category of supervised learning. They can
be used to solve both regression and classification problems. Decision tree uses the tree
representation to solve the problem in which each leaf node corresponds to a class label
and attributes are represented on the internal node of the tree.
Pros:
Cons:
A small change in the data can cause a large change in the structure of
the decision tree causing instability.
For a Decision tree sometimes calculation can go far more
complex compared to other algorithms.
Decision tree often involves higher time to train the model.
Decision tree training is relatively expensive as complexity and time taken
is more.
Random Forest
Random Forest is a supervised learning algorithm which is used for both
classification as well as regression. But however, it is mainly used for classification
problems. As we know that a forest is made up of trees and more trees means more
robust forest. Similarly, random forest algorithm creates decision trees on data samples and
then gets the prediction from each of them and finally selects the best solution by means of
voting. It is an ensemble method which is better than a single decision tree because it
reduces the over-fitting by averaging the result.
Pros:
Cons:
Pros:
Cons:
What is Python :-
Below are some facts about Python.
Python is currently the most widely used multi-purpose, high-level programming language.
Programmers have to type relatively less and indentation requirement of the language,
makes them readable all the time.
Python language is being used by almost all tech-giant companies like – Google,
Amazon, Facebook, Instagram, Dropbox, Uber… etc.
The biggest strength of Python is huge collection of standard library which can be used
for the following –
Machine Learning
GUI Applications (like Kivy, Tkinter, PyQt etc. )
Web frameworks like Django (used by YouTube, Instagram, Dropbox)
Image processing (like Opencv, Pillow)
Web scraping (like Scrapy, BeautifulSoup, Selenium)
Test frameworks
Multimedia
Advantages of Python :-
1. Extensive Libraries
Python downloads with an extensive library and it contain code for various purposes like
regular expressions, documentation-generation, unit-testing, web browsers, threading,
databases, CGI, email, image manipulation, and more. So, we don’t have to write the
complete code for that manually.
2. Extensible
As we have seen earlier, Python can be extended to other languages. You can write some
of your code in languages like C++ or C. This comes in handy, especially in projects.
3. Embeddable
Complimentary to extensibility, Python is embeddable as well. You can put your Python
code in your source code of a different language, like C++. This lets us add scripting
capabilities to our code in the other language.
4. Improved Productivity
5. IOT Opportunities
Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright
for the Internet Of Things. This is a way to connect the language with the real world.
When working with Java, you may have to create a class to print ‘Hello World’. But in
Python, just a print statement will do. It is also quite easy to learn, understand, and code.
This is why when people pick up Python, they have a hard time adjusting to other more
verbose languages like Java.
7. Readable
Because it is not such a verbose language, reading Python is much like reading English.
This is the reason why it is so easy to learn, understand, and code. It also does not need
curly braces to define blocks, and indentation is mandatory. This further aids the
readability of the code.
8. Object-Oriented
Like we said earlier, Python is freely available. But not only can you download
Python for free, but you can also download its source code, make changes to it, and even
distribute it. It downloads with an extensive collection of libraries to help you with your
tasks.
10. Portable
When you code your project in a language like C++, you may need to make some changes
to it if you want to run it on another platform. But it isn’t the same with Python. Here, you
need to code only once, and you can run it anywhere. This is called Write Once Run
Anywhere (WORA). However, you need to be careful enough not to include any system-
dependent features.
11. Interpreted
Lastly, we will say that it is an interpreted language. Since statements are executed one by
one, debugging is easier than in compiled languages.
Any doubts till now in the advantages of Python? Mention in the comment section.
1. Less Coding
Almost all of the tasks done in Python requires less coding when the same task is done in
other languages. Python also has an awesome standard library support, so you don’t have to
search for any third-party libraries to get your job done. This is the reason that many people
suggest learning Python to beginners.
2. Affordable
Python is free therefore individuals, small companies or big organizations can leverage the
free available resources to build applications. Python is popular and widely used so it gives
you better community support.
The 2019 Github annual survey showed us that Python has overtaken Java in the most
popular programming language category.
Python code can run on any machine whether it is Linux, Mac or Windows. Programmers
need to learn different languages for different jobs but with Python, you can professionally
build web apps, perform data analysis and machine learning, automate things, do web
scraping and also build games and powerful visualizations. It is an all-rounder programming
language.
Disadvantages of Python
So far, we’ve seen why Python is a great choice for your project. But if you choose it, you
should be aware of its consequences as well. Let’s now see the downsides of choosing
Python over another language.
1. Speed Limitations
We have seen that Python code is executed line by line. But since Python is interpreted, it
often results in slow execution. This, however, isn’t a problem unless speed is a focal point
for the project. In other words, unless high speed is a requirement, the benefits offered by
Python are enough to distract us from its speed limitations.
As you know, Python is dynamically-typed. This means that you don’t need to declare the
type of variable while writing the code. It uses duck-typing. But wait, what’s that? Well, it
just means that if it looks like a duck, it must be a duck. While this is easy on the
programmers during coding, it can raise run-time errors.
5. Simple
No, we’re not kidding. Python’s simplicity can indeed be a problem. Take my example. I
don’t do Java, I’m more of a Python person. To me, its syntax is so simple that the verbosity
of Java code seems unnecessary.
This was all about the Advantages and Disadvantages of Python Programming Language.
History of Python : -
What do the alphabet and the programming language Python have in common? Right, both
start with ABC. If we are talking about ABC in the Python context, it's clear that the
programming language ABC is meant. ABC is a general-purpose programming language
and programming environment, which had been developed in the Netherlands, Amsterdam,
at the CWI (Centrum Wiskunde &Informatica). The greatest achievement of ABC was to
influence the design of Python.Python was conceptualized in the late 1980s. Guido van
Rossum worked that time in a project at the CWI, called Amoeba, a distributed operating
system. In an interview with Bill Venners1, Guido van Rossum said: "In the early 1980s, I
worked as an implementer on a team building a language called ABC at Centrum voor
Wiskunde en Informatica (CWI). I don't know how well people know ABC's influence on
Python. I try to mention ABC's influence because I'm indebted to everything I learned
during that project and to the people who worked on it."Later on in the same Interview,
Guido van Rossum continued: "I remembered all my experience and some of my frustration
with ABC. I decided to try to design a simple scripting language that possessed some of
ABC's better properties, but without its problems. So I started typing. I created a simple
virtual machine, a simple parser, and a simple runtime. I made my own version of the
various ABC parts that I liked. I created a basic syntax, used indentation for statement
grouping instead of curly braces or begin-end blocks, and developed a small number of
powerful data types: a hash table (or dictionary, as we call it), a list, strings, and numbers."
Before we take a look at the details of various machine learning methods, let's start by
looking at what machine learning is, and what it isn't. Machine learning is often categorized
as a subfield of artificial intelligence, but I find that categorization can often be misleading
at first brush. The study of machine learning certainly arose from research in this context,
but in the data science application of machine learning methods, it's more helpful to think of
machine learning as a means of building models of data.
At the most fundamental level, machine learning can be categorized into two main types:
supervised learning and unsupervised learning.
Human beings, at this moment, are the most intelligent and advanced species on earth
because they can think, evaluate and solve complex problems. On the other side, AI is still
in its initial stage and haven’t surpassed human intelligence in many aspects. Then the
question is that what is the need to make machine learn? The most suitable reason for doing
this is, “to make decisions, based on data, with efficiency and scale”.
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence,
Machine Learning and Deep Learning to get the key information from data to perform
several real-world tasks and solve problems. We can call it data-driven decisions taken by
machines, particularly to automate the process. These data-driven decisions can be used,
instead of using programing logic, in the problems that cannot be programmed inherently.
The fact is that we can’t do without human intelligence, but other aspect is that we all need
to solve real-world problems with efficiency at a huge scale. That is why the need for
machine learning arises.
While Machine Learning is rapidly evolving, making significant strides with cybersecurity
and autonomous cars, this segment of AI as whole still has a long way to go. The reason
behind is that ML has not been able to overcome number of challenges. The challenges that
ML is facing currently are −
Quality of data − Having good-quality data for ML algorithms is one of the biggest
challenges. Use of low-quality data leads to the problems related to data preprocessing and
feature extraction.
No clear objective for formulating business problems − Having no clear objective and
well-defined goal for business problems is another key challenge for ML because this
technology is not that mature yet.
Curse of dimensionality − Another challenge ML model faces is too many features of data
points. This can be a real hindrance.
Machine Learning is the most rapidly growing technology and according to researchers we
are in the golden year of AI and ML. It is used to solve many real-world complex problems
which cannot be solved with traditional approach. Following are some real-world
applications of ML −
Emotion analysis
Sentiment analysis
Speech synthesis
Speech recognition
Customer segmentation
Object recognition
Fraud detection
Fraud prevention
Arthur Samuel coined the term “Machine Learning” in 1959 and defined it as a “Field of
study that gives computers the capability to learn without being explicitly
programmed”.
And that was the beginning of Machine Learning! In modern times, Machine Learning is one
of the most popular (if not the most!) career choices. According to Indeed, Machine Learning
Engineer Is The Best Job of 2019 with a 344% growth and an average base salary
of $146,085 per year.
But there is still a lot of doubt about what exactly is Machine Learning and how to start
learning it? So this article deals with the Basics of Machine Learning and also the path you
can follow to eventually become a full-fledged Machine Learning Engineer. Now let’s get
started!!!
This is a rough roadmap you can follow on your way to becoming an insanely talented
Machine Learning Engineer. Of course, you can always modify the steps according to your
needs to reach your desired end-goal!
In case you are a genius, you could start ML directly but normally, there are some
prerequisites that you need to know which include Linear Algebra, Multivariate Calculus,
Statistics, and Python. And if you don’t know these, never fear! You don’t need a Ph.D.
degree in these topics to get started but you do need a basic understanding.
Both Linear Algebra and Multivariate Calculus are important in Machine Learning. However,
the extent to which you need them depends on your role as a data scientist. If you are more
focused on application heavy machine learning, then you will not be that heavily focused on
maths as there are many common libraries available. But if you want to focus on R&D in
Machine Learning, then mastery of Linear Algebra and Multivariate Calculus is very
important as you will have to implement many ML algorithms from scratch.
Data plays a huge role in Machine Learning. In fact, around 80% of your time as an ML
expert will be spent collecting and cleaning data. And statistics is a field that handles the
collection, analysis, and presentation of data. So it is no surprise that you need to learn it!!!
Some of the key concepts in statistics that are important are Statistical Significance,
Probability Distributions, Hypothesis Testing, Regression, etc. Also, Bayesian Thinking is
also a very important part of ML which deals with various concepts like Conditional
Probability, Priors, and Posteriors, Maximum Likelihood, etc.
Some people prefer to skip Linear Algebra, Multivariate Calculus and Statistics and learn
them as they go along with trial and error. But the one thing that you absolutely cannot skip
is Python! While there are other languages you can use for Machine Learning like R, Scala,
etc. Python is currently the most popular language for ML. In fact, there are many Python
libraries that are specifically useful for Artificial Intelligence and Machine Learning such
as Keras, TensorFlow, Scikit-learn, etc.
So if you want to learn ML, it’s best if you learn Python! You can do that using various
online resources and courses such as Fork Python available Free on GeeksforGeeks.
Step 2 – Learn Various ML Concepts
Now that you are done with the prerequisites, you can move on to actually learning ML
(Which is the fun part!!!) It’s best to start with the basics and then move on to the more
complicated stuff. Some of the basic concepts in ML are:
Model – A model is a specific representation learned from data by applying some machine
learning algorithm. A model is also called a hypothesis.
Feature – A feature is an individual measurable property of the data. A set of numeric
features can be conveniently described by a feature vector. Feature vectors are fed as input to
the model. For example, in order to predict a fruit, there may be features like color, smell,
taste, etc.
Target (Label) – A target variable or label is the value to be predicted by our model. For the
fruit example discussed in the feature section, the label with each set of input would be the
name of the fruit like apple, orange, banana, etc.
Training – The idea is to give a set of inputs(features) and it’s expected outputs(labels), so
after training, we will have a model (hypothesis) that will then map new data to one of the
categories trained on.
Prediction – Once our model is ready, it can be fed a set of inputs to which it will provide a
predicted output(label).
Supervised Learning – This involves learning from a training dataset with labeled data using
classification and regression models. This learning process continues until the required level
of performance is achieved.
Unsupervised Learning – This involves using unlabelled data and then finding the
underlying structure in the data in order to learn more and more about the data itself using
factor and cluster analysis models.
Semi-supervised Learning – This involves using unlabelled data like Unsupervised Learning
with a small amount of labeled data. Using labeled data vastly increases the learning accuracy
and is also more cost-effective than Supervised Learning.
Reinforcement Learning – This involves learning optimal actions through trial and error. So
the next action is decided by learning behaviors that are based on the current state and that will
maximize the reward in the future.
Advantages of Machine learning :-
Machine Learning can review large volumes of data and discover specific trends and patterns
that would not be apparent to humans. For instance, for an e-commerce website like Amazon, it
serves to understand the browsing behaviors and purchase histories of its users to help cater to
the right products, deals, and reminders relevant to them. It uses the results to reveal relevant
advertisements to them.
With ML, you don’t need to babysit your project every step of the way. Since it means giving
machines the ability to learn, it lets them make predictions and also improve the algorithms on
their own. A common example of this is anti-virus softwares; they learn to filter new threats as
they are recognized. ML is also good at recognizing spam.
3. Continuous Improvement
As ML algorithms gain experience, they keep improving in accuracy and efficiency. This lets
them make better decisions. Say you need to make a weather forecast model. As the amount of
data you have keeps growing, your algorithms learn to make more accurate predictions faster.
Machine Learning algorithms are good at handling data that are multi-dimensional and multi-
variety, and they can do this in dynamic or uncertain environments.
5. Wide Applications
You could be an e-tailer or a healthcare provider and make ML work for you. Where it does
apply, it holds the capability to help deliver a much more personal experience to customers
while also targeting the right customers.
1. Data Acquisition
Machine Learning requires massive data sets to train on, and these should be
inclusive/unbiased, and of good quality. There can also be times where they must wait for new
data to be generated.
ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose
with a considerable amount of accuracy and relevancy. It also needs massive resources to
function. This can mean additional requirements of computer power for you.
3. Interpretation of Results
Another major challenge is the ability to accurately interpret results generated by the
algorithms. You must also carefully choose the algorithms for your purpose.
4. High error-susceptibility
Machine Learning is autonomous but highly susceptible to errors. Suppose you train an
algorithm with data sets small enough to not be inclusive. You end up with biased predictions
coming from a biased training set. This leads to irrelevant advertisements being displayed to
customers. In the case of ML, such blunders can set off a chain of errors that can go undetected
for long periods of time. And when they do get noticed, it takes quite some time to recognize
the source of the issue, and even longer to correct it.
Purpose :-
Python
Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.
Python is Interactive − you can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Python also acknowledges that speed of development is important. Readable and terse
code is part of this, and so is access to powerful constructs that avoid tedious repetition of
code. Maintainability also ties into this may be an all but useless metric, but it does say
something about how much code you have to scan, read and/or understand to
troubleshoot problems or tweak behaviors. This speed of development, the ease with
which a programmer of other languages can pick up basic Python skills and the huge
standard library is key to another area where Python excels. All its tools have been quick to
implement, saved a lot of time, and several of them have later been patched and updated
by people with no Python background - without breaking.
Tensorflow
TensorFlow was developed by the Google Brain team for internal Google use. It was
released under the Apache 2.0 open-source license on November 9, 2015.
Numpy
Pandas
Matplotlib
Scikit – learn
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a
consistent interface in Python. It is licensed under a permissive simplified BSD license and is
distributed under many Linux distributions, encouraging academic and commercial use.
Python
Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.
Python is Interactive − you can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Python also acknowledges that speed of development is important. Readable and terse
code is part of this, and so is access to powerful constructs that avoid tedious repetition of
code. Maintainability also ties into this may be an all but useless metric, but it does say
something about how much code you have to scan, read and/or understand to
troubleshoot problems or tweak behaviors. This speed of development, the ease with
which a programmer of other languages can pick up basic Python skills and the huge
standard library is key to another area where Python excels. All its tools have been quick to
implement, saved a lot of time, and several of them have later been patched and updated
by people with no Python background - without breaking.
There have been several updates in the Python version over the years. The question is how to
install Python? It might be confusing for the beginner who is willing to start learning Python
but this tutorial will solve your query. The latest or the newest version of Python is version
3.7.4 or in other words, it is Python 3.
Note: The python version 3.7.4 cannot be used on Windows XP or earlier devices.
Before you start with the installation process of Python. First, you need to know about
your System Requirements. Based on your system type i.e. operating system and based
processor, you must download the python version. My system type is a Windows 64-bit
operating system. So the steps below are to install python version 3.7.4 on Windows 7 device
or to install Python 3. Download the Python Cheatsheet here.The steps on how to install Python
on Windows 10, 8 and 7 are divided into 4 parts to help understand better.
Step 1: Go to the official site to download and install python using Google Chrome or any
other web browser. OR Click on the following link: https://www.python.org
Now, check for the latest and the correct version for your operating system.
Step 3: You can either select the Download Python for windows 3.7.4 button in Yellow Color
or you can scroll further down and click on download with respective to their version. Here,
we are downloading the most recent python version for windows 3.7.4
Step 4: Scroll down the page until you find the Files option.
Step 5: Here you see a different version of python along with the operating system.
• To download Windows 32-bit python, you can select any one from the three options:
Windows x86 embeddable zip file, Windows x86 executable installer or Windows x86 web-
based installer.
•To download Windows 64-bit python, you can select any one from the three options:
Windows x86-64 embeddable zip file, Windows x86-64 executable installer or Windows x86-
64 web-based installer.
Here we will install Windows x86-64 web-based installer. Here your first part regarding which
version of python is to be downloaded is completed. Now we move ahead with the second part
in installing python i.e. Installation
Note: To know the changes or updates that are made in the version you can click on the
Release Note Option.
Installation of Python
Step 1: Go to Download and Open the downloaded python version to carry out the installation
process.
Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.7 to PATH.
Step 3: Click on Install NOW After the installation is successful. Click on Close.
With these above three steps on python installation, you have successfully and correctly
installed Python. Now is the time to verify the installation.
Note: The installation process might take a couple of minutes.
Step 5: Name the file and save as type should be Python files. Click on SAVE. Here I have
named the files as Hey World.
Step 6: Now for e.g. enter print
6.SYSTEM TEST
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub assemblies, assemblies and/or a finished product It is the process of exercising
software with the intent of ensuring that the Software system meets its requirements and user
expectations and does not fail in an unacceptable manner. There are various types of test. Each
test type addresses a specific testing requirement.
TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid outputs. All
decision branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly defined inputs
and expected results.
Integration testing
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is more concerned
with the basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically aimed at
exposing the problems that arise from the combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system documentation, and
user manuals.
Functional testing is centered on the following items:
System Test
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An example of
system testing is the configuration oriented system integration test. System testing is based on
process descriptions and flows, emphasizing pre-driven process links and integration points.
White Box Testing
White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at least its purpose.
It is purpose. It is used to test areas that cannot be reached from a black box level.
Unit testing is usually conducted as part of a combined code and unit test phase
of the software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.
Field testing will be performed manually and functional tests will be written in
detail.
Test objectives
All field entries must work properly.
Pages must be activated from the identified link.
The entry screen, messages and responses must not be delayed.
Features to be tested
Verify that the entries are of the correct format
No duplicate entries should be allowed
All links should take the user to the correct page.
Integration Testing
Software integration testing is the incremental integration testing of two or more
integrated software components on a single platform to produce failures caused by interface
defects.
The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level –
interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation
by the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
TEST CASES
Experiments include an arrangement of steps, conditions and
sources of info that can be utilized while performing testing
undertakings. The principle expectation of this action is to guarantee
whether a product passes or bombs as far as usefulness and different
perspectives. The way toward creating experiments can likewise help
discover issues in the prerequisites or plan of an application.
Experiment goes about as the beginning stage for the test execution,
and in the wake of applying an arrangement of information esteems;
the application has a conclusive result and leaves the framework at
some end point or otherwise called execution post condition.
7.SCREENSHOTS
1. INDEPENDENT VARIABLE
Independent Variable
In the above figure, independent variables are the input for a process that is
being analyses. Independent variables are Gender, Marital status, Self employee, Credit
history, Education.
2. BIVARIATE ANALYSIS
Bivariate Analysis
In the above figure, every variable looking individually in univariate, we will now
explore them again with respect to the target variable in Bivariate analysis.
3. CO-APPLICANT STATUS
In the above figure, it will analyse the co-applicant income and loan amount
variable in similar manner.
4. CONFUSION MATRIX STATUS
In the above figure, it will evaluate the model with confusion matrix. The
confusion matrix shows the how many are predicted and how many are perfectly
approved the loan.
5. CREDIT HISTORY STATUS
Dependent Status
In the above figure, it will predict the customer is dependent and whether
customer pays in time or not.
7. EDUCATION STATUS
In the above figure, it evaluates the how many customers are graduated or non-
graduated.
In the above figure, it will predict the customer income is normally distributed
or not.
In the above figure, it will predict the customer income is normally distributed
or not.
11. MARRIED STATUS
Married Status
In the above figure, it evaluates the how many customers are married or
single.
12. MATRIX STATUS
Matrix Status
In the above figure, it will predict the overall customer income, loan amount,
credit history and loan status.
13. PROPERTY AREA STATUS
In the above figure, it predicts the customer place is whether rural, semi-
urban and urban.
14. SELF EMPLOYED STATUS
In the above figure, it will predicts the whether the customer is self employed or
not.
15. TOTAL INCOME
In the above figure, it will predict the overall customer income in the test data
set and train data set.
16. FEATURE IMPORTANCE STATUS
In the above figure, it will predict the overall details of the customers.
17. LOGISTIC REGRESSIONS
In the above figure, it will predict the loan accuracy by using logistic
regressions.
18. DECISION TREE
In the above figure, it will predict the loan accuracy by using decision tree.
19. RANDOM FOREST
In the above figure, it will predict the loan accuracy by using random
forest.
20. XGBOOST
Accuracy of XG BOOST
In the above figure, it will predict the loan accuracy by using XGBoost
8.CONCLUSION
The main purpose of the project is to classify and analyze the
nature of the loan applicants. From a proper analysis of data set and
constraints of the banking sector, seven different graphs were generated
and visualized. From the graphs, many conclusions have been made and
information was inferred such as short-term loan was preferred by
majority of the loan applicants and the clients majorly apply loan for
debt consolidation. This paper work can be extended to higher level in
future. Predictive model for loans that uses machine learning
algorithms, where the results from each graph of the paper can be taken
as individual criteria for the machine learning algorithm.
9 FUTURE ENHANCEMENTS
In Future Enhancement we can analyze the data by using various types of
algorithms. This project work can be extended to higher level in future. Predictive
model for loans that uses machine learning algorithms, where the results from each
graph of the paper can be taken as individual criteria for the machine learning
algorithm. In upcoming years many models can be used in Building knowledge
management platforms for customer service that improve first call resolution,
average handling time, and customer satisfaction rates. In finance, the prediction of
future outcomes and the assignment of probabilities to those results. This will
definitely help open up efficient delivery channels for the banking industry. It is
important to implement other techniques that outperform the performance of
popular data mining models and to test them for the domain.
10. REFERENCES
Cowell, R.G., A.P., Lauritez, S.L., and Spiegelhalter, D.J. (1999). Graphical
models and Expert Systems. Berlin: Springer. This is a good introduction to
probabilistic graphical models.
Kumar Arun, Garg Ishan, Kaur Sanmeet, MayJun. 2016. Loan Approval
Prediction based on Machine Learning Approach, IOSR Journal of
Computer Engineering (IOSR-JCE).
Wei Li, Shuai Ding, Yi Chen, and Shanlin Yang, Heterogeneous Ensemble for
Default Prediction of Peer-to-Peer Lending in China, Key Laboratory of
Process Optimization and Intelligent Decision Making, Ministry of
Education, Hefei University of Technology, Hefei 23009, China