0% found this document useful (0 votes)
24 views41 pages

Batch6DocumentationFinal_1

The document presents a major project report on 'Phishing Scam Detection Using Machine Learning' by students from SRM University, focusing on the development of a machine learning algorithm to detect phishing scams. It highlights the challenges of phishing detection, the need for a reliable system, and the use of a decision tree algorithm to classify phishing attempts. The report includes sections on literature review, system design, implementation, and future enhancements, emphasizing the importance of combating phishing threats in the digital landscape.

Uploaded by

Sudipta Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views41 pages

Batch6DocumentationFinal_1

The document presents a major project report on 'Phishing Scam Detection Using Machine Learning' by students from SRM University, focusing on the development of a machine learning algorithm to detect phishing scams. It highlights the challenges of phishing detection, the need for a reliable system, and the use of a decision tree algorithm to classify phishing attempts. The report includes sections on literature review, system design, implementation, and future enhancements, emphasizing the importance of combating phishing threats in the digital landscape.

Uploaded by

Sudipta Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

PHISHING SCAM DETECTION USING

MACHINE LEARNING
MAJOR PROJECT REPORT

Submitted by

BADRI NARAYAN.M [Reg No: RA1511003020078]


VIJAY KISHORE.V [Reg No: RA1511003020073]
KEERTHIVASAN.R [Reg No: RA1511003020088]
NARESH SOLANKI [Reg No: RA1511003020124]

Under the guidance of


Mrs. ELAKIYA.R
(Asst. Professor (O.G), Department of Computer Science & Engineering)

COMPUTER SCIENCE & ENGINEERING

FACULTY OF ENGINEERING AND TECHNOLOGY

Ramapuram, Chennai, 600089

MAY 2019
SRM UNIVERSITY
(Under Section 3 of UGC Act, 1956)

BONAFIDE CERTIFICATE

Certified that this project report titled “PHISHING SCAM DETECTION USING
MACHINE LEARNING” is the bonafide work of “BADRI NARAYAN MOHAN
[Reg No: RA151100302078], VIJAY KISHORE.V [Reg No: RA1511003020073],
KEERTHIVASAN.R [Reg No: RA1511003020088], NARESH SOLANKI [Reg
No: RA1511003020124]”, who carried out the project work under my supervision.
Certified further, that to the best of my knowledge the work reported here in does not
form any other project report or dissertation.

SIGNATURE SIGNATURE

Mrs.ELAKIYA.R, M.E. Dr. J. JAGADEESAN, M.Tech, Ph.D.

ASSISTANT PROFESSOR (O.G) PROFESSOR AND HEAD

Dept. of Computer Science and Engg. Dept. of Computer Science and Engg.

SRM Institute of Science and Technology SRM Institute of Science and Technology
Ramapuram Ramapuram
CHENNAI CHENNAI

Submitted for the project viva voice held on ……………………… at


SRM Institute of Science and Technology, Ramapuram, Chennai - 89

EXAMINER I EXAMINER II
ABSTRACT

As a wrongdoing of utilizing specialized intends to take sensitive data of clients and


users in the internet, phishing is as of now an advanced risk confronting the Internet,
and misfortunes due to phishing are developing consistently. Recognition of these
phishing scams is a very testing issue on the grounds that phishing is predominantly a
semantics based assault, which particularly manhandles human vulnerabilities,
anyway not system or framework vulnerabilities. Phishing costs. As a product
discovery plot, two primary methodologies are generally utilized: blacklists/whitelists
and machine learning approaches. Every phishing technique has different parameters
and type of attack. Using decision tree algorithm we find out whether the attack is
legitimate or a scam. We measure this by grouping them with diverse parameters and
features, thereby assisting the machine learning algorithm to edify. This Project is
constructed on the mainframe idea to resolve scams that happen by falsifying and
faking an user into giving details about himself and personal sensitive details which
could be used against the user to obtain money, or any other possession which is
equally valuable.
TABLE OF CONTENTS

Abstract iii
List of Figures iv

Chapter 1 INTRODUCTION Page No.


1.1 Overview 1
1.2 Problem Statement 2
1.3 Objective 2
1.4 Organization of the report 3

Chapter 2 LITERATURE SURVEY 4


2.1 Introduction 4
2.2 Existing System 4
2.2.1 A Switch to email 5
2.2.2 Spear phishing 5
2.2.3 Cybersquatting 6
2.2.4 Typosquatting 6
2.2.5 PageRank Based Detection Technique 7
2.2.6 Intelligent Anti-phishing Strategy Model 7
2.2.7 Antiphishing through Phishing Target Discovery 7

2.2.8 Software-Defined Network Function Virtualization 8

2.2.9 The Fog Computing Paradigm: Scenarios and Security Issues 8


2.3 Issues in Existing System 8
2.4 Summary of Literature Survey 9
2.5 Proposed System 9

Chapter 3 SPECIFICATIONS 10
3.1 Introduction 10
3.1.1 Purpose of Content 10

3.2 OVERALL DESCRIPTION 11


3.2.1 Project features 12
3.2.2 Operating environment 13

3.2.2.1 Python 13

3.2.3.2 Liclipse 14

3.2.3.3 Structured Query Language (SQL) 15

3.2.3.4 SQL standard and proprietary extensions 16

3.2.3.5 SQL commands and syntax 17

3.2.3 Design and Implementation Constraints 18

3.2.4 Preliminary Assumptions and Dependencies 18

Chapter 4 SYSTEM DESIGN 19

3.1 Introduction 19
3.2 System Architecture 20
4.2.1 Description 20
3.3 System Requirement 21
3.4 Summary 21

Chapter 5 MODULE DESCRIPTION 22


4.1 Introduction 23
4.2 Primary Domain 23
4.3 Sub Domain 23
4.4 Page Rank 24
4.5 Alexa Reputation 24
4.6 Google Index 25
4.7 Summary 25
Chapter 6 SYSTEM IMPLEMENTATION 26
5.1 Introduction 26
6.2 Implementation Details 26
6.2.1 Sample coding 26
6.2.2 Screen Shots 33
6.3 Overview of the Platform 34
6.4 Performance and Experiment Analysis 36
6.5 Summary 38

Chapter 7 CONCLUSION AND FUTURE ENHANCEMENT 39

CONFERENCE AND PUBLICATION DETAILS 40

REFERENCES 42
CHAPTER 1

INTRODUCTION
Phishing Scams are very ardent and widespread in the whole world. The internet has
become a crucial and indispensable infrastructure for the human society which has
helped both individuals and corporations over the years and has given a platform for
worldwide connectivity. However, it also has its fair share of drawbacks especially
when it comes to security. One of those security threats comes in the form of
Phishing. Phishing is a technique which employs technical tactics and social
engineering to lure gullible people into leaking personal and valuable data and
information. Phishers have multiple methods in their disposal to steal sensitive
information. One such form of phishing is achieved by creating replicas of real
websites which are designed in such a way that users are led to fraudulent websites
where unsuspecting users release credible values such as atm card values, pins and
many important data. Phishers also create spoofed e-mails disguising to be from
legitimate corporations which tricks recipients into believing such e-mails are from
those legitimate corporations and buy into the contents of such e-mails which slyly
demand users for information such as username, user id, and passwords for accounts
commonly held in social media and e-commerce websites among others. Such e-mails
also lure people into phoney schemes. The main reason why consumers of the internet
buy into such phishing methods is because of how phishers abuse everything right
from logos and slogans to trademarks among many such corporate identifiers which
makes the fake websites and e-mails dangerously similar and bear resemblance to
their original and legitimate counterparts. In the United States solidly it cost 71 billion
dollars in harm due to these scams and thefts that happen over the internet. Hence,
phishing continues to be one of the briskly growing identity theft scams on the
internet.

1.1 OVERVIEW
Blacklisting is a process commonly used by many web browsers and is used to warn
users about potentially dangerous web pages that are included in their blacklist
listings. However such listings don’t include previously unseen URLs since it is non-
trivial to decide if an unseen URL is malicious. Hence, phishing detection faces
challenges such as real time detection which is not possible with blacklisting as it is

1
impossible to have an exhaustive list of phishing websites. Speculation ability is
another test looked by phishing as assailants are constantly patching up their
techniques to set up flourishing framework so as to help continuous phishing
movement. One of the important blocks of such an infrastructure is botnet, which is
used to generate automated phishing emails and also anchor phishing sites. A recent
study by the APWG supports the fact that there could be more sophisticated schemes
and infrastructures used by attackers to exploit the ever expanding volume of popular
brands. To summarize, we are in an urgent need of a reliable phishing detection
system which can potentially assure almost-perfect accuracy in an internet
environment where the amount of attackers and phishing activity continues to expand
and grow.

1.2 PROBLEM STATEMENT


Phishing has grown to a level where people get tricked so easily and there is a
demanding need to stop these atrocities. Atrocious types of scams have been budging
in this present economic and tech revolution era. The problem statement here is to
stop all these scams an bring forth methods and techniques to make these moves
immobile.
If they are slowly curbed from their usual activities and methods, looking into
alternatives will block them and get frustrated finally. By this way we can slowly
eradicate this problem and can look upon other activities to stop in our daily life.

1.3 OBJECTIVE
The main objective is to develop a suitable machine learning algorithm which helps in
blocking and analyzing phishing equipment’s and software’s. By doing so, one can be
aware of his/her environment and can prove to be fighting against immoral activities
happening over the internet. Here we use a decision tree algorithm using machine
learning and train the computer with data sets to distinguish if a website is a phishing
website or not. This algorithm constructs a decision tree based on two formulas called
the info gain and purity which helps in diversifying. Mainly to look out for the tree
constructed and how big is the tree which indicates the scam from real.

1.4 ORGANIZATION OF THE REPORT


The project which mainly focuses on the implementation, capturing, extracting and

2
processing. There are seven chapters that deal with the various design and
implementation details.

Chapter 1
The chapter deals with the overview of the project, objective of the project and the
problem occurs during the completion of project.
Chapter 2
The chapter includes all the features of the existing system and the proposed system.
The issues in the existing system are discussed.
Chapter 3
The chapter includes purpose of the project and the description of the software used.
The features of the platform.
Chapter 4
The chapter deals with the proposed system architecture and the flow of process.
Design of the entire project is done.
Chapter 5
The chapter includes the various modules involved in the project and the architecture
of the entire system. The working of various modules is explained with description.
Chapter 6
The chapter entirely deals about the system implementation on the details about the
platform used and the implementation source code and screen shots of the output
produced.
Chapter 7
The chapter deals with the conclusion and the future enhancement of the project.

3
CHAPTER 2

LITERATURE SURVEY

2.1 INTRODUCTION
The possibility of phishing has been here for over 2 decades and can be followed back
to the 1990s by AOL (America Online). A group of programmers gathered together
and formed a group by the name ‘warez network’ who can be considered as the first
set of “phishers”. During the initial stages of phishing, a generator was made to
generate random charge card numbers which would be later used to create counterfeit
accounts on AOL. When they had the capacity to coordinate a certified card, they
made records and spammed others in AOL's society where individuals were there to
take the bait. Around 1995, AOL had the ability to stop such irregular charge card
generators, but the warez community moved on to other techniques and started to
disguise as AOL representatives and requesting users through AOL messenger for
private data.

2.2 EXISTING SYSTEM


2.2.1 A Switch to email
Internet users started becoming more aware about such malicious activity over time
and this forced phishers to move on to emails which during the time was extremely
difficult to make, shabby to convey and was never good enough to capture individuals
as they were ineffectively built and loaded with various syntactic mistakes. This
forced them to reform their techniques and they rapidly changed them to get more
modern. In late 2003, phishers started disguising like mainstream organizations such
as hurray billing.com and ebay-fulfillment.com. This time they employed refined
techniques such as increasingly genuine looking mails which easily lured gullible
people into believing they were genuine. In October 2003, PayPal clients were hit by
the Mimail infection by means of utilizing pop up windows resembling that of PayPal
and made users give away their client/secret phrases which immediately went to the
programmers. In today’s climate phishers have vastly changed strategies and such

4
malicious attackers are increasingly becoming difficult to trace and have developed
many approaches to easily pickup trust from individuals. Email phishing is a numbers
amusement. An aggressor passing on an extensive number of beguiling messages can
net significant information and sums of money, paying little respect to whether only
somewhat dimension of recipients falls for the trap. As saw above, there are a couple
of techniques aggressors use to fabricate their success rates.
For one, they will put everything on the line in planning phishing messages to imitate
real messages from a caricature association. Utilizing a similar stating, typefaces,
logos, and marks influences the messages to seem real. Moreover, aggressors will
ordinarily attempt to push clients enthusiastically by making a feeling of direness. For
instance, as recently appeared, an email could compromise account termination and
spot the beneficiary on a clock. Applying such weight makes the client be not so
much determined but rather more inclined to mistake. In conclusion, connects inside
messages look like their real partners, yet regularly have an incorrectly spelled area
name or additional subdomains. In the above urls, the avitahr.in/careers URL was
changed to avitahr.inrenewal.com. Similarities between the two tends to offer the
impression of a protected connection, making the beneficiary less mindful that an
assault is occurring.

2.2.2 Spear phishing


Spear phishing concentrates on a isolated individual or endeavour..
An assault may happen as pursues:
A culprit inquiry about names of workers inside an association's advertising office
and accesses the most recent venture solicitations. Acting like the showcasing chief,
the assailant messages a departmental undertaking supervisor (PM) utilizing a
headline that peruses, Updated receipt for Q3 crusades. The content, style, and
included logo copy the association's standard email layout. A connection in the email
side-tracks to a secret phrase ensured inside report, which is in fact a satirize
adaptation of a stolen receipt. The PM is asked for to sign in to see the report. The
assailant takes his accreditations, increasing full access to delicate regions inside the
association's system. By giving an aggressor legitimate login accreditation, skewer
phishing is a powerful strategy for executing the principal phase of an APT.

5
2.2.3 Cybersquatting
Cybersquatting, is selecting, managing in, or using a space name with deceitfulness
objective to profit by the unselfishness of a trademark having a spot with someone
else. The cybersquatter may offer pitching the domain to an individual or organization
who claims a trademark contained inside the name at an expanded cost or may utilize
it for false purposes, for example, phishing. For instance, the name of your
organization is "Avita HR solutions" and you register as avitahrsolutions.com. At that
point phishers can enroll avitahrsolutions.net, abcompany.org, abcompany.biz and
they can utilize it for fake reason.

2.2.4 Typosquatting
Typosquatting, likewise called URL seizing/hijacking, is a type of cybersquatting
which depends on oversights, for example, typographical mistakes made by Internet
clients while contributing a site address into an internet browser or dependent on
typographical blunders that are difficult to see while a quick glance. URLs which are
made with Typosquatting resembles to be a confided URL. A client may
coincidentally enter a wrong site address or snap a connection which resembles a
confided in space, and along these lines, they may visit an elective site claimed by a
phisher.

2.2.5 A PageRank Based Detection Technique for Phishing Web Sites

Phishing is an attempt to acquire one’s information without user’s knowledge


by tricking him by making similar kind of website or sending emails to user which
looks like legitimate site or email. Phishing is a social cyber threat attack, which is
causing severe loss of economy to the user, due to phishing attacks online transaction
users are declining. This paper aims to design and implement a new technique to
detect phishing web sites using Google’s PageRank. Google gives a PageRank value
to each site in the web. This work uses the PageRank value and other features to
classify phishing sites from normal sites. We have collected a dataset of 100 phishing
sites and 100 legitimate sites for our use. By using this Google PageRank technique
98% of the sites are correctly classified, showing only 0.02 false positive rate and
0.02false negative rate.

6
2.2.6 An Intelligent Anti-phishing Strategy Model for Phishing Website
Detection

As a new form of malicious software, phishing websites appear frequently in


recent years, which cause great harm to online financial services and data security. In
this paper, we design and implement an intelligent model for detecting phishing
websites. In this model, we extract 10 different types of features such as title,
keyword and link text information to represent the website. Heterogeneous classifiers
are then built based on these different features. We propose a principled ensemble
classification algorithm to combine the predicted results from different phishing
detection classifiers. Hierarchical clustering technique has been employed for
automatic phishing categorization. Case studies on large and real daily phishing
websites collected from Kingsoft Internet Security Lab demonstrate that our proposed
model outperforms other commonly used anti-phishing methods and tools in phishing
website detection.

2.2.7 Antiphishing through Phishing Target Discovery

Phishing attacks are growing in both volume and sophistication. The


antiphishing method described here collects webpages with either a direct or indirect
association with a given suspicious webpage. This enables the discovery of a
webpage’s so-called “parasitic” community and then ultimately its phishing target —
that is, the page with the strongest parasitic relationship to the suspicious webpage.
Finding this target lets users determine whether the given webpage is a phishing page.

2.2.8 Software-Defined Network Function Virtualization: A Survey

Diverse proprietary network appliances increase both the capital and


operational expense of service providers, meanwhile causing problems of network
ossification. Network function virtualization (NFV) is proposed to address these
issues by implementing network functions as pure software on commodity and
general hardware. NFV allows flexible provisioning, deployment, and centralized
management of virtual network functions. Integrated with SDN, the software-defined

7
NFV architecture further offers agile steering and joint optimization of network
functions and resources. This architecture benefits a wide range of applications (e.g.,
service chaining) and is becoming the dominant form of NFV. In this survey, we
present a thorough investigation of the development of NFV under the software-
defined NFV architecture, with an emphasis on service chaining as its application. We
first introduce the software-defined NFV architecture as the state of the art of NFV
and present relationships between NFV and SDN. Then, we provide a historic view of
the involvement from middle box to NFV. Finally, we introduce significant
challenges and relevant solutions of NFV, and discuss its future research directions by
different application domains.

2.2.9 The Fog Computing Paradigm: Scenarios and Security Issues

Fog Computing is a paradigm that extends Cloud computing and services to the edge
of the network. Similar to Cloud, Fog provides data, compute, storage, and
application services to end-users. In this article, we elaborate the motivation and
advantages of Fog computing, and analyse its applications in a series of real
scenarios, such as Smart Grid, smart traffic lights in vehicular networks and software
defined networks. We discuss the state-of-the-art of Fog computing and similar work
under the same umbrella. Security and privacy issues are further disclosed according
to current Fog computing paradigm. As an example, we study a typical attack, man-
in-the-middle attack, for the discussion of security in Fog computing. We investigate
the stealthy features of this attack by examining its CPU and memory consumption on
Fog device.

2.3 ISSUES IN EXISTING SYSTEM


Phishing destinations are likewise developing quickly in amount and unpredictability
It is hard for clients to observe if an approaching connection is authentic or not.
Numerous phishing sites are made on well-known sites, for example, online journals
or Google destinations, where the positioning highlights are not helpful for phishing
distinguishing proof. New URLs have low ranking values that are like phishing
URLs.

8
2.4 Summary of Literature Review
Various methods have been detected on the basis of type, attacking notion, motive,
region, vulnerability and so on. All these kinds of attacks are highly diverse and takes
time to provide datasets. These could be done by proper classifications and
understanding as to how a particular platform actually operates. As another type of
malignant programming, phishing sites show up often as of late, which cause
incredible mischief to online budgetary administrations and information security. In
this paper, we structure and actualize a keen model for identifying phishing sites. In
this model, we separate 10 unique sorts of highlights, for example, title, catchphrase
and connection content data to speak to the site. Heterogeneous classifiers are then
fabricated dependent on these distinctive highlights. We propose a principled group
characterization calculation to consolidate the anticipated outcomes from various
phishing location classifiers. Various leveled grouping procedure has been utilized for
programmed phishing order. Contextual investigations on expansive and genuine day
by day phishing sites gathered from Kingsoft Internet Security Lab exhibit that our
proposed model beats other ordinarily utilized enemy of phishing techniques and
devices in phishing site location.

2.5 PROPOSED SYSTEM


Distinguishing Phishing Domains is an classification issue, so it suggests we need
marked datasets which have tests as phish areas and real spaces in the preparation
stage.The dataset which will be utilized in the preparation stage is a vital point to
manufacture fruitful discovery mechanism. We need to utilize tests whose classes are
correctly known. So it implies, the examples which are named as phishing must be
completely identified as phishing. Similarly, the examples which are marked as real
should be completely distinguished as genuine. Something else, the framework won't
work accurately on the off chance that we use tests that we don't know about. Hence,
we use a machine learning algorithm called Decision Tree algorithm. Based on the
raw data acquired from reputed sources like Alexa, Phishtank and other data
resources, we create datasets consisting of features which are checked one by one in
the decision tree.

9
We utilize uniform asset locator highlights and web traffic highlights to identify
phishing sites dependent on a planned neuro-fluffy system. In view of the new
methodology, mist figuring as empowered by Cisco, we structure an enemy of
phishing model to straightforwardly screen and shield mist clients from phishing
assaults.

The examination aftereffects of our proposed methodology, in view of a substantial


scale dataset gathered from genuine phishing cases, have demonstrated that our
framework can adequately avert phishing assaults and improve the security of the
system.

A mist based enemy of phishing organization can consequently recognize phishing


URLs. Besides, we can adaptably install AI methods to improve execution since a
haze hub has ground-breaking registering assets. Specifically, haze hubs can be
conveyed hidden system work virtualization innovations, in which hostile to phishing
instruments can be kept running as a virtual machine and offer assets with different
elements of the haze hubs, for example, switches or portals. Thus, for base usage is
our objective to send an enemy of phishing administration at the edge of system.

10
CHAPTER 3
SPECIFICATIONS

3.1 INTRODUCTION
The internet has become a crucial and indispensable infrastructure for the human
society which has helped both individuals and corporations over the years and has
given a platform for worldwide connectivity. However, it also has its fair share of
drawbacks especially when it comes to security. One of those security threats comes
in the form of Phishing. Phishing is a technique which employs technical tactics and
social engineering to lure gullible people into leaking personal and valuable data and
information. Phishers have multiple methods in their disposal to steal sensitive
information. One such form of phishing is achieved by creating replicas of real
websites which are designed in such a way that users are led to fraudulent websites
where unsuspecting users release credible values such as atm card values, pins and
many important data. Phishers also create spoofed e-mails disguising to be from
legitimate corporations which tricks recipients into believing such e-mails are from
those legitimate corporations and buy into the contents of such e-mails which slyly
demand users for information such as username, user id, and passwords for accounts
commonly held in social media and e-commerce websites among others. Such e-mails
also lure people into phoney schemes. The main reason why consumers of the internet
buy into such phishing methods is because of how phishers abuse everything right

11
from logos and slogans to trademarks among many such corporate identifiers which
makes the fake websites and e-mails dangerously similar and bear resemblance to
their original and legitimate counterparts. In the United States solidly it cost 71 billion
dollars in harm due to these scams and thefts that happen over the internet. Hence,
phishing continues to be one of the briskly growing identity theft scams on the
internet.

3.1.1 PURPOSE OF PROJECT


In the era of advanced technologies, where computers, laptops and other processor-
based devices are an integral part of day to day life, efforts are required to be done for
making it more secure and safe. Our aim is to design a tool where people could get
notified whether a platform/website/email/anything is a phishing website. A decision
algorithm machine learning scheme is proposed. This paper presents some high
quality formulas and algorithms which is not very complex to understand.

3.2 OVERALL DESCRIPTION


3.2.1 Project features
Common phishing detection techniques such as blacklisting and content filtering has
multiple drawbacks which is not efficient enough for the phishing techniques of
attackers in this day and age. Filtering techniques such as Bayesian Content Filtering
can be bypassed through “Bayesian Poisoning” which circumvents the process of
filtering while blacklisting results in multiple false positives. Thus it is crucial to bring
in Machine Learning into the picture to optimize phishing detection as it is much
better than exhaustive methods of detection.

One such Algorithm which can be implemented is called “Decision Tree Algorithm”.
This can be considered as an enriched nested-if-else structure. This algorithm checks
each feature one by one to determine if a certain URL is legitimate or not. URLs are
passed through a tree which contains nodes and leaves. The nodes are elliptical and
represent the features while leaves are rectangular and represent the classes. Samples
are fed into this tree to determine if they match the features’ classification to figure
out the class in which it belongs (phishing, legitimate). Once it goes through this
journey, it will be clear if the sample is phish or legitimate.

Thus the important thing is to decide the features intelligently. In order to achieve
this, the algorithm uses two equations which are used to provide values for two
parameters: Gain score, Entropy. The method used is called as Information Gain
method.

Gain score is found out using this equation:

12
Fig 3.1: Gain Score Formula

The gain score value is directly proportional to the distinguishing ability of the
feature. Thus the objective is to rank the features in the tree in the order of reducing
gain scores. The Entropy value used in the equation has a separate equation. It is the
statistical measure of the purity of the samples used.

Fig 3.2: Entropy Formula

There are two Entropy values: Original Entropy and Relative Entropy in which
Original Entropy is constant while Relative Entropy keeps changing. The purity of the
samples is inversely proportional to the Relative Entropy value. To obtain the
hierarchy of features, we will use multiple samples to run through the tree to
determine the gain score values of each feature using the two equations and rank the
features in decreasing order of gain score. To grow the tree, leaves will be converted
into nodes as extra features are being added. During the process of the tree’s growth,
the leaves will have high purity thus indicating the tree is big enough and the training
process can be finished.

The efficiency of this algorithm depends on the variety of the samples used and the
sources from which the samples are extracted. Thus to attain generalization of success
of detection, we need to ensure the samples are extracted from multiple data sources
otherwise it will only work well with the existing dataset and not with real world data.

3.2.2 Operating environment

Python is a translated, abnormal state, broadly useful programming language. Made


by Guido van Rossum and first discharged in 1991, Python has a structure rationality
that underlines code coherence, prominently utilizing huge whitespace. It gives
develops that empower clear programming on both little and vast scales. Van Rossum
drove the language network until venturing down as pioneer in July 2018.

Python includes a dynamic kind framework and programmed memory the


executives. It underpins different programming ideal models, including object-
situated, basic, utilitarian and procedural. It likewise has a far reaching standard
library.

13
1) Effortless

When we state the word 'Effortless', we mean it in various settings.

2) Idiot-Proof Coding

As we have seen in before exercises, Python is anything but difficult to code.


Contrasted with other well known dialects like Java and C++, it is simpler to code in
Python. Anybody can learn python sentence structure in only a couple of hours. In
spite of the fact that beyond any doubt, acing Python requires finding out pretty much
the entirety of its propelled ideas and bundles and modules. That requires some
serious energy. Along these lines, it is developer agreeable.

3) Facile to Read

To start with, how about we find out about expressiveness. Assume we have two
dialects An and B, and all projects that can be made in A can be made in B utilizing
nearby changes. Nonetheless, there are a few projects that can be made in B, however
not in An, utilizing neighborhood changes. At that point, B is said to be more
expressive than A. Python gives us a horde of develops that assistance us center
around the arrangement as opposed to on the linguistic structure. This is one of the
exceptional python includes that discloses to you why you ought to learn Python.

4) Free and Open-Source

Firstly, Python is unreservedly accessible. You can download it from the


accompanying connection.

https://www.python.org/downloads/

For understanding on the best way to download and introduce Python, allude
to our instructional exercise on Python Installation. Besides, it is open-source. This
implies its source code is accessible to people in general. You can download it,
change it, use it, and appropriate it. This is called FLOSS(Free/Libre and Open Source
Software). As the Python people group, we're altogether made a beeline for one
objective a regularly bettering Python

5) High-Level

Programmers need not worry about the language failing to impress them. It’s a very
code friendly program and no necessity to worry about memory management.

14
3.2.2.1 Python:

Python is an interpreted, object-oriented, high-level programming language


with dynamic semantics. Its high-level built in data structures, combined with
dynamic typing and dynamic binding, make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy to learn syntax emphasizes readability
and therefore reduces the cost of program maintenance. Python supports modules and
packages, which encourages program modularity and code reuse. The Python
interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed.

Often, programmers fall in love with Python because of the increased


productivity it provides. Since there is no compilation step, the edit-test-debug cycle
is incredibly fast. Debugging Python programs is easy: a bug or bad input will never
cause a segmentation fault. Instead, when the interpreter discovers an error, it raises
an exception. When the program doesn't catch the exception, the interpreter prints a
stack trace. A source level debugger allows inspection of local and global variables,
evaluation of arbitrary expressions, setting breakpoints, stepping through the code a
line at a time, and so on. The debugger is written in Python itself, testifying to
Python's introspective power. On the other hand, often the quickest way to debug a
program is to add a few print statements to the source: the fast edit-test-debug cycle
makes this simple approach very effective.

3.2.3.2 Liclipse:

 Lightweight editors, theming and usability improvements for Eclipse.


 LiClipse provides a new experience for Eclipse users
 With it, users get out of the box:
 A fast editor supporting many languages out of the box. View supported
languages
 Support for TextMate Bundles. View TextMate Bundles Integration
 A simple way to add support for a new language
 Usability improvements for all Eclipse editors featuring:
 Multiple cursors
 Vertical indent guides

15
 Themed scrollbars
 Improved text search capabilities (with Lucene index-based searching, support
for external folders, open editors and additional filtering on results page)
 HTML preview for the RST, Markdown and HTML editors
 Native installers
 Improved theming support based on Eclipse 4 improvements
 Release Highlights for LiClipse 5.1.3
 Updated PyDev to 7.0.3.
 Debugger performance improvements (on Python 3.6 onwards).
 Mypy can be used for doing code analysis.
 Black can be used as the code formatting engine.
 It's now possible to use pipenv for managing virtual environments.
 It's possible to manage virtual environments from the editor (Ctrl+2,
pip/conda/pipenv).
 Updated EGit.

3.2.3.3 Structured Query Language (SQL):

SQL (Structured Query Language) is a standardized programming language


that's used to manage relational databases and perform various operations on the data
in them. Initially created in the 1970s, SQL is regularly used not only by database
administrators, but also by developers writing data integration scripts and data
analysts looking to set up and run analytical queries.

The uses of SQL include modifying database table and index structures;
adding, updating and deleting rows of data; and retrieving subsets of information from
within a database for transaction processing and analytics applications. Queries and
other SQL operations take the form of commands written as statements -- commonly
used SQL statements include select, add, insert, update, delete, create, alter and
truncate.

SQL became the de facto standard programming language for relational


databases after they emerged in the late 1970s and early 1980s. Also known as SQL
databases, relational systems comprise a set of tables containing data in rows and
columns. Each column in a table corresponds to a category of data -- for example,

16
customer name or address -- while each row contains a data value for the intersecting
column.

3.2.3.4 SQL standard and proprietary extensions

An official SQL standard was adopted by the American National Standards


Institute (ANSI) in 1986 and then by the International Organization for
Standardization, known as ISO, in 1987. More than a half-dozen joint updates to the
standard have been released by the two standards development bodies since then; as
of this writing, the most recent version is SQL:2011, approved that year.

Both proprietary and open source relational database management


systems built around SQL are available for use by organizations. They
include Microsoft SQL Server, Oracle Database, IBM DB2, SAP HANA, SAP
Adaptive Server, MySQL (now owned by Oracle) and PostgreSQL. However, many
of these database products support SQL with proprietary extensions to the standard
language for procedural programming and other functions. For example, Microsoft
offers a set of extensions called Transact-SQL (T-SQL), while Oracle's extended
version of the standard is PL/SQL. As a result, the different variants of SQL offered
by vendors aren't fully compatible with one another.

3.2.3.5 SQL commands and syntax:

SQL commands are divided into several different types, among them data
manipulation language (DML) and data definition language (DDL) statements,
transaction controls and security measures. The DML vocabulary is used to retrieve
and manipulate data, while DDL statements are for defining and modifying database
structures. The transaction controls help manage transaction processing, ensuring that
transactions are either completed or rolled back if errors or problems occur. The
security statements are used to control database access as well as to create user roles
and permissions.

3.2.3 Design and Implementation Constraints:

17
Machine Learning is highly effective and much better than traditional phishing
detection methods such as blacklisting and filtering, but it comes with its own set of
challenges.

1. Error diagnosis and correction: Machine Learning is effective but can


sometimes fail in the diagnosis of erroneous feature classification. This can
lead to more false positives as the URLs can bypass feature checking and can
be classified in the legitimate class of URLs.
2. Time constraints in learning: Unlike directly feeding features and exhaustive
approaches to figuring out the class of URLs, Machine Learning learns
through historical data. This means that it is impossible to make immediate
decisions with existing data sets and it will take time to develop the
architecture big enough to perform well. Thus it is time consuming to deliver
optimal performance.
3. Problems with verification: Even though features are classified in a very
systematic manner and trees are developed from time to time to offer better
phishing detection, it is hard to ensure each and every feature is completely
accurate because machine learning deals with statistical truths instead of literal
truths. This means that even a Machine Learning architecture could have
inaccurate feature classification and hierarchial placement of features which
could result in inaccurate detection of phishing URLs.
4. Limitations of Predictions: Machine Learning system cannot always provide
rational reasoning for every move it does. They are also limited to only
solving problems instead of raising potential problems. This means that
Machine Learning will also develop its own set of hidden or unintentional
biases which affect the ability of the architecture to form trees with highest
accuracy of phishing detection

3.2.4 Preliminary Assumptions and Dependencies:


Machine Learning is a very efficient way to tackle phishing and detect
phishing environments. Within Machine Learning multiple approaches are available
for phishing detection. One such approach is using the decision tree algorithm. This
algorithm is chosen as it is expected to provide better performance than the other
approaches as it is much more effective in checking the features comprehensively to
classify potential phishing samples into phish class or legitimate class.

The Decision Tree Algorithm can be considered as an enriched nested-if-else


structure. This algorithm checks each feature one by one to determine if a certain
URL is legitimate or not. URLs are passed through a tree which contains nodes and
leaves. The nodes are elliptical and represent the features while leaves are rectangular
and represent the classes. Samples are fed into this tree to determine if they match the
features’ classification to figure out the class in which it belongs (phishing,
legitimate). Once it goes through this journey, it will be clear if the sample is phish or
legitimate.

18
CHAPTER 4

SYSTEM DESIGN

4.1 INTRODUCTION
Phishing is a technique which employs technical tactics and social engineering to lure
gullible people into leaking personal and valuable data and information. Phishers have
multiple methods in their disposal to steal sensitive information. One such form of
phishing is achieved by creating replicas of real websites which are designed in such a
way that users are led to fraudulent websites where unsuspecting users release
credible values such as atm card values, pins and many important data. Phishers also
create spoofed e-mails disguising to be from legitimate corporations which tricks
recipients into believing such e-mails are from those legitimate corporations and buy
into the contents of such e-mails which slyly demand users for information such as
username, user id, and passwords for accounts commonly held in social media and e-
commerce websites among others. Such e-mails also lure people into phony schemes.

4.2 SYSTEM ARCHITECTURE

19
Fig 4.1: System Architecture

4.2.1 Description
Traditional phishing detection techniques such as blacklisting and content filtering has
multiple drawbacks which is not efficient enough for the phishing techniques of
attackers in this day and age. Filtering techniques such as Bayesian Content Filtering
can be easily bypassed through “Bayesian Poisoning” which circumvents the process
of filtering. Blacklisting results in multiple false positives due to the limited amount
of data sets available in such exhaustive methods. Thus it is crucial to bring in
Machine Learning into the picture to optimize phishing detection and the Decision
Tree Algorithm” is chosen as it is expected to give more optimal results.

Decision Tree can be considered as an enriched nested-if-else structure. This


algorithm checks each feature one by one to determine if a certain URL is legitimate
or not. URLs are passed through a tree which contains nodes and leaves. The nodes
are elliptical and represent the features while leaves are rectangular and represent the
classes. Samples are fed into this tree to determine if they match the features’
classification to figure out the class in which it belongs (phishing, legitimate). Once it
goes through this journey, it will be clear if the sample is phish or legitimate.

Thus the important thing is to decide the features intelligently. In order to achieve
this, the algorithm uses two equations which are used to provide values for two
parameters: Gain score, Entropy. The method used is called as Information Gain
method.

20
4.3SYSTEM REQUIREMENTS

Hardware Requirements
Processor: - Intel Core i3 or above
Speed: - 1.50GHZ
Memory: - 2GB RAM
Hard Disk Drive: - 200GB

Software Requirements
Development Platform: - Windows 10
Coding Language:-Python,SQL
Tools: - LiClpise
Back End: - MySQL

4.4 SUMMARY
Traditional phishing detection techniques such as blacklisting and content filtering has
multiple drawbacks which is not efficient enough for the phishing techniques of
attackers in this day and age. Filtering techniques such as Bayesian Content Filtering
can be easily bypassed through “Bayesian Poisoning” which circumvents the process
of filtering. Blacklisting results in multiple false positives due to the limited amount
of data sets available in such exhaustive methods. Thus it is crucial to bring in
Machine Learning into the picture to optimize phishing detection and the Decision
Tree Algorithm” is chosen as it is expected to give more optimal results.

Decision Tree can be considered as an enriched nested-if-else structure. This


algorithm checks each feature one by one to determine if a certain URL is legitimate
or not. URLs are passed through a tree which contains nodes and leaves. The nodes
are elliptical and represent the features while leaves are rectangular and represent the
classes. Samples are fed into this tree to determine if they match the features’
classification to figure out the class in which it belongs (phishing, legitimate). Once it
goes through this journey, it will be clear if the sample is phish or legitimate.

Thus the important thing is to decide the features intelligently. In order to achieve
this, the algorithm uses two equations which are used to provide values for two
parameters: Gain score, Entropy. The method used is called as Information Gain
method.

Gain score is found out using this equation:

21
Fig 4.2 : Gain Score Formula

The gain score value is directly proportional to the distinguishing ability of the
feature. Thus the objective is to rank the features in the tree in the order of reducing
gain scores. The Entropy value used in the equation has a separate equation. It is the
statistical measure of the purity of the samples used.

Fig 4.3 : Entropy Formula

There are two Entropy values: Original Entropy and Relative Entropy in which
Original Entropy is constant while Relative Entropy keeps changing. The purity of the
samples is inversely proportional to the Relative Entropy value. To obtain the
hierarchy of features, we will use multiple samples to run through the tree to
determine the gain score values of each feature using the two equations and rank the
features in decreasing order of gain score. To grow the tree, leaves will be converted
into nodes as extra features are being added. During the process of the tree’s growth,
the leaves will have high purity thus indicating the tree is big enough and the training
process can be finished.

The efficiency of this algorithm depends on the variety of the samples used and the
sources from which the samples are extracted. Thus to attain generalization of success
of detection, we need to ensure the samples are extracted from multiple data sources
otherwise it will only work well with the existing dataset and not with real world data.

22
CHAPTER 5

MODULE DESCRIPTION
5.1 INTRODUCTION
Our complete project deals with the different modules based on the working. These
modules are listed below :
List of modules
The list of modules to performed are given below
 Primary Domain
 Sub Domain
 Path Domain
 Page Rank
 Alexa Reputation
 Google Index

23
5.2 PRIMARY DOMAIN

Phishers can't utilize the first Primary Domain since it is now enlisted by the first
organization. Subsequently, phishers register incorrect spellings or comparable
Primary Domain of phishing sites to trick clients.

5.3 SUB DOMAIN

Phishers frequently prepend the space of phishing sites to their site. For instance,
phishers prepend the Sub Domain "paypal.com" to some other area (e.g., ".io",
".business") that may trick clients into the phishing URLs.

5.4 PATH DOMAIN

This is a sub-organizer of the URL. Phishers can likewise utilize the Path Domain to
trick clients. For instance, phishers may explore clients to the URL
www.attack.com/paypal, where a phishing site interface is like the first one.
Heedlessly, the clients will believe that this URL is from the "paypal.com" site.
Particularly, utilizing cell phones with little realistic interfaces, it might be too hard to
even consider recognizing such phishing URLs.

5.5 PAGE RANK

24
Google search engine uses a link analysis algorithm to build PageRank values most
phishing web-pages have low PageRank, because these sites exist only for a short
time.

5.6 ALEXA REPUTATION

Alexa Reputation estimation of a site is determined as the quantity of connections


from different pages to itself. Alexa Reputation is like Page rank, where Alexa
Reputation benefits of phishing sites are much lower than the estimations of the
authentic destinations..

5.7 GOOGLE INDEX

Google file records every genuine site that are visited by specialists of Google.
Google every now and again refreshes this file list for its web index. The estimations
of Google Index for phishing sites are a lot littler than those of real locales.

5.8 SUMMARY

Thus these are the various modules present in our project implementation.

25
CHAPTER 6

SYSTEM IMPLEMENTATION

6.1 INTRODUCTION
In this chapter implementation of the system is described in detail.

6.2 IMPLEMENTATION DETAILS


6.2.1 SAMPLE CODES
6.2.2 SCREEN SHOTS
6.3 OVERVIEW OF THE PLATFORM
6.4 PERFORMANCE AND EXPERIMENT ANALYSIS

26
6.5 SUMMARY
In system implementation, all the details regarding the simulation and implementation
of the project have been mentioned along with the sample coding for the Phishing
scam detection and screenshot of every module is also given in the above section.
Thus, the proposed system has been executed successfully.

27
CHAPTER 7

CONCLUSION AND FUTURE ENHANCEMENT

Thus we have devised a rigid mechanism containing a rigorous machine learning


algorithm to distinguish the features which are used to determine if an infrastructure is
a legitimate infrastructure or a phishing infrastructure with a steep accuracy rate. This
mechanism takes in newest parameters to device a method to protect people from
getting phished. This method uses a scoring technique to distinguish whether its
phishing or legitimate. With proper training set, the comparison set can be easily
distinguished which makes this a really powerful tool to use. With enhancements in
the training datasets used and tools we will be able to create more efficient machine
learning algorithms which can guarantee better success rates in real world detection of
phishing infrastructures.

28
REFERENCES

[1] L. Wenyin, G. Huang, L. Xiaoyue, X. Deng, and Z. Min, “Phishing web page
detection,” in Document Analysis and Recognition, 2005. Proceedings. Eighth
International Conference on. IEEE, pp. 560–564.

[2] P. Stavroulakis and M. Stamp, Handbook of Information and Communication


Security, 1st ed. Springer Publishing Company, Incorporated, 2010.

[3] Anti-phishing working group. Accessed Sept 2016. [Online]. Available:


http://www.antiphishing.org.

[4] Mobile marketing statistics. Accessed Mar, 2017. [Online]. Available:


http://www.smartinsights.com/mobile-marketing/mobile-marketing-analytics/mobile-
marketing-statistics/[5] Phishing attacks. Accessed Sept 2015. [Online]. Available:
https://securityintelligence.com/

[6] Y. Zhang, J. I. Hong, and L. F. Cranor, “Cantina: a content-based approach to


detecting phishing web sites,” in Proceedings of the 16th international conference on
World Wide Web. ACM, 2007, pp. 639–648.

[7] PhishTank. Accessed Nov 2015. [Online]. Available:


http://www.phishtank.com/stats/2014/01/

[8] S. Sheng, B. Wardman, G. Warner, L. F. Cranor, J. Hong, and C. Zhang,“An


empirical analysis of phishing blacklists,” in Proceedings of Sixth Conference on
Email and Anti-Spam (CEAS), 2009.

[9] V. M. Gandhimathi K1, “Identifying similar web pages using scoring methods for
web community mining,” in Proc. of Int. Conf. on Advances in Computer Science,
AETACS, 2013.

29
[10] B. B. Gupta, A. Tewari, A. K. Jain, and D. P. Agrawal, “Fighting against
phishing attacks: state of the art and future challenges,” in Neural Computing and
Applications, 2016.

[11] Apwg reports. [Online]. Available: http://docs.apwg.org/reports/

[12] Fog computing and the internet of things: Extend the cloud to where the things
are. [Online]. Available: http://www.cisco.com/c/dam/en
us/solutions/trends/iot/docs/computing-overview.pdf

[13] Y. Li and M. Chen, “Software-Defined Network Function Virtualization: A


Survey,” IEEE Access, vol. 3, pp. 2542–2553, 2015.

[14] Real time anti-phishing. Accessed on Sept 2016. [Online]. Available:


http://www.brightcloud.com/services/real-time-anti-phishing.php

[15] Anti Spam Hardware. Accessed Sept 2016. [Online].Available:


http://www.windowsnetworking.com/hardware/Anti-Spam-Hardware/modusGate-
Anti-Spam-Appliance.html

[16] Security Intelligence Blacklisting. Accessed Sept 2016. [Online]. Available:


https://www.cisco.com/c/en/us/td/docs/security/firepower/601/configuration/guide/
fpmc-config-guide-v601/Security Intelligence Blacklisting.pdf

[17] N. Zhang and Y. Yuan, “Phishing detection using neural network, cs229 lecture
notes.” [Online].Available:http://cs229.stanford.edu/proj2012/ZhangYuan
PhishingDetectionUsingNeuralNetwork.pdf,2012

[18] G. Xiang, J. Hong, C. P. Rose, and L. Cranor, “Cantina+: a feature-rich machine


learning framework for detecting phishing web sites,” ACM
Transactions on Information and System Security (TISSEC), vol. 14, no. 2, pp. 1–28,
2011.

[19] Vulnerability in spotify android-app. Accessed on Sept 2015. [Online].

30
Available: http://blog.trendmicro.com/trendlabs-security-intelligence/vulnerability-in-
spotify-android-app-may-lead-to-phishing/

[20] Google safe browsing. Accessed Sept 2016. [Online].


Available:https://safebrowsing.google.com/

[21] N. Abdelhamid, “Multi-label rules for phishing classification,”


Applied Computing and Informatics, vol. 11, no. 1, pp. 29 – 46,
2015. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S2210832714000210

[22] W. Hadi, F. Aburub, and S. Alhawari, “A new fast associative classification


algorithm for detecting phishing websites,” Applied Soft Computing, vol. 48, pp. 729
– 734, 2016. [Online].
Available:http://www.sciencedirect.com/science/article/pii/S1568494616303970

[23] L. . T. C. James, J. ; Sandhya, “Detection of phishing urls using machine learning


techniques,” in Control Communication and Computing (ICCC), 2013 International
Conference on, 2013, pp. 304–309.

[24] M. Aburrous, M. A. Hossain, K. Dahal, and F. Thabtah, “Intelligent phishing


detection system for e-banking using fuzzy data mining,” Expert systems with
applications, vol. 37, no. 12, pp. 7913–7921, 2010.

[25] A. N. V. Sunil and A. Sardana, “A pagerank based detection technique for


phishing web sites,” in Computers & Informatics (ISCI), 2012 IEEE Symposium on.
IEEE, 2012, pp. 58–63.

[26] R. Verma and K. Dyer, “On the character of phishing urls: Accurate and robust
statistical learning classifiers,” in Proceedings of the 5 th ACM Conference on Data
and Application Security and Privacy, ser.CODASPY ’15. New York, NY, USA:
ACM, 2015, pp. 111–122.

31
[27] L. Wenyin, G. Liu, B. Qiu, and X. Quan, “Antiphishing through phishing target
discovery,” IEEE Internet Computing, vol. 16, no. 2, pp. 52–61, March 2012.

[28] F. Bonomi, R. Milito, P. Natarajan, and J. Zhu, Fog Computing: A Platform for
Internet of Things and Analytics. Springer International Publishing, 2014, pp. 169–
186.

[29] IoT, from cloud to fog computing. Accessed Sept 2015. [Online]. Available:
http://blogs.cisco.com/perspectives/iot-from-cloud-to-fog-computing

[30] J. Shropshire, “Extending the cloud with fog: Security challenges &
opportunities,” in Information Systems Security, Assurance, and Privacy Track
Conference, 2014.

[31] I. Stojmenovic and S. Wen, “The fog computing paradigm: Scenarios and
security issues,” in Computer Science and Information Systems (FedCSIS), 2014
Federated Conference on. IEEE, pp. 1–8.

[32] Network appliance issues advisory for customers facing phishing attacks.
Accessed Sept 2016. [Online]. Available:
http://www.netapp.com/us/company/news/press-releases/

[33] Cloud NFV White Paper. Accessed date Sept 2016. [Online]. Available:
http://cloudnfv.com/WhitePaper.pdf

[34] WHOIS. Accessed Sept 2016. [Online]. Available: http://www.whois.com/

[35] S. Garera, N. Provos, M. Chew, and A. D. Rubin, “A framework for detection


and measurement of phishing attacks,” in Proceedings of the 2007 ACM workshop on
Recurring malcode. ACM, 2007, pp. 1–8.

[36]Spam and open-relay blocking system. Accessed Sept 2016. [Online].Available:


http://www.sorbs.net

32
[37]Url blacklist. Accessed Sept 2016. [Online]. Available: http://uribl.com

[38] Surbl. Accessed Sept 2016. [Online]. Available: http://www.surbl.org

[39]Symantec norton. Accessed Sept 2016. [Online]. Available:


http://www.symantec-norton.com/

[40] R. Dhamija and J. D. Tygar, “The battle against phishing: Dynamic security
skins,” in Proceedings of the 2005 symposium on Usable privacy and security. ACM,
2005, pp. 77–88.

[41] J. H. Huh and H. Kim, “Phishing detection with popular search engines:Simple
and effective,” in Foundations and Practice of Security.Springer, 2012, pp. 194–207.

[42] Checking Page Rank. [Online]. Available: https://www.prchecker.info/check


page rank.php

[43] Link analysis. Accessed Sept 2015. [Online]. Available:


http://snap.stanford.edu/class/cs224w-readings/borodin05pagerank.pdf

[44] Alexa. [Online]. Available: http://tutology.net/category/how-php/get-alexa-rank-


php-and-alexa-api

[45] Google index. Accessed Sept 2016. [Online]. Available:


”https://www.google.com/insidesearch/howsearchworks/crawling-indexing.html”

[46] J.-S. R. Jang, “Anfis: adaptive-network-based fuzzy inference system,”


Systems, Man and Cybernetics, IEEE Transactions on, vol. 23, no. 3,pp. 665–685,
1993.

[47] R. Full´er, “Neural fuzzy systems,” 1995.

[48] P. Liu and H. Li, “Fuzzy neural networks for storing and classifying,”in Fuzzy
Neural Network Theory And Application. World Scientific,2004, pp. 25–67.

33
[49] Query suggestion service of google. Accessed Sept 2016.[Online].
Available:”http://www.google.com/support/enterprise/static/gsa/docs/admin/70/gsa
doc set/xml reference/query suggestion.html”

[50] H. Zhang, G. Liu, T. W. Chow, and W. Liu, “Textual and visual content-based
anti-phishing: a bayesian approach,” Neural Networks, IEEE Transactions on, vol. 22,
no. 10, pp. 1532–1546, 2011.

[51] W. Zhuang, Q. Jiang, and T. Xiong, “An intelligent anti-phishing strategy model
for phishing website detection,” in Distributed Computing Systems Workshops
(ICDCSW), 2012 32nd International Conference on. IEEE, 2012, pp. 51–56.

[52] J.-S. Jang and C.-T. Sun, “Neuro-fuzzy modeling and control,” Proceedings of
the IEEE, vol. 83, no. 3, pp. 378–406, 1995.

[53] R. Rojas, Neural networks: a systematic introduction. Springer Science


& Business Media, 1996.

[54] Dmoz. Accessed Nov 2015. [Online]. Available: http://rdf.dmoz.org/rdf/

[55] Hard kernel. Accessed Sept 2015. [Online]. Available:


http://www.hardkernel.com/main/products/prdtinfo.php?gcode=G140448267127

34
35

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy