0% found this document useful (0 votes)
153 views51 pages

Project Report: Bachelor of Engineering

Uploaded by

Karan Kukade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views51 pages

Project Report: Bachelor of Engineering

Uploaded by

Karan Kukade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Project Report

On
Fake User Identification on Social Networking Site(Twitter)

Submitted for partial fulfillment of requirement for the degree of

BACHELOR OF ENGINEERING

(Computer Science and Engineering)

Submitted By:-
Priyanka Bhise
Vaishnavi Uttarwar
Vishal Narote
Karan Kukade

Under the Guidance of


Prof. A. R. Deshmukh

Department of Computer Science & Engineering,


Prof. Ram Meghe Institute of Technology &
Research, Badnera
2020-2021
Certificate

This is to certify that the Project (8KS07) entitled

Fake User Identification on Social Networking


Site(Twitter)

Is a bonafide work and it is submitted to the

SantGadge Baba Amravati University, Amravati

By
Priyanka Bhise

Vaishvi Uttarwar

Vishal Narote

Karan Kukade

In the partial fulfillment of the requirement for the degree of Bachelor of Engineering in
Computer Science & Engineering, during the academic year 2020-2021 under my
guidance.
Prof. A. R. Deshmukh Dr. G. R. Bamnote

Guide Head
Department Of Computer Sci.&Engg Department Of Computer Sci. &Engg
PRMITR,Badnera PRMITR,Badnera

Department of Computer Science & Engineering,


Prof. Ram Meghe Institute of Technology &
Research, Badnera
2020-2021
ACKNOWLEDGEMENT
It gives us immense pleasure to express our gratitude to Prof. A. R. Deshmukh, our guide who
provided us constructive criticism and appositive feedback during the preparation of this project.I
am indebted to Dr. G. R. Bamnote, Head of Department Computer Science and Engineering and
other teaching and non-teaching staff who were always there whenever we needed any help.
Without them and their co-operation, completion of this project work would have been difficult,
who constantly motivated us during thiswork.

Priyanka Bhise (Roll.No.16)

Vaishnavi Uttarwar(Roll.No.17)

Vishal Narote(Roll.No.18)

Karan Kukade(Roll.No.19)

B.E.(CSE) VIII Semester


ABSTRACT

Social networking sites engage millions of users around the world. The users' interactions with
these social sites, such as Twitter and Facebook have a tremendous impact and occasionally
undesirablerepercussionsfordailylife.Theprominentsocialnetworkingsiteshaveturnedinto a
target platform for the spammers to disperse a huge amount of irrelevant and deleterious
information.Twitter,forexample,hasbecomeoneofthemostextravagantlyusedplatformsof all
times and therefore allows an unreasonable amount of spam. Fake users send undesired tweets
to users to promote services or websites that not only affect legitimate users but also disrupt
resource consumption. Moreover, the possibility of expanding invalid information to users
through fake identities has increased that results in the unrolling of harmful content. Recently,
the detection of spammers and identification of fake users on Twitter has become a common
area of research in contemporary online social Networks (OSNs). In this paper, we
performareviewoftechniquesusedfordetectingspammersonTwitter.Moreover,ataxonomy of the
Twitter spam detection approaches is presented that classifies the techniques based on their
ability to detect: (I) fake content, (ii) spam based on URL, (iii) spam in trending topics,
and(iv)fakeusers.Thepresentedtechniquesarealsocomparedbasedonvariousfeatures,such as user
features, content features, graph features, structure features, and time features. We are hopeful
that the presented study will be a useful resource for researchers to find the highlights of
recent developments in Twitter spam detection on a singleplatform.
TABLE OF CONTENT

Sr. no Contents Page no


1 Introduction 1
1.1 System Study 3
1.1.1 EconomicalFeasibility 3
1.1.2 Technical Feasibility 4
1.1.3 Social Feasibility 4
1.2 Objectives 4
1.3 Problem Statement 4
2 Literature Review 6
3 System Architecture 10
3.1 System Analysis 10
3.1.1 Existing System 10
3.1.2 Proposed System 10
3.2 System Design 11
3.2.1 Dataflow Diagram 12
3.2.2 Uml Diagram 15
3.2.3 Usecase Diagram 16
3.2.4 Class Diagram 18
3.2.5 Sequence Diagram 20
3.2.6 Activity Diagram 22
3.3 Input Design & Output Design 24
3.3.1 Input Design 24
3.3.2 Output Design 25
4 Implementation 27
4.1 Admin Module 27
4.2 Data Collection 28
4.3 Train And Test 29
4.4 User Check 30
4.5 Detection 31
LIST OF FIGURES

Sr. No. Description Page


No.

Fig.3.2 System Architecture 14

Fig.3.2.1 Dataflow Diagram 15

Fig.3.2.2 UML Diagram 16

Fig.3.2.3 Use case Diagram 18

Fig.4.2.4 Class Diagram 19

Fig.4.2.5 Sequence Diagram 20

Fig.4.2.6 Activity Diagram 23


LIST OF SCREENSHOTS

Sr. Description Page


No. No.

4.1 Admin module 24

4.2 Data Collection 24

4.3 Train and Test 25

4.4 User Check 30

4.4.1 User Check Parameters 30

4.5 Detection 31
Spammer detection and fake user identification
on social networking sites

1. INTRODUCTION

I. Social Network

Wikipedia defines a social network service as a service which “focuses on the building and
verifyingofonlinesocialnetworksforcommunitiesofpeoplewhoshareinterestsandactivities, or
who are interested in exploring the interests and activities of others, and which necessitates the
use ofsoftware.”

AreportpublishedbyOCLCprovidesthefollowingdefinitionofsocialnetworkingsites:“Web
sitesprimarilydesignedtofacilitateinteractionbetweenuserswhoshareinterests,attitudesand
activities, such as Facebook, Mixi andMySpace.”

II. Social Networks Be Used For

Social networks can provide a range of benefits to members of an organization[3]:


 Support for learning: Social networks can enhance informal learning and support
socialconnectionswithingroupsoflearnersandwiththoseinvolvedinthesupportof
learning.
 Support for members of an organisation: Social networks can potentially be used
my all members of an organisation, and not just those involved in working with
students. Social networks can help the development of communities ofpractice.
 Engaging with others: Passive use of social networks can provide valuable business
intelligence and feedback on institutional services (although this may give rise to
ethical concerns).

PRMIT&R//DCSE//2020-21 Page no1


Spammer detection and fake user identification
on social networking sites

 Ease of access to information and applications: The ease of use of many social
networking services can provide benefits to users by simplifying access to other tools
and applications. The Facebook Platform provides an example of how a social
networking service can be used as an environment for othertools.
 Common interface: A possible benefit of social networks may be the common
interface which spans work / social boundaries. Since such services are often used in
a personal capacity the interface and the way the service works may be familiar,thus
minimising training and support needed to exploit the services in a professional
context. This can, however, also be a barrier to those who wish to have strict
boundaries between work and socialactivities.

III. Examples of Social NetworkingServices


Examples of popular social networking servicesinclude:
 Facebook: Facebook is a social networking Web site that allows people to
communicate with their friends and exchange information. In May 2007 Facebook
launchedtheFacebookPlatformwhichprovidesaframeworkfordeveloperstocreate
applications that interact with core Facebookfeatures
 MySpace: MySpace is a social networking Web site offering an interactive, user-
submittednetworkoffriends,personalprofiles,blogsandgroups,commonlyusedfor
sharing photos, music andvideos...
 Ning: An online platform for creating social Web sites and social networks aimed at
users who want to create networks around specific interests or have limited technical
skills.
 Twitter: Twitter is an example of a micro-blogging service. Twitter can be used in a
variety of ways including sharing brief information with users and providing support
for one’speers.

PRMIT&R//DCSE//2020-21 Page no2


Spammer detection and fake user identification
on social networking sites

Note that this brief list of popular social networking services omits popular social sharing
services such as Flicker and YouTube.

IV. Opportunities andChallenges


The popularity and ease of use of social networking services have excited institutions with
their potential in a variety of areas. However effective use of social networking services poses
a number of challenges for institutions including long-term sustainability of the services; user
concerns over use of social tools in a work or study context; a variety of technical issues and
legal issues such as copyright, privacy, accessibility; etc.
Institutionswouldbeadvisedtoconsidercarefullytheimplicationsbeforepromotingsignificant use
of suchservices.

1.1 SYSTEMSTUDY
 FEASIBILITY STUDY :The feasibility of the project is analyzed in this phaseand
business proposal is put forth with a very general plan for the project and some cost
estimates. During system analysis the feasibility study of the proposed system is to
be carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major requirements
for the system isessential[7].

Three key considerations involved in the feasibility analysis are


 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

1.1.1 ECONOMICAL FEASIBILITY


This study is carried out to check the economic impact that the system will have on
the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus the developed
system as well within the budget and this was achieved because most of the technologies used
are freely available. Only the customized products had to be purchased.

PRMIT&R//DCSE//2020-21 Page no3


Spammer detection and fake user identification
on social networking sites

1.1.2 TECHNICALFEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.

1.1.3 SOCIALFEASIBILITY
The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user must not
feel threatened by the system, instead must accept it as a necessity. The level of acceptance by
the users solely depends on the methods that are employed to educate the user about the
system and to make him familiar with it. His level of confidence must be raised so that he is
also able to make some constructive criticism, which is welcomed, as he is the final user.

1.2 Objectives
1. To proposed a system which is more effective and accurate than existingsystem.
2. Tested with real timedata’s.
3. To study machine learning methodology using real time datasets and with different
characteristics andaccomplishments.

1.3 ProblemStatement

Some people can use Social Media Sites for bad purpose like people create their accounts on
socialnetworkingsitesusingfakeinformationandcanuseitfordoingbadthingslikespreading
rumours about something. For pretending them as genuine user and becoming friend withany

PRMIT&R//DCSE//2020-21 Page no4


Spammer detection and fake user identification
on social networking sites

peoplethattheyevendon’tknowthem.Harassingsomeone.Blackmailingpeoplesetc.Towork on
these we are proposing a system that can help to detect fake user from social networking
site(Twitter).

PRMIT&R//DCSE//2020-21 Page no5


Spammer detection and fake user identification
on social networking sites

2. LITERATURE REVIEW

1. Statistical features-based real-time detection of drifted Twitterspam


AUTHORS: C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min
Twitter spam has become a critical problem nowadays. Recent works focus on applying
machine learning techniques for Twitter spam detection, which make use of the statistical
features of tweets. In our labelled tweets data set, however, we observe that the statistical
properties of spam tweets vary over time, and thus, the performance of existing machine
learning-based classifiers decreases[13]. This issue is referred to as “Twitter Spam Drift”. In
order to tackle this problem, we first carry out a deep analysis on the statistical features of one
million spam tweets and one million non-spam tweets, and then propose a novel Fun scheme.
The proposed scheme can discover “changed” spam tweets from unlabeled tweets and
incorporate them into classifier's training process. A number of experiments are performed to
evaluate the proposed scheme. The results show that our proposed Fun scheme can
significantly improve the spam detection accuracy in real-world scenarios.

2. Automatically identifying fake news in popular Twitterthreads


AUTHORS: C. Buntain and J. Golbeck
Information quality in social media is an increasingly important issue, but web-scale data
hinders experts' ability to assess and correct much of the inaccurate content, or "fake news,"
present in these platforms. This paper develops a method for automating fake news detection
on Twitter by learning to predict accuracy assessments in two credibility-focused Twitter
datasets:CREDBANK,acrowdsourceddatasetofaccuracyassessmentsforeventsinTwitter, and
PHEME, a dataset of potential rumours in Twitter and journalistic assessments of their
accuracies. We apply this method to Twitter content sourced from Buzz Feed’s fake news
dataset and show models trained against crowd sourced workers outperform models based on
journalists' assessment and models trained on a pooled dataset of both crowd sourced workers
andjournalists[6].Allthreedatasets,alignedintoauniformformat,arealsopubliclyavailable.

PRMIT&R//DCSE//2020-21 Page no6


Spammer detection and fake user identification
on social networking sites

A feature analysis then identifies features that are most predictive for crowd sourced and
journalistic accuracy assessments, results of which are consistent with prior work. We close
with a discussion contrasting accuracy and credibility and why models ofnonexperts
outperform models of journalists for fake news detection in Twitter.

3. A performance evaluation of machine learning-based streaming spam tweets


detection

AUTHORS: C. Chen, J. Zhang, Y. Xie, Y. Xiang,W. Zhou, M. M. Hassan, A. AlElaiwi, and


M.Alrubaian

The popularity of Twitter attracts more and more spammers. Spammers send unwanted tweets
toTwitteruserstopromotewebsitesorservices,whichareharmfultonormalusers.Inorderto
stopspammers,researchershaveproposedanumberofmechanisms.Thefocusofrecentworks is on
the application of machine learning techniques into Twitter spam detection. However,
tweetsareretrievedinastreamingway,andTwitterprovidestheStreamingAPIfordevelopers and
researchers to access public tweets in real time. There lacks a performance evaluation of
existing machine learning-based streaming spam detection methods. In this paper, we bridged
the gap by carrying out a performance evaluation, which was from three different aspects of
data, feature, and model. A big ground-truth of over 600 million public tweets was created by
usingacommercialURL-basedsecuritytool.Forreal-timespamdetection,wefurtherextracted 12
lightweight features for tweet representation. Spam detection was then transformed to a binary
classification problem in the feature space and can be solved by conventional machine
learning algorithms. We evaluated the impact of different factors to the spam detection
performance, which included spam to nonspam ratio, feature discretization, training data size,
data sampling, time-related data, and machine learning algorithms[9]. The results showthe

PRMIT&R//DCSE//2020-21 Page no7


Spammer detection and fake user identification
on social networking sites

streaming spam tweet detection is still a big challenge and a robust detection technique should
take into account the three aspects of data, feature, and model.

4. A model-based approach for identifying spammers in socialnetworks


AUTHORS: F. Fathaliani and M. Bouguessa
In this paper, we view the task of identifying spammers in social networks from a mixture
modellingperspective,basedonwhichwedeviseaprincipledunsupervisedapproachtodetect
spammers. In our approach, we first represent each user of the social network with a feature
vector that reflects its behaviour and interactions with other participants. Next, based on the
estimated users feature vectors, we propose a statistical framework that uses the Dirichlet
distribution in order to identify spammers. The proposed approach is able to automatically
discriminate between spammers and legitimate users, while existing unsupervised approaches
require human intervention in order to set informal threshold parameters to detect spammers.
Furthermore,ourapproachisgeneralinthesensethatitcanbeappliedtodifferentonlinesocial
sites.Todemonstratethesuitabilityoftheproposedmethod,weconductedexperimentsonreal data
extracted from Instagram and Twitter.

5. Spam detection of Twitter traffic: A framework based on random forestsand


nonuniform featuresampling

AUTHORS: C. Meda, E. Ragusa, C. Gianoglio, R. Zunino, A. Ottaviano, E. Scillia, and R.


Surlinelli
Law Enforcement Agencies cover a crucial role in the analysis of open data and need effective
techniques to filter troublesome information. In a real scenario, Law Enforcement Agencies

PRMIT&R//DCSE//2020-21 Page no8


Spammer detection and fake user identification
on social networking sites

analyzeSocialNetworks,i.e.Twitter,monitoringeventsandprofilingaccounts.Unfortunately,
between the huge amount of internet users, there are people that use micro blogs for harassing
otherpeopleorspreadingmaliciouscontents.Users'classificationandspammers'identification is a
useful technique for relieve Twitter traffic from uninformative content[15]. This work
proposes a framework that exploits a non-uniform feature sampling inside a gray box Machine
LearningSystem,usingavariantoftheRandomForestsAlgorithmtoidentifyspammersinside
Twitter traffic. Experiments are made on a popular Twitter dataset and on a new dataset of
Twitter users. The new provided Twitter dataset is made up of users labelled as spammers or
legitimate users, described by 54 features. Experimental results demonstrate the effectiveness
of enriched feature samplingmethod.

PRMIT&R//DCSE//2020-21 Page no9


Spammer detection and fake user identification
on social networking sites

3. SYSTEM ARCHITECTURE

3.1 SYSTEMANALYSIS

3.1.1 EXISTINGSYSTEM:
 Tingminetal.providesasurveyofnewmethodsandtechniquestoidentifyTwitterspam
detection. The above survey presents a comparative study of the currentapproaches.

 On the other hand, S. J. Somanet. Al. conducted a survey on different behaviours


exhibited by spammers on Twitter social network. The study also provides a literature
review that recognizes the existence of spammers on Twitter socialnetwork.

 Despite all the existing studies, there is still a gap in the existing literature. Therefore, to
bridge the gap, we review state-of-the-art in the spammer detection and fake user
identification onTwitter[11].

DISADVANTAGES OF EXISTING SYSTEM:


 No efficient methodsused.
 No real time data’sused.
 Morecomplex
3.1.2 PROPOSED SYSTEM:
 The aim of this project is to identify fake user detection on Twitter and to present a
frameworkbyclassifyingtheseapproachesintoseveralcategories.Forclassification,we have
identified four means of reporting spammers that can be helpful in identifying fake
identitiesofusers.Spammerscanbeidentifiedbasedon:(I)fakecontent;(ii)URLbased spam
detection, (iii) detecting spam in trending topics, and (IV) fake useridentification.

PRMIT&R//DCSE//2020-21 Page no10


Spammer detection and fake user identification
on social networking sites

 Moreover, the analysis also shows that machine learning-based techniques can be
effectiveforidentifyingfakeuseronTwitter.However,theselectionofthemostfeasible
techniques and methods is highly dependent on the availabledata.

ADVANTAGES OF PROPOSED SYSTEM:


 Thisstudyincludesmachinelearningmethodologyproposedusingrealtimedatasetsand with
different characteristics andaccomplishments.

 The proposed system is more effective and accurate than other existingsystems.
 Tested with real timedata’s.

3.2 SYSTEM DESIGN

Tweet Data FAKEUSER

TWITTER

Get sample tweets


fromTwitter ADMIN
Pre-processing

Analysis Spam
tweets from the
Twitter

Fig 3.2 System Architecture

PRMIT&R//DCSE//2020-21 Page no11


Spammer detection and fake user identification
on social networking sites

File
Register
upload

File
Login
Details

FileTimeLine
View Viw

File Download
ViewProfile Client

Tweeton
TimeLine

Viewuserdetails TWITTER

Viewtweetsfrom
ADMIN

Classification
File Details

Detect Fake User

Fig 3.2 Block diagram

3.2.1 DATA FLOWDIAGRAM

 The DFD is also called as bubble chart. It is a simple graphical formalism that can be
usedtorepresentasystemintermsofinputdatatothesystem,variousprocessingcarried out on
this data, and the output data is generated by thissystem.

 The data flow diagram (DFD) is one of the most important modelling tools. It is used to
model the system components. These components are the system process, the data used
bytheprocess,anexternalentitythatinteractswiththesystemandtheinformationflows in
thesystem.

 DFD shows how the information moves through the system and how it is modified by a
series of transformations[21]. It is a graphical technique that depicts information flow
and the transformations that are applied as data moves from input tooutput.

PRMIT&R//DCSE//2020-21 Page no12


Spammer detection and fake user identification
on social networking sites

 DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functionaldetail.

PRMIT&R//DCSE//2020-21 Page no13


Spammer detection and fake user identification
on social networking sites

USER CSP
ADMIN

Register

Login

View Login
Client User
ViewTimeLine Details
Authorized
post fromthe
Twitter

View user
Tweets
View
Trending
Topic
Classification

Composetweet

ViewFollowing Fake Content, URL


based spamdetect
and Trending based
ViewFollowers

Logout
Fake User
Identification

Logout

Fig 3.2.1 DATA FLOW DIAGRAM

PRMIT&R//DCSE//2020-21 Page no14


Spammer detection and fake user identification
on social networking sites

3.2.2 UMLDIAGRAMS

UML stands for Unified Modelling Language. UML is a standardized general-purpose


modelling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object
orientedcomputersoftware.InitscurrentformUMLiscomprisedoftwomajorcomponents:a Meta-
model and a notation. In the future, some form of method or process may also be added to; or
associated with,UML.

The Unified Modelling Language is a standard language for specifying, Visualization,


Constructing and documenting the artefacts of software system, as well as for business
modelling and other non-software systems.

The UML represents a collection of best engineering practices that have proven
successful in the modelling of large and complex systems.

The UML is a very important part of developing objects oriented software and the
software development process[18]. The UML uses mostly graphical notations to express the
design of software projects.

GOALS:

The Primary goals in the design of the UML are as follows:


1. Provide users a ready-to-use, expressive visual modelling Language so that they can
develop and exchange meaningfulmodels.

2. Provide extendibility and specialization mechanisms to extend the coreconcepts.

PRMIT&R//DCSE//2020-21 Page no15


Spammer detection and fake user identification
on social networking sites

3. Be independent of particular programming languages and developmentprocess.


4. Provide a formal basis for understanding the modellinglanguage.
5. Encourage the growth of OO toolsmarket.
6. Supporthigherleveldevelopmentconceptssuchascollaborations,frameworks,patterns and
components.

7. Integrate bestpractices.

3.2.3 USE CASE DIAGRAM:


A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases[30]. The main purpose of a use
case diagram is to show what system functions are performed for which actor. Roles of the
actors in the system can be depicted.

PRMIT&R//DCSE//2020-21 Page no16


Spammer detection and fake user identification
on social networking sites

Registration

Login

View following and


followers

ViewTimeline

Viewprofile

ADMIN
Composetweet

Viewtrending
USER
View User Details

View user Tweets

Classification

Fake Content, URL


based spam detect and
Trending based

Logout

Fig 3.2.3 Use CaseDiagram

PRMIT&R//DCSE//2020-21 Page no17


Spammer detection and fake user identification
on social networking sites

3.2.4 CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modelling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes[29].

PRMIT&R//DCSE//2020-21 Page no18


Spammer detection and fake user identification
on social networking sites

explains which class contains information.

FAKE
ADMIN
Login
Login

Get sample tweets from Generate tweets ()


Twitter ()
Pre-processing ()
Analysis Spam tweets
from the Twitter ()

TWITTER

OSNGUI

Fig 3.2.4 CLASS DIAGRAM

PRMIT&R//DCSE//2020-21 Page no19


Spammer detection and fake user identification
on social networking sites

3.2.5 SEQUENCE DIAGRAM:

A sequence diagram in Unified Modelling Language (UML) is a kind of interaction diagram


that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart[25]. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.

PRMIT&R//DCSE//2020-21 Page no20


Spammer detection and fake user identification
on social networking sites

3.2.5 SEQUENCE DIAGRAM

PRMIT&R//DCSE//2020-21 Page no21


Spammer detection and fake user identification
on social networking sites

3.2.6 ACTIVITY DIAGRAM

Activitydiagramsaregraphicalrepresentationsofworkflowsofstepwiseactivitiesandactions
withsupportforchoice,iterationandconcurrency.IntheUnifiedModellingLanguage,activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system[20]. An activity diagram shows the overall flow ofcontrol.

PRMIT&R//DCSE//2020-21 Page no22


Spammer detection and fake user identification
on social networking sites

Start

USER ADMIN

ViewTimeLine ViewUser
Client Login
postfromthe Authorized
Details
Twitter

View user
ViewTrending Tweets
Topic

Classification

Compose tweet
Fake Content,
URL based spam
ViewFollowing detectand
Trending based

View Followers

Fake User
Identification

3.2.5 ACTIVITY DIAGRAM

PRMIT&R//DCSE//2020-21 Page no23


Spammer detection and fake user identification
on social networking sites

3.3 INPUT DESIGN AND OUTPUTDESIGN


3.3.1 INPUTDESIGN
The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary to
put transaction data in to a usable form for processing can be achieved by inspecting the
computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system. The design of input focuses on controlling the amount
of input required, controlling the errors, avoiding delay, avoiding extra steps and keeping the
processsimple[18].Theinputisdesignedinsuchawaysothatitprovidessecurityandeaseof use with
retaining the privacy. Input Design considered the followingthings:

 What data should be given asinput?


 How the data should be arranged orcoded?
 The dialog to guide the operating personnel in providinginput.
 Methods for preparing input validations and steps to follow when erroroccur.

OBJECTIVES
Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process and
show the correct direction to the management for getting correct information from the
computerized system.
 It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to befree
from errors. The data entry screen is designed in such a way that all the data
manipulates can be performed. It also provides record viewingfacilities.
 When the data is entered it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that the user willnot

PRMIT&R//DCSE//2020-21 Page no24


Spammer detection and fake user identification
on social networking sites

be in maize of instant. Thus the objective of input design is to create an input layout that
is easy to follow

3.3.2 OUTPUT DESIGN


A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs. In output design it is determined how the information is to be
displacedforimmediateneedandalsothehardcopyoutput.Itisthemostimportantanddirect source
informationtotheuser[32].Efficientandintelligentoutputdesignimprovesthesystem’s relationship
to help userdecision-making.

1. Designing computer output should proceed in an organized, well thought out manner; the
right output must be developed while ensuring that each output element is designed so that
people will find the system can use easily and effectively. When analysis design computer
output, they should Identify the specific output that is needed to meet therequirements.

2. Select methods for presentinginformation.


3. Create document, report, or other formats that contain information produced by the system.
The output form of an information system should accomplish one or more of the following
objectives.

 Convey information about past activities, current status or projections of theFuture.

 Signal important events, opportunities, problems, orwarnings.


 Trigger an action.
 Confirm anaction

PRMIT&R//DCSE//2020-21 Page no25


Spammer detection and fake user identification
on social networking sites

a) Machine Learning Technique:


 The number of features, which are associated with tweet content, and the characteristics
of users are recognized for the detection of spammers. These features are considered as
the characteristics of machine learning process for categorizing users, i.e., to know
whether they are spammers ornot.

 In order to recognize the approach for detecting spammers on Twitter, the labelled
collection in pre-classification of fake user and legitimate user has been done[27]. Next,
those steps are taken which are needed for the construction of labelled collection and
acquired various desiredproperties.

 Inotherwords,stepswhichareessentialtobeexaminedtodevelopthecollectionofusers that
can be labelled as fake user or legitimate user. At the end, user attributes are
identifiedbasedontheirbehavior,e.g.,whotheyinteractwithandwhatisthefrequency of
theirinteraction.

 In order to confirm these instinct, features of users of the labelled collection has been
checked. Two attribute sets are considered, i.e., content attributes and user behavior
attributes, to differentiate one user from theother.

PRMIT&R//DCSE//2020-21 Page no26


Spammer detection and fake user identification
on social networking sites

4. IMPLEMENTATION

MODULES:
 Admin Module
 DataCollection
 Train andTest
 Machine Learning Technique
 Detection of FakeUser

MODULE DESCRIPTIONS:
4.1 Admin Module:
Inthefirstmodule,wedeveloptheOnlineSocialNetworking(OSN)systemmodule.Webuild up the
system with the feature of Online Social Networking System, Twitter. Where, this module is
used for admin login with their authentication.

4.1 Admin Module

PRMIT&R//DCSE//2020-21 Page no27


Spammer detection and fake user identification
on social networking sites

4.2 Data Collection:


We will be using a Python Library called Tweepy to connect to the Twitter API and collect
the data. We download tweets containing certain key words, to incorporate the words or
hash tags that contain relevant keyword related to fake users.
Some of the most important fields are:
• Text, which contains the text included in thetweet.

• Created at, which is a timestamp of when the tweet wascreated.

• User,whichcontainsinformationabouttheuserthatcreatedthetweet,liketheusername and
user id.

4.2 Data Collection

PRMIT&R//DCSE//2020-21 Page no28


Spammer detection and fake user identification
on social networking sites

4.2 Data Collection

4.3 Train and Test:


We present the proposed framework for metadata features are extracted from available
additionalinformationregardingthetweetsofauser,whereascontent-basedfeaturesaim
toobservethemessagepostingbehaviorofauserandthequalityofthetextthattheuser uses
inposts.

4.3 Train and Test:

PRMIT&R//DCSE//2020-21 Page no29


Spammer detection and fake user identification
on social networking sites

4.4 User Check


This module is used to check the particular user is present or not on a twitter. If a user is
present then it shows parameters present in 4.4.1.

4.4 User Check

4.4.1 User Check Parameters

PRMIT&R//DCSE//2020-21 Page no30


Spammer detection and fake user identification
on social networking sites

4.5 Detection
This module helps to detect the particular user.

4.5 Detection

PRMIT&R//DCSE//2020-21 Page no31


Spammer detection and fake user identification
on social networking sites

5 . CONCLUSION

The development of successful strategies for the fake user identification on Twitter, there
are still many problems to further development by the researchers. The issues are highlighted
as fallow: Fake user identification on social media is a problem that needs to be explored
because of the serious repercussions of such news at individual as well as differentlevel.
Another related subject that is worth exploring is the discovery of rumor sources on social
media. While a few experiments focused on different techniques have already been
performed to identify the origins of misinformation, more advanced approaches, e.g., social
networkbased approaches, can be extended because of their demonstrated efficacy.

PRMIT&R//DCSE//2020-21 Page no32


Spammer detection and fake user identification
on social networking sites

6 .REFERENCES

[1] B. Erçahin, Ö. Akta³, D. Kilinç, and C. Akyol, ``Twitter fake accountdetection,'' in Proc.
Int. Conf. Compute. Sci. Eng. (UBMK), Oct. 2017,pp.388_392.

[2] F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida, ``Detectingspammers on


Twitter,'' in Proc. Collaboration, Electron. Messaging, Anti-Abuse Spam Conf. (CEAS),
vol.

6, Jul. 2010, p. 12.

[3] S. Gharge, and M. Chavan, ``An integrated approach for malicioustweetsdetection using
NLP,''inProc.Int.Conf.InventiveCommun.Comput.Technol.(ICICCT),Mar.2017,pp.

435_438.

[4] T. Wu, S. Wen, Y. Xiang, and W. Zhou, ``Twitter spam detection: Surveyof new
approaches and comparative study,''Comput. Secur., vol. 76,pp. 265_284, Jul.2018.

[5] S. J. Soman, ``A survey on behaviors exhibited by spammers in popularsocial media


networks,'' in Proc. Int. Conf. Circuit, Power Comput. Tech-nol. (ICCPCT), Mar. 2016,
pp.

1_6.

PRMIT&R//DCSE//2020-21 Page no33


Spammer detection and fake user identification
on social networking sites

[6] A. Gupta, H. Lamba, and P. Kumaraguru, ``1.00 per RT #BostonMarathon#


prayforboston: Analyzing fake content on Twitter,'' in Proc. eCrimeResearchers Summit
(eCRS), 2013, pp.1_12.

[7] F. Concone, A. De Paola, G. Lo Re, and M. Morana, ``Twitter analysis forreal-time


malware discovery,'' in Proc. AEIT Int. Annu. Conf., Sep. 2017,pp.1_6.

[8] N. Eshraqi, M. Jalali, and M. H. Moattar, ``Detecting spam tweets inTwitter using a data
stream clustering algorithm,'' in Proc. Int. Congr.Technol., Commun.Knowl.(ICTCK),
Nov. 2015, pp.347_351.

[9] C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min, ``Statisticalfeatures-based


real-time detection of drifted Twitter spam,'' IEEE Trans.Inf. Forensics Security, vol. 12,
no.

4, pp. 914_925, Apr. 2017.

[10] C. Buntain and J. Golbeck, ``Automatically identifying fake news in popularTwitter


threads,'' in Proc. IEEE Int. Conf. Smart Cloud (SmartCloud),Nov. 2017, pp.208_215.

[11] C. Chen, J. Zhang, Y. Xie, Y. Xiang,W. Zhou, M. M. Hassan, A. AlElaiwi,and M.


Alrubaian, ``A performance evaluation of machine learning-basedstreaming spam tweets
detection,'' IEEE Trans. Comput. Social Syst.,vol. 2, no. 3, pp. 65_76, Sep.2015.

PRMIT&R//DCSE//2020-21 Page no34


Spammer detection and fake user identification
on social networking sites

[12] G.StaffordandL.L.Yu,``AnevaluationoftheeffectofspamonTwittertrendingtopics,'' in
Proc. Int. Conf. Social Comput., Sep. 2013,pp.373_378.

[13] M. Mateen, M. A. Iqbal, M. Aleem, and M. A. Islam, ``A hybrid approachfor spam
detection for Twitter,'' in Proc. 14th Int. Bhurban Conf. Appl. Sci.Technol. (IBCAST),
Jan.

2017, pp. 466_471.

[14] A. Gupta and R. Kaushal, ``Improving spam detection in online social networks,''inProc.
Int. Conf. Cogn.Comput. Inf. Process. (CCIP), Mar. 2015, pp.1_6.

[15] F. Fathaliani and M. Bouguessa, ``A model-based approach for identifyingspammers in


social networks,'' in Proc. IEEE Int. Conf. Data Sci. Adv.Anal.(DSAA), Oct. 2015, pp.
1_9.

[16] V.Chauhan,A.Pilaniya,V.Middha,A.Gupta,U.Bana,B.R.Prasad,andS.Agarwal,
``Anomalous behavior detection in social networking,''inProc. 8th Int. Conf. Comput.,
Commun. Netw.Technol. (ICCCNT),Jul. 2017, pp. 1_5.

[17] S.Jeong,G.Noh,H.Oh,andC.-K.Kim,``Followspamdetectionbasedoncascadedsocial
information,'' Inf. Sci., vol. 369, pp. 481_499, Nov.2016.

PRMIT&R//DCSE//2020-21 Page no35


Spammer detection and fake user identification
on social networking sites

[18] M. Washha, A. Qaroush, and F. Sedes, ``Leveraging time for spammersdetection on


Twitter,'' in Proc. 8th Int. Conf. Manage. Digit.EcoSyst.,Nov. 2016, pp.109_116.

[19] B. Wang, A. Zubiaga, M. Liakata, and R. Procter, ``Making the mostof tweet-inherent
features for social spam detection on Twitter,'' 2015,arXiv:1503.07405. [Online].
Available:

https://arxiv.org/abs/1503.07405

[20] M. Hussain, M. Ahmed, H. A. Khattak, M. Imran, A. Khan, S. Din,A. Ahmad, G. Jeon,


and A. G. Reddy, ``Towards ontology-based multilingualURL _ltering: A big data
problem,'' J. Supercomput., vol. 74, no. 10,pp. 5003_5021, Oct.2018.

[21] C.Meda,E.Ragusa,C.Gianoglio,R.Zunino,A.Ottaviano,E.Scillia,andR.Surlinelli,
``SpamdetectionofTwittertraf_c:Aframeworkbasedonrandomforestsandnon-uniform
feature sampling,'' in Proc. IEEE/ACMInt. Conf. Adv. Social Netw. Anal.Mining
(ASONAM), Aug. 2016,pp.811_817.

[22] S. Ghosh, G. Korlam, and N. Ganguly, ``Spammers' networks withinonline social


networks:Acase-studyonTwitter,''inProc.20thInt.Conf.CompanionWorldWideWeb, Mar.
2011, pp.41_42.

PRMIT&R//DCSE//2020-21 Page no36


Spammer detection and fake user identification
on social networking sites

[23] C. Chen, S. Wen, J. Zhang, Y. Xiang, J. Oliver, A. Alelaiwi, andM. M.Hassan,


``InvestigatingthedeceptiveinformationinTwitterspam,''FutureGener.Comput.Syst.,vol.72, pp.
319_326, Jul.2017.

[24] I. David, O. S. Siordia, and D. Moctezuma, ``Features combination forthe detection of


malicious Twitter accounts,'' in Proc. IEEE Int. AutumnMeeting Power,Electron.

Comput.(ROPEC), Nov. 2016, pp. 1_6.

[25] M. Babcock, R. A. V. Cox, and S. Kumar, ``Diffusion of pro- and anti-falseinformation


tweets: The black panther movie case,'' Comput. Math. Org.Theory, vol. 25, no. 1, pp.
72_84, Mar.2019.

[26] S. Keretna, A. Hossny, and D. Creighton, ``Recognising user identity inTwitter social
networks via text mining,'' in Proc. IEEE Int. Conf. Syst.,Man, Cybern., Oct. 2013,pp.

3079_3082.

[27] C.Meda,F.Bisio,P.Gastaldo,andR.Zunino,``AmachinelearningapproachforTwitter
spammersdetection,''inProc.Int.CarnahanConf.Secur.Technol.(ICCST),Oct.2014,pp.

1_6.

PRMIT&R//DCSE//2020-21 Page no37


Spammer detection and fake user identification
on social networking sites

[28] W. Chen, C. K. Yeo, C. T. Lau, and B. S. Lee, ``Real-time Twitter contentpolluter


detection based on direct features,'' in Proc. 2nd Int. Conf. Inf. Sci.Secur. (ICISS), Dec.
2015, pp.1_4.

[29] H. Shen and X. Liu, ``Detecting spammers on Twitter based on contentand social
interaction,'' in Proc. Int. Conf. Netw. Inf. Syst. Comput.,pp. 413_417, Jan.2015.

[30] G.Jain,M.Sharma,andB.Agarwal,``Spamdetectioninsocialmediausingconvolutional and


long short term memory neural network,'' Ann.Math.Artif.Intell., vol. 85, no. 1, pp.
21_44, Jan.2019.

[31] M. Washha, A. Qaroush, M. Mezghani, and F. Sedes, ``A topic-based hiddenMarkov


model for real-time spam tweets _ltering,'' ProcediaComput.Sci., vol. 112, pp. 833_843,
Jan.

2017.

[32] F. Pierri and S. Ceri, ``False news on social media: A data-drivensurvey,'' 2019,
arXiv:1902.07539. [Online]. Available:https://arxiv.org/abs/1902.07539

[33] S.Sadiq,Y.Yan,A.Taylor,M.-L.Shyu,S.-C.Chen,andD.Feaster,``AAFA:Associative
af_nityfactoranalysisforbotdetectionandstanceclassificationinTwitter,''inProc.IEEE Int.
Conf. Inf. Reuse Integr. (IRI),Aug. 2017, pp.356_365.

PRMIT&R//DCSE//2020-21 Page no38


Spammer detection and fake user identification
on social networking sites

[34] M. U. S. Khan, M. Ali, A. Abbas, S. U. Khan, and A. Y. Zomaya,``Segregating spammers


and unsolicited bloggers from genuine expertson Twitter,'' IEEE Trans.

Dependable Secure Comput., vol. 15, no. 4,pp. 551_560, Jul./Aug. 2018.

PRMIT&R//DCSE//2020-21 Page no39


RESUME

1)

PERSONAL DETAILS
NAME PriyankaShaileshBhise
DATEOFBIRTH 28-03-1998
ADDRESS At.Post: DongarkhadaTq :Kalamb
MOBILENO 9623634620
EMAIL_ID pihubhise28@gmail.com

EDUCATIONDETAILS

NameofBoard PassingYear %ofMarks/ CGPA

10thSSC MaharashtraStateBoard 2014 81.80


12thHSC Maharashtra State Board 2016 54.31
Diploma MSBTE 2018 77.69
BE(Degree) SGBU 2020 83.24
Bachelor Of PassingYear Marks
Engineering(B. &Month Obt/Out % ofMarks Pointer
E) of
IIndYear III-SEM Winter-18 426/650 60.15 6.46
IV-SEM Summer19 554/700 74.42 8.86
IIIrdYear V-SEM Winter-19 521/700 65.71 7.68
VI-SEM Summer-20 540/700 81.14 9.73
IVthYear VII-SEM Winter-21 673/700 96.14 10
VIII-SEM - - - -

PLACEMENTDETAILS
CampusPlacement(IfAny) No
(IfAny) NameOfCompany

FUTUTREPLANNING
Higher Studies/ HigherStudies -
JobPreferences Job Yes
Training -
Business -
Place: Amravati Signature
Date: 17/05/21 Priyanka S. Bhise

2)

PERSONALDETAILS
NAME Vaishnavi Sunil Uttarwar
DATEOFBIRTH 30-06-1998
ADDRESS ShivajiNagar,Arni
MOBILENO 7249371275
EMAIL_ID vaishnaviuttarwar64@gmail.com

EDUCATIONDETAILS

NameofBoard PassingYear %ofMarks/ CGPA

10thSSC MaharashtraStateBoard 2015 81


Diploma MSBTE 2018 74.38
BE(Degree) SGBAU 2021 72.63
Bachelor Of PassingYear Marks
Engineering(B. &Month Obt/Out % ofMarks Pointer
E) of
IIndYear III-SEM Winter-17 392/650 60.6 6.77
IV-SEM Summer18 473/700 67.5 7.4
IIIrdYear V-SEM Winter-18 427/700 61 6.62
VI-SEM Summer-19 549/700 78 9.15
IVthYear VII-SEM Winter-19 663700 94.71 10
VIII-SEM - - - -

PLACEMENTDETAILS
CampusPlacement(IfAny) No
(IfAny) NameOfCompany
FUTUTREPLANNING
Higher Studies/ HigherStudies -
JobPreferences Job Yes
Training -
Business -
Place: Amravati Signature
Date: 17/05/21 Vaishnavi S Uttarwar

3)
PERSONALDETAILS
NAME Vishal Nana Narote
DATEOFBIRTH 09-05-1999
ADDRESS At Deopur Post DudhaTq And Dist. Buldhana
MOBILENO 8308080614
EMAIL_ID vishalnarote891@gmail.com

EDUCATION DETAILS
% of Marks
Name of Board Passing Year
/CGPA
10thSSC Maharashtra State Board 2015 88
th
12 HSC Maharashtra State Board 2017 63
BE(Degree) SGBAU 2021 75
Bachelor Off Passing Year Marks
Engineering(B. & Month Obt/Out % of Marks Pointer
E) of
st
I Year I-SEM Winter-17 369/600 61.5 6.4
II-SEM Summer-18 384/600 64 7.29
nd
II Year III-SEM Winter-18 415/650 67 7.5
IV-SEM Summer19 481/700 69 7.7
rd
III Year V-SEM Winter-19 499/700 72 8.3
VI-SEM Summer-20 597/700 85 9.8
th
IV Year VII-SEM Winter-20 648/700 93 9.8
VIII-SEM - - - -
PLACEMENTDETAILS
Campus Placement(If Any) Yes
(If Any) Name Of Company Wipro

FUTUTREPLANNING
Higher Studies/ Job Higher Studies -
Preferences Job Yes
Training -
Business -

Place:Buldhana Signature
Date:17/05/2021 Vishal N Narote

4)

PERSONAL DETAILS
NAME Karan SubhashKukade
DATE OF BIRTH 24-02-1999
ADDRESS SantDhnyaneshwar ward, Hinganghat
MOBILE NO 7588302429
EMAIL_ID karankukade18@gmail.com

EDUCATION DETAILS
% of Marks/ CGPA
Name of Board Passing Year
10thSSC Maharashtra State Board 2015 89.20
th
12 HSC Maharashtra State Board 2017 75.54
BE(Degree) SGBAU 2021 68.67
Bachelor Of Passing Year & Marks
Engineering(B. Month Obt / % of Marks Pointer
E) Out of
Ist Year I-SEM Winter-17 380/600 66.33 7.68
II-SEM Summer-18 359/600 59.50 6.46
IInd Year III-SEM Winter-18 378/650 56.92 6.36
IV-SEM Summer19 446/700 62.14 7.08
rd
III Year V-SEM Winter-19 433/700 63.43 7.27
VI-SEM Summer-20 514/700 77.71 9.15
IVth Year VII-SEM Winter-20 414/700 94.14 10
VIII-SEM - - - -

PLACEMENTDETAILS
Campus Placement (If Any) Yes
(If Any) Name Of Company Mindtree

FUTUTREPLANNING
Higher Studies/ Job Higher Studies -
Preferences Job Yes
Training -
Business -

Place: Hinganghat Signature


Date: 17/05/2021 Karan Kukade

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy