Project Report: Bachelor of Engineering
Project Report: Bachelor of Engineering
On
Fake User Identification on Social Networking Site(Twitter)
BACHELOR OF ENGINEERING
Submitted By:-
Priyanka Bhise
Vaishnavi Uttarwar
Vishal Narote
Karan Kukade
By
Priyanka Bhise
Vaishvi Uttarwar
Vishal Narote
Karan Kukade
In the partial fulfillment of the requirement for the degree of Bachelor of Engineering in
Computer Science & Engineering, during the academic year 2020-2021 under my
guidance.
Prof. A. R. Deshmukh Dr. G. R. Bamnote
Guide Head
Department Of Computer Sci.&Engg Department Of Computer Sci. &Engg
PRMITR,Badnera PRMITR,Badnera
Vaishnavi Uttarwar(Roll.No.17)
Vishal Narote(Roll.No.18)
Karan Kukade(Roll.No.19)
Social networking sites engage millions of users around the world. The users' interactions with
these social sites, such as Twitter and Facebook have a tremendous impact and occasionally
undesirablerepercussionsfordailylife.Theprominentsocialnetworkingsiteshaveturnedinto a
target platform for the spammers to disperse a huge amount of irrelevant and deleterious
information.Twitter,forexample,hasbecomeoneofthemostextravagantlyusedplatformsof all
times and therefore allows an unreasonable amount of spam. Fake users send undesired tweets
to users to promote services or websites that not only affect legitimate users but also disrupt
resource consumption. Moreover, the possibility of expanding invalid information to users
through fake identities has increased that results in the unrolling of harmful content. Recently,
the detection of spammers and identification of fake users on Twitter has become a common
area of research in contemporary online social Networks (OSNs). In this paper, we
performareviewoftechniquesusedfordetectingspammersonTwitter.Moreover,ataxonomy of the
Twitter spam detection approaches is presented that classifies the techniques based on their
ability to detect: (I) fake content, (ii) spam based on URL, (iii) spam in trending topics,
and(iv)fakeusers.Thepresentedtechniquesarealsocomparedbasedonvariousfeatures,such as user
features, content features, graph features, structure features, and time features. We are hopeful
that the presented study will be a useful resource for researchers to find the highlights of
recent developments in Twitter spam detection on a singleplatform.
TABLE OF CONTENT
4.5 Detection 31
Spammer detection and fake user identification
on social networking sites
1. INTRODUCTION
I. Social Network
Wikipedia defines a social network service as a service which “focuses on the building and
verifyingofonlinesocialnetworksforcommunitiesofpeoplewhoshareinterestsandactivities, or
who are interested in exploring the interests and activities of others, and which necessitates the
use ofsoftware.”
AreportpublishedbyOCLCprovidesthefollowingdefinitionofsocialnetworkingsites:“Web
sitesprimarilydesignedtofacilitateinteractionbetweenuserswhoshareinterests,attitudesand
activities, such as Facebook, Mixi andMySpace.”
Ease of access to information and applications: The ease of use of many social
networking services can provide benefits to users by simplifying access to other tools
and applications. The Facebook Platform provides an example of how a social
networking service can be used as an environment for othertools.
Common interface: A possible benefit of social networks may be the common
interface which spans work / social boundaries. Since such services are often used in
a personal capacity the interface and the way the service works may be familiar,thus
minimising training and support needed to exploit the services in a professional
context. This can, however, also be a barrier to those who wish to have strict
boundaries between work and socialactivities.
Note that this brief list of popular social networking services omits popular social sharing
services such as Flicker and YouTube.
1.1 SYSTEMSTUDY
FEASIBILITY STUDY :The feasibility of the project is analyzed in this phaseand
business proposal is put forth with a very general plan for the project and some cost
estimates. During system analysis the feasibility study of the proposed system is to
be carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major requirements
for the system isessential[7].
1.1.2 TECHNICALFEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.
1.1.3 SOCIALFEASIBILITY
The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user must not
feel threatened by the system, instead must accept it as a necessity. The level of acceptance by
the users solely depends on the methods that are employed to educate the user about the
system and to make him familiar with it. His level of confidence must be raised so that he is
also able to make some constructive criticism, which is welcomed, as he is the final user.
1.2 Objectives
1. To proposed a system which is more effective and accurate than existingsystem.
2. Tested with real timedata’s.
3. To study machine learning methodology using real time datasets and with different
characteristics andaccomplishments.
1.3 ProblemStatement
Some people can use Social Media Sites for bad purpose like people create their accounts on
socialnetworkingsitesusingfakeinformationandcanuseitfordoingbadthingslikespreading
rumours about something. For pretending them as genuine user and becoming friend withany
peoplethattheyevendon’tknowthem.Harassingsomeone.Blackmailingpeoplesetc.Towork on
these we are proposing a system that can help to detect fake user from social networking
site(Twitter).
2. LITERATURE REVIEW
A feature analysis then identifies features that are most predictive for crowd sourced and
journalistic accuracy assessments, results of which are consistent with prior work. We close
with a discussion contrasting accuracy and credibility and why models ofnonexperts
outperform models of journalists for fake news detection in Twitter.
The popularity of Twitter attracts more and more spammers. Spammers send unwanted tweets
toTwitteruserstopromotewebsitesorservices,whichareharmfultonormalusers.Inorderto
stopspammers,researchershaveproposedanumberofmechanisms.Thefocusofrecentworks is on
the application of machine learning techniques into Twitter spam detection. However,
tweetsareretrievedinastreamingway,andTwitterprovidestheStreamingAPIfordevelopers and
researchers to access public tweets in real time. There lacks a performance evaluation of
existing machine learning-based streaming spam detection methods. In this paper, we bridged
the gap by carrying out a performance evaluation, which was from three different aspects of
data, feature, and model. A big ground-truth of over 600 million public tweets was created by
usingacommercialURL-basedsecuritytool.Forreal-timespamdetection,wefurtherextracted 12
lightweight features for tweet representation. Spam detection was then transformed to a binary
classification problem in the feature space and can be solved by conventional machine
learning algorithms. We evaluated the impact of different factors to the spam detection
performance, which included spam to nonspam ratio, feature discretization, training data size,
data sampling, time-related data, and machine learning algorithms[9]. The results showthe
streaming spam tweet detection is still a big challenge and a robust detection technique should
take into account the three aspects of data, feature, and model.
analyzeSocialNetworks,i.e.Twitter,monitoringeventsandprofilingaccounts.Unfortunately,
between the huge amount of internet users, there are people that use micro blogs for harassing
otherpeopleorspreadingmaliciouscontents.Users'classificationandspammers'identification is a
useful technique for relieve Twitter traffic from uninformative content[15]. This work
proposes a framework that exploits a non-uniform feature sampling inside a gray box Machine
LearningSystem,usingavariantoftheRandomForestsAlgorithmtoidentifyspammersinside
Twitter traffic. Experiments are made on a popular Twitter dataset and on a new dataset of
Twitter users. The new provided Twitter dataset is made up of users labelled as spammers or
legitimate users, described by 54 features. Experimental results demonstrate the effectiveness
of enriched feature samplingmethod.
3. SYSTEM ARCHITECTURE
3.1 SYSTEMANALYSIS
3.1.1 EXISTINGSYSTEM:
Tingminetal.providesasurveyofnewmethodsandtechniquestoidentifyTwitterspam
detection. The above survey presents a comparative study of the currentapproaches.
Despite all the existing studies, there is still a gap in the existing literature. Therefore, to
bridge the gap, we review state-of-the-art in the spammer detection and fake user
identification onTwitter[11].
Moreover, the analysis also shows that machine learning-based techniques can be
effectiveforidentifyingfakeuseronTwitter.However,theselectionofthemostfeasible
techniques and methods is highly dependent on the availabledata.
The proposed system is more effective and accurate than other existingsystems.
Tested with real timedata’s.
Analysis Spam
tweets from the
Twitter
File
Register
upload
File
Login
Details
FileTimeLine
View Viw
File Download
ViewProfile Client
Tweeton
TimeLine
Viewuserdetails TWITTER
Viewtweetsfrom
ADMIN
Classification
File Details
The DFD is also called as bubble chart. It is a simple graphical formalism that can be
usedtorepresentasystemintermsofinputdatatothesystem,variousprocessingcarried out on
this data, and the output data is generated by thissystem.
The data flow diagram (DFD) is one of the most important modelling tools. It is used to
model the system components. These components are the system process, the data used
bytheprocess,anexternalentitythatinteractswiththesystemandtheinformationflows in
thesystem.
DFD shows how the information moves through the system and how it is modified by a
series of transformations[21]. It is a graphical technique that depicts information flow
and the transformations that are applied as data moves from input tooutput.
DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functionaldetail.
USER CSP
ADMIN
Register
Login
View Login
Client User
ViewTimeLine Details
Authorized
post fromthe
Twitter
View user
Tweets
View
Trending
Topic
Classification
Composetweet
Logout
Fake User
Identification
Logout
3.2.2 UMLDIAGRAMS
The goal is for UML to become a common language for creating models of object
orientedcomputersoftware.InitscurrentformUMLiscomprisedoftwomajorcomponents:a Meta-
model and a notation. In the future, some form of method or process may also be added to; or
associated with,UML.
The UML represents a collection of best engineering practices that have proven
successful in the modelling of large and complex systems.
The UML is a very important part of developing objects oriented software and the
software development process[18]. The UML uses mostly graphical notations to express the
design of software projects.
GOALS:
7. Integrate bestpractices.
Registration
Login
ViewTimeline
Viewprofile
ADMIN
Composetweet
Viewtrending
USER
View User Details
Classification
Logout
In software engineering, a class diagram in the Unified Modelling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes[29].
FAKE
ADMIN
Login
Login
OSNGUI
Activitydiagramsaregraphicalrepresentationsofworkflowsofstepwiseactivitiesandactions
withsupportforchoice,iterationandconcurrency.IntheUnifiedModellingLanguage,activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system[20]. An activity diagram shows the overall flow ofcontrol.
Start
USER ADMIN
ViewTimeLine ViewUser
Client Login
postfromthe Authorized
Details
Twitter
View user
ViewTrending Tweets
Topic
Classification
Compose tweet
Fake Content,
URL based spam
ViewFollowing detectand
Trending based
View Followers
Fake User
Identification
OBJECTIVES
Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process and
show the correct direction to the management for getting correct information from the
computerized system.
It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to befree
from errors. The data entry screen is designed in such a way that all the data
manipulates can be performed. It also provides record viewingfacilities.
When the data is entered it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that the user willnot
be in maize of instant. Thus the objective of input design is to create an input layout that
is easy to follow
1. Designing computer output should proceed in an organized, well thought out manner; the
right output must be developed while ensuring that each output element is designed so that
people will find the system can use easily and effectively. When analysis design computer
output, they should Identify the specific output that is needed to meet therequirements.
In order to recognize the approach for detecting spammers on Twitter, the labelled
collection in pre-classification of fake user and legitimate user has been done[27]. Next,
those steps are taken which are needed for the construction of labelled collection and
acquired various desiredproperties.
Inotherwords,stepswhichareessentialtobeexaminedtodevelopthecollectionofusers that
can be labelled as fake user or legitimate user. At the end, user attributes are
identifiedbasedontheirbehavior,e.g.,whotheyinteractwithandwhatisthefrequency of
theirinteraction.
In order to confirm these instinct, features of users of the labelled collection has been
checked. Two attribute sets are considered, i.e., content attributes and user behavior
attributes, to differentiate one user from theother.
4. IMPLEMENTATION
MODULES:
Admin Module
DataCollection
Train andTest
Machine Learning Technique
Detection of FakeUser
MODULE DESCRIPTIONS:
4.1 Admin Module:
Inthefirstmodule,wedeveloptheOnlineSocialNetworking(OSN)systemmodule.Webuild up the
system with the feature of Online Social Networking System, Twitter. Where, this module is
used for admin login with their authentication.
• User,whichcontainsinformationabouttheuserthatcreatedthetweet,liketheusername and
user id.
4.5 Detection
This module helps to detect the particular user.
4.5 Detection
5 . CONCLUSION
The development of successful strategies for the fake user identification on Twitter, there
are still many problems to further development by the researchers. The issues are highlighted
as fallow: Fake user identification on social media is a problem that needs to be explored
because of the serious repercussions of such news at individual as well as differentlevel.
Another related subject that is worth exploring is the discovery of rumor sources on social
media. While a few experiments focused on different techniques have already been
performed to identify the origins of misinformation, more advanced approaches, e.g., social
networkbased approaches, can be extended because of their demonstrated efficacy.
6 .REFERENCES
[1] B. Erçahin, Ö. Akta³, D. Kilinç, and C. Akyol, ``Twitter fake accountdetection,'' in Proc.
Int. Conf. Compute. Sci. Eng. (UBMK), Oct. 2017,pp.388_392.
[3] S. Gharge, and M. Chavan, ``An integrated approach for malicioustweetsdetection using
NLP,''inProc.Int.Conf.InventiveCommun.Comput.Technol.(ICICCT),Mar.2017,pp.
435_438.
[4] T. Wu, S. Wen, Y. Xiang, and W. Zhou, ``Twitter spam detection: Surveyof new
approaches and comparative study,''Comput. Secur., vol. 76,pp. 265_284, Jul.2018.
1_6.
[8] N. Eshraqi, M. Jalali, and M. H. Moattar, ``Detecting spam tweets inTwitter using a data
stream clustering algorithm,'' in Proc. Int. Congr.Technol., Commun.Knowl.(ICTCK),
Nov. 2015, pp.347_351.
[12] G.StaffordandL.L.Yu,``AnevaluationoftheeffectofspamonTwittertrendingtopics,'' in
Proc. Int. Conf. Social Comput., Sep. 2013,pp.373_378.
[13] M. Mateen, M. A. Iqbal, M. Aleem, and M. A. Islam, ``A hybrid approachfor spam
detection for Twitter,'' in Proc. 14th Int. Bhurban Conf. Appl. Sci.Technol. (IBCAST),
Jan.
[14] A. Gupta and R. Kaushal, ``Improving spam detection in online social networks,''inProc.
Int. Conf. Cogn.Comput. Inf. Process. (CCIP), Mar. 2015, pp.1_6.
[16] V.Chauhan,A.Pilaniya,V.Middha,A.Gupta,U.Bana,B.R.Prasad,andS.Agarwal,
``Anomalous behavior detection in social networking,''inProc. 8th Int. Conf. Comput.,
Commun. Netw.Technol. (ICCCNT),Jul. 2017, pp. 1_5.
[17] S.Jeong,G.Noh,H.Oh,andC.-K.Kim,``Followspamdetectionbasedoncascadedsocial
information,'' Inf. Sci., vol. 369, pp. 481_499, Nov.2016.
[19] B. Wang, A. Zubiaga, M. Liakata, and R. Procter, ``Making the mostof tweet-inherent
features for social spam detection on Twitter,'' 2015,arXiv:1503.07405. [Online].
Available:
https://arxiv.org/abs/1503.07405
[21] C.Meda,E.Ragusa,C.Gianoglio,R.Zunino,A.Ottaviano,E.Scillia,andR.Surlinelli,
``SpamdetectionofTwittertraf_c:Aframeworkbasedonrandomforestsandnon-uniform
feature sampling,'' in Proc. IEEE/ACMInt. Conf. Adv. Social Netw. Anal.Mining
(ASONAM), Aug. 2016,pp.811_817.
[26] S. Keretna, A. Hossny, and D. Creighton, ``Recognising user identity inTwitter social
networks via text mining,'' in Proc. IEEE Int. Conf. Syst.,Man, Cybern., Oct. 2013,pp.
3079_3082.
[27] C.Meda,F.Bisio,P.Gastaldo,andR.Zunino,``AmachinelearningapproachforTwitter
spammersdetection,''inProc.Int.CarnahanConf.Secur.Technol.(ICCST),Oct.2014,pp.
1_6.
[29] H. Shen and X. Liu, ``Detecting spammers on Twitter based on contentand social
interaction,'' in Proc. Int. Conf. Netw. Inf. Syst. Comput.,pp. 413_417, Jan.2015.
2017.
[32] F. Pierri and S. Ceri, ``False news on social media: A data-drivensurvey,'' 2019,
arXiv:1902.07539. [Online]. Available:https://arxiv.org/abs/1902.07539
[33] S.Sadiq,Y.Yan,A.Taylor,M.-L.Shyu,S.-C.Chen,andD.Feaster,``AAFA:Associative
af_nityfactoranalysisforbotdetectionandstanceclassificationinTwitter,''inProc.IEEE Int.
Conf. Inf. Reuse Integr. (IRI),Aug. 2017, pp.356_365.
Dependable Secure Comput., vol. 15, no. 4,pp. 551_560, Jul./Aug. 2018.
1)
PERSONAL DETAILS
NAME PriyankaShaileshBhise
DATEOFBIRTH 28-03-1998
ADDRESS At.Post: DongarkhadaTq :Kalamb
MOBILENO 9623634620
EMAIL_ID pihubhise28@gmail.com
EDUCATIONDETAILS
PLACEMENTDETAILS
CampusPlacement(IfAny) No
(IfAny) NameOfCompany
FUTUTREPLANNING
Higher Studies/ HigherStudies -
JobPreferences Job Yes
Training -
Business -
Place: Amravati Signature
Date: 17/05/21 Priyanka S. Bhise
2)
PERSONALDETAILS
NAME Vaishnavi Sunil Uttarwar
DATEOFBIRTH 30-06-1998
ADDRESS ShivajiNagar,Arni
MOBILENO 7249371275
EMAIL_ID vaishnaviuttarwar64@gmail.com
EDUCATIONDETAILS
PLACEMENTDETAILS
CampusPlacement(IfAny) No
(IfAny) NameOfCompany
FUTUTREPLANNING
Higher Studies/ HigherStudies -
JobPreferences Job Yes
Training -
Business -
Place: Amravati Signature
Date: 17/05/21 Vaishnavi S Uttarwar
3)
PERSONALDETAILS
NAME Vishal Nana Narote
DATEOFBIRTH 09-05-1999
ADDRESS At Deopur Post DudhaTq And Dist. Buldhana
MOBILENO 8308080614
EMAIL_ID vishalnarote891@gmail.com
EDUCATION DETAILS
% of Marks
Name of Board Passing Year
/CGPA
10thSSC Maharashtra State Board 2015 88
th
12 HSC Maharashtra State Board 2017 63
BE(Degree) SGBAU 2021 75
Bachelor Off Passing Year Marks
Engineering(B. & Month Obt/Out % of Marks Pointer
E) of
st
I Year I-SEM Winter-17 369/600 61.5 6.4
II-SEM Summer-18 384/600 64 7.29
nd
II Year III-SEM Winter-18 415/650 67 7.5
IV-SEM Summer19 481/700 69 7.7
rd
III Year V-SEM Winter-19 499/700 72 8.3
VI-SEM Summer-20 597/700 85 9.8
th
IV Year VII-SEM Winter-20 648/700 93 9.8
VIII-SEM - - - -
PLACEMENTDETAILS
Campus Placement(If Any) Yes
(If Any) Name Of Company Wipro
FUTUTREPLANNING
Higher Studies/ Job Higher Studies -
Preferences Job Yes
Training -
Business -
Place:Buldhana Signature
Date:17/05/2021 Vishal N Narote
4)
PERSONAL DETAILS
NAME Karan SubhashKukade
DATE OF BIRTH 24-02-1999
ADDRESS SantDhnyaneshwar ward, Hinganghat
MOBILE NO 7588302429
EMAIL_ID karankukade18@gmail.com
EDUCATION DETAILS
% of Marks/ CGPA
Name of Board Passing Year
10thSSC Maharashtra State Board 2015 89.20
th
12 HSC Maharashtra State Board 2017 75.54
BE(Degree) SGBAU 2021 68.67
Bachelor Of Passing Year & Marks
Engineering(B. Month Obt / % of Marks Pointer
E) Out of
Ist Year I-SEM Winter-17 380/600 66.33 7.68
II-SEM Summer-18 359/600 59.50 6.46
IInd Year III-SEM Winter-18 378/650 56.92 6.36
IV-SEM Summer19 446/700 62.14 7.08
rd
III Year V-SEM Winter-19 433/700 63.43 7.27
VI-SEM Summer-20 514/700 77.71 9.15
IVth Year VII-SEM Winter-20 414/700 94.14 10
VIII-SEM - - - -
PLACEMENTDETAILS
Campus Placement (If Any) Yes
(If Any) Name Of Company Mindtree
FUTUTREPLANNING
Higher Studies/ Job Higher Studies -
Preferences Job Yes
Training -
Business -