0% found this document useful (0 votes)
52 views6 pages

IEEE-Paper 1pdf

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views6 pages

IEEE-Paper 1pdf

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Identification of Online Child Predators and Cyber

Harassers in Social Media Environments


K Sowjanya Lakshmi K Sumana Angel G Bhoomika
Dept.of Computer Science and Engineering Dept.of Computer Science and Engineering Dept.of Computer Science and Engineering
Vignan’s Foundation for Vignan’s Foundation for Vignan’s Foundation for
Science,Technology and Research Science,Technology and Research Science,Technology and Research
sowjikolasani29@gmail.com sumana.angel00@gmail.com gavinibhumika@gmail.com

Dr J Vinoj
Assistant Professor
Dept.of Computer Science and Engineering
Vignan’s Foundation for
Science,Technology and Research
vinojbu@gmail.com

Abstract—Social media platforms have become an such as child predators and cyber harrassers. Protecting
essential aspect of our lives in the current digital era, linking people from these internet dangers has become a top priority,
individuals from throughout the globe. Although these especially for kids. Since the early days of the internet, when
platforms have been very beneficial, they have also exposed problems like child predators and cyber harrassers first
children and other vulnerable people to a variety of online
threats.These dangers include cyberbullies and child predators
surfaced, there has been a history of responding to online
who prey on children online and use social media's anonymity threats. Numerous initiatives have been taken to counter these
and reach to hurt other people. In the past, manual reporting risks throughout the years.
and human moderators were mostly depended upon to counter Legal actions, user-focused education campaigns,
these dangers. After users reported suspicious activity, human and the creation of technology-based solutions have all been
moderators looked into the content to see whether it broke any a part of these initiatives. With the development of
site rules. But this reactive strategy frequently caused delays, technology, especially in the areas of data analysis and
allowing dangerous information to proliferate before machine learning, new avenues for the more accurate
appropriate action could be done. Researchers have resorted to detection and mitigation of these online threats opened
machine learning, a subfield of artificial intelligence that enables
computers to learn from data and make predictions, in
up.Practical reasons dictate the necessity for an automated
recognition of the need for more proactive and effective method to detect online child predators and cyber harassers:
solutions. The objective is to create an automated system that -Scale: Manual monitoring is not practicable due to the huge
can quickly and effectively identify possible online child amount of internet material. Effective processing and
predators and cyber harassers by utilizing machine learning analysis of large datasets requires automated methods.
techniques. The suggested machine learning-based strategy has -Speed: Online threats have the potential to grow quickly. To
a number of benefits over conventional techniques. The main stop damage, quick notice and action are essential.
benefit is that it speeds up platform reaction times considerably, -Complexity: Analyzing language, photos, and user behavior
allowing for the quick removal of offensive content and users. patterns is frequently necessary to identify predatory or
Seeing the need for more proactive and efficient solutions,
researchers have turned to machine learning, a branch of
abusive activity. Techniques for data analysis and machine
artificial intelligence that allows computers to learn from data learning can greatly improve.
and make predictions. The goal is to use machine learning Thus, the "Identification of Online Child Predators
techniques to develop an automated system that can rapidly and and Cyber Harassers" application that is being presented is an
efficiently identify potential online child predators and cyber advanced web application that was created with the Django
harassers. Compared to traditional methods, the machine framework. To address the complex issues provided by
learning-based approach that has been proposed has several online dangers, it effortlessly incorporates a number of
advantages. The primary advantage is that it significantly crucial components, such as machine learning algorithms,
accelerates platform response times, enabling the prompt content monitoring, user registration, and an admin panel.
removal of objectionable information and users.
While administrators have the capacity to keep an eye on,
Keywords—child predators,cyber harassers,Natural language evaluate, and take action against dangerous information and
processing,Machine Learning people, users can report suspicious activity they come across.
To summarise, the creation and utilisation of tools and
I. INTRODUCTION systems such as the Django-based application that has been
Social media platforms have completely changed how we shown here play a crucial role in mitigating the enduring
connect and communicate in the current digital era. Along issues presented by cyberbullies and child predators within
side others. Without a question, these platforms have created the modern digital environment.
a great deal of opportunity for global engagement, but they These technologies aim to protect privacy and
have also created a number of serious issues. These security while fostering safer online settings, especially for
difficulties include the presence of people who abuse the children and other vulnerable populations, by integrating
anonymity and reach of the internet for malicious reasons, technology, user interaction, and regulatory compliance.
II. LITERATURE REVIEW when the data was still being trained, but only 46 percent of
Since the early days of social media, online the test set produced findings of F1.
harassment has been a widespread problem, and it still is. The Arijit Josh Chowdhury [9] proposes a disclosure
initial goal of these experiments was to create an automated language model. A linguistic model, a task-specific classifier,
system that could identify and report this kind of wrongdoing. and a specific mediator, namely Twitter, constitute the
Two methods—machine learning and deep learning—have ULMFiT fine-tuning architecture. The overall comparison
been studied to prevent or identify instances of sexual shows the benefits of choosing specific, light-weight mean
harassment and shield kids from bullying in order to provide language models supported by LSTMs and enhanced
a secure atmosphere. Using fuzzy logic and genetic vocabulary by gaining knowledge on the linguistic subtleties
algorithms, the authors of this study [2] monitored the in the deep text that describes sexual harassment.
incidence of cyberbullying on social media platforms. They About 10,000 personal accounts of sexual
recognized and categorized offensive, harassing, racist, and harassment were annotated, and then the neural network
terroristic remarks as well as other cyberbullying-related models produced excellent results in the automatic story
words and actions on social media. The F-measure that was classification with a 92.9% accuracy rate. Therefore, more
obtained was 0.91. To get the right performance and optimize advances were made in the classification with further
parameters, a genetic algorithm is employed. consideration over details of the importance of the features.
Three weighting systems were used by the authors
III. SYSTEM ANALYSIS
in ref. [3] to filter Facebook messages: entropy, modified TF-
IDF for feature selection, and term frequency-inverse The process of identifying malicious information on a site
document frequency (TFIDF). Measurements of a support involves combining several Python modules with machine
vector's recall, accuracy, and precision were made using a learning methods, such as pandas.
Support Vector Machine (SVM). The improved TF-IDF The first step is looking at a number of postings in order to
scheme outperforms the previous schemes, with an accuracy use statistical analysis to identify any malicious activity.
of 96.50%, according to test findings. In order to analyse Those whose degree of suspicion rises over a predetermined
online harassment on Twitter messages as a component of cutoff are then categorized as suspects.
social media competition and harassment (a feature), as well Next, a thorough examination of the postings made by the
as to learn natural language, this study in [4] tested several alleged user is carried out, including any multimedia content
supervised machine learning algorithms. TF-IDF and like pictures, videos, and audio files. Artificial intelligence is
Word2Vec embeddings were used to extract features. used in conjunction with picture and audio analysis
The results accurately covered more than 80% of all techniques to perform this analysis and determine whether the
the forms of harassment that were considered in the data. This suspect is a predator.
study [5] combines a state-of-the-art approach to sentencing The outcomes of this procedure help identify trends in child
grooming. Lastly, information on possible predators is
vectors with emotion analysis. Word vectors are generated
forwarded to law police.
using the Long-Short-Term-Memory, Recurrent Neural
Network (LSTM_RNN) linguistic pattern as a new approach A. EXISTING SYSTEM
to the identification of predators' sex. With a recall of There are now techniques for locating child
81.10%, the last step of extracting the value of emotion from predators on the internet in the areas of gaming, voice chat,
and other online entertainment. By using these techniques,
the SoftMax layer outputs has resulted in a new achievement
parents may shield their kids from sexual exploitation
in accuracy. whether they play online games or engage in voice chats.
The authors in ref. [6] have used CNN to extract However, with the prevalence of the internet in today's world,
features from tags to predict a classifier for Twitter posts a lot of kids are turning to social networking sites as their
holding malevolent intent. They have used these events in a main way to communicate with others.
four-month Twitter dataset to find the conditions around the Because these sites do not have detection systems
story which had carried the evil intentions to create the laws set for sexual predators, sexual predators therefore put
against gender-based violence. The work of the SafeCity children in danger. Currently, the method used has five
Web Community in categorizing and rating various kinds of algorithms to classify the conversations. It includes the
conversation-centered method, which uses the Ridge or
sexual harassment has been described by Sweta Karlekar in
Naive Bayes Classifier while processing the TF-IDF feature
[7]. SafeCity Web uses this experience from the victim's set, and Neural Network Classifier, which also processes the
exchange to help victims to develop online directories, TF-IDF feature set.
provide more comprehensive safety advice services, and help This is our proposed system in which a novel
others find relevant cases to stop more sexual assault. approach will be employed for text and picture categorization.
The single-label CNN-RNN model has a 86.5% This approach will be a regulated machine learning technique
accuracy in processing, connecting, and annotating tags. known as the Support Vector Machine (SVM) used to solve
Espinoza [8] develops a new data set from Twitter in the four problems with two-category categorization.
categories of detecting harassment. They used two models of B. PROPOSED SYSTEM
deep learning architecture, CNN and LSTM, to classify the Our project's goal is to find instances of child
tweets. The measurement of F1 was equally to 55 percent harassment on social media by applying a number of machine
learning techniques, including K-Nearest Neighbors, ⚫ Database Access: This utilizes the pymysql package to
Random Forest, Support Vector Machine (SVM), Naive create a connection to a MySQL database. In order to
Bayes, and Decision Tree. All the models will be trained by get and insert data, including user and post information,
combining phrases and messages that are considered normal it communicates with the database.
with those that are harassing. After it has been trained, the ⚫ Algorithms for machine learning are included in this. A
model will be applied to user postings in order to identify if dataset named "dataset.txt" is loaded and preprocessed,
they include harassing or regular material. text preprocessing is done, and machine learning models
This project uses the Django framework to construct (such SVM, Decision Tree, K-Nearest Neighbors,
a web application. The software seems to have something to Random Forest, and Naive Bayes) are trained to
do with identifying child predators and cyberbullies in social categorize text data. The classifier variable holds the
media settings. This project also shows the backend logic of chosen model for potential usage at a later time.
a web application intended to track and detect child predators ⚫ Web Forms Handling: AddBullyingWords, Signup,
and cyberbullies on social media. Registering, logging in, UserLogin, and AdminLogin are some of the functions
posting, and using machine learning techniques to categorize that handle user-submitted forms and carry out tasks like
text messages as potentially hazardous or not are all options adding information to the database or confirming user
available to users. Web pages displaying the findings are credentials.
available for users and administrators to view. ⚫ File Upload: This manages the uploading of files,
including text files with messages that need to be
categorized and pictures from user profiles.
Normal User Predator ⚫ HTML Templates: To render the user interface, the web
User application uses HTML templates (such as
"index.html," "SendPost.html," "Register.html," and
Detection System
"Admin.html").
⚫ Data processing: This will tokenize words, remove
Post & Comment special characters, and convert text to lowercase as part
of the preprocessing step.
⚫ Classification: Text messages are classified using
Text SVM for Image machine learning models, and the user is shown the
Detection
Classification findings.
⚫ Session Management: Upon successful login, the script
saves the username in a file called "session.txt" and
Dataset maintains user sessions.
⚫ Presentation of findings: HTML templates are utilized
to offer the user with the categorization findings along
with other pertinent data.

Cyber Admin B. ALGORITHM


SVM uses a hyperplane to partition the dataset into
distinct groups; therefore, it helps determine the largest
Figure 1: Proposed system architecture margin. The main objective of the SVM is to find out that
hyperplane of high-dimensional space that will optimally
segregate the data points into several classes. The decision
The following are the primary elements and features: boundary is symbolized by the data points on one side of this
Import Statements: The application begins by importing hyperplane and on the opposite side, the points are
the Django modules and some Python libraries that will be representative of another class.
used. These libraries provide web development, data In short, SVM will try to find a hyperplane that
processing, machine learning, and database access utilities. maximizes margin, which is the gap between the decision
⚫ Global Variables: A number of global variables are border and the nearest data points on both sides. A maximum
introduced at the outset. These will be utilized for data margin will improve SVM's ability to generalize, thus his
processing and machine learning, and they include ability to correctly label unknown data. We will feed labeled
classifier, label_count, X, Y, and corpus. data to our model to make it learn. In the prediction phase,
⚫ Django Views: It defines several Django views, each SVM will match the labeled data with fresh data using the
linked to a unique endpoint in the URL. Index, SendPost, Support Vector Machine.
Register, Admin, Login, AddCyberMessages,
RunAlgorithms, MonitorPost, AddBullyingWords,
Signup, UserLogin, AdminLogin, ViewUsers,
ViewUserPost, word_count, prediction, cal_accuracy,
and classifyPost are some of the methods that are
included in these views. These views manage HTTP
requests and provide HTML templates for various
application pages.
Fig:SVM Algorithm
IV. DATASET other information about posts, much as the "status" field in
The dataset contains two columns: "Tweet" and "Text Label." the "users" table.
Here’s an explanation of what each column represents: V. IMPLEMENTATION AND RESULTS
-Tweet: The brief text messages (sometimes known as
"tweets") in this section are usually about social networking
sites such as Twitter. A user's single tweet is represented by
each row in this column. Usually, tweets are only allowed to
include a particular number of characters (280, for example).
on Twitter).
-Text Label: Each tweet has a label or category listed in this
column. The tweets' content may be categorized or classed
using these labels. Text labels might signify subjects or
themes that are included in the tweet, such "sports,"
"politics," "entertainment," etc., or they could indicate
whether a tweet is favorable, bad, or neutral. Most likely, a
label provided to a related tweet in the "Tweet" column To establish a new user account, navigate to the
correlates with each row in this column. aforementioned screen and activate the "Register
Here" connection.
This dataset is often utilized in machine learning, sentiment
analysis, text categorization, and natural language
processing (NLP) applications. It enables scholars and
analysts to develop and evaluate algorithms that
automatically classify or examine twitter content according
to the given text labels.

A. DATABASE CREATION:

The first set of SQL queries in this project creates a MySQL


database called "cyber" and its two tables, "users" and
"posts."
Please proceed to clicking the "Register" button
Table of Users: displayed above, in order to input the relevant
-username (varchar(50)): It is most likely the intention of information.
this column to hold user account usernames. It can be up to
50 characters long at most.
-password (varchar(50)): It is most likely the intention of
this column to hold passwords for users. In a production
system, passwords ought to be safely hashed and kept, not in
plain text.
-contact_no (varchar(12)): It looks that contact numbers for
users are kept in this column. It has a maximum character
limit of 12.
-email (varchar(50)): Email addresses of users are kept in
this column.
-address (varchar(50)): It appears that user addresses are
stored in this column.
-status (varchar(30)): It looks that this column is used to
store extra information or the status of the user. After completing the sign-up process on the aforementioned
screen, select the "Administrator" link and log in as the
Table for Posts: administrator to view the new user details.
-Sender (varchar(50)): It is most likely the intention of this
column to hold the username or identify of the sender.
-filename (varchar(50)): The name of a file attached to a
post may be stored in this column.
- msg (varchar(300)): It appears that this column is used to
store a post's message or content. It has a 300 character
limit.
-posttime (timestamp): This column is used to record the
post's creation timestamp. When a new post is made, it will
automatically log the date and time.
-status (varchar(50)): This column may provide status or
In order to access the below screen, one must log in In the above Figure, admin needs to select each
as the "admin" user on the aforementioned screen by algorithm one after one then press the 'Submit' button in order
providing "admin" as both the username and password. Upon to train the model and the accuracy of each algorithm is
successful login, the subsequent screen will become shown. Admin has to run these steps again every time he
accessible. reboots the server or every time new bullying messages are
added. He needs to run at least one algorithm to detect
automatically whether a user is a harasser or not.

Now, the administrator can click on the 'View Users' link to


see the list of all users.

The "Send post" page or module in the user interface is shown


in Figure 10.12. Users can write and send a message with a
photo on this page. Important components of this page are:
➢ Text input field: This field allows users to type
messages.
➢ Uploading a file allows users to include a picture or
photo in their post.
➢ The Send button facilitates users in submitting their
In the above screen, the creation of the "Rajesh" posts.
account is demonstrated. The administrator can gain access
to a history of posts made by users by clicking the "Monitor
Posts" button.

View post page for viewing all the messages with


Above figure represents a page or interface where uploaded photos posted by users. All user postings are
administrators can add specific words or messages to a displayed in the Figure , and it is noted that the suggested
dataset. These words are typically considered bullying or system can identify if a message is from a cyber or non-cyber
related to cyber harassment. This dataset is used for training harasser by using machine learning. Based on the dataset
machine learning models to detect such content. records, machine learning models are used here to forecast
the harasser or non-harasser term. Thus, through the use of
the "add words" module by the admin, as earlier stated, admin
can add all the terms that would be a potential harasser and
the ones which are not a potential harasser to the dataset.
Following the addition of terms, algorithms are
linked to train the model, and the suggested application then
automatically forecasts whether the person is harassing or not.
VI. CONCLUSION
The backend logic of a web application with an emphasis on
cyberbullying detection and user management is presented in
this study. It represents a thorough strategy for addressing VII. FUTURE WORK
cyberbullying-related problems and improving user Even though the current application is already very good,
experiences. The application's main selling point is its there is still a lot of space for improvement and growth.
powerful user management features. Users are able to check Some of these include improved user authentication and
their profiles, log in, and signup. A MySQL database is used ongoing machine learning model optimization for Cyber
to carefully store user data, such as usernames, passwords bullying detection through the investigation of various
(which should be further protected using methods like algorithms, feature engineering approaches, and hyper
hashing and salting), contact details, and status. This feature parameter tuning.
serves as the cornerstone for both the user experience and the
REFERENCES
administration of user interactions. The application's ability
to identify cyberbullying is one of its most notable [1] Amer, N. Arabic-sexual-harassment-dataset. Available from
features.From these text messages, the program extracts https:// github.com/Nooramer8/Arabic-sexual-harassment-dataset.
pertinent characteristics, which it then keeps in a dataset for [Accessed 09-10- 2023].
training and inference. Users may submit posts with sender [2] Nandhini BS, Sheeba J. Online social network bullying
names, messages, timestamps, statuses, and filenames (for detection using intelligence techniques. Proc Comput Sci
attached photos) using this application. 2015;45:485–92. doi: https://doi. org/10.1016/j.procs.2015.03.085.
A thorough record of user interactions is formed by the [3] Al-Katheri ASA, Siraj MM. Classification of sexual harassment
careful recording of these posts in a database. The program on Facebook using term weighting schemes. Internat J Innov
Comput 2018;8(1):15–9. doi:
provides a set of user-friendly web pages for user interaction.
https://doi.org/10.11113/ijic.v8n1.157.
Users have access to register, log in, see profiles, write posts, [4] M.Saeidi, S.Sousa, E.Milios, N.Zeh, L.Berton. Categorizing
run machine learning algorithms for detection, add phrases online harassment on Twitter. in Joint European Conference on
linked to cyberbullying to the dataset, and monitor postings. Machine Learning and Knowledge Discovery in Databases. 2019, 3,
User involvement is made easier by this user interface's 283- 297. https://doi.org/10.1007/978-3-030 43887-6_22.
accessibility and intuitiveness. [5] Liu, D., C.Y. Suen, and O. Ormandjieva. A novel way of
Online begging poses hazards, but the costs of identifying cyber predators. 2017, 1712.03903,1-6.
sexual exploitation of children and society are too great to https://doi.org/10.48550/arXiv.1712.03903
ignore. Child groomers usually pose as kids with similar [6] Pandey, R., et al. Distributional semantics approach to detect
intent in Twitter conversations on sexual assaults. in 2018
interests and hobbies in order to build relationships with
IEEE/WIC/ACM International Conference on Web Intelligence
children and obtain access to them. The goal is to establish a (WI). 2018, 1 270-277. https://doi.org/ 10.1109/wi.2018.00-80.
trusting relationship with the youngster. For the purpose of [7] S.Karlekar, and M. Bansal.. Safecity: Understanding diverse
protecting children, our initiative looks for these predators forms of sexual harassment personal stories, arXiv preprint arXiv.
and, if it finds any, notifies the cyber administrative 2018, 2,1-7. https://doi.org/ 10.18653/v1/d18-1303.
authorities right away so that the proper action may be taken. [8] Espinoza I, Weiss F. Detection of harassment on Twitter with
Examining questionable information on a platform entails the deep learning techniques. In Joint European Conference on Machine
following sequential steps: Learning and Knowledge Discovery in Databases 2019;1168:307–
➢ Getting information from the postings made by the 13. doi: https://doi.org/10.1007/978- 3-030-43887-6.
[9] Liu Y et al. Sexual harassment story classification and key
suspected person, including multimedia content like
information identification. In: Proceedings of the 28th ACM
pictures, music, and videos. International Conference on Information and Knowledge
➢ analysis of the acquired data using the NSFW library, Management. p. 2385–8. https://doi.org/10.1145/
artificial intelligence, Urllib, and the IGPL Python 3357384.3358146.
module. [10] C. H. Ngejane, G. Mabuza-Hocquet, J. H. P. Eloff, and S.
➢ Identifying the suspect as a predator or a suspect, Lefophane, “Mitigating online sexual grooming cybercrime on
depending on the results of the investigation. social media using machine learning: A desktop survey,” in 2018
➢ Analyzing kid grooming practices and statistical data to International Conference on Advances in Big Data, Computing and
identify the person as a predator. Data Communication Systems (icABCD), Aug 2018, pp. 1–6.
➢ automatic transmission of the predator categorization to
a server-stored Gmail account.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy