IEEE-Paper 1pdf
IEEE-Paper 1pdf
Dr J Vinoj
Assistant Professor
Dept.of Computer Science and Engineering
Vignan’s Foundation for
Science,Technology and Research
vinojbu@gmail.com
Abstract—Social media platforms have become an such as child predators and cyber harrassers. Protecting
essential aspect of our lives in the current digital era, linking people from these internet dangers has become a top priority,
individuals from throughout the globe. Although these especially for kids. Since the early days of the internet, when
platforms have been very beneficial, they have also exposed problems like child predators and cyber harrassers first
children and other vulnerable people to a variety of online
threats.These dangers include cyberbullies and child predators
surfaced, there has been a history of responding to online
who prey on children online and use social media's anonymity threats. Numerous initiatives have been taken to counter these
and reach to hurt other people. In the past, manual reporting risks throughout the years.
and human moderators were mostly depended upon to counter Legal actions, user-focused education campaigns,
these dangers. After users reported suspicious activity, human and the creation of technology-based solutions have all been
moderators looked into the content to see whether it broke any a part of these initiatives. With the development of
site rules. But this reactive strategy frequently caused delays, technology, especially in the areas of data analysis and
allowing dangerous information to proliferate before machine learning, new avenues for the more accurate
appropriate action could be done. Researchers have resorted to detection and mitigation of these online threats opened
machine learning, a subfield of artificial intelligence that enables
computers to learn from data and make predictions, in
up.Practical reasons dictate the necessity for an automated
recognition of the need for more proactive and effective method to detect online child predators and cyber harassers:
solutions. The objective is to create an automated system that -Scale: Manual monitoring is not practicable due to the huge
can quickly and effectively identify possible online child amount of internet material. Effective processing and
predators and cyber harassers by utilizing machine learning analysis of large datasets requires automated methods.
techniques. The suggested machine learning-based strategy has -Speed: Online threats have the potential to grow quickly. To
a number of benefits over conventional techniques. The main stop damage, quick notice and action are essential.
benefit is that it speeds up platform reaction times considerably, -Complexity: Analyzing language, photos, and user behavior
allowing for the quick removal of offensive content and users. patterns is frequently necessary to identify predatory or
Seeing the need for more proactive and efficient solutions,
researchers have turned to machine learning, a branch of
abusive activity. Techniques for data analysis and machine
artificial intelligence that allows computers to learn from data learning can greatly improve.
and make predictions. The goal is to use machine learning Thus, the "Identification of Online Child Predators
techniques to develop an automated system that can rapidly and and Cyber Harassers" application that is being presented is an
efficiently identify potential online child predators and cyber advanced web application that was created with the Django
harassers. Compared to traditional methods, the machine framework. To address the complex issues provided by
learning-based approach that has been proposed has several online dangers, it effortlessly incorporates a number of
advantages. The primary advantage is that it significantly crucial components, such as machine learning algorithms,
accelerates platform response times, enabling the prompt content monitoring, user registration, and an admin panel.
removal of objectionable information and users.
While administrators have the capacity to keep an eye on,
Keywords—child predators,cyber harassers,Natural language evaluate, and take action against dangerous information and
processing,Machine Learning people, users can report suspicious activity they come across.
To summarise, the creation and utilisation of tools and
I. INTRODUCTION systems such as the Django-based application that has been
Social media platforms have completely changed how we shown here play a crucial role in mitigating the enduring
connect and communicate in the current digital era. Along issues presented by cyberbullies and child predators within
side others. Without a question, these platforms have created the modern digital environment.
a great deal of opportunity for global engagement, but they These technologies aim to protect privacy and
have also created a number of serious issues. These security while fostering safer online settings, especially for
difficulties include the presence of people who abuse the children and other vulnerable populations, by integrating
anonymity and reach of the internet for malicious reasons, technology, user interaction, and regulatory compliance.
II. LITERATURE REVIEW when the data was still being trained, but only 46 percent of
Since the early days of social media, online the test set produced findings of F1.
harassment has been a widespread problem, and it still is. The Arijit Josh Chowdhury [9] proposes a disclosure
initial goal of these experiments was to create an automated language model. A linguistic model, a task-specific classifier,
system that could identify and report this kind of wrongdoing. and a specific mediator, namely Twitter, constitute the
Two methods—machine learning and deep learning—have ULMFiT fine-tuning architecture. The overall comparison
been studied to prevent or identify instances of sexual shows the benefits of choosing specific, light-weight mean
harassment and shield kids from bullying in order to provide language models supported by LSTMs and enhanced
a secure atmosphere. Using fuzzy logic and genetic vocabulary by gaining knowledge on the linguistic subtleties
algorithms, the authors of this study [2] monitored the in the deep text that describes sexual harassment.
incidence of cyberbullying on social media platforms. They About 10,000 personal accounts of sexual
recognized and categorized offensive, harassing, racist, and harassment were annotated, and then the neural network
terroristic remarks as well as other cyberbullying-related models produced excellent results in the automatic story
words and actions on social media. The F-measure that was classification with a 92.9% accuracy rate. Therefore, more
obtained was 0.91. To get the right performance and optimize advances were made in the classification with further
parameters, a genetic algorithm is employed. consideration over details of the importance of the features.
Three weighting systems were used by the authors
III. SYSTEM ANALYSIS
in ref. [3] to filter Facebook messages: entropy, modified TF-
IDF for feature selection, and term frequency-inverse The process of identifying malicious information on a site
document frequency (TFIDF). Measurements of a support involves combining several Python modules with machine
vector's recall, accuracy, and precision were made using a learning methods, such as pandas.
Support Vector Machine (SVM). The improved TF-IDF The first step is looking at a number of postings in order to
scheme outperforms the previous schemes, with an accuracy use statistical analysis to identify any malicious activity.
of 96.50%, according to test findings. In order to analyse Those whose degree of suspicion rises over a predetermined
online harassment on Twitter messages as a component of cutoff are then categorized as suspects.
social media competition and harassment (a feature), as well Next, a thorough examination of the postings made by the
as to learn natural language, this study in [4] tested several alleged user is carried out, including any multimedia content
supervised machine learning algorithms. TF-IDF and like pictures, videos, and audio files. Artificial intelligence is
Word2Vec embeddings were used to extract features. used in conjunction with picture and audio analysis
The results accurately covered more than 80% of all techniques to perform this analysis and determine whether the
the forms of harassment that were considered in the data. This suspect is a predator.
study [5] combines a state-of-the-art approach to sentencing The outcomes of this procedure help identify trends in child
grooming. Lastly, information on possible predators is
vectors with emotion analysis. Word vectors are generated
forwarded to law police.
using the Long-Short-Term-Memory, Recurrent Neural
Network (LSTM_RNN) linguistic pattern as a new approach A. EXISTING SYSTEM
to the identification of predators' sex. With a recall of There are now techniques for locating child
81.10%, the last step of extracting the value of emotion from predators on the internet in the areas of gaming, voice chat,
and other online entertainment. By using these techniques,
the SoftMax layer outputs has resulted in a new achievement
parents may shield their kids from sexual exploitation
in accuracy. whether they play online games or engage in voice chats.
The authors in ref. [6] have used CNN to extract However, with the prevalence of the internet in today's world,
features from tags to predict a classifier for Twitter posts a lot of kids are turning to social networking sites as their
holding malevolent intent. They have used these events in a main way to communicate with others.
four-month Twitter dataset to find the conditions around the Because these sites do not have detection systems
story which had carried the evil intentions to create the laws set for sexual predators, sexual predators therefore put
against gender-based violence. The work of the SafeCity children in danger. Currently, the method used has five
Web Community in categorizing and rating various kinds of algorithms to classify the conversations. It includes the
conversation-centered method, which uses the Ridge or
sexual harassment has been described by Sweta Karlekar in
Naive Bayes Classifier while processing the TF-IDF feature
[7]. SafeCity Web uses this experience from the victim's set, and Neural Network Classifier, which also processes the
exchange to help victims to develop online directories, TF-IDF feature set.
provide more comprehensive safety advice services, and help This is our proposed system in which a novel
others find relevant cases to stop more sexual assault. approach will be employed for text and picture categorization.
The single-label CNN-RNN model has a 86.5% This approach will be a regulated machine learning technique
accuracy in processing, connecting, and annotating tags. known as the Support Vector Machine (SVM) used to solve
Espinoza [8] develops a new data set from Twitter in the four problems with two-category categorization.
categories of detecting harassment. They used two models of B. PROPOSED SYSTEM
deep learning architecture, CNN and LSTM, to classify the Our project's goal is to find instances of child
tweets. The measurement of F1 was equally to 55 percent harassment on social media by applying a number of machine
learning techniques, including K-Nearest Neighbors, ⚫ Database Access: This utilizes the pymysql package to
Random Forest, Support Vector Machine (SVM), Naive create a connection to a MySQL database. In order to
Bayes, and Decision Tree. All the models will be trained by get and insert data, including user and post information,
combining phrases and messages that are considered normal it communicates with the database.
with those that are harassing. After it has been trained, the ⚫ Algorithms for machine learning are included in this. A
model will be applied to user postings in order to identify if dataset named "dataset.txt" is loaded and preprocessed,
they include harassing or regular material. text preprocessing is done, and machine learning models
This project uses the Django framework to construct (such SVM, Decision Tree, K-Nearest Neighbors,
a web application. The software seems to have something to Random Forest, and Naive Bayes) are trained to
do with identifying child predators and cyberbullies in social categorize text data. The classifier variable holds the
media settings. This project also shows the backend logic of chosen model for potential usage at a later time.
a web application intended to track and detect child predators ⚫ Web Forms Handling: AddBullyingWords, Signup,
and cyberbullies on social media. Registering, logging in, UserLogin, and AdminLogin are some of the functions
posting, and using machine learning techniques to categorize that handle user-submitted forms and carry out tasks like
text messages as potentially hazardous or not are all options adding information to the database or confirming user
available to users. Web pages displaying the findings are credentials.
available for users and administrators to view. ⚫ File Upload: This manages the uploading of files,
including text files with messages that need to be
categorized and pictures from user profiles.
Normal User Predator ⚫ HTML Templates: To render the user interface, the web
User application uses HTML templates (such as
"index.html," "SendPost.html," "Register.html," and
Detection System
"Admin.html").
⚫ Data processing: This will tokenize words, remove
Post & Comment special characters, and convert text to lowercase as part
of the preprocessing step.
⚫ Classification: Text messages are classified using
Text SVM for Image machine learning models, and the user is shown the
Detection
Classification findings.
⚫ Session Management: Upon successful login, the script
saves the username in a file called "session.txt" and
Dataset maintains user sessions.
⚫ Presentation of findings: HTML templates are utilized
to offer the user with the categorization findings along
with other pertinent data.
A. DATABASE CREATION: