Social_Media_Fake_Account_Detection_Report_Enhanced_Final
Social_Media_Fake_Account_Detection_Report_Enhanced_Final
1. Introduction
Fake accounts are a growing concern not only because they skew
analytics and user engagement statistics, but also because they can be
weaponized in coordinated campaigns to mislead the public or execute
financial fraud. The significance of this issue is underscored by numerous
case studies in which bot networks have influenced public opinion or
engagement metrics on major platforms like Twitter and Instagram.
Social media platforms have revolutionized the way people communicate and share
information. However, with the increasing popularity of these platforms,
there is also a surge in the number of fake accounts created for malicious purposes such as
spreading misinformation, phishing, spamming, and impersonation.
This project aims to develop a system for detecting such fake accounts using machine
learning techniques. The system leverages user profile features and
behavioral patterns to classify accounts as real or fake.
2. Objective
The primary objective of this project is to build a reliable fake account detection system that
can identify suspicious users on social media platforms.
The goal includes data preprocessing, feature selection, model building, evaluation, and
result visualization to support decision-making.
3. System Design
Security and reliability are built into the system through access control
mechanisms, encryption protocols for data in transit and at rest, and
regular model audits. The pipeline is also designed to be fault-tolerant
with retry mechanisms and logging for monitoring.
The admin interacts with the system by uploading the user data, initiating fake account
detection, viewing analytics, and exporting results.
The system performs preprocessing, prediction, and visualization based on the uploaded
dataset.
Data is stored in structured CSV files with fields like User_ID, Username, Follower_Count,
Following_Count, etc. These features are used for model training
and prediction. Data is stored in cloud platforms (e.g., AWS S3, Firebase Storage) for
scalability and accessibility.
Sequence Flow:
1. Admin uploads user data via UI.
2. System performs preprocessing using Pandas.
3. Machine Learning model (Scikit-learn or ANN) is loaded.
4. Predictions are made.
5. Matplotlib is used to generate visual analytics.
6. User downloads/export reports.
Components:
- Frontend: Flask/Django Web UI or Jupyter Notebook
- Backend: Python ML scripts (Pandas, Scikit-learn, ANN)
- Storage: Cloud storage (CSV format)
4. Technologies Used
Preliminary Results:
- Accuracy: 92%
- Precision: 90%
- Recall: 93%
- F1 Score: 91%
6. Challenges and Limitations
With the sheer volume of users interacting on social media platforms, identifying
inauthentic behavior has become crucial. Platforms must defend against coordinated bot
activity, deepfake profile images, and large-scale misinformation campaigns.
In addition to system accuracy and usability, scalability and resilience are key. The solution
must remain performant under increasing data loads and be adaptable to evolving platform
structures and usage patterns.
It’s also critical that the design prioritizes data privacy and modularity. Each subsystem
should be individually maintainable, and components should communicate via secure, well-
documented APIs. This enables rapid iteration and platform compatibility.
The system supports real-time data ingestion and batch processing, thanks to the
integration of message queuing systems like Apache Kafka and distributed processing
engines. Technologies such as PyCaret and AutoML also reduce development time.
Continued evaluation on unseen data and edge-case scenarios ensures robustness. Metrics
should also reflect resource efficiency and latency under load, helping to assess production-
readiness in high-throughput environments.
To reduce model brittleness, ensemble approaches and frequent retraining cycles are used.
These methods ensure adaptability in response to novel data patterns and attacker
innovations.