0% found this document useful (0 votes)

6 views9 pages

Social_Media_Fake_Account_Detection_Report_Enhanced_Final

The document outlines a project aimed at developing a machine learning-based system for detecting fake accounts on social media platforms, addressing the growing concern of misinformation and fraud. It details the system's objectives, design, technologies used, evaluation metrics, challenges, and future enhancements, emphasizing the need for adaptability, data privacy, and interdisciplinary collaboration. Key performance goals include maximizing detection precision and ensuring real-time processing while maintaining user data security.

Uploaded by

rohitadwani365

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

Social_Media_Fake_Account_Detection_Report_Enhanced_Final

Uploaded by

rohitadwani365

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

System Design Report

Project Title: Social Media Fake Account Detection

1. Introduction

Fake accounts are a growing concern not only because they skew
analytics and user engagement statistics, but also because they can be
weaponized in coordinated campaigns to mislead the public or execute
financial fraud. The significance of this issue is underscored by numerous
case studies in which bot networks have influenced public opinion or
engagement metrics on major platforms like Twitter and Instagram.

In recent years, the sophistication of fake accounts has grown

exponentially. With AI-driven content generation, fake accounts can now
simulate highly realistic behavior, making detection increasingly complex.
Social media companies are investing heavily in countermeasures, but an
adaptable and data-driven approach remains crucial.

These challenges demand not only technological solutions but also

interdisciplinary collaboration between data scientists, cybersecurity
experts, and sociologists.

Social media platforms have revolutionized the way people communicate and share
information. However, with the increasing popularity of these platforms,
there is also a surge in the number of fake accounts created for malicious purposes such as
spreading misinformation, phishing, spamming, and impersonation.
This project aims to develop a system for detecting such fake accounts using machine
learning techniques. The system leverages user profile features and
behavioral patterns to classify accounts as real or fake.
2. Objective

Additional goals include improving detection precision by incorporating

NLP features and temporal activity tracking. Furthermore, the system
aims to be adaptable across different social media platforms with varying
data availability.

Key performance goals include maximizing precision without sacrificing

recall, ensuring low latency for real-time detection, and maintaining user
data privacy through anonymized feature extraction. The system is
expected to provide actionable insights through dashboards and alert
systems for administrators.

The primary objective of this project is to build a reliable fake account detection system that
can identify suspicious users on social media platforms.
The goal includes data preprocessing, feature selection, model building, evaluation, and
result visualization to support decision-making.
3. System Design

The design ensures modularity, where components such as preprocessing,

prediction, and visualization can be independently scaled or upgraded.
This modularity also allows for easy integration with new data sources or
changes in social media API policies.

Security and reliability are built into the system through access control
mechanisms, encryption protocols for data in transit and at rest, and
regular model audits. The pipeline is also designed to be fault-tolerant
with retry mechanisms and logging for monitoring.

Each system component is decoupled, allowing independent deployment,

which ensures that failures in one service do not propagate to others.
Continuous integration and deployment (CI/CD) pipelines help automate
updates and testing.

3.1 Use Case Diagram

Below is the use case diagram that shows the interaction between the admin and the
system:

The admin interacts with the system by uploading the user data, initiating fake account
detection, viewing analytics, and exporting results.
The system performs preprocessing, prediction, and visualization based on the uploaded
dataset.

3.2 Database Design and Data Storage

Data is stored in structured CSV files with fields like User_ID, Username, Follower_Count,
Following_Count, etc. These features are used for model training
and prediction. Data is stored in cloud platforms (e.g., AWS S3, Firebase Storage) for
scalability and accessibility.

Example CSV Schema:

- User_ID
- Username
- Followers
- Following
- Posts
- Bio_Length
- Profile_Pic_Status (0/1)
- Verified (True/False)
- Creation_Date
- Engagement_Score
- Label (Real/Fake)

3.3 Sequence Diagram / Activity Diagram

Sequence Flow:
1. Admin uploads user data via UI.
2. System performs preprocessing using Pandas.
3. Machine Learning model (Scikit-learn or ANN) is loaded.
4. Predictions are made.
5. Matplotlib is used to generate visual analytics.
6. User downloads/export reports.

Activity Diagram Steps:

- Start
- Upload Data
- Clean and Transform Data
- Train/Load Model
- Predict Labels
- Visualize Output
- Export/Save Results
- End

3.4 Deployment Diagram

The system is deployed with the following architecture:

- Client Node: User interface for admin
- Processing Node: Python backend that runs ML models
- Cloud Node: Storage for input/output files
- Visualization Node: Generates graphical outputs

Components:
- Frontend: Flask/Django Web UI or Jupyter Notebook
- Backend: Python ML scripts (Pandas, Scikit-learn, ANN)
- Storage: Cloud storage (CSV format)

4. Technologies Used

To support scalable deployment, containerization technologies such as

Docker and orchestration tools like Kubernetes can be employed. For NLP,
libraries such as SpaCy and Transformers from Hugging Face are
particularly valuable for entity recognition and sentiment analysis.

In addition to the core stack, integration with visualization libraries such

as Plotly and dashboards with Dash enables real-time analytics.
ElasticSearch can be added for log aggregation and anomaly detection in
behavior trends.

Serverless computing options like AWS Lambda or Google Cloud Functions

can further reduce infrastructure costs and enable dynamic scaling based
on usage spikes.

Programming Language: Python

Libraries and Frameworks:

- Pandas: Data preprocessing
- Scikit-learn: Machine learning algorithms
- Matplotlib: Visualization
- NLP: Text analysis on bios and posts
- ANN: Deep learning for feature-based classification

Data Storage: CSV (cloud hosted)

Hardware: Cloud infrastructure (scalable and distributed)

5. Evaluation Metrics and Results

It is also important to evaluate the robustness of the model across

datasets from different platforms. Cross-validation and testing on unseen
social media datasets ensure that the model generalizes well and
maintains performance.

To ensure fairness and mitigate algorithmic bias, the evaluation includes

subgroup analysis by user demographic, activity type, and content
domain. Additionally, A/B testing can be conducted to measure user-
facing impact when deploying new model versions.

Model performance is evaluated using the following metrics:

- Accuracy
- Precision
- Recall
- F1-Score
- Confusion Matrix

Preliminary Results:
- Accuracy: 92%
- Precision: 90%
- Recall: 93%
- F1 Score: 91%
6. Challenges and Limitations

Another limitation is the difficulty in maintaining up-to-date labeled

datasets, as manual labeling is time-consuming. Additionally, the
detection system might be less effective on newly created fake accounts
that have minimal activity.

An emerging concern is adversarial machine learning, where attackers

attempt to manipulate models by introducing noisy data or mimicking
real users. Techniques such as adversarial training and differential privacy
can be explored as countermeasures.

Ethical considerations also include the potential misclassification of

accounts and the impact on user trust. Therefore, transparency and user
appeals mechanisms should be included in production systems.

- Imbalanced dataset with more real accounts than fake.

- Feature extraction from unstructured bio data is challenging.
- Constant evolution of fake account behavior.
- Scalability of detection for real-time platforms.
7. Future Enhancements

Beyond BERT and LSTM, exploring federated learning approaches could

allow training across multiple platforms without compromising user
privacy. Blockchain technology may also be investigated for
authenticating account provenance and tracking changes over time.

Another promising direction is the use of graph neural networks (GNNs)

for social graph-based inference. Such models leverage relationships and
interactions between users rather than isolated features. Integration with
big data platforms like Apache Spark or Flink will also enable handling of
massive, streaming datasets.

Lastly, ongoing collaboration with academic institutions can facilitate

benchmark datasets and cutting-edge research integration.

- Integrate with social media APIs for real-time detection.

- Enhance NLP analysis using advanced transformers like BERT.
- Use deep learning models like LSTM for behavior tracking.
- Build a user-friendly dashboard for continuous monitoring.

With the sheer volume of users interacting on social media platforms, identifying
inauthentic behavior has become crucial. Platforms must defend against coordinated bot
activity, deepfake profile images, and large-scale misinformation campaigns.

In addition to system accuracy and usability, scalability and resilience are key. The solution
must remain performant under increasing data loads and be adaptable to evolving platform
structures and usage patterns.

It’s also critical that the design prioritizes data privacy and modularity. Each subsystem
should be individually maintainable, and components should communicate via secure, well-
documented APIs. This enables rapid iteration and platform compatibility.

The system supports real-time data ingestion and batch processing, thanks to the
integration of message queuing systems like Apache Kafka and distributed processing
engines. Technologies such as PyCaret and AutoML also reduce development time.
Continued evaluation on unseen data and edge-case scenarios ensures robustness. Metrics
should also reflect resource efficiency and latency under load, helping to assess production-
readiness in high-throughput environments.

To reduce model brittleness, ensemble approaches and frequent retraining cycles are used.
These methods ensure adaptability in response to novel data patterns and attacker
innovations.

Open-source collaboration, plugin architectures, and sandbox testing environments will

also support broader community engagement and sustained research contributions.

Fake Account Detection
100% (1)
Fake Account Detection
34 pages
Social Media Fake Account Prediction Report
No ratings yet
Social Media Fake Account Prediction Report
21 pages
Front Pages
No ratings yet
Front Pages
30 pages
Praveen Final
No ratings yet
Praveen Final
37 pages
Project Report Final Black Book
No ratings yet
Project Report Final Black Book
40 pages
Mini Project Contents Ff
No ratings yet
Mini Project Contents Ff
48 pages
650778797 Fake Account Detection
No ratings yet
650778797 Fake Account Detection
33 pages
Sat - 25.Pdf - Discernment of Autonomous Profiles On Social Networking Services (SNS)
No ratings yet
Sat - 25.Pdf - Discernment of Autonomous Profiles On Social Networking Services (SNS)
11 pages
dev srs
No ratings yet
dev srs
14 pages
Fake_Social_Media_Profile_Detection[1]
No ratings yet
Fake_Social_Media_Profile_Detection[1]
10 pages
Fake Profile Detection
No ratings yet
Fake Profile Detection
4 pages
Batch-21
No ratings yet
Batch-21
20 pages
AI Insta Fake Proj Report
No ratings yet
AI Insta Fake Proj Report
27 pages
Eyeon
No ratings yet
Eyeon
6 pages
aaaaa
No ratings yet
aaaaa
60 pages
Detecting Fake Social Media Profiles Using Blockchain
No ratings yet
Detecting Fake Social Media Profiles Using Blockchain
21 pages
Majestic 13 : The New Generation
No ratings yet
Majestic 13 : The New Generation
34 pages
Abstract1
No ratings yet
Abstract1
9 pages
Social_Media_Fake_Account_Detection_Report_20Pages
No ratings yet
Social_Media_Fake_Account_Detection_Report_20Pages
8 pages
PROJECT-PHASE 1-1 (1)
No ratings yet
PROJECT-PHASE 1-1 (1)
4 pages
Batch - 20 - Final
No ratings yet
Batch - 20 - Final
9 pages
Detailed_Social_Media_Fake_Account_Detection_Report (1)
No ratings yet
Detailed_Social_Media_Fake_Account_Detection_Report (1)
4 pages
Social_Media_Fake_Account_Detection_Full_Report
No ratings yet
Social_Media_Fake_Account_Detection_Full_Report
3 pages
Hydrocephalus
No ratings yet
Hydrocephalus
35 pages
TARP Final Poster
No ratings yet
TARP Final Poster
1 page
1 - Field Development Planning - Intro + FDP Team
100% (2)
1 - Field Development Planning - Intro + FDP Team
53 pages
Attendance_Management_System_Synopsis
No ratings yet
Attendance_Management_System_Synopsis
22 pages
GE 2007 100110-001 - AA0 XP4 Crossing Application Typical Diagrams
No ratings yet
GE 2007 100110-001 - AA0 XP4 Crossing Application Typical Diagrams
88 pages
Vâlcea BT B5-37-82
No ratings yet
Vâlcea BT B5-37-82
46 pages
Philosophy Thesis Antithesis Synthesis
100% (3)
Philosophy Thesis Antithesis Synthesis
5 pages
Manual Fh62c14 Beta Eqpm-1102-150 (En)
100% (1)
Manual Fh62c14 Beta Eqpm-1102-150 (En)
168 pages
Tips For Passing The Civil Service Exam
No ratings yet
Tips For Passing The Civil Service Exam
4 pages
2018 COSSA Track and Field Results
No ratings yet
2018 COSSA Track and Field Results
41 pages
Francis! Francis! X1 Manual
100% (1)
Francis! Francis! X1 Manual
5 pages
Ergonic 3 CE Brochure en
100% (1)
Ergonic 3 CE Brochure en
7 pages
RGAMC Vacancy
No ratings yet
RGAMC Vacancy
3 pages
2012 Probability Past IB Questions
No ratings yet
2012 Probability Past IB Questions
18 pages
Confidence Building
No ratings yet
Confidence Building
4 pages
Ineffective Tissue Perfusion
No ratings yet
Ineffective Tissue Perfusion
2 pages
Design of A Dual Band MIMO Antenna
No ratings yet
Design of A Dual Band MIMO Antenna
20 pages
PSD Line Codes
No ratings yet
PSD Line Codes
3 pages
F2 2I Updateddd
No ratings yet
F2 2I Updateddd
17 pages
Olimpiade Bahasa Inggris Online Kelas 4
No ratings yet
Olimpiade Bahasa Inggris Online Kelas 4
5 pages
MAC Setting X3045
No ratings yet
MAC Setting X3045
15 pages
Dss - Lesson Plan
No ratings yet
Dss - Lesson Plan
3 pages
Tredtri Syllabus
No ratings yet
Tredtri Syllabus
8 pages
Igu All Programs With Fees Eng
No ratings yet
Igu All Programs With Fees Eng
3 pages
Analyzer4导出的联系方式
No ratings yet
Analyzer4导出的联系方式
6 pages
Components and Architecture of GPON FTTH Access Network - FS PDF
No ratings yet
Components and Architecture of GPON FTTH Access Network - FS PDF
4 pages
Wollo University: Kombolcha Institute of Technology College of Informatics
No ratings yet
Wollo University: Kombolcha Institute of Technology College of Informatics
8 pages
ISTJ - The Inspector ISFJ - The Protector INFJ - The Advocate
No ratings yet
ISTJ - The Inspector ISFJ - The Protector INFJ - The Advocate
3 pages
Comprehensive ClickUp Administration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive ClickUp Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Accreditation Program: Biosafety Cabinet Field Certification - Basic
No ratings yet
Accreditation Program: Biosafety Cabinet Field Certification - Basic
1 page
Philcare v. Ca
No ratings yet
Philcare v. Ca
2 pages
AI Systems
From Everand
AI Systems
Anand Vemula
No ratings yet
Unleash Open Source Feature Flag Management: The Complete Guide for Developers and Engineers
From Everand
Unleash Open Source Feature Flag Management: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
ChatGPT Application and Integration Guide: Definitive Reference for Developers and Engineers
From Everand
ChatGPT Application and Integration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cisco AppDynamics Associate Performance Analyst (500-420 CAAPA) – Study Guide
From Everand
Cisco AppDynamics Associate Performance Analyst (500-420 CAAPA) – Study Guide
Anand Vemula
No ratings yet
Study Guide Cisco AppDynamics Professional Implementer (500-430 CAPI)
From Everand
Study Guide Cisco AppDynamics Professional Implementer (500-430 CAPI)
Anand Vemula
No ratings yet
OpenAI Development Guide: Definitive Reference for Developers and Engineers
From Everand
OpenAI Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DataRobot: Practical Automation for Enterprise AI
From Everand
DataRobot: Practical Automation for Enterprise AI
Richard Johnson
No ratings yet
Applied Analytics with Spotfire: Definitive Reference for Developers and Engineers
From Everand
Applied Analytics with Spotfire: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
From Everand
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Operational Monitoring with Datadog: Definitive Reference for Developers and Engineers
From Everand
Operational Monitoring with Datadog: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Funnel.io for Data Integration and Automation: Definitive Reference for Developers and Engineers
From Everand
Funnel.io for Data Integration and Automation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Integration with Blendo: Definitive Reference for Developers and Engineers
From Everand
Data Integration with Blendo: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Quip Productivity and Collaboration Essentials: Definitive Reference for Developers and Engineers
From Everand
Quip Productivity and Collaboration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Software Enigma: Navigating the Metrics Maze
From Everand
The Software Enigma: Navigating the Metrics Maze
Pasquale De Marco
No ratings yet
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
From Everand
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
Margaux Masson-Forsythe
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Operational Monitoring with Stackdriver: Definitive Reference for Developers and Engineers
From Everand
Operational Monitoring with Stackdriver: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
From Everand
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Streamlit Development Essentials: Definitive Reference for Developers and Engineers
From Everand
Streamlit Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to LiquidPlanner: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to LiquidPlanner: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rollbar Implementation and Best Practices: Definitive Reference for Developers and Engineers
From Everand
Rollbar Implementation and Best Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Time Tracking with TimeCamp: Definitive Reference for Developers and Engineers
From Everand
Efficient Time Tracking with TimeCamp: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Guide to H2O.ai: Definitive Reference for Developers and Engineers
From Everand
Practical Guide to H2O.ai: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Error Monitoring with Bugsnag: Definitive Reference for Developers and Engineers
From Everand
Effective Error Monitoring with Bugsnag: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DataDog Operations and Monitoring Guide: Definitive Reference for Developers and Engineers
From Everand
DataDog Operations and Monitoring Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
From Everand
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
Zemelak Goraga
No ratings yet
Boost Your Productivity With AI Tools
From Everand
Boost Your Productivity With AI Tools
Daniel Basso
No ratings yet
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Social_Media_Fake_Account_Detection_Report_Enhanced_Final

Uploaded by

Social_Media_Fake_Account_Detection_Report_Enhanced_Final

Uploaded by

System Design Report

Project Title: Social Media Fake Account Detection

In recent years, the sophistication of fake accounts has grown

These challenges demand not only technological solutions but also

Additional goals include improving detection precision by incorporating

Key performance goals include maximizing precision without sacrificing

The design ensures modularity, where components such as preprocessing,

Each system component is decoupled, allowing independent deployment,

3.1 Use Case Diagram

3.2 Database Design and Data Storage

Example CSV Schema:

3.3 Sequence Diagram / Activity Diagram

Activity Diagram Steps:

3.4 Deployment Diagram

The system is deployed with the following architecture:

To support scalable deployment, containerization technologies such as

In addition to the core stack, integration with visualization libraries such

Serverless computing options like AWS Lambda or Google Cloud Functions

Programming Language: Python

Libraries and Frameworks:

Data Storage: CSV (cloud hosted)

5. Evaluation Metrics and Results

It is also important to evaluate the robustness of the model across

To ensure fairness and mitigate algorithmic bias, the evaluation includes

Model performance is evaluated using the following metrics:

Another limitation is the difficulty in maintaining up-to-date labeled

An emerging concern is adversarial machine learning, where attackers

Ethical considerations also include the potential misclassification of

- Imbalanced dataset with more real accounts than fake.

Beyond BERT and LSTM, exploring federated learning approaches could

Another promising direction is the use of graph neural networks (GNNs)

Lastly, ongoing collaboration with academic institutions can facilitate

- Integrate with social media APIs for real-time detection.

Open-source collaboration, plugin architectures, and sandbox testing environments will

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.