0% found this document useful (0 votes)

24 views32 pages

EMAIL SPAM FINAL (2)

The document presents a project report on 'Email/SMS Spam Detection Using Machine Learning' aimed at developing a system to identify and filter spam messages in digital communications. It discusses the limitations of traditional spam detection methods and proposes a machine learning-based approach that utilizes natural language processing for improved accuracy and context-aware analysis. The report outlines the project's objectives, scope, system requirements, and methodology, emphasizing the importance of adapting to evolving spam tactics.

Uploaded by

asranganathan59

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views32 pages

EMAIL SPAM FINAL (2)

Uploaded by

asranganathan59

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

EMAIL/SMS SPAM DETECTION USING

MACHINE LEARNING

A PROJECT REPORT

Submitted by

Pavethra M (621522205037)
Pavithra S (621522205038)
Dhushara S (621522205015)
Dhanya R (621520205013)

in partial fulfillment for the award of the degree

BACHELOR OF TECHNOLOGY

INFORMATION TECHNOLOGY

MAHENDRA COLLEGE OF ENGINEERING,

MAHENDRA SALEM CAMPUS-636106.

ANNA UNIVERSITY::CHENNAI 600 025

MAY 2025
i
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this Naan Mudhalvan & TNDSC Skill Development course of project report
“EMAIL/SMS SPAM DETECTION USING MACHINE LEARNING ” is the
bonafide work done by

Pavethra M (621522205037)
Pavithra S (621522205038)
Dhushara S (621522205015)
Dhanya R (621522205013)

Who carried out the project work under my supervision.

SIGNATURE SIGNATURE
Dr. T. AKILA, M.E, Ph.D., Dr. T. AKILA, M.E, Ph.D.,
ASSOCIATE PROFESSOR, ASSOCIATE PROFESSOR,
HEAD OF THE DEPARTMENT, SUPERVISIOR,
Department of Information Technology, Department of Information Technology,
Mahendra College of Engineering, Mahendra College of Engineering,
Minnampalli, Salem-636106. Minnampalli, Salem-636106.

Submitted to Project and Viva-Voce Examination held on at MCE.

INTERNAL EXAMINER EXTERNAL EXAMINER

ii
ACKNOWLEDGEMENT

The Success and final outcome of this project required a lot of guidance and assistance
from many people and an extremely fortunate to have got this all along with the completion
of my project work.

We request and thank Thirumigu. M. G. BHARATHKUMAR, Founder &

Chairman, Shrimathi. VALLIYAMMAL BHARATHKUMAR, Secretary for their
guidance and blessings, also we express our deepest gratitude to Managing Directors Er. Ba.
MAHENDHIRAN, Er. B. MAHA AJAY PRASATH, who modeled us both technically and
morally for achieving greater success in life.

We were extremely grateful to Dr. N. MOHANASUNDARARAJU, Principal for his

constant encouragement, inspiration, presence, and blessings throughout our course,
especially for providing us with an environment to complete our project successfully.

We also extend my sincere appreciation to Dr. T.AKILA, Head of the Department of

Information Technology who provided her valuable suggestions and precious time in
accomplishing my project report.

We owe my profound gratitude to our guide, Dr. T.AKILA , Head of the Department
of Information Technology who took an interest in our projectwork and provided all the
necessary information for developing the project successfully. We also thank all the staff
members of our college and technicians for their help in making this project a successful one.

Lastly, we would like to thank the almighty and my parents for their moral support and
my friends with whom shared my day-to-day experience and received lots of suggestions that
improved my quality of work.

iii
ABSTRACT

Nowadays communication plays a major role in everything be it professional or

personal. Email communication service is being used extensively because of its free use
services, low-cost operations, accessibility, and popularity. Emails have one major
security flaw that is anyone can send an email to anyone just by getting their unique
user id. This security flaw is being exploited by some businesses and ill-motivated
persons for advertising, phishing, malicious purposes, and finally fraud. This produces
a kind of email category called SPAM.

Spam refers to any email that contains an advertisement, unrelated and frequent
emails. These emails are increasing day by day in numbers. Studies show that around
55 percent of all emails are some kind of spam. A lot of effort is being put into this
by service providers. Spam is evolving by changing the obvious markers of detection.
Moreover, the spam detection of service providers can never be aggressive with
classification because it may cause potential information loss to incase of a
misclassification.

To tackle this problem we present a new and efficient method to detect spam
using machine learning and natural language processing. A tool that can detect and
classify spam. In addition to that, it also provides information regarding the text
provided in a quick view format for user convenience.

iv
TABLE OF CONTENTS

CHAPTER CONTENTS PAGE

NO NO

ABSTRACT Iv

1 INTRODUCTION 1

1.1 OVERVIEW 1

1.2 SCOPE OF PROJECT 1

1.3 OBJECTIVE OF THE PROJECT 2

2 LITERARTUTE REVIEW 3

2.1 SPAM DETECTION TECHNIQUES 3

2.2 MACHINE LEARNING IN TEXT 3

CLASSIFICATION

2.3 NATURAL LANGUAGE PROCESSING 3

FOR SPAM FILTERING

2.4 FEATURE EXTRACTION METHODS 4

2.5 SMS SPAM DETECTION DATASETS 4

v
2.6 EMAIL SPAM FILTERING SYSTEMS 4

3 SYSTEM ANALYSIS 5

3.1 EXISTING SYSTEM 5

3.1.1 DISADVANTAGE OF EXISTING 5

SYSTEM

6
3.2 PROPOSED SYSTEM

3.2.1 ADVANTAGE OF PROPOSED SYSTEM 6

4 SYSTEM REQUIREMENTS 7

4.1 HARDWARE REQUIREMENTS 7

4.2 SOFTWARE REQUIREMENTS 7

5 SYSTEM DESIGNAND DEVELOPMENT 8

5.1 ML ARCHITECTURE OVERVIEW 8

5.2 SYSTEM ARCHITECURE 8

5.3 DATA COLLECTION 9

5.4 DATA PRE -PROCESSING 9

5.5 TEXTING DATA SET 9

5.6 ALGORITHM SELECTION 9

vi
5.6.1 ALGORITHM USED 10

5.6.2 NAIVE BAYES CLASSIFIER 10

5.6.3 RANDOM FOREST 10

5.6.4 SUPPORT VECTOR MACHINE (SVM) 10

5.6.5 NEURAL NETWORKS 11

5.6.6 MODEL PERFORMANCE COMPARISON 11

6 UML DIAGRAMS 12

6.1 CLASS DIAGRAM 12

6.2 USE CASE DIAGRAM 13

6.3 ACTIVITY DIAGRAM 14

7 PERFORMANCE ANALYSIS 15

7.1 ABOUT THE DATASET 15

7.2 ACCURACY COMPARISON OF 16

ALGORITHMS

7.3 METHODOLOGY 17

8 CONCULSION 18

9 FUTURE ENHANCEMENT 19

vii
10 APPENDIX 20

10.1 SOURCE CODE 20

10.2 OUTPUT SCREENS 22

11 BIBLIOGRAPHY 23

viii
CHAPTER-1
INTRODUCTION

1.1 OVERVIEW

Email/SMS Spam Detection Using Machine Learning is an intelligent system

designed to identify and filter unsolicited and potentially harmful messages in digital
communication platforms. This application leverages modern machine learning techniques
to analyze the textual content of emails and SMS messages, enabling accurate classification
into spam or legitimate (ham) categories. The platform is built to assist both individual users
and organizations in protecting themselves against spam, phishing attempts, and fraudulent
content that could compromise security or productivity.

1.2 SCOPE OF THE PROJECT

The scope of the Email/SMS Spam Detection Using Machine Learning project is to
develop a robust and intelligent system capable of automatically identifying and filtering
spam messages in both email and SMS communications. This system is designed to aid
users—including individuals, enterprises, and digital communication platforms—by
reducing exposure to unsolicited, malicious, or fraudulent content. The application leverages
a combination of natural language processing (NLP), machine learning algorithms, and real-
time data processing to classify messages with high accuracy.

While the project focuses on the intelligent detection and classification of spam
messages, it does not encompass direct integration with existing email/SMS providers or
mobile network operators, nor does it include features for data encryption, cybersecurity
measures, or phishing site blocking. The primary goal is to demonstrate the effectiveness of
machine learning in spam detection and provide a foundation for future development and
integration into larger communication security frameworks.

1
1.3 OBJECTIVE OF THE PROJECT

In this study, the Email/SMS Spam Detection Using Machine Learning project aims
to empower users by providing them with a reliable, intelligent system that can
automatically detect and filter spam messages from genuine communication. By integrating
key components such as natural language processing, text classification, feature extraction
techniques, and machine learning algorithms into a single, user-friendly application, the
project seeks to simplify the process of identifying unwanted or malicious content and
enhance the overall communication experience.
This system is designed to improve digital communication security and efficiency by
offering accurate spam classification, timely alerts, and real-time message analysis. Through
predictive models and continual learning, it helps users avoid phishing attempts, reduce
distractions from unsolicited messages, and ensure that only relevant and safe content
reaches their inbox. Ultimately, the platform bridges the gap between traditional rule-based
spam filters and modern AI-powered detection, enabling smarter, faster, and more adaptable
spam management for both personal and organizational use.

2
CHAPTER-2
LITERATURE REVIEW

2.1 TITLE: SPAM DETECTION TECHNIQUES

Authors:Dr.RaviMenon
This literature review explores various traditional and modern techniques used to
detect spam in email and SMS communications. It discusses rule-based filtering, blacklists,
heuristic-based systems, and more recent approaches like machine learning and statistical
models. The review highlights how the evolution of spam patterns has necessitated the use
of adaptive and intelligent detection mechanisms to ensure high accuracy and reliability.

2.2 TITLE: MACHINE LEARNING IN TEXT CLASSIFICATION

Authors:Dr.AnjaliSharma
This review focuses on the application of machine learning algorithms for classifying
text messages into spam and non-spam categories. It covers techniques such as Naive Bayes,
Support Vector Machines, and Decision Trees. The study emphasizes how supervised
learning with labeled datasets improves prediction accuracy and enables automated, scalable
filtering of unwanted message.

2.3 TITLE: NATURAL LANGUAGE PROCESSING FOR SPAM FILTERING

Authors:Dr.KaranVerma
This review discusses the importance of Natural Language Processing (NLP) in pre-
processing and analyzing textual data for spam detection. Techniques such as tokenization,
stop-word removal, and part-of-speech tagging are reviewed for their roles in enhancing the
performance of classification models. NLP enables a deeper understanding of message
context, improving model effectiveness.

3
2.4 TITLE: FEATURE EXTRACTION METHODS (TF-IDF, BAG OF WORDS)
Authors:Dr.SnehaJoshi
This review examines the most widely used feature extraction methods in spam
detection systems, namely TF-IDF (Term Frequency–Inverse Document Frequency) and
Bag of Words. It evaluates how these techniques convert raw text into numerical
representations that can be processed by machine learning algorithms and compares their
strengths, limitations, and use cases.

2.5 TITLE: SMS SPAM DETECTION DATASETS

Authors:Dr.RahulMehta
This literature survey analyzes publicly available datasets used in SMS spam detection
research, such as the UCI SMS Spam Collection. It evaluates the dataset quality, balance
between spam and ham messages, preprocessing needs, and applicability for training,
validation, and benchmarking of machine learning models.

2.6 TITLE: EMAIL SPAM FILTERING SYSTEMS

Authors:Prof.JenniferThomas
This review investigates the design and implementation of spam filtering systems in
email platforms. It discusses the architecture of systems deployed in major email services,
the use of Bayesian filters, adaptive learning mechanisms, and integration with user
feedback. The effectiveness of combining multiple filtering techniques is also highlighted.

4
CHAPTER-3
SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

In the traditional spam detection systems used in email and SMS platforms, the
approach largely relies on rule-based filters and keyword matching. These systems operate
by scanning incoming messages for specific terms or phrases commonly associated with
spam and then flagging or filtering such messages accordingly. While this method has
served as a foundational defense against unwanted communications, it is increasingly
inadequate in the face of evolving spam techniques and the dynamic nature of digital
communication.

3.1.1 DISADVANTAGES OF EXISTING SYSTEM

1. Lack of Adaptability to Evolving Spam Tactics
Traditional rule-based spam filters are rigid and often unable to adapt to new and
sophisticated spam techniques.
2. High False Positives and False Negatives
Conventional systems struggle to accurately distinguish between spam and
legitimate messages.
3. Limited Context Understanding
Rule-based systems operate primarily on keyword detection and lack the ability to
understand context or semantic meaning in messages.
4. Inability to Process Short or Informal Texts
SMS messages are often short, unstructured, and written in informal language or
abbreviations.

5
3.2 PROPOSED SYSTEM
The proposed system introduces an intelligent, machine learning-based approach to
automatically detect and filter spam messages in both email and SMS communications.
This system is designed to overcome the limitations of traditional rule-based filters by
leveraging data-driven algorithms that can learn, adapt, and improve over time. It aims to
offer a more accurate, context-aware, and scalable solution for identifying unwanted or
malicious messages.
3.2.1.1 ADVANTAGE OF PROPOSED SYSTEM
1. Improved Accuracy Through Machine Learning

The proposed system leverages supervised machine learning algorithms trained on

large datasets, enabling it to accurately classify messages as spam or non-spam.

2. Context-Aware Message Analysis

By incorporating Natural Language Processing (NLP) techniques, the system is

capable of understanding the context and semantics of messages.

3. Support for Both Email and SMS Formats

The system is designed to handle different types of message formats, including the
short, informal language common in SMS. This makes it versatile and applicable across
multiple communication platforms and use cases.

4. User Feedback Integration

The platform allows users to provide feedback on message classifications. This

feedback loop enables continuous learning, personalization, and improved model
performance over time, making the system more responsive to individual user needs.

6
CHAPTER-4
SYSTEM REQUIREMENTS

4.1 HARDWARE REQUIREMENTS

 PROCESSOR: MINIMUM DUAL-CORE CPU (QUAD-CORE OR HIGHER

RECOMMENDED FOR MODEL TRAINING AND BATCH PROCESSING)

 RAM: MINIMUM 4 GB (8 GB OR HIGHER RECOMMENDED)

 STORAGE: MINIMUM 20 GB OF FREE DISK SPACE (FOR DATASETS, LOGS,

AND MODEL FILES)

 NETWORK: STABLE BROADBAND INTERNET CONNECTION

4.2 SOFTWARE REQUIREMENTS

 OPERATING SYSTEM: WINDOWS 10 OR LATER, UBUNTU/LINUX (PREFERRED

FOR PYTHON-BASED ML DEVELOPMENT), OR MACOS
 PROGRAMMING LANGUAGE: PYTHON 3.8 OR HIGHER (USED FOR DATA
PROCESSING, MODEL TRAINING, AND DEPLOYMENT)
 LIBRARIES & FRAMEWORKS:
 SCIKIT-LEARN: FOR MACHINE LEARNING MODELS
 NUMPY & PANDAS: FOR DATA MANIPULATION AND ANALYSIS
 MATPLOTLIB & SEABORN: FOR VISUALIZATION
 DEVELOPMENT ENVIRONMENT: JUPYTER NOTEBOOK, VS CODE.
 PACKAGE MANAGER: PIP OR CONDA
 WEB FRAMEWORK : STREAMLIT (FOR BUILDING UI OR API INTERFACE)
 BROWSER: LATEST VERSION OF CHROME, FIREFOX, EDGE, OR SAFARI.
7
CHAPTER-5

SYSTEMDESIGN AND DEVELOPMENT

5.1 ML ARCHITECTUREOVERVIEW

5.2 SYSTEM ARCHITECURE

Figure 5.2 System Architecture

8
5.3 DATA COLLECTION

Data was collected from publicly available sources such as the UCI SMS Spam
Collection, Enron Email Dataset, and Kaggle repositories. These datasets include labeled
spam and non-spam messages from diverse formats. All data was anonymized to ensure
privacy compliance and balanced using synthetic sampling techniques where necessary.

5.4 DATA PRE-PROCESSING

Collected messages were cleaned and prepared using text preprocessing techniques
such as lowercasing, punctuation removal, tokenization, stop-word removal, stemming,
and lemmatization. The processed text was converted into numerical vectors using TF-
IDF or Bag of Words, making it suitable for machine learning model input.

5.5 TESTING DATASET

A portion of the dataset was reserved for testing. It underwent the same
preprocessing as the training data. Model performance was evaluated using metrics like
accuracy, precision, recall, and F1-score to ensure effective generalization to new
messages.

5.6 ALGORITHM SELECTION

Several algorithms were tested, including Naive Bayes, SVM, Random Forest, and
Logistic Regression. Deep learning models like CNN and LSTM were also considered.
The best-performing model, based on evaluation metrics, was selected for deployment.

9
5.6.1 ALGORITHM USED

To effectively detect spam messages, several machine learning algorithms were

explored and implemented. Each algorithm was evaluated based on its ability to classify
email and SMS messages accurately while minimizing false positives and negatives. The
algorithms selected for this study include Naive Bayes, Random Forest, Support Vector
Machine (SVM), and Neural Networks. These models were chosen for their proven
efficiency in text classification tasks and their adaptability to different data structures.

5.6.2 NAIVE BAYES CLASSIFIER

Naive Bayes is a probabilistic algorithm based on Bayes' Theorem, assuming

independence among features. It is particularly effective in spam detection due to its
simplicity and speed. The model calculates the probability of a message being spam or ham
based on word frequency. Despite its simplicity, Naive Bayes often performs competitively
with more complex models, especially on well-preprocessed textual data.

5.6.3 RANDOM FOREST

Random Forest is an ensemble learning method that constructs multiple decision

trees during training and outputs the majority vote of the trees for classification. It handles
high-dimensional data well and is robust against overfitting. In spam detection, Random
Forest can capture non-linear patterns and perform feature importance analysis, enhancing
model interpretability and performance.

5.6.4 SUPPORT VECTOR MACHINE (SVM)

Support Vector Machine is a powerful supervised learning algorithm used for binary

10
classification. It identifies the optimal hyperplane that best separates spam from non-spam
messages in the feature space. SVM is particularly useful when dealing with high-
dimensional data like TF-IDF vectors and is known for its high accuracy and robustness in
spam filtering.

5.6.5 NEURAL NETWORKS

Neural Networks, including architectures like CNN and LSTM, are used to model
complex and contextual patterns in text data. These deep learning models automatically
learn representations from raw input without manual feature engineering. While they
require more computational resources and training time, they can offer superior accuracy,
especially with large and diverse datasets.

5.6.6 MODEL PERFORMANCE COMPARISON

Each algorithm was trained and tested on the same pre-processed dataset. Their
performance was evaluated using metrics such as accuracy, precision, recall, F1-score, and
ROC-AUC.

 Naive Bayes demonstrated fast performance with reasonable accuracy.

 Random Forest offered a good balance between accuracy and

interpretability.

 SVM provided high precision and was effective in minimizing false positives.

 Neural Networks achieved the highest accuracy but required more

computational resources.

Based on this comparison, the model best suited for deployment was selected considering
both performance and implementation feasibility.
11
CHAPTER-6
UML DIAGRAMS

6.1 CLASS DIAGRAM

Figure 6.1 Class Diagram

12
6.2 USE CASE DIAGRAM

Figure 6.2 Use Case Diagram

13
6.3 ACTIVITY DIAGRAM

Figure 7.3 Activity Diagram

14
CHAPTER-7

PERFORMANCE ANALYSIS

7.1 ABOUT THE DATASET

The dataset used for training and evaluating the spam detection model consists of:

 Source: Publicly available datasets such as SMS Spam Collection Dataset

(UCI) or Enron Email Dataset.

 Features:

o Text content (Email/SMS body)

o Labels (Spam/Ham)

 Preprocessing Steps:

o Text Cleaning: Removing special characters, stopwords, and lowercasing.

o Tokenization & Vectorization: Using TF-IDF or Word2Vec for feature

extraction.

o Balancing: Handling class imbalance (if any) using

oversampling/undersampling.

Dataset Statistics:

Class Count

Spam 1,000

Ham 4,000

15
7.2 ACCURACY COMPARISON OF ALGORITHMS
Different Machine Learning algorithms were tested for spam detection, and
their performance was compared using metrics like Accuracy, Precision, Recall, and F1-
Score.
Comparison Table:

Algorithm Accuracy (%) Precision Recall F1-Score

Naive Bayes 92.5 0.91 0.88 0.89

Logistic Regression 94.2 0.93 0.91 0.92

Random Forest 95.1 0.94 0.93 0.93

SVM 93.8 0.92 0.90 0.91

Deep Learning (LSTM) 96.0 0.95 0.94 0.94

Observations:
 Naive Bayes is fast but less accurate for complex spam patterns.
 Logistic Regression provides a good balance between speed and accuracy.
 Random Forest & SVM perform well but may overfit on small datasets.
 LSTM (Deep Learning) achieves the highest accuracy but requires more
computational power.

16
7.3 METHODOLOGY

The spam detection system was developed using the following steps:
1. Data Collection & Preprocessing
 Gathered labeled datasets (spam/ham).
 Cleaned text data (removed noise, normalized text).
 Applied TF-IDF/Word Embeddings for feature extraction.
2. Model Training
 Split data into 70% training, 30% testing.
 Trained multiple ML models (Naive Bayes, SVM, Random Forest, etc.).
 Implemented LSTM for deep learning-based classification.
3. Evaluation Metrics
 Used Confusion Matrix, Precision, Recall, F1-Score.
 Compared model performance to select the best one.
4. Deployment (Optional)
 Integrated the best model into a Flask/Django web app or Android app.
 Enabled real-time spam detection for emails/SMS.
Conclusion
 Best Performing Model: LSTM (Deep Learning) with 96% accuracy.
 Best Trade-off Model: Random Forest (95.1% accuracy, less resource-
intensive).
 Future improvements could include BERT-based models for better
contextual understanding.
This analysis confirms that machine learning effectively detects spam, with deep
learning providing the highest accuracy at the cost of higher computational requirements.

17
CHAPTER-8
CONCULSION
The primary goal of this project was to develop an intelligent, automated system
capable of detecting and classifying email and SMS messages as spam or legitimate (ham)
using machine learning techniques. Through the integration of text preprocessing methods,
feature extraction strategies such as TF-IDF and Bag of Words, and classification algorithms
including Naive Bayes, Random Forest, SVM, and Neural Networks, the system successfully
demonstrated its ability to accurately identify unwanted or harmful messages.
The implementation of Natural Language Processing (NLP) techniques significantly
enhanced the system’s ability to interpret message content, while supervised learning
algorithms enabled precise spam detection based on historical data. Among the evaluated
models, Neural Networks and Support Vector Machines yielded the highest accuracy, though
Naive Bayes offered superior performance in terms of speed and computational efficiency.
This project not only underscores the potential of machine learning in combating spam
across digital communication channels but also highlights the importance of continual model
training and data updates to stay ahead of evolving spam tactics. The system was designed to
be scalable, adaptable, and user-friendly, making it applicable in real-world scenarios ranging
from individual users to enterprise communication platforms.
In conclusion, the machine learning-based spam detection system provides a reliable
solution to a widespread problem, improving user experience, enhancing communication
security, and reducing the risk of phishing and other cyber threats. With further improvements
and integration into live platforms, this solution can be a vital component in maintaining safe
and efficient digital communication.

18
CHAPTER-9
FUTURE ENHANCEMENT

The current implementation of the Email/SMS Spam Detection system lays a strong
foundation for automated spam classification using machine learning. However, to address
the growing complexity of spam tactics and ensure broader applicability, several future
enhancements can be considered:
1. Advanced Deep Learning Integration
Future versions can incorporate state-of-the-art deep learning models such as LSTM,
GRU, and transformer-based architectures (e.g., BERT) for improved context
understanding and semantic analysis of messages.
2. Real-Time Filtering
Implementing real-time spam detection with minimal latency will allow seamless
integration into live messaging platforms and communication gateways, enhancing user
protection instantly.
3. Multilingual Support
Extending the model to support messages in regional and international languages
will broaden its usability and effectiveness in diverse linguistic environments.
4. Adaptive Learning with User Feedback
Enabling a continuous learning loop where the model updates based on user
feedback will help improve accuracy and reduce misclassification over time.
5. Phishing and Threat Detection
Expanding the system to detect phishing links, scams, and malware-infected
messages will provide users with an additional layer of security.
6. Browser and Mobile App Deployment
Creating lightweight, user-friendly browser extensions or mobile applications will
improve accessibility, allowing end-users to benefit from spam detection on the go.

19
CHAPTER-10
APPENDIX

10.1 SOURCE CODE

App.py

import streamlit as st
import pickle
import string
from nltk.corpus import stopwords
import nltk
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()
def transform_text(text):
text = text.lower()
text = nltk.word_tokenize(text)

y = []
for i in text:
if i.isalnum():
y.append(i)

text = y[:]
y.clear()

for i in text:

20
if i not in stopwords.words('english') and i not in string.punctuation:
y.append(i)

text = y[:]
y.clear()
for i in text:
y.append(ps.stem(i))
return " ".join(y)
tfidf = pickle.load(open('vectorizer.pkl','rb'))
model = pickle.load(open('model.pkl','rb'))

st.title("Email/SMS Spam Classifier")

input_sms = st.text_area("Enter the message")

if st.button('Predict'):

# 1. preprocess
transformed_sms = transform_text(input_sms)
# 2. vectorize
vector_input = tfidf.transform([transformed_sms])
# 3. predict
result = model.predict(vector_input)[0]
# 4. Display
if result == 1:
st.header("Spam")
else:
st.header("Not Spam")
21
10.2 OUTPUT SCREENS

22
CHAPTER-11
BIBLIOGRAPHY

REFERENCES

1. S. H. a. M. A. T. Toma, "An Analysis of Supervised Machine Learning Algorithms for

Spam Email Detection," in International Conference on Automation, Control and
Mechatronics for Industry 4.0 (ACMI), 2021.

2. S. Nandhini and J. Marseline K.S., "Performance Evaluation of Machine Learning

Algorithms for Email Spam Detection," in International Conference on Emerging Trends
in Information Technology and Engineering (ic-ETITE), 2020.

3. A. L. a. S. S. S. Gadde, "SMS Spam Detection using Machine Learning and Deep

Learning Techniques," in 7th International Conference on Advanced Computing and
Communication Systems (ICACCS), 2021, 2021.

4. V. B. a. B. K. P. Sethi, "SMS spam detection and comparison of various machine

learning algorithms," in International Conference on Computing and Communication
Technologies for Smart Nation (IC3TSN), 2017.

5. G. D. a. A. R. P. Navaney, "SMS Spam Filtering Using Supervised Machine Learning

Algorithms," in 8th International Conference on Cloud Computing, Data Science &
Engineering (Confluence), 2018.

23
6. S. O.Olatunji, "Extreme Learning Machines and Support Vector Machines models for
email spam detection," in IEEE 30th Canadian Conference on Electrical and Computer
Engineering (CCECE), 2017.

7. S. S. a. N. N. Kumar, "Email Spam Detection Using Machine Learning Algorithms," in

Second International Conference on Inventive Research in Computing Applications
(CIRCA), 2020.

8. Harika, "Analytics Vidhya," [Online]. Available:

https://www.analyticsvidhya.com/blog/2021/07/an-introduction-to-logistic-
regression/.

9. H. Deng, "Towards Data Science," [Online]. Available:

https://towardsdatascience.com/random-forest-3a55c3aca46d.

10. d.AI, "deepai,"Available:deepai.org/machine-learning-glossary-and

terms/accuracy-error-rate.

NLP Report
No ratings yet
NLP Report
19 pages
MINI_PROJECT REPORT (1)
No ratings yet
MINI_PROJECT REPORT (1)
21 pages
aryan blackbook 1
No ratings yet
aryan blackbook 1
29 pages
Maid hiring management system
No ratings yet
Maid hiring management system
43 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
2020CSEPID63 - Spam Alert System Synopsis Final
No ratings yet
2020CSEPID63 - Spam Alert System Synopsis Final
12 pages
DOCUMENT
No ratings yet
DOCUMENT
32 pages
Email spam detection edited
No ratings yet
Email spam detection edited
30 pages
Project_Report_Template_AICTE_Internship_2025
No ratings yet
Project_Report_Template_AICTE_Internship_2025
21 pages
Final PPT
No ratings yet
Final PPT
18 pages
Spam Detection in Text Using Machine Learning 1
No ratings yet
Spam Detection in Text Using Machine Learning 1
85 pages
Print 22may2023
No ratings yet
Print 22may2023
54 pages
Final Report Spam Classifier
No ratings yet
Final Report Spam Classifier
24 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
Project_Report_Template_AICTE_Internship_2025
No ratings yet
Project_Report_Template_AICTE_Internship_2025
20 pages
693613494-Project-Report-Emaildetection-4-44
No ratings yet
693613494-Project-Report-Emaildetection-4-44
41 pages
Sms Spam Detection
No ratings yet
Sms Spam Detection
51 pages
Reportfile
No ratings yet
Reportfile
10 pages
ml lab
No ratings yet
ml lab
13 pages
Kriti Final Report
No ratings yet
Kriti Final Report
60 pages
0_SPAM MAIL PREDICTION
No ratings yet
0_SPAM MAIL PREDICTION
29 pages
AntiSpam
No ratings yet
AntiSpam
26 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Final Documentation
No ratings yet
Final Documentation
82 pages
PRUTHVIRAJ MICOR FOML
No ratings yet
PRUTHVIRAJ MICOR FOML
26 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
mounika (1)
No ratings yet
mounika (1)
8 pages
Spam Email Detection Using Python and Machine Learning
No ratings yet
Spam Email Detection Using Python and Machine Learning
14 pages
emailSpamDetection
No ratings yet
emailSpamDetection
8 pages
vaibhav tiwari final project
No ratings yet
vaibhav tiwari final project
32 pages
vishal FOML micro project vishal & milan
No ratings yet
vishal FOML micro project vishal & milan
26 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
1822 b Deleted Merged Cropped
No ratings yet
1822 b Deleted Merged Cropped
40 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Aiml Pro
No ratings yet
Aiml Pro
14 pages
Final_report(Saie)
No ratings yet
Final_report(Saie)
38 pages
Spam Detection in Emails Using Machine Learning
No ratings yet
Spam Detection in Emails Using Machine Learning
56 pages
Sms Spam Detectionn (1)
No ratings yet
Sms Spam Detectionn (1)
63 pages
Final Project Report PDF
No ratings yet
Final Project Report PDF
35 pages
Synopsis On
No ratings yet
Synopsis On
8 pages
Abhishek mini proj^. file
No ratings yet
Abhishek mini proj^. file
19 pages
SMS SPAM DETECTION USING NAÏVE BAYES ALGORITHM-5
No ratings yet
SMS SPAM DETECTION USING NAÏVE BAYES ALGORITHM-5
6 pages
SMS SPAM FILTERING report
No ratings yet
SMS SPAM FILTERING report
38 pages
Miniproject Thirukumaran
No ratings yet
Miniproject Thirukumaran
38 pages
email report
No ratings yet
email report
15 pages
Email Spam Detection Ppt Github
No ratings yet
Email Spam Detection Ppt Github
11 pages
Spam email. Classifier
No ratings yet
Spam email. Classifier
44 pages
Spam Detection Synopsis
No ratings yet
Spam Detection Synopsis
8 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
TABLE CONTENT 1
No ratings yet
TABLE CONTENT 1
3 pages
REPORT[1]_1
No ratings yet
REPORT[1]_1
35 pages
Jawaharlal Nehru Technology University-A, Ananthapur: A Social Relevant Project Report Submitted To
No ratings yet
Jawaharlal Nehru Technology University-A, Ananthapur: A Social Relevant Project Report Submitted To
24 pages
Devangi It Report
No ratings yet
Devangi It Report
22 pages
Second Progress Report
No ratings yet
Second Progress Report
17 pages
sms spam detection project final
No ratings yet
sms spam detection project final
59 pages
Presentation
No ratings yet
Presentation
27 pages
Final CPP Project
No ratings yet
Final CPP Project
19 pages
NNDL MINI PROJECT REPORT (1)
No ratings yet
NNDL MINI PROJECT REPORT (1)
14 pages
AI and Deep Learning for Networks
From Everand
AI and Deep Learning for Networks
Gopee Mukhopadhyay
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
BDA Final
No ratings yet
BDA Final
33 pages
email spam
No ratings yet
email spam
8 pages
Smart Farm Docsfinal
No ratings yet
Smart Farm Docsfinal
96 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
112 pages
Heart Disease Prediction Project Report (1)
No ratings yet
Heart Disease Prediction Project Report (1)
4 pages
TSP CMC 41333
No ratings yet
TSP CMC 41333
14 pages
Obfuscated Malware Detection Using Dilated Convolutional Network
0% (1)
Obfuscated Malware Detection Using Dilated Convolutional Network
6 pages
Lasya Priya Capstone
No ratings yet
Lasya Priya Capstone
64 pages
AIML-UNIT-3
No ratings yet
AIML-UNIT-3
17 pages
CCCS-CIC-AndMal-2020 (2)
No ratings yet
CCCS-CIC-AndMal-2020 (2)
6 pages
Forecasting Directional Movements of Stock Prices For Intraday Trading Using LSTM and Random Forests
No ratings yet
Forecasting Directional Movements of Stock Prices For Intraday Trading Using LSTM and Random Forests
8 pages
Kashish Singh Resume
No ratings yet
Kashish Singh Resume
1 page
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
No ratings yet
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
25 pages
(Internet of Everything (IoE) ) Rashmi Agrawal (Editor) Marcin Paprzycki (Editor) Neha Gupta (Editor) - Big Data IoT and Mach
No ratings yet
(Internet of Everything (IoE) ) Rashmi Agrawal (Editor) Marcin Paprzycki (Editor) Neha Gupta (Editor) - Big Data IoT and Mach
339 pages
Sat - 95.Pdf - Heart Disease Prediction Using Machine Learning Algorithms
No ratings yet
Sat - 95.Pdf - Heart Disease Prediction Using Machine Learning Algorithms
11 pages
Credit Risk - Predictive Modelling
No ratings yet
Credit Risk - Predictive Modelling
47 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Cyberbullying Detection On Twitter Using Machine Learning A Review
No ratings yet
Cyberbullying Detection On Twitter Using Machine Learning A Review
5 pages
Air Quality Index Using Machine Learning a Jordan Case Study
No ratings yet
Air Quality Index Using Machine Learning a Jordan Case Study
11 pages
Assessing Water Quality of An Ecologically Critical Urban Canal Incorporating Machine Learning Approaches
No ratings yet
Assessing Water Quality of An Ecologically Critical Urban Canal Incorporating Machine Learning Approaches
23 pages
Draft Report Final Year
No ratings yet
Draft Report Final Year
62 pages
Unit-I
No ratings yet
Unit-I
23 pages
Minor Project Report
No ratings yet
Minor Project Report
46 pages
Article PP 1416-1433
No ratings yet
Article PP 1416-1433
18 pages
XG Boost
No ratings yet
XG Boost
5 pages
Proactive Collections Management: Using Artificial Intelligence To Predict Invoice Payment Dates By: Sonali Nanda
No ratings yet
Proactive Collections Management: Using Artificial Intelligence To Predict Invoice Payment Dates By: Sonali Nanda
22 pages
Computer Science Project Titles 2024 25 Takeoff Edu Group (1)
No ratings yet
Computer Science Project Titles 2024 25 Takeoff Edu Group (1)
19 pages
Tuning Parameters
No ratings yet
Tuning Parameters
15 pages
Crop Recommendation On Analyzing Soil Using Machine Learning
No ratings yet
Crop Recommendation On Analyzing Soil Using Machine Learning
8 pages
Codeforces
No ratings yet
Codeforces
3 pages
Predictive-Analytics-for-Demand-Forecasting-and-Planning
No ratings yet
Predictive-Analytics-for-Demand-Forecasting-and-Planning
3 pages
72255028
100% (1)
72255028
81 pages
CV Generate
No ratings yet
CV Generate
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.