0% found this document useful (0 votes)

49 views4 pages

Irjet V9i11154

This document discusses using machine learning algorithms to detect email spam. It begins with an introduction to machine learning approaches for email filtering and classifications algorithms like Naive Bayes, Support Vector Machines, decision trees, KNN, random forests. The objectives are to use these algorithms to classify text as spam or ham and predict scores accurately. The scope of work includes modifying ML algorithms, using and classifying datasets, scoring data for accuracy, and detecting spam credibility to filter messages. Diagrams show use cases, system states, and workflows. The conclusion is that machine learning is an effective approach for email spam detection.

Uploaded by

abhiram2003pgd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views4 pages

Irjet V9i11154

Uploaded by

abhiram2003pgd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

Email Spam Detection Using Machine Learning

Prof. Prachi Nilekar, Tamboli Abdul Salam, Manish Kumar Gupta,
Krishna Sharma, Safwan Attar

ALARD COLLEGE OF ENGINEERING & MANAGEMENT

(ALARD Knowledge Park, Survey No. 50, Marunje, Near Rajiv Gandhi IT Park, Hinjewadi, Pune-411057)
Approved by AICTE. Recognized by DTE. NAAC Accredited. Affiliated to SPPU (Pune University).
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – Nowadays, Email spam has become a big is used to fit the model, and the test dataset is used to
problem, with the fast growth of internet users, email spams evaluate the model.
are also increasing. People are using them for phishing, illegal
and unethical practices and frauds. Sending malicious links
through spam emails that can harm for our system and may
also get into your system. It is very simple for spammers to
create a fake profile and email account, they show like a real
person in their spam emails, these spammers simply target
people who are not aware of these frauds. then there is a need
to identify those spam mails which are frauds, this project will
identifies those spams using techniques of machine learning,
Fig -1: Train and Test Model
this paper will discuss machine learning algorithm's and apply
all these algorithm's to our dataset. it select the best Machine learning algorithms used to classify the text into
algorithm, for this project algorithm will be chosen based on two different categories, spam and ham. The algorithm will
the best accuracy and precision in email spam detecting. predict the score more accurately. The objective of
developing this model is to detect and score word faster and
Key Words: (Machine Learning, Naive Bayes, Support
accurately.
Vector Machine, DTS, Random Forest, Bagging, Boosting)
2. MACHINE LEARNING CLASSIFICATION
1. INTRODUCTION
ALGORITHMS
Machine learning approaches are more efficient, a set of
Naive Bayes: Naive Bayes is a classification algorithm
training data is used, these samples are the set of email
suitable for both binary and multiclass classification. Naive
which are pre classified. Machine learning approaches have a
Bayes performs better for categorical input variables than
lot of algorithms that can be used for email filtering, these
for numerical variables. It is useful for making predictions
algorithms are “Naive Bayes, support vector machines,
based on historical results and forecast data.
Neural Networks, K-nearest neighbor, Random Forests, etc.”

Why Machine Learning: Machine learning allows the user

to feed a computer algorithm an immense amount of data
and have the computer analyze and make data-driven
recommendations and decisions based on only the input P(A) is Prior Probability: The possibility of a hypothesis
data. before seeing the evidence.

What is DATASET: Dataset is a collection of data or related P(B) is Marginal Probability: Probability of Evidence.
information that is composed for separate elements. A
collection of datasets for e-mail spam contains spam and Support Vector Machine: SVMs are used in intrusion
non-spam messages. detection, face detection, email classification, gene
classification, web pages, etc. It can handle classification and
What is Train and Test datasets: The main difference regression on linear and non-linear data.
between training data and test data is that training data is
the subset of original data that is used to train a machine
learning model, whereas test data is used to check the
accuracy of the model. The training dataset is usually larger
in size than the test dataset. Train and test dataset are two
key concepts in machine learning, where the training dataset

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 735
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

a coordinated scope of work. These project scopes will help

focus the project. The scopes are:

 Modifying existing machine learning algorithm.

 Use and classify data sets, including data

preparation, classification, and visualization.

 Score the data to determine the accuracy of spam

detection.

Fig -2: Support Vector Machine  This proposed system will detect the credibility of
the mail and it will filter spam messages.
Decision tree: Decision trees are extremely useful for data
analytics and machine learning because they break down  This proposed system will save the time of the user
complex data into more manageable chunks. They are often and it will eliminate the risk of spam mails.
used in these fields for predictive analysis, data
classification, and regression. Use case diagrams describe the high-level functions and
scope of the system, these diagrams also identify the
Entropy using the frequency table of one attribute: interactions between the system and its actors. A Use case
diagram outlines how external entities user interact with an
internal software system.

Entropy using the frequency table of two attributes:

KNN: The KNN algorithm can compete with highly accurate

models because it makes highly accurate predictions. The
KNN algorithm use for applications that require high
accuracy but do not require a human readable model. The
quality of the predictions is depends on the distance
measurement. Formula:

dist((x, y), (a, b)) = √(x - a)² + (y - b)²

Random forest classifiers: Random forest classifiers can be

Fig -3: Use Case Diagram
used to solve regression or classification problems. The
random forest algorithm is composed of a collection of A state diagram consists of states, transitions, activities, and
decision trees, and each tree in the ensemble consists of data events. It describes the different states that an object moves
samples drawn from the training set with replacement, through or provide an abstract description of the behavior of
called bootstrap samples. a system.
3. OBJECTIVES OF THE STUDY
Machine learning algorithms used to classify the text into
two different categories, spam and ham, the algorithm will
predict the score more accurately. The purpose of
developing this model is to recognize and score the word
rapidly and accurately.

4. SCOPE OF THE STUDY

The proposed system of the project will effectively detect
spam mails and the system will extract spam mails using
some machine learning algorithms and it gives results with
more accuracy and good performance. This project required Fig -4: State Diagram
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 736
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

Activity diagrams are graphical representations of 6. CONCLUSIONS

workflows with support for selection, repetition, and
concurrency of step-by-step activities and tasks. This system, in addition to lessening the work load, it also
fixes any false data about the users that they may have. It is a
benefit for the users’ who’s important time and data is
preserved, for the Affected users or authority whose data is
immensely important, whose data will secured from misuse.

We are able to classify email as spam or non spam. With

huge number of emails if people are using the system it will
be difficult to handle all the possible mails as our project
deals with only limited amount.

The website use for end user, it is user friendliness. Because

of the end user it uses without any other help and without
any conflicts. The website goal is “Email Spam or Non Spam”
using machine learning, related to its use for free and
maintenance (coding, updates, uploading data, datasets, etc)
cost is less. The many goals was successfully completed and
achieved by us.

ACKNOWLEDGEMENT
This paper was supported by Alard College of Engineering &
Management, Pune 411057. We are very thankful to all those
who have provided us valuable guidance towards the
completion of this Seminar Report on “Email Spam Detection
Using Machine Learning” as part of the syllabus of our
Fig -5: Activity Diagram course. We express our sincere gratitude towards the
cooperative department who has provided us with valuable
5. PROJECT ARCHITECRURE DIAGRAM assistance and requirements for the system development.
We are very grateful and Prof. Prachi Nikelar for guiding us
An architectural diagram is a visual representation that in the right manner, correcting our doubts by giving us their
shows the physical implementation of the components of a time whenever we required, and providing their knowledge
software system. It shows the general structure of the and experience in making this project.
software system and the associations, boundaries and limits
between each element. REFERENCES
[1] A Sharaff and Srinivasarao U (2020), "Towards
classification of email through selection of informative
features," First International Conference on Power,
Control and Computing Technologies (ICPC2T), Raipur,
India, pp. 316-320, DOI:
10.1109/ICPC2T48082.2020.9071488.

[2] Adebayo A. Alli, Modupe Odusami, Olusola A. Alli and

Sanjay Misra (2019), A review of soft techniques for SMS
classification: methods, approaches and applications,
Engineering Applications of Artificial Intelligence, DOI:
10.1016/j.engappai.2019.08.024.

[3] A. Sharma & H. Kaur, Improved email spam classification

method using integrated particle swarm optimization
Fig -6: Architecture Diagram of Email Spam Detection and decision tree. In Next Generation Computing
Technologies 2nd International Conference on pp. 516-
521, DOI: 10.1109/NGCT.2016.7877470.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 737
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

[4] A. Sharaff, A. Dhadse and Naresh K. Nagwani (2016),

Comparative study of classification algorithms for spam
email detection, in Emerging Research in Computing,
communication and applications, Information, pp. 237-
244, Springer, Berlin, Germany, DOI: 10.1007/978-81-
322-2553-9_23.

[5] Alfandi O., Dahmani N. and Kaddoura S., "A Spam Email
Detection Mechanism for English Language Text Emails
Using Deep Learning Approach", IEEE 29th International
Conference on Enabling Technologies: Infrastructure for
Collaborative Enterprises, France, Bayonne, pp. 193-
198, DOI: 10.1109/WETICE49692.2020.00045.

[6] Amin, Hossain N. & Rahman M. M., "A Bangla Spam

Email Detection and Datasets Creation Approach based
on Machine Learning Algorithms," 2019 3rd
International Conference on Electrical, Computer &
Telecommunication Engineering, Bangladesh, Rajshahi,
2019, pp. 169-172, DOI:
10.1109/ICECTE48615.2019.9303525.

Report Minor Project PDF
No ratings yet
Report Minor Project PDF
37 pages
5-7
No ratings yet
5-7
3 pages
Spam Detection
No ratings yet
Spam Detection
4 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Spam Filter - Machine Learning
No ratings yet
Spam Filter - Machine Learning
25 pages
Spam Detection & Classification Final
No ratings yet
Spam Detection & Classification Final
38 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
spam detection
No ratings yet
spam detection
39 pages
1822 b Deleted
No ratings yet
1822 b Deleted
38 pages
Final_report(Saie)
No ratings yet
Final_report(Saie)
38 pages
Published Paper
No ratings yet
Published Paper
9 pages
Final ppt
No ratings yet
Final ppt
51 pages
Spam Mail Detection Using Machine Learning
No ratings yet
Spam Mail Detection Using Machine Learning
5 pages
vishal FOML micro project vishal & milan
No ratings yet
vishal FOML micro project vishal & milan
26 pages
Final PPT
No ratings yet
Final PPT
18 pages
0_SPAM MAIL PREDICTION
No ratings yet
0_SPAM MAIL PREDICTION
29 pages
E-Mail Spam Detection Using Machine Learning and Deep Learning
No ratings yet
E-Mail Spam Detection Using Machine Learning and Deep Learning
7 pages
E-Mail Spam Detection Using Machine Lear PDF
No ratings yet
E-Mail Spam Detection Using Machine Lear PDF
7 pages
PRUTHVIRAJ MICOR FOML
No ratings yet
PRUTHVIRAJ MICOR FOML
26 pages
Format For PBS
No ratings yet
Format For PBS
18 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Spam 123
No ratings yet
Spam 123
59 pages
Group 17 Blackbook Final Report (1) (2)
No ratings yet
Group 17 Blackbook Final Report (1) (2)
40 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
Email Classification Using Machine Learning
No ratings yet
Email Classification Using Machine Learning
22 pages
IJCRT23A5429
No ratings yet
IJCRT23A5429
7 pages
aryan blackbook 1
No ratings yet
aryan blackbook 1
29 pages
Arnav MLlab04
No ratings yet
Arnav MLlab04
7 pages
ML Module 1
No ratings yet
ML Module 1
26 pages
Kriti_report FINAL (1)
No ratings yet
Kriti_report FINAL (1)
11 pages
1822 b Deleted Merged Cropped
No ratings yet
1822 b Deleted Merged Cropped
40 pages
Efficient Spam Classification by Appropriate Feature Selection
No ratings yet
Efficient Spam Classification by Appropriate Feature Selection
17 pages
A Study of Machine Learning Algorithms On Email Spam Classification
No ratings yet
A Study of Machine Learning Algorithms On Email Spam Classification
10 pages
Spamemaildetectionusingmachinelearningppt 230201113400 20a802e7
No ratings yet
Spamemaildetectionusingmachinelearningppt 230201113400 20a802e7
21 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
research article on the forensic
No ratings yet
research article on the forensic
14 pages
AI Phase1
No ratings yet
AI Phase1
7 pages
Id - 3747 - Literature Review
No ratings yet
Id - 3747 - Literature Review
3 pages
Spam Email Detection Using Python and Machine Learning
No ratings yet
Spam Email Detection Using Python and Machine Learning
14 pages
Email Spam Filtering Using Machine Learning.1[1]
No ratings yet
Email Spam Filtering Using Machine Learning.1[1]
16 pages
693613494-Project-Report-Emaildetection-4-44
No ratings yet
693613494-Project-Report-Emaildetection-4-44
41 pages
Prajwalpatil
No ratings yet
Prajwalpatil
24 pages
1 s2.0 S0950705106001390 Main
No ratings yet
1 s2.0 S0950705106001390 Main
6 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Industrial Training Report
No ratings yet
Industrial Training Report
31 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
Evaluation and comparison of machine learning models for ham and spam email classification
No ratings yet
Evaluation and comparison of machine learning models for ham and spam email classification
13 pages
An Analysis of Machine Learning Algorithms and Deep Neural Networks For Email Spam Classification U
No ratings yet
An Analysis of Machine Learning Algorithms and Deep Neural Networks For Email Spam Classification U
6 pages
Abhishek mini proj^. file
No ratings yet
Abhishek mini proj^. file
19 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Spam Detection Using Large Datasets with Multilingual Support
No ratings yet
Spam Detection Using Large Datasets with Multilingual Support
7 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
IJRPR8167
No ratings yet
IJRPR8167
7 pages
ml lab
No ratings yet
ml lab
13 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Big As References
No ratings yet
Big As References
1 page
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
REPORT[1]_1
No ratings yet
REPORT[1]_1
35 pages
DM Lab Cycle 6 1
No ratings yet
DM Lab Cycle 6 1
5 pages
Social Media Mining With R Sample Chapter
100% (1)
Social Media Mining With R Sample Chapter
18 pages
ML Practical Updated
No ratings yet
ML Practical Updated
64 pages
Draft Week3
No ratings yet
Draft Week3
41 pages
22068732 Sachida Paudel milestone 1
No ratings yet
22068732 Sachida Paudel milestone 1
79 pages
Smartphone User Behaviour Predication Using AI
No ratings yet
Smartphone User Behaviour Predication Using AI
9 pages
Stress Detection Synopsis
No ratings yet
Stress Detection Synopsis
14 pages
Instant Access to Pattern Recognition Introduction Features Classifiers and Principles De Gruyter Textbook 2nd Edition Beyerer ebook Full Chapters
100% (3)
Instant Access to Pattern Recognition Introduction Features Classifiers and Principles De Gruyter Textbook 2nd Edition Beyerer ebook Full Chapters
62 pages
Sample Copy of Major Project Report
No ratings yet
Sample Copy of Major Project Report
60 pages
ML_UNIT-V
No ratings yet
ML_UNIT-V
161 pages
ML Unit 3 Part B Material
No ratings yet
ML Unit 3 Part B Material
15 pages
Astrological Prediction For Profession Doctor Usin
No ratings yet
Astrological Prediction For Profession Doctor Usin
5 pages
ISYE6740_Fall2024_HW4_Rubric
No ratings yet
ISYE6740_Fall2024_HW4_Rubric
5 pages
Unit II Full Notes
No ratings yet
Unit II Full Notes
108 pages
Predictionof Diabetesusing Machine Learning
No ratings yet
Predictionof Diabetesusing Machine Learning
6 pages
Sahil Final Project REPORT
No ratings yet
Sahil Final Project REPORT
49 pages
Disease Prediction Using Data Mining
No ratings yet
Disease Prediction Using Data Mining
5 pages
Projects 2021 B4
No ratings yet
Projects 2021 B4
96 pages
Multi Disease Prediction System Using ML (Phase-II)
No ratings yet
Multi Disease Prediction System Using ML (Phase-II)
14 pages
Machine Learning Lab File (BTCS619-18)
No ratings yet
Machine Learning Lab File (BTCS619-18)
50 pages
Machine Learning Approach For Acute Respiratory Infections (ISPA) Prediction: Case Study Indonesia
No ratings yet
Machine Learning Approach For Acute Respiratory Infections (ISPA) Prediction: Case Study Indonesia
5 pages
Sample Paper Machine Learning Techniques KCS055
No ratings yet
Sample Paper Machine Learning Techniques KCS055
5 pages
A Review On Sentiment Analysis Methodologies Practices and Applications
No ratings yet
A Review On Sentiment Analysis Methodologies Practices and Applications
9 pages
Megersa Oljira
100% (3)
Megersa Oljira
106 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
Skin Cancer Detection
No ratings yet
Skin Cancer Detection
16 pages
Upgrad + PGD+ML+Brochure
No ratings yet
Upgrad + PGD+ML+Brochure
8 pages
GCD Detailed Syllabus
No ratings yet
GCD Detailed Syllabus
24 pages
Ec3561 Vlsi Laboratory L T P C
No ratings yet
Ec3561 Vlsi Laboratory L T P C
6 pages
A Novel Stacking Approach For Accurate Detection of Fake News
No ratings yet
A Novel Stacking Approach For Accurate Detection of Fake News
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Irjet V9i11154

Uploaded by

Irjet V9i11154

Uploaded by

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

Email Spam Detection Using Machine Learning

ALARD COLLEGE OF ENGINEERING & MANAGEMENT

Why Machine Learning: Machine learning allows the user

a coordinated scope of work. These project scopes will help

 Modifying existing machine learning algorithm.

 Use and classify data sets, including data

 Score the data to determine the accuracy of spam

Entropy using the frequency table of two attributes:

KNN: The KNN algorithm can compete with highly accurate

dist((x, y), (a, b)) = √(x - a)² + (y - b)²

Random forest classifiers: Random forest classifiers can be

4. SCOPE OF THE STUDY

Activity diagrams are graphical representations of 6. CONCLUSIONS

We are able to classify email as spam or non spam. With

The website use for end user, it is user friendliness. Because

[2] Adebayo A. Alli, Modupe Odusami, Olusola A. Alli and

[3] A. Sharma & H. Kaur, Improved email spam classification

[4] A. Sharaff, A. Dhadse and Naresh K. Nagwani (2016),

[6] Amin, Hossain N. & Rahman M. M., "A Bangla Spam

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.