0% found this document useful (0 votes)
49 views4 pages

Irjet V9i11154

This document discusses using machine learning algorithms to detect email spam. It begins with an introduction to machine learning approaches for email filtering and classifications algorithms like Naive Bayes, Support Vector Machines, decision trees, KNN, random forests. The objectives are to use these algorithms to classify text as spam or ham and predict scores accurately. The scope of work includes modifying ML algorithms, using and classifying datasets, scoring data for accuracy, and detecting spam credibility to filter messages. Diagrams show use cases, system states, and workflows. The conclusion is that machine learning is an effective approach for email spam detection.

Uploaded by

abhiram2003pgd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views4 pages

Irjet V9i11154

This document discusses using machine learning algorithms to detect email spam. It begins with an introduction to machine learning approaches for email filtering and classifications algorithms like Naive Bayes, Support Vector Machines, decision trees, KNN, random forests. The objectives are to use these algorithms to classify text as spam or ham and predict scores accurately. The scope of work includes modifying ML algorithms, using and classifying datasets, scoring data for accuracy, and detecting spam credibility to filter messages. Diagrams show use cases, system states, and workflows. The conclusion is that machine learning is an effective approach for email spam detection.

Uploaded by

abhiram2003pgd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

Email Spam Detection Using Machine Learning


Prof. Prachi Nilekar, Tamboli Abdul Salam, Manish Kumar Gupta,
Krishna Sharma, Safwan Attar

ALARD COLLEGE OF ENGINEERING & MANAGEMENT


(ALARD Knowledge Park, Survey No. 50, Marunje, Near Rajiv Gandhi IT Park, Hinjewadi, Pune-411057)
Approved by AICTE. Recognized by DTE. NAAC Accredited. Affiliated to SPPU (Pune University).
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – Nowadays, Email spam has become a big is used to fit the model, and the test dataset is used to
problem, with the fast growth of internet users, email spams evaluate the model.
are also increasing. People are using them for phishing, illegal
and unethical practices and frauds. Sending malicious links
through spam emails that can harm for our system and may
also get into your system. It is very simple for spammers to
create a fake profile and email account, they show like a real
person in their spam emails, these spammers simply target
people who are not aware of these frauds. then there is a need
to identify those spam mails which are frauds, this project will
identifies those spams using techniques of machine learning,
Fig -1: Train and Test Model
this paper will discuss machine learning algorithm's and apply
all these algorithm's to our dataset. it select the best Machine learning algorithms used to classify the text into
algorithm, for this project algorithm will be chosen based on two different categories, spam and ham. The algorithm will
the best accuracy and precision in email spam detecting. predict the score more accurately. The objective of
developing this model is to detect and score word faster and
Key Words: (Machine Learning, Naive Bayes, Support
accurately.
Vector Machine, DTS, Random Forest, Bagging, Boosting)
2. MACHINE LEARNING CLASSIFICATION
1. INTRODUCTION
ALGORITHMS
Machine learning approaches are more efficient, a set of
Naive Bayes: Naive Bayes is a classification algorithm
training data is used, these samples are the set of email
suitable for both binary and multiclass classification. Naive
which are pre classified. Machine learning approaches have a
Bayes performs better for categorical input variables than
lot of algorithms that can be used for email filtering, these
for numerical variables. It is useful for making predictions
algorithms are “Naive Bayes, support vector machines,
based on historical results and forecast data.
Neural Networks, K-nearest neighbor, Random Forests, etc.”

Why Machine Learning: Machine learning allows the user


to feed a computer algorithm an immense amount of data
and have the computer analyze and make data-driven
recommendations and decisions based on only the input P(A) is Prior Probability: The possibility of a hypothesis
data. before seeing the evidence.

What is DATASET: Dataset is a collection of data or related P(B) is Marginal Probability: Probability of Evidence.
information that is composed for separate elements. A
collection of datasets for e-mail spam contains spam and Support Vector Machine: SVMs are used in intrusion
non-spam messages. detection, face detection, email classification, gene
classification, web pages, etc. It can handle classification and
What is Train and Test datasets: The main difference regression on linear and non-linear data.
between training data and test data is that training data is
the subset of original data that is used to train a machine
learning model, whereas test data is used to check the
accuracy of the model. The training dataset is usually larger
in size than the test dataset. Train and test dataset are two
key concepts in machine learning, where the training dataset

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 735
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

a coordinated scope of work. These project scopes will help


focus the project. The scopes are:

 Modifying existing machine learning algorithm.

 Use and classify data sets, including data


preparation, classification, and visualization.

 Score the data to determine the accuracy of spam


detection.

Fig -2: Support Vector Machine  This proposed system will detect the credibility of
the mail and it will filter spam messages.
Decision tree: Decision trees are extremely useful for data
analytics and machine learning because they break down  This proposed system will save the time of the user
complex data into more manageable chunks. They are often and it will eliminate the risk of spam mails.
used in these fields for predictive analysis, data
classification, and regression. Use case diagrams describe the high-level functions and
scope of the system, these diagrams also identify the
Entropy using the frequency table of one attribute: interactions between the system and its actors. A Use case
diagram outlines how external entities user interact with an
internal software system.

Entropy using the frequency table of two attributes:

KNN: The KNN algorithm can compete with highly accurate


models because it makes highly accurate predictions. The
KNN algorithm use for applications that require high
accuracy but do not require a human readable model. The
quality of the predictions is depends on the distance
measurement. Formula:

dist((x, y), (a, b)) = √(x - a)² + (y - b)²

Random forest classifiers: Random forest classifiers can be


Fig -3: Use Case Diagram
used to solve regression or classification problems. The
random forest algorithm is composed of a collection of A state diagram consists of states, transitions, activities, and
decision trees, and each tree in the ensemble consists of data events. It describes the different states that an object moves
samples drawn from the training set with replacement, through or provide an abstract description of the behavior of
called bootstrap samples. a system.
3. OBJECTIVES OF THE STUDY
Machine learning algorithms used to classify the text into
two different categories, spam and ham, the algorithm will
predict the score more accurately. The purpose of
developing this model is to recognize and score the word
rapidly and accurately.

4. SCOPE OF THE STUDY


The proposed system of the project will effectively detect
spam mails and the system will extract spam mails using
some machine learning algorithms and it gives results with
more accuracy and good performance. This project required Fig -4: State Diagram
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 736
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

Activity diagrams are graphical representations of 6. CONCLUSIONS


workflows with support for selection, repetition, and
concurrency of step-by-step activities and tasks. This system, in addition to lessening the work load, it also
fixes any false data about the users that they may have. It is a
benefit for the users’ who’s important time and data is
preserved, for the Affected users or authority whose data is
immensely important, whose data will secured from misuse.

We are able to classify email as spam or non spam. With


huge number of emails if people are using the system it will
be difficult to handle all the possible mails as our project
deals with only limited amount.

The website use for end user, it is user friendliness. Because


of the end user it uses without any other help and without
any conflicts. The website goal is “Email Spam or Non Spam”
using machine learning, related to its use for free and
maintenance (coding, updates, uploading data, datasets, etc)
cost is less. The many goals was successfully completed and
achieved by us.

ACKNOWLEDGEMENT
This paper was supported by Alard College of Engineering &
Management, Pune 411057. We are very thankful to all those
who have provided us valuable guidance towards the
completion of this Seminar Report on “Email Spam Detection
Using Machine Learning” as part of the syllabus of our
Fig -5: Activity Diagram course. We express our sincere gratitude towards the
cooperative department who has provided us with valuable
5. PROJECT ARCHITECRURE DIAGRAM assistance and requirements for the system development.
We are very grateful and Prof. Prachi Nikelar for guiding us
An architectural diagram is a visual representation that in the right manner, correcting our doubts by giving us their
shows the physical implementation of the components of a time whenever we required, and providing their knowledge
software system. It shows the general structure of the and experience in making this project.
software system and the associations, boundaries and limits
between each element. REFERENCES
[1] A Sharaff and Srinivasarao U (2020), "Towards
classification of email through selection of informative
features," First International Conference on Power,
Control and Computing Technologies (ICPC2T), Raipur,
India, pp. 316-320, DOI:
10.1109/ICPC2T48082.2020.9071488.

[2] Adebayo A. Alli, Modupe Odusami, Olusola A. Alli and


Sanjay Misra (2019), A review of soft techniques for SMS
classification: methods, approaches and applications,
Engineering Applications of Artificial Intelligence, DOI:
10.1016/j.engappai.2019.08.024.

[3] A. Sharma & H. Kaur, Improved email spam classification


method using integrated particle swarm optimization
Fig -6: Architecture Diagram of Email Spam Detection and decision tree. In Next Generation Computing
Technologies 2nd International Conference on pp. 516-
521, DOI: 10.1109/NGCT.2016.7877470.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 737
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072

[4] A. Sharaff, A. Dhadse and Naresh K. Nagwani (2016),


Comparative study of classification algorithms for spam
email detection, in Emerging Research in Computing,
communication and applications, Information, pp. 237-
244, Springer, Berlin, Germany, DOI: 10.1007/978-81-
322-2553-9_23.

[5] Alfandi O., Dahmani N. and Kaddoura S., "A Spam Email
Detection Mechanism for English Language Text Emails
Using Deep Learning Approach", IEEE 29th International
Conference on Enabling Technologies: Infrastructure for
Collaborative Enterprises, France, Bayonne, pp. 193-
198, DOI: 10.1109/WETICE49692.2020.00045.

[6] Amin, Hossain N. & Rahman M. M., "A Bangla Spam


Email Detection and Datasets Creation Approach based
on Machine Learning Algorithms," 2019 3rd
International Conference on Electrical, Computer &
Telecommunication Engineering, Bangladesh, Rajshahi,
2019, pp. 169-172, DOI:
10.1109/ICECTE48615.2019.9303525.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 738

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy