Irjet V9i11154
Irjet V9i11154
What is DATASET: Dataset is a collection of data or related P(B) is Marginal Probability: Probability of Evidence.
information that is composed for separate elements. A
collection of datasets for e-mail spam contains spam and Support Vector Machine: SVMs are used in intrusion
non-spam messages. detection, face detection, email classification, gene
classification, web pages, etc. It can handle classification and
What is Train and Test datasets: The main difference regression on linear and non-linear data.
between training data and test data is that training data is
the subset of original data that is used to train a machine
learning model, whereas test data is used to check the
accuracy of the model. The training dataset is usually larger
in size than the test dataset. Train and test dataset are two
key concepts in machine learning, where the training dataset
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 735
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
Fig -2: Support Vector Machine This proposed system will detect the credibility of
the mail and it will filter spam messages.
Decision tree: Decision trees are extremely useful for data
analytics and machine learning because they break down This proposed system will save the time of the user
complex data into more manageable chunks. They are often and it will eliminate the risk of spam mails.
used in these fields for predictive analysis, data
classification, and regression. Use case diagrams describe the high-level functions and
scope of the system, these diagrams also identify the
Entropy using the frequency table of one attribute: interactions between the system and its actors. A Use case
diagram outlines how external entities user interact with an
internal software system.
ACKNOWLEDGEMENT
This paper was supported by Alard College of Engineering &
Management, Pune 411057. We are very thankful to all those
who have provided us valuable guidance towards the
completion of this Seminar Report on “Email Spam Detection
Using Machine Learning” as part of the syllabus of our
Fig -5: Activity Diagram course. We express our sincere gratitude towards the
cooperative department who has provided us with valuable
5. PROJECT ARCHITECRURE DIAGRAM assistance and requirements for the system development.
We are very grateful and Prof. Prachi Nikelar for guiding us
An architectural diagram is a visual representation that in the right manner, correcting our doubts by giving us their
shows the physical implementation of the components of a time whenever we required, and providing their knowledge
software system. It shows the general structure of the and experience in making this project.
software system and the associations, boundaries and limits
between each element. REFERENCES
[1] A Sharaff and Srinivasarao U (2020), "Towards
classification of email through selection of informative
features," First International Conference on Power,
Control and Computing Technologies (ICPC2T), Raipur,
India, pp. 316-320, DOI:
10.1109/ICPC2T48082.2020.9071488.
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 737
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
[5] Alfandi O., Dahmani N. and Kaddoura S., "A Spam Email
Detection Mechanism for English Language Text Emails
Using Deep Learning Approach", IEEE 29th International
Conference on Enabling Technologies: Infrastructure for
Collaborative Enterprises, France, Bayonne, pp. 193-
198, DOI: 10.1109/WETICE49692.2020.00045.
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 738