Spam Detection & Classification Final
Spam Detection & Classification Final
Spam Detection & Classification Final
DETECTION USING
I N T E G R AT I O N O F L O G I S T I C
REGRESSION AND PSO
ALGORITHM
T E A M M AT E S
UG SCHOLAR REGISTER NUMBER
ABHINAY.S 412417205001
NARAYANAN.M 412417205061
NISHANTH.T 412417205063
GUIDE
MRS. J.GAYATHRI
S U S TA I N A B L E D E V E L O P M E N T
GOAL
• The spam mail directs the various kinds of effects, including exposing unwanted images, decreasing
the company productivity, blocking of Internet Service Providers (ISP) networks and so on.
• Additionally, the spam mail contain virus which is planned for some counterfeit activity. Thus, the
robust and efficient spam mail filtering and classification process has to be use.
• Collaborative spam detection techniques can deal with large scale e-mail data contributed by
multiple sources and they have the well-known problem of requiring disclosure of e-mail content.
• Distance-preserving hashes are one of the common solutions used for preserving the privacy of e-
mail content while enabling message classification for spam detection.
• PSO, a Big Data privacy-preserving collaborative spam detection platform built on top of a
standard Map Reduce facility without PSO feature selection the training accuracy will be less with
help of best dataset fit the classification result will be high.
OBJECTIVE
• Spam emails can cause serious issues for Personal Computer (PC) users not
installed in antivirus solutions.
• It is a waste of time for organization works, resulting in decreases the company
productivity and thus causes the overall system performance.
• Our proposed system will be trained for text spam classification and it will check
the each users mail content against our training set.
• if the content has more spam content then predefined threshold value then system
automatically block the user mail.
EXISTING SYSTEM
2 Email Spam Detection using Kriti Agarwal, Tarun Naïve Bayes algorithm is used for Naïve Bayes fails in
integrated approach of Naïve Kumar/2018/Proceedings of the the learning and classification and predicting data which are not
Bayes and Particle Swarm Second International Conference PSO for the global optimization present in the Training data
Optimization on Intelligent Computing and of the parameters of NB approach set
Control Systems (ICICCS 2018)
3 A Machine Learning based Nikhil Govil, Kunal The algorithm generates Naïve Bayes fails in
Spam Detection Agarwal/2020/Fourth dictionary and features and predicting data which are not
Mechanism International Conference on trains them through machine present in the Training data
Computing Methodologies and learning for effective results set
Communication (ICCMC)
4 Performance Evaluation of Nandhini.S, Dr.Jeen Comparison of five classification Large number of trees can
Machine Learning Marseline.K.S/2020/International algorithms is performed and make the algorithm too slow
Algorithms for Email Spam Conference on Emerging Trends random tree algorithm is and ineffective for real-time
Detection in Information Technology and concluded as a best choice for predictions
Engineering (ic-ETITE) performance
5 Detecting Spam Email With Simran Gibson, Biju The bio-inspired algorithms like Using pure genetic algorithm
Machine Learning Isaac/2020/IEEE Access Particle Swarm Optimization and is time consuming,
Optimized With Bio-Inspired Volume:8 Genetic Algorithm are used to computationally expensive
Metaheuristic Algorithms optimize the performance of
classifiers
PROPOSED SYSTEM
• Collaborative spam detection techniques can deal with large scale e-mail data
contributed by multiple sources and they have the well-known problem of
requiring disclosure of e-mail content.
• Here we are collaborating PSO(Particle swarm optimization) with logistic
regression in order select a best feature set for training .
• system will be trained for text spam classification.
• It will check the each users mail content against our training set ,if the content
has more spam content then predefined threshold value then system
automatically block the user mail.
A D VA N T A G E S
• After anomaly detection system will automatically perform the necessary actions.
• Detection Accuracy
• Low time complexity
RESOURCES REQUIREMENT
Hardware Requirement
• Hardware : Pentium Dual Core or above
• Speed : 2.80 GHz
• RAM : 2GB
• Hard Disk : 20 GB
Software Requirement:
• Operating System : Windows or Linux
• Technology : Python
• Web Technologies : django frame work
• IDE : Python Launcher
• Database : Sqlite3
• Packages and Libraries: Numpy, pandas, Tensor flow
MODULES
1)Load Dataset:
In this module unstructured data is converted into structured data format by
writing data into database. Because we can implement our algorithms directly on
the text, CSV, etc..
MODULES
2)Data Pre-processing:
• The data can have many irrelevant and missing parts. To handle this part, data
cleaning is done. It involves handling of missing data, noisy data etc.
• This module most important one because without performing this operation the
result prediction may be lead to incorrect stage
MODULES
3)Feature Extraction:
• Feature Extraction aims to reduce the number of features in a dataset by creating
new features from the existing ones (and then discarding the original features).
• These new reduced set of features should then be able to summarize most of the
information contained in the original set of features. In this way, a summarized
version of the original, features can be created from a combination of the original
set.
• In this module feature engineering that extract both individual behavior and
frequency based behavior in order identify the spam classification.
MODULES
4)Spam Detection:
• Spam identified by using parallel classifier and anomaly detector. User receives
many e-mail per day so we are collected those e-mails allowed to parallel
classifier.
• In parallel classifier similar messages are put into same bucket and noted down
the bucket size. Anomaly detector detects the spam based on the size of the
bucket.
• PSO is a meta-heuristic as it makes few or no assumptions about the problem
being optimized and can search very large spaces of candidate solutions
ALGORITHM
• Particle swarm optimization algorithm along with logistic regression model is
proposed in order to produce a accurate prediction of spam content.
• After the data pre-processing and feature extraction, the Logistic regression
algorithm is applied for the classification purpose.
• Further, in order to optimize the results, Particle Swarm is applied.
• Experimental results demonstrate the usefulness of our proposed method in
significantly obtaining an improved classification performance with few features.
Further, the results show that the proposed methods have a competitive
performance comparing with other existing fitness functions.
SYSTEM ARCHITECTURE DIAGRAM
SPAM MAIL
USER DETECTOR
RECEIVES ALL MAILS
INBOX MAIL SPAM DETECTION
DATA
PREPROCESSIN
UNKNOWN G
USER MAIL
MAIL CONTENT
SPAMMED CONTENT
YES IS MAIL A NO
SPAM
<<include>> <<include>>
WRITE AN ATTACH AN
EMAIL EMAIL
CAPTURE MAILS
BLOCK OR
FILTER SPAM
MAIL
<<include>> <<include>>
NOTIFY THE
NOTIFY THE NETWORK
MAIL RECEIVER ADMINISTRATO
R
SEQUENCE DIAGRAM
SPAM MAIL
UNKNOWN USER DETECTION
USER SYSTEM
WRITES A MAIL
ATTACH A FILE
ACKNOWLEDGEMENT
OF A RECEIVED MAIL DATA
PREPROCESSING
FEATURE
EXTRACTION
SPAM
DETECTION
ALGORITHM
BLOCKS THOSE MAILS
OR NOTIFY USERS
OF THE SPAM MAIL
SCREENSHOTS – CSV FILE
SCREENSHOTS-CODE
SCREENSHOTS-CODE
SCREENSHOTS-CODE
SCREENSHOTS-CODE
SCREENSHOTS-CODE
SCREENSHOTS-CODE
SCREENSHOTS-CODE
OUTPUT MODEL
OUTPUT MODEL
OUTPUT MODEL
OUTPUT MODEL
OUTPUT MODEL
OUTPUT MODEL
OUTPUT MODEL
CONCLUSION
• The main challenge of the tracking this concept creates spam filtering as incremental learning
problem. Thus, to tackle this problem in this research work contribution made fall under spam
filtering based on Machine Learning technique.
• In future, we plan to create an efficient machine learning spam filtering method based on the
different types of weighting process and additionally take an account of hyperlink weighting
process.
LIST OF REFERENES
• "Hybrid Decision Tree and Logistic Regression Classifier for Email Spam Detection"
by Adi Wijaya, Achmad Bisri(2016)
• "Email Spam Detection using integrated approach of Naïve Bayes and Particle Swarm
Optimization" by Kriti Agarwal, Tarun Kumar(2018)
• "A Machine Learning based Spam Detection Mechanism" by Nikhil Govil, Kunal
Agarwal(2020)
• "Performance Evaluation of Machine Learning Algorithms for Email Spam
Detection" by Nandhini.S, Dr.Jeen Marseline.K.S(2020)
• "Detecting Spam Email With Machine Learning Optimized With Bio-Inspired
Metaheuristic Algorithms" by Simran Gibson, Biju Isaac(2020)
THANK YOU