0% found this document useful (0 votes)

5 views4 pages

Spam Detection Model

This document outlines the implementation of a spam detection model using the Naive Bayes classifier, specifically the MultinomialNB variant from scikit-learn. It details the data preprocessing steps, including importing libraries, vectorizing text data, splitting the dataset, training the model, and evaluating its performance. The model also includes a user interface for predicting whether a message is spam and highlighting spam-indicative words.

Uploaded by

githouse36

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

Spam Detection Model

Uploaded by

githouse36

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Madda Walabu University

Collage of Computing Department of

Computer Science
3rd year second semester 2024

AI GROUP ASSIGNMENT

BY :
|NAMUSA HASSAN UGR/22318/13
BEKAM UGR/
Spam Detection Model
Spam Detection Using Naive Bayes Classifier

1. Introduction

Spam detection is an essential application in the domain of Natural Language Processing (NLP).
The goal is to classify email messages as either "spam" or "ham" (non-spam). In this document,
we discuss the implementation of a spam detection model using the Naive Bayes classifier and
describe the data preprocessing steps undertaken.

2. Model Used: Naive Bayes Classifier

The model used in your project is the Naive Bayes classifier, specifically the MultinomialNB
variant from the scikit-learn library. This model is particularly effective for text classification
tasks, making it well-suited for spam detection. Naive Bayes operates on the principle of Bayes'
Theorem, assuming that the presence (or absence) of a particular feature in a class is independent
of the presence (or absence) of any other feature.

3. Data Preprocessing Steps

Step 1: Importing Libraries and Loading the Dataset

To start, we imported essential libraries for data handling, text processing, model building, and
evaluation. The dataset containing emails and their labels (spam or ham) was then loaded into a
DataFrame for further processing.

import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import accuracy_score

dataset = pd.read_csv('emails.csv')

Step 2: Vectorizing Text Data

The raw text data from the emails needed to be converted into a numerical format that the Naive
Bayes model could process. This was achieved using a technique called vectorization.
The CountVectorizer was used to transform the text into a matrix of token counts, where each
column represents a unique word, and each row corresponds to an email
vectorizer = CountVectorizer()

X = vectorizer.fit_transform(dataset['text'])

y = dataset['spam']

Step 3: Splitting the Dataset

The dataset was split into two parts: a training set and a testing set. The training set is used to
train the model, while the testing set is used to evaluate the model's performance. An 80-20 split
was chosen, meaning 80% of the data was used for training and 20% was reserved for testing.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Training the Naive Bayes Model

With the text data vectorized and the dataset split, the Naive Bayes model was trained on the
training set. This involves fitting the model to the training data, allowing it to learn the patterns
and characteristics of spam and ham emails.

model = MultinomialNB()

model.fit(X_train, y_train)

Step 5: Evaluating the Model

After training, the model's performance was evaluated using the testing set. This was done by
predicting the labels of the test data and comparing them to the actual labels. The accuracy score
was calculated to measure how well the model could correctly classify emails as spam or ham.

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy:", accuracy)

Step 6: Predicting and Highlighting Spam Words

To enhance the model's usability and interpretability, a function was defined to predict whether a
given message is spam and to highlight key words that indicate spam. This function takes a
message as input, predicts its spam probability, and identifies words that are strongly associated
with spam.

def predictMessage(message):

message_vector = vectorizer.transform([message])

prediction = model.predict(message_vector)
spam_probability = model.predict_proba(message_vector)[0][1]

feature_names = vectorizer.get_feature_names_out()

log_probabilities = model.feature_log_prob_

spam_weights = log_probabilities[1]

message_words = message.split()

spam_words = [

word for word in message_words

if word in feature_names and spam_weights[feature_names.tolist().index(word)] > -1 # Adjust threshold as needed ]

result = "Spam" if prediction[0] == 1 else "Ham"

return {

"result": result,

"spam_probability": spam_probability,

"spam_words": spam_words

User Interface:

The model is designed to interact with users by predicting the nature of the entered message
(spam or ham) and highlighting spam-indicative words.

userMessage = input('Enter text to predict: ')

prediction = predictMessage(userMessage)

print(f"The message is: {prediction['result']}")

print(f"Spam Probability: {prediction['spam_probability']}")

print(f"Spam Words Highlighted: {prediction['spam_words']}")

4. Conclusion

The Naive Bayes classifier proves to be an effective and straightforward method for spam
detection in text data. The preprocessing steps, including vectorization and data splitting, are
crucial in transforming the raw text into a format suitable for model training and evaluation. The
inclusion of a function to predict and highlight spam words enhances the interpretability and
usability of the model in practical applications.

HW4 Text-1
No ratings yet
HW4 Text-1
8 pages
AIML ASSIGNMENT-2
No ratings yet
AIML ASSIGNMENT-2
8 pages
Micro
No ratings yet
Micro
5 pages
AI Phase4
No ratings yet
AI Phase4
11 pages
Document
No ratings yet
Document
11 pages
Microproject Report
No ratings yet
Microproject Report
23 pages
Aayush Nihar Spam Mail Filtering
No ratings yet
Aayush Nihar Spam Mail Filtering
18 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
implemention of sms spam filtering
No ratings yet
implemention of sms spam filtering
27 pages
Ass 3
No ratings yet
Ass 3
2 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
Simple Naive Bayes Classifier For Email Classification
No ratings yet
Simple Naive Bayes Classifier For Email Classification
5 pages
Bayesian_Inference
No ratings yet
Bayesian_Inference
20 pages
ml lab
No ratings yet
ml lab
13 pages
REPORT ON EMAIL SPAM
No ratings yet
REPORT ON EMAIL SPAM
7 pages
Sodapdf
No ratings yet
Sodapdf
1 page
AI Phash 5
No ratings yet
AI Phash 5
14 pages
Project Name Spam Email Detection 1
No ratings yet
Project Name Spam Email Detection 1
7 pages
Lab7&8 NaiveBayes
No ratings yet
Lab7&8 NaiveBayes
5 pages
Naive Bayes Classification - Jupyter Notebook
No ratings yet
Naive Bayes Classification - Jupyter Notebook
4 pages
A Comparison of The Accuracy of Support Vector
No ratings yet
A Comparison of The Accuracy of Support Vector
17 pages
Maths Answers
No ratings yet
Maths Answers
4 pages
Major-Final Research Paper
No ratings yet
Major-Final Research Paper
3 pages
A2
No ratings yet
A2
12 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Lab5 NaiveBayes Full
No ratings yet
Lab5 NaiveBayes Full
5 pages
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
No ratings yet
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
4 pages
Spam Email Detection Using Machine Learning[1] (1)
No ratings yet
Spam Email Detection Using Machine Learning[1] (1)
8 pages
MachineLearning Lecture06 PDF
No ratings yet
MachineLearning Lecture06 PDF
16 pages
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
No ratings yet
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
64 pages
Spam Classifier
No ratings yet
Spam Classifier
8 pages
lec09 (1) (1)
No ratings yet
lec09 (1) (1)
50 pages
lec09 (1)
No ratings yet
lec09 (1)
50 pages
Naive456 Bayes297Classification
No ratings yet
Naive456 Bayes297Classification
21 pages
Final_report(Saie)
No ratings yet
Final_report(Saie)
38 pages
E-Mail Spam Detection Using Machine Learning and Deep Learning
No ratings yet
E-Mail Spam Detection Using Machine Learning and Deep Learning
7 pages
E-Mail Spam Detection Using Machine Lear PDF
No ratings yet
E-Mail Spam Detection Using Machine Lear PDF
7 pages
Arnav MLlab04
No ratings yet
Arnav MLlab04
7 pages
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
No ratings yet
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
7 pages
Machine Learning Learning With Email Spam Detection
No ratings yet
Machine Learning Learning With Email Spam Detection
5 pages
Spam Detection and Filtering
No ratings yet
Spam Detection and Filtering
16 pages
vishal FOML micro project vishal & milan
No ratings yet
vishal FOML micro project vishal & milan
26 pages
67e35ab89468f8a4cb01b1e4
No ratings yet
67e35ab89468f8a4cb01b1e4
31 pages
Email Spam Detection Ppt Github
No ratings yet
Email Spam Detection Ppt Github
11 pages
Document1
No ratings yet
Document1
1 page
Lab 78
No ratings yet
Lab 78
6 pages
AIML 3.2.dpk
No ratings yet
AIML 3.2.dpk
3 pages
Spam Detection & Classification Final
No ratings yet
Spam Detection & Classification Final
38 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
16 pages
Information Security Awareness - Refresher Course
100% (2)
Information Security Awareness - Refresher Course
83 pages
cs188-fa22-note19
No ratings yet
cs188-fa22-note19
8 pages
IJCRT23A5429
No ratings yet
IJCRT23A5429
7 pages
Spam Detection Thesis
100% (3)
Spam Detection Thesis
6 pages
1822 b Deleted
No ratings yet
1822 b Deleted
38 pages
Spam Filter Project Report logistic regression
No ratings yet
Spam Filter Project Report logistic regression
10 pages
PRUTHVIRAJ MICOR FOML
No ratings yet
PRUTHVIRAJ MICOR FOML
26 pages
Zoom
No ratings yet
Zoom
20 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Gpt-For Exam Project
No ratings yet
Gpt-For Exam Project
207 pages
provide a presentation slide
No ratings yet
provide a presentation slide
2 pages
SSIGNMENT WORD
No ratings yet
SSIGNMENT WORD
18 pages
COURSE ASSIGNMENT
No ratings yet
COURSE ASSIGNMENT
2 pages
Chapter 7 doct
No ratings yet
Chapter 7 doct
23 pages
Chapter 4
No ratings yet
Chapter 4
26 pages
Computer-Network-Installation-Tools-and-Devices (1)
No ratings yet
Computer-Network-Installation-Tools-and-Devices (1)
9 pages
7 Segment Display Interfacing with PIC Microcontroller (PIC16F877A)
No ratings yet
7 Segment Display Interfacing with PIC Microcontroller (PIC16F877A)
22 pages
flashingled
No ratings yet
flashingled
40 pages
Masonry Tools and Equipment
No ratings yet
Masonry Tools and Equipment
5 pages
Central Processing Unit: Jump To Navigationjump To Search
No ratings yet
Central Processing Unit: Jump To Navigationjump To Search
3 pages
Tender No 02 CTC Tea Machinery Kipchabo Tea Factory 2017
No ratings yet
Tender No 02 CTC Tea Machinery Kipchabo Tea Factory 2017
37 pages
Wireless Home Security
No ratings yet
Wireless Home Security
9 pages
Licaros v. Licaros Topic: Kinds of Civil Actions As To Object - Action in Rem Facts
No ratings yet
Licaros v. Licaros Topic: Kinds of Civil Actions As To Object - Action in Rem Facts
3 pages
Pirates of Silicon Valley
No ratings yet
Pirates of Silicon Valley
2 pages
SDGSCN 2nd Quarterly Meeting Report
No ratings yet
SDGSCN 2nd Quarterly Meeting Report
8 pages
The Application of Small Lenses in Optical Intsruments
No ratings yet
The Application of Small Lenses in Optical Intsruments
9 pages
Vino KMF Project
No ratings yet
Vino KMF Project
81 pages
Class Lecture-06 (Multiple Correlation and regression analysis)
No ratings yet
Class Lecture-06 (Multiple Correlation and regression analysis)
19 pages
C S Retail Pricelist
No ratings yet
C S Retail Pricelist
30 pages
CHAPTER 3 - Marketing PPT MGMT
No ratings yet
CHAPTER 3 - Marketing PPT MGMT
41 pages
PDF Opening Science The Evolving Guide on How the Internet is Changing Research Collaboration and Scholarly Publishing Edited By SöNke Bartling download
100% (4)
PDF Opening Science The Evolving Guide on How the Internet is Changing Research Collaboration and Scholarly Publishing Edited By SöNke Bartling download
40 pages
Apple Certification 9l0-009 Practice Exam
No ratings yet
Apple Certification 9l0-009 Practice Exam
21 pages
Ethical Analysis of The Social Network - Edited
No ratings yet
Ethical Analysis of The Social Network - Edited
8 pages
NCBA CHINA TRIP ITINERARY
No ratings yet
NCBA CHINA TRIP ITINERARY
26 pages
Case Analysis Solution
No ratings yet
Case Analysis Solution
4 pages
Final Account BBA
100% (1)
Final Account BBA
37 pages
Models For Inventory Management - Economic Order Quantity and Economic Production Quantity
No ratings yet
Models For Inventory Management - Economic Order Quantity and Economic Production Quantity
13 pages
PQM-703 / PQM-702: Power Quality Analyzer
No ratings yet
PQM-703 / PQM-702: Power Quality Analyzer
4 pages
INS. BILL
No ratings yet
INS. BILL
1 page
By the people: debating American government Kersh - Instantly access the complete ebook with just one click
100% (1)
By the people: debating American government Kersh - Instantly access the complete ebook with just one click
61 pages
Product Data Sheet G Series Hydraulic Performance Data Metric Bettis en 84260
No ratings yet
Product Data Sheet G Series Hydraulic Performance Data Metric Bettis en 84260
7 pages
Goschen Project DFS Refresh
No ratings yet
Goschen Project DFS Refresh
105 pages
Taicounty® Bx-170: Appearance: Dry Content: Murky Paste Approx: 70%
No ratings yet
Taicounty® Bx-170: Appearance: Dry Content: Murky Paste Approx: 70%
1 page
HCL Project Chapter 1
No ratings yet
HCL Project Chapter 1
10 pages
JXSE - ProgGuide - v2.6 (Final)
No ratings yet
JXSE - ProgGuide - v2.6 (Final)
92 pages
People vs. Avila G.R. No. 84612 March 11, 1992 Ponente: PADILLA, J.: Facts
No ratings yet
People vs. Avila G.R. No. 84612 March 11, 1992 Ponente: PADILLA, J.: Facts
1 page
Thesis Glass
100% (1)
Thesis Glass
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Spam Detection Model

Uploaded by

Spam Detection Model

Uploaded by

Madda Walabu University

Collage of Computing Department of

2. Model Used: Naive Bayes Classifier

3. Data Preprocessing Steps

Step 1: Importing Libraries and Loading the Dataset

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import accuracy_score

Step 2: Vectorizing Text Data

Step 3: Splitting the Dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Training the Naive Bayes Model

Step 5: Evaluating the Model

accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy:", accuracy)

Step 6: Predicting and Highlighting Spam Words

word for word in message_words

if word in feature_names and spam_weights[feature_names.tolist().index(word)] > -1 # Adjust threshold as needed ]

result = "Spam" if prediction[0] == 1 else "Ham"

userMessage = input('Enter text to predict: ')

print(f"The message is: {prediction['result']}")

print(f"Spam Probability: {prediction['spam_probability']}")

print(f"Spam Words Highlighted: {prediction['spam_words']}")

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.