0% found this document useful (0 votes)

2 views

Awesome

The report analyzes breast cancer diagnosis using the Breast Cancer Wisconsin dataset and machine learning techniques, primarily focusing on K-Nearest Neighbors (KNN) and Gaussian Naive Bayes classifiers. KNN demonstrated superior accuracy (96.49%) compared to Naive Bayes (93.57%), and Principal Component Analysis (PCA) was employed to reduce dimensionality with minimal impact on performance. The final recommendation is to utilize KNN with 8 PCA components for optimal speed and predictive performance.

Uploaded by

poojab230080ec

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Awesome

Uploaded by

poojab230080ec

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Awesome!

Here's a fully drafted detailed report you can copy into a Word
document or share as a project report. It's based on your notebook and the PDF
summary, written in a clear, structured, and professional tone:

Comprehensive Report: Breast

Cancer Diagnosis using
Machine Learning

1. Dataset & Preprocessing

The analysis was conducted using the Breast Cancer Wisconsin (Diagnostic)
dataset, fetched using fetch_ucirepo(id=17) from the UCI Machine Learning
Repository. This dataset consists of 30 numeric features representing cell nuclei
characteristics from digitized images of fine needle aspirate (FNA) of breast masses.

Target Variable (y): Encoded to binary (0 = Malignant, 1 = Benign)




Features (X): Standardized using StandardScaler to ensure equal

contribution of all variables




Train-Test Split: 70-30 ratio with stratification to maintain class distribution

from sklearn.preprocessing import StandardScaler, LabelEncoder

from sklearn.model_selection import train_test_split

# Label encoding
le = LabelEncoder()
y = le.fit_transform(y) # Convert benign/malignant to 0/1

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
test_size=0.3, random_state=42)

# Standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

2. Baseline Model Performance (Without

PCA)
Two classifiers were used initially with all 30 standardized features:

🔹 Gaussian Naive Bayes



Accuracy: 93.57%




Strengths: High precision and recall across both classes




Limitation: Assumes feature independence, which may not hold

🔹 K-Nearest Neighbors (KNN, k=5)



Accuracy: 96.49%




Highlights: Near-perfect recall and precision for the malignant class




Observation: Outperforms Naive Bayes, likely due to better handling of

feature interactions

from sklearn.naive_bayes import GaussianNB

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Gaussian Naive Bayes

nb = GaussianNB()
nb.fit(X_train_scaled, y_train)
y_pred_nb = nb.predict(X_test_scaled)
print("NB Accuracy:", accuracy_score(y_test, y_pred_nb))

# KNN
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)
y_pred_knn = knn.predict(X_test_scaled)
print("KNN Accuracy:", accuracy_score(y_test, y_pred_knn))

3. Dimensionality Reduction Using PCA

To improve model efficiency, Principal Component Analysis (PCA) was applied to
reduce the dataset dimensionality. Experiments were conducted using 10, 9, and 8
principal components.

Compon Accura
Model
ents cy
Naive
10 91.81%
Bayes
Naive
9 91.81%
Bayes
Naive
8 91.81%
Bayes
KNN 10 95.91%
KNN 9 95.91%
KNN 8 96.49%

Interpretation:

KNN retained strong performance with reduced features




Naive Bayes experienced a slight drop, possibly due to PCA affecting

independence among features




PCA helped simplify the model without major accuracy loss


4. Hyperparameter Tuning of KNN
The optimal value for k in KNN was determined by evaluating model accuracy for k
from 1 to 15. The following observations were made:

Peak performance at k=5




Accuracy decreased slightly as k increased or decreased from 5

# Hyperparameter tuning loop

accuracies = []
for k in range(1, 16):
model = KNeighborsClassifier(n_neighbors=k)
model.fit(X_train_scaled, y_train)
acc = model.score(X_test_scaled, y_test)
accuracies.append(acc)

5. Visual Analysis
🔸 Correlation Matrix


Identified strong linear correlations between features like radius_mean and

area_mean




Helped justify the use of PCA to remove redundancy

🔸 PCA Visualization


PCA scatter plots showed good class separability even after dimensionality
reduction

🔸 K vs Accuracy Plot


Reinforced that k=5 was the optimal choice for KNN

6. Key Insights


KNN Superiority: Outperformed Naive Bayes in both full and reduced

feature spaces




PCA Effectiveness: Reduced computation without compromising accuracy

for KNN




Model Sensitivity: Naive Bayes is more sensitive to dimensionality changes




Feature Redundancy: PCA handled multicollinearity well, preserving

essential variance

7. Conclusion and Recommendation

KNN is the most effective model for this diagnostic task. It balances accuracy and
interpretability, and its robustness to PCA makes it ideal for scalable systems.

🔧 Final Recommendation:

Use KNN with 8 PCA components for a strong trade-off between speed and
predictive performance



Consider further tuning and testing with other distance metrics or ensemble
methods

Let me know if you'd like me to turn this into a formatted PDF or Word version later!

How To Use ChatGPT Effectively A Guide To Master Generative AI
No ratings yet
How To Use ChatGPT Effectively A Guide To Master Generative AI
10 pages
DSASSign4
No ratings yet
DSASSign4
11 pages
Machine Learning Assignment (1)
No ratings yet
Machine Learning Assignment (1)
8 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
18 pages
KnnClassifier - Jupyter Notebook
No ratings yet
KnnClassifier - Jupyter Notebook
2 pages
Assignment
No ratings yet
Assignment
8 pages
Breast_Cancer_Classification_Report
No ratings yet
Breast_Cancer_Classification_Report
16 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
Aditya Predictive
No ratings yet
Aditya Predictive
12 pages
SampleReport INSE6220
No ratings yet
SampleReport INSE6220
8 pages
Study of Ensemble Classifers
No ratings yet
Study of Ensemble Classifers
8 pages
Rahul Raj.ipynb - Colab
No ratings yet
Rahul Raj.ipynb - Colab
50 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
Goni 2020
No ratings yet
Goni 2020
5 pages
Practical 7
No ratings yet
Practical 7
6 pages
Using Predictive Analytics Model To Diagnose Breast Cnacer
No ratings yet
Using Predictive Analytics Model To Diagnose Breast Cnacer
9 pages
Cancer Detection
No ratings yet
Cancer Detection
12 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Foml Project Report
No ratings yet
Foml Project Report
8 pages
Breast Cancer Prediction
No ratings yet
Breast Cancer Prediction
5 pages
I. Bstract Iii. ATA ET: Heart Disease Prediction Using Weka Tools On Machine Learning Anshu Garg, Jasleen Kaur
No ratings yet
I. Bstract Iii. ATA ET: Heart Disease Prediction Using Weka Tools On Machine Learning Anshu Garg, Jasleen Kaur
9 pages
hi
No ratings yet
hi
7 pages
Classification Algorithms
No ratings yet
Classification Algorithms
16 pages
ML Report2
No ratings yet
ML Report2
21 pages
Journal-Breast Cancer Prediction
No ratings yet
Journal-Breast Cancer Prediction
10 pages
Breast Cancer Detection Using Python & Machine Learning
No ratings yet
Breast Cancer Detection Using Python & Machine Learning
12 pages
sensitivity unit 4
No ratings yet
sensitivity unit 4
4 pages
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
No ratings yet
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
11 pages
CAPSTONE REVIEW 1
No ratings yet
CAPSTONE REVIEW 1
9 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
Breast Cancer Detection and Prediction: Created by
No ratings yet
Breast Cancer Detection and Prediction: Created by
20 pages
python_final_project_group_03
No ratings yet
python_final_project_group_03
18 pages
Cancer Detection Using Data Mining
No ratings yet
Cancer Detection Using Data Mining
13 pages
br inel
No ratings yet
br inel
11 pages
AIML Practical 02 22105A2021
No ratings yet
AIML Practical 02 22105A2021
8 pages
Machine Learning Lab5
No ratings yet
Machine Learning Lab5
2 pages
Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
1 page
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
labaihw_
No ratings yet
labaihw_
1 page
ETHICS AND AI LAB FINAL
No ratings yet
ETHICS AND AI LAB FINAL
31 pages
IML Assingment Report
No ratings yet
IML Assingment Report
6 pages
LAB9
No ratings yet
LAB9
3 pages
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
No ratings yet
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
6 pages
Mla - 2 (Cia - 1) - 20221013
No ratings yet
Mla - 2 (Cia - 1) - 20221013
14 pages
Cancer Cell Classification Using Scikit
No ratings yet
Cancer Cell Classification Using Scikit
4 pages
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
No ratings yet
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
4 pages
Article Eda
No ratings yet
Article Eda
7 pages
ML
No ratings yet
ML
11 pages
br old
No ratings yet
br old
8 pages
K Nearest Neighbor Algorithm in Python - Towards Data Science
No ratings yet
K Nearest Neighbor Algorithm in Python - Towards Data Science
7 pages
Bi12-019 Bi12-263 LW3
No ratings yet
Bi12-019 Bi12-263 LW3
35 pages
Project Proposal_Breast Cancer Classification(1)
No ratings yet
Project Proposal_Breast Cancer Classification(1)
2 pages
KNN Lab
No ratings yet
KNN Lab
4 pages
Ml Lab Experiment Shortened With Same Output
No ratings yet
Ml Lab Experiment Shortened With Same Output
6 pages
ITM Document Format_Vedant
No ratings yet
ITM Document Format_Vedant
5 pages
School of Studies Engineering and Technology
No ratings yet
School of Studies Engineering and Technology
27 pages
CatBoost - An In-Depth Guide Python
No ratings yet
CatBoost - An In-Depth Guide Python
33 pages
ML Acti
No ratings yet
ML Acti
23 pages
HW Wincon
No ratings yet
HW Wincon
3 pages
Knn
No ratings yet
Knn
7 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
L2
No ratings yet
L2
68 pages
L1
No ratings yet
L1
17 pages
dl-1
No ratings yet
dl-1
46 pages
Basics
No ratings yet
Basics
9 pages
L7
No ratings yet
L7
11 pages
Foreign Lits 5 RRL
No ratings yet
Foreign Lits 5 RRL
2 pages
Department of Computer Science & Engineering, IIT Guwahati. PH.D Admission-July 2018
No ratings yet
Department of Computer Science & Engineering, IIT Guwahati. PH.D Admission-July 2018
1 page
Data Analytics for Business Lessons for Sales Marketing and Strategy 1st edition by Ira Haimowitz 9781000786187 1000786188 - The newest ebook version is ready, download now to explore
No ratings yet
Data Analytics for Business Lessons for Sales Marketing and Strategy 1st edition by Ira Haimowitz 9781000786187 1000786188 - The newest ebook version is ready, download now to explore
41 pages
TikTok_Whats_Next_Shopping_Trend_Report_2024
No ratings yet
TikTok_Whats_Next_Shopping_Trend_Report_2024
26 pages
Quarter 2 - Module 1 - Writing A Position Paper
No ratings yet
Quarter 2 - Module 1 - Writing A Position Paper
12 pages
unit3 DL JNTUK
No ratings yet
unit3 DL JNTUK
15 pages
III - II SWAYAM Professional Elective 11.01.2024
No ratings yet
III - II SWAYAM Professional Elective 11.01.2024
4 pages
NguyenCongSang ITITIU20292 Lab3
No ratings yet
NguyenCongSang ITITIU20292 Lab3
21 pages
Mj Pakdel
No ratings yet
Mj Pakdel
3 pages
Download On The Path To AI: Law’s Prophecies And The Conceptual Foundations Of The Machine Learning Age Thomas D. Grant ebook All Chapters PDF
100% (2)
Download On The Path To AI: Law’s Prophecies And The Conceptual Foundations Of The Machine Learning Age Thomas D. Grant ebook All Chapters PDF
55 pages
Network Traffic Analysis Nta)
No ratings yet
Network Traffic Analysis Nta)
135 pages
Assignment 7
No ratings yet
Assignment 7
9 pages
Emailing Unit 04 Ai ML Mnnit
No ratings yet
Emailing Unit 04 Ai ML Mnnit
64 pages
ChatGPT For Content and SEO
No ratings yet
ChatGPT For Content and SEO
16 pages
AIS Assignment
100% (1)
AIS Assignment
18 pages
7 Awesome and Free AI Tools You Should Know - by Digital Giraffes - Medium
No ratings yet
7 Awesome and Free AI Tools You Should Know - by Digital Giraffes - Medium
16 pages
2072 4119 1 SM
No ratings yet
2072 4119 1 SM
5 pages
State-of-AI_wp
No ratings yet
State-of-AI_wp
28 pages
High-Performance OCR For Printed English and Fraktur Using LSTM Networks
No ratings yet
High-Performance OCR For Printed English and Fraktur Using LSTM Networks
5 pages
Quality Control in Medical Education 2
No ratings yet
Quality Control in Medical Education 2
46 pages
Alexnet Paper
No ratings yet
Alexnet Paper
39 pages
Unit 3 - Task 5 - Challenging Myself. Quiz Time! Evaluation Quiz - Revisión Del Intento
No ratings yet
Unit 3 - Task 5 - Challenging Myself. Quiz Time! Evaluation Quiz - Revisión Del Intento
6 pages
Bias and AI
No ratings yet
Bias and AI
23 pages
05 ZeroR OneR Bayes KNN
No ratings yet
05 ZeroR OneR Bayes KNN
76 pages
IDS Sec-1 CS1-CS8 Merged Slides
No ratings yet
IDS Sec-1 CS1-CS8 Merged Slides
419 pages
Chinese Comments Sentiment Classification Based On Word2vec and SVM
No ratings yet
Chinese Comments Sentiment Classification Based On Word2vec and SVM
7 pages
How Generative AI Transforms Cloud Reliability Engineering_ _ _ InformIT
No ratings yet
How Generative AI Transforms Cloud Reliability Engineering_ _ _ InformIT
2 pages
6th sem
No ratings yet
6th sem
15 pages
A Survey of AI Music Generation Tools and Models
No ratings yet
A Survey of AI Music Generation Tools and Models
39 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Awesome

Uploaded by

Awesome

Uploaded by

Awesome!

Comprehensive Report: Breast

1. Dataset & Preprocessing

Target Variable (y): Encoded to binary (0 = Malignant, 1 = Benign)

Features (X): Standardized using StandardScaler to ensure equal

Train-Test Split: 70-30 ratio with stratification to maintain class distribution

from sklearn.preprocessing import StandardScaler, LabelEncoder

2. Baseline Model Performance (Without

🔹 Gaussian Naive Bayes

Strengths: High precision and recall across both classes

Limitation: Assumes feature independence, which may not hold

🔹 K-Nearest Neighbors (KNN, k=5)

Highlights: Near-perfect recall and precision for the malignant class

Observation: Outperforms Naive Bayes, likely due to better handling of

from sklearn.naive_bayes import GaussianNB

# Gaussian Naive Bayes

3. Dimensionality Reduction Using PCA

KNN retained strong performance with reduced features

Naive Bayes experienced a slight drop, possibly due to PCA affecting

PCA helped simplify the model without major accuracy loss

Peak performance at k=5

Accuracy decreased slightly as k increased or decreased from 5

# Hyperparameter tuning loop

Identified strong linear correlations between features like radius_mean and

Helped justify the use of PCA to remove redundancy

Reinforced that k=5 was the optimal choice for KNN

KNN Superiority: Outperformed Naive Bayes in both full and reduced

PCA Effectiveness: Reduced computation without compromising accuracy

Model Sensitivity: Naive Bayes is more sensitive to dimensionality changes

Feature Redundancy: PCA handled multicollinearity well, preserving

7. Conclusion and Recommendation

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.