0% found this document useful (0 votes)
2 views

Awesome

The report analyzes breast cancer diagnosis using the Breast Cancer Wisconsin dataset and machine learning techniques, primarily focusing on K-Nearest Neighbors (KNN) and Gaussian Naive Bayes classifiers. KNN demonstrated superior accuracy (96.49%) compared to Naive Bayes (93.57%), and Principal Component Analysis (PCA) was employed to reduce dimensionality with minimal impact on performance. The final recommendation is to utilize KNN with 8 PCA components for optimal speed and predictive performance.

Uploaded by

poojab230080ec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Awesome

The report analyzes breast cancer diagnosis using the Breast Cancer Wisconsin dataset and machine learning techniques, primarily focusing on K-Nearest Neighbors (KNN) and Gaussian Naive Bayes classifiers. KNN demonstrated superior accuracy (96.49%) compared to Naive Bayes (93.57%), and Principal Component Analysis (PCA) was employed to reduce dimensionality with minimal impact on performance. The final recommendation is to utilize KNN with 8 PCA components for optimal speed and predictive performance.

Uploaded by

poojab230080ec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Awesome!

Here's a fully drafted detailed report you can copy into a Word
document or share as a project report. It's based on your notebook and the PDF
summary, written in a clear, structured, and professional tone:

Comprehensive Report: Breast


Cancer Diagnosis using
Machine Learning

1. Dataset & Preprocessing


The analysis was conducted using the Breast Cancer Wisconsin (Diagnostic)
dataset, fetched using fetch_ucirepo(id=17) from the UCI Machine Learning
Repository. This dataset consists of 30 numeric features representing cell nuclei
characteristics from digitized images of fine needle aspirate (FNA) of breast masses.

Target Variable (y): Encoded to binary (0 = Malignant, 1 = Benign)


Features (X): Standardized using StandardScaler to ensure equal


contribution of all variables


Train-Test Split: 70-30 ratio with stratification to maintain class distribution

from sklearn.preprocessing import StandardScaler, LabelEncoder


from sklearn.model_selection import train_test_split

# Label encoding
le = LabelEncoder()
y = le.fit_transform(y) # Convert benign/malignant to 0/1

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
test_size=0.3, random_state=42)

# Standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

2. Baseline Model Performance (Without


PCA)
Two classifiers were used initially with all 30 standardized features:

🔹 Gaussian Naive Bayes


Accuracy: 93.57%


Strengths: High precision and recall across both classes


Limitation: Assumes feature independence, which may not hold

🔹 K-Nearest Neighbors (KNN, k=5)


Accuracy: 96.49%


Highlights: Near-perfect recall and precision for the malignant class


Observation: Outperforms Naive Bayes, likely due to better handling of


feature interactions

from sklearn.naive_bayes import GaussianNB


from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Gaussian Naive Bayes


nb = GaussianNB()
nb.fit(X_train_scaled, y_train)
y_pred_nb = nb.predict(X_test_scaled)
print("NB Accuracy:", accuracy_score(y_test, y_pred_nb))

# KNN
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)
y_pred_knn = knn.predict(X_test_scaled)
print("KNN Accuracy:", accuracy_score(y_test, y_pred_knn))

3. Dimensionality Reduction Using PCA


To improve model efficiency, Principal Component Analysis (PCA) was applied to
reduce the dataset dimensionality. Experiments were conducted using 10, 9, and 8
principal components.

Compon Accura
Model
ents cy
Naive
10 91.81%
Bayes
Naive
9 91.81%
Bayes
Naive
8 91.81%
Bayes
KNN 10 95.91%
KNN 9 95.91%
KNN 8 96.49%

Interpretation:

KNN retained strong performance with reduced features


Naive Bayes experienced a slight drop, possibly due to PCA affecting


independence among features


PCA helped simplify the model without major accuracy loss


4. Hyperparameter Tuning of KNN
The optimal value for k in KNN was determined by evaluating model accuracy for k
from 1 to 15. The following observations were made:

Peak performance at k=5


Accuracy decreased slightly as k increased or decreased from 5

# Hyperparameter tuning loop


accuracies = []
for k in range(1, 16):
model = KNeighborsClassifier(n_neighbors=k)
model.fit(X_train_scaled, y_train)
acc = model.score(X_test_scaled, y_test)
accuracies.append(acc)

5. Visual Analysis
🔸 Correlation Matrix

Identified strong linear correlations between features like radius_mean and


area_mean


Helped justify the use of PCA to remove redundancy

🔸 PCA Visualization

PCA scatter plots showed good class separability even after dimensionality
reduction

🔸 K vs Accuracy Plot

Reinforced that k=5 was the optimal choice for KNN

6. Key Insights

KNN Superiority: Outperformed Naive Bayes in both full and reduced


feature spaces


PCA Effectiveness: Reduced computation without compromising accuracy


for KNN


Model Sensitivity: Naive Bayes is more sensitive to dimensionality changes


Feature Redundancy: PCA handled multicollinearity well, preserving


essential variance

7. Conclusion and Recommendation


KNN is the most effective model for this diagnostic task. It balances accuracy and
interpretability, and its robustness to PCA makes it ideal for scalable systems.

🔧 Final Recommendation:

Use KNN with 8 PCA components for a strong trade-off between speed and
predictive performance



Consider further tuning and testing with other distance metrics or ensemble
methods

Let me know if you'd like me to turn this into a formatted PDF or Word version later!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy