0% found this document useful (0 votes)

5 views

Assignment

This document presents an assignment on the application of the K-Nearest Neighbors (KNN) algorithm for breast cancer diagnosis, detailing the entire workflow from data acquisition to model evaluation. It includes sections on dataset exploration, preprocessing, methodology, and model performance metrics, highlighting the effectiveness of KNN in clinical decision support. The report concludes with suggestions for future work to enhance model accuracy and diagnostic capabilities.

Uploaded by

kholifaahmadalamin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Assignment

Uploaded by

kholifaahmadalamin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Atish Dipankar University of Science &

Technology
Department of Computer Science and Engineering

Assignment
on KNN using Breast Cancer Dataset

K-Nearest Neighbors for Breast Cancer Diagnosis

Student: Kholipha Ahmmad Al-Amin

ID: 221-0217-203

Course: Microprocessor & Assembly Languages

CSE-317/318 [Section-01]

Teacher: Prof. Mahmudur Rahman Roni

Coordinator, Department of CSE

March 18, 2025

KNN on Breast Cancer Dataset ID: 221-0217-203

Contents
1 Introduction 2

2 Dataset and Exploratory Data Analysis 2

3 Data Preprocessing 3

4 Methodology 3
4.1 K-Nearest Neighbors (KNN) Classifier . . . . . . . . . . . . . . . . . . . 3
4.2 Enhanced Machine Learning Pipeline . . . . . . . . . . . . . . . . . . . . 3
4.3 Feature Importance and Analysis . . . . . . . . . . . . . . . . . . . . . . 4

5 Model Training and Evaluation 4

5.1 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5.2 Expanded Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 4
5.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

6 Python Code Overview 5

7 Conclusion and Future Work 7

Appendix 7

1
KNN on Breast Cancer Dataset ID: 221-0217-203

Abstract
Abstract
This comprehensive ultra pro legendary report investigates the application
of the K-Nearest Neighbors (KNN) algorithm for breast cancer diagnosis.
The document details an end-to-end workflowfrom data acquisition and
preprocessing to model training, evaluation, and error analysisaugmented
with state-of-the-art visualizations and analytical insights. This framework
not only demonstrates the potential of KNN in clinical decision support but
also sets a benchmark for future diagnostic research.

1 Introduction
Breast cancer continues to be a significant global health concern. With the advent of
machine learning, innovative approacheslike the KNN classifierhave emerged as pivotal tools
for early detection and treatment planning. This report provides a detailed walkthrough
of constructing a KNN-based diagnostic model, covering all essential phases such as
data exploration, preprocessing, model implementation, evaluation, and error analysis.
Additionally, the report enriches the discussion with feature importance analysis and
insights into model limitations, ensuring a well-rounded investigation.

2 Dataset and Exploratory Data Analysis

The dataset includes numerous features representing tumor characteristics along with
diagnosis labels. Key initial insights include:

• A robust, well-organized dataset with significant statistical properties.

• Elimination of redundant columns (e.g., Unnamed: 32) ensuring data clarity.

• A clear understanding of feature distributions achieved through detailed visualizations.

Figure 1 presents an exemplary summary of the Exploratory Data Analysis (EDA),

showcasing the distribution of vital variables and statistical insights.

2
KNN on Breast Cancer Dataset ID: 221-0217-203

Figure 1: Exploratory Data Analysis Summary

3 Data Preprocessing
Data preprocessing is crucial to prepare the dataset for modeling. The steps include:

1. Data Cleaning: Removing extraneous columns and correcting inconsistencies.

2. Missing Value Imputation: Employing median imputation to handle any missing data.

3. Feature Scaling: Utilizing StandardScaler for normalization, ensuring that all features
contribute equally.

4. Data Splitting: Dividing the dataset into 70% for training and 30% for testing to
validate model performance.

4 Methodology
4.1 K-Nearest Neighbors (KNN) Classifier
The KNN algorithm classifies instances based on their proximity to training examples in
feature space. In this study:

• The hyperparameter k is set to 3, as determined by preliminary experimentation.

• The Euclidean distance metric is applied to measure similarity between instances.

4.2 Enhanced Machine Learning Pipeline

Figure 2 illustrates the enhanced machine learning pipeline with a detailed TikZ diagram.
The diagram highlights each step, providing a clear roadmap from data acquisition to
model evaluation.

3
KNN on Breast Cancer Dataset ID: 221-0217-203

Data Acquisition Data Cleaning Imputation Feature Scaling Train-Test Split

Model Evaluation KNN Training

Figure 2: Enhanced Machine Learning Pipeline for Breast Cancer Diagnosis

4.3 Feature Importance and Analysis

Understanding which features contribute most to model predictions can guide further
improvements. Although KNN does not inherently provide feature importance scores, the
following methods can be employed:

• Correlation Analysis: Evaluating the Pearson correlation coefficient between each

feature and the diagnosis.

• Recursive Feature Elimination (RFE): Utilizing RFE with alternative classifiers to

rank feature importance.

• Visualization: Creating heatmaps and pair plots to visually assess feature interactions.

5 Model Training and Evaluation

5.1 Training Process
Post-preprocessing, the KNN classifier was trained using the designated training set. The
evaluation phase included:

• Accuracy Score: Providing an overall measure of prediction accuracy.

• Confusion Matrix: Visualizing the true versus predicted labels.

• Classification Report: Detailing precision, recall, and F1-score for each diagnosis
category.

5.2 Expanded Evaluation Metrics

In addition to traditional metrics, further evaluation is provided through:

• Receiver Operating Characteristic (ROC) Curve: Illustrating the diagnostic ability

of the model.

• Area Under the Curve (AUC): Quantifying the model’s overall performance.

5.3 Results and Discussion

The classifier achieved an accuracy of approximately XX% (please update with actual
value). Figure 3 displays the confusion matrix, and additional ROC analysis is discussed
in the subsequent section. The model’s performance demonstrates strong predictive power,

4
KNN on Breast Cancer Dataset ID: 221-0217-203

while the error analysis highlights potential areas for hyperparameter tuning and further
feature engineering.

Figure 3: Confusion Matrix of the KNN Classifier

6 Python Code Overview

For transparency and reproducibility, the complete Python code used for this study is
provided below. The code encompasses all stagesfrom data loading and preprocessing to
model training and evaluation.
1 import pandas as pd
2 import numpy as np
3 import matplotlib . pyplot as plt
4 import seaborn as sns
5 from sklearn . model_selection import train_test_split
6 from sklearn . preprocessing import StandardScaler
7 from sklearn . neighbors import K N e i g h b o r s C l a s s i f i e r
8 from sklearn . metrics import accuracy_score , confusion_matrix ,
classification_report , roc_curve , auc
9

10 df = pd . read_csv ( ’/ content /2025 -03 -12 B r e a s t _ C a n c e r _ D i a g n o s t i c . csv

’ , index_col =0)
11 if ’ Unnamed : 32 ’ in df . columns :
12 df = df . drop ( ’ Unnamed : 32 ’ , axis =1)
13 print ( " Dataset Head : " )
14 print ( df . head () )
15

16 target = df [ ’ diagnosis ’]
17 features = df . drop ( ’ diagnosis ’ , axis =1)
18

5
KNN on Breast Cancer Dataset ID: 221-0217-203

19 if features . isnull () . sum () . sum () > 0:

20 print ( " Missing values detected . Imputing using median values .
")
21 features = features . fillna ( features . median () )
22

23 scaler = StandardScaler ()
24 scaled_features = scaler . fit_transform ( features )
25 df_scaled = pd . DataFrame ( scaled_features , columns = features .
columns )
26 print ( " \ nScaled Features Head : " )
27 print ( df_scaled . head () )
28

29 plt . figure ( figsize =(12 , 8) )

30 plt . subplot (2 , 2 , 1)
31 sns . histplot ( df [ " radius_mean " ] , kde = True , bins =20 , color = " blue " )
32 plt . title ( " Distribution of Radius Mean " )
33 plt . subplot (2 , 2 , 2)
34 sns . histplot ( df [ " texture_mean " ] , kde = True , bins =20 , color = " green "
)
35 plt . title ( " Distribution of Texture Mean " )
36 plt . subplot (2 , 2 , 3)
37 sns . boxplot ( x = df [ " diagnosis " ] , y = df [ " radius_mean " ] , palette = "
coolwarm " )
38 plt . title ( " Radius Mean by Diagnosis " )
39 plt . subplot (2 , 2 , 4)
40 sns . boxplot ( x = df [ " diagnosis " ] , y = df [ " texture_mean " ] , palette = "
coolwarm " )
41 plt . title ( " Texture Mean by Diagnosis " )
42 plt . tight_layout ()
43 plt . savefig ( " eda_summary . png " , dpi =1200)
44 plt . show ()
45

46 X_train , X_test , y_train , y_test = train_test_split (

47 scaled_features , target , test_size =0.30 , random_state =42
48 )
49

50 knn = KNe i g h b o r s C l a s s i f i e r ( n_neighbors =3)

51 knn . fit ( X_train , y_train )
52 y_pred = knn . predict ( X_test )
53

54 accuracy = accuracy_score ( y_test , y_pred )

55 conf_mat = confusion_matrix ( y_test , y_pred )
56 report = c l a s s i f i c a t i o n _ r e p o r t ( y_test , y_pred )
57 print ( " \ nAccuracy : " , accuracy )
58 print ( " \ nConfusion Matrix :\ n " , conf_mat )
59 print ( " \ nClassification Report :\ n " , report )
60

61 fpr , tpr , thresholds = roc_curve ( y_test . map ({ ’B ’:0 , ’M ’ :1}) , knn .

predict_proba ( X_test ) [: ,1])
62 roc_auc = auc ( fpr , tpr )
63 print ( " \ nAUC : " , roc_auc )

6
KNN on Breast Cancer Dataset ID: 221-0217-203

65 plt . figure ( figsize =(8 ,6) )

66 plt . plot ( fpr , tpr , label = ’ ROC curve ( area = %0.2 f ) ’ % roc_auc ,
color = ’ darkorange ’)
67 plt . plot ([0 , 1] , [0 , 1] , color = ’ navy ’ , linestyle = ’ -- ’)
68 plt . xlabel ( ’ False Positive Rate ’)
69 plt . ylabel ( ’ True Positive Rate ’)
70 plt . title ( ’ Receiver Operating Characteristic ’)
71 plt . legend ( loc = " lower right " )
72 plt . tight_layout ()
73 plt . savefig ( " roc_curve . png " , dpi =1200)
74 plt . show ()
75

76 plt . figure ( figsize =(8 ,6) )

77 sns . heatmap ( conf_mat , annot = True , fmt = " d " , cmap = " Blues " ,
78 xticklabels = sorted ( target . unique () ) ,
79 yticklabels = sorted ( target . unique () ) )
80 plt . title ( " Confusion Matrix \ nKNN - Breast Cancer Diagnosis " )
81 plt . xlabel ( " Predicted " )
82 plt . ylabel ( " Actual " )
83 plt . tight_layout ()
84 plt . savefig ( " confusion_matrix . png " , dpi =1200)
85 plt . show ()
Listing 1: Python Code for Breast Cancer Diagnosis using KNN

7 Conclusion and Future Work

The KNN classifier demonstrates considerable potential in accurately diagnosing breast
cancer. Its inherent simplicity and transparency make it an attractive model, while further
refinements can be pursued by:

• Hyperparameter tuning using cross-validation for optimal performance.

• Integration of more advanced classifiers (e.g., SVM, ensemble methods) for comparative
analysis.

• Implementation of feature selection techniques to identify the most significant predictors.

• Extensive ROC and AUC analysis to further validate the diagnostic capability.

In summary, this ultra pro legendary assignment not only sets a solid foundation for
breast cancer diagnosis using machine learning but also paves the way for innovative
future research and clinical advancements.

Appendix
Additional resources, extended code snippets, and comprehensive experimental logs are
provided for further reference.

SPLK-2003 Updated Dumps - Splunk SOAR Certified Automation Developer Exam
0% (1)
SPLK-2003 Updated Dumps - Splunk SOAR Certified Automation Developer Exam
11 pages
Mantenimiento Dispensador - MINIMECH
No ratings yet
Mantenimiento Dispensador - MINIMECH
183 pages
Mining Software
No ratings yet
Mining Software
12 pages
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
No ratings yet
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
3 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
18 pages
Breastcancer Research
No ratings yet
Breastcancer Research
9 pages
Breast Cancer Prediction
No ratings yet
Breast Cancer Prediction
5 pages
Breast Cancer Detection
No ratings yet
Breast Cancer Detection
15 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Breast Cancer Detection and Prediction: Created by
No ratings yet
Breast Cancer Detection and Prediction: Created by
20 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
v08i03-06
No ratings yet
v08i03-06
6 pages
LAB9
No ratings yet
LAB9
3 pages
Expt 6
No ratings yet
Expt 6
3 pages
Breast_Cancer_Classification_Report
No ratings yet
Breast_Cancer_Classification_Report
16 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Cancer Detection
No ratings yet
Cancer Detection
12 pages
Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
1 page
Project Final
No ratings yet
Project Final
15 pages
Cancer Detection Using Data Mining
No ratings yet
Cancer Detection Using Data Mining
13 pages
Building A Simple Machine Learning Model On Breast Cancer Data
No ratings yet
Building A Simple Machine Learning Model On Breast Cancer Data
12 pages
K Nearest Neighbor Algorithm in Python - Towards Data Science
No ratings yet
K Nearest Neighbor Algorithm in Python - Towards Data Science
7 pages
Awesome
No ratings yet
Awesome
6 pages
Breast Cancer
No ratings yet
Breast Cancer
20 pages
3
No ratings yet
3
5 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Ankita Patra
No ratings yet
Ankita Patra
17 pages
Journal-Breast Cancer Prediction
No ratings yet
Journal-Breast Cancer Prediction
10 pages
Foml Project Report
No ratings yet
Foml Project Report
8 pages
Machine Learning Evaluation Metrics Lecturer
No ratings yet
Machine Learning Evaluation Metrics Lecturer
30 pages
Breast Cancer Survival Prediction With Machine Learning
No ratings yet
Breast Cancer Survival Prediction With Machine Learning
12 pages
EEE385L- Project Report - Group 3
No ratings yet
EEE385L- Project Report - Group 3
44 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
HW Wincon
No ratings yet
HW Wincon
3 pages
Paper 3
No ratings yet
Paper 3
2 pages
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
No ratings yet
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
27 pages
S2_24_WIPRO_AML_Labcourse2_kittu
No ratings yet
S2_24_WIPRO_AML_Labcourse2_kittu
15 pages
DSASSign4
No ratings yet
DSASSign4
11 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
Using Predictive Analytics Model To Diagnose Breast Cnacer
No ratings yet
Using Predictive Analytics Model To Diagnose Breast Cnacer
9 pages
Breast Cancer Using Image Processing
No ratings yet
Breast Cancer Using Image Processing
3 pages
Aditya Predictive
No ratings yet
Aditya Predictive
12 pages
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
From Everand
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
Michael Gilliland
No ratings yet
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
No ratings yet
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
15 pages
Untitled PDF
No ratings yet
Untitled PDF
6 pages
Predictive Breast Cancer Statistical Modelling for Early Diagnosis
No ratings yet
Predictive Breast Cancer Statistical Modelling for Early Diagnosis
19 pages
BCPUML Breast Cancer Prediction Using Machine Learning Approach—a Performance Analysis
No ratings yet
BCPUML Breast Cancer Prediction Using Machine Learning Approach—a Performance Analysis
10 pages
Breast Cancer Detection Algo Comparison
No ratings yet
Breast Cancer Detection Algo Comparison
15 pages
Validation on selected breast cancer drugs of physicochemical features by using machine learning models
No ratings yet
Validation on selected breast cancer drugs of physicochemical features by using machine learning models
10 pages
Practical 7
No ratings yet
Practical 7
6 pages
Project Proposal_Breast Cancer Classification(1)
No ratings yet
Project Proposal_Breast Cancer Classification(1)
2 pages
Machine Learning Lab5
No ratings yet
Machine Learning Lab5
2 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Breast Cancer Detection Using Python & Machine Learning
No ratings yet
Breast Cancer Detection Using Python & Machine Learning
12 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Experiment 4
No ratings yet
Experiment 4
8 pages
Project Report: Bangladesh University of Business & Technology (BUBT)
No ratings yet
Project Report: Bangladesh University of Business & Technology (BUBT)
18 pages
Breast Cancer Detection Using Machine Learning Algorithm PDF
No ratings yet
Breast Cancer Detection Using Machine Learning Algorithm PDF
29 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Research Paper Final
No ratings yet
Research Paper Final
11 pages
Palak Project
No ratings yet
Palak Project
15 pages
Pillow Block Ball Bearing
No ratings yet
Pillow Block Ball Bearing
5 pages
Vignesh Well Test Operator-Daq
No ratings yet
Vignesh Well Test Operator-Daq
4 pages
Mott Macdonald and Systra Saved 10,000 Hours On Design of Britain'S High Speed Two Project
No ratings yet
Mott Macdonald and Systra Saved 10,000 Hours On Design of Britain'S High Speed Two Project
2 pages
White Paper c11-562881
No ratings yet
White Paper c11-562881
44 pages
RCX3_CNT_E_V3.15
No ratings yet
RCX3_CNT_E_V3.15
430 pages
Vi Cheat Sheet
No ratings yet
Vi Cheat Sheet
4 pages
Flow Controls
No ratings yet
Flow Controls
35 pages
CitectVBA Reference Guide
No ratings yet
CitectVBA Reference Guide
220 pages
Data Sheet Servo Allen Bradley
No ratings yet
Data Sheet Servo Allen Bradley
354 pages
10CS33 QB
No ratings yet
10CS33 QB
11 pages
Baumol
No ratings yet
Baumol
9 pages
Programming in C Practice Book - Module 2
No ratings yet
Programming in C Practice Book - Module 2
119 pages
KarthicK Resume
No ratings yet
KarthicK Resume
6 pages
Prajwal Siddarth: Curriculum Vitae
No ratings yet
Prajwal Siddarth: Curriculum Vitae
3 pages
Color Basics
No ratings yet
Color Basics
10 pages
Photometer 4040 v5+ PDF
No ratings yet
Photometer 4040 v5+ PDF
71 pages
LTE Attach Procedure
100% (4)
LTE Attach Procedure
34 pages
Digitizing Tools Sfumato Stitch: (C) BALARAD, S.R.O
No ratings yet
Digitizing Tools Sfumato Stitch: (C) BALARAD, S.R.O
162 pages
Rapid download FPGA Prototyping by VHDL Examples Xilinx MicroBlaze MCS SoC 2nd Edition Pong P. Chu ebook PDF all chapters
No ratings yet
Rapid download FPGA Prototyping by VHDL Examples Xilinx MicroBlaze MCS SoC 2nd Edition Pong P. Chu ebook PDF all chapters
19 pages
Brookfield Dv-Ii+ Pro Programmable Viscometer: Operating Instructions Manual No
No ratings yet
Brookfield Dv-Ii+ Pro Programmable Viscometer: Operating Instructions Manual No
82 pages
TC1475, TC2175, TF2139
100% (1)
TC1475, TC2175, TF2139
28 pages
Saab Ais
No ratings yet
Saab Ais
2 pages
Owl 20
No ratings yet
Owl 20
32 pages
Agile Unified Process
0% (1)
Agile Unified Process
2 pages
EE6211-Electric Circuits Laboratory
No ratings yet
EE6211-Electric Circuits Laboratory
88 pages
Bus Cont Plan
No ratings yet
Bus Cont Plan
45 pages
Oren
No ratings yet
Oren
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Assignment

Uploaded by

Assignment

Uploaded by

Atish Dipankar University of Science &

K-Nearest Neighbors for Breast Cancer Diagnosis

Student: Kholipha Ahmmad Al-Amin

Course: Microprocessor & Assembly Languages

Teacher: Prof. Mahmudur Rahman Roni

March 18, 2025

2 Dataset and Exploratory Data Analysis 2

5 Model Training and Evaluation 4

6 Python Code Overview 5

7 Conclusion and Future Work 7

2 Dataset and Exploratory Data Analysis

• A robust, well-organized dataset with significant statistical properties.

• Elimination of redundant columns (e.g., Unnamed: 32) ensuring data clarity.

• A clear understanding of feature distributions achieved through detailed visualizations.

Figure 1 presents an exemplary summary of the Exploratory Data Analysis (EDA),

Figure 1: Exploratory Data Analysis Summary

1. Data Cleaning: Removing extraneous columns and correcting inconsistencies.

• The hyperparameter k is set to 3, as determined by preliminary experimentation.

• The Euclidean distance metric is applied to measure similarity between instances.

4.2 Enhanced Machine Learning Pipeline

Data Acquisition Data Cleaning Imputation Feature Scaling Train-Test Split

Model Evaluation KNN Training

Figure 2: Enhanced Machine Learning Pipeline for Breast Cancer Diagnosis

4.3 Feature Importance and Analysis

• Correlation Analysis: Evaluating the Pearson correlation coefficient between each

• Recursive Feature Elimination (RFE): Utilizing RFE with alternative classifiers to

5 Model Training and Evaluation

• Accuracy Score: Providing an overall measure of prediction accuracy.

• Confusion Matrix: Visualizing the true versus predicted labels.

5.2 Expanded Evaluation Metrics

• Receiver Operating Characteristic (ROC) Curve: Illustrating the diagnostic ability

5.3 Results and Discussion

Figure 3: Confusion Matrix of the KNN Classifier

6 Python Code Overview

10 df = pd . read_csv ( ’/ content /2025 -03 -12 B r e a s t _ C a n c e r _ D i a g n o s t i c . csv

19 if features . isnull () . sum () . sum () > 0:

29 plt . figure ( figsize =(12 , 8) )

46 X_train , X_test , y_train , y_test = train_test_split (

50 knn = KNe i g h b o r s C l a s s i f i e r ( n_neighbors =3)

54 accuracy = accuracy_score ( y_test , y_pred )

61 fpr , tpr , thresholds = roc_curve ( y_test . map ({ ’B ’:0 , ’M ’ :1}) , knn .

65 plt . figure ( figsize =(8 ,6) )

76 plt . figure ( figsize =(8 ,6) )

7 Conclusion and Future Work

• Hyperparameter tuning using cross-validation for optimal performance.

• Implementation of feature selection techniques to identify the most significant predictors.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.