This project report details the development of a machine learning model for classifying breast cancer as benign or malignant using the Wisconsin Breast Cancer Dataset. The SVM classifier, optimized through hyperparameter tuning, achieved high accuracy compared to other models like Random Forest and Logistic Regression. Future work includes incorporating more data and exploring deep learning techniques for real-time diagnostic support.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
4 views
Breast_Cancer_Classification_Report
This project report details the development of a machine learning model for classifying breast cancer as benign or malignant using the Wisconsin Breast Cancer Dataset. The SVM classifier, optimized through hyperparameter tuning, achieved high accuracy compared to other models like Random Forest and Logistic Regression. Future work includes incorporating more data and exploring deep learning techniques for real-time diagnostic support.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16
Breast Cancer Classification Using
Machine Learning Data Science Project Report
Author: Khadija Tajir
Date: April 06, 2025
Table of Contents 1. 1. Introduction 2. 2. Problem Statement 3. 3. Dataset Overview 4. 4. Data Cleaning 5. 5. Exploratory Data Analysis (EDA) 6. 6. Data Preprocessing 7. 7. Predictive Modeling (SVM) 8. 8. Model Optimization 9. 9. Model Comparison 10. 10. Evaluation Metrics 11. 11. Results and Discussion 12. 12. Conclusion 13. 13. Future Work 14. 14. References Introduction This report documents a comprehensive machine learning project aimed at classifying breast cancer as benign or malignant using predictive analytics. Problem Statement Early detection of breast cancer significantly improves the chances of successful treatment. The goal is to develop a robust classification model to assist medical diagnosis. Dataset Overview The dataset used in this project originates from the Wisconsin Breast Cancer Dataset. It contains features computed from digitized images of fine needle aspirate (FNA) of breast masses. Data Cleaning Initial steps involved checking for null values, handling duplicates, and removing irrelevant columns such as unnamed indexes or IDs. Exploratory Data Analysis (EDA) EDA revealed relationships between variables such as mean radius and texture. Distribution plots, correlation heatmaps, and class balance visualization were performed. Data Preprocessing StandardScaler was used for normalization, and label encoding was applied to the target variable to convert categories into binary format. Predictive Modeling (SVM) An SVM classifier was trained using the preprocessed data. The kernel function used was RBF, which works well for non-linear classification. Model Optimization Hyperparameter tuning was done using GridSearchCV to optimize the performance of the SVM classifier, adjusting parameters like C and gamma. Model Comparison Other models like Random Forest, Logistic Regression, and KNN were compared based on accuracy, precision, and recall. Evaluation Metrics Confusion matrix, classification report, ROC-AUC score, and accuracy were used to evaluate the model’s performance. Results and Discussion The optimized SVM model achieved high classification accuracy, outperforming baseline models in most metrics. Conclusion The project successfully built a predictive model for breast cancer classification, highlighting the strength of SVM in medical diagnosis. Future Work Incorporate additional data, experiment with deep learning, and deploy the model as a web application for real-time diagnosis support. References 1. UCI Machine Learning Repository 2. scikit-learn documentation 3. Breast Cancer Research Journals
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB