0% found this document useful (0 votes)
10 views

Foml Project Report

Uploaded by

wovec92659
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Foml Project Report

Uploaded by

wovec92659
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

JAYPEE INSTITUTE OF INFORMATION TECHNOLOGY, NOIDA

B. TECH 5th SEMESTER

Fundamentals of Machine Learning Project

TITLE OF PROJECT:

Cancer Detection Models

Submitted By: Submitted To :

Enrollment No. Name


22103124 Khushi Agarwal
22103148 Rishav Sachdeva Dr. Sherry Garg
22103143 Soham Kukreti
22103151 Daksh Jain
PROJECT REPORT
PROBLEM STATEMENT

Cancer diagnosis is a critical area of healthcare that demands accurate and timely
predictions. Early and precise detection of cancer can significantly improve
treatment outcomes and patient survival rates. This project focuses on building
machine learning models that can classify cancer levels (malignant/benign) using
patient data.

OVERVIEW OF THE PROJECT

This project employs machine learning techniques to detect cancer based on


medical data. By leveraging algorithms like K-Nearest Neighbors (KNN), Logistic
Regression, Naive Bayes, and Support Vector Machines (SVM), the project aims
to compare their effectiveness in predicting cancer levels. This comparative study
helps identify the most accurate and efficient model, enabling advancements in
automated cancer detection.

OBJECTIVE OF THE PROJECT

● To develop a machine learning-based cancer detection models capable of


classifying cancer as benign or malignant.
● To compare the accuracy and effectiveness of KNN, Logistic Regression,
Naive Bayes, and SVM algorithms.
● To provide a reliable and efficient tool to assist in the early diagnosis of
cancer.
ALGORITHM INSIGHTS

1. K-Nearest Neighbors (KNN)


○ A non-parametric algorithm that classifies data points based on the
majority class of their k-nearest neighbors.
○ Suitable for small datasets and intuitive to implement, though it can be
computationally expensive for large datasets.
2. Logistic Regression
○ A statistical method for binary classification based on a linear
relationship between the input features and the log-odds of the target
variable.
○ Simple to implement and interpret, making it a baseline model for
classification tasks.
3. Naive Bayes
○ A probabilistic algorithm based on Bayes’ theorem, assuming feature
independence.
○ Fast and efficient for high-dimensional datasets, though its
independence assumption may not hold for all features.
4. Support Vector Machine (SVM)
○ A robust algorithm that finds the hyperplane that best separates
classes in a dataset.
○ Effective for high-dimensional spaces and datasets with a clear
margin of separation.
FLOWCHART
MODEL IMPLEMENTATION AND EVALUATION

1. Dataset Overview

The dataset contains cancer-related medical data and labels.

● Total Records: 1000


● Columns:
○ 23 Features
○ Features include various medical measurements such as Obesity,
GeneticRisk, BalancedDiet, WeightLost, ShortnessOfBreath,etc.
○ Target: Level (0 for benign, 1 for malignant).

2. Missing Data Handling

● Identified missing values and imputed them using the median value for
numerical columns.

3. Feature Engineering

● Encoded categorical variables (if any) using one-hot encoding.


● Standardized numerical features to improve model performance.

MODEL TRAINING AND EVALUATION

1. Algorithms/Models Implemented

● K-Nearest Neighbors (KNN): Tested for its simplicity and performance on


small datasets.
● Logistic Regression: Used as a baseline model for cancer detection.
● Naive Bayes: Implemented to leverage its probabilistic approach for
classification.
● Support Vector Machine (SVM): Included for its robustness in handling
complex decision boundaries.

2. Evaluation Metrics

● Accuracy: Measures the proportion of correct predictions.


● Precision, Recall, F1 Score: Provide insights into the model’s ability to
classify positive and negative samples.
● Confusion Matrix: Visualizes true positives, false positives, true negatives,
and false negatives.
● Cross-validation: Ensures reliability and consistency in model evaluation.

Confusion Matrices for all 4 models:

1. Naive Bayes:

Accuracy : 91%

2. Logistic regression:

Accuracy: 98.4%

3. Support vector machines (SVM)


Accuracy: 99.9%

4. K nearest Neighbors:
Accuracy : 96.5%

CONCLUSION

This project demonstrates the application of machine learning algorithms in cancer


detection. Among the four algorithms tested, Support Vector Machines (SVM)
emerged as the most effective, achieving the highest accuracy and F1 score. By
leveraging SVM’s robustness, this model can serve as a reliable tool for early
cancer diagnosis. The study also highlights the strengths and weaknesses of KNN,
Logistic Regression, and Naive Bayes, providing a comprehensive understanding
of their applicability in medical data analysis.

REFERENCES

1. Hastie, Trevor, et al. "The Elements of Statistical Learning: Data Mining,


Inference, and Prediction." Springer, 2009.
2. Bishop, Christopher M. "Pattern Recognition and Machine Learning."
Springer, 2006.
3. Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine
Learning 20.3 (1995): 273-297.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy