0% found this document useful (0 votes)

7 views

Aditya Predictive

Uploaded by

adityasah895

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Aditya Predictive

Uploaded by

adityasah895

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Project Report

PREDICTIVE-ANALYSIS

course code: INT234

Submitted in partial fulfillment of the requirements for the

award of the degree of Bachelor of Technology
School of computer science and Engineering

PROJECT-REPORT

Submitted By:
Name: Aditya Sah

Section: K21RM
Reg no: 12111525
Roll No: 27

Submitted to: Tanima Thakur mam

Introduction:
This project explores predictive analysis on a synthetic breast cancer dataset,
focusing on identifying patterns in cell characteristics that differentiate benign
from malignant tumors. By applying five machine learning algorithms—K-Nearest
Neighbors (KNN), Linear Regression, Polynomial Regression, K-Means Clustering,
and Support Vector Machine (SVM)—we aim to evaluate and compare their
performance in classifying tumor characteristics. Each model's suitability for
breast cancer diagnosis is analyzed through visualizations and performance
metrics, offering insights into the best algorithmic approach for such data. This
analysis has potential applications in early detection, assisting healthcare
professionals in diagnosing and predicting cancer progression based on cellular
attributes.

Dataset Used:
The dataset consists of synthetic data generated to reflect “real-world breast
cancer” cell characteristics.
The main attributes include:
• cell_size: Describes the size of the cells, which can vary between benign and
malignant samples.
• cell_shape: Reflects the shape of the cells, an essential feature as malignant
cells tend to have irregular shapes.
• smoothness: Indicates the smoothness of cell borders, which can differ
significantly in cancerous cells.
• cell_density: Density metric for cell formation.
• symmetry: Symmetry attribute of the cell nuclei.
Since the primary task is predictive analysis, this dataset seems suitable for a
binary classification model predicting the diagnosis based on other cell
characteristics.
Data Preprocessing:
Preprocessing is crucial to ensure that models interpret the data uniformly,
reducing noise and optimizing for accurate predictions. Here are the
preprocessing steps applied to the dataset:
• Normalization: Cell attributes were normalized to rescale the data between
0 and 1. This minimizes the influence of extreme values and ensures each
feature contributes proportionally.
• Encoding: The target variable, diagnosis, was converted into a binary factor
with labels "Benign" and "Malignant." This step simplifies classification,
allowing algorithms to learn the difference between the two classes.
• Splitting: The dataset was split into training and testing subsets. This
method is essential for evaluating model performance, as it allows us to
train models on one subset and test them on another to prevent overfitting.

Algorithms Used:
1. K-Nearest Neighbors (KNN):
KNN is a simple yet effective classification algorithm. It classifies a sample based
on the class of its nearest neighbors. The distance between samples is calculated,
and the majority class among the k-nearest samples determines the classification.
For this project, we used a k value of 21, selected based on exploratory tests to
optimize accuracy.
KNN Workflow
1. Training: The model was trained on normalized data (449 samples).
2. Testing: 100 samples from the test set were used for evaluation.
3. Evaluation: A confusion matrix was generated to examine classification
accuracy, precision, and recall.
KNN's main advantage is simplicity, but it can be sensitive to the choice of k and
the distribution of the data.

2. Linear Regression:
Linear regression is typically used for predictive modeling on continuous data. For
this project, we implemented a linear regression model to predict cell_shape
based on cell_size. This model explores the relationship between these features,
which may help identify trends in cell characteristics that correlate with
malignancy.
Linear Regression Workflow
1. Model Fit: A linear equation is fitted to the data, attempting to minimize the
difference between predicted and actual values of cell_shape.
2. Training and Testing: The training and test sets were used to observe how
well the model generalizes.
3. Visualization: The linear relationship was visualized, showing how
cell_shape varies with cell_size.
Linear regression performs well for simple, linearly-related data. However, it
struggles to capture complex patterns, which led to exploring polynomial
regression.

3. Polynomial Regression:
Polynomial regression extends linear regression by incorporating higher-degree
terms. This approach allows the model to capture non-linear relationships. We
introduced cell_size^2 and cell_size^3 terms to improve the model fit for non-
linear variations in the data.
Polynomial Regression Workflow
1. Model Fit: Polynomial terms were added, producing a more flexible curve
that fits the data more accurately than a straight line.
2. Training and Testing: The polynomial model was tested to verify improved
performance over the linear model.
3. Visualization: We plotted the polynomial regression results, showing a more
nuanced curve that captures non-linearity between cell_size and
cell_shape.
Polynomial regression is beneficial when relationships are complex, as seen in
biological data like this dataset. However, higher-degree terms may lead to
overfitting, so careful tuning is essential.

4. K-Means Clustering:
K-Means clustering is an unsupervised learning algorithm that segments data into
clusters based on similarity. Here, it was used to group data points into clusters to
explore patterns in cell characteristics without relying on labeled data.
K-Means Workflow
1. Elbow Method: This method was used to identify an optimal number of
clusters (6 in this case).
2. Cluster Assignment: Each data point was assigned to one of the six clusters.
3. Visualization: Clusters were plotted to observe segmentation, which
provides insight into potential patterns in cell characteristics across the
dataset (6 in this case).
K-Means clustering is valuable for identifying inherent structure within data,
making it useful in exploring cell attribute patterns without labels.

5. Support Vector Machine (SVM):

SVM is a robust classification algorithm that maximizes the margin between
classes. It performs well on high-dimensional data and was used here to classify
samples based on clusters generated by K-Means. A linear kernel was applied for
simplicity.
SVM Workflow
1. Data Preparation: Cluster labels were added to the dataset as a response
variable.
2. Training and Testing: Data was split, and features were scaled to ensure
uniformity.
3. Classification and Visualization: SVM decision boundaries were visualized,
showing the separation between clusters.
SVM's strength lies in its effectiveness on complex data, making it suitable for
high-dimensional classification tasks. However, it can be computationally
intensive, particularly with larger datasets.
Performance Comparison:

Algorithm Accuracy/Results Key Observations

K-Nearest High accuracy for binary Effective for classification but

Neighbors classification sensitive to k value.

Linear Useful for linear trends, but limited

Moderate fit
Regression with complex relationships.

Polynomial Better for non-linear relationships;

Improved fit over linear
Regression suited to this dataset.

K-Means Effective clustering, 5 Elbow method found 5 clusters;

Clustering clusters useful for initial grouping.
Algorithm Accuracy/Results Key Observations

Support Vector Accurate on cluster Strong at boundary definition;

Machine classification suitable for cluster labels.

In this project, each algorithm demonstrated unique strengths and limitations. K-

Nearest Neighbors (KNN) was effective and easy to understand, but its accuracy
depended on the chosen k value and was computationally intensive. Linear
Regression provided basic insights into feature relationships, but it couldn't
capture complex patterns, which Polynomial Regression handled better by fitting
non-linear trends. K-Means Clustering identified natural groupings in the dataset,
useful for discovering potential tumor subtypes, though it required careful
selection of the number of clusters. Finally, Support Vector Machine (SVM)
achieved high classification accuracy with clear decision boundaries, showing its
effectiveness in high-dimensional data but requiring careful tuning. Overall,
combining these algorithms gave a comprehensive analysis, highlighting each
model’s suitability for different aspects of tumor classification and pattern
recognition.

Overall Project Findings:

Each algorithm contributed unique insights:
The synthetic breast cancer dataset analysis highlighted key insights across several
machine learning models:
➢ KNN and SVM: Both models were highly effective in predicting benign
versus malignant cases. KNN provided simple, interpretable results but was
sensitive to the choice of 𝑘, while SVM offered a precise boundary for
classification, showing strong potential for diagnostic applications.
➢ Feature Relationships: Linear and polynomial regression models explored
relationships between features (e.g., cell size and shape), revealing that
more complex, non-linear models can better capture these biological
interactions.

➢ Clustering Insights: K-means clustering identified natural subgroups within

the data, which could be relevant for discovering potential subtypes within
the benign and malignant categories. This clustering could guide further
research into subtype-specific treatment or prognosis.

➢ Combined Modeling Approach: The findings suggest that using classification

models (KNN or SVM) for prediction, with regression insights to inform
feature selection and clustering to reveal data structure, could lead to a
well-rounded and interpretable predictive framework for cancer
diagnostics.
Each technique added unique insights, reinforcing that a multi-model approach is
beneficial for complex healthcare data, balancing predictive accuracy with
interpretability. In conclusion, combining models such as SVM or KNN with
regression-based insights and clustering could lead to a highly effective predictive
analysis framework for breast cancer diagnosis.

Conclusion:

The multi-algorithm approach provided valuable insights into the dataset's

structure and the relationships within. KNN and SVM demonstrated strong
performance in classification, while polynomial regression effectively modeled
non-linear relationships. K-Means clustering enabled the identification of distinct
cell characteristic patterns, highlighting possible tumor subtypes. The study
indicates that for similar datasets, using a combination of supervised and
unsupervised algorithms allows for a comprehensive understanding, supporting
early detection and diagnosis efforts.

In conclusion, the predictive analysis techniques applied here lay a foundation for
further research on cell characteristic analysis, potentially aiding healthcare
professionals in early cancer detection. Future work could explore additional
features, larger datasets, or other advanced algorithms to enhance predictive
accuracy and reliability further.

Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
8 pages
5G RAN Capacity Monitoring Guide (V100R016C10 - 01) (PDF) - EN
No ratings yet
5G RAN Capacity Monitoring Guide (V100R016C10 - 01) (PDF) - EN
18 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
ML Report2
No ratings yet
ML Report2
21 pages
Journal-Breast Cancer Prediction
No ratings yet
Journal-Breast Cancer Prediction
10 pages
Breast_Cancer_Classification_Report
No ratings yet
Breast_Cancer_Classification_Report
16 pages
Applying Data Mining Techniques To Breast Cancer Analysis
No ratings yet
Applying Data Mining Techniques To Breast Cancer Analysis
30 pages
Foml Project Report
No ratings yet
Foml Project Report
8 pages
Breast Cancer Prediction
No ratings yet
Breast Cancer Prediction
5 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
No ratings yet
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
11 pages
Cancer Detection Using Data Mining
No ratings yet
Cancer Detection Using Data Mining
13 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
18 pages
Project Report: Bangladesh University of Business & Technology (BUBT)
No ratings yet
Project Report: Bangladesh University of Business & Technology (BUBT)
18 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
Neural Network
No ratings yet
Neural Network
15 pages
A Computational Study On Classification of Malignant
No ratings yet
A Computational Study On Classification of Malignant
63 pages
Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
1 page
Mla - 2 (Cia - 1) - 20221013
No ratings yet
Mla - 2 (Cia - 1) - 20221013
14 pages
BREAST CANCER VIJAY & ARAVIND PROJECT 2024-06-28 RECREATE
No ratings yet
BREAST CANCER VIJAY & ARAVIND PROJECT 2024-06-28 RECREATE
14 pages
Breast Cancer Prediction (Final)
No ratings yet
Breast Cancer Prediction (Final)
28 pages
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
No ratings yet
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
6 pages
Artikel Data Science Yohana Juniati Sitorus b.indo.Id.en
No ratings yet
Artikel Data Science Yohana Juniati Sitorus b.indo.Id.en
7 pages
Using Predictive Analytics Model To Diagnose Breast Cnacer
No ratings yet
Using Predictive Analytics Model To Diagnose Breast Cnacer
9 pages
Logistic Regression For Malignancy Prediction in Cancer - by Luca Zammataro - Towards Data Science
No ratings yet
Logistic Regression For Malignancy Prediction in Cancer - by Luca Zammataro - Towards Data Science
32 pages
HW Wincon
No ratings yet
HW Wincon
3 pages
Sahana S_1BI22MC086
No ratings yet
Sahana S_1BI22MC086
47 pages
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
No ratings yet
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
11 pages
Breast Cancer Prediction Model Assignment
No ratings yet
Breast Cancer Prediction Model Assignment
37 pages
Breast Cancer
No ratings yet
Breast Cancer
20 pages
Study of Ensemble Classifers
No ratings yet
Study of Ensemble Classifers
8 pages
ITM Document Format_Vedant
No ratings yet
ITM Document Format_Vedant
5 pages
Awesome
No ratings yet
Awesome
6 pages
Machine Learning Project
No ratings yet
Machine Learning Project
12 pages
Project Final
No ratings yet
Project Final
15 pages
ML Acti
No ratings yet
ML Acti
23 pages
v08i03-06
No ratings yet
v08i03-06
6 pages
new90李美行管理科学与工程 202111200082
No ratings yet
new90李美行管理科学与工程 202111200082
14 pages
Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Breast Cancer Classifier Using Machine Learning
No ratings yet
Breast Cancer Classifier Using Machine Learning
7 pages
AMCIS 2020 Slide Template ERF
No ratings yet
AMCIS 2020 Slide Template ERF
14 pages
Chapter One to Three
No ratings yet
Chapter One to Three
39 pages
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
No ratings yet
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
21 pages
Breast Cancer Detection Using GA Feature Selection and Rotation Forest
No ratings yet
Breast Cancer Detection Using GA Feature Selection and Rotation Forest
11 pages
Artikel Data Science yohana juniati sitorus b.indo.id.en
No ratings yet
Artikel Data Science yohana juniati sitorus b.indo.id.en
7 pages
Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
No ratings yet
Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
12 pages
A-14 Mini Project Abstract
No ratings yet
A-14 Mini Project Abstract
15 pages
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
No ratings yet
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
27 pages
CHAPTER ONE to 3-1
No ratings yet
CHAPTER ONE to 3-1
51 pages
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
No ratings yet
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
4 pages
Machine Learning Algorithms For Breast Cancer Prediction
No ratings yet
Machine Learning Algorithms For Breast Cancer Prediction
8 pages
Breast Cancer Modeling and Prediction Combining
No ratings yet
Breast Cancer Modeling and Prediction Combining
6 pages
2012 IJCSE Gene Expression
No ratings yet
2012 IJCSE Gene Expression
6 pages
Breast Cancer Detection and Prediction: Created by
No ratings yet
Breast Cancer Detection and Prediction: Created by
20 pages
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
No ratings yet
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
5 pages
5 markd
No ratings yet
5 markd
24 pages
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
No ratings yet
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
5 pages
Article Eda
No ratings yet
Article Eda
7 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Smart Fitness Presentation
No ratings yet
Smart Fitness Presentation
13 pages
Introduction of HyPanel-0823
No ratings yet
Introduction of HyPanel-0823
30 pages
Computer Network MCQ
No ratings yet
Computer Network MCQ
7 pages
In Process Quality Control For Manufactu
No ratings yet
In Process Quality Control For Manufactu
5 pages
Transportation Simplex Steps
No ratings yet
Transportation Simplex Steps
3 pages
Toughbook Cf-D1: The Fully Rugged Diagnostic Tablet
No ratings yet
Toughbook Cf-D1: The Fully Rugged Diagnostic Tablet
2 pages
UNIT-1 Human Computer Interaction: Prepared by DR A.Sudhir Babu
No ratings yet
UNIT-1 Human Computer Interaction: Prepared by DR A.Sudhir Babu
33 pages
Grammar Exercise (Toiec)
No ratings yet
Grammar Exercise (Toiec)
4 pages
Blockchain in Agriculture IEEE
No ratings yet
Blockchain in Agriculture IEEE
5 pages
Sysadmin Magazine October 2021
No ratings yet
Sysadmin Magazine October 2021
32 pages
IGCSE 23 Current&VoltageInCircuits
100% (1)
IGCSE 23 Current&VoltageInCircuits
31 pages
Axcient Installation Guide
No ratings yet
Axcient Installation Guide
44 pages
MJMC Syllabus Wef.2015
No ratings yet
MJMC Syllabus Wef.2015
22 pages
Ib Math Analysis Approaches Summer Packet
No ratings yet
Ib Math Analysis Approaches Summer Packet
6 pages
Log
No ratings yet
Log
16 pages
Addition and Subtraction (Coloring)
100% (3)
Addition and Subtraction (Coloring)
52 pages
Using And Administering Linux Volume 2 Zero To Sysadmin Advanced Topics 2nd Edition David Both download
100% (1)
Using And Administering Linux Volume 2 Zero To Sysadmin Advanced Topics 2nd Edition David Both download
82 pages
ITIL 4 Sample - Questions
No ratings yet
ITIL 4 Sample - Questions
30 pages
Class 9 Computer Project
No ratings yet
Class 9 Computer Project
28 pages
INV16128R-128ch-NVR-Pixelab - Quốc 0904848459
No ratings yet
INV16128R-128ch-NVR-Pixelab - Quốc 0904848459
1 page
LTE Attach Procedure
100% (4)
LTE Attach Procedure
34 pages
Dance Music Formula Course Contents 2018 1
100% (1)
Dance Music Formula Course Contents 2018 1
17 pages
G5TM Makudam 2021 01
No ratings yet
G5TM Makudam 2021 01
8 pages
Digital Silk Social Glossary June 2022
No ratings yet
Digital Silk Social Glossary June 2022
8 pages
Smart Street Light Using Iot: Team Members
No ratings yet
Smart Street Light Using Iot: Team Members
3 pages
CCNA Dis3 - Chapter 2 - Exploring The Enterprise Network Infrastructure - PPT (Compatibility Mode)
No ratings yet
CCNA Dis3 - Chapter 2 - Exploring The Enterprise Network Infrastructure - PPT (Compatibility Mode)
43 pages
Organizational Elements
0% (1)
Organizational Elements
3 pages
Mosfet Scaling
0% (1)
Mosfet Scaling
13 pages
ACU_Internal_Communications_Policy
No ratings yet
ACU_Internal_Communications_Policy
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Aditya Predictive

Uploaded by

Aditya Predictive

Uploaded by

Project Report

course code: INT234

Submitted in partial fulfillment of the requirements for the

Submitted to: Tanima Thakur mam

5. Support Vector Machine (SVM):

Algorithm Accuracy/Results Key Observations

K-Nearest High accuracy for binary Effective for classification but

Linear Useful for linear trends, but limited

Polynomial Better for non-linear relationships;

K-Means Effective clustering, 5 Elbow method found 5 clusters;

Support Vector Accurate on cluster Strong at boundary definition;

In this project, each algorithm demonstrated unique strengths and limitations. K-

Overall Project Findings:

➢ Clustering Insights: K-means clustering identified natural subgroups within

➢ Combined Modeling Approach: The findings suggest that using classification

The multi-algorithm approach provided valuable insights into the dataset's

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.