0% found this document useful (0 votes)

67 views

5 Breast Cancer Model - Ipynb Colab

Uploaded by

anshikarana09925

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

5 Breast Cancer Model - Ipynb Colab

Uploaded by

anshikarana09925

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

10/11/24, 6:34 PM breast-cancer-model.

ipynb - Colab

Q1-Imports

import pandas as pd
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

df = pd.read_csv('/kaggle/input/breast-cancer-dataset/breast-cancer.csv')
print(df.head())

id diagnosis radius_mean texture_mean perimeter_mean area_mean \

0 842302 M 17.99 10.38 122.80 1001.0
1 842517 M 20.57 17.77 132.90 1326.0
2 84300903 M 19.69 21.25 130.00 1203.0
3 84348301 M 11.42 20.38 77.58 386.1
4 84358402 M 20.29 14.34 135.10 1297.0

smoothness_mean compactness_mean concavity_mean concave points_mean \

0 0.11840 0.27760 0.3001 0.14710
1 0.08474 0.07864 0.0869 0.07017
2 0.10960 0.15990 0.1974 0.12790
3 0.14250 0.28390 0.2414 0.10520
4 0.10030 0.13280 0.1980 0.10430

... radius_worst texture_worst perimeter_worst area_worst \

0 ... 25.38 17.33 184.60 2019.0
1 ... 24.99 23.41 158.80 1956.0
2 ... 23.57 25.53 152.50 1709.0
3 ... 14.91 26.50 98.87 567.7
4 ... 22.54 16.67 152.20 1575.0

smoothness_worst compactness_worst concavity_worst concave points_worst \

0 0.1622 0.6656 0.7119 0.2654
1 0.1238 0.1866 0.2416 0.1860
2 0.1444 0.4245 0.4504 0.2430
3 0.2098 0.8663 0.6869 0.2575
4 0.1374 0.2050 0.4000 0.1625

symmetry_worst fractal_dimension_worst
0 0.4601 0.11890
1 0.2750 0.08902
2 0.3613 0.08758
3 0.6638 0.17300
4 0.2364 0.07678

[5 rows x 32 columns]

df.head()

concave
id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean ... radius_worst texture_worst p
points_mean

0 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 ... 25.38 17.33

1 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 ... 24.99 23.41

2 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 ... 23.57 25.53

3 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 ... 14.91 26.50

4 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 ... 22.54 16.67

5 rows × 32 columns

Q2- Data Analysis

# 1. Understanding the structure of the dataset

print("Dataset Information:")
print(df.info()) # Information about data types and missing values

print("\nSummary Statistics:")
print(df.describe()) # Summary statistics of numerical features

Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 32 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 569 non-null int64
1 diagnosis 569 non-null object
2 radius_mean 569 non-null float64
3 texture_mean 569 non-null float64
4 perimeter_mean 569 non-null float64
5 area_mean 569 non-null float64
6 smoothness_mean 569 non-null float64
7 compactness_mean 569 non-null float64
8 concavity_mean 569 non-null float64
9 concave points_mean 569 non-null float64
10 symmetry_mean 569 non-null float64
11 fractal_dimension_mean 569 non-null float64
12 radius_se 569 non-null float64
13 texture_se 569 non-null float64
14 perimeter_se 569 non-null float64
15 area_se 569 non-null float64
16 smoothness_se 569 non-null float64
17 compactness_se 569 non-null float64
https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 1/5
10/11/24, 6:34 PM breast-cancer-model.ipynb - Colab
18 concavity_se 569 non-null float64
19 concave points_se 569 non-null float64
20 symmetry_se 569 non-null float64
21 fractal_dimension_se 569 non-null float64
22 radius_worst 569 non-null float64
23 texture_worst 569 non-null float64
24 perimeter_worst 569 non-null float64
25 area_worst 569 non-null float64
26 smoothness_worst 569 non-null float64
27 compactness_worst 569 non-null float64
28 concavity_worst 569 non-null float64
29 concave points_worst 569 non-null float64
30 symmetry_worst 569 non-null float64
31 fractal_dimension_worst 569 non-null float64
dtypes: float64(30), int64(1), object(1)
memory usage: 142.4+ KB
None

Summary Statistics:
id radius_mean texture_mean perimeter_mean area_mean \
count 5.690000e+02 569.000000 569.000000 569.000000 569.000000
mean 3.037183e+07 14.127292 19.289649 91.969033 654.889104
std 1.250206e+08 3.524049 4.301036 24.298981 351.914129
min 8.670000e+03 6.981000 9.710000 43.790000 143.500000
25% 8.692180e+05 11.700000 16.170000 75.170000 420.300000
50% 9.060240e+05 13.370000 18.840000 86.240000 551.100000
75% 8.813129e+06 15.780000 21.800000 104.100000 782.700000
max 9.113205e+08 28.110000 39.280000 188.500000 2501.000000

smoothness_mean compactness_mean concavity_mean concave points_mean \

count 569.000000 569.000000 569.000000 569.000000
mean 0.096360 0.104341 0.088799 0.048919
std 0.014064 0.052813 0.079720 0.038803
min 0.052630 0.019380 0.000000 0.000000

# 2. Checking for missing values

missing_values = df.isnull().sum()
print("\nMissing values in each column:")
print(missing_values)

Missing values in each column:

id 0
diagnosis 0
radius_mean 0
texture_mean 0
perimeter_mean 0
area_mean 0
smoothness_mean 0
compactness_mean 0
concavity_mean 0
concave points_mean 0
symmetry_mean 0
fractal_dimension_mean 0
radius_se 0
texture_se 0
perimeter_se 0
area_se 0
smoothness_se 0
compactness_se 0
concavity_se 0
concave points_se 0
symmetry_se 0
fractal_dimension_se 0
radius_worst 0
texture_worst 0
perimeter_worst 0
area_worst 0
smoothness_worst 0
compactness_worst 0
concavity_worst 0
concave points_worst 0
symmetry_worst 0
fractal_dimension_worst 0
dtype: int64

# 3. Analyzing the distribution of the target variable (assume 'diagnosis' as the target)
print("\nTarget variable distribution (Diagnosis):")
print(df['diagnosis'].value_counts())

Target variable distribution (Diagnosis):

diagnosis
B 357
M 212
Name: count, dtype: int64

# 4. Visualizing some important aspects of the dataset

import matplotlib.pyplot as plt
import seaborn as sns

# Select only numeric columns for the correlation matrix

numeric_df = df.select_dtypes(include=[np.number])

# Visualizing the correlation between numerical features

plt.figure(figsize=(12, 8))
sns.heatmap(numeric_df.corr(), annot=False, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 2/5
10/11/24, 6:34 PM breast-cancer-model.ipynb - Colab

# Print the column names to verify

print(df.columns)

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',

'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean',
'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se',
'fractal_dimension_se', 'radius_worst', 'texture_worst',
'perimeter_worst', 'area_worst', 'smoothness_worst',
'compactness_worst', 'concavity_worst', 'concave points_worst',
'symmetry_worst', 'fractal_dimension_worst'],
dtype='object')

# Generate the boxplot using the correct column name

print(df.columns)
print(df[['diagnosis', 'radius_mean']].head())
sns.boxplot(x='diagnosis', y='radius_mean', data=df)# Replace 'radius_mean' with the actual column name

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',

https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 3/5
10/11/24, 6:34 PM breast-cancer-model.ipynb - Colab
plt.figure(figsize=(6, 4))
sns.countplot(x='diagnosis', data=df) # Change 'df=df' to 'data=df'
plt.title('Count of Diagnosis in Dataset')
plt.show()
print(df.columns)

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',

Q3-Data Preprocessing, Assign data and labels, Scaling Data, Splitting Data

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split

# 1. Assign Data (Features) and Labels (Target)

# Assuming 'diagnosis' is the target and the rest are features
# Replace 'diagnosis' with the actual name of the label column
X = df.drop(columns=['diagnosis']) # Features (all columns except 'diagnosis')
y = df['diagnosis'] # Labels (target)

# 2. Encode the labels (if needed)

# If 'diagnosis' is categorical (e.g., 'M' and 'B' for malignant and benign), convert it to numerical labels
y = y.map({'M': 1, 'B': 0}) # Assuming 'M' for malignant and 'B' for benign; adjust if necessary

# 3. Scaling the Data

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # Standardizing features by removing mean and scaling to unit variance

# 4. Splitting the Data

# Split the dataset into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Display shapes of training and testing sets

print("Training set size:", X_train.shape)
print("Test set size:", X_test.shape)

Training set size: (455, 31)

Test set size: (114, 31)

Q4-Model Implementation

# Importing the Naive Bayes classifier

from sklearn.naive_bayes import GaussianNB

# 1. Initialize the Naive Bayes classifier

model = GaussianNB()

# 2. Train the model using the training data

model.fit(X_train, y_train)

# 3. Make predictions on the test set

y_pred = model.predict(X_test)

# 4. Evaluate the model's performance

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

# 5. Print the evaluation metrics

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

Accuracy: 0.96
Precision: 0.98
Recall: 0.93

Q5-Calculate the accuracy, precision, and recall for your data set

from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix, ConfusionMatrixDisplay

# 1. Evaluate the model's performance

accuracy accuracy score(y test y pred)
https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 4/5
10/11/24, 6:34 PM breast-cancer-model.ipynb - Colab
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

# 2. Print the evaluation metrics

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

# 3. Calculate and display the confusion matrix

cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Benign (0)', 'Malignant (1)'])
disp.plot(cmap='Blues')
plt.title('Confusion Matrix')
plt.show()

Accuracy: 0.96
Precision: 0.98
Recall: 0.93

https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 5/5

EXERCISE No. 2 - Context of Organization
67% (3)
EXERCISE No. 2 - Context of Organization
3 pages
Lotto Format 3
81% (21)
Lotto Format 3
3 pages
Ajeet Business Tracker Aug 20221234
No ratings yet
Ajeet Business Tracker Aug 20221234
14 pages
Assignment 1 - Introduction To Machine Learning: Version 1.0 of This Notebook. To Download
0% (1)
Assignment 1 - Introduction To Machine Learning: Version 1.0 of This Notebook. To Download
30 pages
Toyota - ToyotaCare Plus
No ratings yet
Toyota - ToyotaCare Plus
4 pages
20BCP021 Assignment 3
No ratings yet
20BCP021 Assignment 3
7 pages
LAB # 08 Naive Bayes.ipynb - Colab
No ratings yet
LAB # 08 Naive Bayes.ipynb - Colab
3 pages
Breast Cancer Classification With Machine Learning
No ratings yet
Breast Cancer Classification With Machine Learning
17 pages
AML_LAB21 6 6 1.Ipynb - Colab
No ratings yet
AML_LAB21 6 6 1.Ipynb - Colab
6 pages
Project 1
No ratings yet
Project 1
6 pages
Expt4.ipynb - JupyterLab
No ratings yet
Expt4.ipynb - JupyterLab
6 pages
1FsWES7YJDERHD-bZ2ujFakbQyzi6 Yin
No ratings yet
1FsWES7YJDERHD-bZ2ujFakbQyzi6 Yin
9 pages
ML Project - Binary - Colaboratory
No ratings yet
ML Project - Binary - Colaboratory
7 pages
Python Code For Machine Learning
No ratings yet
Python Code For Machine Learning
26 pages
45B AIML Practical 08
No ratings yet
45B AIML Practical 08
10 pages
Script Group8
No ratings yet
Script Group8
19 pages
Breast Cancer
No ratings yet
Breast Cancer
30 pages
A008 - KNN.R: # Load The Dataset
No ratings yet
A008 - KNN.R: # Load The Dataset
4 pages
Machine Learning Algorithm
No ratings yet
Machine Learning Algorithm
18 pages
T 5
No ratings yet
T 5
30 pages
T 5
No ratings yet
T 5
30 pages
Cancer Classification
No ratings yet
Cancer Classification
21 pages
sample_dataset.csv
No ratings yet
sample_dataset.csv
27 pages
Hussain-assin2_cancrclassification
No ratings yet
Hussain-assin2_cancrclassification
12 pages
Breast Cancer Dataset
No ratings yet
Breast Cancer Dataset
154 pages
Breast Cancer Prdiction
No ratings yet
Breast Cancer Prdiction
16 pages
# Import Plotting Libraries: in (1) : Import Pandas As PD
No ratings yet
# Import Plotting Libraries: in (1) : Import Pandas As PD
13 pages
ML WEEK3
No ratings yet
ML WEEK3
3 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
7.01 Feature Selection
No ratings yet
7.01 Feature Selection
3 pages
ML 4
No ratings yet
ML 4
4 pages
howxtre
No ratings yet
howxtre
8 pages
ML - LAB 2 - Jupyter Notebook
No ratings yet
ML - LAB 2 - Jupyter Notebook
9 pages
Breast Cancer
No ratings yet
Breast Cancer
6 pages
Experiment - 12: Random Forest in Python
No ratings yet
Experiment - 12: Random Forest in Python
3 pages
Cancer Data
No ratings yet
Cancer Data
56 pages
Tare02 2022
No ratings yet
Tare02 2022
2 pages
Support Vector Machines com Python
No ratings yet
Support Vector Machines com Python
13 pages
Mini Project
No ratings yet
Mini Project
8 pages
Breast Cancer Classification Using DTC
No ratings yet
Breast Cancer Classification Using DTC
1 page
Notebooklien 1
No ratings yet
Notebooklien 1
1 page
Lect7 Skrearing
No ratings yet
Lect7 Skrearing
23 pages
Python - How To Make A 4d Plot With Matplotlib Using Arbitrary Data - Stack Overflow
No ratings yet
Python - How To Make A 4d Plot With Matplotlib Using Arbitrary Data - Stack Overflow
13 pages
4.4. Data Standardization - Ipynb - Colaboratory
No ratings yet
4.4. Data Standardization - Ipynb - Colaboratory
1 page
Merged
No ratings yet
Merged
35 pages
Mini Project With Output
No ratings yet
Mini Project With Output
8 pages
3
No ratings yet
3
5 pages
python_final_project_group_03
No ratings yet
python_final_project_group_03
18 pages
AreaUnderNormalCurve PDF
No ratings yet
AreaUnderNormalCurve PDF
6 pages
Data - Breast Cancer
No ratings yet
Data - Breast Cancer
52 pages
Cancer de Mama Sin Estandarizar Estratificado
No ratings yet
Cancer de Mama Sin Estandarizar Estratificado
10 pages
PCA
No ratings yet
PCA
23 pages
Breastcancer
No ratings yet
Breastcancer
13 pages
m1
No ratings yet
m1
10 pages
Ex 5 - NN - Wheat Seed Data
No ratings yet
Ex 5 - NN - Wheat Seed Data
9 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
CatBoost - An In-Depth Guide Python
No ratings yet
CatBoost - An In-Depth Guide Python
33 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
Exp 07 (ML)
No ratings yet
Exp 07 (ML)
4 pages
Log File
No ratings yet
Log File
1,698 pages
EXP 07 (ML) - Sarthak
No ratings yet
EXP 07 (ML) - Sarthak
4 pages
Dovdush_KN-305_lab3
No ratings yet
Dovdush_KN-305_lab3
2 pages
A List of Factorial Math Constants
From Everand
A List of Factorial Math Constants
StreetLib
No ratings yet
SAT Testing Resources
No ratings yet
SAT Testing Resources
7 pages
Aquaread BlackBox Instruction Manual Revision N
No ratings yet
Aquaread BlackBox Instruction Manual Revision N
69 pages
Sophie The Siamese Cat - Jess Huff
100% (1)
Sophie The Siamese Cat - Jess Huff
10 pages
Group 3 Assessment 1
No ratings yet
Group 3 Assessment 1
11 pages
CB-6-POI-NF-27: Technical Data Sheet
No ratings yet
CB-6-POI-NF-27: Technical Data Sheet
2 pages
TS Comm Rules-3-R5
No ratings yet
TS Comm Rules-3-R5
3 pages
11 Appendix B. Color Characteristics of Scanners and Recorders
No ratings yet
11 Appendix B. Color Characteristics of Scanners and Recorders
3 pages
Tutorial Lecture On Nano Heat Transfer - PPT Only)
No ratings yet
Tutorial Lecture On Nano Heat Transfer - PPT Only)
74 pages
Construction Technology 1
0% (1)
Construction Technology 1
4 pages
Shop Lighting: Led Lamps
No ratings yet
Shop Lighting: Led Lamps
1 page
Janesh and Harshit
No ratings yet
Janesh and Harshit
14 pages
Siemens Pitch2
No ratings yet
Siemens Pitch2
16 pages
Purpose: Work Instruction
No ratings yet
Purpose: Work Instruction
8 pages
Method Statement: Company: Naseem & Son Project: Date: 21/05/2022 1. Description of Works
No ratings yet
Method Statement: Company: Naseem & Son Project: Date: 21/05/2022 1. Description of Works
3 pages
CybVA 6 CDS v1.2
No ratings yet
CybVA 6 CDS v1.2
3 pages
Citibank Differentiate Its e
No ratings yet
Citibank Differentiate Its e
5 pages
PDF 1718703288752
No ratings yet
PDF 1718703288752
2 pages
Monthly Ladder Inspection Checklist
No ratings yet
Monthly Ladder Inspection Checklist
4 pages
Redragon: TORS Series - Transformer Oil Regeneration System
No ratings yet
Redragon: TORS Series - Transformer Oil Regeneration System
1 page
Fundamentals of HVACR 2012 SamHui 02
No ratings yet
Fundamentals of HVACR 2012 SamHui 02
16 pages
Cisco UCS Central 1-4 v1 Demo Guide
100% (1)
Cisco UCS Central 1-4 v1 Demo Guide
109 pages
Introduction To Data Science: Bill Howe, PHD
No ratings yet
Introduction To Data Science: Bill Howe, PHD
9 pages
Section 5 Lubrication: Rt540E Operator'S Manual Lubrication
No ratings yet
Section 5 Lubrication: Rt540E Operator'S Manual Lubrication
13 pages
Data Guru
100% (1)
Data Guru
53 pages
Management Review
100% (3)
Management Review
8 pages
w3 PDF
No ratings yet
w3 PDF
41 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

5 Breast Cancer Model - Ipynb Colab

Uploaded by

5 Breast Cancer Model - Ipynb Colab

Uploaded by

10/11/24, 6:34 PM breast-cancer-model.

id diagnosis radius_mean texture_mean perimeter_mean area_mean \

smoothness_mean compactness_mean concavity_mean concave points_mean \

... radius_worst texture_worst perimeter_worst area_worst \

smoothness_worst compactness_worst concavity_worst concave points_worst \

Q2- Data Analysis

# 1. Understanding the structure of the dataset

smoothness_mean compactness_mean concavity_mean concave points_mean \

# 2. Checking for missing values

Missing values in each column:

Target variable distribution (Diagnosis):

# 4. Visualizing some important aspects of the dataset

# Select only numeric columns for the correlation matrix

# Visualizing the correlation between numerical features

# Print the column names to verify

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',

# Generate the boxplot using the correct column name

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',

Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',

from sklearn.preprocessing import StandardScaler

# 1. Assign Data (Features) and Labels (Target)

# 2. Encode the labels (if needed)

# 3. Scaling the Data

# 4. Splitting the Data

# Display shapes of training and testing sets

Training set size: (455, 31)

# Importing the Naive Bayes classifier

# 1. Initialize the Naive Bayes classifier

# 2. Train the model using the training data

# 3. Make predictions on the test set

# 4. Evaluate the model's performance

# 5. Print the evaluation metrics

from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix, ConfusionMatrixDisplay

# 1. Evaluate the model's performance

# 2. Print the evaluation metrics

# 3. Calculate and display the confusion matrix

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.