5 Breast Cancer Model - Ipynb Colab
5 Breast Cancer Model - Ipynb Colab
ipynb - Colab
Q1-Imports
import pandas as pd
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
df = pd.read_csv('/kaggle/input/breast-cancer-dataset/breast-cancer.csv')
print(df.head())
symmetry_worst fractal_dimension_worst
0 0.4601 0.11890
1 0.2750 0.08902
2 0.3613 0.08758
3 0.6638 0.17300
4 0.2364 0.07678
[5 rows x 32 columns]
df.head()
concave
id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean ... radius_worst texture_worst p
points_mean
0 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 ... 25.38 17.33
1 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 ... 24.99 23.41
2 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 ... 23.57 25.53
3 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 ... 14.91 26.50
4 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 ... 22.54 16.67
5 rows × 32 columns
print("\nSummary Statistics:")
print(df.describe()) # Summary statistics of numerical features
Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 32 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 569 non-null int64
1 diagnosis 569 non-null object
2 radius_mean 569 non-null float64
3 texture_mean 569 non-null float64
4 perimeter_mean 569 non-null float64
5 area_mean 569 non-null float64
6 smoothness_mean 569 non-null float64
7 compactness_mean 569 non-null float64
8 concavity_mean 569 non-null float64
9 concave points_mean 569 non-null float64
10 symmetry_mean 569 non-null float64
11 fractal_dimension_mean 569 non-null float64
12 radius_se 569 non-null float64
13 texture_se 569 non-null float64
14 perimeter_se 569 non-null float64
15 area_se 569 non-null float64
16 smoothness_se 569 non-null float64
17 compactness_se 569 non-null float64
https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 1/5
10/11/24, 6:34 PM breast-cancer-model.ipynb - Colab
18 concavity_se 569 non-null float64
19 concave points_se 569 non-null float64
20 symmetry_se 569 non-null float64
21 fractal_dimension_se 569 non-null float64
22 radius_worst 569 non-null float64
23 texture_worst 569 non-null float64
24 perimeter_worst 569 non-null float64
25 area_worst 569 non-null float64
26 smoothness_worst 569 non-null float64
27 compactness_worst 569 non-null float64
28 concavity_worst 569 non-null float64
29 concave points_worst 569 non-null float64
30 symmetry_worst 569 non-null float64
31 fractal_dimension_worst 569 non-null float64
dtypes: float64(30), int64(1), object(1)
memory usage: 142.4+ KB
None
Summary Statistics:
id radius_mean texture_mean perimeter_mean area_mean \
count 5.690000e+02 569.000000 569.000000 569.000000 569.000000
mean 3.037183e+07 14.127292 19.289649 91.969033 654.889104
std 1.250206e+08 3.524049 4.301036 24.298981 351.914129
min 8.670000e+03 6.981000 9.710000 43.790000 143.500000
25% 8.692180e+05 11.700000 16.170000 75.170000 420.300000
50% 9.060240e+05 13.370000 18.840000 86.240000 551.100000
75% 8.813129e+06 15.780000 21.800000 104.100000 782.700000
max 9.113205e+08 28.110000 39.280000 188.500000 2501.000000
# 3. Analyzing the distribution of the target variable (assume 'diagnosis' as the target)
print("\nTarget variable distribution (Diagnosis):")
print(df['diagnosis'].value_counts())
https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 2/5
10/11/24, 6:34 PM breast-cancer-model.ipynb - Colab
https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 3/5
10/11/24, 6:34 PM breast-cancer-model.ipynb - Colab
plt.figure(figsize=(6, 4))
sns.countplot(x='diagnosis', data=df) # Change 'df=df' to 'data=df'
plt.title('Count of Diagnosis in Dataset')
plt.show()
print(df.columns)
Q3-Data Preprocessing, Assign data and labels, Scaling Data, Splitting Data
Q4-Model Implementation
Accuracy: 0.96
Precision: 0.98
Recall: 0.93
Q5-Calculate the accuracy, precision, and recall for your data set
Accuracy: 0.96
Precision: 0.98
Recall: 0.93
https://colab.research.google.com/drive/1A6NDSz1WmmrrWvLv5ija25kkoZoQlEN8#scrollTo=XIq3C7iloen6&printMode=true 5/5