Awesome
Awesome
Here's a fully drafted detailed report you can copy into a Word
document or share as a project report. It's based on your notebook and the PDF
summary, written in a clear, structured, and professional tone:
# Label encoding
le = LabelEncoder()
y = le.fit_transform(y) # Convert benign/malignant to 0/1
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
test_size=0.3, random_state=42)
# Standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Accuracy: 93.57%
Accuracy: 96.49%
# KNN
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)
y_pred_knn = knn.predict(X_test_scaled)
print("KNN Accuracy:", accuracy_score(y_test, y_pred_knn))
Compon Accura
Model
ents cy
Naive
10 91.81%
Bayes
Naive
9 91.81%
Bayes
Naive
8 91.81%
Bayes
KNN 10 95.91%
KNN 9 95.91%
KNN 8 96.49%
Interpretation:
4. Hyperparameter Tuning of KNN
The optimal value for k in KNN was determined by evaluating model accuracy for k
from 1 to 15. The following observations were made:
5. Visual Analysis
🔸 Correlation Matrix
🔸 PCA Visualization
PCA scatter plots showed good class separability even after dimensionality
reduction
🔸 K vs Accuracy Plot
6. Key Insights
🔧 Final Recommendation:
Use KNN with 8 PCA components for a strong trade-off between speed and
predictive performance
Consider further tuning and testing with other distance metrics or ensemble
methods
Let me know if you'd like me to turn this into a formatted PDF or Word version later!