Mini Project Sushant 612210154
Mini Project Sushant 612210154
Mini Project Sushant 612210154
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler
df = pd.read_csv(‘Dry_Bean_Dataset.csv')
df.head()
df.info()
df.describe()
df.isnull().sum()
1. Handling Missing Values: If the dataset has missing values, we can handle them by
imputing the mean for numerical columns or using forward-fill for categorical columns.
2. Spliting dataset using train_test_split:
X = df.drop(columns=['Class'])
y = df['Class']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
sns.displot(df[Area])
sns.displot(df['MajorAxisLength'])
1. Logistic Regression:
1. Accuracy:
Logistic Regression
y_pred_logreg = logreg.predict(X_test)
print("Accuracy for Logistic Regression:", accuracy_score(y_test,
y_pred_logreg))
Support Vector Machine (SVM):
y_pred_svm = svm.predict(X_test)
print("Accuracy for SVM:", accuracy_score(y_test, y_pred_svm))
y_pred_knn = knn.predict(X_test)
print("Accuracy for KNN:", accuracy_score(y_test, y_pred_knn))
The best-performing model is Support Vector Machine (SVM) , with an accuracy of 93.29%.
The most important features contributing to the prediction of Dry Beans are Area,
MajorAxisLength, Perimeter.