ML-journal
ML-journal
ML-journal
Practical journal
Seat Number [ ]
1
Department of Computer Science and Information Technology
Deccan Education Society’s
CERTIFICATE
of M.Sc. (Computer Science) with Seat No. has completed Practical journal
of Paper- Applied Machine Learning and Deep Learning under my supervision
in this College during the year 2023-2024.
Lecturer-In-Charge H.O.D.
Department of
Computer Science & IT
Date:
2
INDEX
6. 21/05/24 Practical-3.2_Linear_Regression_standard_scalar_califor
nia_housing
7. 27/05/24 Practical-3.3_ridge_Regression_standard_scalar_californi
a_housing
8. 27/05/24 Practical-3.4_Lasso_Regression_standard_scalar_califor
nia_housing
[2]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 2 columns):
# Column Non-Null Count Dtype
1
--- ------ -------------- -----
0 Hours 25 non-null float64
1 Scores 25 non-null int64
dtypes: float64(1), int64(1)
memory usage: 532.0 bytes
[3]: x = df.iloc[:,:-1]
y = df.iloc[:,-1]
[4]: x
[4]: Hours
0 2.5
1 5.1
2 3.2
3 8.5
4 3.5
5 1.5
6 9.2
7 5.5
8 8.3
9 2.7
10 7.7
11 5.9
12 4.5
13 3.3
14 1.1
15 8.9
16 2.5
17 1.9
18 6.1
19 7.4
20 2.7
21 4.8
22 3.8
23 6.9
24 7.8
[5]: y
[5]: 0 21
1 47
2 27
3 75
4 30
5 20
6 88
7 60
2
8 81
9 25
10 85
11 62
12 41
13 42
14 17
15 95
16 30
17 24
18 67
19 69
20 30
21 54
22 35
23 76
24 86
Name: Scores, dtype: int64
[8]: ytrain
[8]: 10 85
18 67
19 69
4 30
2 27
20 30
6 88
7 60
22 35
1 47
16 30
0 21
15 95
24 86
23 76
9 25
8 81
12 41
11 62
5 20
Name: Scores, dtype: int64
3
[9]: from sklearn.linear_model import LinearRegression
[10]: lr = LinearRegression()
[11]: LinearRegression()
Beta 0
[14]: lr.intercept_
[14]: -1.5369573315500702
Beta 1
[15]: lr.coef_
[15]: array([10.46110829])
[16]: print(ytest)
print(predictions)
14 17
13 42
17 24
3 75
21 54
Name: Scores, dtype: int64
[ 9.97026179 32.98470004 18.33914843 87.38246316 48.67636248]
[18]: mean_absolute_error(ytest,predictions)
[18]: 7.882398086270432
[19]: r2_score(ytest,predictions)
[19]: 0.8421031525243527
4
Practical-1.2_linear_regression_Advertising_dataset
[3]: df = pd.read_csv('./dataset/Advertising.csv')
[4]: df
1
[6]: x = df.iloc[:,:-1]
x
2
197 198 177.0 9.3 6.4
198 199 283.6 42.0 66.2
199 200 232.1 8.6 8.7
[7]: y = df.iloc[:,-1]
y
[7]: 0 22.1
1 10.4
2 9.3
3 18.5
4 12.9
…
195 7.6
196 9.7
197 12.8
198 25.5
199 13.4
Name: Sales, Length: 200, dtype: float64
[10]: lr.fit(xtrain,ytrain)
[10]: LinearRegression()
[12]: predict1
3
[13]: from sklearn.metrics import r2_score,mean_absolute_error
[14]: print(r2_score(ytest,predict1))
0.8021354391516504
[15]: print(mean_absolute_error(ytest,predict1))
1.5264827355298254
4
Practical-2.1_Logistic_Regression_placement_dataset
[2]: df = pd.read_csv("./dataset/placement.csv")
[3]: df
[4]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 cgpa 1000 non-null float64
1 placement_exam_marks 1000 non-null float64
2 placed 1000 non-null int64
dtypes: float64(2), int64(1)
memory usage: 23.6 KB
[5]: df.shape
[5]: (1000, 3)
[6]: df.head()
1
[6]: cgpa placement_exam_marks placed
0 7.19 26.0 1
1 7.46 38.0 1
2 7.54 40.0 1
3 6.42 8.0 1
4 7.23 17.0 0
[7]: df['placed'].unique()
[8]: X = df.iloc[:,:-1]
[9]: y = df.iloc[:,-1]
[14]: LogisticRegression()
[17]: probability[:10]
[18]: predictions
[18]: array([0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0,
2
0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0])
[20]: 0.48
3
Practical-2.2_Logistic_Regression_iris_cv_kfold
[5]: x = iris.data
[14]: x[:10]
[7]: y = iris.target
[8]: y
[8]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
[9]: lr = LogisticRegression()
1
[11]: score = cross_val_score(lr, x, y, cv=k_fold)
2
Practical-3.1_Lasso_Regression_housing-dataset
# First 3
df.head(3)
# Random 3
df.sample(3)
# Last 3
df.tail(3)
[2]: x = df.iloc[:,:-1]
x.shape
[3]: y = df.iloc[:,-1]
y
[3]: 0 24.0
1 21.6
2 34.7
3 33.4
4 36.2
…
1
501 22.4
502 20.6
503 23.9
504 22.0
505 11.9
Name: MEDV, Length: 506, dtype: float64
model.fit(xtrain, ytrain)
[6]: Lasso()
[8]: 0.662198077052326
model1 = Lasso()
params = {'alpha':[0.00001,0.0001,0.001,0.01]}
[11]: Lasso(alpha=0.01)
2
[12]: ypred2 = model2.predict(xtest)
r2_score(ytest, ypred2)
[12]: 0.7787372388293925
3
Practical-
3.2_Linear_Regression_standard_scalar_california_housing
Longitude Price_House
0 -122.23 4.526
1 -122.22 3.585
2 -122.24 3.521
3 -122.25 3.413
4 -122.25 3.422
… … …
20635 -121.09 0.781
20636 -121.21 0.771
20637 -121.22 0.923
20638 -121.32 0.847
20639 -121.24 0.894
1
[20640 rows x 9 columns]
[3]: x = boston_df.iloc[:,:-1]
[4]: x
Longitude
0 -122.23
1 -122.22
2 -122.24
3 -122.25
4 -122.25
… …
20635 -121.09
20636 -121.21
20637 -121.22
20638 -121.32
20639 -121.24
[5]: y = boston_df.iloc[:,-1]
[6]: y
[6]: 0 4.526
1 3.585
2 3.521
3 3.413
4 3.422
…
20635 0.781
20636 0.771
20637 0.923
2
20638 0.847
20639 0.894
Name: Price_House, Length: 20640, dtype: float64
[8]: x_sc
[11]: lr.fit(xtrain,ytrain)
[11]: LinearRegression()
[14]: print(r2_score(ytest,predict1))
0.6015507891610434
[15]: print(mean_absolute_error(ytest,predict1))
0.5362588391493065
3
Practical-
3.3_ridge_Regression_standard_scalar_california_housing
Longitude
0 -122.23
1 -122.22
2 -122.24
[2]: df["HousePrice"]=ds.target
df.head(3)
Longitude HousePrice
0 -122.23 4.526
1 -122.22 3.585
2 -122.24 3.521
[3]: X = df.iloc[:,:-1]
y = df.iloc[:,-1]
[4]: X.head(3)
1
[4]: MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude \
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85
Longitude
0 -122.23
1 -122.22
2 -122.24
[5]: y.head(3)
[5]: 0 4.526
1 3.585
2 3.521
Name: HousePrice, dtype: float64
[9]: rr.fit(xtrain,ytrain)
[9]: Ridge(alpha=10)
2
[11]: from sklearn.metrics import r2_score, mean_absolute_error
r2_score(ytest,predict)
[11]: 0.6014258698314037
[12]: mean_absolute_error(ytest,predict)
[12]: np.float64(0.536239634635214)
3
Practical-
3.4_Lasso_Regression_standard_scalar_california_housing
[2]: ds = fetch_california_housing()
df = pd.DataFrame(ds.data, columns = ds.feature_names)
df["HousePrice"] = ds.target
df.head(3)
Longitude HousePrice
0 -122.23 4.526
1 -122.22 3.585
2 -122.24 3.521
[3]: X = df.iloc[:,:-1]
y = df.iloc[:,-1]
[4]: X.head(3)
Longitude
0 -122.23
1 -122.22
2 -122.24
[5]: y.head(3)
1
[5]: 0 4.526
1 3.585
2 3.521
Name: HousePrice, dtype: float64
x_sc
[9]: Lasso(alpha=10)
[11]: -0.00033321189864432554
[12]: mean_absolute_error(ytest,predict)
[12]: np.float64(0.9147942692371251)
2
Practical-4_KNN_Regressor-assignment-done
[1]: crim zn indus chas nox rm age dis rad tax ptratio \
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8
b lstat medv
0 396.90 4.98 24.0
1 396.90 9.14 21.6
2 392.83 4.03 34.7
[4]: x = sc.fit_transform(x_og)
x
[5]: x.shape
1
[5]: (506, 13)
[6]: y = data.iloc[:,-1]
y.shape
[6]: (506,)
[8]: KNeighborsRegressor(n_neighbors=3)
[10]: 0.8069614057252134
[12]: ss.fit(x_og)
x = ss.transform(x_og)
scoring = "neg_mean_squared_error"
2
print(f"Best Params: {search.best_params_}")
[19]: 0.8638489106891928
3
Practical-5_KNN_Classifier_assignment_done
[2]: x
1
[5.4, 3.4, 1.5, 0.4],
[5.2, 4.1, 1.5, 0.1],
[5.5, 4.2, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.2],
[5. , 3.2, 1.2, 0.2],
[5.5, 3.5, 1.3, 0.2],
[4.9, 3.6, 1.4, 0.1],
[4.4, 3. , 1.3, 0.2],
[5.1, 3.4, 1.5, 0.2],
[5. , 3.5, 1.3, 0.3],
[4.5, 2.3, 1.3, 0.3],
[4.4, 3.2, 1.3, 0.2],
[5. , 3.5, 1.6, 0.6],
[5.1, 3.8, 1.9, 0.4],
[4.8, 3. , 1.4, 0.3],
[5.1, 3.8, 1.6, 0.2],
[4.6, 3.2, 1.4, 0.2],
[5.3, 3.7, 1.5, 0.2],
[5. , 3.3, 1.4, 0.2],
[7. , 3.2, 4.7, 1.4],
[6.4, 3.2, 4.5, 1.5],
[6.9, 3.1, 4.9, 1.5],
[5.5, 2.3, 4. , 1.3],
[6.5, 2.8, 4.6, 1.5],
[5.7, 2.8, 4.5, 1.3],
[6.3, 3.3, 4.7, 1.6],
[4.9, 2.4, 3.3, 1. ],
[6.6, 2.9, 4.6, 1.3],
[5.2, 2.7, 3.9, 1.4],
[5. , 2. , 3.5, 1. ],
[5.9, 3. , 4.2, 1.5],
[6. , 2.2, 4. , 1. ],
[6.1, 2.9, 4.7, 1.4],
[5.6, 2.9, 3.6, 1.3],
[6.7, 3.1, 4.4, 1.4],
[5.6, 3. , 4.5, 1.5],
[5.8, 2.7, 4.1, 1. ],
[6.2, 2.2, 4.5, 1.5],
[5.6, 2.5, 3.9, 1.1],
[5.9, 3.2, 4.8, 1.8],
[6.1, 2.8, 4. , 1.3],
[6.3, 2.5, 4.9, 1.5],
[6.1, 2.8, 4.7, 1.2],
[6.4, 2.9, 4.3, 1.3],
[6.6, 3. , 4.4, 1.4],
[6.8, 2.8, 4.8, 1.4],
[6.7, 3. , 5. , 1.7],
2
[6. , 2.9, 4.5, 1.5],
[5.7, 2.6, 3.5, 1. ],
[5.5, 2.4, 3.8, 1.1],
[5.5, 2.4, 3.7, 1. ],
[5.8, 2.7, 3.9, 1.2],
[6. , 2.7, 5.1, 1.6],
[5.4, 3. , 4.5, 1.5],
[6. , 3.4, 4.5, 1.6],
[6.7, 3.1, 4.7, 1.5],
[6.3, 2.3, 4.4, 1.3],
[5.6, 3. , 4.1, 1.3],
[5.5, 2.5, 4. , 1.3],
[5.5, 2.6, 4.4, 1.2],
[6.1, 3. , 4.6, 1.4],
[5.8, 2.6, 4. , 1.2],
[5. , 2.3, 3.3, 1. ],
[5.6, 2.7, 4.2, 1.3],
[5.7, 3. , 4.2, 1.2],
[5.7, 2.9, 4.2, 1.3],
[6.2, 2.9, 4.3, 1.3],
[5.1, 2.5, 3. , 1.1],
[5.7, 2.8, 4.1, 1.3],
[6.3, 3.3, 6. , 2.5],
[5.8, 2.7, 5.1, 1.9],
[7.1, 3. , 5.9, 2.1],
[6.3, 2.9, 5.6, 1.8],
[6.5, 3. , 5.8, 2.2],
[7.6, 3. , 6.6, 2.1],
[4.9, 2.5, 4.5, 1.7],
[7.3, 2.9, 6.3, 1.8],
[6.7, 2.5, 5.8, 1.8],
[7.2, 3.6, 6.1, 2.5],
[6.5, 3.2, 5.1, 2. ],
[6.4, 2.7, 5.3, 1.9],
[6.8, 3. , 5.5, 2.1],
[5.7, 2.5, 5. , 2. ],
[5.8, 2.8, 5.1, 2.4],
[6.4, 3.2, 5.3, 2.3],
[6.5, 3. , 5.5, 1.8],
[7.7, 3.8, 6.7, 2.2],
[7.7, 2.6, 6.9, 2.3],
[6. , 2.2, 5. , 1.5],
[6.9, 3.2, 5.7, 2.3],
[5.6, 2.8, 4.9, 2. ],
[7.7, 2.8, 6.7, 2. ],
[6.3, 2.7, 4.9, 1.8],
[6.7, 3.3, 5.7, 2.1],
3
[7.2, 3.2, 6. , 1.8],
[6.2, 2.8, 4.8, 1.8],
[6.1, 3. , 4.9, 1.8],
[6.4, 2.8, 5.6, 2.1],
[7.2, 3. , 5.8, 1.6],
[7.4, 2.8, 6.1, 1.9],
[7.9, 3.8, 6.4, 2. ],
[6.4, 2.8, 5.6, 2.2],
[6.3, 2.8, 5.1, 1.5],
[6.1, 2.6, 5.6, 1.4],
[7.7, 3. , 6.1, 2.3],
[6.3, 3.4, 5.6, 2.4],
[6.4, 3.1, 5.5, 1.8],
[6. , 3. , 4.8, 1.8],
[6.9, 3.1, 5.4, 2.1],
[6.7, 3.1, 5.6, 2.4],
[6.9, 3.1, 5.1, 2.3],
[5.8, 2.7, 5.1, 1.9],
[6.8, 3.2, 5.9, 2.3],
[6.7, 3.3, 5.7, 2.5],
[6.7, 3. , 5.2, 2.3],
[6.3, 2.5, 5. , 1.9],
[6.5, 3. , 5.2, 2. ],
[6.2, 3.4, 5.4, 2.3],
[5.9, 3. , 5.1, 1.8]])
[3]: y
[3]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
[5]: KNeighborsClassifier()
4
[7]: from sklearn.metrics import accuracy_score
accuracy_score(ytest, predictions)
[7]: 1.0
[14]: 1.0
5
Practical-6_K-means-R
[1]: library(ggplot2)
library(cluster)
[3]: df = as.data.frame(read.csv("./dataset/marks.csv"))
df
1
English Maths Science
<int> <int> <int>
99 100 100
98 99 97
92 9 96
95 92 94
90 100 96
80 75 82
75 83 80
A data.frame: 17 × 3 72 73 74
71 82 76
73 74 76
34 32 28
26 28 30
32 30 31
98 97 98
30 29 29
78 75 78
100 99 100
2
[6]: final_model = kmeans(kmdata,3,nstart = 25)
final_model
Cluster means:
English Maths Science
1 30.50 29.75000 29.50000
2 92.00 9.00000 96.00000
3 85.75 87.41667 87.58333
Clustering vector:
[1] 3 3 2 3 3 3 3 3 3 3 1 1 1 3 1 3 3
3
Within cluster sum of squares by cluster:
[1] 48.750 0.000 4254.083
(between_SS / total_SS = 88.8 %)
Available components:
4
Practical-7_SVM_classifier
[4]: df = pd.read_csv("./dataset/diabetes.csv")
[5]: df.head(3)
[6]: df.sample(2)
[7]: df.shape
[7]: (768, 9)
[8]: df["Outcome"].value_counts()
[8]: Outcome
0 500
1 268
Name: count, dtype: int64
1
[9]: X = df.iloc[:,:-1]
X.shape
[9]: (768, 8)
[10]: y = df.iloc[:,-1]
y.shape
[10]: (768,)
[11]: X
DiabetesPedigreeFunction Age
0 0.627 50
1 0.351 31
2 0.672 32
3 0.167 21
4 2.288 33
.. … …
763 0.171 63
764 0.340 27
765 0.245 30
766 0.349 47
767 0.315 23
[12]: y
[12]: 0 1
1 0
2 1
3 0
4 1
..
2
763 0
764 0
765 0
766 1
767 0
Name: Outcome, Length: 768, dtype: int64
model1.fit(xtrain, ytrain)
accuracy_score(ytest, predictions)
[17]: 0.78125
[20]: 0.796875
3
[21]: array([[113, 10],
[ 29, 40]])
[23]: 0.7708333333333334
4
Practical-8_Ensemble_Bagging
[3]: df = load_breast_cancer()
x=df.data
y=df.target
[6]: pipeline.fit(xtrain,ytrain)
[7]: print("Score:",pipeline.score(xtest,ytest))
pred1=pipeline.predict(xtest)
from sklearn.metrics import accuracy_score
print("Accuracy Score:",accuracy_score(ytest,pred1))
Score: 0.9790209790209791
Accuracy Score: 0.9790209790209791
[8]: bgclassifier=␣
↪BaggingClassifier(estimator=pipeline,n_estimators=100,max_features=10,max_samples=100,␣
↪random_state=1)
[9]: bgclassifier.fit(xtrain,ytrain)
[9]: BaggingClassifier(estimator=Pipeline(steps=[('standardscaler',
StandardScaler()),
1
('logisticregression',
LogisticRegression(random_state=1))]),
max_features=10, max_samples=100, n_estimators=100,
random_state=1)
[10]: print(bgclassifier.score(xtest,ytest))
0.958041958041958
[ ]: