FDS Lab
FDS Lab
FDS Lab
Machine learning is a science of programming the computer by which they can learn from
different types of data. According to machine learning's definition of Arthur Samuel - "Field
of study that gives computers the ability to learn without being explicitly programmed". The
concept of machine learning is basically used for solving different types of life problems.
In previous days, the users used to perform tasks of machine learning by manually coding all
the algorithms and using mathematical and statistical formulas.
This process was time-consuming, inefficient, and tiresome compared to Python libraries,
frameworks, and modules. But in today's world, users can use the Python language, which is
the most popular and productive language for machine learning. Python has replaced many
languages as it is a vast collection of libraries, and it makes work easier and simpler.
In this tutorial, we will discuss the best libraries of Python used for Machine Learning:
o NumPy
o SciPy
o Scikit-learn
o Theano
o TensorFlow
o Keras
o PyTorch
o Pandas
o Matplotlib
NumPy
NumPy is the most popular library in Python. This library is used for processing large multi-
dimensional array and matrix formation by using a large collection of high-level
mathematical functions and formulas. It is mainly used for the computation of fundamental
science in machine learning. It is widely used for linear algebra, Fourier transformation, and
random number capabilities. There are other High-end libraries such as TensorFlow, which
user NumPy as internal functioning for manipulation of tensors.
Example:
Output:
SciPy
Example 1:
Output:
Example 2:
Output:
Scikit-learn
ADVERTISEMENT
Scikit-learn is a Python library which is used for classical machine learning algorithms. It is
built on the top of two basic libraries of Python, that is NumPy and SciPy. Scikit-learn is
popular in Machine learning developers as it supports supervised and unsupervised learning
algorithms. This library can also be used for data-analysis, and data-mining process.
Example:
Output:
DecisionTreeClassifier()
precision recall f1-score support
[[50 0 0]
[ 0 50 0]
[ 0 0 50]]
Theano
Theano is a famous library of Python, which is used for defining, evaluating, and optimizing
mathematical expressions, which also efficiently involves multi-dimensional arrays.
It is achieved by optimizing the utilization of CPU and GPU. As machine learning is all about
mathematics and statistics, Theano makes it easy for the users to perform mathematical
operations.
ADVERTISEMENT
ADVERTISEMENT
It is extensively used for unit-testing and self-verification for detecting and diagnosing
different types of errors. Theano is a powerful library that can be used on a large scale
computationally intensive scientific project. It is a simple and approachable library, which an
individual can use for their projects.
Example:
1. import theano as th
2. import theano.tensor as Tt
3. k = Tt.dmatrix('k')
4. r = 1 / (1 + Tt.exp(-k))
5. logistic_1 = th.function([k], r)
6. logistic_1([[0, 1], [-1, -2]])
Output:
array([[0.5, 0.71135838],
[0.26594342, 0.11420192]])
TensorFlow
Example:
Output:
[ 2 12 30 56]
Keras
Example:
Output:
PyTorch
PyTorch is also an open-source Python library for Machine Learning based on Torch,
which is implemented in C language and used for Machine learning. It has numerous tools
and libraries supported on the computer version, Natural Language Processing
(NLP) and many other Machine Learning programs. This library also allows users to perform
computational tasks on Tensor with GPU acceleration.
Example:
Output:
0 35089116.0
1 33087792.0
2 42227192.0
3 56113208.0
4 61125684.0
5 45541204.0
6 21011108.0
7 6972017.0
8 2523046.5
9 1342124.5
10 950067.5625
11 753290.25
12 620475.875
13 519006.71875
14 437975.9375
15 372063.125
16 317840.8125
17 272874.46875
18 235348.421875
.
.
.
497 7.426088268402964e-05
498 7.348413055296987e-05
499 7.258950790856034e-05
Pandas
Pandas is a Python library that is mainly used for data analysis. The users have to prepare
the dataset before using it for training the machine learning. Pandas make it easy for the
developers as it is developed specifically for data extraction. It has a wide variety of tools for
analysing data in detail, providing high-level data structures.
Example:
1. import pandas as pad
2.
3. data_1 = {"Countries": ["Bhutan", "Cape Verde", "Chad", "Estonia", "Guinea", "Kenya", "Liby
a", "Mexico"],
4. "capital": ["Thimphu", "Praia", "N'Djamena", "Tallinn", "Conakry", "Nairobi", "Tripoli", "M
exico City"],
5. "Currency": ["Ngultrum", "Cape Verdean escudo", "CFA Franc", "Estonia Kroon; Euro", "
Guinean franc", "Kenya shilling", "Libyan dinar", "Mexican peso"],
6. "population": [20.4, 143.5, 12.52, 135.7, 52.98, 76.21, 34.28, 54.32] }
7.
8. data_1_table = pad.DataFrame(data_1)
9. print(data_1_table)
Output:
Experiment-VI
1. # Python program to implement decision tree algorithm and plot the tree
2.
3. # Importing the required libraries
4. import pandas as pd
5. import numpy as np
6. import matplotlib.pyplot as plt
7. from sklearn import metrics
8. import seaborn as sns
9. from sklearn.datasets import load_iris
10. from sklearn.model_selection import train_test_split
11. from sklearn import tree
12.
13. # Loading the dataset
14. iris = load_iris()
15.
16. #converting the data to a pandas dataframe
17. data = pd.DataFrame(data = iris.data, columns = iris.feature_names)
18.
19. #creating a separate column for the target variable of iris dataset
20. data['Species'] = iris.target
21.
22. #replacing the categories of target variable with the actual names of the species
23. target = np.unique(iris.target)
24. target_n = np.unique(iris.target_names)
25. target_dict = dict(zip(target, target_n))
26. data['Species'] = data['Species'].replace(target_dict)
27.
28. # Separating the independent dependent variables of the dataset
29. x = data.drop(columns = "Species")
30. y = data["Species"]
31. names_features = x.columns
32. target_labels = y.unique()
33.
34. # Splitting the dataset into training and testing datasets
35. x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 93)
36.
37. # Importing the Decision Tree classifier class from sklearn
38. from sklearn.tree import DecisionTreeClassifier
39.
40. # Creating an instance of the classifier class
41. dtc = DecisionTreeClassifier(max_depth = 3, random_state = 93)
42.
43. # Fitting the training dataset to the model
44. dtc.fit(x_train, y_train)
45.
46. # Plotting the Decision Tree
47. plt.figure(figsize = (30, 10), facecolor = 'b')
48. Tree = tree.plot_tree(dtc, feature_names = names_features, class_names = target_labels, ro
unded = True, filled = True, fontsize = 14)
49. plt.show()
50. y_pred = dtc.predict(x_test)
51.
52. # Finding the confusion matrix
53. confusion_matrix = metrics.confusion_matrix(y_test, y_pred)
54. matrix = pd.DataFrame(confusion_matrix)
55. axis = plt.axes()
56. sns.set(font_scale = 1.3)
57. plt.figure(figsize = (10,7))
58.
59. # Plotting heatmap
60. sns.heatmap(matrix, annot = True, fmt = "g", ax = axis, cmap = "magma")
61. axis.set_title('Confusion Matrix')
62. axis.set_xlabel("Predicted Values", fontsize = 10)
63. axis.set_xticklabels([''] + target_labels)
64. axis.set_ylabel( "True Labels", fontsize = 10)
65. axis.set_yticklabels(list(target_labels), rotation = 0)
66. plt.show()
Output: