Machine Learning File
Machine Learning File
Machine Learning File
MACHINE LEARNING
Assignment 3
6cse 5 X
Vridhi gupta
A2305218276
6 CSE 5 X
Supervised learning
Unsupervised learning
Reinforcement leaning
Supervised ml is the model is getting trained. to work upon the labelled datasets and these datasets
have both input and output.
During training the model, data is generally split in the ratio of 80:20. 80% as training data and 20
as testing data. In training data, we give input as well as output for 80% data. It learns from training
data only. Then the model is ready and it Is good to tested then the input is ready to be fetched from
the remaining 20 percent of data which the model has never seen before, the model will predict
some value and we will compare it with actual value and calculate accuracy.
gaussian
approach linear regression is KNN is a non- decision random forest is
naïve bayes is
an parametric parametric parametric tree non-parametric
approach method approach is nonpara approach
metric
approach
datasets Linear KNN works best Naive Bayes Decision Random Forest is
regression can on small is highly trees are suitable where
handle Big Data data sets that do accurate when easy to use the situations
analysis. not have many applied to big for small have a large
features data amounts of dataset.
classes.
training Linear regression K-NN does not Naive Bayes decision The dataset is
require data to be require does require trees can divided into
train and test any training training. work subsets and
just load the directly given to each
dataset and off from decision tree.
it runs a table of
data, with
out any
prior
design
work.
Unsupervised ml uses machine learning algorithms to analyze. and cluster unlabeled. datasets.
These algo discover hidden patterns. or data groupings without the need for human intervention. It is
self-learning where the algor can find previously unknown patterns in datasets that does not have any
labels.
Unsupervised learning is divided into subgroups i.e., clustering and associations problems.
Clustering
It is mainly used for finding a structure in a collection of data. It processes your data and find
natural clusters if they exist in the data. Various types of clustering are
Clustering Types
Hierarchical clustering
K-means clustering
K-NN (k nearest neighbors)
Principal Component Analysis
Singular Value Decomposition
Independent Component Analysis
Hierarchical Clustering:
Hierarchical clustering. is an algo which builds a hierarchy of clusters. It begins with all the
data which is assigned to a cluster of their own. Here, two close cluster are going to be in the
same cluster. This algorithm ends when there is only one cluster left.
K-means Clustering
K means it is an iterative clustering algo which helps you to find the highest. value for every
iteration. Initially, the desired number of clusters. are selected. In this clustering method, you
need to cluster. the data points into k groups. A larger k means smaller. groups with more
granularity in the same way. A lower k means larger groups with less granularity.
The output of the algorithm is a group of "labels." It assigns data point to one of the k groups.
In k-means clustering, each group is defined by creating a centroid for each group. The
centroids are like the heart of the cluster, which captures the points closest to them and adds
them to the cluster.
Agglomerative clustering
Dendrogram
Agglomerative clustering:
This type of K-means. clustering starts with a fixed number of clusters. It allocates all data
into the exact number of clusters. This clustering method does not require the number of
clusters K as an input. Agglomeration process starts by forming each data as a single cluster.
This method uses some distance measure, reduces the number of clusters (one in each
iteration) by merging process. Lastly, we have one big cluster that contains all the objects.
Dendrogram:
In this clustering method, every level will show a possible cluster. height of this cluster
shows the level of similarity bet. 2 join clusters. In whole process they are more similar
cluster which is finding of the group from dendrogram which is not natural and mostly
subjective.
K- Nearest neighbors
K- nearest neighbour is the simplest of all machine learning classifiers. It differs from other
machine learning techniques; in that it does not produce a model. It is a simple algo which
stores all available cases and classifies new instances based on a similarity measure.
It works very well when there is a distance between examples. The learning speed is slow
when the training set is large, and the distance calculation is nontrivial.
In case you want a higher-dimensional space. You need to select a basis for that space and
only the 200 most important scores of that basis. This base is known as a principal
component. The subset you select constitute is a new space which is small in size compared
to original space. It maintains as much of the complexity of data as possible.
Clustering automatically split the dataset into groups base on their similarities
Anomaly detection can discover unusual data points in your dataset. It is useful for
finding fraudulent transactions
You cannot get precise information regarding data sorting, and the output as data used
in unsupervised learning is labelled and not known
Less accuracy of the results is because the input data is not known and not labelled by
people in advance. This means that the machine requires to do this itself.
Accuracy is used when. the True Positives. and True negatives are more important
while F1-score is used when the False. Negatives and False Positives are crucial
Accuracy can be used when the class distribution is similar while F1-score is a better
Accuracy
One of the. more obvious metrics, it is the measure of all the correctly identified cases.