Machine Learning File

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

AMITY.

SCHOOL OF ENGINEERING AND


TECHNOLOGY AMITY UNIVERSITY, UTTAR
PRADESH

MACHINE LEARNING

Assignment 3
6cse 5 X

Vridhi gupta
A2305218276
6 CSE 5 X

Submitted To: Mr. Roshan lal

Question : Compare all ML models and explain accuracy, f1 score


What is machine learning?
A machine has to predict whether a customer will buy a specific product let’s say
“Antivirus” this year or not. The machine. will do it by looking. at the past
experiences the data of products. that the customer had bought every year and if he
buys Antivirus every year, then there is a high probability. that the customer is going
to buy an antivirus this year as well. This is how machine learning works at the basic
conceptual level.

There are 3 types of machine learning(ML)

Supervised learning

Unsupervised learning

Reinforcement leaning

How these works:

Supervised ml is the model is getting trained. to work upon the labelled datasets and these datasets
have both input and output.

During training the model, data is generally split in the ratio of 80:20. 80% as training data and 20
as testing data. In training data, we give input as well as output for 80% data. It learns from training
data only. Then the model is ready and it Is good to tested then the input is ready to be fetched from
the remaining 20 percent of data which the model has never seen before, the model will predict
some value and we will compare it with actual value and calculate accuracy.

Example of Supervised Learning Algorithms:


 Linear Regression
 Nearest Neighbor.
 Guassian Naive Bayes
 Decision Trees.
 Support Vector Machine (SVM)
 Random Forest

Linear Nearest Guassian


Regression Neighbor Naive Decision Random
Bayes Trees forest
Algorithm Find set of Save training It’s used It is used It is used for
coefficients that
minimize error on data when the for both both
training dataset features have classificati classification
continuous on and and regression
values. regression problems
problems

gaussian
approach linear regression is KNN is a non- decision  random forest is
naïve bayes is
an parametric parametric parametric tree non-parametric 
approach method approach is nonpara approach
metric
approach

datasets Linear KNN works best Naive Bayes  Decision Random Forest is
regression can on small is highly trees are suitable where
handle Big Data  data sets that do accurate when easy to use the situations
analysis. not have many applied to big for small have a large
features data amounts of dataset.
classes.

training Linear regression K-NN does not Naive Bayes decision The dataset is
require data to be require does require trees can divided into
train and test any training training. work subsets and
just load the directly given to each
dataset and off from decision tree.
it runs a table of
data, with
out any
prior
design
work.

Advantages of supervised learning

Helps to optimize performance. criteria with the help of experience.

Supervised ml helps to solve various types of real-world. computation problems.

Disadvantages of unsupervised leaning

Classifying. big data can be challenging.


Training for supervised learning. needs a lot of computation. time. So, it requires a lot of
time.

Unsupervised ml uses machine learning algorithms to analyze. and cluster unlabeled. datasets.
These algo discover hidden patterns. or data groupings without the need for human intervention. It is
self-learning where the algor can find previously unknown patterns in datasets that does not have any
labels.

Unsupervised learning is divided into subgroups i.e., clustering and associations problems.

Clustering

It is mainly used for finding a structure in a collection of data. It processes your data and find
natural clusters if they exist in the data. Various types of clustering are

Clustering Types

 Hierarchical clustering
 K-means clustering
 K-NN (k nearest neighbors)
 Principal Component Analysis
 Singular Value Decomposition
 Independent Component Analysis

Hierarchical Clustering:

Hierarchical clustering. is an algo which builds a hierarchy of clusters. It begins with all the
data which is assigned to a cluster of their own. Here, two close cluster are going to be in the
same cluster. This algorithm ends when there is only one cluster left.

K-means Clustering

K means it is an iterative clustering algo which helps you to find the highest. value for every
iteration. Initially, the desired number of clusters. are selected. In this clustering method, you
need to cluster. the data points into k groups. A larger k means smaller. groups with more
granularity in the same way. A lower k means larger groups with less granularity.

The output of the algorithm is a group of "labels." It assigns data point to one of the k groups.
In k-means clustering, each group is defined by creating a centroid for each group. The
centroids are like the heart of the cluster, which captures the points closest to them and adds
them to the cluster.

K-mean clustering further defines two subgroups:

 Agglomerative clustering
 Dendrogram
Agglomerative clustering:

This type of K-means. clustering starts with a fixed number of clusters. It allocates all data
into the exact number of clusters. This clustering method does not require the number of
clusters K as an input. Agglomeration process starts by forming each data as a single cluster.

This method uses some distance measure, reduces the number of clusters (one in each
iteration) by merging process. Lastly, we have one big cluster that contains all the objects.

Dendrogram:

In this clustering method, every level will show a possible cluster. height of this cluster
shows the level of similarity bet. 2 join clusters. In whole process they are more similar
cluster which is finding of the group from dendrogram which is not natural and mostly
subjective.

K- Nearest neighbors

K- nearest neighbour is the simplest of all machine learning classifiers. It differs from other
machine learning techniques; in that it does not produce a model. It is a simple algo which
stores all available cases and classifies new instances based on a similarity measure.

It works very well when there is a distance between examples. The learning speed is slow
when the training set is large, and the distance calculation is nontrivial.

Principal Components Analysis:

In case you want a higher-dimensional space. You need to select a basis for that space and
only the 200 most important scores of that basis. This base is known as a principal
component. The subset you select constitute is a new space which is small in size compared
to original space. It maintains as much of the complexity of data as possible.

Applications of unsupervised machine learning

 Clustering automatically split the dataset into groups base on their similarities
 Anomaly detection can discover unusual data points in your dataset. It is useful for
finding fraudulent transactions

Disadvantages of Unsupervised Learning

 You cannot get precise information regarding data sorting, and the output as data used
in unsupervised learning is labelled and not known
 Less accuracy of the results is because the input data is not known and not labelled by
people in advance. This means that the machine requires to do this itself.

Diff b/w supervised and unsupervised. learning


Explain accuracy and f1 score, precision.

Accuracy is used when. the True Positives. and True negatives are more important

while F1-score is used when the False. Negatives and False Positives are crucial

Accuracy can be used when the class distribution is similar while F1-score is a better

metric when there are imbalanced classes 

Accuracy

One of the. more obvious metrics, it is the measure of all the correctly identified cases.

It is most used when. all the classes. are equally important.

Precision: It is implied as the measure of the correctly. identified positive. cases


from. all the predicted positive cases. Thus, it is useful when the costs of False
Positives are high.
Recall: It is the measure of the correctly identified positive cases from all the actual

positive cases. It is important when the cost of False Negatives is high.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy