Machine Learning File

AMITY.
SCHOOL OF ENGINEERING AND

TECHNOLOGY AMITY UNIVERSITY, UTTAR
PRADESH
MACHINE LEARNING
Assignment 3
6cse 5 X
Vridhi gupta
A2305218276
6 CSE 5 X
Submitted To: Mr. Roshan lal
Question : Compare all ML models and explain accuracy, f1 score

What is machine learning?
A machine has to predict whether a customer will buy a specific product let’s say
“Antivirus” this year or not. The machine. will do it by looking. at the past
experiences the data of products. that the customer had bought every year and if he
buys Antivirus every year, then there is a high probability. that the customer is going
to buy an antivirus this year as well. This is how machine learning works at the basic
conceptual level.
There are 3 types of machine learning(ML)
Supervised learning
Unsupervised learning
Reinforcement leaning
How these works:
Supervised ml is the model is getting trained. to work upon the labelled datasets and these datasets
have both input and output.
During training the model, data is generally split in the ratio of 80:20. 80% as training data and 20
as testing data. In training data, we give input as well as output for 80% data. It learns from training
data only. Then the model is ready and it Is good to tested then the input is ready to be fetched from
the remaining 20 percent of data which the model has never seen before, the model will predict
some value and we will compare it with actual value and calculate accuracy.
Example of Supervised Learning Algorithms:

 Linear Regression
 Nearest Neighbor.
 Guassian Naive Bayes
 Decision Trees.
 Support Vector Machine (SVM)
 Random Forest
Linear Nearest Guassian

Regression Neighbor Naive Decision Random
Bayes Trees forest
Algorithm Find set of Save training It’s used It is used It is used for
coefficients that
minimize error on data when the for both both
training dataset features have classificati classification
continuous on and and regression
values. regression problems
problems
gaussian
approach linear regression is KNN is a non- decision random forest is
naïve bayes is
an parametric parametric parametric tree non-parametric
approach method approach is nonpara approach
metric
approach
datasets Linear KNN works best Naive Bayes Decision Random Forest is
regression can on small is highly trees are suitable where
handle Big Data data sets that do accurate when easy to use the situations
analysis. not have many applied to big for small have a large
features data amounts of dataset.
classes.
training Linear regression K-NN does not Naive Bayes decision The dataset is
require data to be require does require trees can divided into
train and test any training training. work subsets and
just load the directly given to each
dataset and off from decision tree.
it runs a table of
data, with
out any
prior
design
work.
Advantages of supervised learning
Helps to optimize performance. criteria with the help of experience.
Supervised ml helps to solve various types of real-world. computation problems.
Disadvantages of unsupervised leaning
Classifying. big data can be challenging.

Training for supervised learning. needs a lot of computation. time. So, it requires a lot of
time.
Unsupervised ml uses machine learning algorithms to analyze. and cluster unlabeled. datasets.
These algo discover hidden patterns. or data groupings without the need for human intervention. It is
self-learning where the algor can find previously unknown patterns in datasets that does not have any
labels.
Unsupervised learning is divided into subgroups i.e., clustering and associations problems.
Clustering
It is mainly used for finding a structure in a collection of data. It processes your data and find
natural clusters if they exist in the data. Various types of clustering are
Clustering Types
 Hierarchical clustering
 K-means clustering
 K-NN (k nearest neighbors)
 Principal Component Analysis
 Singular Value Decomposition
 Independent Component Analysis
Hierarchical Clustering:
Hierarchical clustering. is an algo which builds a hierarchy of clusters. It begins with all the
data which is assigned to a cluster of their own. Here, two close cluster are going to be in the
same cluster. This algorithm ends when there is only one cluster left.
K-means Clustering
K means it is an iterative clustering algo which helps you to find the highest. value for every
iteration. Initially, the desired number of clusters. are selected. In this clustering method, you
need to cluster. the data points into k groups. A larger k means smaller. groups with more
granularity in the same way. A lower k means larger groups with less granularity.
The output of the algorithm is a group of "labels." It assigns data point to one of the k groups.
In k-means clustering, each group is defined by creating a centroid for each group. The
centroids are like the heart of the cluster, which captures the points closest to them and adds
them to the cluster.
K-mean clustering further defines two subgroups:
 Agglomerative clustering
 Dendrogram
Agglomerative clustering:
This type of K-means. clustering starts with a fixed number of clusters. It allocates all data
into the exact number of clusters. This clustering method does not require the number of
clusters K as an input. Agglomeration process starts by forming each data as a single cluster.
This method uses some distance measure, reduces the number of clusters (one in each
iteration) by merging process. Lastly, we have one big cluster that contains all the objects.
Dendrogram:
In this clustering method, every level will show a possible cluster. height of this cluster
shows the level of similarity bet. 2 join clusters. In whole process they are more similar
cluster which is finding of the group from dendrogram which is not natural and mostly
subjective.
K- Nearest neighbors
K- nearest neighbour is the simplest of all machine learning classifiers. It differs from other
machine learning techniques; in that it does not produce a model. It is a simple algo which
stores all available cases and classifies new instances based on a similarity measure.
It works very well when there is a distance between examples. The learning speed is slow
when the training set is large, and the distance calculation is nontrivial.
Principal Components Analysis:
In case you want a higher-dimensional space. You need to select a basis for that space and
only the 200 most important scores of that basis. This base is known as a principal
component. The subset you select constitute is a new space which is small in size compared
to original space. It maintains as much of the complexity of data as possible.
Applications of unsupervised machine learning
 Clustering automatically split the dataset into groups base on their similarities
 Anomaly detection can discover unusual data points in your dataset. It is useful for
finding fraudulent transactions
Disadvantages of Unsupervised Learning
 You cannot get precise information regarding data sorting, and the output as data used
in unsupervised learning is labelled and not known
 Less accuracy of the results is because the input data is not known and not labelled by
people in advance. This means that the machine requires to do this itself.
Diff b/w supervised and unsupervised. learning

Explain accuracy and f1 score, precision.
Accuracy is used when. the True Positives. and True negatives are more important
while F1-score is used when the False. Negatives and False Positives are crucial
Accuracy can be used when the class distribution is similar while F1-score is a better
metric when there are imbalanced classes
Accuracy
One of the. more obvious metrics, it is the measure of all the correctly identified cases.
It is most used when. all the classes. are equally important.
Precision: It is implied as the measure of the correctly. identified positive. cases

from. all the predicted positive cases. Thus, it is useful when the costs of False
Positives are high.
Recall: It is the measure of the correctly identified positive cases from all the actual
positive cases. It is important when the cost of False Negatives is high.

Machine Learning File

Uploaded by

Copyright:

Available Formats

Machine Learning File

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning File

Uploaded by

Copyright:

Available Formats

AMITY.

SCHOOL OF ENGINEERING AND

Submitted To: Mr. Roshan lal

Question : Compare all ML models and explain accuracy, f1 score

There are 3 types of machine learning(ML)

How these works:

Example of Supervised Learning Algorithms:

Linear Nearest Guassian

Advantages of supervised learning

Helps to optimize performance. criteria with the help of experience.

Supervised ml helps to solve various types of real-world. computation problems.

Disadvantages of unsupervised leaning

Classifying. big data can be challenging.

K-mean clustering further defines two subgroups:

Principal Components Analysis:

Applications of unsupervised machine learning

Disadvantages of Unsupervised Learning

Diff b/w supervised and unsupervised. learning

metric when there are imbalanced classes

It is most used when. all the classes. are equally important.

Precision: It is implied as the measure of the correctly. identified positive. cases

positive cases. It is important when the cost of False Negatives is high.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.