Module 2 - ML
Module 2 - ML
Data cleaning
Missing values
Corrupted data
Remove unnecessary
data
Step 4: Exploratory data Analysis
• Data Exploration involves understanding the patterns and
trends in the data. At this stage all the useful insights are
drawn and correlations between the variables are
understood.
Step 5: Building a Machine Learning Model
Supervised
Learning Regression
Time-series forecasting
Machine Unsupervised
Learning Learning
Clustering
Robotics
Reinforcement
Learning
IoT
Supervised Machine Learning
• Supervised learning is a technique in which we
teach or train the machine using data which is
well labelled.
Unsupervised Machine Learning
• Unsupervised learning is the training of machine
using information that is unlabelled and allowing the
algorithm to act on that information without
guidance.
Reinforcement learning
• Reinforcement learning is a part of ML where an
agent is put in an environment and he learns to
behave in this environment by performing certain
actions and observing the rewards which it gets from
those actions.
Supervised v/s Unsupervised V/s
Reinforcement learning
Types of problems solved using
machine learning
Classification
• Classification is a predictive modelling problem
based on supervised learning.
• Classification is a process of categorizing a
given set of data into classes, and the data are
both structured or unstructured data.
• The process starts with predicting the class of
given data points.
• The classes are often referred to as target, label
or categories.
Types of classification
• Binary classification
• Multi-class classification
• Multi-Label classification
Regression
• It is a classification algorithm that uses one or
more independent variables to determine an
outcome.
• The regression is to find a best-fitting
relationship between the dependent variable
and a set of independent variables.
• The output is Continuous real value.
• Quantitatively explains the factors.
Clustering
• Clustering is an unsupervised learning
method, hence the input raw data does not
have an labels.
• A way of grouping the data points into
different clusters, consisting of similar data
points.
• The objects with the possible similarities
remain in a group that has less or no
similarities with another group.
Regression v/s Classification v/s Clustering
Features and Labels in Machine Learning
Identify labels
• For a regression model, the Score column is the label you would
choose, as this is a numeric value. Regression models are used to
predict a range of values.
• For a classification model, the Pass column is the label you would
choose as this column has distinct values. Classification models are
used to predict from a list of distinct categories.
Feature selection
• A feature is a column in your dataset. You use
features to train the model to predict the
outcome. Features are used to train the model to
fit the label.
• Feature selection is the process of selecting a
subset of relevant features to use when building
and training the model. Feature selection restricts
the data to the most valuable inputs, reducing
noise and improving training performance.
Feature engineering
• Feature engineering is the process of creating new
features from raw data to increase the predictive power
of the machine learning model. Engineered features
capture additional information that is not available in
the original feature set.
• Examples of feature engineering are as follows:
– Aggregating data
– Calculating a moving average
– Calculating the difference over time
– Converting text into a numeric value
– Grouping data
Types of cross validation
2. K-Fold cross validation
K=5 1000/5= 200 is my testing data
800
Exp 1 200
Exp 5
Splitting data
Why is Confusion Matrix needed?
Why is Confusion Matrix needed?
Confusion Matrix
Confusion Matrix
What is Confusion Matrix?
• Confusion Matrix is one of the Classification
Matrices.
• Used to evaluate performance of classification
algorithms.
Confusion matrix
• Thank you