0% found this document useful (0 votes)

21 views

Unit 3 PPT

Uploaded by

MR ROBOT Byte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Unit 3 PPT

Uploaded by

MR ROBOT Byte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

All the IPython Notebooks in Machine larning lecture series by **[Mr.Mounesh Gouda](https://www.linkedin.

com/in/mounesh-gouda-
858069246/)** are available @ **[GitHub](https://github.com/Mouneshgouda)** </small>

Mounesh Gouda

Unit:-3 Classification and Tree based methods

Probabilities in machine learning:

Probability Distributions:

Continuous Distributions: Probability distributions like Gaussian (Normal) distribution, exponential distribution, etc., are frequently used in
machine learning algorithms. For example, Gaussian distributions are commonly used in statistical models, and the parameters of these
distributions are often estimated from data.

Discrete Distributions: Probability mass functions for discrete distributions, such as the binomial distribution or multinomial distribution, are
used in situations where the outcome is discrete (e.g., counting successes in a fixed number of trials).

Bayesian Inference:

Bayesian Probability: Bayesian probability is a framework for updating beliefs about a hypothesis as evidence is collected. In machine
learning, Bayesian methods are used for model training and updating based on new information.

Bayesian Networks: These are graphical models that represent the probabilistic relationships among a set of variables. Bayesian networks
are commonly used in areas like medical diagnosis, fraud detection, and risk assessment.

Probabilistic Models:

Probabilistic Graphical Models (PGMs): PGMs, such as Bayesian networks and Markov networks, are used to represent and reason about
uncertainty in complex systems. They are especially useful in situations where variables are interdependent.

Hidden Markov Models (HMMs): HMMs are used for modeling time-series data with hidden states. They have applications in speech
recognition, bioinformatics, and finance.

Classification and Regression:

Logistic Regression: In binary classification problems, logistic regression models the probability of a given instance belonging to a particular
class.

Softmax Regression: In multi-class classification problems, softmax regression models the probabilities of an instance belonging to each
class.

Evaluation Metrics:

Probability Calibration: In binary classification, probability calibration ensures that predicted probabilities align well with actual outcomes.
Calibrated probabilities are important in applications like risk assessment and fraud detection.

Precision, Recall, F1-Score: These metrics are often used to evaluate the performance of classification models and are computed based on
probabilities.

Ensemble Methods:

Random Forest, Gradient Boosting: Ensemble methods often use probabilistic models at their core. They combine predictions from multiple
models to improve overall performance.

Naive Bayes is a classification technique based on Bayes' theorem, which assumes that all features that predict the target outcome are
Naive Bayes is a classification technique based on Bayes' theorem, which assumes that all features that predict the target outcome are
independent of each other. It calculates the probability of each category and then selects the category with the highest probability. It has been
used successfully for many purposes, but is especially useful for natural language processing (NLP) problems.

Bayes' theorem explains the probability of an event based on prior knowledge of the events that will be associated with the event.

What makes Naive Bayes a “Naive” algorithm?

Naive Bayes classifier assumes that the features we use to predict the target are independent and do not affect each other. While in real-life
data, features depend on each other in determining the target, but this is ignored by the Naive Bayes classifier.

Though the independence assumption is never correct in real-world data, but often works well in practice. so that it is called “Naive”.

Math behind Naive Bays Algorithm

Given a features vector X=(x1,x2,…,xn) and a class variable y, Bayes Theorem states that:

We’re interested in calculating the posterior probability P(y | X) from the likelihood P(X | y) and prior probabilities P(y),P(X).

Using the chain rule, the likelihood P(X ∣ y) can be decomposed as:

However, since the concept of freedom is not fair, the situation seems independent from each other.

Thus, by conditional independence, we have:

Since the denominator remains the same for all values, the result could be:
The Naive Bayes classifier combines these models with decision rules. The general rule is to choose the best idea; this is called the
maximum a posteriori decision rule, or MAP.

How Naive Bays really works:

To make it clearer, let's explain with an example:

Suppose we have a group of emails and we want to classify them as spam or not spam.

Our database contains 15 non-spam emails and 10 spam emails. I did some analysis and recorded the frequency of each word as shown
below:

Note: Stop Words like “the”, “a”, “on”, “is”, “all” had been removed as they do not carry important meaning and are usually removed from
texts. The same thing applies to numbers and punctuations.

exploring some probabilities:

P(Dear|Not Spam) = 8/34 P(Visit|Not Spam) = 2/34 P(Dear|Spam) = 3/47 P(Visit|Spam) = 6/47

and so on.

now assume we have the message “Hello friend” and we want to know whether it is a spam or not.

so, using Bayes’ Theorem

ignoring the denominator

But, P(Hello friend | Not Spam) = 0, as this case (Hello friend) doesn’t exist in our dataset, i.e. we deal with single words, not the whole
sentence, and the same for P(Hello friend | Spam) will be zero as well, which in turn will make both probabilities of being a spam and not
spam both are zero, which has no meaning!!

But wait!! we said that the Naive Bayes assumes that the features we use to predict the target are independent .

so,

now let’s calculate the probability of being spam using the same procedure:

so, the message “Hello friend” is not a spam.

Pros and Cons for Naive Bayes

Pros:

Requires a small amount of training data. So the training takes less time. Handles continuous and discrete data, and it is not sensitive to
irrelevant features. Very simple, fast, and easy to implement. Can be used for both binary and multi-class classification problems. Highly
scalable as it scales linearly with the number of predictor features and data points. When the Naive Bayes conditional independence
assumption holds true, it will converge quicker than discriminative models like logistic regression.

Cons:

The assumption of independent predictors/features. Naive Bayes implicitly assumes that all the attributes are mutually independent which is
almost impossible to find in real-world data. If a categorical variable has a value that appears in the test dataset, and not observed in the
training dataset, then the model will assign it a zero probability and will not be able to make a prediction. This is what we called the “Zero
Frequency problem“, and can be solved using smoothing techniques.

Applications of Naive Bayes Algorithm

Real-time Prediction.

Multi-class Prediction.
Text classification/ Spam Filtering/ Sentiment Analysis.

Recommendation Systems.

Support Vector Machine (SVM)

Support Vector Machines (SVM) :-is a powerful supervised machine learning algorithm used for classification and regression tasks. The
primary goal of SVM is to find a hyperplane in an N-dimensional space (N is the number of features) that distinctly classifies data points into
different classes.

Here's a brief explanation of SVM:

Hyperplane:

In a 2-dimensional space, a hyperplane is a line. In 3-dimensional space, it's a plane, and so on. In general, a hyperplane is an (N-1)-
dimensional subspace in an N-dimensional space. For a binary classification problem (two classes), the SVM aims to find the hyperplane that
best separates the two classes. Support Vectors:

Support vectors are the data points that are closest to the hyperplane and have the most significant influence on the positioning of the
hyperplane. These support vectors essentially "support" the optimal separation between classes. Margin:

The margin is the distance between the hyperplane and the nearest data point from each class (the support vectors). SVM aims to maximize
this margin because a larger margin generally leads to a more robust and generalized model. Kernel Trick:

SVMs can efficiently handle non-linear decision boundaries through the use of a kernel trick. The kernel function transforms the input features
into a higher-dimensional space, making it possible to find a hyperplane that separates the classes. Common kernel functions include linear,
polynomial, radial basis function (RBF), and sigmoid. C Parameter:

The regularization parameter C is crucial in SVM. It determines the trade-off between having a smooth decision boundary and classifying
training points correctly. A small C encourages a larger margin but may misclassify more points, while a large C classifies more points
correctly but may result in a smaller margin. Soft Margin SVM:

In real-world scenarios, data may not always be perfectly separable. In such cases, SVM can be adapted to allow for some misclassifications
by introducing a slack variable. Soft Margin SVM aims to find a balance between maximizing the margin and minimizing the misclassification
errors. Multi-Class Classification:

SVM can be extended for multi-class classification using techniques like one-vs-one or one-vs-all. In summary, SVM is a versatile algorithm
suitable for both linear and non-linear classification tasks. Its ability to handle high-dimensional spaces and the flexibility provided by different
kernel functions make it a popular choice in various applications, including image recognition, text classification, and bioinformatics
Margin

Minimum distance from the hyperplane of all the observations

Functional Margin:- Theoretical definition of Margin

Geometric Margin

Wider margin : Future points can be classified with certainty

Why maximize Margin ?

Points near decision surface : uncertain classification decisions (50% either way)

A classifier with a large margin makes no low certainty classification decisions

Gives classification safety margin w.r.t. slight errors in measurement

Why maximize Margin ?

SVM classifier : large margin around decision boundary

Fat separator between the classes

Fewer choices of where it can be put in comparison to hyperplane

Linear SVM Mathematically

Hard Margin

Finding the classifier

Solve an optimization problem

The Real world – not so clean !
Hard vs. Soft

Soft Margin SVM:-

always has a solution

More robust to outliers

Hard Margin:-

Requires no parameters

Why Dual Problem

Why Dual?

High Dimensional data

No. of features >> no. of samples

Eg. Image data, genetic data

p >> N

Np >> N*N

N*N number of inner products Leads to the ‘Kernel’ formulation

Decision Trees

A decision tree is a type of non-parametric supervised learning algorithm characterized by its hierarchical, tree-like structure. The tree
comprises various components, including a root node, branches, internal nodes, and leaf nodes.

The foundational concept of decision trees has greatly influenced classical machine learning algorithms such as Random Forests, Bagging,
and Boosted Decision Trees. The fundamental idea behind decision trees is to represent data using a tree structure. In this structure, each
internal node serves as a test on a specific attribute, essentially representing a condition. Each branch emanating from an internal node
signifies an outcome of the corresponding test, and each leaf node, also known as a terminal node, holds a class label.

This representation allows decision trees to make decisions based on a series of attribute tests, leading to a path from the root to a specific
leaf node. Decision trees are valuable in classification and regression tasks due to their interpretability, ease of understanding, and ability to
capture complex decision boundaries in the data. The tree structure is utilized to segment the feature space into regions, with each leaf node
corresponding to a particular class or regression value.

Before learning more about decision trees let’s get familiar with some of the terminologies.

Root Nodes: These nodes mark the starting point of a decision tree, initiating the division of the population based on various features.

Decision Nodes: After the root node, the resulting nodes from subsequent splits are referred to as decision nodes.

Leaf Nodes: Nodes where further division is not feasible are termed as leaf nodes or terminal nodes.

Branch/Sub-tree: Similar to a sub-graph in a graph, a subsection of a decision tree is known as a sub-tree or branch.

Pruning: This process involves selectively removing nodes to prevent overfitting and improve the generalization ability of the decision tree.
Pruning ensures a more balanced and accurate model by trimming unnecessary branches and nodes.
Why use decision trees?

Reflecting Human Thinking: Decision trees often mirror the logical reasoning inherent in human decision-making. This characteristic makes
them intuitive and straightforward for individuals to comprehend.

Transparent Structure: Decision trees adopt a tree-like structure, contributing to the ease of understanding the underlying logic. The clear and
visual representation simplifies the interpretation of decision-making processes within the algorithm.

Decision Tree Example

Let's explore the concept of decision trees through an example. Decision trees are employed for constructing classification or regression
models, structured in the form of a tree. This process involves breaking down a dataset into progressively smaller subsets while
simultaneously developing an associated decision tree. The outcome is a tree structure comprising decision nodes and leaf nodes.

A decision node (e.g., "Outlook") features two or more branches (e.g., "Sunny," "Overcast," and "Rainy"). Meanwhile, a leaf node (e.g.,
"Play") signifies a specific classification or decision. The highest decision node in the tree, corresponding to the most influential predictor, is
referred to as the root node. Notably, decision trees exhibit versatility by accommodating both categorical and numerical data in their
construction.

Entropy

Entropy is a measure of the impurity in a substance. It shows the randomness of things.

Entropy from information theory. The higher the entropy, the more content.

H(X) = — Σ (pi * log2 pi)

Where,

X = total sample
pi is the probability test of the category I

Gini Index

The Gini Index is a measure of impurity or purity used to create decision trees in the CART (Classification and Regression Trees) algorithm.

In contrast, items with a low Gini index should be prioritized for a high Gini index.

When creating binary bins only, the CART algorithm uses the Gini index to create binary bins.

The Gini index can be calculated using the following formula:

Information Gain

We want to determine which attribute in a given set of training feature vectors is most useful for discriminating between the classes to be
learned.

Information gain tells us how important a given attribute of the feature vectors is.

We will use it to decide the ordering of attributes in the nodes of a decision tree.

Information Gain = entropy ( parent) — [average entropy ( children)]

Decision Tree to Decision Rules

A decision tree can easily be transformed to a set of rules by mapping from the root node to the leaf nodes one by one.

Pruning is a crucial process in decision tree construction aimed at optimizing the tree's structure. The primary goal is to strike a balance
between a tree that is too large, increasing the risk of overfitting, and one that is too small, potentially missing important features in the
dataset. Pruning helps reduce the size of the tree while maintaining or even improving accuracy. Two commonly used tree pruning
techniques are:

Cost Complexity Pruning: Cost Complexity Pruning involves the use of a hyperparameter known as the "cost-complexity parameter" (often
denoted as alpha). This parameter controls the trade-off between the complexity of the tree and its fit to the training data. By iteratively
pruning subtrees based on this parameter, the algorithm seeks to find the optimal balance, leading to a tree with improved generalization
performance on unseen data.

Reduced Error Pruning: Reduced Error Pruning, also known as post-pruning, is a technique where the tree is initially allowed to grow to its full
extent, capturing as much detail from the training data as possible. Subsequently, nodes are removed based on their contribution to reducing
errors on a validation dataset. This process continues until further pruning does not result in a significant improvement in performance.

These pruning methods are essential for preventing overfitting and enhancing the model's ability to generalize to new data. By carefully
removing unnecessary nodes, decision trees become more interpretable, computationally efficient, and better suited for real-world
applications.

Ensemble Learning?

Ensemble methods represent a robust approach to enhancing model performance by combining predictions from multiple models. Leveraging
this machine learning algorithm results in improved outcomes, showcasing the effectiveness of this

Types of Ensemble Learning

Bagging or Bootstrap Aggregation — Random Forest

Boosting — AdaBoost, XG Boost and Gradient Boost

Why do we use Ensemble Techniques?

These techniques help in reducing the variance (bagging), bias (boosting) and improve predictions.

Certainly, here's a rephrased version of the information:

Bagging: Bagging, as exemplified by Random Forest, effectively addresses overfitting concerns and exhibits reduced training time. While
there is occasional bias increase, variance is mitigated, and the use of independent parallel classifiers contributes to its robust performance.

Boosting: Gradient Boosting, a representative of boosting techniques, may encounter overfitting issues, which can be mitigated through
parameter tuning. Boosting is adept at reducing bias and is characterized as a set of sequential classifiers, emphasizing its sequential nature
in model building.

Boosting

Boosting algorithms is the family of algorithms that combine weak learners into a strong learner.
Idea behind boosting algorithms?

Boosting algorithms aim to learn weak classifiers that exhibit slight correlation with the true classification. These weak classifiers are then
combined to form a strong classifier that demonstrates high correlation with the true classification.

Illustrating this concept with the mail spam detection problem involves breaking down the task into several steps:

Identifying if the email contains the phrase 'how have earned the prize.

Checking for the presence of only an image in the email. Determining the sender of the email.

Assessing how caps lock was utilized in the email.

Examining the subject line of the email.

Each of these steps represents a weak classifier, individually insufficient for determining whether the email is spam. However, when
combined, these weak classifiers work synergistically, resulting in a robust and accurate spam detection system with a higher probability of
correctly identifying spam emails.

working of boosting algorithm?

Boosting algorithms operate through an iterative process, progressively learning weak classifiers and integrating them into a final strong
classifier. The inclusion of weak classifiers is typically weighted based on their accuracy. Following each iteration, the training data undergoes
reweighting, with misclassified instances gaining weight and correctly classified instances losing weight. In subsequent iterations, the focus of
the learner is primarily on instances that were previously misclassified. The distinctive characteristics of various boosting algorithms primarily
stem from the specific reweighting approaches applied to the training set.

Bagging( Bootstrap Aggregating)

As we discussed before bagging is an ensemble technique mainly used to reduce the variance of our predictions by combining the result of
multiple classifiers modelled on different sub-samples of the same data set.
Main Steps involved in bagging are :

Generating Multiple Datasets: Through sampling with replacement from the original dataset, new datasets are created.

Constructing Multiple Classifiers: Each of these smaller datasets undergoes the building of a classifier, typically using the same model across
all datasets.

Combining Classifiers: The predictions from each individual classifier are then aggregated to form an improved classifier, often characterized
by significantly reduced variance.

Bagging operates akin to a "Divide and Conquer" strategy, involving a collection of predictive models executed on diverse subsets derived
from the original dataset. These models are subsequently amalgamated to enhance accuracy and promote model stability.

Gradient Boosting

is an ensemble learning technique that works by combining the predictions of multiple weak learners, typically decision trees, to create a
strong predictive model. The key idea behind Gradient Boosting is to iteratively improve the model by fitting a weak learner to the residual
errors of the existing model.

How Gradient Boosting Works? (Iterative Corrections) Compute Target Column Average:

Start by determining the average of the target column to establish an initial prediction. Residual Calculation:

Calculate residuals by comparing actual and predicted observations, representing the differences between them. Model Training with
Residuals:

Train a model using the residuals as the target variable, aiming to capture patterns not considered in the initial prediction. Update Predictions:

Adjust the default prediction by incorporating the predictions obtained from the newly trained model. Optimize Loss Functions:

Enhance the model by optimizing the loss functions of the previous learner, contributing to a refined and more accurate predictive model.
XGBoost

short for eXtreme Gradient Boosting, is an optimized and efficient implementation of the gradient boosting algorithm. It has gained popularity
in machine learning competitions and various applications due to its speed and performance. XGBoost is designed to be highly scalable and
provides high predictive accuracy.

Here are the key components and aspects of XGBoost:

Gradient Boosting Framework:

XGBoost belongs to the family of ensemble learning methods, specifically the gradient boosting framework. It builds a series of weak learners
(typically decision trees) sequentially, with each subsequent learner correcting the errors of the previous ones. Objective Function:

XGBoost uses a customizable objective function that combines a loss function and a regularization term. The objective function is optimized
during the training process, ensuring a robust and accurate model. Regularization:

XGBoost incorporates L1 (Lasso) and L2 (Ridge) regularization terms into its objective function to prevent overfitting and improve the
generalization of the model. Tree Pruning:

XGBoost employs a technique called "pruning" during the construction of decision trees. Pruning helps avoid the growth of deep and overfit
trees, contributing to the overall efficiency and interpretability of the model. Parallel and Distributed Computing:

XGBoost is designed to take advantage of parallel and distributed computing capabilities. It can efficiently handle large datasets and expedite
the training process by utilizing multiple cores or distributed computing environments. Handling Missing Values:

XGBoost has a built-in mechanism to handle missing values in the dataset. It automatically learns the best imputation strategy during the
training process. Feature Importance:

XGBoost provides a feature importance score, allowing users to interpret the contribution of each feature to the model's predictions. This
information aids in feature selection and understanding the model's behavior. Cross-Validation:

XGBoost facilitates the use of cross-validation during the training process, enabling robust model evaluation and hyperparameter tuning.
XGBoost's efficiency, scalability, and effectiveness make it a popular choice for various machine learning tasks, including classification,
regression, and ranking problems.

Credites

https://www.kaggle.com/ https://medium.com/ https://www.wikipedia.org/

Loading [MathJax]/extensions/Safe.js

SVM Using Python
No ratings yet
SVM Using Python
24 pages
03 Classification
No ratings yet
03 Classification
66 pages
ML Unit 3 V1
No ratings yet
ML Unit 3 V1
25 pages
Notes
No ratings yet
Notes
32 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
3 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Bayes' Theorem Explained
No ratings yet
Bayes' Theorem Explained
18 pages
ML Unit 3 Part B Material
No ratings yet
ML Unit 3 Part B Material
15 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
Ch5
No ratings yet
Ch5
21 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
No ratings yet
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
44 pages
The Hundred Page Machine Learning 2019
No ratings yet
The Hundred Page Machine Learning 2019
4 pages
lecture3-linear-classifiers
No ratings yet
lecture3-linear-classifiers
36 pages
UNIT-3
No ratings yet
UNIT-3
12 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
Machine Ass
No ratings yet
Machine Ass
33 pages
(Machine Learning) BAYES’ THEOREM AND CONCEPT LEARNING
No ratings yet
(Machine Learning) BAYES’ THEOREM AND CONCEPT LEARNING
22 pages
Unit 5-6
No ratings yet
Unit 5-6
18 pages
FPA Notes
No ratings yet
FPA Notes
13 pages
8 Classification
No ratings yet
8 Classification
45 pages
NLP NB
No ratings yet
NLP NB
52 pages
Slides Classification Naivebayes
No ratings yet
Slides Classification Naivebayes
6 pages
Lecture 02 Supervised Learning 27102022 124322am
No ratings yet
Lecture 02 Supervised Learning 27102022 124322am
29 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Machine learning assingiment
No ratings yet
Machine learning assingiment
20 pages
Practical-3 Ritesh
No ratings yet
Practical-3 Ritesh
5 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
ARTIFICIAL INTELLIGENCE LEC 3
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 3
17 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Lecture 18 - 2024
No ratings yet
Lecture 18 - 2024
34 pages
L25 - Naïve Bayes
No ratings yet
L25 - Naïve Bayes
18 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
17 pages
Naive_Bayes_Classifier_Presentation
No ratings yet
Naive_Bayes_Classifier_Presentation
10 pages
Classification
No ratings yet
Classification
7 pages
Dl Highlights
No ratings yet
Dl Highlights
6 pages
Pa Mod - 3,4,5
No ratings yet
Pa Mod - 3,4,5
47 pages
3.unit 3 ML Part-1 Q&A
No ratings yet
3.unit 3 ML Part-1 Q&A
39 pages
6d7701 - Bayesean Classifer
No ratings yet
6d7701 - Bayesean Classifer
8 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Unit - 3
No ratings yet
Unit - 3
83 pages
unit 6 ai
No ratings yet
unit 6 ai
28 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
6. Naive Bayes
No ratings yet
6. Naive Bayes
26 pages
Naive Bayes etc.
No ratings yet
Naive Bayes etc.
1 page
AI Unit-4
No ratings yet
AI Unit-4
58 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
Aiml Q Bank
No ratings yet
Aiml Q Bank
25 pages
Reasoning with Data An Introduction to Traditional and Bayesian Statistics Using R 1st Edition Jeffrey M. Stanton - Download the ebook now and own the full detailed content
100% (3)
Reasoning with Data An Introduction to Traditional and Bayesian Statistics Using R 1st Edition Jeffrey M. Stanton - Download the ebook now and own the full detailed content
56 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
Machine Learning Model Qustion Paper Scheme of Valuation
No ratings yet
Machine Learning Model Qustion Paper Scheme of Valuation
4 pages
Gaussian Process Regression For Dummies: Greg Cox Rich Shiffrin
No ratings yet
Gaussian Process Regression For Dummies: Greg Cox Rich Shiffrin
47 pages
Statistics Masters Thesis Topics
100% (2)
Statistics Masters Thesis Topics
6 pages
Knowledge Interference
No ratings yet
Knowledge Interference
18 pages
Classical vs. Bayes Reliability Growth in Theory and Practice
No ratings yet
Classical vs. Bayes Reliability Growth in Theory and Practice
6 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
21 pages
Bayesian Methods in Applied Econometrics, Or, Why Econometrics Should Always and Everywhere Be Bayesian
No ratings yet
Bayesian Methods in Applied Econometrics, Or, Why Econometrics Should Always and Everywhere Be Bayesian
14 pages
AI_02_Naive_Bayes
No ratings yet
AI_02_Naive_Bayes
9 pages
Spatial Integration of Geological Data For Predictive Mineral Mapping: A Case Study From Taebaek-San Area, Korea
No ratings yet
Spatial Integration of Geological Data For Predictive Mineral Mapping: A Case Study From Taebaek-San Area, Korea
10 pages
Fire Load Survey Data Recent Standards A
No ratings yet
Fire Load Survey Data Recent Standards A
14 pages
Waic And Wbic With Python Stan 100 Exercises For Building Logic Joe Suzuki download
No ratings yet
Waic And Wbic With Python Stan 100 Exercises For Building Logic Joe Suzuki download
74 pages
Margaret Bowden Ai Article
No ratings yet
Margaret Bowden Ai Article
21 pages
Biological Variation - cclm-2022-1255
No ratings yet
Biological Variation - cclm-2022-1255
10 pages
Classification: Naïve Bayes' Classifier: Lecture #8
No ratings yet
Classification: Naïve Bayes' Classifier: Lecture #8
51 pages
Recent Developments and Challenges in Surrogate Model Based Optimal Design of Engineering Systems
No ratings yet
Recent Developments and Challenges in Surrogate Model Based Optimal Design of Engineering Systems
9 pages
PHD Computer Science Program: 1. Admission Criteria
No ratings yet
PHD Computer Science Program: 1. Admission Criteria
15 pages
Bayesian Methods With Application in Risk Analysis: Dongyang DING Xiyang LIU
No ratings yet
Bayesian Methods With Application in Risk Analysis: Dongyang DING Xiyang LIU
4 pages
Frick Et Al 2024 Welfare Comparisons For Biased Learning
No ratings yet
Frick Et Al 2024 Welfare Comparisons For Biased Learning
38 pages
Uncertainity Quantification
No ratings yet
Uncertainity Quantification
88 pages
Introduction To Conditional Probability and Bayes Theorem For Data Science Professionals
No ratings yet
Introduction To Conditional Probability and Bayes Theorem For Data Science Professionals
12 pages
Instant ebooks textbook Determining Sample Size Balancing Power Precision and Practicality First Edition Patrick Dattalo download all chapters
100% (2)
Instant ebooks textbook Determining Sample Size Balancing Power Precision and Practicality First Edition Patrick Dattalo download all chapters
69 pages
Get From Statistical Physics To Data-Driven Modelling: With Applications To Quantitative Biology Simona Cocco Free All Chapters
100% (8)
Get From Statistical Physics To Data-Driven Modelling: With Applications To Quantitative Biology Simona Cocco Free All Chapters
48 pages
Philosophy and The Practice of Bayesian Statistics in The Social Sciences
No ratings yet
Philosophy and The Practice of Bayesian Statistics in The Social Sciences
14 pages
Ai ML Important Questions
No ratings yet
Ai ML Important Questions
21 pages
Jackson V AEG Live, Transcripts of Video Deposition From July 18th 2013 of Emery Brown
No ratings yet
Jackson V AEG Live, Transcripts of Video Deposition From July 18th 2013 of Emery Brown
56 pages
AI Mid-Term Guidline
No ratings yet
AI Mid-Term Guidline
1 page
IDS22Bayes Applications
No ratings yet
IDS22Bayes Applications
34 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3 PPT

Uploaded by

Unit 3 PPT

Uploaded by

All the IPython Notebooks in **Machine larning** lecture series by **[Mr.Mounesh Gouda](https://www.linkedin.

Unit:-3 Classification and Tree based methods

Probabilities in machine learning:

Classification and Regression:

What makes Naive Bayes a “Naive” algorithm?

Math behind Naive Bays Algorithm

Thus, by conditional independence, we have:

How Naive Bays really works:

How Naive Bays really works:

To make it clearer, let's explain with an example:

exploring some probabilities:

so, using Bayes’ Theorem

so, the message “Hello friend” is not a spam.

Pros and Cons for Naive Bayes

Applications of Naive Bayes Algorithm

Applications of Naive Bayes Algorithm

Support Vector Machine (SVM)

Here's a brief explanation of SVM:

Minimum distance from the hyperplane of all the observations

Functional Margin:- Theoretical definition of Margin

Wider margin : Future points can be classified with certainty

Why maximize Margin ?

A classifier with a large margin makes no low certainty classification decisions

Gives classification safety margin w.r.t. slight errors in measurement

SVM classifier : large margin around decision boundary

Fat separator between the classes

Fewer choices of where it can be put in comparison to hyperplane

Linear SVM Mathematically

Finding the classifier

Solve an optimization problem

Soft Margin SVM:-

always has a solution

More robust to outliers

Why Dual Problem

High Dimensional data

No. of features >> no. of samples

Eg. Image data, genetic data

N*N number of inner products Leads to the ‘Kernel’ formulation

Decision Tree Example

Entropy is a measure of the impurity in a substance. It shows the randomness of things.

H(X) = — Σ (pi * log2 pi)

The Gini index can be calculated using the following formula:

Information Gain = entropy ( parent) — [average entropy ( children)]

Decision Tree to Decision Rules

Types of Ensemble Learning

Bagging or Bootstrap Aggregation — Random Forest

Boosting — AdaBoost, XG Boost and Gradient Boost

Certainly, here's a rephrased version of the information:

Assessing how caps lock was utilized in the email.

Examining the subject line of the email.

working of boosting algorithm?

Bagging( Bootstrap Aggregating)

Here are the key components and aspects of XGBoost:

Gradient Boosting Framework:

https://www.kaggle.com/ https://medium.com/ https://www.wikipedia.org/

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

All the IPython Notebooks in Machine larning lecture series by **[Mr.Mounesh Gouda](https://www.linkedin.