0% found this document useful (0 votes)
3 views

mod(ML)-4

The document discusses Bayesian Learning, a probability-based approach that combines prior knowledge with new data for decision-making, highlighting key algorithms like Naïve Bayes and Bayesian Belief Networks. It explains the fundamentals of Bayes Theorem, classification using Bayes models, and the Naïve Bayes algorithm, including its application for continuous attributes and the concept of zero probability error. Additionally, it introduces Artificial Neural Networks (ANNs), their structure, activation functions, and types, emphasizing the evolution from simple perceptrons to multi-layer perceptrons for solving complex problems.

Uploaded by

eliyanathomas6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

mod(ML)-4

The document discusses Bayesian Learning, a probability-based approach that combines prior knowledge with new data for decision-making, highlighting key algorithms like Naïve Bayes and Bayesian Belief Networks. It explains the fundamentals of Bayes Theorem, classification using Bayes models, and the Naïve Bayes algorithm, including its application for continuous attributes and the concept of zero probability error. Additionally, it introduces Artificial Neural Networks (ANNs), their structure, activation functions, and types, emphasizing the evolution from simple perceptrons to multi-layer perceptrons for solving complex problems.

Uploaded by

eliyanathomas6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Machine

Learning
Module 4
Bayesian Learning
Introduction
Probability-based learning is a crucial practical learning approach that combines
prior knowledge or prior probabilities with newly observed data to make decisions
or predictions
It relies heavily on probability theory, which helps in modeling randomness,
uncertainty, and noise when predicting future events
1. It helps describe how random events occur and how uncertainty can be
mathematically captured
2. Especially useful when dealing with massive data, as it models and predicts
hidden quantities
Probabilistic vs Deterministic Models
Instead of giving a single solution, it gives a probability distribution over possible
outcomes
Given the same initial conditions, the model will always produce the same output
Introduction - Bayesian Learning
While probabilistic learning generally models uncertainty using observed data,
Bayesian learning goes a step further by using subjective probabilities
Personal belief or interpretation about the likelihood of an event
These beliefs can change over time as more data is observed
Two major algorithms under Bayesian learning
1. Naïve Bayes Learning
2. Bayesian Belief Networks (BBN)
Fundamentals of Bayes Theorem
The prior probability reflects what we already know about an event before any new
observations are made
It’s our starting point. It represents the initial degree of belief in a hypothesis
before seeing the new evidence.
Likelihood measures how probable the observed evidence is, assuming that the
hypothesis is true.
It’s denoted as P(Evidence|Hypothesis) — the probability of the observed
evidence given that the hypothesis holds.
It helps us evaluate how well the hypothesis explains the new data
Posterior probability is the updated probability of the hypothesis after observing
the new evidence
It’s denoted as P(Hypothesis|Evidence) — the probability that the hypothesis is
true given the observed evidence.
Classification Using Bayes Model
Naïve Bayes Classification models are based fundamentally on Bayes Theorem
Bayes' rule is a mathematical formula used to compute the posterior probability of
a hypothesis, given prior information and new evidence

P(Hypothesis h∣Evidence E) = Posterior probability


P(Evidence E∣Hypothesis h) = Likelihood
P(Hypothesis h) = Prior probability
P(Evidence E) = Marginal probability (overall probability of the evidence under
all hypotheses)
Maximum A Posteriori (MAP) Hypothesis, hMAP
Maximum Likelihood (ML) Hypothesis, hML
Maximum Likelihood (ML) Hypothesis, hML

P(Boy participates in tournament)=90%=0.9


P(He is sick∣Boy participates in tournament)=20%=0.2
P(He is sick)=40%=0.4
P(Boy participates in tournament∣He is sick) = ?

P(Boy participates∣He is sick)=0.45


Naïve Bayes Algorithm
Naïve Bayes is based on Bayes Theorem
It uses probability to make decisions: given some features, what is the probability that
the data belongs to a particular class
The algorithm assumes that all features are independent of each other given the
class label
In reality, this assumption may not always be true (hence the term "naïve")
Naïve Bayes Algorithm
Naïve Bayes Algorithm
Step 1: Frequency Matrix and Prior Probability

Step 2: Frequency Matrix and Likelihood probability for each of the feature
2a: CGPA:
Naïve Bayes Algorithm
2b: Interactiveness

2c: Practical Knowledge


Naïve Bayes Algorithm
2d: Communication Skills

Given the test data (CGPA ≥ 9, Interactiveness = Yes, Practical Knowledge = Average,
Communication Skills = Good), apply Bayes' theorem to determine whether the
student will receive a job offer or not.
Naïve Bayes Algorithm
Step 3: Calculate the probability of all hypothesis

Step 1: Compute for Job Offer = Yes

Step 1: Compute for Job Offer = No

Final Prediction: Job Offer = Yes


Zero Probability Error in Naïve Bayes
The test data given is: CGPA ≥ 8, Interactiveness = Yes, Practical Knowledge =
Average, Communication Skills = Good
Step 1: Computing for 'Job Offer = Yes'

Step 1: Computing for 'Job Offer = No'

Since one of the probabilities is zero, the entire multiplication becomes zero
Laplace Correction (Smoothing) is suggested
Add 1 to each count (even if it was zero)
Adjust the denominator accordingly (to account for the added counts)
This ensures no attribute probability is exactly zero, allowing the model to make
better predictions even for unseen combinations.
Bayes Optimal Classifier
It is a probabilistic model that uses Bayes' theorem to find the best possible
classification for a new data point
Instead of choosing just the single most probable hypothesis, it combines all
hypotheses based on their posterior probabilities
It aggregates the predictions of all hypotheses instead of committing early to one
hypothesis
Bayes Optimal Classifier

Step 1: MAP Classifier (Maximum A Posteriori) P(h1​∣T)=0.3 (maximum among all)


MAP predicts: COVID Negative
Step 2: Bayes Optimal Classifier
Bayes Optimal does not pick just one hypothesis — it combines all hypotheses.
Step 3: Calculate total probability for COVID Negative
Bayes Optimal Classifier

Step 4: Calculate total probability for COVID Positive

Bayes Optimal Classifier predicts: COVID Positive


Gibbs Algorithm
Bayes Optimal Classifier computes the posterior probability for all hypotheses in
the hypothesis space
This process is computationally very expensive when there are many hypotheses.
Gibbs algorithm is a sampling-based method
It randomly selects one hypothesis according to the posterior probability
distribution
Then it uses only that one selected hypothesis to classify the new instance
It is found that prediction error occurs twice as often when using Gibbs compared
to the Bayes Optimal classifier
Naïve Bayes Algorithm For Continuous Attributes
Traditional Naïve Bayes works best when features are discrete
When handling continuous data, Naïve Bayes needs special techniques to work
properly
1. Discretize Continuous Features
2. Apply a Normal (Gaussian) Distribution
In Gaussian Naïve Bayes, each continuous feature is assumed to come from a
Gaussian (Normal) distribution
So, for each feature in each class, calculate: Mean (μ) — the center of the
distribution, Variance (𝝈𝟐 ) — the spread or width of the distribution.
Compute the probability density of the feature values using the Gaussian formula
Naïve Bayes Algorithm For Continuous Attributes
Naïve Bayes Algorithm For Continuous Attributes
Step 1: Frequency Matrix and Prior Probability

Step 2a: Mean and Standard Deviation Computation for CGPA


Naïve Bayes Algorithm For Continuous Attributes
Step 2b: Consider Feature Interactiveness

Test data: CGPA = 8.5, Interactiveness = Yes


Step 3: Calculate probability of all hypothesis
Naïve Bayes Algorithm For Continuous Attributes
Test data: CGPA = 8.5, Interactiveness = Yes
Step 3: Calculate probability of all hypothesis

0.297>0.0369, the student is classified as 'Job Offer = Yes'


Module 4
Artificial Neural Network
Artificial Neural Networks
Artificial Neural Networks (ANNs) are inspired by the structure and functioning of
the human brain
The human brain consists of billions of neurons — small, interconnected processing
units
These neurons are connected in the form of a network, which behaves much like a
directed graph
Neurons receive information, process it, and then transmit the result to other
neurons. This is how humans learn and adapt
ANN mimics the brain’s way of solving complex and non-linear problems
Each neuron in an ANN is called a node or computing unit, capable of performing
computations
An ANN consists of many such units that work together in parallel, learning patterns
and relationships from data
These units are layered into - Input Layer, Hidden Layer, Output Layer
Artificial Neural Networks
Central Nervous System (CNS), Comprises the brain and spinal cord
Peripheral Nervous System (PNS), Includes all neurons outside the CNS
Types of Neurons
1. Sensory Neurons
2. Motor Neurons
3. Interneurons
Four Key Parts of a Neuron
1. Dendrites: Collect electrical signals and carry them
toward the cell body
2. Soma (Cell Body): Integrates the incoming signals. If the combined signal is strong
enough, it triggers a response
3. Axon: Transmits processed signals away from the cell body to other neurons or
muscles
4. Synapse: The point where the axon of one neuron connects to the dendrite of another
neuron
Artificial Neurons
The Two-Step Process
1. Receives weighted inputs from other neurons
2. Applies a threshold function (activation function)

This sum is passed to an activation function f(x)


This function decides whether the neuron fires or not

The McCulloch & Pitts Neuron Model is a simple


mathematical abstraction of a biological neuron that was
designed to work with binary (Boolean) inputs and
outputs
Artificial Neural Networks Structure
Input Layer, this is where the data enters the network.
Hidden Layer, It contains neurons (nodes) that process
inputs from the input layer
A summation function computes the weighted sum of
inputs
An activation function (e.g., ReLU, sigmoid) then decides
the output of the neuron
Output Layer, It produces the final output of the network
Activation Functions
An activation function decides whether a neuron should be activated or not
Introduces non-linearity into the network
Controls the firing behavior of a neuron
Normalizes output
1. Identity / Linear Function
Output is directly proportional to input
Used when no thresholding is needed
Limitation: No non-linearity introduced
2. Binary Step Function
Simple thresholding function
3. Bipolar Step Function
Like the binary step, but with output
range: -1 to +1
Useful in bipolar logic systems
Activation Functions
4. Sigmoid (Logistic) Function
S-shaped (sigmoid curve)
Smooth and continuous
Issue: Vanishing gradient problem
5. Bipolar Sigmoid Function
Like sigmoid, but range is (-1, +1)
Offers bipolar output for symmetrical learning
6. Ramp Function
Piecewise linear function
Smooth transition between 0 and 1
7. Tanh (Hyperbolic Tangent Function)
Scaled version of sigmoid
Centered around zero – better for convergence
Activation Functions
8. ReLU (Rectified Linear Unit)
Most commonly used in deep learning
Fast and reduces vanishing gradient problem
Outputs 0 for negatives, linear for positives
9. Softmax Function

Used in multi-class classification in the output layer


Converts raw scores (logits) into probabilities
Output values sum to 1
Perceptron and Learning
The Perceptron is a type of linear binary classifier designed for supervised learning, by
Frank R. in 1958
It was based on the McCulloch-Pitts model but enhanced with
Variable weights for each input
An extra constant input called the bias, which helps shift the decision boundary
1. Inputs from other neurons
2. Weights and Bias
The purpose of weights is to give importance to features;
bias allows flexibility in the decision boundary
3. Net-sum Calculation

4. Activation Function
The binary step function is used

The basic perceptron can only solve linearly separable problems


Perceptron and Learning
Perceptron and Learning
Perceptron and Learning

Epoch 1: Initial Weights (w₁ = 0.3, w₂ = -0.2)


Perceptron and Learning

Epoch 2: Weights (w₁ = 0.5, w₂ = 0)


Epoch x₁ x₂ Ydes Yest Error w₁ w₂ Status

2 0 0 0 Step((0 × 0.5 + 0 × 0) – 0.4) = Step(–0.4) = 0 0 0.5 0 No change

2 0 1 0 Step((0 × 0.5 + 1 × 0) – 0.4) = Step(–0.4) = 0 0 0.5 0 No change

2 1 0 0 Step((1 × 0.5 + 0 × 0) – 0.4) = Step(0.1) = 0 0 0.5 0 No change

2 1 1 1 Step((1 × 0.5 + 1 × 0) – 0.4) = Step(0.1) = 0 1 0.7 0.2 Change


Perceptron and Learning

Epoch 3: Weights (w₁ = 0.7, w₂ = 0.2)

Epoch x₁ x₂ Ydes Yest Error w₁ w₂ Status


3 0 0 0 Step((0 × 0.7 + 0 × 0.2) –0.4) = Step(-0.4) = 0 0 0.7 0.2 No change

3 0 1 0 Step((0 × 0.7 + 1 × 0.2) – 0.4) = Step(-0.2) = 0 0 0.7 0.2 No change

3 1 0 0 Step((1 × 0.7 + 0 × 0.2) – 0.4) = Step(0.3) = 0 0 0.7 0.2 No change

3 1 1 1 Step((1 × 0.7 + 1 × 0.2) – 0.4) = Step(0.1) = 1 0 0.7 0.2 No change


Perceptron and Learning – XOR Problem
A perceptron is a type of artificial neuron used in machine learning. It works well when solving
linearly separable Boolean functions – AND, OR, NAND
The XOR function is not linearly separable. There is no single line
or plane that can separate the input combinations into classes of 0s
and 1s in a two-dimensional space
This limitation was famously pointed out by Marvin Minsky and
Seymour Papert in their 1969 book Perceptrons

Researchers moved towards multi-layered networks—called Multi-Layer Perceptrons


(MLPs)—which include one or more hidden layers between input and output
These hidden layers allow MLPs to model non-linear relationships like XOR
In 1974, Paul Werbos proposed the backpropagation algorithm, which enabled learning in
multi-layer networks by calculating gradients efficiently
In 1986, Rumelhart, Hinton, and Williams popularized backpropagation, leading to a
renaissance in neural network research
Types of ANN’s
1. Feed Forward Neural Network (FFNN)
A Feed Forward Neural Network is a layered neural model where the information flows only
in one direction:
→ from input layer → through hidden layers → to the output layer
Type Hidden Layers Learning Capacity
Can solve only
Single-layer FFNN None linearly separable
problems
Can solve non-linear
Multi-layer FFNN
One or more and more complex
(MLP)
problems

Relationships between input features are relatively straightforward


Types of ANN’s
2. Fully Connected Neural Network
Every neuron in one layer is connected to every neuron in the next layer
These connections are weighted, and weights are adjusted during training to minimize
prediction errors

Powerful model due to full connectivity


Easy to implement using libraries like Keras, TensorFlow,
and PyTorch

Less efficient for spatial or sequential data (like images or


text) compared to CNNs or RNNs
Types of ANN’s
3. Multi-Layer Perceptron (MLP)
A Multi-Layer Perceptron (MLP) is a type of feedforward artificial neural network that
contains:
One input layer
One or more hidden layers
One output layer

Can model complex functions due to hidden layers


Suitable for classification and regression tasks

Slow training compared to other models


Needs large datasets for high accuracy
Types of ANN’s
4. Feedback Neural Network
A Feedback Neural Network is a type of neural network where the output is fed back into
the network
The network has memory and its state can evolve over time based on previous inputs
Suitable for problems involving sequential data or time-dependent data
Feature Feedforward NN Feedback NN (RNN)
Bi-directional (with
Data Flow One-way (input → output)
feedback)
Memory of Past Yes (remembers previous
No
Inputs state)
Suitable for Time
No Yes
Series
More complex due to
Simplicity More straightforward
loops
Training More (requires techniques
Less
Complexity like BPTT)
Popular Applications of Artificial Neural Networks
Artificial Neural Networks (ANNs) have become essential tools for solving complex problems
that involve non-linear and dynamic data patterns
They are particularly effective in situations involving incomplete, noisy, or ambiguous data,
mimicking human-like learning and decision-making capabilities
1. ANNs play a crucial role in real-time operations such as face recognition, emotion
detection, autonomous vehicles, navigation and routing systems, target tracking, and vehicle
scheduling
2. They are used for stock market analysis, sales forecasting, modeling customer behavior, and
conducting market research to support strategic business decisions
3. Banking Applications include credit scoring, loan approval forecasting, fraud detection, risk
assessment, currency price prediction, and real estate valuation
4. ANNs support personalized learning environments through adaptive educational software
and help in predicting student performance and learning outcomes
5. In the medical field, ANNs assist in diagnosing diseases, interpreting symptoms,
recognizing medical patterns in imaging, and supporting drug discovery processes
6. Neural networks are widely adopted in robotics, aerospace, electronic circuit design,
communication systems, chemical process control, manufacturing, and food technology for
modeling and automation
Advantages of Artificial Neural Networks
1. ANNs can model and solve highly non-linear and complex problems that are difficult for
traditional algorithms
2. ANNs can effectively learn from data, recognize intricate patterns, and solve problems in a
manner similar to human reasoning
3. Due to their structure, ANNs can process multiple operations simultaneously, resulting in
faster computations and predictions
4. ANNs are resilient to incomplete, noisy, or imprecise data, enabling them to function even
with limited information
5. They scale well with large data volumes and tend to outperform many traditional learning
methods as the data size grows
Limitations of Artificial Neural Networks
1. Training ANNs requires powerful processors with parallel processing capabilities,
especially for large networks and multiple training epochs
2. ANNs are often criticized for their lack of interpretability. Understanding the internal
workings and representations at each layer is challenging
3. Designing and training neural networks can be highly complex and may require significant
time and expertise
4. Neural networks typically require large datasets to train effectively. Their performance may
degrade significantly on smaller datasets
5. Compared to traditional machine learning models, ANNs demand more memory, processing
power, and energy, making them costlier to run
Challenges of Artificial Neural Networks
1. One of the most demanding aspects of using ANNs is the training process
Overfitting (memorizing noise) or underfitting (failing to learn patterns) often occur when the
training dataset is not properly selected or curated
ANNs typically require extensive amounts of training data to achieve robustness and to be
viable for real-time applications
2. Difficulty in Optimizing Parameters
Determining the ideal weights and bias values in neural networks is a complex task

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy