unit 3
unit 3
Unit – 3
Introduction to AI
Artificial Intelligence (AI) refers to the ability of machines or software systems to mimic human
intelligence and perform tasks that would typically require human cognition. These tasks can
include reasoning, decision-making, learning, problem-solving, speech recognition, and visual
perception.
2. Greedy Algorithms
Concept: A greedy algorithm makes the best local choice at each step with the hope that
these local choices will lead to a globally optimal solution.
Key Idea: The greedy approach does not reconsider its choices, assuming that local
optimums lead to the global optimum.
Steps:
1. Make the optimal choice at the current step.
2. Proceed to the next step with the decision made.
Time Complexity: Typically, linear or logarithmic, depending on the problem and how
the choices are made.
Examples:
o Activity Selection Problem: Select the maximum number of activities that don’t
overlap, choosing the one with the earliest finish time.
o Huffman Coding: Used in data compression, greedy algorithm selects the two
smallest frequencies to combine them into a new node.
o Dijkstra’s Algorithm: Finds the shortest path in a weighted graph by repeatedly
choosing the vertex with the smallest tentative distance.
Advantages:
o Fast and simple.
o Can solve many problems efficiently when the greedy choice property holds.
4. Gradient Descent
Concept: Gradient descent is an optimization algorithm used to minimize a function by
iteratively moving towards the minimum of the function in the direction of the negative
gradient (the steepest descent).
Key Idea: By taking steps proportional to the negative of the gradient, you can gradually
converge to a minimum (or a local minimum) of the function.
Steps:
1. Start with an initial guess for the parameters.
2. Compute the gradient (partial derivatives) of the loss function with respect to each
parameter.
3. Update the parameters by moving in the direction opposite to the gradient.
4. Repeat until convergence (i.e., the change in the loss function is very small).
Time Complexity: The number of iterations is key. Each iteration involves calculating
the gradient and updating the parameters, typically leading to a time complexity of O(n)
per iteration for large datasets.
Examples:
o Neural Networks: Used in backpropagation to minimize the error between
predicted and actual outputs.
o Linear Regression: Minimize the sum of squared errors between the predicted
and actual outputs.
o Logistic Regression: Minimizing the cross-entropy loss between predicted
probabilities and true labels.
Advantages:
o Widely used in machine learning, especially for training models.
o Can handle large datasets efficiently.
o Works well for many optimization problems, especially when the exact solution is
hard to compute directly.
Perceptron
The Perceptron is the simplest form of a neural network model and can be considered the
building block for more complex neural networks.
Key Features:
Single Layer: It consists of a single layer of neurons (also called units or nodes), which
perform a binary classification task.
Inputs and Weights: The perceptron receives inputs, each of which is multiplied by a
corresponding weight.
Activation Function: The weighted sum of the inputs is passed through an activation
function, typically a step function, which determines the output (0 or 1).
Formula:
Let’s define:
x1, x2…, xn as the input features.
w1, w2…, wn as the weights.
b as the bias term.
y as the output.
The perceptron computes:
Output = f (∑ (i=1 to n) wixi + b)
where f is the activation function. For a basic perceptron, f is usually a step function:
f(x)= {1 if x > 0
0 if x ≤ 0
Learning Rule (Training):
The weights w1, w2,…,wn and the bias b are updated during training using a simple rule:
wi←wi+Δwi
where: Δwi = η(ytrue−ypred) *xi
η is the learning rate
ytrue is the true label and ypred is the predicted output.
Limitations:
Linearly Separable Problems: Perceptrons can only solve problems where the data is
linearly separable (i.e., it can be divided by a straight line or hyperplane).
No Hidden Layers: A perceptron is very limited due to its single-layer structure, making
it unable to model complex, nonlinear relationships.
Advantages of FFN
Simple Architecture: Easy to implement and understand.
Flexibility: Can be used for a variety of tasks, including classification and regression.
Backpropagation
Backpropagation is the key algorithm used for training neural networks, including MLPs and
FFNs. It allows the network to adjust the weights based on the error between the predicted
output and the actual target.
Key Idea:
Goal: Minimize the loss function by adjusting the weights in such a way that the error is
minimized.
How It Works: Backpropagation computes the gradient of the loss function with
respect to each weight in the network by applying the chain rule of calculus. This allows
us to perform gradient descent and update the weights efficiently.
Steps in Backpropagation:
1. Forward Pass: Compute the output of the network for a given input by passing it
through all the layers.
2. Compute the Loss: Calculate the loss (or error) by comparing the predicted output with
the true label using a loss function (e.g., mean squared error for regression or cross-
entropy for classification).
3. Backward Pass:
o Compute the gradient of the loss function with respect to the output layer.
o Using the chain rule, propagate the gradient back through the network, layer by
layer, adjusting the weights.
o Update the weights using the gradient descent update rule: w = w – η ⋅ ∂L/∂w
where L is the loss function and η is the learning rate.
4. Repeat: Continue this process iteratively across the entire dataset, updating the weights
after each mini-batch or epoch until convergence.
Key Components:
Loss Function: Measures the difference between the predicted output and the true output.
Common choices are Mean Squared Error (MSE) for regression tasks and Cross-
Entropy for classification tasks.
Gradient Descent: Used to update the weights based on the gradients computed during
backpropagation.