Unit I
Unit I
Unit I
Artificial Intelligence (AI) is the general term for being able to make computers do things that
require intelligence if done by humans. AI can be broken down into two major fields, Machine
Learning (ML) and Neural Networks (NN). Both are subfields under Artificial Intelligence, and
each one has its methods and algorithms to help solve problems.
Machine learning
Machine Learning (ML) makes computers learn from data and experience to improve their
performance on some tasks or decision-making processes. ML uses statistics and probability
theory for this purpose. Machine learning uses algorithms to parse data, learn from it, and make
determinations without explicit programming. Machine learning algorithms are often categorized
as supervised or unsupervised. Supervised algorithms can apply what has been learned in the
past to new data sets; unsupervised algorithms can draw inferences from datasets. Machine
learning algorithms are designed to strive to establish linear and non-linear relationships in a
given set of data. This feat is achieved by statistical methods used to train the algorithm to
classify or predict from a dataset.
Deep learning
Deep learning is a subset of machine learning that uses multi-layered artificial neural networks to
deliver state-of-the-art accuracy in object detection, speech recognition and language translation.
Deep learning is a crucial technology behind driverless cars and enables the machine analysis of
large amounts of complex data — for example, recognizing the faces of people who appear in an
image or video.
Neural networks
Neural networks are inspired by biological neurons in the human brain and are composed of
layers of connected nodes called “neurons” that contain mathematical functions to process
incoming data and predict an output value. Artificial neural network learns by example, similarly
to how humans learn from our parents, teachers, and peers. They consist of at least three layers:
an input layer, hidden layers, and an output layer. Each layer contains nodes (also known as
neurons) which have weighted inputs that compute the output.
Working of deep learning : First, we need to identify the actual problem in order to get the
right solution and it should be understood, the feasibility of the Deep Learning should also be
checked (whether it should fit Deep Learning or not). Second, we need to identify the relevant
data which should correspond to the actual problem and should be prepared accordingly. Third,
Choose the Deep Learning Algorithm appropriately. Fourth, Algorithm should be used while
training the dataset. Fifth, Final testing should be done on the dataset.
Examples :
How to recognize square from other shapes?
...a) Check the four lines!
...b) Is it a closed figure?
...c) Does the sides are perpendicular from each other?
...d) Does all sides are equal?
So, Deep Learning is a complex task of identifying the shape and broken down into simpler
tasks at a larger side.
Limitations:
Learning through observations only.
The issue of biases.
Advantages :
Best in-class performance on problems.
Reduces need for feature engineering.
Eliminates unnecessary costs.
Identifies defects easily that are difficult to detect.
Disadvantages :
Large amount of data required.
Computationally expensive to train.
No strong theoretical foundation.
Applications :
Automatic Text Generation – Corpus of text is learned and from this model new text is
generated, word-by-word or character-by-character. Then this model is capable of learning how
to spell, punctuate, form sentences, or it may even capture the style.
Healthcare – Helps in diagnosing various diseases and treating it.
Automatic Machine Translation – Certain words, sentences or phrases in one language is
transformed into another language (Deep Learning is achieving top results in the areas of text,
images).
Image Recognition – Recognizes and identifies peoples and objects in images as well as to
understand content and context. This area is already being used in Gaming, Retail, Tourism, etc.
Predicting Earthquakes – Teaches a computer to perform viscoelastic computations which are
used in predicting earthquakes.
Deep learning has a wide range of applications in various fields such as computer vision, speech
recognition, natural language processing, and many more. Some of the most common
applications include:
Image and video recognition: Deep learning models are used to automatically classify images
and videos, detect objects, and identify faces. Applications include image and video search
engines, self-driving cars, and surveillance systems.
Speech recognition: Deep learning models are used to transcribe and translate speech in real-
time, which is used in voice-controlled devices, such as virtual assistants, and accessibility
technology for people with hearing impairments.
Natural Language Processing: Deep learning models are used to understand, generate and
translate human languages. Applications include machine translation, text summarization, and
sentiment analysis.
Robotics: Deep learning models are used to control robots and drones, and to improve their
ability to perceive and interact with the environment.
Healthcare: Deep learning models are used in medical imaging to detect diseases, in drug
discovery to identify new treatments, and in genomics to understand the underlying causes of
diseases.
Finance: Deep learning models are used to detect fraud, predict stock prices, and analyze
financial data.
Gaming: Deep learning models are used to create more realistic characters and environments,
and to improve the gameplay experience.
Recommender Systems: Deep learning models are used to make personalized recommendations
to users, such as product recommendations, movie recommendations, and news
recommendations.
Social Media: Deep learning models are used to identify fake news, to flag harmful content and
to filter out spam.
Autonomous systems: Deep learning models are used in self-driving cars, drones, and other
autonomous systems to make decisions based on sensor data.
(ii) CNN– It is also known as Convolutional Neural Networks. It is mainly used for
Image Data. It is used for Computer Vision. Some of the real-life applications are object
detection in autonomous vehicles. It contains a combination of convolutional layers and
neurons. It is more powerful than both ANN and RNN.
(iii) RNN–
It is also known as Recurrent Neural Networks. It is used to process and interpret time
series data. In this type of model, the output from a processing node is fed back into
nodes in the same or previous layers. The most known types of RNN are LSTM (Long
Short Term Memory) Networks
Now that we know the basics about Neural Networks, We know that Neural Networks’
learning capability is what makes it interesting.
There are 3 types of learning’s in Neural networks, namely
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Supervised Learning: As the name suggests, it is a type of learning that is looked after by a
supervisor. It is like learning with a teacher. There are input training pairs that contain a set of
input and the desired output. Here the output from the model is compared with the desired output
and an error is calculated, this error signal is sent back into the network for adjusting the weights.
This adjustment is done till no more adjustments can be made and the output of the model
matches the desired output. In this, there is feedback from the environment to the model.
Reinforcement Learning: It gets the best of both worlds, that is, the best of both Supervised
learning and Unsupervised learning. It is like learning with a critique. Here there is no exact
feedback from the environment, rather there is critique feedback. The critique tells how close our
solution is. Hence the model learns on its own based on the critique information. It is similar to
supervised learning in that it receives feedback from the environment, but it is different in that it
does not receive the desired output information, rather it receives critique information.
Scalars (0D tensors): A Tensor that contains only one number called a scalar (0-dimensional
tensor ). In NumPy, float32 or float64 number is a scalar-tensor.
Vectors (1D tensors): An array of numbers is called vectors or 1D tensors. A 1D tensor has
exactly one axis.
for example, x = np.array([1,2,3,4,5,6,7,8]), it has 8 entries hence called 8 dimensional vector.
An 8D tensor and 8D vector are different. An 8D vector has only one dimension along its axis,
whereas an 8D tensor has 8 axes and may have any number of dimensions along each axis.
Dimensionality can denote either the number of entries along a specific axis (as in the case of our
8D vector) or the number of axes in a tensor (such as an 8D tensor).
3. Matrices (2D tensors): An array of vectors is a matrix or 2D tensor. A matrix has two axes
(row and column).
>>> x = np.array([[5, 78, 2, 34, 0],[6, 79, 3, 35, 1],[7, 80, 4, 36, 2]])
>>> x.ndim
The entries from the first axis are called the rows, and the entries from the second axis are called
the columns
the above data is a 5D vector, 3D tensor and 2 axes. thus the shape of x=(3,2), where 3 is the
number of tensor and 2 is the axes.
Optimizers are algorithms or methods used to update the parameters of the network such as
weights, biases, etc to minimize the losses. Therefore, Optimizers are used to solve optimization
problems by minimizing the function i.e, loss function in the case of neural networks.
Consider a continuous, smooth function f(x) = y, mapping a real number x to a new real number
y. Because the function is continuous, a small change in x can only result in a small change in y
—that’s the intuition behind continuity. Let’s say you increase x by a small factor epsilon_x: this
results in a small epsilon_y change to y:
In addition, because the function is smooth (its curve doesn’t have any abrupt angles),
when epsilon_x is small enough, around a certain point p, it’s possible to approximate f as a
linear function of slope a, so that epsilon_y becomes a * epsilon_x:
Obviously, this linear approximation is valid only when x is close enough to p. The slope
a is called the derivative of f in p. If a is negative, it means a small change of x around p will
result in a decrease of f(x) and if a is positive, a small change in x will result in an increase of
f(x). Further, the absolute value of a (the magnitude of the derivative) tells you how quickly this
increase or decrease will happen.
Optimizers update the parameters of neural networks such as weights and learning rate to
minimize the loss function. Here, the loss function acts as a guide to the terrain telling optimizer
if it is moving in the right direction to reach the bottom of the valley, the global minimum. In
general, Gradient represents the slope of the equation while gradients are partial derivatives and
they describe the change reflected in the loss function with respect to the small change in
parameters of the function. Now, this slight change in loss functions can tell us about the next
step to reduce the output of the loss function.
The loss function, which defines the feedback signal used for learning
The optimizer, which determines how learning proceeds
References:
https://www.red-gate.com/simple-talk/development/data-science-
development/introduction-to-artificial-intelligence/