0% found this document useful (0 votes)
4 views

1. Topic wise Lecture notes unit 4

The document discusses Support Vector Machine (SVM) algorithms, which are primarily used for classification problems in machine learning by creating optimal decision boundaries called hyperplanes. It also covers Bayes' Theorem, a fundamental concept in probability that calculates conditional probabilities, and its applications in various fields, including medicine and engineering. Additionally, it highlights the integration of machine learning and computer vision, particularly in the development of self-driving cars.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1. Topic wise Lecture notes unit 4

The document discusses Support Vector Machine (SVM) algorithms, which are primarily used for classification problems in machine learning by creating optimal decision boundaries called hyperplanes. It also covers Bayes' Theorem, a fundamental concept in probability that calculates conditional probabilities, and its applications in various fields, including medicine and engineering. Additionally, it highlights the integration of machine learning and computer vision, particularly in the development of self-driving cars.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-5

1.1 Support Vector Machine Algorithm [1]

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using a
decision boundary or hyperplane:

Figure 1-1. SVM approach

Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a model can be created by using the SVM
algorithm. We will first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange creature. So as
support vector creates a decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the
support vectors, it will classify it as a cat. Consider the below diagram:

Figure 1-2. Example

SVM algorithm can be used for Face detection, image classification, text categorization, etc.

Types of SVM

SVM can be of two types:

i. Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
ii. Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
1.1.1 Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-


dimensional space, but we need to find out the best decision boundary that helps to classify the
data points. This best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which means if
there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are
3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum distance
between the data points.

Support Vectors:The data points or vectors that are the closest to the hyperplane and which affect
the position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.

1.1.2 How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we have a
dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a
classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider the
below image:
Figure 1-3. Example

So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But
there can be multiple lines that can separate these classes. Consider the below image:

Figure 1-4. Example


Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both
the classes. These points are called support vectors. The distance between the vectors and the
hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.

Figure 1-5. Hyperplane

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:

Figure 1-5. Example


So to separate these data points, we need to add one more dimension. For linear data, we have
used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be
calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:

Figure 1-6. Example

So now, SVM will divide the datasets into classes in the following way. Consider the below
image:
Figure 1-7. Example

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it
in 2d space with z=1, then it will become as:

Figure 1-8. Example


Hence we get a circumference of radius 1 in case of non-linear data.

2. Bayes’ Theorem [2]


Bayes theorem (also known as the Bayes Rule or Bayes Law) is used to determine the
conditional probability of event A when event B has already occurred.
The general statement of Bayes’ theorem is “The conditional probability of an event A, given the
occurrence of another event B, is equal to the product of the event of B, given A and the
probability of A divided by the probability of event B.” i.e.
P(A|B) = P(B|A)P(A) / P(B)

where,

● P(A) and P(B) are the probabilities of events A and B

● P(A|B) is the probability of event A when event B happens

● P(B|A) is the probability of event B when A happens

2.1 Bayes Theorem Statement


Bayes’ Theorem for n set of events is defined as,
Let E1, E2,…, En be a set of events associated with the sample space S, in which all the events
E1, E2,…, En have a non-zero probability of occurrence. All the events E1, E2,…, E form a
partition of S. Let A be an event from space S for which we have to find probability, then
according to Bayes’ theorem,
P(Ei|A) = P(Ei)P(A|Ei) / ∑ P(Ek)P(A|Ek)
for k = 1, 2, 3, …., n

2.2 Bayes Theorem Formula


For any two events A and B, then the formula for the Bayes theorem is given by: (the image
given below gives the Bayes’ theorem formula)
Bayes’ Theorem Formula

where,

● P(A) and P(B) are the probabilities of events A and B also P(B) is never equal to zero.

● P(A|B) is the probability of event A when event B happens

● P(B|A) is the probability of event B when A happens

2.3 Bayes Theorem Derivation


The proof of Bayes’ Theorem is given as, according to the conditional probability formula,
P(Ei|A) = P(Ei∩A) / P(A)…..(i)
Then, by using the multiplication rule of probability, we get

P(Ei∩A) = P(Ei)P(A|Ei)……(ii)
Now, by the total probability theorem,
P(A) = ∑ P(Ek)P(A|Ek)…..(iii)
Substituting the value of P(Ei∩A) and P(A) from eq (ii) and eq(iii) in eq(i) we
get,
P(Ei|A) = P(Ei)P(A|Ei) / ∑ P(Ek)P(A|Ek)
Bayes’ theorem is also known as the formula for the probability of “causes”. As we know, the
Ei‘s are a partition of the sample space S, and at any given time only one of the events Ei occurs.
Thus we conclude that the Bayes’ theorem formula gives the probability of a particular Ei, given
the event A has occurred.
Various terms used in the Bayes theorem are explained below in this article.

2.4 Terms Related to Bayes Theorem


After learning about Bayes theorem in detail, let us understand some important terms related to
the concepts we covered in formula and derivation.
Hypotheses
Events happening in the sample space E1, E2,… En is called the hypotheses
Priori Probability
Priori Probability is the initial probability of an event occurring before any new data is taken into
account. P(Ei) is the priori probability of hypothesis Ei.
Posterior Probability
Posterior Probability is the updated probability of an event after considering new information.
Probability P(Ei|A) is considered as the posterior probability of hypothesis Ei
Conditional Probability
The probability of an event A based on the occurrence of another event B is termed conditional
Probability. It is denoted as P(A|B) and represents the probability of A when event B has already
happened.
Joint Probability
When the probability of two more events occurring together and at the same time is measured it
is marked as Joint Probability. For two events A and B, it is denoted by joint probability is
denoted as, P(A∩B).
Random Variables
Real-valued variables whose possible values are determined by random experiments are called
random variables. The probability of finding such variables is the experimental probability.
2.5 Applications of Bayes’ Theorem
Bayesian inference is very important and has found application in various activities, including
medicine, science, philosophy, engineering, sports, law, etc., and Bayesian inference is directly
derived from Bayes’ theorem.
Example: Bayes’ theorem defines the accuracy of the medical test by taking into account how
likely a person is to have a disease and what is the overall accuracy of the test.
2.6 Difference between Conditional Probability and Bayes Theorem
The difference between Conditional Probability and Bayes Theorem can be understood with the
help of the table given below,

Bayes’ Theorem Conditional Probability

Bayes’ Theorem is derived using the definition of Conditional Probability is the probability
Bayes’ Theorem Conditional Probability

conditional probability. It is used to find the of event A when event B has already
reverse probability. occurred.

Formula: P(A|B) = [P(B|A)P(A)] / P(B) Formula: P(A|B) = P(A∩B) / P(B)

2.6 Theorem of Total Probability


Let E1, E2,…………..En is mutually exclusive and exhaustive events associated with a random
experiment and lets E be an event that occurs with some Ei. Then, prove that
P(E) = n∑i=1P(E/Ei) . P(Ej)
Proof:
Let S be the sample space. Then,
S = E1 ∪ E2 ∪ E3 ∪………………… ∪ En and Ei ∩ Ej = ∅ for i ≠ j.
E = E∩S
= E ∩ (E1 ∪ E2 ∪ E3 ∪………………… ∪ En)
= (E ∩ E1) ∪ (E ∩ E2) ∪ ……∪ (E ∩ En)
P(E) = P{(E ∩ E1) ∪ (E ∩ E2)∪……∪(E ∩ En)}
= P(E ∩ E1) + P(E ∩ E2) + …… + P(E ∩ En)
= {Therefore, (E ∩ E1), (E ∩ E2),………….,(E ∩ En)} are pairwise disjoint}
= P(E/E1) . P(E1) + P(E/E2) . P(E2) +……………………+ P(E/En) . P(En) [by multiplication
theorem]
= n∑i=1P(E/Ei) . P(Ei)

2.7 Numerical Examples of Bayes Theorem


Example 1: A person has undertaken a job. The probabilities of completion of the job on
time with and without rain are 0.44 and 0.95 respectively. If the probability that it will rain
is 0.45, then determine the probability that the job will be completed on time.
Solution:
Let E1 be the event that the mining job will be completed on time and E2 be the event that it
rains. We have,
P(A) = 0.45,
P(no rain) = P(B) = 1 − P(A) = 1 − 0.45 = 0.55
By multiplication law of probability,
P(E1) = 0.44
P(E2) = 0.95
Since, events A and B form partitions of the sample space S, by total probability theorem, we
have
P(E) = P(A) P(E1) + P(B) P(E2)
= 0.45 × 0.44 + 0.55 × 0.95
= 0.198 + 0.5225 = 0.7205
So, the probability that the job will be completed on time is 0.684.

Example 2: There are three urns containing 3 white and 2 black balls; 2 white and 3 black
balls; 1 black and 4 white balls respectively. There is an equal probability of each urn being
chosen. One ball is equal probability chosen at random. what is the probability that a white
ball is drawn?
Solution:
Let E1, E2, and E3 be the events of choosing the first, second, and third urn respectively. Then,
P(E1) = P(E2) = P(E3) =1/3
Let E be the event that a white ball is drawn. Then,
P(E/E1) = 3/5, P(E/E2) = 2/5, P(E/E3) = 4/5
By theorem of total probability, we have
P(E) = P(E/E1) . P(E1) + P(E/E2) . P(E2) + P(E/E3) . P(E3)
= (3/5 × 1/3) + (2/5 × 1/3) + (4/5 × 1/3)
= 9/15 = 3/5

3. Applications of Machine Learning for Computer Vision [3]


AI is nothing new to anyone reading this computer vision blog. Siri, Alexa, and web chatbots
have made AI commonplace. Yet, computer vision gives AI a pair of eyes that can be taught
with machine learning.

3.1 What is machine learning?


Machine learning is the application of statistical models and algorithms to perform tasks without
the need to introduce explicit instructions. It relies on inference and pattern recognition using
existing data sets. It requires minimal assistance from programmers in making decisions.
3.2 What is computer vision?
Computer vision refers to the ability of a machine to understand images and videos. It mimics
the capability of human vision by acquiring, processing, and analyzing real-world data and
synthesizing them into useful information. It uses a camera to capture images and videos to
analyze, which can then be purposed for object recognition, motion estimation, and video
tracking.

3.3 Six applications of computer vision


Machine learning and computer vision are often used together to effectively acquire, analyze,
and interpret captured visual data. Here are six applications of these technologies to help in the
marketplace.

3.3.1. Automotive
Self-driving cars are slowly making their way into the market, with more companies looking for
innovative ways to bring more electric vehicles onto the road. Computer vision technology helps
these self-driving vehicles ‘see’ the environment while machine learning algorithms create the
“brains” that help that computer vision interpret the objects around the car.

Self-driving cars are equipped with multiple cameras to provide a complete 360-degree view of
the environment within a range of hundreds of meters. Tesla cars, for instance, uses up to 8
surround cameras to achieve this feat. Twelve ultrasonic sensors for detecting hard and soft
objects on the road and a forward-facing radar that enables the detection of other vehicles even
through rain or fog are also installed to complement the cameras.
With large amounts of data being fed into the vehicle, a simple computer won’t be enough to
handle the influx of information. This is why all self-driving cars have an onboard computer with
computer vision features created through machine learning.

The cameras and sensors are tasked to both detect and classify objects in the environment – like
pedestrians. The location, density, shape, and depth of the objects have to be considered
instantaneously to enable the rest of the driving system to make appropriate decisions. All these
computations are only possible through the integration of machine learning and deep neural
networks which results in features like pedestrian detection.
Figure 3-1. A Tesla Car’s Vision (source-Tesla)

Road conditions, traffic situations, and other environmental factors don’t remain the same every
time you get in the car. Having a computer simply memorize what it sees won’t be useful when
changes are suddenly introduced into the environment. Machine learning helps the computer
“understand” what it sees, allowing the system to quickly adapt to whichever environment it’s
brought into. That’s artificial intelligence.

3.3.2. Banking
Banks are also using computer vision and machine learning to quickly authenticate documents
such as IDs, checks, and passports. A customer can just take a photo of themselves or their ID
using a mobile device to authorize transactions, but liveliness detection and anti-spoofing can be
acquired through machine learning and then detected by computer vision.

Some banks are starting to implement online deposit of checks through a mobile phone app.
Using computer vision and machine learning, the system is designed to read the important details
on an uploaded photo of a check for deposit. The algorithm can automatically correct distortions,
skews, warps, and poor lighting conditions present on the image.
There’s no need to go to the bank to deposit checks or process other transactions that used to be
done over-the-counter. The Mercantile Bank of Michigan which adopted this system was able to
realize a 20% increase in its online bank users.
3.3.3. Industrial facilities management
The industrial sector has critical infrastructure which must always be monitored, secured, and
regulated to avoid any kind of loss or damage. In the oil industry, for example, remote oil wells
must be monitored regularly to ensure smooth operation. However, with sites deployed in several
regions, it would be very costly to do site visits every so often.

Using machine learning and computer vision, oil companies can monitor sites 24/7 without
having to deploy employees. The system can be programmed to read tank levels, spot leaks, and
ensure the security of the facilities. Alerts are raised whenever an anomaly is detected in any of
the sites, enabling a quick response from the management team.

The way computer vision is used in the scenario above can be adopted by chemical factories,
refineries, and even nuclear power plants. Sensors and camera feed must all be connected and
handled by a powerful AI fully capable of utilizing computer vision and machine learning to
detect pedestrians and vehicles approaching or entering the facilities.

3.3.4. Healthcare
There are several applications for machine learning and computer vision in healthcare.
Accurately classifying illnesses is becoming better now, thanks to computer vision technology.
With machine learning training, AI can “learn” what diseases look like in medical imaging. It is
now even possible to diagnose patients using a mobile phone, eliminating the need to line up in
hospitals for an appointment.

Gauss Surgical, a medical technology company, is using cloud-based computer vision


technology and machine learning algorithms to estimate blood loss during surgical operations.
Using an iPad-based app and a camera, the captured images of suction canisters and surgical
sponges are analyzed to predict the possibility of hemorrhage. They’re found to be more accurate
than the visual estimates of doctors during medical procedures.
3.3.5. Retail
It is powering identification and recommendation engines for several high traffic sites, and we
are also working on inventory systems, but computer vision is also being used in the physical
world by other companies.
Amazon, notably, recently opened their Amazon Go store where shoppers can just pick up any
item and leave the store without having to go through a checkout counter. Automatic electronic
payments are made possible by equipping the Go store with cameras with computer vision
capabilities.

Figure 3-2. an Amazon Go Branch (Source:CNET)

Cameras are placed on aisles and shelves to monitor when a customer picks up or returns an
item. Each customer is assigned a virtual basket that gets filled according to the item they take
from the shelves. When done, customers can freely walk out of the store and the cost will be
charged to their Amazon account.

Cashiers have been eliminated through this program, a personal cost savings, allowing for a
faster and more convenient checkout process. Security won’t be an issue also since the system
can track multiple individuals simultaneously without using facial recognition.

Amazon has also applied for a patent for a virtual mirror. This technology makes use of
computer vision to project the image of the individual looking at the mirror. Various
superimpositions like clothes and accessories can then be placed over the reflection, allowing the
shopper to try different items without needing to physically put them on.

3.3.6. Security
The security sector benefits greatly the most from the perfect unison between machine learning
and computer vision. For instance, airports, stadiums, and even streets are installed with facial
recognition systems to identify terrorists and wanted criminals. Cameras can quickly match an
individual’s face against a database and prompt authorities on the presence of known threats in
the facility.

Offices are also installing CCTV cameras to identify who enters and exits the premises. Some
rooms accessible only to authorized personnel can be set with an automatic alarm when an
unrecognized individual is identified by the camera linked to a computer vision system.

Retail security has also been quick to take up computer vision and machine learning to improve
the safety of business assets. Retailers have been using computer technology to reduce theft and
losses at their branches by installing intelligent cameras in the vicinity.
Checkout can also be monitored. Using computer vision technology, cameras can be placed over
checkout counters to monitor product scans. Any item that crosses the scanner without being
tagged as a sale is labeled by the software as a loss. The report is then sent to the management to
handle the issue and prevent similar incidents from happening.[3]
4. Speech Recognition [4]

Speech recognition, or speech-to-text, is the ability of a machine or program to identify words


spoken aloud and convert them into readable text. Rudimentary speech recognition software has
a limited vocabulary and may only identify words and phrases when spoken clearly. More
sophisticated software can handle natural speech, different accents and various languages.

Speech recognition uses a broad array of research in computer science, linguistics and computer
engineering. Many modern devices and text-focused programs have speech recognition functions
in them to allow for easier or hands-free use of a device.

Speech recognition and voice recognition are two different technologies and should not be
confused:

● Speech recognition is used to identify words in spoken language.

● Voice recognition is a biometric technology for identifying an individual's voice.

4.1 How does speech recognition work?

Speech recognition systems use computer algorithms to process and interpret spoken words and
convert them into text. A software program turns the sound a microphone records into written
language that computers and humans can understand, following these four steps:

1. analyze the audio;

2. break it into parts;

3. digitize it into a computer-readable format; and

4. use an algorithm to match it to the most suitable text representation.

Speech recognition software must adapt to the highly variable and context-specific nature of
human speech. The software algorithms that process and organize audio into text are trained on
different speech patterns, speaking styles, languages, dialects, accents and phrasings. The
software also separates spoken audio from background noise that often accompanies the signal.
To meet these requirements, speech recognition systems use two types of models:

● Acoustic models. These represent the relationship between linguistic units of speech and
audio signals.

● Language models. Here, sounds are matched with word sequences to distinguish between
words that sound similar.

4.2 What applications is speech recognition used for?

Speech recognition systems have quite a few applications. Here is a sampling of them.

Mobile devices. Smartphones use voice commands for call routing, speech-to-text processing,
voice dialing and voice search. Users can respond to a text without looking at their devices. On
Apple iPhones, speech recognition powers the keyboard and Siri, the virtual assistant.
Functionality is available in secondary languages, too. Speech recognition can also be found in
word processing applications like Microsoft Word, where users can dictate words to be turned
into text.

Figure 4-1.Virtual assistants use speech recognition to communicate with users and perform a variety of tasks
triggered by voice commands.

Education. Speech recognition software is used in language instruction. The software hears the
user's speech and offers help with pronunciation.
Customer service. Automated voice assistants listen to customer queries and provides helpful
resources.

Healthcare applications. Doctors can use speech recognition software to transcribe notes in real
time into healthcare records.

Disability assistance. Speech recognition software can translate spoken words into text
using closed captions to enable a person with hearing loss to understand what others are saying.
Speech recognition can also enable those with limited use of their hands to work with computers,
using voice commands instead of typing.

Court reporting. Software can be used to transcribe courtroom proceedings, precluding the need
for human transcribers.

Emotion recognition. This technology can analyze certain vocal characteristics to determine what
emotion the speaker is feeling. Paired with sentiment analysis, this can reveal how someone feels
about a product or service.

Hands-free communication. Drivers use voice control for hands-free communication, controlling
phones, radios and global positioning systems, for instance.

4.3 What are the features of speech recognition systems?

Good speech recognition programs let users customize them to their needs. The features that
enable this include:

● Language weighting. This feature tells the algorithm to give special attention to certain
words, such as those spoken frequently or that are unique to the conversation or subject. For
example, the software can be trained to listen for specific product references.

● Acoustic training. The software tunes out ambient noise that pollutes spoken audio. Software
programs with acoustic training can distinguish speaking style, pace and volume amid the din
of many people speaking in an office.

● Speaker labeling. This capability enables a program to label individual participants and
identify their specific contributions to a conversation.
● Profanity filtering. Here, the software filters out undesirable words and language.

4.4 What are the different speech recognition algorithms?

The power behind speech recognition features comes from a set of algorithms and technologies.
They include the following:

● Hidden Markov model. HMMs are used in autonomous systems where a state is partially
observable or when all of the information necessary to make a decision is not immediately
available to the sensor (in speech recognition's case, a microphone). An example of this is in
acoustic modeling, where a program must match linguistic units to audio signals using
statistical probability.

● Natural language processing. NLP eases and accelerates the speech recognition process.

● N-grams. This simple approach to language models creates a probability distribution for a
sequence. An example would be an algorithm that looks at the last few words spoken,
approximates the history of the sample of speech and uses that to determine the probability
of the next word or phrase that will be spoken.

● Artificial intelligence. AI and machine learning methods like deep learning and neural
networks are common in advanced speech recognition software. These systems use grammar,
structure, syntax and composition of audio and voice signals to process speech. Machine
learning systems gain knowledge with each use, making them well suited for nuances like
accents.

4.5 What are the advantages of speech recognition?

There are several advantages to using speech recognition software, including the following:

● Machine-to-human communication. The technology enables electronic devices to


communicate with humans in natural language or conversational speech.

● Readily accessible. This software is frequently installed in computers and mobile devices,
making it accessible.

● Easy to use. Well-designed software is straightforward to operate and often runs in the
background.
● Continuous, automatic improvement. Speech recognition systems that incorporate AI become
more effective and easier to use over time. As systems complete speech recognition tasks,
they generate more data about human speech and get better at what they do.

4.6 What are the disadvantages of speech recognition?

While convenient, speech recognition technology still has a few issues to work through.
Limitations include:

● Inconsistent performance. The systems may be unable to capture words accurately because
of variations in pronunciation, lack of support for some languages and inability to sort
through background noise. Ambient noise can be especially challenging. Acoustic training
can help filter it out, but these programs aren't perfect. Sometimes it's impossible to isolate
the human voice.

● Speed. Some speech recognition programs take time to deploy and master. The speech
processing may feel relatively slow.

● Source file issues. Speech recognition success depends on the recording equipment used, not
just the software.

5. Natural Language Processing [5]

Humans communicate with each other using words and text. The way that humans convey
information to each other is called Natural Language. Every day humans share a large quality
of information with each other in various languages as speech or text.

However, computers cannot interpret this data, which is in natural language, as they
communicate in 1s and 0s. The data produced is precious and can offer valuable insights.
Hence, you need computers to be able to understand, emulate and respond intelligently to
human speech.

Natural Language Processing or NLP refers to the branch of Artificial Intelligence that gives
the machines the ability to read, understand and derive meaning from human languages.

NLP combines the field of linguistics and computer science to decipher language structure
and guidelines and to make models which can comprehend, break down and separate
significant details from text and speech.
Figure 5-1. Constituents of NLP

5.1 How to Perform NLP?

The steps to perform preprocessing of data in NLP include:

● Segmentation:

You first need to break the entire document down into its constituent sentences. You can do this
by segmenting the article along with its punctuations like full stops and commas.

Figure 5-2. Segmentation

● Tokenizing:
For the algorithm to understand these sentences, you need to get the words in a sentence and
explain them individually to our algorithm. So, you break down your sentence into its constituent
words and store them. This is called tokenizing, and each world is called a token.

Figure 5-3. Tokenizing

● Removing Stop Words:

You can make the learning process faster by getting rid of non-essential words, which add little
meaning to our statement and are just there to make our statement sound more cohesive. Words
such as was, in, is, and, the, are called stop words and can be removed.

Figure 5-4. Removing Stop Words

● Stemming:

It is the process of obtaining the Word Stem of a word. Word Stem gives new words upon
adding affixes to them
Figure 5-5. stemming

● Lemmatization:

The process of obtaining the Root Stem of a word. Root Stem gives the new base form of a word
that is present in the dictionary and from which the word is derived. You can also identify the
base words for different words based on the tense, mood, gender,etc.

Figure 5-6.Lemmatization

● Part of Speech Tagging:


Now, you must explain the concept of nouns, verbs, articles, and other parts of speech to the
machine by adding these tags to our words. This is called ‘part of’.

Figure 5-7.Part of Speech Tagging

● Named Entity Tagging:

Next, introduce your machine to pop culture references and everyday names by flagging names
of movies, important personalities or locations, etc that may occur in the document. You do this
by classifying the words into subcategories. This helps you find any keywords in a sentence. The
subcategories are person, location, monetary value, quantity, organization, movie.

After performing the preprocessing steps, you then give your resultant data to a machine learning
algorithm like Naive Bayes, etc., to create your NLP application.

5.2 Applications of NLP

NLP is one of the ways that people have humanized machines and reduced the need for labor. It
has led to the automation of speech-related tasks and human interaction. Some applications of
NLP include :

● Translation Tools: Tools such as Google Translate, Amazon Translate, etc. translate sentences
from one language to another using NLP.

● Chatbots: Chatbots can be found on most websites and are a way for companies to deal with
common queries quickly.
● Virtual Assistants: Virtual Assistants like Siri, Cortana, Google Home, Alexa, etc can not
only talk to you but understand commands given to them.

● Targeted Advertising: Have you ever talked about a product or service or just googled
something and then started seeing ads for it? This is called targeted advertising, and it
helps generate tons of revenue for sellers as they can reach niche audiences at the right
time.

● Autocorrect: Autocorrect will automatically correct any spelling mistakes you make, apart
from this grammar checkers also come into the picture which helps you write flawlessly.

6. CASE STUDY ON IMAGENET [6]

Figure 6-1. A snapshot of two root to leaf branches of ImageNet: mammal sub tree and vehicle sub tree.

ImageNet is a large database or dataset of over 14 million images. It was designed by academics
intended for computer vision research. It was the first of its kind in terms of scale. Images are
organized and labelled in a hierarchy.

In Machine Learning and Deep Neural Networks, machines are trained on a vast dataset of
various images. Machines are required to learn useful features from these training images. Once
learned, they can use these features to classify images and perform many other tasks associated
with computer vision. ImageNet gives researchers a common set of images to benchmark their
models and algorithms.

It's fair to say that ImageNet has played an important role in the advancement of computer
vision.

● Where is ImageNet useful and how has it advanced computer vision?

ImageNet is useful for many computer vision applications such as object recognition, image
classification and object localization.

Prior to ImageNet, a researcher wrote one algorithm to identify dogs, another to identify cats,
and so on. After training with ImageNet, the same algorithm could be used to identify different
objects.

The diversity and size of ImageNet meant that a computer looked at and learned from many
variations of the same object. These variations could include camera angles, lighting conditions,
and so on. Models built from such extensive training were better at many computer vision tasks.
ImageNet convinced researchers that large datasets were important for algorithms and models to
work well. In fact, their algorithms performed better after they were trained with ImageNet
dataset.

Samy Bengio, a Google research scientist, has said of ImageNet, "Its size is by far much greater
than anything else available in the computer vision community, and thus helped some
researchers develop algorithms they could never have produced otherwise."

6.1 What are some technical details of ImageNet?

ImageNet consists of 14,197,122 images organized into 21,841 subcategories. These


subcategories can be considered as sub-trees of 27 high-level categories. Thus, ImageNet is a
well-organized hierarchy that makes it useful for supervised machine learning tasks.

On average, there are over 500 images per subcategory. The category "animal" is most widely
covered with 3822 subcategories and 2799K images. The "appliance" category has on average
1164 images per subcategory, which is the most for any category. Among the categories with
least number of images are "amphibian", "appliance", and "utensil".

As many as 1,034,908 images have been annotated with bounding boxes. For example, if an
image contains a cat as its main subject, the coordinates of a rectangle that bounds the cat are
also published on ImageNet. This makes it useful for computer vision tasks such as object
localization and detection.

Then there's Scale-Invariant Feature Transform (SIFT) used in computer vision. SIFT helps
in detecting local features in an image. ImageNet gives researchers 1000 subcategories
with SIFT features covering about 1.2 million images.

Images vary in resolution but it's common practice to train deep learning models on sub-sampled
images of 256x256 pixels.

6.2 Explain how ImageNet defined the subcategories?

Figure 6-2. Treemap Visualization of first-level subcategories of geological formations.

● In fact, ImageNet did not define these subcategories on its own but derived these from
WordNet. WordNet is a database of English words linked together by semantic relationships.
Words of similar meaning are grouped together into a synonym set, simply called synset.
Hypernyms are synsets that are more general. Thus, "organism" is a hypernym of "plant".
Hyponyms are synsets that are more specific. Thus, "aquatic" is a hyponym of "plant".
This hierarchy makes it useful for computer vision tasks. If the model is not sure about a
subcategory, it can simply classify the image higher up the hierarchy where the error probability
is less. For example, if model is unsure that it's looking at a rabbit, it can simply classify it as a
mammal.

While WordNet has 100K+ synsets, only the nouns have been considered by ImageNet.

6.3 How were the images labelled in ImageNet?

In the early stages of the ImageNet project, a quick calculation showed that by employing a few
people, they would need 19 years to label the images collected for ImageNet. But in the summer
of 2008, researchers came to know about an Amazon service called Mechanical Turk. This
meant that image labelling can be crowd sourced via this service. Humans all over the world
would label the images for a small fee.

Humans make mistakes and therefore we must have checks in place to overcome them. Each
human is given a task of 100 images. In each task, 6 "gold standard" images are placed with
known labels. At most 2 errors are allowed on these standard images, otherwise the task has to
be restarted.

In addition, the same image is labeled by three different humans. When there's disagreement,
such ambiguous images are resubmitted to another human with tighter quality threshold (only
one allowed error on the standard images).

6.4 How are the images of ImageNet licensed?

Images for ImageNet were collected from various online sources. ImageNet doesn't own the
copyright for any of the images. This has implication on how ImageNet shares the images to
researchers.

For public access, ImageNet provides image thumbnails and URLs from where the original
images were downloaded. Researchers can use these URLs to download the original images.
However, those who wish to use the images for non-commercial or educational purpose, can
create an account on ImageNet and request access. This will allow direct download of images
from ImageNet. This is useful when the original sources of images are no longer available.

The dataset can be explored via a browser-based user interface. Alternatively, there's also
an API. Researchers may want to read the API Documentation. This documentation also shares
how to download image features and bounding boxes.
6.5 What is the ImageNet Challenge and what's its connection with the dataset?

Figure 6-3. Performance of winning entries of ILSVRC 2010-2014

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was an annual computer vision
contest held between 2010 and 2017. It's also called ImageNet Challenge.

For this challenge, the training data is a subset of ImageNet: 1000 synsets, 1.2 million images.
Images for validation and test are not part of ImageNet and are taken from Flickr and via image
search engines. There are 50K images for validation and 150K images for testing. These are
hand-labeled with the presence or absence of 1000 synsets.

The Challenge included three tasks: image classification, single-object localization


(since ILSVRC 2011), and object detection (since ILSVRC 2013). More difficult tasks are based
upon these tasks. In particular, image classification is the common denominator for many other
computer vision tasks. Tasks related to video processing, but not part of the main competition,
were added in ILSVRC 2015. These were object detection in video and scene classification.

REFERENCES-

1. https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm

2. https://www.geeksforgeeks.org/bayes-theorem/

3. https://www.chooch.com/blog/6-applications-of-machine-learning-for-computer-vision/

4. https://www.techtarget.com/searchcustomerexperience/definition/speech-recognition

5. https://www.simplilearn.com/tutorials/artificial-intelligence-tutorial/what-is-natural-language-processing-nlp

6. https://devopedia.org/imagenet

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy