Chunk 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

INTRODUCTION TO

ARTIFICIAL INTELLIGENCE
LIBFEXDLBDSEAIS01
INTRODUCTION TO ARTIFICIAL
INTELLIGENCE
MASTHEAD

Publisher:
The London Institute of Banking & Finance
8th Floor, Peninsular House
36 Monument Street
London
EC3R 8LJ
United Kingdom

Administrative Centre Address:


4-9 Burgate Lane
Canterbury
Kent
CT1 2XJ
United Kingdom

LIBFEXDLBDSEAIS01
Version No.: 001-2024-0327

Hamidreza Kobdani; Kristina Schaaff


Cover image: Adobe Stock, 2024.

© 2024 The London Institute of Banking & Finance


This course book is protected by copyright. All rights reserved.
This course book may not be reproduced and/or electronically edited, duplicated, or dis-
tributed in any kind of form without written permission by the The London Institute of
Banking & Finance.
The authors/publishers have identified the authors and sources of all graphics to the best
of their abilities. However, if any erroneous information has been provided, please notify
us accordingly.

2
TABLE OF CONTENTS
INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Introduction
Signposts Throughout the Course Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Unit 1
History of AI 9

1.1 Historical Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10


1.2 AI Winter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Expert Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Notable Advances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Unit 2
Modern AI Systems 27

2.1 Narrow versus General AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28


2.2 Application Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Unit 3
Reinforcement Learning 35

3.1 What is Reinforcement Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36


3.2 Markov Decision Process and Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Temporal Difference and Q–Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Unit 4
Natural Language Processing 43

4.1 Introduction to NLP and Application Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44


4.2 Basic NLP Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Vectorizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Unit 5
Computer Vision 63

5.1 Introduction to Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64


5.2 Image Representation and Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 Semantic segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3
Appendix
List of References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
List of Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4
INTRODUCTION
WELCOME
SIGNPOSTS THROUGHOUT THE COURSE BOOK

This course book contains the core content for this course. Additional learning materials
can be found on the learning platform, but this course book should form the basis for your
learning.

The content of this course book is divided into units, which are divided further into sec-
tions. Each section contains only one new key concept to allow you to quickly and effi-
ciently add new learning material to your existing knowledge.

At the end of each section of the digital course book, you will find self-check questions.
These questions are designed to help you check whether you have understood the con-
cepts in each section.

For all modules with a final exam, you must complete the knowledge tests on the learning
platform. You will pass the knowledge test for each unit when you answer at least 80% of
the questions correctly.

When you have passed the knowledge tests for all the units, the course is considered fin-
ished and you will be able to register for the final assessment. Please ensure that you com-
plete the evaluation prior to registering for the assessment.

Good luck!

6
LEARNING OBJECTIVES
In this course, you will get an introduction to the field of artificial intelligence.

The discipline of Artificial Intelligence originates from various fields of study such as cog-
nitive science and neuroscience. The coursebook starts with an overview of important
events and paradigms that have shaped the current understanding of artificial intelli-
gence. In addition, you will learn about the typical tasks and application areas of artificial
intelligence.

On the completion of this coursebook, you will understand the concepts behind reinforce-
ment learning, which are comparable to the human way of learning in the real world by
exploration and exploitation.

Moreover, you will learn about the fundamentals of natural language processing and com-
puter vision. Both are important for artificial agents to be able to interact with their envi-
ronment.

7
UNIT 1
HISTORY OF AI

STUDY GOALS

On completion of this unit, you will be able to …

– describe how artificial intelligence has developed as a scientific discipline.


– understand the different paradigms of artificial intelligence winter.
– explain the importance of expert systems and how they have contributed to artificial
intelligence.
– talk about the advances of artificial intelligence.
1. HISTORY OF AI

Introduction
This unit will discuss the history of artificial intelligence (AI). We will start with the histori-
cal developments of AI which date back to Ancient Greece. We will also discuss the recent
history of AI.

In the next step, we will learn about the AI winters. From a historical perspective, there
have been different hype cycles in the development of AI because not all requirements for
a performant system could be met at that time.

We will also examine expert systems and their development. The last section closes with a
discussion of the notable advances in artificial intelligence. This includes modern con-
cepts and its use cases.

Figure 1: Historical Development of Al

Source: Created on behalf of IU (2022).

The figure above illustrates the milestones in AI which will be discussed in the following
sections.

1.1 Historical Developments


Even though historical views of artificial intelligence often start in the 1950s when it was
first applied in computer science, the first considerations about AI range back to 350 BCE.
Therefore, we will first start with a brief overview of ancient artificial intelligence history
before we explore the more recent history.

10
Aristotle, Greek Philosopher (384–322 BCE)

Aristotle was the first to formalize human thinking in a way to be able to imitate it. To for-
malize logical conclusions, he fully enumerated all possible categorical syllogisms (Giles,
2016).

Figure 2: Aristotle, Greek Philosopher (384-322 BCE)

Source: (Pixabay, n.d.-a)

Syllogisms (Greek: syllogismós, “conclusion”, “inference”) use deductive reasoning to


derive workable conclusions from two or more given propositions. Logical programming
languages as they are used today are based on a contemporary equivalent of Aristotle’s
way to formalize thinking in the way logical derivations are used. Modern algorithms in AI
can be programmed such that they derive valid logical conclusions based on a given set of
previously defined rules.

Leonardo da Vinci, Italian Polymath (1452–1519)

Leonardo da Vinci designed a hypothetical computing machine on paper even though it


was never put into practice. The machine had 13 registers, demonstrating that based on a
stored program in memory or mechanics, a black box can accept inputs and produce out-
puts.

These early considerations about computing machinery are very important because prog-
ress in computing is a necessary precondition for any sort of development in AI.

11
René Descartes, French Philosopher (1596–1650)

The French philosopher Descartes believed that rationality and reason can be defined
using principles from mechanics and mathematics. The ability to formulate objectives
using equations is an important foundation for AI, as its objectives are defined mathemati-
cally. According to Descartes, rationalism and materialism are two sides of the same coin
(Bracken, 1984). This links to the methods used in AI where rational decisions are derived
in a mathematical way.

Thomas Hobbes, British Philosopher (1588–1679)

Thomas Hobbes specified Descartes’ theories about rationality and reason. In his work, he
identified similarities between human reasoning and computations of machines. Hobbes
described that, in rational decision-making, humans employ operations similar to calcu-
lus, such that they can be formalized in a way that is analogous to mathematics (Flasiński,
2016).

David Hume, Scottish Philosopher (1711–1776)

Hume made fundamental contributions to questions of logical induction and the concept
of causal reasoning (Wright, 2009). For example, he combined learning principles with
repeated exposure, which has had – among others – a considerable influence on the
Learning curve learning curve (Russell & Norvig, 2022).
The learning curve is a
graphical representation
of the ratio between a Nowadays, many machine learning algorithms are based on the principle of deriving pat-
learning outcome and the terns or relations in data through repeated exposure.
time required to solve a
new tasks.
Recent History of Artificial Intelligence

The recent history of AI started around 1956 when the seminal Dartmouth conference took
place. The term artificial intelligence was first coined at this conference and a definition of
the concept was proposed (Nilsson, 2009). In the following, we will discuss the the key per-
sonalities, organizations , and concepts in the development of AI.

Key personalities

The recent history of AI normally starts with the pioneering Dartmouth conference in 1956
where the term “artificial intelligence” was first coined, and a definition of the term was
suggested.

During the decade of AI's inception, important personalities contributed to the discipline.

Alan Turing was an English computer scientist and mathematician who formalized and
mechanized rational thought processes. In 1950 he conceptualized the well-known Turing
Test. This test examines if an AI communicates with a human observer without the human
observer being able to distinguish whether they are conversing with a machine or another
human. If the human cannot identify an AI as such, it is considered a real AI (Turing, 1950).

12
The American scientist John McCarthy studied automata. It was he who first coined the
term “artificial intelligence” during preparations for the Dartmouth conference (McCarthy
et al., 1955). In cooperation with the Massachusetts Institute of Technology (MIT) and
International Business Machines (IBM), he established AI as an independent field of study.
He was the inventor of the programming language Lisp in 1958 (McCarthy, 1960). For more
than 30 years LISP was used in a variety of applications of AI, such as fraud detection and
robotics. In the 1960s, he founded the Stanford Artificial Intelligence Laboratory which has
had a significant influence on research on implementing human capabilities, like reason-
ing, listening, and seeing, in machines (Feigenbaum, 2012).

American researcher Marvin Minsky, a founder of the MIT Artificial Intelligence Laboratory
in 1959, was another important participant in the Dartmouth conference. Minsky com-
bined insights from AI and cognitive science (Horgan, 1993).

With a background in linguistics and philosophy, Noam Chomsky is another scientist who
contributed to the development of AI. His works about formal language theory and the
development of the Chomsky hierarchy still play an important role in areas such as natural
language processing (NLP). Besides that, he is well known for his critical views on topics
such as social media.

Key institutions

The most influential institutions involved in the development of AI are Dartmouth College
and MIT. Since the Dartmouth conference, there have been several important conferences
at Dartmouth College discussing the latest developments in AI. Many of the early influen-
tial AI researchers have taught at MIT, making it a key institution for AI research. But also
companies such as IBM and Intel, and government research institutions, such as the
Defense Advanced Research Projects (DARPA), have contributed much to AI by funding
research on the subject (Crevier, 1993).

Key disciplines contributing to the development of AI

Many research areas have been contributing to the development of artificial intelligence.
The most important areas are decision theory, game theory, neuroscience, and natural
language processing:

• In decision theory mathematical probability and economic utility are combined. This
provides the formal criteria for decision-making in AI regarding economic benefit and
dealing with uncertainty.
• Game theory is an important foundation for rational agents to learn strategies to solve
games. It is based on the research of the American–Hungarian computer scientist John
von Neuman (1903–1957), and the American–German mathematician and game theo-
rist Oskar Morgenstern (1902–1977); (Leonard, 2010).
• The insights from neuroscience about how the brain works are increasingly used in arti-
ficial intelligence models, especially as the importance of artificial neural networks
(ANN) is increasing. Nowadays, there are many models in AI trying to emulate the way
the brain stores information and solves problems.

13
• Natural language processing (NLP) combines linguistics and computer science. The goal
of NLP is to process not only written language (text) but also spoken language (speech).

High-level programming languages are important to program AI. They are closer to human
language than low-level programming languages such as machine code or assembly lan-
guage and allow programmers to work independently from the hardware’s instruction
sets. Some of the languages that have been developed specifically for AI are Lisp, Prolog,
and Python:

• Lisp has been developed by John McCarthy and is one of the oldest programming lan-
guages. The name comes from “list processing” as Lisp is able to process character
strings in a unique way (McCarthy, 1960). Even though it dates back to the 1960s it has
not only been used for early AI programming but is still relevant today.
• Another early AI programming language is Prolog which was specially designed to prove
theorems and solve logical formulas.
• Nowadays, the general-purpose high-level programming language Python is the most
important programming language. As Python is open source, there exist extensive libra-
ries which help programmers to create applications in a very efficient way.

There are three important factors that have contributed to the recent progress in artificial
intelligence:

• Increasing availability of massive amounts of data, which are required to develop and
train AI algorithms.
• Large improvements in data processing capacity of computers.
• New insights from mathematics, cognitive science, philosophy, and machine learning.

These factors support the development of approaches that were previously impossible, be
it because of a lack of processing capability or a lack of training data.

1.2 AI Winter
The term “AI winter” first appeared in the 1980s. It was coined by AI researchers to
describe periods when interest, research activities, and funding of AI projects significantly
decreased (Crevier, 1993). The term might sound a bit dramatic. However, it reflects the
culture of AI, which is known for its excitement and exuberance.

Historically, the term has its origin in the expression “nuclear winter”, which is an after-
effect of a hypothetical nuclear world war. It describes the state where the atmosphere is
overcome by ashes and the sunshine cannot reach the Earth’s atmosphere, meaning that
temperatures would drop excessively and nothing would be able to grow. Therefore,
transferring this term to AI, it marks periods where interest and funding of AI technologies
were significantly reduced, causing a reduction in research activities. Downturns like this
are usually based on exaggerated expectations towards the capabilities of new technolo-
gies that cannot be realistically met.

14
There have been two AI winters. The first lasted approximately from 1974 to 1980 and the
second from 1987 to 1993 (Crevier, 1993).

The First AI Winter (1974–1980)

During the cold war between the former Soviet Union and the United States (US), auto-
matic language translation was one of the major drivers to fund AI research activities
(Hutchins, 1997). As there were not enough translators to meet the demand, expectations
were high to automate this task. However, the promised outcomes in machine translation
could not be met. Early attempts to automatically translate language failed spectacularly.
One of the big challenges at that time was handling word ambiguities. For instance, the
English sentence “out of sight, out of mind” was translated into Russian as the equivalent
of “invisible idiot” (Hutchins, 1995).

When the Automatic Language Processing Advisory Committee evaluated the results of
the research that had been generously funded by the US, they concluded that machine
translations are not as accurate, nor faster nor cheaper than employing humans (Auto-
matic Language Proesccing Advisory Committee, 1966). Additionally, perceptrons – which
were at that time a popular model of neural-inspired AI – had severe shortcomings as even
simple logical functions, such as exclusive or (XOR), could not be represented in those
early systems.

The Second AI Winter (1987–1993)

The second AI winter started around 1987 when the AI community became more pessimis-
tic about developments. One major reason for this was the collapse of the Lisp machine Lisp machine
business which led to the perception that the industry might end (Newquist, 1994). More- A Lisp machine is a type
of computer that sup-
over, it turned out that it was not possible to develop early successful examples of expert ports the Lisp language.
systems beyond a certain point. Those expert systems had been the main driver of the
returned interest in AI systems after the first AI winter. The reason for the limitations was
that the growth of fact databases was no longer manageable, and results were unreliable
towards unknown inputs i.e., inputs on which the machines had not been trained.

However, there are also arguments that there are no such thing as AI winters, and that
they are myths spread by a few prominent researchers and organizations who had lost
money (Kurzweil, 2014). While the interest in Lisp machines and expert systems
decreased, AI was still deeply embedded in many other types of processing operations
such as credit card transactions.

Causes of the AI Winters

There are several conditions that can cause AI winters. The three most important require-
ments for the success of artificial intelligence are

• algorithms and experience with them,


• computing capacity, and
• the availability of data.

15
The past AI winters occurred because not all requirements were met.

During the first AI winter, there were already powerful algorithms. However, for successful
results, it is necessary to process a huge amount of data. This requires a lot of memory
capacity as well as high processing speed. At the time, there were not enough data availa-
ble to properly train those algorithms. Therefore, the expectations of interested parties
and investors could not be met. As the funded research was unable to produce the prom-
ised results, the funding was stopped.

Until the 1980s the computing capacity had increased enough to train the available algo-
rithms on small data sets. However, as approaches from machine learning and deep learn-
ing became integral parts of AI in the late 1980s, there was a greater need for large data
sets to train AI systems, which became an issue. The lack of labeled training data – even
though computing capacity would have been available – created the perception that sev-
eral of the AI projects had failed.

As the AI winters show, it is impossible to make progress towards developing algorithms


for AI unless there is enough computing capacity (i.e., data storage and processing speed)
and training data.

The Next AI Winter

Nowadays, all three aspects mentioned above are fully met. There is enough computa-
tional power to train the available algorithms on a large number of existing data sets. The
figure below summarizes the preconditions for AI to be successful.

Figure 3: Important Aspects of Al

Source: Created on behalf of IU (2022).

However, the question of whether there might be another AI winter in the future can
hardly be answered. If a hyped concept gets a lot of funding but does not perform, it might
be defunded which could cause another AI winter. Nevertheless, nowadays AI technolo-

16
gies are embedded in many other fields of research. If low-performing projects are
defunded, there is always room for new developments. Therefore, everybody is free to
decide whether AI winters are simply a myth or if the concept really matters.

1.3 Expert Systems


One of the key concepts when looking at the history of artificial intelligence are expert sys-
tems. Expert systems belong to the group of knowledge-based systems. As the name sug-
gests, the goal of expert systems is to emulate the decision and solution-finding process
using the domain-specific knowledge of an expert. The word “expert” refers to a human
with specialized experience and knowledge in a given field, such as medicine or mechan-
ics. Since problems in any given domain may be similar to each other, but never quite
alike, solving problems in that domain cannot be accomplished by memorization alone.
Rather, problem-solving is supplemented by a method that involves matching or applying
experiential knowledge to new problems and application scenarios.

Components of an Expert System

Expert systems are designed to help a non-expert user make decisions based on the
knowledge of an expert.

The figure below illustrates the typical components of an expert system:

Figure 4: Components of an Expert System

Source: Created on behalf of IU (2022).

Expert systems are composed of a body of formalized expert knowledge from a specific
application domain, which is stored in the knowledge base. The inference engine uses the
knowledge base to draw conclusions from the rules and facts in the knowledge. It imple-
ments rules of logical reasoning to derive new facts, rules, and conclusions not explicitly
contained in the given corpus of the knowledge base. A user interface enables the non-
expert user to interact with the expert system to solve a given problem from the applica-
tion domain.

17
Types of Expert Systems

With respect to the representation of knowledge, three approaches to expert systems can
be distinguished:

• Case-based systems store examples of concrete problems together with a successful


solution. When presented with a novel, previously unseen case, the system tries to
retrieve a solution to a similar case and apply this solution to the case at hand. The key
challenge is defining a suitable similarity measure to compare problem settings.
• Rule-based systems represent the knowledge base in the form of facts and if-A-then-B-
type rules that describe relations between facts.
• If the problem class to be solved can be categorized as a decision problem, the knowl-
edge can be represented in a decision tree. The latter are typically generated by analyz-
ing a set of examples.

Development of Expert Systems

Historically, expert systems are an outgrowth of earlier attempts at implementing a gen-


eral problem solver. This approach is primarily associated with the researchers Herbert A.
Simon and Allen Newell, who, in the late 1950s, used a combination of insights from cogni-
tive science and mathematical models of formal reasoning to build a system intended to
solve arbitrary problems by successive reduction to simpler problems (Kuipers & Prasad,
2021). While this attempt was ultimately considered a failure when compared to its lofty
goals, it has nevertheless proven highly influential in the development of cognitive sci-
ence.

One of the initial insights gained from the attempt at general problem solving was that the
construction of a domain specific problem solver should—at least in principle—be easier
to achieve. This led the way to think about systems that combined domain-specific knowl-
edge with domain-dependent apposite reasoning patterns. Edward Feigenbaum, who
worked at Stanford University, the leading academic institution for the subject at the time,
defined the term expert system and built the first practical examples while leading the
Heuristic Programming Project (Kuipers & Prasad, 2021).

The first notable application was Dendral, a system for identifying organic molecules. In
the next step, expert systems were established to help with medical diagnoses of infec-
tious diseases based on given data and rules (Woods, 1973). The expert system that
evolved out of this was called MYCIN, which had a knowledge base of around 600 rules.
However, it took until the 1980s for expert systems to reach the height of research interest,
leading to the development of commercial applications.

The main achievement of expert systems was their role in pioneering the idea of a formal,
yet accessible representation of knowledge. This representation was explicit in the sense
that it was formulated as a set of facts and rules that were suitable for creation, inspec-
tion, and review by a domain expert. This approach thus clearly separates domain-specific
business logic from the general logic needed to run the program – the latter encapsulated
in the inference engine. In stark contrast, more conventional programming approaches
implicitly represent both internal control and business logic in the form of a program code

18
that is hard to read and understand by people who are not IT experts. At least in principle,
the approach championed by expert systems enabled even non-programmers to develop,
improve, and maintain a software solution. Moreover, it introduced the idea of rapid pro-
totyping since the fixed inference engine enabled the creation of programs for entirely dif-
ferent purposes simply by changing the set of underlying rules in the knowledge base.

However, a major downside of the classical expert system paradigm, which also finally led
to a sharp decline in its popularity, was also related to the knowledge base. As expert sys-
tems were engineered for a growing number of applications, many interesting use cases
required larger and larger knowledge bases to satisfactorily represent the domain in ques-
tion. This insight proved problematic in two different aspects:

1. Firstly, the computational complexity of inference grows faster than it does linearly in
the number of facts and rules. This means that for many practical problems the sys-
tem’s answering times were prohibitively high.
2. Secondly, as a knowledge base grows, proving its consistency by ensuring that no
constituent parts contradict each other, becomes exceedingly challenging.

Additionally, rule-based systems in general lack the ability to learn from experience. Exist-
ing rules cannot be modified by the expert system itself. Updates of the knowledge base
can only be done by the expert.

1.4 Notable Advances


After illustrating the downturns of AI winters, it is time to shift the focus to the prosperous
times when artificial intelligence has made huge advances. After an overview of the
research topics that have been in focus in the respective eras, we will examine the most
important developments in adjacent fields of study and how they relate to the progress in
artificial intelligence. Finally, we will examine the future prospects of AI.

Nascent Artificial Intelligence (1956–1974)

In the early years, AI research was dominated by the “symbolic” AI. In this approach, rules
from formal logic are used to formalize thought processes as manipulation of symbolic
representations of information. Accordingly, AI systems developed during this era deal
with the implementation of logical calculus. In most cases, this is done by implementing a
search strategy, where solutions are derived in a step-by-step procedure. The steps in this
procedure are either inferred logically from a preceding step or systematically derived
using backtracking of possible alternatives to avoid dead ends.

The early years were also the period where first attempts for natural language processing
were developed. The first approaches for language processing were focused on highly lim-
ited environments and settings. Therefore, it was possible to achieve initial successes. The
simplification of working environments – a “microworld” approach – also yielded good
results in the fields of computer vision and robot control.

19
In parallel, the first theoretical models of neurons were developed. The research focus was
on the interaction between those cells (i.e., computational units) to implement basic logi-
cal functions in networks.

Knowledge Representation (1980–1987)

The focus of the first wave of AI research was primarily on logical inference. In contrast,
the main topics of the second wave were driven by the attempt to solve the problem of
knowledge representation. The reason for this focus shift was caused by the insight that in
day-to-day situations intelligent behavior is not only based on logical inference but much
more on general knowledge about the way the world works. This knowledge-based way to
view intelligence was the origin of early expert systems. The main characteristic of these
technologies was that domain-relevant knowledge was systematically stored in data-
bases. Using these databases, a set of methods was developed to access that knowledge
in an efficient, effective way.

The emerging interest in AI after the first AI winter was also accompanied by an upturn in
governmental funding at the beginning of the 1980s with projects such as the Alvey
project in the UK and the Fifth Generation Computer project of the Japanese Government
(Russell & Norvig, 2022).

Additionally, in this period the early throwbacks of neurally-inspired AI approaches could


be addressed by new network models and the use of backpropagation as a training
method in layered networks of computational units.

Learning from Data (Since 1993)

During the 1990s there were some major advances of AI in games when the first computer
system “Deep Blue” was able to beat Garry Kasparov, the world champion in chess at that
time.

Aside from this notable but narrow success, AI methods have become widely used in the
development of real-world applications. Successful approaches in the subfields of AI have
gradually found their way into everyday life – often without being explicitly labeled as AI.
In addition, since the early 1990s, there has been a growing number of ideas from decision
theory, mathematics, statistics, and operations research that those contributed signifi-
cantly to AI becoming a rigorous and mature scientific discipline. Especially the paradigm
of intelligent agents has become increasingly popular. In this context, the concept of intel-
ligent agents from economic theory combines with the notions of objects and modularity
of computer science and forms the idea of entities that can act intelligently. This perspec-
tive allows it to shift perspective from AI being an imitation of human intelligence to the
study of intelligent agents and a broader study of intelligence in general.

The advances in AI since the 1990s have been supported by a significant increase in data
storage and computational capacities. Along with this, during the rise of the internet,
there has been an incomparable increase in variety, velocity, and volume of generated
data, which also supported the AI boom.

20
In 2012 the latest upturn in the interest of AI research started when deep learning was
developed based on advances in connectionists machine learning models. The increase in
data processing and information storage capabilities combined with larger data corpora
brought theoretical advances in machine learning models into practice. With deep learn-
ing, new performance levels in many machine learning benchmark problems could be
achieved. This led to a revival of interest in well-established learning models, like rein-
forcement learning and created space for new ideas, like adversarial learning.

Adjacent Fields of Study

There are many fields of study that continuously contribute to AI research. The most influ-
ential fields will be described in the following.

Linguistics

Linguistics can be broadly described as the science of natural language. It deals with
exploring the structural (grammatical) and phonetic properties of interpersonal communi-
cation. To understand language, it is necessary to understand the context and the subject
matter in which it is used. In his book Syntactic Structures, Noam Chomsky (1957) made an
important contribution to linguistics and, therefore, to natural language processing. Since
our thoughts are so closely linked to language as a form of representation, one could take
it a step further and link creativity and thought to linguistic AI. For example, how is it pos-
sible that a child says something it has never said before? In AI, we understand natural
language as a medium of communication in a specific context. Therefore, language is
much more than just a representation of words.

Cognition

In the context of AI, cognition refers to different capabilities such as perception and cogni-
tion, reasoning, intelligence, learning and understanding, and thinking and comprehen-
sion. This is also reflected in the word “recognition”. A large part of our current under-
standing of cognition is a combination of psychology and computer science. In
psychology, theories and hypotheses are formed from observations with humans and ani-
mals. In computer science, behavior is modeled based on what has been observed in psy-
chology. When modeling the brain by a computer, we have the same principle of stimulus
and response as in the human brain. When the computer receives a stimulus, an internal
representation of that stimulus is made. The response to that stimulus can lead to the
original model being modified. Once we have a well-working computer model for a spe-
cific situation, the next step will be to find out how decisions are made. As decisions based
on AI are involved in more and more areas of our lives, it is important to have high trans-
parency about the reasoning process to an external observer. Therefore, explainability
(the ability to explain, how a decision has been made) is becoming increasingly important.
However, approaches based on deep learning still lack explainability.

21
Games

When relating games to AI, this includes much more than gambling or computer games.
Rather, games refer to learning, probability, and uncertainty. In the early twentieth cen-
tury, game theory was established as a mathematical field of study by Oskar Morgenstern
and John von Neuman (Leonard, 2010). In game theory, a comprehensive taxonomy of
games was developed and, in connection with this, some gaming strategies that have
been proven to be optimal strategies.

Another discipline related to game theory is decision theory. While game theory is more
about how the moves of one player affect the options of another player, decision theory
deals with usefulness and uncertainty, i.e., utility and probability. Both are not necessarily
about winning but more about learning, experimenting with possible options, and finding
out what works based on observations.

Games, like chess, checkers, and poker, are usually played for the challenge of winning or
for entertainment. Nowadays, machines can play better than human players. Until 2016,
people believed that the game of Go might be an unsolvable challenge for computers
because of its combinatorial complexity. The objective of the game is to surround the
most territory on a board with 19 horizontal and vertical lines. Even though the ruleset is
quite simple, the complexity comes from the large size of the game board and the result-
ing number of possible moves. This complexity makes it impossible to apply methods that
have been used for games like chess and checkers. However, in 2015 DeepMind developed
the system AlphaGo based on deep networks and reinforcement learning. This system was
the first to be able to beat Lee Sedol, one of the world’s best Go players (Silver et al., 2016).

Not long after AlphaGo, DeepMind developed the system AlphaZero (Silver et al., 2018). In
contrast to AlphaGo, which learned from Go knowledge from past records, AlphaZero only
learns based on intensive self-play following the set of rules. This system turned out to be
even stronger than AlphaGo. It is also remarkable that AlphaZero even found some effec-
tive and efficient strategies, which had, so far, been missed by Go experts.

The Internet of Things

It has only been a few years since the term “Internet of things” (IoT) first came up. IoT con-
nects physical and virtual devices using technologies from information and communica-
tion technology. In our everyday lives, we are surrounded by a multitude of physical devi-
ces that are always connected, such as phones, smart home devices, cars, and wearables.
The communication between those devices produces a huge amount of data which links
IoT to AI. While IoT itself is only about connecting devices and collecting data, AI can help
add intelligent behavior to the interaction between those machines.

Having intelligent devices integrated into our everyday lives not only create opportunities
but also many new challenges. For instance, data about medication based on physical
measurements of a wearable device could be used positively, to remind a person about
medication intake, but also to decide about a possible increase in their health insurance
rate. Therefore, topics like ethics of data use and privacy violations, become increasingly
important facing the new fields of use of AI.

22
Quantum computing

Quantum computing is based on the physical theory of quantum mechanics. Quantum


mechanics deal with the behavior of sub-atomic particles which follow different rules than
described by theories from classical physics. For instance, in quantum mechanics, it is
possible that an electron can be in two different states at the same time. Quantum
mechanics assumes that physical systems can be characterized using a wave function
describing the probabilities of the system being in a particular state. The goal is to exploit
these quantum properties to build supercomputers where new algorithmic approaches
can be implemented, allowing them to outperform classical machines (Giles, 2018). The
kind of information processing from quantum computing is well suited for the probabilis-
tic approach which is inherent in many AI technologies. Therefore, quantum computers
offer the possibility of accelerating applications with AI and thus achieve a real advantage
in processing speed. However, due to the early stage of development of these systems,
using quantum computing has hardly been researched.

The Future of AI

It is always highly speculative when trying to assess the impact of a research area or new
technology on the future as the future prospects will always be biased by previous experi-
ences. Therefore, we do not attempt to predict the long-term future of AI. Nevertheless, we
want to examine the directions of developments in AI and the supporting technologies.

The Gartner hype curve is frequently used to evaluate the potential of new technologies
(Gartner, 2021). The hype curve is presented in a diagram where the y-axis represents the
expectations towards a new technology and time is plotted on the x-axis.

The time axis is characterized by five phases:

1. In the discovery phase a technological trigger or breakthrough generates significant


interest and triggers the innovation.
2. The peak phase of exaggerated expectations is usually accompanied by much enthu-
siasm. Even though there may be successful applications most of them struggle with
early problems.
3. The period of disillusionment shows that not all expectations can be met.
4. In the period of enlightenment, the value of innovation is recognized. There is an
understanding of the practical understanding and advantages, but also of the limita-
tions of the new technology.
5. In the last period, a plateau of productivity is reached, and the new technology
becomes the norm. The final level of this plateau depends on whether the technology
is adopted in a niche or a mass market.

23
Figure 5: The Gartner Hype Cycle

Source: Created on behalf of IU (2022) based on (Gartner, 2018).

The hype cycle has some similarities with the inverted U-shape of a normal distribution
except that the right end of the curve leads into an increasing slope that eventually flat-
tens out.

In 2021, the hype cycle for artificial intelligence showed the following trends (Gartner,
2021):

• In the innovation trigger phase, subjects like composite AI (a combination of different


approaches from AI) and general AI (the ability of a machine to perform humanlike intel-
lectual tasks) appear. Moreover, topics like Human-Centered AI and Responsible AI
show that human integration is becoming increasingly important for the future of AI.
• Deep neural networks, which have been the driver for new levels of performance in
many machine learning applications over the past decades, are still at the peak phase
of inflated expectations or hype. Moreover, topics like knowledge graphs and smart
robots appear in that phase.
• In the disillusionment phase, we find topics like autonomous vehicles, which have expe-
rienced defunding as the high expectations in this area could not be met.

So far, none of the topics of AI have yet reached the plateau of productivity. This reflects
the general acceptance of this area and the productive use of the related technologies.

24
SUMMARY
Research about artificial intelligence has been of interest for a long time.
The first theoretical thoughts about artificial intelligence date back to
Greek philosophers like Aristotle. Those early considerations were con-
tinued by philosophers like Hobbes and Descartes. Since the 1950s, it
has also become an important component of computer science and
made important contributions in areas such as knowledge representa-
tion in expert systems, machine learning, and modeling neural net-
works.

In the past decades, there have been several ups and downs in AI
research. They were caused by a cycle between innovations accompa-
nied by high expectations and disappointment when those expectations
could not be met, often because of technical limitations.

Over time, AI has been shaped by different paradigms from multiple dis-
ciplines. The most popular paradigm nowadays is deep learning. New
fields of applications like IoT or quantum computing offer a vast amount
of opportunities of how AI can be used. However, it remains to see how
intelligent behavior will be implemented in machines in the future.

25
UNIT 2
MODERN AI SYSTEMS

STUDY GOALS

On completion of this unit, you will be able to…

– explain the difference between narrow and general artificial intelligence systems.
– name the most important application areas for artificial intelligence.
– understand the importance of artificial intelligence for corporate activities.
2. MODERN AI SYSTEMS

Introduction
Artificial intelligence has become an integral part of our everyday life. There are several
examples where we do not even notice the presence of AI, be it in Google maps or smart
replies in Gmail.

There are two categories of AI that will be explained in the following unit: narrow and gen-
eral AI.

Organizations like Gartner, McKinsey, or PricewaterhouseCoopers (PwC) predict a mind-


blowing future of AI. Reports like the PWC report (2018) estimate that AI might make a con-
tribution of 15.7 trillion USD to the global economy. Therefore, after discussing the two
categories of AI, we will focus on the most important application areas of AI. Additionally,
we will explore how modern AI systems can be evaluated.

2.1 Narrow versus General AI


Recent research topics in artificial intelligence distinguish between two types: artificial
narrow intelligence (ANI), also referred to as weak artificial intelligence, and artificial gen-
eral intelligence (AGI) or strong artificial intelligence. In ANI, systems are built to perform
specialized functions in controlled environments whereas AGI comprises open-ended,
flexible, and domain independent forms of intelligence like that which is expressed by
human beings.

Even though many people believe that we already have some sort of strong artificial intel-
ligence, current approaches are still implemented in a domain-specific way and lack the
necessary flexibility to be considered AGI. However, there is a large consensus that it is
only a matter of time until artificial intelligence will be able to outperform human intelli-
gence. Results from a survey of 352 AI researchers indicate that there is a 50 percent
chance that algorithms might reach that state by 2060 (Grace et al., 2017).

In the following, we will have a closer look at the underlying concepts of weak and strong
artificial intelligence.

Artificial Narrow Intelligence

The term ANI or weak AI reflects the current and future artificial intelligence. Systems
based on ANI can already solve complex problems or tasks faster than humans. However,
the capabilities of those systems are limited to the use cases for which they have been
designed. In contrast to the human brain, narrow systems cannot generalize from a spe-
cific task to a task from another domain.

28
For example, a particular device or system which can play chess, will probably not be able
to play another strategy game like Go or Shogi without being explicitly programmed to
learn that game. Voice assistants as Siri or Alexa can be seen as some sort of hybrid intelli-
gences, which combine several weak AIs. Those tools are able to translate natural lan-
guage and to analyze those words with their databases in order to complete different
tasks. However, they are only able to solve a limited number of problems for which their
algorithms are suitable and for which they have been trained for. For instance, currently,
they would not be able to analyze pictures or optimize traffic.

In short, ANI includes the display of intelligence with regard to complex problem solving
and the display of intelligence relative to one single task.

Artificial General Intelligence

The reference point for which AGI is measured and judged against are the versatile cogni-
tive abilities of humans. The goal of AGI is not only to imitate the interpretation of sensory
input, but also to emulate the whole spectrum of human cognitive abilities. This includes
all abilities currently represented by ANI, as well as the ability of domain-independent
generalization. This means knowledge of one task can be applied to another in a different
domain. This might also include motivation and volition. Some philosophical sources go
one step further and require AGI to have some sort of consciousness or self-awareness
(Searle, 1980). Developing an AGI would require the following system capabilities:

• cognitive ability to function and learn in multiple domains


• intelligence on a human level across all domains
• independent ability to solve problems
• problem-solving abilities at an average human level over multiple domains
• abstract thinking abilities without drawing directly on past experience
• the cognitive skill to form new ideas about hypothetical concepts
• perception of the whole environment in which the system acts
• self-motivation and self-awareness

Considering the current state of AGI, it is difficult to imagine developing a system that
meets these requirements. In addition, both types of AI also entail the concept of superin-
telligence. This concept goes even further than current conceptions, and describes the
idea that an intelligent system can reach a level of cognition that goes beyond human
capabilities. This self-improvement might be achieved by a recursive cycle. However, this
level of AI is above AGI and still very abstract.

2.2 Application Areas


Due to the latest advances in computational and data storage capabilities, in the past
years, applications for AI have been continuously increasing. The options where AI can be
applied are almost endless.

29
The growing interest is also corroborated by an increase in research activities. According
to the annual AI Index (Zhang et al., 2021), from 2019 to 2020, the number of journal publi-
cations on AI grew by 34.5 percent. Since 2010 AI papers increased more than twenty-fold.
The most popular research topics have been natural language processing and computer
vision which are important for various areas of application.

In a global survey about the state of AI, McKinsey & Company (2021) identified the follow-
AI adoption ing industries as the main fields of AI adoption: High Tech/Telecom, Automotive and
The use of AI capabilities Assembly, Financial Services, Business, Legal and Professional Services, Healthcare/
such as machine learning
in at least one business Pharma and Consumer Goods/Retail. In the following section, we will have a closer look at
function is called AI adop- these fields.
tion.

Figure 6: Application Areas of Al

Source: Created on behalf of IU (2022).

The figure above summarizes the most important domains in which AI is used.

High Tech and Telecommunication

Due to the constant increase of global network traffic and network equipment, there has
been a rapid growth of AI in telecommunication. In this area, AI can not only be used to
optimize and automate networks but also to ensure that the networks, are healthy and
secure.

Using AI in predictive maintenance, it can help fix network issues even before they occur.
Moreover, network anomalies can be accurately predicted when using self-optimizing net-
works.

Big data makes it possible to easily detect network anomalies and therefore prevent frau-
dulent behavior within them.

30
Automotive and Assembly

In the past years, autonomous driving has become a huge research topic. It will drastically
transform the automotive industry in the next decades from a steel-driven to a software-
driven industry. Nowadays, cars are already equipped with many sensors to ensure the
driver’s safety, for staying in–lane or emergency braking assistance.

Intelligent sensors can also detect technical problems based on the car or risks from the
driver – such as fatigue or being under the influence of alcohol – and initiate appropriate
actions.

Like in high tech and telecommunication, in assembly processes, AI can be used for pre-
dictive maintenance and to fix inefficiencies in the assembly line. Moreover, using com-
puter vision, it is already possible to detect defects faster and more accurately than a
human.

Financial Services

Financial services offer numerous applications for artificial intelligence. Intelligent algo-
rithms enable financial institutions to detect and prevent fraudulent transactions and
money laundering much earlier than was previously possible. Computer vision algorithms
can be used to precisely identify counterfeit signatures by comparing them to scans of the
originals stored in a database.

Additionally, many banks and brokers already use Robo-advising; Based on a user's invest-
ment profile, accurate recommendations about future investments can be made
(D’Acunto et al., 2019). Portfolios can also be optimized based on AI applications.

Business, Legal, and Professional Services

Especially in industries where paperwork and repetitive tasks play an important role, AI
can help to make processes faster and more efficient.

Significant elements of routine workflows are currently being automated using robotic
process automation (RPA), which can drastically reduce administrative costs. Systems in Robotic process
RPA do not necessarily have to be enabled with intelligent AI capabilities. However, meth- automation
The automated execution
ods, such as natural language processing and computer vision, can help enhance those of repetitive, manual,
processes with more intelligent business logic. time consuming or error
prone tasks by software
bots is described as
The ongoing developments in big data technologies can help companies extract more robotic process automa-
information from their data. Predictive analytics can be used to identify current and future tion.
trends about the markets a company is in and react accordingly.

Another important use case is the reduction of risk and fraud, especially in legal, account-
ing, and consulting practices. Intelligent agents can help to identify potentially fraudulent
patterns, which will allow for earlier responses.

31
Healthcare and Pharma

In the last few years, healthcare and pharma have been the fastest growing area adopting
AI.

AI-based systems can help detect diseases based on the symptoms. For instance, recent
studies have been able to use AI–based systems to detect COVID–19 based on cough
recordings (Laguarta et al., 2020).

Not only in diagnostics AI can offer many advantages. Intelligent agents can be used to
monitor patients according to their needs. Moreover, regarding medication, AI can help
find an optimal combination of prescriptions to avoid side effects.

Wearable devices – such as heart rate or body temperature trackers – can be used to con-
stantly observe the vital parameters of a person. Based on this data, an agent can give
advice about the wearer’s condition. Moreover, in case critical anomalies are detected, it is
possible to initiate an emergency call.

Consumer Goods and Retail

The consumer goods and retail industry focuses on predicting customer behavior. Web-
sites track how a customer’s profile changes based on their number of visits. This allows
for personal purchase predictions for each customer. This data can not only be used to
make personalized shopping recommendations but also to optimize the whole supply
chain and direct about future research.

Market segmentation is, nowadays, no longer based on geographical regions such as prov-
ince or country. Modern technologies allow it to segment customers’ behavior on a street-
by-street basis. This information can be used to fine-tune operations and decide whether
store locations should be kept or closed.

Additionally, the recent improvement in natural language processing technologies is


increasingly used for chatbots and conversational interfaces. When it comes to customer
retention and customer service, a well-developed artificial agent is key to ensuring cus-
tomer satisfaction.

Evaluation of AI Systems

As the above-mentioned examples illustrate, the application areas for modern AI sytems
are almost unlimited. More and more companies manage to support their business mod-
els with AI or even create completely new ones. Therefore, it is important to carefully eval-
uate new systems. When evaluating AI systems, it is crucial, that all data sets are inde-
pendent from each other and follow a similar probability distribution.

To develop proper models for AI applications, the available data is split into three data
sets:

32
1. Training data set: As the name indicates, this data set is used to fit the parameters of
an algorithm during the training process.
2. Development set: This data set is often also referred to as a validation set. It is used to
evaluate the performance of the model developed using the training set and for fur-
ther optimization. It is important that the development set contains data that have
not been included in the training data.
3. Test set: Once the model is finalised using the training and the development set, the
test set can be used for a final evaluation of the model. Like for the development set,
it is important that the data in the test set have not been used before. The test set is
only used once to validate the model and to ensure that it is not overfitted.

When developing and tuning algorithms, metrics should be in place to evaluate how well
it performs independently and compared to other systems. In a binary classification task,
accuracy, precision, recall, and F-score are metrics that are commonly used for this pur-
pose.

For example, Financial services uses a binary classification task in fraud detection. A finan-
cial transaction can either be categorized as fraud or not. Based on this, we will have four
categories of classification results:

1. True positives (TP): identifies samples that were correctly classified as positive, i.e.
being fraudulent transactions
2. False positives (FP): all results that wrongly indicate a sample to be positive even
though it’s negative, i.e., a non-fraudulent transaction being categorized as fraud
3. True negatives: marks classification results that were correctly classified as negative,
i.e., non-fraudulent transactions that were also labeled as such
4. False negatives: classification results that were wrongly classified as negative even
though they should have been positive, i.e., fraudulent transactions that were classi-
fied as non-fraud

The classification results can be displayed in a confusion matrix, also known as error
matrix. This is shown in the table below.

Figure 7: The Confusion Matrix

Source: Created on behalf of IU (2022).

Using these categories, the above-mentioned metrics can be computed.

Accuracy is an indicator for how many samples were classified correctly. It can be com-
puted as follows:

33
TP + TN
Accuracy = TP + TN + FP + FN

It measures which percentage of the total prediction was correct. Precision denotes the
number of positive samples that were classified correctly in relation to all samples pre-
dicted in this class:

TP
P recision = TP + FP

Recall indicates how many of the positively detected samples were identified correctly in
relation to the total number of samples that should have been identified as such:

TP
Recall = TP + FN

Finally, the F-score combines precision and recall in one score:

precision · recall
F =2· precision + recall

In classification tasks with more than two classes, metrics can be calculated for every
class. In the end the average of the values can be combined to one metric for all classes.

SUMMARY
There are two types of AI: narrow and general. Current AI systems all
belong to the category of ANI. ANI can solve complex problems faster
than humans. However, its capabilities are limited to the domain for
which it has been programmed. Even though the term ANI might suggest
a limitation, it is embedded in many areas of our lives. In contrast to
that, AGI (AI which has the cognitive abilities to transfer knowledge to
other areas of application) remains a theoretical construct, but is still an
important research topic.

The application areas for AI are almost unlimited. AI has had a signifi-
cant impact on today’s corporate landscape. Use cases, such as the opti-
mization of service operations, the enhancement of products based on
AI, and automation of manual processes, can help companies towards
optimizing their business functions. Those use cases stretch across a
wide range of industries, be it automotive and assembly, financial serv-
ices, healthcare and pharma, consumer goods, and many more.

34
UNIT 3
REINFORCEMENT LEARNING

STUDY GOALS

On completion of this unit, you will be able to …

– explain the basic principles of reinforcement learning.


– understand Markov decision processes.
– use the Q-learning algorithm.
3. REINFORCEMENT LEARNING

Introduction
Imagine you are lost in a labyrinth and have to find your way out. As you are there for the
first time, you do not know which way to choose to reach the door to leave. Moreover,
there are dangerous fields on the labyrinth and you should avoid stepping on them.

You will have four actions you can perform: move up, down, left, or right. As you do not
know the labyrinth, the only way to find your way out is to see what happens when you
perform random actions. Within the learning process, you will find out that there are fields
on the labyrinth that will reward you by letting you escape the labyrinth. However, there
are also fields where you will receive a negative reward as they are dangerous to step on.
After some time, you will manage to find your way out without stepping on the dangerous
fields from the experience you have made walking around. This process of learning by
reward is called reinforcement learning.

Figure 8: Initial Situation in the Labyrinth

Source: Created on behalf of IU (2022).

In this unit, you will learn more about the basic ideas of reinforcement learning and the
underlying principles. Moreover, you will get to know algorithms, such as Q-learning, that
can help you optimize the learning experience.

3.1 What is Reinforcement Learning?


Generally, in machine learning, there exist three techniques to train a specific learning
model: supervised, unsupervised, and reinforcement learning.

36
In supervised learning, a machine learns how to solve a problem based on a previously
labeled data set. Typical application areas for supervised learning are regression and clas-
sification problems such as credit risk estimation or spam detection. Training those kinds
of algorithms takes much effort because it requires a large amount of pre-labeled training
data.

In unsupervised learning, training is performed using unlabeled data to discover the


underlying patterns. Based on the input data, clusters are identified which can later be
used for classification. This approach is often used to organize massive amounts of
unstructured data such as customer behavior, to identify relevant peer groups.

Reinforcement learning techniques follow a more explorative approach. Algorithms based


on this approach improve themselves by interacting with the environment. In contrast to
supervised and unsupervised learning, there is no predefined data required. An agent
learns on an unknown set of data based on the reward the environment returns to the
agent. The following table summarizes the basic terms of reinforcement learning.

Table 1: Basic Terms of Reinforcement Learning

Agent Performs actions in an environment and receives


reward for doing so

Action (A) The set of all possible actions the agent can per-
form

Environment (E) The scenario the agent must explore

States (S ) The set of all possible states in the given environ-


ment

Reward (R) Immediate feedback from the environment to


reward an agent's action

Policy (π) The policy the agent applies to determine the next
action based on the current state

Value (V ) The long-term value of the current state S using the


policy π

Source: Created on behalf of IU (2022).

Within the process of reinforcement learning, the agent starts in a certain state st ∈ S and
applies an action at ∈ A st to the environment E, where A st is the set of actions availa-
ble at state st. The environment reacts by returning a new state st + 1 and a reward rt + 1to
the agent. In the next step the agent will apply the next action at + 1 to the environment
which will again return a new state and a reward.

In the introductory example, you are acting as the agent in the labyrinth environment. The
actions you can perform are to move up, down, left, or right. After each move, you will
reach another state by moving to another field in the labyrinth. Each time you perform an
action, you will receive a reward from the environment. It will be positive if you reach the

37
door or negative if you step on a dangerous field. From your new position, the whole
learning cycle will start again. Your goal will be to maximize your reward. The process of
receiving a reward as a function of a state-action pair can be formalized as follows:

f st, at = rt + 1

The whole process of agent-environment interaction is illustrated in the figure below.

Figure 9: The Process of Reinforcement Learning

Source: Created on behalf of IU (2022).

The process of an action being selected from a given state, transitioning to a new state,
and receiving a reward happens repeatedly. For a sequence of discrete time steps
t = 0, 1, 2, … starting at the state s0 ∈ S, the agent-environment interaction will lead to a
sequence:

s0, a0, r1, s1, a1, r2, s2, a2, r3, s3, …

The goal of the agent is to maximize the reward it will receive during the learning process.
The cycle will continue until the agent ends in a terminal state. The total reward R after a
time T can be computed as the sum of rewards received at this point:

Rt = rt + 1 + rt + 2 + … + rT

This reward is also referred to as the Value V π s in the state s using the strategy π. In our
example the maximum reward will be received once you reach the exit of the labyrinth. We
will have a closer look at the value function in the next section.

38
3.2 Markov Decision Process and Value
Function
To be able to evaluate different paths in the labyrinth, we need a suitable approach to
compare interaction sequences. One method to formalize sequential decision-making is
Markov Decision Processes (MDP). In the following, we will discuss how MDPs work.

The Markov Decision Process

MDPs are used to estimate the probability of a future event based on a sequence of possi-
ble events. If a present state holds all the relevant information about past actions, it is said
to have the “Markov property”. In reinforcement learning, the Markov property is critical
because all decisions and values are functions of the present state (Sutton & Barto, 2018),
i.e., decisions are made depending on the environment’s state.

When a task in reinforcement learning satisfies the Markov property, it can be modeled as
an MDP. The process representing the sequence of events in an MDP is called a Markov
chain. If the Markof property is satisfied, in every state of a Markov chain, the probability
that another state is reached depends solely on two factors: the transition probability of
reaching the next state and the present state. MDPs consist of the following components:

• States S
• Actions A
• Rewards for an action at a certain state ra = R s, a, s′
• Transition probabilities for the actions to move from one state to the next state T a s, s′

Because of the Markov property, the transition function depends only on the current state:

P st + 1 st, at, st − 1, at − 1, … = P st + 1 st, at = T at s, s′

The equation above states that the probability P of transitioning from state s_t to state
s_{t+1} given an action a_t depends only on the current state s_t and action a_t and not on
any previous states or actions.

Which action is picked in a certain state is described by the Policy π:

π s, a = p at = a st = s

Using our labyrinth example, the position at which you stand offers no information about
the sequence of states you took to get there. However, your position in the labyrinth repre-
sents all the required information for the decision about your next state, which means it
has the Markov property.

39
The Value Function

In addition to the previously explained concepts, reinforcement learning algorithms use


value functions. Value functions give an estimation about how good it is for an agent to be
in that state and to perform a specific action in that given state (Sutton & Barto, 2018).

Previously, we learned that the value of a state can be computed as the sum of rewards
received within the learning process. Additionally, a discount rate can be used to evaluate
the rewards of future actions at the present state. The discount rate indicates the likeli-
hood to reach a reward state in the future. This helps the agent select actions more pre-
cisely according to the expected reward. An action at + 1 will then be chosen to maximize
the expected discounted return:

V π s =Eπ rt + 1 + γr t + 2 + … + γ T − 1 rT st = s}

=Eπ ∑k = 0 γkrt + k + 1 st = s

where γ is the discount rate, with 0 ≤ γ ≤ 1, denoting the security of the expected return.
A value of γ closer to 1 indicates a higher likelihood for future rewards. Especially in sce-
narios where the length of time the process will take is not known in advance, it is impor-
tant to set γ < 1, as otherwise the value function will not converge.

The following figure illustrates which action the agent should optimally perform in the
respective states of the labyrinth to maximize the reward, i.e., trying to reach the exit and
avoid the dangerous field.

Figure 10: Transitions in the Labyrinth

Source: Created on behalf of IU (2022).

40
3.3 Temporal Difference and Q–Learning
So far, we discussed model-based reinforcement learning. That means that an agent tries
to understand the model of the environment. All decisions are based on a value function.
This value function is based on the current state and the future state where the agent will
end.

In contrast to this, model-free approaches analyze the quality of an action to evaluate


their actions. Q-Learning is a very well-known model-free reinforcement learning algo-
rithm and is based on the concept of temporal difference learning. In the following, we will
explain the underlying concepts of temporal difference and Q-learning.

Temporal Difference Learning

As temporal difference (TD) learning is a model-free approach, there is no model of the


learning environment required. Instead, learning happens directly from the experience in
a system that is partially unknown. As the name indicates, TD learning makes predictions
based on the fact that there is often a correlation between subsequent predictions. The
most prominent example to illustrate the principle of TD learning by Sutton (1988) is
about forecasting the weather. Let’s say we want to predict the weather on a Monday. In a
supervised learning approach, one would use the prediction of every day and compare it
to the actual outcome. The model would be updated once it is Monday. In contrast to that,
a TD approach compares the prediction of each day to the prediction of the following day,
i.e., it considers the temporal difference between subsequent days and updates the pre-
dictions of one day based on the result of the previous day. Therefore, TD learning makes
better use of the experience over time.

Q-Learning

One well-known algorithm based on TD learning is the Q-learning. After initialization, the
agent will conduct random acts which are then evaluated. Based on the outcome of an
action, the agent will adapt its behavior for the subsequent actions.

The goal of the Q-learning algorithm is to maximize the quality function Q s, a . The goal
is to maximize the cumulative reward while being in a given state s by predicting the best
action a (van Otterlo & Wiering, 2012). During the learning process Q s, a is iteratively
updated using the Bellman equation: Bellman equation
The Bellman equation
computes the expected
Q s, a = r + γmaxa′Q s′, a′ reward in an MDP of tak-
ing an action in a certain
state. The reward is bro-
All Q-values computed during the learning process are stored in the Q-Matrix. In every iter- ken into the immediate
ation, the matrix is used to find the best possible action. When the agent has to perform a and the total future
new action, it will look for the maximum Q-value of the state-action pair. expected reward.

41
The Q-learning Algorithm

In the following, we will itemize the Q-learning algorithm. The algorithm consists of an ini-
tialization and an iteration phase. In the initialization phase, all values in the Q-table are
set to 0. In the iteration phase, the agent will perform the following steps:

1. Choose an action for the current state. In this phase there are two different strategies
that can be followed:
• Exploration: perform random actions in order to find out more information about
the environment
• Exploitation: perform actions based on the information which is already known
about the environment based on the Q-table. The goal is to maximize the return
2. Perform the chosen action
3. Evaluate the outcome and get the value of the reward. Based on the result the Q-table
will be updated.

SUMMARY
Reinforcement learning deals with finding the best strategy for how an
agent should behave in an environment to achieve a certain goal. The
learning process of that agent happens based on a reward system which
either rewards the agent for good decisions or punishes it for bad ones.

To model the process of the agent moving in the environment, Markov


decision processes can be used. A value function can be applied to the
system to better evaluate the quality of future decisions.

The Q-learning algorithm is a model-free approach from temporal differ-


ence learning in which the agent gathers information about the environ-
ment based on exploration and exploitation.

Overall, the reinforcement learning process is very similar to learning


through trial and error in real life.

42
UNIT 4
NATURAL LANGUAGE PROCESSING

STUDY GOALS

On completion of this unit, you will be able to…

– explain the historical background of NLP.


– name the most important areas of application.
– distinguish between statistical- and rule-based NLP techniques.
– understand how to vectorize data.
4. NATURAL LANGUAGE PROCESSING

Introduction
Natural language processing (NLP) is one of the major application domains in artificial
intelligence.

NLP can be divided into three subdomains: speech recognition, language understanding,
and language generation. Each will be addressed in the following sections. After an intro-
duction to NLP and its application areas, you will learn more about the basic NLP techni-
ques and how data vectorization works.

4.1 Introduction to NLP and Application


Areas
NLP is an interdisciplinary field with roots in computer science (especially the area of arti-
ficial intelligence), cognitive science, and linguistics. It deals with processing, understand-
ing, and generating natural language (Kaddari et al., 2021). In human-computer interac-
tion, NLP has a key role when it comes to making the interaction more natural. Therefore,
the goal of NLP is to use and interpret language on a similar level to that of humans. This
does more than just help humans to interact with the computer using natural language;
there are many interesting use cases, ranging from automatic machine translation to gen-
erating text excerpts, or even complete literature works. As mentioned above, there are
three subdomains in NLP:

1. Speech recognition: identifies words in spoken language and includes speech-to-text


processing
2. Natural language understanding: extracts the meaning of words and sentences as well
as reading comprehension
3. Natural language generation: is the ability to generate meaningful sentences and
texts.

All these subdomains build on methods from artificial intelligence and form the basis for
the areas of application of NLP.

Historical Developments

Early NLP research dates back to the seventeenth century, when Descartes and Leibnitz
conducted some early theoretical research about NLP (Schwartz, 2019). It became a tech-
nical discipline in the mid-1950s. The geopolitical tension between the former Soviet
Union and the United States led to an increased demand for English-Russian translators.
Therefore, it was attempted to outsource translation to machines. Even though the first
results were promising, machine translation turned out to be much more complex than

44
originally thought, especially as no significant progress could be seen. In 1964 the Auto-
matic Language Processing Advisory Committee classified the NLP technology as “hope-
less” and decided to temporarily stop the research funding in this area. This was seen as
the start of the NLP winter.

Almost 20 years after the NLP winter began, NLP started to regain interest. This was due to
the following three developments:

1. Increase of computing power: Computing power significantly increased, allowing for


more computationally intensive algorithms following Moore’s law.
2. Shift of paradigms: Early language models were based on a grammatical approach
that tried to implement complex rule-based systems to deal with the complexity of
everyday language. More recent research had shifted towards models that are based
on statistical and decision-theoretic foundations, such as decision trees.
3. Part-of-speech-tagging (POS): For this technique, a text is split into smaller units, i.e.,
individual sentences, words, or sub-words. Using POS tagging, grammatical word
functions and categories are added to a given text. This allows to describe speech
using Markov models. In contrast to approaches that consider the whole history, this Markov models
is a major reduction of complexity. In a Markov model, the
next state is defined
based on the current
Taken together, the shift to statistical, decision-theory, and machine learning models state and a set of transi-
increased the robustness of NLP, especially concerning their ability to deal with unknown tion probabilities.

constellations. Moreover, the improved computing power allowed to process a much big-
ger amount of training data which was now available because of the growing amount of
electronic literature. This opened up big opportunities for the available algorithms to
learn and improve.

NLP and the Turing Test

One of the early pioneers in AI was the mathematician and computer scientist Alan Mathi-
son Turing. In his research, he formed the theoretical foundation of what became the
Turing test (Turing, 1950). In the test, a human test person uses a chat to interview two
chat partners: another human and a chatbot. Both try to convince the test person that
they are human. If the test person cannot identify which of their conversational partners is
human and which is the machine, the test is successfully passed. According to Turing pass-
ing the test allows the assumption that the intellectual abilities of a computer are at the
same level as the human brain.

The Turing test primarily addresses the natural language processing abilities of a machine.
Therefore, the Turing test has often been criticized as being too focused on functionality
and not on consciousness. One early attempt to pass the Turing test was done by Joseph
Weizenbaum who developed a computer program to simulate a conversation with a psy-
chotherapist (Weizenbaum, 1966). His computer program ELIZA was one of the first con-
versational AIs. To process the sentence entered by the user, ELIZA utilizes rule-based pat-
tern matching combined with a thesaurus. The publication got some remarkable feedback
from the community. Nevertheless, the simplicity of this approach was soon recognized
and according to the expectations from the community, ELIZA did not pass the Turing test.

45
In 2014 the Chatbot “Eugene Goostman” was the very first AI which seemed to have
passed the Turing test. The Chatbot pretended to be a 13-year-old boy from Ukraine who
was not a native English speaker. This trick was used to explain that the bot did not know
everything and sometimes made mistakes with the language. However, this trick was also
the reason why the validity of the experiment was later questioned (Masnick, 2014).

Application areas of NLP

Now we will briefly describe the major application areas of NLP.

Topic identification

As the name indicates, topic identification deals with the challenge to automatically find
the topics of a given text (May et al., 2015). This can either be done in a supervised or in an
unsupervised way. In supervised topic identification, a model can, for instance, be trained
on newspaper articles that have been labeled with topics, such as politics, sports, or cul-
ture. In an unsupervised setting, the topics are not known in advance. In this case, the
algorithm has to deal with topic modeling or topic discovery to find clusters with similar
topics.

Popular use cases for topic identification are, for instance, social media and brand moni-
toring, customer support, and market research. Topic identification can help find out what
people think about a brand or a product. Social media provides a tremendous amount of
text data that can be analyzed for these uses cases. Customers can be grouped according
to their interests, and reactions to certain advertisements or marketing campaigns can be
easily analyzed. When it comes to market research, topic identification can help when
analyzing open answers in questionnaires. If those answers are pre-classified, it can
reduce the effort to analyze open answers.

Moreover, in customer support, topic identification can be beneficial by categorizing the


customers’ requests by topics. Automatically forwarding requests to specialized employ-
ees can not only reduce costs, but also increase customer satisfaction.

Text summarization

Text summarization deals with methods to automatically generate summaries of a given


text that contain the most relevant information from the source. Algorithms for text sum-
marization are based on extractive and abstractive techniques. Extractive algorithms pro-
duce a summary of a given text by extracting the most important word sequences.
Abstract techniques, conversely, generate summaries by creating a new text and rewriting
the content of the original document.

A common text summarization technique that works in an unsupervised extractive way is


TextRank (Mihalcea & Tarau, 2004). This algorithm compares every sentence of a given text
with all other sentences. This is done by computing a similarity score for every pair of sen-
tences. A score closer to one indicates a higher similarity between one sentence and
another sentence that represents the content in a good way. For each sentence, the scores

46
are summarized to get a sentence rank. After sorting the sentences according to their rank,
it is easy to evaluate the importance of each one and create a summary from a predefined
number with the highest rank.

There are two major challenges when dealing with supervised extractive text summariza-
tion, as training requires a lot of hand-annotated text data. These are:

1. It is necessary that the annotations contain the words that have to be in the summary.
When humans summarize texts, they tend to do this in an abstract way. Therefore, it is
hard to find training data in the required format.
2. The decision about what information should be included in the summary is subjective
and depends on the focus of a task. While a product description would focus more on
the technical aspects of a text, a summary of the business value of a product will put
the emphasis on completely different aspects.

A typical use case for text summarization is presenting a user a preview of the content of
search results or articles. This makes it easier to quickly analyze a huge amount of infor-
mation. Moreover, in question answering, text summarization techniques can be used to
help a user find answers to certain questions in a document.

Sentiment analysis

Sentiment analysis captures subjective aspects of texts (Nasukawa & Yi, 2003), such as
analyzing the author’s mood on a tweet on Twitter. Like topic identification, sentiment
analysis deals with text classification. The major difference between topic identification
and sentiment analysis is that topic identification focuses on objective aspects of the text
while sentiment analysis centers on subjective characteristics like moods and emotions.

The application areas for sentiment analysis are manifold. Customer sentiment analysis
has gained much traction as a research field lately. The ability to track customers’ senti-
ments over time can, for instance, give important insights about how customers react to
changes of a product/a service or how external factors like global crises influence custom-
ers’ perceptions. Social networks, such as Facebook, Instagram, and Twitter, provide huge
amounts of data about how customers feel about a product. Having a better understand-
ing of customer’s needs can help modify and optimize business processes accordingly.
Detecting emotions from user-generated content comes with some big challenges when
dealing with irony/sarcasm, negation, and multipolarity.

There is much sarcasm in user-generated content, especially in social media. Even for
humans, it can sometimes be hard to detect sarcasm, which makes it even more difficult
for a machine. Let us, for instance, look at the sentence

“Wow, your phone has an internal storage of 1 Gigabyte?”

Only a few years back this would have been a straightforward sentence. Now, if said about
a modern smartphone, it is easy for a human to tell that this statement is sarcastic. While
there has been some recent success in sarcasm detection using methods from deep learn-
ing (Ghosh & Veale, 2016), dealing with sarcasm remains a challenging task.

47
Negation is another challenge when trying to detect a statement's sentiment. Negation
can be explicit or implicit, and also comes with the morphology of a word denoted by pre-
fixes, such as “dis-” and “non-,” or suffixes, such as “-less”. Double negation is another lan-
guage construct that can be easily misunderstood. While most of the time double nega-
tives will cancel each other, in some contexts it can also intensify the negation.
Considering negation in the model used for sentiment analysis can help to significantly
increase the accuracy (Sharif et al., 2016).

An additional challenge in sentiment analysis can be multipolarity, meaning that some


parts of the text can be positive while others are negative. Given the sentence “The display
of my new phone is awesome, but the audio quality is really poor”, the sentiment for the
display is positive while it is negative for the speakers. Simply calculating the average of
the sentiment might lead to information loss. Therefore, a better approach to tackle this
issue would be to split the sentence into two parts: one for the positive review of the dis-
play and one for the negative feedback about the speakers.

Named entity recognition

Named entity recognition (NER) deals with the challenge of locating and classifying
named entities in an unstructured text. Those entities can then be assigned to categories
such as names, locations, time and date expressions, organizations, quantities, and many
more. NER plays an important role in understanding the content of a text. Especially for
text analysis and data organization, NER is a good starting point for further analysis. The
following figure shows an example of how entities can be identified from a sentence.

Figure 11: Example for Named Entity Recognition

Source: Created on behalf of IU (2022).

NER can be used in all domains where categorizing text can be advantageous. For
instance, tickets in customer support can be categorized according to their topics. Tickets
can then automatically be forwarded to a specialist. Also, if data has to be anonymized
due to privacy regulations, NER can help to save costs. It can identify personal data and
automatically remove it. Depending on the quality of the underlying data, manual cleanup
is no longer necessary. Another use case is to extract information from candidate resumes
in the application process. It can significantly decrease the workload of the HR depart-
ment, especially when there are many applicants (Zimmermann et al., 2016).

The biggest challenge in NER is that to train a model, it is necessary to have a large
amount of annotated data for training available. The model will later always focus on the
specific tasks/the specific subset of entities on which it has been trained.

48
Translation

Machine translation (MT) is a subfield of NLP that combines several disciplines. Using
methods from artificial intelligence, computer science, information theory, and statistics,
in MT text or speech are automatically translated from one language to another.

In the last decades, the quality of MT has significantly improved. In most cases, the quality
of machine translations is still not as good as those done by professional translators. How-
ever, combining MT and manual post-processing is nowadays often faster than translating
everything manually. Like in any other area of NLP, the output quality depends signifi-
cantly on the quality of the training data. Therefore, often domain-specific data is used.
While in the past, the most commonly used method was statistical machine translation Statistical machine
(SMT), neural machine translation (NMT) has become more popular (Koehn & Knowles, translation
In statistical machine
2017). translation, translations
are generated using stat-
MT can be used for text-to-text translations as well as speech-to-speech translations. Using istical models that were
built based on the analy-
MT for text can help quickly translate text documents or websites, assist professional sis of bilingual text cor-
translators to accelerate the translation process, or as a part of a speech-to-speech trans- pora.
lation system. As globalization progresses, MT has become more important every day. In Neural machine trans-
lation
2016, Google was translating over 100 billion words per day in more than 100 languages. In neural machine trans-
(Turovsky, 2016) The following figure shows how text-to-text translation is interlinked with lation, an artificial neural
speech-to-speech translation. network is used to learn a
statistical model for MT.

Figure 12: Text-to-text Translation as a Part of Speech-to-Speech Translation

Source: Created on behalf of IU (2022).

Text-to-text translation and speech-to-speech translation are becoming increasingly


important. This process has been accelerated by the huge increase of video chats and con-
ferences in recent years. Speech-to-speech translation can help bridge language barriers
using applications such as the Skype translator.

As the figure illustrates, the core of a speech-to-speech translation system is a text-to-text


translation system. Before the translation starts, speech has to be converted into text
using methods from automatic speech recognition (ASR). After the translation, text-to-
speech (TTS) synthesis is used to produce speech from the translated text. Therefore, the

49
quality of the output does not only depend on the quality of the MT component, but also
on the quality of the ASR and TTS components, which makes speech-to-speech-transla-
tion challenging.

Nowadays, the two biggest challenges in MT are domain mismatch and under-resourced
languages. Domain mismatch means that words and sentences can have different transla-
tions based on the domain. Thus, it is important to use domain adaption when developing
an MT system for a special use case.

For some combinations of languages in MT, there are no bilingual text corpora available
for source and target language. One approach to solving the problem of under-resourced
languages is to use pivot MT. In pivot MT, the source and target language are bridged using
a third language (Kim et al., 2019). When, for instance, translating from Khmer (Cambodia)
to Zulu (South Africa), a text will first be translated from Khmer to English and afterwards
from English to Zulu.

Chatbots

Chatbots are text-based dialogue systems. They allow interaction with a computer based
on text in natural language. Based on the input, the system will reply in natural language.
Sometimes, chatbots are used in combination with an avatar simulating a character or
personality. One of the most popular chatbots was ELIZA imitating a psychotherapist.
Chatbots are often used in messenger apps, such as Facebook, or website chats. Moreover,
they form the basis for digital assistants, like Alexa, Siri, or Google Assistant.

Chatbots can be categorized according to their level of intelligence:

• Notification assistants (level 1): These chatbots only interact unidirectionally with the
user. They can be used for notifications about events or updates (i.e., push notifica-
tions).
• Frequently asked questions assistants (level 2): Those bots can bi-directionally interact
with a user. They can interpret the user’s query and find an appropriate answer in a
knowledge base.
• Contextual assistants (level 3): these chatbots can not only interact bidirectionally, but
also be context-aware based on the conversation history.

In the future, it is likely that further levels of chatbots will evolve. A chatbot is based on
three components:

1. Natural language understanding (NLU): This component parses the input text and
identifies the intent and the entities of the user (user information).
2. Dialogue management component: The goal of this component is to interpret the
intent and entities identified by the NLU in context with the conversation and decide
the reaction of the bot.
3. Message generator component: Based on the output of the other components, the
task of this component is to generate the answer of the chatbot by either filling a pre-
defined template or by generating a free text.

50
Chatbots can save a lot of time and money. Therefore, use cases are increasing continu-
ously. They are normally available 24/7 at comparatively low costs and can easily be
scaled if necessary. In customer service, they can not only reply to customers’ requests,
but also give product recommendations or make travel arrangements, such as hotel or
flight reservations. If a request is too complicated for a bot, there are usually interfaces to
forward requests to a human support team.

In marketing, chatbots can be used to generate leads. An increasing number of company


websites are already using bots to ask sales-oriented questions to otherwise anonymous
visitors. The replies can be used to acquire new leads. Moreover, the chatbot can provide
guidance or present new products or information about special offers. Using chatbots as
personal sales agents can also help reduce the efforts of humans, which allows them to
focus on more complex tasks.

4.2 Basic NLP Techniques


Early systems for NLP applied rules based on linguistic structures. Those rules were often
hand written and only applied to the domain for which they were designed. Nowadays,
most NLP systems use statistical methods from machine learning.

Rule-Based Techniques

Rule-based techniques for NLP use a set of predefined rules to tackle a given problem.
Those rules try to reproduce the way humans build sentences. A simple example for a
rule-based system is the extraction of single words from a text based on the very simple
rule “divide the text at every blank space”. Looking at terms like “New York” already illus-
trates how fast a seemingly simple problem can get complicated. Therefore, more com-
plex systems are based on linguistic structures using formal grammars.

The rule-based approach implies that, to build a system, humans have to be involved in
the process. Because of this, one of the major advantages of rule-based systems is the
explainability: As the rules have been designed by humans, it is easy to understand how a
task has been processed and to locate errors.

Rule-based systems can be developed flexibly. Typically, it is unnecessary to make


changes to the core of an application when rules are changed, be it by adding new rules or
correcting existing ones. Another advantage of rule-based systems is that the amount of
training data required to develop the system is comparatively small.

The major drawback of the rule-based approach is that it requires experts to build a set of
appropriate rules. Moreover, rule-based systems are normally built in a domain-specific
way, which makes it difficult to use a system in a domain for which it was not designed.

51
Statistical-Based Techniques

Since computational power has increased in the past decades, systems based on statisti-
cal methods – which are often subsumed under the term machine learning – have
replaced most of the early rule-based systems. Those methods follow a data-driven
approach. The models generated by statistical-based methods are trained with a huge
amount of training data to derive the rules of a given task. After that, the models can be
used to classify a set of unknown data to make predictions. In contrast to rule-based sys-
tems, statistical-based systems do not require expert knowledge about the domain. They
can easily be developed based on existing methods and improved by providing appropri-
ate data. Also transferring the model to another domain is much easier than for rule-based
systems.

However, one disadvantage of systems based on statistical machine learning techniques is


that much annotated training data is required to produce good results, whereas rule-
based systems can perform well when there is only limited data available for a specific
task.

Tasks

In general, NLP tasks can be divided into four categories, syntax, semantics, discourse,
and speech. In the following, we will give an overview of those tasks.

Syntax

Syntactical tasks in NLP deal with the features of language such as categories, word boun-
daries, and grammatical functions. Typical tasks dealing with the syntax are tokenization
and part-of-speech (POS) tagging.

The goal of tokenization is to split a text into individual units such as words, sentences, or
sub-word units. For instance, the sentence “I enjoy studying artificial intelligence.” could
be tokenized into “I” “enjoy” “studying” “artificial” “intelligence” “.” .

POS tagging – also called grammatical tagging – goes one step further and adds grammat-
ical word functions and categories to the text. The following example illustrates how a
sentence can be analyzed using POS tagging.

Figure 13: Part-of-Speech Tagging

Source: Created on behalf of IU (2022).

52
Syntactic ambiguity, i.e., words which cannot be clearly assigned to a category, are a big
challenge in NLP.

One commonly used example for syntactic ambiguity is the sentence

“Time flies like an arrow”

which can be interpreted in many different ways. Two of the possible interpretations are

1. Time passes as quickly as an arrow


2. There exists a particular arrow such that every “time fly” (insect) likes that arrow

In the first interpretation, “like” is a comparative preposition, while in the second interpre-
tation it is a verb.

Semantics

The focus of semantic tasks is on the meaning of words and sentences. Understanding the
meaning of a text is essential for most application areas of NLP. In sentiment analysis, sub-
jective aspects of the language are analyzed. For instance, when analyzing posts on social
media, it is important to understand what the text means to identify whether it is positive
or negative. Named entity recognition (NER) is another research field where semantics are
important for correct classification results. Identifying entities, such as names, locations,
or dates, from a given text cannot be done without understanding the semantics of a text.
In topic identification, a given text is labeled with a topic. Therefore, it is important to
understand what the text is about. For instance, newspaper articles could be labeled with
topics such as “politics”, “culture”, or “weather”.

If NLP is used for answering questions, a computer needs to create an appropriate answer
to a certain question. Assuming a question answering algorithm was trained on this course
book, the algorithm might display this section when asked “What are the typical tasks in
NLP?” For this purpose, the semantics of this section must be interpreted correctly. Also,
in machine translation, understanding the correct meaning of a text is essential. Other-
wise, the translation will yield results that are hard to understand or even wrong.

Figure 14: The Importance of Semantics in NLP

Source: Created on behalf of IU (2022).

The figure above illustrates how important it is to properly understand the semantics of a
text.

53
Discourse

Discourse deals with text that is longer than a single sentence. It is important for tasks,
such as topic identification and text summarization, where an algorithm produces a sum-
mary of a given text by extracting the most important sentences. Analyzing the discourse
of a text involves several sub-tasks, like identifying the topic structure, analysis of the cor-
eference, and the conversation structure.

Speech

Speech tasks in NLP are all about spoken language. In speech tasks two sub-tasks can be
distinguished:

1. Speech-to-text (STT): Also referred to as automatic speech recognition (ASR), it con-


verts spoken language into text.
2. Text-to-speech (TTS), or speech synthesis, which deals with transforming a written
text into spoken language.

Both are important for conversational interfaces, such as voice assistance, like Siri or
Alexa. The following figure summarizes the typical tasks in NLP.

Figure 15: NLP Tasks

Source: Created on behalf of IU (2022).

54
4.3 Vectorizing Data
In machine learning, algorithms only accept numeric input. Therefore, if we want to
extract information from an unstructured text, we need to find a way that the computer
can process it. For this purpose, the text has to be converted into a numerical format,
which the computer can process.

In the following, we want to introduce two approaches to how words can be embedded
into a semantic vector space: the bag-of-words approach, which is simple, and the more
powerful concept of neural word and sentence vectors.

Bag-of-Words

One of the easiest approaches to convert textual information into numbers is the bag-of-
words (BoW) model. Using BoW a text is represented by a vector that describes the num-
ber of word occurrences in a given text document. The term “bag” refers to the fact that
once the words are put into the unique set of words describing a text, all information
about the structure or order of the words in a text is discarded. To understand the BoW
approach, we will use the following example text:

• Darren loves dogs


• Darren does not like cats
• Cats are not like dogs

In the first step, we need to identify all unique words from the text. For this purpose, we
use tokenization.

In the above text, the following words are used:

Darren, loves, dogs, does, not, like, cats, are

In the next step, we need to score the words in every sentence. As we know that our
vocabulary consists of 8 words, the resulting vector will have a length of 8. The BoW vec-
tors for the sentences above will look as follows:

• [1, 1, 1, 0, 0, 0, 0, 0]
• [1, 0, 0, 1, 1, 1, 1, 0]
• [0, 0, 1, 0, 1, 1, 1, 1]

There are different methods to score the words in the BoW model. In the above sentences
every word only occurred once, therefore the resulting vectors are a binary representation
of the text. If the whole text from the above example were summarized in one vector, the
following options are available:

• Boolean representation: the vector simply indicates if a word occurs or not


[1, 1, 1, 1, 1, 1, 1, 1].
• Count of words: the resulting vector reflects how often a word occurs

55
[2,1,2,1,2,2,2,1].

As you will notice, this representation no longer contains any information about the origi-
nal order of the words.

Limitations of Bag-of-Words

Taken together, the BoW model is simple, which induces some major disadvantages:

• Selection of vocabulary: The vocabulary of the model has to be selected very carefully.
The balance between the size of the model and sparsity must always be kept in mind.
The larger the vocabulary, the higher will be the sparsity of the vectors.
• Risk of high sparsity: For computational reasons, it is more difficult to model sparse rep-
resentation of data, as the complexity of time and space will increase with higher spar-
sity. Moreover, it is more difficult to make use of the data if only a little information is
contained in a large representational space.
• Loss of meaning: Using BoW, neither word order nor context nor sense are considered.
In our example, the different meanings of “like” (once being used as a preposition and
once as a verb) gets completely lost. In situations like that, the BoW model does not per-
form well.

Word Vectors

To be able to embed words in a semantic vector space they can be represented as word
vectors. Linear operations can be applied to find word analogies and similarities. These
word similarities can, for instance, be based on the cosine similarity. Most importantly,
once words are transformed into word vectors, they can be used as an input for machine
learning models, like artificial neural networks and linear classifiers. In the following,
three vectorization methods will be presented: Word2Vec, TD-IDF and GloVe.

Word2Vec

The Word2Vec model is based on a simple neural network. The neural network generates
word embeddings based on only one hidden layer (Mikolov et al., 2013). A research mile-
stone was passed when Google Research published the model in 2013. The input layer of
the neural network expects a “one-hot vector.” The one-hot vector is a BoW vector for one
single word. This means that all indices of that vector are set to 0 except for the index of
the word, which is analyzed. This index is set to 1.

Training the neural network for Word2Vec requires a large text corpus. This could, for
instance, be a Wikipedia dump. When the training is performed, a fixed-length word win-
dow with length N is slid over the corpus. Typical values for N would, for example, be
N = 5 or N = 10.

In Word2Vec there are two prediction models:

56
1. Continuous Bag-of-Words (CBOW): This model can be used if the goal is to predict one
missing word in a fixed window in the context of the other N − 1 words. As an input
vector, we can either use the average or the sum of the one-hot vectors.
2. Skip-gram: If we have one word within a fixed window, with this model we can predict
the remaining N − 1 context words.

The difference between both models is illustrated in the figure below.

Figure 16: Comparison of CBOW and Skip-Gram

Source: Created on behalf of IU (2022).

One important aspect of CBOW is that the prediction outcome is not influenced by the
order of the context words. In skip-gram, nearby context words are weighted more heavily
than more distant context words. While CBOW generally performs faster, the skip-gram
architecture is better suited for infrequent words.

When training Word2Vec, the goal is to maximize the probabilities for those words that
appeared in the fixed window of the analyzed sample from the data corpus used for train-
ing. The function we receive from this process is the objective function.

In an NLP task, the goal is usually not to find a model to predict the next word based on a
given text snippet, but to analyze the syntax and semantics of a given word or text. If we
remove the output layer of the model we generated before and look at the hidden layer
instead, we can extract the output vector from this layer. Neural networks usually develop
a strong abstraction and generalization ability on their last layers. It is, therefore, possible
to use the output vector of the hidden layer as an abstract representation of the features
from the input data. Thus, we can use it as an embedding vector for the word we want to
analyze.

57
Nowadays, there are many pre-trained Word2Vec models available for various languages
that can easily be adapted for specific NLP tasks.

Term frequency-inverse document frequency

In BoW, the frequency of vocabulary words does only reflect words that are contained in
the document. Therefore, all words are given the same weight when analyzing a text, no
matter their importance. Term frequency-inverse document frequency (TF-IDF) is a statis-
tical measure from information retrieval that tackles this problem and is one of the most
commonly used weighting schemes in information retrieval (Beel et al., 2016). In TF-IDF
the term frequency (TF) is combined with the inverse document frequency (IDF). The rele-
vance of a word increases with its frequency in a certain text but is compensated by the
word frequency in the whole data set. For the computation of TF-IDF we need the follow-
ing parameters:

• Term frequency (TF) reflects how often a term t occurs in a document d. The word order
is not relevant in this case. The number of occurrences is weighted by the total number
of terms in the document:

number of occurences oftind


T F t, d = number of words ind

• Document frequency (DF) indicates the percentage of documents including a specific


term t in relation to the total number of documents D. This can be seen as an indicator
for the importance of the text.

number of documents containingt


DF t, D = total number of documentsD

• Inverse document frequency (IDF) tests the relevance of a particular term. As the name
suggests, it is the inverse of the document frequency logarithmically scaled:

1
IDF t, D = log DF t, D

The final TF-IDF score for a term can be computed as follows:

T F IDF t, D = T F t, d · IDF t, D

High values of TF-IDF indicate words that occur often in a document while the number of
documents that contain the respective term is small compared to the total amount of
documents. Therefore, TF-IDF can help find terms in a document that are most important
in a text.

GloVe

Global Vectors for word representation (GloVe) is another vectorization method commonly
used in NLP. While Word2Vec is a predictive model, GloVe is an unsupervised approach
based on the counts of words. It was developed because Pennington et al. (2014) con-
cluded that the skip-gram approach in Word2Vec does not fully consider the statistical

58
information when it comes to word co-occurrences. Therefore, they combined the skip-
gram approach with the benefits of matrix factorization. The GloVe model uses a co- Matrix factorization
occurrence matrix, which contains information about the word context. The developed This is used to reduce a
matrix into its compo-
model has been shown to outperform related models, especially for named entity recog- nents to simplify complex
nition and similarity tasks (Pennington et al., 2014). matrix operations.

Sentence Vectors

So far, we have learned how to represent words as vectors. However, various NLP tasks,
like question answering or sentiment analysis, require not only the analysis of a single
word but of a whole sentence or paragraph. Therefore, we also need a way how to encode
a sequence of words to be able to process it with a learning algorithm.

One approach is to build an average of the vectors of a sentence from Word2Vec and use
the resulting vectors as input for a model. However, this method would come with the dis-
advantage that the word order is no longer included in the word encodings. For instance,
the sentences “I put the coffee in the cup” and “I put the cup in the coffee” contain the
same words. Only the word order makes the difference in the sentence.

To tackle the problem of dealing with text snippets of various lengths, there exist several
approaches. In the following sections, we will present a selection of those algorithms.
Please note that in the following the term “sentence” will also be used to represent a
whole paragraph of text, not only as a sentence in a strict grammatical way.

Skip-thought

In the skip-thought vectors approach (Kiros et al., 2015) the concept of the skip-gram
architecture we introduced previously in the section about the Word2Vec approach is
transferred to the level of sentences.

Like Word2Vec, skip-thought requires a large text corpus to train the model. In contrast to
Word2Vec, instead of using a sliding word window, skip-thought analyzes a triple of three
consecutive sentences. The resulting model is a typical example of an encoder-decoder
architecture. The middle sentence from the triple is used as an input for the encoder. The
encoder produces an output, which is connected to the decoder. There are two ways to
optimize the model: the decoder can either be used to predict the following or the previ-
ous sentence of the sentence the encoder received.

There are some NLP tasks that do not require a prediction model. For those tasks, the
decoder part is no longer needed after the training and can be discarded. To get the vector
representation of the sentence, we can use the output vector of the encoder.

In case we use the model to only predict the following or the previous sentence, the result
is a uni-skip vector. When concatenating two uni-skip vectors so that one predicts the pre-
vious and the other predicts the next sentence, the result is called a bi-skip vector. If n-
dimensional uni-skip vectors are combined with n-dimensional bi-skip vectors, the result
will be a 2n-dimensional combine-skip vector. In a comparison of several combine-
thought models, the combine-skip model has been proven to perform slightly better.

59
There is a pre-trained English language model available to the public based on the Book-
Corpus dataset.

Universal sentence encoder (USE)

The universal sentence encoder (USE) is a family of models for sentence embedding that
was developed by Google Research (Cer et al., 2018). There are two architecture variants
of the USE. One variant uses a deep averaging network (DAN); (Iyyer et al., 2015) and is
faster but less accurate, while the other variant utilizes a transformer model.

Also for USE, there are pre-trained models available to the public: one English model and
one multilingual model (Chidambaram et al., 2019). These models are both based on the
DAN architecture.

Bidirectional encoder representations from transformers (BERT)

As the name indicates, BERT (Devlin et al., 2018) was based on the transformer architec-
ture. Like USE this model was introduced by Google Research. The language model is
available open-source and has been pre-trained on a large text corpus in two combined
and unsupervised ways: masked language model and next sentence prediction.

With the masked language model, a sentence is taken from the training set. In the next
step, about 15 percent of the words in that sentence are masked. For example, in the sen-
tence

“I like to [mask1] a cup of coffee with [mask2] in the morning”

the words “drink” and “milk” have been masked. The model is then trained to predict the
missing words in the sentence. The focus of the model is to understand the context of the
words. The processing of the text data is no longer done in an unidirectional way from
either left to right or right to left.

Using next sentence prediction as a training method, the model receives a pair of two sen-
tences. The model's goal is to predict if the first sentence is followed by the second sen-
tence. Therefore, the resulting model focuses mainly on how a pair of sentences are
related.

Both models were trained together to minimize the combined loss function of the two
strategies.

SUMMARY
The use of NLP in computer science dates back to the 1950s. There is a
wide range of application areas for NLP, which include topics such as
question answering, sentiment analysis, named entity recognition, and
topic identification.

60
To be able to process language with computers, vectorization techni-
ques such as Bag-of-Words, word vectors, and sentence vectors are
used. However, these models come with some limitations. For example,
if Bag-of-Words is used, we lose all information about word order. There-
fore, this model can only be used if the word order is not crucial. More-
over, some models, including BERT, have limitations towards the input
length of a text (e.g., 256-word tokens). A larger paragraph of text can
only be embedded using tricks, like segmenting it into smaller parts.

Nevertheless, there has been huge progress in NLP in the past years as
computational power has been increasing drastically and larger data
corpora have become available to train language models.

61
UNIT 5
COMPUTER VISION

STUDY GOALS

On completion of this unit, you will be able to …

– define computer vision.


– explain how to represent images as pixels.
– distinguish between detection, description, and matching of features.
– correct distortion with calibration methods.
5. COMPUTER VISION

Introduction
This unit will discuss the basic principles of computer vision. It starts with a definition of
the topic, the historical background, and an overview of the most important computer
vision tasks. After that, you will learn how an image can be represented as an array of pix-
els and how images can be modified using filters. We will illustrate how to detect features
in images, such as edges, corners, and blobs. This knowledge will be used to illustrate how
you can use calibration and deal with distortion.

Moreover, this unit addresses the topic of semantic segmentation, which can be used to
classify pixels into categories.

5.1 Introduction to Computer Vision


Computer vision is a topic that combines multiple disciplines. It is a mixture of computer
science (especially artificial intelligence) and engineering (Wiley & Lucas, 2018). Computer
vision tries to model human visual perception by processing and analyzing visual data.
This data can, for instance, be static pictures or videos from cameras. The goal is to get a
deep understanding of the visual aspects of the real world (Wiley & Lucas, 2018). Com-
puter vision includes tasks, like the classification of objects or motion detection.

Historical Developments

Research in computer vision began in the 1960s at some of the pioneering universities for
robotics and AI, such as Stanford University, the Massachusetts Institute of Technology,
and Carnegie Mellon University. The goal of that early research was to mimic the visual
system of humans (Szeliski, 2022). Researchers tried to make robots more intelligent by
automating the process of image analysis using a camera attached to the computer. The
big difference between digital image processing and computer vision at that time was that
researchers tried to reconstruct the 3D structure from the real world to gain a better
understanding of the scene (Szeliski, 2022).

Early foundations of algorithms, such as line labeling, edge extraction, object representa-
tion and motion estimation date back to the 1970s (Szeliski, 2022). In the 1980s, there was
a shift of focus towards the quantitative aspects of computer vision and mathematical
analysis. Concepts, such as inference of shape from characteristics like texture, shading,
Photogrammetry contour models, and focus, evolved. In the 1990s, methods from photogrammetry were
A group of contactless used to develop algorithms for sparse 3D reconstructions of scenes based on multiple
methods to derive the
position and shape of images. The results led to a better understanding of camera calibration. Statistical meth-
physical objects directly ods, in particular eigenfaces, were used for facial recognition from pictures. Due to an
from photographic
images is called Photo-
grammetry.

64
increasing interaction between computer vision and computer graphics, there has been a
significant change in methods like morphing, image-based modeling and rendering,
image stitching, light-field-rendering, and interpolation of views (Szeliski, 2022).

Current developments go into the direction of optimization frameworks and machine


learning approaches for those feature-based techniques. Further stimulation of the field
of computer visions comes from the recent development in deep learning. These new
methods outperform the classical methods on benchmark computer image datasets in
many tasks such as segmentation and classification or optical flow (O’Mahony et al.,
2020).

Typical Tasks

There are four major categories in computer vision: recognition tasks, motion analysis,
image restoration and geometry reconstruction. The following figure illustrates those
tasks.

Figure 17: Categories of Computer Vision Tasks

Source: Created on behalf of IU (2022).

Recognition tasks

There are different types of recognition tasks in computer vision. Typical tasks involve the
detection of objects, persons, poses, or images. Object recognition deals with the estima-
tion of different classes of objects that are contained in an image (Zou et al., 2019). For
instance, a very basic classifier could be used to detect whether there is a hazardous mate-
rial label on an image or not. Making the classifier more specific could additionally recog-
nize information about the label type such as “flammable” or “poison.” Object recognition
is also important in the area of autonomous driving to detect other vehicles or pedes-
trians.

In object identification tasks, objects or persons that are in an image are identified using
unique features (Barik & Mondal, 2010). For person identification, for example, a computer
vision system can use characteristics, such as fingerprint, face or handwriting. Facial rec-

65
ognition, for instance, uses biometric features from an image and compares them to the
biometric features of other images from a given database. Person identification is com-
monly used to verify the identity of a person for access control.

Pose estimation tasks play an important role in autonomous driving. The goal is to esti-
mate the orientation and/or position of a given object relative to the camera (Chen et al.,
2020). This can, for instance, be the distance to another vehicle ahead or an obstacle on
the road.

In optical character recognition (OCR), handwritten or printed text is recognized from an


image and converted into a string, which can be processed by a machine (Islam et al.,
2017). In online banking, for instance, OCR can be used to extract the relevant information
for bank transfers such as the amount or the bank account information, from an invoice.

Motion analysis tasks

In classical odometry, motion sensors are used to estimate the change of the position of
an object over time. Visual odometry, conversely, hand analyzes a sequence of images to
gather information about the position and orientation of the camera (Aqel et al., 2016).
Autonomous cleaning bots can, for instance, use this information to estimate the location
in a specific room.

In tracking tasks, an object is located and followed in successive frames. A frame can be
defined as a single image in a longer sequence of images, such as videos or animations
(Yilmaz et al., 2006). This can, for instance, be the tracking of people, vehicles, or animals.

Image restoration tasks

Image restoration deals with the process of recovering a blurry or noisy image to an image
of better and clearer quality. This can, for instance, be old photographs, but also movies
that were damaged over time. To recover the image quality, filters like median or low-pass
Noise filters can remove the noise (Dhruv et al., 2017). Nowadays, methods from image restora-
In computer vision, Noise tion can also be used to restore missing or damaged parts of an artwork.
refers to a quality loss of
an image which is caused
by a disturbed signal. Geometry reconstruction tasks

In geometry reconstruction tasks, virtual 3D models of scenes from videos or images or


even real world objects are estimated (Han et al., 2021). This is typically done based on
multiple images that are taken from different perspectives.

Challenges in Computer Vision

In computer vision, there are five major challenges that must be tackled (Szeliski, 2022):

• The illumination of an object is very important. If lighting conditions change, this can
yield different results in the recognition process. For instance, red can easily be
detected as orange if the environment is bright.

66
• Differentiating similar objects can also be difficult in recognition tasks. If a system is
trained to recognize a ball it might also try to identify an egg as a ball.
• The size and aspect ratios of objects in images or videos pose another challenge in com-
puter vision. In an image, objects that are further away will appear to be smaller than
closer objects even if they are the same size.
• Algorithms must be able to deal with rotation of an object. If we look for instance at a
pencil on a table, it can either look like a line when we look from the top or as a circle
when we change to a different perspective.
• The location of objects can vary. In computer vision, this effect is called translation.
Going back to our example of the pencil, it should not make a difference to the algo-
rithm if the pencil is located on the center of a paper or next to it.

Because of these challenges, there is muchre research towards algorithms that are scale-,
rotation-, and/or translation invariant (Szeliski, 2022).

5.2 Image Representation and Geometry


Computer vision is about processing digital images. To be able to process images with a
computer, this section starts with an explanation of how to represent images in the form
of numerical data. For this purpose, we introduce the concept of pixels. Subsequently, we
will address the topic of filters and how images can be modified using filters.

Pixels

Images are constructed as a two-dimensional pixel array (Lyra et al., 2011). A pixel is the
smallest unit of a picture. The word originates from the two terms “pictures” (pix) and
“element”(el) (Lyon, 2006). A pixel is normally represented as a single square with one
color. It becomes visible when zooming deep into a digital image. You can see an example
of the pixels of an image in the figure below.

Figure 18: Pixels of a Digital Image

Source: Created on behalf of IU (2022).

67
In the resolution of an image, the number of pixels is specified. If the resolution is high, the
more details will be in the image. Conversely, if the resolution is low, the picture might
look fuzzy or blurry.

Color representations

There are various ways to represent the color of a pixel as a numerical value. The easiest
way is to use monochrome pictures. In this case, the color of a pixel will be represented by
a single bit, being 0 or 1. In a true color image, a pixel will be represented by 24 bits.

The following table shows the most important color representations with the according
number of available colors (color depth).

Table 2: Color Representations in Images

Name Color representation Color depth

Monochrome 1 bit 2 colors

8 bit 28 = 256 gray scale intensity


levels or colors

Real color 15 bit 215 = 32.768 colors

High color 16 bit 216 = 65.536 colors

True color 24 bit 224 = 16.777.216colors

Deep color 30 – 48 bit 230 − 248colors

Source: Created on behalf of IU (2022).

One way to represent colors is the RGB color representation. We illustrate this using the
24-bit color representation. Using RGB, the 24 bits of a pixel are separated in three parts,
each 8 bits in length. Each of those parts represents the intensity of a color between 0 and
255. The first is the color red (R), the second green (G), and last blue (B). Out of these three
components all the other colors can be mixed additively. For instance the color code
RGB(0, 255, 0) will yield 100 percent green. If all values are set to 0, the resulting color will
be black. If all values are set to 255 it will be white. The figure below illustrates how the
colors are mixed in an additive way.

68
Figure 19: Additive Mixing of Colors

Source: Created on behalf of IU (2022).

Another way to represent colors is the CMYK model. In contrast to the RGB representation
it is a subtractive color model comprised of cyan, magenta, yellow and key (black). The
color values in CMYK range from 0 to 1. Therefore, to convert colors from RGB to CMYK, the
RGB values first have to be divided by 255. Therefore, the values of cyan, magenta, yellow
and key can be computed as follows:

R G B
K=1 − max , ,
255 255 255
R
1− −K
255
C= 1−K
G
1− −K
255
M= 1−K
B
1− −K
255
Y= 1−K

While the RGB is better suited for digital representation of images, CMYK is commonly
used for printed material.

69
Images as functions

We will now discuss how an image can be built from single pixels. To do that, we need a
function that can map a two-dimensional coordinate (x,y) to a specific color value. On the
x-axis we begin on the left with a value of 0 and continue to the right until the maximum
width of an image is reached. On the y-axis, we begin with 0 at the top and reach the
height of an image at the bottom.

Let us look at the function f x, y for an 8-bit gray scale image. The function values of
f 42, 100 = 0 would mean that we will have a black pixel 42 pixels to the right and 100
pixels below the starting point. In a 24-bit image the result of the function would be a tri-
ple value indicating the RGB intensity of the specified pixel.

Filters

Filters play an important role in computer vision when it comes to applying affects to an
image, implementing techniques like smoothing, or inpainting, or extracting useful infor-
mation from an image, like the detection of corners or edges. It can be defined as a func-
tion that gets an image as an input, applies modifications to that image, and returns the
filtered image as an output (Szeliski, 2022).

2D convolution

A frequently used technique to filter images is 2D convolution. If 2D convolution is applied


to an image, a small matrix (also called a convolution matrix or kernel) is moved over the
matrix of an image pixel by pixel and multiplied with the values of the matrix. The convo-
lution matrix usually consists of 3x3 or 5x5 elements (Smith, 1997).

The convolution of an image I with a kernel k with a size of n and a center coordinate a can
be calculated as follows:

n n
I ⋅ x, y = ∑ ∑ I x + i − a, y + j − a k i, j
i = 1i = j

where I · x, y is the value of the resulting image I · at position x, y while I is the origi-
nal image. The center coordinate for a 3x3 convolution matrix is 2, for a 5x5 convolution
matrix 3 and so forth. To understand the process, we will use the following example of a
3x3 convolution. The kernel matrix used for the convolution is shown in the middle col-
umn of the figure.

70
Figure 20: 2D Image Convolution

Source: Created on behalf of IU (2022).

The kernel matrix is moved over each position of the input image. In our input image the
current position is marked orange. In our example we start with the center position of the
image and multiply the image on this position with the values of the kernel matrix. The
resulting value for the center position of our filtered image is computed as follows:

0 · 41 + 0 · 26 + 0 · 86 + 0 · 27 + 0 · 42 + 1 · 47 + 0 · 44 + 0 · 88 + 0
· 41 = 47

In the next step, we shift the kernel matrix to the next position and compute the new value
of the filtered image:

0 · 26 + 0 · 86 + 0 · 41 + 0 · 42 + 0 · 47 + 1 · 93 + 0 · 88 + 0 · 41 + 0
· 24 = 93

The bottom row in our figure shows the result after all positions of the image have been
multiplied with the kernel matrix.

71

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy