ML Unit 1 CS

UNIT 1
Introduction
KCS 055/ KOE 073: Machine Learning
Introduction
Anurag Malik
(Associate Prof. CS & E)
CS & E Dept. M.I.T Moradabad

B.Tech V/ VII CS /ME
Recommended Books:
1. Tom M. Mitchell,―Machine Learning, McGraw-Hill Education (India) Private Limited, 2013.
2. Ethem Alpaydin,―Introduction to Machine Learning (Adaptive Computation and Machine

Learning), The MIT Press 2004.
3. Stephen Marsland, ―Machine Learning: An Algorithmic Perspective, CRC Press, 2009.
4. Bishop, C., Pattern Recognition and Machine Learning. Berlin: Springer- Verlag.
September 30, 2023 1

Syllabus KCS 055

What is Learning?
 “Learning denotes changes in a system that ... enable a system to do the same task
… more efficiently the next time.” - Herbert Simon
• Learning is the process of acquiring new understanding, knowledge,
behaviors, skills, values, attitudes and preferences.
• The ability to learn is possessed by humans, animals, and some machines.
 “Learning is making useful changes in our minds.” - Marvin Minsky
• Some learning is immediate, induced by a single event (e.g. being burned by a hot
stove), but much skill and knowledge accumulates from repeated experiences.

Types of Learning
1. Visual (Spatial) :By representing information and with
images, students are able to focus on meaning, such as
architecture, engineering, project management, or design.
2. Aural (Auditory-Musical): If you need someone to
tell you something out loud to understand it, you are an
auditory learner. such as musician, recording engineer,
speech pathologist, or language teacher.
3. Verbal (Linguistic): People who find it easier to express
themselves by writing or speaking can be regarded as a verbal learner.
4. Physical (Kinesthetic) :In this style, learning happens
when the learner carries out a physical activity, rather
than listening to a lecture or watching a demonstration .

Types of Learning (Cont…)
5. Logical (Mathematical) :When you like using your
brain for logical and mathematical reasoning,
you’re a logical learner. You easily recognise patterns
and can connect seemingly meaningless concepts easily.
such as scientific research, accountancy, bookkeeping
or computer programming.
Social (Interpersonal) : If you’re at best in socializing
and communicating with people, both verbally and

non-verbally, this is what you are; a social learner.
People often come to you to listen and ask for
advice. counseling, teaching, training and coaching,
sales, politics, and human resources among others.

Related Fields
data
mining control theory
statistics
decision theory
information theory machine
learning
cognitive science
databases
psychological models
evolutionary neuroscience
models
Machine learning is primarily concerned with the accuracy

and effectiveness of the computer system.
7
Well – Posed Learning Problems
 Learning can be defined through a computer program that improves its
performance at some task through experience.
 Definition of Learning: A computer program is said to learn from
experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves
with experience E.
 Lets have some examples of Well Posed Learning Problems
 Learn to Play Checkers
 Learn to recognize spoken words (SPHINX System)
 Learning to drive an autonomous vehicle (ALVINN System)
 Learning to classify new astronomical structures
 Predict recovery rates of pneumonia patients
 Detect fraudulent use of credit cards

 Three features: the class of tasks, the measure of performance to be improved,
and the source of experience.
 A checkers learning problem:
 Task T: playing checkers
 Performance measure P: percent of games won against opponents
 Training experience E: playing practice games against itself
 We can specify many learning problems in this fashion, such as learning
to recognize handwritten words, or learning to drive a robotic automobile
autonomously.
 A handwriting recognition learning problem:
 Task T: recognizing and classifying handwritten words within images
 Performance measure P: percent of words correctly classified
 Training experience E: a database of handwritten words with given
classifications

 A robot driving learning problem:
Task T: driving on public four-lane highways using vision sensors
Performance measure P: average distance traveled before an
error (as judged by human overseer)
Training experience E: a sequence of images and steering
commands recorded while observing a human driver

DESIGNING A LEARNING SYSTEM
1. Choosing the Training Experience
2. Choosing the Target Function
3. Choosing a Representation for the Target

Function
4. Choosing a Function Approximation Algorithm
5. The Final Design

Designing a Learning System
 While designing a Learning system various design issues and approaches
must be consider.
1. Choosing the Training Experience: The first design choice we face is to
choose the type of training experience from which our system will
learn. The type of training experience available can have a significant
impact on success or failure of the learner.
 One key attribute is whether the training experience provides direct or
indirect feedback regarding the choices made by the performance system.
 A second important attribute of the training experience is the degree to
which the learner controls the sequence of training examples.
 A third important attribute of the training experience is how well it
represents the distribution of examples over which the final system
performance P must be measured.

Designing a Learning System
A checkers learning problem:
Task T: Playing checkers (draughts)
Performance Measures P: percent of games won in world tournament
Training Experience E: games played against itself
What experience?
What exactly should be learned?
How shall it be represented?
What specific algorithm to learn it?

Direct versus Indirect Learning
1. Individual checkers board states and correct
move for each
2. Move sequences and final outcomes of various
games played
Credit assignment problem - the degree to which

each move in the sequence deserves credit or
blame for the final outcome - game can be lost
even when early moves are optimal, if these are
followed later by poor moves or vice versa
Teacher or not?
Degree to which learner controls the sequence of training examples
1. Teacher selects informative board states & provides the correct

moves
2. For each proposed board state the learner finds particularly

confusing it asks the teacher for correct move
3. Learner may have complete control as it does when it learns by

playing itself with no teacher - learner may choose between
experimenting with novel board states or honing its skill by
playing minor variations of promising lines of play

1. Choose Training Experience
How well training experience represents the distribution of examples over
which the final system performance P must be measured
P is percent of games in the world tournament, obvious danger when E

consists of only games played against itself (probably can’t get world
champion to teach computer!)
Most current theories of machine learning assume that the distribution of

training examples is identical to the distribution of test examples
It is IMPORTANT to keep in mind that this assumption must often by

violated in practice.
E: play games against itself (advantage of getting a lot of data this way)

2. Choose a Target Function
The next design choice is to determine exactly what type of knowledge
will be learned and how this will be used by the performance
program.
ChooseMove: B -> M where B is any legal board state and M is a legal

move (hopefully the “best” legal move)
Alternatively, function V: B ->  which maps from B to some real value
where higher scores are assigned to better board states
Now use the legal moves to generate every subsequent board state
and use V to choose the best one and therefore the best legal move

Choose a Target Function II
Let us define the target value V(b) for an

arbitrary board state b in B, as follows
V(b) = 100, if b is a final board state that is won
V(b) = -100, if b is a final board state that is lost
V(b) = 0, if b is a final board state that is a draw
V(b) = V(b´), if b is not a final state where b´ is
the best final board state starting from b
assuming both players play optimally

3. Choosing a Representation for the Target
Function
 Given the ideal target function V, we will choose a representation that the
learning system will use to describe V' that it will learn.
 The function V' will be calculated as a linear combination of the following
board features:
 xl: the number of black pieces on the board
 x2: the number of red pieces on the board
 x3: the number of black kings on the board
 x4: the number of red kings on the board
 x5: the number of black pieces threatened by red (which can be

captured on red's next turn)
 x6: the number of red pieces threatened by black
3. Choosing a Representation for the
Target Function
 Thus, learning program will represent V'(b) as a linear
function of the form:
 V'(b) = w0+ w1x1+ w2x2+ w3x3+ w4x4+ w5x5+ w6x6
 where wi is the numerical coefficient or weight to

determine the relative importance of the various board
features and xi is the number of i-th objects on the board.
 where w0 through w6 are numerical coefficients or weights
to be chosen by the learning algorithm
Design So Far
T: Checkers
P: percent of games won in world tournament
E: games played against self
V: Board -> 
Target Function Representation:

V´(b) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6

4. Choose Function Approximation Algorithm
 In order to learn the target function f we require a set of training examples,
each describing a specific board state b and the training value Vtrain(b)
for b.
 In other words, each training example is an ordered pair of the form
(b,Vtrain(b)).
 First need Set of training examples <b,Vtrain(b)>
 For instance, the following training example describes a board state b in
which black has won the game (note x2 = 0 indicates that red has no
remaining pieces) and for which the target function value Vtrain(b) is
therefore +100.
<(x1=3,x2=0,x3=1,x4=0,x5=0,x6=0),+100> because x2=0
a) Estimating Training Values:

b) Adjusting the weights:

5. The Final Design
 The final design of our checkers learning system can be naturally described by four
distinct program modules that represent the central components in many learning
systems. These four modules
1. Performance System: Solve performance task using learned target function(s). It
takes instance of new problem as input and a trace of its solution (history) as output.
2. Critic: Take history of problem as input and produce a set of training examples of
target function as output.
3. Generalizer: Take training examples as input and produce estimate of target function
as output hypothesis. It generalizes from specific training examples, hypothesizing a
general function that covers all examples.
4. Experiment Generator: Take current hypothesis (currently learned function) as

input and outputs a new problem (i.e., initial board state) for Performance System to
explore. Its role is to pick new practice problems that will maximize the learning rate
of the overall system.

5. The Final Design
Fig. Final Design of checkers learner problem

What is Machine Learning?
 A branch of artificial intelligence, concerned with the design
and development of algorithms that allow computers to evolve
behaviors based on empirical data.
 As intelligence requires knowledge, it is necessary for the

computers to acquire knowledge.
 “Machine learning refers to a system capable of the
autonomous acquisition and integration of knowledge.”
 https://www.youtube.com/watch?v=Cx5aNwnZYDc
 https://www.youtube.com/watch?v=YhSeTEumjVA
 https://www.youtube.com/watch?v=ZoemTySxFso

Machine Learning Paradigms
 rote learning
 learning by being told (advice-taking)
 learning from examples (induction)
 learning by analogy
 speed-up learning
 concept learning
 clustering
 discovery
Why Machine Learning?
 No human experts
 industrial/manufacturing control
 mass spectrometer analysis, drug design, astronomic discovery
 Black-box human expertise
 face/handwriting/speech recognition
 driving a car, flying a plane
 Rapidly changing phenomena
 credit scoring, financial modeling
 diagnosis, fraud detection
 Need for customization/personalization
 personalized news reader
 movie/book recommendation
 Recent progress in algorithms and theory
 Growing Flood of online data
 Computational power is available
AI vs ML vs DL

Continue……

Data Science Machine Learning
S. No.
Data Science is a field about processes and systems Machine Learning is a field of study that gives
1. to extract data from structured and semi- computers the capability to learn without
structured data. being explicitly programmed.
2. Need the entire analytics universe. Combination of Machine and Data Science.
Machines utilize data science techniques to

3. Branch that deals with data.
learn about the data.
It uses various techniques like regression and

Data in Data Science maybe or maybe not evolved
4. supervised clustering.
from a machine or mechanical process.
Data Science as a broader term not only focuses

5. on algorithms statistics but also takes care of the But it is only focused on algorithm statistics.
data processing.
6. It is a broad term for multiple disciplines. It fits within data science.
Many operations of data science that is, data It is three types: Unsupervised learning,
7.
gathering, data cleaning, data manipulation, etc. Reinforcement learning, Supervised learning.
Example: Facebook uses Machine Learning

8. Example: Netflix uses Data Science technology.
technology.

Tools used for AI,ML and Deep Learning

Continue……
1. Tensorflow
 TensorFlow is basically an open source software library that is used for numerical computation with the help of
data flow graph. It came into sight by the dedicated efforts of engineers and researchers working on the Google
Brain Team. The flexible architecture of Tensorflow allows you to deploy computation to multiple GPUs or CPUs
in a server/mobile device/desktop by using just a single API.
2. IBM Watson
 IBM has been a viking in the field of Artificial Intelligence as it is working on this technology for a very long time.
The company has its own AI platform named Watson that comes housing numerous AI Tools for both business
users and developers. Watson is available as a set of open APIs, by which users can simply access a lot of starter
kits and sample codes. Users can use them to make virtual agents and cognitive search engines. Moreover, the
cherry on the cake for Watson is its chatbot building platform that is developed focusing on beginners and requires
little machine learning skills.
3. Caffe
 Caffe is a deep learning C++ framework that has been developed keeping modularity, expression, and speed in
mind. Talking about its working, Caffe’s focus remains stable on Convolutional Networks for computer vision
applications.

Continue……
4. Deeplearning4j
 Deeplearning4j is termed as the first open-source, commercial grade, distributed deep learning library
developed for Scala and Java. It's easy to use infrastructure makes it a panacea for non-researchers. The
most fascinating quality of DL4J is that it can import neural net models from many major frameworks via
Keras, which include Theano, Caffe, and TensorFlow.
5. Torch
 Torch is also an open source machine learning library, which is being used by many giant IT firms
including Yandex, IBM, Idiap Research Institute, & Facebook AI Research Group. It can also be termed
as a scientific computing framework and a script language that is based on Lua programming language.
After its successful execution on web platforms, Torch has also been extended for the use on iOS and
Android.

Learning System Model
Testing
Input Learning
Samples Method
System
Training

Training and Testing
Data Practical
acquisition usage
Universal set
(unobserved)
Training set Testing set

(observed) (unobserved)
Training and Testing
 Training is the process of making the system able to learn.
 No free lunch rule:

 Training set and testing set come from the same distribution
 Need to make some assumptions or bias

Algorithms
Supervised Unsupervised
learning learning

EXAMPLES OF ML
•Personalization: Online services like Amazon/Netflix
use AI to personalize our experience. They learn from
our, other users previous purchases and recommend
relevant content for us.
•Image recognition: ML can be used for face detection

in an image. There is a separate category for each
person in a database of several people.
•Medical diagnoses: ML is trained to recognize

cancerous tissues.
•Speech Recognition: It is translation of spoken words
in text. It is used in voice searches and more. Voice
user interfaces include voice dialing, call routing, and
appliance control. (also Natural language processing)
•Data mining: The application of ML methods to large

databases.
•Fraud detection: Banks use AI to determine strange

activity on our account. Unexpected activity, such as
foreign transactions, could be flagged by the algorithm.

DATA MINING (KDD)

KDD Process
Selection: Obtain data from various sources.

Preprocessing: Cleanse data.
Transformation: Convert to common format. Transform
to new format.
Data Mining: Obtain desired results.
Interpretation/Evaluation: Present results to user in
meaningful manner.

KDD Process: Several Key Steps
Many people treat data mining as a synonym for another popularly used term, Knowledge
Discovery from Data, or KDD. Alternatively, others view data mining as simply an essential
step in the process of knowledge discovery. Knowledge discovery as a process
is depicted in Figure 1.4 and consists of an iterative sequence of the following steps:
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be combined)
3. Data selection (where data relevant to the analysis task are retrieved from the
database)
4. Data transformation (where data are transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations, for instance)
5. Data mining (an essential process where intelligent methods are applied in order to
extract data patterns)
6. Pattern evaluation (to identify the truly interesting patterns representing knowledge
based on some interestingness measures.
7. Knowledge presentation (where visualization and knowledge representation
techniques are used to present the mined knowledge to the user)
Steps 1 to 4 are different forms of data preprocessing, where the data are prepared for
mining. The data mining step may interact with the user or a knowledge base. The
interesting patterns are presented to the user and may be stored as new knowledge in the
knowledge base

•Email filtering: Email services use AI to filter
incoming emails. Users can train their spam filters
by marking emails as spam.
•Prediction: ML can be used in prediction systems.

Considering the loan example, to compute the
probability of a fault, the system will need to
classify the available data in groups.
•Computer vision, Computational biology, Robot

control, Handwriting recognition
History of ML
 1950 — Alan Turing creates the “Turing Test” to determine if a computer has real
intelligence. To pass the test, a computer must be able to fool a human into believing it is also
human.
 1952 — Arthur Samuel wrote the first computer learning program. The program was the
game of checkers, and the IBM computer improved at the game the more it played, studying
which moves made up winning strategies and incorporating those moves into its program.
 1957 — Frank Rosenblatt designed the first neural network for computers (the perceptron),
which simulate the thought processes of the human brain.
 1967 — The “nearest neighbor” algorithm was written, allowing computers to begin using
very basic pattern recognition. This could be used to map a route for traveling salesmen,
starting at a random city but ensuring they visit all cities during a short tour.
 1979 — Students at Stanford University invent the “Stanford Cart” which can navigate
obstacles in a room on its own.
 1981 — Gerald Dejong introduces the concept of Explanation Based Learning (EBL), in
which a computer analyses training data and creates a general rule it can follow by discarding
unimportant data.


History of ML
 1985 — Terry Sejnowski invents NetTalk, which learns to pronounce words
the same way a baby does.
 1990s — Work on machine learning shifts from a knowledge-driven
approach to a data-driven approach. Scientists begin creating programs for
computers to analyze large amounts of data and draw conclusions — or “learn” —
from the results.
 1997 — IBM’s Deep Blue beats the world champion at chess.
 2006 — Geoffrey Hinton coins the term “deep learning” to explain new
algorithms that let computers “see” and distinguish objects and text in images
and videos.
 2010 — The Microsoft Kinect can track 20 human features at a rate of 30
times per second, allowing people to interact with the computer via movements
and gestures.
 2011 — IBM’s Watson beats its human competitors at Jeopardy.
 2011 — Google Brain is developed, and its deep neural network can learn to
discover and categorize objects much the way a cat does.

History of ML
 2012 – Google’s X Lab develops a machine learning algorithm that is able to
autonomously browse YouTube videos to identify the videos that contain cats.
 2014 – Facebook develops DeepFace, a software algorithm that is able to
recognize or verify individuals on photos to the same level as humans can.
 2015 – Amazon launches its own machine learning platform.
 2015 – Microsoft creates the Distributed Machine Learning Toolkit, which
enables the efficient distribution of machine learning problems across multiple
computers.
 2015 – Over 3,000 AI and Robotics researchers, endorsed by Stephen
Hawking, Elon Musk and Steve Wozniak (among many others), sign an open letter
warning of the danger of autonomous weapons which select and engage targets
without human intervention.
 2016 – Google’s artificial intelligence algorithm beats a professional player at
the Chinese board game Go, which is considered the world’s most complex board
game and is many times harder than chess. The AlphaGo algorithm developed by
Google DeepMind managed to win five games out of five in the Go competition.

Some Issues in Machine Learning
 What algorithms can approximate functions well (and when)?
 How does number of training examples influence accuracy?
 How does complexity of hypothesis representation impact it?
 How does noisy data influence accuracy?
 What are the theoretical limits of learnability?
 How can prior knowledge of learner help?
 What clues can we get from biological learning systems?
 How can systems alter their own representations?
 Understanding Which Processes Need Automation.
 Lack of Quality Data.
 Inadequate Infrastructure.
 Implementation.
 Lack of Skilled Resources.

TRADITIONAL PROGRAMMING VS ML

Machine Learning Approaches

Machine Learning Approaches

TYPES OF ML
Using data for answering questions
Training Predicting

Supervised vs. Unsupervised Learning
 Supervised learning (classification)

 Supervision: The training data (observations, measurements, etc.)
are accompanied by labels indicating the class of the observations
 New data is classified based on the training set
 No new class is generated
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data
 New classes can be generated.

1. SUPERVISED LEARNING
• Supervised Learning Algorithms are the ones that involve direct
supervision of the operation.
• Developer labels sample data and set strict boundaries upon which the
algorithm operates.
• Learn through examples of which we know the desired output (what we

want to predict).
• The primary purpose of supervised learning is to scale the scope of data

and to make predictions of unavailable, future or unseen data based on
labeled sample data

1. SUPERVISED LEARNING
• It is a spoon-fed version of machine learning:
 you select what kind of information output (samples) to “feed” the
algorithm;
 what kind of results it is desired (for example “yes/no” or “true/false”).
• Example:
 Is this a cat or a dog?

 Are these emails spam or not?
 Predict the market value of houses, given the square meters, number of
rooms, neighborhood, etc.

TYPES OF SUPERVISED LEARNING
• Classification separates the data, Regression fits the data.

I. Classification (Categorial Target Variable) –
• Classification is the process where incoming data is labeled

based on past data samples and manually trains the algorithm to
recognize certain types of objects and categorize them
accordingly.
• The system has to know how to differentiate types of

information, perform an optical character, image, or binary
recognition (whether a particular bit of data is compliant or
non-compliant to specific requirements in a manner of “yes” or
“no”).
• eg. Medical Imaging.

II. Regression (Continuous Target Variable)
•Regression is the process of identifying patterns and calculating

the predictions of continuous outcomes.
•The system has to understand the numbers, their values, grouping

(for example, heights and widths), etc.
•eg. Housing Price Prediction

PROS & CONS OF SUPERVISED LEARNING
 PROS
• It allows to collect and produce data from previous experience.
• It is more trustworthy compared to unsupervised learning, which can be

computationally complex and less accurate in some instances.
 CONS
• Concrete examples are required for training classifiers.
• Decision boundaries can be over trained in absence of right examples.
• Difficulty in classifying big data.

EXAMPLE OF SUPERVISED
LEARNING ALGORITHMS
•Linear Regression
•k-Nearest Neighbor
•Naive Bayes
•Decision Trees
•Support Vector Machine (SVM)
•Random Forest
•Neural Networks (Deep learning)

Classification by Decision Tree Induction (DTI)
 DTI is the learning of decision trees from class_labeled training tuples.
 A decision tree is a flowchart-like tree structure, where each internal node
(non-leaf node) denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node (or terminal node) holds a class
label. The top most node is the root node.
 Why are DT Classifier so popular ?
 The construction of DT classifiers does not require any domain
knowledge or parameter setting, and therefore is appropriate for
exploratory knowledge discovery.
 DT can handle high dimensional data.
 Their representation of acquired knowledge in tree form is intuitive
and generally easy to assimilate by humans
 They have good accuracy.
 They may be used in medicine manufacturing, production, financial
analysis, astronomy and molecular biology.

Output: A Decision Tree for “buys_computer”
age?
<=30 overcast
31..40 >40
student? yes credit rating?
no yes excellent fair
no yes no yes

Bayesian Classification: Why?
 A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities( that a given tuple belongs to a particular
class)
 Foundation: Based on Bayes’ Theorem given by Thomas Bayes
 Performance: A simple Bayesian classifier, naïve Bayesian classifier,
has comparable performance with decision tree and selected neural
network classifiers.
 Class Conditional Independence : Naïve Bayesian Classifiers assume
that the effect of an attribute value on a given class is independent of the
values of the other attributes. This assumption is called class conditional
independence.
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct — prior
knowledge can be combined with observed data
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured
 Bayesian Belief Network: are graphical models that allow the
representation of dependencies among subsets of attributes
Naïve Bayesian Classification
 Naïve Bayes classifier use all the attributes
 Two assumptions:
 –Attributes are equally important
 – Attributes are statistically independent
i.e., knowing the value of one attribute
says nothing about the value
of another
 Equally important & independence assumptions
are never correct in real-life datasets

Bayesian Theorem: Basics
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 E.g. Our world of tuples is confined to customers described by the attributes age and
income. X is a35 year old customer with an income $40,000.
 Classification is to determine P(H|X), the probability that the hypothesis holds given
the observed data sample X. P(H|X) reflects the probability that customer X will buy a
computer given that we know the customer’s age and income.
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): prior probability of X. probability that sample data is observed( that a person
from our set of customers is 35 years old and earns $40,000
 P(X|H) (posteriori probability), the probability of observing the sample X, given that
the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income
Bayesian Theorem
 Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes theorem
P( H | X)  P(X | H ) P( H )
P(X)
 Informally, this can be written as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
 Practical difficulty: require initial knowledge of many
probabilities, significant computational cost
Artificial Neural Networks
 Artificial Neural Networks (ANN) Started by psychologists and neurobiologists to develop and
test computational analogues of neurons
 Other names:
1.Connectionist learning 2.Prediction by N N 3. Adaptive networks,
4. Neural computation 5.Parallel distributed processing 6. Collective computation
 Artificial neural networks components:
 Units : A neural network is composed of a number of nodes, or units. It is Metaphor for
nerve cell body
 Links: Units connected by links. Links represent synaptic connections from one unit to
another
 Weight : Each link has a numeric weight

Genetic Algorithms (GA)
 Genetic Algorithm: based on an analogy to biological evolution
 An initial population is created consisting of randomly generated rules
 Each rule is represented by a string of bits
 E.g., if A1 and ¬A2 then C2 can be encoded as 100
 If an attribute has k > 2 values, k bits can be used
 Based on the notion of survival of the fittest, a new population is formed to
consist of the fittest rules and their offsprings
 The fitness of a rule is represented by its classification accuracy on a set of
training examples
 Offsprings are generated by crossover and mutation
 The process continues until a population P evolves when each rule in P
satisfies a prespecified threshold
 Slow but easily parallelizable
Genetic Algorithms
 A Genetic Algorithm (GA) is a computational model
consisting of five parts:
 A starting set of individuals, P.
 Crossover: technique to combine two parents to
create offspring.
 Mutation: randomly change an individual.
 Fitness: determine the best individuals.
 Algorithm which applies the crossover and
mutation techniques to P iteratively using the
fitness function to determine the best
individuals in P to keep.

What is the Support Vector Machine?
 “Support Vector Machine” (SVM) is a
supervised machine learning algorithm that can be
used for both classification or regression
challenges. However, it is mostly used in
classification problems. In the SVM algorithm,
we plot each data item as a point in n-dimensional
space (where n is a number of features you have)
with the value of each feature being the value of a
particular coordinate. Then, we perform
classification by finding the hyper-plane that
differentiates the two classes very.
 Support Vectors are simply the coordinates of
individual observation. The SVM classifier is a
frontier that best segregates the two classes
(hyper-plane/ line).

2. UNSUPERVISED LEARNING
• Unsupervised learning feeds on unlabeled data.
• Supervised Learning needs to know the results and sort

out the data, whereas in unsupervised machine learning
algorithms the desired results are unknown and yet to
be defined.
• As no teacher is provided that means no training will

be given to the machine. Therefore machine is restricted
to find the hidden structure in unlabeled data by itself.

2. UNSUPERVISED LEARNING
• The unsupervised machine learning algorithm is used
for:
 exploring the structure of the information;

 extracting valuable insights;
 detecting patterns;
 descriptive modeling.
• Eg. I have photos and want to put them in 20 groups .

TYPES OF UNSUPERVISED LEARNING

I. Clustering(Target Variable not available) –
• It is an exploration of data used to segment it into meaningful

groups (i.e., clusters) based on their internal patterns without
prior knowledge of group credentials.
• The credentials are defined by similarity of individual data

objects and also aspects of its dissimilarity from the rest.
• eg. Customer segmentation -grouping customers by purchasing

behavior.

II. Association(Target Variable not available) –
• An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as
people that buy X also tend to buy Y.
eg. Market Basket Analysis

EXAMPLE OF UNSUPERVISED
LEARNING ALGORITHMS
•PCA
•t-SNE
•k-means
•DBSCAN
•Apriori algorithm
• FP – Growth
Dimensionality reduction: There is a lot of noise in the incoming

data. Machine learning algorithms use dimensionality reduction to
remove this noise while distilling the relevant information.
REINFORCEMENT LEARNING
• Reinforcement learning is about taking suitable action to
maximize reward in a particular situation.
• It uses exploration/exploitation. Action takes place,

consequences are observed and the next action considers
the results of first action.
• In supervised learning, training data has answer key with it

so the model is trained with correct answer itself. Whereas,
in reinforcement learning there is no answer but agent
decides what to do to perform given task. In the absence of
a training dataset, it is bound to learn from its experience.
• Agent is an assumed entity which performs actions in an

environment to gain some reward.
• Environment is a scenario that

 an agent has to face & gives
 feedback via positive or negative
 reward signal.
• State (s) is the current situation returned by the environment.

• Two main types of reward signals are:
 Positive reward signal encourages continuing performance in a particular

sequence of action.
 Negative reward signal penalizes for performing certain activities and urges
to correct algorithm to stop getting penalties.
• However, the function of reward signal may vary depending on

the nature of information.
• Overall, the system tries to maximize positive rewards and

minimize the negatives.
 https://www.yotube.com/watch?v=KiHdKynXDtw

• Input: It is initial state from which model will start.
• Output: There are many possible output as there are variety of solution to a
particular problem.
• Training: Training is based upon input, the model will return a state and
user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.

Various Practical applications of Reinforcement

Learning –
• RL can be used in robotics for industrial automation.
• RL can be used in machine learning and data processing.
• RL can be used to create training systems that provide

custom instruction and materials according to the
requirement of students.
There are two important learning models reinforcement

learning:
• Markov Decision Process
• Q learning

APPLICATIONS OF
SUPERVISED LEARNING,
UNSUPERVISED LEARNING
& REINFORCEMENT
LEARNING

TYPES OF ML
SUPERVISED LEARNING UNSUPERVISED LEARNING
(Task Driven) (Data Driven) REINFORCEMENT
(Predict next values) (Identitfy Clusters) LEARNING
(Learn from errors)
•Regression •Clustering
Linear SVD •Dynamic Programming
Polynomial PCA
•Monte Carlo Tree Search
K - Means
•Decision Tree (MCTS)
•Dimensionality Reduction
•Random Forest •Heuristic Methods
Text Mining
•Q-Learning;
Face Recognition
•Classification •Deep Adversarial
Big Data Visualization
KNN Networks
Image Recognition
Trees
Logistic Regression •Temporal Difference (TD)
•Association Analysis
NaiveBayes Apriori
SVM FP – Growth •Asynchronous Actor-Critic
•Hidden Markov Model Agents (A3C) 88
STEPS TO SOLVE A MACHINE
LEARNING PROBLEM
Data Gathering Collect data from various sources
Data Preprocessing Clean data to have homogeneity
Feature Engineering Making your data more useful
Algorithm Selection & Selecting the right machine learning

Training model
Making Predictions Evaluate the model

1. Data Gathering
•Might
depend on human work-
Manual labeling for supervised learning.
Domain knowledge. Maybe even experts.
•May come for free, or “sort of”

E.g., Machine Translation.
•The more the better: Some algorithms need large amounts

of data to be useful (e.g., neural networks).
•Quantity and quality of data dictate model accuracy.

2. Data Preprocessing
•Is there anything wrong with the data?
Missing values
Outliers
Bad encoding (for text)
Wrongly-labeled examples
Biased data
•Do I have many more
samples of one class than the rest?
•Need to fix/remove data?

3. Feature Engineering
•A feature is an individual measurable property of a
phenomenon being observed.
•Our inputs are represented by a set of features.
•To classify spam email, features could be:
Number of words that have been ch4ng3d like this.

Language of the email (0=English, 1=Spanish).
Number of emojis.

3. Feature Engineering
•Extract
more information from existing data-
Make it more useful
With good features, most algorithms can learn faster
•Requires thought and knowledge of the data
•Two steps:
Variable transformation (e.g., dates into weekdays,
normalizing)
Feature creation (e.g., n-grams for texts, if word is
capitalized to2023
September 30, detect names, etc.) 93
4. Algorithm Selection & Training
• Supervised • Unsupervised
•PCA
•Linear classifier •t-SNE
•Naive Bayes •k-means
•DBSCAN
•Support Vector Machines
• Apriori algorithm
(SVM) •FP – Growth
•Decision Tree
•Random Forests • Reinforcement
•k-Nearest Neighbors •SARSA–λ
•Q-Learning
•Neural Networks (Deep
• Markov Decision
learning) Process
4. Algorithm Selection & Training
•Goal of training: making the correct prediction as often as
possible .
•Incremental improvement:
• Use of metrics for evaluating performance and comparing

solutions.
• Hyperparameter tuning (A hyperparameter is a parameter whose value

is used to control the learning process)
5. Making Predictions

Type of data in clustering analysis
 Interval-Scaled Attributes
 Binary Attributes
 Nominal Attributes
 Ordinal Attributes
 Ratio-Scaled Attributes
 Attributes of Mixed Type

Data Types
Interval-Scaled Attributes
 Continuous measurements on a roughly
linear scale Example

Height Scale Weight Scale
1. Scale ranges over the 40kg 80kg 120kg
metre or foot scale 20kg 60kg 100kg
2. Need to standardize 1. Scale ranges over the

heights as different scale kilogram or pound scale
can be used to express
same absolute
measurement

Binary Variables
Object j
1 0 sum
 A contingency table for binary 1 a b a b
Object i
data 0 c d cd
sum a  c b  d p
 Distance measure for

d (i, j)  bc
symmetric binary variables: a bc  d
 Distance measure for
d (i, j)  bc
asymmetric binary variables: a bc
 Jaccard coefficient (similarity
sim (i, j )  a
measure for asymmetric binary Jaccard
a bc
variables):
Nominal / Categorical Variables
 A generalization of the binary variable in that it can take more

than 2 states, e.g., red, yellow, blue, green
 Method 1: Simple matching
 m: # of matches, p: total # of variables
d (i, j)  p 
p
m
 Method 2: use a large number of binary variables

 creating a new binary variable for each of the M nominal
states

Ratio-Scaled Variables
 Ratio-scaled variable: a positive measurement on a nonlinear

scale, approximately at exponential scale, such as AeBt or Ae-Bt
 Methods:
 treat them like interval-scaled variables—not a good
choice! (why?—the scale can be distorted)
 apply logarithmic transformation
yif = log(xif)
 treat them as continuous ordinal data treat their rank as
interval-scaled
September 30, 2023 Data Mining: Concepts and Techniques 101

S.No Machine Learning Deep Learning
1. Machine Learning is a superset of Deep Learning Deep Learning is a subset of Machine Learning
The data represented in Machine Learning is The data representation is used in Deep
2. quite different as compared to Deep Learning as Learning is quite different as it uses neural
it uses structured data networks(ANN).
Deep Learning is an evolution to Machine

3. Machine Learning is an evolution of AI Learning. Basically it is how deep is the
machine learning.
Machine learning consists of thousands of data

4. Big Data: Millions of data points.
points.
Outputs: Numerical Value, like classification of Anything from numerical values to free-form
5.
score elements, such as free text and sound.
Uses various types of automated algorithms that Uses neural network that passes data through
6. turn to model functions and predict future action processing layers to the interpret data features
from data. and relations.
Algorithms are detected by data analysts to Algorithms are largely self-depicted on data
7.
examine specific variables in data sets. analysis once they’re put into production.
Machine Learning is highly used to stay in the Deep Learning solves complex machine
8.
competition and learn new things. learning issues.

ML Unit 1 CS

Uploaded by

Copyright:

Available Formats

ML Unit 1 CS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Unit 1 CS

Uploaded by

Copyright:

Available Formats

UNIT 1

CS & E Dept. M.I.T Moradabad

1. Tom M. Mitchell,―Machine Learning, McGraw-Hill Education (India) Private Limited, 2013.

2. Ethem Alpaydin,―Introduction to Machine Learning (Adaptive Computation and Machine

3. Stephen Marsland, ―Machine Learning: An Algorithmic Perspective, CRC Press, 2009.

September 30, 2023 1

September 30, 2023 2

• The ability to learn is possessed by humans, animals, and some machines.

 “Learning is making useful changes in our minds.” - Marvin Minsky

September 30, 2023 4

September 30, 2023 5

and communicating with people, both verbally and

September 30, 2023 6

Machine learning is primarily concerned with the accuracy

September 30, 2023 8

September 30, 2023 9

September 30, 2023 10

1. Choosing the Training Experience

2. Choosing the Target Function

3. Choosing a Representation for the Target

4. Choosing a Function Approximation Algorithm

5. The Final Design

September 30, 2023 12

What exactly should be learned?

How shall it be represented?

What specific algorithm to learn it?

September 30, 2023 13

Credit assignment problem - the degree to which

1. Teacher selects informative board states & provides the correct

2. For each proposed board state the learner finds particularly

3. Learner may have complete control as it does when it learns by

September 30, 2023 15

P is percent of games in the world tournament, obvious danger when E

Most current theories of machine learning assume that the distribution of

It is IMPORTANT to keep in mind that this assumption must often by

September 30, 2023 16

ChooseMove: B -> M where B is any legal board state and M is a legal

September 30, 2023 17

Let us define the target value V(b) for an

September 30, 2023 18

 xl: the number of black pieces on the board

 x2: the number of red pieces on the board

 x3: the number of black kings on the board

 x4: the number of red kings on the board

 x5: the number of black pieces threatened by red (which can be

 V'(b) = w0+ w1x1+ w2x2+ w3x3+ w4x4+ w5x5+ w6x6

 where wi is the numerical coefficient or weight to

P: percent of games won in world tournament

E: games played against self

Target Function Representation:

September 30, 2023 21

a) Estimating Training Values:

September 30, 2023 22

4. Experiment Generator: Take current hypothesis (currently learned function) as

September 30, 2023 23

Fig. Final Design of checkers learner problem

September 30, 2023 24

 As intelligence requires knowledge, it is necessary for the

September 30, 2023 25

September 30, 2023 28