5th Unit Answer Bank AIML

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

State and explain non parametric density

estimation

Nonparametric Density Estimation Density Estimation: Given a sample S={xi }i=1..N from a distribution
obtain an estimate of the density function at any point. Parametric : Assume a parametric density family f
(.|θ) , (ex. N(µ,σ2) ) and obtain the best estimator of θ Advantages: • Efficient • Robust to noise: robust
estimators can be used Problem with parametric methods • An incorrectly specified parametric model has
a bias that cannot be removed even by large number of samples. Nonparametric : directly obtain a good
estimate of the entire density from the sample. Most famous example: Histogram
Analyze the K-nearest neighbor estimator
K-Nearest Neighbors
The KNN algorithm assumes that similar things exist in close proximity. In other words,
similar things are near to each other.

“Birds of a feather flock together.”


Image showing how similar data points typically exist close to each other
Notice in the image above that most of the time, similar data points are close to each other.
The KNN algorithm hinges on this assumption being true enough for the algorithm to be
useful. KNN captures the idea of similarity (sometimes called distance, proximity, or
closeness) with some mathematics we might have learned in our childhood— calculating
the distance between points on a graph.

Note: An understanding of how we calculate the distance between points on a graph is


necessary before moving on. If you are unfamiliar with or need a refresher on how this
calculation is done, thoroughly read “Distance Between 2 Points” in its entirety, and
come right back.

There are other ways of calculating distance, and one way might be preferable depending
on the problem we are solving. However, the straight-line distance (also called the
Euclidean distance) is a popular and familiar choice.

The KNN Algorithm

1. Load the data

2. Initialize K to your chosen number of neighbors

3. For each example in the data

3.1 Calculate the distance between the query example and the current example from the
data.

3.2 Add the distance and the index of the example to an ordered collection

4. Sort the ordered collection of distances and indices from smallest to largest (in
ascending order) by the distances
5. Pick the first K entries from the sorted collection

6. Get the labels of the selected K entries

7. If regression, return the mean of the K labels

8. If classification, return the mode of the K labels

Choosing the right value for K

To select the K that’s right for your data, we run the KNN algorithm several times with
different values of K and choose the K that reduces the number of errors we encounter
while maintaining the algorithm’s ability to accurately make predictions when it’s given
data it hasn’t seen before.

Here are some things to keep in mind:

1. As we decrease the value of K to 1, our predictions become less stable. Just think for a
minute, imagine K=1 and we have a query point surrounded by several reds and one
green (I’m thinking about the top left corner of the colored plot above), but the green is
the single nearest neighbor. Reasonably, we would think the query point is most likely
red, but because K=1, KNN incorrectly predicts that the query point is green.

2. Inversely, as we increase the value of K, our predictions become more stable due to
majority voting / averaging, and thus, more likely to make more accurate predictions
(up to a certain point). Eventually, we begin to witness an increasing number of errors.
It is at this point we know we have pushed the value of K too far.

3. In cases where we are taking a majority vote (e.g. picking the mode in a classification
problem) among labels, we usually make K an odd number to have a tiebreaker.

Advantages
1. The algorithm is simple and easy to implement.

2. There’s no need to build a model, tune several parameters, or make additional


assumptions.

3. The algorithm is versatile. It can be used for classification, regression, and search (as
we will see in the next section).

Disadvantages

1. The algorithm gets significantly slower as the number of examples and/or


predictors/independent variables increase

Elaborate non parametric classification?

Nonparametric Machine Learning Algorithms


Algorithms that do not make strong assumptions about the form of the mapping function are called nonparametric machine
learning algorithms. By not making assumptions, they are free to learn any functional form from the training data.

Nonparametric methods are good when you have a lot of data and no prior knowledge, and when you don’t want to worry
too much about choosing just the right features.

Nonparametric methods seek to best fit the training data in constructing the mapping function, whilst maintaining some
ability to generalize to unseen data. As such, they are able to fit a large number of functional forms.

An easy to understand nonparametric model is the k-nearest neighbors algorithm that makes predictions based on the k
most similar training patterns for a new data instance. The method does not assume anything about the form of the
mapping function other than patterns that are close are likely to have a similar output variable.

Some more examples of popular nonparametric machine learning algorithms are:

 k-Nearest Neighbors
 Decision Trees like CART and C4.5
 Support Vector Machines
Benefits of Nonparametric Machine Learning Algorithms:

 Flexibility: Capable of fitting a large number of functional forms.


 Power: No assumptions (or weak assumptions) about the underlying function.
 Performance: Can result in higher performance models for prediction.
Limitations of Nonparametric Machine Learning Algorithms:

 More data: Require a lot more training data to estimate the mapping function.
 Slower: A lot slower to train as they often have far more parameters to train.
 Overfitting: More of a risk to overfit the training data and it is harder to explain why specific predictions are made.

Nonparametric machine learning algorithms are those which do not make specific
assumptions about the type of the mapping function. They are prepared to choose any
functional form from the training data, by not making assumptions.
The word nonparametric does not mean that the value lacks parameters existing in it, but
rather that the parameters are adjustable and can change. When dealing with ranked data
one may turn to nonparametric modeling, in which the sequence in that they are ordered is
some of the significance of the parameters.
A simple to understand the nonparametric model is the k-nearest neighbors' algorithm,
making predictions for a new data instance based on the most similar training patterns k.
The only assumption it makes about the data set is that the training patterns that are the
most similar are most likely to have a similar result.

Some more examples of popular nonparametric machine learning algorithms are:

 k-Nearest Neighbors
 Decision Trees like CART and C4.5

 Support Vector Machines

Parametric vs. Nonparametric modeling

1. Parametric models deal with discrete values, and nonparametric models use
continuous values.

2. Parametric models are able to infer the traditional measurements associated with
normal distributions including mean, median, and mode. While some nonparametric
distributions are normally oriented, often one cannot assume the data comes from a
normal distribution.

3. Feature engineering is important in parametric models. Because you can poison


parametric models if you feed a lot of unrelated features. Nonparametric models
handle feature engineering mostly. We can feed all the data we have to those non-
parametric algorithms and the algorithm can ignore unimportant features. It would not
cause overfitting.

4. A parametric model can predict future values using only the parameters. While
nonparametric machine learning algorithms are often slower and require large amounts
of data, they are rather flexible as they minimize the assumptions they make about the
data.

Justify the condensed nearest neighbor?


class imblearn.under_sampling.CondensedNearestNeighbour(*, sampling_strategy='auto', ra
ndom_state=None, n_neighbors=None, n_seeds_S=1, n_jobs=None)[source]
Undersample based on the condensed nearest neighbour method.

Read more in the User Guide.

Parameters
sampling_strategystr, list or callable
Sampling information to sample the data set.
 When str, specify the class targeted by the resampling. Note the the number of samples will
not be equal in each. Possible choices are:
'majority': resample only the majority class;
'not minority': resample all classes but the minority class;
'not majority': resample all classes but the majority class;
'all': resample all classes;
'auto': equivalent to 'not minority'.
 When list, the list contains the classes targeted by the resampling.
 When callable, function taking y and returns a dict. The keys correspond to the targeted
classes. The values correspond to the desired number of samples for each class.
random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
 If int, random_state is the seed used by the random number generator;
 If RandomState instance, random_state is the random number generator;
 If None, the random number generator is the RandomState instance used by np.random.
n_neighborsint or estimator object, default=None
If int, size of the neighbourhood to consider to compute the nearest neighbors. If object, an
estimator that inherits from KNeighborsMixin that will be used to find the nearest-neighbors.
If None, a KNeighborsClassifier with a 1-NN rules will be used.
n_seeds_Sint, default=1
Number of samples to extract in order to build the set S.
n_jobsint, default=None
Number of CPU cores used during the cross-validation loop. None means 1 unless in
a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.
Attributes
sampling_strategy_dict
Dictionary containing the information to sample the dataset. The keys corresponds to the
class labels from which to sample and the values are the number of samples to sample.
estimator_estimator object
The validated K-nearest neighbor estimator created from n_neighbors parameter.
sample_indices_ndarray of shape (n_new_samples,)
Indices of the samples selected.
New in version 0.4.
n_features_in_int
Number of features in the input dataset.
New in version 0.9.

Illustrate in detail about K-armed bandit?


List and explain in detail about elements of reinforcement
learning?
Machine learning and Artificial Intelligence are becoming hot technologies right now. While most of
the available technologies and expertise is centered around supervised and unsupervised techniques,
the real AI paradigm as nature present it to us lies int the ability to learn while interacting with the
environment.
Traditional supervised learning is unrealistic, as no real word entity would have been in a situation
where it would be presented with a set of positive and negative examples. Rather, living organisms
learn by interacting and experimenting. By doing so they are not only learning a model to
discriminate between various categories but also learning the right policy to get the desired outcome.
The supervised learning is suitable for learning generalizations to new situations but is not suitable
for learning form the interactions with an environment. All possible interactions are in the database,
labelled with the correct action. The agent has the ability to generalize by inferring an action to an
unseen environment state, but it has no mean to correct itself if was wrong.
Reinforcement learning is learning how to map situations to actions that maximize a
numerical reward signal. The system is not told which actions to take, but instead must discover
which actions yield the most reward by trying them.
This class of policy learning is called Reinforcement learning and it belong to the class of semi-
supervised learning methodologies.
In this Tutorial we will explore the fundamental concepts behind these techniques and implement
them using python.
Reward, policy, actions what all these terms mean? These as some of the fundamental concepts in
Reinforcement Learning and I will explain them in the coming sections
Agents and Environments
An agent can be viewed as an object that is perceiving its environment through sensors and acting
upon that environment through actuators. This simple idea is illustrated in the following figure.

An animal agent has eyes, ears, and other organs for sensors and mouth, legs, wings, and so on for
actuators. A software agent receives data as sensory inputs and acts on the environment by displaying
on the screen, writing files, and sending network packets.
We use the term percept to refer to the agent’s perceptual inputs at any given instant. An agent’s
choice of action at any given time depends on what he perceives or has perceived until now, but not
on anything it hasn’t perceived. The agent’s behavior is described by a function that maps any given
perceived sequence to an action.
The mapping the agent’s response to every possible percept sequence defines the agent function that
maps any given percept sequence to an action. This function can be thought of as a very large table—
infinite, in fact. It can be, constructed by trying out all possible percept sequences and specifying the
agent response to it. In practice this table is implemented by the agent program.
The Elements of Reinforcement Learning
Beyond the agent and the environment, there are four main elements of a reinforcement learning
system: a policy, a reward, a value function, and, optionally, a model of the environment.
A policy defines the way the agent behaves in a given time. Roughly speaking, a policy is a mapping
from the states of the environment to actions to the actions the agent takes in the environment. The
policy can be a simple function or lookup table in the simplest cases, or it may involve complex
function computations. The policy is the core of what the agent learns.
A reward defines the goal of a reinforcement learning problem. On each time step, the action of the
agent results on a reward. The agent’s final objective is to maximize the total reward it receives. The
reward thus distinguishes between the good and bad action results for the agent. In a natural system,
we might think of rewards as pleasure and pain experiences.
The reward is the primary way for impacting the policy; if an action selected by the policy results in
a low reward, then the policy can be changed to select some other action in the same situation.
The reward signal indicates good actions in an immediate sense: each action results immediately on a
reward, a value function defines what is good in the long run.
The value of a state is the total aggregated number of rewards that the agent can expect to get in the
future if it starts from that state. Values indicate the long-term desirability of a set of states taking
into account the likely future states and the rewards yielded by those states. Even if a state might
yield a low immediate reward, it can still have a high value because it is regularly followed by other
states that yield higher rewards.
The interplay between rewards and values is often confusing for beginners as one is an aggregation
of the other. Rewards are primary and immediate, values, on the other hands are predictions of
rewards, they are secondary. Without rewards there are no values, and the only purpose of estimating
values is to achieve more reward. Nevertheless, it is values which we consider when making and
evaluating decisions. Action choices are ultimately made based on value judgments.
The agent will seek actions that bring states of highest value, not highest reward, these states will
lead to actions that earn the greatest amount of reward over the long run.
How then we determine the values and the rewards?
Rewards are directly given by the environment, but values have been constantly estimated from the
sequences of observations the agent makes at each interaction. This will make the method(s) of
estimating values efficiently the most important component reinforcement learning algorithms.

Another important element of some reinforcement learning systems is the model of the
environment. This is something that reproduce the behavior of the environment and allows
inferences to be made about how the environment will react. This model will help the agent to
predict the next reward id an action is taken and hence base the current action selection based on the
future environment reaction
Exploitation vs. exploration
A Reinforcement Learning agent will gradually learn the best (or near-best) policy essentially based
on trial and error, through random interactions with the environment and by incorporating the
responses of these interactions, in order to improve the overall performance. The agent’s actions
serve both as a means to explore (learn better strategies) and a way to exploit (greedily use the best
available strategy). Since exploration is costly in terms of resource, time and opportunity, a crucial
question in Reinforcement Learning is to address the dichotomy between exploration into uncharted
territory and exploitation of existing proven strategies. Specifically, the agent has to balance between
greedily exploiting what he learned so far and choose actions that currently yield the highest reward,
and continuously explore the environment to acquire more information to potentially achieve a
higher value in the long term.

An example
To illustrate these ideas, let’s use a simple example—the vacuum-cleaner world shown in the
following Figure.

This world is simple and made-up world, we can describe everything that happens in it and consider
several variations. This particular world has 9 locations: squares labelled by coordinates (i, j) where i
=1, 2, 3. and j =1, 2, 3. The vacuum agent perceives which square it is in and whether there is dirt in
the square. It can choose to move left, right, up or down, suck up the dirt, or do nothing. One very
simple agent function is the following: if the current square is dirty, then suck; otherwise, move to
the next square.
It’s important to define when a reward is given to the agent and whether it is positive or negative. A
naive approach would be to give a positive reward whenever the agent cleans all the squares.
However, as the agent explore randomly, the chances of receiving the reward, by cleaning all the
squares, is small. To guide the agent towards the desired goal, a better strategy would be to give
a small positive reward whenever it cleans a square and a small negative reward if the agent
attempts to clean an already cleaned square. And a big positive reward when all squares are
cleaned.
Here is how the vacuum cleaner problem would be approached by making use of value functions.
First, we would set up a table of numbers, one for each possible state of this small world. Each
number will be the latest estimate of the probability of our finishing the cleaning from that state. We
treat this estimate as the state’s value, and the whole table is the learned value function. State A has
higher value than state B, or is considered “better” than state B, if the current estimate of the
probability of our winning from A is higher than it is from B. All the states in which all the squares
are clean have a probability of finishing the cleaning of 1, because we have already cleaned all the
space.
To select the next move the agent examines the states that would result from each of our possible
moves (one for each of the 4 directions and 2 possible functions; sucking up or not) and look up their
current values in the table. Most of the time the agent will move greedily, selecting the move that
leads to the state with greatest value, that is, with the highest estimated probability of finding a dirty
square. Occasionally, however, the agent will choose randomly from among the other moves instead.
These are called exploratory moves because they cause us to experience states that we might
otherwise never see. The value of this exploration becomes apparent if we add to the value function a
reward related to how fast the space is cleaned. This would allow the agent to select better space
traversal strategy.

Elaborate model based learning with an examples


What is Model-Based Machine Learning (MBML)?
The field of machine learning has seen the development of thousands of learning algorithms.
Typically, scientists choose from these algorithms to solve specific problems. Their choices often
being limited by their familiarity with these algorithms. In this classical/traditional framework of
machine learning, scientists are constrained to making some assumptions so as to use an existing
algorithm. This is in contrast to the model-based machine learning approach which seeks to
create a bespoke solution tailored to each new problem.

The goal of MBML is "to provide a single development framework which supports the creation
of a wide range of bespoke models". This framework emerged from an important convergence
of three key ideas:

1. the adoption of a Bayesian viewpoint,


2. the use of factor graphs (a type of a probabilistic graphical model), and
3. the application of fast, deterministic, efficient and approximate inference algorithms.

The core idea is that all assumptions about the problem domain are made explicit in the form of a
model. In this framework, a model is simply a set of assumptions about the world expressed in a
probabilistic graphical format with all the parameters and variables expressed as random
components.

The Key Ideas of MBML


Bayesian Inference

The first key idea enabling this different framework for machine learning is Bayesian
inference/learning. In MBML, latent/hidden parameters are expressed as random variables with
probability distributions. This allows for a coherent and principled manner of quantification of
uncertainty in the model parameters. Once the observed variables in the model are fixed to their
observed values, initially assumed probability distributions (i.e. priors) are updated using the
Bayes' theorem.

This is in contrast to the traditional/classical machine learning framework where model


parameters are assigned average values that are determined by optimizing an objective function.
Bayesian inference on large models over millions of variables is similarly implemented using the
Bayes' theorem but in a more complex manner. This is because Bayes' theorem is an exact
inference technique that is intractable over large datasets. In the past decade, the increase in the
processing power of computers has enabled research and development of fast and efficient
inference algorithms that can scale to large data like Belief Propagation (BP), Expectation
Propagation (EP), and Variational Bayes (VB).

Factor Graphs

The second cornerstone to MBML is the use of Probabilistic Graphical Models (PGM),
particularly factor graphs. A PGM is a diagrammatic representation of the joint probability
distribution over all random variables in a model expressed as a graph. Factor graphs are a type
of PGM that consist of circular nodes representing random variables, square nodes for the
conditional probability distributions (factors), and vertices for conditional dependencies between
nodes (Figure 1). They provide a general framework for modeling the joint distribution of a set
of random variables.

The joint probability P(μ, X) over the whole model in Figure 1 is factorized as:

P(μ, X)=P(μ)*P(X|μ)

Where μ is the model parameter and X are the set of observed variables.
Figure 1: A Factor Graph

In factor graphs, we treat the latent parameters as random variables and learn their probability
distributions using Bayesian inference algorithms along with graph. Inference/learning is simply
the product of factors over a subset of variables in the graph. This allows for easy
implementation of local message-passing algorithms.

Probabilistic Programming (PP)

There's a revolution in Computer Science called Probabilistic programming (PP) where


programming languages are now built to compute with uncertainty in addition to computing with
logic. This means that existing programming languages can now support random variables,
constraints on variables and inference packages. Using a PP language, you can now describe a
model of your problem in a compact form with a few lines of code. Then an inference engine is
called to automatically generate inference routines (and even source code) to solve that problem.
Some notable examples of PP languages include Infer.Net, Stan, BUGS, church, Figarro and
PyMC. In this blog post, we will access Stan algorithms through the R interface.

Stages of MBML
There are 3 steps to model-based machine learning, namely:

1. Describe the Model: Describe the process that generated the data using factor graphs.
2. Condition on Observed Data: Condition the observed variables to their known quantities.
3. Perform Inference: Perform backward reasoning to update the prior distribution over the latent
variables or parameters. In other words, calculate the posterior probability distributions of latent
variables conditioned on observed variables.
Write in detail about partially observables state in learning?
A partially observable Markov decision process (POMDP) is a combination of an MDP and a hidden
Markov model. Instead of the state being observable, there are partial and/or noisy observations of the state that the
agent gets to observe before it has to act.
A POMDP consists of
 •
SS, a set of states of the world
 •
AA, a set of actions
 •
OO, a set of possible observations
 •
P(S0)P⁢(S0), which gives the probability distribution of the starting state

P(S′∣S,A)P⁢(S′∣S,A), which specifies the dynamics – the probability of getting to state S′S′ by doing
 •

action AA from state SS


 •
R(S,A,S′)R⁢(S,A,S′), which gives the expected reward of starting in state SS, doing action AA, and
transitioning to state S′S′, and
 •
P(O∣S)P⁢(O∣S), which gives the probability of observing OO given the state is SS.
A finite part of a POMDP can be depicted using the decision diagram as in Figure 9.22.

Figure 9.22: A POMDP


as a dynamic decision network
There are three main ways to approach the problem of computing the optimal policy for a POMDP:
 •
Solve the associated dynamic decision network using variable elimination for decision networks (Figure 9.13,
extended to include discounted rewards). The policy created is a function of the history of the agent. The
problem with this approach is that the history is unbounded, and the size of a policy is exponential in the
length of the history. This only works when the history is short or is deliberately cut short.
 •
Make the policy a function of the belief state – a probability distribution over the states. Maintaining the belief
state is the problem of filtering. The problem with this approach is that, with nn states, the set of belief states
is an (n−1)(n-1)-dimensional real space. However, because the value of a sequence of actions only depends
on the states, the expected value is a linear function of the values of the states. Because plans can be
conditional on observations, and we only consider optimal actions for any belief state, the optimal policy for
any finite look-ahead, is piecewise linear and convex.
 •
Search over the space of controllers for the best controller. Thus, the agent searches over what to remember
and what to do based on its belief state and observations. Note that the first two proposals are instances of this
approach: the agent remembers all of its history or the agent has a belief state that is a probability distribution
over possible states. In general, the agent may want to remember some parts of its history but have
probabilities over some other features. Because it is unconstrained over what to remember, the search space is
enormous.

or

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy