What Are The Basic Concepts in Machine Learning
What Are The Basic Concepts in Machine Learning
What Are The Basic Concepts in Machine Learning
I found that the best way to discover and get a handle on the basic concepts in machine
learning is to review the introduction chapters to machine learning textbooks and to
watch the videos from the first model in online courses.
Machine Learning
The first half of the lecture is on the general topic of machine learning.
Writing software is the bottleneck, we don’t have enough good developers. Let the data
do the work instead of people. Machine learning is the way to make programming
scalable.
Web search: ranking page based on what you are most likely to click on.
Computational biology: rational design drugs in the computer based on past
experiments.
Finance: decide who to send what credit card offers to. Evaluation of risk on
credit offers. How to decide where to invest money.
E-commerce: Predicting customer churn. Whether or not a transaction is
fraudulent.
Space exploration: space probes and radio astronomy.
Robotics: how to handle uncertainty in new environments. Autonomous. Self-
driving car.
Information extraction: Ask questions over databases across the web.
Social networks: Data on relationships and preferences. Machine learning to
extract value from data.
Debugging: Use in computer science problems like debugging. Labor intensive
process. Could suggest where the bug could be.
What is your domain of interest and how could you use machine learning in that
domain?
Types of Learning
There are four types of machine learning:
Inductive Learning is where we are given examples of a function in the form of data (x)
and the output of the function (f(x)). The goal of inductive learning is to learn the function
for new data (x).
Classification: when the function being learned is discrete.
Regression: when the function being learned is continuous.
Probability Estimation: when the output of the function is a probability.
Machine Learning in Practice
Machine learning algorithms are only a very small part of using machine learning in
practice as a data analyst or data scientist. In practice, the process often looks like:
1. Start Loop
1. Understand the domain, prior knowledge and goals. Talk to domain
experts. Often the goals are very unclear. You often have more things to try then you
can possibly implement.
2. Data integration, selection, cleaning and pre-processing. This is often
the most time consuming part. It is important to have high quality data. The more data
you have, the more it sucks because the data is dirty. Garbage in, garbage out.
3. Learning models. The fun part. This part is very mature. The tools are
general.
4. Interpreting results. Sometimes it does not matter how the model works
as long it delivers results. Other domains require that the model is understandable. You
will be challenged by human experts.
5. Consolidating and deploying discovered knowledge. The majority of
projects that are successful in the lab are not used in practice. It is very hard to get
something used.
2. End Loop
It is not a one-shot process, it is a cycle. You need to run the loop until you get a result
that you can use in practice. Also, the data can change, requiring a new loop.
Inductive Learning
The second part of the lecture is on the topic of inductive learning. This is the general
theory behind supervised learning.
In practice it is almost always too hard to estimate the function, so we are looking for
very good approximations of the function.
Problems where there is no human expert. If people do not know the answer
they cannot write a program to solve it. These are areas of true discovery.
Humans can perform the task but no one can describe how to do it. There
are problems where humans can do things that computer cannot do or do well.
Examples include riding a bike or driving a car.
Problems where the desired function changes frequently. Humans could
describe it and they could write a program to do it, but the problem changes too often. It
is not cost effective. Examples include the stock market.
Problems where each user needs a custom function. It is not cost effective to
write a custom program for each user. Example is recommendations of movies or books
on Netflix or Amazon.
The Essence of Inductive Learning
We can write a program that works perfectly for the data that we have. This function will
be maximally overfit. But we have no idea how well it will work on new data, it will likely
be very badly because we may never see the same examples again.
The data is not enough. You can predict anything you like. And this would be naive
assume nothing about the problem.
In practice we are not naive. There is an underlying problem and we are interested in an
accurate approximation of the function. There is a double exponential number of
possible classifiers in the number of input states. Finding a good approximate for the
function is very difficult.
There are classes of hypotheses that we can try. That is the form that the solution may
take or the representation. We cannot know which is most suitable for our problem
before hand. We have to use experimentation to discover what works on the problem.
Training example: a sample from x including its output from the target function
Target function: the mapping function f from x to f(x)
Hypothesis: approximation of f, a candidate function.
Concept: A boolean target function, positive examples and negative examples
for the 1/0 class values.
Classifier: Learning program outputs a classifier that can be used to classify.
Learner: Process that creates the classifier.
Hypothesis space: set of possible approximations of f that the algorithm can
create.
Version space: subset of the hypothesis space that is consistent with the
observed data.
Key issues in machine learning:
Search procedure
Direct computation: No search, just calculate what is needed.
Local: Search though the hypothesis space to refine the hypothesis.
Constructive: Build the hypothesis piece by piece.
Timing
Eager: Learning performed up front. Most algorithms are eager.
Lazy: Learning performed at the time that it is needed
Online vs Batch
Online: Learning based on each pattern as it is observed.
Batch: Learning over groups of patterns. Most algorithms are batch.