Skip to content

martin-fabbri/reinforcement-learning-playground

Repository files navigation

Reinforcement Learning Playground

What is reinforcement learning?

  • Reinforcement learning is a branch of machine learning.
  • Involves an agent and environment.
  • Agents learns optimal for maximizing rewards.

When should we worry about sequential decision making?

Limited supervision: you know what you want, but not how to get it.

Late consequences?

Why learn RL?

  • Not just for games
  • Make optimal decisions
  • Maximize efficiency

What are RL applications?

  • Robotics
  • Self-driving cars
  • Inventory management
  • Finantial investments
  • Decision-based situations

RL terminology

What is the agent?

  • The agent is the algorithm
  • Decides which action to tale
  • Agent monitors the environment
  • Who is learning
  • It's only outcome are decisions(actions, controls)

What is an environment?

  • The environment is everything the agent can interact with.
  • Agent's actions affect the environment.
  • It responds to actor's actions with consequences(observations, rewards estimation)

What is a state?

  • The state is a representation of what the agent can sense.
  • Does not always involve the entire environment. It's limited to what the agent can sense.

What is an action?

  • An action is what an agent can do is a given state.
  • Actions are limited by the environment.
  • The action's goal is to maximize reward.

What is the reward?

  • Result from making an action.
  • Feedback from the environment.
  • It can be positive or negative.
  • Helps encourage or discourage certain actions, policies or behaivours.
  • Is what the agent tries to optimize.
  • Rewards are hard to formulate.

Where do rewards come from?

  • When playing video games, rewards come from scores.

Are there other forms of supervision?

  • Learning from demostrations.

    • Directly copying observed behavior.
    • Inferring rewards from observed behavior.
  • Learning from observing the world.

    • Learning to predict.
    • Unsupervised Learning
  • Learning from other tasks

    • Transfer learning

What is the standard reinforcement loop?

  • TODO

What is Deep Reinforcement Learning?

  • Deep learning: end-to-end training of expressive, multi-layer models.
  • Deep models are what allow RL algorithms to solve complex problems end-to-end.

Why Deep Reinforcement Learning?

  • Deep = can process complex sensory input

What can deep learning & RL do well now?

  • Adquire high degree of proficiency in domains governed by simple, known rules.
  • Learn simple skills with raw sensory inputs, given enough experience.
  • Learn from imitating enough human-provided expert behavior.

What has proven challenging so far?

  • Humans can learn incredibly quickly
  • Humans can reuse past knowledge
    • Transfer learning in deep RL is an open problem
  • Not clear what the reward function should be

How do we build intelligent machines?

Learning as the basis of intelligence.

  • Some things we can all do.
  • Some things we can only learn.
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy