Deep Reinforcement Learning
Deep Reinforcement Learning
CS 294 - 112
Course logistics
Class Information & Resources
Sergey Levine Kate Rakelly Greg Kahn Sid Reddy Michael Chang Soroush Nasiriany
Instructor Head GSI GSI GSI GSI uGSI
consequences
observations
rewards
Levine*, Finn*, et al. ‘16
What is deep RL, and why should we care?
standard
features mid-level features classifier
computer
(e.g. HOG) (e.g. DPM) (e.g. SVM)
vision
Felzenszwalb ‘08
end-to-end training
deep
learning
standard
reinforcement
learning
features
? more features
? linear policy
or value func.
action
Action
(run away)
action
sensorimotor loop
Action
(run away)
Example: robotics
robotic state
modeling & low-level
control observations estimation
prediction
planning
control
controls
pipeline (e.g. vision)
tiny, highly specialized tiny, highly specialized
“visual cortex” “motor cortex”
no direct supervision
actions have consequences
decisions (actions)
Cathy Wu
Why should we study this now?
Tesauro, 1995
L.-J. Lin, “Reinforcement learning for robots using neural networks.” 1993
Why should we study this now?
Auditory
Cortex
Human echolocation (sonar)
observations
education one would obtain the
actions
adult brain.
- Alan Turing
environment