Reinforcement Learning

Reinforcement Learning
The study of thinking.

1) Problem-Solving
2) Reasoning
Perception Memory Thinking/Cognition
Sensation Encoding
Retrieval
---------------------------------------------------------------------
Low Level Higher Level
Thinking is a higher-level cognitive process that requires all

sorts of cognitive operations (e.g. attention, perception,
memory, language) and is often a conscious, controlled
process
Should we wait until we understand the lower-level

processes first? Research in higher-level cognition might
inform research at lower-level cognition and vice-versa.
The study of thinking
Modern view:
• Thinking is an internal cognitive process
• The exact nature of these processes cannot be observed

directly from behavior
• However, most cognitive theories lead to testable

predictions. Behavioral experiments can test these
predictions. Cognitive processes are inferred indirectly
from behavior.
Well-defined & Ill-defined Problems
Well-defined problems have completely specified initial
conditions, goals, and operators  works well with computer
simulation
Ill-defined problems have some aspects which are not

completely specified  sometimes requires insight to see
problem in a new way
1. Writing a good paper = ?

2. solving an algebra problem = ?
3. conducting a statistical significance test = ?
4. designing a good experiment = ?
5. choosing a president = ?
6. reducing drunk driving = ?
7. being a nice person = ?
Well-defined problem solving - given state
- goal state
- obstacles
- operators
INITIAL STATE
GOAL STATE
INITIAL STATE GOAL STATE
Play the game: http://www.mazeworks.com/hanoi/

problem solving strategies
How to solve the maze?

- trial and error
- forward
- backward
- means-end analysis
• Most problem solving situations involves a combination
of planning (means-end analysis), trial and error, and
reinforcement learning and perhaps ... insight
• Reinforcement learning  grew out of behaviorism

• Insight  Gestaltists view
• Planning  grew out of AI and cognitive psychology
Learning by Reinforcement
Associationist theories of thinking -> thinking as response learning
R3
S
R2
R1
Three elements of associationist theory:
1) stimulus: a problem solving situation
2) response: a particular problem solving behavior
3) associations: strength between stimulus and response
Thorndike’s work on cats in a puzzle box
• Cats initially solved the puzzle box problem by trial and

error – trying various responses until one accidentally
worked
• After being placed in the box many times, it learned the
successful response and pulled the string almost
immediately
Habit Family Hierarchy
Try most dominant

response first, then
second strongest, etc.
1) Law of exercise: practice tends to increase S-R link
2) Law of effect: responses that solve a problem

increase in strength. Responses that do not help
solve problem lose strength
R3
S
R2
R1
• What about response chains?
• E.g.:
start
goal
• How can path from initial state to goal state be

strengthened? How to avoid dead-ends?
• How can we reward a successful action that only much

later in time leads to success?  problem of delayed
reinforcement
• Modern reinforcement learning involves passing

strengths of successful responses back through a chain.
Maze example
• Reinforcement learning example for mazes

Reinforcement Learning
• Behavior follows simple associations in response chains.

No planning, no mental maps, no “insight”
• Learning from very simple feedback: failure or success
• Associative strengths between response chains are
learned. Passing strength back in time
start
goal
Demo’s
Reinforcement learning in mazes:
http://www.ise.pw.edu.pl/~cichosz/rl-java/
Reinforcement learning in robot-arm control:

http://www.fe.dis.titech.ac.jp/~gen/robot/robodemo.html
Robot learning task of pole-balancing and devilsticking:

http://www-clmc.usc.edu/movies/learning.html
Some Amazing Anagrams
Original Becomes...
Dormitory Dirty Room
Desperation A Rope Ends It
The Morse Code Here Come Dots
Slot Machines Cash Lost in 'em
Animosity Is No Amity
Snooze Alarms Alas! No More Z's
Alec Guinness Genuine Class
Semolina Is No Meal
The Public Art Galleries Large Picture Halls, I Bet
A Decimal Point I'm a Dot in Place
The Earthquakes That Queer Shake
Eleven plus two Twelve plus one
Contradiction Accord not in it
To be or not to be: that is the question, whether In one of the Bard's best-thought-of tragedies,
tis nobler in the mind to suffer the slings and our insistent hero, Hamlet, queries on two
arrows of outrageous fortune. fronts about how life turns rotten.
"That's one small step for a man, one giant leap A thin man ran; makes a large stride; left planet,
for mankind." -- Neil A. Armstrong pins flag on moon! On to Mars!
Stimulus Response
(a new letter combination)
gorwn S R1 grown
R2 wrong
R3 wrgno
R4 …
Anagram solving time depends on:

- familiarity of goal word
- letter transition probability of goal word
- letter transition probability of presented word
- number of moves
Class Experiment
• Replicate effect of familiarity

Ready...?
• nrdki
» (drink 7.0)
• aewtr
» (water 3.0)
• cahtb
» (batch 16.0)
• milbc
» (climb 7.5)
• kcler
» (clerk 17.5)
• rtypa
» (party 14.0)
• huocg
» (cough 23.5)
• rmcap
» (cramp 12.0)
• nrdki
» (drink 7.0)
• aewtr
» (water 3.0)
• cahtb
» (batch 16.0) Mean solution times:
• milbc
» (climb 7.5)
High familiarity = 7.9 sec
• kcler
Low familiarity = 17.3 sec
» (clerk 17.5)
• rtypa
» (party 14.0)
• huocg
» (cough 23.5)
• rmcap
» (cramp 12.0)
• Can all thinking be described by trial and error/ stimulus-
response?
• What about insight?  Gestaltist view
• What about planning?  AI view

The Handcuffs Puzzle
The Set-Up For this puzzle you need two people, some rope and some
empty space to do the puzzle in. Each person will need a piece of rope
with a loop tied in both ends, so it can be worn as handcuffs. The rope
should be reasonably long, so that the person wearing it can easily step
over it if they want.
Each person puts on a complete set of handcuffs. Before putting them
on, they loop their handcuffs around each other so they are tied
together. Each person should wear a complete set of handcuffs. They
then have to get themselves apart while following these rules:
The handcuffs cannot be removed.

Do not break, cut, saw through, bite
through or in any other way damage
the rope. Damaging each other is
probably a bad idea too.
content copied from: http://ccins.camosun.bc.ca/~jbritton/jbhandcuff.htm

Reinforcement Learning

Uploaded by

Copyright:

Available Formats

Reinforcement Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reinforcement Learning

Uploaded by

Copyright:

Available Formats

Reinforcement Learning

The study of thinking.

Thinking is a higher-level cognitive process that requires all

Should we wait until we understand the lower-level

• Thinking is an internal cognitive process

• The exact nature of these processes cannot be observed

• However, most cognitive theories lead to testable

Ill-defined problems have some aspects which are not

1. Writing a good paper = ?

INITIAL STATE GOAL STATE

Play the game: http://www.mazeworks.com/hanoi/

How to solve the maze?

• Reinforcement learning  grew out of behaviorism

• Cats initially solved the puzzle box problem by trial and

Try most dominant

2) Law of effect: responses that solve a problem

• How can path from initial state to goal state be

• How can we reward a successful action that only much

• Modern reinforcement learning involves passing

• Reinforcement learning example for mazes

• Behavior follows simple associations in response chains.

Reinforcement learning in robot-arm control:

Robot learning task of pole-balancing and devilsticking:

Anagram solving time depends on:

• Replicate effect of familiarity

• What about insight?  Gestaltist view

• What about planning?  AI view

The handcuffs cannot be removed.

content copied from: http://ccins.camosun.bc.ca/~jbritton/jbhandcuff.htm

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.