Module 2 (Part 2)
Module 2 (Part 2)
Module 2 (Part 2)
s
• Games typically confront the agent with a
competitive (adversarial) environment affected by
an opponent (strategic environment).
• Games are episodic.
• We will focus on planning for
• two-player zero-sum games with
• deterministic game mechanics and
• perfect information (i.e., fully observable
environment).
3
What Kinds of Games?
Mainly games of strategy with the following
characteristics:
5
Two-Player
Opponent’s Move Game
Game yes
Over?
no
Generate Successors
Evaluate Successors
no yes Game
Over?
6
Games as Adversarial
• States: Search
– board configurations
• Initial state:
– the board position and which player will move
• Successor function:
– returns list of (move, state) pairs, each indicating a
legal move and the resulting state
• Terminal test:
– determines when the game is over
• Utility function:
– gives a numeric value in terminal states (e.g., -1, 0, +1
for loss, tie, win)
7
Example:
Tic-tac-toe
𝑠0 Empty board.
𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠) Play empty squares.
𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎) Symbol (x/o) is placed on empty square.
𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠) Did a player win or is the game a draw?
𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠) +1 if x wins, -1 if o wins and 0 for a
draw. Utility is only defined for terminal
states.
Here player x is Max
and player o is Min.
Note: This game still uses a goal-based agent that
plans actions to reach a winning terminal state!
Tic-tac-toe: Partial Game
Note: This game state /
has no cycles!
Tree node
# of nodes
action / 1
result
redundant
path 9×8
The
state
However, the complete game tree
space
is much larger because the same
size
state (board) can be reached in
(numb
different subtrees (redundant
Terminal states er of
have a known
paths). The game tree here is a
possib
utility little smaller than:
le
1 + 9 × 8+ 9 × 8 × 7 + ⋯ 9!
boards
= 986,409 nodes ) is
much
smalle
Optimal
Decisions
Minimax Search and Alpha-Beta
Pruning
Exact Methods
• Model as nondeterministic actions: The
opponent is seen as part of an environment with
nondeterministic actions. Non-determinism is the
result of the unknown moves by the opponent.
We consider all possible moves by the
opponent.
MV MV MV MV MV MV MV MV MV
min
1 MV MV Determine MVs using a bottom-
max up strategy
Represent
s OR
Search
Find the action that
leads to the best value.
Represents
AND
Search
Exercise: Simple 2-Ply
Max
Game
MV
𝑎1 𝑎2 𝑎3
MV MV MV
Min
𝑎1 𝑎3 𝑎1 𝑎3
𝑎1 𝑎3
𝑎2 𝑎2 𝑎2
Space complexity: 𝑂 𝑏𝑚
Time complexity: 𝑂 𝑏𝑚
• Fast solution is only feasible for very simple games with small branching factor!
• Example: Tic-tac-toe
𝑏 = 9, 𝑚 = 9 → 𝑂 99 = 𝑂(387,420,489)
𝑏 decreases from 9 to 8, 7, … the actual size is smaller than:
19 9×8 9 × 8 × 7 … 9! = 986,409 nodes
• Observations:
• min(3, 𝑥, 𝑦) can never be more than 3
• max(5, min(3, 𝑥, 𝑦, … )) does not depend on the values of 𝑥 or 𝑦.
• Minimax search applies alternating min and max.
Utility cannot be
more than 2 in the
Max Max subtree, but we
𝑣=3 already can get 3
Min Min from the first
𝑣≤2 subtree. Prune the
rest.
𝑣=3
[ 3, +∞ ] Max Max Once a subtree is
𝑣=2 fully evaluated,
Min Min the interval has a
length of 0
(𝛼 = 𝛽).
Minimax algorithm
Adversarial analogue of DFS
29
= minimax search + pruning
// v is the minimax
value
• Move ordering for DFS = Check good moves for Min and Max
first.
opponent’s
turn
10
Exercise: Simple 2-Ply
Max
Game
[𝛼, 𝛽]
𝑎1 𝑎2 𝑎3
𝑎1 𝑎3 𝑎1 𝑎3
𝑎1 𝑎3
𝑎2 𝑎2 𝑎2
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
13
30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
14
30
30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
15
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
16
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
17
30
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
18
30
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
19
30
30 20
30 25 20
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
20
30
30 20
30 25 20 05
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
21
30
30 20
30 25 20 05
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
22
30
30 20
30 25 20 05
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
23
20
20
30 20
30 25 20 05
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
24
20
20 15
30 20 15 60
30 25 20 05 10 15 45 60
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
25
20
20 15
30 20 15 60
30 25 20 05 10 15 45 60
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
26
20
20 15
30 20 15 60
30 25 20 05 10 15 45 60
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
27
Minimax
Strategy
• Why do we take the min value every other
level of the tree?
30
Good Enough?
• Chess:
– branching factor b≈35
• The Universe:
– number of atoms ≈ 1078
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
33
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
34
30 Do we need to check
this node?
30 25
80 30 25 ?? 55 20 05 65 40 10 70 15 50 45 60 75
35
30 No - this branch is guaranteed to be worse
than what max already has
30 25
80 30 25
55 20 05 65 40 10 70 15 50 45 60 75
36
??
30
30 20
Do we need to check
this node?
30 25 20 05
80 30 25
55 20 05 ?? 40 10 70 15 50 45 60 75
37
35
30
30 20
30 25 20 05
80 30 25 20 05
55 40 10 70 15 50 45 60 75
38
35 ??
Alpha-Beta
• The alpha-beta procedure can speed up a
depth-first minimax search.
• Alpha: a lower bound on the value that a max
node may ultimately be assigned
v>α
39
Alpha-Beta
MinVal(state, alpha, beta){ if
(terminal(state))
return utility(state); for (s in
children(state)){
child = MaxVal(s,alpha,beta); beta
= min(beta,child);
if (alpha>=beta) return child;
}
return best child (min); }
α=-
∞
β=∞
α=-
∞
β=∞
α=-
∞
β=∞
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
42
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-
∞
β=∞
α=-
∞
β=∞
α=-∞ 80
β=80
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
43
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-
∞
β=∞
α=-
∞
β=∞
α=-∞
30
β=30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
44
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-
∞
β=∞
α=30
β=∞
30
α=-∞
30
β=30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
45
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-
∞
β=∞
α=30
β=∞
30
α=30
β=∞
α=-
∞ 30
β=30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
46
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-
∞
β=∞
α=30
β=∞
30
β≤α
α=30
β=25 prune!
α=- 25
∞ 30
β=30
25
80 30 55 20 05 65 40 10 70 15 50 45 60 7547
35 7547
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-∞
30
β=30
α=30
β=∞
30
α=30
β=25
α=- 25
∞ 30
β=30
25
80 30 55 20 05 65 40 10 70 15 50 45 60 7548
35 7548
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-∞
30
β=30
α=30 α=-∞
β=∞ β=30
30
α=30
β=25
α=- α=-∞
25
∞ 30 β=30
β=30
25
80 30 55 20 05 65 40 10 70 15 50 45 60 7549
35 7549
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-∞
30
β=30
α=30 α=20
β=∞ β=30 20
30
α=30 α=20
β=25 β=30
α=- α=-∞ 20
25
∞ 30 β=20
β=30
25
80 30 55 20 05 65 40 10 70 15 50 45 60 7550
35 7550
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-∞
30
β=30
α=30 α=20
β=∞ β=30 20
30
α=30 α=20
β=25 β=05
α=- α=-∞ 20
25 05
∞ 30 β=20
β=30
25
80 30 55 20 05 65 40 10 70 15 50 45 60 7551
35 7551
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-∞
30
β=30
α=30 α=20
β=∞ β=30 20
30
β≤α
α=30 α=20
β=25 β=05 prune!
α=- α=-∞ 20
25 05
∞ 30 β=20
β=30
25 05
80 30 55 20 40 10 70 15 50 45 60 7552
35 65 7552
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path
α=-∞
20
β=20
α=30 α=20
β=∞ β=30 20
30
α=30 α=20
β=25 β=05
α=- α=-∞ 20 05
25
∞ 30 β=20
β=30
25 05
80 30 55 20 40 10 70 15 50 45 60 7553
35 65 7553
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path
α=-∞
20
β=20
α=30 α=20
β=∞ β=30 20
30
α=30 α=20
β=25 β=05
α=- α=-∞ 20 05
25
∞ 30 β=20
β=30
25 05
80 30 55 20 40 10 70 15 50 45 60 7554
35 65 7554
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path
α=20
20
β=∞
α=20
30 20 β=∞
α=20
30 25 20 05 β=∞
25 05
80 30 55 20 40 10 70 15 50 45 60 7555
35 65 7555
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path
α=20
20
β=∞
α=20
30 20 β=∞
α=20
30 25 20 05 10
β=10
25 05
80 30 55 20 40 10 70 15 50 45 60 7556
35 65 7556
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path
α=20
20
β=∞
α=20
30 20 10 β=∞
α=20
30 25 20 05 10
β=10
25 05
80 30 55 20 40 10 70 15 50 45 60 7557
35 65 7557
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path
α=20
20
β=∞
α=20
30 20 10 β=∞
α=20
α=20 β=15
30 25 20 05 10 15
β=10
25 05
80 30 55 20 40 10 70 15 50 45 60 7558
35 65 7558
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path
α=20
20
β=∞
α=20
30 20 15 β=∞
α=20
α=20 β=15
30 25 20 05 10 15
β=10
25 05
80 30 55 20 40 10 70 15 50 45 60 7559
35 65 7559
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path
α=20
20 15
β=15
α=20
30 20 15 β=∞
α=20
α=20 β=15
30 25 20 05 10 15
β=10
25 05
80 30 55 20 40 10 70 15 50 45 60 7560
35 65 7560
α=20
α - the best value 20 β=∞
for max along the path β - β≤α
the best value prune!
for min along the path
α=20
20 15
β=15
α=20
30 20 15 β=∞ X
α=20
α=20 β=15
30 25 20 05
β=10
10 15
XX
25 05
80 30 55 20 40 10 70 15
X 45 X
50 X 60 X
35 65 75 61
Bad and Good Cases for Alpha-Beta
•
Pruning
Bad: Worst moves encountered first
4 MAX
+ + +
2 34 MIN
+----+----+ +----+----+ +----+----+
6 4 7 5 3 8 6 4 MAX
2 +--+ +-+-+ +--+ +--+ +--+ +--+ +--+--+ 6
+--+ +--+
5 4 3 2 1137 4 5 2 3 8 2 1 61
2 4
• Good: Good moves ordered first
4 MAX
+ + +
43 2 MIN
++ + + + + + +
4 6 + 8 3 x x 2 x MAX
x +--+ +--+
+--+ +--+
4 +-+-+
2 6 1 x2 81 x 3
2
• If we can order moves, we can get more benefit from alpha-beta
Properties of α-β
• Pruning does not affect final result. This means that it gets the
exact same result as does full minimax.
63
m/2
Why O(b )?
Let T(m) be time complexity of search for depth m
Normally:
T(m) = b.T(m-1) + c T(m) = O(bm)
64
Node Ordering
Iterative deepening search
Use evaluations of the previous search for order
65
Good Enough?
• Chess: The universe
– branching factor can play chess
b≈35 - can we?
67
Cutoff
55
80 30 25 35 20 05 65 40 10 70 15 50 45 60 75
68
0
0 0
0 0 Cutoff 0 0
55
80 30 25 35 20 05 65 40 10 70 15 50 45 60 75
69
Evaluation
Functions
Tic Tac
Toe
• Let p be a position in the game
• Define the utility function f(p) by
– f(p) =
• largest positive number if p is a win for computer
• smallest negative number if p is a win for opponent
• RCDC – RCDO
– where RCDC is number of rows, columns and diagonals in
which computer could still win
– and RCDO is number of rows, columns and diagonals in
which opponent could still win.
70
Sample
Evaluations
• X = Computer; O = Opponent
O O O X
X X X
X O XO
rows rows
cols cols
diags diags
71
Evaluation
•
functions
For chess/checkers, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wm fm(s)
e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens),
etc.
72
Example: Samuel’s Checker-Playing
Program
• It uses a linear evaluation function f(n) =
w1f1(n) + w2f2(n) + ... + wmfm(n)
For example: f = 6K + 4M + U
– K = King Advantage
– M = Man Advantage
– U = Undenied Mobility Advantage (number of
moves that Max where Min has no jump moves)
73
Samuel’s Checker
Player
• In learning mode
74
Samuel’s Checker
•
Player
How does A change its function?
Coefficent replacement
(node) = backed-up value(node) – initial value(node)
if > 0 then terms that contributed positively are given more
weight and terms that contributed negatively get less weight
if < 0 then terms that contributed negatively are given more
weight and terms that contributed positively get less weight
75
Chess: Rich history of cumulative ideas
Minimax search, evaluation function learning (1950).
(1975).
Circuitry (1987)
77
Chess game
tree
78
I
Problem with fixed depth Searches
if we only search n moves ahead,
it may be possible that the
catastrop hy can be delayed by a
sequence of moves that do not
make any progress
79
Problems with a fixed ply: The Horizon
Effect
82
Additional
Refinements
• Probabilistic Cut: cut branches probabilistically
based on shallow search and global depth-level
statistics (forward pruning)
84
The MONSTER
88
Other Games
deterministic chance
chess, checkers,
perfect backgammon,
go, othello
information monopoly
d1 di dk
S(c,di)
expectimax(c) =
∑P(di)
max(backed-up-value
(s))
i s in S(c,di)
92
Example Tree with
max
Chance
chance
.4 .6
min ∇ 1.2
∇
chance
.4 .6 .4 .6
max
leaf 3 5 1 4 1
2 4 5
93
Complexity
• Instead of O(bm), it is O(bmnm) where n is the
number of chance outcomes.
94
Imperfect
Information
• E.g. card games, where opponents’
initial cards unknown ar
e
• Idea: For all deals consistent with what
you can see
– compute the minimax value of available
actions for each of possible deals
– compute the expected value over all deals
95
Status of AI Game
• Tic Tac Toe
Players
• Poker
– Tied for best player in world – 2015, Heads-up limit hold'em poker
is solved
• Othello
• Checkers
– Computer better than any human
– Human champions now refuse to – 1994, Chinook ended 40-year reign
play computer of human champion Marion Tinsley
• Scrabble • Chess
– Maven beat world champions Joel – 1997, Deep Blue beat human
Sherman and Matt Graham champion Gary Kasparov in six-
game match
• Backgammon – Deep Blue searches 200M
– 1992, Tesauro combines 3-ply positions/second, up to 40 ply
search & neural networks (with 160 – Now looking at other applications
hidden units) yielding top-3 player
(molecular dynamics, drug
• Bridge • Go synthesis)
– Gib ranked among top players in
the world – 2016, Deepmind’s AlphaGo
defeated Lee Sedol & 2017
Summary
• Games are fun to work on!
97
Constraint
Satisfaction
Problems
Constraint + variables
can have
satisfaction problems no value!
(CSPs)
Definition:
•State is defined by a set of variables Xi (= factored state description)
• Each variable can have a value from domain Di or be unassigned (partial solution).
•Constraints are a set of rules specifying allowable combinations of values for subsets of variables (e.g.,𝑋1 ≠
𝑋7 or 𝑋2 > 𝑋9 + 3)
General-purpose algorithms for CSP with more power than standard search algorithms exit.
Example: Map Coloring (Graph
coloring)
Problem Constraint graph
• Constraints:
i
Σ X =N
i,j ij
(Xij, Xik) ∈ {(0, 0), (0, 1), (1, 0)} # cannot be in same col.
(Xij, Xkj) ∈ {(0, 0), (0, 1), (1, 0)} # cannot be in same row. for 𝑖, 𝑗, 𝑘 ∈ {1, 2, … ,
𝑁}
(Xij, Xi+k, j+k) ∈ {(0, 0), (0, 1), (1, 0)} # cannot be diagonal
(Xij, Xi+k, j–k) ∈ {(0, 0), (0, 1), (1, 0)} # cannot be diagonal
N-Queens: Alternative formulation
Q1Q2 Q3 Q4
• Variables: 𝑄1, 𝑄2, … , 𝑄𝑁
• Domains: {1, 2, … , 𝑁} # row for each
4
col.
3
• Constraints:
2
∀ i, j non-threatening (Qi , Qj)
1
Example:
Q1 = 2, Q2 = 4, Q3 = 1, Q4 = 3
Example: Cryptarithmetic Puzzle
• Variables: T, W, O, F, U, Given Puzzle:
R Find values for the
X1 , X 2 letters. Each letter stands
• Domains: {0, 1, 2, …, 9} for a different digit.
• Constraints:
Alldiff(T, W, O, F, U, R) O + O =
R + 10 * X1
W + W + X1 = U + 10 * X2 T + T +
X2 = O + 10 * F
T ≠ 0, F ≠ 0
Example: Sudoku
• Variables: Xij
• Domains: {1, 2, …, 9}
• Constraints:
Alldiff(Xij in the same unit)
Alldiff(Xij in the same row) X
ij
Alldiff(Xij in the same
column)
Some Popular Types of CSPs
•Boolean Satisfiability Problem (SAT)
•Find variable assignments that makes a
Boolean expression (often expressed in
conjunctive normal form) evaluate as true.
NP-complete
• (x1 ∨ ¬x2) ∧ (¬x1 ∨ x2 ∨ x3) ∧ ¬x1 = True
•Integer Programming
•Variables are restricted to integers. Find a
feasible solution that satisfies all constraints.
The traveling salesman problem can be
expressed as an integer program.
•Linear Programming
Real-world CSPs
•Assignment problems
e.g., who teaches what class for a fixed schedule. Teacher cannot
be in two classes at the same time!
•Timetable problems
e.g., which class is offered when and where? No two classes in the
same room at the same problem.
•Scheduling in transportation and production (e.g., order
of production steps).
•Many problems can naturally also be formulated as
CSPs.
•We can build a search tree that assigns the value to one
variable per level.
• Tree depth n (number of variables)
• Number of leaves: dn (d is the number of values per variable)
fail
Backtracking search
algorithm
Call: Recursive-Backtracking({},
csp)
If (inference(csp, var, assignment) == failure) return
failure
# Check consistency here (called “inference”) and backtrack if we know that the
branch will lead to failure.
Local search
for CSPs
CSP algorithms allow incomplete states, but only if they satisfy all
constraints.
Local Search (e.g., Hill-climbing and simulated annealing) works only with
“complete” states, i.e., all variables assigned, but we can allow states with
unsatisfied constraints.