Module 2 (Part 2)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 136

Game

s
• Games typically confront the agent with a
competitive (adversarial) environment affected by
an opponent (strategic environment).
• Games are episodic.
• We will focus on planning for
• two-player zero-sum games with
• deterministic game mechanics and
• perfect information (i.e., fully observable
environment).

• We call the two players:


1) Max tries to maximize his utility.
2) Min tries to minimize Max’s utility since it is
a zero-sum game.
Definition of a
Game
• Definition:
𝑠0 The initial state (position, board, hand).
𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠) Legal moves in state 𝑠.
𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎) Transition model.
𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠) Test for terminal states.
𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠) Utility for player Max for terminal
states.

• State space: a graph defined by the initial state and


the transition function containing all reachable states
(e.g., chess positions).
• Game tree: a search tree superimposed on the state
space. A complete game tree follows every sequence
from the current state to the terminal state (the game
ends).
Game Playing

Why do AI researchers study game playing?

1.It’s a good reasoning problem, formal and nontrivial.

2.Direct comparison with humans and other computer


programs is easy.

3
What Kinds of Games?
Mainly games of strategy with the following
characteristics:

1. Sequence of moves to play


2. Rules that specify possible moves
3. Rules that specify a payment for each
move
4. Objective is to maximize your payment
4
Games vs. Search
Problems
• Unpredictable opponent specifying a move
for every possible opponent reply

• Time limits unlikely to find goal, must


approximate

5
Two-Player
Opponent’s Move Game

Generate New Position

Game yes
Over?
no
Generate Successors

Evaluate Successors

Move to Highest-Valued Successor

no yes Game
Over?

6
Games as Adversarial
• States: Search
– board configurations
• Initial state:
– the board position and which player will move
• Successor function:
– returns list of (move, state) pairs, each indicating a
legal move and the resulting state
• Terminal test:
– determines when the game is over
• Utility function:
– gives a numeric value in terminal states (e.g., -1, 0, +1
for loss, tie, win)
7
Example:
Tic-tac-toe
𝑠0 Empty board.
𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠) Play empty squares.
𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎) Symbol (x/o) is placed on empty square.
𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠) Did a player win or is the game a draw?
𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠) +1 if x wins, -1 if o wins and 0 for a
draw. Utility is only defined for terminal
states.
Here player x is Max
and player o is Min.
Note: This game still uses a goal-based agent that
plans actions to reach a winning terminal state!
Tic-tac-toe: Partial Game
Note: This game state /
has no cycles!
Tree node
# of nodes

action / 1
result

redundant
path 9×8
The
state
However, the complete game tree
space
is much larger because the same
size
state (board) can be reached in
(numb
different subtrees (redundant
Terminal states er of
have a known
paths). The game tree here is a
possib
utility little smaller than:
le
1 + 9 × 8+ 9 × 8 × 7 + ⋯ 9!
boards
= 986,409 nodes ) is
much
smalle
Optimal
Decisions
Minimax Search and Alpha-Beta
Pruning
Exact Methods
• Model as nondeterministic actions: The
opponent is seen as part of an environment with
nondeterministic actions. Non-determinism is the
result of the unknown moves by the opponent.
We consider all possible moves by the
opponent.

Methods • Find optimal decisions: Minimax search and


Alpha-Beta pruning where each player plays
for optimal to the end of the game.
Heuristic Methods
Adversaria (game tree is too large)

l Games •Heuristic Alpha-Beta Tree Search:


a. Cut off game tree and use heuristic for
utility.
b. Forward Pruning: ignore poor moves.
• Monte Carlo Tree search: Estimate utility of a
state by simulating complete games and
average the utility.
Idea: Minimax
Decision
• Assign each state 𝑠 a minimax value that reflects the
utility realized if both players play optimally from 𝑠 to the
end of the game:
𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑠 if 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙
max 𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 (𝑠) if 𝑚𝑜𝑣𝑒 =
𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑠 𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑀𝑎𝑥
= 𝑠 𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 if 𝑚𝑜𝑣𝑒 = 𝑀𝑖𝑛
min
𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠
𝑠

• This is a recursive definition which can be solved from


terminal states backwards.
• The optimal decision for Max is the action that leads to
the state with the largest minimax value. That is the
largest possible utility if both players play optimally.
Minimax Search:
Back-up Minimax
Values
Pick action that leads to the largest MV

MV MV MV MV MV MV MV MV MV
min
1 MV MV Determine MVs using a bottom-
max up strategy

0 1 1 •Max always picks the action that


min has the largest value.
… •Min always picks the action
that has the smallest value.

= minimax value (MV)


Approach: Follow tree to each
terminal node and back up
minimax value.

Note: This is just a


generalization of the AND-OR
Tree Search and returns the first
action of the conditional plan.

Represent
s OR
Search
Find the action that
leads to the best value.

Represents
AND
Search
Exercise: Simple 2-Ply
Max
Game
MV
𝑎1 𝑎2 𝑎3

MV MV MV
Min

𝑎1 𝑎3 𝑎1 𝑎3
𝑎1 𝑎3
𝑎2 𝑎2 𝑎2

Utility for Max 2 0 5 -5 -2 7 5 -7 4


Utility for Min

• What are the terminal state utilities for Min?


• Compute all MV (minimax values).
• How do we traverse the game tree? What is the Big-O notation for time and
space?
b: max branching factor
m: max depth of tree
Issue: Game Tree
Size
• Minimax search traverses the complete game tree using DFS!

Space complexity: 𝑂 𝑏𝑚
Time complexity: 𝑂 𝑏𝑚

• Fast solution is only feasible for very simple games with small branching factor!

• Example: Tic-tac-toe
𝑏 = 9, 𝑚 = 9 → 𝑂 99 = 𝑂(387,420,489)
𝑏 decreases from 9 to 8, 7, … the actual size is smaller than:
19 9×8 9 × 8 × 7 … 9! = 986,409 nodes

• We need to reduce the search space! → Game tree pruning


Alpha-Beta
Pruning
• Idea: Do not search parts of the tree if they do not make a
difference to the outcome.

• Observations:
• min(3, 𝑥, 𝑦) can never be more than 3
• max(5, min(3, 𝑥, 𝑦, … )) does not depend on the values of 𝑥 or 𝑦.
• Minimax search applies alternating min and max.

• Approach: maintain bounds for the minimax value


[𝛼, 𝛽] and prune subtrees (i.e., don’t follow actions) that do
not affect the current minimax value bound.
• Alpha is used by Max and means “𝑀𝑖𝑛𝑖𝑚𝑎𝑥(𝑠) is at least 𝛼.”
• Beta is used by Min and means “𝑀𝑖𝑛𝑖𝑚𝑎𝑥(𝑠) is at most 𝛽.”
Example: Alpha-Beta
Max updates α
[𝛼, β]
Max
Max Search (utility is at least)

Min Min Min updates 𝛽


(utility is at most)

Utility cannot be
more than 2 in the
Max Max subtree, but we
𝑣=3 already can get 3
Min Min from the first
𝑣≤2 subtree. Prune the
rest.

𝑣=3
[ 3, +∞ ] Max Max Once a subtree is
𝑣=2 fully evaluated,
Min Min the interval has a
length of 0
(𝛼 = 𝛽).
Minimax algorithm
Adversarial analogue of DFS

29
= minimax search + pruning

// v is the minimax
value

Found a better action?


Abandon subtree if Max finds
an actions that has more value
than the best-known move Min
has in another subtree.

Found a better action?


Abandon subtree if Min finds an
actions that has less value than
the best-known move Max has
in another subtree.
Move Ordering for Alpha-Beta
Search
• Idea: Pruning is more effective if good alpha-beta bounds can
be found in the first few checked subtrees.

• Move ordering for DFS = Check good moves for Min and Max
first.

• We need expert knowledge or some heuristic to determine


what
a good move is.

• Issue: Optimal decision algorithms still scale poorly even


when
using alpha-beta pruning with move ordering.
Game Tree (2-player,
Deterministic,
computer’s Turns)
turn

opponent’s
turn

computer’s The computer is Max.


turn
The opponent is Min.
opponent’s
turn
At the leaf nodes, the
leaf nodes utility function
are evaluated is employed. Big value
means good, small is
ba8d.
Mini-Max
Terminology
• move: a move by both players
• ply: a half-move
• utility function: the function applied to leaf nodes
• backed-up value
– of a max-position: the value of its largest successor
– of a min-position: the value of its smallest successor
• minimax procedure: search down several levels; at
the bottom level apply the utility function, back-up
values all the way up to the root node, and that
node selects the move.
9
Minimax
• Perfect play for deterministic games
• Idea: choose move to position with highest minimax value
= best achievable payoff against best play
• E.g., 2-ply game:

10
Exercise: Simple 2-Ply
Max
Game
[𝛼, 𝛽]
𝑎1 𝑎2 𝑎3

[𝛼, 𝛽] [𝛼, 𝛽] [𝛼, 𝛽]


Min

𝑎1 𝑎3 𝑎1 𝑎3
𝑎1 𝑎3
𝑎2 𝑎2 𝑎2

Utility for Max 2 -5 5 7 0 2 5 -7 -4

• Find the [𝛼, 𝛽] intervals for all nodes.


• What part of the tree can be pruned?
• What would be the optimal move ordering?
80 30 25 55 20 05 65 40 10 70 15 50 45 60 75
35 1
1
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
12
80

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
13
30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
14
30

30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
15
30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
16
30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
17
30

30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
18
30

30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
19
30

30 20

30 25 20

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
20
30

30 20

30 25 20 05

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
21
30

30 20

30 25 20 05

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
22
30

30 20

30 25 20 05

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
23
20

20

30 20

30 25 20 05

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
24
20

20 15

30 20 15 60

30 25 20 05 10 15 45 60

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
25
20

20 15

30 20 15 60

30 25 20 05 10 15 45 60

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
26
20

20 15

30 20 15 60

30 25 20 05 10 15 45 60

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
27
Minimax
Strategy
• Why do we take the min value every other
level of the tree?

• These nodes represent the opponent’s


choice of move.

• The computer assumes that the human will


choose that move that is of least value to the
computer.
28
Properties of
• Complete?
Minimax
– Yes (if tree is finite)
• Optimal?
– Yes (against an optimal opponent)
– No (does not exploit opponent weakness against suboptimal opponent)
• Time complexity?
– O(bm)
• Space complexity?
– O(bm) (depth-first exploration)

30
Good Enough?
• Chess:
– branching factor b≈35

– game length m≈100

– search space bm ≈ 35100 ≈ 10154

• The Universe:
– number of atoms ≈ 1078

– age ≈ 1018 seconds

– 108 moves/sec x 1078 x 1018 = 10104

• Exact solution completely infeasible


31
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
32
30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
33
30

30 25

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
34
30 Do we need to check
this node?

30 25

80 30 25 ?? 55 20 05 65 40 10 70 15 50 45 60 75
35
30 No - this branch is guaranteed to be worse
than what max already has

30 25

80 30 25
55 20 05 65 40 10 70 15 50 45 60 75
36
??
30

30 20
Do we need to check
this node?

30 25 20 05

80 30 25
55 20 05 ?? 40 10 70 15 50 45 60 75
37
35
30

30 20

30 25 20 05

80 30 25 20 05
55 40 10 70 15 50 45 60 75
38
35 ??
Alpha-Beta
• The alpha-beta procedure can speed up a
depth-first minimax search.
• Alpha: a lower bound on the value that a max
node may ultimately be assigned
v>α

• Beta: an upper bound on the value that a


minimizing node may ultimately be assigned
v<β

39
Alpha-Beta
MinVal(state, alpha, beta){ if
(terminal(state))
return utility(state); for (s in
children(state)){
child = MaxVal(s,alpha,beta); beta
= min(beta,child);
if (alpha>=beta) return child;
}
return best child (min); }

alpha = the highest value for MAX along the path


beta = the lowest value for MIN along the path
40
Alpha-Beta
MaxVal(state, alpha, beta){ if
(terminal(state))
return utility(state); for (s in
children(state)){
child = MinVal(s,alpha,beta);
alpha = max(alpha,child);
if (alpha>=beta) return child;
}
return best child (max); }

alpha = the highest value for MAX along the path


beta = the lowest value for MIN along the path
41
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-

β=∞

α=-

β=∞

α=-

β=∞

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
42
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-

β=∞

α=-

β=∞

α=-∞ 80
β=80

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
43
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-

β=∞

α=-

β=∞

α=-∞
30
β=30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
44
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-

β=∞

α=30
β=∞
30

α=-∞
30
β=30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
45
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-

β=∞

α=30
β=∞
30

α=30
β=∞
α=-
∞ 30
β=30

80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
46
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-

β=∞

α=30
β=∞
30

β≤α
α=30
β=25 prune!
α=- 25
∞ 30
β=30

25
80 30 55 20 05 65 40 10 70 15 50 45 60 7547
35 7547
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-∞

30
β=30

α=30
β=∞
30

α=30
β=25
α=- 25
∞ 30
β=30

25
80 30 55 20 05 65 40 10 70 15 50 45 60 7548
35 7548
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-∞

30
β=30

α=30 α=-∞
β=∞ β=30
30

α=30
β=25
α=- α=-∞
25
∞ 30 β=30
β=30

25
80 30 55 20 05 65 40 10 70 15 50 45 60 7549
35 7549
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-∞

30
β=30

α=30 α=20
β=∞ β=30 20
30

α=30 α=20
β=25 β=30
α=- α=-∞ 20
25
∞ 30 β=20
β=30

25
80 30 55 20 05 65 40 10 70 15 50 45 60 7550
35 7550
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-∞

30
β=30

α=30 α=20
β=∞ β=30 20
30

α=30 α=20
β=25 β=05
α=- α=-∞ 20
25 05
∞ 30 β=20
β=30

25
80 30 55 20 05 65 40 10 70 15 50 45 60 7551
35 7551
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-∞

30
β=30

α=30 α=20
β=∞ β=30 20
30

β≤α
α=30 α=20
β=25 β=05 prune!
α=- α=-∞ 20
25 05
∞ 30 β=20
β=30

25 05
80 30 55 20 40 10 70 15 50 45 60 7552
35 65 7552
α=-
α - the best value ∞
for max along the path β - β=∞
the best value
for min along the path

α=-∞

20
β=20

α=30 α=20
β=∞ β=30 20
30

α=30 α=20
β=25 β=05
α=- α=-∞ 20 05
25
∞ 30 β=20
β=30

25 05
80 30 55 20 40 10 70 15 50 45 60 7553
35 65 7553
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path

α=-∞

20
β=20

α=30 α=20
β=∞ β=30 20
30

α=30 α=20
β=25 β=05
α=- α=-∞ 20 05
25
∞ 30 β=20
β=30

25 05
80 30 55 20 40 10 70 15 50 45 60 7554
35 65 7554
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path

α=20
20
β=∞

α=20
30 20 β=∞

α=20
30 25 20 05 β=∞

25 05
80 30 55 20 40 10 70 15 50 45 60 7555
35 65 7555
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path

α=20
20
β=∞

α=20
30 20 β=∞

α=20
30 25 20 05 10
β=10

25 05
80 30 55 20 40 10 70 15 50 45 60 7556
35 65 7556
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path

α=20
20
β=∞

α=20
30 20 10 β=∞

α=20
30 25 20 05 10
β=10

25 05
80 30 55 20 40 10 70 15 50 45 60 7557
35 65 7557
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path

α=20
20
β=∞

α=20
30 20 10 β=∞

α=20
α=20 β=15
30 25 20 05 10 15
β=10

25 05
80 30 55 20 40 10 70 15 50 45 60 7558
35 65 7558
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path

α=20
20
β=∞

α=20
30 20 15 β=∞

α=20
α=20 β=15
30 25 20 05 10 15
β=10

25 05
80 30 55 20 40 10 70 15 50 45 60 7559
35 65 7559
α=20
α - the best value 20 β=∞
for max along the path β -
the best value
for min along the path

α=20
20 15
β=15

α=20
30 20 15 β=∞

α=20
α=20 β=15
30 25 20 05 10 15
β=10

25 05
80 30 55 20 40 10 70 15 50 45 60 7560
35 65 7560
α=20
α - the best value 20 β=∞
for max along the path β - β≤α
the best value prune!
for min along the path

α=20
20 15
β=15

α=20
30 20 15 β=∞ X
α=20
α=20 β=15
30 25 20 05
β=10
10 15
XX

25 05
80 30 55 20 40 10 70 15
X 45 X
50 X 60 X
35 65 75 61
Bad and Good Cases for Alpha-Beta

Pruning
Bad: Worst moves encountered first
4 MAX
+ + +
2 34 MIN
+----+----+ +----+----+ +----+----+
6 4 7 5 3 8 6 4 MAX
2 +--+ +-+-+ +--+ +--+ +--+ +--+ +--+--+ 6
+--+ +--+
5 4 3 2 1137 4 5 2 3 8 2 1 61
2 4
• Good: Good moves ordered first
4 MAX
+ + +
43 2 MIN
++ + + + + + +
4 6 + 8 3 x x 2 x MAX
x +--+ +--+
+--+ +--+
4 +-+-+
2 6 1 x2 81 x 3
2
• If we can order moves, we can get more benefit from alpha-beta
Properties of α-β
• Pruning does not affect final result. This means that it gets the
exact same result as does full minimax.

• Good move ordering improves effectiveness of pruning

• With "perfect ordering," time complexity = O(bm/2)


doubles depth of search

• A simple example of reasoning about ‘which computations


are relevant’ (a form of metareasoning)

63
m/2
Why O(b )?
Let T(m) be time complexity of search for depth m

Normally:
T(m) = b.T(m-1) + c T(m) = O(bm)

With ideal α-β pruning:


T(m) = T(m-1) + (b-1)T(m-2) + c T(m) = O(bm/2)

64
Node Ordering
Iterative deepening search
Use evaluations of the previous search for order

Also helps in returning a move in given time

65
Good Enough?
• Chess: The universe
– branching factor can play chess
b≈35 - can we?

– game length m≈100


– search space bm/2 ≈ 3550 ≈
1077
•The Universe:
– number of atoms ≈ 1078
– age8 ≈ 1018 seconds 78
– 10 moves/sec x 10 x 1018 = 66
Cutting off
Search
MinimaxCutoff is identical to MinimaxValue except
1.Terminal? is replaced by Cutoff?
2.Utility is replaced by Eval

Does it work in practice? bm = 106, b=35 m=4

4ply lookahead is a hopeless chess player!


– 4-ply ≈ human novice
– 8-ply ≈ typical PC, human master
– 12-ply ≈ Deep Blue, Kasparov

67
Cutoff

55
80 30 25 35 20 05 65 40 10 70 15 50 45 60 75
68
0

0 0

0 0 Cutoff 0 0

55
80 30 25 35 20 05 65 40 10 70 15 50 45 60 75
69
Evaluation
Functions
Tic Tac
Toe
• Let p be a position in the game
• Define the utility function f(p) by
– f(p) =
• largest positive number if p is a win for computer
• smallest negative number if p is a win for opponent
• RCDC – RCDO
– where RCDC is number of rows, columns and diagonals in
which computer could still win
– and RCDO is number of rows, columns and diagonals in
which opponent could still win.

70
Sample
Evaluations
• X = Computer; O = Opponent

O O O X
X X X

X O XO
rows rows
cols cols
diags diags
71
Evaluation

functions
For chess/checkers, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wm fm(s)

e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens),
etc.

72
Example: Samuel’s Checker-Playing
Program
• It uses a linear evaluation function f(n) =
w1f1(n) + w2f2(n) + ... + wmfm(n)

For example: f = 6K + 4M + U
– K = King Advantage
– M = Man Advantage
– U = Undenied Mobility Advantage (number of
moves that Max where Min has no jump moves)

73
Samuel’s Checker
Player
• In learning mode

– Computer acts as 2 players: A and B


– A adjusts its coefficients after every move
– B uses the static utility function
– If A wins, its function is given to B

74
Samuel’s Checker

Player
How does A change its function?
Coefficent replacement
(node) = backed-up value(node) – initial value(node)
if > 0 then terms that contributed positively are given more
weight and terms that contributed negatively get less weight
if < 0 then terms that contributed negatively are given more
weight and terms that contributed positively get less weight

75
Chess: Rich history of cumulative ideas
Minimax search, evaluation function learning (1950).

Alpha-Beta search (1966).


Transposition Tables (1967). Iterative deepening DFS

(1975).

End game data bases ,singular extensions(1977,

1980) Parallel search and evaluation(1983 ,1985)

Circuitry (1987)
77
Chess game
tree

78
I
Problem with fixed depth Searches
if we only search n moves ahead,
it may be possible that the
catastrop hy can be delayed by a
sequence of moves that do not
make any progress

also work s in other direction ves


(good mo may not be found)

79
Problems with a fixed ply: The Horizon
Effect

Lose queen Lose pawn


The “look ahead horizon”
Lose queen!!!

•Inevitable losses are


postponed
•Unachievable goals
appear achievable
•Short-term gains
mask unavoidable
Solutions
• How to counter the horizon effect
– Feedover
• Do not cut off search at non-quiescent board positions
(dynamic positions)
• Example, king in danger
• Keep searching down that path until reach quiescent
(stable) nodes
– Secondary Search
• Search further down selected path to ensure this is the
best move
Quiescence Search

This involves searching past the terminal search nodes


(depth of 0) and testing all the non-quiescent or 'violent'
moves until the situation becomes calm, and only then apply
the evaluator.

Enables programs to detect long capture sequences


and calculate whether or not they are worth initiating.

Expand searches to avoid evaluating a position where


tactical disruption is in progress.

82
Additional
Refinements
• Probabilistic Cut: cut branches probabilistically
based on shallow search and global depth-level
statistics (forward pruning)

• Openings/Endgames: for some parts of the game


(especially initial and end moves), keep a catalog of
best moves to make.

• Singular Extensions: find obviously good moves and


try them at cutoff.
83
End-Game Databases

• Ken Thompson - all 5 piece


end-games
• Lewis Stiller - all 6 piece end-games
– Refuted common chess wisdom: many
positions thought to be ties were really
forced wins -- 90% for white
– Is perfect chess a win for white?

84
The MONSTER

White wins in 255 moves


(Stiller, 1991)
85
Deterministic Games in

Practice
Checkers: Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. Used a precomputed endgame database defining perfect
play for all positions involving 8 or fewer pieces on the board, a total of
444 billion positions. Checkers is now solved!

• Chess: Deep Blue defeated human world champion Garry Kasparov in a


six-game match in 1997. Deep Blue searches 200 million positions per
second, uses very sophisticated evaluation, and undisclosed methods for
extending some lines of search up to 40 ply. Current programs are even
better, if less historic!

• Othello: human champions refuse to compete against computers, who


are too good.

• Go: until recently, human champions refused to compete against


computers, who were too bad. In Go, b > 300, so most programs use
pattern knowledge bases to suggest plausible moves, along with
aggressive pruning. In 2016, DeepMind’s AlphaGo defeated Lee Sedol 4-1 to
86
end the human reign.
Game of Go
human champions refused to
compete against computers, because
software used to be too bad.
Chess Go
Size of board 8x8 19 x 19
Average no. of 100 300
moves per game
Avg branching 35 235
factor per turn
Additional Players can
complexity
pass
87
AlphaGo (2016)
• Combination of
– Deep Neural Networks
– Monte Carlo Tree
Search

• More details later.

88
Other Games
deterministic chance

chess, checkers,
perfect backgammon,
go, othello
information monopoly

imperfect bridge, poker,


stratego
information scrabble
90
Games of Chance
• What about games that involve chance,
such as
– rolling dice
– picking a card
•Use three kinds of
nodes: ∇ ∇ ∇
min
– max nodes chance
– min nodes max
– chance nodes
91
Games of Chance
Expectiminimax
c chance node with
max children

d1 di dk

S(c,di)

expectimax(c) =
∑P(di)
max(backed-up-value
(s))
i s in S(c,di)
92
Example Tree with
max
Chance
chance
.4 .6
min ∇ 1.2

chance
.4 .6 .4 .6
max

leaf 3 5 1 4 1
2 4 5

93
Complexity
• Instead of O(bm), it is O(bmnm) where n is the
number of chance outcomes.

• Since the complexity is higher (both time and


space), we cannot search as deeply.

• Pruning algorithms may be applied.

94
Imperfect
Information
• E.g. card games, where opponents’
initial cards unknown ar
e
• Idea: For all deals consistent with what
you can see
– compute the minimax value of available
actions for each of possible deals
– compute the expected value over all deals

95
Status of AI Game
• Tic Tac Toe
Players
• Poker
– Tied for best player in world – 2015, Heads-up limit hold'em poker
is solved
• Othello
• Checkers
– Computer better than any human
– Human champions now refuse to – 1994, Chinook ended 40-year reign
play computer of human champion Marion Tinsley
• Scrabble • Chess
– Maven beat world champions Joel – 1997, Deep Blue beat human
Sherman and Matt Graham champion Gary Kasparov in six-
game match
• Backgammon – Deep Blue searches 200M
– 1992, Tesauro combines 3-ply positions/second, up to 40 ply
search & neural networks (with 160 – Now looking at other applications
hidden units) yielding top-3 player
(molecular dynamics, drug
• Bridge • Go synthesis)
– Gib ranked among top players in
the world – 2016, Deepmind’s AlphaGo
defeated Lee Sedol & 2017
Summary
• Games are fun to work on!

• They illustrate several important points about AI.

• Perfection is unattainable must approximate.

• Game playing programs have shown the world


what AI can do.

97
Constraint
Satisfaction
Problems
Constraint + variables
can have
satisfaction problems no value!

(CSPs)
Definition:
•State is defined by a set of variables Xi (= factored state description)
• Each variable can have a value from domain Di or be unassigned (partial solution).

•Constraints are a set of rules specifying allowable combinations of values for subsets of variables (e.g.,𝑋1 ≠
𝑋7 or 𝑋2 > 𝑋9 + 3)

•Solution: a state that is a


a)Consistent assignment : satisfies all constraints
b)Complete assignment: assigns value to each variable

Differences: ”generic” tree search:


•Atomic states (variables are only used to create human readable labels or calculate heuristics)
•States are always complete assignments.
•Constrains are implicit in the transition function.
Differences: Local search
•Factored representation to find local moves.
•Always complete assignments.
•Constraints may not be met.

General-purpose algorithms for CSP with more power than standard search algorithms exit.
Example: Map Coloring (Graph
coloring)
Problem Constraint graph

• Variables representing state: WA, NT, Q, NSW, V, SA, T


• Variable Domains: {red, green, blue}
• Constraints: adjacent regions must have different colors
e.g.,
WA ≠ NT ⇔ (WA, NT) in {(red, green), (red, blue),
(green, red), (green, blue), (blue, red), (blue, green)}
Example: Map Coloring

Solutions are complete and consistent assignments,


e.g.,

WA = red, NT = green, Q = red, NSW = green, V = red, SA


= blue, T = green
Example: N-Queens

• Variables: 𝑋𝑖𝑗 for 𝑖, 𝑗 ∈ {1, 2, … ,


X
j ij
𝑁}
• Domains: {0, 1} # Queen: no/yes

• Constraints:
i
Σ X =N
i,j ij

(Xij, Xik) ∈ {(0, 0), (0, 1), (1, 0)} # cannot be in same col.
(Xij, Xkj) ∈ {(0, 0), (0, 1), (1, 0)} # cannot be in same row. for 𝑖, 𝑗, 𝑘 ∈ {1, 2, … ,
𝑁}
(Xij, Xi+k, j+k) ∈ {(0, 0), (0, 1), (1, 0)} # cannot be diagonal
(Xij, Xi+k, j–k) ∈ {(0, 0), (0, 1), (1, 0)} # cannot be diagonal
N-Queens: Alternative formulation
Q1Q2 Q3 Q4
• Variables: 𝑄1, 𝑄2, … , 𝑄𝑁
• Domains: {1, 2, … , 𝑁} # row for each
4
col.
3
• Constraints:
2
∀ i, j non-threatening (Qi , Qj)
1

Example:
Q1 = 2, Q2 = 4, Q3 = 1, Q4 = 3
Example: Cryptarithmetic Puzzle
• Variables: T, W, O, F, U, Given Puzzle:
R Find values for the
X1 , X 2 letters. Each letter stands
• Domains: {0, 1, 2, …, 9} for a different digit.
• Constraints:
Alldiff(T, W, O, F, U, R) O + O =
R + 10 * X1
W + W + X1 = U + 10 * X2 T + T +
X2 = O + 10 * F
T ≠ 0, F ≠ 0
Example: Sudoku

• Variables: Xij
• Domains: {1, 2, …, 9}
• Constraints:
Alldiff(Xij in the same unit)
Alldiff(Xij in the same row) X
ij
Alldiff(Xij in the same
column)
Some Popular Types of CSPs
•Boolean Satisfiability Problem (SAT)
•Find variable assignments that makes a
Boolean expression (often expressed in
conjunctive normal form) evaluate as true.

NP-complete
• (x1 ∨ ¬x2) ∧ (¬x1 ∨ x2 ∨ x3) ∧ ¬x1 = True
•Integer Programming
•Variables are restricted to integers. Find a
feasible solution that satisfies all constraints.
The traveling salesman problem can be
expressed as an integer program.

•Linear Programming
Real-world CSPs

•Assignment problems
e.g., who teaches what class for a fixed schedule. Teacher cannot
be in two classes at the same time!
•Timetable problems
e.g., which class is offered when and where? No two classes in the
same room at the same problem.
•Scheduling in transportation and production (e.g., order
of production steps).
•Many problems can naturally also be formulated as
CSPs.

•More examples of CSPs: http://www.csplib.org/


CSP as a Standard Search
Formulation
State:
•Values assigned so far
Initial state:
•The empty assignment { } (all variables are unassigned)
Successor function:
•Choose an unassigned variable and assign it a value that
does not violate any constraints
•Fail if no legal assignment is found
Goal state:
•Any complete and consistent assignment.
Backtracking search

•In CSP’s, variable assignments are commutative


For example,
[WA = red then NT = green] is the same as
[NT = green then WA = red]. Order is not important

•We can build a search tree that assigns the value to one
variable per level.
• Tree depth n (number of variables)
• Number of leaves: dn (d is the number of values per variable)

•Depth-first search for CSPs with single-variable


assignments
Example: Backtracking search
(DFS)

fail
Backtracking search
algorithm

Call: Recursive-Backtracking({}, csp)


Improving backtracking efficiency:
•Which variable should be assigned next?
•In what order should its values be tried?
•Can we detect inevitable failure early?
Which variable should be
assigned next? In which order
should
• Most constrained its values be tried?
variable:
• Keep track of remaining legal values for unassigned variables (using
constraints)
• Choose the variable with the fewest legal values left
• A.k.a. minimum remaining values (MRV) heuristic

• Choose the least constraining value:


• The value that rules out the fewest values in the remaining variables
Early detection of failure –
Forward checking Node
consistency
•Keep track of remaining legal values for unassigned
variables
•Terminate search when any variable has no legal values (i.e.,
minimum remaining values = 0)

Stop and backtrack

•NT and SA cannot both be blue! This violates the


constraint.
Early detection of failure –
Forward checking Arc consistency
•X is arc consistent wrt Y iff for every value of X there is some
allowed value of Y.
•Make X arc consistent wrt Y by throwing out any values of X for
which there is no allowed value of Y.
1. NWS cannot be blue
because SA has to be
blue.
2. V cannot be red because
NSW has to be red.
3. SA cannot be blue
because NT is blue.
4. Fail and backtrack

•Arc consistency detects failure earlier than node consistency


•There are more consistency checks (path consistency, K-
consistency)
Backtracking search with inference

Call: Recursive-Backtracking({},
csp)
If (inference(csp, var, assignment) == failure) return
failure
# Check consistency here (called “inference”) and backtrack if we know that the
branch will lead to failure.
Local search
for CSPs
CSP algorithms allow incomplete states, but only if they satisfy all
constraints.

Local Search (e.g., Hill-climbing and simulated annealing) works only with
“complete” states, i.e., all variables assigned, but we can allow states with
unsatisfied constraints.

Attempt to improve states by the min-conflicts heuristic:


1.Select a conflicted variable and
2.Choose a new value that produces violates the fewest constraints (local improvement
step)
3.Repeat till all constraints are met.

Local search is often very effective for


CSPs.
Summary
• CSPs are a special type of search problem:
• States are structured and defined by a set of variables and
values
assignments
• Variables can be unassigned
• Goal test defined by
• Consistency with constraints
• Completeness of assignment

• Backtracking search = depth-first search where a successor state is


generated by a consistent value assignment to a single unassigned
variable
• Starts with {} and only considers consistent assignments.
• Variable ordering and value selection heuristics can help significantly
• Forward checking prevents assignments that guarantee later failure

• Local search can be used to search the space of all complete

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy