0% found this document useful (0 votes)
25 views

Full Text 01

Uploaded by

nehal1103sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Full Text 01

Uploaded by

nehal1103sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

DEGREE PROJECT IN TECHNOLOGY,

FIRST CYCLE, 15 CREDITS


STOCKHOLM, SWEDEN 2021

A comparison of two tree-search


based algorithms for playing 3-
dimensional Connect Four

DAVID AVELLAN-HULTMAN

EMIL GUNNBERG QUERAT

KTH ROYAL INSTITUTE OF TECHNOLOGY


SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
A comparison of two
tree-search based algorithms
for playing 3-dimensional
Connect Four

DAVID AVELLAN-HULTMAN
EMIL GUNNBERG QUERAT

Degree project, first cycle (15 hp)


Date: June 27, 2021
Supervisor: Johan Jansson
Examiner: Pawel Herman
School of Electrical Engineering and Computer Science
Swedish title: En jämförelse av två trädsöksalgoritmer för att spela
tredimensionellt Fyra i rad
iii

Abstract
This thesis aims to investigate general game-playing by conducting a compar-
ison between the well-known methods Alpha-beta Pruning and Monte Carlo
Tree Search in a new context, namely a three-dimensional version of the game
Connect Four. The methods are compared by conducting a tournament with
instances of both methods at varying levels of allowed search extent and mea-
suring the performance as a function of the average thinking time taken per
move. Alpha-beta Pruning proves to clearly be the stronger method at sub-
0.1 second thinking times. However, Monte Carlo Tree Search seems to scale
better with increased thinking time and overtakes Alpha-beta Pruning as the
better method at thinking times of about 10 seconds, in this experiment. This
study is a contribution to the body of knowledge on how these methods per-
form in the context of general game-playing, but further comparisons of the
methods with regard to varying game complexity, game-specific heuristics and
augmentations of the methods are needed to make any definite generalizations.
iv

Sammanfattning
Denna uppsats syftar till att undersöka generella tekniker för att spela spel
genom en jämförelse mellan de välkända metoderna Alpha-beta Pruning och
Monte Carlo Tree Search i en tredimensionell version av spelet Fyra i rad.
Metoderna jämförs genom en turnering med instanser av båda metoderna av
varierande tillåten sökvidd och deras prestanda mäts som en funktion av den
genomsnittliga betänketiden per drag. Alpha-beta Pruning är tydligt den bätt-
re metoden vid mindre än 0.1 sekunders betänketid. Monte Carlo Tree Search
verkar däremot skala bättre med ökad betänketid och blir den bättre metoden
vid betänketider på cirka 10 sekunder i detta experiment. Denna studie är ett
bidrag till förståelsen om hur dessa metoder presterar i allmänhet, men ytter-
ligare jämförelser av metoderna med avseende på varierande spelkomplexitet,
spelspecifika heuristiker och förbättringar av metoderna krävs för några be-
stämda generaliseringar ska kunna göras.
Contents

1 Introduction 1
1.1 Aim and Research Topic . . . . . . . . . . . . . . . . . . . . 2
1.2 Scope and Approach . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 4
2.1 The 3D Connect 4 game . . . . . . . . . . . . . . . . . . . . 4
2.2 Minimax and Alpha-beta Pruning . . . . . . . . . . . . . . . 6
2.3 Monte Carlo Tree Search . . . . . . . . . . . . . . . . . . . . 7
2.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Method 9
3.1 Implementation and execution . . . . . . . . . . . . . . . . . 9
3.2 Players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Alpha-beta Pruning . . . . . . . . . . . . . . . . . . . 9
3.2.2 Monte Carlo Tree Search . . . . . . . . . . . . . . . . 10
3.3 Tournament . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Playing speed measurement . . . . . . . . . . . . . . 11

4 Results 13
4.1 Results overview . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Result matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Player performance . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Convergence of win rate . . . . . . . . . . . . . . . . . . . . 16
4.5 Game repetition . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Discussion 18
5.1 Analyzing the ratings . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Generalizability . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Potential issues . . . . . . . . . . . . . . . . . . . . . . . . . 19

v
vi CONTENTS

5.3.1 Randomness and repetition . . . . . . . . . . . . . . . 19


5.3.2 Edge-case effects . . . . . . . . . . . . . . . . . . . . 20
5.3.3 Method comparability . . . . . . . . . . . . . . . . . 20
5.4 Future research . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Bibliography 23
Chapter 1

Introduction

The task of designing game-playing algorithms has been studied since the early
days of computers. For simple games like Tic Tac Toe, a computer can eas-
ily search through all future possibilities and always be able to play the best
move. However, for most games, such as Chess or Go, the state space is far too
large to search through by brute force, and the central question about playing
these games then becomes about how to play them as well as possible given
restrictions on the time the algorithm is allowed to run.
Instead of brute force, other methods of game-play have been developed.
Most traditional approaches to playing board games search through a smaller
portion of future possibilities starting from the current position, employing
game-specific heuristics and various speed-up techniques such as Alpha-beta
Pruning. One such example is IBM’s Deep Blue computer [1] which used
these methods in 1997 to defeat the then-reigning World Chess Champion in a
six-game match. There has since been a growing interest in developing general
algorithms for game-playing, which work well for many different games with-
out the need for game-specific heuristics [2]. An example of an algorithm that
has been used in systems that attempt general game-playing is the Monte Carlo
Tree Search algorithm, which conducts numerous random play-outs in order
to estimate the likelihood of winning for each possible next move [3]. This
method has been used as a component of DeepMind’s program AlphaZero,
together with a deep neural network, to achieve superhuman performance in
Go, Chess and Shogi [4]. This advancement is a great achievement for general
game-play, however, it requires great computational resources for training. Not
all applications of game-playing algorithms, such as simple AI opponents in
computer games, require the sophistication of AlphaZero or have access to re-
sources like DeepMind’s. Instead, the generalizability of Alpha-beta Pruning

1
2 CHAPTER 1. INTRODUCTION

and Monte Carlo Tree Search as standalone game-playing algorithms remains


to be explored. By comparing the two methods in a multitude of settings,
generalizations can be made about the methods and selection be made easier
when deciding between the two in a novel setting. As a contribution to the
general understanding of these methods, this study compares the methods in
a specific setting, namely in the game of 3D Connect 4, a 4-by-4-by-4 version
of the classic game. The game has a large state space, but has a finite number
of actions for each state, making it a suitable setting for both algorithms in
question. Moreover, the game has not been studied previously to the authors’
knowledge.

1.1 Aim and Research Topic


This study investigates how well tree-search based methods perform in the
game of 3-dimensional Connect 4. Specifically, the goal of the study is to
evaluate the performance of the two distinct methods Alpha-beta Pruning and
Monte Carlo Tree Search with regard to thinking time, as determined by search
depth and number of play-outs for the respective methods.

1.2 Scope and Approach


The goal is achieved by implementing players running Alpha-beta Pruning
with varying search depths and players running Monte Carlo Tree Search with
varying number of play-outs. No game-specific heuristics or hyper-parameter
configurations of the algorithms are explored. The playing strengths are as-
sessed by conducting a tournament of 3D Connect 4 where each round consists
of a match-up between every pair of different players. The results are tallied
and considered together with the average thinking time for each player.

1.3 Thesis outline


This report prefaces the experiment of tree-search based methods by first defin-
ing the game of 3D Connect 4 and then describing the generalized methods
Alpha-beta Pruning and Monte Carlo Tree Search. Moreover, the background
presents how the two methods have been analyzed in general game-playing
and the results of the comparison between them.
Thereafter, the experiment of this study is presented, including:
CHAPTER 1. INTRODUCTION 3

• A description of the game implementation.

• The different players that were studied and how Alpha-beta Pruning and
Monte Carlo Tree Search were implemented.

• The structure of the tournament

• How playing speed was measured

The results of the tournament are presented in terms of individual match-


up results and overall rankings in relation to average thinking times.
To conclude the report, the ratings are analyzed, the generalizability of the
results is discussed, potential issues of the method are presented and topics of
future research are suggested.
Chapter 2

Background

2.1 The 3D Connect 4 game


In this study, the term 3D Connect 4 will be used to refer to a variation of the
common Connect 4 game, played on a 4-by-4-by-4 board in three dimensions.
It is a game between two players, here represented as blue and red, played on
16 vertical columns arranged in a 4-by-4 square, each holding a maximum of
four tokens, see figure 2.1. The board is initially empty. The players will then
alternate (blue moves first) adding a token of their colour to the top of the pile
in a column of their choice, as long as the column is not yet fully occupied.
This continues until one player wins, or all columns are full without either
player winning, in which case the game is a draw. A player wins immediately
upon placing a token that together with three other tokens of the player’s colour
lie on a straight line. This can happen in five essentially distinct ways, again,
see figure 2.1.
In order to algorithmically play the game of 3D Connect 4 with tree-search
based approaches, a game tree representation of the game must be defined.
The game tree consists of nodes representing board states where the root is the
initial game state and the terminal nodes being either win or draw states and
edges representing token placements, i.e. transitions between board states.
The game tree is finite, having a maximum branching factor of 16 at each
depth level and a maximum depth of 64. As figure 2.2 shows, a specific player
will make a move every second move, i.e. at every second depth level of the
tree. The figure also shows two examples of terminal nodes, the bottom-left
node being a win state for blue and the bottom-right node being a win state for
red.
Using this representation of the game, tree-search algorithms can be used

4
CHAPTER 2. BACKGROUND 5

Figure 2.1: Examples of terminal states of 3D Connect 4.

Figure 2.2: Game tree representation of 3D Connect 4.


6 CHAPTER 2. BACKGROUND

to search for winning future states and thus determine the next move of the
current player. Two different methods of tree-search will be presented in the
following sections.

2.2 Minimax and Alpha-beta Pruning


Minimax is the name of a method for computing the result of a game start-
ing from any position s in a finite game tree assuming perfect play from both
players, given the result values R(t) for the terminal states t, usually with the
convention that the first player favours higher R(t), and that the second player
favours lower R(t), such as assigning final states the value R(t) = 1 if the first
player won, R(t) = 0 on a draw, and R(t) = −1 if the second player won.
However, in zero-sum games such as Chess, Go, and indeed 3D Connect 4,
where the values for both players are always opposites of each other, Minimax
may be simplified by always considering the utility U (s) of a state from the
point-of-view of the current player in the state s, which is instead known as
Negamax. Terminal states are considered from the point of view of the player
who would have made the next move if the game instead had gone on. Letting
R0 (t) be the value of a terminal state t to the player who did not make the last
move, i.e. R0 (t) = −1 if either player won, and R0 (t) = 0 on a draw, Negamax
may be described recursively as follows:
(
R0 (s) if s is terminal
U (s) =
max −U (t) otherwise
child t of s

If using Negamax to determine what moves to play, any move which transi-
tions to a state t that achieves the maximum −U (t) is optimal. However since
naïvely computing this requires traversing the whole exponentially-sized game
tree, there is usually a depth cutoff used, i.e. states at a certain depth from the
start node s0 of the computation are considered terminal and assigned a heuris-
tic utility value.
A commonly used speed-up technique for Negamax (and Minimax as well)
is Alpha-beta Pruning, a method which allows certain sections of the game tree
to be excluded when calculating Negamax utility values if they can be proven
to not affect the final outcome [5]. This is done by always maintaining upper
and lower bounds on the final answer (α and β). Pseudocode for limited-depth
Negamax with Alpha-beta pruning (in later sections only referred to as Alpha-
beta pruning) is given in algorithm 1.
CHAPTER 2. BACKGROUND 7

Algorithm 1 Limited-depth Negamax with Alpha-beta pruning


1: . Initially call as Negamax(s0 , −∞, ∞, d0 )
2: function Negamax(s, α, β, d)
3: if s is terminal then return R0 (s)
4: if d = 0 then return Heuristic(s)
5: ans ← α
6: for each child t of s do
7: ans ← max(ans, Negamax(t, −β, −ans, d − 1)
8: if ans ≥ β then return ans
9: return ans

2.3 Monte Carlo Tree Search


Monte Carlo tree search is another algorithm for exploring a game tree. It
fundamentally differs from Minimax in the way nodes are visited: instead of
simply visiting a number of nodes all once, Monte Carlo Tree Search performs
a number of play-outs all starting from the root node s0 (i.e. the current state),
aiming to focus its exploration on the more promising branches of the game
tree. [3] Each such play-out consists of four stages:

1. Selection: Starting from the root node s0 , select child nodes by repeat-
edly making moves that at this point are are in some way optimal (see
below), until either an unvisited node is reached or the game ends, i.e.
at a node with no children.

2. Expansion: If an unvisited node was reached, add it to the tree.

3. Simulation: Evaluate the predicted result starting from the reached node.
Terminal nodes have fixed results, but for non-terminal nodes the re-
sult is estimated by performing random moves until a terminal state is
reached.

4. Backpropagation: Update the nodes on the path taken in the first step
with the information gained from the result or estimated result.

The main problem in Monte Carlo Tree Search is how to choose which
child node to explore. Kocsis and Szepesvári [6] suggest picking the move i
which has the highest upper confidence bound on its predicted reward, which
is calculated as
8 CHAPTER 2. BACKGROUND

r
ln N
U CT = vi + c ,
ni
where vi is the previously estimated value of the child node corresponding
to the move i, N is the number of times the current node has been reached, ni
is the number of times the child node corresponding to the move i has been
reached, and c is a hyper-parameter corresponding to the degree of exploration
in the search.

2.4 Related work


Clune [7] has studied the generalizability of game-playing techniques. In
particular, a comparison between Alpha-beta Pruning and Monte Carlo Tree
Search showed that branching factor (the number of moves per state) and time
allowed per move was shown to affect the match-up results, giving advantage
to Monte Carlo Tree Search in games of high branching factor and limited
time allowed per move. Alpha-beta Pruning proved to excel when the branch-
ing factor was low, or when sufficient time was provided to search and find
strong moves.
Additionally, Clune compared Alpha-beta Pruning with Monte Carlo Tree
Search in variations of 2D Connect 4 (7x6 and 8x6 board size). In six games,
Alpha-beta Pruning dominated and won all six games of the 7x6 version, while
winning only four out of six games in the 8x6 version. Clune argues that
both the branching factor and thinking time allowed affects the performance
of game-playing algorithms. Evidently increasing the branching factor by just
1 does affect Alpha-beta Pruning negatively. However, the experiment played
a limited number of games and the reliability of these results is questionable.
Therefore, additional research into how branching factor and thinking time
affects the two methods is warranted.
Chapter 3

Method

3.1 Implementation and execution


The game of 3D Connect 4, the different players and the tournament pro-
gram were implemented in C++, and the tournament results were presented
using Python and Matplotlib. All the source code for the game implementa-
tion and analysis can be found at https://github.com/astianthus/
connect4_3d. The experiment carried out in this study was run on a Intel
Core i5-7200U CPU over multiple sessions until the target number of rounds
had been completed.

3.2 Players
The experiment compared 14 different players: 8 Alpha-beta Pruning players
with different search depths, 5 Monte Carlo Tree Search players with different
numbers of play-outs, and one completely random player as a baseline. The
selection of players was solely based on the average thinking times, choosing
players such that the run time of the experiment did not grow too long. Play-
ers of increased search depth and play-out counts were added until average
thinking times passed 20 seconds. The players are enumerated in table 3.1.

3.2.1 Alpha-beta Pruning


The Alpha-beta Pruning players use the Alpha-beta Negamax algorithm de-
scribed in section 2.2 as a subroutine when selecting a move, by computing
the value of the state after each legal move, and then uniformly and randomly

9
10 CHAPTER 3. METHOD

Player Description
Random Selects a random move every time
ABP-1 Alpha-beta Pruning with max depth 1
ABP-2 Alpha-beta Pruning with max depth 2
ABP-3 Alpha-beta Pruning with max depth 3
ABP-4 Alpha-beta Pruning with max depth 4
ABP-5 Alpha-beta Pruning with max depth 5
ABP-6 Alpha-beta Pruning with max depth 6
ABP-7 Alpha-beta Pruning with max depth 7
ABP-8 Alpha-beta Pruning with max depth 8
MCTS-20 Monte Carlo Tree Search with 20 play-outs
MCTS-200 Monte Carlo Tree Search with 200 play-outs
MCTS-2k Monte Carlo Tree Search with 2 000 play-outs
MCTS-20k Monte Carlo Tree Search with 20 000 play-outs
MCTS-200k Monte Carlo Tree Search with 200 000 play-outs

Table 3.1: Names and descriptions of tournament contestants.

choosing a move among those which produce the best result, shown in al-
gorithm 2. The randomness is necessary to avoid game repetition, see sec-
tions 4.5 and 5.3.1. As the study is concerned with algorithms that do not
use game-specific heuristics, the value of non-terminal leaf nodes used in the
search is set to the constant 0, equivalent to a draw.

Algorithm 2 Move selection from state s in the ABP player with depth d0
1: function MakeAbpMove(s)
2: v ← {}
3: for each valid move i from s do
4: t ← the state after making the move i
5: v[i] ← −Negamax(t, −∞, ∞, d0 − 1)
6: return RandomElement({i : v[i] = max(v)})

3.2.2 Monte Carlo Tree Search


The Monte Carlo Tree Search players follow the outline given in section 2.3.
Pseudocode of the specific implementation is given in algorithm 3 which de-
scribes how the algorithm chooses a move to play at the top level of the search
tree every move, and in algorithm 4 which describes the actual tree search.
CHAPTER 3. METHOD 11

Algorithm 3 Move selection from state s in the MCTS player with p play-outs
1: function MakeMctsMove(s)
2: loop p times
3: Playout(s)
4: v ← {}
5: for each valid move i from s do
6: t ← the state after making the move i
7: v[i] ← 1 − W [t]/N [t]
8: return RandomElement({i : v[i] > (1 − ε) max(v)})

The variables N and W are maintained by the player throughout every game,
with N [s] storing the number of times the state s has been visited and W [s]
storing the total utility (from 0 to N [s]) to the current player at state s summed
over each time s was visited. Like with Alpha-beta Pruning, the move choices
are random, but with the addition that moves that are close but not equal to
the maximum may be chosen, controlled by the parameter ε, since the move
values produced by the Monte Carlo Tree Search are not discrete like in Alpha-
beta Pruning. For the implementation used in the experiment, ε = 0.05 was
chosen, since it gave sufficient randomness (see section 5.3.1) without a sig-
nificant decrease in playing strength.

3.3 Tournament
The tournament was 10 rounds, each round consisting of 2 games (giving each
player the first move advantage once) per possible match-up of different con-
testants. In total, 13∗14 = 182 games were played per round and the complete
tournament ran 1820 games.
After each game, the sequence of moves that were made was recorded, and
after each round, all the match-up results so far and the average thinking time
for each contestant were recorded in a separate file for analysis.

3.3.1 Playing speed measurement


The average thinking times were measured in order to appropriately weigh
performance to computational intensity. The thinking time for each move was
calculated by starting a clock1 before a function call telling the contestant the
1
Specifically, std::chrono::high_resolution_clock of C++.
12 CHAPTER 3. METHOD

Algorithm 4 Monte Carlo Tree Search play-outs


1: function Playout(s)
2: if not visited s then
3: r ← SimulateUtility(s)
4: (N [s], W [s]) ← (1, r)
5: return r
6: if s is terminal then
7: return W [s]
8: v ← {}
9: for each valid move i from s do
10: t ← the state after making the move i
11: ←∞
if not visited t then v[i]q
N [t]−W [t] 2 log N [s]
12: else v[i] ← N [t]
+ N [t]
. The UCT formula
13: m ← RandomElement({i : v[i] > (1 − ε) max(v)})
14: t ← the state after making the move m
15: r ← 1 − Playout(t)
16: (N [s], W [s]) ← (N [s] + 1, W [s] + r)
17: return r

opponent’s last move, followed by a function call asking the contestant for its
move in the resulting position, after which the clock was immediately stopped.
Chapter 4

Results

4.1 Results overview


10 rounds of the 14-player tournament were played over 4 sessions, totalling
around 35 hours of CPU time. The tournament consisted of 1820 games, of
which 947 were won by the first player, 9 were drawn, and 864 were won by
the second player. The results largely show that increasing the search depth
or number of play-outs increases playing strength, and that Monte Carlo Tree
Search seems to benefit more from increased thinking time.

4.2 Result matrix


To begin with, we present the results of each individual match-up. Each tour-
nament contestant played 20 games against every other contestant, 10 games
with the first move advantage and 10 games without. Figure 4.1 shows the win
rate for each individual match-up with the player with the first move advantage
on the y-axis. The win rate is considered from the perspective of the player
with the first move, and is calculated as the achieved score divided by the max-
imum possible score. Wins count as one point and draws count as half a point,
as in equation 4.1, where w is the number of games won, d is the number of
draws and n is the number games played.

w + 0.5 ∗ d
win rate = (4.1)
n
The result matrix if figure 4.1 can be divided into 4 sections, each corre-
sponding to either intra-method or inter-method match-up groups. The top left

13
14 CHAPTER 4. RESULTS

Figure 4.1: Average game result of y-axis player against x-axis player when
y-player starts. Y-axis player wins gives score 1, draw gives score 0.5, loss
gives score 0.
CHAPTER 4. RESULTS 15

section holds ABP vs ABP, top right holds ABP vs MCTS, bottom left holds
MCTS vs ABP and bottom right MCTS vs MCTS.
The win rates in the respective match-up groups show a few over-arching
trends. First, greater search depth and greater number of play-outs consis-
tently results in higher win rates for Alpha-beta Pruning and Monte Carlo Tree
Search, respectively, in the intra-method match-ups. For instance, the lower
triangles in the intra-method sections have win rates of 0.8 or greater in 24/38
match-ups for Alpha-beta Pruning and 9/10 match-ups for Monte Carlo Tree
Search. In the upper intra-method triangles the same relationship holds, with
stronger players generally beating weaker players, even though there are a few
outliers where a weaker algorithm happens to beat a stronger one in a majority
of the games.
The inter-method win rates display the same relationship. Again, an in-
creased number of play-outs or an increased search depth will generally re-
sult in higher win rates. However, the boundary where one method gains an
advantage over the other is not as clear, since there are many roughly equal
match-ups.

4.3 Player performance


Player performances were measured by their overall number of wins, draws
and losses throughout the tournament. Each player was scored according to
equation 4.2. The scoring is a slight adjustment of 4.1, where the parameters
W , D and N correspond to the total number of wins, draws and games played,
respectively.
W + 0.5 ∗ D
Overall win rate = (4.2)
N
Figure 4.2 shows the relationship between average thinking time and over-
all win rate for the different methods. Both average thinking time and overall
win rate increased with each step of search depth for Alpha-beta Pruning and
each step of number of play-outs for Monte Carlo Tree Search. Therefore, the
individual data points are not labeled, and instead the players are grouped to-
gether by method, which orders them from left to right in increasing order of
search depth and number of play-outs.
A comparison of the two methods shows that Alpha-beta pruning outper-
forms Monte Carlo Tree Search at average thinking times below 10−1 sec-
onds. However, the overall scores of Monte Carlo Tree Search appears to in-
crease faster with increased thinking time than the overall scores of Alpha-beta
Pruning. The over-arching trend suggests that Monte Carlo Tree Search is the
16 CHAPTER 4. RESULTS

Figure 4.2: The overall win rates and average thinking times of the differ-
ent players. From left to right, Alpha-beta Pruning players (orange) are dis-
played in order of increasing search depth and Monte Carlo Tree Search players
(green) are displayed in order of increasing number of play-outs.

stronger method for thinking times over 1 second, and indeed MCTS-200k
performs much better than the strongest Alpha-beta Pruning player, despite
having similar thinking times.

4.4 Convergence of win rate


In order to assess the reliability of the produced results, the win rate after each
tournament round (and including all previous rounds up to that point) has been
plotted for each player in figure 4.3. MCTS-200k maintains a consistent lead
over second place throughout the tournament, but the internal order of the
three runners-up (MCTS-20k, ABP-7 and ABP-8) changes at multiple points
in the tournament, indicating that 10 rounds might not have been sufficient to
properly assess their precise win rates and suggests that the experiment could
benefit from additional rounds of game-play. There are other pairs of players
that have also swapped places during the tournament, but overall the win rates
during the last few rounds of the tournament have only changed by a small
amount.
CHAPTER 4. RESULTS 17

Figure 4.3: Convergence of win rate through-out tournament

4.5 Game repetition


Since randomness was introduced to the methods in order to avoid repetition
of games, the full games from the tournament were all analyzed together after
it had ended to determine if there was any repetition. This showed that no
games had been completely repeated during the whole tournament, and the
furthest that any two games agreed was for the first 5 moves.
Chapter 5

Discussion

5.1 Analyzing the ratings


The findings suggest that the preferred choice of method of game-play de-
pends on the context, specifically with regard to time constraints. In contexts
where sub-second thinking times are preferable, Alpha-beta Pruning is a bet-
ter method for playing 3D Connect 4, since it consistently outperforms Monte
Carlo Tree Search players with similar or lower average move times.
If thinking time should be comparable to human thinking times, between
1 and 10 seconds, the preferred choice is not as evident. The two methods
achieve similar levels of game-play as shown with the three runners-up in fig-
ure 4.2 (ABP-7, ABP-8 and MCTS-20k).
In contexts of abundant thinking times, e.g. when maximal playing strength
is desirable, Monte Carlo Tree Search is the favoured method of the players
studied in this report. The results suggest that the playing strength of Monte
Carlo Tree Search scales better than Alpha-beta Pruning with increased think-
ing times. However, further experimentation is needed before drawing general
conclusions about the preferred method in time-abundant settings.
A possible explanation of the difference between the best performing play-
ers of each method is that Monte Carlo Tree Search will, given enough time
for simulation, identify a good enough heuristic to consistently perform good
moves throughout the game. In contrast, Alpha-beta pruning will perform ran-
dom moves if no game-winning moves are found, giving a clear advantage to
Monte Carlo Tree Search in games where there are no game-winning moves
within the search depth.
The difference is expected to diminish if Alpha-beta Pruning was to employ
game-specific heuristics which enables evaluation of non-terminal states. This

18
CHAPTER 5. DISCUSSION 19

would likely result in fewer random moves and a higher chance for adequate
moves regardless of whether a guaranteed win is within the search depth or not.
Such an implementation would counteract the ability to make generalizations
about the relationship between the methods as regards general game-playing,
and so was not within the scope of this report. However, there is value in
learning what role game-specific heuristics play in the performance of different
methods.

5.2 Generalizability
In contrast to Clune’s [7] findings on general game-play for Alpha-beta Prun-
ing and Monte Carlo Tree Search, which show that Alpha-beta Pruning per-
forms better with more thinking time in games of branching factor 16, this
study’s results show the opposite – more thinking time benefits Monte Carlo
Tree Search. Clune compared the methods with limits of 2 and 16 seconds
of thinking time, where Alpha-beta Pruning outperformed Monte Carlo Tree
Search in the latter case, while Monte Carlo Tree Search had the advantage
when given less time. Admittedly, different games were used for comparing
the methods, suggesting there could be other factors at play causing this dis-
crepancy and warranting further research into scalability in terms of thinking
time.
Clune’s comparison of the methods in many different games appoints Monte
Carlo Tree Search as the favoured method in games of high branching factor.
Adding an additional column to the standard 7x6 board of 2D Connect 4, i.e.
increasing the branching factor by 1, reduced Alpha-beta Pruning’s advantage
over Monte Carlo Tree Search. 3D Connect 4 has a branching factor of 16 and
this study’s results suggests Monte Carlo Tree Search as the preferred choice
of method, adding evidence in favour of one of Clune’s findings and further
supporting the idea that Monte Carlo Tree Search scales better in terms of
branching factor.

5.3 Potential issues


5.3.1 Randomness and repetition
The tournament experiment requires letting the same algorithms play each
other multiple times, which means that the algorithms cannot be determinis-
tic, since that would mean that the same game would be repeated every time
20 CHAPTER 5. DISCUSSION

and yield no new information. Instead both methods have been made non-
deterministic by the same construction of selecting a random move among all
those that to the algorithm seem to be at least close to optimal. The hope was
that this would avoid repetition of games, and indeed the results in 4.5 showed
that no games were repeated.

5.3.2 Edge-case effects


The tournament results show that Monte Carlo Tree Search scaled better with
thinking time and eventually overtook Alpha-beta Pruning. However, no Monte
Carlo Tree Search player actually outperformed any Alpha-beta player with a
longer average thinking time, since MCTS-200k, which clearly performed the
best in the tournament, also had the longest average thinking time out of all
the contestants. While it cannot be said for certain from the limited data that
there is some range of average thinking times where Monte Carlo Tree Search
outperforms Alpha-beta Search, the trend in figure 4.2 is clear enough that the
conclusion that Monte Carlo Tree Search scales better seems justified regard-
less.

5.3.3 Method comparability


Both the Alpha-beta Pruning and Monte Carlo Tree Search methods have been
extensively studied before and exist in numerous forms, and there are optimiza-
tions and improvements that can be made on both methods used in this specific
experiment. It is likely that either method would beat the other if it were to be
improved with existing augmentations, such as iterative deepening and better
move-ordering for Alpha-beta Pruning [8], and parallelization for Monte Carlo
Tree Search as suggested by Chaslot et al. [9]. It is therefore important not
to consider this experiment as a statement about the state-of-the-art versions
of either method, but only as an analysis of the base performance of simple
implementations of the respective methods when running without heuristics.
Another potential comparability problem is that the algorithms always run
to completion, no matter how long that takes, and time is measured after-the-
fact. This results in few pairs of algorithm instances having directly compa-
rable move times, and also has the potential issue of a single algorithm taking
highly variable amounts of time depending on the position, which might not
be desirable in an application setting. While this does not invalidate the trends
observed in this study, it might be better for comparability to restrict the algo-
rithms to fixed amounts of thinking time.
CHAPTER 5. DISCUSSION 21

5.4 Future research


In order to further explore the scaling of playing strength with regard to think-
ing time, this study’s experiment would have benefited from additional imple-
mentations of greater Alpha-beta Pruning search depth and Monte Carlo Tree
Search play-out count. However, restrictions in time and computing resources
limited the experiment to a maximum average thinking time of 20 seconds. By
further augmenting the implementations of the methods as mentioned in sec-
tion 5.3.3, stronger implementations could play at reasonable speeds or longer
tournaments could be conducted to generate more data.
Another aspect of game-playing algorithms which this study did not ex-
plore was how the methods scale with regard to thinking time when employ-
ing varying sets of game-specific heuristics. The heuristics can provide bet-
ter grounds for evaluating board states when reaching maximum depth of the
Alpha-beta Pruning tree search, having shown a significant effect on playing
strength in 2D Connect 4 [10]. Heuristics can also serve as an alternative to
random play-outs in the simulation phase of Monte Carlo Tree Search. The hy-
pothesis is that the differences in scalability between methods grows smaller
with richer heuristics. Alpha-beta Pruning with richer heuristics will likely
make, on average, better moves and thus scale better with increased thinking
time, while Monte Carlo Tree Search, which is designed as a general game-
playing algorithm, will likely not benefit to the same extent.
In an attempt to deal with the issue of having few pairs of players with
comparable thinking times, an alternative approach to studying the scalability
of the two methods would be to set hard time limits on thinking time and im-
plementing the methods such that they search for as long as the limit permits.
This could possibly be done by altering Alpha-beta Pruning to apply itera-
tive deepening [8], which iteratively perform depth first searches at increasing
search depths while thinking time remains. Monte Carlo Tree Search could be
adjusted to perform play-outs for as long as the thinking time allows, instead
of setting the limit on the number of play-outs.

5.5 Conclusion
To summarize, the experiment has provided insights into the scalability of
Alpha-beta Pruning and Monte Carlo Tree Search with regard to thinking time
in the game of 3D Connect 4. Alpha-beta Pruning is the preferred method
when thinking time is highly restricted – lower than 0.1 seconds. On the other
22 CHAPTER 5. DISCUSSION

hand, Monte Carlo Tree Search has proven to scale better with increased think-
ing times and could potentially be the favoured method in time-abundant con-
texts. It should be noted that the experiment studied players with at most 20
seconds of average thinking time and that the effect of thinking time on the
match-up between the two methods contradicts previous research. Therefore,
additional research into the relationship between thinking time and the meth-
ods’ playing strengths would contribute to a better understanding of how the
methods compare.
Bibliography

[1] Feng-hsiung Hsu, Murray S. Campbell, and A. Joseph Hoane. “Deep


Blue System Overview”. In: Proceedings of the 9th International Con-
ference on Supercomputing. ICS ’95. Barcelona, Spain: Association
for Computing Machinery, 1995, pp. 240–244. isbn: 0897917286. doi:
10.1145/224538.224567. url: https://doi.org/10.
1145/224538.224567.
[2] Michael Genesereth, Nathaniel Love, and Barney Pell. “General game
playing: Overview of the AAAI competition”. In: AI magazine 26.2
(2005), pp. 62–62.
[3] Guillaume Maurice Jean-Bernard Chaslot Chaslot. Monte-carlo tree
search. Maastricht University, 2010.
[4] David Silver et al. Mastering Chess and Shogi by Self-Play with a Gen-
eral Reinforcement Learning Algorithm. 2017. arXiv: 1712.01815
[cs.AI].
[5] Donald E Knuth and Ronald W Moore. “An analysis of alpha-beta prun-
ing”. In: Artificial intelligence 6.4 (1975), pp. 293–326.
[6] Levente Kocsis and Csaba Szepesvári. “Bandit based monte-carlo plan-
ning”. In: European conference on machine learning. Springer. 2006,
pp. 282–293.
[7] James Edmond Clune III. “Heuristic Evaluation Functions for General
Game Playing”. PhD thesis. University of California, 2008.
[8] David L. Poole and Alan K. Mackworth. Iterative Deepening. https:
//artint.info/2e/html/ArtInt2e.Ch3.S5.SS3.html.
Accessed: 2021-05-14.
[9] Guillaume Chaslot, Mark Winands, and H. Herik. “Parallel Monte-Carlo
Tree Search”. In: Sept. 2008, pp. 60–71. isbn: 978-3-540-87607-6. doi:
10.1007/978-3-540-87608-3_6.

23
24 BIBLIOGRAPHY

[10] Xiyu Kang, Yiqi Wang, and Yanrui Hu. “Research on Different Heuris-
tics for Minimax Algorithm Insight from Connect-4 Game”. In: Journal
of Intelligent Learning Systems and Applications 11 (Jan. 2019), pp. 15–
31. doi: 10.4236/jilsa.2019.112002.
TRITA-EECS-EX-2021:479

www.kth.se

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy