Playing 2048 with AI - pt. 3

A Monte Carlo tree search approach to the game 2048.

The final post in a 3-part series dedicated to playing 2048 with AI. Part 1 (the game itself) available here and part 2 (reinforcement learning) available here.

Series recap

2048 is a game where the player chooses one of 4 actions to move the tiles in (up, left, down, and right), when presented with a board layout, to combine tiles until they reach a value of 2048 (or larger). Part of a game would look like:

My implementation of the game logic, using only numpy, is described in the first post in this series.

My implementation of reinforcement learning to infer optimal strategies from repeatedly playing games is described in the second post in this series. Unfortunately I was unable to validate the effectiveness of this strategy, due to computational limitations.

In this post, I will discuss an alternative method to playing 2048 - running Monte Carlo simulations to determine moves, rather than training a neural network to decide the strategy.

Algorithm overview

For those unfamiliar, a Monte Carlo tree search algorithm uses random sampling to choose the most promising action, given a high degree of uncertainty. It has often been implemented for game strategy (notably to play the game Go).

In practice, the algorithm looks like:

  1. Computer is presented with a situation and asked for a decision
  2. Computer runs “background runs” - games starting from this situation, where every decision is made completely randomly
  3. Each background run’s final performance is scored
  4. The runs are grouped by initial move, and performance is averaged
  5. Computer performs the initial move with highest expected final performance, and repeats 1-5


In our case, playing 2048, the computer would be presented with a layout. It will execute a series of “background runs”. Some, by chance, will be very good games with high final scores, and others will quickly end with low final scores. All of these “background runs” will be grouped by initial move (swipe up, left, down, or right), and the computer then swipes in that direction, hopefully moving incrementally towards the highest expected final score.

Implementation

I previously detailed my implementation of the game logic itself in numpy in the first post in this series, and will rely heavily on the GameLayout class described therein, which holds a game state, including layout & current score, and also provides functionality to update those when a direction is chosen (see GameLayout.swipe(...)).

Setting up

All we need to do to start a game is initialize this class. To make this repeatable, I’m also going to use a random seed.

import numpy as np
np.random.seed(1)
from game import GameLayout

game = GameLayout()

The game is initialized with only 2 nonzero tiles, and I’d prefer to demonstrate with more going on, so let’s generate a random layout that represents a state from a little later in the game, when moves are starting to matter move. To do this, I’ll leverage the GameLayout.swipe(...) functionality with 40 random moves.

for dummy_move in np.random.choice(['w','a','s','d'], size=40):
    try:
        game.swipe(dummy_move)
    except:
        print("Random move was invalid.")
        continue

Now, if you look at game.layout, it should look like:

[ 4,  4,  0,  0,
  2, 16,  0,  0,
  4, 32,  4,  0,
  2, 16,  8,  0]

Simulating outcomes

Next, let’s go to Monte Carlo! All functionality discussed here is contained in my MonteCarloGameDriver class, but I’ll go through it step by step first.

First, we want to make a copy of the game. It’s important to make a “deep copy”, which will actually duplicate the elements inside the game object. Otherwise, when we swipe on the copy it will update the original game.layout too!

from copy import deepcopy

game_copy = deepcopy(game)
game_copy.reset() # removes prior moves/layouts

Now, we will completely randomly run this copy game. This will look very like the GameDriver class discussed in my previous post here.

while game_copy.active:
    moves = ['w','a','s','d']
    np.random.shuffle(moves)
    for move in moves:
        try:
            game_copy.swipe(move)
            break
        except:
            # move didn't work, try next move
            continue

After this snippet runs, we have a finished copy game, including a final score and all moves performed/layouts seen. In fact, we can look at game_copy.final_layout and see how far the game got:

[  4,   2,  16,   4,
 256,  16,   4,   8,
   4,  32,  64,   2,
   2,   8,  16,   4]

This random game achieved a maximum tile of 256, and we can also see that its final score (stored in game_copy.score) was 2,344.

What’s most important to note is this: the initial move this random simulation took (stored as game_copy.moves[0]) was to swipe up.

Choosing optimal action

So should the computer, then, swipe up? A single simulation doesn’t have any differentiating power. Instead, we should repeat this procedure several times and take the average score for each initial move.

This procedure is implemented in my MonteCarloGameDriver.simulate(game, simulation_size) method, which takes a game and runs simulation_size number of random games starting from that initial gamestate.

Let’s see what happens when we use .simulate(...) on our random game described earlier. As a reminder, the layout was:

[ 4,  4,  0,  0,
  2, 16,  0,  0,
  4, 32,  4,  0,
  2, 16,  8,  0]

Let’s import the MonteCarloGameDriver class and run 20 simulations.

from simulator import MonteCarloGameDriver

driver = MonteCarloGameDriver()

driver.simulate(game, 20)

The output of this is game_performance, a dictionary mapping each move - ['w','a','s','d'] - to the average score that first move produced during the simulations. In this case, we get:

game_performance = {'w': 1101.6, 'd': 1038.6666666666667, 'a': 669.0}

Note that we aren’t getting an entry for s in this case because it is impossible to swipe down. However, with a small simulation_size, it is possible that not all initial moves could be attempted, so it is important to keep this number large enough to attempt a few strategies with each possible initial move.

Now, we can say that the computer should swipe up, w, as this initial move has the highest expected score, given our simulations.

Performance

Now we’ve covered what the algorithm does and how to implement it, let’s see how it actually does!

Benchmarking

First, lets establish a metric of success.

A low bar would be for the computer to perform consistently better than random, but hopefully we can expect that.

A medium bar would be for the computer to achieve 2048 the majority of the time.

A more personal goal of mine would be to have the computer beat me - or rather, to best my own personal highest-scoring game, which achieved 34,828.

Results

Using the Monte Carlo method I ran 10 sample games for each simulation size - [5, 20, 50, 100, 150]. For comparison, I also ran 10 completely random games to see how much improvement we get with running simulations.

Plotted below, you can see the final scores (after the entire game), by simulation size.

We can see that the huge performance gain happens somewhere between running 20 and 50 background runs at each step. Interestingly, there isn’t dramatic improvement in average score (denoted by the horizontal lines) with additional simulations after 50 - however, we can see that the variance increases, pushing a whopping 3 games above my own personal best!

Also impressively, we can see the maximum tile each game achieves move upward as simulation size increases.

I will declare this approach a success - with just 50 background runs at each step, the computer was able to achieve the 2048 tile 60% of the time, and the 1024 tile the remaining 40%.

Code

Complete code can be found in my 2048 git repository.

This section of the project relies on numpy and pytorch 1.1.0. Additional packages (such as widgets and matploblib) may be required to run any notebooks in the repository.


© 2018. All rights reserved.