Implementing 2048 for reinforcement learning - pt. 1

A reinforcement learning approach to the game 2048, including manually coding the game and implementing a neural network to learn how to play successfully. The first of a 3-part series, covering the game itself and data logging.

Game overview

2048 is played on a 4x4 board where tiles appear and are moved either horizontally or vertically, to combine adjacent tiles of equal value. When two tiles merge, they form a new tile that is worth 2x what each individual tiles were worth. With each combined tile pair, the game score increases by the value of the new tile.

For example, two adjacent tiles of value 4 could be combined to make a single tile of value 8, while the game score increases by 8 points. In action, it looks like:

Mechanically, all the user has to do is decide whether to swipe up, down, left, or right when given a board layout. The goal of the game is to reach the highest score, however it is also possible to “win” the game by achieving a tile of value 2048 (the game does not end at this point, it is just a relatively difficult milestone).

Implementation

I decided to implement the game myself, using only numpy, with the goal of connecting this module to a neural network, which will “learn” to play the game by repeatedly trying different strategies.

import numpy as np

Single game

Each game consists, at any given moment, of a board layout (the 4x4 tile values) and a score. As the user (or computer) makes decisions, both the layout and score are updated.

class GameLayout:
    def __init__(self):
        # initialize empty layout and zero points
        self.layout = np.zeros((4,4), dtype=np.int)
        self.score = 0
	...

But wait a second! The user never sees an empty board. When a 2048 game is initialized, it always has 2 full tiles.

class GameLayout:
    def __init__(self):
	...
	# each game starts with 2 full tiles
	self.add_random()
	self.add_random()
	...

add_random is a function that updates the layout, adding either a 2 or 4 tile to any empty tile in the 4x4.

class GameLayout:
    def add_random(self):
        # randomly choose any empty tile and fills it with a 2 or 4 tile (with 90%, 10% probabilities, respectively)
        layout = self.layout.flatten()
        options = (layout==0)
        if sum(options)!=0:
            layout[np.random.choice(range(16), p=(np.repeat(1,(16))*options)/sum(options))] = np.random.choice([2,4],
                                                                                                              p=[.9, .1])
        self.layout = layout.reshape((4,4))

Generally it is used to add randomness to the board after each move, but we’re also using it at the game’s initialization to give the user tiles to move at the start. So after initializing game = GameLayout(), game.score is zero, and game.layout could look like:

[[0, 0, 0, 2],
 [0, 0, 0, 0],
 [0, 0, 0, 2],
 [0, 0, 0, 0]]

Functionality

Next, we need to implement functionality for the user to input a move, by “swiping” on the screen in the app. In this computer-based version, I’ve chosen to allow text input, using the common w, a, s, d format to represent the directions (up, left, down, right, respectively).

class GameLayout:
    def swipe(self, choice):
        scores = np.zeros(4, dtype=np.int)
        new_layout = np.zeros((4,4), dtype=np.int)

        if choice=='w':
            score, new_layout = self.swipe_up(scores, new_layout)
        elif choice=='s':
            score, new_layout = self.swipe_down(scores, new_layout)
        elif choice=='d':
            score, new_layout = self.swipe_right(scores, new_layout)
        elif choice=='a':
            score, new_layout = self.swipe_left(scores, new_layout)
        else:
            raise Exception("Invalid input to swipe.")
			
	...

Each method (swipe_up, swipe_down, swipe_right, and swipe_left) are very similar, simply iterating through the rows/columns in a different order or direction to condense the layout appropriately. Full code is available in the repo, and is omitted here for brevity.

Some error checking is necessary - the move is only “valid” if some tiles can be moved. If not, the user must choose another direction to swipe or, if all options are exhausted, the game is over! I’ve chosen to log failed moves in a set, and when the set has all 4 valid moves (w,a,s, and d) the game must end.

class GameLayout:
    def swipe(self, choice):
	...
	
        if (new_layout!=self.layout).sum()>0: # some tiles were moved so the move is valid
            
            self.failed_moves = set() # reset failed move counter
            
            # update the game's score and layout
            self.score += scores.sum()
            self.num_moves += 1
            
            self.log_data(choice) # log data w/ old layout
            self.layout = new_layout # update to new layout

            # include the random next tile
            self.add_random()
        else:
            self.failed_moves.add(choice)
            self.active = len(self.failed_moves)<4 # otherwise, all moves have been tried and game should end
            
            # assertion error means game ends
            assert self.active 

            # exception means move didn't change the layout, and another input is required
            raise Exception('Not a valid move.')

So given our previous game.layout of:

[[0, 0, 0, 2],
 [0, 0, 0, 0],
 [0, 0, 0, 2],
 [0, 0, 0, 0]]

If we call game.swipe('s'), we’re “swiping down”, and could find game.layout has been updated to:

[[0, 0, 0, 0],
 [0, 2, 0, 0],
 [0, 0, 0, 0],
 [0, 0, 0, 4]]

Note the random new tile that has been added!

Data

The most important part of this project, to me, is the data generated as a game is played, as we’ll need that data to learn from each game later on, when we’re training a neural network. You may have noticed the line with self.log_data(choice) in swipe(...) earlier - this happens after a valid move is performed, to log the layout, the valid move chosen, and the score after the move happens.

class GameLayout:
    def log_data(self, move):
        formatted_move = np.zeros(4)
        formatted_move[['w','a','s','d'].index(move)] = 1

        try:
            self.layouts = np.concatenate((self.layouts, self.layout.reshape(1,-1)))
            self.moves = np.concatenate((self.moves, formatted_move.reshape(1,-1)))
            self.scores = np.append(self.scores, self.score)
        except AttributeError:
            self.layouts = self.layout.reshape(1,-1)
            self.moves = formatted_move.reshape(1,-1)
            self.scores = np.array(self.score)
  • layouts is a 16xn matrix, where each row is a layout during the game
  • moves is a 4xn matrix, where each row is one-hot encoded the w, a, s, d choice
  • scores is a flat array of length n


Note we’ve performed a single move (“swiping down”) so far in our demonstration game, so n has a value of 1. Checking back in on our example:

  • Our current game.layouts looks like:
    array([[0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0]])
  • Our current game.moves looks like:
    array([[0,1,0,0]])
  • Our current game.scores looks like:
    array([4])

Playing the game

We have all the functionality we need! Out of personal pride I decided to also cobble together a decent user interface, which takes that numpy array (game.layout) and formats it to look similar to the app interface.

An example layout would be displayed like this: I’m far from a front-end expert, having never taken a class that taught/used JavaScript or CSS, but anything’s possible with google and a little stubbornness!

The full game implementation is available in the GitHub repository, and can be played (basic UI and all), out of either a jupyter notebook locally, or via this binder: Binder

If you haven’t heard of binder before, Binder allows you to open notebooks in an executable environment, without worrying about requirements or installing/downloading anything.

My one complaint about binder is that it can be pretty slow to spin up the environment - if you’re feeling impatient and just want to play any version of 2048 to see how it works, this site has a smooth implementation, complete with javascript transitions I haven’t bothered with.

Logging data

We can’t just play a single game and expect to learn anything - the learning comes from comparing different strategies and outcomes, so we need to run and log data from several games in a row.

Automated games

The obvious method we need is programatically running multiple games and logging that data. This is what we’ll need to let the neural network play the game. The GameDriver class does just that.

class GameDriver():        
    def run_games(self, n, method=lambda layout:np.array([.25,.25,.25,.25])):
        from game import GameLayout
        for i in range(n):
            game = GameLayout()
            moves = ['w','a','s','d']
            while game.active:
                # repeatedly attempt to make a move
                for move in self.weighted_shuffle(moves, method(game.layout)):
                    try:
                        game.swipe(move)
                    except:
                        # move didn't work, try next move
                        continue
            # game is over
            self.log_game(game)

Calling run_games(...), will progamatically play n number of games, using the method function to map a layout to a probability density array of size 4, where the elements represent the probability of choosing w, a, s, or d, respectively.

This function also logs all the same data as an individual game (layouts, moves, scores) but also keeps track of the final_scores across the games, and num_moves across the games. Both of these should have length n, unless run_games(...) is called multiple times.

Manual games

Initially I thought the automated method was all I needed, but I later decided to include logging functionality for a game played manually, in case I wanted to teach the neural network to play like I do. In this case, we can call log_game(...) to store the user-created game’s data in the same way that we’re creating and storing the automated games’ data.

Code

Complete code can be found in my 2048 git repository. This post details code contained in game.py and helper.py. Further explanation will follow in another post.

This section of the project relies solely on numpy. Additional packages (such as widgets and matploblib) may be required to run any notebooks in the repository, and the deep learning will be built using pytorch.


© 2018. All rights reserved.