Playing 2048 with AI - pt. 1
Game overview
2048 is played on a 4x4 board where tiles appear and are moved either horizontally or vertically, to combine adjacent tiles of equal value. When two tiles merge, they form a new tile that is worth 2x what each individual tiles were worth. With each combined tile pair, the game score increases by the value of the new tile.
For example, two adjacent tiles of value 4 could be combined to make a single tile of value 8, while the game score increases by 8 points. In action, it looks like:
Mechanically, all the user has to do is decide whether to swipe up, down, left, or right when given a board layout. The goal of the game is to reach the highest score, however it is also possible to “win” the game by achieving a tile of value 2048 (the game does not end at this point, it is just a relatively difficult milestone).
Implementation
I decided to implement the game myself, using only numpy, with the goal of connecting this module to a neural network, which will “learn” to play the game by repeatedly trying different strategies.
import numpy as np
Single game
Each game consists, at any given moment, of a board layout (the 4x4 tile values) and a score. As the user (or computer) makes decisions, both the layout and score are updated.
class GameLayout:
def __init__(self):
# initialize empty layout and zero points
self.layout = np.zeros((4,4), dtype=np.int)
self.score = 0
...
But wait a second! The user never sees an empty board. When a 2048 game is initialized, it always has 2 full tiles.
class GameLayout:
def __init__(self):
...
# each game starts with 2 full tiles
self.add_random()
self.add_random()
...
add_random
is a function that updates the layout, adding either a 2 or 4 tile to any empty tile in the 4x4.
class GameLayout:
def add_random(self):
# randomly choose any empty tile and fills it with a 2 or 4 tile (with 90%, 10% probabilities, respectively)
layout = self.layout.flatten()
options = (layout==0)
if sum(options)!=0:
layout[np.random.choice(range(16), p=(np.repeat(1,(16))*options)/sum(options))] = np.random.choice([2,4],
p=[.9, .1])
self.layout = layout.reshape((4,4))
Generally it is used to add randomness to the board after each move, but we’re also using it at the game’s initialization to give the user tiles to move at the start. So after initializing game = GameLayout()
, game.score
is zero, and game.layout
could look like:
[[0, 0, 0, 2],
[0, 0, 0, 0],
[0, 0, 0, 2],
[0, 0, 0, 0]]
Functionality
Next, we need to implement functionality for the user to input a move, by “swiping” on the screen in the app. In this computer-based version, I’ve chosen to allow text input, using the common w
, a
, s
, d
format to represent the directions (up, left, down, right, respectively).
class GameLayout:
def swipe(self, choice):
scores = np.zeros(4, dtype=np.int)
new_layout = np.zeros((4,4), dtype=np.int)
if choice=='w':
score, new_layout = self.swipe_up(scores, new_layout)
elif choice=='s':
score, new_layout = self.swipe_down(scores, new_layout)
elif choice=='d':
score, new_layout = self.swipe_right(scores, new_layout)
elif choice=='a':
score, new_layout = self.swipe_left(scores, new_layout)
else:
raise Exception("Invalid input to swipe.")
...
Each method (swipe_up
, swipe_down
, swipe_right
, and swipe_left
) are very similar, simply iterating through the rows/columns in a different order or direction to condense
the layout appropriately. Full code is available in the repo, and is omitted here for brevity.
Some error checking is necessary - the move is only “valid” if some tiles can be moved. If not, the user must choose another direction to swipe or, if all options are exhausted, the game is over! I’ve chosen to log failed moves in a set, and when the set has all 4 valid moves (w
,a
,s
, and d
) the game must end.
class GameLayout:
def swipe(self, choice):
...
if (new_layout!=self.layout).sum()>0: # some tiles were moved so the move is valid
self.failed_moves = set() # reset failed move counter
# update the game's score and layout
self.score += scores.sum()
self.num_moves += 1
self.log_data(choice) # log data w/ old layout
self.layout = new_layout # update to new layout
# include the random next tile
self.add_random()
else:
self.failed_moves.add(choice)
self.active = len(self.failed_moves)<4 # otherwise, all moves have been tried and game should end
# assertion error means game ends
assert self.active
# exception means move didn't change the layout, and another input is required
raise Exception('Not a valid move.')
So given our previous game.layout
of:
[[0, 0, 0, 2],
[0, 0, 0, 0],
[0, 0, 0, 2],
[0, 0, 0, 0]]
If we call game.swipe('s')
, we’re “swiping down”, and could find game.layout
has been updated to:
[[0, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 4]]
Note the random new tile that has been added!
Data
The most important part of this project, to me, is the data generated as a game is played, as we’ll need that data to learn from each game later on, when we’re training a neural network. You may have noticed the line with self.log_data(choice)
in swipe(...)
earlier - this happens after a valid move is performed, to log the layout, the valid move chosen, and the score after the move happens.
class GameLayout:
def log_data(self, move):
formatted_move = np.zeros(4)
formatted_move[['w','a','s','d'].index(move)] = 1
try:
self.layouts = np.concatenate((self.layouts, self.layout.reshape(1,-1)))
self.moves = np.concatenate((self.moves, formatted_move.reshape(1,-1)))
self.scores = np.append(self.scores, self.score)
except AttributeError:
self.layouts = self.layout.reshape(1,-1)
self.moves = formatted_move.reshape(1,-1)
self.scores = np.array(self.score)
layouts
is a16xn
matrix, where each row is a layout during the gamemoves
is a4xn
matrix, where each row is one-hot encoded thew
,a
,s
,d
choicescores
is a flat array of lengthn
Note we’ve performed a single move (“swiping down”) so far in our demonstration game, so n
has a value of 1. Checking back in on our example:
- Our current
game.layouts
looks like:array([[0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0]])
- Our current
game.moves
looks like:array([[0,1,0,0]])
- Our current
game.scores
looks like:array([4])
Playing the game
We have all the functionality we need! Out of personal pride I decided to also cobble together a decent user interface, which takes that numpy array (game.layout
) and formats it to look similar to the app interface.
An example layout would be displayed like this: I’m far from a front-end expert, having never taken a class that taught/used JavaScript or CSS, but anything’s possible with google and a little stubbornness!
The full game implementation is available in the GitHub repository, and can be played (basic UI and all), out of either a jupyter notebook locally, or via this binder:
If you haven’t heard of binder before, Binder allows you to open notebooks in an executable environment, without worrying about requirements or installing/downloading anything.
My one complaint about binder is that it can be pretty slow to spin up the environment - if you’re feeling impatient and just want to play any version of 2048 to see how it works, this site has a smooth implementation, complete with javascript transitions I haven’t bothered with.
Logging data
We can’t just play a single game and expect to learn anything - the learning comes from comparing different strategies and outcomes, so we need to run and log data from several games in a row.
Automated games
The obvious method we need is programatically running multiple games and logging that data. This is what we’ll need to let the neural network play the game. The GameDriver
class does just that.
class GameDriver():
def run_games(self, n, method=lambda layout:np.array([.25,.25,.25,.25])):
from game import GameLayout
for i in range(n):
game = GameLayout()
moves = ['w','a','s','d']
while game.active:
# repeatedly attempt to make a move
for move in self.weighted_shuffle(moves, method(game.layout)):
try:
game.swipe(move)
except:
# move didn't work, try next move
continue
# game is over
self.log_game(game)
Calling run_games(...)
, will progamatically play n
number of games, using the method
function to map a layout to a probability density array of size 4, where the elements represent the probability of choosing w
, a
, s
, or d
, respectively.
This function also logs all the same data as an individual game (layouts
, moves
, scores
) but also keeps track of the final_scores
across the games, and num_moves
across the games. Both of these should have length n
, unless run_games(...)
is called multiple times.
Manual games
Initially I thought the automated method was all I needed, but I later decided to include logging functionality for a game played manually, in case I wanted to teach the neural network to play like I do. In this case, we can call log_game(...)
to store the user-created game’s data in the same way that we’re creating and storing the automated games’ data.
Code
Complete code can be found in my 2048 git repository. This post details code contained in game.py
and helper.py
. Further explanation will follow in another post.
This section of the project relies solely on numpy. Additional packages (such as widgets and matplotlib) may be required to run any notebooks in the repository, and the deep learning will be built using pytorch.