Fourth Year Project: AlphaZero (Master's Research Project)

October 2018 - June 2019

This was my fourth year master's project. The goal was to implement a version of Deep Mind's AlphaZero algorithm which uses reinforcement learning to create an AI-player for a given board game.

Structure:

The structure of AlphaZero is quite complex and explained in more detail in the PDF below. The basics are as follows:

A two-headed neural network that takes a board state as input and outputs an estimate of whether the current player is going to win or lose, along with a set of move scores which give a value to every possible move that the current player can make signifying how good each move is.
A Monte-Carlo Tree Search implementation (I created two versions: one that does not use the neural network, and one that does)
A data-generation loop to generate labelled data to train the network (data is generated by a player using the neural network playing games against itself and is described in more detail below)
A typical supervised learning training loop using the data generated in the previous step

An object-oriented approach was taken, and an emphasis placed on modular code, such that the neural network, board game with rules, and other parameters could easily be changed; this helped to maximise efficiency when training and testing. Pytorch was used for the machine learning parts of the project.

What I Learnt:

How to do modular, object-oriented code in Python
How to use Pytorch to create complex machine learning algorithms
How to research a complex algorithm and implement it
How to structure a long-term (one-year) project while being flexible with plans and achieving the desired output
How to write an academic research paper

Final Reports

As part of the project, I had to produce a final project paper, a poster, and a presentation to describe the work I had done.

Final Report

Final Poster

Final Presentation (Full)

Final Presentation (Simplified)