Fourth Year Project: AlphaZero (Master's Research Project)
October 2018 - June 2019
PYTHON
PYTORCH
This was my fourth year master's project. The goal was to implement a version of Deep Mind's AlphaZero algorithm which uses reinforcement learning to create an AI-player for a given board game.
Structure:
The structure of AlphaZero is quite complex and explained in more detail in the PDF below. The basics are as follows:
- A two-headed neural network that takes a board state as input and outputs an estimate of whether the current player is going to win or lose, along with a set of move scores which give a value to every possible move that the current player can make signifying how good each move is.
- A Monte-Carlo Tree Search implementation (I created two versions: one that does not use the neural network, and one that does)
- A data-generation loop to generate labelled data to train the network (data is generated by a player using the neural network playing games against itself and is described in more detail below)
- A typical supervised learning training loop using the data generated in the previous step
What I Learnt:
- How to do modular, object-oriented code in Python
- How to use Pytorch to create complex machine learning algorithms
- How to research a complex algorithm and implement it
- How to structure a long-term (one-year) project while being flexible with plans and achieving the desired output
- How to write an academic research paper
Final Reports
As part of the project, I had to produce a final project paper, a poster, and a presentation to describe the work I had done.
Final Report
Final Poster
Final Presentation (Full)
Final Presentation (Simplified)
×