Mario AI Competition 2009 with Genetic Algorithm

This article is the 19th entry in the CAMPHOR - Advent Calendar 2016.

The Mario AI Competition 2009 was a competition organized by researchers Sergey Karakovskiy and Julian Togelius, where participants created AI to automatically control a game modeled after Super Mario Bros and compete for scores. This is not Super Mario Run.

While reading the code and experimenting, I implemented an agent and managed to clear a simple stage, so I will introduce what I did.

Preparation
- Downloaded the source code from the official site as a zip file and set up the development environment (this was the most challenging part).
Implementation of the Learning Agent
- Created an agent that learns to operate using a genetic algorithm.
Results
- After training with 10 individuals per generation, individuals that could reach the goal started to appear around the 50th generation.

Generation 1

Generation 40

Generation 50

The game program for Super Mario Bros and the sample code for the agent controlling Mario were both written in Java. However, the game program had an option to run in server mode, allowing TCP communication to control Mario, which meant the agent could be created in any language.

There was a sample client implemented in Python within the project, so I used that as a reference to implement the learning agent in Python. The sample was in Python 2, so I initially converted it to Python 3 and then proceeded with the implementation.

Even though I executed the commands as written in the official documentation, it didn't work, so I explored ways to execute it and documented them in README.md.

Mario's Action Selection#

The input and output of the agent program are as follows.

Input (Surrounding State of Mario)#

I extracted the following information from the environment information around Mario to use as input.

Whether the 7 squares around Mario are obstacles (7 bits)
Whether Mario can jump (1 bit)
Whether Mario is grounded (1 bit)

Output (Mario's Next Action)#

Whether to press ↓←→A (jump) B (fire/dash) (5 bits)

Genetic Algorithm#

Individual#

I represented the combination of input and output as a single gene locus, considering the information of all input-output patterns as a gene. In the program, it is simply an array of ints.

Selection, Crossover, and Mutation#

The trials and learning were conducted through the following process. As a result, I was able to create individuals that cleared the stage around the 50th generation.

Generate 10 initial individuals with random genes to form the first generation.
Let the first generation play the game and obtain their scores (the score is higher the longer Mario travels).
Keep the individual with the highest score to the next generation (selection).
Select 2 individuals to be parents (individuals with higher scores are more likely to be chosen) and perform two-point crossover to generate 2 offspring. Repeat this to generate a total of 9 offspring, which will be combined with the 1 individual from step 3 to form the next generation.
Mutate 30% of the offspring. Mutation involves randomly selecting several gene loci and swapping their values.
Let the generated generation play the game. -> Return to step 3.

The source code is placed here.

There are various files, but the parts I implemented mainly include the following. I apologize to those who want to read it as the implementation is not clean 🙇.

Gene (Individual) Model: https://github.com/morishin/marioai-2009/blob/yatteiku/src/python/competition/ga/individual.py
Selection, Crossover, and Mutation Operations: https://github.com/morishin/marioai-2009/blob/yatteiku/src/python/competition/ga/controller.py
Agent that operates Mario using gene information: https://github.com/morishin/marioai-2009/blob/yatteiku/src/python/competition/agents/myagent.py
Main task that actually executes: https://github.com/morishin/marioai-2009/blob/yatteiku/src/python/competition/learn.py

I thought that genetic algorithms, with their crossover and mutation processes, seemed like a somewhat arbitrary method, and I wasn't sure if they had any real meaning. However, it is certainly impressive how they can approach solutions relatively quickly through repetition. It seems they have also been used in the design of shapes for Shinkansen and satellite antennas. Additionally, it was quite challenging to understand the code written by researchers back in 2009.