Unity WebGL Player | Super PolySpawn Wars

Performance Graphs

Buffer Size
The graph below shows the phase reached with the default buffer size (gray) and the significantly increased one (purple). Note how the performance was noticably getting worse with the low buffer size, as the agent did not have a large memory of experiences to draw from. As such, whenever there is a dip in performance with a low buffer size, the agent doesn’t remember it’s prior good behavior to correct itself. This results in the performance cascading down.

Gamma
Consider the graph below, where blue is 0.99 (medium), pink is 0.9 (low), and yellow is 0.999 (high). Intuitively, a low gamma value prioritizes short term rewards. This is ideal for turbulent environments where rewards are not guaranteed. However, even though the environment is turbulent, rewards are guaranteed, as long as the agent is able to survive. A high turbulence requires a low gamma, but a high gamma value is ideal when rewards are guaranteed. A medium gamma was able to learn more quickly and showed signs of improving. A low gamma was not ideal. A high gamma was learning much too slowly to be practical.

Beta
The beta value corresponds to exploitation vs exploration. It was decreased from the default 1e-3 to 1e-4. A low beta corresponds to a lower learning rate, therefore favoring sticking to known strategies and exploiting rather than exploring. Indeed, the core gameplay loop is the same regardless of the phase so strategies don't change much beyond "don't get hit", so a low beta value was ideal. The light blue plotline corresponds to 1e-3 and the green line is 1e-4, both using the same parameters chosen above. We see a better rate of learning with a lower beta.

Rotation Observation
To reduce possible states, training was initially done without adding rotation to the observations for the agent. After identifying important parameters, rotation was added, significantly improving performance.

Deep Learning Asteroids AI

Hyperparameter Tuning

Performance Graphs

Rewards and Termination Logic

State Representation