Recent developments in artificial intelligence have gathered a lot of media attention in both academia and popular news. For example, attention has been paid to recent successes in solving games [13, 19]. Researchers in machine learning research are increasingly turning to video games as a benchmarking environment for testing AI algorithms . Here, we propose to flip this paradigm and suggest using a technique known as reinforcement learning as a tool for testing the design and implementation of game mechanics.
Reinforcement learning is a field of AI research with a rich history, drawing on related fields such as optimal control, optimization, psychology, neuroscience and computer science [8, 9]. Reinforcement learning algorithms are flexible in that they can be applied to model and solve sequential decision making problems in a multitude of settings such as performing difficult aerobatic stunts with a helicopter , managing an investment portfolio , modeling river basin hydrology , playing Backgammon at a world-champion level , or playing Atari games better than a human . The core assumption that reinforcement learning makes is that the problem can be described as a Markov Decision Process .
The aim of reinforcement learning is for a computer algorithm (called an “agent”) to learn how to complete a given task. Unlike other machine learning techniques, the steps required to accomplish the task are not explicitly specified (a human expert might not even know how to write down the correct instructions). Instead, the agent uses a process of “free play” and exploration in a simulated environment to discover actions that lead to task completion. The key element enabling the success of this process is a “reinforcement function” (or reward function) specified by the researcher. This function dispenses virtual rewards or punishments to the agent based on their state and actions within the simulated environment. In this sense, the reward function guides the agent to learn to complete the task. For more technical details on the theory of reinforcement learning and the techniques used to solve these problems, we refer the reader to .
Reinforcement learning is a promising approach to solving many problems in robotics, artificial intelligence and other fields, however the requirements to accurately specify the complete Markov Decision Process model can lead to difficulties implementing these algorithms in real-world applications. For this and other reasons, researchers are increasingly using video games and simulated worlds for testing reinforcement learning methods. The motivation for this is that virtual benchmarks can act as proxies for real world tasks, allowing a researcher to gauge the generalizability of an algorithm in a controlled setting . For example, David Silver et al garnered widespread attention in 2016 by achieving superhuman performance on a range of Atari games using a reinforcement learning algorithm . More recently, reinforcement learning has been used to create artificial intelligence that can play competitively with international Dota e-sports champions , and significant resources are being invested in the development of dedicated virtual benchmark environments for testing reinforcement learning agents [3, 17, 18, 20, 21, 24, 28].
While these efforts are important to the development of reproducible reinforcement learning research, an interesting by-product is that reinforcement learning may actually benefit the game development community. Within academia, reinforcement learning algorithms are notorious for a phenomenon known as “reward hacking”. In this situation, the reinforcement learning agent following the specified reward function discovers unintended consequences or behavior in the simulated world, often to the surprise and chagrin of the researchers. This results from a simple fact: it is surprisingly difficult for humans to specify what we want in a way that is both comprehensive and comprehensible to a machine. This issue, known more generally as the AI value-alignment problem, has serious implications for AI governance, interpretability and safety , and has led to entire sub-fields of research (e.g. inverse reinforcement learning , apprenticeship learning  and learning from demonstration ).
Reward hacking behavior may be beneficial to game design and testing in as much as modern game systems are often complex - there may be gameplay mechanics, scoring systems and rules that interact in subtle ways not realized even by the designers and implementers of these systems . These problems are exacerbated in the competitive, online contexts present in many of today’s video games. Additionally, the size of modern game code bases, the difficulty of software testing, and industry norms and practices like “crunch time” culture that encourage hastily written software would suggest that many games are published with unknown numbers of bugs, even if the implementation of the mechanics or scoring systems appears correct under most circumstances . We propose that intentional reward hacking via reinforcement learning agents might be useful as a design tool for detecting both broken game mechanics and broken implementations of game mechanics (i.e. software bugs).
Reinforcement learning agents have already demonstrated the ability to uncover both types of issues. One well-known example is a reinforcement learning agent, trained to play the Coast Runners boat racing game . Instead of learning to progress through the game by racing (as intended), the agent optimized for score by spinning in circles and colliding with objects. This highlighted the fact that the game’s score mechanic was “broken” – high scores could be achieved in a single level, even by failing to race (in a racing game) . This example is now synonymous with the concept of reward hacking in reinforcement learning literature, but it does not seem to have received much attention in the game design and testing communities.
In another, more recent example, reinforcement learning researchers inadvertently discovered a bizarre and significant software bug in the QBert Atari game through reward hacking. To quote the authors: “…the agent discovers an in-game bug. First, it completes the first level and then starts to jump from platform to platform in what seems to be a random manner. For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge amount of points” . This is remarkable in that this same benchmark had been play-tested for many hours by other AI researchers, and yet this bug apparently remained undiscovered until now – suggesting it was extremely difficult to reproduce. There are numerous other colloquial examples of optimization-based AI algorithms discovering software bugs, often in the context of video games , but to our knowledge, this idea has received little consideration in the game design and testing literature.
Reinforcement learning is a powerful AI technique that can be applied to solve many kinds of problems. Researchers in this field are increasingly using virtual simulated worlds as a way to test and benchmark these algorithms, and we believe that this relationship may have mutual benefits for the game design industry as well. The reward hacking phenomenon described here could be utilized to test mechanics for individual, cooperative or competitive games, and will likely bring the most benefit in scenarios where complexity could hide logical or implementation issues in game mechanics. In modern video games that are played competitively online, and have large codebases, downtime for patching broken game mechanics (i.e. ‘game balance’ patches) or implementations (‘software bug’ patches) can result in frustrated users and real financial impact for game companies. As in the broader software development industry, this creates strong demand for techniques to test code more efficiently. It is our belief that recent examples from artificial intelligence literature demonstrate an untapped potential for reinforcement learning to be used to address this need.
 Abbeel, P. and Ng, A.Y. 2004. Apprenticeship learning via inverse reinforcement learning. Twenty-first international conference on Machine learning - ICML ’04. (2004), 1. DOI:https://doi.org/10.1145/1015330.1015430.
 Argall, B.D. et al. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems. 57, 5 (2009), 469–483. DOI:https://doi.org/10.1016/j.robot.2008.10.024.
 Beattie, C. et al. 2016. DeepMind Lab. (2016).
 Chrabaszcz, P. et al. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari.
 Coast Runners, a free online game on Kongregate: 2011. https://www.kongregate.com/games/longanimals/coast-runners. Accessed: 2018-04-06.
 Faulty Reward Functions in the Wild: 2016. https://blog.openai.com/faulty-reward-functions/. Accessed: 2018-04-06.
 Henderson, P. et al. 2018. Deep Reinforcement Learning that Matters. Thirthy-Second AAAI Conference On Artificial Intelligence (AAAI) (2018).
 Kaelbling, L.P. et al. 1996. Reinforcement learning: A survey. Journal of artificial intelligence research. 4, (1996), 237–285.
 Kober, J. et al. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research. 32, 11 (Sep. 2013), 1238–1274. DOI:https://doi.org/10.1177/0278364913495721.
 Lake, B.M. et al. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences. 40, (2017). DOI:https://doi.org/10.1017/S0140525X16001837.
 Lee, J.-H. and Labadie, J.W. 2007. Stochastic optimization of multireservoir systems via reinforcement learning. Water Resources Research. 43, 11 (Nov. 2007). DOI:https://doi.org/10.1029/2006WR005627.
 Lehman, J. et al. 2018. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. (2018).
 Mnih, V. et al. 2015. Human-level control through deep reinforcement learning. Nature. 518, 7540 (Feb. 2015), 529–533. DOI:https://doi.org/10.1038/nature14236.
 Moody, J. et al. 1998. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting. 17, 56 (1998), 441–470.
 Ng, A.Y. et al. Autonomous inverted helicopter flight via reinforcement learning.
 Ng, A.Y. and Russell, S.J. 2000. Algorithms for inverse reinforcement learning. Proceedings of the 17th International Conference on Machine Learning (2000), 663–670.
 Nichol, A. et al. 2018. Gotta Learn Fast: A New Benchmark for Generalization in RL. (2018).
 NVIDIA Isaac: Virtual Simulator For Robots: 2018. https://www.nvidia.com/en-us/deep-learning-ai/industries/robotics/. Accessed: 2018-04-06.
 OpenAI at The International: 2017. https://openai.com/the-international/. Accessed: 2018-04-06.
 OpenAI Gym: 2016. https://gym.openai.com/. Accessed: 2018-04-06.
 OpenAI Universe: 2016. https://blog.openai.com/universe/. Accessed: 2018-04-06.
 Potanin, R. 2010. Forces in Play: The Business and Culture of Videogame Production. Proceedings of the 3rd International Conference on Fun and Games (New York, NY, USA, 2010), 135–143.
 Reinforcement Learning Lecture Series: 2015. http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html. Accessed: 2018-04-06.
 Shah, S. et al. 2017. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. Field and Service Robotics (2017).
 Sicart, M. 2008. Defining game mechanics. Game Studies. 8, 2 (2008), 1–14.
 Tesauro, G. 1995. Temporal difference learning and TD-Gammon. Communications of the ACM. 38, 3 (1995), 58–68.
 The Economist Newspaper Ltd 2017. Shall We Play A Game? The Economist.
 Todorov, E. et al. 2012. Mujoco: A physics engine for model-based control. Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on (2012), 5026–5033.
Aaron Snoswell is a Ph.D. candidate at The University of Queensland’s School of Information Technology and Electrical Engineering Robotics Design Lab in Brisbane, Australia. His research focuses on the use of inverse reinforcement learning and related machine learning techniques for robotic control. With a background as a mechatronic research engineer, Aaron is passionate about technology and its potential to create global change for good, especially the ethical use and development of artificial intelligence techniques. He can be reached on Twitter @aaronsnoswell
Centaine L. Snoswell is a Ph.D. candidate at The University of Queensland’s School of Pharmacy. Her research focus is on service evaluation in telehealth, specifically examining the economic impact when technology is used for new health interventions.