The field of digital interactive storytelling focuses heavily on research into new storytelling systems, either in theory or in implementation. The goals of these systems tend to be similar: to create an immersive and engaging user experience through a story that either is directly influenced by user actions, or at least gives the user such an impression. Many different challenges have been identified in the previous work on the field, along with suggested practices that would allow them to be solved, or at least minimized in their effect.
One of the central problems in interactive storytelling, manifesting especially in story-based systems, is the concept of a combinatorial explosion. Simply put, whenever the story can branch into two or more directions, the amount of authoring work required to create the dialogue, environments and characters involved increases multiplicatively. Thus, if such forks are contained within each other, the authoring work increases exponentially until very soon a traditional branching story will reach an impossible size.
One of the potential solutions to such a problem would be to guide all users down a similar path, or at least one of a few possible paths, but this type of “funnelling” can lead to unsatisfied users if not handled properly. If the users are given decisions to make, they expect them to matter as well. The solution proposed here is trying to predict the choices users will make in a given situation, then present them with the exact type of situation that will lead them down the path the author wants them to follow.
In this paper we explore several possibilities for predictors, measurable variables that could be evaluated by the system during the execution of an interactive story . Prediction through user morality produced the best results, and in this article we will summarize the testing arrangement, results, and conclusions on the morality predictor.
The study was performed using Regicide, an interactive storytelling game in the form of a strategy roleplaying game for Android devices . Regicide was built on the IDEA (Interactive Dynamic Event-based Agency) system . The basic action loop of the system is based on events, which are discrete story fragments with an event text, which describes the initial situation; branches, which are the actions that the user can choose their character to take; and result texts, which are determined based on the chosen branch and the current game state. Events were made to be dynamic: they have variable slots that allow them to be modified by the system in order to guide the story presented to the user.
Regicide is a simple, text-based game. Users make a series of decisions in the character of a medieval noble in a fantasy world, with the ultimate goal of deposing an increasingly unstable monarch whom the player character also has a personal grudge with. This means that users mainly act in a reactionary capacity, not driving the action themselves. This limitation was a conscious choice in this particular implementation, not enforced by the system itself. In addition to the player character and his nemesis, the game contained five other major non-player characters (NPCs): four other influential nobles or other figures of importance in the kingdom, and a neutral advisor figure to help users understand the effects of their choices. Events, and the values of the variables within them, were chosen by a Drama Manager system, with the aim of creating a dramatically satisfying experience.
The game state was primarily defined by four resource variables representing the resources the player character has at their disposal, and four relationship variables representing the attitudes of the NPCs towards the player character. In addition, event results could trigger the activation or deactivation of tags, which were used to insert continuity into the story by keeping track of both minor and major occurrences that the user had played a part in. After 13 events, the user was presented with the endgame event in which they had to choose a method for their attempt at seizing the throne. Depending on their resources, relationships, and previous actions, the user would then either succeed or fail in their attempt. If the attempt was successful, the game would further evaluate the game state and generate endgame results, which provided further closure regarding the main storyline and parallel ones.
Morality Predictor Hypothesis
The basic hypothesis was that some users may place a great deal of importance on matters of morality when navigating through an interactive story. If this tendency can be reliably recognised, it can be used as part of a tool set of predictive algorithms. For example, a longer game could use a short introduction sequence with little to no relevance to the main story that gives users a few choices of immoral actions that grant personal gain against moral actions with no apparent reward or recognition. This data could then be used to modify later events with the intention of guiding the user down a specific path by making the other options less attractive to them from a moral standpoint.
In Regicide, the tendencies of the user with regards to morality in choices were tracked exactly like resources except for the fact that it was not in any way visible to the user. Every choice the player made had a moral value ranging from -3 to 3, and deviation from the initial value was used to indicate a preference towards either immoral or moral actions. The Drama Manager uses this value to augment the probability of the user selecting a given branch during the event selection process. This probability adjustment is the specific value that was collected for actual user choices in the testing process.
The concept of morality within the scope of this study was heavily based on western norms and traditions. The testing process was also targeted at people with backgrounds in societies where these norms are observed. Adapting an interactive storytelling system using a morality-based prediction system to cultures with different moral values would present additional challenges that are not discussed here.
Data Introduction and Analysis
All data used in this study was collected through Regicide. The game was distributed through the Google Play store as a free application . The following graphs contain demographical information on the study participants:
Although there are no surprises in the demographic data, it is important to keep in mind the test subjects were on average experienced roleplayers, and results may not be fully generalizable for a more diverse audience.
The data that was collected for the evaluation of the morality predictor came in the form of prediction values and the observed behavior of the user. The system calculated prediction values for each possible branch of any event that was presented to the user. These values were generated through a mathematical function using observed behavior in the previous events and the moral values assigned to each branch, and normalized so that they could be directly compared between different users and events. The function used to generate prediction values used a weighting system to ensure that predictions would not be too heavily skewed until a clear trend of moral or immoral user behavior had been established.
There were two important definitions to make when planning the standards of evaluation for the collected predictor data. The first was the choice of how many events to take into account in any given playthrough. The predictor needs some base data to work with before predictions actually have any basis or weight, but given the short nature of one game of Regicide cutting out a large part of it presented its own issues. A compromise approach using two separate data sets was chosen: one with all events and one with the last four events of the game. The second issue that needed defining was what constituted a successful prediction value over the average of several events. The value that was settled on for a very strong prediction value was an average of one out of three events with clearly correct predictions. The reason for the relatively low expected value was that not every event offered strongly morally charged options, but it still meant that even a single strongly wrong prediction would have a significant effect on the overall average. The other thresholds were set at 8/12 of this value for strong, 5/12 for medium, and 2/12 for weak.
As we can see, the very strong category is overrepresented in both graphs, especially in the 'Last Four Events' data. The average predictor values in some data sets clearly exceeded the threshold, but this is not necessarily a sign of a badly set threshold. Rather, it reflected a trend in user behavior, in which a fairly large subgroup of users literally never chose immoral options. The predictor gained in confidence throughout the run, thus making extremely confident (and correct) assumptions towards the end of the game.
In the 'All Events' data, the morality predictor found some level of success in 41% of all cases. This number improved to 45% in the 'Last Four Events' data. More detailed inspection showed that the data sets that found success were mostly the same for each set. However, the four weak results in the 'Last Four Events' data were all games that did not achieve notable values across all events, and some sets that found considerable success in across all events fell off completely when only looking at the last events where the predictor should be strongest. This lead to the conclusion that in terms of pure prediction power, the predictor was effective in about 36% of all cases in the study.
Further analysis also revealed a trend among morally motivated users regarding the content of choices. By identifying strong discrepancies between the average choices made by these users and the overall average on an event-by-event basis. The content of those events where such a discrepancy was found was studied, and it was found that morally motivated users were especially likely to make different choices than the majority of users in events which had them interact with NPCs. Of special interest was the fact that despite these users' general tendency to choose “the high road”, options with positive moral value, they were in fact more willing to take immoral actions when doing so would hurt an NPC that had also acted immorally. Thus, it would appear that not only do these users consider their own actions through the scope or morality, they are also more likely to evaluate other characters by the morality of their actions.
Conclusion and Discussion
Despite the morality predictor only reaching useful prediction rates in just over one third of all evaluated cases, it can definitely be considered to have been successful in this particular study. A perfect rate of prediction is not a realistic goal, and in a storytelling system designed around predicting user choice multiple different predictors would have to be working in conjunction with one another. The quality of the predictor is also limited by the lack of a universal standard by which to judge the morality of a given choice. Apart from this one drawback, the principle seems sound. Many users reported feeling morally challenged in the survey, whether they were projecting their own morality into the choices or assuming the morality of the user character.
Some decisions made during the design and development of Regicide were evaluated with regard to the performance of the chosen predictors after the data analysis. In the case of the morality predictor, the largest positive influence was the event design on a subset of events that put users in a position where they could choose between selfish and altruistic actions with little apparent risk of being punished for choosing the selfish path. The presence of the neutral advisor character was also extremely helpful, as a way to raise user awareness of the moral implications of given choices without actually having any influence on the outcome of the choice. Another successful category of events in terms of morality prediction was events where the user was made aware of the immoral actions of another character and given options on how to react to the situation.
The short duration and purely text-based interface of Regicide stand out as the biggest limitations of the platform in terms of providing reliable data regarding morality. Both have the same effect: they lower the immersion of the user, making it more difficult for them to experience the events of the game as actual events that pertain to actual people, and not just as text and numbers on a screen. Any predictor based on user reactions to events in the game can be expected to function better when the user is exposed to those events in a clear and evocative manner. The short duration also makes it difficult for users to form emotional ties to the characters presented in the game.
In order to further evaluate the viability of predicting user choice through morality it could be useful to create a more controlled testing environment. This would allow a researcher to isolate only the morality of choice without interference from other factors that may have affected user choices in Regicide. For example, a short interactive story without any form of reward system or goal would eliminate the possibility of users making their choice in order to achieve the greatest amount of success in the game. This would also require care in creating the story and restricting other possible influences like character design. This type of testing arrangement would only give data on the accuracy of prediction; it would not be helpful in assessing whether using morality as a predictor is usable in a commercial project, because full isolation of motivation cannot possibly be achieved in any more complex implementation.
Another angle of study would be to perform more localized research, with the intention of identifying differences in the moral codes of different cultures and societies and applying these trends to a prediction engine. This kind of research would probably greatly benefit from cooperation with sociologists or psychologists, who might also be interested in seeing the results. If a morality predictor were divided into subcategories of moral norms, its output could be treated differently depending on the cultural standards of the area where a particular copy of the game was activated.
 Itkonen, Eero, 2015. Influencing Perceived Agency: A Study into User Experiences in Digital Interactive Storytelling. Master's Thesis, University of Turku, Finland.
 Kyrki, Juhani, 2015. Metrics for Predicting User Behavior and Experience in an Interactive Storytelling System. Master's Thesis, University of Turku, Finland.