Crowdsourcing the Aesthetics of Platform Games
Crowdsourcing the Aesthetics of Platform Games
Noor Shaker, Georgios N. Yannakakis, Member, IEEE, and Julian Togelius, Member, IEEE
Abstract—What are the aesthetics of platform games and what
having an algorithm that could observe a human playing a game
makes a platform level engaging, challenging, and/or frustrating?
and accurately judge what the human is experiencing as he/she
We attempt to answer such questions through mining a large set of crowdsourced gameplay data of a clone of the classic platform
is playing the game would also be useful, as this could allow
game Super Mario Bros (SMB). The data consist of 40 short game
us to adapt the game to the player, and also help us understand
levels that differ along six key level design parameters. Collec-
how human affect is expressed in behavior.
tively, these levels are played 1560 times over the Internet, and the
A number of researchers have attacked this problem from
perceived experience is annotated by experiment participants via
a top–down perspective, that is, by creating theories of the
self-reported ranking (pairwise preferences). Given the wealth of this crowdsourced data, as all details about players’ in-game be-
aesthetics of game content and gameplay based on introspec-
havior are logged, the problem becomes one of extracting mean-
tion or qualitative research methods. For example, Malone [1]
ingful numerical features at the appropriate level of abstraction for
proposed that computer games are “fun” when they have the
the construction of generic computational models of player experi-
right amount of challenge and evoke curiosity and fantasy, and
ence and, thereby, game aesthetics. We explore dissimilar types of
Koster [2] proposed that fun in games is connected to the player
features, including direct measurements of event and item frequen- cies, and features constructed through frequent sequence mining,
learning to play the game. Magerko et al.’s [3] research within
and go through an in-depth analysis of the interrelationship be-
learning games proposed an adaptation framework based on a
tween level content, players’ behavioral patterns, and reported ex-
predefined set of learning styles.
perience. Furthermore, the fusion of the extracted features allows
Such theories are, in general, too high-level and vague about
us to predict reported player experience with a high accuracy, even
key concepts to be implemented in algorithms, though some
from short game segments. In addition to advancing our insight on the factors that contribute to platform game aesthetics, the results
attempts have been made to create computational models based
are useful for the personalization of game experience via automatic
on them [4], [5].
game adaptation.
Other authors have tried to identify more speci fic and
concrete elements of game design and game content that
Index Terms— Computational aesthetics, experience-driven
procedural content generation, player experience modeling.
contribute to player experience, so-called “patterns in-game design”; Björk et al. [6] are in an ambitious ongoing effort cataloguing hundreds of such patterns, whereas other authors
NTRODUCTION I. I
discuss patterns in content design for individual genres, such as first-person shooters. For example, Hullett and Whitehead
[7] analyze some key patterns in first-person shooter games, N algorithm that could automatically judge how en- such as sniper positions and open arenas and discuss how they
contribute to player entertainment. In [8] and [9], a system is—that is, a computational model of the aesthetics of game that visualizes players’ behaviors to allow analysts to easily
gaging or interesting a particular piece of game content
content—would be useful for several reasons. One strong identify patterns and design issues is presented. Jennings-Teats reason is that such a method would help us to automatically et al. [10] showed how player experience can be altered by or semiautomatically generate good content; another is that presenting sequences of level segments rated by their dif ficulty analysis of the algorithm could help us understand what players and presented to the player according to her behavior. like in games, and ultimately contribute to understanding the
Of particular interest for our current concern is the work of cognitive and affective procedures behind human entertainment Smith et al. [11], who analyzed platform game levels and pro-
and motivation in general. As players tend to vary significantly posed a hierarchical ontology for such levels where cells con- in their preferences, it would further be useful to have an algo- tain rhythm groups which, in turn, consist of components such rithm that, given information about a particular player, could as platforms, collectibles, and switches. The authors further hy- predict the appeal of the game content for that player. Finally, pothesize about how certain design choices might affect player
experience, such as short and uneven rhythm groups making the level more challenging and longer rhythmic sections demanding
Manuscript received November 12, 2011; revised May 22, 2012 and September 15, 2012; accepted November 18, 2012. Date of publication
sustained concentration. These principles were eventually in-
December 03, 2012; date of current version September 11, 2013. The work
corporated into the Tanagra level generator, which can create
was supported in part by the Danish Research Agency, Ministry of Science,
levels with rhythmic structure but does not include methods for
Technology and Innovation under Project “AGameComIn” (274-09-0083). N. Shaker and J. Togelius are with the IT University of Copenhagen, Copen-
judging the aesthetics of completed levels [12].
hagen 2300, Denmark (e-mail: [email protected]; [email protected]).
If we find accurate such theories, we can then create theory-
G. N. Yannakakis is with the Department of Digital Games, University of
driven models of game aesthetics. However, even if the theo-
Malta, Msida MSD 2080, Malta (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online
ries are correct and suf ficiently extensive to allow prediction
at http://ieeexplore.ieee.org.
of player experience in a wide range of situations, they would
Digital Object Identi fier 10.1109/TCIAIG.2012.2231413
also need to be quantitative in order to be incorporated within
1943-068X © 2012 IEEE
SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 277
an algorithm, something that most current theoretical efforts to understand game aesthetics are not. They would also need to
be grounded in measurable quantities. For example, theories based on design patterns would need to be accompanied by al- gorithmic ways of detecting and locating such patterns.
The alternative, complimentary approach is to create data-driven (bottom–up) models of game aesthetics based on collecting data about games, game content, and players’ behavior, and correlating these data with data annotated by player experience tags. This approach, which builds on ma- chine learning and/or other statistical methods, can be seen as crowdsourcing aesthetics modeling. A few researchers have attempted to create such experience models via the affective annotation of data streams such as sounds and videos [13], but the application of crowdsourcing-based approaches for player testing and game aesthetics has not yet been investigated. The approach is also closely linked to massive-scale game data mining [14], [15]; however, direct annotations of player experience are not generally available in those studies. For an overview of research on building aesthetic game models from data, as well as a multifaced framework that interconnects player experience modeling and game adaptation, the reader is referred to the experience-driven procedural content genera- tion framework [16].
In this work, we are taking a slightly narrower view of aes- thetics. We judge the aesthetics of the level design from the players’ point of view, based on the content generated and the gameplay experience it provides. The content can be generated automatically by an algorithm or it can handcrafted by a de- signer, or indeed be created in a mixed-initiative (cocreation) fashion. We are trying to devise a data-driven approach that can automatically extract game design patterns from existing games. These patterns can be used by an algorithm for adapting the game, or they can be generalized and used by game designers when constructing a new game. The focus of the proposed work is not on adding to what we know about what makes a level frustrating or challenging, but rather to construct a quantitative measure of game aesthetics.
A. Relation to Our Own Previous Work In the past, we have published several papers about the
computational aesthetics of the platform game used in this paper, Infinite Mario Bros (IMB). Our first papers [17], [18] reported on the construction of models that predict six different aspects of player experience, based on 36 features extracted from gameplay and four controllable features, which could be used to generate levels. We used forced choice questionnaires to collect player experience data, and neuroevolutionary pref- erence learning combined with feature selection to induce the models, just as we do in this paper. Data were collected from 480 game sessions, played by at most 240 different players. Models were found that predicted certain aspects of player experience with between 73% and 91% accuracy.
A follow-up paper [19] used the same data set as the pre- vious papers, but focused on generating levels based on the models we had learned. The levels were generated by system- atically varying the four controllable features between high and low states until the parameter set was found, which yielded the
highest or lowest predicted value on one of the six player expe- rience dimensions. That parameter configuration was then used to generate personalized levels for the player that maximized predicted challenge, frustration, or fun. The generated levels were, in turn, tested, using both human players and algorithmic agents playing the game, to verify that the adaptation mecha- nism worked by tracking predicted player preference over sev- eral levels.
While these experiments were successful, it became ap- parent that the data set had some limitations. The number of controllable features (and the number of configurations of these features that were tested) was too small to permit meaningful exploration of the search space or the possibility of finding interestingly new design parameter configurations. Also, one of the controllable features (direction switching) and three of the player experience dimensions (predictability, anxiety, and boredom) turned out to be relatively uninteresting to explore in the context of the current game. The levels used in the first data set each took about a minute to play, which, we judged, was overly long, given that we wanted our model to apply to the aesthetics of the moment, in order to enable online adaptation. Finally, and most importantly, we wanted to record more detailed information about both levels and gameplay in order to see if we could find a way to predict player experience even better—to squeeze more information out of the data, as it were. Therefore, we embarked on collecting a new data set, with more levels (40) and more players (1560 games were played). In a recent paper [20], we reported on preliminary explorations of this data set. There, we tried to predict player experience of reported engagement based only on level features (disregarding all gameplay traces), and introduced the use of frequent subsequence counts (as found by sequence mining algorithms) as features extracted from levels. We also explored predicting features from only parts of levels, in order to find the minimum level segment length which would allow us to perform meaningful adaptation. It was found that both re- stricting level segment lengths and disregarding player metrics significantly decreased the predictive power of derived models.
In this paper, we explore the same data set as the one used in [20] to a much greater depth. We investigate a number of ways of extracting sequence data from levels and play traces that go beyond what we used in all previous modeling attempts, and we explore the predictive power of new direct (nonsequential) measures of both levels and player metrics, both on their own and in combination with sequence data. The goal is to create models that predict player experience as accurately as possible from observing a game level and how well a player plays it. We believe the methods we develop along the way to be potentially useful for other games which have a linear structure.
ESTBED II. T P LATFORM G AME The testbed platform game used for our study is a modified
version of Markus Persson’s IMB which is a public domain clone of Nintendo’s classic platform game Super Mario Bros (SMB) (Fig. 1). IMB features the same art assets and general game mechanics as SMB but differs in level construction; while human-authored levels have been constructed for the original SMB, IMB features infinite numbers of procedurally generated
278 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013
Fig. 2. Enemies placement using different probabilities: high probability is given (a) to placement around horizontal boxes
, (b) to placement around
gaps
, and (c) to random placement
data were extracted from raw logs and replays: content, game- play, and annotated (self-reported) player experience.
A. Content Data
Fig. 1. Snapshot from IMB, showing Mario standing on horizontally placed
blocks surrounded by different types of enemies.
Two types of content features have been extracted: direct and sequential. The direct content features are also named controllable, as they are used to generate the levels and are
levels. The gameplay in IMB consists of moving the player-con- varied to make sure several variants of the game are played and trolled character, Mario, through 2-D levels. Mario can walk and compared. The level generator of the game has been modified run, duck, jump, and shoot fireballs. The main goal of each level to create levels according to the following six controllable is to get to the end of the level. Auxiliary goals include collecting features. as many coins as possible, and clearing the level as fast as pos-
• The number of gaps in the level ; gaps are the holes in sible. For more details about the game, the reader may refer to
the game into which Mario may fall and die. [17].
• The average width of gaps
. This parameter controls ware has been used relatively extensively as a testbed for re-
The game itself is very well known, and the benchmark soft-
• The number of enemies
the number of Goombas (mushroom-like enemies) and search and as a testing environment for various AI techniques
Koopas (turtle-like enemies) scattered around the level, [19], [21]–[25]. The game is also being used as a benchmark for
affecting the level difficulty.
. The way enemies are placed around IMB has been chosen because of the popularity of SMB, the
the Mario AI Championship 1 [26].
• Enemy placement
the level determined by three probabilities which sum to high similarity between the two, the availability of an open
one.
: Enemies are placed on or collection easier, and because of the 2-D design and game me-
source clone of the game which makes development and data
— Around horizontal boxes
under a set of horizontal blocks (a number of blocks chanics it provides, which are similar to other games from the
placed horizontally without connection to the ground). same genre. The game has been used for this study not primarily
— Around gaps : Enemies are placed within a close dis- in order to find new design insights, but rather to validate that
tance to the edge of a gap.
the methodology could be used to find new design insights if
: Enemies are placed on a flat used on a less-known game genre.
— Random placement
space on the ground.
While implementing most features of SMB, the standout fea- Fig. 2 illustrates positioned enemies by giving different ture of IMB is the automatic generation of levels. Every time a
. Fig. 2(a) shows enemies placed new game is started, levels are randomly generated by traversing
values for ,
, and
by setting
to 80%. Fig. 2(b) illustrates the result of set-
to 80%, and Fig. 2(c) is the result of 80%. as specified by placement parameters. In our modified version,
a fixed width and adding features according to certain heuristics,
ting
. Mario can collect powerup we concentrated on a number of selected parameters that affect
• The number of powerups
elements hidden in boxes to upgrade his state from little to gameplay experience.
big or from big to fire. • The number of boxes . We define one variable to specify the number of two different types of boxes that exist in
ATA C III. D OLLECTION
IMB. These two types of boxes are here referred to as blocks and rocks. Blocks (which look like squares with
Data from gameplay and questionnaires have been collected question marks) contain hidden elements such as coins or from hundreds of players over the Internet via a crowdsourcing powerups. Rocks (which look like squares of bricks) may experiment. Complete games were logged, including the levels hide a coin, a powerup, or simply be empty. Mario can the players played and what actions the players took at which smash rocks only when he is in big mode. time, enabling complete replays. The following three types of The generation of levels with specified values for all pa-
1 http://www.marioai.org.
rameters is guaranteed by the generator; while generating the
SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 279
Fig. 3. An example level generated and used to collect the data.
levels, and whenever an item is to be added, these parameters
C. Reported Player Experience Data
are checked and the item is placed accordingly. We rely on self-reporting annotations based on previous re- Game designers, who are familiar with 2-D platform games, search in which very accurate player experience models of self-
and, in particular, SMB, have been consulted when selecting the report affective states have been constructed [27], [28]. How- controllable features. The features have been chosen based on ever, a number of limitations are embedded in the players’ self- their impact on the investigated affective states and their gen- reporting experience modeling, including noise due to learning erality to other 2-D platform games. Note that consulting game and self-deception, disruption to gameplay experience, and sen- designers does not conflict with the bottom–up approach fol- sitivity to memory limitations. In order to minimize these ef- lowed, which derives models of player experience based on data fects, we rely on annotated player experience data collected via collected from players while the designers’ knowledge is incor-
a four-alternative forced choice questionnaire presented after porated only when designing the experiment.
small game sessions. The questionnaire asks the player to report The first two features appeared in our previous studies [19], the preferred game for three user states: engagement, challenge,
[27], whereas the four new features are explored for the first and frustration. The selection of these states is based on earlier time here.
game survey studies [27] and our intention to capture both af- Two states (low and high) are set for each of the controllable fective and cognitive/behavioral components of gameplay ex-
parameters above, except for enemy placement, which has been perience [16]. assigned three different states allowing more control over the
The questionnaire protocol gives the players the following difficulty and diversity of the generated levels. The total number alternatives: of pairwise combinations of these states is 96. This number can
• game A[B] was/felt more E than game B[A] (cf., two-al-
be reduced to 40 by analyzing the dependencies between these
ternative forced choice);
features and eliminating the combinations that contain indepen- • both games were/felt equally E; or dent variables. All levels have been checked before starting the
• neither of the two games was/felt E; data collection in a way that assures their compatibility with the where E is the affective state under investigation.
intent parameters assigned. An example level generated by one possible combination of the controllable features is presented in
XPERIMENTAL IV. E P ROTOCOL Fig. 3.
The game survey study has been designed to collect subjec- In addition to the direct (controllable) features, sequential tive affective reports expressed as pairwise preferences of sub-
content features are also extracted. The topology of the levels jects playing the different variants (levels) of the testbed game is converted into sequences of numbers representing different by following the experimental protocol proposed in [29]. types of game items, and sequence mining techniques are ap-
According to the protocol, each subject plays a predefined set plied to extract useful patterns from the resulting sequences (see of two games. The games played differ in the levels of one or
Section V-B). more of the six controllable features presented previously. The game sessions presented to players have been con-
B. Gameplay Data structed using a level width of 100 IMB units (blocks), about While playing the game, different player actions and inter- one-third of the size usually employed when generating levels actions with game items and their corresponding timestamps for IMB games in our previous experiments [19], [27]. The have been recorded. These events are categorized in different selection of this length was due to a compromise between a groups according to the type of the event and the type of inter- window size that is big enough to allow sufficient interaction action with the game objects. The events recorded are the fol- between the player and the game to trigger the examined lowing: level completion event; Mario death event and cause affective states, and a window which is small enough to set an of death; interaction events with game items such as free coins, acceptable frequency of an adaptation mechanism applied in empty rock, coin block/rock, and power-up rock/block; Mario real time, aiming at closing the affective loop of the game [30]. enemy kill event associated with the type of actions performed
A previous study [20] has been conducted to test whether a to kill the enemy and the type of enemy; changing Mario mode smaller level width than the chosen one can be used to construct (small, big, or fire) event; changing Mario state (moving right, models for predicting players’ engagement from game content left, jump, run, duck) event; and the full trajectory of Mario as with higher accuracy than the models constructed based on
a combination of events. information from levels with the chosen width. The study con- Both direct and sequential gameplay features have been ex- cluded that the models perform best when trained on features tracted based on the aforementioned events. The detailed list of extracted from levels with the selected width rather than from features is presented in Section V.
levels with half or one-third of the width.
280 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013
TABLE I F EATURES E XTRACTED F ROM D ATA R ECORDED D URING G AMEPLAY
A total number of 780 players participated in this crowd- style. Alternatively, analyzing sequences of game content and sourcing experiment. Participants’ age covers a range between players’ behavior yields patterns that might be directly linked
16 and 64 years (31.5% females), while their location in- to player experience. For example, we would like to extract fea- cludes Denmark (46.11%), Greece (8.9%), Ireland (1.48%), tures that encapsulate whether a player performed a particular the United States (3.34%), Holland (0.74%), Finland (1.36%), action before or after encountering a specific in-game situation. France (0.37%), Syria (0.25%), Sweden (0.37%), Korea
Modeling players’ experience based on features extracted (0.12%), Spain (0.25%), or unknown (36.71%).
from sequential information provides a promising alternative for models constructed based on direct feature extraction, and
EATURE V. F E XTRACTION
by fusing these two types of representations, we anticipate constructing more accurate models of player experience than
In the following, we describe the types of features that we those constructed on one of these forms of data representation have extracted from the recorded content and gameplay data via at a time. direct and sequential feature extraction.
In the following, we describe different criteria for con-
Most of the direct features presented appear in our previous structing sequences from game content, gameplay, and the studies [17], [19]. These features are used in this work due to interaction between the two. We present two sequence mining their relevance for modeling player experience. In this work, approaches and further discuss different setups that can be used we also investigate sequential patterns extracted from gameplay for mining the extracted sequences. data.
Table II presents the different possible approaches that can be
A. Direct Gameplay Features followed to generate different types of sequences. The columns represent the different order and frequency at which information
Several features have been directly extracted from the data is logged. The rows represent what type of data is logged each recorded during gameplay (see Section III). The choice of these time an event occurs. We will be distinguishing the following features is made in order to be able to represent the difference orders/frequencies, while acknowledging that even more fine- between a large variety of IMB playing styles. In addition to grained distinctions are possible. the six controllable game features that are used to generate IMB
: time step. Information is logged at a constant rate levels, the features presented in Table I are extracted from the
(e.g., once per second), regardless of what the player does. gameplay data collected and are classified in five categories:
This yields a sequence with a length proportional to the time, interaction with items, interaction with enemies, death,
time taken by the player to play the level. and miscellaneous.
• Block: Information is logged once per block in the level, independent of the time taken by the player to traverse the
B. Sequential Features level. This yields a sequence with a length equal to the We investigate another form of indirectly representing the
level.
gameplay interaction by means of sequences, which allows in- • Gameplay event: Information is logged each time the cluding features that are based on ordering in space or time.
player changes the command issued (pressing/releasing a Gameplay features presented in Section V-A provide a quan-
button or changing direction) or something else happens titative measure of different types of game content and playing
(e.g., Mario changes the mode or stomps an enemy).
SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 281
TABLE II T HE D IFFERENT T YPES OF S EQUENCES T HAT C AN B E G ENERATED .C OLUMNS P RESENT THE T YPE OF E VENT TO B E R ECORDED ,W HILE R OWS P RESENT W HEN TO R ECORD THE E VENT .T HE C OMBINATIONS M ARKED W ITH AN XA RE THE O NES I NVESTIGATED IN T HIS P APER
Fig. 4. Snapshot from a level and the corresponding (a) platform structure se- quence representation
and (b) enemies and items sequence representation .
TABLE III
T HE D IFFERENT T YPES OF E VENTS C ONSIDERED W HEN G ENERATING THE
S EQUENCES AND T HEIR G RAPHICAL R EPRESENTATION
four different possible values: 0, 1, 2, and 3, corresponding, respectively, to nonexistence of either enemies or items, the existence of an enemy ( ); the existence of an item ( );
and the existence of an enemy and an item ( ). Fig. 4(b) illustrates an example level segment where the aforemen- tioned four states are presented.
• Content corresponding to gameplay events : We ex- plored another method in which game content at the specific player position is recorded whenever the player performs an action or interacts with game items. In this case, different content events are used: increase in plat- form height ( ); decrease in platform height ( ); the existence of an enemy ( ); the existence of a coin, a block, or a rock ( ); the existence of a coin, a block, or a
rock with an enemy ( ); and the beginning ( ) and the The information logged each time anything is logged can be
end ( ) of a gap.
either game content , player (gameplay) behavior
2) Sequential Gameplay Features: Sequences representing both game content and players’ behavior
, or
different players’ behavior have been generated by recording We will focus the discussion for the rest of the paper on the key pressed/released events (action event) or interaction with
five sequence types marked with an X in Table II. Although we item events. The action event might represent a simple action only investigate a few sequence types of all those available, our performed, such as pressing an arrow key to move left or right; sample provides a variety of options that cover different aspects or more complex players’ behaviors that can be achieved by of playing experience.
pressing a combination of keys at the same time (e.g., jumping Once we know what to sample and when, the question re- over a big gap requires the player to press the run and jump keys
mains how to turn this information into sequences using a low- together for a number of time steps). The following list describes cardinal alphabet. Below, we discuss how to do this for levels the different events that have been considered (Table III): and for gameplay traces.
• pressing an arrow key to move right, left, or duck ;
1) Sequential Content Features: Sequences capturing dif-
• pressing the jump key ;
ferent information about level geometry have been extracted by • pressing the jump key in combination with right or left key converting the content of the levels into numbers representing
different types of game items. Three different representations • pressing the run key in combination with right or left key of game content have been investigated. The full list of events
considered as well as their graphical representation is presented • pressing the run and jump keys in combination with right in Table III.
or left key
• Platform structure : A sequence is generated by com-
• not pressing any key .
paring the height of each block across the level with the
• winning the game
height of the previous block and recording the following
• losing the game ;
values: 0 if no difference found ( ); 1 if there is an in-
; crease in the platform height ( ); 2 if there is a decrease in
• killing an enemy by stomping
• unleashing a koopa shell ;
the platform height ( ); and 3 and 4 to mark the beginning
• changing Mario mode .
( ) and the ending ( ) of a gap, respectively. Fig. 4(a) Fig. 5 presents the graphical interpretation for most of the presents a part of a level and the corresponding platform actions that can be performed. structure sequence representation.
In this paper, we will consider two time window values for • Enemy and item placement : The term “items” refers to generating sequences: 0.5 s
and 0.25 s meaning the coins and the different types of boxes scattered around that an event will be registered every half or quarter of a second, the level. The existence and nonexistence states for ene- respectively. We also consider sequences generated whenever mies and items have been combined together, resulting in the player switches the action . (Note that IMB is a fast-paced
282 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013
Mining sequences across multiple time series of data—content corresponding to gameplay events
, players’ behavior ,
and multimodal sequences
—can be achieved via the GSP algorithm [33]. Martinez and Yannakakis [34] have used GPS to obtain frequent subsequences across multiple modalities of player input (physiology and game-based context).
In the following, we provide a short list of frequent subse- quence mining definitions and give a brief description of the two algorithms and the way they have been used in this paper.
Fig. 5. A graphical representation of the different actions that can be performed by the player: (a) standing still
; (b) moving right
; (c) moving left
A. Definitions
(d) ducking ; (e) jumping
; (f) jumping right
; and (g) running right
A data sequence is a sample of a sequential data set where each sequence consists of a number of events, each one asso- ciated with a time stamp. The events are ordered by increasing
game in which the player could in theory perform an action time. every 1/24 s).
is a nonempty set of simultaneous The purpose of recording these events is that players’ be- events denoted by
A sequence pattern
where is an event. A data havior and playing style can be analyzed by looking at events sequence supports a sequence pattern if and only if it contains
generated by each player and how frequently each of these all the events present in the pattern in the same order but not events occurs. Generating a sequence combining these events necessarily consecutive. in a timely manner provides a more in-depth insight about
is the minimum number of times more complex behavior patterns that might have an impact on
A minimum support
a pattern has to occur in the data sequences to be considered players’ experience.
frequent. If the number of occurrences of in the data sequences The resulting sequences of players’ behavior have a wide exceeds
a frequent pattern, and, in this case, variety, both in terms of length and structure, which reflects the fraction of data sequences that support is refereed to as the diversity of players’ playing style and complicates any se- support count
, we call
quence mining algorithm that can be applied to extract useful information. This diversity is reflected on the normalized com-
B. SPADE
A modified version of the SPADE algorithm [32] has been test for structural similarity between the sequences. The results implemented to extract frequent subsequences of different
pression distance (NCD) measure [31] that has been applied to
of applying this function on each pair of the action sequences game events. Game content for the 40 levels has been con- showed a high dissimilarity between the sequences (NCD
verted into numbers representing different types of content in 71.32% of the cases).
events, as described in Section V-B1. Different subsequence
3) Fused Sequential Features of Both Game Content and lengths and minimum support threshold values have been Gameplay Data: Game content and players’ behavior events explored. A minimum support threshold of 20 has been used, have been fused together to generate bimodal sequences
. meaning that each subsequence should occur in at least half of Events from the two modalities have been extracted, with their the levels to be considered frequent. corresponding time stamp, and then logged in the temporal order. The generated event contains information about the
C. GSP
game content at the specific position in the game where the The GSP algorithm [33] solves the sequence mining problem gameplay event occurred (which is one of the events mentioned based on an a priori algorithm with a number of generaliza-
in Section V-B1, or none if no content event from the list tions. Using GSP, we can discover patterns with a predefined happens to occur at this specific position) along with the type minimum support, define time constraints within which adja- of gameplay event.
cent events can be considered elements of the same pattern, and specify a time window for events from different modalities to
EQUENCE VI. S M INING
be considered as synchronous events.
Sequence mining techniques have been applied to extract GSP generalizes the basic definition of frequent sequential useful information from the different types of sequences gen- pattern by introducing two relaxation schemes. erated. Two algorithms for frequent item set mining have been
• Sliding window: This generalization allows the items of a implemented to find frequent sequence patterns within the
pattern to be contained in the union of the items belonging data set of sequences: the Sequential PAttern Discovery using
to different time series. According to this relaxation, a se- Equivalence classes (SPADE) algorithm and the generalized
, where and can be contained in dif- sequential pattern (GSP) algorithm. The SPADE [32] algorithm
quence
ferent time series, is allowed to be counted as a support for has been used to mine single-dimensional sequences that repre-
a subsequence as long as the time difference between sent game content independently of players’ behavior, namely,
is less than the user-specified window size . platform structure
and
• Time constraints: This relaxation specifies the time gap be- algorithm has been used in our previous work [20] for mining
and enemy and item placement
. This
tween consecutive events from one or two different time content sequences, and is used in the same way in this paper.
series. Given a user-defined gap
, a data sequence
SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 283
supports a pattern of two consecutive events
if and
TABLE IV
only if and occur in the sequence of the specified
N UMBER OF F REQUENT S EQUENTIAL P ATTERNS F OUND FOR D IFFERENT
S EQUENCE L ENGTH V ALUES A CROSS D IFFERENT T YPES OF S order and with a time difference lower than the specified EQUENCES
I S 500 AND
1 s). T HE C OLUMNS S TAND FOR :
C ONTENT C ORRESPONDING TO G AMEPLAY E VENTS ,
The GSP algorithm is used for mining sequences that rely on
G AMEPLAY B EHAVIOR
,G AMEPLAY B EHAVIOR
AND E players’ behavior VERY , since it allows more generalized
R EGISTERED E VERY
0.5 s
0.25 s
AND M ULTIMODAL
frequent patterns to be found by exploring different
S EQUENCES OF G AME C ONTENT
and it is also used for mining multimodal sequences
as, by
AND P LAYERS ’B EHAVIOR
using , we can discover simultaneous events from two different modalities. Different
values have also been explored to obtain a reasonable tradeoff between considering patterns that are gen- eralized over all players and more specific patterns. For the ex- periments presented in this paper, we use a
of 500 which
forces a sequence pattern to occur in at least 31.8% of the sam- ples to be considered frequent.
The parameter defines the threshold under which events from two different modalities can be considered as simul- taneous events. In this paper, we use
In order to lower the feature space dimensionality and the for this parameter has been chosen as a tradeoff between a small computational cost of searching for relevant features, we chose window size that does not consider simultaneous events, and a to use only frequent sequences of length three. window size that processes clearly asynchronous events from two modalities as events happening in a very small interval. For the rest of this paper, we will use parentheses to group simulta-
1 s. The value
REFERENCE VII. P L EARNING FOR M ODELING neous events.
P LAYING E XPERIENCE The
parameter is used to set up the time gap between In order to construct models that approximate the function two events to be considered as belonging to the same pattern. between gameplay features, controllable features, and reported
This parameter has a great impact on the number of frequent affective preferences, we use neuroevolutionary preference patterns that can be extracted. By assigning a large value to this learning. In other words, we use artificial evolution for shaping parameter, we allow more generalized patterns to be taken into artificial neural networks (ANNs) whose output matches the account, and, as a consequence, a large number of sequences reported (pairwise) preferences of the players. will be counted as
We proceed in a three-phase procedure in order to find net- values is that it allows considering less informative pat- works that predict preferences with high accuracy.
. Another drawback for using large
terns. Correctly tuning this parameter has a large impact on the
1) Feature selection: In the first step, we use single-layer informativeness of the resulting patterns, specially when mining
perceptrons (SLPs) to approximate the preferences of the multimodal sequences. For instance, if we use
players; sequential forward selection (SFS) [35], [36] is pattern ( , , ) can be supported by any sequence in which
3 s, the
applied to generate the input vector for the SLPs by finding the player jumps, moves right, and encounters an enemy within
the subset of features that yields the highest performance.
a 3-s interval (note that within this interval, the player might en- The quality of a feature subset is determined by threefold counter more than one enemy or a gap between the jumping and
cross validation on unseen data.
moving right events which makes this pattern somehow mis-
2) Feature space expansion: The subset of features derived leading). The experiments conducted for tuning the value of this
from SFS using SLP is then used as the input of small mul- parameter showed that a
tilayer perceptron (MLP) models (containing one layer of between the number of patterns extracted and their expressive-
of 1 s provides a good tradeoff
two hidden neurons), and SFS is used again to extract addi- ness value.
tional features from the set of remaining features, allowing features with more complicated nonlinear relationships to
D. Length of Sequences
be selected.
3) Setting ANN topology: Once all features that contribute Table IV presents the different types of sequences and the
to accurate simple MLP models are found, we optimize number of frequent subsequences found for a number of dif-
the topology of models using neuroevolutionary prefer- ferent sequence generation methods. As can be seen from the
ence learning. We start with a simple MLP topology of one table, the number of extracted subsequences is quite large for
hidden two-neuron layer; then, we increase the number of sequences containing information about players’ behavior
neurons to ten by adding two neurons at each step. Further, and the search space for automatic feature selection increases
we investigate MLPs with two hidden layers, with up to ten substantially when fusing content and gameplay events for gen-
neurons in the first and second layers. Again, the number erating sequences
; more than 2000 subsequences of length of hidden neurons starts at two and increases by adding two six have been extracted from the players’ behavior and multi-
neurons at each step; this sums to 30 different MLP topolo- modal sequences.
gies which are tested for each input vector.
284 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013
TABLE V
T OP F IVE S TATISTICALLY S IGNIFICANT C ORRELATION C OEFFICIENTS B ETWEEN R EPORTED E NGAGEMENT ,F RUSTRATION AND C HALLENGE , AND E XTRACTED
F EATURES .T HE S IGN B EFORE THE F EATURES I NDICATES P OSITIVE
OR N EGATIVE
C ORRELATION .F OR E XAMPLE , THE T IME R EQUIRED TO C OMPLETE THE L EVEL W AS F OUND TO B E P OSITIVELY C ORRELATED W ITH E NGAGEMENT ,W HILE A S EGMENT OF C ONTENT W ITH T WO A DJACENT D ECREASES IN THE P LATFORM H EIGHT W AS F OUND TO B E N EGATIVELY C ORRELATED W ITH E NGAGEMENT
AS C AN B E S EEN F ROM THE F IRST R OW
The performance of each MLP is obtained through the av- Nineteen direct features are significantly correlated, with en- erage classification accuracy in three independent runs using gagement with some of them also strongly correlated with frus- threefold cross validation. Parameter tuning tests have been con- tration and challenge, while 21 features are significantly cor- ducted to set up the parameters’ values for neuroevolutionary related with frustration, and 17 features with challenge. The user preference learning that yield the highest accuracy and features that are strongly correlated with engagement and not minimize computational effort. A population of 100 individuals with challenge are mostly related to the interaction between the is used, and evolution runs for 20 generations. A probabilistic player and blocks (mainly powerups). These features point to rank-based selection scheme is used, with higher ranked indi- the task of searching for powerups, in which the player has to viduals having higher probability of being chosen as parents. destroy blocks looking for powerups that, as a result, change Finally, reproduction was performed via uniform crossover, fol- Mario’s mode, as being particularly engaging. lowed by Gaussian mutation of 1% probability.
The avatar death feature (signifying that Mario loses a life) is the most significantly correlated with both frustration and chal-
NALYSIS VIII. A
lenge, indicating a strong relationship between death and these two player experience states.
This section provides a thorough analysis conducted for Regarding changes in platform height patterns , only the testing simple and more complex relationships between the features presented in Table V for engagement and frustration are
features extracted and the three reported states of player expe- significantly correlated with engagement and frustration, while rience. We further investigate the generality of the proposed
15 features are strongly correlated with challenge. Seven and approach by comparing the models constructed on the pre-
15 out of the features that correlated best with frustration and sented data set and the ones constructed in our previous work challenge, respectively, relate to the presence of a gap, while
in terms of the models’ performance and the features selected. engagement is significantly correlated with only four features that indicate a gap. It is interesting to note that despite the small
A. Linear Relationships pattern length (three), almost all features presented for the three
We performed an analysis for exploring statistically signif- emotional states require two or three gameplay actions to be icant correlations ( -value
5%) between players’ expressed performed.
preferences and extracted features. Correlation coefficients are Ten out of the 12 features from (item and enemy place- obtained by following the method proposed in [17]. According ment) are significantly correlated with engagement, while only to this method, correlation coefficients are calculated through three and two features correlate significantly with frustration
where is the total number of game and challenge, respectively. The first observation is that it is pairs where players expressed a clear preference (
obviously much easier to predict engagement from than to or
, predict challenge and frustration due to many more features if the player preferred the game with the larger value of the ex- significantly correlated to engagement, and the correlations are amined feature; and
) for one of the two games;
, if the player preferred the other stronger. Most features that correlate with engagement point to game in the game pair . The top five significantly correlated the placement of items and enemies. This is not the same for features for each emotional state are presented in Table V.
frustration, which demonstrates less significant effects, and the
SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 285
majority of those that do focus on the existence of an enemy; The correlations calculated and analyzed above provide basic features that correlate with challenge highlight the importance analysis with linear relationships between the extracted features of the relative placement of items and enemies in the challenge and reported emotions. However, these relationships are most perceived.
likely more complex than those that can be captured by linear Large subsets of features of players’ actions
are signif- models. The aim of the following section is to analyze the non- icantly correlated with engagement, frustration, and challenge linear relationships found using the players’ experience models. (99, 72, and 74, respectively). All features that are highly cor- related to frustration are also correlated to challenge. It is worth
B. Nonlinear Relationships
mentioning that the features that correlate the most with en- In this section, we base our analysis on which features were gagement are also significantly correlated with frustration and selected by the SFS algorithm for constructing neural-network- challenge but at different significance levels. It appears that the based player experience models. As these models take nonlinear number of jumps the player performs plays an important role relations into account, the features selected for these models in predicting engagement, as this appears in all top-five action might reveal more complicated and, in a sense, deeper relation- patterns combined in most of the cases with moving right and ships, but the analysis is also less straightforward. pressing the speed button. This can also explain the significance