Crowdsourcing the Aesthetics of Platform Games

Noor Shaker, Georgios N. Yannakakis, Member, IEEE, and Julian Togelius, Member, IEEE

Abstract—What are the aesthetics of platform games and what

having an algorithm that could observe a human playing a game

makes a platform level engaging, challenging, and/or frustrating?

and accurately judge what the human is experiencing as he/she

We attempt to answer such questions through mining a large set of crowdsourced gameplay data of a clone of the classic platform

is playing the game would also be useful, as this could allow

game Super Mario Bros (SMB). The data consist of 40 short game

us to adapt the game to the player, and also help us understand

levels that differ along six key level design parameters. Collec-

how human affect is expressed in behavior.

tively, these levels are played 1560 times over the Internet, and the

A number of researchers have attacked this problem from

perceived experience is annotated by experiment participants via

a top–down perspective, that is, by creating theories of the

self-reported ranking (pairwise preferences). Given the wealth of this crowdsourced data, as all details about players’ in-game be-

aesthetics of game content and gameplay based on introspec-

havior are logged, the problem becomes one of extracting mean-

tion or qualitative research methods. For example, Malone [1]

ingful numerical features at the appropriate level of abstraction for

proposed that computer games are “fun” when they have the

the construction of generic computational models of player experi-

right amount of challenge and evoke curiosity and fantasy, and

ence and, thereby, game aesthetics. We explore dissimilar types of

Koster [2] proposed that fun in games is connected to the player

features, including direct measurements of event and item frequencies, and features constructed through frequent sequence mining,

learning to play the game. Magerko et al.’s [3] research within

and go through an in-depth analysis of the interrelationship be-

learning games proposed an adaptation framework based on a

tween level content, players’ behavioral patterns, and reported ex-

predeﬁned set of learning styles.

perience. Furthermore, the fusion of the extracted features allows

Such theories are, in general, too high-level and vague about

us to predict reported player experience with a high accuracy, even

key concepts to be implemented in algorithms, though some

from short game segments. In addition to advancing our insight on the factors that contribute to platform game aesthetics, the results

attempts have been made to create computational models based

are useful for the personalization of game experience via automatic

on them [4], [5].

game adaptation.

Other authors have tried to identify more speci ﬁc and

concrete elements of game design and game content that

Index Terms— Computational aesthetics, experience-driven

procedural content generation, player experience modeling.

contribute to player experience, so-called “patterns in-game design”; Björk et al. [6] are in an ambitious ongoing effort cataloguing hundreds of such patterns, whereas other authors

NTRODUCTION I. I

discuss patterns in content design for individual genres, such as ﬁrst-person shooters. For example, Hullett and Whitehead

[7] analyze some key patterns in ﬁrst-person shooter games, N algorithm that could automatically judge how en- such as sniper positions and open arenas and discuss how they

contribute to player entertainment. In [8] and [9], a system is—that is, a computational model of the aesthetics of game that visualizes players’ behaviors to allow analysts to easily

gaging or interesting a particular piece of game content

content—would be useful for several reasons. One strong identify patterns and design issues is presented. Jennings-Teats reason is that such a method would help us to automatically et al. [10] showed how player experience can be altered by or semiautomatically generate good content; another is that presenting sequences of level segments rated by their dif ﬁculty analysis of the algorithm could help us understand what players and presented to the player according to her behavior. like in games, and ultimately contribute to understanding the

Of particular interest for our current concern is the work of cognitive and affective procedures behind human entertainment Smith et al. [11], who analyzed platform game levels and pro-

and motivation in general. As players tend to vary signiﬁcantly posed a hierarchical ontology for such levels where cells con- in their preferences, it would further be useful to have an algo- tain rhythm groups which, in turn, consist of components such rithm that, given information about a particular player, could as platforms, collectibles, and switches. The authors further hy- predict the appeal of the game content for that player. Finally, pothesize about how certain design choices might affect player

experience, such as short and uneven rhythm groups making the level more challenging and longer rhythmic sections demanding

Manuscript received November 12, 2011; revised May 22, 2012 and September 15, 2012; accepted November 18, 2012. Date of publication

sustained concentration. These principles were eventually in-

December 03, 2012; date of current version September 11, 2013. The work

corporated into the Tanagra level generator, which can create

was supported in part by the Danish Research Agency, Ministry of Science,

levels with rhythmic structure but does not include methods for

Technology and Innovation under Project “AGameComIn” (274-09-0083). N. Shaker and J. Togelius are with the IT University of Copenhagen, Copen-

judging the aesthetics of completed levels [12].

hagen 2300, Denmark (e-mail: [email protected]; [email protected]).

If we ﬁnd accurate such theories, we can then create theory-

G. N. Yannakakis is with the Department of Digital Games, University of

driven models of game aesthetics. However, even if the theo-

Malta, Msida MSD 2080, Malta (e-mail: [email protected]). Color versions of one or more of the ﬁgures in this paper are available online

ries are correct and suf ﬁciently extensive to allow prediction

at http://ieeexplore.ieee.org.

of player experience in a wide range of situations, they would

Digital Object Identi ﬁer 10.1109/TCIAIG.2012.2231413

also need to be quantitative in order to be incorporated within

SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 277

an algorithm, something that most current theoretical efforts to understand game aesthetics are not. They would also need to

be grounded in measurable quantities. For example, theories based on design patterns would need to be accompanied by algorithmic ways of detecting and locating such patterns.

The alternative, complimentary approach is to create data-driven (bottom–up) models of game aesthetics based on collecting data about games, game content, and players’ behavior, and correlating these data with data annotated by player experience tags. This approach, which builds on ma- chine learning and/or other statistical methods, can be seen as crowdsourcing aesthetics modeling. A few researchers have attempted to create such experience models via the affective annotation of data streams such as sounds and videos [13], but the application of crowdsourcing-based approaches for player testing and game aesthetics has not yet been investigated. The approach is also closely linked to massive-scale game data mining [14], [15]; however, direct annotations of player experience are not generally available in those studies. For an overview of research on building aesthetic game models from data, as well as a multifaced framework that interconnects player experience modeling and game adaptation, the reader is referred to the experience-driven procedural content generation framework [16].

In this work, we are taking a slightly narrower view of aesthetics. We judge the aesthetics of the level design from the players’ point of view, based on the content generated and the gameplay experience it provides. The content can be generated automatically by an algorithm or it can handcrafted by a de- signer, or indeed be created in a mixed-initiative (cocreation) fashion. We are trying to devise a data-driven approach that can automatically extract game design patterns from existing games. These patterns can be used by an algorithm for adapting the game, or they can be generalized and used by game designers when constructing a new game. The focus of the proposed work is not on adding to what we know about what makes a level frustrating or challenging, but rather to construct a quantitative measure of game aesthetics.

A. Relation to Our Own Previous Work In the past, we have published several papers about the

computational aesthetics of the platform game used in this paper, Inﬁnite Mario Bros (IMB). Our ﬁrst papers [17], [18] reported on the construction of models that predict six different aspects of player experience, based on 36 features extracted from gameplay and four controllable features, which could be used to generate levels. We used forced choice questionnaires to collect player experience data, and neuroevolutionary preference learning combined with feature selection to induce the models, just as we do in this paper. Data were collected from 480 game sessions, played by at most 240 different players. Models were found that predicted certain aspects of player experience with between 73% and 91% accuracy.

A follow-up paper [19] used the same data set as the previous papers, but focused on generating levels based on the models we had learned. The levels were generated by system- atically varying the four controllable features between high and low states until the parameter set was found, which yielded the

highest or lowest predicted value on one of the six player experience dimensions. That parameter conﬁguration was then used to generate personalized levels for the player that maximized predicted challenge, frustration, or fun. The generated levels were, in turn, tested, using both human players and algorithmic agents playing the game, to verify that the adaptation mechanism worked by tracking predicted player preference over several levels.

While these experiments were successful, it became ap- parent that the data set had some limitations. The number of controllable features (and the number of configurations of these features that were tested) was too small to permit meaningful exploration of the search space or the possibility of finding interestingly new design parameter configurations. Also, one of the controllable features (direction switching) and three of the player experience dimensions (predictability, anxiety, and boredom) turned out to be relatively uninteresting to explore in the context of the current game. The levels used in the first data set each took about a minute to play, which, we judged, was overly long, given that we wanted our model to apply to the aesthetics of the moment, in order to enable online adaptation. Finally, and most importantly, we wanted to record more detailed information about both levels and gameplay in order to see if we could find a way to predict player experience even better—to squeeze more information out of the data, as it were. Therefore, we embarked on collecting a new data set, with more levels (40) and more players (1560 games were played). In a recent paper [20], we reported on preliminary explorations of this data set. There, we tried to predict player experience of reported engagement based only on level features (disregarding all gameplay traces), and introduced the use of frequent subsequence counts (as found by sequence mining algorithms) as features extracted from levels. We also explored predicting features from only parts of levels, in order to find the minimum level segment length which would allow us to perform meaningful adaptation. It was found that both re- stricting level segment lengths and disregarding player metrics significantly decreased the predictive power of derived models.

In this paper, we explore the same data set as the one used in [20] to a much greater depth. We investigate a number of ways of extracting sequence data from levels and play traces that go beyond what we used in all previous modeling attempts, and we explore the predictive power of new direct (nonsequential) measures of both levels and player metrics, both on their own and in combination with sequence data. The goal is to create models that predict player experience as accurately as possible from observing a game level and how well a player plays it. We believe the methods we develop along the way to be potentially useful for other games which have a linear structure.

ESTBED II. T P LATFORM G AME The testbed platform game used for our study is a modiﬁed

version of Markus Persson’s IMB which is a public domain clone of Nintendo’s classic platform game Super Mario Bros (SMB) (Fig. 1). IMB features the same art assets and general game mechanics as SMB but differs in level construction; while human-authored levels have been constructed for the original SMB, IMB features inﬁnite numbers of procedurally generated

278 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013

Fig. 2. Enemies placement using different probabilities: high probability is given (a) to placement around horizontal boxes

, (b) to placement around

gaps

, and (c) to random placement

data were extracted from raw logs and replays: content, gameplay, and annotated (self-reported) player experience.

A. Content Data

Fig. 1. Snapshot from IMB, showing Mario standing on horizontally placed

blocks surrounded by different types of enemies.

Two types of content features have been extracted: direct and sequential. The direct content features are also named controllable, as they are used to generate the levels and are

levels. The gameplay in IMB consists of moving the player-con- varied to make sure several variants of the game are played and trolled character, Mario, through 2-D levels. Mario can walk and compared. The level generator of the game has been modiﬁed run, duck, jump, and shoot ﬁreballs. The main goal of each level to create levels according to the following six controllable is to get to the end of the level. Auxiliary goals include collecting features. as many coins as possible, and clearing the level as fast as pos-

• The number of gaps in the level ; gaps are the holes in sible. For more details about the game, the reader may refer to

the game into which Mario may fall and die. [17].

• The average width of gaps

. This parameter controls ware has been used relatively extensively as a testbed for re-

The game itself is very well known, and the benchmark soft-

• The number of enemies

the number of Goombas (mushroom-like enemies) and search and as a testing environment for various AI techniques

Koopas (turtle-like enemies) scattered around the level, [19], [21]–[25]. The game is also being used as a benchmark for

affecting the level difﬁculty.

. The way enemies are placed around IMB has been chosen because of the popularity of SMB, the

the Mario AI Championship 1 [26].

• Enemy placement

the level determined by three probabilities which sum to high similarity between the two, the availability of an open

one.

: Enemies are placed on or collection easier, and because of the 2-D design and game me-

source clone of the game which makes development and data

— Around horizontal boxes

under a set of horizontal blocks (a number of blocks chanics it provides, which are similar to other games from the

placed horizontally without connection to the ground). same genre. The game has been used for this study not primarily

— Around gaps : Enemies are placed within a close dis- in order to ﬁnd new design insights, but rather to validate that

tance to the edge of a gap.

the methodology could be used to ﬁnd new design insights if

: Enemies are placed on a ﬂat used on a less-known game genre.

— Random placement

space on the ground.

While implementing most features of SMB, the standout fea- Fig. 2 illustrates positioned enemies by giving different ture of IMB is the automatic generation of levels. Every time a

. Fig. 2(a) shows enemies placed new game is started, levels are randomly generated by traversing

values for ,

, and

by setting

to 80%. Fig. 2(b) illustrates the result of set-

to 80%, and Fig. 2(c) is the result of 80%. as speciﬁed by placement parameters. In our modiﬁed version,

a ﬁxed width and adding features according to certain heuristics,

ting

. Mario can collect powerup we concentrated on a number of selected parameters that affect

• The number of powerups

elements hidden in boxes to upgrade his state from little to gameplay experience.

big or from big to ﬁre. • The number of boxes . We deﬁne one variable to specify the number of two different types of boxes that exist in

ATA C III. D OLLECTION

IMB. These two types of boxes are here referred to as blocks and rocks. Blocks (which look like squares with

Data from gameplay and questionnaires have been collected question marks) contain hidden elements such as coins or from hundreds of players over the Internet via a crowdsourcing powerups. Rocks (which look like squares of bricks) may experiment. Complete games were logged, including the levels hide a coin, a powerup, or simply be empty. Mario can the players played and what actions the players took at which smash rocks only when he is in big mode. time, enabling complete replays. The following three types of The generation of levels with speciﬁed values for all pa-

1 http://www.marioai.org.

rameters is guaranteed by the generator; while generating the

SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 279

Fig. 3. An example level generated and used to collect the data.

levels, and whenever an item is to be added, these parameters

C. Reported Player Experience Data

are checked and the item is placed accordingly. We rely on self-reporting annotations based on previous re- Game designers, who are familiar with 2-D platform games, search in which very accurate player experience models of self-

and, in particular, SMB, have been consulted when selecting the report affective states have been constructed [27], [28]. How- controllable features. The features have been chosen based on ever, a number of limitations are embedded in the players’ self- their impact on the investigated affective states and their gen- reporting experience modeling, including noise due to learning erality to other 2-D platform games. Note that consulting game and self-deception, disruption to gameplay experience, and sen- designers does not conﬂict with the bottom–up approach fol- sitivity to memory limitations. In order to minimize these ef- lowed, which derives models of player experience based on data fects, we rely on annotated player experience data collected via collected from players while the designers’ knowledge is incor-

a four-alternative forced choice questionnaire presented after porated only when designing the experiment.

small game sessions. The questionnaire asks the player to report The ﬁrst two features appeared in our previous studies [19], the preferred game for three user states: engagement, challenge,

[27], whereas the four new features are explored for the ﬁrst and frustration. The selection of these states is based on earlier time here.

game survey studies [27] and our intention to capture both af- Two states (low and high) are set for each of the controllable fective and cognitive/behavioral components of gameplay ex-

parameters above, except for enemy placement, which has been perience [16]. assigned three different states allowing more control over the

The questionnaire protocol gives the players the following difﬁculty and diversity of the generated levels. The total number alternatives: of pairwise combinations of these states is 96. This number can

• game A[B] was/felt more E than game B[A] (cf., two-al-

be reduced to 40 by analyzing the dependencies between these

ternative forced choice);

features and eliminating the combinations that contain indepen- • both games were/felt equally E; or dent variables. All levels have been checked before starting the

• neither of the two games was/felt E; data collection in a way that assures their compatibility with the where E is the affective state under investigation.

intent parameters assigned. An example level generated by one possible combination of the controllable features is presented in

XPERIMENTAL IV. E P ROTOCOL Fig. 3.

The game survey study has been designed to collect subjec- In addition to the direct (controllable) features, sequential tive affective reports expressed as pairwise preferences of sub-

content features are also extracted. The topology of the levels jects playing the different variants (levels) of the testbed game is converted into sequences of numbers representing different by following the experimental protocol proposed in [29]. types of game items, and sequence mining techniques are ap-

According to the protocol, each subject plays a predeﬁned set plied to extract useful patterns from the resulting sequences (see of two games. The games played differ in the levels of one or

Section V-B). more of the six controllable features presented previously. The game sessions presented to players have been con-

B. Gameplay Data structed using a level width of 100 IMB units (blocks), about While playing the game, different player actions and inter- one-third of the size usually employed when generating levels actions with game items and their corresponding timestamps for IMB games in our previous experiments [19], [27]. The have been recorded. These events are categorized in different selection of this length was due to a compromise between a groups according to the type of the event and the type of inter- window size that is big enough to allow sufﬁcient interaction action with the game objects. The events recorded are the fol- between the player and the game to trigger the examined lowing: level completion event; Mario death event and cause affective states, and a window which is small enough to set an of death; interaction events with game items such as free coins, acceptable frequency of an adaptation mechanism applied in empty rock, coin block/rock, and power-up rock/block; Mario real time, aiming at closing the affective loop of the game [30]. enemy kill event associated with the type of actions performed

A previous study [20] has been conducted to test whether a to kill the enemy and the type of enemy; changing Mario mode smaller level width than the chosen one can be used to construct (small, big, or ﬁre) event; changing Mario state (moving right, models for predicting players’ engagement from game content left, jump, run, duck) event; and the full trajectory of Mario as with higher accuracy than the models constructed based on

a combination of events. information from levels with the chosen width. The study con- Both direct and sequential gameplay features have been ex- cluded that the models perform best when trained on features tracted based on the aforementioned events. The detailed list of extracted from levels with the selected width rather than from features is presented in Section V.

levels with half or one-third of the width.

280 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013

TABLE I F EATURES E XTRACTED F ROM D ATA R ECORDED D URING G AMEPLAY

A total number of 780 players participated in this crowd- style. Alternatively, analyzing sequences of game content and sourcing experiment. Participants’ age covers a range between players’ behavior yields patterns that might be directly linked

16 and 64 years (31.5% females), while their location in- to player experience. For example, we would like to extract fea- cludes Denmark (46.11%), Greece (8.9%), Ireland (1.48%), tures that encapsulate whether a player performed a particular the United States (3.34%), Holland (0.74%), Finland (1.36%), action before or after encountering a speciﬁc in-game situation. France (0.37%), Syria (0.25%), Sweden (0.37%), Korea

Modeling players’ experience based on features extracted (0.12%), Spain (0.25%), or unknown (36.71%).

from sequential information provides a promising alternative for models constructed based on direct feature extraction, and

EATURE V. F E XTRACTION

by fusing these two types of representations, we anticipate constructing more accurate models of player experience than

In the following, we describe the types of features that we those constructed on one of these forms of data representation have extracted from the recorded content and gameplay data via at a time. direct and sequential feature extraction.

In the following, we describe different criteria for con-

Most of the direct features presented appear in our previous structing sequences from game content, gameplay, and the studies [17], [19]. These features are used in this work due to interaction between the two. We present two sequence mining their relevance for modeling player experience. In this work, approaches and further discuss different setups that can be used we also investigate sequential patterns extracted from gameplay for mining the extracted sequences. data.

Table II presents the different possible approaches that can be

A. Direct Gameplay Features followed to generate different types of sequences. The columns represent the different order and frequency at which information

Several features have been directly extracted from the data is logged. The rows represent what type of data is logged each recorded during gameplay (see Section III). The choice of these time an event occurs. We will be distinguishing the following features is made in order to be able to represent the difference orders/frequencies, while acknowledging that even more ﬁne- between a large variety of IMB playing styles. In addition to grained distinctions are possible. the six controllable game features that are used to generate IMB

: time step. Information is logged at a constant rate levels, the features presented in Table I are extracted from the

(e.g., once per second), regardless of what the player does. gameplay data collected and are classiﬁed in ﬁve categories:

This yields a sequence with a length proportional to the time, interaction with items, interaction with enemies, death,

time taken by the player to play the level. and miscellaneous.

• Block: Information is logged once per block in the level, independent of the time taken by the player to traverse the

B. Sequential Features level. This yields a sequence with a length equal to the We investigate another form of indirectly representing the

level.

gameplay interaction by means of sequences, which allows in- • Gameplay event: Information is logged each time the cluding features that are based on ordering in space or time.

player changes the command issued (pressing/releasing a Gameplay features presented in Section V-A provide a quan-

button or changing direction) or something else happens titative measure of different types of game content and playing

(e.g., Mario changes the mode or stomps an enemy).

SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 281

TABLE II T HE D IFFERENT T YPES OF S EQUENCES T HAT C AN B E G ENERATED .C OLUMNS P RESENT THE T YPE OF E VENT TO B E R ECORDED ,W HILE R OWS P RESENT W HEN TO R ECORD THE E VENT .T HE C OMBINATIONS M ARKED W ITH AN XA RE THE O NES I NVESTIGATED IN T HIS P APER

Fig. 4. Snapshot from a level and the corresponding (a) platform structure sequence representation

and (b) enemies and items sequence representation .

TABLE III

T HE D IFFERENT T YPES OF E VENTS C ONSIDERED W HEN G ENERATING THE

S EQUENCES AND T HEIR G RAPHICAL R EPRESENTATION

four different possible values: 0, 1, 2, and 3, corresponding, respectively, to nonexistence of either enemies or items, the existence of an enemy ( ); the existence of an item ( );

and the existence of an enemy and an item ( ). Fig. 4(b) illustrates an example level segment where the aforementioned four states are presented.

• Content corresponding to gameplay events : We explored another method in which game content at the speciﬁc player position is recorded whenever the player performs an action or interacts with game items. In this case, different content events are used: increase in platform height ( ); decrease in platform height ( ); the existence of an enemy ( ); the existence of a coin, a block, or a rock ( ); the existence of a coin, a block, or a

rock with an enemy ( ); and the beginning ( ) and the The information logged each time anything is logged can be

end ( ) of a gap.

either game content , player (gameplay) behavior

2) Sequential Gameplay Features: Sequences representing both game content and players’ behavior

, or

different players’ behavior have been generated by recording We will focus the discussion for the rest of the paper on the key pressed/released events (action event) or interaction with

ﬁve sequence types marked with an X in Table II. Although we item events. The action event might represent a simple action only investigate a few sequence types of all those available, our performed, such as pressing an arrow key to move left or right; sample provides a variety of options that cover different aspects or more complex players’ behaviors that can be achieved by of playing experience.

pressing a combination of keys at the same time (e.g., jumping Once we know what to sample and when, the question re- over a big gap requires the player to press the run and jump keys

mains how to turn this information into sequences using a low- together for a number of time steps). The following list describes cardinal alphabet. Below, we discuss how to do this for levels the different events that have been considered (Table III): and for gameplay traces.

• pressing an arrow key to move right, left, or duck ;

1) Sequential Content Features: Sequences capturing dif-

• pressing the jump key ;

ferent information about level geometry have been extracted by • pressing the jump key in combination with right or left key converting the content of the levels into numbers representing

different types of game items. Three different representations • pressing the run key in combination with right or left key of game content have been investigated. The full list of events

considered as well as their graphical representation is presented • pressing the run and jump keys in combination with right in Table III.

or left key

• Platform structure : A sequence is generated by com-

• not pressing any key .

paring the height of each block across the level with the

• winning the game

height of the previous block and recording the following

• losing the game ;

values: 0 if no difference found ( ); 1 if there is an in-

; crease in the platform height ( ); 2 if there is a decrease in

• killing an enemy by stomping

• unleashing a koopa shell ;

the platform height ( ); and 3 and 4 to mark the beginning

• changing Mario mode .

( ) and the ending ( ) of a gap, respectively. Fig. 4(a) Fig. 5 presents the graphical interpretation for most of the presents a part of a level and the corresponding platform actions that can be performed. structure sequence representation.

In this paper, we will consider two time window values for • Enemy and item placement : The term “items” refers to generating sequences: 0.5 s

and 0.25 s meaning the coins and the different types of boxes scattered around that an event will be registered every half or quarter of a second, the level. The existence and nonexistence states for ene- respectively. We also consider sequences generated whenever mies and items have been combined together, resulting in the player switches the action . (Note that IMB is a fast-paced

282 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013

Mining sequences across multiple time series of data—content corresponding to gameplay events

, players’ behavior ,

and multimodal sequences

—can be achieved via the GSP algorithm [33]. Martinez and Yannakakis [34] have used GPS to obtain frequent subsequences across multiple modalities of player input (physiology and game-based context).

In the following, we provide a short list of frequent subsequence mining deﬁnitions and give a brief description of the two algorithms and the way they have been used in this paper.

Fig. 5. A graphical representation of the different actions that can be performed by the player: (a) standing still

; (b) moving right

; (c) moving left

A. Deﬁnitions

(d) ducking ; (e) jumping

; (f) jumping right

; and (g) running right

A data sequence is a sample of a sequential data set where each sequence consists of a number of events, each one associated with a time stamp. The events are ordered by increasing

game in which the player could in theory perform an action time. every 1/24 s).

is a nonempty set of simultaneous The purpose of recording these events is that players’ be- events denoted by

A sequence pattern

where is an event. A data havior and playing style can be analyzed by looking at events sequence supports a sequence pattern if and only if it contains

generated by each player and how frequently each of these all the events present in the pattern in the same order but not events occurs. Generating a sequence combining these events necessarily consecutive. in a timely manner provides a more in-depth insight about

is the minimum number of times more complex behavior patterns that might have an impact on

A minimum support

a pattern has to occur in the data sequences to be considered players’ experience.

frequent. If the number of occurrences of in the data sequences The resulting sequences of players’ behavior have a wide exceeds

a frequent pattern, and, in this case, variety, both in terms of length and structure, which reﬂects the fraction of data sequences that support is refereed to as the diversity of players’ playing style and complicates any se- support count

, we call

quence mining algorithm that can be applied to extract useful information. This diversity is reﬂected on the normalized com-

B. SPADE

A modiﬁed version of the SPADE algorithm [32] has been test for structural similarity between the sequences. The results implemented to extract frequent subsequences of different

pression distance (NCD) measure [31] that has been applied to

of applying this function on each pair of the action sequences game events. Game content for the 40 levels has been con- showed a high dissimilarity between the sequences (NCD

verted into numbers representing different types of content in 71.32% of the cases).

events, as described in Section V-B1. Different subsequence

3) Fused Sequential Features of Both Game Content and lengths and minimum support threshold values have been Gameplay Data: Game content and players’ behavior events explored. A minimum support threshold of 20 has been used, have been fused together to generate bimodal sequences

. meaning that each subsequence should occur in at least half of Events from the two modalities have been extracted, with their the levels to be considered frequent. corresponding time stamp, and then logged in the temporal order. The generated event contains information about the

C. GSP

game content at the speciﬁc position in the game where the The GSP algorithm [33] solves the sequence mining problem gameplay event occurred (which is one of the events mentioned based on an a priori algorithm with a number of generaliza-

in Section V-B1, or none if no content event from the list tions. Using GSP, we can discover patterns with a predefined happens to occur at this specific position) along with the type minimum support, define time constraints within which adja- of gameplay event.

cent events can be considered elements of the same pattern, and specify a time window for events from different modalities to

EQUENCE VI. S M INING

be considered as synchronous events.

Sequence mining techniques have been applied to extract GSP generalizes the basic deﬁnition of frequent sequential useful information from the different types of sequences gen- pattern by introducing two relaxation schemes. erated. Two algorithms for frequent item set mining have been

• Sliding window: This generalization allows the items of a implemented to ﬁnd frequent sequence patterns within the

pattern to be contained in the union of the items belonging data set of sequences: the Sequential PAttern Discovery using

to different time series. According to this relaxation, a se- Equivalence classes (SPADE) algorithm and the generalized

, where and can be contained in dif- sequential pattern (GSP) algorithm. The SPADE [32] algorithm

quence

ferent time series, is allowed to be counted as a support for has been used to mine single-dimensional sequences that repre-

a subsequence as long as the time difference between sent game content independently of players’ behavior, namely,

is less than the user-speciﬁed window size . platform structure

and

• Time constraints: This relaxation speciﬁes the time gap be- algorithm has been used in our previous work [20] for mining

and enemy and item placement

. This

tween consecutive events from one or two different time content sequences, and is used in the same way in this paper.

series. Given a user-deﬁned gap

, a data sequence

SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 283

supports a pattern of two consecutive events

if and

TABLE IV

only if and occur in the sequence of the speciﬁed

N UMBER OF F REQUENT S EQUENTIAL P ATTERNS F OUND FOR D IFFERENT

S EQUENCE L ENGTH V ALUES A CROSS D IFFERENT T YPES OF S order and with a time difference lower than the speciﬁed EQUENCES

I S 500 AND

1 s). T HE C OLUMNS S TAND FOR :

C ONTENT C ORRESPONDING TO G AMEPLAY E VENTS ,

The GSP algorithm is used for mining sequences that rely on

G AMEPLAY B EHAVIOR

,G AMEPLAY B EHAVIOR

AND E players’ behavior VERY , since it allows more generalized

R EGISTERED E VERY

0.5 s

0.25 s

AND M ULTIMODAL

frequent patterns to be found by exploring different

S EQUENCES OF G AME C ONTENT

and it is also used for mining multimodal sequences

as, by

AND P LAYERS ’B EHAVIOR

using , we can discover simultaneous events from two different modalities. Different

values have also been explored to obtain a reasonable tradeoff between considering patterns that are generalized over all players and more speciﬁc patterns. For the experiments presented in this paper, we use a

of 500 which

forces a sequence pattern to occur in at least 31.8% of the sam- ples to be considered frequent.

The parameter deﬁnes the threshold under which events from two different modalities can be considered as simultaneous events. In this paper, we use

In order to lower the feature space dimensionality and the for this parameter has been chosen as a tradeoff between a small computational cost of searching for relevant features, we chose window size that does not consider simultaneous events, and a to use only frequent sequences of length three. window size that processes clearly asynchronous events from two modalities as events happening in a very small interval. For the rest of this paper, we will use parentheses to group simulta-

1 s. The value

REFERENCE VII. P L EARNING FOR M ODELING neous events.

P LAYING E XPERIENCE The

parameter is used to set up the time gap between In order to construct models that approximate the function two events to be considered as belonging to the same pattern. between gameplay features, controllable features, and reported

This parameter has a great impact on the number of frequent affective preferences, we use neuroevolutionary preference patterns that can be extracted. By assigning a large value to this learning. In other words, we use artiﬁcial evolution for shaping parameter, we allow more generalized patterns to be taken into artiﬁcial neural networks (ANNs) whose output matches the account, and, as a consequence, a large number of sequences reported (pairwise) preferences of the players. will be counted as

We proceed in a three-phase procedure in order to ﬁnd net- values is that it allows considering less informative pat- works that predict preferences with high accuracy.

. Another drawback for using large

terns. Correctly tuning this parameter has a large impact on the

1) Feature selection: In the ﬁrst step, we use single-layer informativeness of the resulting patterns, specially when mining

perceptrons (SLPs) to approximate the preferences of the multimodal sequences. For instance, if we use

players; sequential forward selection (SFS) [35], [36] is pattern ( , , ) can be supported by any sequence in which

3 s, the

applied to generate the input vector for the SLPs by ﬁnding the player jumps, moves right, and encounters an enemy within

the subset of features that yields the highest performance.

a 3-s interval (note that within this interval, the player might en- The quality of a feature subset is determined by threefold counter more than one enemy or a gap between the jumping and

cross validation on unseen data.

moving right events which makes this pattern somehow mis-

2) Feature space expansion: The subset of features derived leading). The experiments conducted for tuning the value of this

from SFS using SLP is then used as the input of small mul- parameter showed that a

tilayer perceptron (MLP) models (containing one layer of between the number of patterns extracted and their expressive-

of 1 s provides a good tradeoff

two hidden neurons), and SFS is used again to extract addi- ness value.

tional features from the set of remaining features, allowing features with more complicated nonlinear relationships to

D. Length of Sequences

be selected.

3) Setting ANN topology: Once all features that contribute Table IV presents the different types of sequences and the

to accurate simple MLP models are found, we optimize number of frequent subsequences found for a number of dif-

the topology of models using neuroevolutionary prefer- ferent sequence generation methods. As can be seen from the

ence learning. We start with a simple MLP topology of one table, the number of extracted subsequences is quite large for

hidden two-neuron layer; then, we increase the number of sequences containing information about players’ behavior

neurons to ten by adding two neurons at each step. Further, and the search space for automatic feature selection increases

we investigate MLPs with two hidden layers, with up to ten substantially when fusing content and gameplay events for gen-

neurons in the ﬁrst and second layers. Again, the number erating sequences

; more than 2000 subsequences of length of hidden neurons starts at two and increases by adding two six have been extracted from the players’ behavior and multi-

neurons at each step; this sums to 30 different MLP topolo- modal sequences.

gies which are tested for each input vector.

284 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 3, SEPTEMBER 2013

TABLE V

T OP F IVE S TATISTICALLY S IGNIFICANT C ORRELATION C OEFFICIENTS B ETWEEN R EPORTED E NGAGEMENT ,F RUSTRATION AND C HALLENGE , AND E XTRACTED

F EATURES .T HE S IGN B EFORE THE F EATURES I NDICATES P OSITIVE

OR N EGATIVE

C ORRELATION .F OR E XAMPLE , THE T IME R EQUIRED TO C OMPLETE THE L EVEL W AS F OUND TO B E P OSITIVELY C ORRELATED W ITH E NGAGEMENT ,W HILE A S EGMENT OF C ONTENT W ITH T WO A DJACENT D ECREASES IN THE P LATFORM H EIGHT W AS F OUND TO B E N EGATIVELY C ORRELATED W ITH E NGAGEMENT

AS C AN B E S EEN F ROM THE F IRST R OW

The performance of each MLP is obtained through the av- Nineteen direct features are significantly correlated, with en- erage classification accuracy in three independent runs using gagement with some of them also strongly correlated with frus- threefold cross validation. Parameter tuning tests have been con- tration and challenge, while 21 features are significantly cor- ducted to set up the parameters’ values for neuroevolutionary related with frustration, and 17 features with challenge. The user preference learning that yield the highest accuracy and features that are strongly correlated with engagement and not minimize computational effort. A population of 100 individuals with challenge are mostly related to the interaction between the is used, and evolution runs for 20 generations. A probabilistic player and blocks (mainly powerups). These features point to rank-based selection scheme is used, with higher ranked indi- the task of searching for powerups, in which the player has to viduals having higher probability of being chosen as parents. destroy blocks looking for powerups that, as a result, change Finally, reproduction was performed via uniform crossover, fol- Mario’s mode, as being particularly engaging. lowed by Gaussian mutation of 1% probability.

The avatar death feature (signifying that Mario loses a life) is the most signiﬁcantly correlated with both frustration and chal-

NALYSIS VIII. A

lenge, indicating a strong relationship between death and these two player experience states.

This section provides a thorough analysis conducted for Regarding changes in platform height patterns , only the testing simple and more complex relationships between the features presented in Table V for engagement and frustration are

features extracted and the three reported states of player expe- signiﬁcantly correlated with engagement and frustration, while rience. We further investigate the generality of the proposed

15 features are strongly correlated with challenge. Seven and approach by comparing the models constructed on the pre-

15 out of the features that correlated best with frustration and sented data set and the ones constructed in our previous work challenge, respectively, relate to the presence of a gap, while

in terms of the models’ performance and the features selected. engagement is signiﬁcantly correlated with only four features that indicate a gap. It is interesting to note that despite the small

A. Linear Relationships pattern length (three), almost all features presented for the three

We performed an analysis for exploring statistically signif- emotional states require two or three gameplay actions to be icant correlations ( -value

5%) between players’ expressed performed.

preferences and extracted features. Correlation coefficients are Ten out of the 12 features from (item and enemy place- obtained by following the method proposed in [17]. According ment) are significantly correlated with engagement, while only to this method, correlation coefficients are calculated through three and two features correlate significantly with frustration

where is the total number of game and challenge, respectively. The ﬁrst observation is that it is pairs where players expressed a clear preference (

obviously much easier to predict engagement from than to or

, predict challenge and frustration due to many more features if the player preferred the game with the larger value of the ex- signiﬁcantly correlated to engagement, and the correlations are amined feature; and

) for one of the two games;

, if the player preferred the other stronger. Most features that correlate with engagement point to game in the game pair . The top ﬁve signiﬁcantly correlated the placement of items and enemies. This is not the same for features for each emotional state are presented in Table V.

frustration, which demonstrates less signiﬁcant effects, and the

SHAKER et al.: CROWDSOURCING THE AESTHETICS OF PLATFORM GAMES 285

majority of those that do focus on the existence of an enemy; The correlations calculated and analyzed above provide basic features that correlate with challenge highlight the importance analysis with linear relationships between the extracted features of the relative placement of items and enemies in the challenge and reported emotions. However, these relationships are most perceived.

likely more complex than those that can be captured by linear Large subsets of features of players’ actions

are signif- models. The aim of the following section is to analyze the non- icantly correlated with engagement, frustration, and challenge linear relationships found using the players’ experience models. (99, 72, and 74, respectively). All features that are highly correlated to frustration are also correlated to challenge. It is worth

B. Nonlinear Relationships

mentioning that the features that correlate the most with en- In this section, we base our analysis on which features were gagement are also significantly correlated with frustration and selected by the SFS algorithm for constructing neural-network- challenge but at different significance levels. It appears that the based player experience models. As these models take nonlinear number of jumps the player performs plays an important role relations into account, the features selected for these models in predicting engagement, as this appears in all top-five action might reveal more complicated and, in a sense, deeper relation- patterns combined in most of the cases with moving right and ships, but the analysis is also less straightforward. pressing the speed button. This can also explain the significance

Crowdsourcing the Aesthetics of Platform Games