ACM - Computers in Entertainment

DarwinTunes 2.0: Evolving Music within a Breeding Game

By Stephen J. Welburn, Carl J. Bussey, Armand M. Leroi, Matthias Mauch, Robert M. MacCallum

Interactive genetic algorithms have been widely applied to music generation, but they are typically aimed at single users and are thus confined to small populations and can only explore a limited creative universe. In DarwinTunes we have previously shown that larger populations can be maintained using the selective input of many thousands of users. However, motivation of large numbers of users proved difficult, partly because the music created in early generations often sounds unpleasant. Significant participation could only be obtained for short periods after coverage on radio, news sites and social media. In an attempt to achieve sustained higher levels of engagement, DarwinTunes 2.0 was developed featuring two main novelties: (1) use of a breeding paradigm (artificial selection), which gives each participant ownership of the audio they generate, and (2) a game-like interface in which participants garner points if their loop is selected by other users for breeding. We found that the new implementation sustained a small but active user base without active media management. Throughput, measured in terms of the amount of evolved audio auditioned by users, was several times higher than the previous version. Timbral analysis of the evolved music using Music Information Retrieval techniques showed that changes through time were more pronounced than differences between users. Given enough simultaneous users, the new game-based implementation with breeding by individual users supports overall musical evolution driven by audience-selection that is faster and arguably more engaging than before.

CCS Concepts: .Human-centered computing Collaborative interaction; .Applied computing Sound and music computing; .Computing methodologiesGenetic programming;

Additional Key Words and Phrases: Evolutionary algorithms, user interaction, music composition



Algorithmic music composition using Darwinian-like evolutionary principles [Miranda and Biles 2007] has a history stretching back more than 20 years. Typically, evolutionary music (EM) algorithms are implemented as interactive genetic algorithms, whereby one or more human judges assign fitness values to individuals and thus control, to some extent, their reproductive success. Given the potential for EM tools to enable anyone, regardless of their talent for composition, to create music, it is surprising that it has not become a widespread phenomenon, particularly since the advent of the Internet and social networks.

One barrier to the uptake of EM is the fatigue experienced by human fitness evaluators who may have to listen to hundreds or thousands of musical fragments during the process. A partial solution to this is to provide computational assistance to the human participant, for example by training machine learning classifiers to mimic previous choices by the user [Johanson and Poli 1998; Tokui and Iba 2000], or by filtering individuals based on the “music-like” distribution of features extracted via Music Information Retrieval (MIR) techniques [Galanter 2013]. Another approach is to distribute the workload across multiple users. Faster progress gained by parallelising the human fitness evaluation should delay the onset of fatigue. In our previous work, in a project entitled DarwinTunes [MacCallum et al. 2012], we found that several thousand users could be enticed to participate for short periods in an EM experiment, referred to here as DT1. However, we did not manage to engage many users in the long term. We concluded that short-term interest was driven by our users’ interest in crowd-science, while longer-term commitment was deterred by deficiencies in usability and user engagement/entertainment value.

A major issue with DT1’s user interface was the requirement for users to play an Internet radio stream in a dedicated audio player, such as iTunes, and mentally transfer information (tune identifiers) from there to a web form in order to provide feedback to the algorithm. The motivation for this Byzantine implementation was to allow gapless playback of the seamless four-bar loops generated from each population individual for as many users as possible (HTML5 Audio was not widely available until 2011).

Another problem in DT1 was a lack of perceivable evolutionary progress from the user’s point of view. Biological and algorithmic evolution is slow processes and when using only human fitness evaluation the rate of progress in EM is limited by the rate at which users can audition the loops. DT1, with its single audio stream, had no parallel auditioning capability and therefore evolution could not be accelerated during periods of high visitor numbers. Thus with a constant period of approximately 20-30 minutes between the auditioning of a loop and its children, few users would participate for long enough to notice any qualitative changes in the music. Without this feedback, and with the added challenge of listener fatigue, we suspect that users lacked the motivation to participate for longer periods.

One of the principal aims of the DT1 experiment was to assess the potential for audience selection to drive the evolution of appealing music. Therefore it was decided to start with a population of relatively unappealing tunes—generated at random but constrained within Western musical norms, including a 4/4 time signature and the standard Western 12 note scale. Each individual musical individual was only four bars long and all individuals shared the same tempo. Furthermore, the evolutionary sound synthesis in DT1 was extremely constrained. Only genetically determined sine-wave additive synthesis was permitted, and typically bell or marimba-like timbres predominated. The rationale for not including synthesizer presets or samples of human-designed instruments was to avoid obvious selection bias towards popular or emotive instruments in early generations, rather than more general musical appeal. These implementation choices therefore limited the entertainment value of the experiment—at no point were DT1 loops going to compete with “real music”, and the likelihood of reaching a mass audience was small.

Finally, DT1 provided little or no sense of community for its participants. The experimental design intentionally discouraged contact between human judges in order to obtain independent fitness ratings and avoid “viral” peer dynamics [Salganik et al. 2006]. The lack of built-in social features most likely hindered the development of DT1 into a self-sustaining phenomenon and also made it look primitive and dated.

Here we describe DarwinTunes 2.0 (DT2), the successor to the DT1 experiment. In this paper we focus on the implementation choices made with the intent to improve user engagement.


The main differences between DT2 and its predecessor are: i) simplified user interface (UI); ii) giving users a direct choice of breeding partners; and iii) gameplay and scoreboards.

The new UI was designed for the web but optimized for mobile and tablet usability from the outset. Figure 1 shows the layout of the UI. The left and right columns present eight individuals to the user via a vertical circular scrolling mechanism similar to the wheels in a fruit machine. When an individual is in the central position its audio is played via HTML5 Audio, and it can be stopped and started with a single click or touch. Thus no external audio application is required.

Gameplay revolves around users, or more appropriately “players”, having ownership of the loops they produce by breeding specific pairs of individuals together and choosing one of the eight generated offspring to survive (Figure 1B). For the first breeding event, players choose two loops belonging to other players, but thereafter they breed their most recently selected offspring with a loop selected from those belonging to other recently active players on the left hand side of the interface (Figure 1A). When another player’s loop is chosen for breeding, that player gains one point. The intention here was to provide the motivation for players to breed/select loops that would be appealing to other users.

From an individual player’s point of view, DT2 is a breeding game, evolving loops by artificial selection. However, in the broader context genes coding for appealing loops should spread throughout the population of recently active users’ loops more rapidly than the genes for less appealing music. Thus there is an element of evolution by natural selection, though this does depend on a large sustained user base.

To maximize participation we made the login procedure as simple as possible. There is no entry questionnaire or email confirmation required—users just have to pick a nickname and a password if they want to play again with the same nickname. A down-side to this relaxed policy is that users can easily create multiple nicknames and game the scoring system by only breeding with their alter egos. However, we wanted users to create their own breeding strategies and allowing multiple nicknames facilitates this.

Player scores are presented on a separate webpage that continuously updates. The scoreboard shows all-time scores and also a “recent” score calculated considering only the 200 most recent breeding events. Both the game and scoreboard pages are implemented in HTML5 and Javascript and interact with same back-end data API via AJAX calls.



The genetic representation of DT2 loops is essentially identical to that previously used in the DT1 experiment, with just a few new types of “genes” added. Very briefly, tree-like or branched genomes are randomly initialized following a grammar such that the genome always transforms into syntactically correct computer code. When executed, this code generates the audio data for a loop, which is the phenotype that corresponds to an individual’s genotype. The use of a tree-like genome allows the number of instrument tracks, notes in a melody, waveforms in an additive synth, and effects in an effect chain to be evolvable, and an infinite universe of four bar loops to be explored. Notes can be placed at semiquaver resolution (triplets are not possible) and a small before- or after-beat time offset is also specified by the genotype, to allow less machine-like music to evolve.




Figure 1.   Screenshots of the DT2 music breeding game. Panel A shows the selection from other users’ loops of a breeding partner for the current player’s tune “Your DarwinTune”. Panel B shows the selection of a single child from eight offspring to become “Your DarwinTune” for the next round of breeding. Clicking or touching the solid-colored circles plays that individual’s audio loop. A new user selects two loops from the left hand side when breeding for the very first time (not shown).


While DT1 only used additive sound synthesis in order to concentrate on audience selection on pure sounds with minimal human origins, DT2 additionally has a large number of orchestral instrument and drum kit samples that the genetically specified music can make use of. A genome can encode just drums or just tuned instrument sounds, or both. In addition, a voice synthesizer fed genetically specified phonemes was also a source of input audio for sound generation. Three new effects were added: noisify (adds Gaussian noise), distortion and delay line, taking the total number of effects available in DT2 to 14. These audio synthesis enhancements, we felt, would create more accessible music and hopefully improve user engagement and recruitment. In DT1, the computational efficiency of mating and audio rendering was not critically important because loop generation was decoupled from loop auditioning. If loop generation lagged behind loop auditioning, the DT1 audio stream would play old loops again to users, in order to avoid interruptions to the listening experience. In DT2, however, it is important for loops to be mated and rendered as quickly as possible because the user is waiting to audition the offspring. The genetic recombination code and some previously slow effects were optimized to reduce the waiting time, and the genetic and rendering operations were run in parallel on appropriate hardware.

As with DT1, the loops in DT2 are constrained to four bars of 4/4. The fixed tempo chosen for DT2 is 130 beats per minute.



DT2 was developed primarily in response to an invitation from the Discovery Festival (DF) in the Netherlands. DF asked us to present an evolutionary music installation for their evening science expo and social event. A major challenge was to demonstrate recognizable progress in the evolution of music under listener selection lasting just a few hours. This was the primary motivation to abandon the single audio stream in favor of distributed parallel auditioning by multiple users.

DT2 was launched exclusively at the DF event on the evening of 27 September 2013. We had the use of eight tablets with two sets of headphones per device in an area with sofas and beanbags and a large screen showing the scoreboard. Additional participation was possible via visitors’ own smartphones using the event’s wireless Internet connection. Access was widened to global Internet users later that night. During the first few hours, despite a high level of interest and continuous participation on all devices, musical evolution was not rapid nor recognizable. Two factors were most likely responsible: the noisy environment made it difficult for users to hear the differences between loops, and recombination only very rarely mixed parental contributions to offspring, so users did not feel that they were actually breeding loops. The experiment was terminated on 6 October 2013 after no significant improvement in usage or musical appeal was observed.

Some audio from the DF experiment is available on SoundCloud [SoundCloud 2013] but we do not present detailed results here. Instead we focused on making further improvements, as detailed below.



Prior to relaunch at the Anonymous Festival (AF, 10 May 2014) several fixes and new features were implemented. The recombination algorithm was changed such that up to three track nodes are preferentially chosen as crossover points (typically, two genomes are recombined at 20-30 crossover points). This change was empirically determined to generate a good mix of parental musical material in the eight offspring from each mating.

The algorithm for presenting other users’ loops for mating was adapted to allow a social network to emerge among the players: previously all the loops were from other users selected at random, but in the new social implementation approximately half come from users that the player has previously chosen loops from for breeding.

To enhance the sense of ownership and involvement, download links for the user’s latest selected loop were made available in WAV, MP3 and Ogg Vorbis formats. Downloaded loops can be combined offline by the player into longer pieces of music.

We presented DT2 at AF at a small table with three tablets and headphones, again with the scoreboard shown on a large monitor (alternating with some explanatory infographics). Again, visitor interest was high and the tablets were continuously busy for approximately five hours.

A few weeks after the relaunch, a streaming Internet radio service was introduced. The streaming service does not allow users to provide feedback or perform matings, it simply plays loops from the 24 most recent players in a continuous manner—with those usernames displayed in the client audio player. The radio stream is listed in the iTunes directory in the Electronica section. Later, in October 2014, the play order of loops was made more smooth and progressive using self-organizing map clustering of a small subset of the loops’ raw audio amplitudes—the loops now being streamed in cluster order.

A Twitter bot was also set up to tweet under the @darwintunes account when new players joined the game such that no more than 6 tweets per day are sent. Content was also posted manually on the DarwinTunes Twitter and Facebook accounts. However, no specific attempts were made to connect with the traditional and mainstream media, as we preferred to see if the DarwinTunes game would grow in popularity organically.



User activity data for 42 weeks following the AF relaunch is presented in Figure 2. After the AF-related peak of activity, a gradual decline in user numbers is seen, although Google Analytics (GA) session counts stay relatively flat. At week 2014-44 a large spike in activity is seen as the result of an unsolicited mention on Reddit’s InternetIsBeautiful page [Reddit 2014]. Overlayed with the data for the period following week 2014-44 is GA data from a similar period during the DT1 experiment following a mention on the website. Although the traffic spikes are of equivalent magnitude, and traffic declines markedly for both DT1 and DT2, the GA session data for DT2 is slightly more robust (GA offers no explanation for the mini-rebound during weeks 2015-03 and 2015-04).

In terms of the amount of audio generated and auditioned, DT2 has been much more successful. Figure 3 shows the number of offspring auditioned, assuming that all eight offspring per mating are auditioned. However, gameplay does not require the user to audition all offspring before selecting a single individual to take forward in the game, so in reality the fraction auditioned may be smaller. But even assuming an auditioning rate of 50%, substantially more loop auditioning activity is recorded in DT2 compared with an equivalent period during the DT1 experiment. We tentatively attribute this success to the user-wise parallelization of auditioning and to the simplification of the user interface.

While watching users play the game and discuss it on social networks, we realized that users were not only registering multiple nicknames in order to save their own personal “seed bank” of musical genotypes, but also to perform matings frequently between their own loops either as part of an intentional breeding strategy or in order to game the scoring system. We defined this so-called inbreeding as the situation where three or more breeding events took place between two specific users in both directions during a given week. All loops generated by breeding events between those two users for that week were flagged as inbred, and are marked in green on Figure 3. Although inbreeding does regularly take place, it takes place at a relatively low level, and our measure may also include two friendly users playing collaboratively. Thus we can infer from Figures 1&2 that, outside traffic spikes, at least 50 users per week are more or less independently breeding around 4000 loops (and selecting 1/8th, or 500, of these). At the time of writing (May 2015) the GA statistics show 20% returning users, indicating that user numbers are maintained by new visitors, although the vast majority of these are acquired directly, without a known referral source, making further analysis of user acquisition difficult.

Figure 2.   User activity by week. Bars show the number of nicknames active in the DT2 game per week. The blue section indicates the proportion of returning users. Equivalent user session data from Google Analytics (GA), for sessions only including a visit to the game page, are plotted against the right hand axis. GA data from DT1 during a similar post-traffic spike period is also shown (dates in parentheses).


Given that we allow individual users to perform artificial selection but hope that, overall, inter-user breeding will produce more natural selection-like evolutionary dynamics, we would like to quantify the relative contributions of users and time to the evolution of musical phenotypes. To put this another way, given the musical features of a loop, is it easier to predict which user bred it, or which time period it came from? To estimate this effect we calculated the composition of each loop with respect to eight timbral descriptors, T1-T8 based on our previous work [Mauch et al. 2015] using MIR techniques. These descriptors were automatically generated from 30 second samples of human-composed chart music, and we only see appreciable quantities of three descriptors in the DT2 loops: T1: percussive, aggressive sounds; T2: mellow sounds; and T3: energetic/speech (see Figure 4). Other timbral qualities, in particular piano-like, guitar-like and vocal vowel sounds are found, not surprisingly, in smaller quantities. Nevertheless we performed a principal components analysis (PCA) on all eight descriptors to transform and normalize the data in order to visualize the differences between user choices and the passage of time.

Week-wise and user-wise (for the 40 most active users) PCA plots are shown in Figure 5. Plotting the first two principal components, which explain 40% of the variance of the dataset, we see the weekly means (Figure 5A) migrating within and between distinct regions of the plot. On the other hand, user-based means (Figure 5B) are poorly separated except for three users whose loops contain a lot of the T1 (percussion) descriptor. The remaining users segregate into two clouds: a large cloud in the centre of the plot, high in T3 (energetic/speech) and a smaller cloud top right high in T2 (mellow). Thus we tentatively conclude that although individual users have the opportunity to breed and select their own distinctive timbral style, they rarely do so, and instead musical direction is driven by the “pack” of users breeding loops with each other.



Figure 3.   Number of loops auditioned. In bars, the number of offspring generated, and assumed auditioned, are shown. The green segments indicate the proportion of loops born out of inbreeding activity (see text for definition). The number of loops generated in the DT1 experiment during an equivalent post-spike period is shown as points.



Figure 4.   Weekly distributions of four timbral descriptors T1-4 whose relative contribution is calculated for each loop. Boxes show the median, and upper and lower quartiles. Whiskers show the 2nd and 98th quartiles. Not shown, descriptors T5-8, which show similar low frequency distributions as seen for T4.


Figure 5.   Visualization of timbral composition via first two principal components. In A, each point is a week-wise mean of PCA-transformed data with 95% confidence intervals for the mean. In B, user-wise means and confidence intervals are shown for the 40 most active users during the 42-week experiment. Points in B are projected through the same transformation as in A. The first two principal components explain 40% of the variance while the first six explain 90%. The loadings for the three main timbral descriptors, T1 (percussion), T2 (mellow), and T3 (energetic/speech), are shown as vectors.

Has DT2 generated appealing music like its DT1 predecessor? We have not performed a user survey to assess this, however audio examples are available in these two SoundCloud playlists: [SoundCloud 2014] [SoundCloud 2015a].

Finally, during the DT2 experiment, one clear case of user engagement was observed. SoundCloud user “jsmcpn” played the game regularly and downloaded loops to assemble into 20 longer tracks (usually 5 minutes or more). These tracks [SoundCloud 2015b; 2015c] demonstrate the ability for DT2 to generate a diverse range of sounds and styles, even within the constraints of the 130bpm 4/4 loop.



As described above, the DT2 implementation has been successful in maintaining a small community of users without any active media management or other enticements. It has not succeeded in building a continuously growing community, which could be for several reasons: perhaps there is little demand for EM toys and tools, maybe the DT2 implementation still lacks the features required to build an engaged community, or perhaps such a community requires active recruitment, nurturing and management?

Beyond the site visitor statistics, we have limited anecdotal evidence of demand for EM. Only one SoundCloud user has regularly engaged in public with DT2 as a compositional tool—others may have used it privately. We know that the user interface needs more work, since a better interface could generate more demand. In particular, we would like to build native mobile apps with more intuitive and engaging modes of gameplay. One option is to provide more sense of ownership by allowing users to keep a small pool of their own individuals for breeding, rather than just the single loop which is currently available in DT2. This would allow users, ideally restricted to one nickname (e.g. via enforced Facebook, Twitter or SoundCloud login), to breed loops in several directions at once. To make it easier for users to keep track of multiple loops, we could use parameters extracted with MIR techniques to decorate each loop with visual cues. It would also be worth exploring the use of automated fitness measures using MIR data, perhaps to discard the most unpleasant sounding audio. We would also bring the scoreboard into the same interface and provide some means of sharing auto-generated tracks (made from a selection of a user’s loops) via SoundCloud and social networks. User scores could be additionally incremented based on SoundCloud “likes” and other social network activity.

As in the previous DT1 experiment, publicity generated large spikes in activity with the number of active users then declining over the subsequent weeks. We observed a 26% increase in the number of returning users in the 15 weeks following the Reddit spike of weeks 2014-44 and 2014-45 compared to the 15 weeks preceding it, although this was not a significant increase under all the assumptions of the Student’s t-test (p = 0.15). A sustained campaign of outreach and publicity might solidify these gains in regular, returning users, which in turn would create a more vibrant and engaging musical ecosystem for new users, who may then decide to return. However as an unfunded part-time project, such focused outreach activities have been beyond the scope of the project.

In summary we have made several small but sure steps towards bringing EM to a wider audience and look forward to implementing future versions of DarwinTunes.


The authors would like to thank Imperial College London for server hosting.



P. Galanter. 2013. Computational Aesthetic Evaluation: Automated Fitness Functions for Evolutionary Art, Design, and Music. In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO ’13 Companion). ACM, New York, NY, USA, 1005–1038. DOI:

B. Johanson and R. Poli. 1998. GP-music: An interactive genetic programming system for music generation with automated fitness raters. University of Birmingham, Cognitive Science Research Centre.

R.M. MacCallum, M. Mauch, Austin Burt, and Armand M Leroi. 2012. Evolution of music by public choice. Proceedings of the National Academy of Sciences 109, 30 (2012), 12081–12086.

M. Mauch, R.M. MacCallum, Mark Levy, and Armand M Leroi. 2015. The evolution of popular music: USA 1960–2010. Royal Society Open Science 2, 5 (2015), 150081.

E.R. Miranda and A. Biles. 2007. Evolutionary Computer Music. Springer.

Reddit. 2014. Help a piece of music evolve through selective breeding. (Darwin Tunes). (2014). help a piece of music evolve through selective/

M.J. Salganik, P.S. Dodds, and D.J. Watts. 2006. Experimental study of inequality and unpredictability in an artificial cultural market. Science 311, 5762 (2006), 854–856.

SoundCloud. 2013. Discovery Festival 2013 (SoundCloud playlist). (2013).

SoundCloud. 2014. DarwinTunes 2.0 the first 45 days (SoundCloud playlist). (2014).

SoundCloud. 2015a. DarwinTunes 2.0 -Month by Month (SoundCloud playlist). (2015). https://soundcloud. com/uncoolbob/sets/darwintunes-20-month-by-month

SoundCloud. 2015b. The DarwinTunes Brood (SoundCloud playlist). (2015).

SoundCloud. 2015c. The DarwinTunes Brood Vol.2 (SoundCloud playlist). (2015).

N. Tokui and H. Iba. 2000. Music composition with interactive evolutionary computation. In Proceedings of the 3rd International Conference on Generative Art, Vol. 17. 215–226.



Authors’ addresses: S.J. Welburn, Queen Mary University of London; C.J. Bussey, Queen Mary University of London, (Current address) Native Instruments, Berlin, Germany; A.M. Leroi, Imperial College London; M. Mauch, Queen Mary University of London, (Current address) Apple Computer, London, UK; R.M. MacCallum, Imperial College London.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Copyright © 2019. All Rights Reserved