tinySounds

for voice and musebot ensemble

An ironic work in which tiny sounds – quiet noises made by the human voice that are barely audible – serve as an input for a noisy and exuberant musebot ensemble that autonomously responds, accompanies, and argues with the live input. Musebots are intelligent musical agents that decide how to respond to their environment – and each other – on their own, based upon their internal beliefs, desires, and intentions.

Machine learning algorithms are wonderful for sifting through data and discovering relationships; more challenging is how these algorithms can be used for generation. It isn’t that difficult, for example, to train a system to provide similar sounds for a database, given a live sound. But what’s the artistic interest in that? Similarly, it isn’t that difficult to extract live performance information from an improvising musician – activity level, general frequency range, timbre – so that the system responds likewise. But, again, reactive systems lose interest fairly quickly.

I find it much more interesting when my musebots go off on their own, exploring their own ideas through beliefs they may have formed incorrectly and unintentionally. For that reason, I usually build a lot of ambiguity into my analysis, or provide conflicting information. What happens when one musebot is sure of something, while another is absolutely sure of something else? And what if a third musebot just doesn’t care?

In tinySounds, musebots are trained using a neural net on a corpus that has been hand-tagged for valence and arousal measures, as well as preanalysed for spectral information. However, the correlation between audio features (what the musebots are listening for) and affect (valence and arousal) isn’t direct; in assigning the latter, I may decide that a sound from the corpus is complex and active, but my reasons for doing so may not use the same information as the musebots are provided with. Thus, a musebot may decide that, based upon what it has learned, a live sound is high  valence / high arousal, but the listener may perceive it otherwise. This isn’t a flaw in the system; it’s a feature!

Lastly, my role as overseer in the musebot ensemble allows me to further disrupt how the musebots apply their knowledge. The corpus is organized semantically (i.e. voice sounds, kitchen sounds, transportation sound, etc.); once a musebot is using a certain subdirectory, it can’t easily switch to another. As a result, its choice of related sound, whether affective or timbral, is limited to what is immediately available to it. If the musebots are frustrated, they haven’t mentioned it to me (yet).