ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement

Authors: Keshav Bhandari¹, Sungkyun Chang¹, Tongyu Lu², Fareza R. Enus¹, Louis B. Bradshaw¹, Dorien Herremans², Simon Colton¹

Affiliations:

¹ Queen Mary University of London, UK
² Singapore University of Technology and Design, Singapore

Abstract

Deep learning has enabled remarkable advances in style transfer across various domains, offering new possibilities for creative content generation. However, in the realm of symbolic music, generating controllable and expressive performance-level style transfers for complete musical works remains challenging due to limited datasets, especially for genres such as jazz, and the lack of unified models that can handle multiple music generation tasks. This paper presents ImprovNet, a transformer-based architecture that generates expressive and controllable musical improvisations through a self-supervised corruption-refinement training strategy. ImprovNet unifies multiple capabilities within a single model: it can perform cross-genre and intra-genre improvisations, harmonize melodies with genre-specific styles, and execute short prompt continuation and infilling tasks. The model's iterative generation framework allows users to control the degree of style transfer and structural similarity to the original composition. Objective and subjective evaluations demonstrate ImprovNet's effectiveness in generating musically coherent improvisations while maintaining structural relationships with the original pieces. The model outperforms Anticipatory Music Transformer in short continuation and infilling tasks and successfully achieves recognizable genre conversion, with 79\% of participants correctly identifying jazz-style improvisations.

Read the full paper | View on GitHub

Expressive Style Aware Improvisations

Musical improvisation is a process of generating music that introduces meaningful variations to a given piece, either by adopting a different style or retaining the original style. The goal of improvisation is to explore new creative expressions while maintaining a recognizable connection to the original content. This involves modifying elements such as melody, harmony, and rhythm to reflect either the original genre or a target genre, depending on the context and user preferences.

In this video, we showcase a cross-genre improvisation of Debussy’s Clair de Lune (Classical → Jazz) from pass 3. As the MIDI plays, the on-screen keyboard visualizes the performance, while the chord identifier in the top right highlights the harmony.

In this video, we showcase a cross-genre improvisation of Chopin’s Nocturne (Classical → Jazz) from pass 5. As seen (and heard), several chords and passages sound jazzy in this version.

In this section, we present examples of expressive style-aware improvisations generated by ImprovNet. The model can perform cross-genre improvisations, transforming classical music into jazz styled improvisations, as well as intra-genre improvisations, where the style remains consistent with the original genre. The generated improvisations exhibit diverse musical expressions while preserving the structural characteristics of the original compositions.

Original Classical Example	Cross-Genre Improvisation (Classical -> Jazz)	Intra-Genre Improvisation (Classical -> Classical)

Expressive Jazz Style Intra-Genre Improvisations

In this section, we present examples of expressive jazz style intra-genre improvisations generated by ImprovNet. The model can generate jazz-style improvisations that maintain the original genre's characteristics while introducing expressive variations. The generated improvisations exhibit diverse musical expressions while preserving the stylistic characteristics of the input compositions.

Original Jazz Example	Intra-Genre Improvisation (Jazz -> Jazz)

Expressive Style Aware Harmonization

ImprovNet can harmonize monophonic melodies with genre-specific styles, generating expressive and stylistically coherent harmonizations. In this section, we present examples of expressive jazz and classical style harmonizations generated by ImprovNet using the logit constraints and Skyline corruption function.

Monophonic Melody Example	Jazz Harmonization	Classical Harmonization

Short Prompt Continuation

ImprovNet can generate short continuations (5-20 seconds) based on prompts. In this section, we present examples of short prompt continuations generated by ImprovNet and Anticipatory Music Transformer (AMT).

Prompt	Anticipatory Music Transformer	ImprovNet

Short Infilling

ImprovNet can perform short infilling (5-20 seconds) tasks when the left and right segments are provided. In this section, we present examples of short infillings generated by ImprovNet and Anticipatory Music Transformer (AMT). The left and right context segments are 20 seconds long and both models generate 20 seconds in the middle.

Original Composition Anticipatory Music Transformer ImprovNet

Bonus - Reharmonization

ImprovNet is also capable of reharmonizing an existing composition with genre-specific styles. In this section, we present examples of reharmonizations generated by ImprovNet using the logit constraints and Skyline corruption function.

Original	Jazz Reharmonization	Classical Reharmonization