Abstract
Deep learning has enabled remarkable advances in style transfer across various domains, offering new possibilities for creative content generation. However, in the realm of symbolic music, generating controllable and expressive performance-level style transfers for complete musical works remains challenging due to limited datasets, especially for genres such as jazz, and the lack of unified models that can handle multiple music generation tasks. This paper presents ImprovNet, a transformer-based architecture that generates expressive and controllable musical improvisations through a self-supervised corruption-refinement training strategy. ImprovNet unifies multiple capabilities within a single model: it can perform cross-genre and intra-genre improvisations, harmonize melodies with genre-specific styles, and execute short prompt continuation and infilling tasks. The model's iterative generation framework allows users to control the degree of style transfer and structural similarity to the original composition. Objective and subjective evaluations demonstrate ImprovNet's effectiveness in generating musically coherent improvisations while maintaining structural relationships with the original pieces. The model outperforms Anticipatory Music Transformer in short continuation and infilling tasks and successfully achieves recognizable genre conversion, with 79\% of participants correctly identifying jazz-style improvisations.
Read the full paper | View on GitHub
Musical improvisation is a process of generating music that introduces meaningful variations to a given piece, either by adopting a different style or retaining the original style. The goal of improvisation is to explore new creative expressions while maintaining a recognizable connection to the original content. This involves modifying elements such as melody, harmony, and rhythm to reflect either the original genre or a target genre, depending on the context and user preferences.
In this video, we showcase a cross-genre improvisation of Debussy’s Clair de Lune (Classical → Jazz). As the MIDI plays, the on-screen keyboard visualizes the performance, while the chord identifier in the top right highlights the harmony.
In this section, we present examples of expressive style-aware improvisations generated by ImprovNet. The model can perform cross-genre improvisations, transforming classical music into jazz styled improvisations, as well as intra-genre improvisations, where the style remains consistent with the original genre. The generated improvisations exhibit diverse musical expressions while preserving the structural characteristics of the original compositions.
Original Classical Example | Cross-Genre Improvisation (Classical -> Jazz) | Intra-Genre Improvisation (Classical -> Classical) |
---|---|---|
In this section, we present examples of expressive jazz style intra-genre improvisations generated by ImprovNet. The model can generate jazz-style improvisations that maintain the original genre's characteristics while introducing expressive variations. The generated improvisations exhibit diverse musical expressions while preserving the stylistic characteristics of the input compositions.
Original Jazz Example | Intra-Genre Improvisation (Jazz -> Jazz) |
---|---|
ImprovNet can harmonize monophonic melodies with genre-specific styles, generating expressive and stylistically coherent harmonizations. In this section, we present examples of expressive jazz and classical style harmonizations generated by ImprovNet using the logit constraints and Skyline corruption function.
Monophonic Melody Example | Jazz Harmonization | Classical Harmonization |
---|---|---|
ImprovNet can generate short continuations (5-20 seconds) based on prompts. In this section, we present examples of short prompt continuations generated by ImprovNet and Anticipatory Music Transformer (AMT).
Prompt | Anticipatory Music Transformer | ImprovNet |
---|---|---|
ImprovNet can perform short infilling (5-20 seconds) tasks when the left and right segments are provided. In this section, we present examples of short infillings generated by ImprovNet and Anticipatory Music Transformer (AMT). The left and right context segments are 20 seconds long and both models generate 20 seconds in the middle.
ImprovNet is also capable of reharmonizing an existing composition with genre-specific styles. In this section, we present examples of reharmonizations generated by ImprovNet using the logit constraints and Skyline corruption function.
Original | Jazz Reharmonization | Classical Reharmonization |
---|---|---|