Versatile Symbolic Music-for-Music Modeling via Function Alignment

Abstract

Many music AI models learn a map between music content and human-defined labels. However, many annotations, such as chords, can be naturally expressed within the music modality itself, e.g., as sequences of symbolic notes. This observation enables both understanding tasks (e.g., chord recognition) and conditional generation tasks (e.g., chord-conditioned melody generation) to be unified under a music-for-music sequence modeling paradigm. In this work, we propose parameter-efficient solutions for a variety of symbolic music-for-music tasks. The high-level idea is that (1) we utilize a pretrained Language Model (LM) for both the reference and the target sequence and (2) we link these two LMs via a lightweight adapter. Experiments show that our method achieves superior performance among different tasks such as chord recognition, melody generation, and drum track generation.

Description of Demos

Common Settings

Compared Models

Drum to Others Generation

The model is given a drum track and generates the full song. The ground truths are from the RWC Pop Music Database.

Song Input & Ground Truth Our Models Baselines
Prompt Ground Truth FA_self FA_cross cocomulla seq2seq
RM_P090
RM_P008
RM_P005
RM_P001

Others to Drum Generation

The model is given a full song but the drum track is missing. The model needs to generate the drum track. The ground truths are from the RWC Pop Music Database.

Song Input & Ground Truth Our Models Baselines
Prompt Ground Truth FA_self FA_cross cocomulla seq2seq assistant
RM_P005
RM_P003

Chord to Melody Generation

The model is given a chord sequence and generates the melody (typically monophonic). The ground truths are from the Nottingham Dataset.

Song Input & Ground Truth Our Models Baselines
Prompt Ground Truth FA_self FA_cross cocomulla seq2seq melodyt5
jigs108
ashover28

Melody to Chord Generation

The model is given a monophonic melody and generates the chord sequence. The ground truths are from the Nottingham Dataset.

Song Input & Ground Truth Our Models Baselines
Prompt Ground Truth FA_self FA_cross cocomulla seq2seq melodyt5
waltzes5
waltzes30