This entire website is fully anonymous to support double-blind peer review.
Link to Anonymized PDF paper.Really? Yet another MIDI inpainting paper? Yes! Sure, MIDI inpainting's been demonstated many times, but can we make it easier to do, and more expressive? Yes!
In this study, we get comparable performance to prior methods while using a new(er) Hierarchical Diffusion Transformer (HDiT) that makes things simple and easy, and handles big images (long sequences) too! And we can inpaint unique shapes for our melodies and accompaniments.
But why merely solo piano MIDI??
The point of this study is not piano or MIDI per se, it's about exploring how to control generative music models, and piano MIDI is just a nice, compact data representation with which to conduct these investigations. Transformer-based approaches to music modeling tend to offer a limited suite of user control opportuntities compared to diffusion models. This paper is an early exploration of the simple idea of, "what if we took advantage of the prolific work done controllable image diffusion methods and applied them toward music generation?"
Examples
I. Click here for example Subjective Evaluation (Listening) Test
II. Example Generations below...(Work in Progress)
Example of "Drawn" Melody:
Original
PoM Undirected Melody
PoM-Drawn - RePaint=1 👎
PoM-Drawn Melody - RePaint=2 😀
PoM-Drawn - RePaint=4 👎
Trying to Spell Musical Words (Like Jacob Collier)
...doesn't sound amazing. We need to crank up the RePaint parameter to get enough notes to read the words, but more RePaint seems to introduce more randomness. Conditioning on chords might help, but that part of the code isn't working yet. What we can do instead is use a lower value of RePaint (say, 3) and then "ReMask" a few times, i.e., run the sampling again with a new mask in which the previously-generated notes are left alone. ReMask-ing with a lower value of RePaint seems to better preserve the "musicality" than the randomness we got from higher RePaint values.
(This ReMask stuff is not in the preprint BTW, because page limits & I didn't develop the idea much until after submission. Can add it to the final paper, demo, & code later; for now you have manually do the iteration yourself.)
Interactive Demo
Iterative Workflow Idea: Once the output image is generated, download it and upload it as a new input image, edit via drawing, then re-run the model!
If the demo below is 'down", there is a HuggingFace Spaces version of it, but you'll have to search for it as it will remove double-blind anonymity.