As described by Kreuk, the model uses an EnCodec audio tokenizer based on a transformer language model. Users can demo MusicGen through Hugging Face’s API, though, generating some music could take some time depending on how many users are using it at once. You can use the Hugging Face site to create your own instance of the model for much faster outputs. Otherwise, you can download the code and run it yourself if you have the know-how and the rig to support it.
Advertisement
Advertisement
Our own tests included a synth-heavy “symphonic rendition of the happy birthday theme” and a rather crackly “Lo-fi hip hop track with samples from nature including crickets.” There are no lyrics included in the songs by default. Gizmodo tested the system by trying our own optional audio track featuring lyrics from yours truly (if you really want to stress your ears on my glass-cracking singing voice, you can find that in our previous tests of Apple Music’s karaoke feature). The prompt “Grunge song with heavy bass and violin accompaniment” came out more crackly with the added lyrics than the same prompt running without it.
It’s unclear how much the AI comprehends certain composers. We asked it to create a “Hans Zimmer score for a steampunk medieval film” though it’s hard to say if the AI could really replicate Zimmer’s themes.
While plenty of other models are running text generation, voice synthesization, generated art, and even short video, there haven’t been many quality examples of music generation released to the public. According to the accompanying research document available on the preprint arXiv repository, one of the main challenges with music is that it requires running the full frequency spectrum, which requires more intense sampling. That’s not to mention the complex structures and overlapping instrumentation found in music.
Meta also compared its system to Google’s MusicLM text-to-music model. Meta has its own page showcasing the features of the two models for direct comparison.
Though for artists, what may be most concerning about the model is its training data. According to the research paper, MusicGen was trained on 20,000 hours of licensed music from an internal dataset that includes 10,000 music tracks. In addition, the company used around 390,000 instrument-only tracks featured on Shutterstock and Pond5. The Meta researchers claimed all the music their model is trained on was “covered by legal agreements with the right holders.” This includes a deal with Shutterstock.
Advertisement
Shutterstock signed a deal with DALL-E creator OpenAI last year, and it already has its own AI image generation tool that’s pre-trained on all contributors’ images. Still, that doesn’t mean artists are necessarily happy about their work being used to train AI. Some artists have already sued some of the biggest AI art companies like Stability AI and Midjourney, with allegations aimed directly at how AI datasets suck up mass amounts of licensed content without user permissions. This grows more complicated when big tech companies like Meta can afford to license creative content for use in its AI generation. For a user, the risk that the AI is directly plagiarizing other musicians’ work, licensed or not, looms in the background.
Like most big tech companies, Meta has been on an AI kick as of late. Compared to its big tech brethren, Meta has stated it wants to release more open-source models into the ether for anyone to pick up and use. It’s an interesting tactic to make the company stand out from the likes of OpenAI, Microsoft, and Google which have become increasingly secretive. Still, it doesn’t mean Meta can avoid controversy, especially as creatives are concerned companies will use AI for artistic tasks rather than real flesh and blood creatives. In their paper, the Meta researchers acknowledged that AI “can represent an unfair competition for artists.” But they claimed that using open models can give music amateurs and professionals new tools for making music.
Services Marketplace – Listings, Bookings & Reviews