Meta just released an AI music generator that was trained on 20,000 hours of licensed music
MBW’s Stat Of The Week is a series in which we highlight a data point that deserves the attention of the global music industry. Stat Of the Week is supported by Cinq Music Group, a technology-driven record label, distribution, and rights management company.
Researchers at Facebook parent company Meta have developed an AI text-to-music generator called MusicGen.
The language model, described by Meta’s Fundamental AI Research (FAIR) team as “a simple and controllable model for music generation”, can take text prompts like, for example, ‘up-beat acoustic folk’ or “Pop dance track with catchy melodies” and turn them into new 12-second music clips.
The model, released as open source over the weekend, can also use melodic prompts to generate new music. You can see a demo here.
Meta says that it used 20,000 hours of licensed music to train MusicGen, which included 10,000 “high-quality” licensed music tracks, and as reported by TechCrunch, 390,000 instrument-only tracks from ShutterStock and Pond5.
Meta’s entrance into the world of text-to-music AI marks a significant moment in this fast-moving space, with the company becoming the latest tech giant, after Google, to develop its own language model that can generate new music from text prompts.
Google unveiled MusicLM, an ‘experimental AI’ tool that can generate high-fidelity music from text prompts and humming, in January, and made it publicly available last month.
Google explains that at the public-use level, its MusicLM tool works by typing in a prompt like “soulful jazz for a dinner party”.
The MusicLM model will then create two versions of the requested song for the person inputting the prompt. You can then vote on which one you prefer, which Google says will “help improve the AI model”. Google’s model was trained on five million audio clips, amounting to 280,000 hours of music at 24 kHz.
The Decoder reports that, “compared to other music models such as Riffusion, Mousai, MusicLM, and Noise2Music, MusicGen performs better on both objective and subjective metrics that test how well the music matches the lyrics and how plausible the composition is”.
You can see the comparisons between music generated by the different models here.
According to Facebook Research Scientist Gabriel Synnaeve, who announced the release of the research via LinkedIn over the weekend, Meta has released “code (MIT) and pretrained models (CC-BY non-commercial) publicly for open research, reproducibility, and for the broader music community to investigate this technology”.
Meta’s researchers have also published a paper outlining the work that went into training the model. Within the paper, they outline ethical challenges around the development of generative AI models.
According to the paper, the research team “first ensured that all the data we trained on was covered by legal agreements with the right holders, in particular through an agreement with ShutterStock”.
“Generative models can represent an unfair competition for artists, which is an open problem.”
Musicgen White paper
The paper added: “A second aspect is the potential lack of diversity in the dataset we used, which contains a larger proportion of western-style music.
“However, we believe the simplification we operate in this work, e.g., using a single stage language model and a reduced number of auto-regressive steps, can help broaden the applications to new datasets.”
Another challenge highlighted by the paper is that “Generative models can represent an unfair competition for artists, which is an open problem”.
The paper added: “Open research can ensure that all actors have equal access to these models. Through the development of more advanced controls, such as the melody conditioning we introduced, we hope that such models can become useful both to music amateurs and professionals.”
News of Meta’s AI music research arrives at a time of growing disquiet around the use of generative AI in the music business, due to issues around copyright infringement and the vast daily supply of content to DSPs.
In April, AI-generated music productions that mimic the vocals of superstar artists dominated headlines after a song called heart on my sleeve, featuring AI-generated vocals copying the voices of Drake and The Weeknd, went viral.
The track, uploaded by an artist called ghostwriter, was subsequently deleted from the likes of YouTube, Spotify and other platforms. On YouTube, a confirmation on what triggered the takedown of the track from that platform appeared on the holding page of ghostwriter’s now-defunct YouTube upload.
It read: “This video is no longer available due to a copyright claim by Universal Music Group.”
Speaking on Universal Music Group‘s Q1 earnings call in April, Sir Lucian Grainge, CEO & Chairman of Universal Music Group, noted that: “Unlike its predecessors, much of the latest generative AI [i.e. ‘fake Drake’] is trained on copyrighted material, which clearly violates artists’ and labels’ rights and will put platforms completely at odds with the partnerships with us and our artists and the ones that drive success.”
In his opening remarks to analysts on that same call, Sir Lucian Grainge also criticized the “content oversupply” that currently sees around 120,000 tracks a day distributed to music streaming services.
“Not many people realize that AI has already been a major contributor to this content oversupply,” said Grainge. “Most of this AI content on DSPs comes from the prior generation of AI, a technology that is not trained on copyrighted IP and that produces very poor quality output with virtually no consumer appeal.”
The rise of AI platforms that allow users to create vast volumes of tracks at the touch of a button has also exposed the potential for generative AI to be used for streaming fraud.
Via generative AI music apps, large volumes of audio content can be created by fraudsters and uploaded to DSPs with the aim of racking up huge numbers of plays of this content via bot-driven ‘streaming farms’.
In April, Spotify removed a substantial number of tracks – many created via AI music-making platform Boomy – from its service, citing “potential cases of stream manipulation”. (There was no suggestion that Boomy itself was responsible for the “stream manipulation” in question).
Back in January, we reported on a recent French study showing that up to 3% of music streams on services like Spotify are known to be fraudulent.
Last week, France-born music streaming service Deezer set out a strategy to address both the rise of AI music and fraudulent streaming activity on its platform.
Deezer’s announcement followed remarks made about AI by Jeronimo Folgueira, CEO of Deezer, to analysts on the company’s own Q1 earnings call in April, when he said that, “We want to give our customers a high-quality experience and relevant content, so obviously getting AI to flood our catalog is not something we’re super keen on, and we’re working on that.”
On that same call, however, Folgueira revealed that Deezer has itself used AI to generate content for its recently-launched wellbeing app, Zen by Deezer, which offers music and audio content to aid sleep, relaxation and meditation.
A number of entities in the music business are also embracing AI music technology for various applications.
Canadian singer, songwriter and record producer Grimes, for example, launched a new AI project in beta last month, inviting users to create songs using her voice in exchange for a 50% share of the master recording royalties.
On Monday (June 12), Believe-owned music distributor TuneCore announced that it has partnered with CreateSafe and Grimes to let TuneCore artists distribute collaborations created through Grimes’ Elf.Tech AI to all major streaming platforms.
Last month, South Korea-based entertainment giant HYBE released a new single called Masquerade which HYBE claimed to be the “first-ever multilingual track produced in Korean, English, Japanese, Chinese, Spanish and Vietnamese”.
According to HYBE, the artist behind the track, MIDNATT, sang the vocals in those six languages, and using AI, “the pronunciation data of native speakers was applied to the track to further refine the artist’s pronunciation and intonation”.
The multilingual track uses technology developed by Supertone, the fake voice AI company HYBE acquired last year in a deal worth around $32 million, following an initial investment in the startup in February 2021.
Cinq Music Group’s repertoire has won Grammy awards, dozens of Gold and Platinum RIAA certifications, and numerous No.1 chart positions on a variety of Billboard charts. Its repertoire includes heavyweights such as Bad Bunny, Janet Jackson, Daddy Yankee, T.I., Sean Kingston, Anuel, and hundreds more.Music Business Worldwide