Google’s MusicLM is rather good at creating music from text descriptions

January 28, 2023

MusicLM is Google’s latest generative AI and it can turn text descriptions of varying complexity into high-fidelity music.

MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes.

Text-to-music models are not new, but Google says (via TechCrunch) “MusicLM outperforms previous systems both in audio quality and adherence to the text description.” The rich caption examples below generated 30-second audio pieces:

“The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls.”

“Epic soundtrack using orchestral instruments. The piece builds tension, creates a sense of urgency. An a cappella chorus sing in unison, it creates a sense of power and strength.”

“This is an r&b/hip-hop music piece. There is a male vocal rapping and a female vocal singing in a rap-like manner. The beat is comprised of a piano playing the chords of the tune with an electronic drum backing. The atmosphere of the piece is playful and energetic. This piece could be used in the soundtrack of a high school drama movie/TV show. It could also be played at birthday parties or beach parties.”

One particular fun demo is taking a description of a painting and setting loose:

Napoleon Crossing the Alps by Jacques-Louis David: “The composition shows a strongly idealized view of the real crossing that Napoleon and his army made across the Alps through the Great St Bernard Pass in May 1800.” [Wikipedia]

There’s then long generation for 5 minutes for “melodic techno” (below) and “swing”:

MusicLM is capable of generating various genres and even replicate “musician experience level” (e.g., beginner, intermediate professional). Going forward, Google might explore generating lyrics, improving vocal quality, and higher sample rates.

Google has “no plans to release models at this point,” citing the need for more work. More generated music examples can be found here. It joins the company’s working on text-to-image and text-to-video.

Add 9to5Google to your Google News feed.

Author: Abner Li
Source: 9TO5Google

585

0