In the same way that systems like DALL-E create visuals from written prompts, Google researchers have created an AI that can generate minute-long pieces of music from text prompts and even whistling or humming melodies. According to TechChrunch, the AI technology can translate to different devices. This model is called MusicLM, and although you can’t interact with it, the company has released several samples that were created using it.
Yesterday, Google published a paper on a new AI model called MusicLM.
The model generates 24 kHz music from rich captions like “A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. Induces the experience of being lost in space.” pic.twitter.com/XPv0PEQbUh
— Product Hunt 😸 (@ProductHunt) January 27, 2023
Examples of MusicLM
The examples are endless. There are 30-second clips of what sound like real songs made up of paragraph-long descriptions that describe a genre, atmosphere, and even specific instruments, along with a word or two like “melodic techno.” Five-minute pieces made from “Perhaps my favorite is a demonstration of “Story Mode” in which the model is basically given a screenplay to switch between gestures. Consider this example:
But that’s only one of its features.
Story Mode, for instance, generates music based on a sequence of text prompts.
time to meditate (0:00-0:15)
time to wake up (0:15-0:30)
time to run (0:30-0:45)
time to give 100% (0:45-0:60) pic.twitter.com/yTHgr5fIZo— Product Hunt 😸 (@ProductHunt) January 27, 2023
It might not be for everyone, but I can totally picture the person who wrote it (I’ve even listened to it on loop dozens of times while writing this article). The demo site also features examples of what the model produces when asked to produce 10-second instruments such as cellos or maracas (the latter being an example where the system performs relatively poorly), eight seconds Clips of a specific genre, music suitable for a prison break, and what a beginner versus advanced piano player would sound like. It also contains definitions of terms such as “future club” and “accordion death metal”.
Capabilities of MusicLM
MusicLM is capable of simulating human voices, and while it captures the tone and general sound of voices accurately, they have a distinctly strange feel to them. The easiest way to describe their sound is grainy or static. In the previous example, this feature is not so obvious, but I believe it shows it quite effectively.
Interestingly, this is the result of asking him to compose music for Jim. You may have also noticed that the lyrics are vulgar, but in such a way that if you’re not paying attention, you wouldn’t necessarily notice if you were listening to someone singing in smilash or a song that Sounds like English but it isn’t.
Google’s Working on the AI-Generator
I won’t claim to know how Google arrived at these results, but if you’re the kind of person who can understand this graph, Google has published a research paper that explains it in depth.
really well done, from SoundStream and AudioLM through MuLan to MusicLM 👏👏
the overall structure of MusicLM
= MuLan + AudioLM
= MuLan + w2v-BERT + SoundStream pic.twitter.com/7d7ks1A5sz— Keunwoo Choi (@keunwoochoi) January 27, 2023
Previous Works by Google
Such systems have been credited with creating pop songs, imitating Bach better than humans in the 1990s, and accompanying live performances for decades. A more recent version uses the artificial intelligence imaging engine StableDiffusion to convert speech signals into spectrograms, which are then converted into music. According to the paper, MusicLM outperforms other systems in terms of its “quality and title restraint” and ability to pick up sounds and transcribe lyrics.
This last part is probably one of the researchers’ best demonstrations. You can listen to input audio, including someone playing or whistling a tune, and then hear how the model reproduces it as an electronic synth lead, string quartet, and so on. Based on the cases I’ve heard, it does the job well.
As with its earlier experiments in this form of AI, Google is approaching MusicLM with more caution than its competitors. The article concludes, “We have no plans to display the models at this time,” noting the risks of “potential misuse of original content” (read: plagiarism) and cultural appropriation or distortion.
Conclusion
It’s possible the technology could appear in one of Google’s fun musical experiments in the future, but for now, only those developing musical AI systems will be allowed to use the study. Google claims it will release to the public a dataset containing about 5,500 music-text pairs, which could help train and test other musical AIs.