Visit our social pages

The New Google AI Update Converts Text Into Music

In the same way that systems like DALL-E create visuals from written prompts, Google researchers have created an AI that can generate minute-long pieces of music from text prompts and even whistling or humming melodies. According to TechChrunch, the AI technology can translate to different devices. This model is called MusicLM, and although you can’t interact with it, the company has released several samples that were created using it.

Examples of MusicLM


The examples are endless. There are 30-second clips of what sound like real songs made up of paragraph-long descriptions that describe a genre, atmosphere, and even specific instruments, along with a word or two like “melodic techno.” Five-minute pieces made from “Perhaps my favorite is a demonstration of “Story Mode” in which the model is basically given a screenplay to switch between gestures. Consider this example:



It might not be for everyone, but I can totally picture the person who wrote it (I’ve even listened to it on loop dozens of times while writing this article). The demo site also features examples of what the model produces when asked to produce 10-second instruments such as cellos or maracas (the latter being an example where the system performs relatively poorly), eight seconds Clips of a specific genre, music suitable for a prison break, and what a beginner versus advanced piano player would sound like. It also contains definitions of terms such as “future club” and “accordion death metal”.

Capabilities of MusicLM


MusicLM is capable of simulating human voices, and while it captures the tone and general sound of voices accurately, they have a distinctly strange feel to them. The easiest way to describe their sound is grainy or static. In the previous example, this feature is not so obvious, but I believe it shows it quite effectively.


Interestingly, this is the result of asking him to compose music for Jim. You may have also noticed that the lyrics are vulgar, but in such a way that if you’re not paying attention, you wouldn’t necessarily notice if you were listening to someone singing in smilash or a song that Sounds like English but it isn’t.


Google’s Working on the AI-Generator


I won’t claim to know how Google arrived at these results, but if you’re the kind of person who can understand this graph, Google has published a research paper that explains it in depth.


Previous Works by Google


Such systems have been credited with creating pop songs, imitating Bach better than humans in the 1990s, and accompanying live performances for decades. A more recent version uses the artificial intelligence imaging engine StableDiffusion to convert speech signals into spectrograms, which are then converted into music. According to the paper, MusicLM outperforms other systems in terms of its “quality and title restraint” and ability to pick up sounds and transcribe lyrics.


This last part is probably one of the researchers’ best demonstrations. You can listen to input audio, including someone playing or whistling a tune, and then hear how the model reproduces it as an electronic synth lead, string quartet, and so on. Based on the cases I’ve heard, it does the job well.


As with its earlier experiments in this form of AI, Google is approaching MusicLM with more caution than its competitors. The article concludes, “We have no plans to display the models at this time,” noting the risks of “potential misuse of original content” (read: plagiarism) and cultural appropriation or distortion.




It’s possible the technology could appear in one of Google’s fun musical experiments in the future, but for now, only those developing musical AI systems will be allowed to use the study. Google claims it will release to the public a dataset containing about 5,500 music-text pairs, which could help train and test other musical AIs.

Leave a Comment

Your email address will not be published. Required fields are marked *