Spotify Launches GPT-3 Backed AI DJ
Synthetic characters driven by AI are going to become commonplace
Convergence. I noticed in 2021 that there were a lot of synthetic media technologies that were growing but were typically looked at in isolation. Deepfakes, virtual humans, text and image generators, chatbots, and synthetic voices were all available in a cafeteria-style menu for anyone who wanted to implement them on their own or assemble multiple technologies for bespoke applications.
However, the real power of synthetic media emerged when you combined these technologies into a packaged solution. The convergence of synthetic media technologies is the catalyst for accelerated adoption.
Spotify just launched its AI DJ in the U.S. and Canada, and it is a perfect example of using synthetic media to enhance an existing consumer product. Spotify commented in a blog post about the release:
Ready for a brand-new way to listen on Spotify and connect even more deeply with the artists you love? The DJ is a personalized AI guide that knows you and your music taste so well that it can choose what to play for you. This feature, first rolling out in beta, will deliver a curated lineup of music alongside commentary around the tracks and artists we think you’ll like in a stunningly realistic voice.
Personalized Experience with a Twist
Spotify users are accustomed to personalization. This has been a core value proposition since the early days. However, the AI DJ takes this to a new level. In many ways, it is similar to the personalized playlists Spotify is so well known for, but the DJ plays a mix of songs you have already liked as well as new songs it thinks you will enjoy. Before it plays a new recommended song, the AI DJ jumps in and tells you about the song and why it thinks you will like it.
Many people have grown up with great radio DJs that curate your listening experience and use their interjections between songs to add interesting commentary. One problem with those experiences is that they are genre-based and not necessarily aligned with your preferences which may extend beyond a single genre. Another problem, a bigger problem, is the insertion of 12-18 minutes of ads for every hour of radio listening.
Spotify’s AI-enabled DJ is not burdened by either issue. If you are a premium user, you already have an ad-free experience. And, unlike the constant stream of music, the insertion of DJ commentary every few songs takes the experience up a level.
It may not be for everyone. Some people may prefer music with no interruptions. For me, listening to a mix of new and known music with occasional DJ commentary of 20-30 seconds is a superior product experience.
Stunningly Realistic Synthetic Voice
The DJ’s voice is based on Xavier “X” Jernigan, Spotify’s Head of Cultural Partnerships. Spotify commented in its announcement:
Previously, X served as one of the hosts on Spotify’s first (and personalized) morning show, The Get Up. His personality and voice resonated with our listeners and resulted in a loyal following for the podcast. His voice is the first model for the DJ, and we’ll continue to iterate and innovate, as we do with all our products.
I have heard a lot of synthetic voices over the years, and the quality, in general, has improved markedly over the past four years. The best implementations have tended to be speech-to-speech implementations where voice actors actually read a script, and AI is used to convert their performance to a synthetic voice, usually a clone of some known actor or celebrity.
Text-to-speech is the format of a synthetic voice rendering text as spoken audio is far more common and cost-efficient, but it generally does not have the same prosody fidelity for intonation and inflection. Spotify DJ is text-to-speech but seems to match the quality of speech-to-speech.
Many people were surprised when Spotify acquired speech-to-text pioneer Sonantic in June 2022. The company became famous for replicating Val Kilmer’s voice in the documentary about his life, Val, in 2021. We now know what Spotify had in mind.
GPT-3 Strikes Again
Of course, having a personalized music selection and a humanlike synthetic voice was only part of the recipe for Spotify’s new AI DJ. The company also needed to render interesting commentary when it does speak, and there would be no way to script this for 200 million premium subscribers. GPT-3 is being used to generate the words behind the voice of X.
Generative AI through the use of OpenAI technology…in the hands of our music editors [provides] you with insightful facts about the music, artists, or genres you’re listening to. The expertise of our editors is something that’s really important to our philosophy at Spotify.
We have experts in genres who know music and culture inside and out. And no one knows the music scene better than they do. With this generative AI tooling, our editors are able to scale their innate knowledge in ways never before possible.
In other words, Spotify created a fine-tuned model of GPT-3 based on its own internally generated knowledge base. And GPT-3 ensures each conversation is unique and humanlike in its delivery. There may be errors along the way in some of the generations, but this is a low-risk implementation of the technology, and it should default to the information in the knowledge base most of the time.
Synthetic Culture Emerges
It may seem like this is just another GPT-3 feature representing an incremental change in an already popular app. However, the AI DJ is likely to have both a symbolic and tangible impact on the adoption of synthetic media and generative AI technology.
The symbolic impact is that synthetic media is beginning to become commonplace in our cultural touchstones, which will make it both a shaper of culture as well as a reflection of it. The tangible impact is that more people than ever are being exposed to these technologies on a daily basis.
Synthetic media use is about to become as common as television or radio listening. The novelty may soon wear off because it will simply be embedded in so many services that it becomes an expectation.
If you have been wondering why people are so excited about generative AI and synthetic media, here is one more example. When new technologies become everyday technologies like the smartphone, a lot of change is afoot.
Great analysis and matches my experience.
The AI DJ still isn't universally rolled out and so skepticism seems plentiful from people who haven't even tried it yet. I got access and agree with you — this is one of the best voice experiences I've encountered because it's personalized to your tastes, entertains, works within a narrow range (isn't trying to do too much), and builds off a familiar and well known format (radio DJs).
I really want this in the car. If only CarThing had succeeded!