I played hooky on Tuesday. Instead of running around the Wynn lobby or the Las Vegas Convention Center with 90,000 other attendees at the NAB Show, I was in San Francisco at “The Voice of the Car” summit. It was a sold out event at the beautiful NASDAQ Center, featuring a series of keynotes and panels all with a singular focus:
To learn more about the role voice plays in the cars we drive.
There were a couple other “radio people” in the room, and most of the technologists on hand were a bit surprised to see me, and hear my perspective on why radio needs to be engaged in this technology. I attend a lot of conferences, and at the really good ones, themes emerge. From my point of view, one of the consistent threads was an emphasis on artificial voice personality.
You read that correctly.
The tech community – from Amazon to Google to SoundHound to some of the smaller players – have come to realize the optimal user experience is a voice that sounds human, interacting with consumers, answering their questions, and through AI, anticipating and fulfilling their needs.
Interestingly, that’s what great programmers have been coaching talent on for decades. Be real, be yourself, connect with your listeners, imagine you’re talking with just one person.
It turns out a mechanical, impersonal voice isn’t particularly inviting. Techies love when consumers refer to voice assistants by name – Siri, Alexa, Bixby, Hound. They’re working hard to make their voice assistants conversational, accessible without pushing buttons or talking around beeps and tones.
And as we learned almost a decade ago about cars, there’s an entirely new vocabulary – and more acronyms – that revolve around successfully integrating intelligent voices into various platforms.
There’s VUX – Voice User Experience that sums up the overall give and take consumers have with the voices that guide them, providing information and answers.
Or ASR – Automatic Speech Recognition; the ability for the technology to understand what consumers wants and deliver it to them.
And then there’s NLR – Natural Language Recognition; an even deeper “skill” for software developers to master around voice.
But when you dig deeper, all this software cannot measure our moods – Siri and Alexa don’t know when we’ve had a horrific day at work. Or when the sun comes out for the first time in weeks, lifting our spirits. Or the sense of anticipation we have when a new song is released by a favorite artist.
That’s the DJ’s job. And at great radio stations, they convey those feelings, emotions, and the local context consumers have come to expect from broadcast, real-time radio.
But that won’t stop technologists from trying. A new story in Axios says technology that can create fake celebrity speech could be the future of voice. And it’s coming sooner than we think.
Pindrop is the “audio biometrics” company featured in a story by Kaveh Waddell. The Pindrop team listened to hours and hours of Ellen DeGeneres, developing a synthetic voice.
It’s not perfect – the cadence, the emotion, the inflections are off just a bit. But you can hear where this technology is headed. You can hear how the “virtual Ellen” sounds..
And here’s the key quote from Pindrop CEO Vijay Balasubramaniyan who admits the software is not perfect:
“You are actually identifying all the things it takes to start mimicking a million years of human evolution in voice. Our synthesis systems do a good job at synthesizing a voice but not yet things like cadence, emotion and flair, which are all active areas of research.”
Google Voice has stepped into the arena as well – featuring a limited time John Legend voice on their Google Assistant platform.
Bernard Beanz Smalls writes in HipHop Wired that crooner Legend has lent his voice, singing “Happy Birthday,” giving the weather, and performing other simple tasks on Google Assistant here in the U.S.
https://www.youtube.com/watch?time_continue=7&v=AgHghcYqeto
You can see where this is headed – perhaps taking voicetracking as we know it to a whole new level. It also could “preserve” the voices of iconic radio personalities and voice artists, that could included Paul Harvey, Nick Michaels, Larry Lujack, or Chris Corley. Those prospects may be disturbing to some, but it’s a statement about where voice technology and the user experience is headed. And it could be a way for technology to create permanent reminders of the impact these artists had on the broadcast radio medium.
Meantime, my takeaways from “The Voice of the Car” Summit will hopefully be a reminder to programmers and on-air personalities of the importance of staying in touch with their audience and their market. You’ll read them over the next few weeks in this blog.
At this gathering, it struck me that when Artificial Intelligence is working to its full potential, it’s predicting our needs and wants based on our past behaviors and tendencies (“If you like this….”). There’s something comforting (or maybe a little creepy) about technology anticipating our moves and desires, based on our chain of experiences, knowing where we drive, our favorite Starbucks, or the dive bar we hang out in on Friday nights.
But none of those amazing algorithms can replace the element of surprise you can still get from listening to the radio – a deep cut, a guilty pleasure, a DJ doing something that’s not a benchmark bit, or a talk host breaking the usual mold and topic paradigms.
I can tell you from moderating hundreds of focus groups that listeners appreciate and remember those special moments – when Gene Simmons walked out on “Fresh Air’s” Terry Gross, when Art Penhallow played Steve Ray Vaughan’s “Pride & Joy” two times in a row because he loved the song so much when he first heard it, or when Steve Dahl blew up a bunch of disco records in between games of a doubleheader.
AI is about predictability. When it’s on its game, AM/FM is anything but.
To truly stand out in the audio ecosphere today, radio has simply got to be better than the bots.
They’re coming to a dashboard near you.
- Traveling At The Speed of CES - January 10, 2025
- The One Thing Missing At CES? - January 9, 2025
- AI Your Commercials - January 8, 2025
David Manzi says
When I read anything about technological leaps and how far they might go, it reminds me of my first experience with a video game–“Pong.”
Think of the leaps between “Pong” and, say, “Pac-Man,” a few years later. Then consider the amazing leaps since Pac-Man, which is coming up on it’s 40th birthday!
All that to answer the question of whether the techies will figure out such voice subtleties as “cadence, emotion and flair,” with a loud and resounding “Oh YES, you better believe they will figure it out!”
All the more reason those in the radio game better stay on THEIR game, because don’t look now but a bot is nipping at our heels.
Fred Jacobs says
No doubt about it, Dave. Things are moving FAST. Thanks for the comment & the POV.