If you haven’t taken part in the yanny/laurel controversy over the last couple of days, allow me to sincerely congratulate you. But your time is up. The viral speech synth clip has met the AI hype train and the result is, like everything in this mortal world, disappointing.
Sonix, a company that produces AI-based speech recognition software, ran the ambiguous sound clip through Google, Amazon and Watson’s transcription tools, and of course its own.
Google and Sonix managed to get it on the first try — it’s “laurel,” by the way. Not yanny. Laurel.
But Amazon stumbled, repeatedly producing “year old” as its best guess for what the robotic voice was saying. IBM’s Watson, amazingly, got it only half the time, alternating between hearing “yeah role” and “laurel.” So in a way, it’s the most human of them all.
Top: Amazon; bottom: IBM.
Sonix CEO Jamie Sutherland told me in an email that he can’t really comment on the mixed success of the other models, not having access to them.
“As you can imagine the human voice is complex and there are so many variations of volume, cadence, accent, and frequency,” he wrote. “The reality is that different companies may be optimizing for different use cases, so the results may vary. It is challenging for a speech recognition model to accommodate for everything.”
My guess as an ignorant onlooker is it may have something to do with the frequencies the models have been trained to prioritize. Sounds reasonable enough!
It’s really an absurd endeavor to appeal to a system based on our own hearing and cognition to make an authoritative judgement in a matter on which our hearing and cognition are demonstrably lacking. But it’s still fun.