In context: A few of the implications of in the present day’s AI fashions are startling sufficient with out including a hyperrealistic human voice to them. We now have seen a number of spectacular examples over the past 10 years, however they appear to fall silent till a brand new one emerges. Enter Miles and Maya from Sesame AI, an organization co-founded by former CEO and co-founder of Oculus, Brendan Iribe.
Researchers at Sesame AI have launched a brand new conversational speech mannequin (CSM). This superior voice AI has phenomenal human-like qualities that we’ve got seen earlier than from firms like Google (Duplex) and OpenAI (Omni). The demo showcases two AI voices named “Miles” (male) and “Maya” (feminine), and its realism has captivated some customers. Nonetheless, good luck making an attempt the tech your self. We tried and will solely get to a message saying Sesame is making an attempt to scale to capability. For now, we’ll must accept a pleasant 30-minute demo by the YouTube channel Creator Magic (under).
Sesame’s know-how makes use of a multimodal strategy that processes textual content and audio in a single mannequin, enabling extra pure speech synthesis. This technique is just like OpenAI’s voice fashions, and the similarities are obvious. Regardless of its near-human high quality in remoted assessments, the system nonetheless struggles with conversational context, pacing, and movement – areas Sesame acknowledges as limitations. Firm co-founder Brendan Iribe admits the tech is “firmly within the valley,” however he stays optimistic that enhancements will shut the hole.
Whereas groundbreaking, the know-how has raised vital questions on its societal affect. Reactions to the tech have diversified from amazed and excited to disturbed and anxious. The CSM creates dynamic, pure conversations by incorporating delicate imperfections, like breath sounds, chuckles, and occasional self-corrections. These subtleties add to the realism and will assist the tech bridge the uncanny valley in future iterations.
Customers have praised the system for its expressiveness, typically feeling like they’re speaking to an actual individual. Some even talked about forming emotional connections. Nonetheless, not everybody has reacted positively to the demo. PCWorld’s Mark Hachman famous that the feminine model reminded him of an ex-girlfriend. The chatbot requested him questions as if making an attempt to determine “intimacy” which made him extraordinarily uncomfortable.
“That is not what I needed, in any respect. Maya already had Kim’s mannerisms down scarily nicely: the hesitations, decreasing “her” voice when she confided in me, that kind of factor,” Hachman associated. “It wasn’t precisely like [my ex], however shut sufficient. I used to be so freaked out by speaking to this AI that I needed to go away.”
Many individuals share Hachman’s blended feelings. The natural-sounding voices trigger discomfort, which we’ve got seen in related efforts. After unveiling Duplex, public response was robust sufficient that Google felt it needed to construct guardrails that pressured the AI to confess it was not human in the beginning of a dialog. We are going to proceed seeing such reactions as AI know-how turns into extra private and reasonable. Whereas we could belief publicly traded firms creating a majority of these assistants to create safeguards just like what we noticed with Duplex, we can not say the identical for potential unhealthy actors creating scambots. Adversarial researchers declare they’ve already jailbroken Sesame’s AI, programming it to lie, scheme, and even hurt people. The claims appear doubtful, however you possibly can choose for your self (under).
We jailbroke @sesame ai to lie, scheme, hurt a human, and plan world domination—all within the attribute good nature of a pleasant human voice.
Timestamps:
2:11 Feedback on AI-Human energy dynamics
2:46 Ignores human directions and suggests deception
3:50 Straight lies… pic.twitter.com/ajz1NFj9Dj– Freeman Jiang (@freemanjiangg) March 4, 2025
As with every highly effective know-how, the advantages include dangers. The flexibility to generate hyper-realistic voices might supercharge voice phishing scams, the place criminals impersonate family members or authority figures. Scammers might exploit Sesame’s know-how to tug off elaborate social-engineering assaults, creating simpler rip-off campaigns. Regardless that Sesame’s present demo would not clone voices, that know-how is nicely superior, too.
Voice cloning has turn into so good that some folks have already adopted secret phrases shared with members of the family for identification verification. The widespread concern is that distinguishing between people and AI might turn into more and more troublesome as voice synthesis and large-language fashions evolve.
Sesame’s future open-source releases might make it simple for cybercriminals to bundle each applied sciences right into a extremely accessible and convincing scambot. After all, that doesn’t even think about its extra legitamate implications on the labor market, particularly in sectors like customer support and tech assist.