The AI firm Sesame has unveiled the foundational model that drives Maya, the remarkably lifelike voice assistant.
This model, comprising 1 billion parameters (with “parameters” denoting individual elements of the model), is licensed under Apache 2.0, allowing for commercial use with minimal restrictions. Named CSM-1B, this model converts text and audio inputs into “RVQ audio codes,” as outlined in Sesame’s profile on the AI development platform Hugging Face.
RVQ stands for “residual vector quantization,” a method for encoding audio into distinct tokens known as codes. This technique is utilized in various contemporary AI audio applications, including Google’s SoundStream and Meta’s Encodec, as discussed in recent literature.
CSM-1B utilizes a backbone from Meta’s Llama model family, paired with an audio “decoder” component. According to Sesame, a finely-tuned variant of this model powers Maya.
In the description for CSM-1B found on Hugging Face and GitHub, Sesame states, “The model made available here is a foundational generation model. It can produce a variety of voice outputs, but is not optimized for any particular voice. […] The model does have some ability to handle non-English languages due to data contamination during training; however, its performance in those languages may not be reliable.”
The exact datasets used for the training of CSM-1B remain unspecified by Sesame.
Notably, the model lacks robust safeguards; it operates on an “honor system.” Sesame advises developers and users against employing the model to replicate someone’s voice without consent, generate misleading content such as fake news, or partake in any “harmful” or “malicious” activities.
I experimented with the demo on Hugging Face, and I was able to clone my voice in under a minute. From there, generating speech on various topics, including sensitive ones like elections and Russian propaganda, was straightforward:
Sesame, co-founded by Brendan Iribe, a creator of Oculus, gained significant attention in late February for its remarkable assistant technology, which could soon cross the threshold into uncanny valley territory. Both Maya and Sesame’s other assistant, Miles, exhibit breathing patterns and speech disfluencies, along with the ability to be interrupted while talking, much like OpenAI’s Voice Mode.
The company has secured undisclosed funding from Andreessen Horowitz, Spark Capital, and Matrix Partners. Besides its work on voice assistant technology, Sesame is also developing AI glasses “intended for all-day wear,” equipped with its proprietary models.
Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence


