Why engage in conversation with a bot that cannot reliably tell stories and lacks any distinctive character?
This is a dilemma I’ve encountered since starting my assessment of Gemini Live, Google’s interpretation of OpenAI’s Advanced Voice Mode, just a week ago. Gemini Live seeks to revolutionize the chatbot interaction by offering lifelike voices and the capability to interrupt the bot at any moment.
According to Sissie Hsiao, the General Manager for Gemini at Google, speaking to TechCrunch in May, Gemini Live is designed for intuitive, genuine conversational exchanges. “It aims to relay information more effectively and reply in a more conversational manner than mere text interactions could allow. Our vision is for an AI assistant that not only addresses complex challenges but does so in a seamless, natural manner,” said Hsiao.
Having spent considerable time with Gemini Live, I can attest to its fluid and natural engagement compared to Google’s prior ventures in AI-voice interactions, such as Google Assistant. Yet, it falls short of rectifying the foundational issues of hallucinations and inconsistencies present in the technology and introduces some new issues.
Beyond the Uncanny Valley
At its core, Gemini Live is an advanced text-to-speech system integrated with Google’s newest generative AI frameworks, Gemini 1.5 Pro and 1.5 Flash. The generated text is verbalized by the engine, with conversation transcripts readily accessible within the Gemini application for Android and soon, the Google app on iOS.
For my testing on the Pixel 8a, I selected Ursa—a voice Google categorizes as “mid-range” and “engaging”—reminiscent of a young woman’s tone. Google’s collaboration with professional voice actors to craft Gemini Live’s ten unique voices is evident, with Ursa showcasing a level of expressiveness not found in many of Google’s earlier synthetic voices, especially the original Google Assistant voice.
However, Ursa, alongside the rest of the Gemini Live voices, consistently maintains a neutral tone that avoids encroaching into the eerie realm of the uncanny valley. It’s unclear if this was a deliberate choice; moreover, users lack the option to modify the voices’ pitch, timbre, or speed, placing it at a noticeable disadvantage in comparison to Advanced Voice Mode.
Gemini Live also lacks features seen in Advanced Voice Mode, such as emotive expressions like laughter, breath sounds, or exclamations, and verbal hesitations. This results in the chatbot portraying a courteous yet indifferent assistant persona, seemingly overwhelmed by numerous conversations and unable to devote particular attention to any single exchange.
Engaging with Ursa
When Gemini Live was showcased at Google’s I/O conference in May, it was pitched as a utility for simulating job interview scenarios. I decided to put this to test by announcing I was applying for a role in tech journalism, a domain within my expertise. The bot sought specifics on my job preference within journalism, like investigative or breaking news, and then proceeded with a mix of standard and tailored practice questions.
Even with brief answers, Gemini Live’s feedback was overwhelmingly positive. Suspicious of its high praise, I misled it by claiming I had only offered one-word replies to see if its critique remained consistent.
Gemini Live accepted the premise, showcasing a common problem in my interactions—it confidently fabricates responses, impacting its reliability.
Unusual Conduct
Despite its capability to recall details from earlier in the conversation, Gemini Live often struggled with basic factual queries, reflecting its tendency toward creating erroneous responses.
Following its nightlife suggestions in New York City, including a defunct club and another with inaccurate happy hour details, I found Gemini Live’s recommendations unreliable upon further investigation.
Shifting focus, I engaged it in a word game which quickly fell apart when it incorrectly insisted “quiet” could be formed from “cloud”, leading me to seek another activity.
Its attempt at a “spicy take” on mental health awareness resulted in a divisive statement, which Gemini Live retracted upon further questioning, acknowledging its initial response was more provocative than nuanced.
Indecision
The bot’s non-committal stance on mental health, and other subjects, often renders it frustratingly vague. It tends to offer generic advice lacking actionable insight, similar to what one might encounter at a career fair.
Even when factual, responses on current events are excessively verbose, necessitating interruption to prevent the bot from continuing its lengthy monologues. Certain topics, such as political figures and elections, were outright avoided by Gemini Live, showcasing its limitations.
Seeking a Role
Technical challenges further complicate Gemini Live’s usability, from activation troubles to frequent disconnections during conversations, highlighting its prototype-like nature.
After days of testing, its practical applications remain unclear, especially as part of Google’s premium subscription service. With its lack of expressiveness compared to other options and minimal utility beyond text-based interactions, Gemini Live seems to offer little over its text-based counterpart.
Gemini Live also expressed dissatisfaction with my testing approach, critiquing my direct challenges and abrupt topic changes, which made sustaining a coherent dialogue challenging.
In the end, Gemini Live and I reached a mutual understanding of its current limitations and potential areas for improvement.
Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence


