OpenAI's Operator Agent Assisted Me with My Move, but I Had to Lend a Hand as Well

I was given a week to explore OpenAI’s latest AI agent, Operator, a system designed to autonomously complete online tasks on your behalf.

Operator represents a significant leap toward the tech industry’s ambitious vision of AI agents—tools that can automate tedious aspects of our daily lives, allowing us to focus on the activities we truly enjoy. However, based on my experience with OpenAI’s agent, the concept of genuinely autonomous AI is still beyond our current capabilities.

OpenAI developed a new model for Operator, which merges the visual comprehension capabilities of GPT-4o with the analytical strengths of o1.

This model performs adequately for simple tasks; I observed Operator successfully clicking buttons, navigating through website menus, and filling out forms. While it managed certain tasks independently, it operated at a speed much quicker than web-based agents I’ve tested from competitors like Anthropic and Google.

Yet, throughout my trial, I found myself assisting OpenAI’s agent more than I anticipated. It felt as if I was guiding Operator through each challenge instead of truly delegating those tasks.

Frequently, I had to answer numerous questions, grant permissions, provide personal details, and step in whenever the agent encountered difficulties.

To draw an analogy, using Operator is akin to driving a car equipped with cruise control—occasionally allowing the car to navigate on its own but not quite achieving full autopilot functionality.

In fact, OpenAI has stated that Operator’s regular pauses are intentional.

The AI technology that powers Operator, similar to that behind chatbots like OpenAI’s ChatGPT, struggles to function autonomously for extended durations, often experiencing issues like hallucination. Consequently, OpenAI is cautious about granting the system substantial decision-making authority or access to sensitive user data. While this may be prudent, it undeniably limits Operator’s effectiveness.

Nevertheless, OpenAI’s inaugural agent stands as a remarkable proof of concept and interface for an AI that interacts with any website’s front end. To construct genuinely independent AI systems, however, technology companies will need to develop more reliable models that require less human intervention.

A Bit Too “Hands-On”

My week with Operator coincided with my apartment move, so I enlisted OpenAI’s agent to assist with logistical tasks.

I requested that Operator help me purchase a new parking permit. The agent responded, “Absolutely,” and then opened a new browser window on my computer.

Operator executed a search for parking permits in San Francisco, leading me directly to the relevant city website and even to the correct page.

One advantage of Operator is its ability to let you continue using your computer while it performs tasks, unlike Google’s Project Mariner. This functionality is possible because OpenAI’s agent operates from the cloud, rather than your local machine.

The operator interfaceImage Credits:Maxwell Zeff and OpenAI

However, I found myself having to grant Operator permission to initiate various processes more times than I would have preferred. It frequently paused to request that I fill out forms with personal details—such as my name, phone number, and email. There were also instances where Operator got confused, prompting me to take the reins and redirect the browser.

In another instance, I instructed Operator to make a reservation at a Greek restaurant. To its credit, the agent located a charming venue in my vicinity with fair prices. Nonetheless, I had to respond to more than six questions during the entire process.

Some steps to making a reservation with OperatorImage Credits:Maxwell Zeff and OpenAI

If I find myself needing to intervene over six times just to secure a reservation via an AI agent, I often wonder if it would be simpler to manage the task myself. That’s a reflection I mulled over frequently during my time with Operator.

Agent as a Platform

During some of my tests, I encountered websites that would not allow Operator access. For instance, when attempting to book an electrician through TaskRabbit, I received an error message from OpenAI’s agent and was asked if it could utilize an alternative service. Websites like Expedia, Reddit, and YouTube similarly restricted access to the AI agent.

On the other hand, certain platforms welcomed Operator warmly. Instacart, Uber, and eBay partnered with OpenAI for the introduction of Operator, permitting the agent to maneuver through their websites on users’ behalf.

These companies are preparing for a future where a fraction of user interactions are managed by an AI agent.

“Customers utilize Instacart through various entry points,” said Daniel Danker, the Chief Product Officer at Instacart, in a discussion with TechCrunch. “We view Operator as potentially another one of those entry points.”

Although it might seem like allowing OpenAI’s agent to interact on Instacart’s site could create a separation from its customers, Danker asserts that Instacart aims to engage users wherever they are.

“We strongly believe, much like OpenAI, that agentic systems will significantly transform how consumers interact with digital platforms,” said eBay’s Chief AI Officer, Nitzan Mekel-Bobrov, in an interview with TechCrunch.

Even with the potential rise of AI agents, Mekel-Bobrov anticipates that users will continue to visit eBay’s website, emphasizing that “online destinations are here to stay.”

Trust Concerns

I experienced some trust issues with Operator after it produced several hallucinations that nearly cost me a substantial amount of money.

For example, I requested the agent to locate a parking garage near my new residence. It ended up recommending two garages, claiming they were only a few minutes’ walk away.

Incorrect information about parking distancesImage Credits:Maxwell Zeff and OpenAI

Unfortunately, not only were the garages well above my budget, but they were also quite far from my home. One required a 20-minute walk, and the other was a 30-minute trek. Ultimately, Operator had inputted the wrong address.

This scenario illustrates precisely why OpenAI refrains from granting its agent access to sensitive information like credit card numbers and passwords. If OpenAI hadn’t permitted me to intervene, Operator could have easily squandered hundreds of dollars on an unnecessary parking space.

These hallucinations present a significant hurdle to achieving genuinely useful autonomous agents—those that could handle bothersome tasks on your behalf. Users are unlikely to place their trust in agents that are susceptible to basic errors, particularly those with real-world implications.

With Operator, OpenAI has undoubtedly constructed impressive tools that enable AI systems to explore the internet. Nonetheless, these tools will be of little value unless the underlying AI can accurately perform tasks as users expect. Until that happens, humans will remain responsible for guiding agents—rather than the other way around—and that somewhat undermines the very premise of such technology.

TechCrunch features an AI-focused newsletter! Sign up here to receive it straight to your inbox every Wednesday.

Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

OpenAI’s Operator Agent Assisted Me with My Move, but I Had to Lend a Hand as Well

About Us

Top Categories

Latest Articles

Editor's Picks

The reputation of struggling YC...

Roku Introduces Standalone App for...

Meta Launches Initial Testing of...

Meta’s Natural Gas Surge Could...