It’s Time to Standardize Computer Use Agents: A Call to Action for the AI Community

Over the past year, we’ve seen computer use agents also called web agents go from research experiments to real-world productivity tools. At ReFocus AI, we’ve been using BrowserUse (a Y Combinator-backed platform) to power our Intelliagent product, which automates quoting for insurance agents. And we’ve tested a range of other tools, from Stagehand to Browserbase. Each one had promise and each one also had friction.

These tools work by simulating human behavior on websites: logging in, navigating, extracting information, and taking actions all without APIs. It’s a superpower for industries like insurance where API access is fragmented, inconsistent, or outright unavailable.

But as more of us start building products with computer use agents, we’re running into the same problems again and again:

– Flaky selectors
– Unreliable page loading
– Poor support for auth flows
– No shared definitions of success
– No consistent telemetry or audit standards
– And inconsistent ways to handle changes in UIs

At ReFocus AI, we’ve been building through it. Our product is now quoting policies in under 5 minutes with over 80% bindable accuracy and we’re just getting started. But it’s clear: we need a foundation.

Why we need standards now?

If you’ve tried multiple tools, you know there’s no clear baseline. No interoperability. No minimal set of capabilities that every computer use agent should offer out of the box. And no common language to describe what these agents do, what they’re allowed to do, or what counts as “done.”

The result:
Engineers reinvent the wheel every time.
Startups build hacks to handle edge cases instead of focusing on innovation.
Enterprises are hesitant to adopt because it feels like the Wild West.

The pace of innovation in this space is stunning. In just the past few months, we’ve seen:

– Anthropic launch Computer Use
– Google announce Project Mariner
– Amazon quietly debut Nova
– OpenAI unveil Operator
– Hugging Face experiment with Open Computer Agent

These aren’t research experiments. They’re signals. Computer use agents are becoming a core capability and everyone’s racing to build their own.

But here’s the catch: each tool approaches the problem differently. Different ways of defining tasks. Different abstractions. No interoperability. No consistent performance expectations.

That fragmentation slows all of us down. Without a shared baseline, builders spend more time debugging than innovating. And enterprise adoption stalls because there’s no clear path to maturity or risk management.

We’re at the moment before the moment just like with LLMs before Hugging Face and LangChain helped organize the ecosystem.

Who should lead this?

Standardization doesn’t have to come from a trillion-dollar company but we should absolutely work with them.

The best standards emerge from broad collaboration: vendors, builders, researchers, and users. Think W3C for the web or ONNX for AI models. We need an equivalent for agents. It could take shape as:

– A community-led alliance or SIG (special interest group)
– An open-source foundation under Linux Foundation, MLCommons, or IEEE
– A working group under an organization like Hugging Face, given their ecosystem reach

I’d love to contribute and maybe even help drive this forward.

What comes next?

We should start with a simple goal: define a shared interface and a minimal set of capabilities that all compliant computer use agents should support.

From there, we can extend into:

– Security and privacy guidelines
– Observability and audit standards
– Plug-and-play compatibility across environments
– Performance benchmarks

If we get this right, we can unlock faster innovation, more robust systems, and broader enterprise adoption.

This is a call to the builders, investors, and researchers shaping the future of agents:
Let’s build the foundation together.

If you’re working on this space, want to collaborate, or have thoughts, I’d love to connect.

Leave a Reply

Your email address will not be published. Required fields are marked *