Surfer 2

H Company advances the frontier of computer-use agents with Surfer 2, a unified architecture built for real and impactful use cases. Our mission is to move from models that know to agents that do: we create capable, context-aware systems that operate reliably on digital environments.

H is thrilled to present Surfer 2: a cross-platform computer-use agent that runs seamlessly on desktop, web, and mobile environments. Surfer 2 surpasses existing state-of-the-art agents on four major agentic benchmarks spanning multiple platforms, outperforming systems developed by other leading AI labs, such as OpenAI, Anthropic, and Google.

Key highlights

H achieves state-of-the-art results on four major agent benchmarks spanning desktop, web, and mobile environments (OSWorld, WebVoyager, WebArena, AndroidWorld); surpassing human performance on desktop and mobile environments.
Surfer 2 integrates third-party frontier models and Holo1.5 models in a unified architecture, highlighting H’s capacity to drive performance through expert agent design and informed model training.

Surfer 2: A unified architecture

Our original web browsing agent, Surfer-H, delivered Pareto-optimal performance on web browsing tasks (from WebVoyager). In just four months, we built on what we learned to achieve state-of-the-art results across platforms with Surfer 2.

Surfer 2 is an agent architecture for computer use. It separates strategic planning from tactical execution, with an orchestrator module managing planning and coordination while sub-agents act across interfaces.

Surfer 2 can be configured with or without the orchestrator module. When enabled, the orchestrator decomposes the primary goal into a set of sub-tasks assigned to sub-agents. Upon completion, each sub-agent reports its outcome and intermediate state back to the orchestrator. Based on these reports, the orchestrator determines its own next step: either producing a final output, advancing to the next sub-task, or replanning in the event of failure.

‍Surfer 2 follows a ReAct (reason+act) loop. At each step, it assesses its progress towards its task, and determines its next action. Reliable performance across environments relies on dedicated components for visual grounding (see more below), task validation, and failure recovery, along with a robust integration layer that ensures actions translate accurately into system control. Because an agent is only as capable as the actions it can reliably perform, these layers are central to Surfer 2’s consistency.

Surfer2 ReAct Loop, assessing its progress towards its task

This design yields a resilient agent that continuously benefits from advances in frontier models and extracts maximum performance from them.

SOTA performance across desktop, mobile & web

Desktop

OSWorld evaluates an agent’s ability to control a full desktop environment on Ubuntu systems across diverse applications via human-like interaction. In the Foundation E2E GUI category where only visual perception and interaction are allowed, Surfer 2 achieves state-of-the-art with pass@1 of 60.1%. Our system reaches 77.0% with pass@10, surpassing the human baseline score of 72.4%.

OSWorld E2E Benchmark evaluating Surfer 2 surpassing the human baseline score of 72,4%

Web

WebArena evaluates agents in a sandboxed web environment from five categories (E-commerce, social forum, collaborative development, maps, CMS platforms) with functionality and data mimicking their real-world equivalents. Surfer 2 achieves a state-of-the-art score of 69.6% by decoupling planning and execution.

WebArena Benchmark with Surfer 2 state-of-the-art score of 69.6%

WebVoyager assesses web information retrieval on dynamic live websites. On this benchmark Surfer 2 reaches a success rate of 97.1%, surpassing Magnitude’s previous state-of-the-art of 93.9%.

WebVoyager Benchmark with Surfer 2 reaching a success rate of 97.1%

Mobile

AndroidWorld evaluates an agent’s capability to control an Android device and use 20 real applications. On this suite, Surfer 2 achieves a success rate of 87.1%, reaching state-of-the-art level in the visual interaction category. Relying on vision and human-like interaction (swipe, press, type) allows Surfer 2 to be proficient across applications and Android versions, surpassing the human baseline of 80.0%.

AndroidWorld Benchmark with Surfer 2 achieving a success rate of 87.1%

What’s next?

Surfer 2 is the culmination of what is currently possible using a mix of third-party and internal foundation models. By combining our proprietary agent training methods and infrastructure with the best external models available, we’ve achieved state-of-the-art performance across desktop, web, and mobile benchmarks, often matching or exceeding human capabilities.

These results didn’t come cheap: Surfer 2 runs are extremely costly. That’s why we’re now focused on training Holo2, our next-generation proprietary model designed to deliver the same breakthrough performance at a fraction of the cost, bringing truly scalable and accessible AI agents within reach.

In just the past few months, we’ve open-sourced Surfer-H, launched Holo1.5, and set new state-of-the-art records in desktop, web, and mobile performance with Surfer 2. And we’re just getting started. H Company is committed to pushing these boundaries even further, making AI more capable, reliable, and accessible for everyone.

See our Surfer-H agent in action and start using it to find information and solve real-world problems, today.

Or, check out our full technical report detailing Surfer 2’s performance, state-of-the-art benchmark results, and cross-platform evaluations!