cua-driver lets AI agents click, type, and verify in any native macOS app without taking focus, cursor, or the active Space.
Key Takeaways
Works on non-AX surfaces: Chromium web content, canvas-based tools like Blender, Figma, DAWs, and game engines that standard accessibility APIs can’t reach.
Every session records as a replayable trajectory, usable for RL training via cua-bench on OSWorld, ScreenSpot, and Windows Arena benchmarks.
Unified Python SDK (cua) targets Linux containers, Linux VMs, macOS, Windows, and Android with the same Sandbox.ephemeral() API across QEMU local and cloud.
cuabot gives coding agents (Claude Code, OpenClaw) isolated sandboxes with H.265 display, shared clipboard, and audio; individual windows appear natively on the host desktop.
Install is a single curl script; MCP server ships with the package for direct Claude Code and Cursor integration.
Hacker News Comment Review
An ex-Apple engineer validated the macOS background-automation approach, noting parallel UI test execution as the headline win, but flagged telemetry-on-by-default as a friction point for privacy-conscious adopters.
Compliance and audit readiness surfaced as an open question: trajectory logs capture what the agent did, but no mechanism yet explains the decision behind each action to a compliance team.
Notable Comments
@LatencyKills: Built similar tooling at Apple; endorses the implementation but calls out opt-in vs. opt-out telemetry as the one concrete criticism.
@davey2wavey: Raises agent auditability gap – logs exist, but “how do you explain the ‘why’ behind each decision to a compliance team?”