title: "OpenAI Operator и CUA: агенты для управления компьютером в 2026" description: "Полное техническое руководство по Computer-Using Agent (CUA) от OpenAI, продукту Operator, открытой альтернативе Browser-Use и созданию пользовательских агентов для управления компьютером в целях RPA, тестирования и извлечения данных." keywords: ["OpenAI Operator", "CUA", "computer-using agent", "browser automation", "AI agent", "Responses API", "Browser-Use", "RPA", "Playwright", "2026"] date: "2026-02-26" category: "AI Agents" tags: ["openai", "operator", "cua", "browser-automation", "computer-use", "rpa", "playwright"] lang: ru

OpenAI Operator и CUA: агенты для управления компьютером в 2026

Ключевые факты

Имя модели CUA: computer-use-preview — доступна только через OpenAI Responses API (не Chat Completions)
Статус API: Beta (февраль 2026), production-ready для тестирования и разработки
Продукт Operator: Consumer SaaS запущен в январе 2026, интегрирован в агент ChatGPT к июлю 2025, бренд Operator deprecated после 31 августа 2025
Доступ к Operator: Доступна для ChatGPT Plus, Pro и Team подписчиков; работает через интерфейс ChatGPT с выделенной браузерной панелью
Цены Operator: Включены в подписку ChatGPT ($20/month Plus, $200/month Pro) — отдельных API-платежей нет для consumer-использования
Цены CUA API: Token-based pricing через Responses API; скриншоты потребляют 500-2000 токенов в зависимости от разрешения
Поддерживаемые браузеры: Chromium-based браузеры (Chrome, Edge) через Playwright или Puppeteer; Firefox-поддержка экспериментальна
Поддерживаемые OS окружения: browser, mac, windows, ubuntu (указываются через параметр environment)
Разрешение скриншота: Настраивается через display_width и display_height; типичная production конфигурация — 1024×768 или 1280×720
Типы действий: click(x, y), type(text), key(key_name), scroll(x, y, direction, amount), screenshot(), wait()
Задержка на действие: 1-5 секунд (включая inference модели + кодирование скриншота); задачи из 20 шагов обычно занимают 30-120 секунд end-to-end
Safety Controls (Operator): Автоматическая пауза на login/payment страницах, CAPTCHA handoff пользователю, изоляция сессии, confirmation prompts для высокорисковых действий
CUA vs Anthropic Computer Use (Архитектура): CUA сосредоточена на браузере с управляемым action loop; Anthropic поддерживает полную автоматизацию desktop с разработчиком-управляемыми скриншотами
CUA vs Anthropic (Производительность): CUA получает 38.1% на OSWorld benchmark, 58.1% на WebArena, 87% на WebVoyager; Anthropic Claude Opus 4.6 — 72.7% на OSWorld (февраль 2026)
Launch Partners: DoorDash, Instacart, OpenTable, Priceline, StubHub, Uber — оптимизированные интеграции для надежного исполнения задач
Максимум рекомендуемых шагов: 30 по умолчанию; настраивается через параметр max_steps для предотвращения бесконечных циклов
Guardrails встроены в Operator: Domain allowlists, валидация действий перед исполнением, read-only режимы для research задач, sandboxed браузер изоляция
OpenAI Agents SDK версия: 0.10.2 (февраль 2026) — pre-1.0 но публично доступна для production
Security Model: Рекомендуется sandboxed исполнение; агент подвержен prompt injection через adversarial контент страницы
Token Consumption: Задача из 20 шагов со скриншотами 1024×768 может потребить 10,000-40,000 токенов всего

Что такое OpenAI Operator / CUA

Различие Product vs API

OpenAI предоставляет computer-use capability через две отдельные поверхности:

Operator (Consumer Product) Управляемый SaaS опыт интегрированный в ChatGPT, который предоставляет браузерную автоматизацию конечным пользователям. Запущен как отдельный продукт в январе 2026, полностью интегрирован в агент ChatGPT к июлю 2025 и deprecated как отдельный бренд после 31 августа 2025. Пользователи описывают задачи на естественном языке, смотрят как агент навигирует в реальном времени через браузерную панель и могут пауза или вмешаться в любой момент. Operator обрабатывает safety автоматически: он паузирует на login скринах, payment страницах и CAPTCHAs, возвращая контроль пользователю вместо попытки автономно обрабатывать credentials.

CUA API (Developer Tooling) Базовая модель computer-use-preview раскрытая через Responses API. Разработчики реализуют perception-reasoning-action loop вручную: захватывают скриншоты, передают их в API, получают структурированные действия, исполняют действия через Playwright/Puppeteer, захватывают новый скриншот, повторяют. Этот API требует разработчикам управлять браузерной инфраструктурой, обрабатывать security sandboxing и реализовать guardrails.

Архитектура

CUA работает через непрерывный трехфазный loop:

Perception — Модель получает скриншот и извлекает текстовый контент, интерактивные элементы (кнопки, inputs, dropdowns), пространственные отношения и индикаторы состояния (checkbox checked, dropdown expanded, validation error displayed). Критически: CUA не использует DOM напрямую; он выводит всю структуру из пиксельных данных.
Reasoning — Используя chain-of-thought, модель оценивает прогресс к goal задачи, идентифицирует текущее состояние страницы, обнаруживает препятствия (errors, unexpected navigation), определяет best next action и решает требуется ли пользовательское вмешательство.
Action Execution — Модель выводит структурированные действия (click, type, key, scroll) с координатами пикселей или text payloads. Execution layer (код Playwright разработчика или managed runtime OpenAI) переводит их в реальные браузерные взаимодействия.

Этот loop продолжается до completion задачи или исчерпания step budget.

Vision-First vs DOM-Based Automation

Традиционные инструменты (Selenium, Playwright, Puppeteer) полагаются на CSS selectors и XPath выражения, которые ссылаются на DOM структуру. CUA работает на визуальном слое: кнопка распознается потому что она выглядит как кнопка, имеет описательный текст и логически позиционирована в UI flow. Это делает CUA robust к косметическим UI изменениям но вводит latency (кодирование скриншота + model inference) и стоимость (token consumption на скриншот).

Decision Framework

Когда использовать Operator (SaaS)

✅ Подходящие Use Cases:

Ad-hoc consumer задачи: бронирование ресторанов, заказ продуктов, поиск информации
Multi-step workflows через diverse websites где building custom integrations непрактично
Задачи требующие real-time supervision и вмешательства (например, handling unexpected popups)
Нетехнические пользователи которым нужна браузерная автоматизация без написания кода

❌ Неподходящие:

High-volume, repeatable автоматизация (стоимость и latency prohibitive)
Enterprise workflows требующие audit logs, role-based access или SLA guarantees
Задачи требующие programmatic result extraction или интеграции с downstream системами

Когда использовать CUA API

✅ Подходящие Use Cases:

RPA замена для web-based workflows (скачивание счетов, form submissions через multiple sites)
Ad-hoc data extraction где writing site-specific scrapers cost-prohibitive
High-level acceptance тестирование где describing scenarios на natural language быстрее чем writing Playwright selectors
Background async workflows с 30-120 second acceptable latency на задачу

❌ Неподходящие:

Real-time user interactions требующие sub-second response
High-frequency автоматизация (latency: 1-5s на действие; стоимость: 500-2000 токенов на скриншот)
Задачи требующие pixel-perfect UI interactions (drag-and-drop, canvas manipulation)

Когда использовать Anthropic Computer Use

✅ Подходящие Use Cases:

Full desktop автоматизация beyond браузера (file system доступ, terminal команды, multi-application workflows)
Задачи требующие highest benchmark производительности (72.7% OSWorld vs CUA's 38.1%)
Pay-per-use cost модель preferred над subscription bundling
Нужен разработчик-managed action loop с explicit screenshot handling

❌ Неподходящие:

Exclusively browser-based workflows где CUA's built-in абстракция reduces boilerplate
Teams уже invested в OpenAI ecosystem (Agents SDK, web search, file search)

Когда использовать Playwright (Traditional Automation)

✅ Подходящие Use Cases:

Latency-critical автоматизация (миллисекунды vs секунды)
High-volume workflows (тысячи исполнений в день)
Stable, well-known UIs где selector maintenance manageable
Deterministic outcomes требующие (CUA introduces path variability)

❌ Неподходящие:

Frequently changing UIs где selector maintenance becomes prohibitive
Novel задачи через diverse websites без common DOM structure
Задачи требующие visual understanding (reading charts, interpreting images)

Parameters Reference Table

Parameter	Value	Notes
`model`	`computer-use-preview`	Only available through Responses API
`tools[].type`	`computer_use_preview`	Built-in tool type identifier
`tools[].display_width`	1024, 1280, 1920	Width of virtual display in pixels; 1024 typical for production
`tools[].display_height`	768, 720, 1080	Height of virtual display in pixels; 768 typical for production
`tools[].environment`	`browser`, `mac`, `windows`, `ubuntu`	Execution environment; `browser` most common
`input[].role`	`user`, `assistant`	Message role in conversation history
`input[].content[].type`	`input_text`, `input_image`	Content block type
`input[].content[].text`	String	Task description or instruction
`input[].content[].image_url`	`data:image/png;base64,{b64}`	Base64-encoded screenshot
`truncation`	`auto`, `none`	History management for long tasks
`max_steps`	30 (recommended)	Loop termination limit; prevent runaway execution
Action: `click`	`{type: "click", coordinate: [x, y]}`	Pixel coordinates
Action: `type`	`{type: "type", text: "..."}`	Keyboard input to focused element
Action: `key`	`{type: "key", key: "Enter"}`	Special keys: Enter, Tab, Escape
Action: `scroll`	`{type: "scroll", coordinate: [x, y]}`	Scroll viewport
Action: `screenshot`	`{type: "screenshot"}`	Explicit screenshot capture
Action: `wait`	`{type: "wait"}`	Pause for dynamic content load

Common Pitfalls

Pitfall 1: Running Agent Without Sandboxing

❌ Incorrect:

# Running agent on developer's local machine with full access
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Agent can access cookies, credentials, file system

✅ Correct:

# Running agent in isolated Docker container
# Dockerfile sets up chromium with no host volume mounts
# No credentials store, no access to developer's session cookies
browser = p.chromium.launch(
    headless=True,
    args=['--no-sandbox', '--disable-setuid-sandbox']
)

Impact: Adversarial page content может inject prompts которые манипулируют агентом в accessing sensitive data, sending cookies на attacker-controlled domains или navigating на malicious sites.

Pitfall 2: No Step Budget or Runaway Loop Prevention

❌ Incorrect:

while True:
    response = client.responses.create(...)
    # Infinite loop if agent never reaches completion state

✅ Correct:

MAX_STEPS = 30
for step in range(MAX_STEPS):
    response = client.responses.create(...)
    if response.status == "completed":
        break
else:
    logger.warning(f"Task did not complete within {MAX_STEPS} steps")

Impact: Runaway loops потребляют unbounded API tokens (сотни или тысячи долларов), вызывают tasks hang indefinitely и создают poor user experience.

Pitfall 3: Not Validating Actions Before Execution

❌ Incorrect:

for action in response.output:
    execute_action(page, action)  # Blindly execute

✅ Correct:

ALLOWED_DOMAINS = ["shop.example.com", "checkout.example.com"]

for action in response.output:
    if action.type == "click":
        # Validate click target is within expected domain
        current_url = page.url
        if not any(domain in current_url for domain in ALLOWED_DOMAINS):
            logger.error(f"Attempted action outside allowed domains: {current_url}")
            raise SecurityError("Domain violation")
    execute_action(page, action)

Impact: Агент может navigating на phishing sites, executing malicious JavaScript или performing unintended actions за пределами task scope.

Pitfall 4: Using Chat Completions API Instead of Responses API

❌ Incorrect:

response = client.chat.completions.create(
    model="computer-use-preview",  # Will fail
    messages=[...]
)

✅ Correct:

response = client.responses.create(
    model="computer-use-preview",
    input=[...]
)

Impact: API call fails с model not found error. computer-use-preview доступна только через Responses API, не Chat Completions.

Pitfall 5: Ignoring Screenshot Resolution Impact on Performance

❌ Incorrect:

# Using 4K resolution unnecessarily
tools=[{
    "type": "computer_use_preview",
    "display_width": 3840,
    "display_height": 2160  # Massive token consumption
}]

✅ Correct:

# Using production-appropriate resolution
tools=[{
    "type": "computer_use_preview",
    "display_width": 1024,  # Balances readability and cost
    "display_height": 768
}]

Impact: Более высокое разрешение скриншотов потребляют dramatically больше токенов (4K ~8x больше чем 1024×768), увеличивают latency и предоставляют marginal benefit для большинства web UIs.

CUA API Implementation

Basic Action Loop

Минимальная реализация требует capturing скриншотов, passing их в Responses API, extracting действия, executing их через browser automation framework и repeating до task completion.

import base64
from openai import OpenAI
from playwright.sync_api import sync_playwright

client = OpenAI()

def capture_screenshot(page):
    """Capture screenshot and return as base64."""
    screenshot_bytes = page.screenshot()
    return base64.b64encode(screenshot_bytes).decode("utf-8")

def execute_action(page, action):
    """Execute a single CUA action on the Playwright page."""
    if action.type == "click":
        page.mouse.click(action.coordinate[0], action.coordinate[1])
    elif action.type == "type":
        page.keyboard.type(action.text)
    elif action.type == "key":
        page.keyboard.press(action.key)
    elif action.type == "scroll":
        page.mouse.wheel(action.coordinate[0], action.coordinate[1])

    page.wait_for_timeout(500)  # Allow page to respond

def run_cua_task(task: str, max_steps: int = 30):
    """Run a CUA task in a Playwright browser."""
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page(viewport={"width": 1024, "height": 768})
        page.goto("about:blank")

        screenshot_b64 = capture_screenshot(page)

        messages = [{
            "role": "user",
            "content": [{
                "type": "input_text",
                "text": task
            }, {
                "type": "input_image",
                "image_url": f"data:image/png;base64,{screenshot_b64}"
            }]
        }]

        for step in range(max_steps):
            response = client.responses.create(
                model="computer-use-preview",
                tools=[{
                    "type": "computer_use_preview",
                    "display_width": 1024,
                    "display_height": 768,
                    "environment": "browser"
                }],
                input=messages
            )

            if response.status == "completed":
                print("Task completed.")
                break

            for action in response.output:
                if hasattr(action, "type") and action.type == "computer_use_preview":
                    for computer_action in action.actions:
                        execute_action(page, computer_action)

            screenshot_b64 = capture_screenshot(page)
            messages = response.output + [{
                "role": "user",
                "content": [{
                    "type": "input_image",
                    "image_url": f"data:image/png;base64,{screenshot_b64}"
                }]
            }]

        browser.close()

# Example usage
run_cua_task("Go to news.ycombinator.com and tell me the top 3 stories.")

Production-Grade Implementation with Safety Controls

import asyncio
from dataclasses import dataclass
from typing import Optional
from urllib.parse import urlparse

@dataclass
class TaskConfig:
    task_description: str
    entry_url: str
    allowed_domains: list[str]
    max_steps: int = 25
    success_pattern: Optional[str] = None

class SafeCUAAgent:
    def __init__(self, config: TaskConfig):
        self.config = config
        self.client = OpenAI()

    def validate_navigation(self, url: str) -> bool:
        """Reject navigation to disallowed domains."""
        parsed = urlparse(url)
        return any(parsed.netloc.endswith(domain)
                   for domain in self.config.allowed_domains)

    def validate_action(self, page, action) -> bool:
        """Validate action before execution."""
        current_url = page.url

        if not self.validate_navigation(current_url):
            raise SecurityError(f"Current URL {current_url} not in allowed domains")

        # Block form submissions on payment pages
        if "checkout" in current_url or "payment" in current_url:
            if action.type == "click":
                # Additional validation: check if click target is submit button
                return False

        return True

    async def run(self) -> dict:
        with sync_playwright() as p:
            browser = p.chromium.launch(headless=True)
            page = browser.new_page(viewport={"width": 1024, "height": 768})

            # Set up navigation guard
            page.on("framenavigated", lambda frame: (
                frame.page().goto("about:blank")
                if not self.validate_navigation(frame.url)
                else None
            ))

            page.goto(self.config.entry_url)
            screenshot_b64 = capture_screenshot(page)

            messages = [{
                "role": "user",
                "content": [{
                    "type": "input_text",
                    "text": f"""
                    Entry URL: {self.config.entry_url}
                    Task: {self.config.task_description}

                    Constraints:
                    - Only navigate within: {', '.join(self.config.allowed_domains)}
                    - Do not enter payment information
                    - Stop if you encounter a CAPTCHA
                    """
                }, {
                    "type": "input_image",
                    "image_url": f"data:image/png;base64,{screenshot_b64}"
                }]
            }]

            for step in range(self.config.max_steps):
                response = self.client.responses.create(
                    model="computer-use-preview",
                    tools=[{
                        "type": "computer_use_preview",
                        "display_width": 1024,
                        "display_height": 768,
                        "environment": "browser"
                    }],
                    input=messages
                )

                if response.status == "completed":
                    break

                for action in response.output:
                    if hasattr(action, "type") and action.type == "computer_use_preview":
                        for computer_action in action.actions:
                            if self.validate_action(page, computer_action):
                                execute_action(page, computer_action)

                screenshot_b64 = capture_screenshot(page)
                messages = response.output + [{
                    "role": "user",
                    "content": [{
                        "type": "input_image",
                        "image_url": f"data:image/png;base64,{screenshot_b64}"
                    }]
                }]

            browser.close()

            return {
                "completed": response.status == "completed",
                "steps_taken": step + 1
            }

Integration with OpenAI Agents SDK

from openai import OpenAI
from agents import Agent, Runner, handoff

client = OpenAI()

research_agent = Agent(
    name="WebResearcher",
    instructions="""You are a web research specialist.
    Use the computer_use tool to navigate websites and extract information.
    Return structured data from your research.""",
    tools=["computer_use_preview"],
    model="computer-use-preview",
)

analysis_agent = Agent(
    name="DataAnalyst",
    instructions="""You receive raw data extracted by the WebResearcher.
    Analyze it, identify patterns, and produce actionable insights.""",
    model="gpt-4o",
)

orchestrator = Agent(
    name="Orchestrator",
    instructions="""You coordinate research tasks.
    Delegate web research to WebResearcher, then analysis to DataAnalyst.""",
    handoffs=[
        handoff(research_agent),
        handoff(analysis_agent),
    ],
    model="gpt-4o",
)

result = Runner.run_sync(
    orchestrator,
    "Research the pricing of the top 5 project management tools and compare their feature sets."
)

Operator Use Cases

Designed For (Consumer SaaS Operator)

E-commerce and Ordering

Finding products по natural language description через retail sites
Filtering по criteria (price range, brand, reviews), adding to cart, selecting shipping
Pausing перед payment для user confirmation

Restaurant and Service Bookings

Multi-step booking flows: date, time, party size, special requirements
Navigating OpenTable, Resy или любой booking platform без site-specific integration

Travel and Accommodation

Searching flights по flexible dates, comparing options, selecting seats
Managing booking flows через airline и hotel websites

Data Lookup and Research

Aggregating информацию через multiple websites (stock levels, contact info, pricing data)
Navigating directories и databases с inconsistent UIs

Requires Custom CUA API

RPA Replacement (Enterprise)

Invoice скачивание из multiple supplier portals со diverse UIs
Employee onboarding form completion через HR systems
Compliance report generation через legacy government portals
Customer data reconciliation через CRM и ERP web interfaces

Structured Data Extraction

Extracting product catalogs из e-commerce sites в structured JSON
Monitoring competitor pricing через multiple sites
Aggregating real estate listings с specific attribute schemas

Testing Automation

Acceptance тестирование described в plain English вместо Playwright selectors
Cross-browser compatibility тестирование с visual validation
Regression тестирование для frequently changing UIs где selector maintenance prohibitive

Workflow Integration

Triggering браузерную автоматизацию из backend services (cron jobs, webhooks)
Embedding результатов автоматизации в downstream data pipelines
Programmatic result extraction для analytics или alerting

Production Considerations

Cost Modeling

Token Consumption:

Screenshot кодирование: 500-2000 токенов на image (depends на resolution и content complexity)
20-step задача на 1024×768: приблизительно 10,000-40,000 токенов всего
Model inference: computer-use-preview pricing per input/output token (check current pricing на platform.openai.com)

Cost Comparison:

Traditional Playwright script: infrastructure cost только (negligible per execution)
CUA API: $0.50-$2.00 per 20-step задача (illustrative range based на token pricing)
Anthropic Computer Use: similar token-based pricing structure

Volume Implications:

1,000 задач/month: manageable для API-based workflows
10,000+ задач/month: consider hybrid approach (CUA для novel задач, Playwright для repeatable patterns)

Latency Profile

Per-Step Breakdown:

Screenshot capture: 50-200ms
Base64 encoding: 10-50ms
Network upload: 100-500ms (depends на screenshot size и connection)
Model inference: 1-4 seconds
Action execution: 100-500ms
Total per step: 1.5-5 seconds

Multi-Step Tasks:

10-step задача: 15-50 seconds
20-step задача: 30-100 seconds
30-step задача: 45-150 seconds

Acceptable Use Cases:

Background async workflows (invoice processing, nightly data extraction)
User-initiated задачи с progress visibility (Operator model)

Not Suitable:

Real-time user interactions требующие sub-second response
High-frequency polling или monitoring

Rate Limits

OpenAI Responses API:

Tier-based rate limits (check current limits на platform.openai.com/account/limits)
Typical production tier: 10,000 requests/minute (sufficient для most use cases)
Computer-use задачи считаются single request per API call (loop executes multiple calls)

Practical Throughput:

Single-threaded execution: 1-2 задачи per minute (due to latency, not rate limits)
Parallel execution: 10-20 concurrent задач feasible (limited by Playwright browser overhead)

Safety Controls Implementation Checklist

Mandatory Controls:

[ ] Sandboxed браузер execution (Docker container, VM или cloud browser service)
[ ] Domain allowlist enforcement (reject navigation вне permitted domains)
[ ] Step budget limit (prevent runaway loops)
[ ] Action validation перед execution (verify current URL, action type)
[ ] No credentials в agent access (use pre-authenticated sessions или human handoff)

Recommended Controls:

[ ] Human confirmation gates для high-consequence actions (purchases, form submissions, deletions)
[ ] Read-only mode для research задач (block все type и click на submit buttons)
[ ] Comprehensive logging (все actions, скриншоты, errors)
[ ] Retry logic с exponential backoff (handle transient failures)
[ ] Explicit success criteria validation (verify expected outcome перед reporting completion)

Security Monitoring:

[ ] Detect prompt injection attempts (unexpected navigation, suspicious action patterns)
[ ] Alert на domain allowlist violations
[ ] Rate limit violations или unexpected token consumption spikes
[ ] Failed authentication или CAPTCHA encounters требующие intervention

Performance & Benchmarks

Примечание: Приведённые ниже цифры являются иллюстративными оценками на основе типичных production-конфигураций, а не измерениями конкретной системы.

Benchmark Scores (OpenAI CUA)

OSWorld (Full Desktop Computer Use):

CUA computer-use-preview: 38.1% success rate
Comparison: Anthropic Claude Opus 4.6 достигает 72.7% (февраль 2026)

WebArena (Multi-Step Web Tasks):

CUA computer-use-preview: 58.1% success rate

WebVoyager (Web-Based Task Completion):

CUA computer-use-preview: 87% success rate
Comparison: Browser-Use (open-source) достигает 89.1%

Production Performance Characteristics

Task Completion Rates (Typical Production Workloads):

Form completion: High success rate (80-90% для standard forms)
E-commerce checkout: Moderate success rate (pauses на payment для user confirmation)
Multi-site data aggregation: Moderate success rate (depends на site complexity и UI diversity)
CAPTCHA-protected flows: Requires human intervention (not automated)

Reliability and Repeatability:

Non-deterministic: Те же задачи могут take different action paths через runs
Variability sources: Model sampling, page load timing, dynamic content
Mitigation: Retry logic, explicit state verification, fallback на scripted automation для critical paths

Latency Sensitivity:

Interactive задачи: 30-120 seconds acceptable с progress visibility
Background workflows: Latency tolerance depends на SLA requirements
High-volume автоматизация: Traditional Playwright preferred для <1s requirement

ZORIN

OpenAI Operator и CUA: агенты для управления компьютером в 2026

OpenAI Operator и CUA: агенты для управления компьютером в 2026

Ключевые факты

Что такое OpenAI Operator / CUA

Различие Product vs API

Архитектура

Vision-First vs DOM-Based Automation

Decision Framework

Когда использовать Operator (SaaS)

Когда использовать CUA API

Когда использовать Anthropic Computer Use

Когда использовать Playwright (Traditional Automation)

Parameters Reference Table

Common Pitfalls

Pitfall 1: Running Agent Without Sandboxing

Pitfall 2: No Step Budget or Runaway Loop Prevention

Pitfall 3: Not Validating Actions Before Execution

Pitfall 4: Using Chat Completions API Instead of Responses API

Pitfall 5: Ignoring Screenshot Resolution Impact on Performance

CUA API Implementation

Basic Action Loop

Production-Grade Implementation with Safety Controls

Integration with OpenAI Agents SDK

Operator Use Cases

Designed For (Consumer SaaS Operator)

Requires Custom CUA API

Production Considerations

Cost Modeling

Latency Profile

Rate Limits

Safety Controls Implementation Checklist

Performance & Benchmarks

Benchmark Scores (OpenAI CUA)

Production Performance Characteristics