GPT-5.5
OpenAI’s agentic flagship: best-in-class computer-use and knowledge-work scores, at double the price.
Read the full GPT-5.5 analysisContext
1M
Max output
128K
Input /1M
$5.00
Output /1M
$30.00
Live pricing via OpenRouter
Best for
- Terminal-agent and Codex-style autonomous loops
- Computer-use / OSWorld automation
- Long-context reasoning over large codebases and corpora
Watch out
Output is $30/1M and the published 1M context can shrink in practice (~258K reported in Codex). The 2x price only pays off if your workload is output-heavy enough to capture the ~40% token-efficiency savings. Trails Opus 4.8 on GDPval-AA and SWE-Bench Pro.
For creators. Strong for multi-step agentic pipelines and long-document synthesis; for real-time voice use the separate gpt-realtime-2 family, not GPT-5.5 directly.
Benchmarks
| gdpval | 84.9 |
| osworld | 78.7 |
| tau2 telecom | 98 |
| terminal bench 2 | 82.7 |
| long context 512k 1m | 74 |
| aime 2025 | 93.6 |
| swe bench pro | 58.6 |
| gdpval aa | 1769 |
Capabilities
- Long-horizon agentic autonomy (plan, tool-use, multi-step execution)
- Strongest published computer-use scores (OSWorld 78.7%, Tau2 Telecom 98%)
- Large long-context reasoning gain (512K-1M retrieval to 74%)
- ~40% better output-token efficiency vs GPT-5.4 (Artificial Analysis)
- Drop-in API replacement for GPT-5.4 (same caching, tools, compaction)
Compare GPT-5.5
Claude Opus 4.8 vs GPT-5.5
Opus 4.8 leads aggregate intelligence and SWE-Bench Pro; GPT-5.5 wins computer-use, terminal-agent loops, and native voice. The split is real enough to run both.
Grok 4.3 vs GPT-5.5
GPT-5.5 is clearly the stronger model; Grok 4.3 delivers a large share of the capability at roughly a fifth of the price — the budget-frontier default.