Skip to content
OpenAIGA

gpt-oss (120b / 20b)

OpenAI’s open-weight family — Apache 2.0 reasoning that fits on one GPU or a laptop.

Read the full gpt-oss (120b / 20b) analysis

Context

131K

Max output

Input /1M

$0.04

Output /1M

$0.18

Live pricing via OpenRouter

Best for

  • Local-first, privacy-sensitive products (gpt-oss-20b offline on 16GB)
  • Single-GPU reasoning workloads (gpt-oss-120b on one 80GB card)
  • Cost-controlled agentic loops with no per-token meter or vendor lock-in

Watch out

Open weights are not free inference — self-hosting the 120b only pencils out at high steady volume or under data-residency rules; otherwise a hosted endpoint at ~$0.04-$0.15/1M is cheaper. Requires OpenAI’s harmony response format. No longer tops open-model leaderboards.

For creators. Run gpt-oss-20b locally via Ollama/LM Studio for offline drafting, agent prototyping, and any workflow where prompts/outputs can’t leave the machine.

Benchmarks

gpqa diamond80.1
mmlu pro90
aime 2025 tools97.9
swe bench verified62.4
humanitys last exam19

Capabilities

  • Apache 2.0 open weights (free, commercial-friendly, no copyleft)
  • MXFP4-quantized: 120b on a single 80GB GPU (H100/MI300X), 20b in ~16GB
  • Run via Ollama / vLLM / LM Studio / llama.cpp / HF Transformers
  • 131K-token context, low/medium/high reasoning-effort levels
  • Native function calling, browsing, Python, structured outputs (harmony format)

Compare gpt-oss (120b / 20b)

More from OpenAI

Sources