OpenAIGA

gpt-oss (120b / 20b)

Name: gpt-oss (120b / 20b)
Price: 0.04 USD
Author: OpenAI

OpenAI’s open-weight family — Apache 2.0 reasoning that fits on one GPU or a laptop.

Read the full gpt-oss (120b / 20b) analysis

Context

131K

Max output

—

Input /1M

$0.04

Output /1M

$0.18

Live pricing via OpenRouter

Best for

Local-first, privacy-sensitive products (gpt-oss-20b offline on 16GB)
Single-GPU reasoning workloads (gpt-oss-120b on one 80GB card)
Cost-controlled agentic loops with no per-token meter or vendor lock-in

Watch out

Open weights are not free inference — self-hosting the 120b only pencils out at high steady volume or under data-residency rules; otherwise a hosted endpoint at ~$0.04-$0.15/1M is cheaper. Requires OpenAI’s harmony response format. No longer tops open-model leaderboards.

For creators. Run gpt-oss-20b locally via Ollama/LM Studio for offline drafting, agent prototyping, and any workflow where prompts/outputs can’t leave the machine.

Benchmarks

gpqa diamond	80.1
mmlu pro	90
aime 2025 tools	97.9
swe bench verified	62.4
humanitys last exam	19

Capabilities

Apache 2.0 open weights (free, commercial-friendly, no copyleft)
MXFP4-quantized: 120b on a single 80GB GPU (H100/MI300X), 20b in ~16GB
Run via Ollama / vLLM / LM Studio / llama.cpp / HF Transformers
131K-token context, low/medium/high reasoning-effort levels
Native function calling, browsing, Python, structured outputs (harmony format)

Compare gpt-oss (120b / 20b)

gpt-oss vs Gemma 4

Gemma 4 wins for multimodal work on a single consumer GPU; gpt-oss wins on reasoning-per-gigabyte and scales to 120b on one 80GB card. Both are Apache 2.0.

More from OpenAI

GPT-5.5 GPT-5.2 Pro