OpenAI Cuts o3 API Prices 40%, Doubles Rate Limits Across Paid Tiers

OpenAI on Thursday cut input and output token prices on its o3 reasoning-model family by roughly 40% and doubled rate limits across paid API tiers, the company said, citing inference-stack improvements as the primary driver.

SAN FRANCISCO, May 8 — OpenAI on Thursday cut input and output token prices on its o3 reasoning-model family by roughly 40 percent and doubled the requests-per-minute and tokens-per-minute rate limits available across its paid API tiers, the company said, in a move that puts the o3 line within rough price-parity of Anthropic’s Claude Sonnet pricing.

The new prices, effective immediately, drop o3 input tokens to $6 per million from $10 and output tokens to $24 per million from $40. The o3-mini variant drops to $0.66 per million input and $2.64 per million output, from $1.10 and $4.40 respectively. The o3-pro tier, which OpenAI introduced in March alongside the GPT-5 launch, is unchanged.

OpenAI attributed the cut to inference-stack efficiency gains accumulated over the first two quarters of 2026, including a switching-attention kernel rewrite the company described in a March engineering post and a recently shipped activation-recomputation change.

“The unit economics of running o3 have improved roughly in line with these price moves,” an OpenAI spokesperson said in a written statement. “We are passing the savings to API customers because we want o3 to be the default reasoning model for production use, not the premium one.”

Competitive context

The cut is the second pricing move in the AI API category in the past 30 days. Anthropic dropped Claude Sonnet 4.6 input-token pricing by 25 percent in mid-April. Google has not announced a corresponding cut on Gemini 2.5 Pro, which has remained priced at $5 per million input and $20 per million output since January.

“This is the part of the price curve that everyone has been waiting for,” said Priya Salgado-Ferreira, an analyst at the developer-tooling research firm Trailhead. “Reasoning-model pricing has been the line item in every API customer’s budget that nobody could explain to their CFO. A 40 percent cut on the leading reasoning model resets the conversation. The question now is whether Google matches on Gemini 2.5 Pro within the quarter.”

The rate-limit doubling is, for many production customers, the more material change. OpenAI’s Tier 4 paid customers — the level most production deployments operate at — now have access to 60,000 requests per minute and 4 million tokens per minute on o3, up from 30,000 and 2 million respectively. Tier 5, which OpenAI extends to a smaller set of customers by application, sees corresponding doublings.

Reasoning-tier framing

OpenAI’s framing of o3 as “the default reasoning model for production use” is a shift from the GPT-5 launch in March, when the company described o3 as a research-and-complex-task tier above GPT-5’s general-purpose tier. Thursday’s pricing language puts o3 in rough cost-parity with non-reasoning models for the first time.

The company said it does not expect the cut to affect ChatGPT consumer-tier pricing, which remains $20 per month for Plus and $200 per month for Pro.

OpenAI’s next scheduled API-side communication is its DevDay developer conference, scheduled for June 17 in San Francisco.

Asari Whitfield-Asari covers AI tools and developer infrastructure for Consumer Tech Wire.

OpenAI Cuts o3 API Prices 40%, Doubles Rate Limits Across Paid Tiers

Competitive context

Reasoning-tier framing

Sources & Citations

More from Ronan Whitfield-Asari

More in AI Tools