Right now we’re releasing GPT‑5.4 mini and nano, our most succesful small fashions but. They carry lots of the strengths of GPT‑5.4 to quicker, extra environment friendly fashions designed for high-volume workloads.
GPT‑5.4 mini considerably improves over GPT‑5 mini throughout coding, reasoning, multimodal understanding, and power use, whereas operating greater than 2x quicker. It additionally approaches the efficiency of the bigger GPT‑5.4 mannequin on a number of evaluations, together with SWE-Bench Professional and OSWorld-Verified.
GPT‑5.4 nano is the smallest, least expensive model of GPT‑5.4 for duties the place pace and value matter most. It is usually a major improve over GPT‑5 nano. We suggest it for classification, information extraction, rating, and coding subagents that deal with easier supporting duties.
These fashions are constructed for the sorts of workloads the place latency instantly shapes the product expertise: coding assistants that must really feel responsive, subagents that shortly full supporting duties, computer-using methods that seize and interpret screenshots, and multimodal functions that may motive over pictures in real-time. In these settings, the perfect mannequin is commonly not the biggest one—it’s the one that may reply shortly, use instruments reliably, and nonetheless carry out properly on complicated skilled duties.
1 The very best reasoning_effort obtainable for GPT‑5 mini is ‘excessive’.
Right here’s what our prospects assume after testing GPT‑5.4 mini and nano of their workflows:
“GPT-5.4 mini delivers sturdy end-to-end efficiency for a mannequin on this class. In our evaluations it matched or exceeded aggressive fashions on a number of output duties and quotation recall at a a lot decrease value. It additionally achieved greater end-to-end cross charges and stronger supply attribution than the bigger GPT-5.4 mannequin.”
— Aabhas Sharma, CTO at Hebbia
GPT‑5.4 mini and nano are particularly efficient in coding workflows that profit from quick iteration. The fashions deal with focused edits, codebase navigation, front-end technology, and debugging loops with low latency, making them a robust match for coding duties that must be accomplished at quicker speeds and decrease prices.
In benchmarks, GPT‑5.4 mini persistently outperforms GPT‑5‑mini at related latencies and approaches GPT‑5.4‑degree cross charges whereas operating a lot quicker, delivering one of many strongest performance-per-latency tradeoffs for coding workflows.
We estimate latency by wanting on the manufacturing habits of our fashions, and simulating this offline. The latency estimate accounts for instrument name period (code execution time), sampled tokens, and enter tokens. Actual-world latency could differ considerably, and will depend on many components not captured in our simulation. Equally, prices are estimated primarily based on API pricing of those fashions on the time of writing. Prices could change sooner or later. Reasoning efforts have been swept from low to xhigh.
GPT‑5.4 mini can be a robust match for methods that mix fashions of various sizes. In Codex, for instance, a bigger mannequin like GPT‑5.4 can deal with planning, coordination, and last judgment, whereas delegating to GPT‑5.4 mini subagents that deal with narrower subtasks in parallel—like looking out a codebase, reviewing a big file, or processing supporting paperwork. Learn the way subagents work in Codex within the docs(opens in a brand new window).
This sample turns into extra helpful as smaller fashions get quicker and extra succesful. As an alternative of utilizing one mannequin for the whole lot, builders can compose methods the place bigger fashions determine what to do and smaller fashions execute shortly at scale. GPT‑5.4 mini is our strongest mini mannequin but for that type of workflow.
GPT‑5.4 mini can be sturdy on multimodal duties, significantly these associated to laptop use. The mannequin can shortly interpret screenshots of dense person interfaces to finish laptop use duties with pace. On OSWorld-Verified, GPT‑5.4 mini approaches GPT‑5.4 whereas considerably outperforming GPT‑5 mini.
GPT‑5.4 mini is offered at this time within the API, Codex, and ChatGPT.
Within the API, GPT‑5.4 mini helps textual content and picture inputs, instrument use, operate calling, net search, file search, laptop use, and expertise. It has a 400k context window and prices $0.75 per 1M enter tokens and $4.50 per 1M output tokens.
In Codex, GPT‑5.4 mini is offered throughout the Codex app, CLI, IDE extension and net. It makes use of solely 30% of the GPT‑5.4 quota, letting builders shortly deal with easier coding duties in Codex for about one-third the associated fee. Codex may also delegate to GPT‑5.4 mini subagents in order that much less reasoning-intensive work runs on the cheaper mannequin.
In ChatGPT, GPT‑5.4 mini is offered to Free and Go customers by way of the “Pondering” characteristic within the + menu. For all different customers, GPT‑5.4 mini is offered as a price restrict fallback for GPT‑5.4 Pondering.
GPT‑5.4 nano is simply obtainable within the API and prices $0.20 per 1M enter tokens and $1.25 per 1M output tokens.
1 The very best reasoning_effort obtainable for GPT‑5 mini is ‘excessive’.
2 Total Edit Distance. OmniDocBench was run with reasoning_effort set to ‘none’ to replicate low-cost, low-latency efficiency.
