GPT-4.1 nano
GPT-4.1 nano is the smallest and fastest model in the GPT-4.1 family, designed for high-volume, low-latency tasks like classification, autocomplete, and routing, delivering strong results on MMLU at the lowest price point in the GPT-4.1 lineup.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-4.1-nano', prompt: 'Why is the sky blue?'})Playground
Try out GPT-4.1 nano by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
About GPT-4.1 nano
GPT-4.1 nano was introduced on April 14, 2025 as the smallest and most latency-optimized model in the GPT-4.1 family. OpenAI designed it specifically for tasks where speed and cost efficiency take priority over frontier reasoning depth: classification, autocomplete, routing decisions, and other lightweight inference workloads that need to run at high volume.
Despite being the entry-level tier of the GPT-4.1 family, GPT-4.1 nano posts creditable benchmark scores for its size: 80.1% on MMLU (Massive Multitask Language Understanding) and 50.3% on GPQA (Graduate-Level Google-Proof Q&A). These numbers show that the GPT-4.1 training improvements carried down to the smallest variant. Like its larger siblings, it supports the full context window of 1.0M tokens, which is a notable capability for a model at its price point and enables it to handle tasks that involve reading long inputs even if the outputs remain short.
GPT-4.1 nano inherits the GPT-4.1 family's 75% prompt caching discount and the removal of surcharges for long-context usage. For applications that preload a large knowledge base or system prompt once and then issue many rapid short queries against it, these economics make nano an attractive option for the query stage of a retrieval-augmented pipeline.
What To Consider When Choosing a Provider
- Configuration: For event-driven pipelines that fire many rapid inferences per user action (real-time intent classification, content routing), GPT-4.1 nano's speed and low cost make it practical to run inference inline without queuing.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use GPT-4.1 nano
Best For
- Real-time classification: Sentiment analysis, intent detection, and topic labeling at high request volume
- Autocomplete features: Inline suggestion experiences requiring sub-second response times
- Routing and triage: Logic within multi-model pipelines that decides which downstream model handles a request
- Short-answer extraction: Pulling answers from long documents where the context window of 1.0M tokens and nano's low cost combine well
- Cost-sensitive batch jobs: Millions of inferences that need to run economically
Consider Alternatives When
- Complex reasoning: GPT-4.1 mini or GPT-4.1 provide meaningfully higher capability for multi-step reasoning, code generation, or complex instruction following
- Edge-case quality: Larger models in the family handle nuanced or ambiguous inputs better
- Hard STEM problems: O1-mini or o1 are purpose-built for chain-of-thought reasoning on difficult STEM tasks
Conclusion
GPT-4.1 nano brings the GPT-4.1 family's architectural improvements, including the context window of 1.0M tokens and 75% caching discount, to the fastest and most affordable tier, making it the right choice for classification, routing, and high-throughput lightweight inference through AI Gateway.
Frequently Asked Questions
What tasks is GPT-4.1 nano specifically designed for?
OpenAI designed it for classification, autocomplete, and routing where response speed and low cost outweigh the need for frontier reasoning.
Does GPT-4.1 nano really support a context window of 1.0M tokens?
Yes. All three GPT-4.1 family members share the context window of 1.0M tokens, which is unusual for a model at nano's price and speed tier.
What benchmark scores does GPT-4.1 nano achieve?
At launch, GPT-4.1 nano scored 80.1% on MMLU and 50.3% on GPQA, showing that the family's training improvements extended to the smallest variant.
How does GPT-4.1 nano's pricing compare to the rest of the GPT-4.1 family?
See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GPT-4.1 nano.
Is GPT-4.1 nano suitable as the query model in a RAG pipeline?
Yes. Pairing a large preloaded knowledge base or system prompt (benefiting from the 75% cache discount) with rapid, inexpensive nano queries is a practical pattern for retrieval-augmented generation at scale.
When should I use nano versus mini versus GPT-4.1?
Nano: classification, routing, autocomplete. Mini: GPT-4o-class quality with lower cost and latency. GPT-4.1: maximum coding and instruction-following accuracy.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.