What tasks is GPT-4.1 nano specifically designed for?

OpenAI designed it for classification, autocomplete, and routing where response speed and low cost outweigh the need for frontier reasoning.

Does GPT-4.1 nano really support a context window of 1.0M tokens?

Yes. All three GPT-4.1 family members share the context window of 1.0M tokens, which is unusual for a model at nano's price and speed tier.

What benchmark scores does GPT-4.1 nano achieve?

At launch, GPT-4.1 nano scored 80.1% on MMLU and 50.3% on GPQA, showing that the family's training improvements extended to the smallest variant.

How does GPT-4.1 nano's pricing compare to the rest of the GPT-4.1 family?

See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GPT-4.1 nano.

Is GPT-4.1 nano suitable as the query model in a RAG pipeline?

Yes. Pairing a large preloaded knowledge base or system prompt (benefiting from the 75% cache discount) with rapid, inexpensive nano queries is a practical pattern for retrieval-augmented generation at scale.

When should I use nano versus mini versus GPT-4.1?

Nano: classification, routing, autocomplete. Mini: GPT-4o-class quality with lower cost and latency. GPT-4.1: maximum coding and instruction-following accuracy.

What are typical latency characteristics?

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

Dashboard

GPT-4.1 nano

GPT-4.1 nano is the smallest and fastest model in the GPT-4.1 family, designed for high-volume, low-latency tasks like classification, autocomplete, and routing, delivering strong results on MMLU at the lowest price point in the GPT-4.1 lineup.

File InputTool UseVision (Image)Implicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-4.1-nano',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Latency Uptime Status Similar FAQ

Playground

Try out GPT-4.1 nano by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Azure

0.5s

$0.10/M

$0.40/M

Read:$0.03/M

Write:—

$14/K

+ input costs

—

04/14/2025

OpenAI

0.5s

$0.10/M

$0.40/M

Read:$0.03/M

Write:—

$10.00/K

+ input costs

—

04/14/2025

More models by OpenAI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

openai/gpt-5.5

2.8s

55tps

$5.00/M

$30.00/M

Read:

$0.5/M

Write:

—

$10.00/K

+ input costs

—

04/24/2026

openai/gpt-5.4-mini

400K

1.2s

182tps

$0.75/M

$4.50/M

Read:$0.07/M

Write:—

$10.00/K

+ input costs

—

03/17/2026

openai/gpt-5.4-nano

400K

0.6s

61tps

$0.20/M

$1.25/M

Read:$0.02/M

Write:—

$10.00/K

+ input costs

—

03/17/2026

openai/gpt-5.4

1.1M

1.1s

92tps

$2.50/M

$15.00/M

Read:

$0.25/M

Write:

—

$10.00/K

+ input costs

—

03/05/2026

openai/gpt-5-mini

400K

3.5s

306tps

$0.25/M

$2.00/M

Read:$0.03/M

Write:—

$14/K

+ input costs

—

08/07/2025

openai/gpt-oss-120b

131K

0.2s

723tps

$0.35/M

$0.75/M

Read:$0.25/M

Write:—

—

08/05/2025

About GPT-4.1 nano

GPT-4.1 nano was introduced on April 14, 2025 as the smallest and most latency-optimized model in the GPT-4.1 family. OpenAI designed it specifically for tasks where speed and cost efficiency take priority over frontier reasoning depth: classification, autocomplete, routing decisions, and other lightweight inference workloads that need to run at high volume.

Despite being the entry-level tier of the GPT-4.1 family, GPT-4.1 nano posts creditable benchmark scores for its size: 80.1% on MMLU (Massive Multitask Language Understanding) and 50.3% on GPQA (Graduate-Level Google-Proof Q&A). These numbers show that the GPT-4.1 training improvements carried down to the smallest variant. Like its larger siblings, it supports the full context window of 1.0M tokens, which is a notable capability for a model at its price point and enables it to handle tasks that involve reading long inputs even if the outputs remain short.

GPT-4.1 nano inherits the GPT-4.1 family's 75% prompt caching discount and the removal of surcharges for long-context usage. For applications that preload a large knowledge base or system prompt once and then issue many rapid short queries against it, these economics make nano an attractive option for the query stage of a retrieval-augmented pipeline.

What To Consider When Choosing a Provider

Configuration: For event-driven pipelines that fire many rapid inferences per user action (real-time intent classification, content routing), GPT-4.1 nano's speed and low cost make it practical to run inference inline without queuing.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT-4.1 nano

Best For

Real-time classification: Sentiment analysis, intent detection, and topic labeling at high request volume
Autocomplete features: Inline suggestion experiences requiring sub-second response times
Routing and triage: Logic within multi-model pipelines that decides which downstream model handles a request
Short-answer extraction: Pulling answers from long documents where the context window of 1.0M tokens and nano's low cost combine well
Cost-sensitive batch jobs: Millions of inferences that need to run economically

Consider Alternatives When

Complex reasoning: GPT-4.1 mini or GPT-4.1 provide meaningfully higher capability for multi-step reasoning, code generation, or complex instruction following
Edge-case quality: Larger models in the family handle nuanced or ambiguous inputs better
Hard STEM problems: O1-mini or o1 are purpose-built for chain-of-thought reasoning on difficult STEM tasks

Conclusion

GPT-4.1 nano brings the GPT-4.1 family's architectural improvements, including the context window of 1.0M tokens and 75% caching discount, to the fastest and most affordable tier, making it the right choice for classification, routing, and high-throughput lightweight inference through AI Gateway.

Frequently Asked Questions

What tasks is GPT-4.1 nano specifically designed for?
OpenAI designed it for classification, autocomplete, and routing where response speed and low cost outweigh the need for frontier reasoning.
Does GPT-4.1 nano really support a context window of 1.0M tokens?
Yes. All three GPT-4.1 family members share the context window of 1.0M tokens, which is unusual for a model at nano's price and speed tier.
What benchmark scores does GPT-4.1 nano achieve?
At launch, GPT-4.1 nano scored 80.1% on MMLU and 50.3% on GPQA, showing that the family's training improvements extended to the smallest variant.
How does GPT-4.1 nano's pricing compare to the rest of the GPT-4.1 family?
See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GPT-4.1 nano.
Is GPT-4.1 nano suitable as the query model in a RAG pipeline?
Yes. Pairing a large preloaded knowledge base or system prompt (benefiting from the 75% cache discount) with rapid, inexpensive nano queries is a practical pattern for retrieval-augmented generation at scale.
When should I use nano versus mini versus GPT-4.1?
Nano: classification, routing, autocomplete. Mini: GPT-4o-class quality with lower cost and latency. GPT-4.1: maximum coding and instruction-following accuracy.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GPT-4.1 nano

Playground

Providers

More models by OpenAI

About GPT-4.1 nano

What To Consider When Choosing a Provider

When to Use GPT-4.1 nano

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions