Gemini 2.5 Flash Lite
ActiveUltra-fast lightweight variant.
History
Gemini 2.5 Flash Lite became available via the Google API on 2025-07-22.
Training & availability
Training data has a knowledge cutoff of 2025-01-31 — information about events after that date is unlikely to appear in the model's responses. Google has not released the underlying model weights — access is via their hosted API only.
Capabilities
-
Context window: 1.0M tokens.
-
Max output: 65K tokens.
-
Input modalities: text, image.
Recommended for: vision, agentic, long-context.
Limitations
- The knowledge cutoff is 14 months old — this model will not know about recent events, releases, or API changes.
Quick start
Minimal example using the google API. Copy, paste, replace the key.
from google import genai
client = genai.Client(api_key="...")
resp = client.models.generate_content(
model="gemini-2.5-flash-lite",
contents="Explain quantum computing in one sentence.",
)
print(resp.text)Cost calculator
Estimate your monthly bill. Presets are typical workload sizes.
Providers & performance
2 providersMulti-provider inference routes for this model — sorted by throughput. Latency is time-to-first-token; throughput is output tokens per second. Data from OpenRouter, measured over the last 30 minutes.
| Provider | Throughput | Latency (TTFT) | Input $ / 1M | Output $ / 1M | Context | Quant | Supports |
|---|---|---|---|---|---|---|---|
| 170tok/s | 376ms | $0.1 | $0.4 | 1.0M | — | tools · json | |
| Google AI Studio | 124tok/s | 562ms | $0.1 | $0.4 | 1.0M | — | tools · json |
Integrations & tooling support
- Tool calling
- Supported
- Structured outputs
- Supported
Price vs quality
Priced low — good for high-volume tasks. Quality tier pending more benchmark coverage.
- Quality percentile
- —
- Effective price
- $0.325/1M
- Pricing breakdown
- $0.1/1M in
$0.4/1M out
Community ratings
Rate Gemini 2.5 Flash Lite
Sign in to rate and review.
Comments
Sign in to leave a comment.