Phi-4
Active86.5Frontier14B small language model from Microsoft Research with state-of-the-art STEM reasoning.
Intelligence Index
86.5/ 100
Frontierweighted across 3 benchmarks
- Math
- 88.1
- General knowledge
- 84.8
Computed as the mean of per-category averages across MMLU, GPQA, SWE-bench, HumanEval, MATH, GSM8K, AIME, Aider Polyglot and more. See each benchmark for methodology.
Overview
Phi-4 is Microsoft's 14B parameter small language model trained with synthetic data to achieve strong reasoning and STEM performance, outperforming much larger models on many benchmarks.
History
Phi-4 became available via the Microsoft API on 2025-01-10.
Training & availability
Training data has a knowledge cutoff of 2024-06-30 — information about events after that date is unlikely to appear in the model's responses. Weights are publicly available under the MIT license, making this an open-weight model suitable for on-prem deployment and fine-tuning.
Capabilities
-
Context window: 16K tokens.
-
Max output: 16K tokens.
-
Input modalities: text.
-
Intelligence Index: 86.5/100.
Strongest categories: Math (88), General knowledge (85).
Recommended for: math, frontier, open-source.
Limitations
-
The knowledge cutoff is 21 months old — this model will not know about recent events, releases, or API changes.
-
The context window (16K tokens) is modest by 2026 standards — unsuitable for processing long documents in a single request.
-
Text-only — cannot process images, audio, or video inputs.
Quick start
Minimal example using the OpenRouter API. Copy, paste, replace the key.
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-...",
)
resp = client.chat.completions.create(
model="microsoft/phi-4",
messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
)
print(resp.choices[0].message.content)Providers & performance
2 providersMulti-provider inference routes for this model — sorted by throughput. Latency is time-to-first-token; throughput is output tokens per second. Data from OpenRouter, measured over the last 30 minutes.
| Provider | Throughput | Latency (TTFT) | Input $ / 1M | Output $ / 1M | Context | Quant | Supports |
|---|---|---|---|---|---|---|---|
| DeepInfra | 94tok/s | 198ms | $0.07 | $0.14 | 16K | bf16 | json |
| NextBit | 26tok/s | 640ms | $0.065 | $0.14 | 16K | int4 | json |
Popularity
Signals from open-source communities — not a quality measure, but useful for gauging adoption among developers.
Benchmarks
Integrations & tooling support
- Tool calling
- Not supported
- Structured outputs
- Not supported
Price vs quality
Solid benchmark performance. Pricing not publicly available — check the provider.
- Quality percentile
- 41.3%
- Effective price
- —
- Pricing breakdown
- — in
— out
Community ratings
Rate Phi-4
Sign in to rate and review.
Comments
Sign in to leave a comment.