Phi-4 — Microsoft | Modeldex

Phi-4

Active86.5Frontier

14B small language model from Microsoft Research with state-of-the-art STEM reasoning.

Updated 4 days agoStructured data from Modeldex catalog

MathFrontierOpen source

Knowledge cutoff

Jun 30, 2024(1.8 years ago)

API release

Jan 10, 2025(1.3 years ago)

Intelligence Index

86.5/ 100

Frontierweighted across 3 benchmarks

Math: 88.1
General knowledge: 84.8

Computed as the mean of per-category averages across MMLU, GPQA, SWE-bench, HumanEval, MATH, GSM8K, AIME, Aider Polyglot and more. See each benchmark for methodology.

Overview

Phi-4 is Microsoft's 14B parameter small language model trained with synthetic data to achieve strong reasoning and STEM performance, outperforming much larger models on many benchmarks.

History

Phi-4 became available via the Microsoft API on 2025-01-10.

Training & availability

Training data has a knowledge cutoff of 2024-06-30 — information about events after that date is unlikely to appear in the model's responses. Weights are publicly available under the MIT license, making this an open-weight model suitable for on-prem deployment and fine-tuning.

Capabilities

Context window: 16K tokens.
Max output: 16K tokens.
Input modalities: text.
Intelligence Index: 86.5/100.

Strongest categories: Math (88), General knowledge (85).

Recommended for: math, frontier, open-source.

Limitations

The knowledge cutoff is 21 months old — this model will not know about recent events, releases, or API changes.
The context window (16K tokens) is modest by 2026 standards — unsuitable for processing long documents in a single request.
Text-only — cannot process images, audio, or video inputs.

Quick start

Minimal example using the OpenRouter API. Copy, paste, replace the key.

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)
resp = client.chat.completions.create(
    model="microsoft/phi-4",
    messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
)
print(resp.choices[0].message.content)

Providers & performance

2 providers

Multi-provider inference routes for this model — sorted by throughput. Latency is time-to-first-token; throughput is output tokens per second. Data from OpenRouter, measured over the last 30 minutes.

Provider	Throughput	Latency (TTFT)	Input $ / 1M	Output $ / 1M	Context	Quant	Supports
DeepInfra	94tok/s	198ms	$0.07	$0.14	16K	bf16	json
NextBit	26tok/s	640ms	$0.065	$0.14	16K	int4	json

Popularity

Signals from open-source communities — not a quality measure, but useful for gauging adoption among developers.

🤗

HuggingFace downloads

514.3K/ month

HuggingFace likes

2.2K

Benchmarks

Benchmark	Score	Source
GSM8KMath	95.8% accuracy	Self-reported Microsoft Phi-4 tech report
MATHMath	80.4% accuracy	Self-reported Microsoft Phi-4 tech report
MMLUGeneral knowledge	84.8% accuracy	Self-reported Microsoft Phi-4 tech report

Integrations & tooling support

Tool calling: Not supported
Structured outputs: Not supported

Price vs quality

Competent benchmarks

Solid benchmark performance. Pricing not publicly available — check the provider.