Llama 3.3 Nemotron Super 49B
Active49B parameter efficient model with frontier reasoning capability from NVIDIA.
Overview
Llama 3.3 Nemotron Super 49B is an NVIDIA model derived from Llama 3.3 with pruning and distillation, achieving strong reasoning performance at 49B active parameters.
History
Llama 3.3 Nemotron Super 49B was released on 2025-03-12.
Training & availability
Weights are publicly available under the Llama 3.3 Community License license, making this an open-weight model suitable for on-prem deployment and fine-tuning.
Capabilities
-
Context window: 128K tokens.
-
Max output: 32K tokens.
-
Input modalities: text.
Recommended for: agentic, open-source.
Limitations
- Text-only — cannot process images, audio, or video inputs.
Quick start
Minimal example using the OpenRouter API. Copy, paste, replace the key.
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-...",
)
resp = client.chat.completions.create(
model="nvidia/llama-3-3-nemotron-super-49b",
messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
)
print(resp.choices[0].message.content)Cost calculator
Estimate your monthly bill. Presets are typical workload sizes.
Integrations & tooling support
- Tool calling
- Supported
- Structured outputs
- Not supported
Price vs quality
Priced low — good for high-volume tasks. Quality tier pending more benchmark coverage.
- Quality percentile
- —
- Effective price
- $0.325/1M
- Pricing breakdown
- $0.1/1M in
$0.4/1M out
Community ratings
Rate Llama 3.3 Nemotron Super 49B
Sign in to rate and review.
Comments
Sign in to leave a comment.