Llama 3.3 Nemotron Super 49B — NVIDIA | Modeldex

Llama 3.3 Nemotron Super 49B

Active

49B parameter efficient model with frontier reasoning capability from NVIDIA.

Updated 3 days agoStructured data from Modeldex catalog

AgenticOpen source

Not enough benchmark coverage yet for an Intelligence Index — needs at least 3 results across 2 categories.

Overview

Llama 3.3 Nemotron Super 49B is an NVIDIA model derived from Llama 3.3 with pruning and distillation, achieving strong reasoning performance at 49B active parameters.

History

Llama 3.3 Nemotron Super 49B was released on 2025-03-12.

Training & availability

Weights are publicly available under the Llama 3.3 Community License license, making this an open-weight model suitable for on-prem deployment and fine-tuning.

Capabilities

Context window: 128K tokens.
Max output: 32K tokens.
Input modalities: text.

Recommended for: agentic, open-source.

Limitations

Text-only — cannot process images, audio, or video inputs.

Quick start

Minimal example using the OpenRouter API. Copy, paste, replace the key.

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)
resp = client.chat.completions.create(
    model="nvidia/llama-3-3-nemotron-super-49b",
    messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
)
print(resp.choices[0].message.content)

Cost calculator

Estimate your monthly bill. Presets are typical workload sizes.

Input tokens / month5.0M

@ $0.1/1M

Output tokens / month2.0M

@ $0.4/1M

Input cost

$0.5

5.0M × $0.1/1M

Output cost

$0.8

2.0M × $0.4/1M

Total / month

$1.3

$15.6 / year

Integrations & tooling support

Tool calling: Supported
Structured outputs: Not supported

Price vs quality

Budget pricing

Priced low — good for high-volume tasks. Quality tier pending more benchmark coverage.