Llama 3.1 Nemotron 70B — NVIDIA | Modeldex

Llama 3.1 Nemotron 70B

Active

NVIDIA-tuned Llama 3.1 70B with state-of-the-art alignment and helpfulness.

Updated 3 days agoStructured data from Modeldex catalog

AgenticOpen source

Not enough benchmark coverage yet for an Intelligence Index — needs at least 3 results across 2 categories.

Overview

Llama 3.1 Nemotron 70B is NVIDIA's fine-tuned version of Llama 3.1 70B, optimized for helpfulness using NVIDIA's RLHF pipeline. It surpasses the base model on instruction following and open-ended generation.

History

Llama 3.1 Nemotron 70B was released on 2024-10-03.

Training & availability

Weights are publicly available under the Llama 3.1 Community License license, making this an open-weight model suitable for on-prem deployment and fine-tuning.

Capabilities

Context window: 128K tokens.
Max output: 32K tokens.
Input modalities: text.

Recommended for: agentic, open-source.

Limitations

Text-only — cannot process images, audio, or video inputs.

Quick start

Minimal example using the OpenRouter API. Copy, paste, replace the key.

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)
resp = client.chat.completions.create(
    model="nvidia/llama-3-1-nemotron-70b",
    messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
)
print(resp.choices[0].message.content)

Cost calculator

Estimate your monthly bill. Presets are typical workload sizes.

Input tokens / month5.0M

@ $1.2/1M

Output tokens / month2.0M

@ $1.2/1M

Input cost

5.0M × $1.2/1M

Output cost

$2.4

2.0M × $1.2/1M

Total / month

$8.4

$100.8 / year

Integrations & tooling support

Tool calling: Supported
Structured outputs: Not supported

Price vs quality

Mid-tier pricing

Standard pricing band. Quality tier pending more benchmark coverage.

Quality percentile: —
Effective price: $1.2/1M
Pricing breakdown: $1.2/1M in
$1.2/1M out

Community ratings

No ratings yet. Be the first to rate Llama 3.1 Nemotron 70B.