Llama 3.3 70B Instruct
ActiveThe Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
Overview
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
History
Llama 3.3 70B Instruct became available via the Meta API on 2024-12-06.
Training & availability
Training data has a knowledge cutoff of 2023-12-31 — information about events after that date is unlikely to appear in the model's responses. Meta has not released the underlying model weights — access is via their hosted API only.
Capabilities
-
Context window: 131K tokens.
-
Input modalities: text.
Recommended for: fast, cheap.
Limitations
-
The knowledge cutoff is 27 months old — this model will not know about recent events, releases, or API changes.
-
Text-only — cannot process images, audio, or video inputs.
Pricing
- Input: $0.1000 per 1M tokens
- Output: $0.3200 per 1M tokens
Use the cost calculator above to estimate monthly spend for your workload.
Example interactions
Curated prompts showing the model's response style — not cherry-picked to impress, picked to show what typical output looks like.
Quick start
Minimal example using the OpenRouter API. Copy, paste, replace the key.
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-...",
)
resp = client.chat.completions.create(
model="meta/llama-3-3-70b-instruct",
messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
)
print(resp.choices[0].message.content)Cost calculator
Estimate your monthly bill. Presets are typical workload sizes.
Providers & performance
16 providersMulti-provider inference routes for this model — sorted by throughput. Latency is time-to-first-token; throughput is output tokens per second. Data from OpenRouter, measured over the last 30 minutes.
| Provider | Throughput | Latency (TTFT) | Input $ / 1M | Output $ / 1M | Context | Quant | Supports |
|---|---|---|---|---|---|---|---|
| Groq | 184tok/s | 308ms | $0.59 | $0.79 | 131K | — | tools · json |
| SambaNova | 92tok/s | 548ms | $0.45 | $0.9 | 16K | bf16 | — |
| SambaNova | 77tok/s |
Popularity
Signals from open-source communities — not a quality measure, but useful for gauging adoption among developers.
Integrations & tooling support
- Tool calling
- Not supported
- Structured outputs
- Not supported
Price vs quality
Priced low — good for high-volume tasks. Quality tier pending more benchmark coverage.
- Quality percentile
- —
- Effective price
- $0.265/1M
- Pricing breakdown
- $0.1/1M in
$0.32/1M out
Community ratings
Rate Llama 3.3 70B Instruct
Sign in to rate and review.
Comments
Sign in to leave a comment.