Model Comparison

Author
qwen
Context Length33K

Qwen2 72B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.

For more details, see this blog post and GitHub repo.

Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

Provider

Together

Pricing

Input$0.90 / M tokens
Output$0.90 / M tokens
Images– –

Endpoint Features

Quantizationunknown
Max Tokens (input + output)33K
Max Output Tokens4K
Stream cancellation
Supports Tools
No Prompt Training
Reasoning– –