Skip to main content

Supported Models

Tenstorrent supports a broad range of models across LLMs, CNNs, speech, and NLP. Below is a categorized list of officially tested and supported models on Tenstorrent hardware, based on real benchmark results.

LLMs

ModelBatchHardwarettft (ms)t/s/uTarget t/s/ut/s
QwQ 32B (TP=8)32QuietBox13325.230806.4
DeepSeek R1 Distill Llama 3.3 70B32QuietBox15915.220486.4
Llama 3.1 70B (TP=32)32Galaxy45.1801443.2
Llama 3.1 70B (TP=8)32QuietBox15915.220486.4
Llama 3.2 11B Vision (TP=2)16n300255015.817252.8
Qwen 2.5 7B (TP=2)32n30012632.5381040.0
Qwen 2.5 72B (TP=8)32QuietBox33314.520464.0
Falcon 7B32n1507018.326585.6
Falcon 7B (DP=8)256QuietBox8815.5263968.0
Falcon 7B (DP=32)1024Galaxy2234.8264915.2
Falcon 40B (TP=8)32QuietBox11.936380.8
Llama 3.1 8B32p10087*26.5*848.0*
Llama 3.1 8B32p15069*29.1*931.2*
Llama 3.1 8B (DP=2)642 x p15064*18.6*1190.4*
Llama 3.1 8B32n15010424.623787.2
Llama 3.2 1B32n1502367.61602163.2
Llama 3.2 3B32n1505343.5601392.0
Mamba 2.8B32n1503713.741438.4
Mistral 7B32n15024025.8423826.9
Mixtral 8x7B (TP=8)32QuietBox20716.633531.2

Last Update: May 5, 2025

note
  • ttft = time to first token | t/s/u = tokens/second/user | t/s = tokens/second; where t/s = t/s/u * batch.
  • TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.
  • The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
  • The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.

Speech-to-Text

ModelBatchHardwarettft (ms)t/s/uTarget t/s/ut/s
Whisper distil-large-v31n15024454.74554.7

CNNs

ModelBatchHardwarefpsTarget fps
ResNet-50 (224x224)16n15047007000
ResNet-50 (DP=2)32n300920014000
ResNet-50 (DP=8)128QuietBox3580056000
ResNet-50 (DP=32)512Galaxy96800224000
ViT (224x224)8n15011001600
Stable Diffusion 1.4 (512x512)1n1500.1170.3
YOLOv4 (320x320)1n150120300
YOLOv4 (640x640)1n15050100
SegFormer Semantic Seg (512x512)1n15090300
Stable Diffusion 3.5 medium1n1500.060.3

NLPs

ModelBatchHardwaresen/secTarget sen/sec
BERT-Large8n150270400