Tenstorrent supports a broad range of models across LLMs, CNNs, speech, and NLP. Below is a categorized list of officially tested and supported models on Tenstorrent hardware, based on real benchmark results.
LLMs
| Model | Batch | Hardware | ttft (ms) | t/s/u | Target t/s/u | t/s |
|---|
| QwQ 32B (TP=8) | 32 | QuietBox | 133 | 25.2 | 30 | 806.4 |
| DeepSeek R1 Distill Llama 3.3 70B | 32 | QuietBox | 159 | 15.2 | 20 | 486.4 |
| Llama 3.1 70B (TP=32) | 32 | Galaxy | | 45.1 | 80 | 1443.2 |
| Llama 3.1 70B (TP=8) | 32 | QuietBox | 159 | 15.2 | 20 | 486.4 |
| Llama 3.2 11B Vision (TP=2) | 16 | n300 | 2550 | 15.8 | 17 | 252.8 |
| Qwen 2.5 7B (TP=2) | 32 | n300 | 126 | 32.5 | 38 | 1040.0 |
| Qwen 2.5 72B (TP=8) | 32 | QuietBox | 333 | 14.5 | 20 | 464.0 |
| Falcon 7B | 32 | n150 | 70 | 18.3 | 26 | 585.6 |
| Falcon 7B (DP=8) | 256 | QuietBox | 88 | 15.5 | 26 | 3968.0 |
| Falcon 7B (DP=32) | 1024 | Galaxy | 223 | 4.8 | 26 | 4915.2 |
| Falcon 40B (TP=8) | 32 | QuietBox | | 11.9 | 36 | 380.8 |
| Llama 3.1 8B | 32 | p100 | 87* | 26.5* | | 848.0* |
| Llama 3.1 8B | 32 | p150 | 69* | 29.1* | | 931.2* |
| Llama 3.1 8B (DP=2) | 64 | 2 x p150 | 64* | 18.6* | | 1190.4* |
| Llama 3.1 8B | 32 | n150 | 104 | 24.6 | 23 | 787.2 |
| Llama 3.2 1B | 32 | n150 | 23 | 67.6 | 160 | 2163.2 |
| Llama 3.2 3B | 32 | n150 | 53 | 43.5 | 60 | 1392.0 |
| Mamba 2.8B | 32 | n150 | 37 | 13.7 | 41 | 438.4 |
| Mistral 7B | 32 | n150 | 240 | 25.84 | 23 | 826.9 |
| Mixtral 8x7B (TP=8) | 32 | QuietBox | 207 | 16.6 | 33 | 531.2 |
Last Update: May 5, 2025
- ttft = time to first token | t/s/u = tokens/second/user | t/s = tokens/second; where t/s = t/s/u * batch.
- TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.
- The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
- The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.
Speech-to-Text
| Model | Batch | Hardware | ttft (ms) | t/s/u | Target t/s/u | t/s |
|---|
| Whisper distil-large-v3 | 1 | n150 | 244 | 54.7 | 45 | 54.7 |
CNNs
| Model | Batch | Hardware | fps | Target fps |
|---|
| ResNet-50 (224x224) | 16 | n150 | 4700 | 7000 |
| ResNet-50 (DP=2) | 32 | n300 | 9200 | 14000 |
| ResNet-50 (DP=8) | 128 | QuietBox | 35800 | 56000 |
| ResNet-50 (DP=32) | 512 | Galaxy | 96800 | 224000 |
| ViT (224x224) | 8 | n150 | 1100 | 1600 |
| Stable Diffusion 1.4 (512x512) | 1 | n150 | 0.117 | 0.3 |
| YOLOv4 (320x320) | 1 | n150 | 120 | 300 |
| YOLOv4 (640x640) | 1 | n150 | 50 | 100 |
| SegFormer Semantic Seg (512x512) | 1 | n150 | 90 | 300 |
| Stable Diffusion 3.5 medium | 1 | n150 | 0.06 | 0.3 |
NLPs
| Model | Batch | Hardware | sen/sec | Target sen/sec |
|---|
| BERT-Large | 8 | n150 | 270 | 400 |