Supported Models

Tenstorrent supports a broad range of models across LLMs, CNNs, speech, and NLP. Below is a categorized list of officially tested and supported models on Tenstorrent hardware, based on real benchmark results.

LLMs

Model	Batch	Hardware	ttft (ms)	t/s/u	Target t/s/u	t/s
QwQ 32B (TP=8)	32	QuietBox	133	25.2	30	806.4
DeepSeek R1 Distill Llama 3.3 70B	32	QuietBox	159	15.2	20	486.4
Llama 3.1 70B (TP=32)	32	Galaxy		45.1	80	1443.2
Llama 3.1 70B (TP=8)	32	QuietBox	159	15.2	20	486.4
Llama 3.2 11B Vision (TP=2)	16	n300	2550	15.8	17	252.8
Qwen 2.5 7B (TP=2)	32	n300	126	32.5	38	1040.0
Qwen 2.5 72B (TP=8)	32	QuietBox	333	14.5	20	464.0
Falcon 7B	32	n150	70	18.3	26	585.6
Falcon 7B (DP=8)	256	QuietBox	88	15.5	26	3968.0
Falcon 7B (DP=32)	1024	Galaxy	223	4.8	26	4915.2
Falcon 40B (TP=8)	32	QuietBox		11.9	36	380.8
Llama 3.1 8B	32	p100	87*	26.5*		848.0*
Llama 3.1 8B	32	p150	69*	29.1*		931.2*
Llama 3.1 8B (DP=2)	64	2 x p150	64*	18.6*		1190.4*
Llama 3.1 8B	32	n150	104	24.6	23	787.2
Llama 3.2 1B	32	n150	23	67.6	160	2163.2
Llama 3.2 3B	32	n150	53	43.5	60	1392.0
Mamba 2.8B	32	n150	37	13.7	41	438.4
Mistral 7B	32	n150	240	25.84	23	826.9
Mixtral 8x7B (TP=8)	32	QuietBox	207	16.6	33	531.2

Last Update: May 5, 2025

note

ttft = time to first token | t/s/u = tokens/second/user | t/s = tokens/second; where t/s = t/s/u * batch.
TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.
The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.

Speech-to-Text

Model	Batch	Hardware	ttft (ms)	t/s/u	Target t/s/u	t/s
Whisper distil-large-v3	1	n150	244	54.7	45	54.7

CNNs

Model	Batch	Hardware	fps	Target fps
ResNet-50 (224x224)	16	n150	4700	7000
ResNet-50 (DP=2)	32	n300	9200	14000
ResNet-50 (DP=8)	128	QuietBox	35800	56000
ResNet-50 (DP=32)	512	Galaxy	96800	224000
ViT (224x224)	8	n150	1100	1600
Stable Diffusion 1.4 (512x512)	1	n150	0.117	0.3
YOLOv4 (320x320)	1	n150	120	300
YOLOv4 (640x640)	1	n150	50	100
SegFormer Semantic Seg (512x512)	1	n150	90	300
Stable Diffusion 3.5 medium	1	n150	0.06	0.3

NLPs

Model	Batch	Hardware	sen/sec	Target sen/sec
BERT-Large	8	n150	270	400

LLMs​

Speech-to-Text​

CNNs​

NLPs​

LLMs

Speech-to-Text

CNNs

NLPs