Skip to main content

Tenstorrent NPU

Tenstorrent's Neural Processing Units (NPU) are designed for high-performance deep learning inference, LLM, computer vision, and a variety of advanced AI workloads.

Tenstorrent NPU

Tenstorrent NPU are built on the scalable Tensix™ architecture.
Recent models, such as the Blackhole p100a and p150a, deliver improved performance over the previous-generation Wormhole boards in terms of core count, memory bandwidth, and clock speed.
The hardware is designed to support multi-chip configurations, enabling flexible execution of large-scale AI workloads.
Its system-level design accommodates multi-user environments and supports partitioning for enhanced scalability.
Tenstorrent NPU can be deployed in both on-premises infrastructure and selected cloud platforms.

Further Resources

Hardware Specification

FeatureWormhole N300S (Dual-Chip)Blackhole P100A (Single-Chip)Blackhole P150A (Single-Chip)
Tensix Cores128 (64 per ASIC)120140
AI Clock1.0 GHz1.35 GHz1.35 GHz
Big RISC-V” CoresN/A1616
SRAM192MB180 MB210 MB
Memory24GB GDDR628GB GDDR632GB GDDR6
Memory Speed12 GT/sec16 GT/sec16 GT/sec
Memory Bandwidth576 GB/sec448 GB/sec512 GB/sec
TeraFLOPS (FP8)466664774
TeraFLOPS (FP16)131166194
TeraFLOPS (BLOCKFP8)262332387
TBP(Total Board Power)300W300W300W
System InterfacePCI Express 4.0 x16PCI Express 5.0 x16PCI Express 5.0 x16
CoolingPassiveActiveActive
Connectivity2x Warp 100, 2x QSFP-DD 200G*N/A4x QSFP-DD 800G (Passive)*