Tenstorrent NPU
Tenstorrent's Neural Processing Units (NPU) are designed for high-performance deep learning inference, LLM, computer vision, and a variety of advanced AI workloads.
Tenstorrent NPU are built on the scalable Tensix™ architecture.
Recent models, such as the Blackhole p100a and p150a, deliver improved performance over the previous-generation Wormhole boards in terms of core count, memory bandwidth, and clock speed.
The hardware is designed to support multi-chip configurations, enabling flexible execution of large-scale AI workloads.
Its system-level design accommodates multi-user environments and supports partitioning for enhanced scalability.
Tenstorrent NPU can be deployed in both on-premises infrastructure and selected cloud platforms.
Further Resources
- Tensix™ Core Architecture Overview
- Wormhole Product Page
- Blackhole Product Page
- Tenstorrent GitHub Repositories
- Getting Started with Tenstorrent SDKs
Hardware Specification
| Feature | Wormhole N300S (Dual-Chip) | Blackhole P100A (Single-Chip) | Blackhole P150A (Single-Chip) |
|---|---|---|---|
| Tensix Cores | 128 (64 per ASIC) | 120 | 140 |
| AI Clock | 1.0 GHz | 1.35 GHz | 1.35 GHz |
| Big RISC-V” Cores | N/A | 16 | 16 |
| SRAM | 192MB | 180 MB | 210 MB |
| Memory | 24GB GDDR6 | 28GB GDDR6 | 32GB GDDR6 |
| Memory Speed | 12 GT/sec | 16 GT/sec | 16 GT/sec |
| Memory Bandwidth | 576 GB/sec | 448 GB/sec | 512 GB/sec |
| TeraFLOPS (FP8) | 466 | 664 | 774 |
| TeraFLOPS (FP16) | 131 | 166 | 194 |
| TeraFLOPS (BLOCKFP8) | 262 | 332 | 387 |
| TBP(Total Board Power) | 300W | 300W | 300W |
| System Interface | PCI Express 4.0 x16 | PCI Express 5.0 x16 | PCI Express 5.0 x16 |
| Cooling | Passive | Active | Active |
| Connectivity | 2x Warp 100, 2x QSFP-DD 200G* | N/A | 4x QSFP-DD 800G (Passive)* |