Skip to main content

TT-Forge Overview

TT-Forge is Tenstorrent’s MLIR-based compiler, designed to bridge modern machine learning frameworks with Tenstorrent’s NPU hardware. It enables models from PyTorch, TensorFlow, JAX, and ONNX to be compiled into optimized binaries that run natively on Tenstorrent accelerators. As the top layer in the Tenstorrent software stack, it provides a unified path from high-level frameworks to hardware-level execution.

To support this, TT-Forge is composed of several subcomponents—each responsible for a specific part of the compilation pipeline. These modules work together to ingest models, lower them into intermediate representations, and generate deployable binaries for NPU execution:

  • TT-Torch : Converts PyTorch models into StableHLO format using Torch-MLIR.
  • TT-XLA : Connects JAX models via OpenXLA (PJRT interface).
  • TT-Forge-fe : Accepts multiple formats and optimizes model graphs (built on Apache TVM).
  • TT-MLIR : Core compiler backend that lowers operations into Tenstorrent-specific instructions.
  • TT-TVM : Customized TVM integration for broader framework support.
note

Together, TT-Forge ensures compatibility and performance across frameworks, compiling models into deployable binaries for NPU execution.






Why is TT-Forge needed?

Tenstorrent NPUs have a distinct architecture and do not follow the CUDA-based GPU ecosystem. As a result, AI models developed for platforms like NVIDIA GPUs cannot run on Tenstorrent hardware without adaptation. TT-Forge addresses this gap by acting as a compiler that automatically transforms and optimizes models—originally written in PyTorch, TensorFlow, JAX, or ONNX—into a graph-based intermediate representation tailored for Tenstorrent accelerators. This eliminates the need for manual rewriting or low-level tuning, and enables seamless, high-performance deployment of existing AI models on Tenstorrent devices.


Getting Started

To get started with TT-Forge: