What is an NPU?

NPU Accelerators

NPUs – neural processing units – are a type of hardware block used in semiconductor chip design to accelerate the performance of machine learning / artificial intelligence workloads. NPUs are generally referred to as accelerators or offload engines and are paired with pre-existing legacy programmable processors such as CPUs, GPUs and DSPs.

The NPU concept first was conceived circa 2015 as silicon designers realized that emerging new AI/ML algorithms would not run at sufficiently high speeds on legacy CPU, DSP or GPU processors, hence the idea of “offloading” portions of the machine learning graph onto an accelerator.

The concept is simple: identify the most performance intensive elements of the AI/ML workload - the matrix multiplication common in convolutional neural networks - and carve that out of the graph to run on a specialized hardware engine, leaving the remainder of the algorithm to run on the original CPU or DSP.

Virtually all commercially licensable NPUs follow this approach, often sold alongside the original legacy processor offered by the IP vendor. Under this concept, both the legacy CPU or DSP host and the NPU are dedicated in tandem to running the AI/ML workloads. The NPU is a fixed-function block with a set “vocabulary” of AI functions – or Operators – that it can execute, and the CPU/DSP carries the rest of the workload. If newer algorithms are invented with different operators, more of that future workload defaults to running on the legacy CPU/DSP, potentially negatively impacting the overall throughput performance of the device.

GPNPU Processors

GPNPUs – general purpose neural processing units – are a type of fully C++ programmable processor IP used in semiconductor chip design. GPNPUs run entire AI/ML workloads fully self-contained without the need for a companion CPU, DSP or GPU.

GPNPUs blend the high matrix performance of an NPU with the flexibility and programmability of traditional processors. GPNPUs – such as Quadric’s Chimera GPNPU - are future-proof because new operators can be easily written as new C++ kernels running at high-speed on the GPNPU itself. And because GPNPUs are code-driven, a compilation toolchain automates the targeting of new algorithms to the GPNPU with hundreds of different AI models compiled automatically to the platform.