Chimera GPNPU Supports Large Language Modules

Run the newest generative transformer models with a simple software port, not a silicon respin!

Porting LLama2 LLM In record time

In July of 2023, Meta announced the Llama2 large language model (LLM) designed to be run on-device, not in the cloud. This exciting breakthrough model prompted numerous SoC vendors and IP core vendors to announce intent to support Llama2 - far into the future. Early announcements promised Llama2 support in 2024, or a new IP core available for early 2024 tape-out and thus 2025 shipment.

Chimera is a fully programmable NPU - programable in C++. A small team of 4 Quadric engineers ported and optimized Llama2 in just under 5 weeks! If YOUR chip had a Chimera GPNPU you could have raced to the market a half-year faster than your competition!
Porting a new ML workload to Chimera cores is fast - because its done by compiling an ONNX graph into C++ using our state of the art Chimera Graph Compiler.  Yet Chimera processors also deliver ML Inference efficiency dramatically higher than what CPUs or GPUs provide. Chimera GPNPUs uniquely combine the easy programmability of a processor with the ML efficiency of a NPU "accelerator".

Experience Chimera yourself

Sign-in to your existing account, or sign-up for a new Quadric DevStudio account today and see it for yourself.

© Copyright 2024  Quadric    All Rights Reserved     Privacy Policy

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram