News Coverage

Silicon 100: Startups Worth Watching in 2024

(July 8, 2024) - EETimes - Founded in 2016, Quadric is developing machine-learning software and platforms for autonomous vehicles and robots. Its Chimera GPNPU is a licensable processor IP core that scales from 1 to 16 TOPS in a single core and intermixes scalar, vector and matrix code. In a multicore configuration, Chimera scales to hundreds of TOPS.

KANs Explode!

(June 13, 2024) Semiconductor Engineering - In late April 2024, a novel AI research paper was published by researchers from MIT and CalTech proposing a fundamentally new approach to machine learning networks – the Kolmogorov Arnold Network – or KAN. In the six weeks since its publication, the AI research field is ablaze with excitement and speculation that KANs might be a breakthrough that dramatically alters the trajectory of AI models for the better – dramatically smaller model sizes delivering similar accuracy at orders of magnitude lower power consumption – both in training and inference.

The Fallacy of Operator Fallback and the Future of Machine Learning Accelerators

(May 30, 2024) SemiWiki - Managing the interplay between NPU, DSP, and CPU requires complex data transfers and synchronization, leading to increased system complexity and power consumption. Developers must contend with different programming environments and extensive porting efforts, making debugging across multiple cores even more challenging and reducing productivity.

Will Domain-Specific ICs Become Ubiquitous?

(May 16, 2023) Semiconductor Engineering - Even low-cost SoCs for mobile phones today have CPUs for running Android, complex GPUs to paint the display screen, audio DSPs for offloading audio playback in a low-power mode, video DSPs paired with NPUs in the camera subsystem to improve image capture (stabilization, filters, enhancement), baseband DSPs — often with attached NPUs — for high speed communications channel processing in the Wi-Fi and 5G subsystems, sensor hub fusion DSPs, and even power-management processors that maximize battery life.

Fallback Fails Spectacularly

(May 16, 2024) Semiconductor Engineering - Our analysis of ConvNext on an NPU+DSP architecture suggests a throughput of less than 1 inference per second. Note that these numbers for the fallback solution assume perfect 100% utilization of all the available ALUs in an extremely wide 1024-bit VLIW DSP. Reality would undoubtably be below the speed-of-light 100% mark, and the FPS would suffer even more. In short, fallback is unusable.

Dealing With AI/ML Uncertainty

(April 25, 2024) Semiconductor Engineering - When it comes to the question of how to test and correct the model, the first thing most companies need to do is establish the realistic goal of what kind of error rate is acceptable, what severity of error needs to be eliminated completely, and then guardband the model using those criteria.

Embrace the New!

(March 14, 2024) Semiconductor Engineering - Perhaps those who are not Embracing The New are limited because they chose accelerators with limited support of new ML operators. If a team four years ago implemented a fixed-function accelerator in an SoC that cannot add new ML operators, then many newer networks – such as Transformers – cannot run on those fixed-function chips today. A new silicon respin – which takes 24 to 36 months – is needed.

Thanks for the Memories!

(February 15, 2024) Semiconductor Engineering - “I want to maximize the MAC count in my AI/ML accelerator block because the TOPs rating is what sells, but I need to cut back on memory to save cost,” said no successful chip designer, ever.

Choosing the Right Memory Configuration for AI/ML Accelerators

(February 15, 2024) AI2.news - Quadric’s Chimera GPNPU addresses the memory challenge with its intelligent approach. By analyzing data usage across ML graphs and leveraging advanced operator fusion techniques, Quadric’s technology eases memory bottlenecks.

As AI Takes Off, Chipmakers Pump Up Performance

(February 1, 2024) Electronic Design - The Chimera can provide strong ML inference performance while also running traditional C++ code. There’s no need for a partition code between multiple kinds of processors. The GPNPU uses a single pipeline to handle matrix and vector operations and scalar (control) code.

© Copyright 2024  Quadric    All Rights Reserved     Privacy Policy

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram