(September 26, 2023) SemiWiki - Assuming that Baby Llama is a good proxy for an edge based LLM, Quadric made the following interesting points. First, they were able to port the 15 million parameter network to their Chimera core in just 6 weeks. Second, this port required no hardware changes, only some (ONNX) operation tweaking in C code to optimize for accuracy and performance. Third they were able to reach 225 tokens/second/watt, using a 4MB L2 memory, 16 GB/second DDR, a 5nm process and 1GHz clock. And fourth the whole process consumed 13 engineer weeks.