We Smoked NVIDIA’s Blackwell, Says Cerebras

[ad_1]

AI hardware maker Cerebras announced on Wednesday that its systems have outperformed NVIDIA’s DGX B200 with 8 Blackwell GPUs’ [Graphic Processing Units] output token speed on Meta’s Llama 4 Maverick Model.

Cerebras achieved an output token speed of over 2,500 tokens/sec, whereas NVIDIA demonstrated only 1,000 tokens per second.

However, NVIDIA outperformed systems from Groq, AMD, Google, and other vendors. “Only Cerebras stands – and we smoked Blackwell,” said Cerebras in a post on X.

Cerebras just beat NVIDIA Blackwell
Last week: Blackwell hit 1,000 t/s on Llama 4.
Today: Cerebras hit 2,500 t/s on the same model, same benchmarks by @ArtificialAnlys
Blackwell smoked Groq, AMD, Google – everyone.
Only Cerebras stands – and we smoked Blackwell. pic.twitter.com/2Nd0W8ttOB

— Cerebras (@CerebrasSystems) May 28, 2025

Based in the United States, Cerebras manufactures hardware specifically designed for AI inference, using a trained AI model to make decisions. The company’s Wafer-Scale Engine (WSE) technology offers faster inference/output tokens speed than traditional GPUs.

“We’ve tested dozens of vendors, and Cerebras is the only inference solution that outperforms Blackwell for Meta’s flagship model,” said the company.

Last month, Meta announced a partnership with Cerebras to offer developers access to inference speeds up to 18 times faster than GPU-based solutions.

While GPUs are widely used for training AI models, which require a vast amount of data and compute, dedicated solutions for inferencing are being developed at a large scale. Cerebras, Groq, and SambaNova are some of the other companies working on such solutions.

SambaNova’s SN40L custom AI chip, which features their Reconfigurable Dataflow Unit architecture. Manufactured on TSMC’s 5 nm process, the SN40L combines DRAM, HBM3, and SRAM on each chip.

On the other hand, Groq offers AI inference processors called LPUs, or language processing units. Instead of relying on external memory like GPUs, LPUs keep all the model parameters directly within their chips.

[ad_2]

Source link