Google brings Multi-Token Prediction drafters to Gemma 4: 3x speedup without quality loss

One month after the introduction of Gemma 4, Google is now releasing Multi-Token Prediction (MTP) drafters for the Gemma 4 model family. These drafters use a specialized speculative decoding architecture, which enables large models such as Gemma 4 26B Mixture-of-Experts (MoE) and 31B Dense to achieve up to a threefold speedup in inference. Importantly, this performance boost comes without any drop in output quality or reasoning accuracy. The MTP method operates by decoupling the token generation process from verification. While the primary, heavy model completes the final verification of each predicted token, a lighter drafter model predicts multiple future tokens in parallel. This approach takes advantage of idle compute resources, allowing systems to process several tokens using the drafter while the primary model would otherwise be busy with only one. Following this architectural update, developers can reduce latency dramatically for near real-time chat, voice communication applicat...

Read Original

Related

Product Hunt tool 15h ago

Notchkin

A notes app that lives in your MacBook's notch. Discussion | Link

Product Hunt tool 20h ago

Agent 37

Give every customer their own Hermes or OpenClaw agent Discussion | Link

Product Hunt tool 1d ago

Laguna by Poolside

Foundation models for agentic coding and long-horizon work Discussion | Link