Dev.to tutorial Tutorials 54m ago

Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

by Ian L. Paterson

The setup The starting line was 43 tokens per second decode on vanilla llama.cpp. The...

Read Original

Metadata

Devto Id: 3694886
Reading Time Minutes: 18

Dev.to tutorial 42m ago

I Built a Voice AI Tutor in 200 Lines of Code (and Zero Backend)

Day 35 of TechFromZero. Every voice assistant you've ever used is the same three Lego bricks. Let's snap them together in a single afternoon — using only free, browser-native APIs.

Dev.to tutorial 44m ago

Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

I pulled a Quadro M4000 out of a used Dell Precision T5820, dropped in an RTX 3090 Ti, and turned the...

Dev.to tutorial 54m ago

Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

The setup The starting line was 43 tokens per second decode on vanilla llama.cpp. The...

Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

Metadata

Related

I Built a Voice AI Tutor in 200 Lines of Code (and Zero Backend)

Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B