The setup The starting line was 43 tokens per second decode on vanilla llama.cpp. The...
Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B
The setup The starting line was 43 tokens per second decode on vanilla llama.cpp. The...
Day 35 of TechFromZero. Every voice assistant you've ever used is the same three Lego bricks. Let's snap them together in a single afternoon — using only free, browser-native APIs.
I pulled a Quadro M4000 out of a used Dell Precision T5820, dropped in an RTX 3090 Ti, and turned the...
The setup The starting line was 43 tokens per second decode on vanilla llama.cpp. The...