Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

The setup The starting line was 43 tokens per second decode on vanilla llama.cpp. The...

Read Original

Related