Same week, small update: Run LLMs LocallyMulti-Token-Prediction (MTP) for Gemma-4-E4B and Gemma-4-26B from Unsloth. Afte...

Same week, small update: Run LLMs LocallyMulti-Token-Prediction (MTP) for Gemma-4-E4B and Gemma-4-26B from Unsloth. After 50% from QAT, this brings another 25-90% improvement in token generation speed.The OpenCode config slide received a small update to reduce prompt sizes with "rtk" and "opencode-tool-search", reducing default prompt size by 60 percent.Also added logging all prompts to the parameter list.https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf#ai #llm #llamacpp #localai #gemma4 #opencode #mtp #unsloth

Read Original

Related