Fully Offline Local AI assistant is hungry for GPU memoryRunning Qwen3.6 27B Q8_0 with 256k context in reasoning mode lo...

Fully Offline Local AI assistant is hungry for GPU memoryRunning Qwen3.6 27B Q8_0 with 256k context in reasoning mode loads around 50GB of the GPU memory and gives around 64 tokens/s for prompt+generation and that is quite good for a local model with that much context.Originally published on My Tech Blog:https://minox.cosmichive.com/experiences-from-setting-up-fully-offline-local-only-ai-assisted-workstation/#ai #aiagents #linux #llm

Read Original

Related