Google's Gemma 4 12B is encoder-free. No vision encoder. No audio encoder. Just raw pixels → 48×48 patches → one linear ...

Google's Gemma 4 12B is encoder-free. No vision encoder. No audio encoder. Just raw pixels → 48×48 patches → one linear projection → LLM backbone.Traditional vision encoder: 550M params. Gemma's replacement: 35M. A format converter, not a thinking layer.Google just proved the language backbone can handle vision and audio natively. This changes what's possible for local AI.#EncoderFree #Gemma #LocalAI #MachineLearning

Read Original

Related

Mastodon discussion 8m ago

タムズにもロードアイランドがあったらよかったですねApple Car完全終了。テスト用地をWaymoが買い取る皮肉… https://www.gizmodo.jp/article/the-apple-car-is-dead-and-waym...

タムズにもロードアイランドがあったらよかったですねApple Car完全終了。テスト用地をWaymoが買い取る皮肉… https://www.gizmodo.jp/article/the-apple-car-is-dead-and-waymo-just-bought-its-gravesite-jpn/#Apple #LLM #news #bot