DiffusionGemma

DiffusionGemma Last May Google briefly released an experimental Gemini Diffusion model. I tried the preview at the time and recorded it running at 857 tokens/second. It was an exciting model, but Google made no further announcements about it. That research has returned in the best possible way: as a new open weight (Apache 2 licensed) Gemma model, google/diffusiongemma-26B-A4B-it. NVIDIA are currently hosting the model for free on their NIM cloud API. I used that API to generate this pelican, which took 4.4s (according to time uv run generate.py) to return 2,409 tokens - so at least 500 tokens/second. Via Hacker News Tags: google, ai, generative-ai, llms, nvidia, pelican-riding-a-bicycle, gemma, llm-release, llm-performance
Read Original

Related

TechCrunch AI news 1h ago

The AI layoff wave is becoming a powder keg

What makes this combustible: at the very moment that tens of thousands of workers are being shown the door, a small cohort of AI insiders is becoming wealthy on a scale that's hard...

AI Blogs (RSS) news 6h ago

Quoting Julia Evans

[...] Instead, I picture a specific person and I just write for them. Often this person is "me, but 3 years ago" or a good friend. — Julia Evans, write for 1 person Tags: writing, ...