Papers with Code paper 1d ago

Dynamic Linear Attention

The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the quadratic complexity of standard attention, motivating the adoption of linear a...