LLMs Corrupt Your Documents When You DelegatePhilippe Laban, Tobias Schnabel, Jennifer Neville (#Microsoft Research)Large Language Models (#LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation that the LLM will faithfully execute the task without introducing errors into documents. We introduce DELEGATE-52 to study the readiness of AI systems in delegated workflows. DELEGATE-52 simulates long delegated workflows that require in-depth document editing across 52 professional domains, such as coding, crystallography, and music notation. Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (#Gemini 3.1 Pro, #Claude 4.6 Opus, #GPT 5.4) #corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely. https://doi.org/10.48550/arXiv.2604.15597#agenti...
Related
🥼💰🧊 NIH freezes funds to Harvard and four other universities, but can’t tell them#AI Q: 🚫 Should research be political?🔬...
🥼💰🧊 NIH freezes funds to Harvard and four other universities, but can’t tell them#AI Q: 🚫 Should research be political?🔬 Scientific Research | 🏛️ Federal Policy | 📜 Academic Freedo...
A short list of things that capitalists and entrepreneurs the world over have tried hard to convince us to buy and becom...
A short list of things that capitalists and entrepreneurs the world over have tried hard to convince us to buy and become dependent on, in the past, and assured us had no downside ...
📰 Leica Cine Play 1 Review: Pricey but Worth Every PennyLeica’s first home entertainment projector is pricey, but like t...
📰 Leica Cine Play 1 Review: Pricey but Worth Every PennyLeica’s first home entertainment projector is pricey, but like the company’s cameras, the image quality is worth the splurge...