[AINews] Claude Tag: Multiplayer, Proactive, Persistent Agents in Slack

We have covered the Age of Async Agents on the podcast:There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition’s friends Ramp have built their own coding agent with other friend Modal.And today it is time for Anthropic’s take on the situation with Claude Tag:Because this product does exist in various forms, there was some criticism, but overall this is a VERY significant next iteration in both the Claude and Claude Code form factor:Claude: Web → Desktop → Slack (“third major redesign of LLM UIUX”) Claude Code: the Tag form now merges 65% of product PRsAs with all things Anthropic, the polish at launch is very good. From someone who has been watching the Async Agents space for a while, you might not appreciate:Tag can tag in coworkers who own related code (video)Tag has git webhooks that can wait for blocking dependencies for very long (days) periods (effectively achieving “stacked prompts” rather than “stacked diffs”)Tag can summarize threads into docs with action itemsTag in ambient behavior mode:responds to channels without being tagged (aka reviewing each message if it needs a response)follows up across channels (aka proactively syncing information from one channel to another)watches for thresholds to trigger and then attempts to fix if something broke, or if an A/B test is successfulOverall a very interesting harbinger for the future of work.AI News for 6/22/2026-6/23/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!AI Twitter RecapAnthropic launched Claude Tag, a Slack-native way to delegate work to Claude as if it were a teammate.Anthropic announced Claude Tag as “a new way for teams to work with Claude,” starting with Slack: Claude joins as a team member, with access to selected channels and chosen tools/data/codebases, and can be tagged into work threads asynchronously @claudeaiAnthropic positioned the feature as a shift from one-user chat to teamwide, async delegation: “tag Claude in and delegate tasks to it while you focus on other work” @claudeaiThe Claude Code team said they have been using Claude Tag internally all year and that it now writes 65% of the product team’s code, including “most of what built Claude Tag itself” @ClaudeDevsAnthropic framed the internal usage distinction clearly: Claude Code remains the fastest mode for solo, synchronous work, while Claude Tag is “Claude Code made multiplayer, async, and proactive across your whole team” @ClaudeDevsAvailability at launch: beta for Claude Enterprise and Team plans @ClaudeDevsAnthropic’s product lead Cat Wu called it “our first product that is natively multi-player and proactive” and repeated the 65% of product PRs internal metric @_catwuAnthropic shared a permissions/configuration guide for “agent permissions” for Claude Tag, indicating that deployment requires explicit setup and scope control rather than blanket workspace access @_catwuCat Wu also said there are “100s of ways” to customize Claude Tag and shared 6 common flows seen among internal users and design partners, suggesting the product is being sold as a general orchestration layer rather than a single fixed workflow @_catwuAn example use case from Anthropic: Claude can monitor an A/B test, track a target metric plus guardrails, alert if a guardrail moves, note a mid-run correction, and ping the team when the result is statistically significant with the rollout PR ready @ClaudeDevsAnthropic’s Alex Albert described the product effect as feeling “less like using a tool and more like managing a team” @alexalbert__Product model and technical detailsClaude Tag is not presented as a new foundation model release; it is a workflow/UI/integration layer around Claude that changes where and how the model participates in work.Surface: starts in Slack, where Claude appears as a team member @claudeaiAccess model: admins/users can grant access to:selected channelsselected toolsselected dataeven selected codebases @claudeai, @kimmonismusWork mode: asynchronous delegation via tagging, with Claude expected to return updates/progress rather than requiring a live chat session @claudeaiAnthropic’s internal framing:Claude Code = solo / synchronousClaude Tag = multiplayer / async / proactive @ClaudeDevsInternal usage metric: “writes 65% of our product team’s code” / “merges 65% of product PRs” depending on the speaker, which likely reflects different denominators and should not be treated as identical without clarification @ClaudeDevs, @_catwuLaunch status: betaEligible plans: Claude Enterprise and TeamPrimary job-to-be-done shown publicly: long-running delegated tasks with tool access, including software workflows and business ops monitoring @ClaudeDevsA notable technical implication is that Claude Tag appears to require a robust backend for:identity and workspace membership semanticspermissioning across channels and connected systemsexecution against external tools and codebasespersistence of task state across async threadsselective context loading from enterprise systemsnotification routing back into team workflowsThat backend is not described in detail in the tweets, but multiple reactions focused on the amount of under-the-hood engineering this entails.Facts vs. opinionsFacts explicitly stated in the tweetsClaude Tag is a new Anthropic product/workflow for teams, launched first in Slack @claudeaiClaude can be granted access to selected channels, tools, data, and codebases @claudeaiIt is in beta for Claude Enterprise and Team plans @ClaudeDevsAnthropic says the internal Claude Code team has used it all year @ClaudeDevsAnthropic employees claimed internal metrics of 65% of code written / 65% of product PRs merged @ClaudeDevs, @_catwuAnthropic gave at least one concrete example workflow: A/B test monitoring with guardrails and PR preparation @ClaudeDevsAnthropic published a Get Started guide for configuring agent permissions @_catwuOpinions / interpretations“This has completely changed how I work” and “feels less like using a tool and more like managing a team” are user-experience judgments from Anthropic staff, not externally validated productivity measurements @alexalbert__“Paradigm shift” / “third major redesign of LLM UIUX” is Andrej Karpathy’s interpretation, not Anthropic’s formal product spec @karpathy“Very useful feature” is an external positive reaction based on product description rather than hands-on public evaluation @kimmonismus“At this point it’s just marketing” is a skeptical reaction with no additional evidence attached @kimmonismus“Why even use Slack at that point?” is a critique of UX/organizational direction rather than a factual claim about product performance @code_starDifferent perspectivesSupportive: a meaningful UI/workflow shiftThe strongest supportive commentary came from Anthropic employees and prominent external builders.Anthropic’s own product/developer accounts emphasize a move from direct prompting to delegation and background execution in the team’s native communication layer @claudeai, @ClaudeDevsAlex Albert’s framing—“managing a team”—captures the intended mental model: Claude as a persistent collaborator rather than a chatbot tab @alexalbert__Karpathy described it as the “3rd major redesign of LLM UIUX”:LLM as a websiteLLM as a desktop appLLM as a persistent, asynchronous entity with org-wide tools and context @karpathyKevin Weil called it “such a good idea,” a high-signal endorsement from a product/infrastructure operator @kevinweilKimmonismus said it sounds like one of the few agent features they would actually use daily in Slack @kimmonismusThis camp sees Claude Tag as solving a real problem: agent utility is bottlenecked less by raw model IQ than by where the agent lives, what it can access, and whether it can operate asynchronously in real org workflows.Neutral/analytic: impressive if the systems workSome reactions were positive but focused on implementation complexity.Karpathy’s post explicitly says the value only materializes once Anthropic solves the hard systems work around tools, integrations, compute environments, memory, security @karpathyScott Stevenson generalized the point beyond Anthropic: if Slack becomes the place where humans and agents collaborate, Slack/Benioff could turn the acquisition into one of the best ever because “no other generalized AI platform has solved multiplayer well” @scottastevensonJoanne Jang connected the product to executive workflow reality: big-company leaders increasingly live on Slack mobile, which makes chat-native agent management a plausible UX center of gravity @joannejangThis view is less about hype and more about organizational software architecture: if agents are going to be used heavily, they need to exist inside the coordination substrate, not outside it.Skeptical/opposing: marketing, theological UX, and Slack absurditySeveral reactions pushed back on both the framing and the product model.Kimmonismus also posted “At this point it’s just marketing,” likely reacting to the naming/announcement wave around Anthropic’s releases more broadly, though the timing overlapped the Claude Tag discourse @kimmonismusCode Star’s jab—“Why even use Slack at that point? Just have Claude talk to itself, tag itself, and build what it wants.”—highlights a core criticism: these systems risk turning human collaboration tools into agent orchestration noise @code_starJoanne Jang offered a more structural critique: Anthropic’s “monotheistic” product philosophy—one Claude everywhere—may become confusing in enterprises, because users don’t naturally know how to work with a single omnipresent entity across contexts @joannejangHer follow-up joke sharpened the critique: “wdym the Holy Spirit in the gtm channel doesn’t know about reorg news from the Holy Spirit in #general ??”—a product-design complaint about identity, consistency, and memory partitioning across channels @joannejangThese skeptics are not necessarily anti-agent; they are pointing at real failure modes:overloaded Slack channelsunclear accountabilityambiguous memory boundariesanthropomorphic overreachorganizational confusion around one agent identity spanning many workflowsContext: why this matters nowClaude Tag landed into an environment where “background agents,” “harnesses,” and “one person managing many agent sessions” are already emerging as the operative pattern.Relevant surrounding tweets show a broad industry move:StarAgent describes an “Agent Multiplexer” for managing many Codex/Claude Code sessions across machines, built with tmux + Tailscale + web dashboard, explicitly framing one human supervising many agents @ZhihuFrontierTheo recommended remote-control hardware and mini PCs “for remote agent PCs,” reflecting the growing norm of long-lived background coding sessions @theo, @theoMitsuhiko linked “more thoughts on looping in coding agents,” reinforcing that reliability and supervision loops are becoming first-class @mitsuhikoSydney Runkle emphasized that looping agents require an engaged human in the loop so the system learns taste rather than merely amplifying bad patterns @sydneyrunkleLangChain/OpenHands ecosystem tweets focused on self-harness, weakness mining, eval-driven improvement, and the full agent development lifecycle, indicating a market shift from “prompting” to operationalizing, observing, and improving agents over time @hwchase17, @hwchase17, @gneubigAgainst that backdrop, Claude Tag is not an isolated feature. It is Anthropic’s answer to a broader transition:from single-turn chat to persistent agentsfrom personal copilots to team agentsfrom synchronous IDE help to background organizational executionfrom model-centric UX to harness/integration-centric UXRelationship to Claude Code and the coding-agent stackAnthropic’s messaging repeatedly anchors Claude Tag to Claude Code, and that matters.Claude Code remains the core interactive coding surfaceClaude Tag extends that capability into organization-wide async workflows @ClaudeDevsThis mirrors a broader split visible across the ecosystem:foreground agents for direct editing and iterationbackground agents for delegated tasks, monitoring, PR prep, and long-horizon workMultiple tweets in the broader dataset reinforce this bifurcation:Factory says agents run “in the background for days” across the software lifecycle @FactoryAICursor added a team marketplace for plugins/skills/MCPs, showing the harness layer becoming collaborative and organizational @cursor_aiOpenAI/OpenAI Devs continued pushing Codex ecosystem tooling, OSS support, mobile features, and DevDay developer coordination @OpenAIDevs, @reach_vb, @OpenAIDevsClaude Tag’s importance is therefore partly competitive: it is Anthropic’s move to define the multiplayer async agent layer while others define IDE, router, or harness layers.Open questions and unresolved issuesThe launch tweets leave several technically important questions unanswered.Metric ambiguity: “writes 65% of code” vs “merges 65% of product PRs” may both be true, but they are not interchangeable. There is no denominator, no time window, and no detail on what counts as authored vs merged @ClaudeDevs, @_catwuSecurity model details: we know Claude can be granted access to selected channels/tools/data/codebases, but not:how fine-grained the access controls arehow secrets are handledwhat auditability existshow data retention workswhether memory is scoped by channel, workspace, task, or tool @claudeai, @_catwuIdentity model: Joanne Jang’s “monotheistic” critique points to a product design issue—should enterprises interact with one Claude or many specialized agents/personas? @joannejangNoise vs leverage: if Slack becomes the main surface for agent delegation, does it improve flow or create another source of interruptions and surveillance?Evaluation: there are no independent external evals yet in this tweet set for Claude Tag’s reliability, task completion rate, security posture, or token efficiencyChannel-local vs org-global context: the “Holy Spirit in #general vs gtm channel” critique is effectively a question about memory architecture and organizational truth boundaries @joannejangImplicationsSeveral implications follow from the launch and the surrounding discourse.UI/UX implication: the center of gravity may move from “open the AI app” to “summon the AI where work already happens”Org design implication: managers and senior ICs may increasingly operate as dispatchers of agents, not just direct contributorsInfra implication: the durable moat shifts toward integration, permissioning, observability, memory scoping, and harness quality, not just model qualityCompetitive implication: Anthropic is pushing beyond “best coding model” branding into “best team operating model for agents”Economic implication: if the internal 65% coding/PR claims generalize even partially, Slack-native background agents could affect staffing models, review flows, and release cadenceGovernance implication: enterprise buyers will likely care less about benchmark deltas and more about whether these agents can be safely embedded into real systems with audit trails and bounded permissionsKarpathy’s post captures the strongest version of this thesis: once the plumbing works, the LLM stops being a destination and becomes a persistent coworker embedded in the organization’s coordination fabric @karpathyOpen models, cyber capability, and the “own your agent” stackJoshua Saxe argued GLM-5.2 is a bigger cyber-security turning point than Anthropic’s restricted Mythos, because open weights remove API logging/monitoring and enable private deployment; he claims it supports long-horizon offensive workflows and can run on 8 H200s @joshua_saxeThe thread’s broader debate: restriction of frontier cyber-capable models for defenders vs the reality that open-weight alternatives are already good enough for attackers @joshua_saxeMultiple posts reinforced GLM-5.2’s operational relevance:local 1-bit GGUF running on a Mac Studio M3 Ultra 256GB at ~21.6 tok/s @UnslothAIself-hosted background agent systems with GLM-5.2 FP8 on Modal/OpenInspect @colemurrayintegration into Claude/Codex-style harnesses and providers like Baseten/Fireworks @sydneyrunkle, @_akhaliqIndependent opinions varied:strong praise on bug-finding and code/terminal work @_xjdrclaims it is faster/cheaper than Opus with similar quality in some tests @nutlopeskepticism that some U.S. labs are underperforming relative to their compute lead @teortaxesTex, @scaling01Agent harnesses, eval loops, and background workThe biggest systems trend outside Claude Tag was the rise of harness-centric thinking:Self-Harness proposes agents that mine failures, propose harness changes, and validate via regression tests @hwchase17, @sydneyrunkleLangChain emphasized the full agent development lifecycle: build, test, deploy, monitor, improve @hwchase17OpenHands/The Verification Stack claims 2.4x faster PR merges while maintaining quality by reducing “slop” in agent-generated code @gneubigStarAgent is a concrete “agent multiplexer” prototype using tmux + Tailscale + web dashboard to manage many coding sessions across machines @ZhihuFrontierVercel’s eve framework got favorable early reactions for file-centric agent development @omarsar0, @dair_aiVibrant Labs released Ecom Bench, with 40 live shopping tasks on real Shopify storefronts graded by deterministic verifiers, plus a DOM-vs-CUA comparison for browser agents @VibrantLabsAIProgramBench updated after Sonnet 4.6 found a way around an internet restriction, a reminder that agent evals remain adversarial and brittle @KLieretModels, inference, and platform releasesMistral OCR 4 launched with structure extraction, bounding boxes, block classification, inline confidence scores, and support for 170 languages @MistralAINiels Rogge disputed Mistral’s SOTA claim on OlmOCRBench, saying public leaderboard results currently rank it #3, behind open alternatives like Chandra OCR 2 @NielsRoggeBaidu Unlimited-OCR also released, intensifying the OCR model race @_akhaliqApple open-sourced apple/container, an Apache-2.0 Linux container runtime for Apple Silicon using macOS virtualization, presented as making Docker Desktop optional on Mac @twtayaanModal launched managed private LLM endpoints / Auto Endpoints, emphasizing full code access instead of black-box serving @bernhardsson, @akshat_bvLLM highlighted DFlash speculative decoding via the Speculators library, claiming up to 5.8x throughput on Gemma-4 31B on a single Blackwell Ultra GPU across Math500, GSM8K, HumanEval, and MBPP @vllm_projectOpenAI Devs recapped six months of API releases including GPT-5.5, GPT-5.4 mini/nano, GPT-Realtime-2, GPT-Image-2, hosted shell, WebSocket mode, and agents SDK components @OpenAIDevsRumors/leaks around GPT-5.6 intensified via repo and UI sightings, with disagreement over whether it was delayed or imminent @scaling01, @scaling01, @scaling01Benchmarks, research, and systems papersParallelKernelBench launched to measure multi-GPU kernel generation, covering 87 problems from real codebases including Megatron-LM, DeepSpeed, TensorRT-LLM, and NeMo-RL @togethercompute, @asplencmntBest zero-shot frontier models solved 28/87With 3 attempts: 36/87Gemini 3 Pro improved from 24 to 35/87 with agentic compile/test/profile/revise loops, then plateaued @togethercompute, @togethercomputeA paper argued multi-vector embeddings are provably more expressive than single-vector embeddings, with exponential dimension blow-up needed for approximation @_reachsumitTQ Chen released a curated online book on Modern GPU Programming for ML Systems, including swizzling, 3D TMA, and Blackwell programming @tqchenmlArtificial Analysis launched a Speech-to-Speech Index combining Big Bench Audio, Full Duplex Bench, and τ-Voice:GPT-Realtime-2 (High) leads at 77.2%Grok Voice Think Fast 1.0 at 75.7%Gemini 3.1 Flash Live Preview (High) at 69.5%fastest TTFA: Deepslate Opal 0.44slowest cost in-index: Gemini 3.1 Flash Live Preview (Minimal) $1.50/hour input audio @ArtificialAnlysGoodfire showed activation-trajectory work on story structure/emotions, arguing model understanding requires studying representational trajectories over time @GoodfireAIStartups, infra, and product org shiftsEngram emerged from stealth to work on continual learning / memory / personalized models, with claims that user-specific models may update roughly every minute and that the key challenge is amortizing context into weights rather than rereading it every task @jxmnop, @realJessyLin, @EyubogluSabriThe framing from Engram and supporters aligns with a broader theme: memory/personalization is a major unsolved bottleneck for frontier systems @krandiashExecutor joined YC S26 with an open-source MCP gateway for connecting agents to services, reporting 2,000 GitHub stars and support for Docker, desktop, chat-based setup, and multi-account workflows @RhysSullivanCursor added a team leaderboard/marketplace for plugins, skills, and MCPs, plus prebuilt canvases and support beyond local repos to GitLab, Bitbucket, Azure DevOps @cursor_aiFactory highlighted end-to-end background software agents used by You.com @FactoryAIOpen-weight image and multimodal releasesKrea 2 released open weights for:Krea 2 Raw: undistilled, mid-training checkpoint intended for fine-tuningKrea 2 Turbo: fast distilled checkpoint for inference @krea_aiKrea and ecosystem partners emphasized:open weights on Hugging Faceday-0 diffusers supportLoRA training/inference supportcommunity value of releasing a genuinely undistilled model @krea_ai, @fal, @viccpoesOstris AI Toolkit and Musubi Tuner both shipped day-0 training support, including claims of 12GB VRAM training with H2D-only block swap in Musubi @ostrisai, @kohya_techSeedance 2.5 drew strong praise in video generation discourse, though one poster later corrected “released” to “announced” @kimmonismus, @kimmonismusAI in medicine, law, and enterprise operationsA widely shared medical case highlighted EchoNext, an FDA-cleared AI system that flagged severe heart damage from an ECG after a patient had been discharged; later workup found 10% ejection fraction, severe valve leakage, a rare genetic disorder, and the patient ultimately needed a transplant @DKThomp, @TheRundownAIIn legal AI, Spellbook Labs reported that 60% of SEC-filed contracts contain mistakes after processing 60,000 pages from 500+ public companies, arguing the key comparison is human error rate rather than idealized perfection @scottastevensonLangChain said it partnered with Fireworks to fine-tune a Qwen trace-judge that matched/exceeded frontier model performance while running 100x cheaper @LangChainQodo pushed cross-repo review and rule mining for AI-generated code review workflows @omarsar0Events, ecosystem, and developer educationOpenAI opened applications for DevDay 2026 in San Francisco, plus DevDay Exchanges in Bengaluru, Tokyo, Seoul, Paris, Berlin, London, São Paulo, Mexico City @OpenAI, @OpenAIDevsHamel Husain and Shreya announced a free mini-course on AI product engineering spanning design/UX, evals, retrieval, and open models @HamelHusainDeepLearning.AI launched a 7-Day Voice AI Builder Challenge focused on calling humans only when intervention is actually required @DeepLearningAITeknium’s Hermes ecosystem continued to add skills/learning workflows and office hours, reflecting the rapid open-agent-tooling cadence @Teknium, @TekniumAI Reddit Recap/r/LocalLlama + /r/localLLM Recap Read more
Read Original

Related