#Safety/Alignment

Mastodon discussion Mar 29

Interesting dichotomy in AI safety today: MIT researchers are developing systems that admit uncertainty (“Humble AI”) to...

Interesting dichotomy in AI safety today: MIT researchers are developing systems that admit uncertainty (“Humble AI”) to prevent hallucinations. Conversely, new studies show agents...

Safety/Alignment

18

Mastodon discussion Mar 28

The Register: Telling an AI model that it’s an expert programmer makes it a worse programmer . “For alignment-dependent ...

The Register: Telling an AI model that it’s an expert programmer makes it a worse programmer . “For alignment-dependent tasks, like writing, role-playing, and safety, personas do i...

Safety/Alignment

18

Mastodon discussion Mar 28

📰 Anthropic vs OpenAI: AI Ethics Battle for the Future of AI Safety (2026)Anthropic has emerged as a direct counterpoint...

📰 Anthropic vs OpenAI: AI Ethics Battle for the Future of AI Safety (2026)Anthropic has emerged as a direct counterpoint to OpenAI, framing its mission as an ethical antidote to wh...

OpenAI Anthropic Safety/Alignment

9

Mastodon discussion Mar 28

📰 Anthropic’s 2026 IPO at Risk: Supply Chain Battles and AI Safety Dilemmas Behind Claude ModelAnthropic is grappling wi...

📰 Anthropic’s 2026 IPO at Risk: Supply Chain Battles and AI Safety Dilemmas Behind Claude ModelAnthropic is grappling with supply chain restrictions and internal safety protocols a...

Anthropic Safety/Alignment

9

Mastodon discussion Mar 28

Judge blocks Pentagon from blacklisting Anthropic over AI safety stance: A federal judge today blocked the Trump adminis...

Judge blocks Pentagon from blacklisting Anthropic over AI safety stance: A federal judge today blocked the Trump administration from designating Anthropic a supply chain risk, ruli...

Anthropic Safety/Alignment

24

YouTube video Mar 28

AI News: China Boycotts NeurIPS Over US Sanctions | OpenAI's $100K AI Safety Bug Bounty | ZGC For...

Nia and Kai break down the latest AI news! 3 stories in today's episode. ⏱ Timestamps 0:00 China Boycotts NeurIPS Over US ...

OpenAI Safety/Alignment

15

Mastodon discussion Mar 27

#Retail doesn’t just run on data. It depends on alignment across pricing, inventory, and product systems. That alignment...

#Retail doesn’t just run on data. It depends on alignment across pricing, inventory, and product systems. That alignment is harder than it looks.A few thoughts after a conversation...

Safety/Alignment

24

ArXiv paper Mar 26

RefAlign: Representation Alignment for Reference-to-Video Generation

Reference-to-video (R2V) generation is a controllable video synthesis paradigm that constrains the generation process using both text prompts and reference images, enabling applica...

Safety/Alignment

18

ArXiv paper Mar 26

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge,...

Multimodal Safety/Alignment

18

ArXiv paper Mar 25

Cross-Modal Prototype Alignment and Mixing for Training-Free Few-Shot Classification

Vision-language models (VLMs) like CLIP are trained with the objective of aligning text and image pairs. To improve CLIP-based few-shot image classification, recent works have obse...

Safety/Alignment

18

Mastodon discussion Mar 25

#ai safety efforts seem pretty useless as long as we cant inspect and interpret what happens during inferenceIt may seem...

#ai safety efforts seem pretty useless as long as we cant inspect and interpret what happens during inferenceIt may seem like an #llm recognizing it is being tested and responds di...

Safety/Alignment

18

Mastodon discussion Mar 25

📰 New Bernie Sanders AI Safety Bill Would Halt Data Center ConstructionThe US senator said on Tuesday that a moratorium ...

📰 New Bernie Sanders AI Safety Bill Would Halt Data Center ConstructionThe US senator said on Tuesday that a moratorium would give lawmakers time to "ensure that AI is safe." Alexa...

Safety/Alignment

24

ArXiv paper Mar 25

Semantic Alignment across Ancient Egyptian Language Stages via Normalization-Aware Multitask Learning

We study word-level semantic alignment across four historical stages of Ancient Egyptian. These stages differ in script and orthography, and parallel data are scarce. We jointly tr...

Safety/Alignment

18

ArXiv paper Mar 25

InstanceRSR: Real-World Super-Resolution via Instance-Aware Representation Alignment

Existing real-world super-resolution (RSR) methods based on generative priors have achieved remarkable progress in producing high-quality and globally consistent reconstructions. H...

Safety/Alignment

18

Papers with Code paper Mar 25

BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment

Understanding animal species from multimodal data poses an emerging challenge at the intersection of computer vision and ecology. While recent biological models, such as BioCLIP, h...

Benchmark Safety/Alignment

21

GNews news Mar 25

US judge says Pentagon's blacklisting of Anthropic looks like punishment for its views on AI safety

A U.S. judge said on Tuesday that the Pentagon's blacklisting of Anthropic looked like an effort to punish the artificial intelligence lab for going public with its concerns about ...

Anthropic Safety/Alignment

18

Papers with Code paper Mar 24

ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

Video-based world models offer a powerful paradigm for embodied simulation and planning, yet state-of-the-art models often generate physically implausible manipulations - such as o...

LLM Safety/Alignment Robotics

21

Papers with Code paper Mar 23

Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

While Vision-Language Models (VLMs) have achieved remarkable performance, their Euclidean embeddings remain limited in capturing hierarchical relationships such as part-to-whole or...

Multimodal Safety/Alignment

21

Papers with Code paper Mar 22

RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

Improving embodied reasoning in multimodal-large-language models (MLLMs) is essential for building vision-language-action models (VLAs) on top of them to readily translate multimod...

Multimodal Safety/Alignment

21

GitHub Trending repo Mar 21

dubermandeer/Worm-GPT-LLM-2026: High-performance C++ execution engine for LLM red-teaming and prompt engineering. Deploy dynamic jailbreak payloads, bypass alignment guardrails, and utilize free autonomous uncensored conversational logic locally.

High-performance C++ execution engine for LLM red-teaming and prompt engineering. Deploy dynamic jailbreak payloads, bypass alignment guardrails, and utilize free autonomous uncens...

LLM Safety/Alignment

60

ArXiv paper Mar 17

Prompt Programming for Cultural Bias and Alignment of Large Language Models

Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target population...

Safety/Alignment

18

GitHub Trending repo Mar 17

lukasHoel/video_to_world: Our method reconstructs 3D worlds from video diffusion models using non-rigid alignment to resolve inherent 3D inconsistencies in the generated sequences.

Our method reconstructs 3D worlds from video diffusion models using non-rigid alignment to resolve inherent 3D inconsistencies in the generated sequences.

Safety/Alignment

55

ArXiv paper Mar 17

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

Pixel-space diffusion has recently re-emerged as a strong alternative to latent diffusion, enabling high-quality generation without pretrained autoencoders. However, standard pixel...

Safety/Alignment

18

AI Blogs (RSS) news Mar 16

Quoting A member of Anthropic’s alignment-science team

The point of the blackmail exercise was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually sali...

Anthropic Safety/Alignment

24