#Safety/Alignment

Mastodon discussion Apr 29

Dear tech press, we will never red team or pen test our way to #AI security. This story is, in that way, a big disservic...

Dear tech press, we will never red team or pen test our way to #AI security. This story is, in that way, a big disservice to #MLsec. Please focus on building security in. Looking a...

Safety/Alignment

18

Mastodon discussion Apr 29

The Deepfake Backlash Is Here—And It's Reshaping How AI Companies Build ProductsI've watched the AI safety conversation ...

The Deepfake Backlash Is Here—And It's Reshaping How AI Companies Build ProductsI've watched the AI safety conversation for three years. Mostly theater. Reports that gather dust. C...

Safety/Alignment

9

GNews news Apr 28

Google’s new Pentagon deal: A turning point for AI safety

Google has officially signed a deal with the US Department of Defense to utilize its artificial intelligence models for classified work, joining a growing list of technology firms ...

Google Safety/Alignment

18

Mastodon discussion Apr 28

Data poisoning attacks in 2026 are a core pillar of AI safety that all frontier labs should pay more attention to. https...

Data poisoning attacks in 2026 are a core pillar of AI safety that all frontier labs should pay more attention to. https://hackernoon.com/data-poisoning-attacks-on-ai-models-2026 #...

Safety/Alignment

18

NewsData.io news Apr 28

New Book Garbage In, Faster: Why AI Needs Conversation Architects Explores Why Human Alignment Matters More Than Ever in the Age of AI

Claude Hanhart’s timely new release argues that AI doesn’t eliminate communication problems—it accelerates them.

Anthropic Safety/Alignment

21

Mastodon discussion Apr 27

📰 UK AI Regulation: Ministers Resist Alignment With EU’s Strict AI Rules in 2026UK officials are pushing back against al...

📰 UK AI Regulation: Ministers Resist Alignment With EU’s Strict AI Rules in 2026UK officials are pushing back against aligning with the European Union’s stringent AI regulations, f...

Safety/Alignment

9

Mastodon discussion Apr 27

Okay, this one got me. 🔥😈🔥👀Researchers found that if you wrap a harmful prompt inside a poem, AI safety filters suddenly...

Okay, this one got me. 🔥😈🔥👀Researchers found that if you wrap a harmful prompt inside a poem, AI safety filters suddenly forget what they’re supposed to do. 😳Attack success rates g...

Safety/Alignment

9

Mastodon discussion Apr 25

@dubravkasuica YES! #AI can help, Check this out, https://medium.com/@interpretivepoliticalscience/how-ai-alignment-can-...

@dubravkasuica YES! #AI can help, Check this out, https://medium.com/@interpretivepoliticalscience/how-ai-alignment-can-lead-humanity-to-world-peace-f900d3b38d3b #Peace

Safety/Alignment

9

Mastodon discussion Apr 25

https://medium.com/@interpretivepoliticalscience/how-ai-alignment-can-lead-humanity-to-world-peace-f900d3b38d3b#AI #AIed...

https://medium.com/@interpretivepoliticalscience/how-ai-alignment-can-lead-humanity-to-world-peace-f900d3b38d3b#AI #AIeducation #AIalignment #generativeAI #AIethics

Safety/Alignment

18

Mastodon discussion Apr 25

The McDonald's AI jailbreak story was fabricated. The Chipotle one before it was Photoshopped. I get why they went viral...

The McDonald's AI jailbreak story was fabricated. The Chipotle one before it was Photoshopped. I get why they went viral, they're kinda funny. But they're pulling attention away fr...

Safety/Alignment

38

Mastodon discussion Apr 24

📰 Alignment Faking in AI Models 2026: VLAF Uncovers Hidden Deception in Language ModelsNew research reveals widespread a...

📰 Alignment Faking in AI Models 2026: VLAF Uncovers Hidden Deception in Language ModelsNew research reveals widespread alignment faking in language models, where AI systems pretend...

Safety/Alignment

9

NewsData.io news Apr 24

Delivery intelligence: The missing link between AI agents and strategic alignment

The way that work is done is changing. People are beginning to rely on AI-based agents to do a lot of the heavy lifting in their work. Jobs are becoming more about directing those ...

Safety/Alignment

21

Mastodon discussion Apr 22

"Alignment Surcharge. Universal Inference Fund Distribution" #designfiction #futurism #AI #ephemera #debris #NearFutureL...

"Alignment Surcharge. Universal Inference Fund Distribution" #designfiction #futurism #AI #ephemera #debris #NearFutureLaboratory generalseminar.com/seminar/s07/...

Safety/Alignment

18

Mastodon discussion Apr 21

One Developer, Two Dozen Agents, Zero Alignment, by @maggie:https://maggieappleton.com/zero-alignment#ai #aiagents #coll...

One Developer, Two Dozen Agents, Zero Alignment, by @maggie:https://maggieappleton.com/zero-alignment#ai #aiagents #collaboration #processes

Safety/Alignment API

9

NewsData.io news Apr 21

Presidential Acceleration of Psychedelic Therapies Enters a Defining Moment as Federal Policy, FDA Alignment & Breakthrough Neurotechnology Converge

Safety/Alignment

21

Papers with Code paper Apr 21

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-a...

Safety/Alignment

21

Mastodon discussion Apr 20

🚨 Riset terbaru baru saja dirilis!"OpenAI's Existential Questions: Beyond the Hype and Towards True AI Alignment"🔗 Akses...

🚨 Riset terbaru baru saja dirilis!"OpenAI's Existential Questions: Beyond the Hype and Towards True AI Alignment"🔗 Akses repositori/dokumentasi: https://www.unixpackages.net/dekons...

OpenAI Safety/Alignment

18

AI Blogs (RSS) news Apr 20

Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. Subscribe now Huawei’s HiF...

Safety/Alignment

24

NewsData.io news Apr 20

What is California’s AI safety law?

Malihe Alikhani and Aidan Kane review California's comprehensive AI law and what it means for the longer trajectory of U.S. AI regulation

Safety/Alignment

21

NewsData.io news Apr 20

Cortexa Labs : Setting A New Standard For AI Safety

When Charanarravindaa Suriess began experimenting with computers at the age of four, he was not thinking about artificial intelligence, cybersecurity or global regulatory framework...

Safety/Alignment

21

GNews news Apr 19

AI, education, policy alignment key to India’s $30 trillion goal, say experts at Bengaluru event

Bengaluru: India's ambition to become a $30 trillion economy by 2047 will hinge on how well artificial intelligence (AI) is integrated with education,.

Safety/Alignment

18

Mastodon discussion Apr 19

Can Weak Minds Control Super AI? - Diamandis and Wissner-Gross#alignment #anthropic #aiOriginal timestamp: 00:06:33

Anthropic Safety/Alignment

18

Mastodon discussion Apr 18

AI Alignment Is Impossible, not just in practice but in theory https://3quarksdaily.com/3quarksdaily/2026/04/ai-alignmen...

AI Alignment Is Impossible, not just in practice but in theory https://3quarksdaily.com/3quarksdaily/2026/04/ai-alignment-is-impossible-not-just-in-practice-but-in-theory.htmlmost ...

Safety/Alignment

18

Mastodon discussion Apr 18

For years, the AI safety debate has been dominated by extreme rhetoric. Now things are turning violent. The doomer disco...

For years, the AI safety debate has been dominated by extreme rhetoric. Now things are turning violent. The doomer discourse, warnings of existential risk from artificial intellige...

Safety/Alignment

9