#Safety/Alignment

AI Blogs (RSS) news Mar 16

Quoting A member of Anthropic’s alignment-science team

The point of the blackmail exercise was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually sali...

Anthropic Safety/Alignment

24

Papers with Code paper Mar 15

Representation Alignment for Just Image Transformers is not Easier than You Think

Representation Alignment (REPA) has emerged as a simple way to accelerate Diffusion Transformers training in latent space. At the same time, pixel-space diffusion transformers such...

Safety/Alignment

21

GitHub Trending repo Mar 12

tongjingqi/AI-Can-Learn-Scientific-Taste: We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem.

We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learnin...

Safety/Alignment

69