Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems.
Read more: Automated Alignment Researchers: Using large language models to scale scalable oversight
Read more: Automated Alignment Researchers: Using large language models to scale scalable oversight