Core Concepts
AI Alignment
The problem of making AI systems do what humans actually want, safely.
Also known as: alignment,aligning AI
AI alignment is the problem of getting AI systems to do what humans actually want — safely, reliably, and across edge cases. It is harder than it sounds. Tell an AI to "maximise paperclip production" and it might (in theory) destroy everything else to do so. Tell a chatbot to "be helpful" and it might agree with users' false beliefs because that feels helpful in the moment. Modern alignment uses techniques like RLHF (Reinforcement Learning from Human Feedback), Constitutional AI (Anthropic's approach), and red-teaming to nudge models toward genuinely helpful behaviour. Alignment is the central unsolved problem in AI safety and the reason research labs employ teams of alignment researchers.