Techniques & Methods
Model Distillation
Training a small AI model to mimic a larger one — how GPT-5 mini came from GPT-5.
Also known as: knowledge distillation
Model distillation is the process of training a smaller AI model to mimic a much larger one. The recipe: have the big "teacher" model generate huge volumes of responses, then train a smaller "student" model to produce the same outputs. The student ends up 5–10x smaller and faster but captures most of the teacher's capability. This is how GPT-5 mini came from GPT-5, Claude Haiku from Claude Opus, Gemini Flash from Gemini Pro. Distillation is one of the most economically efficient capability shifts available — it does not require a new architecture, just compute time on a teacher model that already exists.