Models & Architectures
Mixture of Depths (MoD)
A neural architecture where the model decides per-token how many layers to use — saving compute on easy tokens.
Also known as: MoD,mixture of depths
Mixture of Depths (MoD) is a transformer architecture where the model dynamically decides how much computation each token needs. Easy tokens (filler words, predictable continuations) skip layers; hard tokens (reasoning, key facts) get the full depth. Result: ~30% fewer FLOPs per token at the same quality. MoD differs from Mixture of Experts (MoE), which routes tokens to different experts within each layer — MoD routes tokens to different numbers of layers. Combined, MoE + MoD ("Mixture of Depths and Experts," MoDE) powers most 2026 frontier models. DeepMind published the foundational paper in 2024; by 2026 it's in Gemini 3.5, Llama 4 Behemoth, and reportedly Claude Opus 4.8. Invisible to API users — purely an architectural efficiency win.

