Mixture of Depths (MoD) — Plain English Definition

Mixture of Depths (MoD) is a transformer architecture where the model dynamically decides how much computation each token needs. Easy tokens (filler words, predictable continuations) skip layers; hard tokens (reasoning, key facts) get the full depth. Result: ~30% fewer FLOPs per token at the same quality. MoD differs from Mixture of Experts (MoE), which routes tokens to different experts within each layer — MoD routes tokens to different numbers of layers. Combined, MoE + MoD ("Mixture of Depths and Experts," MoDE) powers most 2026 frontier models. DeepMind published the foundational paper in 2024; by 2026 it's in Gemini 3.5, Llama 4 Behemoth, and reportedly Claude Opus 4.8. Invisible to API users — purely an architectural efficiency win.

Read the full guide

What Is Mixture Of Depths
What Is Mixture Of Experts

Read the full guide

Tools that use this