Transformer
Mixture of Experts (MoE): Architecture & How It Works
Mixture of Experts (MoE) is an architecture paradigm that scales model capacity dramatically while keeping computational cost manageable by routing each input to only a subset