CDRL: cerebellar-microcircuit-inspired reinforcement learning improves sample efficiency 3.4×
By structuring the actor and critic to mirror the cerebellum's granule-Purkinje circuit organization, a new RL framework reaches the same competence with substantially less environment interaction.
The cerebellum has long been the brain region that neuroscience research engineers gesture toward when asked which biological structure looks most “designed for control.” Its granular layer expands inputs into a high-dimensional sparse representation; its Purkinje cells integrate this expansion under climbing-fiber supervision; the resulting circuit is implicated in motor learning, predictive coding, and increasingly, the kind of internal-model learning that RL needs.
CDRL operationalizes the circuit literally. The actor is a granule-layer-style sparse high-dimensional projection; the critic is a Purkinje-style supervised integrator that receives an error signal styled on climbing-fiber dynamics. The training scheme keeps the analogy intact: errors are sparse and localized rather than backpropagated densely.
On a battery of MuJoCo continuous-control tasks, CDRL reaches PPO-equivalent performance with 3.4× fewer environment steps. The architecture is small — the granule projection is the only large layer — which suggests the gain is structural, not parametric. The authors are careful to flag that the cerebellar analogy is loose at the cellular level; the win is at the level of the circuit topology and the locality of the error.