A Pontryagin Perspective on Reinforcement Learning

Onno Eberhard · Claire Vernade · Michael Muehlebach

2025 Sixth Annual Learning for Dynamics & Control Conference (L4DC 2025)
Oral · Best Paper Nomination

Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman’s equation from dynamic programming, our work builds on Pontryagin’s principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, demonstrating remarkable performance compared to existing baselines.


PDF arXiv Code Slides

Also presented at the ICML 2024 Workshop on Foundations of Reinforcement Learning and Control (OpenReview).

The video below shows the open-loop behavior learned by our model-free method on two MuJoCo tasks.