1

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond
Add the full text or supplementary notes for the publication here using Markdown formatting.
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Cautious Actor-Critic