1

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Cautious Actor-Critic