KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Publication
arXiv preprint arXiv:2205.14211