3

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal