Toshinori Kitamura

Toshinori Kitamura

Postdoc Researcher

The University of Alberta

Biography

I am a postdoctoral researcher at the University of Alberta in Csaba Szepesvári’s lab. My Ph.D. was supervised by Dr. Tadashi Kozuno and Prof. Yutaka Matsuo. I earned my master’s degree at the Nara Institute of Science and Technology under the supervision of Prof. Takamitsu Matsubara.

Download my resumé.

Interests
  • Safe Reinforcement Learning Theory
  • Bandit Algorithms
Education
  • Postdoctoral Researcher, 2025(Sep)-

    The University of Alberta

  • Postdoctoral Researcher, 2025-2025(Sep)

    The University of Tokyo

  • Ph.D. in Engineering, 2022-2025

    The University of Tokyo

  • M.S. of Science and Technology, 2020-2022

    Nara Institute of Science and Technology

  • Exchange Student, 2018

    University of California Davis

  • B.E. in Science and Technology, 2016-2020

    Keio University

Selected Publications

(2026). Emergence of exploration in policy gradient reinforcement learning via retrying. International Conference on Machine Learning (ICML).

Cite Source Document

(2026). Revisiting Subgradient Dominance in Robust MDPs: Counterexamples, Hardness, and Sufficient Conditions. arXiv preprint arXiv:2604.21177.

Cite Source Document

(2025). Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form. International Conference on Learning Representation (ICLR).

Cite Source Document

(2025). A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond. Finding the Frame Workshop at RLC 2025.

Cite Source Document

(2025). Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation. Neural Information Processing Systems (NeuralIPS) Spotlight.

Cite Slides Source Document

(2024). A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees. arXiv preprint arXiv:2401.17780.

Cite Source Document

(2023). Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice. International Conference on Machine Learning (ICML).

Cite Source Document

Other Papers

(2023). (OS 招待講演) 逐次意思決定における諸問題設定と問題に関する事前知識が性能保証に及ぼす影響について. 人工知能学会全国大会論文集 第 37 回 (2023).

Cite Source Document

(2022). KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal. arXiv preprint arXiv:2205.14211.

Cite Source Document

(2021). Cautious Actor-Critic. Asian Conference on Machine Learning (ACML).

Cite Source Document

(2021). Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning. arXiv preprint arXiv:2107.05798.

Cite Source Document

(2021). Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning. Asian Conference on Machine Learning (ACML).

Cite Source Document

Working Experience

 
 
 
 
 
Postdoctoral Researcher
Apr 2025 – Sep 2025 Tokyo, Japan
Reinforcement Learning (RL) Theory
 
 
 
 
 
Research Part-time
Jun 2023 – Feb 2025 Tokyo, Japan
 
 
 
 
 
Research Internship
Jun 2021 – Feb 2025 Tokyo, Japan
 
 
 
 
 
Engineering Internship
Jun 2022 – Apr 2013 Tokyo, Japan
 
 
 
 
 
Research Internship
Mar 2019 – Jan 2020 Tokyo, Japan
 
 
 
 
 
Engineering Intern
Jan 2018 – May 2018 Tokyo, Japan

Posts

OMRON SINIC X (OSX) のインターン感想
2021年6月から2022年2月まで, オムロンサイニックエックス (OSX) にて研究インターンをさせていただきました. これはOSXインターンについてまとめたポストになります.