Distributional Reinforcement Learning with Dual Expectile-Quantile Regression
Quantile-based distributional RL agents are usually improved by using alternate L2-based losses, at the cost of theoretical validity and distributional collapse. We propose a way to leverage effective L2 losses while maintaing an estimation of the full distribution of returns.