Full recordShow full item record
AbstractWe consider the problem of learning hierarchical policies for Reinforcement Learning able to discover options, an option corresponding to a sub-policy over a set of primitive actions. Different models have been proposed during the last decade that usually rely on a predefined set of options. We specifically address the problem of automatically discovering options in decision processes. We describe a new RL learning framework called Bi-POMDP, and a new learning model called Budgeted Option Neural Network (BONN) able to discover options based on a budgeted learning objective. Since Bi-POMDP are more general than POMDP, our model can also be used to discover options for classical RL tasks. The BONN model is evaluated on different classical RL problems, demonstrating both quantitative and qualitative interesting results.
Comment: Under review as a conference paper at ICLR 2017