M.Sc. Thesis
Steffen Nissen
October 8, 2007
Department of Computer Science
University of Copenhagen
Denmark
Abstract
This thesis explores how the novel model-free reinforcement learning algorithm Q-SARSA(λ) can be combined with the constructive neural network training algorithm Cascade 2, and how this combination can scale to the large problem of backgammon.In order for reinforcement learning to scale to larger problem sizes, it needs to be combined with a function approximator such as an artificial neural network. Reinforcement learning has traditionally been combined with simple incremental neural network training algorithms, but more advanced training algorithms like Cascade 2 exists that have the potential of achieving much higher performance. All of these advanced training algorithms are, however, batch algorithms and since reinforcement learning is incremental this poses a challenge. As of now the potential of the advanced algorithms have not been fully exploited and the few combinational methods that have been tested have failed to produce a solution that can scale to larger problems.
The standard reinforcement learning algorithms used in combination with neural networks are Q(λ) and SARSA(λ), which for this thesis have been combined to form the Q-SARSA(λ) algorithm. This algorithm has been combined with the Cascade 2 neural network training algorithm, which is especially interesting because it is a constructive algorithm that can grow a neural network by gradually adding neurons. For combining Cascade 2 and Q-SARSA(λ) two new methods have been developed: The NFQ-SARSA(λ) algorithm, which is an enhanced version of Neural Fitted Q Iteration and the novel sliding window cache.
The sliding window cache and Cascade 2 are tested on the medium sized mountain car and cart pole problems and the large backgammon problem. The results from the test show that Q-SARSA(λ) performs better than Q(λ) and SARSA(λ) and that the sliding window cache in combination with Cascade 2 and Q-SARSA(λ) performs significantly better than incrementally trained reinforcement learning. For the cart pole problem the algorithm performs especially well and learns a policy that can balance the pole for the complete 300 steps after only 300 episodes of learning, and its resulting neural network contains only one hidden neuron. This should be compared to 262 steps for the incremental algorithm after 10,000 episodes of
learning. The sliding window cache scales well to the large backgammon problem and wins 78% of the games against a heuristic player, while incremental training only wins 73% of the games. The FQ-SARSA(λ) algorithm also outperforms the incremental algorithm for the medium sized problems, but it is not able to scale to backgammon.
The sliding window cache in combination with Cascade 2 and Q-SARSA(λ) performs better than incrementally trained reinforcement learning for both medium sized and large problems and it is the first combination of advanced neural network training algorithms and reinforcement learning that can scale to larger problems.
Large Scale Reinforcement Learning using Q-SARSA(λ) and Cascading Neural Networks
0 yorum:
Yorum Gönder