Home / Papers / Reinforcement Learning

Reinforcement Learning

DOI: 10.1007/978-1-4899-7687-1_720Semantic Scholar

4 Citations•2017•

Peter Stone

journal unavailable

The oscillations and spikes in the early part of the curve for the optimistic method are explained, which makes this method perform differently on particular early plays.

Abstract

2. Consider the optimistic initial value example, (fig. 2.4 in Sutton and Barto’s book based on numbering in the first edition). This represents averages over 2000 individual, randomly chosen 10-armed bandit tasks, so the result should be reliable. How do you explain the oscillations and spikes in the early part of the curve for the optimistic method? What makes this method perform differently on particular early plays?