login
Home / Papers / Reinforcement Learning

Reinforcement Learning

4 Citations•2017•
Peter Stone
journal unavailable

The oscillations and spikes in the early part of the curve for the optimistic method are explained, which makes this method perform differently on particular early plays.

Abstract

2. Consider the optimistic initial value example, (fig. 2.4 in Sutton and Barto’s book based on numbering in the first edition). This represents averages over 2000 individual, randomly chosen 10-armed bandit tasks, so the result should be reliable. How do you explain the oscillations and spikes in the early part of the curve for the optimistic method? What makes this method perform differently on particular early plays?