Bandits with Rcpp
问题 This is a second attempt at correcting my earlier version that lives here. I am translating the epsilon-greedy algorithm for multiarmed bandits. A summary of the code is as follows. Basically, we have a set of arms, each of which pays out a reward with a pre-defined probability and our job is to show that by drawing at random from the arms while drawing the arm with the best reward intermittently eventually allows us to converge on to the best arm. The original algorithm can be found here.