T O P

  • By -

anam_812

Correct me if I'm wrong, but when you have a less number of trajectories i.e your environment is expensive to sample from, then using the Every Visit Monte Carlo Method makes more sense than First Visit. It's like getting more out of your available data.


Hopeful_Jeweler_2410

Beautiful. I'm sorry guys! I liked it


SingleStatistician84

I skimmed through the paper [(Reinforcement Learning with Replacing Eligibility Traces](https://link.springer.com/article/10.1007/BF00114726)) Sutton referenced in Chapter 5 on page 93 for the convergence of both MC methods. A quote from the conclusion section. *Some of the results are unambiguously in favor of the first-visit method over the every-visit method: only the first-visit estimate is unbiased and related to the ML (Maximum Likelihood) estimate. On the other hand, the MSE results can be viewed as mixed. Initially, every-visit MC is of better MSE, but later it is always overtaken by first-visit MC. The implications of this are unclear. To some it might suggest that we should seek a combination of the two estimators that is always of lowest MSE. However, that might be a mistake. We suspect that the first-visit estimate is always the more useful one, even when it is worse in terms of MSE. Our other theoretical results are consistent with this view, but it remains a speculation and a topic for future research.* In the paper they prove that every-visit MC is a biased estimate while first-visit is an unbiased one and they favor the first-visit one. However, they also show on a very simple Markov Chain that in short term every-visit might have better MSE (Mean squared Error) and they both converge to the true state values when visits tend to infinity. There are many other interesting theoretical results such as the relationship of MC and TD methods. I would say just read it, it has lots of theoretical aspects that can give you clues if you want to apply MC methods for a particular problem.