Machine learning is the computational study of algorithms that improve performance based on experience, and this book covers the basic issues of artificial intelligence. Individual sections introduce the basic concepts and problems in machine learning, describe algorithms, discuss adaptions of the learning methods to more complex problem. Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-oﬀ. This is the balance between staying with the option that gave highest payoﬀs in the past and exploring new options that might give higher payoﬀs in the future. Although the study of bandit problems dates back to. Outline I Bandit problems and applications I Bandits with small set of actions I Stochastic setting I Adversarial setting I Bandits with large set of actions I unstructured set I structured set I linear bandits I Lipschitz bandits I tree bandits I Extensions Jean-Yves Audibert, Introduction to . For a one-armed bandit problem, only arm 1 is unknown with some multiple prior beliefs C, where the random payoff is simply X t = X t 1 and the stochastic process is (X 1, , X T). Let λ be the constant per-period payoff given by arm 2. Hence, I can denote a one-armed bandit problem by .

The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the tradeoff between exploration (trying out each arm to find the best one) and exploitation (playing.

Adding new arms in a bandit problem doesn't pose a problem for most bandit algorithms. Any of the common algorithms will handle it just fine. Arms disappearing is more interesting, as that effects the explore / exploit tradeoff. It's been a while since I was studying bandit algorithms but "Mortal multi-armed bandits" is one paper that addresses.