Here’s why I don’t like the term “multi-armed bandit” to describe the exploration-exploitation tradeoff of inference and decision analysis.

First, and less importantly, each slot machine (or “bandit”) only has one arm. Hence it’s many one-armed bandits, not one multi-armed bandit.

Second, the basic strategy in these problems is to play on lots of machines until you find out which is the best, and then concentrate your plays on that best machine. This all presupposes that either (a) you’re required to play, or (b) at least one of the machines has positive expected value. But with slot machines, they all have negative expected value for the player (that’s why they’re called “bandits”), and the best strategy is not to play at all. So the whole analogy seems backward to me.

Third, I find the “bandit” terminology obscure and overly cute. It’s an analogy removed at two levels from reality: the optimization problem is not really like playing slot machines, and slot machines are not actually bandits. It’s basically a math joke, and I’m not a big fan of math jokes.

So, no deep principles here, the “bandit” name just doesn’t work for me. I agree with Bob Carpenter about the iid part (although I do remember a certain scene from The Grapes of Wrath with a non-iid slot machine), but other than that, the analogy seems a bit off.

The term “bandit” is just way well-established to change at this point. These complaints about it make sense, but there have been probably thousands of papers written about “bandits” and it would just confuse the literature to switch to another name. Or even worse, no name. The stance in BDA that “empirical Bayes” shouldn’t be used because it implies other Bayesian methods aren’t empirical is a similar example. While I agree this is a crappy name, the main utility of a name is to label when the same idea is being discussed. If people had called bandits “left-orthogonal fluffy ponies” after this long, we should just stick with it.

This came up because Lauren, Andrew, and I are writing a case study on sequential design or reinforcement learning or bandits, depending on how you want to talk about it. I frequently run into terminology issues with Andrew. I’m usually the pickiest person in the room about this kind of thing. I blame my misspent youth working on natural language semantics; I believe Andrew just wants to make the world a better place terminologically.

Andrew’s also told me he finds “MAP” too jargony. I don’t get this one, because every technical term is jargon by definition. It’s not like “posterior mode” is less jargony or even less Latinate. Maybe it’s that “A” which should be an “a”?

I get why Andrew and Jennifer wrote a long explanation in their regression book about the term “random effects” being too imprecise. I get this one. And it’s easy to say X-level effects for the relevant X, as in patient-level effects or individual education-level effects, etc. I just have to run an ongoing translation in my brain from the common language of (non-)linear mixed-effects models (the acronym source for NONMEM and lme4).

Bob:

I don’t like the term “MAP estimator” because it’s presented as an “estimator” rather than as a summary of the posterior distribution. If it were called “MAP summary,” I’d be ok with it.

Justin:

You write, “there have been probably thousands of papers written about ‘bandits’ and it would just confuse the literature to switch to another name.”

If we’re just talking about math problems, then, fine, the name can be whatever. My problem, though, is with applications. My concern is that most of these thousands of papers are solving various math problems that are irrelevant to real applied concerns.

Here’s what often happens in the statistical literature: People start with a real applied problem. Then they abstract a relevant math problem and solve it. So far, so good. But then you’ll get thousands of papers elaborating the math problem, yielding lots of solutions that are irrelevant or even counterproductive to the applied problem that they’re supposed to be solving. Names can make a difference in that they can give misleading impressions of what are the problems that we should be working on.

I totally agree that people start with an applied problem, abstract it, solve it, and then keep changing the math problem in ways that make the solutions less relevant to the original problem. I’m not so sure that’s a bad thing, though! “Bandits” are used to solve a huge range of fairly disconnected applied problems. I think it makes more sense to organize the ideas in terms of the mathematical concepts, rather than the application. At least, for topics like bandits where the mathematical problem is so damnably subtle, all sorts of seemingly disparate applications end up tapping into the same pool of existing research.

If they’re all separate, playing one machine won’t change anything on another.

But if each machine/arm is a part of a genome with fixed values in certain loci, then each one selects from all the others’ ranges. If you’re homing in on a good solution for one, hopefully it will improve the reward from another.

“This all presupposes that either (a) youâ€™re required to play”

If the game is life, well… I suppose one isn’t actually required to play, but those that don’t feel the obligation won’t be part of the phenomenon under study for long – or we won’t ever notice them!

Each pull of the arm of a bandit is a kind of dice throw, but combining randomness with an overall long-term characteristic outcome.

” (b) at least one of the machines has positive expected value”

The winning one usually does!

I think it can make sense to play, even if all “bandits” have negative expected value. The expected value may take some time to realize, so isn’t necessarily relevant to decision. There’s a literature in evolution and ecology about stochastic switching in which each “bandit” has negative expected value (long term growth rate, like geometric mean of per annum productivity in simplest models), but a strategy that disperses among “bandits” can still have a positive long term growth rate.

Example with citations to older lit: https://www.journals.uchicago.edu/doi/10.1086/599296

So technically, in very special cases, it makes sense to play even if all “bandits” have negative expected value alone. There must be an economics lit observing the same result but using different terms. Portfolio theory maybe.

In any event, I think that all of this reinforces your point about the badness of the “bandit” metaphor.

Perhaps related: Ole Peters (and Murray Gell-Mann) work on ergodics and expected utility theory. The ensemble expectation is not the same as the time average. Brilliant stuff.

(yet) It’s been that for years!

Touching on a nerve here! In recent years, “multi-armed bandits” are being hailed as a solution for randomized controlled testing. It’s argued that traditional fixed-sample tests (“A/B tests”) is all exploration, no exploitation. But the analogy to slot machines is inexact to be generous: in a slot machine, the odds are fixed but unobserved; in an A/B test, the odds are not fixed, and depend on lots of factors, like time of day, demographics of web visitors, etc.; in the “multi-armed bandit”, the player chooses which machines to play, and plays multiple machines all at once; in an A/B test, the player is assigned exactly one of the machines at random, and not allowed to play other machines; in the “multi-armed bandit”, it’s a single player moving around multiple machines; in the A/B test, it’s lots of different players each playing a single machine. And I’m only touching the surface!