A jar of beads meets human certainty
I was watching an especially populist and ridiculous edition of the BBC’s science show Horizon when I saw an experiment performed which captured my imagination. It is deceptively simple, but mathematically and psychologically intriguing, and asks some interesting questions about how and why humans make decisions in the way they do. The experiment runs like this:
A psychologist shows you two jars of beads. One is mostly red: 80 reds to 20 blue ones. The other is the opposite: 80 blue, 20 red. She hides the jars under the table, and starts drawing beads out from one of them, one at a time. It’s your job to decide which jar the beads are being pulled from. When you’re sure, you tell her to stop, and you tell her which jar the beads were coming from.
So, when should you tell her to stop? Well, it obviously depends on the combination of beads which comes out. If the first ten beads are all red, it’s pretty likely to be the mostly-red jar, right? But how long should you wait? How many reds in a row should it take to convince you? There are two different answers: the cold mathematical one, and the fuzzy psychological one, and they’re both interesting in their respective ways.
One thing is for sure: you’ll definitely know one way or the other by bead 41. By that time, 21 of one colour will have been pulled out, and it’s a dead cert they came from that jar. In fact, 21 of the dominant colour will almost certainly come out before that and leave you sure.
However, we can do much better than that. The probability of a given jar being responsible for a given sequence of balls can be ascertained using Bayes’ theorem, a mathematical way of quantifying how sure you can be given limited information. The limited information we have in this case is that we know one of two different jars is being used (I have assumed that each jar will be used half the time, at random), and we know the ratio of red to blue beads in each.
It’s fairly simple to get a computer to draw out a table of possibilities for you. Here, you can look up the number of red beads you see across the top, and the number of blue down the side, up to a total of ten beads altogether. The percentage shows how likely it is to be the relevant jar: so 80% on a blue square means it’s 80% likely to be the blue jar, and so on.
What does that table tell us? Well…
- Firstly, and obviously, if you’ve got an equal number of each type of bead, you don’t know which jar is being used.
- Having one extra bead of one colour doesn’t help much. In fact, after two alternately-coloured beads are drawn, it’s like you’ve reset the experiment, but with slightly altered odds: rather than picking from a jar split 80:20 red:blue, it’s changed to 79:19, which is almost the same.
- Having two or more surplus red (or blue) beads leads you to near-certainty (> 94%) that it’s the red (or blue) jar.
So where, rationally, should you stop? Well, that depends on the level of certainty you want, which is left (presumably intentionally) vague in this experiment. If that level is 80%, you should choose after the first bead, but if you want to be 95% sure, you’ll need at least three beads, and possibly more. You can plot the expected level of certainty against number of beads drawn:
Drawing more beads makes it more likely you’ll be more certain, obviously. This is actually mainly because you’re very unlikely to keep failing to get a majority of one colour as time goes on—see the table in the section about checking for hoaxes.
Also, bizarrely, the maths seems to suggest that your likely level of certainty only actually increases every other go—getting an extra bead to make an even number might clinch it, but it’s just as likely not to. If anyone can think of an intuitive explanation for this apparent quirk of the numbers (or point out what I’ve done wrong…), I’d be pleased to hear from you!
This is a psychological test administered to patients suspected of having schizophrenia. 40–70% of schizophrenics decide which jar is being used after just one bead, whereas normal people, or patients with other psychological conditions, take longer. In particular, patients with obsessive-compulsive disorder often want to see many beads before making the call.
This is called ‘jumping to conclusions bias’, and is part of the reason that schizophrenics have paranoid delusions and believe implausible hallucinations—they are happy to jump to the conclusion that the crazy notion that they’re being stalked by a six-foot ravenous bunny rabbit is true.
The question which presumably none of the subjects ask is ‘how sure do you want me to be?’ If the answer is ‘80%’, those quick-picking schizophrenics are making the choice in an entirely rational way. Is 80% sure a rational default level of certainty before starting a task? Well, it depends what it is: probably better to be more than 80% sure you’ll make it when you cross the road, but it’s probably a higher degree of certainty than is possible in many social or business decisions you might make. Will John find my red dress or my blue dress sexier? Should I invest in Red Ltd or Blue PLC?
It’s interesting that the human brain is naturally pretty cautious. In the rather bizarre situation of this experiment, what motivates people to see another bead to increase their certainty? How much extra certainty do people perceive they’re getting by waiting for extra beads? What, mathematically, does ‘sure’ mean to most people, and how does it depend on situation? To what extent, and under what circumstances, does human intuition align with true probability?
It’s very difficult to assess the ‘rational’ level of certainty in a situation like this: nothing whatsoever hangs on whether you get it right, apart from that asking to see more beads might mean your appointment at the hospital takes slightly longer. However, the experiment could be modified to test human perceptions of incentives, if we added a reward which decreased if you needed to see more beads to come to a decision. Mathematically, if your remuneration goes down faster than your certainty goes up, you would be better off adopting the schizophrenic strategy and picking on the first bead. In fact, your average certainty goes up pretty slowly, so it would be a slow-depreciating return which made waiting worthwhile. The question is what people perceive the levels of certainty in this experiment to be, and whether it aligns with the mathematical reality.
Of course, a clinical psychologist doesn’t care whether it’s schizophrenics or non-schizophrenics who are behaving more ‘rationally’: what’s important is that there is a statistically significant difference between the groups which makes this a good test for schizophrenia.
Is it a hoax?
When this experiment is actually performed, the beads are not picked from one of the two jars at all: they’re picked in a pre-set order to make the experiment consistent. You’d have to be pretty paranoid (schizophrenically so, perhaps?) to suspect a kindly psychologist of tricking you in this situation. But when does rash accusation of cheating become rational? When is something just so flukey that you should suspect that it’s been engineered?
Checking for foul play in a situation like this calls for careful application of statistics. The odds of getting red, blue, red, blue, blue, red in that order is a fairly teensy 1 in 250. So should we cry foul? Well, not initially: any combination of three reds and three blues is equally likely, and order isn’t important so we should really consider them all together. There are 20 different combinations, bringing the chances of getting one of them up to 1 in 12.
If you do a similar calculation for every possible combination of up to twenty beads, you get a table something like this:
Since we don’t know a priori which jar is being used, there are two likely paths, one for each jar, but what is unlikely as the number of beads drawn increases is that we’ll have an approximately equal number of each colour of bead: eventually, a larger number of one or other colour should appear, making it apparent which jar is being drawn from.
So, how can you catch a cheat? Well, predicting something is normally a good method: if you predict that something you know is unlikely will happen, and have a reason to believe that this unlikely outcome may be favoured, and then it happens, then you’ll often be right. For example, if you went into the room thinking ‘bloody psychologist, I bet he will alternately pick out red and blue beads just to unnerve me and slow down my decision’, and then he did, you would probably be justified in crying ‘hoax!’
However, there’s a flaw in this method, if applied indiscriminately: if you go through everyday life constantly expecting unexpected stuff, then eventually you’ll be right. Magician and sceptic James Randi famously wrote a note on the back of a business card every morning which read ‘I am James Randi and I will die today’, followed by his signature and the date, just in case he died in some horrific and unusual way. The transparent lack of psychic ability needed to predict your own death in this way makes it a stinging satire on, say, astrologers—the fact is that if you make enough predictions, you’ll get it right at some point.
The only way you can unambiguously catch a cheat is if you get to observe something many times. If you observed this bead experiment a few times and the psychologist always drew the same or a similar pattern of colours, you could quickly amass enough evidence that there was something fishy afoot.
The problem of hoax detection is particularly acute in the real world, where the underlying probabilities are much harder to predict, and repeating the experiment is often impossible. If the Red party tell us that their tax policy will boost the economy, and then the economy does improve, the Blue party will undoubtedly shake their fist and say it would have happened anyway and a knockout package of evidence either way simply doesn’t exist. This is mathematical proof that politics will be a perpetual rollercoaster of tit-for-tat theatrics.
Hoax detection becomes an almost philosophical problem when studying the entire Universe. It would be nice to be able to take a sample of many universes, and take a survey of number of worlds with life, or strength of gravity in each to give us an idea of whether our universe is a ‘hoax’—not random, but designed as a venue for the creation of the human race by some all-powerful being. Sadly, with only one universe, we don’t know if there is a God—let alone whether he’s red or blue.