I found a paper in a clinical journal about confidence intervals. I’m not going to give the reference, but it was published in 2017, and written by a group of clinicians and methodologists, including a statistician. Its main purpose was to explain confidence intervals to clinical readers – which is undoubtedly a worthwhile aim, as there is plenty of confusion out there about what they are.
I think there is an interesting story here about what understanding people take away from these sorts of papers (of which there are quite a number), and how things that are written that are arguably OK can lead the reader to a totally wrong understanding.
Here’s the definition of confidence intervals that the authors give:
“A 95% confidence interval offers the range of values for which there is 95% certainty that the true value of the parameter lies within the confidence limits.”
That’s the sort of definition you see often, and some people don’t find problematic, but I think most readers will be misled by it.
The correct definition is that in a long series of replicates, 95% of the confidence intervals will contain the true value, so it’s kind-of OK to say that a 95% CI has a “95% probability of including the true value,” if you understand that means that “95% of the confidence intervals that you could have obtained would contain the true value.”
Where I think this definition goes wrong is in using the definite article: “THE range of values for which there is 95% certainty…” That seems to be saying pretty clearly that we can conclude that there is a 95% probability that the true value is in this specific range. I’m pretty sure that is what most people would understand, and the next logical step is that if there is 95% probability of the true value being in this range, if we replicate the study many times, we will find a value in this range 95% of the time.
That’s completely wrong – the probability that the parameter is in a 95% CI varies depending exactly where in relation to the true value the CI falls. If you’ve got a CI that happens to be extreme, the probability of getting a replicated parameter in that range might be very low. On average it’s around 83.6% (see Cumming & Maillardet 2006, ref below).
The problem is that “95% probability of including the true value” is a property of the population of all possible confidence intervals, and unless we are very careful about language, it’s easy to convey the erroneous meaning that the “95% probability” applies to the one specific confidence interval that we have found. But in frequentist statistics it doesn’t make sense to talk about the probability of a parameter taking certain values; the parameter is fixed but unknown, so it is either in a particular confidence interval or it isn’t. That’s why the definition is as it is: 95% of the possible confidence intervals will include the true value. But we don’t know where along their length the true value will fall, or even whether it is in or out of any particular interval. It’s easy to see that “95% probability of the location of the true value” (which seems to be the interpretation in this paper) can’t be right; replications of the study will each have different data and different confidence intervals. These cannot all show the location of the true value with 95% certainty; some of them won’t even overlap!
What the authors seem to be doing, without realising it, is using a Bayesian interpretation. This is no surprise; people do it all the time, because it is a natural and intuitive thing to do, and many probably go through an entire career without realising that this is what they are doing. When we don’t know a parameter, it is natural to think of our uncertainty in terms of probability – it makes sense to us talk about the most probable values, or a range of values with 95% probability. I think this is what people are doing when they talk about 95% probability of the true value being in a confidence interval. They are imagining a probability distribution for the parameter, with the confidence interval covering 95% of it. But frequentist confidence intervals aren’t probability distributions. They are just intervals.
I guess this post ought to have some nice illustrations. I might add some when I’ve got a bit of time.
Cumming, G., Maillardet, R. Psychological Methods 2006, Vol. 11, No. 3, 217–227