#
All entries for September 2017

## September 21, 2017

### Best sample size calculation ever!

I don't want to start obsessing about sample size calculations, because most of the time they're pretty pointless and irrelevant, but I came across a great one recently.

My award for least logical sample size calculation goes to Mitesh Patel et al, Intratympanic methylprednisolone versus gentamicin in patients with unilateral Meniere's disease: a randomised, comparative effectiveness trial, in The Lancet, 2016, 388: 2753-62.

The background: Meniere's disease causes vertigo attacks and hearing loss. Gentamicin, the standard treatment, improves vertigo but can worsen hearing. So the question is whether an alternative treatment, methylprednisolone, would be better - as good in reducing vertigo, and better in terms of hearing loss. That's actually not what the trial did though - it had frequency of vertigo attacks as the primary outcome. You might question the logic here; if gentamicin is already good at reducing vertigo, you might get no or only a small improvement with methylprednisolone, but methylprednisolone might not cause as much hearing loss. So you want methylprednisolone to be better at reducing hearing loss, as long as it's nearly as good as gentmicin at reducing vertigo.

Anyway, the trial used vertigo as its primary outcome, and recruited 60 people, which was its pre-planned sample size. But when you look at the sample size justification, it's all about hearing loss! Er... that's a completely different outcome. They based the sample size of 60 people on "detecting" a difference of (i.e. getting statistical significance if the true difference was) 9dB (sd11). Unsurprisingly, the trial didn't find a difference in vertigo frequency.

This seems to be cheating. If you're going to sign up to the idea that it's meaningful to pre-plan a sample size based on a significance test, it seems important that it should have some relation to the main outcome. Just sticking in a calculation for a different outcome doesn't really seem to be playing the game. I guess it ticks the box for "including a sample size calculation" though. Hard to believe that the lack of logic escaped the reviewers here, or maybe the authors managed to convince them that what they did made sense (in which case, maybe they could get involved in negotiating Brexit?).

Here's their section on sample size, from the paper in The Lancet:

## September 13, 2017

### Confidence (again)

I found a paper in a clinical journal about confidence intervals. I’m not going to give the reference, but it was published in 2017, and written by a group of clinicians and methodologists, including a statistician. Its main purpose was to explain confidence intervals to clinical readers – which is undoubtedly a worthwhile aim, as there is plenty of confusion out there about what they are.

I think there is an interesting story here about what understanding people take away from these sorts of papers (of which there are quite a number), and how things that are written that are arguably OK can lead the reader to a totally wrong understanding.

Here’s the definition of confidence intervals that the authors give:

“A 95% confidence interval offers the range of values for which there is 95% certainty that the true value of the parameter lies within the confidence limits.”

That’s the sort of definition you see often, and some people don’t find problematic, but I think most readers will be misled by it.

The correct definition is that in a long series of replicates, 95% of the confidence intervals will contain the true value, so it’s kind-of OK to say that a 95% CI has a “95% probability of including the true value,” if you understand that means that “95% of the confidence intervals that you could have obtained would contain the true value.”

Where I think this definition goes wrong is in using the definite article: “THE range of values for which there is 95% certainty…” That seems to be saying pretty clearly that we can conclude that there is a 95% probability that the true value is in this specific range. I’m pretty sure that is what most people would understand, and the next logical step is that if there is 95% probability of the true value being in this range, if we replicate the study many times, we will find a value in this range 95% of the time.

That’s completely wrong – the probability that the parameter is in a 95% CI varies depending exactly where in relation to the true value the CI falls. If you’ve got a CI that happens to be extreme, the probability of getting a replicated parameter in that range might be very low. On average it’s around 83.6% (see Cumming & Maillardet 2006, ref below).

The problem is that “95% probability of including the true value” is a property of the population of all possible confidence intervals, and unless we are very careful about language, it’s easy to convey the erroneous meaning that the “95% probability” applies to the one specific confidence interval that we have found. But in frequentist statistics it doesn’t make sense to talk about the probability of a parameter taking certain values; the parameter is fixed but unknown, so it is either in a particular confidence interval or it isn’t. That’s why the definition is as it is: 95% of the possible confidence intervals will include the true value. But we don’t know where along their length the true value will fall, or even whether it is in or out of any particular interval. It’s easy to see that “95% probability of the location of the true value” (which seems to be the interpretation in this paper) can’t be right; replications of the study will each have different data and different confidence intervals. These cannot all show the location of the true value with 95% certainty; some of them won’t even overlap!

What the authors seem to be doing, without realising it, is using a Bayesian interpretation. This is no surprise; people do it all the time, because it is a natural and intuitive thing to do, and many probably go through an entire career without realising that this is what they are doing. When we don’t know a parameter, it is natural to think of our uncertainty in terms of probability – it makes sense to us talk about the most probable values, or a range of values with 95% probability. I think this is what people are doing when they talk about 95% probability of the true value being in a confidence interval. They are imagining a probability distribution for the parameter, with the confidence interval covering 95% of it. But frequentist confidence intervals aren’t probability distributions. They are just intervals.

I guess this post ought to have some nice illustrations. I might add some when I’ve got a bit of time.

Cumming, G., Maillardet, R. Psychological Methods 2006, Vol. 11, No. 3, 217–227