“Something is rotten in the state of Denmark”
The DANISH trial (in which, pleasingly, the D stands for “Danish”, and it was conducted in Denmark too), evaluated the use of Implantable Cardioverter Defibrillators (ICD) in patients with heart failure that was not due to ischaemic heart disease. The idea of the intervention is that it can automatically restart the heart in the event of a sudden cardiac arrest – so it might help these patients, who are at increased risk of their heart stopping suddenly (obviously there is a lot more clinical detail to this).
The trial recruited 1116 patients and found that the primary outcome (death from any cause) occurred in 120/556 (21.6%) in the ICD group and 131/560 (23.4%) in control; a hazard ratio of 0.87, 95% CI 0.68, 1.12. The conclusion was (from the abstract):
“prophylactic ICD implantation … was not associated with a significantly lower long-term rate of death from any cause than was usual clinical care”;
and from the end of the paper:
“prophylactic ICD implantation … was not found to reduce longterm mortality.”
Note, in passing, the subtle change from “no significant difference” in the abstract, which at least has a chance of being interpreted as a statement about statistics, to “not found to reduce mortality” – a statement about the clinical effects. Of course the result doesn’t mean that, but the error is so common as to be completely invisible.
Reporting of the trial mostly put it across as showing no survival improvement, for example:
The main issue in this trial, however, was that the ICD intervention DID reduce sudden cardiac death, which is what the intervention is supposed to do: 24/556 (4.3%) in the ICD group and 46/560 (8.2%) in control, hazard ratio 0.50 (0.31, 0.82). All cardiovascular deaths (sudden and non-sudden) were also reduced in the ICD group, but not by so much: HR 0.77 (0.57, 1.05). You might expect a result like this if the ICD reduced sudden cardiac deaths, but in addition to this both groups have similar risk of non-sudden cardiac death. When all deaths are counted (including cardiac and other causes), the difference in the outcome that the intervention can affect starts getting swamped by outcomes that it doesn’t reduce. The sudden cardiac deaths make up a small proportion of the total, so the overall difference between the groups is dominated by deaths that weren’t likely to differ between the groups, and the difference in all-cause mortality is much smaller (and “non-significant”). So all of the results seem consistent with the intervention reducing the thing it is intended to reduce, by quite a lot, but there also being a lot of deaths due to other causes that aren’t affected by the intervention. To get my usual point in, if Bayesian methods were used, you would find a substantially greater probability of benefit for the intervention for cardiovascular death and all-cause mortality.
All-cause death was chosen as the primary outcome, and following convention, the conclusions are based on this. But the conclusion is sensitive to the choice of primary outcome: if sudden cardiac death had been the primary outcome, the trial would have been regarded as “positive”.
So, finally, to get around to the general issues. It is the convention in trials to nominate a single “primary outcome”, which is used for calculating a target sample size and for drawing the main conclusions of the study. Usually this comes down to saying there was benefit (“positive trial”) if the result gets a p-value of less than 0.05, and not if the p-value exceeds 0.05 (“negative trial”). The expectation is that a single primary outcome will be nominated (sometimes you can get away with two), but that means that the conclusions of the trial will be sensitive to this choice. I think the reason for having a single primary outcome stems from concerns over type I errors if lots of outcomes are analysed. You could them claim a “positive” trial and treatment effectiveness if any of them turned out “significant” – though obviously restricting yourself to a single primary outcome is a pretty blunt instrument for addressing multiple analysis issues.
There are lots of situations where it isn’t clear that a single outcome is sufficient for drawing conclusions from a trial, as in DANISH: the intervention should help by reducing sudden cardiac death, but that won’t be any help if it increases deaths for other reasons – so both sudden cardiac deaths and overall deaths are important. Good interpretation isn’t helped by the conventions (=bad habits) of equating “statistical significance” with clinical importance, and labelling the treatment as effective or not based on a single primary outcome.
Reference for DANISH trial: N Engl J Med 2016; 375:1221-1230, September 29, 2016