All 3 entries tagged <em>Bayesian</em>Simon Gateshttps://blogs.warwick.ac.uk/simongates/tag/bayesian/?atom=atomWarwick Blogs, University of Warwick(C) 2020 Simon Gates2020-01-28T11:24:29ZBayesian trial in the real world by Simon GatesSimon Gateshttps://blogs.warwick.ac.uk/simongates/entry/bayesian_trial_in/2017-11-01T09:18:12Z2017-11-01T09:18:12Z<p>This post arose from a discussion on Twitter about a recently-published randomised trial. Twitter isn’t the best forum for debate so I wanted to summarise my thoughts here in some more detail.</p>
<p>What was interesting about the trial was that it used a Bayesian analysis, but this provoked a lot of reaction on Twitter that seemed to miss the mark a bit. There were some features of the analysis that some people found challenging, and the Bayesian methods tended to get the blame for that, incorrectly in my view.</p>
<p>First, a bit about the trial. It’s this one: <br />
<strong>Laptook et al</strong> <em>Effect of Therapeutic Hypothermia Initiated After 6 Hours of Age on Death or Disability Among Newborns With Hypoxic-Ischemic Encephalopathy. <span class="caps">JAMA 2017</span>; 318(16): 1550-1560.</em></p>
<p>This trial randomised infants with hypoxic ischaemic encephalopathy who were aged over 6 hours to cooling to 33.5 C for 96 hours (to prevent brain injury) or no cooling. Earlier studies have established that cooling started in the first 6 hours after birth reduces death and disability, so it is plausible that starting later might also help, though maybe the effect would be smaller. The main outcome was death or disability at 18 months.</p>
<p>The methodological interest here is that they used a Bayesian final analysis, because they felt that they would only be able to recruit a restricted number of infants, and a Bayesian analysis would be more informative, as it can quantify the probability of the treatment’s benefit, rather than giving the usual significant/non-significant = works/doesn’t work dichotomy.</p>
<p>The main outcome occurred in 19/78 in the hypothermia group and 22/79 in the no hypothermia group. Their analysis used three different priors, a neutral prior (centred on <span class="caps">RR 1</span>.0), an enthusiastic prior, centred on <span class="caps">RR 0</span>.72 (as found in an earlier trial of hypothermia started before 6 hours), and a sceptical prior, centred on <span class="caps">RR 1</span>.10. The 95% interval for the neutral prior was from 0.5 to 2.0, so moderately informative.</p>
<p>The results for the Bayesian analysis with the neutral prior that were presented in the paper were: an adjusted risk ratio of 0.86, with 95% interval from 0.58 to 1.29, and 76% probability of the risk ratio being less than 1.</p>
<p>OK, that’s the background.</p>
<p>Here are some unattributed Twitter reactions:</p>
<blockquote>
<p>“This group (mis)used Bayesian methods to turn a 150 pt trial w P=0.62 into a + result w/ post prob efficacy of 76%!”</p>
</blockquote>
<blockquote>
<p>“I think the analysis is suspicious, it moves the posterior more than the actual effect size in study, regardless which prior chosen <br />
Primary outcome is 24.4% v 27.9% which is RR of 0.875 at best. Even with a weak neutral prior, should not come up with aRR to 0.86<br />
Also incredibly weak priors with high variance chosen, with these assumptions, even a n=30 trial would have shifted the posterior.”</p>
</blockquote>
<p>There were some replies from Bayesian statisticians, saying (basically) no, it looks fine. The responses were interesting to me, as I have frequently said that Bayesian methods would help clinicians to understand results from clinical trials more easily. Maybe that’s not true! So it’s worth digging a bit into what’s going on.</p>
<p>First, on the face of it 19 versus 22 patients with the outcome (that’s 24.4% versus 27.8%) doesn’t look like much of a difference. It’s the sort of difference that all of us are used to seeing described as “non-significant,” followed by a conclusion that the treatment was not effective or something like that. So to see this result meaning a probability of benefit of 76% might look as if it’s overstating the case.</p>
<p>Similarly, the unadjusted risk ratio was about 0. 875, but the Bayesian neutral-prior analysis had RR=0.86; so it looks as though there has been some alchemy in the Bayesian analysis to increase the effect size.</p>
<p>So is there a problem or not? First, the 76% probability of benefit just means 76% posterior probability (based on the prior, model and data) that the risk ratio is less than 1. There’s quite a sizeable chunk of that probability where the effect size is very small and not really much of a benefit, so it’s not 76% probability that the treatment does anything useful. The paper actually reported the probability that the absolute risk difference was >2%, which was 64%, so quite a bit lower.</p>
<p>Second, 76% probability of a risk ratio less than 1 also means 24% probability that it is more than 1, so there is a fairly substantial probability that the treatment isn’t beneficial at all. I guess we are more used to thinking of results in terms of “working” or “not working” and a 76% probability sounds like a high probability of effectiveness.</p>
<p>Third, the point estimate. The critical point here is that the results presented in the paper were adjusted estimates, using baseline measures of severity as covariates. The Bayesian analysis with neutral prior centred on 1 would in fact pull the risk ratio estimate towards 1; the reason the final estimate (0.86) shows a bigger effect than the unadjusted estimate (0.875) is the adjustment, not the Bayesian analysis. The hypothermia group was a bit more severely affected than the control group, so the unadjusted estimate is over-conservative (too near 1), and the covariate adjustment has reduced the risk ratio. So even when pulled back towards 1 by the neutral prior, it’s still lower than the unadjusted estimate.</p>
<p>Another Twitter comment was that the neutral prior was far too weak, and gave too much probability to unrealistic effect sizes. The commenter advocated using a much narrower prior centred on 1, but with much less spread. I don’t agree with that though, mainly because assuming such a prior would be equivalent to assuming more data in the prior than in the actual trial, which doesn’t seem sensible when it isn’t based on actual real data.</p>
<p>The other question about priors is what would be a reasonable expectation based on what we know already? If we believe that early starting of hypothermia gives a substantial benefit (which several trials have found, I think), then it seems totally reasonable that a later start might also be beneficial, just maybe a bit less so. The results are consistent with this interpretation – the most probable risk ratios are around 0.85.</p>
<p>Going further, the division into “early” or “late” starting of hypothermia (before or after 6 hours of age) is obviously artificial; there isn’t anything that magically happens at 6 hours, or any other point. Much more plausible is a decline in effectiveness with increasing time to onset of hypothermia. It would be really interesting and useful to understand that relationship, and the point at which it wasn’t worth starting hypothermia. That would be something that could be investigated with the data from this and other trials, as they all recruited infants with a range of ages (in this trial it was 6 to 24 hours). Maybe that’s an individual patient data meta-analysis project for someone.</p><p>This post arose from a discussion on Twitter about a recently-published randomised trial. Twitter isn’t the best forum for debate so I wanted to summarise my thoughts here in some more detail.</p>
<p>What was interesting about the trial was that it used a Bayesian analysis, but this provoked a lot of reaction on Twitter that seemed to miss the mark a bit. There were some features of the analysis that some people found challenging, and the Bayesian methods tended to get the blame for that, incorrectly in my view.</p>
<p>First, a bit about the trial. It’s this one: <br />
<strong>Laptook et al</strong> <em>Effect of Therapeutic Hypothermia Initiated After 6 Hours of Age on Death or Disability Among Newborns With Hypoxic-Ischemic Encephalopathy. <span class="caps">JAMA 2017</span>; 318(16): 1550-1560.</em></p>
<p>This trial randomised infants with hypoxic ischaemic encephalopathy who were aged over 6 hours to cooling to 33.5 C for 96 hours (to prevent brain injury) or no cooling. Earlier studies have established that cooling started in the first 6 hours after birth reduces death and disability, so it is plausible that starting later might also help, though maybe the effect would be smaller. The main outcome was death or disability at 18 months.</p>
<p>The methodological interest here is that they used a Bayesian final analysis, because they felt that they would only be able to recruit a restricted number of infants, and a Bayesian analysis would be more informative, as it can quantify the probability of the treatment’s benefit, rather than giving the usual significant/non-significant = works/doesn’t work dichotomy.</p>
<p>The main outcome occurred in 19/78 in the hypothermia group and 22/79 in the no hypothermia group. Their analysis used three different priors, a neutral prior (centred on <span class="caps">RR 1</span>.0), an enthusiastic prior, centred on <span class="caps">RR 0</span>.72 (as found in an earlier trial of hypothermia started before 6 hours), and a sceptical prior, centred on <span class="caps">RR 1</span>.10. The 95% interval for the neutral prior was from 0.5 to 2.0, so moderately informative.</p>
<p>The results for the Bayesian analysis with the neutral prior that were presented in the paper were: an adjusted risk ratio of 0.86, with 95% interval from 0.58 to 1.29, and 76% probability of the risk ratio being less than 1.</p>
<p>OK, that’s the background.</p>
<p>Here are some unattributed Twitter reactions:</p>
<blockquote>
<p>“This group (mis)used Bayesian methods to turn a 150 pt trial w P=0.62 into a + result w/ post prob efficacy of 76%!”</p>
</blockquote>
<blockquote>
<p>“I think the analysis is suspicious, it moves the posterior more than the actual effect size in study, regardless which prior chosen <br />
Primary outcome is 24.4% v 27.9% which is RR of 0.875 at best. Even with a weak neutral prior, should not come up with aRR to 0.86<br />
Also incredibly weak priors with high variance chosen, with these assumptions, even a n=30 trial would have shifted the posterior.”</p>
</blockquote>
<p>There were some replies from Bayesian statisticians, saying (basically) no, it looks fine. The responses were interesting to me, as I have frequently said that Bayesian methods would help clinicians to understand results from clinical trials more easily. Maybe that’s not true! So it’s worth digging a bit into what’s going on.</p>
<p>First, on the face of it 19 versus 22 patients with the outcome (that’s 24.4% versus 27.8%) doesn’t look like much of a difference. It’s the sort of difference that all of us are used to seeing described as “non-significant,” followed by a conclusion that the treatment was not effective or something like that. So to see this result meaning a probability of benefit of 76% might look as if it’s overstating the case.</p>
<p>Similarly, the unadjusted risk ratio was about 0. 875, but the Bayesian neutral-prior analysis had RR=0.86; so it looks as though there has been some alchemy in the Bayesian analysis to increase the effect size.</p>
<p>So is there a problem or not? First, the 76% probability of benefit just means 76% posterior probability (based on the prior, model and data) that the risk ratio is less than 1. There’s quite a sizeable chunk of that probability where the effect size is very small and not really much of a benefit, so it’s not 76% probability that the treatment does anything useful. The paper actually reported the probability that the absolute risk difference was >2%, which was 64%, so quite a bit lower.</p>
<p>Second, 76% probability of a risk ratio less than 1 also means 24% probability that it is more than 1, so there is a fairly substantial probability that the treatment isn’t beneficial at all. I guess we are more used to thinking of results in terms of “working” or “not working” and a 76% probability sounds like a high probability of effectiveness.</p>
<p>Third, the point estimate. The critical point here is that the results presented in the paper were adjusted estimates, using baseline measures of severity as covariates. The Bayesian analysis with neutral prior centred on 1 would in fact pull the risk ratio estimate towards 1; the reason the final estimate (0.86) shows a bigger effect than the unadjusted estimate (0.875) is the adjustment, not the Bayesian analysis. The hypothermia group was a bit more severely affected than the control group, so the unadjusted estimate is over-conservative (too near 1), and the covariate adjustment has reduced the risk ratio. So even when pulled back towards 1 by the neutral prior, it’s still lower than the unadjusted estimate.</p>
<p>Another Twitter comment was that the neutral prior was far too weak, and gave too much probability to unrealistic effect sizes. The commenter advocated using a much narrower prior centred on 1, but with much less spread. I don’t agree with that though, mainly because assuming such a prior would be equivalent to assuming more data in the prior than in the actual trial, which doesn’t seem sensible when it isn’t based on actual real data.</p>
<p>The other question about priors is what would be a reasonable expectation based on what we know already? If we believe that early starting of hypothermia gives a substantial benefit (which several trials have found, I think), then it seems totally reasonable that a later start might also be beneficial, just maybe a bit less so. The results are consistent with this interpretation – the most probable risk ratios are around 0.85.</p>
<p>Going further, the division into “early” or “late” starting of hypothermia (before or after 6 hours of age) is obviously artificial; there isn’t anything that magically happens at 6 hours, or any other point. Much more plausible is a decline in effectiveness with increasing time to onset of hypothermia. It would be really interesting and useful to understand that relationship, and the point at which it wasn’t worth starting hypothermia. That would be something that could be investigated with the data from this and other trials, as they all recruited infants with a range of ages (in this trial it was 6 to 24 hours). Maybe that’s an individual patient data meta-analysis project for someone.</p>The future is still in the future by Simon GatesSimon Gateshttps://blogs.warwick.ac.uk/simongates/entry/the_future_is/2017-07-31T16:35:08Z2017-07-18T09:49:01Z<p>I just did a project with a work experience student that involved looking back through four top medical journals for the past year (NEJM, JAMA, Lancet and BMJ), looking for reports of randomised trials. As you can imagine, there were quite a lot - I'm not sure exactly how many because only a subset were eligible for the study we were doing. We found 89 eligible for our study, so there were probably at least 200 in total. </p>
<p>Of all those trials, I saw only ONE that used Bayesian statistical methods. The rest were still doing all the old stuff with null hypotheses and significance testing.</p><p>I just did a project with a work experience student that involved looking back through four top medical journals for the past year (NEJM, JAMA, Lancet and BMJ), looking for reports of randomised trials. As you can imagine, there were quite a lot - I'm not sure exactly how many because only a subset were eligible for the study we were doing. We found 89 eligible for our study, so there were probably at least 200 in total. </p>
<p>Of all those trials, I saw only ONE that used Bayesian statistical methods. The rest were still doing all the old stuff with null hypotheses and significance testing.</p>Why do they say that? by Simon GatesSimon Gateshttps://blogs.warwick.ac.uk/simongates/entry/why_do_they/2015-12-10T13:17:49Z2015-12-09T15:27:02Z<p>A thing I've heard several times is that Bayesian methods might be advantageous for Phase 2 trials but not for Phase 3. I've struggled to understand why people would think that. To me, the advantage of Bayesian methods comes in the fact that the methods make sense, answer relevant questions and give understandable answers, which seem just as important in Phase 3 trials as in Phase 2.</p>
<p>One of my colleagues gave me his explanation, which I will paraphrase. He made two points:</p>
<p><em>1. Decision-making processes are different after Phase 2 and Phase 3 trials; folowing Phase 2 decisions about whether to proceed further are made by researchers or research funders, but after Phase 3 decisons (about use of therapies presumably) are taken by "society" in the form of regulators or healthcare providers. This makes the Bayesian approach harder as it is harder to formulate a sensible prior (for Phase 3 I think he means).</em></p>
<p><em>2. In Phase 3 trials sample sizes are larger so the prior is almost always swamped by the data, so Bayesian methods don't add anything.</em></p>
<p>My answer to point 1: Bayesian methods are about more than priors. I think this criticism comes from the (limited and in my view somewhat misguided) view of priors as a personal belief. That is one way of specifying them but not the most useful way. As Andrew Gelman has said, prior INFORMATION not prior BELIEF. And you can probably specify information in pretty much the same way for both Phase 2 and Phase 3 trials.</p>
<p>My answer to point 2: Bayesian methods aren't just about including prior information in the analysis (though they are great for doing that if you want to). I'll reiterate my reasons for preferring them that I gave earlier - the methods make sense, answer relevant questions and give understandable answers. Why would you want to use a method that doesn't answer the question and nobody understands? Also, If you DO have good prior information, you can reach an answer more quickly by incorporating that in the analysis - which we kind of do by doing trials and then combining them with others in meta-analyses; but doing it the Bayesian way would be neater and more efficient.</p><p>A thing I've heard several times is that Bayesian methods might be advantageous for Phase 2 trials but not for Phase 3. I've struggled to understand why people would think that. To me, the advantage of Bayesian methods comes in the fact that the methods make sense, answer relevant questions and give understandable answers, which seem just as important in Phase 3 trials as in Phase 2.</p>
<p>One of my colleagues gave me his explanation, which I will paraphrase. He made two points:</p>
<p><em>1. Decision-making processes are different after Phase 2 and Phase 3 trials; folowing Phase 2 decisions about whether to proceed further are made by researchers or research funders, but after Phase 3 decisons (about use of therapies presumably) are taken by "society" in the form of regulators or healthcare providers. This makes the Bayesian approach harder as it is harder to formulate a sensible prior (for Phase 3 I think he means).</em></p>
<p><em>2. In Phase 3 trials sample sizes are larger so the prior is almost always swamped by the data, so Bayesian methods don't add anything.</em></p>
<p>My answer to point 1: Bayesian methods are about more than priors. I think this criticism comes from the (limited and in my view somewhat misguided) view of priors as a personal belief. That is one way of specifying them but not the most useful way. As Andrew Gelman has said, prior INFORMATION not prior BELIEF. And you can probably specify information in pretty much the same way for both Phase 2 and Phase 3 trials.</p>
<p>My answer to point 2: Bayesian methods aren't just about including prior information in the analysis (though they are great for doing that if you want to). I'll reiterate my reasons for preferring them that I gave earlier - the methods make sense, answer relevant questions and give understandable answers. Why would you want to use a method that doesn't answer the question and nobody understands? Also, If you DO have good prior information, you can reach an answer more quickly by incorporating that in the analysis - which we kind of do by doing trials and then combining them with others in meta-analyses; but doing it the Bayesian way would be neater and more efficient.</p>