This blog was first published on 26th January 2017 on https://blogs.bmj.com/aim/. At the time I was in Cape Town on holiday, trying to get a rapid response published to the NG59 summary in the BMJ. It was critical of NICE, and I was negotiating over content with a legal expert from BMJ! The response took three weeks to go up, by which time it was too late to be noticed. In the meantime I created a bit of a storm with this blog, and my use of the term ‘old sceptic blogger’ in the title. This is the version edited by BMJ.
So there has been a big response to this paper press released by BMJ on behalf of the journal Acupuncture in Medicine. The response has been influenced by the usual characters – retired professors who are professional bloggers and vocal critics of anything in the realm of complementary medicine. They thrive on flexing their EBM muscles for a baying mob of fellow sceptics (see my ‘stereotypical mental image’ here). Their target in this instant is a relatively small trial on acupuncture for infantile colic. Deserving of being press released by virtue of being the largest to date in the field, but by no means because it gave a definitive answer to the question of the efficacy of acupuncture in the condition. We need to wait for an SR where the data from the 4 trials to date can be combined.
On this occasion I had the pleasure of joining a short segment on the Today programme on BBC Radio 4 led by John Humphreys. My protagonist was David Colquhoun, who spent his short air-time complaining that the journal was even allowed to be published in the first place. Why would BBC Radio 4 invite a retired basic scientist and professional sceptic to be interviewed alongside one of the journal editors – a clinician with expertise in acupuncture (WMA)? At no point was it made manifest that only one of the two had ever been in a position to try to help parents with a baby that they think cries excessively.
So what about the research itself? I have already said that the trial was not definitive, but it was not a bad trial. It suffered from under-recruiting, which meant that it was underpowered in terms of the statistical analysis. But it was prospectively registered, had ethical approval and the protocol was published. Primary and secondary outcomes were clearly defined, and the only change from the published protocol was to combine the two acupuncture groups in an attempt to improve the statistical power because of under recruitment. The fact that this decision was made after the trial had begun means that the results would have to be considered speculative. For this reason the editors of Acupuncture in Medicine insisted on alteration of the language in which the conclusions were framed to reflect this level of uncertainty.
David Colquhoun has focussed on multiple statistical testing and p values. These are important considerations, and we could have insisted on more clarity in the paper. P values are a guide and the 0.05 level commonly adopted must be interpreted appropriately in the circumstances. In this paper there are no definitive conclusions, so the p values recorded are there to guide future hypothesis generation and trial design. There were over 50 p values reported in this paper, so by chance alone you must expect some to be below 0.05. If one is to claim statistical significance of an outcome at the 0.05 level, ie a 1:20 likelihood of the event happening by chance alone, you can only perform the test once. If you perform the test twice you must reduce the p value to 0.025 if you want to claim statistical significance of one or other of the tests. So now we must come to the predefined outcomes. They were clearly stated, and the results of these are the only ones relevant to the conclusions of the paper. The primary outcome was the relative reduction in total crying time (TC) at 2 weeks. There were two significance tests at this point for relative TC. For a statistically significant result, the p values would need to be less than or equal to 0.025 – neither was this low, hence my comment on the Radio 4 Today programme that this was technically a negative trial (more correctly ‘not a positive trial’ – it failed to disprove the null hypothesis ie that the samples were drawn from the same population and the acupuncture intervention did not change the population treated). Finally to the secondary outcome – this was the number of infants in each group who continued to fulfil the criteria for colic at the end of each intervention week. There were four tests of significance so we need to divide 0.05 by 4 to maintain the 1:20 chance of a random event ie only draw conclusions regarding statistical significance if any of the tests resulted in a p value at or below 0.0125. Two of the 4 tests were below this figure, so we say that the result is unlikely to have been chance alone in this case. With hindsight it might have been good to include this explanation in the paper itself, but as editors we must constantly balance how much we push authors to adjust their papers, and in this case the editor focussed on reducing the conclusions to being speculative rather than definitive. A significant result in a secondary outcome leads to a speculative conclusion that acupuncture ‘may’ be an effective treatment option… but further research will be needed etc…
Now a final word on the 3000 plus acupuncture trials that David Colquhoun mentions. His point is that there is no consistent evidence for acupuncture after over 3000 RCTs, so it clearly doesn’t work. He first quoted this figure in an editorial after discussing the largest, most statistically reliable meta-analysis to date – the Vickers et al IPDM. He admits that there is a small effect of acupuncture over sham, but follows the standard EBM mantra that it is too small to be clinically meaningful without ever considering the possibility that sham (gentle acupuncture plus context of acupuncture) can have clinically relevant effects when compared with conventional treatments. Perhaps now the best example of this is a network meta-analysis (NMA) using individual patient data (IPD), which clearly demonstrates benefits of sham acupuncture over usual care (a variety of best standard or usual care) in terms of health-related quality of life (HRQoL).
Key to abbreviations
- BMJ – British Medical Journal (company)
- EBM – evidence-based medicine
- HRQoL – health-related quality of life
- IDP – individual patient data
- IDPM – individual patient data meta-analysis
- MCID – minimal clinically important difference
- NMA – network meta-analysis
- SR – systematic review
- VAS – visual analogue scale (usually a 100mm line)
- Landgren K, Hallström I. Effect of minimal acupuncture for infantile colic: a multicentre, three-armed, single-blind, randomised controlled trial (ACU-COL). Acupunct Med 2017: acupmed-2016-011208. doi:10.1136/acupmed-2016-011208
- Vickers AJ, Cronin AM, Maschino AC, et al. Acupuncture for chronic pain: individual patient data meta-analysis. Arch Intern Med 2012;172:1444–53. doi:10.1001/archinternmed.2012.3654
- Saramago P, Woods B, Weatherly H, et al. Methods for network meta-analysis of continuous outcomes using individual patient data: a case study in acupuncture for chronic pain. BMC Med Res Methodol 2016;16:131. doi:10.1186/s12874-016-0224-1