Exercise not acupuncture recommended by NICE for low back pain

This blog was first published on 31st March 2016 on BMJ Blogs (link no longer available).

back needles


Low back pain and sciatica: management of non-specific low back pain and sciatica – draft clinical guideline February 2016 (link no longer available).

NICE clinical guidelines are very large pieces of work. This draft runs to over 1000 pages with the addition of around 2500 pages of appendices, and data extracted and analysed from nearly 600 RCTs. Having sat on the guideline development group (GDG) meetings for CG88 I had a first-hand view of the size and difficulty of the task, and the GDG in this case is to be congratulated for their work in completing this draft.

In this discussion I will focus on a very small section, and highlight key data entry errors within the analysis. I hope that a careful reconsideration of the data may result in a positive recommendation for acupuncture in low back pain.

Acupuncture techniques have been used in the UK for at least 200 years, but their strongest association is with traditional East Asian medicine, and therefore they can seem conceptually alien to our contemporary scientific medicine. Whilst the GDG recognised the modern interpretation of acupuncture, and its scientific basis, under the heading of Western medical acupuncture (WMA), they went on to apply an additional requirement to acupuncture that apparently was not applied to similar interventions (exercise & manual therapies).

The GDG first discussed the necessity of a body of evidence to show specific intervention effects, that is, over and above any contextual or placebo effects.

[draft 1, p 493]

So, this involves a focus on sham controlled trials of acupuncture, which nearly always compare two forms of needling – a gentle superficial form at sites away from the most common points used (minimal or sham), and an average style of routine acupuncture usually at muscle level. The comparison of normal and sham acupuncture certainly excludes contextual and placebo effects, but it also excludes the effect of gentle needling, and therefore underestimates the whole effect attributable to needle acupuncture. Consequently, it would be inequitable to place too strong a reliance on the clinical relevance of this difference, but appropriate to focus on this for biological plausibility of the technique, before moving on to consider more pragmatic comparisons with usual care.

For the purposes of this discussion, I will focus on the sham-controlled evidence for exercise, manual therapies and acupuncture, and compare and contrast the strength of evidence and the subsequent recommendations.

Exercise was recommended, and the GDG commented:

The GDG noted that there was some evidence of benefit for all exercise types compared to sham, usual care or other active comparators, …

[draft 1, p 303]

By contrast I could not find any evidence of an effect of exercise over sham. Indeed, there were only two trials that included data. Appendix K p 60 shows a forest plot (Fig 219) with data from Albert 2012 – this plot seems to demonstrate an effect of exercise over sham for pain ≤4 months, but the data in this plot is different from that extracted from the paper and included in the table of Appendix H p 146. Indeed, the original paper reports no difference between exercise and sham in the primary outcomes, and the responder rate was actually slightly greater for sham. Only secondary outcomes of neurological signs relevant to sciatica were in favour of exercise over sham.

The second paper with data relevant to the comparison of exercise over sham reported no significant benefit in terms of the only outcome reported – psychological distress. [Appendix K, p 70]

So, there is no sham-controlled data supporting exercise interventions, yet the GDG made a positive recommendation. This positive recommendation was therefore based either on error from faulty data entry, or on low quality data that could have been entirely attributable to contextual effects, those that the GDG insisted on excluding when considering data on acupuncture.

Manual therapies were also recommended. Two small trials tested massage against sham, and there was a borderline effect over sham for pain <4 months – the lower 95% confidence interval (CI) crossed the line of no effect by 0.02. [Appendix K, p 115] Five trials (533 patients) were combined for manipulation over sham for pain ≤4 months. The mean difference in VAS (0-10) was -0.26, and the lower 95% confidence interval (CI) reached zero. [Appendix K, p 122] There were no long-term effects >4 months. This is very weak data on which to base a positive recommendation.

The meta-analysis of acupuncture over sham (minimal needling) for pain ≤4 months included a major data entry error. [Appendix K, p 153] Brinkhaus 2006 data was entered as values representing a decrease in pain score from baseline rather than as the absolute value after treatment. This error flipped the point estimate for the mean difference to the wrong side of the line of no effect ie favouring sham instead of acupuncture. This resulted in a reduction in the total effect size, and more importantly an erroneously high heterogeneity (I2 = 76%). Both of these potentially resulted in a reduction in the ‘quality of evidence’ (GRADE) for this item, which was consequently presented as Low quality, rather than High or Moderate. It was this uncertainty that resulted in the GDG statement:

Heterogeneity was observed in the meta-analysis that was unexplained by pre-specified subgroup analysis of type of acupuncture or duration of pain.

[draft 1, p 493]

Despite this error, the point estimate was -0.8, and the lower 95% CI was well clear of the line of no effect, resulting in a highly statistically significant result in favour of acupuncture over sham for pain ≤4 months. This analysis included 7 RCTs and a total of 1359 patients. Furthermore, the effect of acupuncture over sham in the long-term (>4 months) was also positive, with no heterogeneity, 4 RCTs and 1159 patients.

This data clearly demonstrates the biological plausibility of normal acupuncture over gentle needling. For clinically relevant effects we should look at the data compared with usual care. This analysis demonstrates clinically relevant effects for pain ≤4 months, but high heterogeneity. The latter is clearly related to the differences in the usual care comparisons in the larger trials: Brinkhaus 2006 used a waiting list control; and Haake 2007 used rather intensive guideline-based conventional care (physician visits, physiotherapy, NSAIDs). Despite this obvious clinical heterogeneity in the control groups, the GRADE category for quality was automatically reduced. The GDG stated that the benefits on pain were not sustained beyond 4 months; [draft 1, p 494] however, the forest plot for acupuncture compared with usual care for pain >4 months clearly demonstrates a statistically significant benefit. [Appendix K, p 159]

I note that the health economic data demonstrates a more favourable cost per quality adjusted life year (QALY) for acupuncture compared with the cost of either exercise or manual therapies. [Appendix I, p 29, p 18, p 27]

Taking all this together, I call for the GDG to look again at their data with the errors corrected, and invite them to consider a more equitable recommendation for acupuncture in low back pain.

Declaration of interests MC