Musings on heterogeneity in quantitative outcomes of acupuncture trials in LBP

This blog was first published on 4th April 2016 on

apple and pear pyramid picFurther commentary:
Low back pain and sciatica: management of non-specific low back pain and sciatica
Draft clinical guideline February 2016

This commentary follows a previous blog post.

Late last Friday night I got around to dropping the pain VAS outcome figures from the trials of acupuncture versus sham into RevMan 5 – the software used for Cochrane Reviews. I was surprised to find that the high I2 value for the short-term outcome in the draft guideline did not drop substantially with the corrected data (incidentally there were errors in the data from both Brinkhaus and Leibing). Here is the corrected forest plot to replace Figure 667 [Appendix K, p 153].


The total mean difference in pain now reaches clinical significance, and remember that is the difference over gentle needling, not an inactive placebo intervention. However, the heterogeneity remains unexpectedly high. The outlier now is Haake. This was a huge multicentre trial with some 300 different centres, where the participant clinicians did not meet for instruction on intervention procedures, as the 26 in Brinkhaus did. The primary outcome in Haake showed both real and sham acupuncture were twice as good as guideline based conventional care, so we might hypothesise that the sham was closer to real acupuncture than in Brinkhaus. Excluding Haake removes all heterogeneity.


So one large trial where we suspect substantial differences in the comparator (sham acupuncture) creates all the heterogeneity. But large trials are usually held out to be more statistically reliable, so there does remain some uncertainty in interpretation. I should point out that within RevMan the pain results for Haake are positive, so whether you think the sum of the smaller trials (n=610) or Haake (n=749) are more reliable, both demonstrate a biological effect of average acupuncture over gentle acupuncture.

Moving to the long term analysis (pain VAS >4 month), there was a data entry error here too. Hard to spot, but glaring when noticed – the pain VAS outcome for Leibing was a negative value! How can a pain score be negative? The negative figure is clearly a change value, not an absolute value of pain at the relevant time point (this is the same data entry error made for the Brinkhaus data). Here is the corrected Figure 668 [Appendix K, p 153].


Statistically positive and no heterogeneity, this represents a clear long term biological effect of average acupuncture over gentle acupuncture, although the difference is not in the range that would be regarded as clinically significant by NICE, if indeed you can judge clinical significance in an explanatory (sham controlled) model. The heterogeneity result seems to be explained by a reduction in the mean difference between acupuncture and sham in the smaller trials, and no change in that of Haake, so in effect the smaller trial results got closer to the results of Haake. In terms of absolute pain scores, it seems that, on average, the patients in Haake continued to improve, whereas those in the smaller trials deteriorated slightly.

In summary, whilst there remains some uncertainty about interpretation of the clinical relevance of this data, it is clear that average acupuncture is superior to gentle acupuncture for low back pain in both the short and long term outcomes, and this data is clearly more convincing than the equivalent data for either the exercise or the manual therapies recommended in the draft NICE guideline for low back pain.

Declaration of interests MC