NONINVASIVE NONPHARMACOLOGICAL TREATMENT FOR CHRONIC PAIN: A SYSTEMATIC REVIEW UPDATE (2020)

Noninvasive Nonpharmacological Treatment for Chronic Pain:
A Systematic Review Update (April 16, 2020).

Andrea C. Skelly, Ph.D., M.P.H., Roger Chou, M.D., Joseph R. Dettori, Ph.D., M.P.H., M.P.T.,
Judith A. Turner, Ph.D., et al.

Rockville (MD): Agency for Healthcare Research and Quality (US); 2020 (Apr)

This section was compiled by Frank M. Painter, D.C.
Send all comments or additions to: Frankp@chiro.org

Results

Introduction

Results are organized by Key Question (i.e., by condition) and intervention and then organized by comparators for each subquestion. We categorized postintervention followup as short term (1 to <6 months), intermediate term (≥6 to <12 months) and long term (≥12 months). We prioritized function and pain outcomes based on validated measures. For some conditions (e.g., osteoarthritis [OA]), results are organized by affected region.

We synthesized data qualitatively and quantitatively, using meta-analysis where appropriate. Two continuous primary outcomes (pain, function) provided adequate data for meta-analysis. For meta-analyses providing pooled estimates, we report results from heterogeneity testing. I-squared and corresponding p-values describe the degree and statistical significance of heterogeneity across studies; pooled (subtotal) estimates are statistically significant if the confidence interval does not include the value of 0 for mean differences (MDs) or the value of 1 for risk ratios (RR). (See the Methods section of this report and the protocol for additional details on data analysis and synthesis.) In general, if effect estimates tended to favor one treatment but failed to reach statistical significance with confidence interval crossing the null value of zero or one (perhaps due to sample size), the results are interpreted as showing no clear difference between treatments. If effect estimates are close to zero and not statistically significant, results are interpreted as no difference between groups.

A list of acronyms and abbreviations appears at the end of the report.

      Results of Literature Searches

Figure 2
The search and selection of articles are summarized in the literature flow diagram (Figure 2). The original database searches resulted in 4,996 potentially relevant articles; an additional 3,520 were identified for this update. After dual review of abstracts and titles, 1,574 articles across searches (381 new to this update) were selected for full-text dual review, and 252 publications (34 added for this update) were determined to meet inclusion criteria and were included in this review. Nearly one-fourth of the trials excluded at full text did not meet our criteria for followup duration (i.e., a minimum of 1 month of followup after termination of the intervention, or postintervention if the intervention duration was at least 6 months). Other common reasons for exclusion of primary trials included ineligible population and ineligible intervention or comparator (i.e., combination of treatments or if treatments were additive in nature). Data abstraction and quality assessment tables for all included studies are available in Appendix D and Appendix E.

Figure 2 is a flow chart that outlines study retrieval and selection process. It begins with the total number of citations retrieved from the literature searches and ends with the number of studies that satisfied the inclusion criteria of the report. This figure is described further in the Section entitled “Results of Literature Searches”. Briefly, a total of 8,516 potentially relevant citations were identified (4996 from the prior AHRQ report and 3,520 for this update) and, after removal of duplicates, 6,702 (4,276 from the prior AHRQ report and 7,474 for this update) underwent title and abstract review. After dual review of abstracts and titles, 5900 articles (3,083 from the prior AHRQ report and 2,817 for this update) were excluded. The remaining 1,574 articles (1,193 from the prior AHRQ report and 381 for this update) underwent dual review at the full text level and a total of 233 trials (202 from the prior AHRQ report and 31 for this update) in 252 publications (218 from the prior AHRQ report and 34 for this update) met the inclusion criteria and were included in this report update.

      Description of Included Studies

Table 4
A total of 233 trials (in 252 publications) were included. For each intervention category, the comparisons evaluated and their respective studies are listed in Table 4. The number of studies and related publications included for each condition (and the number of new studies and publications in this update review) are:

Chronic low back pain: 77 studies in 83 publications (9 new trials)
Chronic neck pain: 27 studies in 28 publications (2 new trials, 1 new publication)
Osteoarthritis: 62 studies in 66 publications (9 new trials in 10 publications
Fibromyalgia: 58 studies in 66 publications (11 new trials in 12 publications)
Chronic tension headache: 9 studies
Thirty-six percent of the included trials were small (<70 participants). Across trials, most patients were female (>57%), with a mean ages ranging from 31 to 78 years; patients with OA tended to be older in general than those in the other conditions (range, 52 to 76 years). Mean pain duration for patients with chronic low back pain, chronic neck pain, and OA were similar and varied widely from 6 months to 15 years. Mean symptom duration in trials of fibromyalgia and chronic tension headache tended to be at least 4 years (up to 22 years). Exercise interventions were the most commonly studied for OA and fibromyalgia. Psychological therapies were most commonly studied for fibromyalgia, and manual therapies were most commonly studied for chronic low back pain. We identified trials of acupuncture for all included conditions. Multidisciplinary rehabilitation was studied primarily for chronic low back pain and fibromyalgia. Most trials of multidisciplinary rehabilitation used a functional restoration approach either explicitly or implicitly. Limited evidence was available for hip or hand OA or chronic tension headache. The majority of trials compared nonpharmacological interventions with usual care, waitlist, no treatment, attention control, or placebo/sham, with very few trials employing pharmacological treatments or exercise as comparators. Little long-term evidence was available across conditions and interventions.

Figure 3
The majority of trials (61%) were rated fair quality with only 6 percent considered good quality (Figure 3). For chronic tension headache, no study was considered good quality. In the majority of trials (72%), attrition was under 20 percent and therefore rated as acceptable. Across trials where attrition was not acceptable, the range was 20 to 63 percent. A primary methodological limitation in many trials was the inability to effectively blind participants and in many cases providers. Poor reporting of randomization and allocation concealment methods were common shortcomings. Acceptable adherence, defined as completion of a minimum of 80 percent of planned treatment, was reported in 44 percent of trials. It was either unclear (40%) or unacceptable (16%) in the majority of trials.

Figure 3 is bar graph depicting the distribution of individual study quality ratings across the report as a whole and by each chronic pain condition. Overall, 6% of trials were considered good quality, 61% were fair quality, and 33% were poor quality. For low back pain, 6% were good quality, 70% fair quality, and 23% poor quality. For neck pain, 4% were good quality, 59% were fair quality, and 37% were poor quality. For osteoarthritis, 8% were good quality, 63% were fair quality and 29% were poor quality. For fibromyalgia, 5% were good quality, 52% were fair quality, and 43% were poor quality. For tension headache, 22% were fair quality and 78% were poor quality; there were no good quality studies for headache.

      Key Question 1. Chronic Low Back Pain

For chronic low back pain, 68 randomized controlled trials (RCTs) (in 74 publications) were included in the prior Agency for Healthcar Research and Quality (AHRQ) report (N=13,163). Two studies were rated good-quality, 49 studies fair quality, and 17 studies poor quality. The prior AHRQ report found massage, yoga, psychological therapies, exercise, acupuncture, low-level laser therapy, spinal manipulation, and multidisciplinary rehabilitation associated with greater effects than usual care, attention control, sham, or placebo on improved pain or function. The strength of evidence was low or moderate, generally stronger for pain than for function, and observed at short- or intermediate-term followup, with the exception of psychological therapies, which were associated with small effects at long-term followup.

For this update, we identified nine new RCTs (N=1,026). Three of the new studies were rated good quality; four were rated fair quality, and two were rated poor quality. The new trials evaluated exercise (5 trials) massage (2 trials), yoga (2 trials), and interferential therapy (1 trial); one trial evaluated both exercise and yoga interventions. The Key Points summarize the main findings based on the evidence included in the prior report and new trials; the Key Points note where new trials contributed to findings.
Exercise for Chronic Low Back Pain

Key Points

Exercise was associated with a small improvement in short-term function compared with usual care, an attention control, or a placebo intervention (10 trials [4 new], pooled standardized mean difference [SMD] –0.31, 95% confidence interval [CI] –0.50 to –0.13, I²=32%) after excluding an outlier trial; there were no effects on intermediate-term function (5 trials [2 new], pooled SMD –0.17, 95% CI –0.39 to 0.02, I²=0%) or long-term function (1 trial, difference 0.00 on the 0 to 100 Oswestry Disability Index [ODI], 95% CI –11.4 to 11.4) (strength of evidence [SOE]: moderate for short term, low for intermediate and long term).
Exercise was associated with moderate effects on pain versus usual care, an attention control, or a placebo intervention at short-term (11 trials [5 new], pooled difference –1.21 on a 0 to 10 scale, 95% CI –1.77 to –0.65, I²=64%) and long-term (1 trial, difference –1.55, 95% CI –2.76 to –0.34), and a small effect at intermediate-term (5 trials [2 new], pooled MD –0.85, 95% CI –1.67 to –0.07, I²=50%) followup (SOE: low for all timepoints).
No trial evaluated exercise versus pharmacological therapy.
Comparisons involving exercise versus other nonpharmacological therapies are addressed in the sections for the other therapies.
Harms were not reported in most trials; one trial did not find an association between exercise and increased pain versus placebo and one trial reported no adverse events (SOE: low).

Detailed Synthesis

Table 5
Eleven trials of exercise therapy for low back pain met inclusion criteria (Table 5 and Appendix D). [31–40, 212] Six trials [31–36] were included in the prior AHRQ report and five [37-40, 212] were added for this update.

Three trials (1 new) evaluated neuromuscular re-education exercise (motor control exercises), [31, 32, 38]
four trials 2 new) muscle performance exercises (Pilates or modified Pilates), [35, 36, 40, 212]
three trials (1 new) combined exercise techniques, [33, 34, 39]
and one trial evaluated strength training. [37]
Sample sizes ranged from 42 to 295 (total sample=1,204).

Five trials compared exercise versus an attention control, [32, 33, 35, 37, 38]
four trials compared exercise versus usual care, [34, 36, 40, 212]
and two trials compared exercise versus a placebo intervention (detuned diathermy and ultrasound). [31, 39]

Five trials (1 new) [31–34, 37] were conducted in the United States, Europe, or Australia,
four trials (2 new) [35, 36, 39, 212] in Brazil,
one new trial [38] in Asia, 40] in Iran.

The duration of exercise therapy ranged from 6 to 12 weeks and the number of exercise sessions ranged from 6 to 24. Three trials reported outcomes through long-term followup, [32, 39, 212] four trials reported outcomes through intermediate-term followup [31, 33, 39, 212] and the remainder only evaluated short-term outcomes.

Two trials (both new) [39, 212] were rated good quality, seven trials (2 new) [31–33, 35–38] were rated fair quality, and two trials (1 new) [34, 40] were rated poor quality (Appendix D). In two fair-quality trials, [31, 36] the main methodological limitation was the inability to blind interventions. Limitations in the other trials included unclear randomization and allocation concealment methods, high loss to followup, and baseline differences between intervention groups.

Exercise Compared With Usual Care, an Attention Control, or a Placebo Intervention

Figure 4
Exercise was associated with small effects on short-term function versus controls (11 trials, pooled SMD –0.51, 95% CI –0.98 to –0.08, I²=88%) (Figure 4). [31–40, 212] Excluding one trial [38] that reported a much higher SMD (–3.1) and smaller standard deviation (~1.0) compared to the other trials (SMD range –0.81 to 0.17 and standard deviation range 5 to 17) also resulted in a pooled estimate that favored exercise, though the difference was attenuated (10 trials, pooled SMD –0.31, 95% CI –0.50 to –0.13, I²=32%). Seven trials that evaluated function using the Roland-Morris Disability Questionnaire (RDQ) (0 to 24 scale) reported a pooled difference of –2.86 points (95% CI –3.36 to –1.05). [31, 34–36, 38, 39, 212] and two trials that used the ODI (0 to 100 scale) reported differences that ranged from 3.7 points favoring exercise [40] to 2.9 points favoring an attention control. [32] There were no clear differences in estimates when analyses were stratified according to the type of exercise (pooled SMD estimates ranged from –0.08 to –0.54) or the type of control, or when poor-quality trials were excluded. There were no differences between exercise versus controls in intermediate-term function (5 trials, pooled SMD –0.17, 95% CI –0.39 to 0.02, I²=0%) [31–33, 39, 212] or long-term function (1 trial, difference 0.00, 95% CI –11.4 to 11.4 on the ODI). [32]

Figure 5
Exercise was associated with moderate effects on short-term pain versus usual care, an attention control, or a placebo intervention (11 trials, pooled difference –1.21 on a 0 to 10 scale, 95% CI –1.77 to –0.65, I²=64%) (Figure 5). [31–36, 38–40, 212] There were no clear differences in estimates when analyses were stratified according to the type of exercise (pooled differences ranged from –0.59 to –0.98 points on a 0 to 10 scale), the type of control (usual care, attention control, or placebo intervention), and when poor-quality trials were excluded. Exercise was associated with small effects on intermediate-term pain versus controls (5 trials, pooled difference –0.85, 95% CI –1.67 to –0.07, I²=50%). [31–33, 39, 212] For long-term pain, effects of exercise on pain were moderate compared with attention control, but findings were based on one trial (difference –1.55, 95% CI –2.76 to –0.34). [32]

Evidence on effects of exercise on quality of life was limited. One trial32 found no differences between exercise versus an attention control on the Nottingham Health Profile at short-term, intermediate-term, or long-term followup, and one trial [36] found exercise associated with higher scores on the Short-Form 36 (SF-36) physical functioning (difference 5.8 points on 0 to 100 scale, p=0.026), bodily pain (difference 8.3 points, p=0.03), and vitality subscales (difference 5.3 points, p=0.029) at short-term followup; there were no differences on other SF-36 subscales (Table 5). Another trial found exercise associated with greater improvement in the SF-36 Physical Component Summary versus an attention control (difference 8.26 on a 0 to 100 scale, 95% CI 5.27 to 11.25) but no difference on the SF-36 Mental Component Summary (difference 1.27, 95% CI –3.38 to 5.92). [38]

No trial evaluated effects of exercise on use of opioid therapies or healthcare utilization. There was insufficient evidence to determine effects of duration of exercise therapy or number of sessions on outcomes.

Exercise Compared With Pharmacological Therapy

No trial of exercise versus pharmacological therapy met inclusion criteria.

Exercise Compared With Other Nonpharmacological Therapies

Findings for exercise versus other nonpharmacological therapies are addressed in the sections on other nonpharmacological therapies.

Harms

Harms were not reported in most trials. One trial [31] found no difference between exercise and a placebo intervention (detuned diathermy) in likelihood of increased pain, and another trial [35] reported no adverse events (Appendix D).

Psychological Therapies for Chronic Low Back Pain

Key Points

Psychological therapy was associated with small improvements in function compared with usual care or an attention control at short-term (3 trials, pooled SMD –0.24, 95% CI –0.38 to –0.04, I²=0%), intermediate-term (3 trials, pooled SMD –0.24, 95% CI –0.38 to –0.10, I²=0%), and long-term followup (3 trials, pooled SMD –0.28, 95% CI –0.43 to –0.13, I²=0%) (SOE: moderate).
Psychological therapy was associated with small improvements in pain compared with usual care or an attention control at short-term (3 trials, pooled difference –0.75 on a 0 to 10 scale, 95% CI –1.01 to –0.41, I²=0%), intermediate-term (3 trials, pooled difference –0.71, 95% CI –0.97 to –0.46, I²=0%), and long-term followup (3 trials, pooled difference –0.55, 95% CI –0.92 to –0.23, I²=0%) (SOE: moderate).
Evidence from one poor-quality trial was too unreliable to determine effects of psychological therapy versus exercise (SOE: insufficient).
One trial of cognitive behavioral therapy versus an attention control reported no serious adverse events and one withdrawal due to adverse events in 468 patients (SOE: low).
Detailed Synthesis

Table 6
Five trials (reported in 6 publications) of psychological therapies for low back pain met inclusion criteria (Table 6 and Appendix D). [104–108, 133, 195] All of the trials were included in the prior AHRQ report.

Three trials evaluated group cognitive-behavioral therapy (CBT), [104–107]
one trial evaluated respondent therapy (progressive muscle relaxation), [108]
and one trial evaluated operant therapy. [133] Sample sizes ranged from 49 to 701 (total sample=1,308). The number of psychological therapy sessions ranged from six to eight, and the duration of therapy ranged from 6 to 8 weeks. In one trial [106, 107] the duration of therapy was unclear.

Three trials compared psychological therapies versus usual care, [104, 105, 108]
one trial compared psychological therapy versus an attention control (advice), [106, 107]
and one trial compared psychological therapy versus exercise therapy. [133]

All trials were conducted in the United States or the United Kingdom.
Four trials reported outcomes through long-term (12 to 34 months) followup, [105–107, 133, 195]
one trial evaluated outcomes through intermediate-term followup, [104]
and one trial only evaluated short-term outcomes. [108]

Three trials [104–107] were rated fair quality and two trials poor quality (Appendix D). [108, 133] The major methodological limitation in the fair-quality trials was the inability to effectively blind patients and caregivers to the psychological intervention. Other methodological shortcomings in the poor-quality trials included unclear randomization and allocation concealment methods and high attrition.

Psychological Therapy Compared With Usual Care or an Attention Control

Figure 6
Psychological therapy was associated with small improvements in function compared with usual care or an attention control at short-term (3 trials, pooled SMD –0.24, 95% CI –0.38 to –0.04, I²=0%), [104, 106, 108] intermediate-term (3 trials, pooled SMD –0.24, 95% CI –0.38 to –0.10, I²=0%) [104–106] and long-term followup (3 trials, pooled SMD –0.28, 95% CI –0.43 to –0.13, I²=0%) (Figure 6). [105, 106, 195] Pooled differences on the RDQ or modified RDQ were –1.2 to –1.5 points at all time points. For short-term function, two fair-quality trials [104, 106, 107] evaluated CBT and one poor-quality trial108 evaluated respondent therapy (progressive relaxation). Excluding the poor-quality trial of progressive relaxation, [108] which found no effect on short-term function (SMD –0.08, 95% CI –0.48 to 0.31), had no effect on the pooled estimate (2 trials, pooled SMD –0.26, 95% CI –0.44 to –0.05).

Figure 7
Psychological therapy was associated with small improvements in pain compared with usual care or an attention control at short-term (3 trials, pooled difference –0.75 on a 0 to 10 scale, 95% CI –1.01 to –0.41, I²=0%), [104, 106, 108] intermediate-term (3 trials, pooled difference –0.71, 95% CI –0.97 to –0.46, I²=0%), [104–106] or long-term followup (3 trials, pooled difference –0.55, 95% CI –0.92 to –0.23, I²=0%) (Figure 7). [105, 107, 195] Excluding a poor-quality trial of progressive relaxation, which found no effect on short-term pain (difference –0.14, 95% CI –1.27 to 0.99), did not change the pooled estimate (2 trials, pooled difference –0.78, 95% CI –1.08 to –0.47). For intermediate-term and long-term pain, all trials were fair quality and evaluated CBT.

Effects of psychological therapy on short-term or intermediate-term SF-36 Physical Component (PCS) or Mental Component (MCS) scores were small (differences 0 to 2 points on a 0 to 100 scale) and not statistically significant, except for short-term MCS (2 trials, pooled difference 2.18, 95% CI 0.37 to 4.05). [104, 106] One trial found no effect of psychological therapy on work status or healthcare visits [107] and one trial found no effect of psychological therapy on markers of healthcare utilization. [196]

Psychological Therapy Compared With Pharmacological Therapy

No trial of psychological versus pharmacological therapy met inclusion criteria.

Psychological Therapy Compared With Exercise

One poor-quality trial found no differences between psychological versus exercise therapy in intermediate-term or long-term function. [133] Differences on the McGill Pain Questionnaire were less than 0.5 points on a 0 to 78 scale, and differences on the Sickness Impact Profile were 0.60 to 1.30 points on a 0 to 100 scale.

Harms

Data on harms were sparse. One trial of cognitive-behavioral therapy versus an attention control reported no serious adverse events and one withdrawal due to adverse events among 468 patients randomized to CBT. [106, 107]

Figure 6 is a forest plot. Standardized mean differences were reported or calculated for three short-term studies, with a pooled standardized mean difference of –0.24 (95% confidence interval –0.38 to –0.04) and an overall I-squared value of 0%. Standardized mean differences were reported or calculated for three intermediate-term studies, with a pooled standardized mean difference of –0.24 (95% confidence interval –0.38 to –0.10) and an overall I-squared value of 0%. Standardized mean differences were reported or calculated for three long-term studies, with a pooled standardized mean difference of –0.28 (95% confidence interval –0.43 to –0.13) and an overall I-squared value of 0%.

Physical Modalities for Chronic Low Back Pain

Ultrasound

Two trials found inconsistent effects of ultrasound versus sham ultrasound on short-term function (SOE: insufficient). Two trials found no differences between ultrasound versus sham ultrasound in short-term pain (SOE: low).
One trial found no differences between ultrasound versus sham ultrasound in risk of any adverse events or risk of serious adverse events (SOE: low).
Interferential Therapy

One new trial found interferential therapy associated with effects on short-term function and pain that were below the threshold for small (statistical significance uncertain) when compared with a placebo therapy (SOE: low).
Low-Level Laser Therapy

One trial found low-level laser therapy associated with a small improvement compared with sham laser for short-term function (difference –8.2 on the 0 to 100 ODI, 95% CI –13.6 to –2.8) and a moderate improvement for short-term pain (difference –16.0 on a 0 to 100 scale, 95% CI –28.3 to –3.7) (SOE: low).
One trial found no differences between low-level laser therapy versus exercise therapy in intermediate-term function or pain (SOE: low).
One trial of low-level laser therapy reported no adverse events (SOE: low).
Traction

Two trials found no differences between traction versus sham traction in short-term function or pain (SOE: low).
Harms were not reported in either trial.
Short-Wave Diathermy

Data from a small, poor-quality trial were insufficient to determine effects of short-wave diathermy versus sham (detuned) diathermy (SOE: insufficient).

Detailed Synthesis

Ultrasound

Table 7
Two trials (n=50 and n=455) of ultrasound versus sham ultrasound for low back pain met inclusion criteria (Table 7 and Appendix D). [139, 140] Both of the trials were included in the prior AHRQ report. The duration of ultrasound therapy was 4 and 8 weeks and the number of sessions was 6 and 10. Both trials evaluated outcomes at short-term (1 month) followup. One good-quality trial1 [40] was conducted in the United States and one fair-quality trial [139] in Iran (Appendix E). Methodological limitations in the fair-quality trial included failure to blind care providers and unclear blinding of outcome assessors.

Ultrasound Compared With Sham Ultrasound

Limited evidence indicated no clear differences between ultrasound versus sham ultrasound at short-term followup. One good-quality trial (n=455) found no difference between ultrasound versus sham ultrasound in the RDQ (median 3 vs. 3, p=0.93), likelihood for ≥50 percent improvement in pain (RR 1.09, 95% CI 0.88 to 1.35), SF-36 general health (median 72 vs. 74), likelihood of prescription drug use for low back pain (16% vs. 18%, p=0.54), or risk of serious adverse events (1.3% vs. 2.7%, RR 0.48, 95% CI 0.12 to 1.88) or any adverse event (6.0% vs. 5.9%, RR 1.03, 95% CI 0.49 to 2.13). [140] In the smaller (n=50) fair-quality trial, there was no difference between ultrasound versus sham ultrasound in pain (mean 27.7 vs. 25.5 on a 0 to 100 scale, p=0.48), although ultrasound was associated with better function (mean 22.8 vs. 30.5 on the 0 to 40 Functional Rating Index, p=0.004). [139] No trial evaluated longer-term outcomes.

Ultrasound Compared With Pharmacological Therapy or With Exercise

No trial of ultrasound versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

One trial found no differences between ultrasound versus sham ultrasound in risk of any adverse event (RR 1.03, 95% CI 0.49 to 2.13) or serious adverse event (RR 0.48, 95% CI 0.12 to 1.88). [140]

Interferential Therapy

Table 8
One new trial (n=150) [144] of interferential therapy met inclusion criteria (Table 8 and Appendix D). It found small differences between 1 kHz or 4 kHz interferential therapy versus placebo therapy in the RDQ (differences 0.2 or 0.3 points) and pain (differences 0.2 or 0.4 points) at short-term followup; the statistical significance of findings was unclear due to errors in reporting of the confidence intervals (confidence intervals did not incorporate the point estimates). The trial was rated fair-quality due to the data discrepancies.

Interferential Therapy Compared With Pharmacological Therapy or With Exercise

No trial of interferential therapy versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

One trial found no differences between 1 kHz or 4 kHz interferential therapy versus placebo interferential current in withdrawals due to adverse event (4% vs. 4% vs. 4%, RR 1.0, 95% CI 0.14 to 6.8). [144]

Low-Level Laser Therapy

Table 9
Three trials of low-level laser therapy (n=34, 56, and 71) met inclusion criteria (Table 9 and Appendix D). [141, 142, 170] All of the trials were included in the prior AHRQ report. One trial [142] evaluated neodymium:yttrium-aluminum-garnet (Nd:YAG) laser and two trials [141, 170] evaluated gallium-arsenide (GaAs) laser. Two trials compared low-level laser therapy versus sham laser therapy [141, 142] and one trial low-level laser therapy versus exercise plus sham laser. [170] One trial was conducted in the United States, [142] one in Iran, [170] and one in Argentina. [141] The duration of laser therapy ranged from 2 to 6 weeks and the number of sessions ranged from 10 to 12. One trial141 reported intermediate-term outcomes and the other two trials reported short-term outcomes.

Two trials [142, 170] were rated fair quality and one trial [141] poor quality (Appendix D). The major methodological limitation in the fair-quality trials was unclear allocation concealment methods. [142, 170] The poor-quality trial also did not report randomization methods, did not conduct intention-to-treat analysis at intermediate-term followup, and reported high attrition; it was also unclear if timing of followup was the same in all patients. [141]

Low-Level Laser Therapy Compared With Sham Laser

One fair-quality trial found Nd:YAG laser therapy associated with moderate improvement in pain (difference –16.0 on a 0 to 100 scale, 95% CI –28.3 to –3.7) and a small improvement in function (difference –8.2 points on the 0 to 100 ODI, 95% CI –13.6 to –2.8) at short-term followup. [142] A poor-quality trial found GaAs laser therapy associated with increased likelihood of having no pain at intermediate-term followup (44.7% vs. 15%, p<0.01), but the analysis was restricted to patients who reported that laser therapy was effective at the end of a 2-week course of treatment. [141]

Low-Level Laser Therapy Compared With Pharmacological Therapy

No trial of low-level laser therapy compared with pharmacological therapy met inclusion criteria.

Low-Level Laser Therapy Compared With Exercise Therapy

One fair-quality trial found no clear differences between GaAs laser therapy versus exercise plus sham laser in function (difference in change from baseline –4.4 on the 0 to 100 ODI, 95% CI –11.4 to 2.5) or pain (difference in change from baseline –0.9 on a 0 to 10 scale, 95% CI –2.5 to 0.7) at intermediate-term followup. [170] For pain, the difference at followup was similar to the baseline difference (mean 7.3 vs. 6.3), and final scores were very similar (4.4 vs. 4.3).

Harms

No adverse events were reported in any of the three trials of low-level laser therapy.141,142,170

Traction

Table 10
Two trials of traction (n=151 and 60) met inclusion criteria (Table 10 and Appendix D). [137, 138] Both of the trials were included in the prior AHRQ report. One trial [137] evaluated continuous traction (12 sessions in 5 weeks) and the other [138] evaluated intermittent traction (20 sessions in 6 weeks). The comparator in both trials was sham traction (traction at <10% or 20% of body weight, compared with 35% to 50% for active traction). Both trials were conducted in the Netherlands and reported only short-term outcomes. The trials were rated fair quality due to failure to blind care providers (Appendix E).

Traction Compared With Sham Traction

There were no differences between traction versus sham traction at short-term followup in function (25 vs. 23 on the 0 to 100 ODI in one trial and 4.7 vs. 4.0 on the 0 to 24 RDQ, difference 0.7, 95% CI –1.1 to 2.6) or pain (32 vs. 36 on a 0 to 100 scale, p=0.70 and 24 vs. 20, difference 3.7, 95% CI –8.4 to 15.8).137,138 One trial [138] also found no difference between intermittent traction versus sham on the total SF-36 (66 vs. 65 on a 0 to 100 scale) and one trial [137] found no difference between continuous traction versus sham in global perceived effect, work absence, or medical consumption.

Traction Compared With Pharmacological Therapy or With Exercise

No trial of low-level laser therapy compared with pharmacological therapy or with exercise met inclusion criteria.

Harms

Neither trial reported harms.

Short-Wave Diathermy

Table 11
Data were insufficient from one poor-quality trial (n=68) to evaluate effects of short-wave diathermy (3 times weekly for 4 weeks) versus sham (detuned) diathermy for low back pain (Table 11 and Appendix D). [143] The trial was included in the prior AHRQ report. Methodological limitations included unclear randomization and allocation concealment methods, differential attrition, and baseline differences between groups (Appendix E). Although diathermy was associated with worse pain than sham treatment at short-term (8 weeks after completion of therapy) followup (25 vs. 13), statistical significance was not reported. There were no statistically significant differences in likelihood of using analgesics (7% vs. 22%, RR 0.34, 95% CI 0.08 to 1.50) or being unable to work or having limited activities (7% vs. 19%, RR 0.40, 95% CI 0.09 to 1.80), but estimates were imprecise.

Harms

Adverse events were not evaluated in the trial.

Manual Therapies for Chronic Low Back Pain

Key Points

Spinal Manipulation

Spinal manipulation was associated with small improvements compared with sham manipulation, usual care, an attention control, or a placebo intervention in short-term function (3 trials, pooled SMD –0.34, 95% CI –0.75 to –0.02, I²=45%) and intermediate-term function (3 trials, pooled SMD –0.40, 95% CI –0.85 to –0.05, I²=65%) (SOE: low).
There was no difference between spinal manipulation versus sham manipulation, usual care, an attention control, or a placebo intervention in short-term pain (3 trials, pooled difference –0.36 on a 0 to 10 scale, 95% CI –0.62 to 0.25, I²=0%), but manipulation was associated with a small improvement compared with controls on intermediate-term pain (3 trials, pooled difference –0.64, 95% CI –0.93 to –0.35, I²=0%) (SOE: low for short term, moderate for intermediate term).
There were no differences between spinal manipulation versus exercise in short-term function (3 trials, pooled SMD 0.02, 95% CI –0.28 to 0.30; I²=37%) or intermediate-term function (4 trials, pooled SMD 0.01, 95% CI –0.15 to 0.21; I²=19%) (SOE: low).
There were no differences between spinal manipulation versus exercise in short-term pain (3 trials, pooled difference 0.31 on a 0 to 10 scale, 95% CI –0.42 to 1.06; I²=34%) or intermediate-term pain (4 trials, pooled difference 0.23, 95% CI –0.14 to 0.59, I²=0%) (SOE: low).
No serious adverse events or withdrawals due to adverse events were reported in seven trials; nonserious adverse events with manipulation (primarily increased pain) were reported in three trials (SOE: low).
Massage

Massage was associated with small improvements in short-term function compared with sham massage or usual care (6 trials [2 new], SMD –0.38, 95% CI –0.63 to –0.20, I²=0%). There were no differences between massage versus controls in intermediate-term function (3 trials, SMD –0.09, 95% CI –0.26 to 0.12, I²=0%) (SOE: moderate for short term, low for intermediate term).
Massage was associated with a small improvement in short-term pain compared with sham massage or usual care (5 trials [1 new], pooled difference –0.55 on a 0 to 10 scale, 95% CI –0.88 to –0.23, I²=0%). There was no difference between massage versus controls in intermediate-term pain (3 trials, pooled difference –0.02, 95% CI –0.56 to 0.44, I²=0%) (SOE: moderate for short term, low for intermediate term).
One trial found no differences between massage versus exercise in intermediate-term function or pain (SOE: low).
Four trials of massage reported no serious adverse events; in four trials, the proportion of massage patients who reported increased pain ranged from <1 to 26 percent (SOE: low).

Detailed Synthesis

Spinal Manipulation

Table 12
Eight trials of spinal manipulation for low back pain met inclusion criteria (Table 12 and Appendix D). [143, 171–174, 190–192] All of the trials were included in the prior AHRQ report. All of the trials evaluated standard (high-velocity low-amplitude) manipulation techniques; one trial [192] evaluated flexion-distraction manipulation and one trial [172] evaluated both high-velocity low-amplitude and flexion-distraction manipulation. Sample sizes ranged from 75 to 1,001 (total sample=2,580). The number of manipulation therapy sessions ranged from 4 to 24 and the duration of therapy ranged from 4 to 12 weeks. In one trial, patients were randomized to 12 manipulation sessions over 1 month or to 12 sessions over 1 month plus biweekly maintenance sessions for an additional 10 months. [173]

Two trials compared spinal manipulation versus usual care, [172, 174]
one trial spinal manipulation versus an attention control (minimal massage), [171]
one trial spinal manipulation versus sham manipulation, [173]
one trial spinal manipulation versus a placebo treatment (sham short-wave diathermy), [143]
and four trials spinal manipulation versus exercise. [174, 190–192]

One trial was conducted in Egypt [173] and the rest in the United States, United Kingdom, or Australia. Six trials reported outcomes through intermediate-term followup [171, 173, 174, 190–192] and two trials only evaluated short-term outcomes. [143, 172]

Two trials [143, 173] were rated poor quality and the remainder fair quality (Appendix E). The major methodological limitation in the fair-quality trials was use of an unblinded design. Methodological shortcomings in the poor-quality trials included unclear randomization and allocation concealment methods, failure to report intention-to-treat analysis, and high attrition.

Spinal Manipulation Compared With Sham Manipulation, Usual Care, an Attention Control, or a Placebo Intervention

Figure 8
Spinal manipulation was associated with small improvements in function compared with controls at short-term followup (3 trials, SMD –0.34, 95% CI –0.75 to –0.02, I²=45%) [171–173] and intermediate-term followup (3 trials, SMD –0.40, 95% CI –0.85 to –0.05, I²=65%) [171, 173, 174] (Figure 8). Based on the original 0 to 100 scales (ODI and Von Korff functional disability [VF]) used in two trials, the pooled difference was –5.12 (95% CI –10.53 to 0.77) for short-term function and –9.27 (95% CI –13.42 to –5.12) for intermediate-term function. Estimates were similar when a poor-quality trial [173] was excluded. For short-term function, one trial reported similar effects for standard manipulation (difference –1.3 on the RDQ, 95% CI –2.9 to 0.6) and flexion-distraction manipulation (difference –1.9, 95% CI –3.6 to –0.2); therefore, results for both arms were combined for the pooled analysis. [172]

Figure 9
There was no clear difference between spinal manipulation versus sham manipulation, an attention control, or a placebo intervention in short-term pain (3 trials, pooled difference –0.36 on a 0 to 10 scale, 95% CI –0.62 to 0.25, I²=0%) (Figure 9). [143, 171, 173] Two of the trials were rated poor quality; the results of the fair-quality trial [171] were consistent with the overall estimate (difference –0.21, 95% CI –0.69 to 0.26). Manipulation was associated with a small improvement in intermediate-term pain compared with sham manipulation, usual care, or an attention control (3 trials, pooled difference –0.64 on a 0 to 10 scale, 95% CI –0.93 to –0.35, I²=0%). [171, 173, 174] The estimate was similar when a poor-quality trial173 was excluded (2 trials, difference –0.60, 95% CI –0.98 to –0.21). [171, 174]

Two trials found no differences between spinal manipulation versus controls on the SF-36 MCS and PCS. [171, 174] One trial [171] found no differences in short-term PCS (mean difference 0.94 on a 0 to 100 scale, 95% CI –1.55 to 3.42) or MCS scores (mean difference –0.17 on a 0 to 100 scale, 95% CI –2.70 to 2.36) at short-term followup. At intermediate-term followup, pooled differences were also very small and not statistically significant for the PCS (2 trials, mean difference 1.54, 95% CI –0.03 to 3.10, I²=0%) or the MCS (2 trials, mean difference 0.52, 95% CI –1.94 to 2.97, I²=44%). [171, 174]

Spinal Manipulation Compared With Pharmacological Therapy

No trial of spinal manipulation versus pharmacological therapy met inclusion criteria.

Spinal Manipulation Compared With Exercise

Figure 10
Figure 11
There were no differences between spinal manipulation versus exercise in function at short-term (3 trials, SMD 0.02, 95% CI –0.28 to 0.30, I²=37%) [190–192] or intermediate-term followup (4 trials, SMD 0.01, 95% CI –0.15 to 0.21, I²=19%) [174, 190–192] (Figure 10). Excluding one trial [192] of flexion-distraction manipulation resulted in similar findings.

There were no differences between spinal manipulation versus exercise in short-term pain (3 trials, pooled difference 0.31, 95% CI –0.42 to 1.06, I²=34%) [190–192] or intermediate-term pain (4 trials, pooled difference 0.23, 95% CI –0.14 to 0.59, I²=0%) (Figure 11). [174, 190–192] Excluding one trial [192] of flexion-distraction manipulation resulted in similar findings.

Two trials found no differences between spinal manipulation versus controls on the SF-36 MCS and PCS. [174, 190] One trial found no differences in short-term PCS (mean difference –1.25 on a 0 to 100 scale, 95% CI –3.32 to 0.83) or MCS scores (mean difference 0.95, 95% CI –0.96 to 2.86). [190] At intermediate-term followup, pooled differences were also very small (<1 point) and not statistically significant for the PCS (2 trials, mean difference –0.89, 95% CI –2.33 to 0.55, I²=0%) or the MCS (2 trials, mean difference 0.64, 95% CI –0.96 to 2.24). [174, 190]

Harms

Seven trials of spinal manipulation reported no serious adverse events or withdrawals due to adverse events. [171–174, 190–192] Nonserious adverse events (primarily increased pain) were reported in three trials. [171, 173, 190]

Massage

Table 13
Eight trials of massage for low back pain met inclusion criteria (Table 13 and Appendix D). [108, 175–180, 189] Six trials [108, 175–178, 189] were included in the prior AHRQ report and two new trials [179, 180] were identified for this update. Massage techniques varied across trials.

Two trials evaluated reflexology, [108, 178]
two trials (one new) myofascial release, [175, 179]
one trial relaxation or structural massage, [177]
one trial (new) acupressure [180]
and two trials mixed massage techniques that included Swedish massage. [176, 189]

Sample sizes ranged from 15 to 401 (total sample=1,133).
Two trials compared massage versus sham massage, [175, 178]
three trials massage versus usual care, [108, 177, 189]
and one trial compared massage versus an attention control (self-care education). [176]

Two new trials compared the intervention to sham, one new trial compared acupressure to sham acupressure, [180] and one new trial compared myofascial release to sham myofascial release. [179] One trial was conducted in India, [175] one trial in Iran,180 and the rest in the United States or Europe. The duration of massage therapy ranged from 2 to 10 weeks and the number of massage sessions ranged from 4 to 24. Three trials reported outcomes through intermediate-term followup, [176, 177, 189] and five only reported short-term outcomes. [108, 175, 178–180] No trial reported long-term outcomes.

Seven of the massage trials were rated fair-quality [108, 175–179, 189] and one trial was rated poor-quality [180] (Appendix E). Methodological limitations included unclear allocation concealment methods and unblinded design. One trial reported high loss to followup [108]; the poor quality trial [180] also was unclear regarding blinding of outcome assessors and did not provide information on treatment compliance.

Massage Compared With Sham Massage, Usual Care, or an Attention Control

Figure 12
Massage was associated with small effects on short-term function versus sham massage or usual care (6 trials, SMD –0.38, 95% CI –0.63 to –0.20, I²=0%) (Figure 12). [108, 175, 177–180] The massage technique was myofascial release in two trials (pooled SMD –0.45, 95% CI –0.88 to –0.04,175,179 structural or relaxation massage in one trial (difference –1.72 on the 0 to 23 modified RDQ, 95% CI –2.78 to –0.67), [177] foot reflexology in two trials (pooled SMD –0.15, 95% CI –0.60 to 0.50), [108, 178] and acupressure in one trial (mean difference –12.2, 95% CI –18.6 to –5.8 on the 9 to 63 Fatigue Severity Scale). [180] Estimates were similar when trials were stratified according to whether the comparator was sham massage or usual care. There was no effect on intermediate-term function (3 trials, SMD –0.09, 95% CI –0.26 to 0.12, I²=0%) (Figure 12). [176, 177, 189]

Figure 13
Massage was associated with small effects on short-term pain versus sham massage or usual care (5 trials, pooled difference –0.55 on a 0 to 10 scale, 95% CI –0.88 to –0.23, I²=0%) (Figure 13). [108, 175, 177–179] On a 0 to 10 scale, effects were –0.60 points (95% CI –1.72 to 0.46) in two trials of foot reflexology, [108, 178] –0.68 points (95% CI –1.35 to –0.10) in two trials of myofascial release, [175, 179] and –0.35 points (95% CI –0.82 to 0.12) in a trial of relaxation or structural massage. [177] Estimates were similar when trials were stratified according to whether the comparator was sham massage or usual care. There was no difference between massage (structural or relaxation massage or mixed massage techniques, including Swedish massage) versus an attention control or usual care in intermediate-term pain (3 trials, pooled difference –0.02, 95% CI –0.56 to 0.44, I²=0%). [176, 177, 189]

One trial found no difference between massage versus usual care in use of opioids at intermediate-term followup or healthcare costs.177 There was insufficient evidence to determine effects of duration of massage or number of massage sessions on findings. Two trials [177, 189] found no differences between massage versus usual care on the SF-36 MCS (mean difference 0.87 on a 0 to 100 scale, 95% CI –1.01 to 2.75, I²=0%) or PCS scores (mean difference 3.91 on a 0 to 100 scale, 95% CI –4.50 to 12.31, I²=77%) at intermediate-term followup, and one trial [108] found no effects on various SF-36 subscales or the Beck Depression Inventory at short-term followup. One trial found massage associated with greater likelihood of experiencing ≥3 point improvement in the RDQ or ≥20 point improvement on a 0 to 100 VAS pain scale, but did not report statistical significance, which could not be calculated because the denominators were unclear. [179]

Massage Compared With Pharmacological Therapies

No trial of massage versus pharmacological therapy met inclusion criteria.

Massage Compared With Exercise

One trial found no differences between massage versus exercise in intermediate-term function (difference 1.2 on the 0 to 24 RDQ, 95% CI –1.47 to 3.87), pain (difference 0.60 on the 0 to 10 Von Korff pain scale, 95% CI –0.67 to 1.87), or the SF-36 MCS or PCS scores (differences 0 to 3 points on 0 to 100 scales, p>0.05).189

Harms

Four trials [175, 176, 179, 180] of massage reported no serious adverse events, and one trial [178] reported no adverse events. In four trials, the proportion of massage patients who reported increased pain ranged from <1 to 26 percent. [175–177, 189]

Mindfulness-Based Stress Reduction for Chronic Low Back Pain

Key Points

There were no differences between mindfulness-based stress reduction (MBSR) versus usual care or attention control in short-term function (4 trials, pooled SMD –0.14, 95% CI –0.51 to 0.02, I²=0%), intermediate-term function (1 trial, SMD –0.20, 95% CI –0.46 to 0.06), or long-term function (1 trial, SMD –0.09, 95% CI –0.35 to 0.16) (SOE: low).
MBSR was associated with a small improvement compared with usual care or an attention control in short-term pain (3 trials, pooled difference –0.68 on a 0 to 10 scale, 95% CI –1.29 to –0.28, I²=45%) after excluding two poor-quality trials; MBSR was also associated with a small improvement in intermediate-term pain (1 trial, difference –0.75, 95% CI –1.16 to –0.34), with no statistically significant effects on long-term pain (1 trial, difference –0.22, 95% CI –0.63 to 0.19) (SOE: moderate for short term, low for intermediate and long term).
One trial reported temporarily increased pain in 29 percent of patients undergoing MBSR, and three trials reported no harms (SOE: low).

Detailed Synthesis

Table 14
Five trials (7 publications) of MBSR for low back pain met inclusion criteria (Table 14 and Appendix D). [104, 194–199] All of the trials were included in the prior AHRQ report. In three trials, [104, 195–198] the MBSR intervention was closely modeled on the program developed by Kabat-Zinn; [282] in the other two trials, the MBSR intervention appeared to have undergone some adaptations from the original Kabat-Zinn program. [194, 199] In all trials, the main intervention consisting of 1.5 to 2 hour weekly group sessions for 8 weeks. Sample sizes ranged from 35 to 282 (total sample=629).

Three trials compared MBSR versus usual care [104, 194–196, 199]
and two trials compared MBSR versus an attention control (education). [197, 198]
Four trials [104, 195–199] were conducted in the United States and one trial [194] in Iran.

One trial focused on patients on opioid therapy for low back pain.199 One trial reported outcomes through long-term (22 months after 8-week MBSR course) followup, [104, 195, 196] and the others only evaluated short-term outcomes.

Three trials [104, 195–198] were rated fair quality and two trials poor quality (Appendix D). [194, 199] The major methodological limitation in the fair-quality trials was the inability to effectively blind patients and caregivers to the MBSR intervention. One poor-quality trial reported unclear randomization and allocation concealment methods and had high attrition, [194] and another poor-quality trial reported a large baseline difference in baseline pain scores (Brief Pain Inventory score 6.3 on a 0 to 10 scale with MBSR versus 4.9 with usual care). [199]

MBSR Compared With Usual Care or an Attention Control

Figure 14
MBSR was associated with no statistically significant differences in short-term function compared with usual care or an attention control (4 trials, pooled SMD –0.14, 95% CI –0.51 to 0.02, I²=0%) (Figure 14). [104, 197, 198] Three trials [104, 197, 198] evaluated function using the RDQ (pooled difference –0.89 points on a 0 to 24 scale, 95% CI –2.37 to 0.30), and one trial [199] used the ODI (difference –3.00 points on a 0 to 100 scale, 95% CI –11.39 to 5.39). One trial found no difference between MBSR versus usual care in intermediate-term (SMD –0.20, 95% CI –0.46 to 0.06) or long-term function (SMD –0.09, 95% CI –0.35 to 0.16). [104, 195] There was no clear difference between MBSR versus controls in likelihood of a clinically meaningful effect on function (≥30% improvement in RDQ or RDQ improved by ≥2.5 points) at short term (2 trials, 1.17, 95% CI 0.88 to 1.57). [104, 197] Data were restricted to one trial for intermediate-term (RR 1.41, 95% CI 1.13 to 1.77)104 and long-term followup (RR 1.32, 95% CI 1.00 to 1.74). [195]

Figure 15
MBSR was associated with no statistically significant effects on short-term pain compared with usual care or an attention control, when all trials were included in the analysis (5 trials, pooled difference –0.88 on a 0 to 10 scale, 95% CI –1.82 to 0.08, I²=89%) (Figure 15). [104, 194, 197–199] However, the estimate favored MBSR and statistical heterogeneity was substantial. Excluding two poor-quality trials, [194, 199] one of which reported the largest effect in favor of MBSR (–2.23 points) as well as one of which was the only trial with results that favored usual care (mean difference 0.40 points), resulted in a small, statistically significant effect on short-term pain (3 trials, pooled difference –0.68, 95% CI –1.29 to –0.28, I²=45%) and reduced statistical heterogeneity. [104, 197, 198] Estimates were similar when analyses were stratified according to whether the trial evaluated usual care or an attention control comparator. One trial found MBSR associated with a small improvement compared with an attention control on intermediate-term pain (difference –0.75 on a 0 to 10 scale, 95% CI –1.16 to –0.34); there was no statistically significant effect on long-term pain (difference –0.22, 95% CI –0.63 to 0.19). [195] MBSR was associated with greater likelihood of a clinically meaningful effect on pain (defined as ≥30% improvement) at short-term (2 trials, RR 1.49, 95% CI 1.14 to 1.95, I²=0%) [104, 197] and intermediate-term followup (1 trial, RR 1.56, 95% CI 1.14 to 2.14), [104] but not at long-term followup (41% vs. 31%, RR 1.32, 95% CI 0.95 to 1.85). [195]

Three trials found no clear differences between MBSR versus usual care or an attention control on quality of life measured by the 12-Item Short Form Health Survey (SF-12) or 36-Item Short Form Health Survery (SF-36). [104, 194, 197] Two trials reported conflicting effects on short-term PCS (mean difference 2.89, 95% CI –5.13 to 10.92, I²=97%) and MCS scores (mean difference 4.27, 95% CI –0.07 to 9.51, I²=88%), though statistical heterogeneity was high. [104, 194] One trial found no difference in intermediate-term PCS (mean difference –0.56, 95% CI –2.52 to 1.40) or MCS scores (mean difference 2.06, 95% CI 0.05 to 4.07) scores. [104] One trial found MBSR associated with less medication use for low back pain at short term (43% vs. 54%) but not at intermediate term (47% vs. 53%); MBSR was associated with a small decrease in severity of depression (difference 0.63 points on the Patient Health Questionnaire (PHQ-8) at intermediate-term), with no clear differences in measures of healthcare utilization. [104, 196]

MBSR Compared With Pharmacological Therapy or With Exercise

No trial of MBSR versus pharmacological or versus exercise therapy met inclusion criteria.

Harms

In one trial, 29 percent of MBSR patients reported temporarily increased pain. [104] Three trials [197–199] reported no adverse events and one trial [194] did not report adverse events.

Mind-Body Practices for Chronic Low Back Pain

Key Points

Yoga

Yoga was associated with moderate effects on function versus an attention or waitlist control at short-term (8 trials [2 new], pooled SMD –0.45, 95% CI –0.69 to –0.28, I²=31%) and small effects at intermediate-term (3 trials, pooled SMD –0.29, 95% CI –0.47 to –0.11, I²=0%) (SOE: moderate for short term, low for intermediate term).
Yoga was associated with small effects on pain versus an attention or waitlist control at short-term (7 trials [2 new], pooled difference –0.87 on a 0 to 10 scale, 95% CI –1.49 to –0.24, I²=64%) and moderate effects at intermediate-term (2 trials, pooled difference –1.16, 95% CI –2.16 to –0.27, I²=0%) (SOE: low for short term, moderate for intermediate term).
Yoga was associated with no statistically significant differences versus exercise in short-term or intermediate-term pain or function (SOE: low).
Yoga was not associated with increased risk of harms versus controls (SOE: low).
Qigong

One trial found no differences between qigong versus exercise in short-term function (difference 0.9 on the RDQ, 95% CI –0.1 to 2.0), although intermediate-term results showed a small improvement favoring exercise (difference 1.2, 95% CI 0.1 to 2.3) (SOE: low).
One trial found qigong associated with a small improvement in pain versus exercise at short-term followup (difference 7.7 on a 0 to 100 scale, 95% CI 0.7 to 14.7), but the difference at intermediate-term was not statistically significant (difference 7.1, 95% CI –1.0 to 15.2) (SOE: low).
One trial found no difference between qigong versus exercise in risk of adverse events (SOE: low).

Detailed Synthesis

Yoga

Table 15
Ten trials of yoga for low back pain met inclusion criteria (Table 15, Appendix D). [37, 204–211, 220] Eight trials [204–210, 220] were included in the prior AHRQ report and two trials [37, 211] were added for this update. In the prior AHRQ report, four trials evaluated Iyengar yoga, [208–210, 220] two trials Viniyoga, [206, 207] – and two trials Hatha yoga [204, 205]; one new trial evaluated Kundalini yoga [37] and the other new trial evaluated (Restorative Exercise and Strength Training for Operational Resilience and Excellence) RESTORE yoga. [211] Across all trials, sample sizes ranged from 60 to 320 (total sample=1,520).

Six trials compared yoga versus an attention control (education), [37, 205–208, 210]
two trials yoga versus wait list control, [204, 209]
one trial yoga versus usual care, [211]
and five trials yoga versus exercise. [37, 205–207, 220]

One trial was conducted in India [220] and the rest in the United States or Europe. The duration of yoga therapy ranged from 4 to 24 weeks and the number of sessions ranged from 4 to 48. In one trial, patients who received 12 weeks of yoga therapy were randomized to ongoing once-weekly maintenance sessions or to no maintenance. [205] Three trials reported outcomes through intermediate-term followup, [205, 208, 209] and seven only reported short-term outcomes. [37, 204, 206, 207, 210, 211, 220]

All of the trials were rated fair quality (Appendix E). Trials could not effectively blind patients; other methodological limitations included unclear allocation or randomization methods and high attrition.

Yoga Compared With an Attention Control or Waitlist

Figure 16
Yoga was associated with small effects on short-term function versus an attention control or waitlist (8 trials, pooled SMD –0.45, 95% CI –0.69 to –0.28, I²=31%) (Figure 16). [37, 204–208, 210, 211]

Results were similar for Viniyoga (2 trials, pooled SMD –0.54, 95% CI –1.36 to 0.18), [206, 207]
Hatha yoga (2 trials, SMD –0.45, 95% CI –0.82 to –0.09), [204, 205]
Iyengar yoga (2 trials, SMD –0.38, 95% CI –1.38 to 0.14), [208, 210]
Kundalini yoga (1 trial, SMD –0.13, 95% CI –0.57 to 0.31), [37]
or RESTORE yoga (1 trial, SMD –0.74, 95% CI –1.23 to –0.25). [211]

Six trials evaluated function using the RDQ or modified RDQ, with a difference on a 0 to 24 or 0 to 23 scale of –2.32 (95% CI –3.48 to –1.40, I²=46%). [204–208, 211] Yoga was also associated with small effects on intermediate-term function versus controls (3 trials, pooled SMD –0.29, 95% CI –0.47 to –0.11, I²=0%). [205, 208, 209] In two trials that evaluated intermediate-term function with the RDQ or modified RDQ, the difference was –1.65 points (95% CI –3.17 to –0.32, I²=0%). [205, 208] No trials were rated poor quality.

Figure 17
Yoga was associated with small effects on short-term pain versus controls (7 trials, pooled difference –0.87, 95% CI –1.49 to –0.24 on a 0 to 10 scale, I²=64%) (Figure 17). [37, 204–207, 210, 211] Estimates were similar from two trials of Viniyoga (pooled difference –1.25, 95% CI –3.78 to 1.27), [206, 207] two trials of Hatha yoga (difference –0.80, 95% CI –1.46 to –0.20), [204, 205] and one trial of Iyengar yoga (difference –1.40, 95% CI –2.43 to –0.37); [210] one trial of Kundalini yoga [37] and one trial of RESTORE yoga [211] showed no clear effects on pain, but estimates were imprecise. Yoga was also associated with moderate effects on intermediate-term pain versus controls, based on two trials (pooled difference –1.16, 95% CI –2.16 to –0.27, I²=0%). [205, 209]

Data on effects of yoga on quality of life were limited. One trial found no difference between yoga versus an attention control on the SF-36 Physical and Mental Component Summaries at short-term or intermediate-term followup (differences 0.42 to 2.02 points on a 0 to 100 scale). [208] One other trial found no differences between yoga versus an attention control on the SF-36, but did not provide data. [206]

One trial found yoga associated with lower (better) scores on the Beck Depression Inventory than waitlist at intermediate-term followup (mean 4.6 vs. 7.8 on a 0 to 63 scale, p=0.004) [209] and one trial found no difference between yoga versus waitlist in opioid use (9% vs. 7%, p=0.40) or other medical treatments for pain (39% vs. 37%, p=0.42) at short-term followup. [204] One trial found yoga associated with fewer work absence days compared with an attention control at 5 to 8 months followup (mean difference –8.0 days, 95% CI –15.8 to –0.2), but differences were not statistically significant at 1 to 4 months for at 9 to 12 months. [37]

Yoga Compared With Pharmacological Therapy

No trial of yoga versus pharmacological therapy met inclusion criteria.

Yoga Compared With Exercise

Figure 18
Figure 19
There were no differences between yoga versus exercise in short-term function (4 trials, pooled SMD –0.04, 95% CI –0.27 to 0.16, I²=0%) [37, 205–207] or intermediate-term function (1 trial, SMD –0.01, 95% CI –0.26 to 0.24) [205] (Figure 18). One trial found no difference between yoga versus exercise on the SF-36 at short-term followup (data not provided). [206] No trials were rated poor quality. Effects of yoga versus exercise on short-term pain were not statistically significant and there was marked heterogeneity (5 trials, pooled difference –0.63 on a 0 to 10 scale, 95% CI –1.68 to 0.45, I²=88%) (Figure 19). [37, 205–207, 220] Effects favored yoga in one trial of Iyengar yoga (difference –2.00, 95% CI –2.50 to –1.50) and in one trial of Viniyoga (difference –1.50, 95% CI –2.36 to –0.64). The other three trials (one trial each of Viniyoga, Kundalini yoga, and Hatha yoga) each found no differences between yoga versus exercise. One trial found no difference between yoga versus exercise in intermediate-term pain (difference 0.30, 95% CI –0.39 to 0.99). [205]

Harms

Data on harms were limited, but trials reported no clear difference between yoga versus control interventions in risk of any adverse event (primarily mild, self-limiting back or joint pain). [205, 207, 208] Three serious adverse events were reported across three trials (?1% of patients), all in patients randomized to yoga: worsening back pain due to yoga, [205, 207, 208] herniated disc [205, 207, 208] and cellulitis205 (whether the latter two complications were related to yoga is unclear).

Qigong

Table 16
One German trial (n=125) compared qigong (weekly sessions for 3 months) versus exercise therapy (including stretching and strengthening) (Table 16 and Appendix D). [219] The trial was included in the prior AHRQ report. It was rated fair quality due to baseline differences between groups, unblinded design, and suboptimal compliance (Appendix E). There was no difference between qigong versus exercise in short-term function (difference 0.9 on the 0 to 24 RDQ, 95% CI –0.1 to 2.0), although intermediate-term results slightly favored exercise (difference 1.2, 95% CI 0.1 to 2.3). Qigong was associated with slightly worse pain versus exercise at short-term followup (difference 7.7 on a 0 to 100 scale, 95% CI 0.7 to 14.7), but the difference at intermediate-term was not statistically significant (difference 7.1, 95% CI –1.0 to 15.2). There were no differences in sleep, measures of the SF-36 PCS or MCS scores, or in risk of adverse events.

Acupuncture for Chronic Low Back Pain

Key Points

Acupuncture was associated with a small improvement in short-term function compared with sham acupuncture or usual care (4 trials, pooled SMD –0.23, 95% CI –0.35 to –0.04, I²=25%). There were no differences between acupuncture versus controls in intermediate-term function (3 trials, pooled SMD –0.08, 95% CI –0.42 to 0.28, I²=64%) or long-term function (1 trial, adjusted difference –3.4 on the 0 to 100 ODI, 95% CI –7.8 to 1.0) (SOE: low).
Acupuncture was associated with small improvements in short-term pain compared with sham acupuncture, usual care, an attention control, or a placebo intervention (5 trials, pooled difference –0.54 on a 0 to 10 scale, 95% CI –0.91 to –0.16, I²=25%). There was no difference in intermediate-term pain (5 trials, pooled difference –0.22, 95% CI –0.67 to 0.21, I²=0%); one trial found acupuncture associated with greater effects on long-term pain (difference –0.83, 95% CI –1.53 to –0.13) (SOE: moderate for short term, low for intermediate term and long term).
There was no clear difference between acupuncture versus control interventions in risk of study discontinuation due to adverse events. Serious adverse events were rare with acupuncture and control interventions (SOE: low).

Detailed Synthesis

Table 17
Eight trials of acupuncture for low back pain met inclusion criteria (Table 17 and Appendix D). [176, 224–230] All of the trials were included in the prior AHRQ report. All trials evaluated needle acupuncture to body acupoints; one trial also evaluated electroacupuncture. [225] Sample sizes ranged from 46 to 1,162 (total sample=2,645). Four trials compared acupuncture versus sham acupuncture, [224, 226–228] three trials acupuncture versus usual care, [226, 228, 230] two trials acupuncture versus a placebo intervention (sham transcutaneous electrical nerve stimulation [TENS]), [225, 229] and one trial acupuncture versus an attention control (self-care education). [176] One trial was conducted in Asia [227] and the rest in the United States or Europe. The duration of acupuncture therapy ranged from 6 to 12 weeks and the number of acupuncture sessions ranged from 6 to 15. One trial reported outcomes through long-term followup, [230] four trials through intermediate-term followup, [176, 224–226] and the remainder only evaluated short-term outcomes.

One trial was rated good quality, [224] five trials fair quality, [176, 226–228, 230] and two trials [225, 229] poor quality (Appendix D). Limitations in the fair-quality and poor-quality trials included unblinded design, unclear randomization or allocation concealment methods, and high attrition.

Acupuncture Compared With Sham Acupuncture, Usual Care, an Attention Control, or a Placebo Intervention

Figure 20
Acupuncture was associated with small improvements in short-term function compared with sham acupuncture or usual care (4 trials, pooled SMD –0.23, 95% CI –0.35 to –0.04, I²=25%) (Figure 20). [224, 226–228] Each trial measured function using a different scale; across trials the SMD ranged from –0.34 to 0.00. Differences were slightly greater in trials that compared acupuncture against usual care (2 trials, SMD –0.43, 95% CI –0.60 to –0.22) [226, 228] than against sham acupuncture (4 trials, SMD –0.13, 95% CI –0.24 to 0.01). [224, 226–228] None of the trials were rated poor quality. There were no differences between acupuncture versus controls in intermediate-term function (3 trials, pooled SMD –0.08, 95% CI –0.42 to 0.28, I²=64%) [176, 224, 226] or long-term function (1 trial, adjusted difference –3.4 on the 0 to 100 ODI, 95% CI –7.8 to 1.0). [230]

Figure 21
Acupuncture was associated with small improvements in short-term pain compared with sham acupuncture, usual care, an attention control, or a placebo intervention (5 trials, pooled difference –0.54 on a 0 to 10 scale, 95% CI –0.91 to –0.16, I²=25%) (Figure 21). [224–228] The pooled estimate was similar when poor-quality trials were excluded. When stratified according to the type of control intervention, acupuncture was associated with greater effects when compared with usual care (2 trials, pooled difference –1.01, 95% CI –1.60 to –0.28) [226, 228] than when compared with sham acupuncture (4 trials, pooled difference –0.21, 95% CI –0.66 to 0.18). [224, 226–228] There was no difference between acupuncture versus controls in intermediate-term pain (5 trials, pooled difference –0.22, 95% CI –0.67 to 0.21, I²=0%). [176, 224–226, 230] One trial found acupuncture associated with greater effects on long-term pain than usual care (difference –0.83, 95% CI –1.53 to –0.13). [230]

Data on effects of acupuncture on quality of life were limited. In two trials, differences between acupuncture versus sham acupuncture or usual care on short-term or intermediate-term SF-36 PCS and MCS scores were small (range 0.64 to 3.92 points on a 0 to 100 scale), and most differences were not statistically significant. [224, 228] Two trials found no clear effects of acupuncture and controls on measures of depression. [224, 227]

Two trials found no clear differences between acupuncture versus an attention control in measures of healthcare utilization (provider visits, medication fills, imaging studies, costs of services), [176, 226] and one trial found no clear differences at intermediate-term followup between acupuncture versus placebo TENS in likelihood of working full time. [225]

One trial found acupuncture associated with a higher likelihood of short-term (4.5 months) treatment response (defined as ≥33% pain improvement and ≥12% functional improvement) versus usual care (48% vs. 27%, RR 1.74, 95% CI 1.43 to 2.11), but there was no difference versus sham acupuncture (RR 1.08, 95% CI 0.92 to 1.25). [228]

No trial evaluated effects of acupuncture on use of opioid therapies or healthcare utilization. There was insufficient evidence to determine effects of duration of acupuncture or number of acupuncture sessions on findings.

Acupuncture Compared With Pharmacological Therapy or With Exercise

No trial of acupuncture versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

Data on harms were limited but indicated no clear difference between acupuncture versus control interventions in risk of withdrawal due to adverse events. [226, 230] Serious adverse events were rare with acupuncture and control interventions. [176, 224, 226–228]

Multidisciplinary Rehabilitation for Chronic Low Back Pain

Key Points

Multidisciplinary rehabilitation was associated with small improvements in function compared with usual care at short-term (4 trials, pooled SMD –0.30, 95% CI –0.63 to 0.00, I²=58%) and intermediate-term followup (4 trials, pooled SMD –0.37, 95% CI –0.69 to –0.08, I²=34%); there was no difference in long-term function (2 trials, pooled SMD –0.04, 95% CI –0.36 to 0.35, I²=0%) (SOE: low).
Multidisciplinary rehabilitation was associated with small improvements in pain compared with usual care at short-term followup (4 trials, pooled difference –0.53 on a 0 to 10 scale, 95% CI –0.86 to –0.11, I²=0%) and intermediate-term followup (4 trials, pooled difference –0.62, 95% CI –1.06 to –0.18, I²=0%); the long-term difference was smaller and not statistically significant (2 trials, pooled difference –0.35, 95% CI –1.10 to 0.34, I²=0%) (SOE: moderate for short term and intermediate term, low for long term).
Multidisciplinary rehabilitation was associated with a small improvement compared with exercise in short-term function (6 trials, pooled SMD –0.20, 95% CI –0.54 to 0.00, I²=0%) and intermediate-term function (5 trials [excluding outlier trial], pooled SMD –0.20, 95% CI –0.40 to –0.00, I²=0%); there was no effect on long-term function (2 trials [excluding outlier trial], pooled SMD –0.07, 95% CI –0.50 to 0.39, I²=0%) (SOE: moderate for short term and intermediate term, low for long term).
Multidisciplinary rehabilitation was associated with a small improvement compared with exercise in short-term pain (6 trials, pooled difference –0.69 on a 0 to 10 scale, 95% CI –1.16 to –0.22, I²=0%) and intermediate-term pain (5 trials [excluding outlier trial], pooled difference –0.55, 95% CI –1.00 to –0.11, I²=0%); there was no effect on long-term pain (2 trials [excluding outlier trial], pooled difference 0.00, 95% CI –1.31 to 1.17) (SOE: moderate for short term and intermediate term, low for long term).
Data on harms were sparse; no serious harms were reported (SOE: insufficient).

Detailed Synthesis

Table 18
Sixteen trials (reported in 21 publications) of multidisciplinary rehabilitation for low back pain met inclusion criteria (Table 18 and Appendix D). [35, 133, 140, 189, 255–260, 269–281] All of the trials were included in the prior AHRQ report. In accordance with our definition for multidisciplinary rehabilitation, the intervention in all trials included a psychological therapy and an exercise therapy component, with therapy developed by clinicians from at least two disciplines. Most multidisciplinary rehabilitation interventions incorporated techniques and approaches consistent with principles of functional restoration. [283] The intensity of multidisciplinary rehabilitation varied substantially, with treatment ranging from 4 to 150 hours.

Five trials evaluated a multidisciplinary rehabilitation intervention that met our criteria for high intensity (≥20 hours/week or >80 hours total). [255, 260, 270, 271, 278] The duration of therapy ranged from 4 days to up to 13 weeks. Sample sizes ranged from 20 to 459 (total sample=1,964).
Six trials compared multidisciplinary rehabilitation versus usual care, [255–260]
nine trials compared multidisciplinary rehabilitation versus exercise therapy, [133, 257, 270, 271, 273–278]
and one trial compared multidisciplinary rehabilitation versus oral medications. [269]

One trial [269] was conducted in Iran and the remainder were conducted in the United States, the United Kingdom, or Australia.
Five trials reported outcomes through long-term (12 to 60 months) followup, [133, 255, 269, 270, 276]
eight trials evaluated outcomes through intermediate-term followup, [133, 258–260, 271, 273, 275, 278, 279]
and three trials only evaluated short-term outcomes. [256, 274, 277]

Ten trials [255, 257, 258, 270, 271, 274–278] were rated fair quality and six trials poor quality (Appendix D). [133, 256, 259, 260, 269, 273] The major methodological limitation in the fair-quality trials was the inability to effectively blind patients and caregivers to the multidisciplinary rehabilitation. Other methodological shortcomings included unclear randomization and allocation concealment methods and high attrition.

Multidisciplinary Rehabilitation Compared With Usual Care

Figure 22
Multidisciplinary rehabilitation was associated with small improvements in function compared with controls at short-term (4 trials, pooled SMD –0.30, 95% CI –0.63 to 0.00, I²=58%), [255–258] and intermediate-term followup (4 trials, pooled SMD –0.37, 95% CI –0.69 to –0.08, I²=34%) (Figure 22). [257–260] There was no difference in long-term function (2 trials, pooled SMD –0.04, 95% CI –0.36 to 0.35, I²=0%). [255, 257] In trials that measured function using the RDQ, the difference was –0.67 points (95% CI –21.5 to 0.81, 2 trials) at short term and –1.9 points (95% CI –3.70 to –0.18, 2 trials) at intermediate term. Restriction to high-intensity multidisciplinary rehabilitation interventions or exclusion of poor-quality trials had little effect on estimates. At short-term followup, effects on function were somewhat larger with high intensity multidisciplinary rehabilitation interventions (2 trials, pooled SMD –0.50, 95% CI –0.94 to –0.22) [255, 256] than with nonhigh intensity interventions (3 trials, pooled difference –0.20, 95% CI –0.38 to 0.04), [256–258] but the interaction was not statistically significant (p=0.19). At intermediate term, there were no clear differences between high intensity (1 trial, SMD –0.59, 95% CI –0.99 to –0.19)260 and nonhigh intensity (3 trials, pooled difference –0.30, 95% CI –0.69 to 0.06) [257–259] interventions (p=0.48 for interaction).

Figure 23
Multidisciplinary rehabilitation was associated with small improvements compared with usual care in pain at short-term (4 trials, pooled difference –0.53 on a 0 to 10 scale, 95% CI –0.86 to –0.11, I²=0%) [255–258] and intermediate-term followup (4 trials, pooled difference –0.62, 95% CI –1.06 to –0.18, I²=0%) [257–260] (Figure 23). The long-term difference was smaller and not statistically significant (2 trials, pooled difference –0.35, 95% CI –1.10 to 0.34, I²=0%). [255, 257] Excluding poor-quality trials [256, 259, 260] had little effect on estimates. At short-term followup, effects on pain were somewhat larger with high intensity multidisciplinary rehabilitation interventions (2 trials, pooled difference –0.86, 95% CI –1.57 to –0.31) [255, 256] than with nonhigh intensity interventions (3 trials, pooled difference –0.35, 95% CI –0.71 to 0.15), [256–258] but the interaction between intensity and effects of multidisciplinary rehabilitation was not statistically significant (p=0.48). At intermediate term, estimates were similar for high intensity (1 trial, difference –0.53, 95% CI –1.35 to 0.29) [260] and nonhigh intensity (3 trials, pooled difference –0.66, 95% CI –1.22 to –0.09) interventions (p=0.82 for interaction). [257–259]

Data on other outcomes was limited. One trial found no differences between multidisciplinary rehabilitation versus usual care on the SF-36 Social Functioning or Mental Functioning subscales. [257] Three trials reported inconsistent effects on work or disability/sick leave status. [255, 257, 260] Two trials found multidisciplinary rehabilitation associated with fewer health system contacts versus usual care. [255, 258]

Multidisciplinary Rehabilitation Compared With Pharmacological Therapy

One poor-quality trial (n=74) found multidisciplinary rehabilitation (intensity unclear) associated with greater effects on short-term quality of life than oral medications (acetaminophen, nonsteroidal anti-inflammatory drugs [NSAIDs], and chlordiazepoxide). [269] The difference on the SF-36 PCS was 25.5 points (95% CI 14.7 to 36.3) and on the SF-36 MCS was 23.0 points (95% CI 10.8 to 35.2). Effects were smaller at intermediate term and statistically significant for the SF-36 PCS (difference 15.4, 95% CI 2.35 to 28.45) but not for the SF-36 MCS (difference 9.0, 95% CI –3.88 to 21.9). Effects were not statistically significant at long-term (12-month) followup (differences 13.6 and 4.9 points, respectively).

Multidisciplinary Rehabilitation Compared With Exercise

Figure 24
Multidisciplinary rehabilitation was associated with a small improvement in short-term function compared with exercise (6 trials, pooled SMD –0.20, 95% CI –0.54 to 0.001, I²=32%) (Figure 24). [270, 272–275, 277] Estimates were similar when a poor-quality trial [273] was excluded and when analyses were restricted to trials of high-intensity multidisciplinary rehabilitation (2 trials, pooled difference –0.14, 95% CI –0.50 to 0.22). [270, 272] Multidisciplinary rehabilitation was associated with substantially greater effects than exercise on intermediate-term function (6 trials, pooled SMD –1.04, 95% CI –2.82 to 0.71, I²=96%), but statistical heterogeneity was very large. [133, 271, 273, 275, 276, 278, 279] Excluding an outlier trial (SMD –5.31, 95% CI –6.20 to –4.42) [276] eliminated statistical heterogeneity and resulted in a markedly attenuated (small) effect (5 trials, pooled SMD –0.20, 95% CI –0.40 to –0.00, I²=0%). There was no difference between multidisciplinary rehabilitation versus exercise in long-term function (3 trials, pooled SMD –1.82, 95% CI –5.90 to 2.24, I²=98%). [133, 270, 276] Excluding the outlier trial276 described above resulted in a pooled SMD close to 0 (–0.07, 95% CI –0.50 to 0.39, I² = 0%).

Figure 25
Multidisciplinary rehabilitation was associated with small improvements in short-term pain versus exercise (6 trials, pooled difference –0.69 on a 0 to 10 scale, 95% CI –1.16 to –0.22, I²=0%) (Figure 25). Estimates were similar when one poor-quality trial [273] was excluded (5 trials, pooled difference –0.53, 95% CI –1.12 to 0.11), and estimates were similar when analyses were stratified according to intensity of multidisciplinary rehabilitation. In two trials that evaluated high intensity multidisciplinary rehabilitation, the pooled difference was –0.62 (95% CI –1.61 to 0.37). [270, 272] Estimates at intermediate term (6 trials, pooled difference –1.20 points, 95% CI –2.43 to 0.09, I²=95%) [271, 273, 275, 277–279] and long term (3 trials, pooled difference –1.68, 95% CI –5.25 to 1.97, I²=98%) [133, 270, 276] favored multidisciplinary rehabilitation, but differences were not statistically significant. Substantial statistical heterogeneity was present in analyses of intermediate-term and long-term pain, with an outlier trial [276] that reported substantially larger effects than the other trials. For intermediate term, the outlier trial reported a difference of –3.90 points, versus –0.31 to –0.73 points in the other trials. Excluding the outlier trial eliminated statistical heterogeneity and resulted in a small, statistically significant difference in intermediate-term pain that favored multidisciplinary rehabilitation (5 trials, pooled difference –0.55, 95% CI –1.00 to –0.11, I²=0%); there was no difference in long-term pain (2 trials, pooled difference 0.00, 95% CI –1.31 to 1.17, I²=0%). For intermediate-term pain, exclusion of a poor-quality trial [273] (5 trials, pooled difference –1.52, 95% CI –3.35 to 0.39) or restriction of analyses to high intensity multidisciplinary rehabilitation interventions (2 trials, pooled difference –0.60, 95% CI –1.44 to 0.24) [271, 278, 279] did not reduce heterogeneity and differences remained not statistically significant.

Data on other outcomes was limited. One trial found multidisciplinary rehabilitation associated with better scores versus exercise on SF-36 subscales at short-term followup (differences 10 to 21 points). [277] Four trials found no clear differences between multidisciplinary rehabilitation versus exercise on severity of depression. [133, 272–274] Two trials found no clear effects on work status [270, 278, 279] and one trial found high intensity multidisciplinary rehabilitation associated with fewer days or sick leave than exercise, but nonhigh intensity rehabilitation associated with more days of sick leave. [270] Two trials found inconsistent effects on number of health system contacts. [270, 271]

Harms

Data on harms were sparse and reported in only two trials. One study reported no clear difference between multidisciplinary rehabilitation versus exercise in risk of transient worsening of pain, [277] and one trial reported no harms with either multidisciplinary rehabilitation or medications alone. [269]

      Key Question 2. Chronic Neck Pain

For chronic neck pain, 25 RCTs were included in the prior AHRQ report (N=3,294). One study was rated good-quality, sixteen studies fair quality, and eight studies poor quality. The prior AHRQ report found combination exercise, low-level laser therapy, Alexander Technique and acupuncture associated with greater effects than usual care, no treatment, advice alone, or sham on improved function; only combination exercise and low-level laser therapy were also associated with greater improvement in pain. The strength of evidence was low or moderate, and observed at short- intermediate- or long-term followup.

For this update, we identified two new RCTs (N=156) and a new publication (subanalysis) of a previously included trial; all were rated fair quality. One trial evaluated exercise and the other evaluated manual therapy (massage); the subsequent publication provided data for mind-body practices (Alexander Technique) and acupuncture. The Key Points summarize the main findings based on the evidence included in the prior report and new trials; the Key Points note where new trials contributed to findings.

Exercise for Chronic Neck Pain

Key Points

Across types of exercise, there was no clear improvement in function (3 trials [excluding outlier trial], pooled SMD –0.22, 95% CI –0.66 to 0.17, I²=73%) or pain (3 trials [excluding outlier trial], pooled SMD –0.70, 95% CI –1.62 to 0.15, I²=64%) versus no treatment, waitlist or attention control in the short term (SOE: low).
A subgroup of two trials of combination exercises (including 3 of the following 4 exercise categories: muscle performance, mobility, muscle re-education, aerobic) suggests a small benefit for function and pain versus waitlist or attention control over the short term; and function versus attention control in the long term (1 trial) (SOE: low).
There was no clear improvement in function for exercise versus no intervention at intermediate term (1 trial) and a small improvement versus attention control in the long term (1 trial) (SOE: low for both).
There was no improvement in pain for exercise versus no intervention or attention control at intermediate term (2 trials) and versus attention control at long-term (3 trials) (SOE: low for both).
The effect of exercise versus NSAIDs and muscle relaxants on function and pain was indeterminate at short or intermediate term due to insufficient evidence from a single poor-quality trial (SOE: insufficient).
Muscle performance exercise (Pilates) was associated with a small improvement in function and a substantial improvement in pain compared with oral medication (acetaminophen) in the short-term in one new fair quality trial (SOE: low).
Harms were poorly reported in trials of exercise with only two trials describing adverse events. No serious harms were reported in either trial. Minor complaints included muscle pain with exercise, knee pain and lumbar spine pain (SOE: low).

Detailed Synthesis

Table 19
Eight trials of exercise therapy for neck pain met inclusion criteria (Table 19 and Appendix D). [41–46, 100, 101] Seven trials [41–46, 100] were included in the prior AHRQ report and one [101] was added for this update. Four trials evaluated participants with chronic neck pain associated with office work, [41, 43, 45, 46] and one trial each included patients with chronic neck pain following whiplash, [44] nonspecific neck pain, [42] cervical arthritis, [100] and mechanical neck pain (new trial). [101] Across trials, participants were predominately female (>80%) with only the new trial predominantly men (78%).101 Mean ages ranged from 38 to 52 years.

Five trials (1 new) evaluated muscle performance exercises (resistive training), [41, 43, 45, 46, 101] three combined exercise techniques, [42, 44, 100] and one neuromuscular rehabilitation. [46] Sample sizes ranged from 40 to 265 (total sample=973). Four trials compared exercise versus an attention control, [41, 43, 44, 46] one versus no treatment, [45] one versus waitlist, [42] and two (1 new) versus pharmacological care. [100, 101] Four trials were conducted in Europe, [41, 42, 45, 46] one in Australia, [44] one in China, [43] one in Turkey, [100] and one in Brazil (new trial). [101] The duration of exercise therapy ranged from 6 weeks to 12 months, and the number of supervised exercise sessions ranged from 3 to 52. Three trials reported outcomes through long-term followup, [41, 44, 46] two through intermediate-term followup, [45, 100] and three (1 new) evaluated only short-term outcomes. [42, 43, 101]

Four trials, including the new trial, were rated fair quality [43–45, 101] and four poor quality [41, 42, 46, 100] (Appendix D). In the four fair-quality trials, the main methodological limitation was the inability to blind interventions. Limitations in the other trials included inability to blind interventions, unclear randomization and allocation concealment methods, unclear or high loss to followup, and baseline differences between intervention groups.

Exercise Compared With No Treatment, Waitlist, or an Attention Control

Figure 26
Across types of exercise, there was no clear improvement in function versus no treatment, waitlist or an attention control in the short term (4 trials, pooled SMD –0.73, 95% CI –1.84 to 0.36, I²=95.1%), but statistical heterogeneity was very large [42–45] (Figure 26). Excluding an outlier trial (SMD –2.22, 95% CI –2.74 to –1.70) [43] reduced the statistical heterogeneity and resulted in an attenuated effect (SMD –0.22, 95% CI –0.66 to 0.17, I²=72.6%). However, two studies that included combination exercises (3 of the following 4 exercise categories: muscle performance, mobility, muscle re-education, aerobic) found small improvement in function compared with controls short term (2 trials, pooled SMD –0.44, 95% CI –0.76 to –0.09, data not shown in figure). [42, 44] A fair-quality study reported a continued small benefit with combination exercise in the long term (SMD –0.39, 95% CI –0.74 to –0.03). [44]

Figure 27
Exercise tended toward moderately greater effects on short-term pain compared with no treatment, waitlist or an attention control (4 trials, pooled difference –1.33, 95% CI –2.68 to 0.07, I²=89.4%), but statistical heterogeneity was very large, [42–45] (Figure 27). Excluding an outlier trial (difference –2.92, 95% CI –3.38 to –2.46) [43] reduced the statistical heterogeneity and resulted in an attenuated effect (difference –0.70, 95% CI –1.62 to 0.15, I²=63.7%). The effect of exercise on reducing pain was substantially greater in trials assessing combination exercises (2 trials, pooled difference –1.12, 95% CI –1.82 to –0.43; data not shown in figure). [42, 44] There were no differences in pain comparing exercise versus controls in the intermediate term (2 trials, pooled difference –0.25, 95% CI –0.81 to 0.31, I²=0%) [41, 45] or the long term (3 trials, pooled difference 0.07, 95% CI –0.51 to 0.88, I²=0%). [41, 44, 46]

Data on effects of exercise on quality of life were limited. One fair-quality trial44 found significant improvement in SF-36 PCS and MCS in the short term (difference in change score 3.60 on a 0-100 scale, 95% CI 1.23 to 5.97 and 4.00, 95% CI 1.24 to 6.77, respectively) and PCS in the long term (difference in change score 3.80, 95% CI 1.30 to 6.30). A poor-quality trial found no difference in SF-36 PCS or MCS in the short term. [42] No trial evaluated effects of exercise therapies on use of opioid therapies or healthcare utilization.

There was insufficient evidence to determine effects of duration of exercise therapy or number of sessions on outcomes.

Exercise Compared With Pharmacological Therapy

Two trials, (1 new) compared exercise with pharmacological therapy. Differences in the pharmacological therapies and study quality precluded pooling of the trials.

One poor-quality trial (N=40) [100] comparing 1.5 months of home combination exercises (posture, stretching, strengthening and endurance exercises) versus ibuprofen plus thiocolchicoside for 15 days found no between-group difference in function (Neck Disability Index [NDI]) at 3-month (difference –2.2 on 0-50 scale, 95% CI –5.8 to 1.5) or 6-month followup (difference of –1.8, 95% CI –5.7 to 2.1). The study reported similar results for pain intensity (difference –1.0 on a 0-10 scale, 95% CI –2.3 to 0.3 at 3-month and difference –0.8, 95% CI –2.3 to 0.7 at 6-month followup). The exercise group reported a better quality of life compared with the medication group at 3-month and 6-month followup using the Turkish version of the Nottingham Health Profile (difference –141, scale not stated though usual scale 0-100, 95% CI –214 to –68; difference –135, 95% CI –209 to –62, respectively). [100] The groups scored comparably on the Beck Depression Inventory at both followup periods (Table 18).

The new fair-quality trial (N=64)101 found Pilates exercise to be associated with a small improvement in function according to the NDI (difference –5.6 on 0-50 scale, 95% CI –8.4 to –2.8) and a substantial improvement in pain (difference –3.1 on 0-10 scale, 95% CI –4.2 to –2.1) compared with oral medication (acetaminophen) in the short term. SF-36 scores were reported for individual domains; physical functioning, bodily pain, general health, vitality, and mental health showed a small improvement with exercise compared with acetaminophen.

Exercise Compared With Other Nonpharmacological Therapies

Findings for exercise versus other nonpharmacological therapies are addressed in the sections for other nonpharmacological therapies.

Harms

Only two exercise trials reported harms. One reported only mild complaints that included muscle pain with exercise (5%), knee pain (3%), and lumbar spine pain (3%). [44] None required referral to a medical practitioner. In the other, investigators reported no serious harms related to the intervention. [42] One occurrence of minor knee pain was reported in the exercise group.

Psychological Therapies for Chronic Neck Pain

Key Points

No difference was found in function (NDI, 0–80 scale) or pain (visual analog scale [VAS], 0-10 scale) in the short term (adjusted difference 0.1, 95% CI –2.9 to 3.2 and 0.2, 95% CI –0.4 to 0.8, respectively) or intermediate term (adjusted difference 0.2, 95% CI –2.8 to 3.1 and 0.2, 95% CI –0.3 to 0.8, respectively) from one fair-quality study comparing relaxation training and no intervention or exercise (SOE: low for all). We found no trials with outcomes assessed in the long term.
We found no evidence comparing relaxation training with pharmacological therapy.
The only trial of relaxation training did not report harms.

Detailed Synthesis

Table 20
We found one trial comparing the effects of relaxation training versus no intervention (N=258) or exercise therapy (N=263) in female office workers with chronic neck pain [45] (Table 20 and Appendix D). This trial was included in the previous AHRQ report. Relaxation training and muscle performance exercise therapy were done in 30-minute sessions three times per week for 12 weeks, with 1 week of reinforcement training 6 months after randomization. Patients in the no-treatment group were instructed not to change their usual activities. Adherence to the relaxation schedule during the intervention period was 42 percent of the scheduled sessions. The nature of the intervention and control precluded blinding of participants and people administering the interventions; therefore, this trial was rated as fair quality.

Relaxation Training Compared With No Treatment

The one fair-quality trial found no between-group differences in the short term (3 months) or intermediate term (9 months) as measured by a neck disability scale (difference 0.1 on a 0-80 scale, 95% CI –2.9 to 3.2, and difference 0.2, 95% CI –2.8 to 3.1, respectively) [45] (Table 19). The neck disability scale, a nonvalidated instrument, asked whether the participant had pain or difficulty on eight functional activities, with each activity scored from 0 (no pain or hindrance) to 10 (unbearable pain or maximum hindrance), for a total of 80 points. Likewise, there were no differences in pain intensity between groups at the same time frames, (difference 0.2 on a 10-point scale, 95% CI –0.4 to 0.8, and difference 0.2, 95% CI –0.3 to 0.8, respectively). There were no trials evaluating relaxation in the long term.

Relaxation Training Compared With Pharmacological Therapy

We did not find any trials meeting our criteria that compared a relaxation training with pharmacological therapy.

Relaxation Training Compared With Exercise Therapy

The one fair-quality trial found no differences between relaxation training and exercise therapy in the short term (3 months) or intermediate term (9 months) as measured by a neck disability scale described above (difference 0.2 on a 0-80 scale, 95% CI –2.8 to 3.2, and difference 0.2, 95% CI –2.7 to 3.2, respectively) [45] (Table 19). Similarly, there were no differences in pain intensity between groups at the same time frames (difference –0.2 on a 10-point scale, 95% CI –0.8 to 0.4, and difference –0.2, 95% CI –0.8 to 0.3, respectively). There were no trials comparing relaxation with exercise therapy in the long term.

Harms

The trial on relaxation therapy did not report harms.45

Physical Modalities for Chronic Neck Pain

Key Points

Low-level laser therapy was associated with a moderate improvement in short-term function (2 trials, pooled difference –13.60, 95% CI –26.30 to –6.30, I²=0%, 0-100 scale) and pain (3 trials, pooled difference –1.89 on a 0-10 scale, 95% CI –3.34 to –0.06, I²=61%) compared with sham (SOE: moderate for function and pain).
Data from two small, poor-quality trials, one evaluating cervical traction versus attention control (infrared irradiation) and the other electromagnetic fields versus sham, were insufficient to determine effects on function or pain over the short term (SOE: insufficient).
No trials assessed outcomes in the intermediate term or long term, or compared a physical modality to pharmacological therapy or exercise.
Harms were poorly reported in trials of low-level laser. Adverse effects occurred with similar frequency in the laser and sham groups in the one trial reporting such effects. The most frequently reported adverse effects included mild (78%) or moderately (60%) increased neck pain, increased pain elsewhere (78%), mild headache (60%), and tiredness (24%) (SOE: low).
The trials of cervical traction and electromagnetic fields did not report harms.

Detailed Synthesis

Table 21
A total of five trials (N range, 53 to 90; total sample=363) [145–149] evaluating physical modalities for the treatment of chronic neck pain met inclusion criteria (Table 21 and Appendixes D and E). All of the trials were included in the prior AHRQ report. Interventions included traction, laser therapy, and electromagnetic field therapy.

One trial (N=79) conducted in Hong Kong compared intermittent cervical traction versus attention control (infrared irradiation). [146] Each treatment was administered for 20 minutes twice weekly for 6 weeks. This trial was considered poor quality due to lack of patient and caregiver blinding, high and unequal attrition (41% in traction group, 58% in control), and dissimilar baseline characteristics between groups.

Three trials (N range, 53 to 90; total sample=203) [145, 147, 148] compared low-level laser therapy with sham. The mean duration of pain varied from 4 years in two trials [145, 148] to 15 years in a third. [147] Treatment consisted of laser application (wavelength range, 830 to 904 nm) over several myofascial tender points; across the trials, duration ranged from 30 seconds to 3 minutes per tender point and frequency varied from daily to twice weekly over periods of 2 or 7 weeks. One trial was rated good quality [147] and two fair quality. [145, 148] Common methodological limitations in the two fair-quality trials included inadequate reporting of treatment allocation and no or unclear blinding of the care provider. In addition, baseline characteristics were not similar in one trial, in which the intervention group tended to have more pain and tenderness and longer duration of symptoms. [145]

One trial (N=81) compared the effects of eighteen 30-minute sessions (3-5 times per week) of low frequency pulsed electromagnetic fields versus sham. [149] The treatment consisted of an electromagnetic coil against the back of the neck while the participants were lying on a pillow. The investigators covered the set of light emitting diodes that pulse to signal the coil being energized in order to blind the participants to the treatment or sham. This trial was rated as poor quality due to several factors: failure to describe the number randomized in each group; inadequate reporting of treatment compliance and information to calculate participant attrition and intent to treat analysis; care provider not blinded to treatment; and baseline characteristics dissimilar between groups.

Physical Modalities Compared With Attention Control or Sham

Traction. One poor-quality trial found no short-term differences in function comparing intermittent cervical traction versus attention control (infrared irradiation) using the Northwick Park Questionnaire (NPQ) (difference –1.8, 95% CI –10.8 to 7.2, 0-100% scale). [146] Likewise, there was no difference in pain intensity between groups (difference –0.7, 95% CI –2.2 to 0.8, 10 point scale). There were no trials evaluating cervical traction in the intermediate term or long term.

Figure 28
Figure 29
Low-Level Laser Therapy. Laser was associated with moderately greater effects compared with sham on short-term function (2 trials, pooled difference –13.60, 95% CI –26.30 to –6.30, I²=0%, 0-100 scale) (Figure 28) [147, 148] and short-term pain (3 trials, pooled difference –1.89, 95% CI –3.34 to –0.06, I²=61%, 0-10 scale) (Figure 29). [145, 147, 148] Pain improvement of greater than –3.0 on a 10-point VAS scale was substantially more common with laser therapy in the good-quality trial (RR 6.0, 95% CI 1.9 to 19.0).147 Quality of life improvement also favored low-level laser as measured by the SF-36 PCS (difference 4.5, 95% CI 0.7 to 8.2) [147] and the Nottingham Health Profile (difference –16.1 on a 0-100 scale, 95% CI –30.9 to –1.3). [148] Measures demonstrating no difference between groups included the SF36 MCS and the McGill Pain Questionnaire component scores [147] (Table 20). There were no trials evaluating laser therapy in the intermediate term or long term.

Electromagnetic Fields. One poor-quality trial found no between-group differences in short-term difficulty with activities of daily living (ADLs) (difference 1.6, 95% CI –1.5 to 4.8, scale 0-24, nonvalidated measure). [149] The ADL instrument asked whether the participant had pain or difficulty on eight activities scored from 0 (never) to 3 (always), for a total of 24 points.

Likewise, there was no difference in pain intensity between groups (difference 1.1, 95% CI –0.3 to 2.6, 0-10 scale) or in patients’ assessment of improvement (difference 1.2, 95% CI –15.2 to 17.6, 0-100 scale).149 There were no trials evaluating electromagnetic fields in the intermediate term or long term.

Physical Modalities Compared With Pharmacological Therapy or With Exercise Therapy

We did not find any trials meeting our criteria comparing a physical modality with pharmacological therapy or with exercise.

Harms

Only one laser trial reported harms. [147] The trial reported a large number of adverse effects with similar frequency in both groups. However, the sham group reported nausea significantly more frequently (42% vs. 20%) while the laser group reported stiffness more frequently (20% vs. 4%). The most frequently reported adverse effects included mild (78%) or moderate (60%) increased neck pain, increased pain elsewhere (78%), mild headache (60%), and tiredness (24%). Harms were not reported by either trial evaluating cervical traction or electromagnetic fields.

Manual Therapies for Chronic Neck Pain

Key Points

Massage

The effects of Swedish massage on function (≥5 point improvement on the NDI) versus self-management attention control were small and not statistically significant in one trial in the short term (39% versus 14%, RR 2.7, 95% CI 0.99 to 7.5) and intermediate term (57% versus 31%, RR 1.8, 95% CI 0.97 to 3.5) (SOE: low for both time periods).
Massage was associated with a small improvement in short-term function compared with attention or waitlist control (2 trials [1 new], pooled difference –3.66 on a 0-50 NDI scale, 95% CI –6.58 to –0.56, I²=10%) (SOE: low).
Massage was associated with a moderate improvement compared with waitlist control in short-term pain intensity experienced during the previous 7 days (1 new trial, difference –1.8 on a 0-10 scale, 95% CI –2.7 to –0.9) (SOE: low).
No clear evidence that massage improved pain in the intermediate term versus exercise (p>0.05, data not reported) was seen in a third fair-quality trial (SOE: low).
Three fair-quality trials (1 new) reported no serious adverse effects; transient nonserious pain or soreness was reported during or following massage in two trials (1 new) and during or after exercise, but not massage, in a third trial (SOE: low).

Detailed Synthesis

Massage

Table 22
Three trials of massage therapy met inclusion criteria (Table 22 and Appendix D). [181–183] Two trials [181, 182] were included in the prior AHRQ report and one [183] was added for this update. Sample sizes ranged from 64 to 108 (total sample=264). One trial compared Swedish massage versus attention control (self-care education), [182] the new trial compared Tuina massage versus waitlist [183] and one trial compared classical massage versus two types of exercise (muscle re-education and strength training targeting the neck and shoulder muscles). [181] Swedish and classical massage (nonforceful) were performed on the neck and back, and in some cases the pectoral muscles and rotator cuff or arms. Tuina massage included soft tissue massage, local muscle stretching, mobilization and traction of the cervical spine, and manipulation of local pain (trigger) points; no high-velocity/low-amplitude thrusts were applied. Muscle re-education exercise was performed with a newly developed training device strapped to the head and consisted of a plate with 5 exchangeable surfaces that allow for progression of task difficulty; strength training included both isometric and dynamic exercises targeting the neck and shoulders. One trial was conducted in the United States, [182] one in Sweden [181] and the new trial in Germany. [183] One trial administered 6 massage treatments over 3 weeks, [183] a second trial 10 massage treatments over 10 weeks, [182] and the third trial 22 massage treatments over 11 weeks. [181] The new trial evaluated outcomes in the short term only [183]; trials included in the original report one in reported the intermediate term only, [181] and one reported on the short and intermediate term. [182]

All trials were rated fair quality (Appendix E). Methodological limitations included the inability to blind interventions in all trials, and 21 percent attrition in the trial comparing massage with exercise. [181]

Massage Therapy Compared With an Attention Control or Waitlist

Figure 30
One trial of Swedish massage versus attention control found that a greater proportion of participants in the massage group achieved ≥5 point improvement on the NDI in the short-term (39% versus 14%, RR 2.7, 95% CI 0.99 to 7.5) and intermediate term (57% versus 31%, RR 1.8, 95% CI 0.97 to 3.5). [153] Massage was associated with a small improvement in short-term function compared with attention or waitlist controls (2 trials [1 new], pooled difference –3.66 on a 0 to 50 NDI scale, 95% CI –6.58 to –0.56, I²=10.2%) (Figure 30). [182, 183] The massage technique in one trial was soft tissue massage and mobilization of upper extremity joints and the cervical spine (i.e., Tuina massage) (difference –4.8, 95% CI –7.0 to –2.6 on the 0 to 50 NDI scale) [183] and structural or relaxation massage (i.e., Swedish massage) in one trial (difference –2.3, 95% CI –4.7 to 0.1 on the 0-50 NDI scale). [182] One new, small fair quality study reported that Tuina massage was associated with moderate improvement in pain intensity experienced during the previous 7 days compared with waitlist controls (difference –1.8 on a 0-10 scale, 95% CI –2.7 to –0.9). [183] A greater proportion of participants in the Swedish massage group reported improvement in a symptom bothersomeness scale (≥30%) in the short term (55% versus 25%; RR 2.2, 95% CI 1.04 to 4.2) but not the intermediate term (43% vs. 39%; RR 1.1, 95% CI 0.6 to 2.0) compared with attention controls in one trial. [182] One new trial found no differences between groups in SF-36 PCS and MCS while one reported a better quality of life as measured by the SF-12 PCS (difference 5.6 on a 0-100 scale, 95% CI 2.4 to 8.9), but not on the SF-12 MCS (difference 2.6 on a 0-100 scale, 95% CI –1.4 to 6.6). [183]

Massage Therapy Compared With Pharmacological Therapy

No trial of manual therapy versus pharmacological therapy met inclusion criteria.

Massage Therapy Compared With Exercise

One fair-quality study reported no difference in intermediate-term pain comparing classical massage with neck coordination exercises (difference 0.2, 95% CI –0.82 to 1.22, 0-10 scale) or muscle performance exercises (no data given, p>0.05). [181] The use of opioid therapies and healthcare utilization were not evaluated.

Harms

None of the trials reported serious adverse effects. Nonserious mild adverse effects included discomfort or pain during (n=5) or after Swedish massage (n=3) in one trial. [182] In the new trial of Tuina massage, the proportion of patients reporting mild adverse events was 41.3% (19/46); most included increased pain (aching muscles, n =11; headache, n=3 and point tenderness, n=1). [183] Other mild adverse events included dizziness, sleepiness, mood swings, nausea, difficulty staying asleep, difficulty moving the head and neck. In the third trial, transient neck or headache pain was reported in the neuromuscular training exercise group (n=10); there was no mention of complications for the strength training or massage groups. [181]

Mind-Body Practices for Chronic Neck Pain

Key Points

Alexander Technique resulted in a small improvement in function in the short term (difference –5.56 on a 0-100% scale, 95% CI –8.33 to –2.78) and intermediate term (difference –3.92, 95% CI –6.87 to –0.97) compared with usual care alone, based on one fair-quality trial (SOE: low).
There was no clear evidence that basic body awareness therapy improved function in the short term versus exercise in one fair-quality trial (SOE: low).
There is insufficient evidence from one poor-quality trial to determine the effects of qigong on intermediate-term or long-term function or pain versus exercise; no data were available for short term outcomes (SOE: insufficient).
Both fair-quality trials reported no serious treatment-related adverse events. The trial evaluating Alexander Technique versus usual care found no clear between-group difference for nonserious adverse events, such as pain and incapacity, knee injury, or muscle spasm (RR 2.25, 95% CI 1.00 to 5.04). The other trial reported no differences between basic body awareness and exercise in any nonserious adverse effect (RR 0.65, 95% CI 0.37 to 1.14) (SOE: low).

Detailed Synthesis

Table 23
Three trials (reported in 4 publications) of mind-body practices met inclusion criteria, (Table 23 and Appendix D). [213, 214, 221, 222] All three trials were included in the prior AHRQ report; only a newly identified publication (subanalysis) [214] of a previously included trial [213] was added for this update. One trial evaluated the Alexander Technique (a method of self-care developed to help people enhance their control of reaction and improve their way of going about everyday activities) plus usual care (N=344), [213] one trial basic body awareness therapy (N=113), [222] and one trial of qigong (N=139). [221] One trial compared mind-body techniques versus usual care [213] and two trials versus individually adjusted cervical and shoulder strengthening and stretching exercises, [221] or group-led exercises for whole body strengthening, aerobic, and coordination exercises. [222] Two trials were conducted in Sweden [221, 222] and one in England. [213] The duration of mind-body treatment ranged from 10 to 20 weeks and the number of treatment sessions ranged from 12 to 20. One trial reported outcomes during the intermediate term and long term, [221] one short-term and intermediate-term outcomes, [213] and one short-term outcomes only. [222]

Two of the trials were rated fair quality [213, 222] and one trial poor quality [221] (Appendix E). In the two fair-quality trials, the main methodological limitation was the inability to blind interventions. Limitations in the other trial included the inability to blind interventions, high attrition, and unequal loss to followup between groups.

Mind-Body Practices Compared With Usual Care

One fair-quality trial found a small improvement in function as measured by the NPQ in favor of the Alexander Technique plus usual care versus usual care alone in the short term (difference –5.56 on a 100% scale, 95% CI –8.33 to –2.78) and intermediate term (difference –3.92, 95% CI –6.87 to –0.97). [213] There were no significant differences between the intervention group and usual care for the physical component score of the SF-12 (version 2) at 1-month or 7-month followup. However, significantly larger improvements in the MCS occurred in the Alexander group versus the usual care group 7 months following treatment (difference, 2.12 on a 0-100 scale, 95% CI 0.42 to 3.82). [213] In a new secondary economic analysis of a subset (57%) of patients from a previously included trial there were no significant differences between Alexander Technique and usual care in terms of UK National Health Service (NHS) healthcare utilization (appointments or prescription items). [214] While more people paid for extra Alexander lessons in the private healthcare setting, this represented people who attended all trial sessions and paid for extra. There were no differences in terms of utilizing other private healthcare services.

Mind-Body Practices Compared With Pharmacological Therapy

No trial of mind-body practice versus pharmacological therapy met inclusion criteria.

Mind-Body Practices Compared With Exercise

There were no differences in function as measured by the NDI between basic body awareness therapy (1 fair-quality study, n=113) [222] in the short term (mean change from baseline –2 versus –1, p>0.05) or qigong (poor-quality study, n=139) [221] in the intermediate term or long term (median 22 versus 18, p>0.05, at each time period) versus exercise therapy. The trial assessing qigong found no difference in pain at 6 or 12 months following treatment (median 2.6 versus 2.3 and 2.8 versus 2.3, p>0.05, respectively). [221] Two of the eight sections of the SF-36v2 favored basic body awareness therapy versus exercise in the short term (bodily pain and social functioning) in the fair-quality trial. [222] No other section of the SF-36v2 demonstrated a difference between groups. No trial evaluated effects of mind-body practices on use of opioid therapies.

Harms

Two trials, one of basic body awareness therapy [222] and the other of Alexander Technique, [213] reported no serious adverse effects. One patient in the basic body awareness group and four patients in the exercise group reported that they discontinued treatment due to increased neck symptoms or pain in other joints (p=0.363). The event risk for all nonserious adverse events was 0.27 in the body awareness therapy group and 0.40 in the exercise group (RR 0.65, 95% CI 0.37 to 1.14). In the trial comparing Alexander Technique versus usual care, no clear difference was seen in the risk of any nonserious adverse event (e.g., pain and incapacity, knee injury, muscle spasm, and complications after surgery): RR 2.25 (95% CI 1.00 to 5.04).

Acupuncture for Chronic Neck Pain

Key Points

Acupuncture was associated with small improvements in short-term and intermediate-term function versus sham acupuncture, a placebo (sham laser), or usual care (short term, 5 trials, pooled SMD –0.40, 95% CI –0.67 to –0.14, I²=61%; intermediate term, 3 trials, pooled SMD –0.19, 95% CI –0.37 to 0.05, I²=0%). One trial reported no difference in function in the long term (SMD –0.23, 95% CI –0.61 to 0.16) (SOE: low for all time periods).
There were no differences in pain in trials comparing acupuncture with sham acupuncture or placebo interventions in the short term (4 trials [excluding outlier trial], pooled difference –0.27 on a 0-10 scale, 95% CI –0.59 to 0.05, I²=2%), intermediate term (3 trials, pooled difference 0.40, 95% CI –0.45 to 1.44, I²=19%), or long term (1 trial, difference –0.35, 95% CI –1.34 to 0.64) (SOE: low for all time periods).
There was insufficient evidence from two small poor-quality trials to draw conclusions regarding short-term function or pain for acupuncture versus NSAIDs (SOE: insufficient).
No serious adverse events were reported in six trials reporting harms. The most commonly reported nonserious adverse events in people receiving acupuncture included numbness/discomfort, fainting, and bruising (SOE: moderate).

Detailed Synthesis

Table 24
We identified nine trials (reported in 10 publications) of acupuncture that met our inclusion criteria, (Table 24 and Appendix D). [213, 214, 231–237, 254] All trials were included in the prior AHRQ report; only a newly identified publication (subanalysis) [214] of a previously included trial [213] was added for this update. All trials evaluated needle acupuncture to body acupoints; two also evaluated electroacupuncture. [234, 237] Control groups included sham acupuncture in five trials, [231–234, 236] placebo intervention (sham TENS [235] and sham laser acupuncture [237]) in two trials, usual care in one trial, [213] and pharmacological therapy (Zaltoprofen [254] and Trilisate [231]) in two trials. The duration of acupuncture therapy ranged from 2 weeks to 5 months, and the number of sessions from 5 to 14. Sample sizes ranged from 30 to 345 (total sample=1, [260]). Across trials, participants were predominately female (from 60% to 90%) with mean ages ranging from 37 to 53 years. One trial was conducted in the United States, [231] one in Turkey, [234] and the rest in Asia [232, 233, 237, 254] or Europe. [213, 235, 236] One trial reported outcomes through long-term followup, [236] four trials through intermediate-term followup, [213, 235–237] and the remainder only evaluated short-term outcomes. [231–234, 254]

Seven trials were rated fair quality [213, 232–237] and two trials poor quality [231, 254] (Appendix E). Common limitations in the fair-quality trials included unclear allocation concealment methods and of care provider blinding; additionally, the poor-quality trials had baseline group dissimilarity (not controlled for) and high attrition.

Acupuncture Compared With Sham Acupuncture, Usual Care, or a Placebo Intervention

Figure 31
Figure 32
Acupuncture was associated with small improvements in short-term and intermediate-term function versus sham acupuncture, placebo (sham laser), or usual care (short term, 5 trials, [213, 232, 233, 236, 237] pooled SMD –0.40, 95% CI –0.67 to –0.14, I²=61%; intermediate term, 3 trials, [213, 236, 237] pooled SMD –0.19, 95% CI –0.37 to 0.05, I²=0.0%) (Figure 31). Trials measured function using the NDI or the NPQ; across trials the SMD ranged from –0.78 to –0.03 in the short term and –0.29 to –0.05 in the intermediate term. None of the trials were rated poor quality. One trial reported no difference in function in the long term (SMD –0.23, 95% CI –0.61 to 0.16).236 Acupuncture was associated with small improvements in short-term pain versus controls (5 trials, pooled difference –0.66, 95% CI –1.46 to 0.11, I²=78.4%), but statistical heterogeneity was large. [232–234, 236, 237] (Figure 32). Excluding an outlier trial (pooled difference –1.80, 95% CI –2.36 to –1.24) [232] eliminated statistical heterogeneity and resulted in a markedly attenuated effect (difference –0.27, 95% CI –0.59 to 0.05, I²=2%). Stratified analyses according to the type of control (sham or placebo laser) resulted in similar estimates. Trials reported no differences in pain between acupuncture versus controls in the intermediate term (3 trials, pooled difference 0.40, 95% CI –0.45 to 1.44, I²=18.7%)235–237 or long term (1 trial, difference –0.35, 95% CI –1.34 to 0.64). [236] In a secondary economic analysis of a subset (57%) of patients, 1 trial reported that there were no significant differences between acupuncture and usual care in terms of UK NHS healthcare utilization (appointments or prescription items). [214] While more people paid for extra acupuncture in the private healthcare setting, this represented people who attended all trial sessions and paid for extra. There were no differences in terms of utilizing other private healthcare services. In general, acupuncture did not improve quality of life compared with sham intervention in the short term or intermediate term as reported in four trials [233, 235–237] (Table 23). No trial evaluated effects of acupuncture on use of opioid therapies.

Acupuncture Compared With Pharmacological Therapy

Two small poor-quality trials evaluated acupuncture versus NSAIDs. One trial (n=27) compared acupuncture three times per week for 3 weeks versus 80 mg of Zaltoprofen alone three times per day for 3 weeks. [254] The other trial (n=30) compared 14 sessions of acupuncture versus 500 mg of Trilisate per day for 8 weeks.231 In the short term, one trial reported no difference in NDI (difference –0.4, 95% CI –4.6 to 3.8). [254] Both trials reported no difference between groups in pain as measured by the McGill Pain Questionnaire [231] or VAS. [254] One trial found no differences between groups in the Beck Depression Index, the SF-36, or the EQ-5D in the short term [254] (Table 23).

Acupuncture Compared With Exercise Therapy

No trial of acupuncture versus exercise met inclusion criteria.

Harms

Six of the eight trials assessing acupuncture reported harms. [213, 233, 235–237, 254] No serious adverse events (defined as involving death, hospitalization, persistent disability, or a life-threatening risk in one trial [213] and undefined in the other five studies) were reported in any trial. The most commonly reported nonserious adverse effects in people receiving acupuncture included numbness/discomfort (2.7%), fainting (1.1%), and bruising (1.1%).

      Key Question 3. Osteoarthritis Pain

For OA, 53 RCTs (in 56 publications) were included in the prior AHRQ report (N=6, [101]). Four studies were rated good quality, 31 studies fair quality, and 18 studies poor quality. The prior AHRQ report found exercise and ultrasound (US) associated with greater effects than usual care, an attention control or a sham procedure on improved function (exercise, US) or pain (exercise) for the treatment of knee OA. The strength of evidence was low or moderate, generally stronger for function than for pain, and observed at short, intermediate, and long term (with the exception of pain) for exercise but only short term for ultrasound. For hip OA, exercise and manual therapy were associated with small improvements compared with usual care and exercise for function (short and intermediate term) and pain (intermediate term). The strength of evidence was low. For hand OA, there was either no difference between treatment groups for function or pain or the evidence was insufficient to draw conclusions.

For this update, we identified nine new RCTs (in 10 publications) of knee OA (N=1,235); no new trials evaluating hip or hand OA were identified. One of the new studies was rated good quality, seven were rated fair quality, and one was rated poor quality. The new trials evaluated exercise (5 trials), psychological therapies (2 trials), and physical modalities (ultrasound) (2 trials). The Key Points summarize the main findings based on the evidence included in the prior report and new trials; the Key Points note where new trials contributed to findings.

Exercise for Osteoarthritis Knee Pain

Key Points

Exercise was associated with a small improvement in function compared with usual care, no treatment, or sham intervention short term (8 trials [1 new trial], pooled SMD –0.29, 95% CI –0.46 to –0.11, I²=10%) moderate improvement intermediate term (11 trials [two new trials and excluding outlier trial], pooled SMD –0.63, 95% CI –1.17 to –0.10, I²=91%), and small improvement long term (4 trials [2 new trials], pooled SMD –0.22, 95% CI –0.34 to –0.08, I²=0%) (SOE: moderate for short term; low for intermediate and long term).
One trial found no statistical difference between exercise or sham procedure in the proportion of patients who reported clinically relevant reductions (≥1.75 points) in VAS pain on movement (prior week) [58% (34/59) vs. 42% (27/65); RR 1.4, 95% CI 1.0 to 2.0] or VAS global improvement in pain [59% (35/59) vs. 50% (33/65); RR 1.2, 95% CI 0.8 to 1.6] in the short term.
Exercise was associated with a small improvement in pain short term (8 trials [1 new trial], pooled difference on a 0-10 scale –0.47, 95% CI –0.86 to –0.10, I²= 42%) versus usual care, no treatment, waitlist, or sham intervention (SOE: moderate), a moderate improvement intermediate term (11 trials [2 new trials], pooled difference –1.34, 95% CI –2.12 to –0.54, I²=90% on a 0-10 scale) compared with usual care, an attention control, waitlist, or no treatment (SOE: low), and a small improvement long term (4 trials [2 new trials], pooled difference –0.30 on a 0 to 10 scale, 95% CI –0.49 to 0.00, I²=0%) compared to usual care, attention control, or waitlist. (SOE: low).
One new trial found that more patients who received exercise versus pharmacological therapy (analgesics and anti-inflammatory drugs) achieved a clinically important improvement in function in the intermediate term (>10 point improvement on the Knee Injury and Osteoarthritis Outcome Score [KOOS] ADL), 47% (22/47) versus 28% (13/46); RR 1.7, 95% CI 1.0 to 2.9, although the difference did not reach statistical significance. There were no differences between the groups across all other function and pain outcomes measured (SOE: low).
Harms were not well reported. Across seven trials, one reported minor temporary increase in pain with exercise, four others found no difference in worsening pain versus controls, and one reported no difference in falls or death (SOE: moderate).

Detailed Synthesis

Table 25
Twenty-three trials (in 26 publications) of exercise therapy for knee osteoarthritis (OA) met inclusion criteria (Table 25 and Appendix D). [47–71, 102, 103] Eighteen trials (in 21 publications) 47–67 were included in the prior AHRQ report and five (in six publications) [68–71, 102, 103] were added for this update.

Eight trials evaluated muscle performance exercise versus attention control, [51, 52, 54, 57, 58, 66]
no treatment [49, 53, 65]
or usual care (1 new trial). [71]

In nine trials (3 new trials), the interventions consisted of combined exercise approaches compared with usual care, [47, 55, 56, 60, 63, 68–70]
an attention control [64]
or no treatment. [50]

Muscle performance exercises were a component of nine of these trials (3 new trials). [47, 50, 55, 56, 60, 63, 64, 68–70] One trial had an aerobic exercise arm that consisted of a facility-based, 1-hour walking program three times per week over 3 months, and it used an attention control. [51, 57, 58] A single trial evaluated a mobility exercise program based on Mechanical Diagnosis and Therapy (MDT) versus a waitlist comparator, where patients were allowed to continue receiving usual care. [61] One trial evaluated gait training (guided strategies to optimize knee movements during treadmill walking with computerized motion analysis with visual feedback) versus usual care. [32] Five trials (2 new trials) tested exercise programs as a part of physiotherapy care compared to usual care or sham. [48, 59, 67–69] The duration of exercise programs ranged from 2 to 26 weeks; the number of exercise sessions ranged from 4 to 36. One new trial compared neuromuscular reeducation exercise with pharmacological intervention. [102, 103]

Sample sizes ranged from 50 to 786 (total sample=3,633). Across the trials, the majority of patients were female (51% to 100%) with mean ages ranging from 56 to 75 years. Seven trials (2 new trials) specifically included patients with bilateral knee OA. [49, 52–54, 66, 68, 69] Six trials (1 new trials) were conducted in the United States or Canada, [51, 56–58, 60–63, 68] eight (3 new trials) in Europe, [55, 59, 64, 65, 67, 69, 71, 102, 103] five in Taiwan, [49, 52–54, 66] two in Australia or New Zealand, [47, 48] one in Brazil [50] and one new trial in Malaysia. [70] Most trials had short (7 trials [1 new trial]) [47, 55, 61, 62, 65, 67, 69] or intermediate followup (13 trials [3 new trials]). [49, 50, 52–54, 56, 62–64, 66, 68, 70, 102, 103] Four trials (1 new trial) reported long-term outcomes. [56–58, 60, 64, 71]

Sixteen trials (4 new trials) were rated fair quality (one at short-term followup [62]), [47, 48, 51, 52, 54–61, 65, 68–70, 102, 103] and nine trials (1 new trial) poor quality, [49, 50, 53, 63, 64, 66, 67, 71] including one at intermediate-term followup [62] (Appendix E). In the fair-quality trials, the main methodological limitation was a lack of blinding for the patients or care providers. Additional limitations in the poor-quality trials included unclear randomization and allocation concealment methods, unclear use of intention to treat, unclear baseline differences between intervention groups, and attrition not reported or unacceptable.

Exercise Compared With Usual Care, No Treatment, Sham, or an Attention Control

Figure 33
Functional Outcomes. Exercise was associated with a small improvement short-term in function (assessed across various measures) compared with usual care, no treatment, or sham intervention (8 trials [1 new trial], pooled SMD –0.29, 95% CI –0.46 to –0.11, I²=9.9%), [48, 55, 59, 61, 62, 65, 67, 69] (Figure 33). Estimates were similar following exclusion of poor-quality trials and when analyses were stratified by exercise and control type. In the short term, across three fair-quality trials, [55, 61, 65] a small improvement in the KOOS Sport and Recreation scale was seen with exercise compared with usual care or no treatment (pooled difference 5.88 on a 0-100 scale, 95% CI 0.28 to 11.27, I²=0%, plot not shown) but there was no clear difference between groups in the KOOS ADL (pooled difference 5.06 on a 0-100 scale, 95% CI –1.99 to 10.65, I²=44.6%, plot not shown).

Exercise was also associated with moderate improvement in function (assessed across various measures) versus usual care, no treatment, or attention control at intermediate term (12 trials [2 new trials], pooled SMD –0.98, 95% CI –1.86 to –0.13, I²=96.5%), [49, 50, 52–54, 56, 59, 62, 63, 66, 68, 70] (Figure 33). Substantial heterogeneity was present with one outlier trial50 of combination exercise versus no treatment in elderly patients (median age 75 years) which had higher (worse) baseline Lequesne Index scores compared with other studies and a larger change from baseline score in the intervention group. Removal of this poor quality trial did not improve heterogeneity but did attenuate the pooled estimate (11 trials [2 new trials], pooled SMD –0.63, 95% CI –1.17 to –0.10, I²=90.8%). Stratification by exercise type and control type may partially explain the heterogeneity. Muscle performance exercise, but not combination exercise (5 trials), was associated with a moderate improvement in function compared with attention control or no treatment (5 trials, pooled SMD –1.44, 95% CI –2.08 to –0.79) [49, 52–54, 66] and when compared with attention control only (3 trials, pooled SMD –1.12, 95% CI –1.83 to –0.47)52,54,66 and no treatment only (2 poor quality trials, pooled SMD –1.88, 95% CI –3.16 to –0.55). [49, 53] No difference was seen across studies of exercise versus usual care (5 trials [1 new trial], pooled SMD 0.05, 95% CI –0.16 to 0.26). [56, 59, 62, 63, 70]

Analyses confined to trials that evaluated function on the 0-24 point Lequesne Index also suggests a moderate improvement in intermediate-term function with exercise compared with attention control or no treatment (6 trials, pooled difference –3.42, 95% CI –5.77 to –1.07, I²=97%, plot not shown). [49, 50, 52–54, 66] Again, removal of the poor quality outlier trial [50] did not impact the heterogeneity, but yielded a slightly lower effect estimate (5 trials, pooled difference –2.40, 95 CI –3.32 to –1.44), still consistent with a moderate effect for exercise. Results were similar when analyses were stratified according to muscle performance exercise, use of attention control, and study quality (when only the two fair-quality trials were retained).

One fair-quality trial (n=101 with knee OA) [47] compared combined exercise programs to usual care for intermediate-term function using the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). The exercise group had improvement in function from baseline, which was not statistically significant (mean change from baseline –12.7, 95% CI –27.1 to 1.7), while the usual care group had no change in function (mean change from baseline 1.6, 95% CI –10.5 to 13.7). Data were insufficient to determine effect size or include in the meta-analysis.

One, new fair-quality trial showed no significant difference between combined exercise and usual care at intermediate term for the KOOS Sport and Recreation (difference –18.2 on a 0-100 scale, 95% CI –41.5 to 5.1) or KOOS ADL (difference –5.4 on a 0-100 scale, 95% CI –18.3 to 7.4). [70]

One trial separately analyzed participants free of disability for ADLs at baseline (n=250) and followed them to compare cumulative incidence of disability over 15 months. The aerobic exercise group had decreased risk of disability compared to the attention control group, RR 0.53 (95% CI 0.33, 0.85), as did the muscle performance exercise group compared to the attention control group, RR 0.60 (95% CI 0.38, 0.97). [57]

A small improvement in function long-term was seen across four trials (2 new trials) of exercise compared with usual care, attention control, or waitlist (pooled SMD –0.22, 95% CI –0.34 to –0.08, I²=0%), two fair [56, 68] and two poor quality [64, 71] (Figure 33). Following exclusion of the two poor quality trials the difference was slightly attenuated and no longer statistically significant (pooled SMD –0.18, 95% CI –0.38 to 0.03, I²=0%). No difference between groups was seen when exercise was compared with a waitlist control only (2 trials, pooled difference –0.17, 95% CI –0.45 to 0.15). A single, new poor-quality trial found no long-term difference in KOOS Sport and Recreation (difference 2.3 on a 0-100 scale, 95% CI –7.9 to 12.5) or KOOS ADL (difference 0.90 on a 0-100 scale, 95% CI –4.1 to 5.9) for muscle performance exercise compared with waitlist. [71]

Figure 34
Pain Outcomes. One fair-quality trial found no statistical difference between exercise or sham procedure in the proportion of patients who reported clinically relevant reductions (≥1.75 points) in VAS pain on movement (prior week) [58% (34/59) vs. 42% (27/65); RR 1.4, 95% CI 1.0 to 2.0] or VAS global improvement in pain [59% (35/59) vs. 50% (33/65); RR 1.2, 95% CI 0.8 to 1.6] in the short term. [48] Exercise was associated with a small improvement in short-term pain compared with usual care, no treatment, waitlist or sham in eight (1 new) trials (pooled difference on a 0-10 scale –0.47, 95% CI –0.86 to –0.10, I²=42%) (Figure 34). Seven trials (1 new trial) were fair quality [48, 55, 59, 61, 62, 65, 69] and one was poor quality. [67] The estimate was similar following exclusion of the poor-quality trial (pooled difference –0.45, 95% CI –0.86 to –0.04). Across studies comparing exercise with usual care, results were also similar (5 trials, pooled difference –0.53, 95% CI –1.07 to –0.02). [55, 59, 61, 62, 67]

Exercise was associated with moderately greater improvement in intermediate-term pain compared with usual care, attention control, waitlist or no treatment across pain measures (11 trials [2 new trials], pooled difference –1.34, 95% CI –2.12 to –0.54, I²=90% on a 0-10 scale) across six fair-quality trials (2 new trials) [52, 54, 56, 59, 68, 70] and five poor-quality trials [49, 53, 62, 63, 66] (Figure 34). Following exclusion of the poor quality trials the difference between groups was attenuated and no longer statistically significant (pooled SMD –0.98, 95% CI –2.09 to 0.12). Results differed somewhat by type of exercise and type of control. Five trials (2 new trials) showed no difference between combination exercise and usual care or waitlist [56, 59, 63]; however, a substantial improvement in pain was seen for muscle performance exercise compared with attention control or no treatment (5 trials, pooled difference on 0-10 scale –2.53, 95% CI –3.23 to –1.80) [49, 52–54, 66] and when compared with attention control only (3 trials, pooled difference –2.18, 95% CI –3.15 to –1.24) [52, 54, 66] and with no treatment only (2 poor quality trials, pooled difference –3.01, 95% CI –4.00 to –1.90).49,53 No difference was seen across studies of exercise versus usual care (5 trials [1 new trial], pooled SMD –0.29, 95% CI –0.80 to 0.13). [56, 59, 62, 63, 70]

Exercise resulted in a small improvement in long-term pain versus usual care, waitlist or attention control (pooled difference –0.30 on a 0 to 10 scale, 95% CI –0.49 to 0.00, I²=0%), in three fair-quality trials (2 new trials)v 56,68,71 and one large, poor-quality trial [64] (Figure 34).

Most trials evaluated pain using a traditional 0 to 10 VAS. A small improvement in short-term pain favoring exercise was observed across four trials (3 fair [one new trial], 1 poor quality, pooled difference –0.83, 95% CI –1.49 to –0.19, I²=33%) [48, 59, 67, 69]; the effect estimate was similar after exclusion of the poor quality trial (pooled difference –0.84, 95% CI –1.73 to 0.02). [67] Estimates confined to combination exercise showed a slightly greater effect size and remained significant (3 trials, pooled difference –1.14, 95% CI –1.73 to –0.41). [59, 67, 69] Findings for intermediate-term pain showed a moderate improvement with exercise (7 trials, pooled difference –2.04, 95% CI –2.86 to –1.13, I²=81%). [49, 52–54, 59, 63, 66] The pooled estimate was similar when four poor-quality trials [49, 53, 63, 66] were excluded, leaving three fair-quality trials (pooled difference –1.97, 95% CI –3.45 to –0.44). [52, 54, 59] When results were stratified by exercise type, muscle performance exercise resulted in a large effect size (5 trials, pooled difference –2.53, 95% CI –3.23 to –1.80) [49, 52–54, 66] while results for combination exercise showed no difference versus usual care (2 trials, pooled difference –0.54, 95% CI –1.55 to 0.51). [59, 63] Stratification by control type among studies reporting VAS pain yielded similar findings to those across multiple measures. No trial employing VAS reported on long-term pain.

Other Outcomes. Health-related quality of life (QoL) outcomes had mixed results (Table 24). Two fair-quality trials found no association between exercise and short-term QoL on the KOOS 0 to 100 scale (pooled difference 1.8, 95% CI –2.5 to 6.0, I²=0%, plot not shown). [55, 61] A fair-quality trial (n=65) reported no differences in mean change for short term SF-36 PCS (mean change of 3.0 [95% CI –5.9 to 16.3] versus –0.7 [95% CI –14.8 to 9.8]) and SF-36 MCS (mean change of 0.7 [95% CI –18.1 to 13.2] vs. –0.7 [95% CI –16.8 to 12.8]). [65] One fair-quality trial (n=158) reported similar health-related QoL scores between a combined exercise group and usual care using averaged intermediate- and long-term scores. The adjusted mean (standard error [SE]) SF-36 PCS were 37.6 (0.9) vs. 35.3 (0.8), respectively, and adjusted mean (SE) SF-36 MCS were 54.1 (0.8) vs. 53.7 (0.8), respectively. [60] A poor-quality trial (n=50) reported intermediate-term SF-36 scores for individual domains. Functional capacity, physical role, bodily pain, general health, and vitality showed small improvement with exercise versus attention control. [50]

A fair-quality trial (n=438) reported no difference in depressive symptoms compared with attention control (2.59 vs. 2.80, p=0.27) for muscle performance exercise, while aerobic exercise was associated with fewer depressive symptoms on the Center for Epidemiologic Studies Depression (CES-D) questionnaire compared to attention control (2.12 vs. 2.80, p<0.001). [58]

There was insufficient evidence to determine effects of duration of exercise therapy or number of sessions on outcomes. No trials reported on changes in opioid use as a result of exercise programs.

Exercise Compared With Pharmacological Therapy or With Other Nonpharmacological Therapies

One new trial (in 2 publications) of exercise therapy versus pharmacological therapy met inclusion criteria. This fair-quality trial (N=93) [102, 103] compared combined exercise with standard recommendations for analgesics and anti-inflammatory drugs and had intermediate-term followup only. More patients who received exercise versus pharmacological therapy achieved a clinically important improvement in function (>10 point improvement on KOOS ADL), 47% (22/47) versus 28% (13/46); RR 1.7, 95% CI 1.0 to 2.9; however the difference did not reach statistical significance. There was no difference between groups for change in function from baseline: KOOS ADL (difference –3.6 on a 0-100 scale, 95% CI –9.2 to 2.1) and KOOS Sport and Recreation (difference –2.9 on a 0-100 scale, 95% CI –11.4 to 5.5). There was also no difference for change in pain from baseline according to the KOOS pain measure (difference 4.2 on a 0-100 scale, 95% CI –10.0 to 1.6), but there was a small difference for change in symptoms favoring exercise, KOOS Symptoms (difference –7.6 on a 0-100 scale, 95% CI –12.7 to –2.6). No difference in change in QoL from baseline was found with the KOOS QoL (difference –1.3 on a 0-100 scale, 95% CI –7.5 to 4.9) and the EQ-5D (difference 2.6, 95% CI –2.9 to 8.1).

Findings for exercise versus other nonpharmacological therapies are addressed in the sections for other nonpharmacological therapies.

Harms

Most trials did not report harms. One trial reported greater temporary, minor increases in pain in the exercise group versus a sham group (RR 14.7, 95% CI 2.0 to 107.7); however, the confidence interval is wide. [48] Four studies found no difference in worsening of pain symptoms with exercise versus comparators. [49, 53, 65, 66] One trial found no difference in falls or deaths. [51] No difference in adverse events (to include abdominal and intestinal symptoms, musculoskeletal symptoms, central nervous system, psychiatric symptoms, skin and subcutaneous symptoms and other) was reported for exercise compared to standard analgesics and anti-inflammatory therapy. [102, 103]

Psychological Therapy for Osteoarthritis Knee Pain

Key Points

Two new trials of motivational interviewing and CBT versus usual care and no treatment found no differences between treatment groups in function (pooled difference –2.09 on a 0-68 WOMAC function scale, 95% CI –8.70 to 1.61, I²=63.3%) but a small improvement in pain (pooled difference –0.6 on a 0-20 WOMAC pain scale, 95% CI –1.5 to –0.1, I²=0.0%) favoring the psychological treatments compared to controls in the short term (SOE: low for both function and pain).
Two trials of pain coping skills training and CBT versus usual care found no differences in function (WOMAC physical function, 0-100) or pain (WOMAC pain, 0-100); treatment effects were averaged over short term to intermediate term (difference –0.3, 95% CI –8.3 to 7.8 for function and –3.9, 95% CI –1.8 to 4.0 for pain) and intermediate term to long term (mean 35.2, 95% CI 31.8 to 38.6 vs. mean 37.5, 95% CI 33.9 to 41.2, and mean 34.5, 95% CI 30.8 to 38.2 vs. mean 38.0, 95% CI 34.1 to 41.8), respectively (SOE: low).
One trial of pain coping skills training versus strengthening exercises found no differences in WOMAC physical function scores (0-68 scale) at short term (difference 2.0, 95% CI –2.4 to 6.4) or intermediate term (difference 3.2, 95% CI –0.6 to 7.0) or in WOMAC pain scores (0-20 scale) at short term (difference –0.1, 95% CI –1.2 to 1.0) or intermediate term (difference 0.4, 95% CI –0.8 to 1.6) (SOE: low).
No serious harms were reported in either trial (SOE: low).

Detailed Synthesis

Table 26
Five trials of psychological therapies for knee OA met inclusion criteria (Table 26 and Appendix D). [109–112, 134] Three trials were included in the prior AHRQ report [109, 110, 134] and two were added for this update. [111, 112] Two trials (1 new trial) were conducted in the United States, [110, 111] one in Finland, [109] and two (1 new trial) in Australia. [112, 134] Sample sizes ranged from 67 to 155 (total sample=593). Across the trials, participants were predominately female (60% to 80%) with mean ages ranging from 58 to 64 years. Three trials (1 new trial) [109, 110, 112] evaluated CBT or pain coping skills training with usual care. The number and duration of psychological sessions varied between the trials (6, 2-hour sessions, 6 online sessions or e18, 1-hour sessions, respectively), as did the total duration of therapy (6 and 24 weeks). Usual care was defined as routine care provided by the patient’s primary care doctor and was not well-described in any trial. Another new trial (n=155) compared motivational interviewing focused on goal setting and physical activity with no treatment. [111] Motivational interviewing consisted of a longer initial session followed by 5 brief sessions (10-15 minutes) over 24 months. The fifth trial (n=149) [134] compared pain coping skills training (PCST) (ten 45-minute sessions) with strengthening exercises (ten 25-minute sessions); all sessions were conducted on an individual basis over a treatment period of 12 weeks. Participants randomized to receive PCST were told to practice skills daily and then as needed during followup; those in the exercise group were instructed to perform exercises four times a week during 12-week intervention and three times a week during the followup period.

Four trials (2 new trials) were rated fair quality [109, 111, 112, 134] and one was rated poor quality [110] (see Appendix E for quality ratings). The primary methodological limitation in the fair-quality trials were the inability to effectively blind care providers, outcome assessors, and/or patients. Additional methodological shortcomings in the poor-quality trial included poor treatment compliance and high attrition (32%).

Psychological Therapies Compared With Usual Care

Figure 35
Figure 36
Four trials (2 new trials) [109–112] compared psychological therapies with usual care or no treatment. Only the short term results of the two new, fair quality trials (O’Moore, 2018 and Gilbert, 2018) were amenable to pooling. [111, 112] There was no statistically significant difference between groups at short term for function according to the WOMAC (pooled difference –2.09 on a 0-68 scale, 95% CI –8.70 to 1.61, I²=63.3%) (Figure 35) but there was a small improvement in pain favoring the psychological treatments compared to usual care or no treatment (pooled difference –0.60 on the 0-20 WOMAC pain scale, 95% CI –1.48 to –0.08, I² = 0.0%) (Figure 36). [111, 112] One of these trials111 also reported intermediate and long term results with no statistically significant differences between treatment groups in either the WOMAC pain or function subscales at any timepoint with the exception of a small difference in function favoring usual care at 12 months (difference 3.2, 95% CI 0.1 to 6.2) at 12 months. Regarding quality of life, there was no statistically significant difference between groups at short term for either the SF-12 PCS (2 trials, pooled difference 1.3 on a 0-100 scale, 95% CI –1.1 to 3.6, I²=0.0%)111,112 the or the SF-12 MCS (2 trials, pooled difference 3.7 on a 0-100 scale, 95% CI –7.7 to 16.3, I²=90.8%). [111, 112]

Two other trials reported outcomes averaged over all post-treatment followup times and therefore were not able to be pooled. The trial of CBT averaged results from 1.5 to 10.5 months post-treatment (spanning short to intermediate term) [109] and the trial of pain coping skills training averaged results from 6 to 12 months post-treatment (spanning intermediate to long term). [110] Similar to the pooled results, no significant differences in function or pain were found between the psychological therapy and the usual care groups in either trial. Function was measured using the WOMAC physical function subscale (0-100) in both trials, over the short to intermediate term (difference –0.3, 95% CI –8.3 to 7.8) [109] and intermediate to long term (mean 35.2, 95% CI 31.8 to 38.6 vs. mean 37.5, 95% CI 33.9 to 41.2), [110] and using the Arthritis Impact Measurement Scale (AIMS) physical disability subscale in one trial [110] (Table 25). Both trials measured pain using the WOMAC pain subscale (0-100), one trial over short- to intermediate-term followup (difference –3.9, 95% CI –11.8 to 4.0) [109] and the other over intermediate- to long-term followup (mean 34.5, 95% CI 30.8 to 38.2 vs. mean 38.0, 95% CI 34.1 to 41.8). [110] Results were similar for the AIMS pain subscale and the numeric rating scale (NRS) pain scale, reported by one trial each (Table 25). Neither trial reported any differences between groups in any secondary outcome measure.

No trial evaluated effects of psychological therapies on use of opioid therapies or healthcare utilization.

Psychological Therapies Compared With Pharmacological Therapy

No trial of psychological therapy versus pharmacological therapy met inclusion criteria.

Psychological Therapies Compared With Exercise Therapy

One fair-quality trial134 of pain coping skills training versus strengthening exercise found no between-group differences in function or pain in the short term (WOMAC physical function, difference 2.0, 95% CI –2.4 to 6.4 on a 0-68 scale and WOMAC pain, difference –0.1, 95% CI –1.2 to 1.0 on a 0-20 scale) or the intermediate term (WOMAC physical function, difference 3.2, 95% CI –0.6 to 7.0 and WOMAC pain, difference 0.4, 95% CI –0.8 to 1.6) (Table 25). Results were similar for overall pain and pain with walking, both measured on a 0-100 VAS. There were also no differences between groups on any other secondary outcome measure including opioid use at short-term or intermediate-term followup.

Harms

In the four trials of psychological interventions versus usual care, [109–112] no adverse events were observed. In the fifth trial,134 fewer participants in the pain coping skills training group compared with the exercise group experienced pain in the knee (3% vs. 31%, p<0.001) and in other body regions (4% vs. 15%, p=0.02) during treatment; during followup, only the frequency of pain in other body areas differed between groups (0% vs. 11%, respectively, p<0.05; knee pain, 7% vs. 10%, p=0.53). Pain was most mostly mild and transient.

Physical Modalities for Osteoarthritis Knee Pain

Key Points

Ultrasound

Three trials (2 new trials), one good-, one fair- and one poor-quality, found no statistically significant differences between either continuous or pulsed ultrasound or sham in short-term function (pooled difference –2.50 on a 0-24 scale, 95% CI –6.37 to 1.22, I²=94.0%) and short-term pain intensity (pooled difference –1.2 on a 0-10 scale, 95% CI –3.7 to 1.3, I²=91.1%) (SOE: low).
One fair-quality trial found no differences between continuous and pulsed ultrasound versus sham in intermediate-term function (difference –2.9, 95% CI –9.19 to 3.39 and 1.6, 95% CI –3.01 to 6.22, on a 0-68 WOMAC function scale) or pain (difference –1.6, 95% CI –3.26 to 0.06 and 0.2, 95% CI –1.34 to 1.74, on a 0-20 WOMAC pain scale). There was also no difference between groups for VAS pain during rest or on movement (SOE: low).
No adverse events were reported during the two trials (SOE: low).
Transcutaneous Electrical Nerve Stimulation

One trial found no differences between TENS and placebo TENS in intermediate-term function (proportion of patients who achieved a minimal clinically important difference (MCID) on the WOMAC function subscale [≥9.1], 38% vs. 39%, RR 1.2, 95% CI 0.6 to 2.2; and difference –1.9, 95% CI –9.7 to 5.9 on the 0-100 WOMAC function subscale) or intermediate-term pain (proportion of patients who achieved MCID [≥20] in VAS pain, 56% vs. 44%, RR 1.3, 95% CI 0.8 to 2.0; and difference –5.6, 95% CI –14.9 to 3.6 on the 0-100 WOMAC pain subscale) (SOE: low for function and pain).
One trial of TENS reported no difference in the risk of minor adverse events (RR 1.06 (95% CI 0.38 to 2.97) (SOE: low).
Low-Level Laser Therapy

Evidence was insufficient from one small fair-quality and two poor-quality trials to determine effects or harms of low-level laser therapy in the short or intermediate term; No data were available for the long term (SOE: insufficient)
Microwave Diathermy

There was insufficient evidence to determine short-term effects or harms from one small, fair-quality trial (SOE: insufficient).
Pulsed Short-Wave Diathermy

There was insufficient evidence to determine effects or harms from one poor-quality trial in the short term or from another poor quality trial in the long term (SOE: insufficient).
Electromagnetic Field

One fair-quality trial found pulsed electromagnetic fields were associated with small improvements in function (difference –3.48, 95% CI –4.44 to –2.51 on a 0-85 WOMAC ADL subscale) and pain (difference –0.84, 95% CI –1.10 to –0.58 on a 0-25 WOMAC pain subscale) versus sham short-term but differences may not be clinically significant (SOE: low).
More patients who received real versus sham electromagnetic field therapy reported throbbing or warming sensations or aggravation of pain (29% versus 7%); however, the difference was not significant (RR 1.95, 95% CI 0.81 to 4.71) (SOE: low).
Superficial Heat

Evidence was insufficient from one small fair-quality trial to determine effects or harms of trial superficial heat versus placebo in short-term pain (SOE: insufficient).
Braces

There was insufficient evidence from one poor-quality study to determine the effects of bracing versus usual care for intermediate-term and long-term function or pain (SOE: insufficient).
Harms were not reported.
Detailed Synthesis

Table 27
A total of 15 trials evaluating the use of a physical modality for the treatment of knee OA met inclusion criteria (Table 27 and Appendixes D and E). [150–164] Thirteen were included in the prior AHRQ report [150–162] and two were added for this update. [163, 164] Physical modalities evaluated included ultrasound (both new trials), TENS, low-level laser therapy, microwave diathermy, pulsed short-wave diathermy, electromagnetic fields, superficial heat, and bracing. All but one intervention (bracing vs. usual care) [152] were compared to a sham procedure.

Four RCTs (2 new trials; 1 good-quality, 2 fair-quality, and 1 poor-quality) that evaluated ultrasound for knee OA met the inclusion criteria. [153, 162–164] All trials required at least grade 2 radiographic knee OA using the Kellgren–Lawrence criteria for inclusion. One (new) trial evaluated continuous ultrasound,164 one (new) evaluated pulsed ultrasound163 and two trials had both a continuous and a pulsed ultrasound group. [153, 162] In three trials, the ultrasound groups received 1 MHz treatments five times per week for 2 weeks at an intensity of either 1 or 1.5 W/cm2 and the sham comparators received the same protocol, but the power was switched off. [153, 162, 164] The forth trial applied daily pulsed ultrasound for 10 days at 0.6 MHz with an average intensity of 120 mW/cm2 and duty cycle of 20% plus participants took diclofenac sodium tablets; the comparator group received sham ultrasound (no power output) plus the diclofenac sodium tablets. [163] Compliance with the intervention protocols were not reported. Three trials reported short-term outcomes, [162–164] the other intermediate-term outcomes. The methodological shortcomings were unclear blinding of the provider or assessor, [153, 163, 164] unclear randomization procedures and concealment of treatment allocation164 and unclear adherence to an intention-to-treat analysis. [162]

We found one good-quality (n=70) trial that compared active TENS with sham TENS for knee OA. [154] Inclusion criteria required a confirmed diagnosis of knee OA using the American College of Rheumatology criteria. The TENS protocol had patients wear a pulsed TENS device 7 hours daily for 26 weeks. The sham TENS groups followed the same protocol as the active treatment, but the device turned off after 3 minutes. Compliance was unacceptable for time the TENS device was worn.

We identified three small trials (n=30, 49, and 60) that investigated low-level laser therapy versus sham laser for knee OA. [150, 157, 160] The mean age ranged from 49 to 64 years and most patients were female (62% to 75%). Two studies included patients meeting the American College of Rheumatology criteria for knee OA. [150, 160] Two trials also required an average pain intensity of greater than 3 or 4 on a 0-10 VAS, [150] while the other trial had an additional inclusion criteria of radiographic knee OA of Kellgren–Lawrence grade of 2 or 3. [160] Treatment duration ranged from 2 to 4 weeks and the number of total sessions from 8 to 10. Low-level laser therapy protocols differed across the trials with doses ranging from 1.2 to 6 Joules per point (range, 5 to 6 points) and length of irradiation from 40 seconds to 2 minutes; all trials used a continuous laser beam. The sham laser comparison groups followed the same respective protocols, but the device was inactive. One trial was rated fair quality [150] and two poor quality. [157, 160] In the fair-quality trial, blinding of the care provider was unclear. The two poor-quality trials suffered from insufficient descriptions of allocation concealment methods, unclear application of intention to treat, lack of clarity regarding patient blinding, and no reporting of or unacceptable attrition.

One small (n=63), fair-quality trial compared microwave diathermy (three 30-minute sessions per week for 4 weeks) to sham. [156] The inclusion criteria required radiographic knee OA of a Kellgren and Lawrence grade 2 or 3. The power was set to 50 watts. Sham diathermy followed the same protocol, but the machine was set to off. Compliance with the treatment regimen for each group was unclear. Methodological limitations of this study included no blinding of the care providers.

Two trials (n=86 and 115) examined pulsed short-wave diathermy compared to sham diathermy. [155, 158] The mean age ranged from 62 to 75 years, and the proportion of female participants ranged from 67 to 100 percent. Both trials included patients meeting radiographic criteria for knee OA. Each trial compared two doses of short-wave diathermy to a sham diathermy group; dosages varied by intensity in one trial (mean power output of either 1.8 or 18 Watts for 20 minutes) [158] or by length of session (19 or 38 minutes at 14.5 Watts) in the other. [155] Both trials applied diathermy three times per week for 3 weeks (total of 9 sessions). Each sham diathermy group followed the same treatment protocol, but the electrical current was not applied. Compliance with the treatment regimens was acceptable for both trials. Both trials were rated poor quality due to unclear concealment of treatment allocation, a lack of care provider blinding, and unacceptable attrition.

Two trials (n=90 for both) compared the application of electromagnetic fields to sham interventions for knee OA. [151, 161] The mean age of participants was 59 and 60 years, and the proportion of female participants ranged from 48 to 70 percent. The mean duration of chronicity ranged from 9 to 11 years. The good-quality trial enrolled participants meeting the American College of Rheumatology criteria for knee OA. [161] The inclusion criteria was not clearly presented in the poor-quality trial. [151] The intervention group in the good-quality study received 2 hours of pulsed electromagnetic fields 5 days a week for 6 weeks. [161] The poor-quality trial had a musically modulated electromagnetic field group that received 15 daily 30-minute sessions. Music from a connected speaker modulated the parameters of the electromagnetic field. The study also had an extremely low frequency electromagnetic field group that had 15 daily 30 minutes sessions, but the electromagnetic field was set at a frequency of 100 Hz. [151] The sham group in each trial followed the same respective treatment protocol, but used a noneffective electromagnetic field during the sessions. Compliance to the treatment sessions was acceptable in both trials. One trial was rated fair quality161 and the other was rated poor quality. [151] Methodological limitations in both trials included unclear methods for allocation concealment. Additionally, in the poor-quality trial, there were baseline dissimilarities between groups, no blinding of patients, providers, or outcome assessors, and attrition was not reported. [151]

A single trial compared superficial heat with placebo (n=52). [159] Participants were included if they had grade 2 or higher using the Kellgren-Lawrence grading for radiographic knee OA. Superficial heat was provided using a knee sleeve with a heat retaining polyester and aluminum substrate. Participants were instructed to wear the sleeve at least 12 hours per day. The placebo sleeves were identical and participants received the same instructions, but the sleeve did not contain the heat retaining substrate; the extent to which patients could be truly blinded is unclear (sleeve may retain body heat and feel warmer). Compliance with wearing the sleeve was acceptable. This trial was rated fair quality due to unclear concealment of treatment allocation, and a lack of clarity regarding whether it was the provider or outcomes assessor that was blinded.

We identified one trial comparing use of a knee brace to usual care (n=118). [152] Inclusion criteria required unicompartmental knee OA, and either a varus or valgus malalignment. Patients in the intervention group were fitted with a commercially available knee brace that allowed medial unloading or lateral unloading. Usual care consisted of patient education and physical therapy and analgesics as needed. Compliance with continued use of the brace was unacceptable. This trial was rated poor quality due to lack of patient, provider, or assessor blinding, and unacceptable attrition.

Physical Modalities Compared With Sham or Usual Care

Figure 37
Figure 38
Ultrasound. Three trials (2 new; one good, one fair, one poor quality) reported function using Lequesne Index and pain (during activity) using VAS over the short term. [162–164] There were no statistically significant differences between real ultrasound versus sham ultrasound in either function (3 trials, pooled difference –2.50 on a 0-24 scale, 95% CI –6.37 to 1.22, I²=94.0%) (Figure 37) or pain intensity (3 trials, pooled difference –1.2 on a 0-10 scale, 95% CI –3.7 to 1.3, I²=91.1%) (Figure 38) using a PL estimate likely due to heterogeneity between studies. Exclusion of the poor quality study [164] resulted slighter larger, but still nonstatistically significant, effects for function (2 trials, SMD –3.4, 95% CI –9.5 to 2.4, plot not shown) and pain (2 trials, pooled difference –1.9, 95% CI –5.1 to 1.1, plot not shown). Stratification by type of ultrasound (continuous vs. pulsed) resulted in similar conclusions regarding function and pain.

Intermediate-term results at 6 months from one fair-quality trial showed no difference on the WOMAC Physical Function subscale (0 to 100) between either the continuous or pulsed ultrasound group versus sham ultrasound (difference –4.5, 95% CI –10.34 to 1.34, and –2.9, 95% CI –9.19 to 3.39, respectively). [153] Results for pain intensity were not consistent with regard to ultrasound method. The continuous ultrasound group had a small improvement in pain on the WOMAC pain scale compared with sham (difference –1.8, 95% CI –3.34 to –0.26), but no statistical difference was seen between pulsed ultrasound and sham (difference –1.6, 95% CI –3.26 to 0.06). There was no difference between either ultrasound group versus sham ultrasound for VAS pain during rest or on movement (Table 26).

Regarding quality of life, one new trial reported no differences in the short term between the continuous and sham ultrasound groups for change from baseline on the SF-36 PCS (mean change 7.9 vs. 6.1 on a 0-100 scale, p=0.47) and the SF-36 MCS (mean change –0.3 vs. –0.1 on a 0-100 scale, p=0.95). [164]

Transcutaneous Electrical Nerve Stimulation. No effect was seen for TENS versus placebo TENS for function or pain over the intermediate term for any outcome measured in one good-quality trial. [154] Function was measured via the WOMAC-function subscale (0 to 100); the proportion of patients who achieved a MCID ≥9.1 was 38 percent versus 39 percent (RR 1.2, 95% CI 0.6 to 2.2) and the difference in mean change scores was –1.9 (95% CI –9.7 to 5.9). Pain was measured using a VAS pain scale (difference 0.9 on a scale of 0 to 10, 95% CI –11.7 to 13.4) and the WOMAC pain subscale (difference –5.6 on a 0 to 100 scale, 95% CI –14.9 to 3.6). The proportion of patients who achieved MCID (≥20) in pain VAS was 56 percent versus 44 percent (RR 1.3, 95% CI 0.8 to 2.0). Health-related quality of life measured with the SF-36 was not different between the two groups for the physical component and mental component score (Table 26).

Low-Level Laser Therapy. One fair-quality trial reported no difference between low-level laser therapy and sham for short-term function based on median Saudi Knee Function Scale scores (range 0-112 with higher scores indicating greater severity), median difference –10 (interquartile range of –23 to –4), p=0.054. [150] There were inconclusive results for intermediate-term function. One fair-quality trial reported the low-level laser therapy group had less functional severity at 6 months compared to sham on the Saudi Knee Function Scale (median difference –21.0, 95% CI –34.0 to –7.0), p=0.006. [150] For the other poor-quality trial, neither the higher dose nor the lower dose low-level laser therapy group differed from sham on the WOMAC physical function (0 to 96) subscale (difference –3.82, 95% CI –9.75 to 2.11 and –0.14, 95% CI –6.59 to 6.31, respectively). [160] However, the evidence was considered insufficient for function.

Figure 39
Low-level laser therapy was associated with moderately less pain over the short term in one fair-quality and one poor-quality trial (pooled difference –2.00, 95% CI –4.15 to 0.04) (Figure 39). [150, 157] There was no difference between low-level laser therapy versus sham for intermediate-term pain (pooled difference –1.04, 95% CI –3.17 to 1.45). [] 150,160 However, the evidence was considered insufficient for pain.

Microwave Diathermy. Data were insufficient from one small, fair-quality trial evaluating microwave diathermy. [156] The microwave diathermy group showed substantial short-term improvement compared with sham for function (difference –33.2 on a 0-85 scale, 95% CI –42.0 to –24.6, WOMAC ADL subscale) and pain (difference –8.1 on a 0-25 scale, 95% CI –10.7 to –5.3, WOMAC pain subscale). Substantial imprecision was noted.

Pulsed Short-Wave Diathermy. Data were insufficient for pulsed short-wave diathermy compared with sham. There was no difference in short-term function or pain for either the low intensity or high intensity group compared to sham diathermy based on the WOMAC in one poor-quality trial. [156] There was no difference on the WOMAC function subscale (0 to 10) between either the low intensity group versus sham (difference 0.16, 95% CI –1.51 to 1.83), or the high intensity group versus sham (difference –0.02, 95% CI –1.67 to 1.63). There was also no difference on the WOMAC pain subscale (0 to 10) for either the low or high intensity group versus sham (difference 0.15, 95% CI –1.57 to 1.87 and –0.24, 95% CI –2.02 to 1.54, respectively).

The other trial found inconsistent results among the high and low dose groups for long-term function using the KOOS (0 to 100). [155] The low dose group had substantially greater improvement on the KOOS-Daily Activities subscale compared to sham (difference 27.30, 95% CI 13.73 to 40.87), but there was no difference between the high dose group and sham on the KOOS-Daily Activities subscale (difference 10.30, 95% CI –1.24 to 21.84). Neither the low or high dose group differed from sham on the KOOS-recreational activities subscale (Table 26). Regarding pain intensity, the low dose group had moderately better pain NRS (0 to 10) that was not statistically significant (difference –1.8, 95% CI –3.60 to 0.00). The high dose group experienced substantially greater pain reduction than the sham group (difference –2.3, 95% CI –3.68 to –0.92).

Electromagnetic Fields. The fair-quality trial found use of pulsed electromagnetic fields did not appear to provide clinically meaningful short-term improvements in function or pain compared with sham, although statistical significance was achieved. The pulsed electromagnetic field group had better function on the WOMAC ADL subscale (0 to 85) compared with the sham group, (difference –3.48, 95% CI –4.44 to –2.51), and it had lower scores on the WOMAC pain subscale (0 to 25) versus sham (difference –0.84, 95% CI –1.10 to –0.58). [161] Based on estimated values from a graph for the poor-quality trial, [151] each group using electromagnetic fields had better function and substantially less pain in the short term on the Lequesne Index. The musically modulated electromagnetic field group had moderately better Lequesne Function scores (0-10) versus sham (mean of 6.5 vs. 3.8) and substantially lower Lequesne Pain scores (0 to 10) (mean of 1.4 vs. 6.9). The low frequency electromagnetic field group had similar benefits for function (mean of 7.1 vs. 3.83) and pain (mean of 1.4 vs. 6.85, standard deviation and statistical testing not reported), compared with sham.

Superficial Heat. Evidence from one small fair-quality trial was insufficient to determine the effects of superficial heat on short-term pain. WOMAC pain subscale scores were similar between the heat and placebo group at 1 month post-treatment (13.7 versus 13.9, respectively). [159]

Brace. Evidence from one small poor-quality trial was insufficient to determine the effects of brace treatment. There was no difference between bracing and usual care for intermediate-term or long-term function, pain, and quality of life outcomes. [152] Function was measured using the Hospital for Special Surgery (HSS) score (difference 3.2, 95% CI –0.58 to 6.98 for intermediate-term function and difference 3.0, 95% CI –1.05 to 7.05 for long-term function). Pain intensity was assessed using a VAS. The difference was –0.58 (95% CI –1.48 to 0.32) for intermediate-term pain and –0.81 (95% CI –1.76 to 0.14) for long-term pain. Health-related quality of life was measured using the Euro-Qol 5-Dimensions (EQ-5D) (difference 0.01, 95% CI –0.08 to 0.10 for both intermediate-term and long-term health-related quality of life).

Physical Modalities Compared With Pharmacological Therapy or With Exercise Therapy

No trial of physical modalities versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

In general, harms were poorly reported across the physical modality trials. Six trials (2 of low-level laser therapy,150,160 2 of ultrasound therapy, [153, 162] 1 of pulsed short-wave diathermy, [158] and 1 of superficial heat [159]) reported that no adverse events or side effects occurred in either group. The good-quality trial that evaluated TENS found no difference between active and sham TENS in the risk of localized, mild rashes (18% vs.17%; RR 1.06, 95% CI 0.38 to 2.97). [154] One trial of microwave diathermy reported two cases of symptom aggravation in the intervention group; the events were transient and neither patient withdrew from the trial. [156] More patients who received real versus sham electromagnetic field therapy reported throbbing or warming sensations or aggravation of pain (29% versus 7%); however, the difference was not significant (RR 1.95, 95% CI 0.81 to 4.71) in one fair-quality trial. [161]

Manual Therapies for Osteoarthritis Knee Pain

Key Points

There was insufficient evidence from one trial to determine the effects of joint manipulation on intermediate-term function or harms versus usual care or versus exercise due to inadequate data to determine effect sizes or statistical significance (SOE: insufficient).
There was insufficient evidence from one trial to determine the effects of massage versus usual care on short-term function, pain, or harms, or to evaluate the effect of varying dosages of massage on outcomes (SOE: insufficient).

Detailed Synthesis

Table 28
Two trials were identified that met inclusion criteria and evaluated manual therapies for the treatment of knee OA, [47, 184] (Table 28 and Appendixes D and E). Both trials were included in the prior AHRQ report. Patients in both trials were required to have radiographically established knee OA meeting the American College of Rheumatology criteria.

One fair-quality trial (N=117 with knee OA) compared manual therapy with usual care (continued routine care from general practitioner and other providers) and with combination exercise. [47] The manual therapy intervention consisted of nine 50-minute sessions. Seven were delivered in the first 9 weeks and two booster sessions at week 16. All participants were prescribed a home exercise program three times per week. Compliance with the intervention was acceptable in all groups, and the methodological shortcoming of this trial was a lack of blinding for the patients and care providers. Only intermediate-term outcomes were reported.

One fair-quality trial (N=125) compared four different dosages of massage therapy with usual care (continued current treatment). [184] The massage protocol consisted of standard Swedish massage strokes applied in each intervention group over 8 weeks. The dosage varied from 240 to 720 minutes based on the frequency (once or twice per week) and duration of massage (30-60 minutes per session). Compliance was acceptable in all groups, and the methodological shortcoming of this trial was a lack of blinding for the patients and care providers in the usual care arm. Only short-term outcomes were reported.

Manual Therapies Compared With Usual Care

Manual Therapy. Data were insufficient from one fair-quality trial (n=58 with knee OA) [47] to evaluate effects of joint manipulation versus usual care over the intermediate term. Although the manual therapy group showed a statistically significant improvement from baseline in function as measured by the WOMAC (mean change –31.5 on a 0-240 scale, 95% CI –52.7 to –10.3), whereas the usual care group showed no improvement (mean change 1.6, 95% CI –10.5 to 13.7), insufficient data was provided to calculate an effect estimate (number of patients with knee OA in each group were not provided). Pain outcomes were not reported.

Massage. Data were insufficient from one fair-quality trial (n=125) to evaluate the short-term effects of massage therapy (4 different dosages) compared with usual care. [184] Function was measured using the WOMAC total and physical function subscale scores (both 0 to 100 scales) and pain was measured using the WOMAC pain subscale and the VAS (both 0 to 10). No significant effects were seen in any outcome measure at 4 months postmassage treatment versus usual care (Table 27). Authors reported a trend for greater magnitude of change in function and pain with higher massage dosages versus lower massage dosages and versus usual care (statistical tests not provided).

Manual Therapies Compared With Pharmacological Therapy

No trial of manual therapy versus pharmacological therapy met inclusion criteria.

Manual Therapies Compared With Exercise Therapy

The trial evaluating manual therapy also included an exercise group that received aerobic warm-up, muscle strengthening, muscle stretching, and neuromuscular control exercises (n=59 with knee OA). [47] Both groups showed improvement from baseline in function (WOMAC) over the intermediate term, but the change was statistically significant in the manual therapy group only (mean change of –31.5, 95% CI –52.7 to –10.3 versus –12.7, 95% CI –27.1 to 1.7) for exercise. However, insufficient data was provided to calculate an effect estimate (number of patients with knee OA in each group were not provided). Pain outcomes were not reported.

Harms

No serious treatment-related adverse events occurred in either trial [47, 184]; one nontrial-related death was reported in the usual care group in the trial evaluating manual therapy. [47]

Mind-Body Therapies for Osteoarthritis Knee Pain

Key Points

Data were insufficient from two small, unblinded trials to determine the effects or harms of tai chi versus attention control in the short or intermediate terms. No data on long-term outcomes were available (SOE: insufficient).

Detailed Synthesis

Table 29
Two small trials (n=31 and 40) of tai chi versus attention control in older adults met the inclusion criteria [215, 216] (Table 29 and Appendix D). Both trials were included in the prior AHRQ report. Tai chi was practiced 40 to 60 minutes two or three times per week for 24 or 36 sessions. Attention control consisted of group education classes with one trial [216] including 20 minutes of stretching for sessions 18 to 24. Blinding was not possible in either trial and was the primary methodological limitation in one fair-quality trial. [216] Additional methodological concerns in the other poor-quality trial included unclear concealment of treatment allocation and high attrition [215] (Appendix E).

Mind-Body Therapies Compared With Attention Control

There is no clear difference between tai chi and an attention control on functional outcomes across the two trials over the short term on a WOMAC physical function 0- to 85-point scale (difference 1.03, 95% CI –9.87 to 11.93) [215] or WOMAC physical function 0- to 1700-point scale (difference –183.2, 95% CI –372.6 to 6.2), [216] or at intermediate term in one of the trials (difference –105.3, 95% CI –294.7 to –84.1, 0 to 1700 scale). [216] Results for short-term pain improvement were inconsistent with no difference between groups on WOMAC pain scale in one trial (difference 0.39 on a 0-35 point scale, 95% CI –4.21 to 4.99) [215] and the other marginally favoring tai chi on 0 to 500 point WOMAC pain scale (difference –67.0, 95% CI –131.8 to –2.1), [216] but demonstrating no difference between the groups in 0 to 10 VAS pain (difference –0.65, 95% CI –2.31 to 1.02). [216] There were no differences between groups at intermediate term in this latter trial (WOMAC pain 0 to 500 scale, difference –183.2, 95% CI –372.6 to 6.2). [216] One trial noted improvement in health-related quality of life (SF-36) in the intermediate term only and depression (CES-D) and self-efficacy in the short and intermediate terms.

Mind-Body Therapies Compared With Pharmacological Therapy or With Exercise Therapy

No trial of mind-body therapy versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

In the two trials of mind-body interventions, harms were poorly reported. One trial reported no serious adverse events216 and the other reported sporadic complaints of muscle soreness and foot or knee pain.215

Acupuncture for Osteoarthritis Knee Pain

Key Points

There were no differences between acupuncture versus control interventions (sham acupuncture, waitlist, or usual care) on function in the short term (4 trials [excluding outlier trial], pooled SMD –0.05, 95% CI –0.32 to 0.38) or the intermediate term (4 trials, pooled SMD –0.15, 95% CI –0.31 to 0.02, I²=0%) (SOE: low for short term; moderate for intermediate term). Stratified analysis showed no differences between acupuncture and sham treatments (4 trials) but moderate improvement in function compared with usual care (2 trials) short term.
There were no differences between acupuncture versus control interventions (sham acupuncture, waitlist, or usual care) on pain in the short term (6 trials, pooled SMD –0.27, 95% CI –0.67 to 0.12, I²=79%) or clinically meaningful differences in the intermediate term (4 trials, pooled SMD –0.16, 95% CI –0.32 to –0.01, I²=0%) (SOE: low for short term; moderate for intermediate term). Short-term differences were significant for acupuncture versus usual care but not for acupuncture versus sham acupuncture.
Data from one poor-quality trial were insufficient to determine the effects of acupuncture versus exercise (SOE: insufficient).
There was no difference in the risk of serious adverse events between any form of acupuncture and the control group. Worsening of symptoms (7% to 14%) and mild bruising, swelling, or pain at the acupuncture site (1% to 18%) were most common; one case of infection at an electroacupuncture site was reported (SOE: moderate).

Detailed Synthesis

Table 30
Nine trials of acupuncture for knee OA were identified that met inclusion criteria [67, 238–245] (Table 30 and Appendix D). All of the trials were included in the prior AHRQ report.

Four trials evaluated traditional acupuncture, [67, 240, 242, 244]
four electroacupuncture, [238, 239, 241, 243]
and two laser acupuncture. [240, 245]

Three trials compared acupuncture with usual care (provision of educational leaflets, instructions to remain on current oral medications, or no changes to their ongoing treatments) [67, 238, 242] and one trial each to no treatment [240] or to waitlist control. [243]

Six trials compared acupuncture with sham procedures, which consisted of inactive laser treatment (red light on but no power applied), [240, 245]
superficial needling, or acupuncture performed at nonmeridian sites, [239, 243, 244]
or nonpenetrating sham acupuncture. [241]

No trials of acupuncture versus pharmacological therapy or exercise were identified. Sample sizes ranged from 30 to 527 (total sample 1,811). Duration of acupuncture treatment ranged from 2 to 12 weeks, with the number of sessions ranging from 6 to 16. Four studies were conducted in Europe, [67, 241, 242, 244] three in the United States, [238, 239, 243] and one study each was conducted in Australia [240] and Turkey. [245]

Short-term outcomes were reported by six trials [67, 238, 241, 243–245] and intermediate-term outcomes by four [239, 240, 242, 244]; no trial reported outcomes over the long term.

Trials were rated good quality (for the comparison of acupuncture versus sham only). [240, 243] Seven trials were rated fair quality (to include the comparison of acupuncture with no treatment/waitlist in the two trials described previously) [238–241, 243–245] and two were considered poor quality [67, 242] (Appendix E). The primary methodological shortcoming in the fair-quality trials was lack of blinding; additionally, the poor-quality trials suffered from unclear allocation concealment methods and high rates of attrition (30% to 35%).

Acupuncture Compared With Usual Care, Waitlist, or Sham

Figure 40
Functional Outcomes. There was no difference between acupuncture versus control interventions (sham acupuncture, usual care, waitlist, no treatment) on WOMAC function score in the short term (5 trials, pooled SMD –0.17, 95% CI –0.71 to 0.38, I²=86%) [238, 241, 243–245] (Figure 40). All trials were considered fair quality. Removal of one outlier trial (Berman 1999) [238] attenuated the effect estimate size (4 trials, pooled SMD –0.05, 95% CI –0.32 to 0.38); results remained insignificant. No differences were found when the results were analyzed by the type of acupuncture used: electroacupuncture (3 trials, pooled SMD –0.34, 95% CI –1.17 to 0.46), [238, 241, 243] standard needle acupuncture (SMD –0.28, 95% CI –0.55 to 0.00), [244] or laser acupuncture (SMD 0.55, 95% CI –0.01 to 1.10) [245] compared with control interventions. When stratified by control type no differences were found between any form of acupuncture and sham treatment (4 trials, pooled SMD –0.02, 95% CI –0.28 to 0.39); [241, 243–245] however, when acupuncture was compared with waitlist and usual care, estimates suggested moderate improvement in function (2 trials, pooled SMD –0.74, 95% CI –1.40 to –0.24, plot not shown). [238, 243] In one small, fair-quality trial [245] of low-level laser acupuncture the authors reported a difference in WOMAC function score that favored the sham control (Table 29).

Similarly, based on WOMAC total score, there were no differences in short-term function between acupuncture and sham, waitlist, and usual care across trials (4 trials, pooled SMD –0.30, 95% CI –0.81 to 0.21, I²=85%, plot not shown). [67, 238, 244, 245] Removal of one outlier trial (Berman 1999) [238] attenuated the effect estimate size (3 trials, pooled SMD –0.10, 95% CI –0.54 to 0.49); results remained insignificant. Stratification by acupuncture type, control type, and exclusion of one poor-quality trial yielded similar estimates. Results according to other measures of function were mixed. In two small, fair-quality trials authors reported significant results (Table 29), one favoring electroacupuncture compared with usual care based on the Lequesne Index (0 to 24 scale), [238] and the second favoring the sham control comparing low-level laser acupuncture based on the WOMAC total score. [245] Five additional trials reported no differences between acupuncture and any of the control conditions across other measures of function [67, 240–242, 244] (Table 29).

In the intermediate term, there was no difference between acupuncture versus control conditions (sham acupuncture, usual care, waitlist) on the WOMAC function score (4 trials, pooled SMD –0.15, 95% CI –0.31 to 0.02, I²=0%), [239, 240, 242, 244] (Figure 40). Estimates were similar when stratified by study quality, acupuncture type, and control type; however, sensitivity analyses were limited by the small number of trials. Similarly, no differences in WOMAC total score were found for standard needle acupuncture versus usual care or sham at intermediate-term followup (2 trials, pooled SMD –0.23, 95% CI –0.49 to 0.03, I²=0%, plot not shown). [242, 244] Across other measures of function, no differences were seen at intermediate term between standard needle acupuncture versus sham acupuncture on the Pain Disability Index (difference –3.5 on a 0-70 scale, 95% CI –7.7 to 0.5) in one fair-quality trial [244] or versus usual care on the Oxford Knee Score (difference 3.6 on a 12 to 60 scale, 95% CI –9.8 to 2.6) in one small poor-quality trial. [242]

No trials reported data on long-term function.

Figure 41
Pain Outcomes. There was no difference between acupuncture versus control interventions (sham acupuncture, usual care, waitlist) on pain in the short term (6 trials, pooled SMD –0.27, 95% CI –0.67 to 0.12, I²=79%) [67, 238, 241, 243–245] (Figure 41). All but one trial used the WOMAC pain score. Removal of one outlier trial (Berman 1999) [238] attenuated the effect estimate size (5 trials, pooled SMD –0.15, 95% CI –0.29 to 0.00); results remained insignificant. Estimates were similar after exclusion of one poor-quality trial and for stratification by acupuncture type and for analyses of VAS or NRS instead of WOMAC pain score if more than one pain measure was reported. When stratified by control type, no differences were seen between acupuncture and sham acupuncture (4 trials, pooled SMD –0.06, 95% CI –0.24 to 0.14); [241, 243–245] however, when acupuncture was compared with waitlist or usual care, the estimate suggested moderate effects on pain (2 trials, pooled SMD –0.68, 95% CI –1.28 to –0.15). [238, 243]

There were no clinically meaningful differences between acupuncture and control interventions for pain in the intermediate term (4 trials, pooled SMD –0.16, 95% CI –0.32 to –0.01, I²=0%) [239, 240, 242, 244]; individually no trial reached statistical significance (Figure 41). Stratification based on acupuncture type, type of control intervention, and study quality yielded similar results.

No trial reported data on long-term pain.

Other Outcomes. Data on the effects of acupuncture on quality of life were limited (plots not shown). A small effect favoring acupuncture versus control conditions (sham acupuncture, usual care, waitlist, no treatment) was seen for the SF-12/SF-36 PCS (0-100 scale) in both the short term (2 trials, pooled difference 1.6, 95% CI 0.08 to 3.11, I²=0%) [243, 244] and the intermediate term (2 trials, pooled difference 1.94, 95% CI 0.03 to 3.86, I²=0%), [240, 244] but no difference was seen in the SF-12/SF-36 MCS (0-100 scale) at either timepoint: short term (2 trials, pooled difference 1.14, 95% CI –0.27 to 2.56, I²=0%) [243, 244] and intermediate term (2 trials, pooled difference –0.25, 95% CI –4.05 to 3.54, I²=70.8%). [240, 244] For individual trials, the effects were small and not statistically significant for either outcome (SF-12 or SF-36 PCS or MCS). There were no differences between acupuncture and control interventions on other quality of life measures or on measures of anxiety or depression over either the short or intermediate term (Table 29).

In one trial, [240] a small (1%) change in opioid use at intermediate term was seen with needle acupuncture (decrease from 1% to 0%), laser acupuncture (decrease from 3% to 2%), and sham acupuncture (decrease from 1% to 0%) while use remained the same in the no treatment group (Table 29).

Acupuncture Compared With Pharmacological Therapy

No trial of acupuncture versus pharmacological therapy met inclusion criteria.

Acupuncture Compared With Exercise Therapy

Data were insufficient from one poor-quality trial (n=120) [67] to evaluate the effects of weekly acupuncture versus 60 minutes of combination exercise (strengthening, aerobics, stretching, and balance training) for 6 weeks for knee OA (Table 29 and Appendix D). Methodological limitations included lack of patient or care provider blinding, unclear adherence, unacceptable attrition, and differential loss to followup (Appendix E). There were no differences between groups with regard to function on the Oxford Knee Score questionnaire (difference –0.7, 95% CI –3.5 to 2.1 on 12-60 scale) or WOMAC score (difference –1.0, 95% CI –6.7 to 4.7; scale not provided by author). Similarly there was no difference between treatments for VAS pain on a 0 to 10 scale (difference 0.22, 95% CI –0.67 to 1.11) or for anxiety or depression based on the Hospital Anxiety and Depression Scale.

Harms

All trials reported adverse events. One trial reported similar rates of serious adverse events in patients who received real versus sham acupuncture (2.1% vs. 2.7%, respectively; RR 0.75, 95% CI 0.13 to 4.39), to include hospitalizations and one case of death from myocardial infarction in the control group; none were considered to be related to the study condition or treatment. [244] All other events reported were classified as mild and there was no apparent difference in risk of adverse events between any form of acupuncture and the control groups. The most common adverse events reported were worsening of symptoms (7% to 14%) in three trials [240, 242, 243] and mild bruising, swelling, or pain at the acupuncture site (1% to 18%) in five trials. [67, 240, 242–244] One trial reported one case of an infection at the electroacupuncture site (n=455 for real and sham acupuncture groups). [243] In only one trial did an adverse event (not treatment related) lead to withdrawal: one patient (3%) in the acupuncture group had a flare-up of synovitis (nonseptic). [241]

Exercise for Osteoarthritis Hip Pain

Key Points

Exercise was associated with a small improvement in function versus usual care in the short term (3 trials, pooled SMD –0.33, 95% CI –0.58 to –0.11, I²=0%), intermediate term (2 trials, pooled SMD –0.28, 95% CI –0.55 to 0.02, I²=0%), and long term (1 trial, SMD –0.37, 95% CI –0.74 to –0.01) (SOE: low for short and intermediate term, insufficient for long term).
Exercise tended toward small improvement in short-term pain compared with usual care (3 trials, pooled SMD –0.30, 95% CI –0.70 to –0.02, I²=0%) but the results were no longer significant at intermediate term (2 trials, pooled SMD –0.14, 95% CI –0.40 to 0.12, I²=0%) or long term (1 trial, SMD –0.25, 95% CI –0.62 to 0.11) (SOE: low for short and intermediate term, insufficient for long term).
Evidence for harms was insufficient in trials of exercise with only two trials describing adverse events. However, no serious harms were reported in either trial (SOE: insufficient).

Detailed Synthesis

Table 31
Four trials of exercise therapy for hip OA met the inclusion criteria (Table 31 and Appendix D). [47, 72–74] All of the trials were included in the prior AHRQ report. Three trials evaluated participants with chronic hip pain diagnosed as OA using American College of Radiology criteria [47, 72, 74] and one assessed participants with hip OA diagnosed clinically who were on a waitlist for hip replacement. [73] Sample sizes ranged from 45 to 203 (total sample=455). Across trials, participants were predominately female (>50%) with mean ages ranging from 64 to 69 years. Three trials were conducted in Europe [72–74] and the other in New Zealand. [47]

All trials compared exercise with usual care, defined as care routinely provided by the patient’s primary care physician, which could include physical therapy referral. Two trials also provided education about hip OA to all participants. [72, 74] The exercise interventions included 8 to 12 supervised sessions of 30 to 60 minutes duration once per week over 8 to 12 weeks; the interventions were comprised of strengthening and stretching exercises (all studies), as well as neuromuscular control exercises in one trial [47] and endurance exercise in another. [74] All trials reported compliance rates with the scheduled exercise sessions between 76 and 88 percent. However, in one trial, [47] although 88 percent of patients completed more than 80 percent of the scheduled sessions, only 44 percent of participants returned logbooks to demonstrate compliance with the recommended home exercises.

Three trials were rated fair quality [47, 72, 74] and one was rated poor quality [73] (Appendix E). In all trials, the nature of the intervention and control precluded blinding of participants and researchers; patient-reported outcomes were therefore not blinded. Additionally, in the poor-quality trial, [73] concealed allocation was unclear and outcomes were poorly reported, as were attrition rates, which were substantial for pain (68%) and function (73%) outcomes.

Exercise Compared With Usual Care

Figure 42
Figure 43
Exercise was associated with a small improvement in function versus usual care in the short term (3 trials, pooled SMD –0.33, 95% CI –0.58 to –0.11, I²=0.0%), [72–74] intermediate term (2 trials, pooled SMD –0.28, 95% CI –0.55 to 0.02, I²=0.0%) [72, 74] and long term (1 trial, SMD –0.37, 95% CI –0.74 to –0.01) [72] (Figure 42). The intermediate-term findings were consistent with the additional trial not included in the meta-analysis (authors did not provide sufficient data), [47] although the small improvement in function in this trial did not reach statistical significance in those with hip OA. The small number of trials precluded meaningful sensitivity analysis.

Exercise tended toward small improvement in short-term pain compared with usual care (3 trials, pooled SMD –0.30, 95% CI –0.70 to –0.02, I²=0%) [72–74] (Figure 43), but not at intermediate term (2 trials, pooled SMD –0.14, 95% CI –0.40 to 0.12, I²=0%). [72, 74] There was moderate heterogeneity between studies and the short-term improvement in pain was observed in only one poor-quality study, [73] whereas the two fair-quality studies did not demonstrate any significant differences in short-term pain relief. [72, 74] There were no identifiable differences in methodology between the studies to explain these inconsistent findings, although the poor-quality study only reported pain outcomes for 68 percent of participants, which may have biased results. There was no difference between exercise and usual care in the long term based on a single study (SMD –0.25, 95% CI –0.62 to 0.11). [72] The small number of trials precluded meaningful sensitivity analysis.

Data on effects of exercise on quality of life were limited and were reported in only two trials. [73, 74] One fair-quality trial [74] found no differences in health-related quality of life between groups in the short term and intermediate term and one poor-quality study [73] found no differences between groups in the short term. One fair-quality study found no differences between groups in terms of opioid use at any time point (proportion of patients using tramadol or codeine daily: 7.0% vs. 3.5% at 3 months, 8.6% vs. 5.2% at 9 months, and 7.0% vs. 7.4% at 21 months, p=0.73), but did report slightly fewer followup physical therapy visits in the exercise group in the intermediate and long terms [72] (Table 30).

There was insufficient evidence to determine effects of duration of exercise therapy or number of sessions on outcomes.

Exercise Compared With Pharmacological Therapy or With Other Nonpharmacological Therapies

No trial of exercise versus pharmacological therapy met inclusion criteria. Findings for exercise versus other nonpharmacological therapies are addressed in the sections for other nonpharmacological therapies.

Harms

Only two exercise trials reported on harms, and neither reported adverse events in either the exercise group or usual care groups.47,73

Manual Therapies for Osteoarthritis Hip Pain

Key Points

Manual therapy was associated with small improvements in short-term (difference 11.1, 95% CI 4.0 to 18.6, 0-100 scale Harris Hip Score) and intermediate-term (difference 9.7, 95% CI 1.5 to 17.9) function versus exercise (SOE: low).
Manual therapy was associated with a small effect on pain in the short term (difference –0.72 [95% CI –1.38 to –0.05] for pain at rest and –1.21 [95% CI –2.29 to –0.25] for pain walking) versus exercise (SOE: low). The impact on pain is not clear at intermediate term; there was no difference in pain at rest (adjusted difference –7.0, 95% CI –20.3 to 5.9, 0-100 scale) but there was small improvement in pain while walking (adjusted difference –12.7, 95% CI –24.0 to –1.9) (SOE: insufficient).
No trials evaluated manual therapies versus pharmacological therapy.
One trial reported that no treatment-related serious adverse events were detected and in the other, no difference in study withdrawal due to symptom aggravation was seen between manual therapy and exercise (RR 1.42, 95% CI 0.25 to 8.16) (SOE: low).
There were insufficient data to determine the effects or harms of manual therapy compared with usual care at intermediate term. No effect size could be calculated (SOE: insufficient).

Detailed Synthesis

Table 32
We identified two trials (n=69 and 109) of manual therapy for hip OA that met inclusion criteria (Table 32 and Appendix D). [47, 193] Both trials were included in the prior AHRQ report. Mean patient age ranged from 66 to 72 years and females comprised 49 to 72 percent of the populations. Both trials required a diagnosis of hip OA meeting the American College of Rheumatology (ACR) criteria for inclusion. The duration of manual therapy ranged from 5 to 16 weeks with a total of nine sessions in both groups; in one trial this included seven sessions over the first 9 weeks and two booster sessions at week 16. [47] One trial compared manual therapy to usual care (continued routine care from a general practitioner and other providers) [47] and both trials compared manual therapy to combination exercise programs. [47, 193] The number of exercise sessions matched the manual therapy group of that respective study. All participants were prescribed a home exercise program three times per week. One trial reported short-term outcomes [193] and both reported intermediate-term outcomes. One trial was conducted in New Zealand [47] and the other in the Netherlands. [193]

Both trials were rated fair quality (Appendix E). Compliance with the intervention was acceptable in all groups, and the methodological shortcomings of these trials included a lack of blinding for the patients and care providers.

Manual Therapies Compared With Usual Care

A single fair-quality trial (n=69 with hip OA) [47] found that manual therapy resulted in an improvement in function at intermediate term using the total WOMAC score (0 to 240) in the manual therapy group (mean change from baseline –22.9, 95% CI –43.3 to –2.6), while the usual care group showed little change from baseline (mean change –7.9, 95% CI –30.9 to 15.3). Lack of information on the number of patients precluded calculation of effect size, and results of statistical testing between groups was not presented.

Manual Therapies Compared With Pharmacological Therapy

No trial of manual therapy versus pharmacological therapy met inclusion criteria.

Manual Therapies Compared With Exercise

One trial found that manual therapy resulted in a small improvement in short-term function compared with exercise (adjusted difference on the 0-100 scale Harris Hip Score [HHS] of 11.1, 95% CI 4.0 to 18.6). Regarding intermediate-term function, manual therapy conferred a small benefit in both trials. The adjusted difference on the HHS was 9.7 (95% CI 1.5 to 17.9) in one trial. [193] The other trial compared function using the total WOMAC score (0 to 240), and the manual therapy group experienced a statistically significant improvement from baseline (mean change of –22.9, 95% CI –43.3 to –2.6), while the exercise group did not (mean change –12.4, 95% CI –27.1 to 2.3). [47]

Only one of the trials reported pain outcomes. Manual therapy was associated with a small improvement in short-term pain at rest and during walking compared with exercise (adjusted differences on a VAS (0 to 10) of –0.72, 95% CI –1.38 to –0.05, and –1.21, 95% CI –2.29 to –0.25, respectively). [193] Intermediate-term pain results were inconsistent. A moderate effect on VAS pain during walking was seen following manual therapy compared to exercise (adjusted difference –1.27, 95% CI –2.40 to –0.19), but there was no difference for pain at rest (adjusted difference –0.70, 95% CI –2.03 to 0.59). [193]

There was no difference in one trial [193] between manual therapy and exercise for short-term or intermediate-term quality of life measured with the SF-36 physical function, role physical, or bodily pain subscales (Table 31).

Harms

No trial-related serious adverse events were detected in one trial, [47] and there was no difference in symptom aggravation leading to withdrawal (5% vs. 4%; RR 1.42, 95% CI 0.25 to 8.16) in the other trial. [193]

Exercise for Osteoarthritis Hand Pain

Key Points

Data from one poor-quality trial were insufficient to determine the effects or harms (though no serious harms were reported) of exercise versus usual care in the short term (SOE: insufficient).

Detailed Synthesis

Table 33
One Norwegian trial (n=130) that evaluated the effects of strengthening and range of motion exercise (3 times weekly for 3 months plus 4 group sessions) versus usual care (treatment recommended by the patient’s general practitioner) met inclusion criteria (Table 33 and Appendix D). [75] This trial was included in the prior AHRQ report and was rated poor quality due to lack of patient blinding, baseline differences in mental health conditions, and large differential attrition between groups (exercise 29% vs. usual care 7%) (Appendix E). Only short-term data was reported.

Exercise Compared With Usual Care

Data were insufficient from one poor-quality trial. No differences between exercise and usual care were observed for function according to the Functional Index for Hand OsteoArthritis (adjusted difference –0.5 on a 0-30 scale, 95% CI –1.9 to 0.8), or for pain (adjusted difference –0.2 on a 0 to 10 VAS pain scale, 95% CI –0.8 to 0.3) at 3 months. [75] Similarly, there were no differences between groups in the proportion of Osteoarthritis Research Society International Outcome Measures in Rheumatology (OARSI OMERACT) responders (30% versus 28%). There were also no differences between groups in any secondary outcome measure, including the patient-specific function scale, hand stiffness, or patient global assessment of disease activity.

The effects of exercise on use of opioid therapies or healthcare utilization were not reported. There was insufficient evidence to determine effects of duration of exercise therapy or number of sessions on outcomes.

Exercise Compared With Pharmacological Therapy or Other Nonpharmacological Therapies

No trial of exercise versus pharmacological therapy met inclusion criteria. Findings for exercise versus other nonpharmacological therapies are addressed in the sections for other nonpharmacological therapies.

Harms

In this trial, [75] no serious adverse events were reported; 8/130 (6%) patients reported increased pain (3 in hand, 5 in neck/shoulders) but adverse events were not reported by group.

Physical Modalities for Osteoarthritis Hand Pain

Key Points

One good-quality study of low-level laser treatment versus sham found no differences in function (difference 0.2, 95% CI –0.2 to 0.6) or pain (difference 0.1, 95% CI –0.3 to 0.5) in the short term (SOE: low).
Data were insufficient from one fair-quality trial to determine effects or harms of heat therapy using paraffin compared to no treatment on function or pain in the short term (SOE: insufficient).
No serious harms were reported in the trial of low-level laser therapy (SOE: low).

Detailed Synthesis

Table 34
We identified two trials of physical modality use for hand OA (Table 34 and Appendixes D and E). [165, 166] Both were included in the prior AHRQ report. One good-quality double-blind Canadian trial (N=88) [165] compared three, 20-minute sessions of low-level laser treatment to a sham laser probe over a 6-week period. Identical treatment procedures were used in each group. All participants attended three sham laser treatment sessions prior to randomization to ensure ability to comply with the treatment protocol.

One fair-quality trial (n=46) conducted in Turkey compared 15 minutes of paraffin wrapping 5 days per week for 3 weeks with a no treatment control group. [166] Both groups received information about joint protection strategies. Methodological limitations included lack of patient blinding, unclear compliance with treatment, and poorly reported analyses.

Physical Modalities Compared With Sham or No Treatment

Low-Level Laser Therapy. In the one good-quality trial of low-level laser treatment versus sham (n=88), [165] there were no differences in short-term function (difference 0.2 on a 0-4 Australian Canadian Osteoarthritis Hand Index [AUSCAN] functional subscale, 95% CI –0.2 to 0.6) or pain (difference 0.1 on a 0-4 AUSCAN pain subscale, 95% CI –0.3 to 0.5) at 4.5 months. Likewise, no difference was seen between groups in improvement based on patient global assessment.

Paraffin Treatment. One fair-quality trial (N =56) [166] of paraffin heat treatment demonstrated no difference compared with no treatment on the AUSCAN function scale (0-36) (difference –4.0, 95% CI –8.6 to 0.6 at short-term [2.25-month] followup). Regarding pain, no clear difference was identified between the groups over the short term as there was inconsistency across measures used and analyses for outcomes were poorly reported; findings were considered insufficient. [166] While heat treatment was slightly favored based on the AUSCAN pain subscale (difference –3 on a 0-20 scale, 95% CI –5.5 to –0.5), it was not statistically significant in the author’s intention-to-treat (ITT) analysis (p=0.07). VAS pain at rest suggested more improvement with heat therapy versus control in the ITT analysis (median 0 vs. 5.0 on a 0-10 scale, p<0.001); however, there was no clear difference between groups on VAS pain during ADL (median 5.0 vs. 7.0, p=0.09 for per protocol analysis, p=0.05 for ITT). No trial evaluated effects of physical modalities on use of opioid therapies or healthcare utilization.

Physical Modalities Compared With Pharmacological Therapy or With Exercise Therapy

No trial of a physical modality versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

Only the low-level laser therapy trial reported adverse events; no serious harms were reported.165 One patient (2%) who received low-level laser treatment experienced erythema at the site.

Multidisciplinary Rehabilitation for Osteoarthritis Hand Pain

Key Points

One fair-quality trial of multidisciplinary rehabilitation versus waitlist control found no differences between groups over the short term in function (adjusted difference 0.49, 95% CI –0.09 to 0.37 on 0-36 scale) or pain (adjusted difference 0.40, 95% CI –0.5 to 1.3 on a 0-20 scale), or with regard to the proportion of OARSI OMERACT responders (OR 0.82, 95% CI 0.42 to 1.61) (SOE: low for all outcomes).
Data on harms were insufficient, although no serious adverse events were reported in the one trial of multidisciplinary rehabilitation versus waitlist control (SOE: insufficient).

Detailed Synthesis

Table 35
One fair-quality trial (n=147) compared four, 2.5- to 3-hour group-based sessions, delivered by an occupational therapist and a specialized nurse, consisting of self-management techniques, ergonomic principles, daily home exercises, and splint (optional) versus a waitlist control, [261] (Table 35 and Appendix D). Waitlist control consisted of one 30-minute explanation of OA followed by a 3-month waiting period. Effect estimates were adjusted for baseline function or pain, body mass index (BMI), gender, and presence of erosive arthritis. Methodological limitations included lack of patient blinding and unreported compliance to treatment (Appendix E). This trial was included in the prior AHRQ report.

Of note, this intervention appeared to focus on functional restoration and while it met our broad definition of multidisciplinary rehabilitation (see footnote in Table 1), it was not consistent with how multidisciplinary rehabilitation is generally delivered clinically.

Multidisciplinary Rehabilitation Compared With Waitlist

No short-term (3 months) differences in function on the AUSCAN functional subscale (adjusted difference 0.49, 95% CI –0.09 to 0.37 on 0-36 scale) or on the AUSCAN pain subscale (adjusted difference 0.40, 95% CI –0.5 to 1.3, scale 0-20) were reported. [261]

There was no difference in the proportion of OARSI OMERACT responders (odds ratio [OR] 0.82, 95% CO 0.42 to 1.61) between groups or on any secondary outcome measure, including ADLs (Canadian Occupational Measurement Scales), health-related quality of life (SF-36), arthritis self-efficacy, pain coping, muscle strength, or joint mobility. [261] The effect of multidisciplinary rehabilitation on use of opioid therapies or healthcare utilization was not evaluated in any of the included studies.

Multidisciplinary Rehabilitation Compared With Pharmacological Therapy or With Exercise Therapy

No trial of a multidisciplinary rehabilitation program versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

No serious adverse events were reported. One patient reported a swollen hand and increased pain after the second treatment session.261

      Key Question 4. Fibromyalgia

For fibromyalgia, 47 RCTs (in 54 Publications) were included in the prior AHRQ report (N=4,225). Three trials were rated good quality, twenty trials fair quality, and twenty-four trials poor quality. The prior AHRQ report found exercise, CBT, myofascial release, massage, tai chi, qigong, acupuncture, and multidisciplinary rehabilitation (MDR) associated with small to moderate improvements in function and pain over the short and intermediate term compared with an attention control, sham, no treatment or usual care. Strength of evidence was low to moderate. In the long term, small improvement in function continued for MDR and in pain for massage (low strength of evidence). CBT compared with pregabalin was associated with a small improvement in function but not pain in the short term.

For this update, we identified 11 new RCTs (in 12 publications) (N=1,194). Ten were rated fair quality and one was rated poor quality. The new trials evaluated exercise (1 trial), psychological therapies (CBT and electromyography [EMG] biofeedback) (6 trials), mindfulness practices (1 trial), mind-body practices (Tai chi) (1 trial) and acupuncture (2 trials). The Key Points summarize the main findings based on the evidence included in the prior report and new trials; the Key Points note where new trials contributed to findings.

Exercise for Fibromyalgia

Key Points

Exercise was associated with a small improvement in function compared with attention control, no treatment, or usual care in the short term (7 trials, pooled difference –7.68 on a 0 to 100 scale, 95% CI –13.04 to –1.84, I²=60%) (SOE: low) and intermediate term (8 trials, pooled difference –6.04, 95% CI –9.25 to –3.01, I²=0%) (SOE: moderate). There were no clear effects in the long term (3 trials, pooled difference –4.33, 95% CI –10.46 to 1.97, I²=0%) (SOE: low).
Exercise was associated with a small improvement in VAS pain (0 to 10 scale) compared with usual care, attention control, or no treatment in the short term (6 trials [excluding outlier trial], pooled difference –0.88, 95% CI –1.33 to –0.27, I²=1.5%), and at intermediate term (8 trials [1 new], pooled difference –0.51, 95% CI –0.92 to –0.06), I²=0%) but no effect long term (4 trials, pooled difference –0.18, 95% CI –0.77 to 0.42, I²=0%) (SOE: moderate for all time frames).
There was insufficient evidence from one small, poor-quality trial to determine the effects of aerobic exercise versus pharmacological therapy (paroxetine) on pain in the intermediate term (SOE: insufficient). There were no data on short- or long-term effects.
Data on harms were insufficient. Most trials of exercise did not report on adverse events at all. One trial reported one nonstudy-related adverse event. Two trials reported no adverse events. (SOE: insufficient).

Detailed Synthesis

Table 36
Twenty-two trials (reported in 24 publications) of exercise therapy for fibromyalgia met inclusion criteria [76–99] (Table 36 and Appendix D). This included one new trial not included in the prior AHRQ report. [99] The exercise interventions varied across the trials and included combinations of different exercise types (12 trials), [77, 78, 80, 83, 85, 89, 91, 92, 94–97, 99] aerobic exercise (10 trials), [79, 81, 82, 84, 86–88, 90, 92, 93, 98] muscle performance exercise/strength training (1 trial), [86] and Pilates (1 trial). [76]

The duration of exercise therapy ranged from 1 to 8 months across the trials and the total number of exercise sessions ranged from 4 to 96 (at a frequency of 1 to 5 times per week). Many trials also included instruction for home exercise practice. Exercise was compared to usual care in nine trials, [79, 80, 90–92, 96–99] no treatment in six trials, [83–86, 89, 94, 95] attention control in five trials, [76, 78, 81, 82, 87, 88] and to waitlist, [77] sham (i.e., transcutaneous electrical stimulation), [93] and pharmacological care [93] in one trial each (the latter two control conditions were separate arms of the same trial). Usual care generally included medical treatment for fibromyalgia and continued normal daily activities (which often specifically excluded the exercise intervention being evaluated). Attention control conditions consisted of fibromyalgia education sessions, social support, general guidance on coping strategies, relaxation and stretching exercises, and physical activity planning.

Sample sizes ranged from 32 to 166 across the trials (total sample=1,428). Patient mean age ranged from 35 to 57 years, and the majority were female (89% to 100%). Thirteen trials were conducted in Europe, [79, 83, 85, 88–92, 94–99] five in North America, [78, 80–82, 84, 87] two in Brazil, [77, 86] and two in Turkey. [76, 93]

Twelve trials were rated fair quality [76, 77, 79–82, 86, 88, 89, 92, 96, 98, 99] and 10 poor quality [78, 83–85, 87, 90, 91, 93–95, 97] (Appendix E). Methodological limitations in the fair-quality trials were primarily related to unclear allocation concealment methods and lack of blinding (the nature of interventions precluded blinding of participants and researchers). Additionally, poor-quality trials also suffered from unclear randomization methods and high rates of attrition and/or differential attrition.

Exercise Compared With Usual Care, Waitlist, an Attention Control, or No Treatment

Figure 44
Functional Outcomes. Exercise was associated with a small improvement in function short term compared with usual care, an attention control, or no treatment based on Fibromyalgia Impact Questionnaire (FIQ) total scores, which reflect fibromyalgia impact on function as well as symptoms such as pain, fatigue, stiffness, anxiety, and depression, (7 trials, pooled difference –7.68 on a 0 to 100 scale, 95% CI –13.04 to –1.84, I²= 59.9%) [76, 77, 80, 83, 86, 87, 89] (Figure 44). The estimate across fair-quality trials (i.e., not including the poor-quality trials) was somewhat higher (5 trials, pooled difference –9.91, 95% CI –15.75 to –4.07). [76, 77, 80, 86, 89]

Exercise was associated with a small improvement in intermediate-term function versus controls for FIQ total score (8 trials, pooled difference on 0-100 scale, –6.04, 95% CI –9.25 to –3.01, I²= 0%) [80, 82–84, 88, 91, 92, 94] (Figure 44). Estimates were slightly smaller across the fair-quality trials only (4 trials, pooled difference –4.04, 95% CI –7.90 to –0.03). [80, 82, 88, 92] Stratification by exercise type yielded similar results for combination exercise (7 trials, pooled difference –5.75, 95% CI –9.29 to –2.54), [80, 82, 83, 88, 91, 92, 94] but there was no clear difference between aerobic exercise and no treatment or usual care (2 trials, pooled difference –8.13, 95% CI –16.24 to 0.28). [84, 92] Estimates were consistent with a slightly greater effect of exercise on function when compared with usual care (3 trials, pooled difference –6.13, 95% CI –11.71 to –1.06) [80, 91, 92] or no treatment (3 poor quality trials, pooled difference –9.97, 95% CI –16.24 to –3.45), [83, 84, 94] but there was no clear difference in two fair-quality trials using attention controls (pooled difference –3.25, 95% CI –99.32 to 5.20). [82, 88]

Exercise no longer had an effect on long-term function compared with controls based on the FIQ total score (3 trials, pooled difference on 0 to 100 scale, –4.33, 95% CI –10.46 to 1.97, I²= 0%) [82, 91, 96] (Figure 44). There were no clear differences in estimates when analyses were stratified according to the type of exercise (2 trials of combination exercise, pooled difference –4.45, 95% CI –14.39 to 6.24), [82, 91] type of comparison (2 trials of usual care, pooled difference –5.34, 95% CI –13.4 to 2.32), [91, 96] or after the exclusion of one poor-quality trial (2 trials, pooled difference –3.11, 95% CI –11.26 to 5.86). [82, 96] Findings are based on a small number of trials.

Figure 45
Pain Outcomes. Exercise had a moderately greater effect on pain (0 to 10 VAS) in the short term compared with usual care, attention control, or no treatment (7 trials, pooled difference –1.08, 95% CI –1.75 to –0.32, I²=53.1%) [76–78, 80, 83, 85, 86] (Figure 45). Substantial heterogeneity was noted with one outlier trial of belly dance (combination exercise) versus waitlist control, reporting substantially higher estimates. [77] Excluding the outlier trial reduced heterogeneity and led to an effect size consistent with a small effect (6 trials, pooled difference –0.88, 95% CI –1.33 to –0.27, I²=1.5%) Estimates were similar when stratified by exercise type and control type. Across the fair-quality trials, the estimate was somewhat larger (4 trials, pooled difference –1.44, 95% CI –2.4 to –0.49, including the outlier). [76, 77, 80, 82, 86]

There was a small improvement in VAS pain with exercise at intermediate term (8 trials [1 new], pooled difference –0.51, 95% CI –0.92 to –0.06), I²=0%) [80, 82, 83, 90, 93, 94, 97, 99] (Figure 45). Removal of poor-quality trials [83, 90, 93, 94] and stratification by exercise and control types yielded similar estimates (pooled differences ranged from –0.40 to –0.71) with no clear differences identified.

There was no effect of exercise on pain long term (4 trials, pooled difference –0.18 on a 0-10 scale, 95% CI –0.77 to 0.42, I²=0%) [78, 82, 96, 98] (Figure 45). Similar estimates were obtained and no clear differences were seen following exclusion of one poor quality-trial or for the comparisons of aerobic exercise with usual care or combination exercise with attention control; pooled differences ranged from –0.05 to –0.26.

Other Outcomes. Data on the effects of exercise on anxiety, depression, and quality of life were often poorly reported (Table 35) and results are mixed. Exercise had no clear effect in the short term on measures of mental health, depression, anxiety, psychological distress, or sleep disturbance VAS across five trials, [76–80] with only one small poor-quality trial favoring exercise on the EQ-5D anxiety/depression scale. [85] Similarly, exercise had no clear effect on quality of life.

At intermediate term, exercise was associated with a small improvement in depression measured by the Beck Depression Inventory (BDI) compared with no treatment or usual care (4 trials, pooled difference –4.9 on a 0-63 scale, 95% CI –7.55 to –2.47, I²= 33.1%, plot not shown) [84, 91–93]; three of the four trials were poor quality. Results were similar for aerobic exercise (3 trials, pooled difference –5.34, 95% CI –8.42 to –3.03) but no difference between groups was seen in the pooled estimate for the two trials using combination exercise or when any exercise was compared with usual care only (2 trials). Across various other measures, exercise had no clear effect on depression in five trials [78, 79, 82, 88, 90]; however, one poor-quality trial favored exercise based on the FIQ depression subscale versus usual care. [94] Results for anxiety were mixed: two trials (one fair- and one poor-quality) [88, 90] reported no difference between groups while two small, poor-quality trials reported a greater improvement in anxiety on the State-Trait Anxiety Inventory (STAI) and the FIQ anxiety subscale with exercise versus usual care. [84, 94] Exercise was associated with improved quality of life (SF-36 questionnaire) in three small trials, [91, 92, 95] but not in a fourth larger fair-quality trial88 (Table 35). Exercise had no clear effect on psychological problems in two trials [78, 80] or sleep in three trials. [78, 83, 90] One trial reported no between-group difference in analgesic medication use by 6 months, although patients randomized to aerobic exercise showed a significant reduction from baseline use. [93]

Long term, exercise had no clear effect on measures of depression, anxiety, or psychological problems in all but one poor-quality trial. [91] This same trial also reported improvement in SF-36 total scores, whereas one larger fair-quality trial did not. [79] No differences between groups in healthcare utilization were seen in the 2 months prior to the final assessment at 18 months in one trial [96] (Table 35).

Exercise Compared With Pharmacological Therapy

One small, poor-quality trial (N=32 analyzed) comparing 1.5 months of aerobic exercise (40 minutes on bicycle ergometer three times per week) versus paroxetine 20 mg daily found no between-group difference in pain on VAS at intermediate-term followup (difference –0.26 on a 0-10 scale, 95% CI –1.46 to 0.94). Regarding secondary outcomes, no differences were seen for depression (BDI) or mean analgesic consumption over the intermediate term, although the exercise group showed a greater reduction from baseline in analgesic use compared with the paroxetine group.

Exercise Compared With Other Nonpharmacological Therapies

Findings for exercise versus other nonpharmacological therapies are addressed in the sections for other nonpharmacological therapies.

Harms

Most trials of exercise did not report on adverse events. One trial reported one nonstudy-related adverse event.85 Two trials reported no adverse events. [86, 89]

Psychological Therapies for Fibromyalgia

Key Points

There was no clear difference between CBT versus usual care or waitlist in short-term function (3 trials [1 new], pooled difference –6.14 on 0-100 FIQ total scale, 95% CI –16.86 to 3.74, I²=70.6%). At intermediate term, CBT was associated with a moderate improvement in function (3 trials [1 new], pooled difference –12.82 on 0-100 FIQ total scale, 95% CI –24.07 to –2.44, I²=94.2%) versus waitlist or usual care. CBT was associated with improved function intermediate term (mean difference –1.8 on 0-10 FIQ Physical Impairment Scale, 95% CI –2.9 to –0.70) compared with attention control in an additional trial, however two new trials found no difference between CBT and waitlist on the Pain Disability Index or West Haven -Yale Multidimensional Pain Inventory (MPI) pain interference subscale.. Evidence from two poor-quality trials was insufficient to determine effects on long-term function (SOE: low for short term and intermediate term, insufficient for long term).
CBT was associated with a small improvement in pain (on a 0-10 scale) compared with usual care or waitlist in the short term (4 trials [1 new], pooled difference –0.62, 95% CI –1.08 to –0.14) but not at intermediate-term (6 trials [4 new], pooled difference –0.55, 95% CI –1.13 to 0.06). There was no difference in clinically important improvement at intermediate term (≥50% on the Brief Pain Inventory) between CBT (8.3%) or emotional awareness and expression therpay (EAET) (22.5%) and usual care (12%) in one new fair quality trial. Evidence from one poor-quality trial was insufficient to determine effects on long-term pain (SOE: low for short term and intermediate term, insufficient for long term).
Data were insufficient to determine the effects of EMG biofeedback on function and pain compared with attention controls in the short and long term (1 poor-quality trial and one new fair-quality trial) and with usual care in the intermediate term (1 poor-quality trial), and for the impact of guided imagery versus attention control in the short term (1 poor-quality trial) (SOE: insufficient for all comparisons and time points).
At intermediate term, CBT was associated with a small improvement in function versus pregabalin (plus duloxetine as needed) in two trials [1 new]; differing effect size magnitudes for the trials (–4.0 vs. –15.6, FIQ total score, 0-100 scale) resulted in substantial heterogeneity for the pooled effect estimate making it unreliable (pooled difference –9.81, 95% CI –23.83 to 4.21, I²=96%) (SOE: low). There was no difference across these trials for VAS pain at intermediate term (2 trials [1 new], pooled difference –0.31 on a 0-10 scale, 95% CI –1.15 to 0.51, I²=63.5%) (SOE: low)
There was insufficient evidence to determine the impact on pain and function for the following: CBT versus pharmacological treatment (amitriptyline) over the short term (fair-quality trial) and electroencephalography (EEG) biofeedback versus pharmacological treatment (escitalopram) over the short and intermediate term (poor-quality trial) (SOE: insufficient). Long-term data were not reported.
There was insufficient evidence to determine the effects of psychological therapies versus exercise on function and pain in the short term (1 small trial of biofeedback), intermediate term (2 trials of CBT and biofeedback), and long term (3 trials of CBT, biofeedback, and relaxation for function; 4 trials of CBT [2], biofeedback, and relaxation for pain). All trials were considered poor quality (SOE: insufficient for function and pain at all time points).
Data on harms were insufficient. Adverse events were poorly reported across the trials but were overall minor and occurred at similar frequencies between groups. In one trial, however, fewer patients randomized to stress management (4.8%) compared with usual care (50%) withdrew from the trial, citing increased depression and worsening of symptoms, respectively. In another (new) trial comparing acceptance and commitment therapy (ACT) with pregabalin (plus duloxetine as needed) several mild adverse events were noted in the pharmacological therapy group, most commonly nausea (25%) and dry mouth (23%) (SOE: insufficient).

Detailed Synthesis

Table 37
A total of 20 trials (in 22 publications) of psychological therapy for fibromyalgia met inclusion criteria (Table 37 and Appendix D). [78, 97, 98, 113–127, 130, 131, 135, 136] Fourteen trials (across 15 publications) were included in the previous AHRQ report [78, 97, 98, 113–120, 130, 131, 135, 136] and six trials (across 7 publications) [121–124] were added for this update.

Fourteen trials (5 new trials; across 16 publications) featured a CBT component, [98, 113–117, 119–124, 126, 127, 130, 136]
four trials included biofeedback (EMG or EEG), [78, 97, 125, 131]
and one trial each included relaxation training [135] and guided imagery [118] (Table 36 and Appendix D).

The various psychological interventions were compared with usual care, waitlist control or attention control groups (15 trials [5 new], 17 publications), [78, 97, 98, 113–121, 124–127]
pharmacological therapy (4 trials [1 new], 5 publications), [113, 122, 123, 130, 131]
or exercise therapy (5 trials). [78, 97, 98, 135, 136]

The majority of subjects in all the trials were female (range 90% to 100%, many trials were limited to females) and mean ages ranged from 32 to 56 years. Sample sizes ranged between 32 and 230 subjects (total sample=1,822). Therapy duration and frequency in CBT trials ranged from 6 weekly sessions to 20 sessions over 6 months. CBT was delivered in groups in 12 trials (4 new trials) [113, 115–117, 119–124, 126, 130, 136] and by telephone [114] in another. In one trial, [127] CBT appeared to be delivered individually. Most CBT trials were of CBT as traditionally delivered for the treatment of pain problems. The exceptions included two trials (in 4 publications) ACT; [116, 119, 122, 123] two trials that evaluated CBT for pain and CBT for pain and insomnia; [121, 127] one trial of stress management therapy which that included presentations on stress mechanisms and training in pain coping and relaxation strategies; [98] and one trial of CBT for managing stress and pain. [126] These interventions were considered to be similar to standard CBT, however. Session lengths ranged from 30 minutes up to 3 hours.

In the six trials of biofeedback and associated interventions, therapy duration ranged from 4 to 16 weeks and was delivered individually in the four biofeedback trials and in groups for the remaining two trials. The frequency ranged from one to five times per week with sessions as short as 25 minutes and as long as 3 hours.

Short-term outcomes (<6 months) were reported by five trials (1 new trial) of CBT, [114–116, 119, 121, 130] three trials (1 new trial) of biofeedback [78, 125, 131] and one trial of guided imagery. [118] Intermediate outcomes (6 to <12 months) were reported by eight CBT trials (4 new trials) [113, 115, 117, 122–124, 126, 127, 136] and one trial of biofeedback. [97] Long-term outcomes (≥12 months) were reported by four CBT trials, [98, 117, 120, 136] one biofeedback trial [78] and one trial of relaxation therapy. [135] Studies were conducted in Spain (5 trials), [113, 115, 121–123, 136] the United States (5 trials), [78, 114, 120, 124, 127] Sweden (3 trials), [116, 119, 126, 135] the Netherlands (2 trials), [97, 118] Germany (2 trials), [117, 125] and one trial each in Brazil, [130] Norway [98] and Turkey. [131]

Among the 14 CBT trials, seven (4 new trials) were considered fair quality, [113, 116, 119, 122–124, 126, 127, 130] while the remaining seven (1 new trial) were rated poor quality [98, 114, 115, 117, 120, 121, 136] (Appendix E). Among the remaining trials of biofeedback, relaxation, and guided imagery interventions, all were rated poor quality [78, 97, 118, 131, 135] except for one new biofeedback trial which was considered to be fair-quality. [125] Methodological shortcomings included lack of blinding in fair-quality and poor-quality trials, and unclear allocation concealment methods, poor compliance, and high attrition in the poor-quality trials. In all trials, the nature of the intervention types precluded blinding of participants.

Psychological Therapies Compared With Usual Care, Waitlist, or Attention Control

Fifteen trials (5 new trials) compared psychological interventions versus usual care, waitlist, or attention control. [78, 97, 98, 113–121, 124–127] Nine trials were considered poor quality and six [5 new trials] [116, 119, 122–127] were considered fair quality. ACT is considered a form of CBT and was included in CBT-specific analyses.

Functional Outcomes. Across all types of psychological interventions, two poor quality trials reported on clinically meaningful improvement in short-term function (Table 37). Significantly more patients in the CBT group attained a clinically important improvement (≥14% on the FIQ total, 0-100 scale) from baseline compared with usual care (RR 2.8, 95% CI 1.3 to 6.1) in one trial, [115] but there was no significant difference in a smaller trial (RR 2.2, 95% CI 0.5 to 9.3). [114]

Figure 46
Examining mean differences in followup scores short-term, there was no clear difference in function across psychological therapies versus usual care, waitlist or attention control (5 trials [2 new], pooled difference –2.82 on a 0-100 FIQ total scale, 95% CI –9.79 to 2.81, I²=70.6%). [115, 116, 118, 119, 121, 125] Analysis confined to CBT trials (including ACT) showed no clear difference in function compared with usual care or waitlist in the short term (3 trials [1 new], pooled difference –6.14 on a 0-100 scale, FIQ total, 95% CI –16.86 to 3.74, I²=70.6%). [115, 116, 119, 121] Two trials were fair quality (Figure 46). Analysis of differences in change scores on the FIQ were similar in magnitude (data not shown). The prior AHRQ review reported a small improvement in function with CBT versus usual care or waitlist based on two trials. [115, 116, 119] No differences between groups were seen in the trials of guided imagery (difference 1.2 on a 0-100 FIQ total scale, 95% CI –0.2 to 2.6) [118] and EMG biofeedback. In one study of EMG biofeedback versus attention control, median change from baseline was 6.0 for both groups on the Arthritis Impact Measurement Scales (AIMS) physical activity subscale (0-10 scale). [78] In a new fair-quality trial of EMG biofeedback, [125] there was no difference on the FIQ as compared with an attention control condition.

At intermediate term, one poor quality trial reported that substantially more CBT patients achieved a clinically important functional improvement (≥14% on the FIQ total, 0-100 scale) compared with usual care (RR 2.9, 95% CI 1.9 to 17.8). [115] For analysis of mean differences in intermediate term scores, CBT/ACT was associated with moderate improvement in function (3 trials [1 new], pooled difference –12.82 on 0-100 scale, FIQ total, 95% CI –24.07 to –2.44, I²=94.2%) [113, 115, 122, 123] versus waitlist or usual care. All trials favored CBT (2 fair, 1 poor quality) but differed in magnitude of benefit. Pooled effect size was attenuated (small improvement with CBT) and no longer significant due to heterogeneity across the two trials of CBT versus usual care in the prior report (pooled difference –9.35, 95% CI –26.95 to 5.02, I²=84.5%). [113, 115] Both trials individually showed CBT had a statistically greater effect on function than usual care, but the effects differed in magnitude and we reported as a small improvement in function in the prior report (Figure 46). Findings from an additional trial suggested a greater improvement in function with CBT compared with attention control based on a 0 to 10 FIQ Physical Impairment Scale (difference –1.8, 95% CI –2.9 to –0.70). [117] A new fair-quality trial [127] of CBT for pain and CBT for insomnia versus waitlist found no difference between groups on the Pain Disability Index. A new fair-quality trial [126] of a CBT stress management program versus waitlist also found no difference on the West Haven-Yale Multidimensional Pain Inventory (MPI) pain interference subscale. There was no clear difference between biofeedback and usual care on function on the Sickness Impact Profile (SIP) physical score in one trial (mean change –1.6, 95% CI –3.4 to 0.2 versus –0.6, 95% CI –2.9 to 1.7, respectively, on a 0-100 scale). [97]

Data from two poor-quality trials were insufficient to determine the long-term effects of psychological therapies on function. One trial reported that CBT resulted in greater improvement compared with attention control on the FIQ Physical Impairment Scale (difference –1.8 on a 0-10 scale, 95% CI –2.85 to –0.745). [117] A trial of biofeedback versus usual care reported the same median change in the AIMS Physical Activity subscale (6.0) in both groups. [78]

Figure 47
Pain Outcomes. Psychological interventions (CBT/ACT and EMG biofeedback) were associated with a small improvement in pain compared with usual care, waitlist, or attention control, based on mean differences at short-term followup (5 trials [1 new], pooled difference –0.62, 95% CI –1.02 to –0.20, I²=0%) [78, 114–116, 119, 121] (Figure 47). Results based on the mean difference of change scores were similar, but not statistically significant (data not shown). The estimate was similar when only trials of CBT were considered (4 trials [1 new], pooled difference –0.62, 95% CI –1.08 to –0.14, plot not shown). [114–116, 119, 121] One poor quality trial reported no difference between CBT and usual care in the proportion of patients with clinically important improvement in pain short-term (≥30% improvement on 0-10 NRS, RR 1.5, 95% CI 0.4 to 5.7). [115] The addition of the new poor quality CBT trial121 resulted in no changes in conclusions from the prior AHRQ report for short term results.

At intermediate term, one poor quality trial reported no difference in the proportion of patients showing a clinically important improvement in pain (≥30% on 0-10 NRS, RR 1.3 95% CI 0.4 to 4.2) [115]; similarly, one new fair quality trial reported no differences in clinically important improvement (≥50% on Brief Pain Inventory) with CBT (8.3%) or EAET (22.5%) versus usual care (12%). [124] In analyses based on mean differences in scores, psychological interventions (CBT, ACT, EMG biofeedback, and combined CBT and EAET) were associated with a small benefit for pain compared with usual care, attention control or waitlist (7 trials [4 new], pooled difference –0.62, 95% CI –1.14 to –0.09, I²=65.7%), [97, 113, 115, 122–124, 126, 127] (Figure 47). Effect sizes at intermediate term were slightly smaller in a subanalysis of therapies versus usual care only (3 trials [1 new], pooled difference –0.52, 95% CI –1.4 to –0.15).97,113,115 Pooling only the six CBT trials, the effect was slightly smaller (6 trials [4 new] pooled difference –0.55, 95% CI –1.12 to 0.06) [113, 115, 122–124, 126, 127] with no clear difference between CBT and usual care, waitlist or attention control. Similarly, there was no clear difference in a subanalysis confined to the five fair quality trials, all of which were of CBT (5 trials [4 new], pooled difference –0.48, 95% CI –1.11 to 0.24). [113, 122–124, 126, 127] In the prior AHRQ report, there was no clear difference between CBT and usual care across two studies although each tended to favor CBT. The addition of the four new fair quality studies does not change the conclusion of no clear difference. In one new trial, the author-developed EAET, compared with attention control, was not associated with lower pain intensity at intermediate term based on the proportion of patients achieving a 50 percent or greater reduction in pain (22.5% vs. 12.0%, p=0.07) or the mean difference in pain scores using the Brief Pain Inventory 0-10 scale (–0.54, 95% CI –1.2 to 0.1), but was associated with improved fibromyalgia symptoms (difference –2.9, 95% CI –4.9 to –0.8 on the FM symptom scale, scale unclear). [124]

Three trials [78, 98, 120] reported long term effects on pain. A pooled analysis of two of these trials found no difference between these psychological therapies (CBT or biofeedback/relaxation training) and attention control or usual care (2 trials, pooled difference 0.04, 95% CI –0.89 to 0.98, I²=0%) [78, 98]; however, evidence across these two poor-quality trials was considered insufficient (Figure 47). The third trial found no difference between CBT and usual care in the proportion of participants achieving a clinically meaningful change of 12 points from baseline on the McGill Pain Questionnaire (MPQ) Sensory Scale (RR 0.54, 95% CI 0.14 to 2.2). [120]

Other Outcomes. Results for secondary outcomes were mixed across trials of CBT and ACT on secondary outcomes (Table 36). Five trials were fair quality; [116, 119, 122–124, 126, 127] the rest were poor quality.

In one fair-quality trial of ACT versus waitlist there were no differences between groups over the short term on the BDI, STAI-State scale or Short-Form-36 (SF-36) PCS; ACT was associated with improvement in the SF-36 MCS. [116, 119] In a new fair-quality trial of EMG biofeedback,125 there was no difference on SF-36 scores compared with an attention control condition.

Five fair-quality trials of CBT/ACT reported intermediate term outcomes. A comparison of CBT versus usual care found no differences on the Hospital Anxiety and Depression Scale (HAM-D) and Hamilton Anxiety Rating Scale (HAM-A).113 A new trial of ACT versus waitlist found a benefit of ACT for the 0-100 EQ5D VAS health status rating (difference 12.2, 95% CI 7.9 to 16.5), Hospital Anxiety and Depression Scale-Anxiety (HADS-A) (difference –3.42, 95% CI –4.7 to –2.1), and Hospital Anxiety and Depression Scale-Depression (HADS-D) (difference –3.5, 95% CI –4.4 to –2.5). [122, 123] A new trial of CBT versus education attention control [124] found no difference on the Short Form-12 Physical scale, Satisfaction with Life Scale, Pittsburgh Sleep Quality Index (PSQI), Positive Affect Negative Affect Schedule (PANAS)-positive score, PANAS-negative score, Center for Epidemiologic Studies Depression Scale (CES-D), Generalized Anxiety Disorder-7, or PROMIS Fatigue Short-Form. A new fair-quality trial of CBT for insomnia, CBT for pain, and waitlist found benefits of both CBT interventions for measures of sleep, but not depression or anxiety. [127] A new fair-quality trial of CBT stress management versus waitlist found benefits of CBT for measures of affective distress and depression, but not sleep. [126] Across the poor-quality trials, results were mixed across various secondary outcomes measures (Table 36).

Two poor-quality studies compared EMG biofeedback to attention control conditions; neither found differences on secondary outcomes, including the Symptoms Checklist 90-Revised Global Severity Index, SIP psychosocial score, global assessment of well-being, CES-D, and a sleep scale. [78, 97]

Psychological Therapies Compared With Pharmacological Therapy

Figure 48
Three fair-quality trials [113, 122, 123, 130] and one poor-quality trial131 compared a psychological therapy with pharmacological treatment. Two small trials reported functional outcomes over the short term with differing results. No effect was seen for CBT (plus amitriptyline) compared with amitriptyline alone at 3 months in one fair-quality trial (difference –4.10, 95% CI –18.40 to 10.20 on the FIQ total score [0 to 100 scale]). [130] One poor-quality trial, comparing EEG biofeedback with escitalopram, reported improved mean FIQ total scores (0-100 scale) in the biofeedback group at 4 to 5 months followup (difference –29.00, 95% CI –38.58 to –19.42).131 Substantial heterogeneity of the interventions, the medication comparators and quality of the trials precluded meaningful pooling for this outcome (Figure 48).

Intermediate-term function was reported by two fair-quality trials (1 new trial) [113, 122, 123]; both found benefits for CBT (including ACT) compared with pregabalin (plus duloxetine for depressed patients) according to the FIQ Total scale (0-100). One found a small improvement in function favoring CBT (difference –4.00 on a 0-100 scale, 95% CI –7.44 to –0.56) [113]; the other found a moderate improvement for function associated with CBT (difference –15.62, 95% CI –19.03 to –12.21). [122, 123] The pooled estimate suggests a small improvement in function (pooled difference –9.81, 95% CI –23.83 to 4.21, I²=96% but substantial heterogeneity due to the differences in effect magnitudes is noted) (Figure 48). It is unclear how many patients in the pharmacological group received concomitant duloxetine for major depressive disorder.

No differences in pain short-term were seen between groups in the trial of CBT versus amitriptyline (difference –0.7 on a 0-10 VAS, 95% CI –2.8 to 1.4), [130] whereas a moderate improvement was seen for EEG biofeedback compared with escitalopram (difference –2.7 on a 0-10 VAS, 95% CI –3.7 to –1.7) in the poor-quality trial. [131] Trials were not pooled given heterogeneity of both the intervention and medication comparators.

At intermediate-term, no difference between CBT/ACT versus pregabalin was observed (2 trials [1 new] pooled difference –0.31, 95% CI –1.15 to 0.51, I²= 63.5%). [113, 122, 123]

Regarding secondary outcomes, EEG biofeedback was associated with significantly better outcomes on various measures of anxiety, depression, and quality of life compared with escitalopram short term in the poor-quality trial.131 The two fair-quality trials evaluating CBT (versus amitriptyline and versus pregabalin) [113, 130] found no differences between groups over the short or intermediate term, with the exception of a benefit of CBT for SF-36 Mental Health scores at short-term followup in one trial (difference 13.7 on a 0-100 scale, 95% CI 0.07 to 27.3). [130] In the fair quality trial of ACT versus pregabalin (plus duloxetine for patients who were depressed), at intermediate term there was a benefit of ACT on the EQ-5D VAS measure of self-assessed health state (0-100 scale, with higher scores indicating better health; difference 9.6, 95% CI 5.2 to 14.0); the 0-21 HADS-A anxiety scale (difference –1.0, 95% CI –1.8 to –0.06); and the 0-21 HADS-D depression scale (difference –1.7, 95% CI –2.6 to –0.8). Across the two studies of CBT versus pregabalin (plus duloxetine as needed), [113, 122, 123] there was no difference between therapies on depression (measured by the HADS depression scale and the Hamilton Depression scale) intermediate term (difference –0.43, 95% CI –1.13 to 0.28, I²=93%). Two trials examined effects of pregabalin (plus duloxetine as needed) on measures of anxiety, with no difference across these studies at intermediate term followup (difference –0.23, 95% CI –0.69 to 0.23, I²= 0%).

Psychological Therapies Compared With Exercise

Five poor-quality trials compared psychological interventions with exercise; two trials evaluated CBT, [98, 136] two trials evaluated biofeedback, [78, 97] and one evaluated relaxation training [135] (Table 36). All trials were included in the prior AHRQ report.

Data were insufficient from one poor-quality trial to determine the effects of biofeedback versus combination exercise on function. The trial reported improved function based on the AIMS physical activity subscale (median change from baseline 6.0 versus 4.0, p<0.05). [78] Intermediate-term data from two poor-quality trials were insufficient to determine effects of psychological therapies on function and no clear differences in function were seen for CBT (difference –0.6, 95% CI –12.6 to 11.4 on 0-100 FIQ total score) [136] or biofeedback (mean change –1.6, 95% CI –3.4 to 0.2 vs. –0.6, 95% CI –2.9 to 1.7 on 0-100 SIP Physical score) [97] versus combination exercise. Similarly, no clear differences between psychological therapies and exercise were seen across three trials at longer term and evidence was considered insufficient. Results from two trials were not statistically significant (CBT vs. combination exercise [difference 0.1, 95% CI –10.5 to 10.7 on 0-100 FIQ total scale] [136] and relaxation training versus strength training [difference –1.7, 95% CI –9.3 to 5.9, on 0-100 FIQ Total Score]). [135] The third trial of biofeedback versus combination exercise reported improvement in function, but limited data were provided (median change from baseline, 6.0 versus 4.0, p<0.05). [78]

Data were insufficient from one poor-quality trial to determine the effects of biofeedback versus combination exercise pain (median change from baseline, 5.2 vs. 5.4 on 0-10 VAS). [78] Across two poor-quality trials at intermediate term, no clear differences were seen for CBT (difference –1.0, 95% CI –2.8 to 0.8)136 or biofeedback (mean change –0.6, 95% CI –6.5 to 5.3 vs. –5.5, 95% CI –10.9 to –0.1, p=not statistically significant [NS]) [97] compared with combination exercise; evidence was considered insufficient. There were no clear differences between any of the psychological therapies and exercise for pain on a 0 to 10 scale across four trials long term, including CBT versus combination exercise (difference 0.3, 95% CI –2.0 to 1.3)136 or aerobic exercise (difference 2, 95% CI –11.6 to 15.6), [98] biofeedback versus combination exercise (median change: 5.2 vs. 5.5, p=NS), [78] and relaxation training versus strength training (difference 2.9, 95% CI –5.5 to 11.3).135

There were generally no significant differences on measures of mental health, depression or anxiety, or on SF-36 scales, at any time frame across five poor-quality trials. [78, 97, 98, 135, 136] Some trials did not provide data for determination of effect sizes between treatment groups or report results of significance tests (Table 36).

Harms

Only seven trials (3 fair-quality and 4 poor-quality, 2 new) reported harms, which were poorly described in general. Two trials compared CBT with usual care; in one, there were no withdrawals due to adverse events in the CBT group compared with two (3.6%) in the control group (not further described) [113] and in the other there were two withdrawals, one in each group, due to painfulness of the nociceptive flexion reflex test used as an outcome measure (not as part of treatment). [114] Two trials compared psychological therapies with attention controls. One trial reported that 4.8 percent of patients in the CBT group versus 50 percent in the control group withdrew from the study (withdrawal attributed to depression [CBT group] and symptom worsening [control group]). [117] The other trial (a new trial) reported no adverse events for CBT or attention control (education) but did note that brief symptom exacerbation (i.e., increased pain or sleep problems) was occasionally reported by patients who received the EAET intervention [124]; 4% of patients in the CBT and EAET groups (vs. 2.6% in the control group) withdrew due to treatment not of interest or fit and one (1.3%) patient in the CBT group withdrew after being diagnosed with cancer. In another trial that compared CBT with waitlist, [122, 123] 5.9% and 3.9% of CBT patients withdrew due to lack of efficacy or patient decision, respectively, compared with no patients in the waitlist group. One trial of stress management versus usual care reported one withdrawal due to cancer (unrelated to the treatment) in the intervention group compared with no withdrawals or adverse events in the control group. [98]

Physical Modalities for Fibromyalgia

Key Points

One fair-quality parallel trial found no differences between magnetic mattress pads compared with sham or usual care in intermediate-term function (difference on the 0 to 80 scale FIQ –5.0, 95% CI –14.1 to 4.1 vs. sham and –5.5, 95% CI –14.4 to 3.4 vs. usual care) or pain (difference –0.6, 95% CI –1.9 to 0.7 and –1.0, 95% CI –2.2 to 0.2, respectively on a 0 to 10 NRS) (SOE: low). Data from one small, poor-quality crossover trial were insufficient to determine the effects of a magnetic mattress versus sham on function and pain in the short term (SOE: insufficient).
There were no differences in adverse events between the functional and sham magnetic mattress pad groups (data not reported); none of the events were deemed to be related to the treatments (SOE: low).

Detailed Synthesis

Table 38
Two trials, [167, 168] one parallel and one cross-over design, evaluating the efficacy of magnetic fields for the treatment of fibromyalgia met inclusion criteria (Table 38 and Appendix D). Both trials were included in the prior AHRQ report. In both trials, the majority of patients were female (93% and 100%) and the mean ages were 45 and 50 years; symptom duration was 6 years in one trial and was not reported by the other trial. Due to the differences in trial designs we could not pool the data; therefore, these trials are reported separately.

One parallel trial (N=119), [167] conducted in the United States, compared two different magnetic mattress pads (one with a low, uniform magnetic field of negative polarity and the other a low, static magnetic field that varied spatially and in polarity) versus sham (mattress pads with demagnetized magnets) and versus usual care (management by primary care provider). All pads were used for 6 months and outcomes were measured immediately post-treatment. This trial was rated fair quality due to deviations from the randomization protocol and high attrition rate (21%) (Appendix E).

A second small, crossover trial (N=33) [168] evaluated the effects of an extremely low frequency magnetic mattress compared with a sham mattress (no magnetic field delivered). The trial was conducted in Italy. The intervention periods were 1 month and the washout period between the first and second period was 1 month; no further information was provided about the washout period. Outcomes were measured 1 month after the end of each treatment cycle (i.e., at the beginning of the second treatment cycle, after a 1 month washout, and 1 month after the end of the second treatment cycle). This trial was rated poor quality due to unclear randomization sequence generation and allocation concealment, and loss-to-followup of greater than 20% through the second treatment period; additional sources of bias in this crossover trial include no details regarding handling of missing data and no analysis of carryover effect.

Physical Modalities Compared With Usual Care or Sham

The magnetic mattress pads offered no intermediate-term benefit for either function or pain compared with both sham and usual care in the one parallel trial. [167] The difference between groups on the 0 to 80 scale FIQ at 6 months was –5.0 (95% CI –14.1 to 4.1) (versus sham) and –5.5 (95% CI –14.4 to 3.4) (usual care). Regarding pain, the between-group differences were –0.6 (95% CI –1.9 to 0.7) and –1.0 (95% CI –2.2 to 0.2), respectively, on a 0 to 10 NRS. When the intervention groups were considered separately, only the magnetic mattress pad designed to expose the body to a uniform magnetic field of negative polarity resulted in lower FIQ and NRS pain scores compared with controls; however, the differences between groups were not statistically significant.

The crossover trial [168] reported statistically significant improvement in both function and pain favoring the magnetic mattress 1 month after the end of both treatment periods (i.e., over the short term); however, the evidence is considered insufficient. For patients that received magnetic therapy during the first and second (i.e. after crossing-over) treatment periods, mean FIQ scores were 19.2 and 25.1 on a 0-100 scale, respectively, compared with 57.9 and 53.9 for those receiving sham during the same treatment periods (p<0.001 for both). For VAS pain, respective scores were 2.2 and 3.1 versus 5.3 and 4.6 on a 0-10 scale (p<0.001 for both). Results were similar for both the Fibromyalgia Assessment Scale and the Health Assessment Questionnaire (Table 37).

Physical Modalities Compared With Pharmacological Therapy or Exercise

No trial of physical modality versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

In the parallel trial, there were no differences in adverse events between the magnetic mattress pad and sham pad groups. [167] Type of adverse events was not reported, but none of the events were judged to be due to magnetic treatments. The crossover trial only stated that no side effects were recorded during the study. [168]

Manual Therapies for Fibromyalgia

Key Points

Myofascial release therapy was associated with a small improvement in intermediate-term function as measured by the FIQ (mean 58.6 [standard deviation, SD, 16.3] vs. 64.1 [SD 18.1] on a 100 point scale, p=0.048 for the group effect in repeated measures analysis of variance [ANOVA]), but not long-term function (mean 62.8 [SD 20.1] vs. 65.0 [SD 19.8], p=0.329), compared with sham in one fair-quality trial (SOE: low). Short-term function was not reported.
There was insufficient evidence to determine the effects of myofascial release therapy on short-term pain (1 poor-quality trial) and intermediate-term pain (1 fair-quality and 1 poor-quality trial) compared with sham; there were inconsistencies in effect estimates between the intermediate-term trials (SOE: insufficient). 7
Myofascial release therapy was associated with small improvement in pain long term compared with sham, based on the sensory domain (mean 18.2 [SD 8.3] vs. 21.2 [SD 7.9] on a 0-33 scale, p=0.038 for group by repeated measures ANOVA) and evaluative domain (mean 23.2 [SD 7.6] vs. 26.7 [SD 6.9] on a 0-42 scale, p=0.036) of the MPQ in one fair-quality trial; there were no differences for the affective domain of the MPQ or for VAS pain (SOE: low).
Data were insufficient for harms; however, no adverse effect occurred in one fair-quality trial (SOE: insufficient)

Detailed Synthesis

Table 39
Two trials (N=64 and 94) [185, 186] evaluating myofascial release therapy versus sham therapy for fibromyalgia met inclusion criteria (Table 39 and Appendix D). Both trials were included in the prior AHRQ report. Mean patient ages were 48 and 55 years. Baseline pain history characteristics were poorly described in both trials. The duration of myofascial release therapy was 20 weeks in both trials; sessions ranged in length from 60 to 90 minutes and were conducted twice or once a week. The sham conditions included short-wave and ultrasound electrotherapy or sham (disconnected) magnotherapy. Both trials reported intermediate-term outcomes; short-term and long-term outcomes were also reported by one trial each. One trial was rated fair quality and the other poor quality (Appendix E). Unclear allocation concealment methods and lack of blinding were the major methodological shortcoming in both trials. Additionally, the poor-quality trial did not describe the randomization process employed.

Myofascial Release Therapy Compared With Sham

Myofascial release therapy was associated with a small improvement in intermediate-term function compared with sham as measured by the FIQ (mean 58.6 [standard deviation, SD 16.3] vs. mean 64.1 [SD 18.1] on a 100 point scale, p=0.048 for the group by time effect in repeated measures ANOVA) in one fair-quality trial185; this effect did not persist to the long term (62.8 [SD 20.1] vs. 65.0 [SD 19.8], p=0.329, at 12 months). Function was not reported over the short term.

Figure 49
Regarding pain outcomes, one poor-quality trial reported a small effect for myofascial release compared with sham therapy over the short term (mean 8.4 vs. mean 9.4 on a 0-10 VAS at 1 month, p=0.048 for group by time repeated measures ANOVA). [186] Intermediate-term results were inconsistent across the trials as measured on a 0 to 10 VAS pain scale with one fair-quality trial reporting a small improvement in pain for myofascial release versus sham (mean 8.25 [SD 1.13] vs. mean 8.94 [SD 1.34], p=0.043)185 at 6 months and the other (poor quality) reporting no significant difference between groups (8.8 vs. 9.7, p=NS) (Figure 49). [186] Additional pain measures were reported over the intermediate-term by the fair-quality trial, all of which showed a small benefit in favor of myofascial release: FIQ pain (8.5 [SD 0.7] vs. 8.0 [SD 1.3], p=0.042 for group by time repeated measures ANOVA) and the MPQ sensory (17.3 [SD 7.8] vs. 20.7 [SD 7.1] on a 0-33 scale, p=0.04), affective (4.5 [SD 2.9] vs. 5.2 [SD 3.8] on a 0-12 scale, p=0.04) and evaluative (21.9 [SD 7.2] vs. 26.2 [SD 6.8] on a 0-42 scale, p=0.02) dimensions. [185] This effect persisted at long-term followup for the sensory and evaluative dimension of the MPQ only; no differences were seen between groups regarding VAS pain of the affective dimension of the MPQ at long term following in this trial (Table 38).

Depression, anxiety, and sleep outcomes were evaluated in one poor-quality trial, with significant improvement seen short term in the myofascial release versus the sham group on some subscales of the Short-Form-36 and on the sleep duration subscale of the PSQI,186 but no differences between groups on the STAI or BDI (Table 38); at intermediate followup, only PSQI sleep duration was significantly improved following myofascial release versus sham.

Manual Therapy Compared With Pharmacological Therapy or Exercise

No trial of manual therapy versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

In one trial, no patient experienced an adverse effect (details not reported).185 No information on harms was reported by the other trial.

Mindfulness Practices for Fibromyalgia

Key Points

No clear short-term effects of MBSR were seen on function compared with waitlist or attention control (difference 0 to 0.06 on a 0-10 scale) in two trials (one fair and one poor quality). Clinically meaningful improvement in function (≥14% on the FIQ total, 0-100 scale) was not different for MBSR versus either comparator (SOE: moderate).
No clear short-term effects of MBSR were seen on pain (difference 0.1 on a 0-100 VAS pain scale in one poor quality trial; difference –1.38 to –1.59 on the affective and –0.28 to –0.71 on the sensory dimension [scales not reported] of the Pain Perception Scale in one fair-quality trial) compared with waitlist or attention control in two trials (SOE: moderate). Intermediate-term and long-term outcomes were not reported.
In one new trial, meditation awareness training (MAT) was associated with a small intermediate-term improvement in function (adjusted difference –7.9, 95% CI –8.2 to –4.3 on FIQ 0-100 scale) and a small improvement in pain (adjusted difference –3.0, 95% CI –4.1 to –1.9 on the 0-45 SF-MPQ Pain Perception Index) versus attention control (SOE: low).
No trial of mindfulness practices versus pharmacological therapy or versus exercise met inclusion criteria.
Harms were not reported.

Detailed Synthesis

Table 40
We identified three trials (4 publications) of mindfulness practices for fibromyalgia that met inclusion criteria (Table 40 and Appendix D). [200–203] Two trials (3 publications) [200–202] of mindfulness-based stress reduction (MBSR) practices were included in the prior AHRQ report and one new trial [203] of “Meditation Awareness Training” (MAT) was included for this update. In both MBSR trials, the intervention was modeled after the program developed by Kabat-Zinn. The intervention lasted 8 weeks, with weekly 2.5-hour sessions, daily homework assignments, and a single 7-hour session. Sample sizes ranged from 90 to 168 (total sample=406), age ranged from 48 to 53 years, and all participants were female. Both studies compared MBSR versus waitlist control; one trial [201] also compared MBSR to an attention control group that consisted of education, relaxation, and stretching. Both studies reported only short-term outcomes. One study was conducted in the United States [200, 202] and the other in Germany. [201] The third trial (N=148, mean age 47, 83% female) compared MAT, a mindfulness-based intervention, with an attention control condition (education only). [] 203 MAT consisted of one 2-hour session per week for 8 weeks plus a CD of guided meditations to facilitate daily practice. Weekly sessions included a presentation, a facilitated group discussion, and guided educational exercises, with no practice or discussion of meditation. This trial was conducted in England.

Two trials (1 MBSR and 1 MAT) were considered fair quality [201, 203] and the other MBSR trial was considered poor quality [200, 202] (Appendix E). Methodological shortcomings in all trials were the lack of long-term followup and the inability to blind patients and providers. The poor-quality study also had a high rate of overall attrition as well as differential attrition between the groups.

Mindfulness Practices Compared With Waitlist or Attention Control

There were no clear short-term effects of MBSR on any function or pain measure reported compared with waitlist or attention control. Both trials compared MBSR to waitlist and reported function using the FIQ; one reported the physical function subscale (difference 0 on a 0-10 scale, 95% CI –0.32 to 0.32) [200] and the other reported the total score (difference –0.06 on a 0-10 scale, 95% CI –0.75 to 0.63). [201] The latter fair-quality trial also reported the proportion of patients who achieved a 14percent or greater improvement in FIQ total scores: 30 percent versus 22 percent, RR 1.37 (95% CI 0.83 to 1.94). [201] Regarding pain, one trial reported a mean difference of 0.1 (95% CI –9.96 to 10.16) on a 0 to 100 VAS pain scale [200] between the MBSR and waitlist groups, while the other reported on affective (difference –1.59, 95% CI –5.01 to 1.83) and sensory (difference –0.28, 95% CI –2.30 to 1.74) domains of the Pain Perception Scale (scale not reported). [201] Estimates for function and pain were similar for the comparison of MBSR versus attention control in the fair-quality trial [201] (Table 39). The new fair-quality trial of MAT versus educational attention control reported only intermediate term outcomes. There were small improvements in function on the 0-100 FIQ-R (adjusted difference –7.9, 95% CI –8.24 to –4.25) and in pain on the 0-45 SF-MPQ Pain Perception Index (adjusted difference –3.0, 95% CI –4.1 to –1.9) associated with MAT compared with attention control. [203]

Secondary outcomes (measures of depression, anxiety, sleep, fatigue) did not differ significantly between MBSR and waitlist or attention control in either trial [200–202] (Table 39). The fair-quality trial compared medication use (analgesics, anti-depressants, and sleep medication) between baseline and short-term followup; only antidepressant medication was reduced significantly from baseline (46% to 35%, p=0.01) but there was no group effect (data not reported). [201] In the trial of MAT versus education attention control, [203] there was an intermediate-term benefit for MAT on the 0-21 PSQI sleep measure (adjusted difference –2.3, 95% CI –2.9 to –1.6) and the 0-100 DASS measure of depression, anxiety and stress (adjusted difference –4.9, 95% CI –6.3 to –3.4).

Mindfulness-Based Stress Reduction Therapy Compared With Pharmacological Therapy or Exercise

No trial of MBSR versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

Neither trial reported harms.

Mind-Body Therapy for Fibromyalgia

Key Points

Over the short term, two trials of mind-body practices reported small improvement in function for qigong compared with waitlist (difference –7.5, 95% CI –13.3 to –1.68) and for tai chi compared with attention control (difference –23.5, 95% CI –30 to –17) based on 0 to 100 scale total FIQ score; heterogeneity may be explained by duration and intensity of intervention and control conditions. Significantly more participants in the tai chi group also showed clinically meaningful improvement on total FIQ (RR 1.6, 95% CI 1.1 to 2.3) consistent with a small effect (SOE: low).
Qigong and tai chi were associated with moderately greater improvement in pain (0-10 scale) compared with waitlist and attention control in the short term (2 trials, pooled difference –1.44, 95% CI –2.96, –0.23, I²=46%). Significantly more participants in the tai chi group also showed clinically meaningful improvement on VAS pain (RR 2.0, 95% CI 1.1 to 3.8) consistent with a small effect (SOE: low).
There was no evidence regarding effects of mind-body practices versus waitlist or attention control in the intermediate or long term.
In one new trial, compared with aerobic exercise, tai chi was associated with a small improvement in function 3 to 6 months postintervention (difference in change scores –5.5, 95% CI –0.6 to –10.4, FIQ-R 0-100 scale), but the effect did not persist from intermediate to longer term (6-12 months) (difference in change scores –2.7, 95% CI –2.3 to 7.7) (SOE: low). Analyses confined to two 60-minute sessions of tai chi per week for 24 weeks versus comparable sessions per weeks of aerobic exercise suggest moderate functional improvement at intermediate term (difference in change scores –16.2, 95% CI –8.7 to –23.6, 0-100 FIQ-R scale) that was sustained long-term (difference in change scores –11.1, 95% CI –2.7 to –19.6). There were no differences between tai chi overall and exercise with regard to opioid use at intermediate (OR 0.89, 95% CI 0.28 to 2.80) or long term (OR 1.08, 95% CI 0.33 to 3.51).
Data for harms were insufficient. However, one trial reported two adverse events (in two patients) judged to be possibly related to qigong practice: an increase in shoulder pain and plantar fasciitis; neither participant withdrew from the study. One trial of tai chi reported no adverse events while the second (new) trial reported that, across all intensities of tai chi vs. aerobic exercise, there were no severe treatment-related adverse events and 5.3% (8/151) versus 5.3% (4/75) mild/moderate treatment-related adverse events, respectively (SOE: insufficient).

Detailed Synthesis

Table 41
Three trials [217, 218, 223] that evaluated mind-body therapies for fibromyalgia met inclusion criteria (Table 41 and Appendix D). Two trials were included in the prior AHRQ report [217, 218] and one was added for this update. [223] Sample sizes ranged from 66 to 226 (total sample=392). Across trials, the participants were predominately female (87% to 96%), with mean ages between 51 to 52 years. Prior to study enrollment, participants in both trials were being treated with several drugs from major analgesic and adjuvant drug groups such as analgesics/NSAIDs (53% to 73%), antidepressants (35% to 48%), and anticonvulsants (21% to 27%); in one trial, approximately 30 percent of participants were taking opioids and many participants had tried a variety of other therapies (including acupuncture, chiropractic, naturopathic/homeopathic/osteopathic therapies, massage therapy, and psychological therapies). [217]

One trial compared Qigong (3 consecutive half-day training sessions, then weekly practice/review sessions for 8 weeks plus daily at-home practice for 45 to 60 minutes) to a waiting list control condition. [217] Another trial compared tai chi (two 60-minute sessions/week for 12 weeks) to an attention control condition (40 minutes of wellness education and 20 minutes of supervised stretching exercises). [218] In the Qigong trial, the mean self-reported practice time per week for all participants who completed the trial was 4.9 hours at 2 months, 2.9 hours at 4 months, and 2.7 hours at 6 months.217 In the tai chi study, the average percent of sessions attended during the 12-week intervention was 77 percent for the tai chi group and 70 percent for the control group. [218] The third trial [223] compared three different intensities (one 60-minute session/week for 12 weeks vs. two 60-minute sessions/week for 12 weeks vs. one 60-minute session/week for 24 weeks vs. two 60-minute sessions/week for 24 weeks) of Yang style tai chi to an aerobic exercise intervention consisting of two 60-minute sessions per week for 24 weeks. Patients in the tai chi group attended 62% of all possible classes (67% vs. 65% vs. 57% vs. 58% by intensity, respectively) and those in the exercise group attended 40%. In all three trials, patients were instructed to continue the practice at home throughout the followup period. The two trials comparing Qigong and tai chi with a waitlist and an attention control reported only short-term outcomes while the third trial comparing tai chi with exercise reported only long-term outcomes. Both tai chi trials were conducted in the United States [218, 223] and the Qigong trial in Canada. [217]

All trials were rated fair quality (Appendix E). Due to the nature of the intervention and control groups, blinding was not possible in these trials. Other methodological concerns included unacceptable attrition overall (30% at 12 months) and differential attrition (e.g., 11% in the most frequent tai chi group vs. 24% in the comparable exercise group at 12 months) in the new tai chi trial and differential attrition between groups in the Qigong trial (intervention 19% vs. waitlist 4% at 6 months). [217]

Mind-Body Therapies Compared With Waitlist or Attention Control

Figure 50
Figure 51
All trials were included in the prior AHRQ report. Short-term improvement in function on 0 to 100 scale total FIQ score was reported for qigong (small improvement, difference –7.51, 95% CI –13.33 to –1.69) [217] and for tai chi (substantial improvement, difference –23.50, 95% CI –29.98 to –17.02)218 compared with waitlist or attention control. Substantial heterogeneity (I²=92%) precluded meaningful pooling for this outcome (Figure 50). Significantly more participants in the tai chi group also showed clinically meaningful improvement (reduction of ≥8.1 points from baseline) on total FIQ (RR 1.6, 95% CI 1.1 to 2.3), consistent with a small effect. Tai chi and qigong were associated with a moderate improvement in pain (0 to 10 scale) compared with wait list or attention control (2 trials, pooled difference –1.44, 95% CI –2.96 to –0.23, I²=45.6%) (Figure 51). Significantly more participants in the tai chi group also showed clinically meaningful improvement (reduction of ≥2 points from baseline) in VAS pain (RR 2.0, 95% CI 1.1 to 3.8), consistent with a small effect. Heterogeneity may in part be due to differences in duration and intensity of the intervention.

Mind-body therapy resulted in significant improvement in most secondary outcomes measured. Tai chi participants showed clinically meaningful improvement in depressive symptoms as measured by the CES-D (RR 1.8, 95% CI 1.1 to 2.9), in sleep quality as measured by the PSQI (RR 2.5, 95% CI 1.1 to 5.6), and in quality of life as measured by the SF-36 PCS (RR 3.4, 95% CI 1.4 to 8.1) and MCS (RR 2.0, 95% CI 1.0 to 4.0) compared with controls; similar results were seen for mean followup scores on these measures (Table 40). [218] In the second trial, [217] compared to a waitlist control, qigong resulted in significantly improved quality of life as measured by the SF-36 PCS (difference in change from baseline 4.4, 95% CI 1.5 to 7.3) and in sleep quality as measured by the PSQI (difference in change from baseline –2.2, 95% CI –3.6 to –0.8). The change in SF-36 MCS scores did not differ between groups.

Mind-Body Therapies Compared With Pharmacological Therapy or Exercise

No trials comparing mind-body therapies with pharmacological therapy met inclusion criteria in the prior report; no new studies were identified for this update.

One new trial of different frequencies and durations of tai chi versus aerobic exercise was identified. [223] Tai chi was associated with a small improvement in function 3 to 6 months postintervention (difference in change scores –5.5, 95% CI –0.6 to –10.4, FIQ-R, 0-100 scale) when all tai chi groups were combined versus twice weekly aerobic exercise at 6 months. At 12 months (6 to 12 months postintervention), there was no difference between the combined tai chi groups and the exercise group (difference in change scores –2.7, 95% CI –2.3 to 7.7). When analysis was confined to two 60-minute sessions of tai chi per week for 24 weeks, a moderate improvement in function based on 0-100 FIQ-R at intermediate term (difference in change scores –16.2, 95% CI –8.7 to –23.6) was seen and improvement was sustained long-term (difference in change scores –11.1, 95% CI –2.7 to –19.6) versus a comparable number of sessions/weeks of aerobic exercise. Once-weekly tai chi for 24 weeks was also associated with improved function at intermediate term and long term versus twice-weekly aerobics for 24 weeks but effect sizes were slightly smaller versus twice-weekly sessions (–7.5 and –1.9 respectively, CI’s not reported) and consistent with small improvement in function.

There were no differences between tai chi overall and exercise with regard to opioid use at intermediate (OR 0.89, 95% CI 0.28, 2.80) or long term (OR 1.08, 95% CI 0.33, 3.51). Two weekly 60 minute tai chi sessions, versus a comparable number of aerobic exercise sessions, were associated with improved HADS-A anxiety (difference 1.6, 95% CI 0.1 to 3.1) and 0-10 PGAS global assessment (difference 1.5, 95% CI 0.4 to 2.5), but no difference on the SS symptom severity (difference 0.7, 95% CI –0.3 to 1.8), HAQ (difference 1.8, 95% CI –5.9 to 9.4), BDI depression (difference 4.6, 95% CI –0.5 to 9.7), HADS-D depression (difference 1.6, 95% CI 0.0 to 3.2), SF-36 MCS (difference 2.2, 95% CI –2.7 to 7.1), SF-36 PCS (difference 3.0, 95% CI –0.7 to 6.8) or PSQI (difference 0.9, 95% CI –0.7 to 2.5) measures.

Harms

In the trial of qigong, [217] there were two adverse events judged to be possibly related to the practice. One participant reported an increase in shoulder pain and another experienced plantar fasciitis; neither participant withdrew from the study. In the trial of tai chi, no adverse events were reported. [218] In the new trial, [] [223] across all intensities of tai chi versus aerobic exercise, there were no severe treatment-related adverse events and 5.3% (8/151) versus 5.3% (4/75) mild/moderate treatment-related adverse events, respectively.

Acupuncture for Fibromyalgia

Key Points

Acupuncture was associated with a small improvement in function compared with sham acupuncture as evaluated by the FIQ Total Score (0 to 100) at short-term (3 trials [1 new], pooled difference –9.21, 95% CI –13.65 to –5.78, I²=0%) and intermediate-term followup (2 trials, pooled difference –9.82, 95% CI –14.35 to –3.01, I²=27.4%) (SOE: moderate).
There was no effect of acupuncture versus sham acupuncture on pain (0 to 10 scale) in the short term (4 trials [1 new], pooled difference –0.86, 95% CI –2.73 to 0.92, I²=88.9%) or intermediate term (3 trials, pooled difference –0.65, 95% CI –1.15 to 0.17, I²=45.5%). Across control conditions (sham or attention control), there was also no effect of acupuncture (5 trials [two new], pooled difference –1.14, 95% CI –2.66 to 0.33, I²=91.6%) (SOE: low).
Results for secondary outcomes across trials of acupuncture versus sham were inconsistent.
No data on long-term effects were reported.
Discomfort and bruising were the most common adverse events. Across two trials, discomfort was reported by 37% to 70% of those receiving true or sham acupuncture. Across two trials, bruising was reported in 6% (1/16) to 30% (29/96) of patients who received true or sham acupuncture. Vasovagal symptoms (occurring in 4% of participants who received acupuncture in one trial) and dizziness/nausea were less common adverse events associated with acupuncture (SOE: moderate).

Detailed Synthesis

Table 42
Five trials of acupuncture for fibromyalgia were identified that met inclusion criteria (Table 42 and Appendix D). [246–250] Three trials [246–248] were included in the prior AHRQ report and two trials [249, 250] were added for this update. Four trials (2 new trials) evaluated traditional Chinese needle acupuncture [246, 248–250] and the fifth evaluated acupuncture with electrical stimulation. [247]

Four studies compared acupuncture to sham [246–249];
the fifth compared it to an education attention control. [250]

One study [246] employed three different types of sham treatments (needling for an unrelated condition, sham needling, and simulated acupuncture);
one employed two different types of sham procedures (sham needling and simulated acupuncture) [249];
one used sham needling [247];
and one used simulated acupuncture. [248]

Sample sizes ranged from 30 to 164 (total sample=412), mean ages from 35 to 56 years, and the proportion of females ranged from 95 percent to 100 percent. The duration of acupuncture treatment ranged from 3 to 12 weeks, with the total number of sessions ranging from six to 24. All studies except two reported short-term and intermediate-term outcomes; the two new trials reported only short-term outcomes. [249, 250] No trial had long-term followup. Three trials were conducted in the United States, [246, 247, 250] one in Spain [248] and one in Turkey. [249]

All trials except two were considered good quality; the two new trials were considered fair-quality [249, 250] (Appendix E). The primary limitation across trials was lack of acupuncturist blinding to treatment allocation; for one new fair-quality trial, the intention-to-treat principle was not followed. [249] No trial reported long term outcomes.

Acupuncture Compared With Sham or Attention Control

Figure 52
Figure 53
Acupuncture was associated with a small improvement in function compared with sham acupuncture as evaluated by the FIQ Total Score (0 to 100) at short-term followup (3 trials, pooled difference –9.21, 95% CI –13.65 to –5.78, I²=0%) [247–249] and intermediate-term followup (2 trials, pooled difference on 0-100 scale, –9.82, 95% CI –14.35 to –3.01, I²=27.4%) [247, 248] (Figure 52). There was, however, no effect of acupuncture versus sham acupuncture on pain (0-10 scale) in the short term (4 trials, pooled difference –0.86, 95% CI –2.73 to 0.92, I²=88.9%) [246–248] or intermediate term (3 trials, pooled difference –0.65, 95% CI –1.15 to 0.17, I²=45.5%) [246–248] (Figure 53). Results based on mean difference in change scores were similar (data not shown). These conclusions are the same as in the previous report. All trials versus sham, except one, were considered good quality; the new trial [249] was considered fair quality. In the new trial, acupuncture was also compared with simulated acupuncture; at short term, a moderate improvement in function (difference –11.9, 95% CI –23.1 to –0.8, FIQ 0-100) and large improvement in pain (difference –3.7, 95% CI –5.1 to –2.4, VAS 0-10) were reported. [249] Another new, small trial of group acupuncture versus education attention control found a benefit at short term on VAS pain [250]; however, across control conditions (sham or attention control), there was no effect of acupuncture short term (5 trials [2 new], pooled difference –1.14, 95% CI –2.66 to 0.33, I²=91.6%). [246–250] Substantial heterogeneity was noted and may be due to a variety of factors including differences in intervention delivery across studies and lack of blinding (attention control).

Results for secondary outcomes across trials of acupuncture versus sham were inconsistent. In the trial of acupuncture versus three different types of sham acupuncture, [246] there was no significant benefit of acupuncture versus the combined sham groups on the SF-36 MCS score, a measure of sleep quality, or a measure of overall well-being. In the trial of six acupuncture treatments over 2 to 3 weeks, there was a benefit for true versus sham acupuncture at 1 and 7 months on the FIQ subscale of anxiety, but not depression, sleep, or well-being. [247] In the trial of one 20-minute session per week for 9 weeks plus pharmacological treatment as prescribed by a general practitioner, there was a benefit for true versus sham acupuncture at 1 month for the SF-12 MCS scale (mean relative change 30.6%, 95% CI 19.7 to 41.5 vs. 13.9%, 95% CI 5.4 to 22.5; Cohen’s d=0.38, p=0.01), and at 9.75 months for the Hamilton Rating Scale for Depression (mean relative change –19.1%, 95% CI –34.2 to –3.9 vs. –5.9%, 95% CI –16.6 to –4.8, Cohen’s d=0.22, p=0.01) and the SF-12 Mental Component scale (mean relative change, 23.0%, 95% CI 13.7 to 32.4 vs. 9.4%, 95% CI 1.9 to 16.9; Cohen’s d=0.36, p=0.01).248 In the new trial of acupuncture versus sham and simulated acupuncture, [249] comparing acupuncture versus sham short-term, there was a benefit for acupuncture on the 0-100 NHP sleep measure (difference –38.2, 95% CI –55.9 to –20.6) and the 0-40 BDI depression measure (difference –21.2, 95% CI –29.5 to –13.0). Comparing acupuncture versus simulated acupuncture short-term, there was a benefit of acupuncture on the NHP sleep scale (difference –53.6, 95% CI –71.6 to –35.7) and 0-63 BDI (difference –25.2, 95% CI –32.4 to –18.1). [249]

Acupuncture Compared With Pharmacological Therapy or Exercise

No trial of acupuncture versus pharmacological therapy or versus exercise met inclusion criteria.

Harms

Discomfort and bruising were the most common reported adverse events. In one trial, [246] 89 of 96 treated (true or sham acupuncture) participants reported adverse events; 35 of 96 (37%) reported discomfort at needle insertion sites, 29 of 96 (30%) reported bruising, 3 of 96 (3%) reported nausea, and one of 96 (0.3%) felt faint at some point during the study. For patients assigned to simulated acupuncture, five of 19 (29%) had significantly less discomfort than those in directed acupuncture (14 of 23, 61%), acupuncture for unrelated condition (15 of 22, 70%) or sham needling (14 of 22, 64%); p=0.02. In one trial, [247] two of 50 (4%) experienced mild vasovagal symptoms and 1 of 50 (2%) experienced a pulmonary embolism believed to be unrelated to treatment. Mild bruising and soreness were reported to be more common in the true acupuncture group, but rates were not reported. In one study, [248] 2.6 percent of sessions led to aggravation of fibromyalgia symptoms and 0.5 percent led to headache. In the true acupuncture group, pain, bruising, and vagal symptoms presented after 4.7 percent of sessions. In one new trial, no serious adverse events were reported but some patients experienced discomfort and bruising at the sites of needle insertion. [249] In the other new trial, bruising and dizziness were reported in one patient following acupuncture (of 16 randomized or 6%) versus no patients randomized to attention control. [250]

Multidisciplinary Rehabilitation for Fibromyalgia

Key Points

More multidisciplinary treatment participants experienced a clinically meaningful improvement in FIQ total score (≥14% change) compared with usual care at short (odds ratio [OR] 3.1, 95% CI 1.6 to 6.2), intermediate (OR 3.1, 95% CI 1.5 to 6.4), and long term (OR 8.8, 95% CI 2.5 to 30.9) in one poor-quality trial. Multidisciplinary treatment was associated with a small improvement in function (based on a 0-100 FIQ total score) versus usual care or waitlist in the short term (3 trials, pooled difference –6.08, 95% CI –14.17 to 0.16, I²=49%), and versus usual care at intermediate term (3 trials, pooled difference –7.77, 95% CI –12.22 to –3.83, I²=0%) and long term (2 trials, pooled difference –8.54, 95% CI –15.00 to –1.30, I²=0%) (SOE: low for short, intermediate and long term).
Multidisciplinary treatment was associated with a small improvement in pain compared with usual care or waitlist at intermediate term (3 trials, pooled difference –0.68, 95% CI –1.10 to –0.27, I²=0%); there were no clear differences compared with usual care or waitlist in the short term (2 trials [excluding an outlier trial], pooled difference on a 0-10 scale –0.24, 95% CI –0.63 to 0.15, I²=0%) or with usual care in the long term (2 trials, pooled difference –0.25, 95% CI –0.79 to 0.36, I²=0%) (SOE: low for short, intermediate and long term).
There were no differences between multidisciplinary pain treatment versus aerobic exercise at long term in one trial for function (difference –1.10, 95% CI –8.40 to 6.20, 0-100 FIQ total score) or pain (difference 0.10, 95% CI –0.67 to 0.87, 0-10 FIQ pain scale) (SOE: low).
Data were insufficient for harms. However, one poor-quality study reported on adverse events, stating that 19 percent of participants randomized to multidisciplinary treatment withdrew (versus 0% for waiting list) and two of these 16 patients gave increased pain as the reason. Reasons for other withdrawals were not given and there was not systematic reporting of adverse events (SOE: insufficient).

Detailed Synthesis

Table 43
We identified six trials (across 8 publications) of multidisciplinary treatments that met inclusion criteria (Table 43 and Appendix D). [96, 262–268] All the trials were included in the prior AHRQ report. Across trials, sample sizes ranged from 66 to 203 (total sample=801) and participants were predominantly (>90%) female with mean ages between 40 to 50 years. The multidisciplinary treatments included physical therapy or exercise training in all trials, as well as CBT and pharmacological therapy (2 trials) [263, 266]; CBT and an educational program (1 trial) [268]; sociotherapy, psychotherapy, and creative arts therapy (1 trial) [96]; relaxation exercises (1 trial) [265]; and education and group discussions (1 trial). [262] All trials compared multidisciplinary treatment with usual care or waitlist; in addition, one trial compared it with exercise. [96] Treatment duration ranged from 2 to 12 weeks and the frequency of sessions from once a week to daily (total number of sessions ranged from 12 to 24 with durations between 1.5 to 5 hours). One of the trials included two intervention arms. [268] The long-term multidisciplinary arm (2 days of education and exercise followed by 10 weeks of CBT) was determined to be most consistent with interventions employed by the other trials and was included in the pooled estimates below; results for the short-term group (2 days of education, exercise, and CBT programs) were similar to those of the long-term group and can be found in Table 42. Three trials reported outcomes over the short term (3 to 5.5 months), [262, 263, 268] three over the intermediate term (6 months), [263, 265, 266] and two over the long term (12 and 18 months). [96, 263] Five trials were conducted in Europe [96, 262–267] and one trial in Turkey. [268]

Three trials were judged to be of fair quality [96, 262, 268] and three trials were rated poor quality [263, 265, 266] (Appendix E). The nature of the intervention precluded blinding of participants and of people administering the treatments. Additional methodological shortcomings in the poor quality trials included unclear allocation concealment methods and high rates of overall attrition (21% to 43%) and differential attrition (12% to 13%) between groups.

Multidisciplinary Rehabilitation Compared With Usual Care or Waitlist

Figure 54
Clinically important FIQ improvement (≥14% change) was significantly more common for multidisciplinary treatment compared with usual care at short- (odds ratio [OR] 3.1, 95% CI 1.6 to 6.2), intermediate- (OR 3.1, 95% CI 1.5 to 6.4) and long-term followup (OR 8.8, 95% CI 2.5 to 30.9) in one poor-quality trial. [263] Multidisciplinary treatment for fibromyalgia was associated with a small improvement in function versus usual care or waitlist based on a 0 to 100 FIQ total score in the short term (3 trials, pooled difference –6.08, 95% CI –14.17 to 0.16, I²=48.9%), [262, 263, 268] and versus usual care in the intermediate term (3 trials, pooled difference –7.77, 95% CI –12.22 to –43.83, I²=0%) [263, 265, 266] (Figure 54). The short-term estimate for trials of multidisciplinary treatment versus usual care only was similar (2 trials, pooled difference –9.74, 95% CI –16.38 to –3.83). [263, 268] The slightly smaller effect of multidisciplinary rehabilitation versus usual care persisted over the long term (2 trials, pooled difference on 0-100 scale –8.54, 95% CI –15.00 to –1.30, I²=0%). [96, 263] Only one poor-quality trial reported short-term, intermediate-term, and long-term effects on function, showing a significant result for each time frame. [263]

Figure 55
Clinically important improvement in pain (≥30% change on a 0-10 scale) was more common for multidisciplinary treatment compared with usual care at intermediate-term followup in one poor-quality trial (OR 3.4, 95% CI 1.0 to 10.8) [263]; no statistically significant differences were seen between groups at short- or long-term followup. There were no clear effects of multidisciplinary treatment for fibromyalgia on pain versus usual care or waitlist in the short term (3 trials, pooled difference on a 0-10 scale –0.84, 95% CI –2.56 to 0.64, I²=83.6%), [262, 263, 268] but statistical heterogeneity was very large (Figure 55). Excluding an outlier trial (difference –2.50, 95% CI –3.73 to –1.27) [268] reduced the statistical heterogeneity and resulted in an attenuated effect (pooled difference –0.24, 95% CI –0.63 to 0.15, I²=0%). At intermediate term, multidisciplinary treatment was associated with a small improvement in pain compared with usual care (3 trials, pooled difference 0–10 scale –0.68, 95% CI –1.10 to –0.27, I²=0%). [263, 265, 266] Long term, there were no clear effects of multidisciplinary treatment on pain versus usual care (2 trials, pooled difference –0.25, 95% CI –0.79 to 0.36, I²=0%). [96, 263] Only one poor-quality trial reported short-, intermediate-, and long-term effects on pain, showing a significant result for each time frame. [263]

Results were mixed across the six trials for effects of multidisciplinary treatment on secondary outcomes. Three trials were fair quality. [96, 262, 268] Across the three fair-quality trials, there were no significant differences between multidisciplinary treatment and usual care or waitlist on measures of anxiety (Generalized Anxiety Disorder–10, FIQ anxiety subscale) in two trials [96, 262] and depression (Major Depression Inventory, FIQ depression subscale, BDI) in three trials [96, 262, 268] over short-term or long-term followup. Regarding quality of life, two of these trials reported no differences between groups on the SF-36 PCS and MCS and the EQ-5D [96, 262] while the third reported significant improvement on the SF-36 PCS but not the MCS. [268] One trial reported no difference in healthcare utilization between groups during the 2 months prior to the final measurement at 18 months. [96]

Multidisciplinary Rehabilitation Compared With Pharmacological Therapy

No trial of multidisciplinary rehabilitation versus pharmacological therapy met inclusion criteria.

Multidisciplinary Rehabilitation Compared With Exercise

There was no clear effect of multidisciplinary pain treatment versus aerobic exercise at long term in one fair-quality trial [96] for physical function on the FIQ physical function scale (difference 0 on a 0–10 scale, 95% CI –0.79 to 0.79) or the FIQ total score (difference –1.10 on a 0–100 scale, 95% CI –8.40 to 6.20). Similarly, there were no significant differences on the FIQ pain scale (difference 0.10 on a 0–10 scale, 95% CI –0.67 to 0.87), or secondary outcomes of quality of life, depression or anxiety, or healthcare utilization, with the exception of physiotherapist consultations, which was higher for the multidisciplinary group in the 2 months prior to the final measurement at 18 months (Table 42).

Harms

Adverse events were poorly reported by the included trials. One trial that compared multidisciplinary treatment (group pool sessions of physiotherapy, relaxation exercises, and exercise) with usual care (physical therapy, drug treatment and, in some cases, psychotherapy)265 reported that 16 of 84 (19%) multidisciplinary participants withdrew (versus 0% for waiting list) and two of these gave increased pain as the reason. Reasons for other withdrawals were not given and there was not systematic reporting of adverse events.

      Key Question 5. Chronic Tension Headache

No new trials that evaluated nonpharmacological treatments for chronic tension headache that met our inclusion criteria were identified for this update.

Psychological Therapies for Chronic Tension Headache

Key Points

There is insufficient evidence from three poor quality trials to determine the effects of psychological therapies (CBT, relaxation) on short-term or intermediate-term function or pain compared with waitlist, placebo, or attention control (SOE: insufficient).
There is insufficient evidence from two poor-quality trials to determine the effects of CBT on short-term or intermediate-term function or pain compared with antidepressant medication (SOE: insufficient).
No long-term outcomes were reported and no trials comparing psychological therapies to biofeedback were identified that met inclusion criteria.
Data were insufficient for harms. Results were mixed across two poor-quality trials comparing CBT with antidepressant medication, with one trial reporting a lower risk of “at least mild” adverse events in the CBT group (0% vs. 59%), four of which led to withdrawal from the trial, and the second trial reporting a similar low risk of withdrawal due to adverse events (2% to 6% across groups to include placebo) (SOE: insufficient).

Detailed Synthesis

Table 44
Three trials, all conducted in the United States, [128, 129, 132] of CBT for chronic tension headache met inclusion criteria (Table 44 and Appendix D). Sample sizes ranged from 36 to 104 (total sample=198); the mean age across trials varied from 32 to 42 years and most participants were female (56% to 80%). Duration since the onset of headache pain ranged from 10.7 to 14.5 years. All trials either excluded patients with concomitant migraines or required that they suffer from no more than one migraine per month. Two trials also specifically excluded patients with medication overuse (analgesic-abuse) headaches and required that patients be free from prophylactic headache medication upon study entry. [129, 132]

All three trials evaluated some variation of stress management therapy/cognitive coping skills training with a relaxation component; one trial (n=77) also included an additional relaxation only arm. [128] In two trials (n=41, 150), patients received three 60-minute sessions of CBT and training in home-based relaxation, [129, 132] and in the third trial (n=77), patients underwent 11 sessions (1-2 per week) of CBT plus progressive muscle relaxation training (session duration varied from 45 to 90 minutes). [128] In all trials, the interventions were administered by a psychologist or counselor over a 2-month period. Two trials compared CBT with placebo (placebo pill), [129] attention control (pseudomeditation/body awareness training) [128] and waitlist (monitoring via phone and clinical visits) control groups. [128] Two trials compared CBT with amitriptyline (25-75 mg/day). [129, 132] All trials reported short-term results; one trial also provided outcomes at intermediate-term followup. [129]

All three trials were considered poor quality (Appendix E) due to lack of blinding and large differential attrition between groups (in one trial, overall attrition was also substantial [129]). Additionally, randomization, concealment, and intention-to-treat processes were unclear in one trial. [132]

Psychological Therapy Compared With Waitlist, Placebo, or Attention Control

There was insufficient evidence from three poor-quality trials to draw conclusions regarding the effects of psychological therapies compared with waitlist, placebo, or attention control over the short term or intermediate term.

CBT plus placebo was associated with a small improvement in both short-term and intermediate-term function compared with placebo alone as measured by the Headache Disability Inventory (HDI) (scale 0–100) in one trial (difference 7.3, 95% CI 1.6 to 13.0 at 1 month and 9.3, 95% CI 3.5 to 15.1 at 6 months. [129] Long-term function was not reported.

Figure 56
Figure 57
Various pain measures were reported across trials. In general, CBT (plus relaxation), but not relaxation alone, appeared to have a small effect on short-term pain compared with waitlist, placebo, or attention control (Table 43). CBT plus relaxation was associated with a small improvement in pain on the Headache Index (HI) at 1 month compared with waitlist, attention control, or placebo across two trials (pooled SMD –0.40, 95% CI –0.79 to 0.00, I²=0%) [128, 129] (Figure 57). Relaxation only conferred no benefit for short-term pain compared with waitlist or attention control in one of these trials (difference –0.21 on a 0-20 HI scale, 95% CI –0.78 to 0.36). [128] Almost twice as many patients who received CBT plus relaxation achieved at least a 50 percent improvement in headache frequency compared with usual care or waitlist (risk ratio [RR] 1.94, 95% CI 1.03 to 3.66) over the short term in one trial; however, there was no difference between groups when the intervention was relaxation alone (RR 0.98, 95% CI 0.42 to 2.26) [128] (Figure 56). One trial reported similar favorable results regarding pain over the intermediate-term for CBT plus placebo compared with placebo alone (difference –0.65, 95% CI –1.06 to –0.24) (Figure 57), with the exception of “success” (≥50% improvement from baseline in HI score), which did not differ between groups (Table 43). [129]

Medication use did not differ significantly between the CBT and relaxation therapy groups and waitlist, placebo, or attention control groups over the short-term in two trials. [128, 129] Over the intermediate-term, CBT plus placebo resulted in a significant reduction in analgesic use compared with placebo alone (difference 11.8, 95% CI 1.5 to 22.1). [129]

Psychological Therapy Compared With Pharmacological Therapy

There was insufficient evidence from two poor-quality trials to draw conclusions regarding the effect of CBT versus pharmacological therapy through intermediate-term followup.

There was no effect for CBT plus placebo versus antidepressant medication over the short-term or intermediate-term for function as measured by the HDI (scale 0–100) in one trial (difference 0.1, 95% CI –5.6 to 5.7 at 1 month and 2.4, 95% CI –3.3 to 8.0 at 6 months). [129] Long-term function was not reported.

Regarding short-term pain, two trials reported HI index scores with differing results. One trial found that CBT plus placebo resulted in less improvement compared with antidepressant medication at 1 month (SMD 0.50, 95% CI 0.11 to 0.89), [129] whereas the other trial showed an improvement with CBT versus amitriptyline by 1 month, although the difference did not reach statistical significance (SMD –0.59, 95% CI –1.26 to 0.08) [132] (Figure 57); due to the significant heterogeneity between groups we did not use the pooled estimate. There were no significant differences between CBT and pharmacological treatment for any other pain outcome reported over the short term in both trials [129, 132] or over the intermediate-term in one trial [129] (Table 43).

Short-term results were mixed regarding medication use with one trial reporting no difference between CBT and amitriptyline [132] and the other reporting a significant difference between groups favoring antidepressant therapy [129]; however, this difference did not persist to the intermediate term in the latter trial (Table 43).

Psychological Therapy Compared With Biofeedback

No trial of psychological therapy versus biofeedback met inclusion criteria.

Harms

Harms were reported by the two poor-quality trials comparing CBT with antidepressant medication, [132] and with placebo in one. [129] No patient who underwent CBT experienced an adverse effect versus 10 of 17 (59%) of those who took medication in one trial; [132] six events were classified as mild, two as moderate, and two as substantial (no further details provided). Four of these patients withdrew from the trial. The risk of withdrawal due to adverse events was similar across groups in the second trial: CBT (2%) versus antidepressant medication (2%) and placebo (6%); no other information was provided. [129]

Physical Modalities for Chronic Tension Headache

Key Points

There is insufficient evidence from one poor-quality trial to determine the effects occipital transcutaneous electrical stimulation (OTES) on short-term function or pain compared with sham (SOE: insufficient).
No longer-term outcomes were reported and no trials comparing physical modalities to pharmacological therapy or to biofeedback were identified that met inclusion criteria.
Data were insufficient for harms; however, no adverse events occurred in either the real or the sham OTES group in one poor-quality trial (SOE: insufficient).

Detailed Synthesis

Table 45
Only one Italian trial [169] was identified that investigated the efficacy of OTES versus sham (Table 45 and Appendix D). Patients were excluded if they had undergone prophylactic treatment in the prior 2 months or had previous treatment with OTES. Acute medications use was permitted during the study period, but other methods of pain control or new preventive treatments were prohibited. At baseline, 46 percent of patients were overusing medications. Identical devices and procedures were used for both the real and the sham OTES, and treatment consisted of 30-minute sessions, three times per day for two consecutive weeks. Limited information on the timing of outcomes was provided, but it was assumed that data was collected at 1 and 2 months post-treatment. This trial was rated poor quality due to unclear randomization sequence, failure to control for dissimilar proportion of females between groups, and no reporting of attrition (Appendix E). The focus of the trial was on allodonia, which was not of interest to this report.

Physical Modalities Compared With Sham

There was insufficient data from one poor-quality trial to determine the short-term effects of OTES compared with sham. [169] OTES resulted in greater improvement in function at 2 months as measured by the Migraine Disability Assessment Questionnaire (difference –35.0, 95% CI –42.6 to –27.4, scale 0-21+) and in pain intensity as measured by VAS (difference –5.0 on a 0–10 scale, 95% CI –5.8 to –4.2) The proportion of patients who achieved a 50 percent or greater reduction in headache days also favored OTES (RR 12.4; 95% CI 3.2 to 47.3). Measures of depression and anxiety were both somewhat better following OTES compared with sham at 2 months, however, the between-group difference was only statistically significant for anxiety (Table 44). The proportion of patients overusing medications at 2 months was also significantly lower in the OTES group.

Physical Modalities Compared With Pharmacological Therapy or Biofeedback

No trial of physical modalities versus pharmacological therapy and versus biofeedback met inclusion criteria.

Harms

Authors report that neither adverse events nor side effects occurred in either the real or the sham OTES group in one poor-quality trial. [169]

Manual Therapies for Chronic Tension Headache

Key Points

Spinal manipulation therapy, compared with usual care, was associated with small and moderate improvements, respectively, in function (difference –5.0, 95% CI –9.02 to –1.16 on the Headache Impact Test, scale 36-78 and difference –10.1, 95% CI –19.5 to –0.64 on the Headache Disability Inventory, scale 0 to 100) and pain intensity (difference –1.4 on a 0-10 NRS scale, 95% CI –2.69 to –0.16) over the short term in one fair-quality trial (SOE: low). Approximately 25 percent of the patients had comorbid migraine.
There is insufficient evidence from one poor-quality trial to determine the effects of spinal manipulation therapy on short-term pain compared with amitriptyline (SOE: insufficient).
No longer-term outcomes were reported and no trials comparing physical modalities to pharmacological therapy or to biofeedback were identified that met inclusion criteria.
No adverse events occurred in the trial comparing spinal manipulation to usual care, but significantly fewer adverse events were reported following manipulation versus amitriptyline in the other poor-quality trial (4.3% vs. 82.1%; RR 0.05, 95% CI 0.02 to 0.16). The risk of withdrawal due to adverse events was not significantly different (1.4% vs. 8.9%; RR 0.16, 95% CI 0.02 to 1.33). Common complaints were neck stiffness in the manipulation group and dry mouth, dizziness, and weight gain in the medication group (SOE: low).

Detailed Synthesis

Table 46
Two trials (n=75 and n=126) [187, 188] that evaluated spinal manipulation therapy (SMT) for the treatment of chronic tension headache met inclusion criteria (Table 46 and Appendix D). The majority of patients in both trials were female (61% to 78%) with mean ages ranging from 40 to 42 years and a mean headache duration of 13 years. Both trials included patients with comorbid migraine as long as their headache problem was determined by a physician to be predominantly tension-type in nature (this included 26% of patients in one trial, [187] proportion not reported in the other trial). In one trial, patients were specifically excluded if they met the criteria for medication overuse or if they had received manual therapy in the 2 months prior to enrollment. [187] At baseline, prophylactic medication use was common. Current or past use of other treatments was not reported.

One Dutch trial compared a maximum of nine, 30-minute sessions of SMT over 8 weeks with usual care (information, reassurance and advice, discussion of lifestyle changes, and analgesics or NSAIDs provided by a general practitioner). [187] The second trial, conducted in the United States, compared 12 SMT sessions of 20 minutes over a 6-week treatment period versus amitriptyline (maximum dose 30 mg/day). [188] Both trials reported only short-term outcomes. One trial was rated fair quality187 and one poor quality188 (Appendix E). Due to the nature of the interventions, blinding of patients and researchers was not possible. Additionally, the poor trial had a high rate of differential attrition (7% SMT and 27% amitriptyline).

Manual Therapies Compared With Usual Care

Only short-term data from one fair-quality trial were reported. SMT resulted in small to moderate improvements in function compared with usual care at 4.5 months post-treatment as measured by the Headache Disability Inventory (HDI, scale 0 to 100) and the Headache Impact Test (HIT-6, scale 36 to 78), respectively (difference between groups in change scores from baseline, –10.1, 95% CI –19.5 to –0.64 and –5.0, 95% CI –9.02 to –1.16).187 Regarding pain outcomes, twice as many patients who received SMT experienced a ≥50% reduction from baseline in the number of headache days (per 2 weeks) compared with usual care: 81.6% versus 40.5%; RR 2.0 (95% CI 1.3, 3.0).187 Similarly, a statistically greater reduction in the number of headache days (difference between groups in change scores from baseline, –4.9; 95% CI –6.95 to –2.98) and in headache pain intensity (difference in change scores from baseline, –1.4 on a 0 to 10 NRS scale, 95% CI –2.69 to –0.16) was seen following SMT. Given that 29 percent of SMT patients and 22 percent of usual care patients had comorbid migraine, it is unclear how the coexistence of these headache types may have affected the outcome. The proportion of patients who used any additional healthcare services (e.g., physical therapy, medical specialists, other) was statistically lower in the SMT group compared with the usual care group (Table 45). [187] Authors report no statistically significant differences between treatments in analgesic or NSAID use; data were not provided.

Manual Therapies Compared With Pharmacological Therapy

The evidence was insufficient from one poor-quality trial to determine the effects of spinal manipulation compared with amitriptyline over the short term. [188] The spinal manipulation group showed more improvement compared with the amitriptyline group in daily headache intensity (adjusted difference –1.4, 95% CI –2.3 to –0.3), weekly headache frequency (adjusted difference –4.2, 95% CI –6.5 to –1.9), Short Form-36 Function score (adjusted difference 4.9, 95% CI 0.4 to 9.4), and over-the-counter medication use (difference –0.9, 95% CI –1.5 to –0.3) at 1 month. Attrition in the amitriptyline group was 27 percent, compared with 7 percent in the manipulation group.

Manual Therapies Compared With Biofeedback

No trial of physical modalities versus biofeedback met inclusion criteria.

Harms

No adverse events occurred in the trial comparing spinal manipulation to usual care. [187] The other poor-quality trial reported significantly fewer adverse events following spinal manipulation compared with amitriptyline (4.3% vs. 82.1%; RR 0.05, 95% CI 0.02 to 0.16) but the risk of withdrawal due to adverse events was not significantly different (1.4% vs. 8.9%; RR 0.16, 95% CI 0.02 to 1.33).188 Patients in the manipulation group complained of neck stiffness which resolved in all cases and common side effects in the amitriptyline group included dry mouth, drowsiness, and weight gain.

Acupuncture for Chronic Tension Headache

Key Points

There is insufficient evidence from two poor quality trials to determine the effects of Traditional Chinese needle acupuncture on short-term (2 trials), intermediate-term (1 trial), or long-term (1 trial) pain compared with sham acupuncture (SOE: insufficient).
Laser acupuncture was associated with a small improvement in pain intensity (median difference –2, IQR 6.3, on a 0-10 VAS scale) and in the number of headache days per month (median difference –8, IQR 21.5) over the short term versus sham in one fair-quality trial (SOE: low).
No trials comparing acupuncture to pharmacological therapy or to biofeedback were identified that met inclusion criteria.
The fair-quality trial evaluating laser acupuncture reported that no adverse events occurred in either group (SOE: low).

Detailed Synthesis

Table 47
Three small trials (N=30 to 50; total sample=119) [251–253] that evaluated acupuncture versus sham treatment for chronic tension headaches met inclusion criteria (Table 47 and Appendix D). Two trials employed traditional Chinese needle acupuncture, [252, 253] while one used low-energy laser acupuncture. [251] The number of acupoints ranged from 6 to 10 across studies. The duration of treatment ranged from 5 to 10 weeks, with the total number of sessions ranging from 8 to 10 (20 to 30 minutes duration, 1 to 3 times per week). Sham treatment consisted of irrelevant acupuncture (superficial needle insertion in areas without acupuncture points) and sham acupuncture (blunt needle that simulates puncturing of the skin, laser power output set to zero).

Across trials, participants were primarily female (49% to 87%), mean ages ranged from 33 to 49 years, and headache frequency from 18 to 27 days per month. Two trials specifically excluded patients with other causes of chronic headache [251, 252]; the third trial did not note if any of the patients had concomitant headaches. [253] One trial required patients to abstain from all other prophylactic therapies (with the exception of rescue analgesics), [253] and one trial excluded patients who had received any treatment for their headache in the 2 weeks prior to enrollment. [251] Concomitant (nonnarcotic) medication was permitted in two trials, [252, 253] the third stated that no patient took concomitant analgesics. [251] All trials assessed outcomes over the short term; one trial additionally provided intermediate- and long-term data. [253]

One trial was rated fair quality [251] and two poor quality [252, 253] (Appendix E). In all three trials, random sequence generation and concealment of allocation were not clearly reported and the care providers were not blinded to treatment. Additional methodological concerns in the poor quality trials included unclear application of intention-to-treat methods, and failure to control for disproportionate baseline characteristics or to account for loss to followup in one trial each.

Acupuncture Compared With Sham

Figure 58
None of the trials reported on function. All three trials reported pain outcomes, although the specific measures varied across the trials. The results were mixed depending on the type of acupuncture used. No significant differences were found between needle acupuncture and sham for any pain outcome evaluated during the short term in two small poor-quality trials, [252, 253] or at intermediate and long-term followup in one of these trials [253] (Table 46). In the third small fair-quality trial, [251] laser acupuncture resulted in a significant reduction in the number of headache days per month (median –8, interquartile range [IQR] 21.5), in pain intensity on a 0 to 10 VAS scale (median –2, IQR 6.3), and in the duration of attacks (median –4 hours, IQR 7.5) over the short term compared with the sham group, which reported no improvement from baseline on any outcome at the 3-month followup (p<0.001 for all). Substantial heterogeneity (I²=91%) precluded meaningful pooling for this outcome (Figure 58).

Acupuncture Compared With Pharmacological Therapy or Biofeedback

No trial of acupuncture versus pharmacological therapy and versus biofeedback met inclusion criteria.

Harms

Harms were generally not reported. The trial evaluating laser acupuncture reported that no adverse events occurred in either group.251

      Key Question 6. Differential Efficacy

RCTs that stratified on patient characteristics of interest, permitting evaluation of factors that might modify the effect of treatment, were considered for inclusion. Factors included age, sex, presence of comorbidities (e.g., emotional or mood disorders) and degree of nociplasticity/central sensitization. If a comparison is not listed below there was either no evidence identified that met the inclusion criteria or the included trials did not provide information on differential efficacy or harms. Studies likely had insufficient sample size to evaluate differential efficacy or harms, and evidence was considered insufficient.

Osteoarthritis Knee Pain

Key Points

There is insufficient evidence from one fair-quality trial (across 3 publications) that age, sex, race, BMI, baseline disability, pain, or depression status modify the effects of exercise in patients with OA of the knee. Sample sizes in the subgroup analyses from the Fitness, Arthritis and Seniors Trial (FAST) were likely inadequate to effectively test for modification.

Exercise Compared With Attention Control

One fair-quality trial (n=439) reported across three publications of the FAST [51, 57, 58] included in Key Question 3 compared muscle performance (i.e., resistance training) and aerobic exercise programs to an attention control and formally evaluated factors that may modify treatment in patients with OA of the knee. Details regarding these study populations are available in the Results section for Key Question 3 and in Appendix D. Two of the reports performed formal tests for interaction; none of the demographic or clinical variables evaluated were found to modify the effect of either type of exercise. [57, 58] One trial explored whether age, sex, race, BMI, baseline disability, or baseline pain modified the effects of exercise on function based on ADL disability measures in a subgroup of patients who were free of ADL disability upon enrollment; however, no data were provided for evaluation. [57]

A second publication looked at whether the effects of exercise on pain, disability, and depression were modified by baseline depression status, that is, high versus low depressive symptomology according to the Center for Epidemiologic Studies Depression scale over time (using an adjusted repeated measures analysis of variance). However, the authors do not provide results that directly examined modification by baseline depression without the time component. [58] The third FAST publication stratified on age, sex, race, and BMI and did not perform a formal statistical test for interaction. [51] Upon visual inspection, the point estimates across groups and strata are similar, suggesting that the effect of exercise on physical disability and knee pain was not modified by any patient characteristic evaluated.

Osteoarthritis Hip Pain

Key Points

There is insufficient evidence from one fair-quality trial that age, sex, baseline pain, and the presence of radiographic OA modify the effects of exercise in patients with OA of the hip. Study authors only reported on effects that include evaluation of these factors over time. Sample size was likely inadequate to effectively test for modification.
Exercise Compared With Usual Care

One fair-quality trial (n=203) included for Key Question 3 compared combination exercise therapy (strengthening, stretching, and endurance exercises) to usual care and stratified on age, sex, race, and BMI, but it did not formally test for interaction. [74] Details regarding this study population are available in the Results section for Key Question 3 and in Appendix D. Age, sex, education, self-reported knee OA, and baseline pain and Kellgren & Lawrence radiographic OA scores were defined a priori as subgroups of interest. Although older patients (age ≥65 years), women, patients with a lower NRS pain score at baseline, and patients with radiographic OA showed somewhat larger effects of exercise therapy on function and pain, data were not systematically reported and, based on the data provided, overlapping confidence intervals suggest that the effect of exercise was not modified by any of these variables.

Fibromyalgia

Key Points

There was insufficient evidence from one poor-quality trial that baseline BMI (normal, overweight, obese) modifies the effects of multidisciplinary rehabilitation in patients with fibromyalgia. Study authors only report on effects that include evaluation of these factors over time. Sample size was likely inadequate to effectively test for modification.
Multidisciplinary Rehabilitation Compared With Usual Care

An additional publication (n=130) [264] of a poor-quality trial [263] included for Key Question 4 that compared multidisciplinary rehabilitation to usual care assessed potential modification of treatment based on baseline BMI (normal, overweight, obese). No significant interactions were found for the effect of BMI on exercise over time for any pain or function measure evaluated; however, the authors do not provide results that exclude effects of time. Details regarding this study population are available in the section on efficacy and in Appendix D.

Return to: Noninvasive Nonpharmacological Treatment (2020)

Home Page

Visit Our Sponsors

Become a Sponsor

Join us

Please read our DISCLAIMER