Noninvasive Nonpharmacological Treatment for Chronic Pain:
A Systematic Review Update
(April 16, 2020).

Andrea C. Skelly, Ph.D., M.P.H., Roger Chou, M.D., Joseph R. Dettori, Ph.D., M.P.H., M.P.T.,
Judith A. Turner, Ph.D., et al.

Rockville (MD): Agency for Healthcare Research and Quality (US); 2020 (Apr)

This section was compiled by Frank M. Painter, D.C.
Send all comments or additions to:


The methods for this systematic review follow the Agency for Healthcare Research and Quality (AHRQ) Methods Guide for Effectiveness and Comparative Effectiveness Reviews [18] and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist. See the review protocol
( for details.

      Topic Refinement and Review Protocol

The Evidence-based Practice Center (EPC) review team reexamined the Key Questions and PICOTS (Populations, Interventions, Comparators, Outcomes, Timing, Studies, Settings) in consultation with the AHRQ Task Order Officer (TOO), representatives from the Centers for Disease Control and Prevention, and the Technical Expert Panel (TEP).

The TEP consisted of members with expertise in primary care, rheumatology, pain medicine, behavioral sciences, physical medicine and rehabilitation, and physical therapy. TEP members had expertise in treating patients with one or more of the five conditions included in this report.

The final version of the protocol for this review was posted on the AHRQ Effective Health Care Program website
( on March 1, 2019.

The protocol was also registered in the PROSPERO database of prospectively registered systematic reviews (CRD42019132457).

      Literature Search Strategy

A research librarian conducted searches in Ovid® MEDLINE®, Cochrane Central Register of Controlled Trials, and Cochrane Database of Systematic Reviews. For the prior report, the searches were conducted from inception through November 1, 2017 and for this update, from September 1, 2017 through September 20, 2019. For the prior report, searches were conducted without publication date restrictions with the exception of studies of chronic low back pain, as we relied on a recent AHRQ review [19] to identify primary studies for inclusion through 2016 (see Appendix A for full search strategies). As there are multiple manufacturers/sources for many of the devices examined in this review, a Federal Register notice was posted to request submission of Supplemental Evidence and Data for Systematic Reviews (SEADS) via an AHRQ portal.

Responses received were reviewed and suggested citations and other data were compared against the inclusion/exclusion criteria. No new trials eligible for inclusion were identified from these responses. We also searched for unpublished studies in Reference lists of included articles and the bibliographies of systematic reviews (published since 2010 for the prior report) were reviewed for includable literature. Literature searches will be updated during the public comment and peer review period to capture any new publications. Resulting citations and any suggested during peer review and public comment will be evaluated against the inclusion/exclusion criteria following the same process of dual review as all other studies considered for inclusion in the report.

      Inclusion and Exclusion Criteria and Study Selection

Table 1

Inclusion and exclusion criteria were developed a priori based on the Key Questions and PICOTS, in accordance with the AHRQ Methods Guide for Effectiveness and Comparative Effectiveness Reviews. [18] Criteria are detailed below in Table 1. Abstracts were reviewed by at least two investigators, and full-text articles were retrieved for all citations deemed potentially appropriate for inclusion by at least one of the reviewers. Two investigators then independently reviewed all full-text articles for final inclusion. Discrepancies were resolved by discussion and consensus. A list of the included studies appears in Appendix B; excluded studies and primary reason for exclusion are listed in Appendix C.

The focus of this review is on randomized controlled trials (RCTs) reporting on longer-term outcomes (at least 1 month postintervention) that otherwise meet our PICOTS criteria.

      Data Abstraction and Data Management

Using templates, data from included trials were abstracted into categories that included but were not limited to: study design, year, setting, country, sample size, eligibility criteria, attrition, population and clinical characteristics (including age, sex, comorbidities, diagnostic classifications/information), intervention characteristics (including the type, number, intensity, duration of, and adherence to treatments), comparator characteristics, and results (including harms). We also recorded the funding source and role of the sponsor. All abstracted study data were verified for accuracy and completeness by a second team member (Appendix D). Details are further outlined in the protocol.

      Quality (Risk of Bias) Assessment of Individual Studies

Table 2

Predefined criteria were used to assess the quality of included trials. We focused on trials with the least potential for bias and the fewest limitations. RCTs were assessed based on criteria and methods established in the Cochrane Handbook for Systematic Reviews of Interventions (Chapter 8.5 Risk of Bias Tool), [21] and precepts for appraisal developed by the Cochrane Back and Neck Group. [22] These criteria and methods were used in conjunction with the approach recommended in the AHRQ Methods Guide for Effectiveness and Comparative Effectiveness Research. [18] Two team members independently appraised each included study, with disagreements resolved by consensus. Studies were rated as “good,” “fair,” or “poor” as described in Table 2. Assessments of included studies are in Appendix E.

      Data Analysis and Synthesis

Meta-analyses from the 2018 report were updated and new analyses conducted if two or more studies could be combined. Data were synthesized qualitatively (e.g., ranges and descriptive analysis) and quantitatively using meta-analysis where appropriate. Results are organized by Key Question (i.e., by condition) and intervention and then by comparators for each subquestion (e.g., intervention vs. waitlist or sham for subquestion a). To the extent that the interventions were distinct, we explored separating them out for analysis and reporting. For example, we categorized various forms of exercise based on their primary mechanisms of action (Appendix F). Interventions with similar characteristics were combined (e.g., cognitive-behavioral therapy [CBT] and acceptance and commitment therapy [ACT], which is a type of CBT). [23] Duration of followup postintervention was reported and categorized as short term (1 to <6 months), intermediate term (≥6 to <12 months), and long term (≥12 months).

Prioritized outcomes of function and pain, based on validated measures, are presented first. Based on input from stakeholders, improvement in function was prioritized as the most important outcome. There is overlap between functional outcome measures and quality of life measures. Short-Form 36 (SF-36) and EuroQoL-5 Dimensions (EQ-5D) are two such outcome measures and they were categorized as quality of life measures for this report. For some conditions, such as osteoarthritis, results were organized by affected region (e.g., knee, hip, hand). Based on input from stakeholders, improvement in function was prioritized as the most important outcome.

Results for continuous outcomes as well as dichotomous outcomes were synthesized. Binary outcomes were based on the proportion of patients achieving specific thresholds of success for improved function, or other measure of success as defined in the trials (e.g., ≥30% improvement in pain score), and a risk ratio and 95% confidence interval were calculated to evaluate the presence of an association and estimate relative effect size using the Rothman Episheet. [24] For continuous outcomes, mean differences between treatments and 95% confidence intervals were calculated using GraphPad or Stata®/IC 12.1 (StataCorp, College Station, TX) to provide effect sizes and determine presence of a statistical association.

We conducted meta-analysis to quantitatively synthesize evidence. To determine the appropriateness of meta-analysis, we considered clinical and methodological diversity and assessed statistical heterogeneity. Two continuous primary outcomes (pain and function) and one secondary outcome (quality of life) provided adequate data for meta-analysis. Mean difference (MD) was used as the effect measure if the studies reported outcomes using the same scale, or if the outcomes could be converted to the same scale (e.g., 0-100 pain ratings were converted to 0-10 scale); otherwise, standardized mean difference (SMD) was used when the reported outcomes used different scales but measured the same underlying construct (e.g., function). In the primary analysis, MD and SMD were calculated using the followup score, and sensitivity analyses were conducted using the change score from the baseline. When standard deviation (SD) was not reported, or could not be calculated from the reported data, it was imputed using the average SD or assuming the same coefficient of variation from the studies of the same meta-analysis, or using the SD value from the baseline if the baseline SD was reported and the followup SD was not.

We assumed random effects across studies and used both the Dersimonian-Laird method [25] and the profile-likelihood model [26] to combine studies. Statistical heterogeneity among the studies was assessed using the standard Cochran’s chi-square test and the I2 statistic. [27] The p-values for the chi-square test were reported in the forest plots. Primary analyses were stratified by disease type, intervention, control group (usual care, exercise, or pharmacological treatment) and length of followup (short, intermediate, and long term). Controls included usual care, waitlist, no treatment, placebo, sham treatment, attention control, or other groups that involved at most minimal active treatment. We performed additional sensitivity and subgroup analyses based on specific interventions (e.g., type of acupuncture, type of exercise, intervention intensity etc.) and control types (as described above) and by excluding outlying studies and studies rated as poor as data permitted. Meta-regression was conducted to test the interaction between the intervention effects and intervention characteristics if warranted by data.

To facilitate interpretation of results across trials and interventions, we categorized the magnitude of effects for function and pain outcomes as in our previous reviews. [19, 28] In general we classified effects for measures with a 0 to 10 scale for pain or function as small (0.5 to 1 point), moderate (>1 to 2 points), or large/substantial (>2 points) (see additional information in Assessing Applicability). Where data were available, proportions of patients meeting clinically important improvement were reported. If effect estimates tended to favor one treatment but failed to reach statistical significance with confidence interval crossing the null value of zero or one (perhaps due to sample size), the results are interpreted as showing no clear difference between treatments. If effect estimates are close to zero and not statistically significant, results are interpreted as no difference between groups.

      Grading the Strength of Evidence for Major Comparisons and Outcomes

The strength of evidence for each Key Question and primary outcome (function, pain, harms) was initially assessed by one researcher with experience in determining strength of evidence for each primary clinical outcome in accordance with AHRQ guidance [29, 30] and as described in the protocol. The initial assessment was independently reviewed by at least one other experienced senior investigator. The overall strength of evidence (SOE) was determined based on assessment of study limitations (graded low, moderate, high); consistency of results across trials (graded consistent, inconsistent, or for single studies, unknown); the directness of the evidence linking the interventions with health outcomes (graded direct or indirect); effect estimate precision (graded precise or imprecise); and reporting bias (suspected or undetected). Bodies of evidence consisting of RCTs were initially considered high strength. All outcomes were considered direct.

Table 3

The final strength of evidence grade was assigned by evaluating and weighing the combined results of the above domains and considering the highest quality evidence available. While studies rated as poor quality were not excluded, such studies were considered to be less reliable than higher quality studies when synthesizing the evidence, particularly when discrepancies across studies were noted. The strength of evidence was assigned an overall grade of high, moderate, low, or insufficient according to a four-level scale (Table 3). When all of the studies for a primary outcome were rated poor quality, we rated the strength of evidence as insufficient. SOE tables for primary outcomes are presented in Appendix G. Summary strength of evidence tables were updated based on the totality of underlying evidence (i.e., the 2018 systematic review [16] evidence in combination with that newly identified studies) and the impact of new trials on SOE is the summary tables.

      Assessing Applicability

Applicability was assessed using the PICOTS framework by examining the abstracted characteristics of the patient populations for each condition (e.g., demographic characteristics, condition-specific diagnostic criteria, symptoms, presence of medical and psychiatric comorbidities, and other psychosocial factors); the interventions (e.g., availability in the United States; dose, frequency, or intensity of treatment, and methods for administration); and clinical settings (e.g., primary care, specialty setting, or developing country vs. developed country) in which the included studies are performed.

The magnitude of effects for pain and function (Appendix H) were classified with the system used in our previous AHRQ review on noninvasive treatment for low back pain, [28] recognizing that small effects using this system may not meet standard thresholds for clinically meaningful effects. We applied the following definitions:

  • Small effect

    • For pain:   as a mean between-group difference following treatment of 5 to 10 points on a 0-to 100-point visual analog scale (VAS), 0.5 to 1.0 point on a 0- to 10-point numeric rating scale (NRS), or equivalent

    • For function:   as a mean difference of 5 to 10 points on the 0- to 100-point Oswestry Disability Index (ODI) or Western Ontario and McMasters Universities Osteoarthritis Index (WOMAC) or 1 to 2 points on the 0- to 24-point Roland-Morris Disability Questionnaire (RDQ) or Lequesne Index (LI), or equivalent

    • For any outcome:   as a SMD of 0.2 to 0.5

  • Moderate effect

    • For pain:   as a mean difference of 10 to 20 points on a 0- to 100-point VAS

    • For function:   as a mean difference of 10-20 points (on a 0-100 scale) on the ODI or WOMAC or 2 to 5 points on RDQ or LI, or equivalent

    • For any outcome:   as a SMD of >0.5 to 0.8

  • Large effect

    • For pain:   For pain: as a mean difference of ≥20 points on a 0- to 100-point VAS

    • For function:   as a mean difference of ≥20 (on a 0-100 scale) on the ODI or WOMAC or 5 points on RDQ or LI, or equivalent

    • For any outcome:   as a SMD of >0.8

Information regarding effect size definitions for other outcome measures is available in Appendix H. There is variability across individual patients regarding what may constitute a clinically import effect, which is influenced by a number of factors such as preferences, duration and type of chronic pain, baseline symptom severity, harms, and costs.

      Peer Review and Public Commentary

Peer reviewers with expertise in primary care and management of the included chronic pain conditions were invited to provide written comments on the draft report. The AHRQ TOO and an EPC Associate Editor also provided comments and editorial review. Subsequently, the peer-reviewed draft report was posted on the AHRQ website for 4 weeks for public comment. A disposition of comments report with authors’ responses to the peer and public review comments will be posted after publication of the final Comparative Effectiveness Review on the AHRQ website.

Return to:   Noninvasive Nonpharmacological Treatment (2020)

                       © 1995–2022 ~ The Chiropractic Resource Organization ~ All Rights Reserved