RELIABILITY AND VALIDITY OF CLINICAL TESTS TO ASSESS THE FUNCTION OF THE CERVICAL SPINE IN ADULTS WITH NECK PAIN AND ITS ASSOCIATED DISORDERS: PART 5. A SYSTEMATIC REVIEW FROM THE CERVICAL ASSESSMENT AND DIAGNOSIS RESEARCH EVALUATION (CADRE) COLLABORATION
 
   

Reliability and Validity of Clinical Tests to Assess
the Function of the Cervical Spine in Adults with
Neck Pain and its Associated Disorders: Part 5.
A Systematic Review from the Cervical Assessment
and Diagnosis Research Evaluation (CADRE) Collaboration

This section is compiled by Frank M. Painter, D.C.
Send all comments or additions to:
   Frankp@chiro.org
 
   

FROM:   European Journal of Physiotherapy 2019 (Jul 8); 1–32 ~ FULL TEXT

Nadège Lemeunier, Minisha Suri-Chilana, Patrick Welsh, Heather M. Shearer, Margareta Nordin, et al.

Institut Franco-Européen de Chiropraxie,
72 chemin de la Flambère,
31300, Toulouse, France.
nlemeunier@ifec.net


The purpose of this study is to determine the reliability and validity of clinical tests used to assess cervical function, muscle strength and endurance in adults with neck pain and its associated disorders (NAD). Systematic review and update of the Bone and Joint Decade 2000–2010 Task Force on NAD. We systematically searched five electronic databases. Eligible reliability and validity studies were critically appraised using the QAREL and QUADAS-2 tools, respectively. Validity studies were ranked according to the Sackett and Haynes classification to determine clinical utility. Early studies of novel tests provide preliminary evidence, and phase III/IV studies are necessary to confirm the validity of tests in clinical practice. We conducted a best evidence synthesis. We screened 7846 citations and critically appraised 28 articles. Eighteen low risk of bias articles provide preliminary evidence of reliability and validity (phase I/II) for the cranio-cervical flexion test and deep cervical extensor (DCE) test in patients with NAD. Only two clinical tests were found to be reliable and valid. Cranio-cervical flexion test and DCE test could assess cervical muscle strength in adults with NAD. However, the evidence is supported by only phase I and II validity studies from the Sackett and Haynes classification.

KEYWORDS:   Systematic review, neck pain, functional tests, neck strength, neck endurance, reliability, validity



From the FULL TEXT Article:

Introduction

Neck pain and its associated disorders (NAD) is a common disorder and can have a significant impact on an individual’s function and quality of life. NAD is ranked as the fourth leading cause for disability and 21st for overall burden of disease. [1] The annual prevalence of non-specific NAD is estimated to be 30–50% globally, with 1.7–11.5% reporting activity-limiting pain. [2] Furthermore, 50–85% of individuals reported a second episode of neck pain 1–5 years following initial onset. [3] Overall, NAD is often recurrent and represents a significant source of pain and activity limitations in the working population. [3, 4]

The physical examination of patients with NAD involves observation, range of motion, palpation, and neurological examination. Functional tests are employed as additional examination tools to provide measures of an individual’s abilities to perform physical tasks such as lifting and overhead reaching. [5–8] In theory, these tests can provide clinicians with performance-based outcomes to evaluate daily physical abilities and inform return to function and clinical goals. Although functional tests are frequently employed by clinicians, their reliability and validity remain unclear.

A systematic review by the Bone and Joint Decade 2000–2010 Task Force on Neck Pain and its Associated Disorders (Neck Pain Task Force) investigated the reliability and validity of various assessment and diagnostic procedures for NAD, including neck function, muscle strength and endurance tests. [9] The Neck Pain Task Force identified:

(1)   preliminary evidence that lower functional ability is associated with higher pain intensity in patients with chronic NAD;

(2)   evidence that muscle testing of the neck and upper extremity has poor reliability which may be due to measurement error (Kappa 0.60); and

(3)   preliminary evidence that cervical flexor endurance tests in the supine position may help differentiate between patients with whiplash-associated disorder (WAD) grade II and healthy controls. [9]

The search used by the Neck Pain Task Force including literature published up to 2006 and an update of the systematic review is needed to determine the reliability and validity of functional tests for the assessment of NAD.

      Aim

The purpose of our systematic review was to update the Neck Pain Task Force and determine the reliability and validity of clinical tests used in the assessment of neck function in adults aged 18 years or older with NAD grades I–IV. This review is the last in a series of five systematic reviews updating the Neck Pain Task on assessment of patients with NAD. [10–13] Together, these reviews will inform the development of a clinical practice guideline for the clinical assessment of NAD.



From the FULL TEXT Article:

Methods

      Registration

We registered two review protocols with the International Prospective Register of Systematic Reviews (PROSPERO) on 2 February 2016 (CRD4201603XXXX for the functional tests section and CRD4201603XXXX for the muscle strength and endurance tests section).

      Eligibility criteria

Population:   We included studies of adults, 18 years of age and older, with NAD (grades I–IV) including WAD (grades I–IV). We defined NAD according to the Neck Pain Task Force (Supplementary Table S1) [14] and WAD according to the Quebec Task Force [15] (Supplementary Table S2). NAD includes non-traumatic neck pain and neck pain subsequent to a traffic collision (whiplash), with or without its associated disorders, which include arm pain radiating from the neck and upper thoracic pain, and/or headache, and/or temporomandibular joint pain where they are associated with neck pain. [14]

According to the Neck Pain Task Force, NAD is classified into four grades [14]:

  • Grade I:   Pain of low intensity and related to low levels of disability and interference with activities of daily living. No signs or symptoms suggestive of major structural pathology and no or minor interference with activities of daily living.

  • Grade II:   Pain of high intensity, but associated with low level of disability and interference with activities of daily living. No signs or symptoms of major structural pathology, but major interference with activities of daily living.

  • Grade III:   Pain that is associated with high levels of disability and moderate limitations in activities of daily living. No signs or symptoms of major structural pathology, but presence of neurologic signs such as decreased deep tendon reflexes, weakness, and/or sensory deficits.

  • Grade IV:   Pain that is associated with high levels of disability and severe limitations in activities of daily living.


Signs or symptoms of major structural pathology, such as fracture, myelopathy, neoplasm, or systemic disease; requires prompt investigation and treatment.

The Quebec Task Force Classification of Grades of Whiplash-associated Disorder [15]:

  • Grade I WAD:   Neck pain and associated symptoms in the absence of objective physical signs.

  • Grade II WAD:   Neck pain and associated symptoms in the presence of objective physical signs and without evidence of neurological involvement.

  • Grade III WAD:   Neck pain and associated symptoms with evidence of neurological involvement including decreased or absent reflexes, decreased or limited sensation, or muscular weakness.

  • Grade IV WAD:   Neck pain and associated symptoms accompanied by fracture and dislocation.

      Interventions

We limited our review to studies assessing the reliability and validity of neck function, muscle strength and endurance tests used to assess NAD patients. Reliability refers to the ability of a test to give an equivalent result with repeated application in a person with a particular level of a disease. [16] Reliability can be measured within (intra-rater) and between (inter-rater) individuals performing a test. We also considered test–retest reliability which is defined as the stability of a clinical phenomenon in subjects who are supposed to have not changed. Validity refers to the degree to which persons with or without the condition under study are correctly categorised. [16] Construct validity is the degree to which a test measures what it purports to measure, while criterion validity compares a measure to a gold standard. [16]

      Definition of functional tests

The definition of a functional test is adapted from Solway et al. [8] Functional tests are measures of functional status and capacity, referring primarily to the ability to undertake physically demanding activities of daily living or work-related tasks. [8] Examples of functional tests include, but are not limited to, the assessment of lifting, stepping, hopping or general movement (e.g. walking, running, or gait). We included studies that assessed home and work-related function and functional capacity evaluations. We excluded active and passive range of motion tests, orthopaedic tests, which were reported in another review. [10, 13]

      Definition of muscle strength and endurance tests

The National Strength and Conditioning Association of America defines muscle strength as the maximal force that a muscle or muscle group can generate at a specified velocity or as an isometric contraction. [17] Muscle endurance is defined as the time limit of a person’s ability to maintain an isometric force or a power level involving combinations of concentric and/or eccentric muscular contractions. [17] Tests of neck strength and endurance include but are not limited to manual muscle testing, dynamometry, and endurance tests. [7, 18–20]

      Study characteristics

To be included in the systematic review, studies met the following inclusion criteria:

(1)   English or French language;

(2)   published from 1 January 2005 to 7 November 2017;

(3)   published in a peer-reviewed journal;

(4)   reliability or validity studies of neck functional tests; or muscle strength and/or endurance; and

(5)   study population including adults (18 years of age or older) with grades I–IV neck pain (including non-traumatic neck pain and neck pain subsequent to a traffic collision) with or without its associated disorders.

If studies included a mixed population with individuals less than 18 years of age, results must be stratified for adults 18 years of age and older. In studies with multiple diagnostic assessments or tests (e.g. strength, range of motion, and palpation), results must be stratified for each test.

We excluded studies meeting any of the following criteria:

(1)   publication types including guidelines, letters, editorials, commentaries, unpublished manuscripts, dissertations, government reports, books and book chapters, conference proceedings, meeting abstracts, lectures and addresses, consensus development statements, guideline statements;

(2)   study designs including systematic and non-systematic reviews, and case studies;

(3)   cadaveric or animal studies;

(4)   studies only targeting individuals with serious pathology or systemic diseases (including but not limited to fractures, dislocations, myelopathy, neoplasms, and infection);

(5)   sample size less than 20 per group;

(6)   studies utilising devices that are not commonly used or very expensive for a typical clinical practice (e.g. electromyography [EMG]).

      Data sources and searches

We developed a search strategy in consultation with a health sciences librarian, which was reviewed by a second librarian. We systematically searched the following electronic databases from 1 January 2005 to 7 November 2017: MEDLINE, Cochrane Central Register of Controlled Trials, CINAHL, PubMed. We also searched SPORTDiscus for the muscle strength and endurance strategy. Search terms consisted of subject headings specific to each database (e.g. MeSH in MEDLINE) and free text words relevant to

(1)   NAD or WAD IIV,

(2)   diagnosis/validity/reliability/reproducibility, and

(3)   neck muscle strength and/or endurance, or functional test and/or visual inspection (Supplementary Tables S3 and S4).

Visual inspection findings were reported in a separate review. [13] We first developed the search strategy in MEDLINE and subsequently adapted the search to the other bibliographic databases. Our search overlapped the NPTF search by one year to ensure studies were not missed during this period.

      Study selection

We exported all citations identified by the search strategy into EndNote for reference management and tracking of the screening process. Eight pairs of independent reviewers screened articles in two stages. Stage one involved screening of titles and abstracts for relevant and possibly relevant citations based on the inclusion and exclusion criteria. Citations deemed possibly relevant from the first stage were reviewed in the second stage using the full text article. Disagreements were resolved by discussion between the paired reviewers to reach consensus. If consensus could not be reached, the citation was independently screened by a third reviewer and discussed with the other two reviewers to reach consensus.

      Assessment of risk of bias

Pairs of reviewers (thirteen pairs in total) independently critically appraised all relevant studies. We assessed the internal validity of each study using the modified Quality Appraisal Tool for Studies of Diagnostic Reliability (QAREL) [21] criteria for diagnostic reliability studies, and the modified Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria for diagnostic accuracy studies. [22] We modified the QAREL and QUADAS-2 instruments to include

(1)   a question on whether the study objective was clear;

(2)   not applicable options for certain questions (QAREL items # 3, 4, 5, 6, 8, QUADAS items # 3.1, 3.2, 3.3 and 3.B); and

(3)   the Sackett and Haynes classification (in the QUADAS-2 instrument) described below. [23]

Based on these critical appraisal criteria, a study was considered low risk of bias if reviewers agreed that selection bias (questions 2 and 3 of the QAREL checklist and questions in domain 1 for the QUADAS-2 checklist) and measurement bias (questions 4–10 of the QAREL checklist and questions in domains 2 and 3 for the QUADAS-2 checklist) did not threaten the internal validity of a study.

Consensus between reviewers was reached through discussion and an independent third reviewer was involved when consensus could not be reached. We contacted authors if additional information was needed to ensure the critical appraisal was accurate. Following appraisal, we considered studies with adequate internal validity as low risk of bias and included these studies in the best evidence synthesis. We classified each low risk of bias study according to the classification system described by Sackett and Haynes. [23] Phase I studies assess differences in the results of the diagnostic test between patients and healthy individuals. Phase II studies assess the association between test results compared to a reference standard in patients diagnosed with the condition (i.e. NAD), whereas phase III assess a test’s ability to perform in a population with the suspected condition. Finally, phase IV examines whether patients who were assessed with the test have better outcomes than untested individuals. [23] Early studies of novel tests provide preliminary evidence of clinical utility, and phase III or IV studies are needed to inform the validity and utility of a test in clinical practice.

      Data extraction and synthesis of results

Two reviewers (M. S., P. W.) extracted data from low risk of bias studies to build evidence tables. A second reviewer checked the extracted data (H. S., J. W., or N. L.). Meta-analysis would be performed in the event that the accepted studies were statistically and clinically homogenous. In the case of heterogeneity, a qualitative synthesis of findings from the studies with a low risk of bias would be performed to develop evidence statements according to the principles of best evidence synthesis. [24] Specifically, the research team used evidence tables to outline the best evidence on each topic, identify consistencies and inconsistencies in this evidence, and formulate summary statements to describe the body of evidence and compare the results to the NPTF findings. [9]

      Statistical analyses

We computed the inter-rater reliability for the screening of articles using the kappa coefficient (OE) and 95% confidence intervals (CI). [25] We also calculated the percentage agreement for classifying studies into high or low risk of bias following independent critical appraisal.

      Reporting

Our review complies with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [26] and Statement for Reporting Studies of Diagnostic Accuracy (STARD). [27]



Results

      Study selection

We identified 10,085 citations, removed 2239 duplicates, and screened 7,846 articles for eligibility (Figures 1 and 2).

We screened 165 citations for eligibility using full text and 129 were excluded due to

(1)   sample size less than 20 (n = 22);

(2)   irrelevant outcomes (n = 49);

(3)   ineligible study population (n = 27);

(4)   ineligible study design (n = 15);

(5)   ineligible publication type (n = 14);

(6)   ineligible language (n = 1);

(7)   ineligible device (i.e. too sophisticated machine) (n = 1).

We critically appraised 36 articles of which nine focussed only on visual inspection were reported in a separate review. [13] The remaining 27 articles appraised include 28 studies as articles could explore both reliability and validity. Eighteen articles (reporting on 19 studies) were low risk of bias, which included nine reliability studies [28–36] and 10 validity studies [37–45]; among which one article assessed both reliability and validity. [34] Nine studies were deemed to be high risk of bias [37, 42, 45, 47–52] and excluded from the best evidence synthesis; among which reliability section of two low risk of bias studies [42, 45] and the muscle strength study of a low risk of bias article. [37]

The inter-rater reliability for screening of articles were k = 0.98 (95% CI 0.67; 0.84) for the visual inspection and functional tests section, and k = 0.88 (95% CI: 0.80–0.96) for the muscle strength and endurance tests section. The visual inspection findings were reported in a separate review. [13] In total, the percentage agreement for independent critical appraisal of studies (high versus low risk of bias) was 75% (21/28). A meta-analysis was not possible due to clinical heterogeneity between studies; we therefore conducted a best evidence synthesis.

      Study characteristics

Of the nine reliability studies with low risk of bias, five examined inter-rater reliability [28, 29, 31, 33, 36], one examined intra-rater reliability [32], and three examined both. [30, 34, 35]

Reliability was studied for neck:

(1)   function (n = 3) [28–30];

(2)   muscle strength (n = 2) [34, 35]; and

(3)   muscle endurance (n = 6). [31–36]

In these articles, neck pain was defined as chronic NAD I–II [30–32, 34, 35], NAD I–II [29], or NAD I–III [28] of unknown duration, and chronic NAD I–III. [33, 36]

Of the 10 validity studies with low risk of bias,

eight were phase I studies [34, 37, 39–41, 43–45],

of which four included a phase II component [34, 39, 41, 43]

and two studies were phase II only. [38,42]

The validity was studied for neck:

(1)   function (n = 3) [37–39];

(2)   muscle strength (n = 4) [34, 40, 42, 45]; and

(3)   muscle endurance (n = 4) [34, 41, 43, 44] in patients with NAD.

Seven studies targeted NAD grades I and II [34, 37, 38, 40, 42, 44, 45], one targeted NAD grades I–III [43], and two targeted NAD grade III. [39,41]

The studies were conducted in the United States [31,33,36], Australia [32,42], Canada [39], Denmark [34, 34], Hong Kong [40], Portugal [44], Spain [30, 45], Sweden [28, 37, 41, 43], and Switzerland [29, 38]. The examiners consisted of physical therapy students [34] and trained physiotherapists [28–31, 33, 35–37, 39, 42, 46]; while the remaining studies did not report the background of the examiners. [32, 38, 40, 41, 43, 44].

A variety of functional tests were used:

(1)   active cervical movement control tests (ROM) in various positions (n = 3) [28–30];

(2)   active shoulder movement control tests (n = 3) [28–30];

(3)   handgrip strength (n = 2) [37, 38]; and

(4)   lifting overhead, overhead working and repetitive reaching (n = 2) [38, 39] or other functional tests (n = 2). [29, 30]

Tests of neck muscle strength included the cranio-cervical flexion test (CCFT) performed with pressure biofeedback [34, 35, 40, 42] or cervical muscle strength. [45] Tests of neck muscle endurance included the chin tuck neck flexion test [31–33, 44], neck flexor muscle endurance (NFME) test [35], neck extensor test (NET) [32, 35, 36, 44], deep cervical extensor (DCE) test [34], and neck muscle endurance (NME) test. [41, 43] All tests are described in the glossary (Table 1).

      Risk of bias within studies

All reliability studies with low risk of bias had clear study objectives as well as appropriate sample selection, inter-rater blinding, time intervals between measurements, interpretation of tests, and use of statistics (Table 2).

However, some studies did not provide

(1)   clear information regarding raters (n = 2) [29, 32];

(2)   adequate information regarding intra-rater blinding (n = 2) [34, 35];

(3)   a clear description of blinding to the reference standard (n = 1) [35];

(4)   clear description regarding blinding of clinical information (n = 2) [33, 36] or additional cues such as scars or unique identifying feature on imaging films for example (n = 7) [29, 30, 32–36]; and

(5)   information or no variation in the order of examination (n = 5). [30, 31, 33, 34, 36]

All validity studies with low risk of bias had appropriate exclusion criteria, reference standard and time interval (Table 3). However, some studies did not provide clear information regarding patient flow (n = 3) [] [40,41,43] and blinding to reference standard or index results (n = 3) [] [38,41,43] (Table 3).

Nine studies were deemed to be high risk of bias [37, 42, 44, 46–51] and excluded from the best evidence synthesis; among which reliability sections of two low risk of bias studies [42, 44] and the muscle strength study of a low risk of bias article. [37]

These studies were excluded due to

(1)   inappropriate rater selection [44];

(2)   absence of blinding [44, 49, 50];

(3)   incorrect interpretation of results [51];

(4)   unexplained study flow [46, 51];

(5)   inappropriate reference standards (not reliable or valid) [48]; or (6) insufficient information in the Methods section. [47]



Summary of evidence

      Reliability of neck functional tests

Three studies assessed the reliability of functional tests used for neck pain assessment. Two examined inter-rater reliability only [28, 29] and one examined both inter- and intra-rater reliabilities. [30] Evidence from three studies supports the reliability of active cervical and arms control tests in the assessment of NAD I and II (Table 4). However, the reliability findings of active shoulder tests for the assessment of NAD I–III patients showed important measurement errors [28] (Table 4).

      Active cervical and shoulder control tests.

Patroncini et al. examined the reliability of active movement control tests of the cervical spine and upper limb in individuals with NAD I and II of unknown duration. Movement control tests examined whether there was impairment in the control of movement during functional activities. [29] Statistically significant differences of clinical importance for diagnostic were shown for all index tests. Specifically, active cervical motion produced similar results for all ranges, with higher reliability shown for nodding movement with head on the wall (k = 0.80, 95% CI: 0.55–1.00), chin protraction–retraction (k = 0.91, 95% CI: 0.75–1.00), and neck flexion when supine (k = 0.81, 95% CI: 0.61–1.00). Results suggested unilateral arm flexion (k = 0.74, 95% CI: 0.47–0.95), and bilateral shoulder elevation (k = 1.0) were also shown to have inter-reliability (Table 4). [29]

Segarra et al. examined chronic NAD I-II patients in comparison to patients with musculoskeletal disorders other than the cervical spine. [30] Both inter- and intra-reliabilities were assessed for active cervical and shoulder ROM in various body positions. [30] Inter-rater reliability was assessed by video recordings with a 2–week period between ratings. All tests were shown to have significant reliability in seated, standing and 4–point kneeling positions (Table 4). Inter-rater reliability was greatest for active cervical rotation (k = 0.81, 95% CI: 0.58–1.00), active upper cervical rotation in 4–point kneeling (k = 0.80, 95% CI: 0.66–0.93), active cervical extension while seated (k = 0.73, 95% CI: 0.29–0.91), and active bilateral arm flexion while standing (k = 0.71, 95% CI: 0.44–0.93). The same tests were shown to have intra-rater reliability (0.70 (0.49–0.83) < ICC (95% CI) < 0.94 (0.90–0.97)), along with active cervical extension (ICC = 0.92; 95% CI: 0.86–0.95) in 4–point kneeling (Table 4). [30]

A recent study by Aasa et al. assessed the inter-rater reliability of NAD I-III patients of unknown duration with healthy age-matched controls. [28] Index tests included: active maximal neck extension, rotation, active scapulo-/gleno-humeral medial rotation in the scapular plane, serratus anterior and lower trapezius control tests. Serratus anterior testing involved downward rotation, elevation and retraction of the scapula in 4–point kneeling position. Lower trapezius control was tested with downward elevation and retraction of the scapula while prone.

Raters observed video recordings independently. Inter-rater reliability of all index tests was statistically significant with important measurement errors. For expert raters, greater inter-rater reliability was found for active neck rotation and serratus anterior tests (k = 0.89; SE = 0.08; p0.01) (Table 4). For the novice pair, statistically significant results were found for gleno-humeral medial rotation right (k = 0.60; SE = 0.18; p<0.01) and left (k = 0.62; SE = 0.20; p<0.01), and lower trapezius control tests (k = 0.58; SE = 0.17; p<0.01). Experts were found to have a higher inter-rater reliability compared to novice raters on all tests, except right gleno-humeral medial rotation (Table 4). [28]

      Active movement control tests of the upper limb.

Patroncini et al. examined the reliability of active movement control tests of the upper limb in individuals with NAD I-II of unknown duration. [29] Movement control tests that assessed upper body forward-backward motion (k = 0.84, 95% CI: 0.68–0.94) and forward bending in standing (k = 1.0) demonstrated inter-rater reliability, as well as weighted arm flexion to 90° (k = 0.85, 95% CI: 0.55–1.00) (Table 4). [29] Segarra et al. examined chronic NAD I-II patients in comparison to patients with musculoskeletal disorders other than the cervical spine. [30] Both inter- and intra-rater reliabilities were assessed for rocking backwards in 4–point kneeling. [30] This test was shown to have significant reproducibility, with higher intra-rater reliability (0.78 (0.54–0.99) < k (95% CI) < 0.80 (0.54–1.0)) than inter-rater reliability (i = 0.36; 95% CI: 0.12–0.68) (Table 4). [30]

      Validity of neck functional tests

Three articles examined the validity of functional tests used for NAD patients. [37–39] Two phase I and two phase II studies provide preliminary evidence that active shoulder control tests, handgrip strength and tests such as lifting overhead, overhead working and repetitive reaching may be helpful for the assessment of NAD I-II patients (Table 5). However, the clinical accuracy of these tests is not known.

      Active shoulder control tests.

Juul-Kristensen et al. designed a phase I study identifying computer workers with recurrent neck and shoulder trouble with healthy workers reporting little or no neck or shoulder dysfunction in the last year. [37] Functional tests evaluated were maximum voluntary contraction (MVC) of shoulder elevation. Construct validity was assessed by determining the mean difference in functional outcomes between groups (symptomatic-control). There was a statistically significant difference of clinical importance for diagnostic between groups (Table 5). Participants with selfreported neck trouble had decreased shoulder elevation compared to their asymptomatic colleagues (mean differences symptomatic versus asymptomatic between –44.00 (95% CI: –51.38, –36.62) and –66.00 (95% CI: –77.46, –54.54) Newton’s for the right and the left side, respectively). [37]

      Handgrip strength.

Juul-Kristensen et al. also evaluated handgrip strength using a dynamometer. [37] Construct validity was assessed by determining the mean difference in functional outcomes between groups (symptomatic-control). Participants with self-reported neck trouble had decreased right-handed grip strength compared to their asymptomatic colleagues. However, they found there was no statistically significant difference between groups using left handgrip strength (Table 5). [37]

A phase II study by Trippolini et al. measured the construct validity of functional tests in patients with persistent NAD I and II. [28] Participants performed tests by incrementally increasing weight until reaching their maximal ability. The correlation between hand grip strength and reference standards including: numeric rating scale (NRS) for pain, spinal function sort (SFS) for functional ability, neck disability index (NDI) for disability, and the hospital anxiety and depression scale (HADS-A/D) for anxiety and depression were calculated. For handgrip strength (in kgF), all correlations with reference standards were statistically and clinically significant indicating decreased grip strength was associated with increased pain, disability, anxiety, depression ( –0.28 ( –0.38 to 0.17) < Pearson’s r (95% CI) < –0.25 ( –0.35 to 0.15)) and decreased functional ability (Pearson’s r = 0.38 (95% CI: 0.28–0.47)) (Table 5). [38] Statistically significant gender differences were also shown, with all favouring greater abilities in males (Table 5). [38]

      Lifting overhead, overhead working and repetitive reaching.

A phase II study by Trippolini et al. measured the construct validity of functional tests in patients with persistent NAD I and II. [38] The correlation between lifting overhead, overhead working, and repetitive reaching and reference standards were calculated. Reference standards included: numeric rating scale (NRS) for pain, spinal function sort (SFS) for functional ability, neck disability index (NDI) for disability, and the hospital anxiety and depression scale (HADS-A/D) for anxiety and depression. Lifting overhead (kg), working overhead, and repetitive reaching significantly correlated with all reference standards indicating decreased functionality was associated with increased pain, disability, anxiety, depression and decreased functional ability (Table 5). [38] Statistically significant gender differences were also shown for lifting and repetitive reaching, all favouring greater abilities in males (mean differences between male and female from 3.80 kg (95% CI: 2.57–5.03) to 8.20 s (95% CI: 2.23–14.17), respectively) (Table 5). [38]

A phase I and II validity studies assessed functional impairment test in patients with WAD II and controls. [39]

Three timed tasks were performed:

(1)   the waist-up test consisting of grabbing, lifting, moving, and placing containers on waist-level and 25 cm above waist level shelves;

(2)   the same task except that the two shelves are placed at eye level and 25 cm below; and

(3)   an overhead work task.

There was a significant mean difference between groups (WAD II versus controls) for all tasks (p<0.001). Performance scores of the tasks are negatively correlated with pain intensity (NPRS) (–0.37 < Spearman’s r <–0.46); neck disability (NDI) (–0.32 < Spearman’s < r–0.43); arm and shoulder disability (DASH) (–0.25 < Spearman’s r <–0.36); and positively correlated with cervical range of motion (CROM) (0.01 < Spearman’s r <0.51). [39]

      Reliability of neck muscle strength tests

Two low risk of bias studies provide evidence of inter-rater and intra-rater reliabilities for the CCFT in patients with NAD I and II and healthy controls [34, 35] (Table 4). Evidence from these studies supports the reliability of Cranio-Cervical Flexion Test in the assessment of NAD I and II patients.

      Cranio-cervical flexion test.

The inter-rater reliability intraclass correlation coefficient (ICC) ranged from 0.63 (95% CI: 0.41–0.78) to 0.82 (95% CI: 0.67–0.91) measured at two minutes. The intra-rater reliability ranged from 0.70 (95% CI: 0.43–0.85) to 0.86 (95% CI: 0.72–0.93), with measurements occurring between one and seven days. [34]

Juul et al. (2013) reported an inter-rater reliability ICC ranging between 0.85 (95% CI: 0.76–0.91) and 0.86 (95% CI: 0.81–0.93) measured at ten minutes. The intra-rater reliability ranged from 0.69 (95% CI: 0.53–0.80) to 0.81 (95% CI 0.70–0.88) with measurements taken at 1 and 3 working days (Table 4). [35]

      Validity of neck muscle strength tests

Three low risk of bias studies provide evidence of validity (phases I and II) for the CCFT as a measure of neck muscle strength in patients with NAD/WAD grades I and II (Table 5). [34, 40, 42] Results from these studies provide preliminary evidence that the cranio-cervical flexion test may be helpful in the assessment of NAD and WAD I and II patients. Another phase I study reported preliminary evidence of cervical muscle strength for the assessment of patients with NAD I-II. [45] However, the clinical accuracy of these tests is not known.

      Cranio-cervical flexion test (CCFT).

Two phase I validity studies reported a significant difference in CCFTs scores [1.71/30mmHg (95% CI: 0.22–3.21); p = 0.03] [34] and performance (p<0.001) [40] between patients with NAD I and II and healthy controls (Table 5). A phase II validity study reported a non-statistically significant negative correlation between muscle strength (as measured by CCFT activation score and performance index) and both pain intensity and disability (as measured by the Visual Analogue Scale and Neck Disability Index respectively) (Table 5). [42] Similarly, Jorgensen et al. (phase II study) reported a negative correlation between the CCFT and both the Neck Disability Index and Numeric Rating Scale, and a positive correlation with the SF36-Physical Component Score (Table 5). [34]

      Cervical muscle strength test.

A phase I validity study reported a significant median difference in cervical muscle strength between NAD I and II patients and a control group (from 3.25 kg (95% CI: 1.75–4.76); p<0.05 in latero-flexion to 4.82 kg (95% CI: 2.93–6.71); p<0.05 in extension) [45] (Table 5).

      Reliability of NME tests

Six studies reported on the reliability of five NME tests (Table 4). [31–36] Evidence from these studies supports the reliability of chin tuck neck flexion test, neck extensor endurance test, DCE test in the assessment of NAD I and II patients, and the reliability of NFME and NE tests for the assessment of NAD I–III patients.

      Chin tuck neck flexion test.

Cleland et al. (2006) reported the inter-rater reliability [ICC = 0.57 (95% CI: 0.14–0.81)] of the Chin Tuck Neck Flexion Test among adult patients with NAD grades I and II, with a mean duration of 69 d [31] (Table 4).

Two articles studied a similar test as the Chin tuck neck flexor test (described in Table 1). One study measured neck flexor endurance in patients with NAD I and II at least 6 months duration. [32] The intra-rater reliability coefficient was ICC = 0.93 (95% CI: 0.86–0.97) measured three days later. [32] Hanney et al. reported on the inter-rater reliability of the neck Flexor Endurance Test in patients with NAD grades I–III with a mean symptom duration of 259 d, ICC = 0.70 (95% CI: 0.40–0.87). [33]

      NFME test.

One study provided evidence of the intra-rater reliability of the NFME test in patients with NAD I and II and healthy controls. [35] Juul et al. (2013) measured neck flexor endurance in supine and while seated.

The intra-rater reliability ICC ranged from 0.68 (95% CI: 0.52–0.80) to 0.75 (95% CI: 0.61–0.85) in supine, and between 0.42 (95% CI: 0.18–0.60) and 0.59 (95% CI: 0.40–0.73) when seated. [35] Juul et al. also reported the inter-rater reliability of the NFME in supine ranging from 0.73 (95% CI: 0.59–0.83) to 0.70 (95% CI: 0.55–0.81), and between 0.56 (95% CI: 0.37–0.71) and 0.74 (95% CI: 0.56–0.84) when seated. [35]

      NET.

Three studies provide evidence of the reliability of the NET. [32, 35, 36] Edmondston et al. reported the intra-rater reliability for patients with NAD I and II of greater than 6 months duration [ICC = 0.88 (95% CI: 0.75–0.95)]. [32] Juul et al. (2013) reported that intra-rater ICC ranged between 0.41 (95% CI: 0.17–0.60) and 0.14 (95% CI: –0.17 to 0.37) for patients with NAD I and II of greater than 4 weeks duration. [35] The inter-rater reliability of the NET ranged between 0.19 (95% CI: –0.06 to 0.42) and 0.25 (95% CI: –0.01 to 0.47) [35] (Table 4).

Sebastian et al. provided evidence of the inter-rater reliability for the cervical extensor endurance test in patients with NAD I and III of unknown duration [k = 0.80 (95% CI: 0.59–1.01)] [36] (Table 4).

      DCE test.

A study by Jorgensen et al. assessed the DCE test in patients with NAD grades I and II and healthy controls. [34] The inter-rater reliability ranged from 0.75 (95% CI: 0.55–0.87) to 0.76 (95% CI: 0.59–0.86) with an intra-rater reliability ranging from 0.77 (95% CI: 0.55–0.89) to 0.90 (95% CI: 0.79–0.95) [34] (Table 4).

      Validity of neck endurance tests

Four articles provided evidence of phase I/II validity of the NME test, deep cervical endurance test or neck flexor and extensor tests. [34, 41, 43, 44] Evidence from these studies provide preliminary evidence for the validity of chin tuck neck flexion, neck extensor, DCE tests in NAD I and II patients and NME tests in NAD I–III patients (Table 5). However, the clinical accuracy of these tests is not known.       Chin tuck neck flexion test.

A phase I validity study reported a median difference (in seconds) in deep neck flexor endurance between NAD I and II patients (18.82s (interquartile: 8.08)) and a control group (26.29s (interquartile: 24.13)) [44] (Table 5).

      NET.

A phase I validity study reported a median difference (in minutes) in deep neck extensor endurance between NAD I and II patients (3.44 min (interquartile: 3.03) and a control group 3.54 min (interquartile: 2.04) [44] (Table 5).

      NME test.

Four studies provide evidence of validity of the NME test. [41, 43] Halvorsen et al. examined the construct validity (phase II) of the NME test compared to the Visual Analogue Scale, Neck Disability Index (NDI), and Tampa Scale of Kinesiophobia (TSK). [41] Results suggested that patients with NAD III had a significantly reduced NME time compared to healthy controls (prone: p < 0.01; supine: p = 0.017) (Table 5). [41] Peolsson et al. investigated the construct validity (phase II) of the NME test compared to the VAS and NDI in patients with NAD I–III, cervical disc disease, and healthy controls. [43]

There was a negative correlation between

(1)   NME test performed in a dorsal position and Visual Analogue Scale for pain intensity (Pearson’s r = 0.30; p = 0.01); and

(2)   NME test performed in ventral position and the Neck Disability Index (Pearson’s r = 0.23; p = 0.07), which demonstrates that as neck pain and disability increase, neck endurance decreases [43] (Table 5).

      DCE test.

A phase I validity study reported a significant median difference in time on the deep extension endurance between NAD I and II patients and a control group (29.21 seconds; p = 0.06) [34] (Table 5). Jorgensen et al. reported also on the construct validity of the DCE test compared to the NDI, SF-36-PCS, and the Numeric Rating Scale in NAD I and II patients. [34] There was a negative correlation between the DCE test and Neck Disability Index and Numeric Rating Scale, respectively, and a positive correlation between the test and SF36-PCS (Table 5). [34]



Discussion

      Summary of the results and update of the neck pain task force findings

Only tests with both reliability and validity findings could be used in clinic. For functional testing as an assessment tool, our review suggests inter- and intra-reliabilities for active movement control tests of the cervical spine and upper extremity. In particular, bilateral shoulder elevation/flexion and forward bending when standing produced a perfect inter-rater reliability score in NAD II patients. The evidence suggests inter-rater reliability for active cervical rotation and scapular medial and downward rotation as well. However, the evidence suggests some important measurement errors and the clinical accuracy of these tests is not known. For muscle and endurance tests, our results demonstrate preliminary evidence of reliability and phase I and II validity for the cranio-cervical flexion test and deep cervical extension test for the assessment in patients with NAD. In addition, evidence for reliability and preliminary validity of NME tests including both neck flexor tests and extensor tests was identified. However, the clinical accuracy of these tests is not known.

      Update of the Bone and Joint Decade 2000–2010 Task Force on neck pain’s systematic review

For functional tests, the NPTF reported on one study that showed some evidence for construct validity (phase II). [9] Ljungquist et al. showed that patients with NAD I-II had less lifting ability from waist to shoulder compared to those with lower back pain. [52] They found a higher rating of pain intensity or pain behaviour associated with lower performance on functional tests (i.e. stepping, lifting, and walking). There were no studies reported on the reliability of this assessment method. Our findings update new evidence for functional tests.

For muscle and endurance tests, the reliability and validity results of this review are in agreement with the Neck Pain Task Force, which accepted seven studies dealing with neck muscle strength (active arm and shoulder control tests and flexor tests) as an assessment tool for patients with neck pain. [9] Six of these articles were phase I/II and one was a phase III but utilising EMG to measure muscle strength (which was excluded from this review). In their review, Nordin et al. (2008) reported that neck muscle strength tests had a slight to moderate inter-examiner reliability (kappa ≤0.60) in patients with neck pain, with or without radiculopathy. [9]

The Neck Pain Task Force also reported on one validity study which found that cervical flexor muscle endurance could distinguish between patients with WAD II and healthy controls. [9] Our review is in agreement with the Neck Pain Task Force, as we identified significant mean differences in NFME (supine) between patients with NAD III and healthy controls. [41]

Furthermore, the Neck Pain Task Force did not identify any low risk of bias studies assessing cervical extensor endurance tests. However, we found new evidence that examines the preliminary validity of two cervical extensor endurance tests for NAD: the NME Test prone and the DCE test. [34, 41] Even though the Neck pain Task Force was published more than 10 years ago, there is still an important need for phase III and IV validity studies to establish the clinical utility of these tests in clinical practice.

      Comparison of results to previous systematic reviews

One previous systematic review [7] investigated the reliability and validity of neck muscle strength and endurance tests since the Neck Pain Task Force. [9] de Koning et al. reported that muscle endurance tests of the short neck flexors and the cervical progressive iso-inertial lifting evaluation (PILE) test could be reliable with ICC intra-rater reliability ranged from 0.88 to 0.96. An almost perfect inter-observer reliability coefficient was reported (ICC = 1.00 (95% CI 0.99–1.0). [7] No studies using the PILE test in the current review met our inclusion criteria (sample size < 20). [37]

      Strengths and limitations

There are several strengths to our review. First, we worked with a librarian to develop a search strategy that was comprehensive and methodologically rigorous. To minimise errors, this search strategy was reviewed by a second independent librarian. Second, prior to reviewing the literature, we outlined detailed inclusion and exclusion criteria to identify relevant citations from the searched literature. Third, we searched multiple databases using database-specific subject headings (e.g. MeSH) when available. Fourth, we had multiple pairs of independent reviewers complete screening and critical appraisal to minimise error and bias. Fifth, we used standardised quality assessment tools (QAREL/QUADAS-2) for the critical appraisal process. Finally, we used best-evidence synthesis to minimise the risk of bias associated with the inclusion of low-quality studies. Any count points were used to decide the risk of bias of each article, and pairs of reviewers were trained to determine the overall internal validity of the studies and to assess how biases influenced the results.

Our review has some limitations. First, our literature search was restricted to the English and French languages and potentially admissible non-English/French studies may have been excluded. However, previous systematic reviews of clinical trials have investigated the impact of language restriction and found that it does not lead to bias as most large trials are published in English. [53–57]

Second, it is possible that we missed potentially relevant studies, despite using a sensitive search strategy and an independent screening process. We updated our literature search to November 2017, but found no new information. Nevertheless, it is possible that new research has been published since.

Third, there is judgment in the critical appraisal process, which may vary between reviewers. However, we minimised this by using pairs of independent, trained reviewers and standardised quality assessment tools. Fourth, most clinical tests described in this review involve a subjective evaluation of the patient which may lead to measurement error. This could be especially important when comparing versus experienced examiner.

This could explain some of the inconsistency outlined in our results. Finally, we elected to keep shoulder tests because it is important to report what has been published in the literature. However, the pathophysiological rational for some of these tests was lacking or ill-conceived.



Conclusions

We found active shoulder tests reliable and valid to assess neck function in adults with NAD. Experts were found to have a higher inter-rater reliability compared to novice raters. The cranio-cervical flexion test and DCE test were also reported to be reliable and valid for the assessment of cervical muscle strength in NAD patients. Overall, the evidence is preliminary at best, supported by phase I and II validity studies from the Sackett and Haynes classification. [23] Clinicians must consider the preliminary nature of the evidence when considering the use of these tests in clinic.

More than 10 years after the publication of the Neck Pain Task Force, we still know little about the reliability and validity of clinical tests used to assess cervical function, muscle strength, and endurance in adults with neck pain. At best the current literature provides preliminary evidence for the active shoulder tests, cranio-cervical flexion tests and the DCE test. Therefore, the clinical utility of these tests remains unknown. Future high-quality studies, particularly phase III validity studies, are needed to inform the use of these tests for the assessment of NAD in clinical practice and their utility for treatment recommendations.


Acknowledgement:

The authors acknowledge and thank Mrs Sophie Despeyroux, librarian at the Haute Autorite de Sante, for her suggestions and review of the search strategy. This research was undertaken, in part, thanks to funding from the Canada Research Chairs programme to Dr Pierre Cote, Canada Research Chair in Disability Prevention and Rehabilitation at the University of Ontario Institute of Technology.


Conflict of Interest

The authors report no declarations of interest. None of these associations were involved in the collection of data, data analysis, interpretation of data, or drafting of the manuscript.


Funding

This study was funded by the Institut Franco-Europeen de Chiropraxie, the Association Franc¸aise de Chiropraxie and the Fond de Dotation de Recherche en Chiropraxie in France. Fond de Dotation en Recherche Chiropratique.



References:

  1. GBD 2017 DALYs and HALE Collaborators.
    Global, regional, and national disability-adjusted life-years (DALYs) for 359 diseases
    and injuries and healthy life expectancy (HALE) for 195 countries and territories,
    1990–2017: a systematic analysis for the Global Burden of Disease Study 2017.
    The Lancet. 2018;392(10159):1859–1922.

  2. Hogg-Johnson, S, van der Velde, G, Carroll, LJ et al.
    The Burden and Determinants of Neck Pain in the General Population: Results of the
    Bone and Joint Decade 2000–2010 Task Force on Neck Pain and Its Associated Disorders

    Spine (Phila Pa 1976). 2008 (Feb 15); 33 (4 Suppl): S39–51

  3. Carroll LJ, Hogg-Johnson S, Cote P, van der Velde G, Holm LW, et al.
    Course and Prognostic Factors for Neck Pain in Workers: Results of the Bone and Joint Decade
    2000–2010 Task Force on Neck Pain and Its Associated Disorders

    Spine (Phila Pa 1976). 2008 (Feb 15); 33 (4 Suppl): S93–100

  4. Cote P, van der Velde G, Cassidy JD, Carroll LJ, Hogg-Johnson S, Holm LW, et al.
    The Burden and Determinants of Neck Pain in Workers: Results of the Bone and Joint Decade
    2000–2010 Task Force on Neck Pain and Its Associated Disorders

    Spine (Phila Pa 1976). 2008 (Feb 15); 33 (4 Suppl): S60–74

  5. Silverman JL, Rodriquez AA, Agre JC.
    Quantitative cervical flexor strength in healthy subjects and in subjects with mechanical neck pain.
    Arch Phys Med Rehabil. 1991;72:679–681.

  6. Rodriquez AA, Bilkey WJ, Agre JC.
    Therapeutic exercise in chronic neck and back pain.
    Arch Phys Med Rehabil. 1992;73:870–875.

  7. de Koning CH, van den Heuvel SP, Staal JB, et al.
    Clinimetric evaluation of methods to measure muscle functioning in patients with non-specific
    neck pain: a systematic review.
    BMC Musculoskelet Disord. 2008;9:142.

  8. Solway S, Brooks D, Lacasse Y, et al.
    A qualitative systematic overview of the measurement properties of functional walk test
    used in the cardiorespiratory domain.
    Chest. 2001;119:256–270.

  9. Nordin M, Carragee EJ, Hogg-Johnson S, Weiner SS, Hurwitz EL, Peloso PM, et al.
    Assessment of Neck Pain and Its Associated Disorders: Results of the Bone and Joint Decade
    2000–2010 Task Force on Neck Pain and Its Associated Disorders

    Spine (Phila Pa 1976). 2008 (Feb 15); 33 (4 Suppl): S101–S122

  10. Lemeunier N; da Silva-Oolup S; Chow N; Southerst D; Carroll L; Wong JJ; et al..
    Reliability and Validity of Clinical Tests to Assess the Anatomical Integrity of the Cervical Spine
    in Adults with Neck Pain and its Associated Disorders: Part 1- A Systematic Review from the
    Cervical Assessment and Diagnosis Research Evaluation (CADRE) Collaboration

    European Spine Journal 2017 (Sep); 26 (9): 2225–2241

  11. Moser N, Lemeunier N, Southerst D, Shearer H, Murnaghan K, Sutton D, Cote P (2017)
    Validity and Reliability of Clinical Prediction Rules used to Screen for Cervical Spine Injury
    in Alert Low-risk Patients with Blunt Trauma to the Neck: Part 2. A Systematic Review
    from the Cervical Assessment and Diagnosis Research Evaluation
    (CADRE) Collaboration

    European Spine Journal 2018 (Jun); 27 (6): 1219–1233

  12. Lemeunier N; da Silva-Oolup S; Olesen K; Carroll LJ; Shearer H; Wong JJ; Brady OD; et al.
    Reliability and validity of clinical tests to assess measurements of pain and disability in adults
    with neck pain and its associated disorders: Part 3. A systematic review from the Cervical
    Assessment and Diagnosis Research Evaluation (CADRE) Collaboration

    Musculoskeletal Science & Practice 2018 (Dec);   38:   128–147

  13. Lemeunier N, Jeoun EB, Suri M, et al.
    Reliability and Validity of Clinical Tests to Assess Posture, Pain Location, and Cervical Spine Mobility
    in Adults with Neck Pain and its Associated Disorders: Part 4. A Systematic Review from the
    Cervical Assessment and Diagnosis Research Evaluation (CADRE) Collaboration

    Musculoskeletal Science & Practice 2018 (Dec);   38:   128–147

  14. Guzman J, Hurwitz EL, Carroll LJ, Haldeman S, Cote P, Carragee EJ, et al.
    A New Conceptual Model Of Neck Pain: Linking Onset, Course, And Care
    Results of the Bone and Joint Decade 2000–2010 Task Force on
    Neck Pain and Its Associated Disorders

    Spine (Phila Pa 1976). 2008 (Feb 15); 33 (4 Suppl): S14–23

  15. Spitzer WO, Skovron ML, Salmi LR, Cassidy JD, Duranceau J, Suissa S, Zeiss E.
    Scientific Monograph of the Quebec Task Force on Whiplash-Associated Disorders
    Redefining Whiplash and its Management

    Spine (Phila Pa 1976). 1995 (Apr 15); 20 (8 Suppl): S1-S73

  16. Rothman KJ.
    Modern epidemiology.
    Philadelphia, USA:
    Wolters Kluwer Health/Lippincott Williams & Wilkins; 2008.

  17. Knuttgen HG, Kraemer WJ.
    Terminology and measurement in exercise performance.
    J Strength Cond Res. 1987;1:1–10.

  18. Strimpakos N, Oldham JA.
    Objective measurements of neck function. A critical review of their validity and reliability.
    Phys Ther Rev. 2001;6:39–51.

  19. Strimpakos N.
    The assessment of the cervical spine. Part 2: strength and endurance/fatigue.
    J Bodyw Mov Ther. 2011;15: 417–430.

  20. Dvir Z, Prushansky T.
    Cervical muscles strength testing: methods and clinical implications.
    J Manipulative Physiol Ther. 2008;31: 518–524.

  21. Lucas N, Macaskill P, Irwig L, et al.
    The reliability of a quality appraisal tool for studies of diagnostic reliability (QAREL).
    BMC Med Res Methodol. 2013;13:111.

  22. Whiting PF, Rutjes AW, Westwood ME, et al.
    QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.
    Ann Intern Med. 2011;155:529–536.

  23. Sackett DL, Haynes RB.
    The architecture of diagnostic research.
    BMJ. 2002;324:539–541.

  24. Slavin RE.
    Best evidence synthesis: an intelligent alternative to meta-analysis.
    J Clin Epidemiol. 1995;48:9–18.

  25. Viera AJ, Garrett JM.
    Understanding interobserver agreement: the kappa statistic.
    Fam Med. 2005;37:360.

  26. Moher D, Liberati A, Tetzlaff J, Altman DG.
    Preferred Reporting Items for Systematic Reviews
    and Meta-Analyses: The PRISMA Statement

    PLoS Medicine 2009 (Jul 21); 6 (7): e1000100

  27. Bossuyt PM, Reitsma JB, Bruns DE, et al.
    Toward complete and accurate reporting of studies of diagnostic accuracy.
    Am J Clin Pathol. 2003;119:18–22.

  28. Aasa B, Lundstr€om L, Papacosta D, et al.
    Do we see the same movement impairments? The inter-rater reliability of movement tests
    for experienced and novice physiotherapists.
    Eur J Physiother. 2014;16:173–182.

  29. Patroncini M, Hannig S, Meichtry A, et al.
    Reliability of movement control tests on the cervical spine.
    BMC Musculoskelet Disord. 2014;15:402.

  30. Segarra V, Duenas L, et al.
    Inter-and intra-tester reliability of a battery of cervical movement control dysfunction tests.
    Man Ther. 2015;20:570.

  31. Cleland JA, Childs JD, Fritz JM, Whitman JM.
    Interrater reliability of the history and physical examination in patients with
    mechanical neck pain.
    Arch Phys Med Rehabil. 2006;87:1388–1395.

  32. Edmondston SJ, Wallumrød ME, Macleid F, Kvamme LS, et al.
    Reliability of isometric muscle endurance tests in subjects with postural neck pain.
    J Manipulative Physiol Ther. 2008;31:348–354.

  33. Hanney WJ, George SZ, Kolber MJ, et al.
    Inter-rater reliability of select physical examination procedures in patients with neck pain.
    Physiother Theory Pract. 2011;27:345–352.

  34. Jørgensen R, Ris I, Falla D, Juul-Kristensen B.
    Reliability, construct and discriminative validity of clinical testing in subjects with
    and without chronic neck pain.
    BMC Musculoskelet Disord. 2014;15: 408.

  35. Juul T, Langberg H, Enoch F, et al.
    The intra- and inter-rater reliability of five clinical muscle performance tests in patients
    with and without neck pain.
    BMC Musculoskelet Disord. 2013;14:339.

  36. Sebastian D, Chovvath R, Malladi R.
    Cervical extensor endurance test: a reliability study.
    J Bodyw Mov Ther [Internet]. 2015;19: 213–216.

  37. Juul-Kristensen B, Kadefors R, Hansen K, et al.
    Clinical signs and physical function in neck and upper extremities among elderly female
    computer users: the NEW study.
    Eur J Appl Physiol. 2006; 96:136–145.

  38. Trippolini MA, Dijkstra PU, Geertzen JHB, et al.
    Construct validity of functional capacity evaluation in patients with
    Whiplash-Associated disorders.
    J Occup Rehabil. 2015;25:481–492.

  39. Pierrynowski M, McPhee C, Mehta SP, et al.
    Intra and inter-rater reliability and convergent validity of FITHaNSA in individuals with
    Grade G Whiplash Associated disorder.
    Toorthj. 2016;10: 179–189.

  40. Chiu TTW, Law EYH, Chiu THF.
    Performance of the craniocervical flexion test in subjects with and without chronic neck pain.
    J Orthop Sports Phys Ther. 2005;35:567–571.

  41. Halvorsen M, Abbott A, Peolsson A, et al.
    Endurance and fatigue characteristics in the neck muscles during sub-maximal isometric
    test in patients with cervical radiculopathy.
    Eur Spine J. 2014;23: 590–598.

  42. Hudswell S, von Mengersen M, Lucas N.
    The cranio-cervical flexion test using pressure biofeedback: a useful measure of cervical
    dysfunction in the clinical setting?
    Int J Osteopath Med. 2005;8: 98–105.

  43. Peolsson A, Kjellman G.
    Neck muscle endurance in nonspecific patients with neck pain and in patients after anterior
    cervical decompression and fusion.
    J Manipulative Physiol Ther. 2007;30: 343–350.

  44. Lourenc¸o AS, Lameiras C, Silva AG.
    Neck flexor and extensor muscle endurance in subclinical neck pain: intrarater reliability,
    standard error of measurement, minimal detectable change, and comparison with
    asymptomatic participants in a university student population.
    Manipulative Physiol Ther. 2016;39: 427–433.

  45. Lopez-de-Uralde-Villanueva I, Sollano-Vallez E, Del Corral T.
    Reduction of cervical and respiratory muscle strength in patients with chronic nonspecific
    neck pain and having moderate to severe disability.
    Disabil Rehabil. 2017;96(3):203–210.

  46. Trippolini MA, Reneman MF, Jansen B, et al.
    Reliability and safety of functional capacity evaluation in patients with whiplash
    associated disorders.
    J Occup Rehabil. 2013;23:381–390.

  47. O’Leary S, Jull G, Vicenzino B.
    Do dorsal head contact forces have the potential to identify impairment during
    graded-craniocervical flexor muscle contractions?
    Arch Phys Med Rehabil. 2005;86: 1763–1766.

  48. Kahlaee AH, Rezasoltani A, Ghamkhar L.
    Is the clinical cervical extensor endurance test capable of differentiating the local
    and globalmuscles?
    Spine J. 2017;17:913–921.

  49. Rastovic P, Gojanovic MD, Berberovic M, et al.
    Isometric muscle fatigue of the paravertebral and upper extremity muscles after whiplash injury.
    Ann Saudi Med. 2017;37:297–307.

  50. Martins F, Bento A, Silva AG.
    Within-session and between-session reliability, construct validity, and comparison
    between individuals with and without neck pain of four neck muscle tests.
    Pm R. 2018;10:183–193.

  51. Cagnie B, Cools A, De Loose V, et al.
    Differences in isometric neck muscle strength between healthy controls and women with
    chronic neck pain: the use of a reliable measurement.
    Arch Phys Med Rehabil. 2007;88:1441.

  52. Ljungquist T, Jensen IB, Nygren A, et al.
    Physical performance tests for people with long-term spinal pain:
    aspects of construct validity.
    J Rehabil Med. 2003;35:69–75.

  53. J€uni P, Holenstein F, Sterne J, et al.
    Direction and impact of language bias in meta-analyses of controlled trials: empirical study.
    Int J Epidemiol. 2002;31:115–123.

  54. Moher D, Fortin P, Jadad AR, et al.
    Completeness of reporting of trials published in languages other than English:
    implications for conduct and reporting of systematic reviews.
    Lancet. 1996;347: 363–366.

  55. Moher D, Pham B, Lawson ML, et al.
    The inclusion of reports of randomised trials published in languages other than English
    in systematic reviews.
    Health Technol Assess. 2003;7:1–90.

  56. Morrison A, Polisena J, Husereau D, et al.
    The effect of English-language restriction on systematic review-based meta analyses:
    a systematic review of empirical studies.
    Int J Technol Assess Health Care. 2012;28:138–144.

  57. Sutton AJ, Duval SJ, Tweedie RL, et al.
    Empirical assessment of effect of publication bias on meta-analyses.
    BMJ. 2000;320: 1574–1577.

Return to SPINAL PALPATION

Since 3–20–2020

© 1995–2025 ~ The Chiropractic Resource Organization ~ All Rights Reserved