Interexaminer Reliability of Seated Motion Palpation
for the Stiffest Spinal Site

This section is compiled by Frank M. Painter, D.C.
Send all comments or additions to:

FROM: J Manipulative Physiol Ther. 2018 (Sep);   41 (7):   571–579 ~ FULL TEXT

Kelly Holt, PhD, David Russell, DC, Robert Cooperstein, MA, DC, Morgan Young, DC, Matthew Sherson, DC, Heidi Haavik, DC, PhD

Center for Chiropractic Research,
New Zealand College of Chiropractic,
Aukland, New Zealand.

OBJECTIVES:   The purpose of this study was to assess the interexaminer reliability of palpation for stiffness in the cervical, thoracic, and lumbar spinal regions.

METHODS:   In this secondary data analysis, data from 70 patients from a chiropractic college outpatient clinic were analyzed. Two doctors of chiropractic palpated for the stiffest site within each spinal region. Each were asked to select the stiffest segment and to rate their confidence in their palpation findings. Reliability between examiners was calculated as Median Absolute Examiner Differences (MedianAED) and data dispersion as Median Absolute Deviation (MAD). Interquartile analysis of the paired examiner differences was performed.

RESULTS:   In total, 210 paired observations were analyzed. Nonparametric data precluded reliability determination using intraclass correlation. Findings included lumbar MedianAED = 0.5 vertebral equivalents (VE), thoracic = 1.7 VE, and cervical = 1.4 VE. For the combined dataset, the findings were MedianAED = 1.1 VE; MAD was lowest in the lumbar spine (0.3 VE) and highest in thoracic spine (1.4 VE), and for the combined dataset, MAD = 1.1 VE. Examiners agreed on the segment or the motion segment containing the stiffest site in 54% of the observations.

There are more articles like this @ our:


CONCLUSIONS:   Interexaminer reliability for palpation was good between 2 clinicians for the stiffest site in each region of the spine and in the combined dataset. This is consistent with previous studies of motion palpation using continuous analysis.

KEYWORDS:   Observer Variation; Palpation; Reproducibility of Results; Spine

From the FULL TEXT Article:


A commonly used chiropractic method for assessment of spine function is motion palpation (MP). [1, 2] This method has been found to be unreliable, with levels of agreement often being no better than those associated with chance. [3] This has led to criticism of chiropractic programs for teaching MP and for its use as an assessment in clinical practice. [4, 5] It has been argued [6] that previous study designs may have had design flaws that could account for the poor agreement results. Among the design flaws that may have lowered concordance were forcing examiners to classify each segment as either hypomobile or not, regardless if the participant lacked a stiff spinal site, or forcing a response when examiners were unsure whether a specific level was hypomobile or not. [6]

A continuous measures analytic system, combined with stratification by examiner confidence, has demonstrated good interexaminer reliability in detecting areas of maximum spinal stiffness. [6–9] Level-by-level analysis of agreement at each spinal level, assessed using the κ statistic, addresses a different question and would not be expected to detect examiner agreement on locations of maximum spinal stiffness. Although this continuous measures approach seemed potentially valuable, the present authors thought it best that this prior work be replicated using similar methodology to substantiate it. The goal was to evaluate if seated motion assessment would obtain results comparable to those obtained using the prone, supine, and side-posture methods used in these prior studies. [6–9] Thus, the primary objective of this study was to assess interexaminer reliability of MP assessment for stiffness in the cervical, thoracic, and lumbar spinal regions.


This study found seated manual MP for SSS to have good reliability between 2 doctors of chiropractic. However, to be clinically useful, this procedure must also be valid when compared with a reference standard. The reference standard could be objective measurements of spinal stiffness, but that should ultimately be buttressed by clinical studies showing that treatment based on the identification of spinal stiffness results in enhanced clinical outcomes.

Few studies have reported on the validity for manual palpation of spinal stiffness. A systematic review [17] reported equivocal results, with variable results and generally low sensitivity. It included 5 studies; in 3 of the studies, the palpators assessed spinal stiffness in mannequins featuring variable segmental stiffness, whereas the 2 other studies used pain as a reference standard. The relevance of using mannequins to simulate in vivo spinal stiffness is unclear. In another validity study not included in the review, palpators examining the sacroiliac joint were unable to identify known cases of ankylosing spondylitis as fixated. [18] Another study with a similar design involving the cervical spine [19] demonstrated that palpators were able to detect fixation at the site of congenital block vertebrae. A comprehensive review by Snodgrass et al [20] discussed not only the measurement (manual and instrumented) of spinal stiffness, but also its utility in diagnosis, prognosis, and treatment decision-making. Both the reliability and validity of manual assessment of spinal stiffness were found equivocal at best.

Among the studies included by Snodgrass et al, [20] only that of Campbell and Snodgrass [21] used a stiffest spinal site method similar to the one in the present study, rather than more typical assessment of segmental stiffness. In their study, an experienced physiotherapist manually identified the thoracic segment perceived to be the stiffest by applying posteroanterior pressure on the spinous processes of the thoracic spine. An instrument was used both before and after spinal manipulation to measure stiffness at not only the segment perceived to be stiffest, but also the 2 levels caudal and cephalad to it. Stiffness at spinal locations judged to be very stiff tended to decrease with spinal manipulation.

The poor reliability of manually assessing individual spinal segments for stiffness [22–24] does not preclude reliability in determining the stiffest site within a defined spinal region. A continuous measures analytic system, combined with stratification by examiner confidence, has demonstrated good interexaminer reliability in detecting areas of maximum spinal stiffness. [6–9] When a clinical variable can be measured using either discrete or continuous analysis, there are good reasons to expect finding greater reliability when using continuous data. Two meta-analyses used continuous and categorical measures to assess psychopathology. [25] These studies found a 15% increase in reliability and 37% increase in validity using continuous measures, allowing a 50% reduction in sample size for any given power analysis of participant requirements. Another study [26] showed low reliability in assessing patients may result from the use of discrete diagnostic criteria that fail to recognize continuous variation in patients’ presentations.

One explanation for why segmental palpation is less likely to detect agreement than using a stiffest site paradigm may be that when findings are measured on a continuum, information is lost. It is proposed that if stiffness were distributed over a spinal region spanning more than 1 spinal motion segment, rating adjacent motion segments as moving or not moving might fail to identify larger fields of spinal stiffness and, thus, miss the overlap of those fields among examiners’ perceptions. Some investigators have attempted to address this issue by liberalizing their definition of agreement. For example, Christensen et al [14] considered motion palpators to agree on a site of hypomobility when their findings were ±1 spinal segment, as did Harlick et al [15] in a study of the accuracy of static spinal palpation. Although these efforts to transcend the requirement that examiners agree on the same segment seem reasonable, the SSS paradigm goes further by liberalizing the definition of agreement by extending it to fields of spinal stiffness that are centered around the SSS. Addressing the SSS would affect the contacted vertebra and the 2 motion segments including it, each of which, in turn, will influence the subjacent segment. This “field of stiffness” analysis thus leads directly to a “field of manipulative impact” analysis, possibly explaining clinical outcomes for spinal manipulation despite frequently low levels of examiner agreement on exact sites for spine care. [16]

Assessing the stiffest spinal sites in a defined region may more closely match the practice of chiropractic clinicians than formulating judgements for each spinal vertebra as “moving or not moving,” which is the examination method that has been used in typical research settings. It is hypothesized that level-by-level analysis may find multiple levels within a spinal region, although a clinician would probably not adjust each hypomobile segment. Since spinal manipulation often results in multiple cavitations, [27, 28] it may be less specific than usually intended. However, since manipulation necessarily impacts at least 2 motion segments, treating what seems to be the most hypomobile segment within a defined region may impact adjacent spinal levels. This study’s finding that 75% of examiner differences among 210 observations were ≤2 VE apart implies this SSS protocol would usually “capture” the site of the patient’s complaint.

Interexaminer reliability in this study was determined by calculating MedianAED and MAD; the nonparametric distribution of examiner differences precluded calculating ICC in the usual manner. It also precluded using Bland-Altman limits of agreement [29, 30] for the same reason. Median Absolute Examiner Difference provides a useful measure of the typical examiner difference in localizing the SSS and is especially useful when examiner differences are not normally distributed. [13, 31] Median Absolute Deviation is a robust measure of dispersion (similar to a standard deviation) that is resilient to outliers and is suitable for datasets that are not normally distributed. [13] It is mathematically defined as the median of the absolute deviations of all paired observations from the median absolute examiner difference. [13, 32] It is resistant to extreme outliers at the minimum and maximum ends of the range of examiner differences because relatively large or small differences have no greater impact on the median value than smaller ones. Median Absolute Deviation quantifies the variability of the data, painting a more detailed picture than more typically reported range values, the maximum and minimum values. The magnitude of a range is strongly affected by extreme values. Because the height of a typical vertebra varies according to the spinal region, examiner differences reported in centimeters misleadingly imply different degrees of examiner reliability, depending on the spinal region. Reporting the data as VEs allowed immediate comparisons of examiner reliability, irrespective of spinal region, and is more clinically intuitive to understand than centimeters.

To assist in interpreting the findings for MedianAED (Fig 5), consider a case in which the first examiner judged the SSS to be at the middle of a given segment (right). If the second examiner identifies an SSS that is ≤1.5 VE segments distant, it must be concluded the examiners agreed that 1 of the 2 motion segments including the first examiner's SSS was hypomobile. This does not imply their findings were in a range spanning 2 motion segments, but only that agreement may have occurred on the motion segment above or below. In this study, this occurred 85.6% of the time in the lumbar spine, 47.1% of the time in the thoracic spine, and 61.4% of the time in the cervical spine.

If 1 of the examiners had identified the SSS at either the very top or the very bottom of a segment (left), the MedianAED consistent with agreeing on the hypomobile motion segment, including the SSS, narrows to ≤1.0 VE. This is because a MedianAED > 1.0 VE would mean the second examiner might have found the SSS to lie more than 1 vertebra distant (Fig 5, lower left), thus within the subjacent motion segment. Examiner disagreement ≤1.0 VE occurred 75.8% of the time in the lumbar spine, 40.1% of the time in the thoracic spine, and 41.4% of the time in the cervical spine.

For the combined dataset, agreement on the SSS or the motion segment at the ≤1.5 VE cutoff point (ie, first examiner found the SSS at a vertebral center) was 56.2% and at the ≤1.0 VE cutoff point was 46.7% (ie, the first examiner found the SSS at the top or bottom of a vertebra). Under the reasonable assumption that an examiner was equally likely to locate the SSS nearest the middle or nearest the top or bottom of a vertebra, it may be most reasonable to use ≤1.25 VE as the cutoff point for stating the examiners agreed on the SSS or the motion segment including it. This occurred 54.3% of the time for the combined dataset. These figures show less agreement compared with data reported9 in prior studies. [6–8] In the prior analysis, examiner differences were reported to be ≤1.5 VE 77.0% of the time in the combined dataset, exceeding the 56.2% seen in the present study. In their secondary analysis, unstratified MedianAED = 0.7 VE, compared with the 1.1 VE in the present study.

A box and whisker plot summarizes the results for the combined dataset (Fig 3). This plot divides the data into quartile groups: the low whisker represents the smallest 25% of examiner differences, the blue box represents the middle half of the examiner differences, and the upper whisker represents the largest 25% of examiner differences. The 13 dots above the line across the upper whisker represent data points defined as outliers because they exceed the third quartile of the data by more than 1.5 times the interquartile range or height of the box. The value for VE at the top of the box is 2.0 VE, including 75% of examiner differences among 210 observations.

There were several differences between the present study and the previous, otherwise similar, studies, [6–8] possibly accounting for the modest difference in outcomes. In the present study, MP was performed in the seated position, assessing the excursion of primarily forward flexion and extension movements; whereas the prior studies assessed prone end-feel (participants prone in the thoracic spine assessments, supine in the cervical spine, and in side-posture in the lumbar spine). Excursion MP assesses the motion of a vertebra in relation to an adjacent vertebra, whereas end-feel palpation assesses the stiffness to palpation of a single vertebra. [33] Unlike the present study, in the prior studies, [9] interexaminer reliability varied with examiner confidence; when both examiners were confident in their findings (53.4%), the median examiner difference decreased to 0.6 VE, increased to 1.0 VE when 1 lacked confidence, and increased to 1.8 VE when both lacked confidence. In the previous studies, paired examiner differences were normally distributed, [9] allowing the use of both ICC and Bland-Altman limits of agreement to assess interexaminer reliability in addition to the MedianAED calculations used in both the previous and present studies.


Limitations of this study include the use of a convenience sample of participants based on logistics in the Chiropractic Centre and examiner availability, which likely resulted in sampling bias. Only 2 examiners, who may have had unique experience and skills, participated in the study. The examiners were full-time chiropractic educators; thus, it could be argued that they were not representative of chiropractors in the field. Therefore, these findings may not necessarily be extrapolated to other chiropractors.

Although examiners were blinded to any other prior findings during the study, it is possible that they were familiar with some patients’ prior findings or clinical or nonclinical cues based on previous visits to the Chiropractic Centre that they may have supervised. Although there was no power analysis to determine sample size, the investigators thought 70 participants, each of whom were examined in each spinal region for a total of 210 observations, was ample. Although ICC could not be used because of the nonparametric nature of the data, the recommended number of participants for either a complete dataset or a subset in this kind of study is about 35 participants, to have 80% power at the 5% significance level to detect ICC ≥ 0.6. [34] It was likely reasonable to extrapolate this judgement on sample size to the MedianAED calculations used in the present study. Although MAD is robust to extreme values because a larger extreme value has no greater impact than a smaller extreme value, this is also an important weakness. So-called extreme outliers at the lower and upper quartiles of examiner differences may represent an important characteristic of the examination method under investigation.

The methodological choice to select the more caudal site for the SSS, when an examiner found at least 2 sites indistinguishably stiff, may have biased the results to some degree. Without there having been a reference standard, it cannot be confirmed there actually were areas of clinically significant spinal stiffness among the participants studied. Although clinician disagreement on the stiffest or otherwise defined optimal site of spine care may lead to suboptimal results or even harm patients, the authors are not aware of studies confirming or excluding that possibility. Snodgrass et al did conclude [20] that there is limited evidence that manually assessed spinal stiffness may be associated with radiographic findings of rotational hypomobility or hypermobility and that short-term spinal stiffness decreases immediately after high-velocity, low amplitude manipulation in symptomatic persons. In the end, the limited amount of information available on kinematic and clinical changes after manipulation of arbitrarily selected specific spinal segments may not be entirely relevant to pre-post changes in the properties of the SSS.

Motion palpation findings should not solely determine clinical interventions, but instead should consider additional parameters related to the patients and doctors. These include the patients’ symptoms, comorbidities, prior response to care, preferences, values, goals, and specific diagnoses. It should also be noted that demonstrating an examination procedure to be reliable does not, by itself, demonstrate it is clinically useful. It must also be shown to obtain valid information, generally through comparison of its results with those of a reference standard. Future studies might address whether deploying this stiffest site paradigm results in improved clinical outcomes


More than half the time (54.3%) the examiners in this study agreed on the exact segment or at least the motion segment that was stiff in a given spinal region. The MedianAED for the combined dataset was 01.1 VE. This information supports high levels of interexaminer reliability for the SSS in each region of the spine and in the combined dataset. Reliability estimates based on absolute examiner differences were relatively higher and seemed to be more trustworthy than prior estimates based on discrete analysis and analysis with κ because the assessment method more closely resembled that used by clinicians in clinical practice. The reliability seen in this study is broadly consistent with previous studies of MP using continuous analysis.


  1. Holt, K, Kelly, B, and Taylor, H.
    Practice characteristics of chiropractors in New Zealand.
    Chiropr J Aust. 2009; 39: 103–109

  2. Walker, BF and Buchbinder, R.
    Most Commonly Used Methods of Detecting Spinal Subluxation and the Preferred Term for
    its Description: A Survey of Chiropractors in Victoria, Australia

    J Manipulative Physiol Ther. 1997 (Nov);   20 (9):   583–589

  3. Haneline, M, Cooperstein, R, Young, M, and Birkeland, K.
    An annotated bibliography of spinal motion palpation reliability studies.
    JCCA J Can Chiropr Assoc. 2009; 53: 40–58

  4. Troyanovich, SJ and Harrison, DD.
    Motion palpation: it's time to accept the evidence.
    J Manipulative Physiol Ther. 1998; 21: 568–571

  5. Hestbaek, L and Leboeuf-Yde, C.
    Are chiropractic tests for the lumbo-pelvic spine reliable and valid?
    A systematic critical literature review.
    J Manipulative Physiol Ther. 2000; 23: 258–275

  6. Cooperstein R, Haneline M, Young M (2010)
    Interexaminer Reliability of Thoracic Motion Palpation Using Confidence Ratings and Continuous Analysis
    J Chiropractic Medicine 2010 (Sep);   9 (3):   99–106

  7. Cooperstein R, Young M, Haneline M (2013)
    Interexaminer Reliability of Cervical Motion Palpation Using Continuous Measures and Rater Confidence Levels
    J Can Chiropr Assoc. 2013 (Jun);   57 (2):   156–164

  8. Cooperstein, R and Young, M.
    The Reliability of Lumbar Motion Palpation Using Continuous
    Analysis and Confidence Ratings

    J Can Chiropr Assoc. 2016 (Jun);   60 (2):   146–157

  9. Cooperstein, R and Young, M.
    The Reliability of Spinal Motion Palpation Determination of the Location of the Stiffest Spinal
    Site is Influenced by Confidence Eatings: A Secondary Snalysis of Three Studies

    Chiropractic & Manual Therapies 2016 (Dec 20);   24:   50

  10. Holt, K, Russell, D, Cooperstein, R, Young, M, Sherson, M, and Haavik, H.
    Interexaminer reliability of the detection of vertebral subluxations using continuous
    measures and confidence levels.
    Chiropr J Aust. 2018; 46: 100–117

  11. Abbott, JH, Flynn, TW, Fritz, JM, Hing, WA, Reid, D, and Whitman, JM.
    Manual physical assessment of spinal segmental motion: intent and validity.
    Man Ther. 2009; 14: 36–44

  12. Bergmann, T and Peterson, DH.
    Chiropractic Technique. 3rd ed.
    Elsevier, St. Louis, MO; 2011

  13. Leys, CL, Ley, C, Klein, O, Bernard, P, and Licata, L.
    Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median.
    J Exp Soc Psychol. 2013; 49: 764–766

  14. Christensen, HW, Vach, W, Vach, K et al.
    Palpation of the upper thoracic spine: an observer reliability study.
    J Manipulative Physiol Ther. 2002; 25: 285–292

  15. Harlick, JC, Milosavljevic, S, and Milburn, PD.
    Palpation identification of spinous processes in the lumbar spine.
    Man Ther. 2007; 12: 56–62

  16. Triano, JJ, Budgell, B, Bagnulo, A et al.
    Review Of Methods Used By Chiropractors To Determine The Site For Applying Manipulation
    Chiropractic & Manual Therapies 2013 (Oct 21); 21 (1): 36

  17. Najm, WI, Seffinger, MA, Mishra, SI et al.
    Content validity of manual spinal palpatory exams - A systematic review.
    BMC Complement Altern Med. 2003; 3: 1

  18. Mior, SA, McGregor, M, and Schut, B.
    The role of experience in clinical accuracy.
    J Manipulative Physiol Ther. 1990; 13: 68–71

  19. Humphreys, BK, Delahaye, M, and Peterson, CK.
    An Investigation into the Validity of Cervical Spine Motion Palpation Using
    Subjects with Congenital Block Vertebrae as a 'Gold Standard'

    BMC Musculoskelet Disord 2004 (Jun 15); 5 (1): 19

  20. Snodgrass, SJ, Haskins, R, and Rivett, DA.
    A structured review of spinal stiffness as a kinesiological outcome of manipulation:
    its measurement and utility in diagnosis, prognosis and treatment decision-making.
    J Electromyogr Kinesiol. 2012; 22: 708–723

  21. Campbell, BD and Snodgrass, SJ.
    The effects of thoracic manipulation on posteroanterior spinal stiffness.
    J Orthop Sports Phys Ther. 2010; 40: 685–693

  22. Matyas, TA and Bach, TM.
    The reliability of selected techniques in clinical arthrometrics.
    Aust J Physiother. 1985; 31: 175–199

  23. Maher, C and Adams, R.
    Reliability of pain and stiffness assessments in clinical manual lumbar spine examination.
    (discussion 809-811)
    Phys Ther. 1994; 74: 801–809

  24. Binkley, J, Stratford, PW, and Gill, C.
    Interrater reliability of lumbar accessory motion mobility testing.
    (discussion 793-795)
    Phys Ther. 1995; 75: 786–792

  25. Markon, KE, Chmielewski, M, and Miller, CJ.
    The reliability and validity of discrete and continuous measures of psychopathology:
    a quantitative review.
    Psychol Bull. 2011; 137: 856–879

  26. Baca-Garcia, E, Perez-Rodriguez, MM, Basurte-Villamor, I et al.
    Diagnostic stability of psychiatric disorders in clinical practice.
    Br J Psychiatry. 2007; 190: 210–216

  27. Ross, JK, Bereznick, DE, and McGill, SM.
    Determining cavitation location during lumbar and thoracic spinal manipulation:
    is spinal manipulation accurate and specific?.
    Spine. 2004; 29: 1452–1457

  28. Ross, K, Bereznick, DE, and McGill, SM.
    The accuracy and specificity of lumbar and thoracic spinal manipulation.
    J Chiropr Educ. 2004; 18: 26

  29. Bland, JM and Altman, DG.
    Statistical methods for assessing agreement between two methods of clinical measurement.
    Lancet. 1986; 1: 307–310

  30. Bland, JM and Altman, DG.
    Comparing methods of measurement: why plotting differences against standard method is misleading.
    Lancet. 1986; 346: 1085–1087

  31. Rouse, MW, Borsting, E, and Deland, PN.
    Convergence Insufficiency and Reading Study (CIRS) Group.
    Reliability of binocular vision measurements used in the classification of convergence insufficiency.
    Optom Vis Sci. 2002; 79: 254–264

  32. Huang, S, Wang, T, and Yang, M.
    The evaluation of statistical methods for estimating the lower limit of detection.
    Assay Drug Dev Technol. 2013; 11: 35–43

  33. Cooperstein, R.
    Two types of motion palpation: the excursion and the end-feel methods.
    JACA Online. 2008; 45: 25–26

  34. Eliasziw, M, Young, SL, Woodbury, MG, and Fryday-Field, K.
    Statistical methodology for the concurrent assessment of interrater and intrarater reliability:
    using goniometric measurements as an example.
    Phys Ther. 1994; 74: 777–788


Since 1-11-2019

                       © 1995–2019 ~ The Chiropractic Resource Organization ~ All Rights Reserved