Skip to main content
. 2025 Nov 30;69(3):238–254.
J Can Chiropr Assoc 2025 (Nov); 69 (3): 238-254

Table 1.

Descriptive information extracted from the 28 articles included in our review.

Field First author, year Title Study design Evidence hierarchy insights a
Clinical practice Aldous, 202414 Wheel replacing pyramid: better paradigm representing totality of evidence-based medicine Narrative review
  • 1. Propose a ‘totality of evidence’ wheel that provides a non-hierarchical framework to include all study designs to offer a comprehensive view of medical evidence, for use in fast-evolving situations like the COVID-19 pandemic, enabling quicker, informed decision-making.


  • 2. The evidence pyramid places RCTs at the top, potentially overshadowing other study designs. For example, well-conducted observational studies are sometimes neglected because of their lower position. The authors argue that the traditional evidence pyramid restricts the scope of information and thereby hampers medical progress, particularly in emergencies. The wheel structure they proposed, which is non-hierarchical in nature, would enable medical professionals to consider a broader array of evidence, including population studies and narrative accounts, which are often excluded in traditional pyramid-based thinking.

Antoniou, 202215 An overview of evidence quality assessment methods, evidence to decision frameworks, and reporting standards in guideline development Narrative review
  • 1. Distinguishes between strength of evidence assessments and evidence hierarchies. While both aim to provide clinicians, patients and researchers a comprehensive evaluation of the evidence, assessments provide judgements on confidence in study findings and hierarchies rank evidence by study design (e.g., RCTs highest, expert opinion lowest). Hierarchies are simple and easy to use by non-experts, aiding guideline development for therapeutic effects, harms, and other clinical questions. They are also easy to comprehend for clinical practice guidance. Within hierarchies, the level of evidence does not necessarily reflect the strength of a recommendation.

    The authors developed their own hierarchy to align evidence with class of recommendation. They suggested within the discipline of vascular surgery that evidence from multiple RCTs showing favourable results for a given treatment should be associated with the wording “is recommended”, clearly favourable results from a single RCT or large non-randomized study “should be considered”, unclear favourable results (efficacy less well established) from these single studies “may be considered”, and unfavourable results potentially suggesting harm from consensus of experts or small studies “is not recommended” when making clinical decisions.


  • 2. They felt that hierarchies are overly simplistic, failing to account for important factors of evidence beyond study design that are essential for clinical decision-making.

Anttila, 20169 Conclusiveness resolves the conflict between quality of evidence and imprecision in GRADE Commentary
  • 1. Highlights that the GRADE guideline presents significant challenges in the understanding of the key concepts of “quality of evidence” and “imprecision,” particularly when considered together. This confusion may hinder the practical process of evidence assessment, indicating a need for explicit guidance in the GRADE framework. Quality is not objectively calculated but instead reflects reviewers’ confidence in how close the estimate is to the true effect, expressed on a 4-point ordinal scale. Imprecision, a reason for downgrading evidence quality, incorporates aspects such as sample size, statistical power, confidence intervals, and critical margins regarding benefits and harms. However, the inclusion of critical margins within the concept of imprecision leads to confusion, as these elements do not necessarily reflect the statistical closeness of the parameter value to the estimate.

Bosdriesz, 202010 Evidence-based medicine: when observational studies are better than randomized controlled trials Narrative review
  • 3. RCTs are the gold standard for evaluating the intended effects of interventions due to their use of randomization, which minimizes confounding by indication. However, RCTs can have limitations, including limited generalizability, high costs, short follow-up, ethical concerns, and smaller sample sizes. When RCTs are not feasible, observational studies (e.g., cohort or case-control) are used. While observational studies may have confounding concerns, they provide more generalizability and the ability to measure naturally occurring exposure on an outcome. Ultimately, the research question should guide the study design to be considered.

Chloros, 202316 Has anything changed in evidence-based medicine? Commentary
  • 1. The evidence pyramid ranks research designs, with meta-analyses and systematic reviews at the top, followed by RCTs, cohort and case-control studies, case series, case reports, and expert opinion at the bottom. While fine-tuned periodically, the top of the pyramid remains consistent, but the lower levels may vary, sometimes including laboratory and animal research. The pyramid separates evidence into “robust” (levels 1 and 2) and “less robust” categories for prioritizing the best evidence in research and clinical practice. Many view the pyramid as a hierarchy. However, not all research questions can be addressed by RCTs, which primarily aim to reduce bias and confounding.


  • 2. The traditional evidence pyramid, based solely on methodology, is oversimplified and potentially misleading. A poorly conducted RCT can yield unreliable results, while a well-executed observational study may produce strong evidence.


  • 3. Urgent public health needs (e.g., COVID-19 pandemic) sometimes necessitate considering multiple forms of evidence, such as robust observational studies, in addition to RCTs.

Cuello-Garcia, 202217 GRADE guidance 24: optimizing the integration of randomized and non-randomized studies of interventions in evidence syntheses and health guidelines Commentary
  • 1. The authors recommend using the GRADE methodology to assess the certainty of evidence from RCTs for each outcome individually. If high certainty is achieved, further evaluation of non-randomized studies of interventions (NRSI) is unnecessary. However, if RCT evidence is of low or very low certainty, NRSIs can be considered to enhance overall certainty. In cases where RCT evidence is moderate, NRSIs may be integrated to address issues like indirectness. The authors caution that while large NRSIs with precise estimates may be appealing, they should be carefully evaluated for bias using appropriate tools (e.g., ROBINS-I).

Djulbegovic, 202218 High quality (certainty) evidence changes less often than low-quality evidence, but the magnitude of effect size does not systematically differ between studies with low versus high-quality evidence Meta-epidemiological study
  • 1. Within a traditional evidence hierarchy, the authors found lower-quality evidence changes more often than higher-quality evidence, suggesting that higher quality evidence is more valid and reliable. However, the magnitude of treatment effects did not significantly between low and high quality of evidence. Therefore, the GRADE approach may not effectively differentiate the impact of quality of evidence on treatment effect sizes. The authors suggest current appraisal methods of evidence may need reassessing to capture quality of evidence as intended. If both low and high quality of evidence studies produce similar effect sizes, it challenges the assumption that higher quality evidence is always more valid or applicable for informing clinical decisions.


  • 2. As above.

Djulbegovic, 202419 High certainty evidence is stable and trustworthy, whereas evidence of moderate or lower certainty may be equally prone to being unstable Meta-epidemiological study
  • 1. Found that high-quality evidence, free from limitations, rarely changes with new data, while evidence with even one limitation (moderate quality) is more likely to change. Moderate-quality evidence often has a single limitation and should be interpreted cautiously when issuing strong recommendations. Lower quality evidence (moderate, low, or very low) exhibited more frequent changes, larger deviations, and greater uncertainty. Limitations, especially imprecision and indirectness, significantly impacted changes in effect estimates and their significance.

Galbraith, 201728 A real-world approach to evidence-based medicine in general practice: a competency framework derived from a systematic review and Delphi process Systematic review and Delphi process
  • 1. Propose a competency framework to bridge real-world practice and EBP. Propose viewing evidence as what is more appropriate, suggesting that relying solely on evidence to guide a search for ‘real-world’ evidence is not best practice.


  • 3. Emphasize the importance of clinician expertise, as viewing evidence alone is insufficient and suggest that EBP is rigid in its application.

Hohmann, 201829 Research pearls: how do we establish the level of evidence? Commentary
  • 1. Acknowledge a traditional evidence hierarchy in research as categorized into five levels (I-V), where Level I represents the highest quality, and Level V the lowest. They state these levels are to help classify studies based on design and rigour, with higher levels often offering more reliable results for clinical practice.


  • 3. They suggest that the level of evidence assigned to studies in the hierarchy reflects study design rather than quality, and even a poorly executed ‘level 1’ trial can be downgraded if it lacks power or proper design. Level of evidence is just one measure of quality, but relying on this alone does not reflect the definition of EBP.

Mayoral, 202130 Decision-making in medicine: a Kuhnian approach Commentary
  • 1. Criticizes the traditional thought process of using an evidence pyramid to guide evidence consideration, suggesting it imposes constraints on clinical decision-making that can contribute to a lack of holistic care for individual patients with their own contexts and circumstances.

Mercuri, 201831 The evolution of GRADE (part 1): is there a theoretical and/or empirical basis for the GRADE framework? Narrative review
  • 1. Critiques the GRADE framework for lacking theoretical and empirical justification in its criteria for assessing evidence quality and making clinical recommendations. They state that GRADE relies on a modified hierarchy of evidence, which itself does not have a solid theoretical foundation, suggesting the EBP hierarchy is based more on belief than scientific proof. These hierarchical limitations are emphasized in the prioritization of RCTs over other well-designed studies. They suggest that empirical studies have shown that the superiority of RCTs in controlling bias is inconclusive, with some non-randomized studies yielding similar effect estimates when well-designed. The article suggests that without addressing these foundational issues, GRADE may not effectively improve upon the limitations of the EBP evidence hierarchy, and could suffer from the same limitations in guiding clinical practice.

Mercuri, 2018 32 The evolution of GRADE (part 2): still searching for a theoretical and/or empirical basis for the GRADE framework Narrative review
  • 1. Highlights research critiquing the GRADE framework for adopting Bradford Hill’s criteria (implicitly and explicitly) without fully integrating them into a coherent theoretical basis and not clearly articulating the connection. They also note that GRADE lacks explicit consideration of biological plausibility and mechanisms, which are downplayed in EBP hierarchies but are important for understanding causation. They critique EBP’s reliance on evidence hierarchies, particularly the emphasis on randomization. Proponents of EBP argue that randomization balances study groups, leading to more reliable effect estimates. However, literature is presented that questions the philosophical and empirical basis of randomization’s superiority. Even with balanced groups, external validity and individual patient applicability remain problematic, as generalizability and patient-specific outcomes are not always addressed effectively.

Mercuri, 201811 The evolution of GRADE (part 3): a framework built on science or faith? Narrative review
  • 1. States that GRADE categorizes studies into RCTs and observational studies, with the latter consistently rated as lower-quality evidence, without clear reasoning for why these types of studies are grouped together or rated similarly. They suggest the decision to classify observational studies as starting at “low certainty” was made based on internal discussion rather than empirical evidence. They suggest that clarity is lacking on why certain criteria for assessing evidence quality and making recommendations were selected and others excluded. They suggest changes to the framework have been introduced based on consensus rather than scientific evidence, and the lack of operational definitions for key criteria leaves too much room for user-judgement, raising concerns about the validity of the recommendations produced. They conclude that GRADE’s foundation is weak, as it lacks the necessary theoretical or empirical support to justify its approach. They argue that until the framework is substantiated by scientific evidence, the validity of its recommendations remain uncertain, and reliance on it should be cautious.

Mercuri, 201812 What confidence should we have in GRADE? Commentary
  • 1. Summarize that within GRADE and the evidence hierarchy, RCTs receive a “high” grade, signifying high confidence in the effect “low” and other sources (e.g., lab studies, case reports) are graded “very low.” Criteria are provided to adjust these grades, either increasing or decreasing confidence based on factors such as study limitations, effect size, or bias.


  • 2. They criticize GRADE for suggesting that certain types of evidence, like observational studies or expert opinion, are discarded when stronger evidence (e.g., RCTs) is available. They suggest it also lacks clarity on how to integrate evidence from diverse sources (e.g., RCTs with observational or basic science findings), and that the hierarchy implies that higher-quality evidence, such as RCTs, automatically outweighs lower-quality studies, which may undermine the value of the broader evidence base.

Mugerauer, 202013 Professional judgement in clinical practice (part 3): a better alternative to strong evidence-based medicine Narrative review
  • 1. Suggests a major issue with EBP is its unrealistic focus on certainty, leading to the mistaken belief that if clinicians make different decisions, it means they do not know what they are doing. This results in a push for rigid, standardized guidelines based on evidence ‘level’ or quality, with RCTs seen as the most “objective.” However, they argue that skilled practitioners recognize that uncertainty is normal, especially when treating unique patients with multiple conditions in complex and varying environments. They suggest that clinician expertise is therefore not only important, but necessary when considering EBP and evidence hierarchies.

Noman, 202420 Simplifying the concept of level of evidence in lay language for all aspects of learners: in brief review Commentary
  • 1. The authors conceptualized the evidence hierarchy as divided into filtered and unfiltered categories, reflecting different levels of synthesis and evaluation. Filtered information, positioned at the top of the pyramid, includes systematic reviews, meta-analyses, and critically appraised topics and articles. These forms of evidence undergo rigorous assessment and synthesis, providing highly reliable information that can guide clinical practice without further scrutiny from practitioners. Unfiltered information, located in the middle tiers, comprises primary research studies, such as RCTs and observational studies, which, while potentially more current and specific, require practitioners to critically evaluate their quality and relevance before application.


  • 2. While filtered evidence is easier to apply due to its pre-evaluated nature, it may not always be available or applicable to specific clinical scenarios, necessitating a reliance on unfiltered sources. Additionally, the base of the pyramid, which includes expert opinion and background information, though not considered high-level evidence, still plays a role in forming the foundation of clinical knowledge, especially in areas where high-level evidence is lacking. Practitioners are encouraged to carefully select and apply the best available evidence, balancing the reliability of filtered sources with the immediacy and specificity of unfiltered ones, and to remain mindful of the context and limitations inherent in lower levels of evidence.

Ritson, 202322 Bridging the gap: evidence-based practice guidelines for sports nutritionists Narrative review
  • 1. Suggests the hierarchy is based on susceptibility to bias from study design. For intervention-focused questions, systematic reviews and meta-analyses of RCTs (Level 1) and evidence syntheses (Level 2) are preferred due to their rigorous appraisal process. Evidence hierarchies provide practitioners with insight on the degree of certainty they can have when providing recommendations. The authors further suggested that practitioners prioritize evidence from the top of these hierarchies as a result, but should not disregard evidence at the bottom of hierarchies when making recommendations, particularly when evidence at the top has gaps.


  • 2. Despite the defined hierarchy informing levels of bias and trustworthiness, in applied sports and exercise nutrition, this hierarchy is not always definitive. Such high-level evidence can take years to publish and may not fully address a practitioner’s specific PICO question, leading them to rely on lower-tier evidence. While RCTs offer strong internal validity, their high control levels can reduce practical relevance. Although the top of the hierarchy should be prioritized, the full hierarchy should still be considered.

Semrau, 202323 Common misunder-standings of evidence-based medicine Commentary
  • 1. The evidence pyramid is appropriate to highlight the highest quality evidence for doubtful, mechanistically unexplained effects requiring a control group. As in these cases, a control group baseline is needed to inform the treatment effect.

    However, they feel the pyramid’s structure is misleading when assessing parameters associated with specific interventions, where RCTs may not provide the highest quality evidence in cases when a comparator group does not impact the quality of evidence.

  • 2. When available, they feel that different study designs (or levels of evidence) should be assessed. For example, RCTs can demonstrate a probable cause-effect relationship or indicate a treatment’s practical usefulness, but significant results do not guarantee a true effect. Positive RCTs cannot definitively prove a therapy’s benefit, and negative results cannot disprove a known cause-effect relationship.

  • 3. They suggested to reconsider traditional evidence pyramids, advising that the most suitable evidence should be determined based on the specific parameter being evaluated.

Sekhon, 202424 Synthesis of guidance available for assessing methodological quality and grading of evidence from qualitative research to inform clinical recommendations: a systematic literature review Systematic review
  • 1. Identified two approaches for summarizing the quality of qualitative research for clinical guidance: a qualitative evidence hierarchy and a research pyramid. Both rank qualitative systematic reviews and meta-syntheses at the top, similar to quantitative research, and each suggests that the top of the hierarchy is reserved for studies providing the most ‘evidence’. However, qualitative research focuses on experiences, barriers, facilitators, and the feasibility of implementation, which are not easily ranked in the same hierarchical way as quantitative evidence.

Szajewska, 201821 Evidence-based medicine and clinical research: both are needed, neither is perfect Commentary
  • 1. Acknowledges the appropriateness of an evidence hierarchy and that systematic reviews are the strongest form of evidence. However, it is emphasized there are contextual factors to consider. Depending on the clinical question, observational studies may be more applicable. A framework is proposed to account for this.

Vere, 201926 Evidence-based medicine as science Commentary
  • 1. Critiques the notion that EBP, through the evidence hierarchy, fits neatly into traditional scientific methods such as inductivism and falsificationism, which focus on theory confirmation or falsification through observation. EBP prioritizes empirical evidence through hierarchies (RCTs, meta-analyses) but this does not align well with traditional scientific theories. Hierarchies rank evidence quality but do not necessarily test or advance scientific theories directly.

Wieten, 201827 Expertise in evidence-based medicine: a tale of three models Commentary
  • 1. Summarizes three initial models of EBP, one being the evidence pyramid. They note how the first pyramid comprised of four layers is used to inform the GRADE framework.

  • 2. Explains clinician expertise is an important consideration when evaluating evidence for use in practice. They argue that clinician expertise is incorrectly considered a form of evidence in many pyramids. Instead, expertise should be considered as a process for appraising and integrating various forms of evidence.

Wallace, 202225 Hierarchy of evidence within the medical literature Commentary
  • 1. Defines the hierarchy similar to others (observational studies up through to RCTs, systematic reviews, and meta-analyses). The authors believe the hierarchy should be applied when performing literature searches, particularly when clinicians are pressed for time. However, the overall quality of evidence of each study design is still dependent on study strengths and limitations identified in the critical appraisal process. An abundance of only lower-level observational studies for a particular clinical question should also inform the development of higher-level studies on the same topic.

Geo-science St. John, 201736 The strength of evidence pyramid: one approach for characterizing the strength of evidence of geoscience education research (GER) community claims Commentary
  • 1. Proposes a modified 5-level evidence pyramid in geoscience education that places “Practitioner wisdom/expert opinion” at its foundation, recognizing educators’ unique insights into what and how to teach. The pyramid distinguishes between qualitative and quantitative studies, separating case studies and cohort studies, while emphasizing the role of clinical expertise in assessing quality. At the top are meta-analyses and systematic reviews, which are less common as they summarize primary research. This model is similar to the EBP hierarchy, highlighting the need for context-sensitive decision-making using hierarchical frameworks.

Public health Irving, 2016 33 A critical review of grading systems: implications for public health policy Narrative review
  • 1. While RCTs are considered the best method to minimize bias and are frequently regarded as the ideal research design within evidence hierarchies, this does not mean RCTs are appropriate for all types of questions. The authors suggest that some grading systems often overlook issues like flawed randomization or unequal group sizes in RCTs. Additionally, RCTs may not always be appropriate or ethical for all research areas. Observational studies, especially large population-based studies, may offer applicable findings and include more diverse participants, enhancing their external validity.

Jervelund, 202235 Evidence in public health: an integrated, multi-disciplinary concept Commentary
  • 1. The authors feel the typical hierarchy of ranking study designs based on methodological rigour and risk of bias with systematic reviews and meta-analyses at the top, and expert opinions at the bottom, is less applicable in a field with large epistemological diversity (e.g., context variability) such as public health.


  • 2. They advocate for an evidence typology that evaluates the quality of evidence based on the appropriateness of study designs to specific research questions, rather than following a rigid hierarchy. This approach suggests that quantitative methods are best for studying causal relationships, while qualitative methods are relevant for understanding social contexts, or the use of mixed methods to optimize public health outcomes.

Parkhurst, 201634 What constitutes “good” evidence for public health and social policy-making? From hierarchies to appropriateness Commentary
  • 1. The authors acknowledge that “good evidence” for clinical practice often relies on evidence hierarchies, with RCTs typically viewed as the gold standard due to their scientific rigour. However, they suggest there is growing recognition that these hierarchies may not always provide the best guidance for policy-making. Evidence hierarchies prioritize internal validity, but policy decisions require broader considerations, such as social, political, and economic factors, which are often not suitably investigated using RCTs.


  • 2. The authors argue for a framework based on the appropriateness of evidence, which considers relevance to policy concerns, applicability to local contexts, and alignment with public health goals.

COVID-19 = coronavirus disease of 2019;
EBP = evidence-based practice;
GRADE = grading of recommendations assessment, development, and evaluation;
NRSI = non-randomized studies of interventions;
PICO = patient, intervention, comparison, outcome;
RCT = randomized controlled trial;
ROBINS-I = risk of bias in nonrandomized studies – of interventions.

aReview categories:

(1) contemporary understandings of the evidence pyramid, including how it is used and understood;

(2) critiques of the evidence pyramid in relation to EBP; and

(3) contextual considerations when applying the evidence pyramid to clinical decision-making.