Appendix Table 2. U.S. Preventive Services Task Force Hierarchy of Research Design and Quality Rating Criteriaa
Hierarchy of research design
I: Properly conducted RCT
II-1: Well-designed controlled trial without randomization
II-2: Well-designed cohort or case–control analytic study
II-3: Multiple time series with or without the intervention; dramatic results from uncontrolled experiments
III: Opinions of respected authorities, based on clinical experience; descriptive studies or case reports; reports of expert committees
Design-specific criteria and quality category definitions
Systematic reviews
Criteria
Comprehensiveness of sources considered/search strategy used
Standard appraisal of included studies
Validity of conclusions
Recency and relevance are especially important for systematic reviews
Definition of ratings based on criteria above:
Good: recent, relevant review with comprehensive sources and search strategies; explicit and relevant selection criteria; standard appraisal of included studies; and valid conclusions
Fair: recent, relevant review that is not clearly biased but lacks comprehensive sources and search strategies
Poor: outdated, irrelevant, or biased review without systematic search for studies, explicit selection criteria, or standard appraisal of studies
Case–control studies
Criteria
Accurate ascertainment of cases
Nonbiased selection of cases/controls with exclusion criteria applied equally to both
Response rate
Diagnostic testing procedures applied equally to each group
Measurement of exposure accurate and applied equally to each group
Appropriate attention to potential confounding variables
Definition of ratings based on criteria above:
Good: appropriate ascertainment of cases and nonbiased selection of case and control participants; exclusion criteria applied equally to cases and controls; response rate >80%; diagnostic procedures and measurements accurate and applied equally to cases and controls; and appropriate attention to confounding variables
Fair: recent, relevant, without major apparent selection or diagnostic work-up bias but with response rates <80% or attention to some but not all important confounding variables
Poor: major section or diagnostic work-up biases, response rates <50%, or inattention to confounding variables
RCTs and cohort studies
Criteria
Initial assembly of comparable groups
RCTs: adequate randomization, including first concealment and whether potential confounders were distributed equally among groups
Cohort studies: consideration of potential confounders with either restriction or measurement for adjustment in the analysis; consideration of inception cohorts
Maintenance of comparable groups (includes attrition, crossovers, adherence, contamination)
Important differential loss to follow-up or overall high loss to follow-up
Measurements: equal, reliable, and valid (includes masking of outcome assessment)
Clear definition of the interventions
All important outcomes considered
Definition of ratings based on criteria above:
Good: evaluates relevant available screening tests; uses a credible reference standard; interprets reference standard independently of screening test; reliability of test assessed; has few indeterminate results or handles indeterminate results in a reasonable manner; includes large number (>100) of broad-spectrum patients
Fair: evaluates relevant available screening tests; uses reasonable although not best standard; interprets reference standard independent of screening test; moderate sample size (50–100 participants) and a "medium" spectrum of patients
Poor: has fatal flaw, such as uses inappropriate reference standard; screening test improperly administered; biased ascertainment of reference standard; very small sample size or very narrow selected patients
Diagnostic accuracy studies
Criteria
Screening test relevant, available for primary care, adequately described
Study uses a credible reference standard, performed regardless of test results
Reference standard interpreted independently of screening test
Handles indeterminate results in a reasonable manner
Spectrum of patients included in study
Sample size
Administration of reliable screening test
Definition of ratings based on criteria above:
Good: evaluates relevant available screening test; uses a credible reference standard; interprets reference standard independently of screening test; reliability of test assessed; has few or handles indeterminate results in a reasonable manner; includes large number (>100) of broad-spectrum patients with and without disease
Fair: evaluates relevant available screening test; uses reasonable although not best standard; interprets reference standard independent of screening test; moderate sample size (50–100 participants) and a "medium" spectrum of patients
Poor: has fatal flaw, such as uses inappropriate reference standard; screening test improperly administered; biased ascertainment of reference standard; very small sample size or very narrow selected patients
a. Based on information from references 25 and 56. RCT = randomized, controlled trial.
Return to Document