Plan • GRADE background • two steps – confidence in estimates (quality of evidence) – strength of recommendation • quality and strength can differ • profiles and summary of findings • exercises in applying GRADE
• any experience participating in guideline panels? • Is grading recommendations a good idea? • Why? • experience with grading – systems used?
Grading good idea, but which grading system to use? • many available – Australian National and MRC – Oxford Center for Evidence-based Medicine – Scottish Intercollegiate Guidelines (SIGN) – US Preventative Services Task Force – American professional organizations • AHA/ACC, ACCP, AAP, Endocrine society, etc.... • cause of confusion, dismay
Common international grading system? • GRADE ( G rades of r ecommendation, a ssessment, d evelopment and e valuation) • international group – Australian NMRC, SIGN, USPSTF, WHO, NICE, Oxford CEBM, CDC, CC • ~ 30 meetings over last 13 years • (~10 – 70 attendants)
Grading system – for what? • interventions – management strategy 1 versus 2 • what grade is not about – individual studies (body of evidence)
What GRADE is not primarily about • diagnostic accuracy questions – in lung cancer, what is the accuracy of CT scanning of the mediastinum • what it is about: diagnostic impact – does use of CT scanning improve outcomes • prognosis
70+ Organizations 2008 2010 2005 2006 2007 2009 2011 9
GRADE uptake
What are we grading? • two components • confidence in estimate of effect adequate to support decision (quality of body of evidence) • high, moderate, low, very low • strength of recommendation • strong and weak
Confidence in estimate (quality of evidence) Very Low Moderate totally no confident Low High confidence
Structured question • patients: – women considering breast cancer screening – 50 to 74 – no risk genetic mutation chest radiation • intervention – film mammography • alternative – no screening
Need to define all patient-important outcomes and evaluate their importance • desirable consequences – reduction in breast cancer mortality • undesirable consequences – false positive screening results - anxiety – invasive procedures from positive results – complications of invasive procedures – unnecessary diagnosis and treatment
Determinants of confidence • RCTs start high • observational studies start low • what can lower confidence? • risk of bias • inconsistency • indirectness • imprecision • publication bias
Risk of Bias • well established – concealment – intention to treat principle observed – blinding – completeness of follow-up • more recent – selective outcome reporting bias – stopping early for benefit
Consistency of results • if inconsistency, look for explanation – patients, intervention, outcome, methods • judgment of consistency – variation in size of effect – overlap in confidence intervals – statistical significance of heterogeneity – I 2
Relative Risk with 95% CI for Vitamin D Non-vertebral Fractures
Relative Risk with 95% CI for Vitamin D (Non-Vertebral Fractures, Dose >400)
Relative Risk with 95% CI for Vitamin D (Non-Vertebral Fractures, Dose = 400)
Confidence judgments: Directness • populations – older, sicker or more co-morbidity • interventions – warfarin in trials vs clinical practice • outcomes – important versus surrogate outcomes – glucose control versus CV events
Figure 6: Hierarchy of outcomes according to their patient-importance to assess the effect of phosphate lowering drugs in patients with renal failure and hyperphophatemia Importance Surrogates of declining importance of endpoints Mortality 9 Coronary Ca 2+ /P- Critical Myocardial infarction 8 calcification Product for decision making Bone Ca 2+ /P- Fractures 7 density Product Pain due to soft tissue Soft tissue Ca 2+ /P- calcification Product Calcification / function 6 Important, but not critical for 5 decision making 4 Lower by one level for 3 indirectness Of low patient- 2 Flatulence importance Lower by two levels for 1 indirectness
Directness interested in A versus B available data A vs C, B vs C Alendronate Risedronate Placebo
Imprecision • small sample size – small number of events • wide confidence intervals – uncertainty about magnitude of effect • how do you decide what is too wide? • primary criterion: – would decisions differ at ends of CI
Precision • atrial fib at risk of stroke • warfarin increases serious gi bleeding – 3% per year • 1,000 patients 1 less stroke – 30 more bleeds for each stroke prevented • 1,000 patients 100 less strokes – 3 strokes prevented for each bleed • where is your threshold? – how many strokes in 100 with 3% bleeding?
1.0% 0
1.0% 0
1.0% 0
1.0% 0
1.0% 0.5% 0
Example: clopidogrel or ASA? • pts with threatened stroke • RCT of clopidogrel vs ASA – 19,185 patients • ischaemic stroke, MI, or vascular death compared – 939 events (5·32%) clopidogrel – 1021 events (5·83%) with aspirin • RR 0.91 (95% CI 0.83 – 0.99) (p=0·043) • rate down for precision?
Clopidogrel or ASA for threatened vascular events RCT 19,185 patients 1.7% - 0.9 – 0.1% RR 0.91 (95% CI 0.83 – 0.99) 1.0% 0
Non-inferiority 0
Non-inferiority 0
Non-inferiority 0
Publication bias • high likelihood could lower quality • when to suspect • number of small studies • industry sponsored
Funnel Plot Fish oil on mortality
What can raise confidence? • large magnitude can rate up one level – very large two levels • common criteria – everyone used to do badly – almost everyone does well – quick action • hip replacement for hip osteoarthritis
Dose-response gradient • childhood lymphoblastic leukemia • risk for CNS malignancies 15 years after cranial irradiation • no radiation: 1% (95% CI 0% to 2.1%) • 12 Gy: 1.6% (95% CI 0% to 3.4%) • 18 Gy: 3.3% (95% CI 0.9% to 5.6%).
Confidence assessment criteria
Beta blockers in non-cardiac surgery Summary of Findings Quality Assessment Relative Absolute risk Quality Effect difference Number of (95% CI) Risk of Publication Outcome participants Consistency Directness Precision Bias Bias (studies) Myocardial 10,125 No serious No serious No serious No serious Not 0.71 1.5% fewer High infarction (9) limitations imitations limitations limitations detected (0.57 to 0.86) (0.7% fewer to 2.1% fewer) 0.5% more 10,205 No serious No serious No serious Not 1.23 Mortality Imprecise Moderate (0.1% fewer (0.98 – 1.55) (7) limitations limiations limitations detected to 1.3% more) 10,889 No serious No serious No serious No serious Not 2.21 0.5% more Stroke High (1.37 – 3.55) (5) limitaions limitations limitations limitations detected (0.2% more to 1.3% more0
High versus low PEEP in ALI and ARDS Population No. of Higher Lower Adjusted Relative Risk Adjusted Absolute Risk Quality participants PEEP PEEP (95% CI; P - value) ‡ Difference (95% CI) (trials) † Patients with 1892 (3) 324/951 368/941 0.90 (0.81 to 1.00; -3.9% (-7.4% to -0.04%) High ARDS (34.1%) (39.1%) 0.049) Patients 404 (3) 50/184 41/220 1.37 (0.98 to 1.92; 6.9% (-0.4% to 17.1%) Moderate without ARDS (27.2%) (18.6%) 0.065) (imprecision)
Overall level of evidence • most systems just use evidence about primary benefit outcome • but what about others (risk)? • what to do? • options – ignore all but primary – weakest of any outcome – some blended approach – weakest of critical outcomes
Strength of Recommendation • strong recommendation – benefits clearly outweigh risks/hassle/cost – risk/hassle/cost clearly outweighs benefit • what can downgrade strength? • low confidence in estimates • close balance between up and downsides
Risk/Benefit tradeoff • aspirin after myocardial infarction – 25% reduction in relative risk – side effects minimal, cost minimal – benefit obviously much greater than risk/cost • warfarin in low risk atrial fibrillation – warfarin reduces stroke vs ASA by 50% – but if risk only 1% per year, ARR 0.5% – increased bleeds by 1% per year
Strength of Recommendations Aspirin after MI – do it Warfarin rather than ASA in Afib -- probably do it -- probably don’t do it
Significance of strong vs weak • variability in patient preference – strong, almost all same choice (> 90%) – weak, choice varies appreciably • interaction with patient – strong, just inform patient – weak, ensure choice reflects values • use of decision aid – strong, don’t bother – weak, use the aid • quality of care criterion – strong, consider – weak, don’t consider
Recommend
More recommend