Mammo by the numbers: Benchmarks help identify MDs who don't make the grade
If underperforming physicians received interventions to boost performance on newly developed criteria for diagnostic mammography, the result would be an increase in cancer diagnosis and a reduction of false-positives, according to an article published online Jan. 7 in Radiology.
Patricia A. Carney, PhD, of Oregon Health and Science University in Portland, and colleagues created the performance thresholds for physicians interpreting diagnostic mammography studies based on a review of National Cancer Institute data.
“Identifying low performers who might benefit from additional training should lead to more accurate and cost-effective diagnostic mammography,” wrote the authors.
Cutoff points were established to identify performance measures which could indicate remedial training is required. Final cut points for workup of abnormal screening exams were:
- Sensitivity less than 80 percent
- Specificity less than 80 percent or greater than 95 percent
- Abnormal interpretation rate less than 8 percent or greater than 25 percent
- Positive predictive value (PPV) of biopsy recommendation less than 15 percent or greater than 40 percent
- PPV of biopsy performed less than 20 percent or greater than 45 percent
- Cancer diagnosis rate less than 20 per 1,000 interpretations
Cut points for workup of a breast lump were:
- Sensitivity less than 85 percent
- Specificity less than 83 percent or greater than 95 percent
- Abnormal interpretation rate less than 10 percent or greater than 25 percent
- PPV of biopsy recommendation less than 25 percent or greater than 50 percent
- PPV of biopsy performed less than 30 percent or greater than 55 percent
- Cancer diagnosis rate less than 40 per 1,000 interpretations
Criteria were based on an examination of Breast Cancer Surveillance Consortium (BCSC) data, according to Carney and colleagues. They explained that the outcome measures that include both an upper and lower bound for acceptability, such as abnormal interpretation rate, reflect concern for limiting false-positives.
“This is because too high an abnormal interpretation rate, which typically results in a low PPV and specificity, may indicate an excessive number of abnormal assessments, resulting in increased false-positives and a low probability of diagnosing cancer among women with a positive assessment,” wrote the authors.
Carney and colleagues also used BCSC data to estimate the expected clinical impact of the thresholds. If every physician’s performance was moved into an acceptable range, an additional 86 cancers would be diagnosed per 100,000 women undergoing workup after screening exams, and an additional 335 cancers would be diagnosed per 100,000 women undergoing workup of a breast lump. False-positives would be reduced by 1,067 per 100,000 and 634 per 100,000 for workups as a result of screening and breast lumps, respectively.
The authors indicated that between 16 and 42 percent of BCSC interpreting physicians fall outside one of the indicated cut points, though they did offer the caveat that performance should be considered within the context of practice setting. “Performance measures may be affected by many factors, such as differences in patient populations and a low number of cancers diagnosed,” they wrote.