Radiology: Radiologists accuracy unaffected by expected findings
“Over the decades it has been proposed that observer performance is affected by prior expectations,” with indications of these findings coming from well-known experiments ranging from sociology and psychology to medicine, Warren M. Reed, from the department of medical imaging and radiation sciences, part of the faculty of health sciences at the University of Sydney, in Lidcombe, Australia, and colleagues indicated. Previous radiology studies, in fact, have specified a small to moderate effect on confidence and accuracy among residents and faculty for x-rays and angiograms, depending on the expected number of abnormal findings, though the studies lacked consensus.
Reed and co-authors sought to measure the effect of the expected number of abnormalities (the ‘prevalence effect’) on radiologists’ performance and reading times. Twenty-two experienced radiologists were divided into three groups, with each group including at least one (out of five total) thoracic specialist. The groups were asked to interpret a set of 30 posteroanterior chest x-rays, 15 of which contained between one and three pulmonary nodules.
Each group read the same set of 30 films on two different days. Group one was given an expected count of 9 and 15 nodules for the first and second readings; group two was offered abnormality-prevalence expectations (of pulmonary nodules) of 15 and 22; and group three was given an expected number of 15 for the first reading and no expectation for the second reading. The order of the first or second reading was switched for half of the members of each group. Receiver operating characteristic (ROC) curves were created based on the placement and subjective confidence of identified nodules for each radiologist.
No significant differences were observed in the overall area under the curve (AUC) for the group ROC curves between the two disparate expectation readings. Compared with the standard reading of a radiologist who received no instruction on reading the 30 films, group one demonstrated an AUC of 0.91 when told to expect 9 nodules and an AUC of 0.93 when the expectation was 15 (the actual number of cases with nodules present).
Group two produced an AUC of 0.87 when told to expect 15 nodules and 0.85 when told to expect 22. Group three scored 0.92 for the 15-nodule expectation and 0.91 when given no expectation.
Although accuracy remained unchanged with different expectations, all groups spent significantly more time reviewing each image as the number of nodules they were told to expect increased. Moreover, the median number of fixations per image (defined as 100 msec of no more than 1 degree of movement by both eyes) increased significantly for all three groups as prevalence expectation grew.
At the same time, median fixation time on each detected lesion significantly decreased for groups two and three as prevalence expectation increased; group one experienced a similar effect just shy of significance. All these effects held for group three when moving from an expected to an unexpected number of nodules.
The authors said these effects showed “visual search significantly alters at higher prevalence expectation rates, with radiologists searching more, and possibly hesitating more, when told to expect a higher abnormality-prevalence rate.
“It was hypothesized that expert radiologists would record more false-positive findings when told a high prevalence level and conversely would record more false-negative findings when told a low prevalence level; however, this was not supported in this work,” Reed and colleagues continued.
The authors acknowledged that the expectation reported to radiologists might not have been their actual expectation, a distinction that would be difficult to measure precisely.
The findings could have important implications for laboratory experiments. Higher prevalence rates than are found in clinical practice are used in many studies to more efficiently test and observe experimental effects, however this laboratory condition is considered a limitation in these studies. If expected prevalence does not or only minimally affects radiologists’ interpretations, “[s]tudies could therefore be designed to obtain the highest statistical power per unit observer effort to test the hypothesis posed,” the authors noted.
Overall, the authors concluded that “[n]o evidence was provided that expert radiologists’ interpretative skills were affected by preconceptions due to prevalence expectation.”