ChatGPT's radiology board success has experts rethinking resident education
Following ChatGPT’s strong performance on a mock radiology board exam, experts are calling for radiology training programs to rethink how they are educating residents.
In an editorial published in Radiology alongside ChatGPT’s test results, authors Ana P. Lourenco, Grayson L. Baird and Priscilla J. Slanetz, from the department of diagnostic imaging at the Warren Alpert Medical School of Brown University and the department of radiology at Boston University Medical Center, suggested that while the chatbot’s scores do reflect its impressive strengths, its weaknesses could present a unique opportunity for educators [1].
The authors pointed to the chatbot’s performance on the mock board—it performed well on questions requiring low-order thinking and worse on questions intended to prompt more intense cognitive processing, such as those related to describing of imaging findings, calculation and classification, and application of concepts [2].
“Basically, large language models scoured the internet for words and did well on low-hanging fruit (low taxonomy) questions,” the authors noted. “The key finding is that ChatGPT did not perform well on higher-order taxonomy material and potentially may not be able to become proficient in it.”
Low-order learning takes less time than higher-order learning. ChatGPT is very skilled in low-order learning—it can answer multiple choice-style questions about radiology quite efficiently, despite never having been trained with radiology-specific data. However, unlike humans, the chatbot is incapable of thinking critically; although it can answer a question correctly, it often cannot explain how it arrived at its conclusion, which is a critical skill for emerging radiologists.
Unfortunately, low-order learning comes first, and radiology residents must dedicate a considerable amount of time to it before they can reach proficiency in higher-order learning. This focus, the authors suggested, can reduce residents’ time spent building critical thinking skills that set them apart from AI.
“In other words, a human usually learns to handle higher taxonomy content by mastering the low-hanging fruit first,” the group explained. “The metacognitive knowledge is attained only after mastering the factual, conceptual, and procedural knowledge.”
Due to the way radiology boards are formatted, residents are required to memorize facts and concepts that will enable them to spot correct answers on multiple choice questions, rather than prompting them to reason. The authors cautioned that this “teach to test” inhibits residents' ability to grow cognitively.
“What is the value of having residents take an examination that ChatGPT can successfully pass (or can surpass the performance of residents)? If we are not careful, AI may ultimately outperform residents—and not just on examinations—if we only teach to the test.,” the group wrote.
Their solution? In short, they suggested that educators stop the “teach to test” methods and instead encourage students to “to read and spend more time at the view box!” This comes in the form of creating evaluations that require more high-order knowledge, clinical judgement, critical problem-solving skills, etc.
The American Board of Radiology’s recent announcement that they will be returning to oral board exams is a good example of how better educational initiatives can be achieved. The authors described the ABR move as a “positive” step in the right direction.
“As AI evolves, radiology education and evaluation must also evolve, adapting in ways that prepare residents over and above the capability of AI,” the authors concluded.
The study abstract detailing ChatGPT’s test results can be found here, and the accompanying editorial here.