Maybe no need to worry?: ‘ChatGPT Bot Flunks GastroenterologyExam’ One study should not make us complacent.
[Read online: https://www.medscape.com/viewarticle/992318 ]
Diana Swift May 23, 2023
ChatGPT, a popular artificial intelligence language-processing model, failed a gastroenterology self-assessment test several times in a recent study.
Versions 3 and 4 of the chatbot scored only 65% and 62%, respectively, on the American College of Gastroenterology (ACG) Self-Assessment Test. The minimum passing grade is 70%.
"You might expect a physician to score 99%, or at least 95%," lead author Arvind J. Trindade, MD, regional director of endoscopy at Northwell Health (Central Region) in New Hyde Park, New York, told Medscape Medical News in an interview.
The study was published online May 22 in the American Journal of Gastroenterology.
https://journals.lww.com/ajg/Abstract/9900/ChatGPT_Fails_the_Multiple_Ch...
Trindade and colleagues undertook the study amid growing reports of students using the tool across many academic areas, including law and medicine, and growing interest in the chatbot's potential in medical education.
"I saw gastroenterology students typing questions into it. I wanted to know how accurate it was in gastroenterology — if it was going to be used in medical education and patient care," said Trindade, who is also an associate professor at Feinstein Institutes for Medical Research in Manhasset, New York." Based on our research, ChatGPT should not be used for medical education in gastroenterology at this time, and it has a way to go before it should be implemented into the healthcare field."
Poor Showing
The researchers tested the two versions of ChatGPT on both the 2021 and 2022 online ACG Self-Assessment Test, a multiple-choice exam designed to gauge how well a trainee would do on the American Board of Internal Medicine Gastroenterology board examination.
Questions that involved image selection were excluded from the study. For those that remained, the questions and answer choices were copied and pasted directly into ChatGPT, which returned answers and explanations. The corresponding answer was selected on the ACG website based on the chatbot's response.
Of the 455 questions posed, ChatGPT-3 correctly answered 296, and Chat GPT-4 got284 right. There was no discernible pattern in the type of question that the chatbot answered incorrectly, but questions on surveillance timing for various disease states, diagnosis, and pharmaceutical regimens were all answered incorrectly.
The reasons for the tool's poor performance could lie with the large language model underpinning ChatGPT, the researchers write. The model was trained on freely available information — not specifically on medical literature and not on materials that require paid journal subscriptions — to be a general-purpose interactive program.
Additionally,the chatbot may use information from a variety of sources, including non- or quasi-medical sources, or out-of-date sources, which can lead to errors, they note. ChatGPT-3 was last updated in June 2021 and ChatGPT-4 in September 2021.
"ChatGPT does not have an intrinsic understanding of an issue," Trindade said. "Its basic function is to predict the next word in a string of text to produce an expected response, regardless of whether such a response is factually correct or not."
Previous Research
In a previous study, ChatGPT was able to pass parts of the US Medical Licensing Examination (USMLE).
https://www.medscape.com/viewarticle/987549
AI Bot ChatGPT Passes US Medical Licensing Exams
When queried, ChatGPT says it's no substitute for "the rigorous training and education required to become a lice...
The chatbot may have performed better on the USMLE because the information tested on the exam may have been more widely available for ChatGPT's language training, Trindade said. "In addition, the threshold for passing [the USMLE] is lower with regard to the percentage of questions correctly answered," he said.
ChatGPT seems to fare better at helping to inform patients than it does on medical exams. The chatbot provided generally satisfactory answers to common patient queries about colonoscopy in one study and about hepatocellular carcinoma and liver cirrhosis in another study.
ChatGPT Delivers Credible Answers to Colonoscopy Queries
The AI chatbot can generate easily understandable, scientifically accurate, and generally satisfactory answers t...
For ChatGPT to be valuable in medical education, "future versions would needto be updated with medical resources such as journal articles, society guidelines, and medical databases, such as UpToDate," Trindade said. "With directed medical training in gastroenterology, it may be a futuretool for education or patient use in this field, but not currently as it isnow. Before it can be used in gastroenterology, it should be validated."
Thatsaid, he noted, medical education has evolved from being based on textbooks andprint journals to include internet-based journal data and practice guidelines on specialty websites. If properly primed, resources such as ChatGPT may be thenext logical step.
This study received no funding. Trindade is a consultant for Pentax Medical, BostonScientific, Lucid Diagnostic, and Exact Science and receives research supportfrom Lucid Diagnostics.
AmJ Gastroenterol. Published online May 22,2023. Abstract
https://journals.lww.com/ajg/Abstract/9900/ChatGPT_Fails_the_Multiple_Ch...
Diana Swift is a freelance medicaljournalist based in Toronto.
--
Prof Joseph Ana
Lead Senior Fellow/ medicalconsultant.
Center for Clinical Governance Research &
Patient Safety (ACCGR&PS) @ HRI GLOBAL
P: +234 (0) 8063600642
E: info@hri-global.org
8 Amaku Street, State Housing, Calabar,Nigeria.
www.hri-global.org