From MedPage Today:
ChatGPT was good at answering low-complexity cardiology questions, but came up short on more complicated cases, researchers found.
When analyzing a series of case vignettes of cardiology-related questions that patients had for their primary care providers, the generative artificial intelligence (AI) tool answered correctly 90% of the time. But with a more complex set of vignettes involving questions that primary care doctors referred to their cardiology colleagues, the technology only answered correctly half of the time, reported Ralf Harskamp, MD, PhD, and Lukas de Clercq, MSc, of Amsterdam UMC in the Netherlands.
ChatGPT also performed decently well when answering cardiology trivia questions, scoring a 74% overall, the researchers noted in medRxiv.
“Specialized medical information probably comprises a very small part of the training data, and yet that implicit knowledge is somewhere in this giant stochastic space of the model. So we were surprised to see that it performed rather well,” de Clercq, a PhD candidate, told MedPage Today.
“We did not give it any context for these questions,” he added. “It might perform even better if you provide it with the current guidelines and then ask questions regarding those guidelines.”
David Asch, MD, MBA, senior vice dean for strategic initiatives at the University of Pennsylvania Perelman School of Medicine in Philadelphia, who was not involved in the study, said it’s important to evaluate ChatGPT’s performance in all situations — from answering medical trivia questions, to more complex tasks such as assessing patient questions and complicated clinical scenarios.
“We have high standards for accuracy in medicine,” said Asch, who recently published a conversation with ChatGPT in the NEJM Catalyst. “If you’re using an AI program to direct patients to hospital parking, a little bit of inaccuracy is not life-threatening. It may be frustrating, but there are no fatalities.”
Since ChatGPT showed off its general medical capabilities by passing the U.S. Medical Licensing Examination (USMLE) a few times now, researchers have started to conduct more specialty-specific assessments, including in hepatology, breast cancer, and now cardiology.
Harskamp and de Clercq put its cardiology knowledge to the test first by posing 50 cardiovascular trivia questions that were based on quizzes for medical professionals. While its overall accuracy was 74%, there was slight variation in accuracy across domains, from 80% for each of three categories — coronary artery disease, pulmonary and venous thrombotic embolism, and heart failure — to 70% for atrial fibrillation and 60% for cardiovascular risk management.
They then posed a total of 20 clinical vignettes to the program, comparing its accuracy with expert clinical opinion.
Of the 10 scenarios where patients posed questions to their primary care doctors — about whether their symptoms should be a reason for concern, medication use, and behavioral changes such as diet, for instance — ChatGPT’s advice was in line with actual clinical advice and care in nine of the cases.
One major inconsistency occurred when the chatbot advised using thrombolytic agents as secondary preventive medications in a patient with a prior myocardial infarction (MI). “While there is a place for thrombolytics in the acute phase [in settings where primary percutaneous coronary intervention is not available], these medications have no place in chronic management of patients post-MI,” the researchers wrote.
Senior Vice Dean for Strategic Initiatives in the Perelman School of Medicine