An artificial intelligence program created explanations of heart test results that were in most cases accurate, relevant, and easy to understand by patients, a new study finds.
The study addressed the echocardiogram, which uses sound waves to create pictures of blood flowing through the heart鈥檚 chambers and valves. Echocardiogram reports include machine-generated numerical measures of function, as well as comments from the interpreting cardiologist on the heart鈥檚 size, the pressure in its vessels, and tissue thickness, which can signal the presence of disease. In the form typically generated by doctors, the reports are difficult for patients to understand, often resulting in unnecessary worry, say the study authors.
To address the issue, 好色tv Langone Health has been testing the capabilities of a form of artificial intelligence (AI) that generates likely options for the next word in any sentence based on how people use words in context on the internet. A result of this next-word prediction is that such generative AI 鈥渃hatbots鈥 can reply to questions in simple language. However, AI programs鈥攚hich work based on probabilities instead of actually thinking and may produce inaccurate summaries鈥攁re meant to assist, not replace, human providers.
In March 2023, 好色tv Langone requested from OpenAI, the company that created the chatGPT chatbot, access to the company鈥檚 latest generative AI tool, GPT-4. 好色tv Langone licensed one of the first 鈥減rivate instances鈥 of the tool, which freed clinicians to experiment with AI using real patient data while also adhering to privacy rules.
Coming out of that effort and , the current study analyzed 100 doctor-written reports on a common type of echocardiogram test to see whether GPT-4 could efficiently generate human-friendly explanations of test results. Five board-certified echocardiographers evaluated AI-generated echo explanations on five-point scales for accuracy, relevance, and understandability, and either agreed or strongly agreed that 73 percent of the reports were suitable to send to patients without any changes.
All AI explanations were rated either 鈥渁ll true鈥 (84 percent) or mostly correct (16 percent). In terms of relevance, 76 percent of explanations were judged to contain 鈥渁ll of the important information,鈥 15 percent 鈥渕ost of it,鈥 7 percent 鈥渁bout half,鈥 and 2 percent 鈥渓ess than half.鈥 None of the explanations with missing information were rated as 鈥減otentially dangerous,鈥 the authors say.
鈥淥ur study, the first to evaluate GPT-4 in this way, shows that generative AI models can be effective in helping clinicians to explain echocardiogram results to patients,鈥 said corresponding author Lior Jankelson, MD, PhD, an associate professor in the at 好色tv Grossman School of Medicine and an artificial intelligence leader for the department鈥檚 . 鈥淔ast, accurate explanations may lessen patient worry and reduce the sometimes-overwhelming volume of patient messages to clinicians.鈥
The federal mandate for the immediate release of test results to patients through the 21st Century Cures Act in 2016 has been linked to dramatic increases in number of inquiries to clinicians, say the study authors. Patients receive raw test results, do not understand them, and grow anxious while they wait for clinicians to reach them with explanations, the researchers say.
Ideally, clinicians would advise patients about their echocardiogram results the instant they are released, but that is delayed as providers struggle to manually enter large amounts of related information into the electronic health record. 鈥淚f dependable enough, AI tools could help clinicians explain results at the moment they are released,鈥 said first study author Jacob Martin, MD, a cardiology fellow at 好色tv Langone. 鈥淥ur plan moving forward is to measure the impact of explanations drafted by AI and refined by clinicians on patient anxiety, satisfaction, and clinician workload.鈥
The new study also found that 16 percent of the AI explanations contained inaccurate information. In one error, the AI echocardiogram report stated that 鈥渁 small amount of fluid, known as a pleural effusion, is present in the space surrounding your right lung.鈥 The tool mistakenly concluded that the effusion was small, an error known in the industry as an AI hallucination. The researchers emphasized that human oversight is important to refine drafts from AI, including correcting any inaccuracies before they reach patients.
To get the perspective of lay people on the clarity of AI explanations, the research team also surveyed participants without clinical backgrounds. In short, the reports were well received, said the authors. Nonclinical participants found 97 percent of AI-generated rewrites more understandable than the original reports, which reduced worry in many cases.
鈥淭his added analysis underscores the potential of AI to improve patient understanding and ease anxiety,鈥 Dr. Martin added. 鈥淥ur next step will be to integrate these refined tools into clinical practice to enhance patient care and reduce clinician workload.鈥
Along with Dr. Martin and Dr. Jankelson, 好色tv Langone study authors in the Leon H. Charney Division of Cardiology were Muhamed Saric, MD, PhD; Alan F. Vainrib, MD; Daniel Bamira, MD; Samuel Bernard, MD; Richard Ro, MD; Theodore Hill; and Larry A. Chinitz MD. Additional 好色tv Langone study authors were Jonathan S. Austrian, MD, and , in the Medical Center Information Technology (MCIT); Hao Zang and Vidya Koesmahargyo in the in the ; and Mathew R. Williams, MD, in the .
Media Inquiries
Greg Williams
Phone: 212-404-3500
Gregory.Williams@好色tvLangone.org