People often question medical AI advice

- People often doubt medical AI advice - often with good reason

An unexplained stomach ache, a persistent cough, or an unusual spot on a toenail: People querying Google about various symptoms is not a new phenomenon, and with the increasing popularity of AI-based chatbots like ChatGPT, the possibilities for digital self-diagnosis appear to be growing. However, a study from Würzburg, reported in the journal "Nature Medicine", shows that there are still significant reservations about the medical competence of such artificial intelligence.

Perception of AI Advice Investigated

The Würzburg researchers investigated how people react to AI-generated medical advice. "We weren't interested in the technical competence of the AI, but rather how the AI output is perceived," says Moritz Reis from the Julius-Maximilians-Universität.

To do this, the research team divided over 2,000 participants into three groups, each receiving identical medical advice. The first group was told the recommendations came from a doctor, the second was told they came from an AI-based chatbot, and the third group believed the advice came from a chatbot but had been reviewed by a doctor.

Participants rated the recommendations for reliability, understandability, and empathy. Whenever they suspected AI involvement, they perceived the advice as less empathetic and reliable. This was true even for the group that believed a doctor had reviewed the AI recommendations. Consequently, they were less likely to follow these recommendations. "The bias against AI is statistically significant, although not overwhelming," comments Reis.

Explanations for AI Skepticism

Reis partly attributes AI skepticism to stereotypes: "Many believe a machine can't be empathetic." However, all three groups rated the understandability of the advice equally.

For the research group, the identified AI skepticism is important because AI is playing an increasingly significant role in medicine. Many studies on new AI applications are currently being published. Therefore, public acceptance is crucial, says Reis: "The question of future AI use in medicine isn't just about what's technically possible, but also how far patients are willing to go." Education about these applications and AI in general is necessary. "Other studies have shown how important it is for patient trust that a human doctor has the final say, together with the patient," emphasizes Reis.

Transparency as a Key Factor

Reis considers transparency particularly relevant: "This means, for example, that an AI doesn't just make a diagnosis, but also explains in a traceable way what information led to this result."

The quality of these results has been scientifically investigated for some time with varying degrees of success. For instance, a 2023 study in the "Journal of Medical Internet Research" found that ChatGPT had a high diagnostic accuracy, correctly identifying the final diagnosis in nearly 77% of 36 cases. A Dutch study even suggested that ChatGPT's diagnostic capabilities were on par with doctors in emergency rooms, accurately diagnosing 97% of cases using anonymized data from 30 patients.

Conversely, a 2023 study published in the "Jama" journal found that ChatGPT only correctly diagnosed 27 out of 70 medical cases, a mere 39%. A study published in "Jama Pediatrics" concluded that its accuracy was even worse for conditions primarily affecting children.

ChatGPT in Medical Education

A recent study published in the "Plos One" journal investigated whether ChatGPT could be useful in medical education. The research team from the Canadian London Health Sciences Centre noted that the chatbot has access to a vast knowledge base and can communicate this information interactively and understandably.

The team fed ChatGPT 150 "case challenges" from a database of medical case histories, where symptoms and disease progression are described. Both medical students and professionals were asked to make a diagnosis and develop a treatment plan using a multiple-choice format.

ChatGPT was correct in only 74 out of 150 cases, or just under half. The study found that ChatGPT struggles with interpreting lab values and imaging results and often overlooks important information. The authors concluded that in its current form, ChatGPT is not accurate enough to be used as a diagnostic tool and should be used with caution as both a diagnostic aid and educational tool.

"The combination of high relevance and relatively low accuracy argues against relying on ChatGPT for medical advice, as it may present misleading information," the study warned, a caution that likely applies to laypeople using the bot for digital self-diagnosis as well.

ChatGPT's Self-Assessment

ChatGPT itself emphasizes that it is not suitable for making diagnoses. When asked about its diagnostic capabilities, the bot responds: "I am not a doctor and do not have medical training. I can provide information on medical topics, offer general advice, and answer questions, but I cannot make medical diagnoses or provide professional medical advice. If you have health problems or questions, you should always consult a doctor or a qualified healthcare provider."

ChatGPT and Diagnosing Childhood Conditions (JAMA Pediatrics, 2024)

In light of the study's findings, individuals tend to perceive AI-generated medical advice as less reliable and empathetic than advice from human doctors, potentially affecting their willingness to follow the recommendations. Interestingly, other studies have pointed out the importance of human doctors having the final say in patient care, emphasizing the need for education about AI in healthcare.