December 1, 2025

Evaluation of the Quality and Reliability of ChatGPT-4's Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment

Cherrez-Ojeda I, Zuberbier T, Rodas-Valero G, Sanchez J, Rudenko M, Dramburg S, Demoly P, Caimmi D, Gómez RM, Ramon GD, Fouda GE, Quimby KR, Chong-Neto H, Llosa OC, Larco JI, Monge Ortega OP, Faytong-Haro M, Pfaar O, Bousquet J, Robles-Velasco K.  Clin Transl Allergy. 2025 Dec;15(12):e70130. doi: 10.1002/clt2.70130.


ABSTRACT

Background

Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) represents an advancing large language model (LLM) with potential applications in medical education and patient care. While Allergen Immunotherapy (AIT) can change the course of allergic diseases, it can also bring uncertainty to patients, who turn to readily available resources such as ChatGPT-4 to address these doubts. This study aimed to use validated tools to evaluate the information provided by ChatGPT-4 regarding AIT in terms of quality, reliability, and readability.

Methods

In accordance with EAACI clinical guidelines about AIT, 24 questions were selected and introduced in ChatGPT-4. Independent reviewers evaluated ChatGPT-4 responses using three validated tools: the DISCERN instrument (quality), JAMA Benchmark criteria (reliability), and Flesch-Kincaid Readability Tests (readability).

Descriptive statistics summarized findings across categories.

Results

Quality scores according to DISCERN
ChatGPT-4 responses were generally rated as “fair quality” on DISCERN, with strengths in classification/formulations and special populations. Notably, the tool provided good-quality responses on the preventive effects of AIT in children and premedication to reduce adverse reactions. However, JAMA Benchmark scores consistently indicated “insufficient information” (median = 0–1), primarily due to absent authorship, attribution, disclosure, and currency. Readability analyses revealed a college graduate–level requirement, with most responses classified as “very difficult” to understand. Overall, ChatGPT-4 demonstrated fair quality, insufficient reliability, and difficult readability for patients.

Conclusions

ChatGPT-4 provides generally well-structured responses on AIT but lacks reliability and readability for clinical or patient-directed use. Until specialized, reference-based models are developed, healthcare professionals should supervise its use, particularly in sensitive areas such as dosing and safety.

PDF

No comments:

Post a Comment