Search Articles

View query in Help articles search

Search Results (1 to 10 of 95 Results)

Download search results: CSV END BibTex RIS


Authors’ Reply: Citation Accuracy Challenges Posed by Large Language Models

Authors’ Reply: Citation Accuracy Challenges Posed by Large Language Models

We also propose another solution aimed at fundamentally reducing citation errors: the development of “Reference-Accurate” academic LLM by major global publishers. Leading journals could develop their own specialized LLM, trained exclusively on rigorously verified academic literature from robust databases. This targeted training would ensure that every generated reference is accurate and directly traceable to published work.

Mohamad-Hani Temsah, Ayman Al-Eyadhy, Amr Jamal, Khalid Alhasan, Khalid H Malki

JMIR Med Educ 2025;11:e73698

Citation Accuracy Challenges Posed by Large Language Models

Citation Accuracy Challenges Posed by Large Language Models

Developers must continuously improve the LLM technology and algorithms, users must increase their awareness and critical evaluation skills while using LLMs, and academic institutions must strengthen the management and education in academic practices. Only through these efforts can we ensure that LLMs play a positive role in academic research and promote the dissemination and progress of knowledge.

Manlin Zhang, Tianyu Zhao

JMIR Med Educ 2025;11:e72998

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review

The emergence of several recent pilot studies of LLM-enabled data extraction prompts the need for a scoping review to map the current landscape, including definitions, frameworks, and future directions for this novel tool in clinical data extraction. This review seeks to address this gap in the literature by characterizing primary research articles that evaluated an LLM tool applied to data extraction from unstructured clinical text into structured data.

David Chen, Saif Addeen Alnassar, Kate Elizabeth Avison, Ryan S Huang, Srinivas Raman

JMIR Cancer 2025;11:e65984

The AI Reviewer: Evaluating AI’s Role in Citation Screening for Streamlined Systematic Reviews

The AI Reviewer: Evaluating AI’s Role in Citation Screening for Streamlined Systematic Reviews

As we only ran each citation through a given LLM once, multiple runs or “prompt engineering” strategies could yield more consistent or refined outcomes when evaluating LLMs. Nonetheless, our study offers a novel approach by directly comparing the performance of multiple LLMs, thus providing insight into how different architectures perform on the same dataset.

Jamie Ghossein, Brett N Hryciw, Tim Ramsay, Kwadwo Kyeremanteng

JMIR Form Res 2025;9:e58366

Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework

Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework

Our LLM-based model inherently provides ease of use in other LLM-based solutions for the medical domain enabling clinical automation. Our generative pretrained transformer (GPT)-based LLM beats the BERT-based Che Xbert model on many pathologies and with a far bigger context length can handle long reports as compared with Che Xbert. Our model outperforms the previous labelers [8] for many pathologies on an external dataset, MIMIC-CXR [9].

Abdullah Abdullah, Seong Tae Kim

JMIR Med Inform 2025;13:e68618

Prompt Engineering an Informational Chatbot for Education on Mental Health Using a Multiagent Approach for Enhanced Compliance With Prompt Instructions: Algorithm Development and Validation

Prompt Engineering an Informational Chatbot for Education on Mental Health Using a Multiagent Approach for Enhanced Compliance With Prompt Instructions: Algorithm Development and Validation

Therefore, to leverage the benefits of LLMs in mental health care while avoiding the numerous risks, it is crucial to develop robust systems for restricting the scope of LLM-powered chatbots to the supplementary roles in which they excel and ensuring that they do not drift into taking on superficially similar roles. Prompting is a technique often used to direct chatbots toward producing more accurate and relevant responses without having to collect new training data and retrain the LLM [10].

Per Niklas Waaler, Musarrat Hussain, Igor Molchanov, Lars Ailo Bongo, Brita Elvevåg

JMIR AI 2025;4:e69820

Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination

Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination

This task poses a different challenge to an LLM than medical board examinations in the English language [12,13], as the performance of such models in other languages and in combination with more recent GPT versions and available plugins has not been explored. In the medical field, where mistakes can have severe consequences, assessing the amount of uncertainty is of paramount importance [14].

Julian Madrid, Philipp Diehl, Mischa Selig, Bernd Rolauffs, Felix Patricius Hans, Hans-Jörg Busch, Tobias Scheef, Leo Benning

JMIR Med Educ 2025;11:e58375

Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes

Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes

We aimed to estimate the presence and size of bias related to gender and sexual orientation produced by Chat GPT-4, a common LLM, as well as Menta LLa MA, an LLM fine-tuned for the mental health domain, exemplified by their application in the context of ED symptomatology and health-related quality of life (HRQo L) of patients with AN or BN.

Rebekka Schnepper, Noa Roemmel, Rainer Schaefert, Lena Lambrecht-Walzinger, Gunther Meinlschmidt

JMIR Ment Health 2025;12:e57986