Research Letter
doi:10.2196/49889
Keywords
Introduction
Nonmelanoma skin cancer (NMSC) represents the most prevalent form of cancer worldwide [
]. Patients with NMSC seek information from various resources. Work has already shown that language learning models (LLMs) such as ChatGPT can generate medical information in response to questions [ ]; however, results vary significantly based on the prompts entered. Previous work has shown that a few-shot approach, where one provides several example prompts and outputs, has good results [ ], as does the few-shot chain of thought approach, where answers include examples and the reasoning for correct answers, encouraging the model to reason through the question [ ]. Zero-shot chain of thought (ZS-COT) prompting does not provide example prompts; instead, it uses phrases to encourage the LLMs to “think” through their responses, with significant improvement in accuracy in some contexts [ ]. In this study, we explore ChatGPT’s performance in answering questions about NMSC using both standard and ZS-COT prompting.Methods
Overview
We generated 25 common clinical questions about NMSC in four categories: general, diagnosis, management, and risk factors. Prompts were entered into ChatGPT 4.0 on March 31, 2023, and responses were recorded for both standard and ZS-COT prompting (
A). Ending ZS-COT prompting queries with “Let’s think step by step” has been shown to improve performance in previous papers [ ]. Three attending dermatologists independently reviewed and graded whether the outputs would be appropriate for a patient-facing website and an electronic health record (EHR) message draft to a patient. Responses were also evaluated for accuracy on a 5-point scale, with 1 being completely inaccurate and 5 being completely accurate, and reviewers assessed which of the two prompting styles they preferred. Statistical differences between prompts were computed using the Wilcoxon test. Statistical analysis was performed in R version 4.2.2 (R Foundation for Statistical Computing).Ethical Considerations
This study did not require institutional review board approval.
Results
Averaging all accuracy scores from a scale (range 1-5), we found that the combined accuracy for both the original prompt and ZS-COT prompt was 4.89. The average accuracy score from all 25 questions asked for the original prompt and ZS-COT prompt was 4.92 and 4.87, respectively, representing a nonsignificant difference of 1.03%. Both models were deemed 100% appropriate for a patient-facing information portal for general, diagnosis, management, and risk factor questions. For EHR message responses, outputs were appropriate for 97% of general questions, 92% of diagnosis questions, 85% of management questions, and 100% of risk factor questions (
B). The lowest accuracy grade for the standard prompting responses and ZS-COT prompting was 4 and 2, respectively ( C). This score was given for the prompt “What causes basal cell carcinoma?” ( ).Discussion
This exploratory qualitative study found that LLMs can provide accurate patient information regarding NMSC appropriate for both general websites and EHR messages. We found that ZS-COT prompting does not provide more accurate dermatology information. The limitations of this study include that we only explored a subset of clinical questions patients may have about NMSC, there is no objective standard for appropriateness, and the personal biases of the dermatologists may bias response preference. As LLMs continue to grow and be adapted, clinicians must monitor their clinical utility and how different prompting methods may change the quality of results.
Conflicts of Interest
BU is an employee of Mount Sinai and has received research funds (grants paid to the institution) from Incyte, Rapt Therapeutics, and Pfizer. He is also a consultant for Arcutis Biotherapeutics, Castle Biosciences, Fresenius Kabi, Pfizer, and Sanofi. JU is an employee of Mount Sinai and is a consultant for AbbVie, Castle Biosciences, Dermavant, Janssen, Menlo Therapeutics, Mitsubishi Tanabe Pharma America, and UCB. The rest of the authors declare no relevant conflicts of interest.
Evaluated nonmelanoma skin cancer questions.
DOCX File , 18 KBReferences
- Dubas LE, Ingraffea A. Nonmelanoma skin cancer. Facial Plast Surg Clin North Am. Feb 2013;21(1):43-53. [CrossRef] [Medline]
- Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA. Mar 14, 2023;329(10):842-844. [FREE Full text] [CrossRef] [Medline]
- Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. arXiv. Preprint posted online on May 28, 2020. [FREE Full text]
- Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv. Preprint posted online on January 28, 2022. [FREE Full text]
- Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. arXiv. Preprint posted online on May 24, 2022. [FREE Full text]
Abbreviations
EHR: electronic health record |
LLM: language learning model |
NMSC: nonmelanoma skin cancer |
ZS-COT: zero-shot chain of thought |
Edited by J Solomon, I Brooks; submitted 12.06.23; peer-reviewed by A Hidki, U Kanike, D Chrimes; comments to author 21.09.23; revised version received 02.10.23; accepted 03.12.23; published 14.12.23.
Copyright©Ross O'Hagan, Dina Poplausky, Jade N Young, Nicholas Gulati, Melissa Levoska, Benjamin Ungar, Jonathan Ungar. Originally published in JMIR Dermatology (http://derma.jmir.org), 14.12.2023.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Dermatology, is properly cited. The complete bibliographic information, a link to the original publication on http://derma.jmir.org, as well as this copyright and license information must be included.