Research Letter
doi:10.2196/50163
Keywords
Introduction
A study of 402 randomly selected Medicaid enrollees reported an average of a 5th-grade reading level, which is lower than the average 8th-grade level of US adults [
, ]. Therefore, the American Medical Association (AMA) recommends developing health materials at a 6th-grade reading level or lower [ ]. However, a 2018 systematic review of 7891 health websites reported that educational health materials are often at 10th- to 15th-grade reading levels [ ].In a study evaluating ChatGPT-generated materials for 14 dermatological diseases, content was at a 10th-grade reading level [
]. We hypothesized that ChatGPT could be prompted to generate rewritten health materials at a lower grade level and in line with AMA recommendations. The readability of ChatGPT-generated dermatology information and public educational resources on the American Academy of Dermatology Association’s (AAD) website was assessed and determined whether strategic prompting would enhance the material’s readability.Methods
We inputted the AAD website’s sunscreen and melanoma FAQs individually into ChatGPT, then compiled corresponding outputs, with the supplemental prompts: “I don’t understand, please clarify” and “I still don’t understand, please clarify.” We used well-established readability and health literacy assessment tools and a single web-based readability calculator to calculate 7 different scores [
, ], and computed an “average readability” score with these grade level outputs. A 2-sample t test was used for comparisons (P<.05). To determine information accuracy before and after prompting, 3 dermatology residents blindly evaluated the education materials using a numerical scale: 1 (not accurate), 2 (somewhat accurate), and 3 (accurate).Results
The AAD’s sunscreen FAQs and melanoma FAQs had Flesch Reading Ease scores of 60.9 (standard/average) and 56.2 (fairly difficult), respectively. The initial ChatGPT output had readability scores of 60.5 (standard/average) and 46.5 (difficult) for sunscreen and melanoma questions, respectively. Subsequent prompting resulted in readability levels of 69.4 (standard/average) and 80.2 (easy) for sunscreen questions and 58.9 (fairly difficult) and 59.3 (fairly difficult) for melanoma questions (
).AAD | ChatGPT | ChatGPT with 1 prompt | ChatGPT with 2 prompts | ||
Sunscreen FAQs | |||||
Flesch Reading Ease score | 60.9 (standard/average) | 60.5 (standard/average) | 69.4 (standard/average) | 80.2 (easy) | |
Gunning Fog | 11.1 (hard) | 11.7 (hard) | 8.0 (fairly easy) | 6.2 (fairly easy) | |
Flesch-Kincaid Grade Level | 8.9 (9th grade) | 9.1 (9th grade) | 5.6 (6th grade) | 3.8 (4th grade) | |
Coleman-Liau Index | 10.0 (10th grade) | 10.0 (10th grade) | 10.0 (10th grade) | 8.0 (8th grade) | |
SMOGa Index | 8.2 (8th grade) | 8.6 (9th grade) | 6.0 (6th grade) | 4.9 (5th grade) | |
Automated Readability Index | 9.4 (9th grade) | 9.4 (9th grade) | 4.6 (5th grade) | 2.5 (3rd grade) | |
Linsear Write Formula | 9.3 (9th grade) | 10.8 (11th grade) | 4.0 (4th grade) | 2.8 (3rd grade) | |
Average readabilityb | 9.2 (9th grade) | 9.6 (10th grade) | 6.0 (6th grade) | 4.4 (4th grade) | |
Melanoma FAQs | |||||
Flesch Reading Ease score | 56.2 (fairly difficult) | 46.5 (difficult) | 58.9 (fairly difficult) | 59.3 (fairly difficult) | |
Gunning Fog | 12.5 (hard to read) | 13.7 (hard to read) | 11.0 (hard to read) | 10.9 (hard to read) | |
Flesch-Kincaid Grade Level | 9.5 (10th grade) | 10.5 (11th grade) | 8.0 (8th grade) | 7.9 (8th grade) | |
Coleman-Liau Index | 9.0 (9th grade) | 12.0 (12th grade) | 10.0 (10th grade) | 8.0 (8th grade) | |
SMOG Index | 9.4 (9th grade) | 10.1 (10th grade) | 8.3 (8th grade) | 8.2 (8th grade) | |
Automated Readability Index | 8.4 (8th grade) | 9.7 (10th grade) | 6.9 (7th grade) | 6.3 (6th grade) | |
Linsear Write Formula | 10.8 (11th grade) | 9.5 (10th grade) | 7.0 (7th grade) | 6.8 (7th grade) | |
Average readability | 9.4 (9th grade) | 10.4 (10th grade) | 8.0 (8th grade) | 7.4 (7th grade) | |
Accuracy scorec, mean (SD) | 2.82 (0.25) | 2.89 (0.19) | 2.63 (0.41) | 2.62 (0.37) |
aSMOG: Simple Measure of Gobbledygook.
bThe average readability score was computed by averaging the tests with grade levels as outputs: Flesch-Kincaid Grade Level, Coleman-Liau Index, SMOG Index, Automated Readability Index, and Linsear Write Formula.
cThe accuracy score represents the mean score of 3 dermatology residents who assessed the educational materials using a numeric scale: 1 (not accurate), 2 (somewhat accurate), and 3 (accurate).
The AAD’s sunscreen FAQs and melanoma FAQs had readability levels of 9.2 and 9.4 (both 9th grade), respectively, and the original ChatGPT sunscreen and melanoma output readability levels were 9.6 and 10.4 (9th grade and 10th grade), respectively, with no differences in readability between AAD and ChatGPT for both question sets (P=.32 and P=.15, respectively). The first and second prompting of the sunscreen FAQs output generated material at lower reading levels than AAD-generated material (6.0, P=.005; 4.4, P<.001, respectively). Melanoma FAQs, after prompting, achieved lower reading levels versus AAD material, with scores of 8.0 (8th grade; P=.08) and 7.4 (7th grade; P=.007) (see
).The AAD material scored an average of 2.82 in accuracy, while the original ChatGPT material scored 2.89. All of the material (42/42, 100%) averaged within the 2-3 range. Initial and secondary prompting resulted in generated material with average scores of 2.63 and 2.62, respectively. Of the 42 materials generated from prompting, 42 (95.2%) averaged within the 2-3 range.
Discussion
The AAD’s sunscreen FAQs and melanoma FAQs had readability scores below the recommended threshold of 80 (Flesch Reading Ease scale) and above the recommended 6th-grade reading level, consistent with a study showing that 27 subungual melanoma websites had poor readability overall, with only 22% having readability lower than the 7th-grade reading level [
]. Taken together, these findings emphasize the need to enhance readability of dermatology public education information.Our study demonstrated that ChatGPT may be a solution to this problem. Prompting ChatGPT following initial inputs improved health information readability versus AAD materials and was closer to or within recommended guidelines. Our findings are similar to a 2023 study assessing 9 uveitis web pages with an average Flesch-Kincaid Grade Level of 11.0 (SD 1.4); ChatGPT improved the readability, with a mean Flesch-Kincaid Grade Level of 8.0 (SD 1.0) [
]. Therefore, the use of ChatGPT to adapt output to enhance readability might have applicability in dermatology and other medical fields.Most of the ChatGPT-generated material was rated as accurate to somewhat accurate. However, additional prompting resulted in a slight trend toward less accuracy, with 2 responses below the 2-3 (accurate to somewhat accurate) range. This observation may highlight a potential limitation to the applicability of ChatGPT in this context. Additionally, only a small number of questions were assessed. We analyzed the ChatGPT-3.5 version, which includes information up until September 2021.
In conclusion, ChatGPT could be used to enhance the readability of dermatology health information and lower it to the 6th-grade reading level recommended by the AMA. Larger studies are needed to corroborate our data and evaluate the utility of ChatGPT for dermatology public education materials.
Conflicts of Interest
SRL has served as a consultant for Eli Lilly, Ortho Dermatologics, Moberg Pharmaceuticals, and BelleTorus Corporation.
References
- Weiss B, Blanchard JS, McGee DL, Hart G, Warren B, Burgoon M, et al. Illiteracy among Medicaid recipients and its relationship to health care costs. J Health Care Poor Underserved. 1994;5(2):99-111. [CrossRef] [Medline]
- Institute of Medicine (US) Committee on Health Literacy; Nielsen-Bohlman L, Panzer AM, Kindig DA. Health Literacy: A Prescription to End Confusion. Washington DC. National Academies Press; 2004;256-266.
- Weiss BD. Removing Barriers to Better, Safer Care: Health Literacy and Patient Safety: Help Patients Understand: Manual for Clinicians (2nd ed). Chicago, IL. American Medical Association Foundation; 2007.
- Daraz L, Morrow AS, Ponce OJ, Farah W, Katabi A, Majzoub A, et al. Readability of online health information: a meta-narrative systematic review. Am J Med Qual. 2018;33(5):487-492. [CrossRef] [Medline]
- Mondal H, Mondal S, Podder I. Using ChatGPT for writing articles for patients' education for dermatological diseases: a pilot study. Indian Dermatol Online J. 2023;14(4):482-486. [FREE Full text] [CrossRef] [Medline]
- National Institutes of Health. National Institutes of Health (NIH). Washington, D.C. U.S. Department of Health & Human Services URL: https://www.nih.gov/institutes-nih/nih-office-director/office-communications-public-liaison/clear-communication/clear-simple [accessed 2023-06-01]
- Readable. Jun 2023. URL: https://readability-score.com/ [accessed 2023-02-14]
- Kang R, Lipner S. Assessment of internet sources on subungual melanoma. Melanoma Res. Aug 2020;30(4):416-419. [CrossRef] [Medline]
- Kianian R, Sun D, Crowell EL, Tsui E. The use of large language models to generate education materials about uveitis. Ophthalmol Retina. Feb 2024;8(2):195-201. [FREE Full text] [CrossRef] [Medline]
Abbreviations
AAD: American Academy of Dermatology Association |
AMA: American Medical Association |
Edited by J Solomon, I Brooks; submitted 21.06.23; peer-reviewed by O Tarawneh, H Mondal, S Friedman; comments to author 27.08.23; revised version received 02.01.24; accepted 06.02.24; published 06.03.24.
Copyright©Katie Roster, Rebecca B Kann, Banu Farabi, Christian Gronbeck, Nicholas Brownstone, Shari R Lipner. Originally published in JMIR Dermatology (http://derma.jmir.org), 06.03.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Dermatology, is properly cited. The complete bibliographic information, a link to the original publication on http://derma.jmir.org, as well as this copyright and license information must be included.