The Difficulty of German Information Booklets on Psoriasis and Psoriatic Arthritis: Automated Readability and Vocabulary Analysis

Background: Information-seeking Psoriasis or Psoriatic Arthritis patients are confronted with numerous educational materials when looking through the internet. Literature suggests that only 17.0%-21.4% of (Psoriasis, Psoriatic Arthritis) patients have a good level of knowledge about psoriasis treatment and self-management. A study from 1994 found that English Psoriasis/Psoriatic Arthritis brochures required a reading level between grades 8-12 to be understandable, which was confirmed in a follow-up study 20 years later. As readability of written health-related text material should not exceed the sixth-grade level, Psoriasis/Psoriatic Arthritis material seems to be ill-suited to its target audience. However, no data is available on the readability levels of Psoriasis/Psoriatic Arthritis brochures for German-speaking patients, and both the volume and their scope are unclear. Objective: This study aimed to analyze freely available educational materials for Psoriasis/Psoriatic Arthritis patients written in German, quantifying their difficulty by assessing both the readability and the vocabulary used in the collected brochures. Methods: Data collection was conducted manually via an internet search engine for Psoriasis/Psoriatic Arthritis–specific material, published as PDF documents. Next, raw text was extracted, and a computer-based readability and vocabulary analysis was performed on each brochure. For the readability analysis, we applied the Flesch Reading Ease (FRE) metric adapted for the German language, and the fourth Vienna formula (WSTF). To assess the laymen-friendliness of the vocabulary, the computation of an expert level was conducted using a specifically trained Support Vector Machine classifier. A two-sided, two-sample Wilcoxon test was applied to test whether the difficulty of brochures of pair-wise topic groups was different from each other. Results: In total, 68 brochures were included for readability assessment, of which 71% (48/68) were published by pharmaceutical companies, 22% (15/68) by nonprofit organizations, and 7% (5/68) by public institutions. The collection was separated into four topic groups: basic information on Psoriasis/Psoriatic Arthritis (G1/G2), lifestyle, and behavior with Psoriasis/Psoriatic Arthritis (G3/G4), medication and therapy guidance (G5), and other topics (G6). On average, readability levels were comparatively low, with FRE=31.58 and WSTF=11.84. However, two-thirds of the educational materials (69%; 47/68) achieved a vocabulary score ≤4 (ie, easy, very easy) and were, therefore, suitable for a lay audience. Statistically significant differences between brochure groups G1 and G3 for FRE (P=.0001), WSTF (P=.003), and vocabulary measure (L) (P=.01) exist, as do statistically significant differences for G2 and G4 in terms of FRE (P=.03), WSTF (P=.03) and L (P=.03). Conclusions: Online Psoriasis/Psoriatic Arthritis patient education materials in German require, on average, a college or university education level. As a result, patients face barriers to understanding the available material, even though the vocabulary used seems appropriate. For this reason, publishers of Psoriasis/Psoriatic Arthritis brochures should carefully revise their educational materials to provide easier and more comprehensible information for patients with lower health literacy levels. JMIR Dermatol 2020 | vol. 3 | iss. 1 | e16095 | p. 1 http://derma.jmir.org/2020/1/e16095/ (page number not for citation purposes) Wiesner et al JMIR DERMATOLOGY


Introduction
Overview Psoriasis (International Classification of Diseases Tenth Edition [ICD-10] code: L40) is one of the most common chronic inflammatory skin disorders in the dermatology field, manifesting as scaly, erythematous plaques. According to Griffiths and Barker [1], "the incidence in white individuals is estimated to be 60 cases per 100 000 head of population per year." Females and males are equally affected by the disease. Furthermore, this skin disease is associated with a form of inflammatory arthritis known as Psoriatic Arthritis (ICD-10: M07*) [2]. Patients' health-related quality of life is reduced by both conditions by a considerable amount [3][4][5][6], and "is similar to that of other major medical diseases" [7].
The development of Psoriasis and its clinical expression is influenced by several external factors, including smoking, weight, and stressful life events [8]. Moreover, work productivity loss is reported for Psoriatic Arthritis patients with moderate to severe joint symptoms [6].
Self-management plays an important role in coping with the effects of Psoriasis. In this context, it is vital to follow a consistent therapy approach [9]. According to [10], the major reasons for missing treatment were "drinking alcohol, being fed up, forgetfulness, and being too busy." However, patients require not only a certain degree of knowledge to keep their personal adherence level high, but psychological support [11] and exchange with other patients can also be valuable to improve self-management [12]. Besides consulting health professionals, Psoriasis patients can also seek (emotional) support and therapy advice from other sufferers, such as in online support communities [13]. Still, Renzi et al reported in a study with 240 Italian patients that [14]: The level of knowledge about the disease was not as high, with only 17 Information-seeking Psoriasis/Psoriatic Arthritis patients are offered different forms of health education material, such as printed health booklets. In 1994, Feldman et al investigated the readability of such educational material when provided in English [15]. The authors found that the text material required a US education level between grades 8-12, which was above the recommended grade level of text material for health education [16][17][18][19][20][21]. However, these findings cannot be transferred improvidently to other languages, such as Italian or German, as education systems and language properties differ substantially.
Another major problem of written patient information is the gap between the language of experts and laypeople. Even with a higher level of education, medical vocabulary, such as concepts of diagnosis and treatment, pose problems for those affected by a disease [22]. Furthermore, the medical terms associated with the origin of a disease, as used by health professionals or patients, tend to be different ones [23][24][25][26][27][28].
To assess the difficulty of written text material, several metrics exist for the English language [29][30][31][32][33]. However, the manual computation of these metrics can be difficult and time-consuming for large document collections and is, therefore, associated with a high demand for human or financial resources. Given the great variety of available Psoriasis/Psoriatic Arthritis brochures on the internet, a manual or semiautomatic approach seems far from practical. In this context, to the best of the authors' knowledge, no study has previously been published for Psoriasis/Psoriatic Arthritis-specific health education material written in the German language that applies machine learning methods and computes readability levels and vocabulary difficulty in a fully automated approach.
This study presents an automated, computer-based readability and vocabulary analysis of 68 patient information brochures on Psoriasis and Psoriatic Arthritis in German. The difficulty assessment of these brochures was conducted by applying a German adaptation of the Flesch-Reading Ease (FRE) [29] scale [34], the fourth Vienna formula (German: Wiener Sachtextformel, WSTF) [35], and a vocabulary-oriented method that is based on a Support Vector Machine (SVM) [36].

Related Work
Written or oral patient information should provide scientific evidence on a disease in a way that patients can understand. Individuals must be able to assess the essential chances and risks inherent to available therapeutic strategies and to balance them with their situation in life. In this context, health literacy, according to Ratzen and Parker, describes [37]: The degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions.
This concept is particularly important as low health literacy is associated with a poorer general health status and increased mortality, especially for higher age groups [38]. To quantify the health literacy level of an individual, the European Health Literacy Survey offers an instrument with a scale ranging from 1 (lowest) to 50 (highest). It was used to compare health literacy levels in different European countries. An analysis by Zok reports an average score of 31.9 for German participants, which was below the European average score (33.8) [39]. In a study from 2016, Schaeffer et al reported that "54.3% of [German study participants] were found to have limited health literacy" (n=2000) [40,41]. These findings support the need for educational materials that meet the capabilities of their readers; that is, those materials must be written at a sufficient readability level. Consequently, expert-centric vocabulary should be avoided as it imposes barriers to patients, hinders understandability of recommended therapy advice, or might lower overall adherence to treatment plans.
In this context, the analysis of health education material plays an important role in text production or for the improvement of existing material. However, several studies found that health education material is often written and published with low readability, which reduces or hinders its understandability for its intended target readers [42][43][44][45][46][47][48][49][50][51][52][53][54][55][56][57]. Different medical subdisciplines or diseases have been the subject of readability assessments. These include, among others: (1) cancer; (2) heart diseases; (3) lung diseases; (4) kidney diseases; (5) ophthalmic conditions; or (6) dermatologic conditions. Many other medical subdisciplines have been assessed, and both the previous list and related literature references should not be considered complete. Instead, the selected studies highlight recent studies in the broad field of readability assessment.
In 2004, Friedman et al analyzed cancer education material from 55 websites [42]. They reported a mean FRE score of 41.6; that is, readability of the content presented was at college-level, which corresponds to a US school level of grade 13+. However, their analysis revealed differences between different types of cancer, as "breast cancer sites were written at easier reading grade levels." A similar study was presented by Basch et al in 2018, where the readability of prostate cancer materials on the internet was assessed using five different metrics [43]. They reported that the "majority of websites had difficult readability" and concluded that a "large majority of information available on the Internet about prostate cancer will not be readable for many individuals." A recent analysis of printed booklets addressing melanoma patients in the German language found that the median FRE was 43 for nine brochures analyzed manually [44]. The authors reported "low readability in at least half of the booklets" and emphasized the need for content and didactic revision of the educational material.
In 2012, Taylor-Clarke et al studied the suitability and readability of written material (n=18) provided in heart failure clinics and available on the internet [45]. In a non-computer-based analysis, the authors used the Fry readability formula and found that readability levels "ranged between 3rd and 15th grade-level," and the average readability level was eighth grade level. Similar results were reported by Kher, Johnson, and Griffith [46] in their study, which included health education material on congestive heart failure from 70 websites. Their primary outcome was that "only 5 out of 70 websites were within the limits of the recommended sixth-grade readability level." The mean FRE score was 48.87.
A recent study on heart failure education via a mobile app [47] analyzed the in-app content with an online readability calculation tool. The authors reported, "although the use of medical terminology in patient educational material is often unavoidable," which results in many polysyllabic medical terms, the "CHF [congestive heart failure] Info App included fewer polysyllabic terms." They calculated a mean of sixth grade reading level for the in-app CHF content.
Other studies investigated the readability of educational material provided for patients with lung diseases or their family members. A study from 2016 included 109 patient-directed online information resources and applied ten different readability metrics [48]. Weiss et al found that only "10 articles (9%) were written below a sixth-grade level," but the "average [FRE] score was 52," ranging from 18 to 78; the grade level ranged from "9.2 (www.cancer.gov) to 15.2 (www.wikipedia.org)" when grouped by parent website. A study by Hansberry et al [49], assessed the readability of educational material on the "health benefits of lung cancer screening," which was intended for the general public, using ten readability instruments. The authors reported that of "80 articles, 62.5% required a high school education to comprehend." In a similar study, Haas et al reviewed 46 websites on lung cancer screening [50]. The overall mean Flesch-Kincaid grade level was a mean of 10.6 (SD 2.2). In 2017, Fullmann et al [51] assessed consumer information of 26 chronic obstructive pulmonary disease inhalers from the Health Canada Drug Product Database. They concluded that, while the medication information section was on average "difficult to read" or "hard" (FRE=47.8), the instruction section was "easy" or "fairly easy" (FRE=79.0) to read.
For the field of nephrology, Thomas et al [52] analyzed Wikipedia as a resource for patient education, including 69 publicly available articles. The overall mean FRE reported was 19.4, which corresponds to a deficient level of readability. Moreover, the mean Flesch-Kincaid grade level was 15.1, signaling college-level education was required by readers of Wikipedia. A systematic review by Morony et al [53] included 80 patient education materials on chronic kidney disease from the United States, the United Kingdom, and Australia. When evaluated with the Flesch-Kincaid grade level instrument, "most materials required a minimum of grade 9" reading level. The authors emphasized that "cognitive decline in patients" suffering from the effects of this disease resulted in "lower literacy than the average patient," and content providers should carefully compile text material.
Online ophthalmic patient information was studied by Edmunds et al [54]. They assessed 160 websites, reporting a median FRE score of 52.1. Their analysis found that "83% [..] as being of 'difficult' readability." The authors also reported that "Not-for-profit webpages were of significantly greater length than commercial webpages." A single-institution study evaluated education materials on glaucoma [55]. The authors checked the handouts' readability of their institution and found a 10th-grade Flesch-Kincaid reading level. After "applying guidelines on writing easy-to-understand" material and revising the material, readability had improved to "a 6th-grade reading level," which better suits patients with low health literacy levels.
Tulbert, Snyder, and Brodell [56] compared the readability of "three sources of patient-education material on the internet (WebMD.com, Wikipedia.org, and MedicineOnline.com) […] with materials produced by the American Academy of Dermatology [AAD]". The educational materials found on Wikipedia.org were more difficult to comprehend than AAD and MedicineOnline. Tulbert et al categorized the retrieved pamphlets by several topics. Psoriasis brochures (no differentiation between Psoriasis/Psoriatic Arthritis) were found with a mean FRE of 39.5 for the AAD materials, and a mean FRE of 53.6 for the WebMD resources.
The readability of education materials designed for patients with Psoriasis was studied in 1994 [15]. The authors found that the text material, written in English, required an education level between grades 8-12, significantly above the recommended grade level for health education. In their analysis, the mean FRE score was 52.7. A follow-up study was conducted 20 years later by Smith [57]. The analysis of these brochures in English revealed that revised, newer online resources on Psoriasis provided by three organizations still "fail to meet the desired 6th grade level" [57].

Aims of the Study
The authors decided to focus on brochures available for free on the internet and written in German, targeting patients with Psoriasis (Vulgaris) or Psoriatic Arthritis. In this context, the aim of this study was three-fold: (1) to conduct an analysis of the current situation, that is, the volume and scope of information brochures on Psoriasis/Psoriatic Arthritis for (German-speaking) patients; (2) to quantify the level of readability of the text material and the type of vocabulary used in the identified brochures; and (3) to evaluate whether different types of brochures are better suited for citizens with lower health literacy levels. Therefore, this study can provide a baseline for researchers that want to validate their findings.

Study Design
This study of educational material consisted of two stages. First, to answer aim 1, data extraction was conducted manually using an internet search for PDF documents specifically written for and targeting Psoriasis and Psoriatic Arthritis patients. The retrieval was limited to PDF documents. This file type was chosen as the corresponding documents are easily accessible in electronic format (machine-readable) and can also be distributed in printed format. Generally, these documents are highly structured and proofread by publishing institutions.
Next, the subsequent stage used the health education material collected in stage 1 and conducted a computer-based readability and vocabulary analysis. Both analyses were intended to answer research aims 2 and 3.

Study Setting
Patient information brochures on Psoriasis (Vulgaris) and Psoriatic Arthritis were collected. All booklets had to be freely available on the internet. Print-only booklets or multimedia content were not considered. Documents were eligible for inclusion if they: (1) provided information on Psoriasis and Psoriatic Arthritis for patients; (2) provided information in the German language; and (3) were free to access. If these criteria were not met, then the related documents were excluded from the readability and vocabulary analysis.
For the identification of relevant brochures, the expert term "Psoriasis" was chosen, accompanied by its more layman-friendly German term "Schuppenflechte." The two terms refer to the same concept, and patients in Germany are familiar with both. The German term "Broschüre" (English: brochure) was included to find educational materials suited for patients rather than other types of PDF files, such as drug package inserts or electronic presentation slides by medical professionals. The DuckDuckGo search engine was utilized to search the Web with the following search terms: +Broschüre +Psoriasis filetype:pdf (search terms A), +Broschüre +Schuppenflechte filetype:pdf (search terms B), +Schuppenflechte filetype:pdf (search terms C), and +Psoriasis filetype:pdf (search terms D).
After the elimination of duplicates, two authors screened the titles and the content of the retrieved information brochures in a joint session to check whether the educational material targeted Psoriasis/Psoriatic Arthritis patients. Therefore, false-positive retrieval results were removed during this manual step.

Definition
Readability [58] is a term to describe the properties of written text concerning the readers' competence, motivation, and understanding of a document [59]. It depends on the complexity of a text's structure, the sentence structure, and the vocabulary used.

Flesch Reading Ease Scale
A well-established readability scale for the English language is the Flesch Reading Ease metric [29]. The FRE measures the readability of a text via its average sentence length (ASL) and the average number of syllables per word (ASW). It relies on the fact that short words or sentences are usually easier to understand than longer ones. However, for this analysis, we applied the modified FRE for the German language by Toni Amstad [34]:

Vienna Formula
In contrast to the FRE, the Vienna formula (WSTF) is not an adapted version for the German language. Instead, it relies on work by Bamberger and Vanacek [35], who analyzed the bases of German text material and derived at least five versions of the Vienna formula for prose and nonfiction text. Typically, the fourth WSTF is used for text analyses. This metric also relies on the ASL and the proportion of (complex) words with three or more syllables (MS):

Vocabulary Classification
For the German language, average words' length or syllable counts are not a good indicator of whether a term/concept is laypeople compatible, which means it can be easily understood by people with an education level of grades 6-7. This is because German grammer allows the creation and use of many compound words (eg, "Hauterkrankung," "Hautunverträglichkeit," "Kontaktallergie"), which are, while lengthy, quite laymen friendly for an average patient. Several machine learning techniques can be leveraged to compensate for the limitations of established readability measures [36,60]. This is why we added the vocabulary-based SVM approach as an extra dimension of text analysis.
In previous work [36], a vocabulary-based computation of an "expert level" using a specially trained SVM for German was presented, which was applied to cancer information brochures [61] and is also applicable to Psoriasis information brochures. To use this pretrained classifier to quantify the vocabulary-based difficulty of medical text material, several preprocessing steps are necessary [62]. As a first step, each text is split into tokens (ie, single word fragments). Second, nonhuman readable markup (eg, XML tags), as well as stop words, are removed (eg, he/she/it). This is important as these kinds of tokens do not influence the difficulty of a text. Next, the remaining tokens are reduced to their stem forms (eg, surgeries becomes surger) to eliminate linguistic variations of the same basic concept. Finally, the text content of a document is transformed into its mathematical representation based on previously selected features, similarly to a study conducted by Keinki et al [63]. In this context, features represent characteristic terms from the medical domain and thereby influence the vocabulary-based difficulty of a text.
To quantify the degree of "expert-centricity" of the text material, the vocabulary measure (L) ∈ [1,…,10] is defined. It makes use of the SVM classifier above. In this context, higher values of L indicate an academic (medical) background knowledge or working experience in the medical domain is needed; a value of >7 corresponds to a very expert-centric text, a value of 5-6 to a difficult text, a value of 4-5 to a moderate text (laypeople with medical, educational background), a value of 3-4 to an easy text (intermediate level/junior high school), and a value of <3 to a very easy text (elementary level/elementary school).

Difficulty
The aforementioned instruments make use of different scales to express difficulty, either in terms of readability or vocabulary. Therefore, it seems advisable to map these scales to independent classes that express the difficulty much more simply. The mapping used in this study is presented in Table 1.

Computational Processing Steps
Parsing a text document is the process of analyzing its structure and fragments according to the rules of a natural language's grammar. Typically, modern text documents (eg, PDF, DOC, DOCX) include metadata that describes their internal structure or external representation. In this context, text parsers process the descriptive markup structure of such document formats. The primary aim of this process is to extract the raw version of a text without any remaining technical markup which describes structural information. Typically, this includes how a paragraph is oriented, to which section it belongs, if text is formatted bold, if it contains figures or tables, and so on [64] (see chapters 5 and 6 for further details).
Before a parser can extract raw text data, the construction of a document collection is necessary. In the context of this study, all information brochures were downloaded as PDF files. These files were automatically converted to documents in DOCX format and represent the input of our analysis framework. The computational processing steps to compute readability and vocabulary scores for each document follows the workflow depicted in Figure 1. First, document parsers from the Apache Tika framework [65] were applied to extract the actual text content. As a second step, the extracted text was cleaned of disturbance artifacts (eg, different hyphen encoding schemes). Finally, the aforementioned readability and vocabulary metrics were computed for every brochure by a self-implemented analysis framework written in Java, which was previously tested against reference material. For sentence detection, the analysis framework relies on the Apache OpenNLP library [66] and their broadly accepted sentence model for the German language [67]. Liang's hyphenation algorithm [68] was used to estimate syllable counts. For stem form reduction, the Snowball Stemmer, according to Porter, was applied [69]. The analysis was conducted on a Mac OS 10.14.6 64bit computer running Java 11.0.4 (Oracle Corporation, Redwood Shores, California, United States) on August 21, 2019.

Statistical Analysis
A two-sided, two-sample Wilcoxon test [70], also known as the Mann-Whitney U test, was applied to test whether the difficulty of brochures of two topic groups are different to each other (H 0 : μ 1 =μ 2 , H 1 : μ 1 ≠μ 2 , alpha=0.05). If P<.05, H 1 is accepted, as in there is a significant difference in terms of readability between two groups. The nonparametric U test was chosen as the number of brochures for several topic groups was rather small (n<10), and no normal distribution could be assumed. Data were analyzed with the statistics software R (The R Foundation, Vienna, Austria) version 3.6.1, on a Linux, Ubuntu 18.04 LTS/64bit computer.

Principal Findings
The acquisition of Psoriasis/Psoriatic Arthritis brochures was carried out on August 19 and 20, 2019, by two of the authors. Given the search terms and the inclusion criteria, 73 brochures were eligible for inclusion, of which five were identified as either duplicate content or as being too general (ie, they were unspecific or covered other dermatology topics). The flowchart in Figure 2 depicts the data acquisition process with all details. In total, 68 brochures were included for further readability and vocabulary assessment. While assessing the brochures for eligibility, four categories emerged from the search engine's retrieval results: basic information on the disease (Psoriasis/Psoriatic Arthritis, labeled G1/G2), general advice on coping with Psoriasis/Psoriatic Arthritis in daily life situations (labeled G3/G4), including topics such as stress, diet, smoking, work-life and traveling, medication and therapy guidance (G5), and other topics (G6).

Sample Characteristics
During the collection, several types of publishers emerged: pharmaceutical company or association, nonprofit organization, and public institution.

Readability Analysis
All brochure groups (G1-G6) were analyzed according to the readability metrics FRE and WSTF, as outlined in the Methods section. The results are presented in Table 2 The distributions for both readability metrics, FRE, and the Vienna formula (WSTF), are depicted in Figures 3 and 4.   (4)(5) and dark red as the lowest readability (14-15).

Vocabulary Classification
Overall, the brochures had a mean vocabulary measure (L) of L=3.66. As listed in Table 2, two-thirds of the educational materials (69%; 47/68) achieved a score ≤4 (VE+E) and were therefore suitable for a lay audience. A total of 11/68 booklets (16%) had a score ≥9 and are thus only suitable for an academic readership. For the remaining ten booklets (15%, 10/68), a score between >4 and <9 corresponds to a level suitable for persons with medical knowledge or a strong medical background. The groups G3 and G4 scored the lowest vocabulary measure, with L=1.75 for each. The highest vocabulary measure was found for the booklet group on medication and therapy topics (G5), with L=6.64. The distribution of the classification results over all the brochure groups is depicted in Figure 5.  (1) and dark red as the highest expert level required (10). SVM: support vector machine.
A comparison of the topic groups was conducted for the pairs G1/G3 and G2/G4. The results of the corresponding Wilcoxon test for two independent samples are presented in Table 3. Negative values originate from the definition of the FRE metric; that is, lower numbers correspond to a higher difficulty. In addition, due to a high number of ties (in the ranks) for the vocabulary metric (L), an exact computation of CI and P was not possible. Instead, a normal approximation was used by the statistics software R.
The readability findings show that the majority of the collected material is difficult or very difficult (D+VD) to read, as shown by the WSTF (87%; 59/68). The outcome is more apparent when the German adaption of the FRE scale is applied (100%; 68/68) (Table 2). Thus, educational materials on Psoriasis/Psoriatic Arthritis are not suitable for their intended group of readers. This corresponds to the results of other authors, who also reported the high readability levels of such resources [73][74][75][76][77].
The vocabulary is also of great relevance for comprehensibility and might be even more decisive than the sentence structure [78]. The finding of the vocabulary analysis revealed that two-thirds (69%; 47/68) of the educational materials were well suited for laypeople. This originates from the fact that relatively few medical expert terms have been used during text production, or expert terminology has been actively avoided. With the difficulty assessment of 68 Psoriasis/Psoriatic Arthritis brochures, we demonstrated that a pretrained SVM can analyze text material for its vocabulary. The study findings therefore contribute the first dedicated vocabulary analysis related to the use of expert medical terms in patient educational material for Psoriasis and Psoriatic Arthritis.

Limitations
Several limitations apply to the study setting. First, a public search engine was utilized to build the data collection used in this study. In this context, the internal mechanisms used to compute and retrieve information from a search engine's index are not fully transparent. For this reason, some potentially relevant documents might have been missed by our data collection process. The retrieval was also limited to PDF documents. The study design included this file type as the corresponding documents are easily accessible in electronic format (machine-readable), can also be distributed in printed format (these documents are, in general, highly structured and proof-read by publishing institutions), and represent a robust, well-known data format to provide information on (chronic) diseases and related treatment options via the internet.
Second, for this study, we analyzed 68 brochures on Psoriasis/Psoriatic Arthritis published by different types of organizations (see Multimedia Appendix 1). Depending on the motivation of an organization, there might be different aims in terms of content, words used, and selected topics. This might have affected our results, as scientific organizations might have used more complex sentence structures to explain Psoriasis/Psoriatic Arthritis concepts, while pharmaceutical companies might tend towards easier vocabulary and sentence structure.
Next, in the preprocessing phase, the included PDF brochures were automatically converted to documents in DOCX format. Nevertheless, disturbance artifacts, that is, different kinds of hyphens or misencoded characters originating from different encoding schemes, may still have been included in the extracted, raw text material.
The adapted FRE metric and Vienna formula are mainly computed on the basis of mean sentence length, the mean number of syllables per word, and language-specific weighting factors. However, detecting syllables is not a trivial task for the German language and does not work reliably in some rare circumstances [79]. For this reason, the computed FRE or WSTF scores can be influenced by the aforementioned inaccuracies.
In this context, it should be stressed that this affects all natural language processing analysis tools for German text material.
Furthermore, solely computing the readability of educational materials disregards the individual knowledge and motivation of readers [35]. Aspects related to illustration and design were not included in the analysis of this study. Consequently, the suitability of health information cannot exclusively be judged based on its readability or its used vocabulary [35,80]. In this context, the studies by Taylor-Clarke et al and Tuot et al [45,81], among others, have applied methods that go beyond measures of word and sentence lengths, such as the Suitability Assessment of Materials (SAM) instrument, which reflects other aspects of a brochure's appearance that influence the understandability of (health) information and text comprehension.
However, besides the need for manual efforts, judging quality criteria is a highly subjective task for this instrument. Moreover, a sufficient number of judges are required to ensure an objective assessment of visual and aesthetic aspects in brochure design, which is not met by every study in this field. Even more important: interjudge reliability must be considered, evaluated, and reported properly. Modern approaches use crowd-sourcing techniques for which a large number of judges and related assessments can be obtained more easily [82].

Comparison With Prior Work
Previous studies investigated the readability of health education materials on Psoriasis/Psoriatic Arthritis written in the English language [15,57]. In both analyses, the outcome was that the materials failed to "meet the desired 6th grade level" [57]. Although no accepted recommendation exists for German health education material, our findings confirm the low readability of Psoriasis/Psoriatic Arthritis brochures for patients. In contrast to the studies by Feldman et al and Smith, this study contributes the first vocabulary-related assessments of materials originating from the dermatology domain. We found that the vocabulary used in Psoriasis/Psoriatic Arthritis brochures is adequate for laypeople; that is, patients and family members who have no professional background in the health sector. A secondary study outcome gives a broad picture over the published materials in German-speaking countries, listed by publisher and year in Multimedia Appendix 1.
In a previous study [61], Keinki et al analyzed information booklets for German cancer patients. In this particular domain, the authors found a mean vocabulary score of L SVM =5.09, signaling a higher difficulty for laypeople than in this study (L SVM =3.66), that is, Psoriasis/Psoriatic Arthritis brochures make use of less complex medical terminology. This difference might be explained by the fact that Psoriasis/Psoriatic Arthritis brochures are mainly (71%; 48/68) produced and published by pharmaceutical companies or related associations. In contrast, cancer booklets follow a stricter evidence-based text production process in Germany [83], that is, patient guidelines and brochures on cancer topics are written or reviewed by medical professionals.

Future Directions
This study analyzed static PDF document content for Psoriasis/Psoriatic Arthritis patients. In future work, the authors intend to extend their analyses to other types of online resources. This includes the content of trustworthy health information websites in German or articles in Wikipedia. Given such an analysis, a comparison to the work of Thomas et al [52] would be possible in terms of FRE and grade levels, as the authors reported even lower readability than in this study.

Conclusions
For 68 German Psoriasis and Psoriatic Arthritis brochures freely available on the internet, the study findings reveal that the readability is low (Figures 3 and 4). Publishing organizations and authors should, therefore, reevaluate existing brochures and reduce sentence complexity, but our findings suggest that the use of vocabulary suits the target audience ( Figure 5).
Methods from the field of machine learning can support authors of Psoriasis/Psoriatic Arthritis brochures, as they complement existing readability assessment methodology. For this reason, the assessment of written patient information should preferably be analyzed in terms of sentence structure and vocabulary, such as via the SVM-based classifier used for this study. The authors recommend the use of both sentence dimension and vocabulary dimension as supportive measures to ensure and provide understandable health education materials, independent of the medical domain.