Published on in Vol 3, No 1 (2020): Jan-Dec

The Difficulty of German Information Booklets on Psoriasis and Psoriatic Arthritis: Automated Readability and Vocabulary Analysis

The Difficulty of German Information Booklets on Psoriasis and Psoriatic Arthritis: Automated Readability and Vocabulary Analysis

The Difficulty of German Information Booklets on Psoriasis and Psoriatic Arthritis: Automated Readability and Vocabulary Analysis

Original Paper

1Department of Medical Informatics, Heilbronn University, Heilbronn, Germany

2Consumer Health Informatics SIG, German Association for Medical Informatics, Biometry & Epidemiology (GMDS eV), Cologne, Germany

3Center for Machine Learning, Heilbronn University, Heilbronn, Germany

4GECKO Institute for Medicine, Informatics & Economics, Heilbronn University, Heilbronn, Germany

*these authors contributed equally

Corresponding Author:

Martin Wiesner, Dipl-Inform Med

Department of Medical Informatics

Heilbronn University

Max-Planck-Str 39

Heilbronn, 74081


Phone: 49 71315046947


Background: Information-seeking Psoriasis or Psoriatic Arthritis patients are confronted with numerous educational materials when looking through the internet. Literature suggests that only 17.0%-21.4% of (Psoriasis, Psoriatic Arthritis) patients have a good level of knowledge about psoriasis treatment and self-management. A study from 1994 found that English Psoriasis/Psoriatic Arthritis brochures required a reading level between grades 8-12 to be understandable, which was confirmed in a follow-up study 20 years later. As readability of written health-related text material should not exceed the sixth-grade level, Psoriasis/Psoriatic Arthritis material seems to be ill-suited to its target audience. However, no data is available on the readability levels of Psoriasis/Psoriatic Arthritis brochures for German-speaking patients, and both the volume and their scope are unclear.

Objective: This study aimed to analyze freely available educational materials for Psoriasis/Psoriatic Arthritis patients written in German, quantifying their difficulty by assessing both the readability and the vocabulary used in the collected brochures.

Methods: Data collection was conducted manually via an internet search engine for Psoriasis/Psoriatic Arthritis–specific material, published as PDF documents. Next, raw text was extracted, and a computer-based readability and vocabulary analysis was performed on each brochure. For the readability analysis, we applied the Flesch Reading Ease (FRE) metric adapted for the German language, and the fourth Vienna formula (WSTF). To assess the laymen-friendliness of the vocabulary, the computation of an expert level was conducted using a specifically trained Support Vector Machine classifier. A two-sided, two-sample Wilcoxon test was applied to test whether the difficulty of brochures of pair-wise topic groups was different from each other.

Results: In total, 68 brochures were included for readability assessment, of which 71% (48/68) were published by pharmaceutical companies, 22% (15/68) by nonprofit organizations, and 7% (5/68) by public institutions. The collection was separated into four topic groups: basic information on Psoriasis/Psoriatic Arthritis (G1/G2), lifestyle, and behavior with Psoriasis/Psoriatic Arthritis (G3/G4), medication and therapy guidance (G5), and other topics (G6). On average, readability levels were comparatively low, with FRE=31.58 and WSTF=11.84. However, two-thirds of the educational materials (69%; 47/68) achieved a vocabulary score ≤4 (ie, easy, very easy) and were, therefore, suitable for a lay audience. Statistically significant differences between brochure groups G1 and G3 for FRE (P=.0001), WSTF (P=.003), and vocabulary measure (L) (P=.01) exist, as do statistically significant differences for G2 and G4 in terms of FRE (P=.03), WSTF (P=.03) and L (P=.03).

Conclusions: Online Psoriasis/Psoriatic Arthritis patient education materials in German require, on average, a college or university education level. As a result, patients face barriers to understanding the available material, even though the vocabulary used seems appropriate. For this reason, publishers of Psoriasis/Psoriatic Arthritis brochures should carefully revise their educational materials to provide easier and more comprehensible information for patients with lower health literacy levels.

JMIR Dermatol 2020;3(1):e16095




Psoriasis (International Classification of Diseases Tenth Edition [ICD-10] code: L40) is one of the most common chronic inflammatory skin disorders in the dermatology field, manifesting as scaly, erythematous plaques. According to Griffiths and Barker [1], “the incidence in white individuals is estimated to be 60 cases per 100 000 head of population per year.” Females and males are equally affected by the disease. Furthermore, this skin disease is associated with a form of inflammatory arthritis known as Psoriatic Arthritis (ICD-10: M07*) [2]. Patients’ health-related quality of life is reduced by both conditions by a considerable amount [3-6], and “is similar to that of other major medical diseases” [7].

The development of Psoriasis and its clinical expression is influenced by several external factors, including smoking, weight, and stressful life events [8]. Moreover, work productivity loss is reported for Psoriatic Arthritis patients with moderate to severe joint symptoms [6].

Self-management plays an important role in coping with the effects of Psoriasis. In this context, it is vital to follow a consistent therapy approach [9]. According to [10], the major reasons for missing treatment were “drinking alcohol, being fed up, forgetfulness, and being too busy.” However, patients require not only a certain degree of knowledge to keep their personal adherence level high, but psychological support [11] and exchange with other patients can also be valuable to improve self-management [12]. Besides consulting health professionals, Psoriasis patients can also seek (emotional) support and therapy advice from other sufferers, such as in online support communities [13]. Still, Renzi et al reported in a study with 240 Italian patients that [14]:

The level of knowledge about the disease was not as high, with only 17.0% and 21.4% of patient[s] with [Psoriasis] and [Psoriatic Arthritis], respectively, having a good level of knowledge about psoriasis treatment.

Information-seeking Psoriasis/Psoriatic Arthritis patients are offered different forms of health education material, such as printed health booklets. In 1994, Feldman et al investigated the readability of such educational material when provided in English [15]. The authors found that the text material required a US education level between grades 8-12, which was above the recommended grade level of text material for health education [16-21]. However, these findings cannot be transferred improvidently to other languages, such as Italian or German, as education systems and language properties differ substantially.

Another major problem of written patient information is the gap between the language of experts and laypeople. Even with a higher level of education, medical vocabulary, such as concepts of diagnosis and treatment, pose problems for those affected by a disease [22]. Furthermore, the medical terms associated with the origin of a disease, as used by health professionals or patients, tend to be different ones [23-28].

To assess the difficulty of written text material, several metrics exist for the English language [29-33]. However, the manual computation of these metrics can be difficult and time-consuming for large document collections and is, therefore, associated with a high demand for human or financial resources. Given the great variety of available Psoriasis/Psoriatic Arthritis brochures on the internet, a manual or semiautomatic approach seems far from practical. In this context, to the best of the authors’ knowledge, no study has previously been published for Psoriasis/Psoriatic Arthritis–specific health education material written in the German language that applies machine learning methods and computes readability levels and vocabulary difficulty in a fully automated approach.

This study presents an automated, computer-based readability and vocabulary analysis of 68 patient information brochures on Psoriasis and Psoriatic Arthritis in German. The difficulty assessment of these brochures was conducted by applying a German adaptation of the Flesch-Reading Ease (FRE) [29] scale [34], the fourth Vienna formula (German: Wiener Sachtextformel, WSTF) [35], and a vocabulary-oriented method that is based on a Support Vector Machine (SVM) [36].

Related Work

Written or oral patient information should provide scientific evidence on a disease in a way that patients can understand. Individuals must be able to assess the essential chances and risks inherent to available therapeutic strategies and to balance them with their situation in life. In this context, health literacy, according to Ratzen and Parker, describes [37]:

The degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions.

This concept is particularly important as low health literacy is associated with a poorer general health status and increased mortality, especially for higher age groups [38].

To quantify the health literacy level of an individual, the European Health Literacy Survey offers an instrument with a scale ranging from 1 (lowest) to 50 (highest). It was used to compare health literacy levels in different European countries. An analysis by Zok reports an average score of 31.9 for German participants, which was below the European average score (33.8) [39]. In a study from 2016, Schaeffer et al reported that “54.3% of [German study participants] were found to have limited health literacy” (n=2000) [40,41]. These findings support the need for educational materials that meet the capabilities of their readers; that is, those materials must be written at a sufficient readability level. Consequently, expert-centric vocabulary should be avoided as it imposes barriers to patients, hinders understandability of recommended therapy advice, or might lower overall adherence to treatment plans.

In this context, the analysis of health education material plays an important role in text production or for the improvement of existing material. However, several studies found that health education material is often written and published with low readability, which reduces or hinders its understandability for its intended target readers [42-57]. Different medical subdisciplines or diseases have been the subject of readability assessments. These include, among others: (1) cancer; (2) heart diseases; (3) lung diseases; (4) kidney diseases; (5) ophthalmic conditions; or (6) dermatologic conditions. Many other medical subdisciplines have been assessed, and both the previous list and related literature references should not be considered complete. Instead, the selected studies highlight recent studies in the broad field of readability assessment.

In 2004, Friedman et al analyzed cancer education material from 55 websites [42]. They reported a mean FRE score of 41.6; that is, readability of the content presented was at college-level, which corresponds to a US school level of grade 13+. However, their analysis revealed differences between different types of cancer, as “breast cancer sites were written at easier reading grade levels.” A similar study was presented by Basch et al in 2018, where the readability of prostate cancer materials on the internet was assessed using five different metrics [43]. They reported that the “majority of websites had difficult readability” and concluded that a “large majority of information available on the Internet about prostate cancer will not be readable for many individuals.” A recent analysis of printed booklets addressing melanoma patients in the German language found that the median FRE was 43 for nine brochures analyzed manually [44]. The authors reported “low readability in at least half of the booklets” and emphasized the need for content and didactic revision of the educational material.

In 2012, Taylor-Clarke et al studied the suitability and readability of written material (n=18) provided in heart failure clinics and available on the internet [45]. In a non-computer-based analysis, the authors used the Fry readability formula and found that readability levels “ranged between 3rd and 15th grade-level,” and the average readability level was eighth grade level. Similar results were reported by Kher, Johnson, and Griffith [46] in their study, which included health education material on congestive heart failure from 70 websites. Their primary outcome was that “only 5 out of 70 websites were within the limits of the recommended sixth-grade readability level.” The mean FRE score was 48.87.

A recent study on heart failure education via a mobile app [47] analyzed the in-app content with an online readability calculation tool. The authors reported, “although the use of medical terminology in patient educational material is often unavoidable,” which results in many polysyllabic medical terms, the “CHF [congestive heart failure] Info App included fewer polysyllabic terms.” They calculated a mean of sixth grade reading level for the in-app CHF content.

Other studies investigated the readability of educational material provided for patients with lung diseases or their family members. A study from 2016 included 109 patient-directed online information resources and applied ten different readability metrics [48]. Weiss et al found that only “10 articles (9%) were written below a sixth-grade level,” but the “average [FRE] score was 52,” ranging from 18 to 78; the grade level ranged from “9.2 ( to 15.2 (” when grouped by parent website. A study by Hansberry et al [49], assessed the readability of educational material on the “health benefits of lung cancer screening,” which was intended for the general public, using ten readability instruments. The authors reported that of “80 articles, 62.5% required a high school education to comprehend.”

In a similar study, Haas et al reviewed 46 websites on lung cancer screening [50]. The overall mean Flesch-Kincaid grade level was a mean of 10.6 (SD 2.2). In 2017, Fullmann et al [51] assessed consumer information of 26 chronic obstructive pulmonary disease inhalers from the Health Canada Drug Product Database. They concluded that, while the medication information section was on average “difficult to read” or “hard” (FRE=47.8), the instruction section was “easy” or “fairly easy” (FRE=79.0) to read.

For the field of nephrology, Thomas et al [52] analyzed Wikipedia as a resource for patient education, including 69 publicly available articles. The overall mean FRE reported was 19.4, which corresponds to a deficient level of readability. Moreover, the mean Flesch-Kincaid grade level was 15.1, signaling college-level education was required by readers of Wikipedia. A systematic review by Morony et al [53] included 80 patient education materials on chronic kidney disease from the United States, the United Kingdom, and Australia. When evaluated with the Flesch-Kincaid grade level instrument, “most materials required a minimum of grade 9” reading level. The authors emphasized that “cognitive decline in patients” suffering from the effects of this disease resulted in “lower literacy than the average patient,” and content providers should carefully compile text material.

Online ophthalmic patient information was studied by Edmunds et al [54]. They assessed 160 websites, reporting a median FRE score of 52.1. Their analysis found that “83% [..] as being of ‘difficult’ readability.” The authors also reported that “Not-for-profit webpages were of significantly greater length than commercial webpages.” A single-institution study evaluated education materials on glaucoma [55]. The authors checked the handouts’ readability of their institution and found a 10th-grade Flesch-Kincaid reading level. After “applying guidelines on writing easy-to-understand” material and revising the material, readability had improved to “a 6th-grade reading level,” which better suits patients with low health literacy levels.

Tulbert, Snyder, and Brodell [56] compared the readability of “three sources of patient-education material on the internet (,, and […] with materials produced by the American Academy of Dermatology [AAD]”. The educational materials found on were more difficult to comprehend than AAD and MedicineOnline. Tulbert et al categorized the retrieved pamphlets by several topics. Psoriasis brochures (no differentiation between Psoriasis/Psoriatic Arthritis) were found with a mean FRE of 39.5 for the AAD materials, and a mean FRE of 53.6 for the WebMD resources.

The readability of education materials designed for patients with Psoriasis was studied in 1994 [15]. The authors found that the text material, written in English, required an education level between grades 8-12, significantly above the recommended grade level for health education. In their analysis, the mean FRE score was 52.7. A follow-up study was conducted 20 years later by Smith [57]. The analysis of these brochures in English revealed that revised, newer online resources on Psoriasis provided by three organizations still “fail to meet the desired 6th grade level” [57].

Aims of the Study

The authors decided to focus on brochures available for free on the internet and written in German, targeting patients with Psoriasis (Vulgaris) or Psoriatic Arthritis. In this context, the aim of this study was three-fold: (1) to conduct an analysis of the current situation, that is, the volume and scope of information brochures on Psoriasis/Psoriatic Arthritis for (German-speaking) patients; (2) to quantify the level of readability of the text material and the type of vocabulary used in the identified brochures; and (3) to evaluate whether different types of brochures are better suited for citizens with lower health literacy levels. Therefore, this study can provide a baseline for researchers that want to validate their findings.

Study Design

This study of educational material consisted of two stages. First, to answer aim 1, data extraction was conducted manually using an internet search for PDF documents specifically written for and targeting Psoriasis and Psoriatic Arthritis patients. The retrieval was limited to PDF documents. This file type was chosen as the corresponding documents are easily accessible in electronic format (machine-readable) and can also be distributed in printed format. Generally, these documents are highly structured and proofread by publishing institutions.

Next, the subsequent stage used the health education material collected in stage 1 and conducted a computer-based readability and vocabulary analysis. Both analyses were intended to answer research aims 2 and 3.

Study Setting

Patient information brochures on Psoriasis (Vulgaris) and Psoriatic Arthritis were collected. All booklets had to be freely available on the internet. Print-only booklets or multimedia content were not considered. Documents were eligible for inclusion if they: (1) provided information on Psoriasis and Psoriatic Arthritis for patients; (2) provided information in the German language; and (3) were free to access. If these criteria were not met, then the related documents were excluded from the readability and vocabulary analysis.

For the identification of relevant brochures, the expert term “Psoriasis” was chosen, accompanied by its more layman-friendly German term “Schuppenflechte.” The two terms refer to the same concept, and patients in Germany are familiar with both. The German term “Broschüre” (English: brochure) was included to find educational materials suited for patients rather than other types of PDF files, such as drug package inserts or electronic presentation slides by medical professionals. The DuckDuckGo search engine was utilized to search the Web with the following search terms: +Broschüre +Psoriasis filetype:pdf (search terms A), +Broschüre +Schuppenflechte filetype:pdf (search terms B), +Schuppenflechte filetype:pdf (search terms C), and +Psoriasis filetype:pdf (search terms D).

After the elimination of duplicates, two authors screened the titles and the content of the retrieved information brochures in a joint session to check whether the educational material targeted Psoriasis/Psoriatic Arthritis patients. Therefore, false-positive retrieval results were removed during this manual step.

Readability Analysis


Readability [58] is a term to describe the properties of written text concerning the readers’ competence, motivation, and understanding of a document [59]. It depends on the complexity of a text’s structure, the sentence structure, and the vocabulary used.

Flesch Reading Ease Scale

A well-established readability scale for the English language is the Flesch Reading Ease metric [29]. The FRE measures the readability of a text via its average sentence length (ASL) and the average number of syllables per word (ASW). It relies on the fact that short words or sentences are usually easier to understand than longer ones. However, for this analysis, we applied the modified FRE for the German language by Toni Amstad [34]:

Vienna Formula

In contrast to the FRE, the Vienna formula (WSTF) is not an adapted version for the German language. Instead, it relies on work by Bamberger and Vanacek [35], who analyzed the bases of German text material and derived at least five versions of the Vienna formula for prose and nonfiction text. Typically, the fourth WSTF is used for text analyses. This metric also relies on the ASL and the proportion of (complex) words with three or more syllables (MS):

Vocabulary Classification

For the German language, average words’ length or syllable counts are not a good indicator of whether a term/concept is laypeople compatible, which means it can be easily understood by people with an education level of grades 6-7. This is because German grammer allows the creation and use of many compound words (eg, “Hauterkrankung,” “Hautunverträglichkeit,” “Kontaktallergie”), which are, while lengthy, quite laymen friendly for an average patient. Several machine learning techniques can be leveraged to compensate for the limitations of established readability measures [36,60]. This is why we added the vocabulary-based SVM approach as an extra dimension of text analysis.

In previous work [36], a vocabulary-based computation of an “expert level” using a specially trained SVM for German was presented, which was applied to cancer information brochures [61] and is also applicable to Psoriasis information brochures. To use this pretrained classifier to quantify the vocabulary-based difficulty of medical text material, several preprocessing steps are necessary [62]. As a first step, each text is split into tokens (ie, single word fragments). Second, nonhuman readable markup (eg, XML tags), as well as stop words, are removed (eg, he/she/it). This is important as these kinds of tokens do not influence the difficulty of a text. Next, the remaining tokens are reduced to their stem forms (eg, surgeries becomes surger) to eliminate linguistic variations of the same basic concept. Finally, the text content of a document is transformed into its mathematical representation based on previously selected features, similarly to a study conducted by Keinki et al [63]. In this context, features represent characteristic terms from the medical domain and thereby influence the vocabulary-based difficulty of a text.

To quantify the degree of “expert-centricity” of the text material, the vocabulary measure (L) ∈ [1,…,10] is defined. It makes use of the SVM classifier above. In this context, higher values of L indicate an academic (medical) background knowledge or working experience in the medical domain is needed; a value of >7 corresponds to a very expert-centric text, a value of 5-6 to a difficult text, a value of 4-5 to a moderate text (laypeople with medical, educational background), a value of 3-4 to an easy text (intermediate level/junior high school), and a value of <3 to a very easy text (elementary level/elementary school).


The aforementioned instruments make use of different scales to express difficulty, either in terms of readability or vocabulary. Therefore, it seems advisable to map these scales to independent classes that express the difficulty much more simply. The mapping used in this study is presented in Table 1.

Table 1. Mapping readability and vocabulary instrument scales to corresponding classes (labels). Adapted according to [61].
DifficultyFREa ∈ [0,100]WSTFb ∈ [4,15]L ∈ [1,10]Class label
Very difficult to read[0-29][14-15]9, 10VDb
Difficult to read[30-49][12-14[7, 8Dd
Fairly difficult to read[50-59][10-12[6D
Average readability[60-69][8-10[5Me
Fairly easy to read[70-79][7-8[4Ef
Easy to read[80-89][5-7[3E
Very easy to read[90-100][4-5[1, 2VEg

aFRE: Flesch Reading Ease.

bWSTF: Fourth Vienna Formula (German: Wiener SachTextFormel).

cVD: very difficult.

dD: difficult.

eM: moderate.

fE: easy.

gVE: very easy.

Computational Processing Steps

Parsing a text document is the process of analyzing its structure and fragments according to the rules of a natural language’s grammar. Typically, modern text documents (eg, PDF, DOC, DOCX) include metadata that describes their internal structure or external representation. In this context, text parsers process the descriptive markup structure of such document formats. The primary aim of this process is to extract the raw version of a text without any remaining technical markup which describes structural information. Typically, this includes how a paragraph is oriented, to which section it belongs, if text is formatted bold, if it contains figures or tables, and so on [64] (see chapters 5 and 6 for further details).

Before a parser can extract raw text data, the construction of a document collection is necessary. In the context of this study, all information brochures were downloaded as PDF files. These files were automatically converted to documents in DOCX format and represent the input of our analysis framework. The computational processing steps to compute readability and vocabulary scores for each document follows the workflow depicted in Figure 1.

Figure 1. Workflow of the processing steps and involved software components: (1) text content extraction; (2) a collection of data preparation and cleaning tasks; and (3) computation of the readability and vocabulary metrics. The analysis framework processes PDF, DOC, DOCX as input format and outputs a summary Excel spreadsheet for each document processed. SVM: support vector machine; FRE: Flesch Reading Ease; WSTF: Fourth Vienna Formula (German: Wiener SachTextFormel).
View this figure

First, document parsers from the Apache Tika framework [65] were applied to extract the actual text content. As a second step, the extracted text was cleaned of disturbance artifacts (eg, different hyphen encoding schemes). Finally, the aforementioned readability and vocabulary metrics were computed for every brochure by a self-implemented analysis framework written in Java, which was previously tested against reference material. For sentence detection, the analysis framework relies on the Apache OpenNLP library [66] and their broadly accepted sentence model for the German language [67]. Liang’s hyphenation algorithm [68] was used to estimate syllable counts. For stem form reduction, the Snowball Stemmer, according to Porter, was applied [69]. The analysis was conducted on a Mac OS 10.14.6 64bit computer running Java 11.0.4 (Oracle Corporation, Redwood Shores, California, United States) on August 21, 2019.

Statistical Analysis

A two-sided, two-sample Wilcoxon test [70], also known as the Mann-Whitney U test, was applied to test whether the difficulty of brochures of two topic groups are different to each other (H0: μ12, H1: μ1≠μ2, alpha=0.05). If P<.05, H1 is accepted, as in there is a significant difference in terms of readability between two groups. The nonparametric U test was chosen as the number of brochures for several topic groups was rather small (n<10), and no normal distribution could be assumed. Data were analyzed with the statistics software R (The R Foundation, Vienna, Austria) version 3.6.1, on a Linux, Ubuntu 18.04 LTS/64bit computer.

Principal Findings

The acquisition of Psoriasis/Psoriatic Arthritis brochures was carried out on August 19 and 20, 2019, by two of the authors. Given the search terms and the inclusion criteria, 73 brochures were eligible for inclusion, of which five were identified as either duplicate content or as being too general (ie, they were unspecific or covered other dermatology topics). The flowchart in Figure 2 depicts the data acquisition process with all details.

Figure 2. Data acquisition process with search terms A-D, as defined in the section "Study Setting".
View this figure

In total, 68 brochures were included for further readability and vocabulary assessment. While assessing the brochures for eligibility, four categories emerged from the search engine’s retrieval results: basic information on the disease (Psoriasis/Psoriatic Arthritis, labeled G1/G2), general advice on coping with Psoriasis/Psoriatic Arthritis in daily life situations (labeled G3/G4), including topics such as stress, diet, smoking, work-life and traveling, medication and therapy guidance (G5), and other topics (G6).

Sample Characteristics

During the collection, several types of publishers emerged: pharmaceutical company or association, nonprofit organization, and public institution. Of the 68 brochures, 71% (48/68) were published by pharmaceutical companies or associations, 22% (15/68) by nonprofit organizations, and 7% (5/68) by public institutions. A detailed listing, given in Multimedia Appendix 1, includes the original German document title, publisher and type, and publishing year separated into G1-G6.

The included brochures were analyzed in terms of their linguistic characteristics. The number of sentences per brochure ranged from 45-619 (mean 235; SD 147.40) and the number of words from 579-11,430 (mean 3852; SD 2542.58). On average, 16.4 words were used by brochure authors to form a sentence (SD 3.03; minimum=11.5; maximum=27.7). Complex words, which meant ≥3 syllables, ranged from 253-4424 (mean 1284; SD 914.88). The minimal proportion of complex words was 22.85% (995/4354) and the maximum was at 46.9% (441/940), with a mean of 33.57% (1284/3852). A complete listing with data on the number of sentences, words, complex words, and syllables is given in Multimedia Appendix 2 per brochure and group (G1-G6).

Readability Analysis

All brochure groups (G1-G6) were analyzed according to the readability metrics FRE and WSTF, as outlined in the Methods section. The results are presented in Table 2. The majority of the booklets are difficult (D) (FRE: 66%, 45/68; WSTF: 74%, 50/68), or very difficult (FRE: 34%, 23/68; WSTF: 13%, 9/68), to read.

In G1, the brochure with the lowest readability was PSO_110, with an FRE value of 19.26 and corresponding to the second highest WSTF value of 14.11 (VD). The corresponding Psoriatic Arthritis group G2 showed the lowest FRE value for PSO_210, with FRE=2.71 and WTSF=15 (VD). The third document set (G3) scored higher FRE values, thus signaling higher readability, which is supported by lower WSTF scores in this group. The corresponding Psoriatic Arthritis group (G4) produced similar results to G1. On average, documents about Psoriasis/Psoriatic Arthritis medication or therapy advice (G5) scored lowest, with PSO_502 being the most difficult one in this group (FRE=8.36; WTSF=15; VD). The lowest mean readability levels were FREG5=23.50 and WSTFG5=12.95. The highest readability was achieved for G3, with an FRE of 41.39 and a WSTF of 10.27. For G6, no mean was calculated as the sample size was too small. Several selected text fragments with low or high readability levels can be found in Multimedia Appendix 3.

The distributions for both readability metrics, FRE, and the Vienna formula (WSTF), are depicted in Figures 3 and 4.

Figure 3. Distribution of achieved readability values on the Flesch Reading Ease scale. Difficulty is indicated by color, with dark green as the highest readability (90-100) and dark red as the lowest readability (0-10).
View this figure
Figure 4. Distribution of achieved readability values on the Vienna formula scale. Difficulty is indicated by color, with dark green as the highest readability (4-5) and dark red as the lowest readability (14-15).
View this figure
Table 2. Listing of readability and vocabulary scores, and associated class labels.
Group and identifierFREaWSTFbLcCdFRECWSTFCSVMe
G1, Psoriasis, Basic Information (n=20)





















G2, Psoriatic Arthritis, Basic Information (n=15)
















G3, Psoriasis, Stress, Diet, Travelling, Smoking (n=12)













G4, Psoriatic Arthritis, Stress, Diet, Travelling, Smoking (n=8)









G5, Psoriasis and Psoriatic Arthritis, Medication, Therapy (n=11)












G6, Psoriasis and Psoriatic Arthritis, Other Topics (n=2)


Total Mean31.5811.843.66

aFRE: Flesch Reading Ease.

bWSTF: Fourth Vienna Formula (German: Wiener SachTextFormel).

cL: vocabulary measure.

dC: class label.

eSVM: support vector machine.

fD: difficult.

gVE: very easy.

hVD: very difficult.

iM: moderate.

jE: easy.

kNot applicable.

Vocabulary Classification

Overall, the brochures had a mean vocabulary measure (L) of L=3.66. As listed in Table 2, two-thirds of the educational materials (69%; 47/68) achieved a score ≤4 (VE+E) and were therefore suitable for a lay audience. A total of 11/68 booklets (16%) had a score ≥9 and are thus only suitable for an academic readership. For the remaining ten booklets (15%, 10/68), a score between >4 and <9 corresponds to a level suitable for persons with medical knowledge or a strong medical background. The groups G3 and G4 scored the lowest vocabulary measure, with L=1.75 for each. The highest vocabulary measure was found for the booklet group on medication and therapy topics (G5), with L=6.64. The distribution of the classification results over all the brochure groups is depicted in Figure 5.

Figure 5. Distribution of achieved vocabulary values on the SVM classification scale. Difficulty is indicated by color with dark green as the most laymen friendly (1) and dark red as the highest expert level required (10). SVM: support vector machine.
View this figure

A comparison of the topic groups was conducted for the pairs G1/G3 and G2/G4. The results of the corresponding Wilcoxon test for two independent samples are presented in Table 3. Negative values originate from the definition of the FRE metric; that is, lower numbers correspond to a higher difficulty. In addition, due to a high number of ties (in the ranks) for the vocabulary metric (L), an exact computation of CI and P was not possible. Instead, a normal approximation was used by the statistics software R.

Table 3. Comparison of different brochure groups for difficulty.
Comparison and MetricDifference of Means95% CIP value
G1 versus G3

FREa–8.745–14.830 to –2.516.001


G2 versus G4

FRE–7.985–14.513 to –1.256.03


L1.317–0.00004 to 2.00.03

aFRE: Flesch Reading Ease.

bWSTF: Fourth Vienna Formula (German: Wiener SachTextFormel). cL: vocabulary measure.

The observed differences between the brochure groups G1 and G3 (Psoriasis) for FRE (P=.001), WSTF (P=.003), and L (P=.01) were statistically significant, as were the FRE (P=.03), WSTF (P=.03), and L (P=.03) of G2 and G4 (Psoriatic Arthritis).

Principal Results

High-quality health information must not only include the best available external evidence, it must also be readable and reflect patients’ preferences [71]. In order to comply with these requirements, the application of easy language is essential [42-50,52-55,57,72].

The readability findings show that the majority of the collected material is difficult or very difficult (D+VD) to read, as shown by the WSTF (87%; 59/68). The outcome is more apparent when the German adaption of the FRE scale is applied (100%; 68/68) (Table 2). Thus, educational materials on Psoriasis/Psoriatic Arthritis are not suitable for their intended group of readers. This corresponds to the results of other authors, who also reported the high readability levels of such resources [73-77].

The vocabulary is also of great relevance for comprehensibility and might be even more decisive than the sentence structure [78]. The finding of the vocabulary analysis revealed that two-thirds (69%; 47/68) of the educational materials were well suited for laypeople. This originates from the fact that relatively few medical expert terms have been used during text production, or expert terminology has been actively avoided. With the difficulty assessment of 68 Psoriasis/Psoriatic Arthritis brochures, we demonstrated that a pretrained SVM can analyze text material for its vocabulary. The study findings therefore contribute the first dedicated vocabulary analysis related to the use of expert medical terms in patient educational material for Psoriasis and Psoriatic Arthritis.


Several limitations apply to the study setting. First, a public search engine was utilized to build the data collection used in this study. In this context, the internal mechanisms used to compute and retrieve information from a search engine’s index are not fully transparent. For this reason, some potentially relevant documents might have been missed by our data collection process. The retrieval was also limited to PDF documents. The study design included this file type as the corresponding documents are easily accessible in electronic format (machine-readable), can also be distributed in printed format (these documents are, in general, highly structured and proof-read by publishing institutions), and represent a robust, well-known data format to provide information on (chronic) diseases and related treatment options via the internet.

Second, for this study, we analyzed 68 brochures on Psoriasis/Psoriatic Arthritis published by different types of organizations (see Multimedia Appendix 1). Depending on the motivation of an organization, there might be different aims in terms of content, words used, and selected topics. This might have affected our results, as scientific organizations might have used more complex sentence structures to explain Psoriasis/Psoriatic Arthritis concepts, while pharmaceutical companies might tend towards easier vocabulary and sentence structure.

Next, in the preprocessing phase, the included PDF brochures were automatically converted to documents in DOCX format. Nevertheless, disturbance artifacts, that is, different kinds of hyphens or misencoded characters originating from different encoding schemes, may still have been included in the extracted, raw text material.

The adapted FRE metric and Vienna formula are mainly computed on the basis of mean sentence length, the mean number of syllables per word, and language-specific weighting factors. However, detecting syllables is not a trivial task for the German language and does not work reliably in some rare circumstances [79]. For this reason, the computed FRE or WSTF scores can be influenced by the aforementioned inaccuracies. In this context, it should be stressed that this affects all natural language processing analysis tools for German text material.

Furthermore, solely computing the readability of educational materials disregards the individual knowledge and motivation of readers [35]. Aspects related to illustration and design were not included in the analysis of this study. Consequently, the suitability of health information cannot exclusively be judged based on its readability or its used vocabulary [35,80]. In this context, the studies by Taylor-Clarke et al and Tuot et al [45,81], among others, have applied methods that go beyond measures of word and sentence lengths, such as the Suitability Assessment of Materials (SAM) instrument, which reflects other aspects of a brochure’s appearance that influence the understandability of (health) information and text comprehension.

However, besides the need for manual efforts, judging quality criteria is a highly subjective task for this instrument. Moreover, a sufficient number of judges are required to ensure an objective assessment of visual and aesthetic aspects in brochure design, which is not met by every study in this field. Even more important: interjudge reliability must be considered, evaluated, and reported properly. Modern approaches use crowd-sourcing techniques for which a large number of judges and related assessments can be obtained more easily [82].

Comparison With Prior Work

Previous studies investigated the readability of health education materials on Psoriasis/Psoriatic Arthritis written in the English language [15,57]. In both analyses, the outcome was that the materials failed to “meet the desired 6th grade level” [57]. Although no accepted recommendation exists for German health education material, our findings confirm the low readability of Psoriasis/Psoriatic Arthritis brochures for patients. In contrast to the studies by Feldman et al and Smith, this study contributes the first vocabulary-related assessments of materials originating from the dermatology domain. We found that the vocabulary used in Psoriasis/Psoriatic Arthritis brochures is adequate for laypeople; that is, patients and family members who have no professional background in the health sector. A secondary study outcome gives a broad picture over the published materials in German-speaking countries, listed by publisher and year in Multimedia Appendix 1.

In a previous study [61], Keinki et al analyzed information booklets for German cancer patients. In this particular domain, the authors found a mean vocabulary score of LSVM=5.09, signaling a higher difficulty for laypeople than in this study (LSVM=3.66), that is, Psoriasis/Psoriatic Arthritis brochures make use of less complex medical terminology. This difference might be explained by the fact that Psoriasis/Psoriatic Arthritis brochures are mainly (71%; 48/68) produced and published by pharmaceutical companies or related associations. In contrast, cancer booklets follow a stricter evidence-based text production process in Germany [83], that is, patient guidelines and brochures on cancer topics are written or reviewed by medical professionals.

Future Directions

This study analyzed static PDF document content for Psoriasis/Psoriatic Arthritis patients. In future work, the authors intend to extend their analyses to other types of online resources. This includes the content of trustworthy health information websites in German or articles in Wikipedia. Given such an analysis, a comparison to the work of Thomas et al [52] would be possible in terms of FRE and grade levels, as the authors reported even lower readability than in this study.


For 68 German Psoriasis and Psoriatic Arthritis brochures freely available on the internet, the study findings reveal that the readability is low (Figures 3 and 4). Publishing organizations and authors should, therefore, reevaluate existing brochures and reduce sentence complexity, but our findings suggest that the use of vocabulary suits the target audience (Figure 5).

Methods from the field of machine learning can support authors of Psoriasis/Psoriatic Arthritis brochures, as they complement existing readability assessment methodology. For this reason, the assessment of written patient information should preferably be analyzed in terms of sentence structure and vocabulary, such as via the SVM-based classifier used for this study. The authors recommend the use of both sentence dimension and vocabulary dimension as supportive measures to ensure and provide understandable health education materials, independent of the medical domain.


The authors thank Psoriasis Netz ( for hosting a well-maintained, curated collection of Psoriasis/Psoriatic Arthritis brochures on their website.

Conflicts of Interest

None declared.

Multimedia Appendix 1

List of extracted Pso / PsA brochures. Publisher types: PC: Pharmaceutical Company or Association; NPO: Non-Profit Organization; PI: Public Institution.

DOCX File , 34 KB

Multimedia Appendix 2

Linguistic characteristics of analyzed Pso / PsA brochures. Se: Sentences, W: Words, CW: Complex Words, W / Se: Words per Sentence, CW / W: relative share of Complex Words (in per cent), Sy: Syllables, Ch: Characters.

DOCX File , 38 KB

Multimedia Appendix 3

Selection of Pso / PsA text fragments (DE/EN).

DOCX File , 18 KB

  1. Griffiths CE, Barker JN. Pathogenesis and clinical features of psoriasis. Lancet 2007 Jul 21;370(9583):263-271. [CrossRef] [Medline]
  2. Gladman DD, Antoni C, Mease P, Clegg DO, Nash P. Psoriatic arthritis: epidemiology, clinical features, course, and outcome. Ann Rheum Dis 2005 Mar;64 Suppl 2:ii14-ii17 [FREE Full text] [CrossRef] [Medline]
  3. Krueger G, Koo J, Lebwohl M, Menter A, Stern RS, Rolstad T. The impact of psoriasis on quality of life: results of a 1998 National Psoriasis Foundation patient-membership survey. Arch Dermatol 2001 Mar;137(3):280-284. [Medline]
  4. Choi J, Koo JY. Quality of life issues in psoriasis. Journal of the American Academy of Dermatology 2003 Aug;49(2):57-61. [CrossRef]
  5. Mease PJ, Menter MA. Quality-of-life issues in psoriasis and psoriatic arthritis: outcome measures and therapies from a dermatological perspective. J Am Acad Dermatol 2006 Apr;54(4):685-704. [CrossRef] [Medline]
  6. Merola JF, Shrom D, Eaton J, Dworkin C, Krebsbach C, Shah-Manek B, et al. Patient Perspective on the Burden of Skin and Joint Symptoms of Psoriatic Arthritis: Results of a Multi-National Patient Survey. Rheumatol Ther 2019 Mar;6(1):33-45 [FREE Full text] [CrossRef] [Medline]
  7. Rapp SR, Feldman SR, Exum M, Fleischer AB, Reboussin DM. Psoriasis causes as much disability as other major medical diseases. Journal of the American Academy of Dermatology 1999 Sep;41(3):401-407. [CrossRef]
  8. Naldi L, Chatenoud L, Linder D, Belloni Fortina A, Peserico A, Virgili AR, et al. Cigarette Smoking, Body Mass Index, and Stressful Life Events as Risk Factors for Psoriasis: Results from an Italian Case–Control Study. Journal of Investigative Dermatology 2005 Jul;125(1):61-67. [CrossRef]
  9. Richards HL, Fortune DG, O'Sullivan TM, Main CJ, Griffiths CE. Patients with psoriasis and their compliance with medication. J Am Acad Dermatol 1999 Oct;41(4):581-583. [Medline]
  10. Zaghloul SS, Goodfield MJD. Objective assessment of compliance with psoriasis treatment. Arch Dermatol 2004 Apr;140(4):408-414. [CrossRef] [Medline]
  11. Fortune DG, Richards HL, Griffiths CE. Psychologic factors in psoriasis: consequences, mechanisms, and interventions. Dermatol Clin 2005 Oct;23(4):681-694. [CrossRef] [Medline]
  12. Seng TK, Nee TS. Group therapy: a useful and supportive treatment for psoriasis patients. Int J Dermatol 1997 Feb;36(2):110-112. [CrossRef] [Medline]
  13. Idriss SZ, Kvedar JC, Watson AJ. The role of online support communities: benefits of expanded social networks to patients with psoriasis. Arch Dermatol 2009 Jan 01;145(1):46-51. [CrossRef] [Medline]
  14. Renzi C, Di PC, Tabolli S. Participation, satisfaction and knowledge level of patients with cutaneous psoriasis or psoriatic arthritis. Clin Exp Dermatol 2011 Dec;36(8):885-888. [CrossRef] [Medline]
  15. Feldman SR, Vanarthos J, Fleischer AB. The readability of patient education materials designed for patients with psoriasis. Journal of the American Academy of Dermatology 1994 Feb;30(2):284-286. [CrossRef]
  16. Doak LG, Doak CC. Patient comprehension profiles: recent findings and strategies. Patient Counselling and Health Education 1980 Jul;2(3):101-106. [CrossRef]
  17. Kirsch IS, Jungeblut A, Jenkins L, Kolstad A. National Center for Education Statistics. Washington, DC: U.S. Department of Education; 2002. Adult Literacy in America: A First Look at the Findings of the National Adult Literacy Survey   URL: [accessed 2020-01-16]
  18. Davis TC, Mayeaux EJ, Fredrickson D, Bocchini JA, Jackson RH, Murphy PW. Reading ability of parents compared with reading level of pediatric patient education materials. Pediatrics 1994 Mar;93(3):460-468. [Medline]
  19. Centers for Disease Control and Prevention. Atlanta, Georgia, United States: U.S. Department of Health and Human Services; 2009 Apr. Simply Put: A guide for creating easy-to-understand materials   URL: [accessed 2020-01-16]
  20. The Nation's Report Card. Washington, DC: U.S. Department of Education; 2012. NAEP - 2012 Long-term Trend: Summary of Major Findings   URL: [accessed 2020-01-16]
  21. U.S. National Library of Medicine. How to Write Easy-to-Read Health Materials. 2017.   URL: [accessed 2020-01-16] [WebCite Cache]
  22. Ownby RL. Influence of vocabulary and sentence complexity and passive voice on the readability of consumer-oriented mental health information on the Internet. AMIA Annu Symp Proc 2005:585-589 [FREE Full text] [Medline]
  23. Bourhis RY, Roth S, MacQueen G. Communication in the hospital setting: a survey of medical and everyday language use amongst patients, nurses and doctors. Soc Sci Med 1989;28(4):339-346. [CrossRef] [Medline]
  24. Hume MA, Kennedy B, Asbury AJ. Patient knowledge of anaesthesia and peri-operative care. Anaesthesia 1994 Aug;49(8):715-718 [FREE Full text] [CrossRef] [Medline]
  25. Chapple A, Campion P, May C. Clinical terminology: anxiety and confusion amongst families undergoing genetic counseling. Patient Educ Couns 1997;32(1-2):81-91. [CrossRef] [Medline]
  26. Koch-Weser S, Dejong W, Rudd RE. Medical word use in clinical encounters. Health Expect 2009 Dec;12(4):371-382 [FREE Full text] [CrossRef] [Medline]
  27. Wittenberg-Lyles E, Goldsmith J, Oliver DP, Demiris G, Kruse RL, Van Stee S. Using medical words with family caregivers. J Palliat Med 2013 Sep;16(9):1135-1139 [FREE Full text] [CrossRef] [Medline]
  28. Wittenberg E, Goldsmith J, Ferrell B, Platt CS. Enhancing Communication Related to Symptom Management Through Plain Language. J Pain Symptom Manage 2015 Nov;50(5):707-711. [CrossRef] [Medline]
  29. Flesch R. A new readability yardstick. J Appl Psychol 1948 Jun;32(3):221-233. [CrossRef] [Medline]
  30. Gunning R. The Technique of Clear Writing. New York City, New York, United States: McGraw Hill; Jun 01, 1968.
  31. Mc Laughlin GH. SMOG Grading-a New Readability Formula. J Read 1969;12(8):639-646.
  32. Coleman M, Liau TL. A computer readability formula designed for machine scoring. Journal of Applied Psychology 1975;60(2):283-284. [CrossRef] [Medline]
  33. Fry E. Fry's Readability Graph: Clarifications, Validity, and Extension to Level 17. J Read 1977;21(3):242-252.
  34. Amstad T. Wie verständlich sind unsere Zeitungen? [How readable are our newspapers?]. Zurich, Switzerland: University of Zurich; 1978.
  35. Bamberger R, Vanecek E. Lesen - Verstehen - Lernen - Schreiben. Die Schwierigkeitsstufen von Texten in deutscher Sprache [Reading - Understanding - Learning - Writing. The difficulty levels of German texts]. Vienna, Austria: Jugend u. Volk Sauerlaender; 1984.
  36. Zowalla R, Wiesner M, Pfeifer D. Automatically Assessing the Expert Degree of Online Health Content using SVMs. Stud Health Technol Inform 2014;202:48-51. [Medline]
  37. Ratzan S, Parker R. Introduction. In: Seldon C, Zorn M, Ratzan S, Parker R. editors. Natl Libr Med Curr Bibliogr Med Health Lit 1st edition. Washington, DC: National Institutes of Health, US Department of Health and Human Services; 2000.
  38. Berkman ND, Sheridan SL, Donahue KE, Halpern DJ, Crotty K. Low health literacy and health outcomes: an updated systematic review. Ann Intern Med 2011 Jul 19;155(2):97-107. [CrossRef] [Medline]
  39. Zok K. Unterschiede bei der Gesundheitskompetenz - Ergebnisse einer bundesweiten Repräsentativ-Umfrage unter gesetzlich Versicherten [Differences of Health Literacy - Results of a nation-wide Representative Survey among Statutory Health Insurees]. WIdO-monitor 2014;11(2):1-12 ISSN: 1614-8444 [FREE Full text]
  40. Schaeffer D, Berens E, Vogt D. Health Literacy in the German Population: Results of a Representative Survey. Dtsch Arztebl Int 2017 Jan 27;114(4):53-60 [FREE Full text] [CrossRef] [Medline]
  41. Schaeffer D, Vogt D, Berens EM, Hurrelmann K. Gesundheitskompetenz der Bevölkerung in Deutschland: Ergebnisbericht [Health Literacy of the Population in Germany: Final Report]. Bielefeld, Germany: University of Bielefeld, Faculty of Health Sciences; 2017.
  42. Friedman DB, Hoffman-Goetz L, Arocha JF. Readability of cancer information on the internet. J Cancer Educ 2004;19(2):117-122. [CrossRef] [Medline]
  43. Basch CH, Ethan D, MacLean SA, Fera J, Garcia P, Basch CE. Readability of Prostate Cancer Information Online: A Cross-Sectional Study. Am J Mens Health 2018 Sep 09;12(5):1665-1669 [FREE Full text] [CrossRef] [Medline]
  44. Brütting J, Reinhardt L, Bergmann M, Schadendorf D, Weber C, Tilgen W, NVKH. Quality, Readability, and Understandability of German Booklets Addressing Melanoma Patients. J Cancer Educ 2019 Aug 7;34(4):760-767. [CrossRef] [Medline]
  45. Taylor-Clarke K, Henry-Okafor Q, Murphy C, Keyes M, Rothman R, Churchwell A, et al. Assessment of commonly available education materials in heart failure clinics. J Cardiovasc Nurs 2012;27(6):485-494 [FREE Full text] [CrossRef] [Medline]
  46. Kher A, Johnson S, Griffith R. Readability Assessment of Online Patient Education Material on Congestive Heart Failure. Adv Prev Med 2017;2017:9780317. [CrossRef] [Medline]
  47. Athilingam P, Jenkins B, Redding BA. Reading Level and Suitability of Congestive Heart Failure (CHF) Education in a Mobile App (CHF Info App): Descriptive Design Study. JMIR Aging 2019 Apr 25;2(1):e12134 [FREE Full text] [CrossRef] [Medline]
  48. Weiss KD, Vargas CR, Ho OA, Chuang DJ, Weiss J, Lee BT. Readability analysis of online resources related to lung cancer. J Surg Res 2016 Nov;206(1):90-97. [CrossRef] [Medline]
  49. Hansberry DR, White MD, D'Angelo M, Prabhu AV, Kamel S, Lakhani P, et al. Lung Cancer Screening Guidelines: How Readable Are Internet-Based Patient Education Resources? AJR Am J Roentgenol 2018 Jul;211(1):W42-W46. [CrossRef] [Medline]
  50. Haas K, Brillante C, Sharp L, Elzokaky AK, Pasquinelli M, Feldman L, et al. Lung cancer screening: assessment of health literacy and readability of online educational resources. BMC Public Health 2018 Dec 07;18(1):1356 [FREE Full text] [CrossRef] [Medline]
  51. Fullmann K, Blackburn DF, Fenton ME, Mansell H. Readability and Suitability of COPD Consumer Information. Can Respir J 2017;2017:2945282-2945288 [FREE Full text] [CrossRef] [Medline]
  52. Thomas GR, Eng L, de Wolff JF, Grover SC. An evaluation of Wikipedia as a resource for patient education in nephrology. Semin Dial 2013;26(2):159-163. [CrossRef] [Medline]
  53. Morony S, Flynn M, McCaffery KJ, Jansen J, Webster AC. Readability of Written Materials for CKD Patients: A Systematic Review. Am J Kidney Dis 2015 Jun;65(6):842-850. [CrossRef] [Medline]
  54. Edmunds MR, Barry RJ, Denniston AK. Readability assessment of online ophthalmic patient information. JAMA Ophthalmol 2013 Dec;131(12):1610-1616. [CrossRef] [Medline]
  55. Williams AM, Muir KW, Rosdahl JA. Readability of patient education materials in ophthalmology: a single-institution study and systematic review. BMC Ophthalmol 2016 Aug 03;16:133 [FREE Full text] [CrossRef] [Medline]
  56. Tulbert BH, Snyder CW, Brodell RT. Readability of Patient-oriented Online Dermatology Resources. J Clin Aesthet Dermatol 2011 Mar;4(3):27-33 [FREE Full text] [Medline]
  57. Smith GP. The readability of patient education materials designed for patients with psoriasis: what have we learned in 20 years? J Am Acad Dermatol 2015 Apr;72(4):737-738. [CrossRef] [Medline]
  58. Klare GR. Assessing Readability. Reading Research Quarterly 1974;10(1):62. [CrossRef]
  59. Klare G. The formative years. In: Zakaluk BL, Samuels SJ, editors. Readability: Its past, present, and future. Newark, Delaware, United States: International Reading Association; 1988:14-34.
  60. Leroy G, Miller T, Rosemblat G, Browne A. A balanced approach to health information evaluation: A vocabulary-based naïve Bayes classifier and readability formulas. J. Am. Soc. Inf. Sci 2008 Jul;59(9):1409-1419. [CrossRef]
  61. Keinki C, Zowalla R, Pobiruchin M, Huebner J, Wiesner M. Computer-Based Readability Testing of Information Booklets for German Cancer Patients. J Cancer Educ 2019 Aug 12;34(4):696-704. [CrossRef] [Medline]
  62. Joachims T. Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec C, Rouveirol C, editors. Machine Learning: ECML-98. Berlin, Heidelberg, Germany: Springer; 1998:137-142.
  63. Keinki C, Zowalla R, Wiesner M, Koester MJ, Huebner J. Understandability of Patient Information Booklets for Patients with Cancer. J Cancer Educ 2018 Jun 10;33(3):517-527. [CrossRef] [Medline]
  64. Mattmann CA, Zitting JL. Tika In Action. Shelter Island, New York, United States: Manning Publications; 2020.
  65. Apache Software Foundation. Apache Tika. 2019.   URL: [accessed 2020-01-16]
  66. Apache Software Foundation. Apache OpenNLP. 2019.   URL: [accessed 2020-01-16]
  67. Apache Software Foundation. OpenNLP. 2019.   URL: [accessed 2020-01-16]
  68. Liang F. Word Hy-phen-a-tion by Com-put-er. Stanford, California, United States: Stanford University; 1983.
  69. Porter M. An algorithm for suffix stripping. Program 1980 Mar;14(3):130-137. [CrossRef]
  70. Hollander M, Wolfe D, Chicken E. Nonparametric statistical methods. Third edition. Hoboken, New Jersey, United States: John Wiley and Sons, Inc; 2014.
  71. Hoefert HW, Klotter C. In: Hoefert HW, Flick U, Härter M, editors. Wandel der Patientenrolle. Neue Interaktionsformen im Gesundheitswesen [Change of the Patient role. New Forms of Interaction in the Healthcare System]. Göttingen, Germany: Hogrefe Verlag; 2011.
  72. Fagerlin A, Zikmund-Fisher BJ, Ubel PA. Helping patients decide: ten steps to better risk communication. J Natl Cancer Inst 2011 Oct 05;103(19):1436-1443 [FREE Full text] [CrossRef] [Medline]
  73. Cooley ME, Moriarty H, Berger MS, Selm-Orr D, Coyle B, Short T. Patient literacy and the readability of written cancer educational materials. Oncol Nurs Forum 1995 Oct;22(9):1345-1351. [Medline]
  74. Garcia SF, Hahn EA, Jacobs EA. Addressing low literacy and health literacy in clinical oncology practice. J Support Oncol 2010;8(2):64-69 [FREE Full text] [Medline]
  75. Nicholls S, Hankins M, Hooley C, Smith H. A survey of the quality and accuracy of information leaflets about skin cancer and sun-protective behaviour available from UK general practices and community pharmacies. J Eur Acad Dermatol Venereol 2009 May;23(5):566-569. [CrossRef] [Medline]
  76. Singh J. Reading Grade Level and Readability of Printed Cancer Education Materials. Oncology Nursing Forum 2007 Feb 8;30(5):867-870. [CrossRef]
  77. Weintraub D, Maliski SL, Fink A, Choe S, Litwin MS. Suitability of prostate cancer education materials: applying a standardized assessment tool to currently available materials. Patient Educ Couns 2004 Nov;55(2):275-280. [CrossRef] [Medline]
  78. Hasan M, Kotov A, Carcone A, Dong M, Naar S, Hartlieb KB. A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories. J Biomed Inform 2016 Aug;62:21-31 [FREE Full text] [CrossRef] [Medline]
  79. Müller K. Automatic Detection of Syllable Boundaries Combining the Advantages of Treebank and Bracketed Corpora Training. In: Proc 39th Annu Meet Assoc Comput Linguist. Stroudsburg, Pennsylvania: Association for Computational Linguistics; 2001 Presented at: Annual Meeting on Association for Computational Linguistics; July 6-11; Toulouse, France p. 410-417. [CrossRef]
  80. Friedman DB, Hoffman-Goetz L. A systematic review of readability and comprehension instruments used for print and web-based cancer information. Health Educ Behav 2006 Jun 30;33(3):352-373. [CrossRef] [Medline]
  81. Tuot DS, Davis E, Velasquez A, Banerjee T, Powe NR. Assessment of printed patient-educational materials for chronic kidney disease. Am J Nephrol 2013;38(3):184-194 [FREE Full text] [CrossRef] [Medline]
  82. Carvalho A, Dimitrov S, Larson K. How many crowdsourced workers should a requester hire? Ann Math Artif Intell 2016 Jan 6;78(1):45-72. [CrossRef]
  83. Schaefer C, Zowalla R, Wiesner M, Siegert S, Bothe L, Follmann M. Patientenleitlinien in der Onkologie: Zielsetzung, Vorgehen und erste Erfahrungen mit dem Format [Patient guidelines in oncology: objectives, procedures and first experiences with this format]. Z Evid Fortbild Qual Gesundhwes 2015 Jan;109(6):445-451. [CrossRef] [Medline]

AAD: American Academy of Dermatology
ASL: Average Sentence Length
ASW: Average Number of Syllables per Word
CHF: Congestive Heart Failure
FRE: Flesch Reading Ease
ICD-10: International Classification of Diseases Tenth Edition
L: vocabulary measure
MS: Words with three or More Syllables
SAM: Suitability Assessment of Materials
SVM: support vector machine
WSTF: Fourth Vienna Formula (German: Wiener SachTextFormel)

Edited by G Eysenbach; submitted 02.09.19; peer-reviewed by J Papadakos, J Lander; comments to author 09.11.19; revised version received 17.12.19; accepted 19.12.19; published 11.03.20


©Martin Wiesner, Richard Zowalla, Monika Pobiruchin. Originally published in JMIR Dermatology (, 11.03.2020.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Dermatology Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.