Background

JDERM

JMIR Dermatol

JMIR Dermatology

2562-0959

JMIR Publications

Toronto, Canada

v5i4e38783

37632892

10.2196/38783

Original Paper

Phenotype Algorithms to Identify Hidradenitis Suppurativa Using Real-World Data: Development and Validation Study

Dellavalle

Robert

Sivesind

Torunn

Gulliver

Susanne

Feldman

Steven

Steingrimsson

Steinn

Hardin

Jill

MBA, MS, PhD 1

Janssen Research and Development

1125 Trenton-Harbourton Road

Titusville, NJ, 08560

United States 1 650 619 8599 jhardi10@its.jnj.com

https://orcid.org/0000-0003-2682-2187

Murray

Gayle

MSW 1

https://orcid.org/0000-0002-3685-269X

Swerdel

Joel

MS, MPH, PhD 1 2

https://orcid.org/0000-0001-9491-2737

1 Janssen Research and Development

Titusville, NJ

United States 2 Observational Health Data Sciences and Informatics

New York, NY

United States

Corresponding Author: Jill Hardin jhardi10@its.jnj.com

Oct-Dec 2022

30 11 2022

5 4

e38783

15 4 2022 29 6 2022 9 11 2022 10 11 2022

©Jill Hardin, Gayle Murray, Joel Swerdel. Originally published in JMIR Dermatology (http://derma.jmir.org), 30.11.2022.

2022

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Dermatology, is properly cited. The complete bibliographic information, a link to the original publication on http://derma.jmir.org, as well as this copyright and license information must be included.

Background

Hidradenitis suppurativa (HS) is a potentially debilitating, chronic, recurring inflammatory disease. Observational databases provide opportunities to study the epidemiology of HS.

Objective

This study’s objective was to develop phenotype algorithms for HS suitable for epidemiological studies based on a network of observational databases.

Methods

A data-driven approach was used to develop 4 HS algorithms. A literature search identified prior HS algorithms. Standardized databases from the Observational Medical Outcomes Partnership (n=9) were used to develop 2 incident and 2 prevalent HS phenotype algorithms. Two open-source diagnostic tools, CohortDiagnostics and PheValuator, were used to evaluate and generate phenotype performance metric estimates, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value.

Results

We developed 2 prevalent and 2 incident HS algorithms. Validation showed that PPV estimates were highest (mean 86%) for the prevalent HS algorithm requiring at least two HS diagnosis codes. Sensitivity estimates were highest (mean 58%) for the prevalent HS algorithm requiring at least one HS code.

Conclusions

This study illustrates the evaluation process and provides performance metrics for 2 incident and 2 prevalent HS algorithms across 9 observational databases. The use of a rigorous data-driven approach applied to a large number of databases provides confidence that the HS algorithms can correctly identify HS subjects.

dermatology hidradenitis suppurativa medical dermatology observational data phenotype inflammation skin disease epidemiology algorithm

Introduction

Hidradenitis suppurativa (HS) is a chronic, recurring inflammatory disease of the skin. Clinically, subjects have nodules, draining skin tunnels (ie, sinus tracts), abscesses, and bands of severe scar formation in the intertriginous skin areas, such as the axillary, groin, perianal, perineal, and inframammary regions [1]. Patients with HS suffer from metabolic, psychiatric, and autoimmune disorders [2].

The use of real-world evidence from observational data is valuable for studying the epidemiology, clinical manifestations, and real-world experience of patients with HS. A critical step in using observational data for the study of HS is the development of accurate phenotype algorithms (PAs). A PA is the translation of the case definition of a health condition or phenotype into an executable algorithm based on clinical data elements in a database [3]. Several studies have investigated HS using health care claims, electronic medical records, patient care, and hospitalization databases and have been conducted using data from the United States, Germany, Finland, Taiwan, Korea, England, Canada, and Denmark [2,4-32]. These studies have focused on a range of topics in patients with HS, including the incidence and prevalence of HS in different populations and the associations between HS and autoimmune disorders. Only 5 studies have provided phenotype validation metrics [9,10,16,29,30]; 2 used hospital data [16,29], 4 used a single phenotype requiring at least one code for HS from the International Classification of Diseases, Ninth Revision (ICD-9) [9,10,29,30], and 1 evaluated several phenotypes [16].

The objectives of this study were to develop HS PAs, evaluate their performance, and characterize the resultant HS phenotypes across a network of 9 US and non-US observational databases. This study used a data-driven framework and developed HS PAs for use in observational databases.

Methods Overview

A literature search was conducted to identify studies that describe the codes and logic used to identify HS patients in observational databases. This literature search identified 30 articles, which provided a set of diagnosis codes for the identification of HS across vocabularies, including the ICD-9, the International Classification of Diseases, Tenth Revision (ICD-10), and Read codes. Five of the 30 articles included validation metrics. Our study utilized the Systemized Nomenclature of Medicine (SNOMED) vocabulary to develop the codes. The vocabulary and diagnostic codes used in the published studies and the SNOMED terms are presented in Multimedia Appendix 1. The Observational Health Data Sciences and Informatics (OHDSI) open-source Atlas tool [33] was used to create the HS PAs.

The observational databases used in this study were not created specifically to study HS. The observational data were obtained in the delivery of health care or for administrative or billing purposes in electronic format. A network of 9 observational databases (4 administrative claims databases from the United States, 1 from Japan, 1 from France, 1 from Germany, and 1 from Australia; and 1 US electronic health record [EHR] database; Table 1) were used to develop the PAs. The 9 databases were a mix of administrative insurance claims, EHRs, and general practitioner databases. Descriptions and details of each database are shown in Table 2. The databases were transformed to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.3.1) [34] so the PAs could be consistently applied across databases.

Four HS PAs were developed and evaluated in subjects of all ages [35] (Figure 1). The PA “incident 1x” used the first diagnosis code for HS in a subject’s history and required 365 days of prior continuous enrollment (CE) time to qualify for entry into the HS cohort. The date a subject met both criteria was the subject’s index date. The PA “incident 2x” used the first diagnosis code for HS in a subject’s history and required both a second HS diagnosis code within 31 to 365 days and 365 days of prior CE time. The date a subject met all 3 criteria became the subject’s index date. The prevalent PAs (“prevalent 1x” and “prevalent 2x”) were identical to the corresponding incident versions, except that the first HS diagnosis code was not required to be the first time an HS code occurred in a subject’s history, nor was there a requirement for 365 days prior CE.

The OHDSI CohortDiagnostics tool [36] allowed for evaluation and comparison of PAs at a cohort level, providing overall counts, incidence over time, the diagnosis code that allowed the subject into the cohort, cohort overlap, and temporal characterization.

Use of the PheValuator [37] method provided performance metrics, including the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) associated with each PA. PheValuator is a machine learning–based method of assessing PAs. It constructs a predictive model for the disease and calculates the predictive value of having the disease for each subject using the model. Using PheValuator, performance indices of an algorithm are calculated without reviewing medical charts. While algorithm validation results from chart review are considered the “gold standard,” we have compared the results from PheValuator with prior studies using chart review and found excellent agreement between the 2 methods [38]. Four additional PAs from Kim et al [16] were evaluated for comparison.

Computer code for PheValuator and CohortDiagnostics and the JSON files for the PAs are available on the authors’ website [39].

Table 1

Description of databases used in the study.

Name	Years	Country	Data type	Clinical visits included	Subjects, n (millions)	Age at first observation, average (years)	Female subjects, %	Length of follow-up, median (years)
IBM MarketScan Commercial Claims and Encounters	2000-2021	United States	Insurance claims	Inpatient/outpatient	157	31	51	1.56
IBM MarketScan Multi-State Medicaid	2006-2020	United States	Insurance claims	Inpatient/outpatient	31	23	56	1.52
IBM MarketScan Medicare Supplemental	2000-2021	United States	Insurance claims	Inpatient/outpatient	10	71	55	2.46
Optum’s de-identified Clinformatics Data Mart Database	2007-2021	United States	Insurance claims	Inpatient/outpatient	71	37	51	1.48
Optum Electronic Health Records	2007-2021	United States	Electronic health records	Inpatient/outpatient	99	37	53	2.63
Japan Medical Data Center	2000-2021	Japan	Insurance claims	Inpatient/outpatient	12	31	49	3.29
IQVIA Disease Analyzer–France	2016-2021	France	General practitioner data	Outpatient	4	37	52	0.9
IQVIA Disease Analyzer–Germany	2011-2021	Germany	General practitioner data with supplemental data from participating specialists	Outpatient	31	43	56	0.5
IQVIA Australian Longitudinal Patient Data	1996-2020	Australia	General practitioner data	Outpatient	5	37	22^a	0.5

^a59% of subjects did not have a designated sex in this study.

Figure 1

Schematics of phenotype algorithms for Hidradenitis suppurativa (HS).

Ethics Approval

The use of the IBM and Clinformatics databases was reviewed by the New England Institutional Review Board and was determined to be exempt from broad approval, as this project did not involve human subject research. Patient consent for publication was not required. All patients in the databases were deidentified, and the identities of data contributors were removed.

Results

We examined cohort characteristics of the PAs. These characteristics may be viewed interactively online [40]. The number of subjects ranged from 81 in the IQVIA Australian Longitudinal Patient Data (IALPD) database to 170,149 in the IBM MarketScan Commercial Claims and Encounters (CCAE) database for the incident 1x cohort. These numbers were as expected based on the relative sizes of the databases, indicating that all codes used were appropriate for each database. The counts were much higher in the US databases compared to the non-US databases. The reduction in the number of subjects in the incident 1x PA compared to the incident 2x PA ranged from about 90% in the IALPD, IQVIA Disease Analyzer–France (IDAF), and IQVIA Disease Analyzer–Germany (IDAG) databases to about 73% in the IBM MarketScan Multi-State Medicaid (MDCD) database. The incident 1x PA identified a higher proportion of female subjects compared to male subjects: 51% in the Japan Medical Data Center (JMDC) database and 81% in the MDCD database; the incident 2x PA identified a lower proportion of female subjects in the JMDC database (46%) but a higher proportion in all other databases, ranging from 53% for the IDAG database to 82% for the MDCD database. The overlap in subjects between the incident PAs for each database is shown in Figure 2. The incident 2x PA is a subset of the incident 1x PA.

A comparison of standardized differences between the incident 1x and the incident 2x cohorts for 3 data sets across 5 different time frames is shown in Figure 3. Differences in the standardized difference of the mean greater than 0.1 are considered imbalanced [41]. Points closer to the diagonal indicate similar proportions between cohorts; points farther from the diagonal indicate more disparate proportions. The plots compare the diagnosed conditions, prescribed drugs, laboratory measurements, and clinical procedures of the subjects in the incident 1x and incident 2x PA cohorts and illustrate the population differences. The CCAE database showed disparities between the 2 algorithms in the period 31 to 365 days after the index date. Some differences arose from higher proportions of diagnosis codes for HS (50% for incident 2x vs 11% for incident 1x, standard mean difference [SMD] 0.66) and prescriptions for clindamycin (32% for incident 2x vs 14% for incident 1x, SMD 0.3). There were also differences in the MDCD database population, with more subjects of a lower socioeconomic status. The MDCD database also showed differences in diagnosis codes for HS (70% for incident 2x vs 18% for incident 1x, SMD 0.86) and prescriptions for clindamycin (37% for incident 2x vs 13% for incident 1x, SMD 0.31). The Optum’s de-identified Clinformatics Data Mart Database (Clinformatics DOD) data set showed differences in proportions between the 2 cohorts for diagnosis codes for HS (62% for incident 2x vs 14% for incident 1x, SMD 0.81) and prescriptions for clindamycin (29% for incident 2x vs 13% for incident 1x, SMD 0.29). The relative proportions between the 2 cohorts for the majority of the characteristics in the CCAE, MDCD, and Clinformatics DOD databases showed similar proportions between the cohorts.

We examined the incident 2x algorithm for subject characteristics across the databases. We identified a higher proportion of female subjects with HS compared to male subjects. The largest disproportionality was in the MDCD database, in which 82% of the subjects were female. The JMDC database had the lowest disproportionality by sex, with 45% female subjects. An outpatient visit was the most common type of clinical visit for the first diagnosis of HS. Less than 5% of first diagnoses were made during an emergency room visit, with the exception of the MDCD database, for which the proportion was 10%. Examination of the index codes or diagnosis codes that allowed subjects into cohorts showed that the most prevalent code was the diagnosis code of “hidradenitis suppurativa” (SNOMED code 4241223; ICD-10 L73.2) in all databases except the CCAE database, in which the most prevalent code was a diagnosis code of “hidradenitis” (SNOMED code 434119; ICD-9 705.83).

Figure 2

Graphical depiction of the overlap in subjects between the 2 incidence cohorts and the 2 prevalence cohorts. CCAE: IBM MarketScan Commercial Claims and Encounters; Clinformatics DOD: Optum’s de-identified Clinformatics Data Mart Database; IALPD: IQVIA Australian Longitudinal Patient Data; IDAF: IQVIA Disease Analyzer–France; IDAG: IQVIA Disease Analyzer–Germany; JMDC: Japan Medical Data Center; MDCD: IBM MarketScan Multi-State Medicaid; MDCR: IBM MarketScan Medicare Supplemental; Optum EHR: Optum Electronic Health Records.

Figure 3

Comparison of the proportion of subjects in the incident 1x cohort and the incident 2x cohort for 3 selected data sets with different demographic characteristics. Points closer to the diagonal indicate similar proportions between the comparators; points farther from the diagonal indicate more disparate proportions. CCAE: IBM MarketScan Commercial Claims and Encounters; Clinformatics DOD: Optum’s de-identified Clinformatics Data Mart Database; MDCD: IBM MarketScan Multi-State Medicaid.

Incidence rates for HS (for the incident 2x algorithm) from 2015 to 2020 differed between databases. The MDCD database had the highest rate at 23 per 100,000 person-years. The rates in the CCAE, Clinformatics DOD, and Optum EHR databases were approximately 12 per 100,000 person-years. Rates in the IDAG and IDAF databases and the JMDC and the IBM MarketScan Medicare Supplemental Database (MDCR) databases were 1 per 100,000 person-years. The rate in the IALPD database was undetectable, likely due to the small sample size. The incidence rates peaked in subjects in the 20- to 29-year-old age group. The incidence rates in the 30- to 39-year-old age group in the MDCD and IDAG databases were higher than in the older age groups but were similar to the 20- to 29-year-old age group. Incidence rates in female subjects were generally higher than in male subjects and were highest in the MDCD database at 24 per 100,000 person-years, followed by 11 per 100,000 person-years in the CCAE, Clinformatics DOD, and Optum EHR databases and 1 per 100,000 person-years in the IDAF database. The rate in female subjects was equal to the rate in male subjects in the MDCR database at 2 per 100,000 person-years.

Performance characteristics for the HS phenotypes assessed using the PheValuator method are presented in Table 2. Due to low subject counts, calculation of performance characteristics for the IDAG, IDAF, IALPD, and JMDC databases was not possible. The mean PPVs were higher in all databases for the PAs requiring a second diagnostic HS code in the 31 to 365 days after the index date. The mean PPVs for the 2 PAs that required a second code was 88% (incident) and 86% (prevalent). This was reduced to 62% (incident) and 59% (prevalent) when only a single diagnosis code for HS was required. The highest sensitivity estimates were in the 2 prevalent cohorts. The sensitivity for the 2 prevalent algorithms was 58% (single code required) and 25% (2 codes required). This decreased to 32% (single code required) and 12% (2 codes required) in the incident cohorts. The estimates for mean PPV for the Kim et al [16] PAs increased with the increase in number of HS diagnosis codes, ranging from 59% (2 codes) to 84% (5 codes). Our results showed a similar trend, but PPV was lower than reported by Kim et al (81% including subjects with 2 HS codes and 97% including subjects with >5 codes).

Table 2

Performance characteristics of the hidradenitis suppurativa phenotypes based on the PheValuator methodology.

Phenotype algorithm/database		Sensitivity (95% CI)		PPV^a (95% CI)		Specificity (95% CI)		NPV^b (95% CI)
Hidradenitis suppurativa incidence
	IBM MarketScan Commercial Database	0.380 (0.367-0.393)	0.599 (0.582-0.615)		0.999 (0.999-0.999)		0.998 (0.998-0.998)
	Optum’s de-identified Clinformatics Data Mart Database	0.369 (0.358-0.380)	0.603 (0.589-0.617)		0.999 (0.999-0.999)		0.998 (0.997-0.998)
	IBM MarketScan Multi-State Medicaid Database	0.311 (0.306-0.317)	0.676 (0.668-0.685)		0.998 (0.998-0.998)		0.990 (0.990-0.990)
	IBM MarketScan Medicare Supplemental Database	0.298 (0.277-0.319)	0.444 (0.417-0.472)		1.000 (1.000-1.000)		0.999 (0.999-0.999)
	Optum’s de-identified Electronic Health Record dataset	0.279 (0.269-0.289)	0.777 (0.761-0.793)		1.000 (1.000-1.000)		0.997 (0.997-0.997)
Hidradenitis suppurativa incidence with second diagnosis 31 to 365 days after index date
	IBM MarketScan Commercial Database	0.151 (0.142-0.161)	0.890 (0.868-0.909)		1.000 (1.000-1.000)		0.998 (0.998-0.998)
	Optum’s de-identified Clinformatics Data Mart Database	0.133 (0.126-0.141)	0.882 (0.862-0.900)		1.000 (1.000-1.000)		0.997 (0.996-0.997)
	IBM MarketScan Multi-State Medicaid Database	0.115 (0.112-0.119)	0.874 (0.862-0.885)		1.000 (1.000-1.000)		0.987 (0.987-0.987)
	IBM MarketScan Medicare Supplemental Database	0.109 (0.095-0.123)	0.830 (0.778-0.874)		1.000 (1.000-1.000)		0.999 (0.999-0.999)
	Optum de-identified Electronic Health Record dataset	0.109 (0.102-0.116)	0.948 (0.931-0.962)		1.000 (1.000-1.000)		0.997 (0.996-0.997)
Hidradenitis suppurativa prevalence
	IBM MarketScan Commercial Database	0.541 (0.531-0.551)	0.649 (0.639-0.660)		0.999 (0.999-0.999)		0.998 (0.998-0.998)
	Optum’s de-identified Clinformatics Data Mart Database	0.666 (0.655-0.677)	0.602 (0.591-0.613)		0.998 (0.998-0.998)		0.999 (0.999-0.999)
	IBM MarketScan Multi-State Medicaid Database	0.664 (0.658-0.670)	0.628 (0.621-0.634)		0.995 (0.995-0.995)		0.996 (0.996-0.996)
	IBM MarketScan Medicare Supplemental Database	0.442 (0.422-0.462)	0.355 (0.338-0.373)		0.999 (0.999-0.999)		0.999 (0.999-0.999)
	Optum de-identified Electronic Health Record dataset	0.632 (0.618-0.647)	0.754 (0.739-0.768)		1.000 (1.000-1.000)		0.999 (0.999-0.999)
Hidradenitis suppurativa prevalence with second diagnosis 31 to 365 days after index date
	IBM MarketScan Commercial Database	0.296 (0.285-0.307)	0.874 (0.860-0.887)		1.000 (1.000-1.000)		0.997 (0.997-0.998)
	Optum’s de-identified Clinformatics Data Mart Database	0.233 (0.220-0.246)	0.937 (0.920-0.951)		1.000 (1.000-1.000)		0.998 (0.998-0.998)
	IBM MarketScan Multi-State Medicaid Database	0.219 (0.203-0.236)	0.732 (0.699-0.764)		1.000 (1.000-1.000)		0.999 (0.999-0.999)
	IBM MarketScan Medicare Supplemental Database	0.288 (0.282-0.294)	0.859 (0.851-0.867)		0.999 (0.999-0.999)		0.992 (0.992-0.992)
	Optum de-identified Electronic Health Record dataset	0.231 (0.222-0.239)	0.912 (0.900-0.923)		1.000 (1.000-1.000)		0.996 (0.996-0.996)

^aPPV: positive predictive value.

^bNPV: negative predictive value.

Discussion Principal Findings

This study sought to develop and determine the accuracy of 4 HS PAs. The 4 PAs included 2 for incidence and 2 for prevalence, with one in each group having high sensitivity and specificity. Use of the PheValuator method allowed for estimation of sensitivity, specificity, PPV, and NPV without manual chart review. While both the incident and prevalent PAs were useful for the exploration of HS in observational databases, the PAs with definitions requiring just a single HS diagnosis code had lower specificity and higher sensitivity than the definitions requiring 2 codes, which had higher specificity and lower sensitivity. Thus, the choice of which algorithm to use is dependent on the research question being explored. For example, the use of a more sensitive algorithm would be applicable for safety studies, in which the PA is used to determine HS outcomes and missed identification of possible cases is problematic, whereas the use of a PA with higher specificity would be useful for treatment comparison studies, in which the goal is to ensure that all subjects exposed to a treatment have a high probability of having HS.

A few studies have included validation metrics for HS algorithms for observational databases [9,10,16,29,30]. Kim et al [16] used data available from the Massachusetts General Hospital and reported an increase in PPV with an increasing number of HS diagnosis codes (81% for 2 codes vs 97% for 5 codes). Our study replicated the Kim et al cohorts and found an increase in PPV with the use of 5 or more diagnosis codes compared to the use of at least two HS diagnosis codes (mean 84% for >5 codes vs mean 59% for 2 codes) that was similar to, albeit lower than, the published results. In general, our study found higher PPVs compared to studies that used a single HS diagnosis code [9,10,29,30]. The majority of subjects identified in our study were female, which is similar to findings from other studies [5,6,9,16,31]. A US study that used a cross-sectional design and a large electronic medical records database found an overall prevalence of 24.8% for type 2 diabetes, 71.6% for obesity, and 39.9% for hyperlipidemia among HS subjects [8]. Our study, when restricted to US data and examining covariates 365 days prior to and including the index date, identified type 2 diabetes in 26.5%, obesity in 19.6%, and hyperlipidemia in 26.5% of incident 1x HS subjects. The cross-sectional study was restricted to subjects aged 18 years or older, while our study included all ages, which may help in interpreting the decreased proportion of hyperlipidemia observed in our results. It has been reported that administrative databases underreport obesity as a diagnosis and are not an optimal data source for obesity prevalence [42]. This may support our finding of a lower prevalence of obesity compared to the findings of the cross-sectional study.

Strengths of our study include the use of a rigorous, data-driven approach for generating and evaluating the HS phenotypes across a data network that included 9 databases covering US and non-US countries. Network-based phenotype evaluations greatly strengthen the knowledge base for a given algorithm, because they allow the assessment of the consistency of findings across data types, geographic locations, and time periods. When concordant trends emerge, it increases confidence that the observations are the effect of the PA itself rather than an artifact of a particular data source. The PAs were analyzed using multiple approaches, providing ancillary verification of decisions made in determining the cohort logic. Our study includes several study artifacts, including JSON files for the PAs, computer code, and results for all the analyzed PAs, providing transparency in our interpretation of the results.

There were also several limitations to our study. We used administrative data sets primarily maintained for insurance billing, which are well-known to have significant deficits, including coding inaccuracies [43]. In addition, the estimation of performance characteristics using the PheValuator methodology was dependent on the quality of the data in the data set, which can vary substantially [37]. The algorithm validation was performed using a method involving predictive modeling of HS rather than case reviews. Results from PheValuator have been compared to results from previously published validation studies and have demonstrated excellent agreement [38]. This method does have the advantage of using multiple databases to provide a full set of performance metrics, including sensitivity and specificity, which are rarely provided in validation studies using case reviews [37]. The generalizability of our findings to uninsured populations is uncertain, given the insured population that was observed in this study. In the incident PA that defined HS with only a single diagnosis code, it was not possible to determine if any of these were “rule-out” diagnoses. The algorithms presented in this study use codes specific to HS; therefore, jurisdictions and practices that do not use these specific codes and instead use codes for “abscess” or “cyst” would be unable to operationalize these PAs. The study period used for evaluation of the HS algorithms includes the year (2015) when the drug Humira was introduced to treat HS [44]. Education on HS increased, and physicians became more likely to use diagnosis codes specifically indicating HS in observational data. Therefore, to avoid temporal bias, researchers should avoid use of these algorithms in data from prior to 2015.

Conclusions

This study developed and evaluated 4 HS PAs using a rigorous, data-driven approach and generated phenotype performance metrics including sensitivity, specificity, PPV, and NPV. Based on the analyses, we recommend that PAs requiring a single HS diagnosis code be used in studies requiring high sensitivity, while studies requiring high specificity should use PAs requiring 2 HS diagnosis codes. These algorithms will enable researchers to use large observational databases to research HS, which has a high burden of disease. There is a need for better evidence, as currently there are clinical knowledge gaps for HS that observational data is well suited to address.

Multimedia Appendix 1

Diagnostic codes.

Abbreviations

CCAE

IBM MarketScan Commercial Claims and Encounters

continuous enrollment

Clinformatics DOD

Optum’s de-identified Clinformatics Data Mart Database

EHR

electronic health record

hidradenitis suppurativa

IALPD

IQVIA Australian Longitudinal Patient Data

ICD-9

International Classification of Diseases, Ninth Revision

ICD-10

International Classification of Diseases, Tenth Revision

IDAF

IQVIA Disease Analyzer–France

IDAG

IQVIA Disease Analyzer–Germany

JMDC

Japan Medical Data Center

MDCD

IBM MarketScan Multi-State Medicaid

MDCR

IBM MarketScan Medicare Supplemental

NPV

negative predictive value

OHDSI

Observational Health Data Sciences and Informatics

OMOP

Observational Medical Outcomes Partnership

phenotype algorithm

PPV

positive predictive value

SMD

standard mean difference

SNOMED

Systemized Nomenclature of Medicine

Manuscript review was provided by Anna Sheahan, PhD. All authors contributed to all aspects of the study (study design and execution, data analysis and interpretation, and writing of the manuscript). This research was funded by Janssen Research and Development, LLC. The data source for this study was a retrospective claims database and thus there are no patient or public contributors.

Data Availability

The data used for this study are proprietary and only available through a licensing data-use agreement process. This process ensures that confidentiality of the data contributors is maintained and that the data are used appropriately. The MarketScan Research Database can be licensed by researchers.

All authors are employees of Janssen Research and Development, LLC, and may own stock or stock options. The work performed for this study was part of their employment.

Alikhan

Lynch

Eisen

Hidradenitis suppurativa: a comprehensive review

J Am Acad Dermatol 2009 04 60 4 539 61; quiz 562

10.1016/j.jaad.2008.11.911

19293006

S0190-9622(09)00036-X

Tiri

Jokelainen

Timonen

Tasanen

Huilaja

Somatic and psychiatric comorbidities of hidradenitis suppurativa in children and adolescents

J Am Acad Dermatol 2018 09 79 3 514 519

10.1016/j.jaad.2018.02.067

29518461

S0190-9622(18)30353-0

Overby

Pathak

Gottesman

Haerian

Perotte

Murphy

Bruce

Johnson

Talwalkar

Shen

Ellis

Kullo

Chute

Friedman

Bottinger

Hripcsak

Weng

A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury

J Am Med Inform Assoc 2013 12 01 20 e2 e243 52

10.1136/amiajnl-2013-001930

23837993

amiajnl-2013-001930

PMC3861914

Aletaha

Epstein

Skup

Zueger

Garg

Panaccione

Risk of Developing Additional Immune-Mediated Manifestations: A Retrospective Matched Cohort Study

Adv Ther 2019 07 17 36 7 1672 1683

10.1007/s12325-019-00964-z

31102202

10.1007/s12325-019-00964-z

PMC6824390

Cosmatos

Matcho

Weinstein

Montgomery

Stang

Analysis of patient claims data to determine the prevalence of hidradenitis suppurativa in the United States

J Am Acad Dermatol 2013 03 68 3 412 9

10.1016/j.jaad.2012.07.027

22921795

S0190-9622(12)00812-2

Desai

Shah

High burden of hospital resource utilization in patients with hidradenitis suppurativa in England: a retrospective cohort study using hospital episode statistics

Br J Dermatol 2017 04 21 176 4 1048 1055

10.1111/bjd.14976

27534703

Egeberg

Gislason

Hansen

Risk of Major Adverse Cardiovascular Events and All-Cause Mortality in Patients With Hidradenitis Suppurativa

JAMA Dermatol 2016 04 152 4 429 34

10.1001/jamadermatol.2015.6264

26885728

2491691

Garg

Birabaharan

Strunk

Prevalence of type 2 diabetes mellitus among patients with hidradenitis suppurativa in the United States

J Am Acad Dermatol 2018 07 79 1 71 76

10.1016/j.jaad.2018.01.014

29339240

S0190-9622(18)30098-7

Garg

Kirby

Lavian

Lin

Strunk

Sex- and age-adjusted population analysis of prevalence estimates for hidradenitis suppurativa in the United States

JAMA Dermatol 2017 08 01 153 8 760 764

10.1001/jamadermatol.2017.0201

28492923

2626146

PMC5710402

Garg

Lavian

Lin

Strunk

Alloo

Incidence of hidradenitis suppurativa in the United States: A sex- and age-adjusted population analysis

J Am Acad Dermatol 2017 07 77 1 118 122

10.1016/j.jaad.2017.02.005

28285782

S0190-9622(17)30163-9

Hung

Chiang

Chung

Tsao

Chien

Wang

Increased risk of cardiovascular comorbidities in hidradenitis suppurativa: A nationwide, population-based, cohort study in Taiwan

J Dermatol 2019 10 46 10 867 873

10.1111/1346-8138.15038

31389066

Ingram

Jenkins-Jones

Knipe

Morgan

Cannings-John

Piguet

Population-based Clinical Practice Research Datalink study using algorithm modelling to identify the true burden of hidradenitis suppurativa

Br J Dermatol 2018 04 178 4 917 924

10.1111/bjd.16101

29094346

Jemec

GBE

Guérin

Annie

Kaminsky

Okun

Sundaram

What happens after a single surgical intervention for hidradenitis suppurativa? A retrospective claims-based analysis

J Med Econ 2016 07 14 19 7 710 7

10.3111/13696998.2016.1161636

26938967

Jung

Lee

Kim

Chang

Lee

Choi

Won

Lee

Assessment of overall and specific cancer risks in patients with hidradenitis suppurativa

JAMA Dermatol 2020 08 01 156 8 844 853

10.1001/jamadermatol.2020.1422

32459291

2766018

PMC7254443

Khalsa

Liu

Kirby

Increased utilization of emergency department and inpatient care by patients with hidradenitis suppurativa

J Am Acad Dermatol 2015 10 73 4 609 14

10.1016/j.jaad.2015.06.053

26190241

S0190-9622(15)01859-9

Kim

Shlyankevich

Kimball

The validity of the diagnostic code for hidradenitis suppurativa in an electronic database

Br J Dermatol 2014 08 171 2 338 42

10.1111/bjd.13041

24712395

PMC4219870

Kimball

Sundaram

Gauthier

Guérin

Annie

Pivneva

Singh

Ganguli

The comorbidity burden of hidradenitis suppurativa in the United States: a claims data analysis

Dermatol Ther (Heidelb) 2018 12 8 4 557 569

10.1007/s13555-018-0264-z

30306395

10.1007/s13555-018-0264-z

PMC6261111

Kirby

Miller

Adams

Leslie

Health care utilization patterns and costs for patients with hidradenitis suppurativa

JAMA Dermatol 2014 09 150 9 937 44

10.1001/jamadermatol.2014.691

24908260

1878305

Kirsten

Petersen

Hagenström

Augustin

Epidemiology of hidradenitis suppurativa in Germany - an observational cohort study based on a multisource approach

J Eur Acad Dermatol Venereol 2020 01 34 1 174 179

10.1111/jdv.15940

31494987

Lee

Kwon

Jung

Kim

Bae

Prevalence and comorbidities associated with hidradenitis suppurativa in Korea: a nationwide population-based study

J Eur Acad Dermatol Venereol 2018 10 32 10 1784 1790

10.1111/jdv.15071

29761904

Marvel

Vlahiotis

Sainski-Nguyen

Willson

Kimball

Disease burden and cost of hidradenitis suppurativa: a retrospective examination of US administrative claims data

BMJ Open 2019 09 30 9 9 e030579

10.1136/bmjopen-2019-030579

31575575

bmjopen-2019-030579

PMC6797383

McMillan

Kathleen

Hidradenitis suppurativa: number of diagnosed patients, demographic characteristics, and treatment patterns in the United States

Am J Epidemiol 2014 06 15 179 12 1477 83

10.1093/aje/kwu078

24812161

kwu078

Mehdizadeh

Rosella

Alavi

Sibbald

Farzanfar

Hazrati

Vernich

Laporte

Bashash

A Canadian population-based cohort to the study cost and burden of surgically resected hidradenitis suppurativa

J Cutan Med Surg 2018 22 3 312 317

10.1177/1203475418763536

29528753

Patel

Rastogi

Singam

Lee

Amin

Silverberg

Association between hidradenitis suppurativa and hospitalization for psychiatric disorders: a cross-sectional analysis of the National Inpatient Sample

Br J Dermatol 2019 08 181 2 275 281

10.1111/bjd.17416

30422314

Pinter

Kokolakis

Rech

Biermann

MHC

Häberle

Benjamin M

Multmeier

Reinhardt

Hidradenitis suppurativa and concurrent psoriasis: comparison of epidemiology, comorbidity profiles, and risk factors

Dermatol Ther (Heidelb) 2020 08 10 4 721 734

10.1007/s13555-020-00401-y

32500484

10.1007/s13555-020-00401-y

PMC7367943

Ramos-Rodriguez

Timerman

Khan

Bonomo

Hunjan

Lemor

The in-hospital burden of hidradenitis suppurativa in patients with inflammatory bowel disease: a decade nationwide analysis from 2004 to 2014

Int J Dermatol 2018 05 57 5 547 552

10.1111/ijd.13932

29431201

Ruan

Chen

Singhal

Lee

Fukudome

Surgical management of hidradenitis suppurativa: procedural trends and risk factors

J Surg Res 2018 09 229 200 207

10.1016/j.jss.2018.04.007

29936991

S0022-4804(18)30246-4

Schneeweiss

Merola

Schneeweiss

Wyss

Rosmarin

Risk of connective tissue disease, morphoea and systemic vasculitis in patients with hidradenitis suppurativa

J Eur Acad Dermatol Venereol 2021 01 35 1 195 202

10.1111/jdv.16728

32531094

Shlyankevich

Chen

Kim

Kimball

Hidradenitis suppurativa is a systemic disease with substantial comorbidity burden: a chart-verified case-control analysis

J Am Acad Dermatol 2014 12 71 6 1144 50

10.1016/j.jaad.2014.09.012

25440440

S0190-9622(14)01912-4

Strunk

Midura

Papagermanos

Alloo

Garg

Validation of a case-finding algorithm for hidradenitis suppurativa using administrative coding from a clinical database

Dermatology 2017 233 1 53 57

10.1159/000468148

28448975

000468148

Tzellos

Yang

Calimlim

Signorovitch

Impact of hidradenitis suppurativa on work loss, indirect costs and income

Br J Dermatol 2019 07 181 1 147 154

10.1111/bjd.17101

30120887

PMC7379487

Wertenteil

Strunk

Garg

Overall and subgroup prevalence of acne vulgaris among patients with hidradenitis suppurativa

J Am Acad Dermatol 2019 05 80 5 1308 1313

10.1016/j.jaad.2018.09.040

30287328

S0190-9622(18)32651-3

Atlas

GitHub 2022-11-16

https://github.com/OHDSI/Atlas

Overhage

Ryan

Reich

Hartzema

Stang

Validation of a common data model for active safety surveillance research

J Am Med Inform Assoc 2012 19 1 54 60

10.1136/amiajnl-2011-000376

22037893

amiajnl-2011-000376

PMC3240764

OHDSI/PhenotypeEvaluations/tree/main/HS/inst/cohorts

GitHub 2022-11-16

https://github.com/OHDSI/PhenotypeEvaluations/tree/main/HS/inst/cohorts

CohortDiagnostics | Introduction

GitHub 2022-11-16

https://github.com/OHDSI/CohortDiagnostics

Swerdel

Hripcsak

Ryan

PheValuator: Development and evaluation of a phenotype algorithm evaluator

J Biomed Inform 2019 09 97 103258

10.1016/j.jbi.2019.103258

31369862

S1532-0464(19)30177-7

PMC7736922

Swerdel

Schuemie

Murray

Ryan

PheValuator 2.0: Methodological improvements for the PheValuator approach to semi-automated phenotype algorithm evaluation

J Biomed Inform 2022 11 135 104177

10.1016/j.jbi.2022.104177

35995107

S1532-0464(22)00188-5

OHDSI/PhenotypeEvaluations/tree/main/HS

GitHub 2022-11-16

https://github.com/OHDSI/PhenotypeEvaluations/tree/main/HS

Cohort Definition

OHDSI Cohort Diagnostics 2022-11-16

https://data.ohdsi.org/HSCohortDiagnostics

Austin

Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research

Commun Stat Simul Comput 2009 04 09 38 6 1228 1234

10.1080/03610910902859574

Martin

Chen

Graham

Quan

Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies

BMC Health Serv Res 2014 02 13 14 70

10.1186/1472-6963-14-70

24524687

1472-6963-14-70

PMC3996078

Tyree

Lind

Lafferty

Challenges of using medical insurance claims data for utilization analysis

Am J Med Qual 2006 21 4 269 75

10.1177/1062860606288774

16849784

21/4/269

PMC1533763

Humira (adalimumab) | A Biologic Treatment Option

AbbVie Inc 2022-11-16

https://www.humira.com/