Introduction and Background
Selecting the Algorithm
Defining Fitness for Purpose (Interactive Exercise)
Implementing the Algorithm
Extending the Algorithm (Interactive Exercise)
Localizing the Algorithm (Interactive Exercise)
Validating the Algorithm
Optimizing the Algorithm
Wrap Up & Additional Questions
You are trying to build a diabetes research repository at your institution and want to identify patients with any type of diabetes as well as label those with Type 1 or Type 2 diabetes specifically. You decide to start the algorithm selection process with the three algorithms we used to develop the Colorado Diabetes EHR Research Repository (CODER):
Upadhyaya SG et. al. Automated Diabetes Case Identification Using Electronic Health Record Data at a Tertiary Care Facility. Mayo Clin Proc Innov Qual Outcomes. 2017 Jul;1(1):100–10. PMC6135013
Supplement: Table, Appendices
Schroeder EB et. al. Validation of an algorithm for identifying type 1 diabetes in adults based on electronic health record data. Pharmacoepidemiol Drug Saf. 2018 Oct;27(10):1053–9. PMC6028322
Kho AN et. al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J Am Med Inform Assoc. 2012 Mar;19(2):212–8. PMC3277617
Supplement: Data, Appendix, Tables, PheKB Entry
As a group pick one study for the rest of the exercise:
Assess the fitness for purpose: What are the phenotype and performance goals of the algorithm? Were they explicitly stated or are you inferring based on the paper context?
Assess the implementability at your institutions:
What data types does the algorithm use? Do you have all of those data types available in your institution’s data warehouse?
What types of methods do they use? Do you need any special computational tools (software, servers, etc.) to implement that part of the algorithm?
Is there anything missing from their definition? i.e., do they mention using a type of data or algorithmic logic but not actually provide the exact value sets or specific details about the implementation?
Using the algorithm you selected in the Fitness for Purpose Exercise, look back at the types of data they used - Do they use specific terminologies? Do they use the most up-to-date version of those terminologies? If not, you will need to update this data type. Identify one or more data types that will need to be updated and start looking at resources that could help you in this process. Try out updating at least one data element (i.e., if diagnoses need to be updated from ICD9 → ICD10 try updating a single ICD code).
Resources:
Diagnoses
PheWAS Catalog: Has multiple versions of mappings - v1.2 (ICD9-CM), v1.2b (ICD10-CM) and vX
HCUP Clinical Classifications Software (CCS): ICD10-CM and ICD9-CM
CMS General Equivalence Mappings: 2018 latest version
OHDSI has also done much work mapping both ICD9-CM and ICD10-CM to SNOMED codes, but make it clear that these are unidirectional mappings (ICD9/10-CM → SNOMED) and should not be used to translate between ICD terminologies (i.e., do not try ICD9-CM → SNOMED → ICD10-CM)
Medications
Using the algorithm you selected in the Fitness for Purpose and Extending the Algorithm Exercises, look back at the types of data and terminologies they used. Does your site also use those terminologies? If not, you will need to localize those data types.
Identify the types of data you will need to localize to your institution. Are these the same for everyone in your group? Do some sites have different needs/types of localizations needed?
Are there any types of data that you won’t be able to localize/implement at your institution? Look at the algorithm design - what impact do you think it will have on the algorithm performance (e.g., do you think it will make the algorithm more or less specific?)
Dr. Wiley is an expert in computational phenotyping and identifying patient populations from EHRs. She has developed and implemented algorithms across a range of traits including diseases (diabetes, intracranial aneurysm), medication related outcomes (statin induced myopathy, stable warfarin dose), and social history (tobacco use status) within multiple clinical systems. She co-leads phenotype harmonization efforts for the Population Architecture Using Genomics and Epidemiology (PAGE) consortium as experience developing systems to support development and deployment of harmonized phenotype definitions across multiple EHR systems. She also developed a massively open online course (MOOC) on computational phenotyping available on the Coursera platform with more than 2,000 enrolled students.
Mr. Rasmussen is a Senior Clinical Research Associate at Northwestern University Feinberg School of Medicine. He has participated in many phenotyping initiatives, including participation within the electronic Medical Records and Genomics (eMERGE) Network and work with the Phenotype Execution and Modeling Architecture (PhEMA) project. Through these efforts, he has studied and published on the portability of phenotyping algorithms, including technical considerations as well as the impact that ambiguous language can have on phenotype interpretation. He has collaborated with Dr. Wiley for several years developing infrastructure for phenotype validation (ReviewR), as well as contributing to a systematic evidence review on EHR-based phenotype algorithms.
David is a clinical data analyst in the Wiley Lab. He has a bachelor degree in biochemistry from the University of Illinois at Urbana Champaign. He then served as a project manager and technical specialist at the Center of Excellence for Airport Technology. Since joining the lab, David has built and maintains a high performance cloud computing platform for the Coursera Clinical Data Science Specialization. He also manages the lab's technical infrastructure and is the primary developer for the ReviewR Shiny App.
Melissa is a clinical data analyst in the Wiley Lab. She earned a bachelor's degree in biology from the University of Colorado, and then pursued a certificate in diagnostic medical sonography. After 10 years in clinical medicine, she returned to CU where she completed a master's degree in biostatistics with a minor in data science. Since joining the lab, Melissa has been working on projects with the Aneurysm Resarch Group and the Colorado Center for Personalized Medicine Biobank.