Reusing & Adapting Computational Phenotyping Algorithms

Outline of Topics

Introduction and Background

Selecting the Algorithm

Defining Fitness for Purpose (Interactive Exercise)

Implementing the Algorithm

Extending the Algorithm (Interactive Exercise)

Localizing the Algorithm (Interactive Exercise)

Validating the Algorithm

Optimizing the Algorithm

Wrap Up & Additional Questions

Exercises

Fitness for Purpose

You are trying to build a diabetes research repository at your institution and want to identify patients with any type of diabetes as well as label those with Type 1 or Type 2 diabetes specifically. You decide to start the algorithm selection process with the three algorithms we used to develop the Colorado Diabetes EHR Research Repository (CODER):

As a group pick one study for the rest of the exercise:

Assess the fitness for purpose: What are the phenotype and performance goals of the algorithm? Were they explicitly stated or are you inferring based on the paper context?
Assess the implementability at your institutions:
1. What data types does the algorithm use? Do you have all of those data types available in your institution’s data warehouse?
2. What types of methods do they use? Do you need any special computational tools (software, servers, etc.) to implement that part of the algorithm?
3. Is there anything missing from their definition? i.e., do they mention using a type of data or algorithmic logic but not actually provide the exact value sets or specific details about the implementation?

Extending the Algorithm

Using the algorithm you selected in the Fitness for Purpose Exercise, look back at the types of data they used - Do they use specific terminologies? Do they use the most up-to-date version of those terminologies? If not, you will need to update this data type. Identify one or more data types that will need to be updated and start looking at resources that could help you in this process. Try out updating at least one data element (i.e., if diagnoses need to be updated from ICD9 → ICD10 try updating a single ICD code).

Resources:

Diagnoses
- PheWAS Catalog: Has multiple versions of mappings - v1.2 (ICD9-CM), v1.2b (ICD10-CM) and vX
- HCUP Clinical Classifications Software (CCS): ICD10-CM and ICD9-CM
- CMS General Equivalence Mappings: 2018 latest version
- OHDSI has also done much work mapping both ICD9-CM and ICD10-CM to SNOMED codes, but make it clear that these are unidirectional mappings (ICD9/10-CM → SNOMED) and should not be used to translate between ICD terminologies (i.e., do not try ICD9-CM → SNOMED → ICD10-CM)
Medications
- By Class: RxClass
- By Drug Name/Active Ingredient: RxNav

Localizing the Algorithm

Using the algorithm you selected in the Fitness for Purpose and Extending the Algorithm Exercises, look back at the types of data and terminologies they used. Does your site also use those terminologies? If not, you will need to localize those data types.

Identify the types of data you will need to localize to your institution. Are these the same for everyone in your group? Do some sites have different needs/types of localizations needed?
Are there any types of data that you won’t be able to localize/implement at your institution? Look at the algorithm design - what impact do you think it will have on the algorithm performance (e.g., do you think it will make the algorithm more or less specific?)

Instructors & TA's

Laura Wiley, PhD

Dr. Wiley is an expert in computational phenotyping and identifying patient populations from EHRs. She has developed and implemented algorithms across a range of traits including diseases (diabetes, intracranial aneurysm), medication related outcomes (statin induced myopathy, stable warfarin dose), and social history (tobacco use status) within multiple clinical systems. She co-leads phenotype harmonization efforts for the Population Architecture Using Genomics and Epidemiology (PAGE) consortium as experience developing systems to support development and deployment of harmonized phenotype definitions across multiple EHR systems. She also developed a massively open online course (MOOC) on computational phenotyping available on the Coursera platform with more than 2,000 enrolled students.

Luke Rasmussen, MS

Mr. Rasmussen is a Senior Clinical Research Associate at Northwestern University Feinberg School of Medicine. He has participated in many phenotyping initiatives, including participation within the electronic Medical Records and Genomics (eMERGE) Network and work with the Phenotype Execution and Modeling Architecture (PhEMA) project. Through these efforts, he has studied and published on the portability of phenotyping algorithms, including technical considerations as well as the impact that ambiguous language can have on phenotype interpretation. He has collaborated with Dr. Wiley for several years developing infrastructure for phenotype validation (ReviewR), as well as contributing to a systematic evidence review on EHR-based phenotype algorithms.

David Mayer

David is a clinical data analyst in the Wiley Lab. He has a bachelor degree in biochemistry from the University of Illinois at Urbana Champaign. He then served as a project manager and technical specialist at the Center of Excellence for Airport Technology. Since joining the lab, David has built and maintains a high performance cloud computing platform for the Coursera Clinical Data Science Specialization. He also manages the lab's technical infrastructure and is the primary developer for the ReviewR Shiny App.

Melissa Wilson, MS

Melissa is a clinical data analyst in the Wiley Lab. She earned a bachelor's degree in biology from the University of Colorado, and then pursued a certificate in diagnostic medical sonography. After 10 years in clinical medicine, she returned to CU where she completed a master's degree in biostatistics with a minor in data science. Since joining the lab, Melissa has been working on projects with the Aneurysm Resarch Group and the Colorado Center for Personalized Medicine Biobank.

Page updated

Google Sites

Report abuse