Online Education

Clinical Data Science Specialization

Are you interested in how to use data generated by doctors, nurses, and the healthcare system to improve the care of future patients? If so, you may be a future clinical data scientist!

This specialization provides learners with hands on experience in use of electronic health records and informatics tools to perform clinical data science. This series of six courses is designed to augment learner’s existing skills in statistics and programming to provide examples of specific challenges, tools, and appropriate interpretations of clinical data.

Learn More at

Tutorials and Workshops

Reusing and Adapting Computational Phenotyping Algorithms

Presented at AMIA 2023 Informatics Summit

Laura Wiley and Luke Rasmussen, March 13, 2023

A common challenge in translational research is identifying patient cohorts from electronic health records (EHRs). Computational phenotyping aims to address this problem using algorithms (e.g., rule-based logic, natural language processing [NLP], and/or machine learning) to identify which patients have a particular clinical condition or characteristic. The EHR computational phenotyping community is diverse and builds algorithms for a variety of applications and clinical systems, which has led to a proliferation of algorithms and methodologies. Our recent systematic evidence review identified more than 661 algorithms for 312 distinct clinical conditions published in the literature.1 The plethora of existing phenotype algorithms paradoxically reduces reuse of algorithms as it’s not always clear to investigators which algorithm is most appropriate for their use case. Furthermore, differences in medical records systems, data warehouses, and technology availability often necessitate modifying an algorithm to implement it in local systems. While informaticians are developing methods to try and solve these portability challenges, in the interim investigators need a rationale approach for reusing algorithms. This instructional workshop will interactively guide learners through a systematic framework and set of best practices for reusing algorithms in a new context.

 Project Homepage

Using R for Healthcare Data Science

Presented at AMIA 2015 Annual Symposium

Vojtech Huser and Laura Wiley, November 15, 2015

The R statistical programming language provides powerful tools to manipulate data and attracts many non-programmers. R offers a unique package management system and powerful data visualization packages. This tutorial will provide an introduction to the language, R installation (free software) and use of RStudio, a free integrated development environment built for R. In the first part we will cover R solutions for basic challenges facing data scientists like wrangling, cleaning and visualizing data in reproducible ways. We will focus on the most recent R packages, such as dplyr (data manipulation), ggplot2 (publication ready plots), and shiny (interactive web-based reports). In the second part, we will use several case studies (using publically available data from International Warfarin Pharmacogenomics Consortium (IWPC), Drugs@FDA, and RxNorm) to demonstrate R in action on biomedical informatics datasets. We will demonstrate how the previously introduced packages for data cleaning and visualization can be applied to a dataset that combines clinical and genomic data and a range of informatics resources. All work will be demonstrated using reproducible reporting tools (e.g., RMarkdown) that combine code and analysis output in a single file (html, docx, or pdf). We will conclude with a summary of latest trends in the R language and comparison of R to other languages commonly used for data science (such as Python, Java, Julia, C or SAS), and a general Q&A section.

 Project Homepage

 GitHub Repository

Overview of ggplot2 and the "Grammar of Graphics"

Nashville Data Science Meeting

Laura Wiley, August 10, 2015

Laura Wiley (PhD Candidate, Vanderbilt) will walk us through an overview of ggplot and the "grammar of graphics." The talk will include a high level overview of ggplot and go through some specific visualizations and an example of how to code them in R.

 GitHub Repository

Using Knitr and RMarkdown for Reproducible Workflows

Nashville R User Group

Laura Wiley, May 12, 2015

 GitHub Repository

Using ggplot2 for Genetics

Vanderbilt Human Genetics Student Association

Laura Wiley, March 20, 2015

 GitHub Repository