Online Education

Clinical Data Science Specialization

Are you interested in how to use data generated by doctors, nurses, and the healthcare system to improve the care of future patients? If so, you may be a future clinical data scientist!

This specialization provides learners with hands on experience in use of electronic health records and informatics tools to perform clinical data science. This series of six courses is designed to augment learner’s existing skills in statistics and programming to provide examples of specific challenges, tools, and appropriate interpretations of clinical data.

Learn More at

Tutorials and Workshops

Using R for Healthcare Data Science

Presented at AMIA 2015 Annual Symposium

Vojtech Huser and Laura Wiley, November 15, 2015

The R statistical programming language provides powerful tools to manipulate data and attracts many non-programmers. R offers a unique package management system and powerful data visualization packages. This tutorial will provide an introduction to the language, R installation (free software) and use of RStudio, a free integrated development environment built for R. In the first part we will cover R solutions for basic challenges facing data scientists like wrangling, cleaning and visualizing data in reproducible ways. We will focus on the most recent R packages, such as dplyr (data manipulation), ggplot2 (publication ready plots), and shiny (interactive web-based reports). In the second part, we will use several case studies (using publically available data from International Warfarin Pharmacogenomics Consortium (IWPC), Drugs@FDA, and RxNorm) to demonstrate R in action on biomedical informatics datasets. We will demonstrate how the previously introduced packages for data cleaning and visualization can be applied to a dataset that combines clinical and genomic data and a range of informatics resources. All work will be demonstrated using reproducible reporting tools (e.g., RMarkdown) that combine code and analysis output in a single file (html, docx, or pdf). We will conclude with a summary of latest trends in the R language and comparison of R to other languages commonly used for data science (such as Python, Java, Julia, C or SAS), and a general Q&A section.

Project Homepage

GitHub Repository

Overview of ggplot2 and the "Grammar of Graphics"

Nashville Data Science Meeting

Laura Wiley, August 10, 2015

Laura Wiley (PhD Candidate, Vanderbilt) will walk us through an overview of ggplot and the "grammar of graphics." The talk will include a high level overview of ggplot and go through some specific visualizations and an example of how to code them in R.

GitHub Repository

Using Knitr and RMarkdown for Reproducible Workflows

Nashville R User Group

Laura Wiley, May 12, 2015

GitHub Repository

Using ggplot2 for Genetics

Vanderbilt Human Genetics Student Association

Laura Wiley, March 20, 2015

GitHub Repository