Get Started with covdata • covdata

Loading the Package

The covdata package aims to make data related to the COVID-19 pandemic easily accessible to users of R. Once the package is installed, load it in the usual way:

library(tidyverse) 
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
#> ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
#> ✔ tibble  3.1.8      ✔ dplyr   1.0.10
#> ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
#> ✔ readr   2.1.3      ✔ forcats 0.5.2 
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library(covdata)
#> 
#> Attaching package: 'covdata'
#> 
#> The following object is masked from 'package:datasets':
#> 
#>     uspop

Loading the package makes a variety of datasets available for use. Because the data are in tibbles, the use of the tidyverse suite of packages is strongly recommended, though it is not required. If use the data objects as dataframes (i.e., without loading the tidyverse packages) you will need to re-encode some variables, most importantly date and date-time columns, for them to behave as expected.

Exploring the Data

Individual datasets are documented on their help pages. The documentation includes details on properly citing the source of the data and, in some cases, the terms of use under which it is provided. The vignettes in the package provide a brief overview of several of the datasets. These include:

For the ECDC COVID-specific data: vignette("ecdc-data")
For US COVID-specific data: vignette("us-data")
For the New York Times COVID-specific data: vignette("new-york-times")
For the global mobility data from Apple: vignette("mobility-data")
For the CoronaNet Project data on policy: vignette("coronanet-data")
For data from the Human Mortality Database and the New York Times on excess mortality: vignette("excess-mortality")

Caveat Emptor

The data are provided as-is. More information about collection methods, scope, limits, and possible sources of error in the data can be found in the documentation provided by their respective sources. Follow the links above, and see the vignettes in the package. The collection and effective reporting of case and mortality data by national governments has technical and political aspects influenced by, amongst other things, the varying capacity of states to test, track and measure events in a timely fashion, the varying definitions, criteria, and methods employed by states in registering cases and deaths, and the role of politics in the exercise of capacity and the reporting of unflattering news. Researchers should take care to familiarize themselves with these issues prior to making strong claims based on these data.

Other Data Sources

Philip Cohen has released several datasets and code for STATA users. These can be found at his website.