We have one pre-print manuscript that describes the performance of the ensemble forecasts through July 2020, as well as a second pre-print that evaluates the predictive performance of the ensemble and dozens of other models through all of 2020.

Overview

Each week, we generate ensemble forecasts of cumulative and incident COVID-19 deaths, incident COVID-19 cases, and incident COVID-19 hospitalizations over the next four weeks that combine the forecasts from a designated model submitted by each team. This is helpful because it gives a sense of the general consensus forecast across all teams. Previous work in infectious disease forecasting and other fields has also shown that ensemble forecasts are often more accurate than any individual model that went into the ensemble. Readers who are more familiar with the forecasting methods may also find it helpful to explore forecasts from individual models to obtain a more detailed understanding of the underlying uncertainty and the range of projections generated by models built on different assumptions. We published a medrxiv pre-print in August 2020 describing the performance of the ensemble forecast during the first few months of the pandemic.

Summary of how the ensemble is built

Typically on Monday evening or Tuesday morning we update the COVID-19 Forecast Hub ensemble forecast using all eligible forecasts submitted in the prior week.

From April 13 to July 21 2020, the ensemble was created by taking the arithmetic average of each prediction quantile for all eligible models for a given location. Starting on the week of July 28, we instead used the median prediction across all eligible models at each quantile level.

We created ensemble forecasts for hospitalizations due to COVID-19 for the first time on the week of December 7, 2020. This is a beta version of the ensemble, and it has not been assessed in detail for accuracy or calibration.

Detailed eligibility criteria

Forecasts submitted by 6pm ET on Monday night are guaranteed consideration for inclusion in the ensemble for that week, as long as the forecast was associated with a date since the previous Tuesday.

To be included in the ensemble, a team’s designated model must meet certain specified inclusion criteria.
We require that forecasts include a full set of 23 quantiles to be submitted for each of the one through four week ahead values for forecasts of deaths, a full set of 7 quantiles for the one through four week ahead values for forecasts of cases, or a full set of 7 quantiles for the one through twenty-eight day ahead values for forecasts of hospitalizations (see Technical README) for details).

For forecasts of cumulative deaths, we also perform two additional checks for internal consistency. By definition, cumulative deaths cannot decrease over time (other than possibly because of revisions to reporting). We therefore require that (1) a team assigns at most a 10% chance that cumulative deaths would decrease in their one-week ahead forecasts, and (2) at each quantile level of the predictive distribution, that quantile is constant or increasing over time. Additionally, models that project case or death values that are larger than the population size of the geographic location are not included. Before the week of July 28, we performed manual visual inspection checks to ensure that forecasts were in alignment with the ground truth data; this step is no longer a part of our weekly ensemble generation process. Details on which models were included each week in the ensemble are available on GitHub.

To be eligible for inclusion in the ensemble of hospitalizations, individual model forecasts must meet a check for consistency with recent observed data. We have periodically made minor updates to the criteria since the introduction of the ensemble forecasts for hospitalizations:

  • On the weeks of December 7, 2020 through December 21, 2020, we required that the mean daily point prediction for a given location during the first seven days (e.g., covering Tuesday, December 8 through Monday, December 14 for forecasts submitted December 7) must be at least as large as the mean reported daily confirmed hospital admissions minus four times the standard deviation of the reported daily confirmed hospital admissions data for that location over the past 14 days. This check was performed separately for each location, but a given model was included in the ensemble for all locations if it passed this check in at least 75% of jurisdictions, and excluded for all locations otherwise.
  • On the weeks of December 28, 2020 and January 4, 2021 we used the check described above, but inclusion was determined separately for each location.
  • Starting on the week of January 11, 2021 the check is based on the mean of the predictive median during the first seven days rather than the mean point prediction. Model inclusions are still determined separately for each location.

For all checks described above, daily reported hospital admissions are taken from HealthData.gov.