Welcome to the CalCORVID (California Clustering for Operational Real-time Visualization of Infectious Diseases) dashboard repository. This repository provides sample data, code for a RShiny dashboard displaying spatiotemporal cluster results (CalCORVID), and functions to preprocess your data for display in the dashboard. Although CalCORVID and the accompanying functions can be directly implemented as a final product, we encourage users to customize and build on the provided code base.
🛈 Note This application is designed using R programming language and is developed in the RStudio integrated development environment (IDE). Click here to download R software and here to download RStudio to get started.
⚠️ Warning The current iteration of this dashboard is only designed to display spatiotemporal cluster results from SaTScan software. The user must analyze their data in SaTScan before using this repository.
Repository Home:
app.R
: Contains code for the user interface (UI) and server functions.global.R
: Sourced byapp.R
to load libraries, load data, and preprocess data which improves dashboard processing time./R/
folder: R scripts with relevant functions including generating dashboard files, cleaning input data, and running SaTScan within R./data/
folder: Stores all SaTScan outputs and generated data files byglobal.R
that the dashboard uses.- This folder contains files corresponding to two sample datasets: 1) The California vaccination analysis in the displayed dashboard example, and 2) simulated sample data in the
test_data
subfolder for users to get familiar with using the dashboard.
- This folder contains files corresponding to two sample datasets: 1) The California vaccination analysis in the displayed dashboard example, and 2) simulated sample data in the
/R/
:
satscan_run_example.R
: Script to run SaTScan using R with thersatscan
package. This generates the example data displayed on the dashboard, but is commented out as the results are already provided in the repository. However, this script can be adapted by users who are interested in running SaTScan within R.check_data.R
: Script containing functions that:clean_data()
: Check if newest SaTScan outputs are in the correct format.combine_datasets()
: Merge cluster center (*.col) and location ID (*.gis) files over a specified time frame.clean_combined_datasets()
: Reformat the combined dataset after merging with Social Vulnerability Index (SVI) calculations for dashboard display.
generate_map.R
: Script containing functions that:generate_svi_vars()
: Calculate average SVI percentiles for the geographic unit of analysis for each cluster.generate_county_shapes()
: Create a geojson shapefile containing county boundaries for the specified state. This function will also calculate the centroid of each county polygon to display centered county label names on the leaflet map.generate_state_coords()
: Generate a CSV file with geographic coordinates to orient the leaflet map to the specified state (more useful if detecting clusters over a larger geographic area).generate_cluster_coords()
: Generate a CSV file with geographic coordinates to orient the leaflet map to the detected clusters (more useful if detecting clusters over a smaller geographic area).
generate_test_data.R
: Script containing the code used to generate simulated test data at the California census tract-level.run_test_data.R
: Script to run SaTScan using R with thersatscan
package for the simulated test data.
/data/
:
CAvax_combgiscol_fnl.csv
: Final version of the example dataset to test running this dashboard that is generated byclean_combined_datasets()
coords/
: Folder containing CSV file/s of generated centroids given the state of analysis usinggenerate_state_coords()
orgenerate_cluster_coords()
.county_boundary/
: Folder containing geojson files of county boundaries given the state of analysis usinggenerate_county_shapes()
.giscol_files/
: Folder containing merged cluster center files (*.col) and location IDs (*.gis) usingcombine_datasets()
that are aggregated over a given time period.satscan_output/
: Folder to store required raw SaTScan outputs (cluster center - *.col and location ID files - *.gis).svi/
: Folder containing Social Vulnerability Index (SVI) scores for the given geographic unit and state.test_data/
: Folder containing relevant files for the simulated test data (see "Build and Test" section below) for users to test generating their own dashboard files.
Required files: This dashboard requires these two output files and corresponding columns from your SaTScan output:
- Cluster centers (*.col file): LOC_ID*, LATITUDE, LONGITUDE, RADIUS, START_DATE, END_DATE, OBSERVED, EXPECTED, CLUSTER
- Location IDs (*.gis file): LOC_ID*, CLUSTER
* The LOC_ID variables must be expressed as FIPS codes (census tract or county) or ZIP codes to calculate average Social Vulnerability Index values for the dashboard tooltip.
If you already have these two output files and corresponding columns, you can skip to the Dashboard Implementation section.
If you do not have these files: Follow the steps below to obtain them.
- Analyze your data using SaTScan software, which is available here or through the
rsatscan
package. This dashboard is currently designed to display only circular clusters. - Establish the nomenclature and organization of the result files so that analyses of the same input data are found in the same folder and are easily identifiable. For example, our sample dataset contains vaccination data from California so we name all our output files with
CAvax
, and all of our SaTScan outputs can be found in the/data/satscan_output
folder. - Save the cluster center and location information results in CSV format by either checking the "Cluster Information" and "Location Information" output options in the SaTScan software or saving the
$col
and$gis
objects from runningrsatscan::satscan
in the same folder. Similarly, you need to establish a nomenclature to distinguish these files in the output folder. We use_col_
and_gis_
, resulting in files calledCAvax_col_20240123.csv
andCAvax_gis_20240123.csv
. If you are saving the files from the SaTScan software, you may need to convert from a text file (.txt).
- Fork this Repository
- Modify necessary parameter values starting from line 35 to line 75 of the
global.R
file.- Change any file paths and subfolder names necessary under the "File Paths and Subfolders" section.
- Provide the nomenclature used for your SaTScan outputs under the "Input Files" section.
- Provide the desired parameter values for the dashboard functions given in the
R
folder. For example, if your analysis is at the census tract level instead of the zip code level, changelevel="zcta"
tolevel="tract"
.
- After specifying the parameter values, running the
app.R
file will automatically run the provided functions in theglobal.R
file in the following order:clean_data()
,combine_datasets()
,generate_svi_vars()
, andclean_combined_df()
. These functions check if your files are in the correct format, combine them to create a historical file, calculate Social Vulnerability Index (SVI) averages for each cluster, and cleans the dataset for the dashboard display. More specifically:clean_data()
: Checks if most recently run dataset is in the correct CSV formatcombine_datasets()
: If the most recent dataset is in the correct format, aggregate to the historical period specified in the parameters above. The default istime_value=10
andtime_unit="days"
, so the aggregated dataset will contain results from the past 10 days.generate_svi_vars()
: Given the geographic level of analysis (level="zcta"
,level="tract"
,level="county"
) and state of analysis (state="CA"
), calculate average Social Vulnerability Index (SVI) percentiles for each cluster.clean_combined_df()
: Reformats the combined dataset (removing unnecessary columns, renaming columns) containing SVI information for map and table displays on the RShiny dashboard.
- If this is your first time running
app.R
for a given state (default isstate="CA"
), theglobal.R
file will also run eithergenerate_state_coords()
orgenerate_cluster_coords()
depending on thezoom_level
specified, which generates coordinates to center the leaflet map to the state of the analysis or the detected clusters. The default for the CAvax data is set tozoom_level=state
because the clusters are spread across a large geographic area. Theglobal.R
file will also rungenerate_county_shapes()
, which generates the county boundaries of the state of analysis. If these files are already generated, the files will be read in. - Deploy the dashboard.
We suggest forking this repository, cloning into your local environment, and trying to run the app.R
and global.R
files with the provided sample data. After becoming familiar with the structure of the dashboard and the underlying functions, try plugging in your SaTScan results and modifying simpler features like the dashboard theme. We also provide a simulated test dataset detailed below for testing purposes.
-
We provide a simulated dataset in the
data/test_data
folder, which contains census tract-level analyses for the state of California over a period of 14 days. Poisson-distributed counts are randomly assigned to each census tract and date combination in theR/generate_test_data.R
file. This allows us to create case and geography files (ca_tract_case.csv
,ca_tract_geo.csv
) for the space-time permutation model. We then use the SaTScan software to run the case and geography files inR/run_test_data.R
to generate example results. The sample results are in the samedata/test_data
folder calledCAtest_col_20240221.csv
andCAtest_gis_20240221.csv
, following the nomenclature guidelines provided in the previous section. -
Modify the necessary parameter values starting from line 35 to line 75 in the
global.R
file.- File paths:
- Change
satscan_output_folder_name = "satscan_output"
tosatscan_output_folder_name = "test_data"
- Change
- Input files:
- Change
model <- "CAvax"
tomodel <- "CAtest"
- Change
- Dashboard functions:
- Change
zoom_level <- "cluster"
tozoom_level <- "state"
- Change
level="zcta"
tolevel="tract"
- Note:
time_value
andtime_unit
parameters are relevant only when aggregating cluster results over a given time period. For instance, if you run SaTScan analyses daily, the default setting oftime_value=10
andtime_unit="days"
will aggregate the last 10 days of data to display on the dashboard.
- Change
- File paths:
-
Run the
app.R
file, which will automatically run the provided functions in theglobal.R
file in the following order:clean_data()
,combine_datasets()
,generate_svi_vars()
, andclean_combined_df()
. These functions are detailed in the previous section. The final dataset the dashboard uses will be output in the maindata/
folder asCAtest_combgiscol_fnl.csv
. -
Deploy the dashboard.
Since every organization has its own internal requirements, we are unable to provide specific guidance on deploying CalCORVID. However, we have provided resources below based on feedback.
We can't wait to see what you do with this. Please fork, edit and send us back as pull requests the changes you'd like to see. Potentials for extension include:
- Incorporating different spatial resolutions in addition to the state-level display
- Inclusion of other socioeconomic variables
- Support for other spatiotemporal clustering methods
- Display line lists with each cluster
- Support for non-circular clusters
- Incorporating a quantitative method to track clusters over time (e.g., Moran's I)
A gift from California with love. “Together, all things are possible.” -- Cesar Chavez