Scripts

Data processing instructions are stored as a set of scripts under the analysis/scripts folder. The following flowchart shows the relationship between scripts and data files, leading to the final packaged dataset:

Processing flowchart. Green boxes are data files and gray ellipses are scripts. Text in angle brackets depends on options provided to the script.

Processing flowchart. Green boxes are data files and gray ellipses are scripts. Text in angle brackets depends on options provided to the script.

The scripts are typically run indirectly via make, but instructions for running them individually can be found at the top of each script.

Data files are stored under analysis/raw, analysis/intermediate, and analysis/out. Most intermediate files are stored in sqlite format so that they can be easily queried by the viewer app.

Configuration

Many aspects of the data processing are controlled by a collection of csv files stored under analysis/config. The files are organized like a normalized relational database, as shown below.

Data model of the configuration files. Each arrow is a foreign key relationship. For example, A->B can be read as "each A has a B".

Data model of the configuration files. Each arrow is a foreign key relationship. For example, A->B can be read as “each A has a B”.