Adding codelists to a project
Adding a codelist🔗
Your example research template doesn't include any codelists but the folder structure and text files that are needed to include codelists already exist.
Take a look at the codelists/codelists.txt
file in the repo, this file is currently empty but any codelists that you add to your project will appear here.
You can add a codelist from OpenCodelists to your project by manually editing the codelists.txt file or by using the opensafely codelists add
command.
For example, running the following command in your terminal:
opensafely codelists add https://www.opencodelists.org/codelist/opensafely/covid-identification/2020-06-03/
will add the OpenSAFELY COVID Identification codelist to codelists.txt
and also download and add opensafely-covid-identification.csv
to your project.
Manually editing codelists.txt🔗
The naming convention of the line that you need to add to the codelists/codelists.txt
file follows this structure: a <codelist-id>
is followed by /
and a <version-id>
.
Note that the version ID is a sequence of 8 characters. Some codelists may also have a version tag in the form of a date (YYYY-MM-DD) or a version number (e.g., v1.2) that can be
used in place of the version ID.
<codelist-id>/<version-id>
To find this information on the page for each of the codelists on OpenCodelists, see orange boxes in the screenshot below.
You need to add each line into a new line of the codelists.txt
file.
The next time you run the command opensafely codelists update
in your terminal, the codelists you specified earlier will be added to the the codelists/
subfolder in your project automatically so you don't need to add these files manually to your project.
For example, a codelists.txt
file of a project may consist of four different lines:
opensafely/aplastic-anaemia/58ac196d
opensafely/asplenia/3ce9e642
opensafely/current-asthma/2020-05-06
primis-covid19-vacc-uptake/bmi_stage/v1.2
After running the command opensafely codelists update
the following four .csv files will be added to your project:
opensafely-aplastic-anaemia.csv
opensafely-asplenia.csv
opensafely-current-asthma.csv
primis-covid19-vacc-uptake-bmi_stage.csv
A codelist may be owned by an individual user, rather than an organisation. In this case, the
entry in codelists.txt
follows this structure: user/<username>/<codelist-id>/<version-id>
.
If necessary, during initial development you can even import codelists this way before they are published (provided they have been put "under review", not in "draft" state), but ensure they are finalised and updated in your study before running in the real data.
Adding/updating a codelist CSV file🔗
Once you have listed the codelists you need from OpenCodelists in the codelists.txt
file, you can download the specified files into the codelists/
folder using the opensafely
program by running
opensafely codelists update
This command should be re-run every time a codelist is added or removed from the codelists.txt
file. Running this command will automatically generate a file called codelists.json
, which contains a dictionary of codelists files, the URLs they have been downloaded from, download dates, and secure hash algorithms (SHA). The file should not be manually edited; however, you will need to add and commit the change and push to GitHub. If you don't, or a newer version is available than that committed, the automated tests will fail with an error message. Beware that in Windows, if one or more of these codelist files is open then this command won't be able to run; close them first.
If necessary, you can also import CSVs not via OpenCodelists - just manually copy the CSV files into codelists/
. However, we would recommend uploading these to OpenCodelists to import them as above. Note, if you are also using some codelists from OpenCodelists, any manually imported codelists should be stored in a local_codelists
folder so that they are not overwritten in the next step, as manual changes to CSV files will be clobbered the next time the command is run.
See more on using Codelists in your study definition in Working with codelists.