| Title: | Epidemiology Data Dictionaries and Random Data Generators |
|---|---|
| Description: | The 'R4EPIs' project <https://r4epi.github.io/sitrep/> seeks to provide a set of standardized tools for analysis of outbreak and survey data in humanitarian aid settings. This package currently provides standardized data dictionaries from Medecins Sans Frontieres Operational Centre Amsterdam for outbreak scenarios (Acute Jaundice Syndrome, Cholera, Diphtheria, Measles, Meningitis) and surveys (Retrospective mortality and access to care, Malnutrition, Vaccination coverage and Event Based Surveillance) - as described in the following <https://scienceportal.msf.org/assets/standardised-mortality-surveys?utm_source=chatgpt.com>. In addition, a data generator from these dictionaries is provided. It is also possible to read in any Open Data Kit format data dictionary. |
| Authors: | Alexander Spina [aut, cre] (ORCID: <https://orcid.org/0000-0001-8425-1867>), Zhian N. Kamvar [aut] (ORCID: <https://orcid.org/0000-0003-1458-7108>), Lukas Richter [aut], Patrick Keating [aut], Annick Lenglet [ctb], Applied Epi Incorporated [cph], Medecins Sans Frontieres Operational Centre Amsterdam [fnd] |
| Maintainer: | Alexander Spina <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3.0 |
| Built: | 2026-05-20 08:48:45 UTC |
| Source: | https://github.com/r4epi/epidict |
Dictionary-based helper for aligning your data to variables used in a script
dict_rename_helper( dictionary, varnames, varnames_type, rmd, copy_to_clipboard = TRUE )dict_rename_helper( dictionary, varnames, varnames_type, rmd, copy_to_clipboard = TRUE )
dictionary |
A dataframe of the dictionary which you would like to use. |
varnames |
The name of |
varnames_type |
The name of |
rmd |
Path to the Rmarkdown file which you would like to compare to. |
copy_to_clipboard |
if |
A dplyr command used to rename columns in your data frame according to the dictionary
Based on a dictionary generator like msf_dict(),
this function will generate a randomized dataset based on values defined in
the dictionaries. The randomized dataset produced should mimic an excel
export from DHIS2 or ODK.
gen_data(dictionary, varnames = "name", numcases = 300, org = "MSF")gen_data(dictionary, varnames = "name", numcases = 300, org = "MSF")
dictionary |
Specify which dictionary you would like to use. |
varnames |
Specify name of column that contains variable names.
If |
numcases |
Specify the number of cases you want (default is 300) |
org |
Specify the organisation which the dictionary belongs to. Currently, only MSF exists. In the future, dictionaries from WHO and other organizations may become available. |
a data frame with cases in rows and variables in columns. The number of columns will vary from dictionary to dictionary, so please use the dictionary functions to generate a corresponding dictionary.
if (require("dplyr") & require("matchmaker")) { withAutoprint({ # You will often want to use MSF dictionaries to translate codes to human- # readable variables. Here, we generate a data set of 20 cases: dat <- gen_data( dictionary = "Cholera", varnames = "data_element_shortname", numcases = 20, org = "MSF" ) print(dat) # We want the expanded dictionary, so we will select `compact = FALSE` dict <- msf_dict(dictionary = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE) print(dict) # Now we can use matchmaker to filter the data: dat_clean <- matchmaker::match_df(dat, dict, from = "option_code", to = "option_name", by = "data_element_shortname", order = "option_order_in_set" ) print(dat_clean) }) }if (require("dplyr") & require("matchmaker")) { withAutoprint({ # You will often want to use MSF dictionaries to translate codes to human- # readable variables. Here, we generate a data set of 20 cases: dat <- gen_data( dictionary = "Cholera", varnames = "data_element_shortname", numcases = 20, org = "MSF" ) print(dat) # We want the expanded dictionary, so we will select `compact = FALSE` dict <- msf_dict(dictionary = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE) print(dict) # Now we can use matchmaker to filter the data: dat_clean <- matchmaker::match_df(dat, dict, from = "option_code", to = "option_name", by = "data_element_shortname", order = "option_order_in_set" ) print(dat_clean) }) }
These function produce MSF dictionaries based on DHIS2 (for OCA outbreaks) and ODK (for intersectional outbreaks and surveys) data sets defining the data element name, code, short names, types, and key/value pairs for translating the codes into human-readable format.
msf_dict(dictionary, tibble = TRUE, long = TRUE, compact = TRUE, clean = TRUE)msf_dict(dictionary, tibble = TRUE, long = TRUE, compact = TRUE, clean = TRUE)
dictionary |
Specify which dictionary you would like to use.
|
tibble |
If |
long |
If |
compact |
If |
clean |
If |
A data frame (tibble) containing the specified MSF data dictionary.
If long = TRUE, each variable-option pair is represented as a row.
If compact = TRUE, the options are nested as a data frame column named
"options". If long = FALSE, a list is returned with two data frames:
dictionary and options.
read_dict() gen_data() matchmaker::match_df()
if (require("dplyr") & require("matchmaker")) { withAutoprint({ # You will often want to use MSF dictionaries to translate codes to human- # readable variables. Here, we generate a data set of 20 cases: dat <- gen_data( dictionary = "Cholera", varnames = "data_element_shortname", numcases = 20, org = "MSF" ) print(dat) # We want the expanded dictionary, so we will select `compact = FALSE` dict <- msf_dict(dictionary = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE) print(dict) # Now we can use matchmaker to filter the data: dat_clean <- matchmaker::match_df(dat, dict, from = "option_code", to = "option_name", by = "data_element_shortname", order = "option_order_in_set" ) print(dat_clean) }) }if (require("dplyr") & require("matchmaker")) { withAutoprint({ # You will often want to use MSF dictionaries to translate codes to human- # readable variables. Here, we generate a data set of 20 cases: dat <- gen_data( dictionary = "Cholera", varnames = "data_element_shortname", numcases = 20, org = "MSF" ) print(dat) # We want the expanded dictionary, so we will select `compact = FALSE` dict <- msf_dict(dictionary = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE) print(dict) # Now we can use matchmaker to filter the data: dat_clean <- matchmaker::match_df(dat, dict, from = "option_code", to = "option_name", by = "data_element_shortname", order = "option_order_in_set" ) print(dat_clean) }) }
Helper for aligning your data to MSF standardised dictionaries and analysis templates.
msf_dict_rename_helper(dictionary, copy_to_clipboard = TRUE)msf_dict_rename_helper(dictionary, copy_to_clipboard = TRUE)
dictionary |
Specify which MSF dictionary you would like to use.
See |
copy_to_clipboard |
if |
A dplyr command used to rename columns in your data frame according to the dictionary
These function read dictionaries in ODK and DHIS2 formats, and reformats them for dataset recoding into human-readable format.
read_dict( path, sheet, format, tibble = TRUE, long = TRUE, compact = TRUE, clean = TRUE )read_dict( path, sheet, format, tibble = TRUE, long = TRUE, compact = TRUE, clean = TRUE )
path |
Define the path to .xlsx file where the dictionary is stored |
sheet |
Optional, if your sheets have non-standard names (e.g. using a disease pre-fix) - this can be specified here. |
format |
The format which the dictionary is in. Currently supports "DHIS2" and "ODK". |
tibble |
If |
long |
If |
compact |
If |
clean |
If |
If long = TRUE, returns a tibble of the merged dictionary and
value options. If long = FALSE, returns a list with elements dictionary
and options. If compact = TRUE, options are nested as a column of
data frames under "options".