Package 'epikit' reference manual

Title:	Miscellaneous Helper Tools for Epidemiologists
Description:	Contains tools for formatting inline code, renaming redundant columns, aggregating age categories, adding survey weights, finding the earliest date of an event, plotting z-curves, generating population counts and calculating proportions with confidence intervals. This is part of the 'R4Epis' project <https://r4epis.netlify.app/>.
Authors:	Alexander Spina [aut] , Zhian N. Kamvar [aut, cre] , Dirk Schumacher [aut], Kate Doyle [ctb]
Maintainer:	Zhian N. Kamvar <[email protected]>
License:	GPL-3
Version:	0.1.6
Built:	2025-03-06 06:20:05 UTC
Source:	https://github.com/r4epi/epikit

Add a column of cluster survey weights to a data frame.

Description

For use in surveys where you took a sample population out of a larger source population, with a cluster survey design.

Usage

add_weights_cluster(
  x,
  cl,
  eligible,
  interviewed,
  cluster_x = NULL,
  cluster_cl = NULL,
  household_x = NULL,
  household_cl = NULL,
  ignore_cluster = TRUE,
  ignore_household = TRUE,
  surv_weight = "surv_weight",
  surv_weight_ID = "surv_weight_ID"
)
add_weights_cluster(
  x,
  cl,
  eligible,
  interviewed,
  cluster_x = NULL,
  cluster_cl = NULL,
  household_x = NULL,
  household_cl = NULL,
  ignore_cluster = TRUE,
  ignore_household = TRUE,
  surv_weight = "surv_weight",
  surv_weight_ID = "surv_weight_ID"
)

Arguments

`x`	a data frame of survey data
`cl`	a data frame containing a list of clusters and the number of households in each.
`eligible`	the column in `x` which specifies the number of people eligible for being interviewed in that household. (e.g. the total number of children)
`interviewed`	the column in `x` which specifies the number of people actually interviewed in that household.
`cluster_x`	the column in `x` that indicates which cluster rows belong to. Ignored if `ignore_cluster` is TRUE.
`cluster_cl`	the column in `cl` that lists all possible clusters. Ignored if `ignore_cluster` is TRUE.
`household_x`	the column in `x` that indicates a unique household identifier. Ignored if `ignore_household` is TRUE.
`household_cl`	the column in `cl` that lists the number of households per cluster. Ignored if `ignore_household` is TRUE.
`ignore_cluster`	If TRUE (default), set the weight for clusters to be 1. This assumes that your sample was taken in a way which is a close approximation of a simple random sample. Ignores inputs from `cluster_cl` as well as `cluster_x`.
`ignore_household`	If TRUE (default), set the weight for households to be 1. This assumes that your sample of households was takenin a way which is a close approximation of a simple random sample. Ignores inputs from `household_cl` and `household_x`.
`surv_weight`	the name of the new column to store the weights. Defaults to "surv_weight".
`surv_weight_ID`	the name of the new ID column to be created. Defaults to "surv_weight_ID"

Details

Will multiply the inverse chances of a cluster being selected, a household being selected within a cluster, and an individual being selected within a household.

As follows:

((clusters available) / (clusters surveyed)) *
((households in each cluster) / (households surveyed in each cluster)) *
((individuals eligible in each household) / (individuals interviewed))

In the case where both ignore_cluster and ignore_household are TRUE, this will simply be:

1 * 1 * (individuals eligible in each household) / (individuals interviewed)

Author(s)

Alex Spina, Zhian N. Kamvar, Lukas Richter

Examples



# define a fake dataset of survey data
# including household and individual information
x <- data.frame(stringsAsFactors=FALSE,
         cluster = c("Village A", "Village A", "Village A", "Village A",
                     "Village A", "Village B", "Village B", "Village B"),
    household_id = c(1, 1, 1, 1, 2, 2, 2, 2),
      eligible_n = c(6, 6, 6, 6, 6, 3, 3, 3),
      surveyed_n = c(4, 4, 4, 4, 4, 3, 3, 3),
   individual_id = c(1, 2, 3, 4, 4, 1, 2, 3),
         age_grp = c("0-10", "20-30", "30-40", "50-60", "50-60", "20-30",
                     "50-60", "30-40"),
             sex = c("Male", "Female", "Male", "Female", "Female", "Male",
                     "Female", "Female"),
         outcome = c("Y", "Y", "N", "N", "N", "N", "N", "Y")
)

# define a fake dataset of cluster listings
# including cluster names and number of households
cl <- tibble::tribble(
     ~cluster, ~n_houses,
  "Village A",        23,
  "Village B",        42,
  "Village C",        56,
  "Village D",        38
)


# add weights to a cluster sample
# include weights for cluster, household and individual levels
add_weights_cluster(x, cl = cl,
                    eligible = eligible_n,
                    interviewed = surveyed_n,
                    cluster_cl = cluster, household_cl = n_houses,
                    cluster_x = cluster,  household_x = household_id,
                    ignore_cluster = FALSE, ignore_household = FALSE)


# add weights to a cluster sample
# ignore weights for cluster and household level (set equal to 1)
# only include weights at individual level
add_weights_cluster(x, cl = cl,
                    eligible = eligible_n,
                    interviewed = surveyed_n,
                    cluster_cl = cluster, household_cl = n_houses,
                    cluster_x = cluster,  household_x = household_id,
                    ignore_cluster = TRUE, ignore_household = TRUE)

# define a fake dataset of survey data
# including household and individual information
x <- data.frame(stringsAsFactors=FALSE,
         cluster = c("Village A", "Village A", "Village A", "Village A",
                     "Village A", "Village B", "Village B", "Village B"),
    household_id = c(1, 1, 1, 1, 2, 2, 2, 2),
      eligible_n = c(6, 6, 6, 6, 6, 3, 3, 3),
      surveyed_n = c(4, 4, 4, 4, 4, 3, 3, 3),
   individual_id = c(1, 2, 3, 4, 4, 1, 2, 3),
         age_grp = c("0-10", "20-30", "30-40", "50-60", "50-60", "20-30",
                     "50-60", "30-40"),
             sex = c("Male", "Female", "Male", "Female", "Female", "Male",
                     "Female", "Female"),
         outcome = c("Y", "Y", "N", "N", "N", "N", "N", "Y")
)

# define a fake dataset of cluster listings
# including cluster names and number of households
cl <- tibble::tribble(
     ~cluster, ~n_houses,
  "Village A",        23,
  "Village B",        42,
  "Village C",        56,
  "Village D",        38
)


# add weights to a cluster sample
# include weights for cluster, household and individual levels
add_weights_cluster(x, cl = cl,
                    eligible = eligible_n,
                    interviewed = surveyed_n,
                    cluster_cl = cluster, household_cl = n_houses,
                    cluster_x = cluster,  household_x = household_id,
                    ignore_cluster = FALSE, ignore_household = FALSE)


# add weights to a cluster sample
# ignore weights for cluster and household level (set equal to 1)
# only include weights at individual level
add_weights_cluster(x, cl = cl,
                    eligible = eligible_n,
                    interviewed = surveyed_n,
                    cluster_cl = cluster, household_cl = n_houses,
                    cluster_x = cluster,  household_x = household_id,
                    ignore_cluster = TRUE, ignore_household = TRUE)

Add a column of stratified survey weights to a data frame. For use in surveys where you took a sample population out of a larger source population, with a simple-random or stratified survey design.

Description

Creates weight based on dividing stratified population counts from the source population by surveyed counts in the sample population.

Usage

add_weights_strata(
  x,
  p,
  ...,
  population = population,
  surv_weight = "surv_weight",
  surv_weight_ID = "surv_weight_ID"
)
add_weights_strata(
  x,
  p,
  ...,
  population = population,
  surv_weight = "surv_weight",
  surv_weight_ID = "surv_weight_ID"
)

Arguments

`x`	a data frame of survey data
`p`	a data frame containing population data for groups in `...`
`...`	shared grouping columns across both `x` and `p`. These are used to match the weights to the correct subset of the population.
`population`	the column in `p` that defines the population numbers
`surv_weight`	the name of the new column to store the weights. Defaults to "surv_weight".
`surv_weight_ID`	the name of the new ID column to be created. Defaults to "surv_weight_ID"

Author(s)

Zhian N. Kamvar Alex Spina Lukas Richter

Examples


# define a fake dataset of survey data
# including household and individual information
x <- data.frame(stringsAsFactors=FALSE,
         cluster = c("Village A", "Village A", "Village A", "Village A",
                     "Village A", "Village B", "Village B", "Village B"),
    household_id = c(1, 1, 1, 1, 2, 2, 2, 2),
     eligibile_n = c(6, 6, 6, 6, 6, 3, 3, 3),
      surveyed_n = c(4, 4, 4, 4, 4, 3, 3, 3),
   individual_id = c(1, 2, 3, 4, 4, 1, 2, 3),
         age_grp = c("0-10", "20-30", "30-40", "50-60", "50-60", "20-30",
                     "50-60", "30-40"),
             sex = c("Male", "Female", "Male", "Female", "Female", "Male",
                     "Female", "Female"),
         outcome = c("Y", "Y", "N", "N", "N", "N", "N", "Y")
)

# define a fake population data set
# including age group, sex, counts and proportions
p <- epikit::gen_population(total = 10000,
  groups = c("0-10", "10-20", "20-30", "30-40", "40-50", "50-60"),
  proportions = c(0.1, 0.2, 0.3, 0.4, 0.2, 0.1))

  # make sure col names match survey dataset
p <- dplyr::rename(p, age_grp = groups, sex = strata, population = n)

# add weights to a stratified simple random sample
# weight based on age group and sex
add_weights_strata(x, p = p, age_grp, sex, population = population)

# define a fake dataset of survey data
# including household and individual information
x <- data.frame(stringsAsFactors=FALSE,
         cluster = c("Village A", "Village A", "Village A", "Village A",
                     "Village A", "Village B", "Village B", "Village B"),
    household_id = c(1, 1, 1, 1, 2, 2, 2, 2),
     eligibile_n = c(6, 6, 6, 6, 6, 3, 3, 3),
      surveyed_n = c(4, 4, 4, 4, 4, 3, 3, 3),
   individual_id = c(1, 2, 3, 4, 4, 1, 2, 3),
         age_grp = c("0-10", "20-30", "30-40", "50-60", "50-60", "20-30",
                     "50-60", "30-40"),
             sex = c("Male", "Female", "Male", "Female", "Female", "Male",
                     "Female", "Female"),
         outcome = c("Y", "Y", "N", "N", "N", "N", "N", "Y")
)

# define a fake population data set
# including age group, sex, counts and proportions
p <- epikit::gen_population(total = 10000,
  groups = c("0-10", "10-20", "20-30", "30-40", "40-50", "50-60"),
  proportions = c(0.1, 0.2, 0.3, 0.4, 0.2, 0.1))

  # make sure col names match survey dataset
p <- dplyr::rename(p, age_grp = groups, sex = strata, population = n)

# add weights to a stratified simple random sample
# weight based on age group and sex
add_weights_strata(x, p = p, age_grp, sex, population = population)

Create an age group variable

Description

Create an age group variable

Usage

age_categories(
  x,
  breakers = NULL,
  lower = 0,
  upper = NULL,
  by = 10,
  separator = "-",
  ceiling = FALSE,
  above.char = "+"
)

group_age_categories(
  dat,
  years = NULL,
  months = NULL,
  weeks = NULL,
  days = NULL,
  one_column = TRUE,
  drop_empty_overlaps = TRUE
)
age_categories(
  x,
  breakers = NULL,
  lower = 0,
  upper = NULL,
  by = 10,
  separator = "-",
  ceiling = FALSE,
  above.char = "+"
)

group_age_categories(
  dat,
  years = NULL,
  months = NULL,
  weeks = NULL,
  days = NULL,
  one_column = TRUE,
  drop_empty_overlaps = TRUE
)

Arguments

`x`	Your age variable
`breakers`	A string. Age category breaks you can define within c(). Alternatively use "lower", "upper" and "by" to set these breaks based on a sequence.
`lower`	A number. The lowest age value you want to consider (default is 0)
`upper`	A number. The highest age value you want to consider
`by`	A number. The number of years you want between groups
`separator`	A character that you want to have between ages in group names. The default is "-" producing e.g. 0-10.
`ceiling`	A TRUE/FALSE variable. Specify whether you would like the highest value in your breakers, or alternatively the upper value specified, to be the endpoint. This would produce the highest group of "70-80" rather than "80+". The default is FALSE (to produce a group of 80+).
`above.char`	Only considered when ceiling == FALSE. A character that you want to have after your highest age group. The default is "+" producing e.g. 80+
`dat`	a data frame with at least one column defining an age category
`years`, `months`, `weeks`, `days`	the bare name of the column defining years, months, weeks, or days (or NULL if the column doesn't exist)
`one_column`	if `TRUE` (default), the categories will be joined into a single column called "age_category" that appends the type of age category used. If `FALSE`, there will be one column with the grouped age categories called "age_category" and a second column indicating age unit called "age_unit".
`drop_empty_overlaps`	if `TRUE`, unused levels are dropped if they have been replaced by a more fine-grained definition and are empty. Practically, this means that the first level for years, months, and weeks are in consideration for being removed via `forcats::fct_drop()`

Value

a factor representing age ranges, open at the upper end of the range.

a data frame

Examples



if (interactive() && require("dplyr") && require("epidict")) {
withAutoprint({
set.seed(50)
dat <- epidict::gen_data("Cholera", n = 100, org = "MSF")
ages <- dat %>%
  select(starts_with("age")) %>%
  mutate(age_years = age_categories(age_years, breakers = c(0, 5, 10, 15, 20))) %>%
  mutate(age_months = age_categories(age_months, breakers = c(0, 5, 10, 15, 20))) %>%
  mutate(age_days = age_categories(age_days, breakers = c(0, 5, 15)))

ages %>%
  group_age_categories(years = age_years, months = age_months, days = age_days) %>%
  pull(age_category) %>%
  table()
})
}
if (interactive() && require("dplyr") && require("epidict")) {
withAutoprint({
set.seed(50)
dat <- epidict::gen_data("Cholera", n = 100, org = "MSF")
ages <- dat %>%
  select(starts_with("age")) %>%
  mutate(age_years = age_categories(age_years, breakers = c(0, 5, 10, 15, 20))) %>%
  mutate(age_months = age_categories(age_months, breakers = c(0, 5, 10, 15, 20))) %>%
  mutate(age_days = age_categories(age_days, breakers = c(0, 5, 15)))

ages %>%
  group_age_categories(years = age_years, months = age_months, days = age_days) %>%
  pull(age_category) %>%
  table()
})
}

create factors from numbers

Description

If the number of unique numbers is five or fewer, then they will simply be converted to factors in order, otherwise, they will be passed to cut and pretty, preserving the lowest value.

Usage

fac_from_num(x)
fac_from_num(x)

Arguments

`x`	a vector of integers or numerics

Value

a factor

Examples

fac_from_num(1:100)
fac_from_num(sample(100, 5))
fac_from_num(1:100)
fac_from_num(sample(100, 5))

Automatically calculate breaks for a number

Description

Automatically calculate breaks for a number

Usage

find_breaks(n, breaks = 4, snap = 1, ceiling = FALSE)
find_breaks(n, breaks = 4, snap = 1, ceiling = FALSE)

Arguments

`n`	a number to calcluate breaks for
`breaks`	the maximum number of segments you want to have
`snap`	the number defining where to snap to the nearest factor
`ceiling`	if `TRUE`, n is included in the breaks

Value

a vector of integers

Examples


# find four breaks from 1 to 100
find_breaks(100)

# find four breaks from 1 to 123, rounding to the nearest 20
find_breaks(123, snap = 20)

# note that there are only three breaks here because of the rounding
find_breaks(123, snap = 25)

# Include the value itself
find_breaks(123, snap = 25, ceiling = TRUE)
# find four breaks from 1 to 100
find_breaks(100)

# find four breaks from 1 to 123, rounding to the nearest 20
find_breaks(123, snap = 20)

# note that there are only three breaks here because of the rounding
find_breaks(123, snap = 25)

# Include the value itself
find_breaks(123, snap = 25, ceiling = TRUE)

Find the first date beyond a cutoff in several columns

Description

This function will find the first date in an orderd series of columns that is either before or after a cutoff date, inclusive.

Usage

find_date_cause(
  x,
  ...,
  period_start = NULL,
  period_end = NULL,
  datecol = "start_date",
  datereason = "start_date_reason",
  na_fill = "start"
)

find_start_date(
  x,
  ...,
  period_start = NULL,
  period_end = NULL,
  datecol = "start_date",
  datereason = "start_date_reason"
)

find_end_date(
  x,
  ...,
  period_start = NULL,
  period_end = NULL,
  datecol = "end_date",
  datereason = "end_date_reason"
)

constrain_dates(i, period_start, period_end, boundary = "both")

assert_positive_timespan(x, date_start, date_end)
find_date_cause(
  x,
  ...,
  period_start = NULL,
  period_end = NULL,
  datecol = "start_date",
  datereason = "start_date_reason",
  na_fill = "start"
)

find_start_date(
  x,
  ...,
  period_start = NULL,
  period_end = NULL,
  datecol = "start_date",
  datereason = "start_date_reason"
)

find_end_date(
  x,
  ...,
  period_start = NULL,
  period_end = NULL,
  datecol = "end_date",
  datereason = "end_date_reason"
)

constrain_dates(i, period_start, period_end, boundary = "both")

assert_positive_timespan(x, date_start, date_end)

Arguments

`x`	a data frame
`...`	an ordered series of date columns (i.e. the most important date to be considered first).
`period_start`, `period_end`	for the find_ functions, this should be the name of a column in `x` that contains the start/end of the recall period. For `constrain_dates`, this should be a vector of dates.
`datecol`	the name of the new column to contain the dates
`datereason`	the name of the column to contain the name of the column from which the date came.
`na_fill`	one of either "before" or "after" indicating that the new column should only contain dates before or after the cutoff date.
`i`	a vector of dates
`boundary`	one of "both", "start", or "end". Dates outside of the boundary will be set to NA.
`date_start`, `date_end`	column name of a date vector

Examples

d <- data.frame(
  s1 = c(as.Date("2013-01-01") + 0:10, as.Date(c("2012-01-01", "2014-01-01"))),
  s2 = c(as.Date("2013-02-01") + 0:10, as.Date(c("2012-01-01", "2014-01-01"))),
  s3 = c(as.Date("2013-01-10") - 0:10, as.Date(c("2012-01-01", "2014-01-01"))),
  ps = as.Date("2012-12-31"),
  pe = as.Date("2013-01-09")
)
print(dd <- find_date_cause(d, s1, s2, s3, period_start = ps, period_end = pe))
print(bb <- find_date_cause(d, s1, s2, s3, period_start = ps, period_end = pe,
                            na_fill = "end", 
                            datecol = "enddate",
                            datereason = "endcause"))
find_date_cause(d, s3, s2, s1, period_start = ps, period_end = pe)

# works
assert_positive_timespan(dd, start_date, pe)

# returns a warning because the last date isn't later than the start_date
assert_positive_timespan(dd, start_date, s2)


with(d, constrain_dates(s1, ps, pe))
with(d, constrain_dates(s2, ps, pe))
with(d, constrain_dates(s3, ps, pe))

d <- data.frame(
  s1 = c(as.Date("2013-01-01") + 0:10, as.Date(c("2012-01-01", "2014-01-01"))),
  s2 = c(as.Date("2013-02-01") + 0:10, as.Date(c("2012-01-01", "2014-01-01"))),
  s3 = c(as.Date("2013-01-10") - 0:10, as.Date(c("2012-01-01", "2014-01-01"))),
  ps = as.Date("2012-12-31"),
  pe = as.Date("2013-01-09")
)
print(dd <- find_date_cause(d, s1, s2, s3, period_start = ps, period_end = pe))
print(bb <- find_date_cause(d, s1, s2, s3, period_start = ps, period_end = pe,
                            na_fill = "end", 
                            datecol = "enddate",
                            datereason = "endcause"))
find_date_cause(d, s3, s2, s1, period_start = ps, period_end = pe)

# works
assert_positive_timespan(dd, start_date, pe)

# returns a warning because the last date isn't later than the start_date
assert_positive_timespan(dd, start_date, s2)


with(d, constrain_dates(s1, ps, pe))
with(d, constrain_dates(s2, ps, pe))
with(d, constrain_dates(s3, ps, pe))

Helper to format confidence interval for text

Description

This function is mainly used for placing in the text fields of Rmarkdown reports.

Usage

fmt_ci(
  e = numeric(),
  l = numeric(),
  u = numeric(),
  digits = 2,
  percent = TRUE,
  separator = "-"
)

fmt_pci(
  e = numeric(),
  l = numeric(),
  u = numeric(),
  digits = 2,
  percent = TRUE,
  separator = "-"
)

fmt_pci_df(
  x,
  e = 3,
  l = e + 1,
  u = e + 2,
  digits = 2,
  percent = TRUE,
  separator = "-"
)

fmt_ci_df(
  x,
  e = 3,
  l = e + 1,
  u = e + 2,
  digits = 2,
  percent = TRUE,
  separator = "-"
)
fmt_ci(
  e = numeric(),
  l = numeric(),
  u = numeric(),
  digits = 2,
  percent = TRUE,
  separator = "-"
)

fmt_pci(
  e = numeric(),
  l = numeric(),
  u = numeric(),
  digits = 2,
  percent = TRUE,
  separator = "-"
)

fmt_pci_df(
  x,
  e = 3,
  l = e + 1,
  u = e + 2,
  digits = 2,
  percent = TRUE,
  separator = "-"
)

fmt_ci_df(
  x,
  e = 3,
  l = e + 1,
  u = e + 2,
  digits = 2,
  percent = TRUE,
  separator = "-"
)

Arguments

`e`	the column of the estimate (defaults to the third column). Otherwise, a number
`l`	the column of the lower bound (defaults to the fourth column). Otherwise, a number
`u`	the column of the upper bound (defaults to the fifth column), otherwise, a number
`digits`	the number of digits to show
`percent`	if `TRUE` (default), converts the number to percent, otherwise it's treated as a raw value
`separator`	what to separate lower and upper confidence intervals with, default is "-"
`x`	a data frame

Value

a text string in the format of "e\

Examples


cfr <- data.frame(x = 1, y = 2, est = 0.5, lower = 0.25, upper = 0.75)
fmt_pci_df(cfr)

# If the data starts at a different column, specify a different number
fmt_pci_df(cfr[-1], 2, d = 1)

# It's also possible to provide numbers directly and remove the percent sign.
fmt_ci(pi, pi - runif(1), pi + runif(1), percent = FALSE)
cfr <- data.frame(x = 1, y = 2, est = 0.5, lower = 0.25, upper = 0.75)
fmt_pci_df(cfr)

# If the data starts at a different column, specify a different number
fmt_pci_df(cfr[-1], 2, d = 1)

# It's also possible to provide numbers directly and remove the percent sign.
fmt_ci(pi, pi - runif(1), pi + runif(1), percent = FALSE)

Counts and proportions inline

Description

These functions will give proportions for different variables inline.

Usage

fmt_count(x, ...)
fmt_count(x, ...)

Arguments

`x`	a data frame
`...`	an expression or series of expressions to pass to `dplyr::filter()`

Value

a one-element character vector of the format "n (%)"

Examples


fmt_count(mtcars, cyl > 3, hp < 100)
fmt_count(iris, Species == "virginica")
fmt_count(mtcars, cyl > 3, hp < 100)
fmt_count(iris, Species == "virginica")

Fake spatial data as polygons This function returns a polygon which is split in to regions based on a supplied vector of names

Description

Fake spatial data as polygons This function returns a polygon which is split in to regions based on a supplied vector of names

Usage

gen_polygon(regions)
gen_polygon(regions)

Arguments

regions

A string of names for each region to label the polygon with

References

The coordinates used for the polygon are of Vienna, Austria. based off government data (see metadata)

Generate population counts from estimated population age breakdowns.

Description

This generates based on predefined age groups and proportions, however you could also define these yourself.

Usage

gen_population(
  total_pop = 1000,
  groups = c("0-4", "5-14", "15-29", "30-44", "45+"),
  strata = c("Male", "Female"),
  proportions = c(0.079, 0.134, 0.139, 0.082, 0.066),
  counts = NULL,
  tibble = TRUE
)
gen_population(
  total_pop = 1000,
  groups = c("0-4", "5-14", "15-29", "30-44", "45+"),
  strata = c("Male", "Female"),
  proportions = c(0.079, 0.134, 0.139, 0.082, 0.066),
  counts = NULL,
  tibble = TRUE
)

Arguments

`total_pop`	The overal population count of interest - the default is 1000 people
`groups`	A character vector of groups - the default is set for age groups: c("0-4","5-14","15-29","30-44","45+")
`strata`	A character vector for stratifying groups - the default is set for gender: c("Male", "Female")
`proportions`	A numeric vector specifying the proportions (as decimals) for each group of the total_pop. The default repeats c(0.079, 0.134, 0.139, 0.082, 0.067) for strata. However you can change this manually, make sure to have the length equal to groups times strata (or half thereof). These defaults are based of MSF general emergency intervention standard values.
`counts`	A numeric vector specifying the counts for each group. The default is NULL - as most often proportions above will be used. If is not NULL then total_pop and proportions will be ignored. Make sure the length of this vector is equal to groups times strata (or if it is half then it will repeat for each strata). For reference, the MSF general emergency intervention standard values are c(7945, 13391, 13861, 8138, 6665) based on above groups for a 100,000 person population.
`tibble`	Return data as a tidyverse tibble (default is TRUE)

Examples

# get population counts based on proportion, unstratified
gen_population(groups = c(1, 2, 3, 4), 
               strata = NULL, 
               proportions = c(0.3, 0.2, 0.4, 0.1))

# get population counts based on proportion, stratified
gen_population(groups = c(1, 2, 3, 4), 
               strata = c("a", "b"), 
               proportions = c(0.3, 0.2, 0.4, 0.1))

# get population counts based on counts, unstratified
gen_population(groups = c(1, 2, 3, 4), 
               strata = NULL, 
               counts = c(20, 10, 30, 40))

# get population counts based on counts, stratified
gen_population(groups = c(1, 2, 3, 4), 
               strata = c("a", "b"), 
               counts = c(20, 10, 30, 40))

# get population counts based on counts, stratified - type out counts
# for each group and strata
gen_population(groups = c(1, 2, 3, 4), 
               strata = c("a", "b"), 
               counts = c(20, 10, 30, 40, 40, 30, 20, 20))
# get population counts based on proportion, unstratified
gen_population(groups = c(1, 2, 3, 4), 
               strata = NULL, 
               proportions = c(0.3, 0.2, 0.4, 0.1))

# get population counts based on proportion, stratified
gen_population(groups = c(1, 2, 3, 4), 
               strata = c("a", "b"), 
               proportions = c(0.3, 0.2, 0.4, 0.1))

# get population counts based on counts, unstratified
gen_population(groups = c(1, 2, 3, 4), 
               strata = NULL, 
               counts = c(20, 10, 30, 40))

# get population counts based on counts, stratified
gen_population(groups = c(1, 2, 3, 4), 
               strata = c("a", "b"), 
               counts = c(20, 10, 30, 40))

# get population counts based on counts, stratified - type out counts
# for each group and strata
gen_population(groups = c(1, 2, 3, 4), 
               strata = c("a", "b"), 
               counts = c(20, 10, 30, 40, 40, 30, 20, 20))

Cosmetically relabel all columns that contains a certain pattern

Description

These function are only to be used cosmetically before kable and will likely return a data frame with duplicate names.

Usage

rename_redundant(x, ...)

augment_redundant(x, ...)
rename_redundant(x, ...)

augment_redundant(x, ...)

Arguments

`x`	a data frame
`...`	a series of keys and values to replace columns that match specific patterns.

Details

rename_redundant fully replaces any column names matching the keys
augment_redundant will take a regular expression and rename columns via gsub().

Value

a data frame.

Author(s)

Zhian N. Kamvar

Examples


df <- data.frame(
  x = letters[1:10],
  `a n` = 1:10,
  `a prop` = (1:10) / 10,
  `a deff` = round(pi, 2),
  `b n` = 10:1,
  `b prop` = (10:1) / 10,
  `b deff` = round(pi * 2, 2),
  check.names = FALSE
)
df
print(df <- rename_redundant(df, "%" = "prop", "Design Effect" = "deff"))
print(df <- augment_redundant(df, " (n)" = " n$"))
df <- data.frame(
  x = letters[1:10],
  `a n` = 1:10,
  `a prop` = (1:10) / 10,
  `a deff` = round(pi, 2),
  `b n` = 10:1,
  `b prop` = (10:1) / 10,
  `b deff` = round(pi * 2, 2),
  check.names = FALSE
)
df
print(df <- rename_redundant(df, "%" = "prop", "Design Effect" = "deff"))
print(df <- augment_redundant(df, " (n)" = " n$"))

Unite estimates and confidence intervals

Description

create a character column by combining estimate, lower and upper columns. This is similar to tidyr::unite().

Usage

unite_ci(
  x,
  col = NULL,
  ...,
  remove = TRUE,
  digits = 2,
  m100 = TRUE,
  percent = FALSE,
  ci = FALSE,
  separator = "-"
)

merge_ci_df(x, e = 3, l = e + 1, u = e + 2, digits = 2, separator = "-")

merge_pci_df(x, e = 3, l = e + 1, u = e + 2, digits = 2, separator = "-")
unite_ci(
  x,
  col = NULL,
  ...,
  remove = TRUE,
  digits = 2,
  m100 = TRUE,
  percent = FALSE,
  ci = FALSE,
  separator = "-"
)

merge_ci_df(x, e = 3, l = e + 1, u = e + 2, digits = 2, separator = "-")

merge_pci_df(x, e = 3, l = e + 1, u = e + 2, digits = 2, separator = "-")

Arguments

`x`	a data frame with at least three columns defining an estimate, lower bounds, and upper bounds.
`col`	the quoted name of the replacement column to create
`...`	three columns to bind together in the order of Estimate, Lower, and Upper.
`remove`	if `TRUE` (default), the three columns in `...` will be replaced by `col`
`digits`	the number of digits to retain for the confidence interval.
`m100`	`TRUE` if the result should be multiplied by 100
`percent`	`TRUE` if the result should have a percent symbol added.
`ci`	`TRUE` if the result should include "CI" within the braces (defaults to FALSE)
`separator`	what to separate lower and upper confidence intervals with, default is "-"
`e`	the column of the estimate (defaults to the third column). Otherwise, a number
`l`	the column of the lower bound (defaults to the fourth column). Otherwise, a number
`u`	the column of the upper bound (defaults to the fifth column), otherwise, a number

Value

a modified data frame with merged columns or one additional column representing the estimate and confidence interval

Examples


fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
df  <- data.frame(v = names(coef(fit)), e = coef(fit), confint(fit), row.names = NULL)
names(df) <- c("variable", "estimate", "lower", "upper")
print(df)
unite_ci(df, "slope (CI)", estimate, lower, upper, m100 = FALSE, percent = FALSE)

fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
df  <- data.frame(v = names(coef(fit)), e = coef(fit), confint(fit), row.names = NULL)
names(df) <- c("variable", "estimate", "lower", "upper")
print(df)
unite_ci(df, "slope (CI)", estimate, lower, upper, m100 = FALSE, percent = FALSE)

Create a curve comparing observed Z-scores to the WHO standard.

Description

Create a curve comparing observed Z-scores to the WHO standard.

Usage

zcurve(x, zscore)
zcurve(x, zscore)

Arguments

`x`	a data frame
`zscore`	bare name of a numeric vector containing computed zscores

Value

a ggplot2 object that is customisable via the ggplot2 package.

Examples


library("ggplot2")
set.seed(9)
dat <- data.frame(observed = rnorm(204) + runif(1),
                  skewed   = rnorm(204) + runif(1, 0.5)
                 ) # slightly skewed
zcurve(dat, observed) +
  labs(title = "Weight-for-Height Z-scores") +
  theme_classic()

zcurve(dat, skewed) +
  labs(title = "Weight-for-Height Z-scores") +
  theme_classic()

# Including different groups to facet
dat <- data.frame(
  observed = c(rnorm(204) + runif(1), rnorm(204) + runif(1, 0.5)),
  groups   = rep(c("A", "B"), each = 204),
  treat    = sample(c('up', 'down'), 408, replace = TRUE)
                 )
zcurve(dat, observed) +
  facet_grid(treat~groups)

library("ggplot2")
set.seed(9)
dat <- data.frame(observed = rnorm(204) + runif(1),
                  skewed   = rnorm(204) + runif(1, 0.5)
                 ) # slightly skewed
zcurve(dat, observed) +
  labs(title = "Weight-for-Height Z-scores") +
  theme_classic()

zcurve(dat, skewed) +
  labs(title = "Weight-for-Height Z-scores") +
  theme_classic()

# Including different groups to facet
dat <- data.frame(
  observed = c(rnorm(204) + runif(1), rnorm(204) + runif(1, 0.5)),
  groups   = rep(c("A", "B"), each = 204),
  treat    = sample(c('up', 'down'), 408, replace = TRUE)
                 )
zcurve(dat, observed) +
  facet_grid(treat~groups)

Package 'epikit'

Help Index

Add a column of cluster survey weights to a data frame.

Description

Usage

Arguments

Details

Author(s)

Examples

Add a column of stratified survey weights to a data frame. For use in surveys where you took a sample population out of a larger source population, with a simple-random or stratified survey design.

Description

Usage

Arguments

Author(s)

Examples

Create an age group variable

Description

Usage

Arguments

Value

Examples

create factors from numbers

Description

Usage

Arguments

Value

Examples

Automatically calculate breaks for a number

Description

Usage

Arguments

Value

Examples

Find the first date beyond a cutoff in several columns

Description

Usage

Arguments

Examples

Helper to format confidence interval for text

Description

Usage

Arguments

Value

Examples

Counts and proportions inline

Description

Usage

Arguments

Value

Examples

Fake spatial data as polygons This function returns a polygon which is split in to regions based on a supplied vector of names

Description

Usage

Arguments

References

Generate population counts from estimated population age breakdowns.

Description

Usage

Arguments

Examples

Cosmetically relabel all columns that contains a certain pattern

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Unite estimates and confidence intervals

Description

Usage

Arguments

Value

Examples

Create a curve comparing observed Z-scores to the WHO standard.

Description

Usage

Arguments

Value

Examples