Title: | Visualize Population Pyramids Aggregated by Age |
---|---|
Description: | Provides a quick method for visualizing non-aggregated line-list or aggregated census data stratified by age and one or two categorical variables (e.g. gender and health status) with any number of values. It returns a 'ggplot' object, allowing the user to further customize the output. This package is part of the 'R4Epis' project <https://r4epis.netlify.app/>. |
Authors: | Zhian N. Kamvar [aut, cre] , Alex Spina [ctb] |
Maintainer: | Zhian N. Kamvar <[email protected]> |
License: | GPL-3 |
Version: | 0.1.3 |
Built: | 2024-10-09 03:14:06 UTC |
Source: | https://github.com/r4epi/apyramid |
Plot a population pyramid (age-sex) from a dataframe.
age_pyramid( data, age_group = "age_group", split_by = "sex", stack_by = NULL, count = NULL, proportional = FALSE, na.rm = TRUE, show_midpoint = TRUE, vertical_lines = FALSE, horizontal_lines = TRUE, pyramid = TRUE, pal = NULL )
age_pyramid( data, age_group = "age_group", split_by = "sex", stack_by = NULL, count = NULL, proportional = FALSE, na.rm = TRUE, show_midpoint = TRUE, vertical_lines = FALSE, horizontal_lines = TRUE, pyramid = TRUE, pal = NULL )
data |
Your dataframe (e.g. linelist) |
age_group |
the name of a column in the data frame that defines the age group categories. Defaults to "age_group" |
split_by |
the name of a column in the data frame that defines the the bivariate column. Defaults to "sex". See NOTE |
stack_by |
the name of the column in the data frame to use for shading
the bars. Defaults to |
count |
for pre-computed data the name of the column in the data frame for the values of the bars. If this represents proportions, the values should be within [0, 1]. |
proportional |
If |
na.rm |
If |
show_midpoint |
When |
vertical_lines |
If you would like to add dashed vertical lines to help
visual interpretation of numbers. Default is to not show ( |
horizontal_lines |
If |
pyramid |
if |
pal |
a color palette function or vector of colors to be passed to
|
If the split_by
variable is bivariate (e.g. an indicator for a
specific symptom), then the result will show up as a pyramid, otherwise, it
will be presented as a facetted barplot with with empty bars in the
background indicating the range of the un-facetted data set. Values of
split_by
will show up as labels at top of each facet.
library(ggplot2) old <- theme_set(theme_classic(base_size = 18)) # with pre-computed data ---------------------------------------------------- # 2018/2008 US census data by age and gender data(us_2018) data(us_2008) age_pyramid(us_2018, age_group = age, split_by = gender, count = count) age_pyramid(us_2008, age_group = age, split_by = gender, count = count) # 2018 US census data by age, gender, and insurance status data(us_ins_2018) age_pyramid(us_ins_2018, age_group = age, split_by = gender, stack_by = insured, count = count ) us_ins_2018$prop <- us_ins_2018$percent/100 age_pyramid(us_ins_2018, age_group = age, split_by = gender, stack_by = insured, count = prop, proportion = TRUE ) # from linelist data -------------------------------------------------------- set.seed(2018 - 01 - 15) ages <- cut(sample(80, 150, replace = TRUE), breaks = c(0, 5, 10, 30, 90), right = FALSE ) sex <- sample(c("Female", "Male"), 150, replace = TRUE) gender <- sex gender[sample(5)] <- "NB" ill <- sample(c("case", "non-case"), 150, replace = TRUE) dat <- data.frame( AGE = ages, sex = factor(sex, c("Male", "Female")), gender = factor(gender, c("Male", "NB", "Female")), ill = ill, stringsAsFactors = FALSE ) # Create the age pyramid, stratifying by sex print(ap <- age_pyramid(dat, age_group = AGE)) # Create the age pyramid, stratifying by gender, which can include non-binary print(apg <- age_pyramid(dat, age_group = AGE, split_by = gender)) # Remove NA categories with na.rm = TRUE dat2 <- dat dat2[1, 1] <- NA dat2[2, 2] <- NA dat2[3, 3] <- NA print(ap <- age_pyramid(dat2, age_group = AGE)) print(ap <- age_pyramid(dat2, age_group = AGE, na.rm = TRUE)) # Stratify by case definition and customize with ggplot2 ap <- age_pyramid(dat, age_group = AGE, split_by = ill) + theme_bw(base_size = 16) + labs(title = "Age groups by case definition") print(ap) # Stratify by multiple factors ap <- age_pyramid(dat, age_group = AGE, split_by = sex, stack_by = ill, vertical_lines = TRUE ) + labs(title = "Age groups by case definition and sex") print(ap) # Display proportions ap <- age_pyramid(dat, age_group = AGE, split_by = sex, stack_by = ill, proportional = TRUE, vertical_lines = TRUE ) + labs(title = "Age groups by case definition and sex") print(ap) # empty group levels will still be displayed dat3 <- dat2 dat3[dat$AGE == "[0,5)", "sex"] <- NA age_pyramid(dat3, age_group = AGE) theme_set(old)
library(ggplot2) old <- theme_set(theme_classic(base_size = 18)) # with pre-computed data ---------------------------------------------------- # 2018/2008 US census data by age and gender data(us_2018) data(us_2008) age_pyramid(us_2018, age_group = age, split_by = gender, count = count) age_pyramid(us_2008, age_group = age, split_by = gender, count = count) # 2018 US census data by age, gender, and insurance status data(us_ins_2018) age_pyramid(us_ins_2018, age_group = age, split_by = gender, stack_by = insured, count = count ) us_ins_2018$prop <- us_ins_2018$percent/100 age_pyramid(us_ins_2018, age_group = age, split_by = gender, stack_by = insured, count = prop, proportion = TRUE ) # from linelist data -------------------------------------------------------- set.seed(2018 - 01 - 15) ages <- cut(sample(80, 150, replace = TRUE), breaks = c(0, 5, 10, 30, 90), right = FALSE ) sex <- sample(c("Female", "Male"), 150, replace = TRUE) gender <- sex gender[sample(5)] <- "NB" ill <- sample(c("case", "non-case"), 150, replace = TRUE) dat <- data.frame( AGE = ages, sex = factor(sex, c("Male", "Female")), gender = factor(gender, c("Male", "NB", "Female")), ill = ill, stringsAsFactors = FALSE ) # Create the age pyramid, stratifying by sex print(ap <- age_pyramid(dat, age_group = AGE)) # Create the age pyramid, stratifying by gender, which can include non-binary print(apg <- age_pyramid(dat, age_group = AGE, split_by = gender)) # Remove NA categories with na.rm = TRUE dat2 <- dat dat2[1, 1] <- NA dat2[2, 2] <- NA dat2[3, 3] <- NA print(ap <- age_pyramid(dat2, age_group = AGE)) print(ap <- age_pyramid(dat2, age_group = AGE, na.rm = TRUE)) # Stratify by case definition and customize with ggplot2 ap <- age_pyramid(dat, age_group = AGE, split_by = ill) + theme_bw(base_size = 16) + labs(title = "Age groups by case definition") print(ap) # Stratify by multiple factors ap <- age_pyramid(dat, age_group = AGE, split_by = sex, stack_by = ill, vertical_lines = TRUE ) + labs(title = "Age groups by case definition and sex") print(ap) # Display proportions ap <- age_pyramid(dat, age_group = AGE, split_by = sex, stack_by = ill, proportional = TRUE, vertical_lines = TRUE ) + labs(title = "Age groups by case definition and sex") print(ap) # empty group levels will still be displayed dat3 <- dat2 dat3[dat$AGE == "[0,5)", "sex"] <- NA age_pyramid(dat3, age_group = AGE) theme_set(old)
All of these tables were read directly from the excel sources via custom script located at https://github.com/R4EPI/apyramid/blob/master/scripts/read-us-pyramid.R.
us_2018 us_2008 us_ins_2018 us_ins_2008 us_gen_2018 us_gen_2008
us_2018 us_2008 us_ins_2018 us_ins_2008 us_gen_2018 us_gen_2008
All tables are in long tibble format. There are three columns common to all of the tables:
age [factor] 18 ordered age groups in increments of five years from "<5" to "85+"
gender [factor] 2 reported genders (male, female).
count [integer] Numbers in thousands. Civilian noninstitutionalized and military population.
Below are specifics of each table beyond the stated three columns with names as reported on the US census website
us_2018
, us_2008
)A tibble with 36 rows and 4 columns.
(us_2018
source: https://www2.census.gov/programs-surveys/demo/tables/age-and-sex/2018/age-sex-composition/2018gender_table1.xls)
(us_2008
source: https://www2.census.gov/programs-surveys/demo/tables/age-and-sex/2008/age-sex-composition/2008gender_table1.xls)
Additional columns:
percent [numeric] percent of the total US population rounded to the nearest 0.1%
us_ins_2018
, us_ins_2008
)A tibble with 72 rows and 5 columns.
(us_ins_2018
source: https://www2.census.gov/programs-surveys/demo/tables/age-and-sex/2018/age-sex-composition/2018gender_table14.xls)
(us_ins_2008
source: https://www2.census.gov/programs-surveys/demo/tables/age-and-sex/2008/age-sex-composition/2008gender_table29.xls)
Additional columns:
insured [factor] Either "Insured" or "Not insured" indicating insured status
percent [numeric] percent of each age and gender category insured rounded to the nearest 0.1%
us_gen_2018
, us_gen_2008
)A tibble with 108 rows and 5 columns.
(us_gen_2018
source: https://www2.census.gov/programs-surveys/demo/tables/age-and-sex/2018/age-sex-composition/2018gender_table13.xls)
(us_gen_2008
source: https://www2.census.gov/programs-surveys/demo/tables/age-and-sex/2008/age-sex-composition/2008gender_table29.xls)
Additional columns:
generation [factor] Three categories of generations in the US: First, Second, Third and higher (see note)
percent [numeric] percent of the total US population rounded to the nearest 0.1%
Note: from the US Census Bureau: The foreign born are considered first generation. Natives with at least one foreign-born parent are considered second generation. Natives with two native parents are considered third-and-higher generation.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 36 rows and 4 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 72 rows and 5 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 72 rows and 5 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 108 rows and 5 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 108 rows and 5 columns.
https://census.gov/data/tables/2018/demo/age-and-sex/2018-age-sex-composition.html https://census.gov/data/tables/2008/demo/age-and-sex/2008-age-sex-composition.html