---
title: "Details of the incidence_fit class"
author: "Zhian N. Kamvar"
date: "`r Sys.Date()`"
output:
   rmarkdown::html_vignette:
     toc: true
     toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Incidence fit class}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  fig.width=7, 
  fig.height=5
)
```

This vignette details the structure and construction of the `incidence_fit` and 
`incidence_fit_list` classes, which are produced by the `fit()` and 
`fit_optim_split()` functions, respectively. By the end of this tutorial, you
should be able to construct `incidence_fit` and `incidence_fit_list` objects for
use with your own models.

# Structure of an `incidence_fit` object

An `incidence_fit` object contains three elements:

 - `$model`: The model fit to an `incidence` object. Currently, this represents a 
    log-linear model, but it can be any model. 
 - `$info`: Information derived from the model
   - `r` The growth rate
   - `r.conf` the confidence interval of `r`
   - `pred` a data frame containing the predictions of the model using the true
      dates (`dates`), their numeric version used in the model (`dates.x`), the
      predicted value (`fit`), and the lower (`lwr`) and upper (`upr`) bounds of
      the associated confidence interval.
   - `doubling` the predicted doubling time in days (only if `r` is positive)
   - `doubling.conf` the confidence interval of the doubling time
   - `halving` the predicted halving time in days (only if `r` is negative)
   - `halving.conf` the confidence interval of the halving time
 - `$origin`: the date corresponding to day '0'

Internally, when `fit()` is run, these elements are constructed by
function `incidence:::extract_info()`. First we need to setup data. We will use 
simulated Ebola outbreak data from the *outbreaks* package over weekly intervals
and calculate the fit for the first 20 weeks:

```{r fit_dates}
library(outbreaks)
library(incidence)
dat <- ebola_sim$linelist$date_of_onset
i <- incidence(dat, interval = "week")
i
f <- fit(i[1:20])
f
plot(i, fit = f)
```

As you can see, the `incidence_fit` object has a print method and a plot method.
If you want to access individual elements in the `$info` element, you can use
the `get_info()` function:

```{r get_info}
get_info(f, "r")
get_info(f, "r.conf")
get_info(f, "doubling.conf")
```

This will be important later when we combine several `incidence_fit` objects into
a single `incidence_fit_list`.

# Building an `incidence_fit` object from scratch

The `incidence_fit` object can be constructed from any model from which you can
derive the daily growth rate, doubling/halving times, predictions, and confidence
intervals. The following three steps show roughly how it is done from model 
fitting to construction.

### Step 1: create the model

The default model for `fit()` is a log-linear model on the intervals between 
dates. To fit this model, we will need to create a data frame with the counts
and the midpoints of the intervals:

```{r create_model}
# ensure all dates have at least one incidence
i2 <- i[1:20]
i2 <- i2[apply(get_counts(i2), 1, min) > 0]
df <- as.data.frame(i2, long = TRUE)
df$dates.x <- get_dates(i2, position = "center", count_days = TRUE)
head(df)
lm1 <- stats::lm(log(counts) ~ dates.x, data = df)
lm1
```

If we compare that to the `$model` element produced from `fit()`, we can see that
it is identical:

```{r fit_model}
all.equal(f$model, lm1)
```

### Step 2: creation of the `$info` list:

The `$info` list is created directly from the model itself:

```{r make_info}
r <- stats::coef(lm1)["dates.x"]
r.conf <- stats::confint(lm1, "dates.x", 0.95)
new.data <- data.frame(dates.x = sort(unique(lm1$model$dates.x)))
pred     <- exp(stats::predict(lm1, newdata = new.data, interval = "confidence",
                               level = 0.95))
pred <- cbind.data.frame(new.data, pred)
info_list <- list(
  r = r,
  r.conf = r.conf,
  doubling = log(2) / r,
  doubling.conf = log(2) / r.conf,
  pred = pred
)
info_list
```

### Step 3: combine lists and create object

the last step is to combine everything into a list and create the object.

```{r combine}
origin <- min(get_dates(i2))
info_list$pred$dates <- origin + info_list$pred$dates.x
the_fit <- list(
  lm = lm1,
  info = info_list,
  origin = min(get_dates(i2))
)
class(the_fit) <- "incidence_fit"
the_fit
plot(i, fit = the_fit)
```


# Structure of an `incidence_fit_list` object

There are several reasons for having multiple fits to a single `incidence`
object. One may want to have a separate fit for different groups represented in
the object, or one may want to split the fits at the peak of the epidemic. To
aid in plotting and summarizing the different fits, we've created the 
`incidence_fit_list` class. This class has two defining features:

 - It consists of a named list containing one or more `incidence_fit` objects or
   lists containing `incidence_fit` objects.
 - An attribute called "locations" contains a list whose length is equal to the
   number of `incidence_fit` objects in the object. Each list element contains
   a vector that defines where an `incidence_fit` object is within the 
   `incidence_fit_list`. 

The reason for this structure is because it is sometimes necessary to nest
lists of `incidence_fit` objects within lists. When this happens, accessing
individual elements of the objects cumbersome. To alleviate this, each object
has a distinct path within the named list in the "locations" attribute that 
allows one to access the object directly since R allows you to traverse the
elements of a nested list by subsetting with a vector:

```{r nest}
l <- list(a = list(b = 1, c = 2),d = list(e = list(f = 3, g = 4), h = 5))
str(l)
l[[c("a", "b")]]
l[[c("d", "e", "f")]]
```

## Example: A tale of two fits

The function `fit_optim_split()` attempts to find the optimal split point in an
epicurve, producing an `incidence_fit_list` object in the `$fit` element of the 
returned list:

```{r incidence_fit_list}
fl <- fit_optim_split(i)
fl$fit
plot(i, fit = fl$fit)
```

Here you can see that the object looks very similar to the `incidence_fit` object,
but it has extra information. The first thing you may notice is the fact that
both "doubling" and "halving" are shown. This is because the two fits have 
different signs for the daily growth rate. The second thing you may notice is
the fact that there is something called `attr(x, 'locations')`. This attribute
gives the location of the `incidence_fit` objects within the list. We can
illustrate how this works if we look at the structure of the object:

```{r incidence_fit_list_str}
str(fl$fit, max.level = 2)
```

Internally, all of the methods for `incidence_fit_list` use the 'locations'
attribute to navigate:

```{r incidence_fit_methods}
methods(class = "incidence_fit_list")
```

For example, it's often useful to extract the growth rate for all models at
once. The `get_info()` method allows us to do this easily:

```{r get_info_incidence_fit_list}
get_info(fl$fit, "r")
get_info(fl$fit, "r.conf")
```

Because doubling or halving is determined by whether or not `r` is negative, we
automatically filter out the results that don't make sense, but you can include
them with `na.rm = FALSE`:

```{r get_doubling}
get_info(fl$fit, "doubling.conf")
get_info(fl$fit, "doubling.conf", na.rm = FALSE)
```

## Example: Nested incidence_fit

Above, we showed the example of a basic `incidence_fit_list` class with two
objects representing the fits before and after the peak of an epicurve. However,
it is often useful evaluate fits for different groups separately. Here, we will
construct an incidence object, but define groups by gender:

```{r incidence_by_gender}
gen <- ebola_sim$linelist$gender
ig <- incidence(dat, interval = "week", group = gen)
plot(ig, border = "grey98")
```

Now if we calculate an optimal fit split, we will end up with four different
fits: two for each defined gender.

```{r fit_gender}
fg <- fit_optim_split(ig)
plot(ig, fit = fg$fit, border = "grey98", stack = FALSE)
```

If we look at the fit object, we can see again that it is an `incidence_fit_list`
but this time with four fits defined. 

```{r fit_gender_print}
fg$fit
str(fg$fit, max.level = 3)
```

> Notice that the nested lists themselves are of class `incidence_fit_list`.

Now, even though the fits within nested lists, the 'locations' attributes still
defines where they are within the object so that the `get_info()` function still
operates normally:

```{r get_info_gender}
get_info(fg$fit, "r.conf")
```

If you need to access all the fits easily, a convenience function to flatten the
list is available in `get_fit()`:

```{r get_fit}
str(get_fit(fg$fit), max.level = 2)
```

Because all that defines an `incidence_fit_list` is the class definition and
the 'locations' attribute that defines the positions of the `incidence_fit`
objects within the nesting, then it's also possible to define the output of
`fit_optim_split()` as an `incidence_fit_list` class:

```{r incidence_fit_listify}
print(locs <- attributes(fg$fit)$locations)

for (i in seq_along(locs)) {
	locs[[i]] <- c("fit", locs[[i]])
}
print(locs)
fg.ifl <- fg
attributes(fg.ifl)$locations<- locs
class(fg.ifl) <- "incidence_fit_list"
```

Now when we print the object, we can see that it prints only the information
related to the `incidence_fit_list`:

```{r new_fit_list_print}
fg.ifl
```

But, we still retain all of the extra information in the list:

```{r list_stuff}
str(fg.ifl, max.level = 1)
fg.ifl$split
fg.ifl$df
fg.ifl$plot
```