This function takes the map and original titer table, and performs a version of bootstrapping defined by the method argument. For each bootstrap run this process is performed and a record of the coordinates of points in the lowest stress solution is kept. See details for a description of the bootstrapping methods you can apply.

bootstrapMap(
  map,
  method,
  bootstrap_repeats = 1000,
  bootstrap_ags = TRUE,
  bootstrap_sr = TRUE,
  reoptimize = TRUE,
  optimizations_per_repeat = 100,
  ag_noise_sd = 0.7,
  titer_noise_sd = 0.7,
  options = list()
)

Arguments

map

The map object

method

One of "resample", "bayesian" or "noisy" (see details)

bootstrap_repeats

The number of bootstrap repeats to perform

bootstrap_ags

For "resample" and "bayesian" methods, whether to apply bootstrapping across antigens

bootstrap_sr

For "resample" and "bayesian" methods, whether to apply bootstrapping across sera

reoptimize

Should the whole map be reoptimized with each bootstrap run. If FALSE, the map is simply relaxed from it's current optimization with each run.

optimizations_per_repeat

When re-optimizing the map from scratch, the number of optimization runs to perform

ag_noise_sd

The standard deviation (on the log titer scale) of measurement noise applied per antigen when using the "noisy" method

titer_noise_sd

The standard deviation (on the log titer scale) of measurement noise applied per titer when using the "noisy" method

options

Map optimizer options, see RacOptimizer.options()

Value

Returns the map object updated with bootstrap information

Details

Bootstrapping methods

"resample": The resample boostrap is the most standard bootstrap method, a random resample of the titer table data is taken with replacement. Depending on your specification, resampling is applied across either individual antigens, individual sera or both antigens and sera. In essence this method tries to let you see how robust the map is to inclusion of particular titer measurements or antigens or sera. Like most bootstrapping techniques it will prove give more reliable results the more antigens and sera you have in your map. It won't work very well for a map of 5 sera and antigens for example, in this case a "noisy" bootstrap may be better.

"bayesian": The bayesian bootstrap is akin to the resampling bootstrap, but rather than explicitly resampling data, weights are assigned to each part of the titer table data according to random draws from a dirichilet distribution. Under this scheme, every data point will play at least some role in making the map, even if only weighted slightly. Sometimes this is helpful, if you know for example that the points in your map are highly dependent upon the presence of a few antigens / sera / titers to achieve reasonable triangulation of point positions and you don't really want to risk removing them completely and ending up with bootstrap runs that are under-constrained, you might want to consider this approach. On the other hand this might be exactly what you don't want and you want to know uncertainty that can be generated when certain subsets of the data are excluded completely, in that case you probably want to stick with the "resample" method.

"noisy": The noisy bootstrap, sometimes termed a smooth bootstrap involved adding normally distributed noise to each observation. The distribution of this noise can be parametrised through the ag_noise_sd and titer_noise_sd arguments. titer_noise_sd refers to the standard deviation (on the log scale) of noise added to each individual titer measurement in the table, while antigen_noise_sd refers to the standard deviation of noise applied to titers for each antigen. The reason for this distinction is that we have noticed with repeat measurements of influenza data there is often both a random noise per titer and a random noise per antigen, i.e. in one repeat titers may all be around one 2-fold higher on average, in addition to unbiased additional titer noise. If you wish to only simulate additional noise per titer and not a per antigen effect, simply set antigen_noise_sd to 0. Note that in order to use this most effectively it is best to have an idea of the amount and type of measurement noise you may expect in your data and set these parameters accordingly.

See also