Title: | Tools to Use and Explore the 'BioTIME' Database |
---|---|
Description: | The 'BioTIME' database was first published in 2018 and inspired ideas, questions, project and research article. To make it even more accessible, an R package was created. The 'BioTIMEr' package provides tools designed to interact with the 'BioTIME' database. The functions provided include the 'BioTIME' recommended methods for preparing (gridding and rarefaction) time series data, a selection of standard biodiversity metrics (including species richness, numerical abundance and exponential Shannon) alongside examples on how to display change over time. It also includes a sample subset of both the query and meta data, the full versions of which are freely available on the 'BioTIME' website <https://biotime.st-andrews.ac.uk/home.php>. |
Authors: | Alban Sagouis [aut, cre] , Faye Moyes [aut] , Inês S. Martins [aut, rev] , Shane A. Blowes [ctb] , Viviana Brambilla [ctb] , Cher F. Y. Chow [ctb] , Ada Fontrodona-Eslava [ctb] , Laura Antão [ctb, rev] , Jonathan M. Chase [fnd] , Maria Dornelas [fnd, cph] , Anne E. Magurran [fnd] , European Research Council grant AdG BioTIME 250189 [fnd], European Research Council grant PoC BioCHANGE 727440 [fnd], European Research Council grant AdG MetaCHANGE 101098020 [fnd], The Leverhulme Centre for Anthropocene Biodiversity grant RC-2018-021 [fnd] |
Maintainer: | Alban Sagouis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.3 |
Built: | 2024-11-09 04:54:19 UTC |
Source: | https://github.com/biotimehub/biotimer |
A subset of data from BioTIME temporal surveys.
BTsubset_data
BTsubset_data
## 'BTsubset_data' A data frame with 81,084 rows and 17 columns:
Unique BioTIME identifier for record
Double representing the abundance for the record (see metadata for details of ABUNDANCE_TYPE
Double representing the biomass for the record (see metadata for details of BIOMASS_TYPE
Unique identifier linking to the species table
Concatenation of variables comprising unique sampling event
Name or identifier of plot field, only used for fixed plots such as forest quadrats
Latitude of record
Longitude of record
Depth or elevation of record if available
Numerical day of record
Numerical value of month for record, i.e. January=1
Year of record
BioTIME study unique identifier
Validated species identifier key
Highest taxonomic resolution of individual, preferred is genus and species
Level of resolution, i.e. 'species' represented by genus and species
Higher level taxonomic grouping, i.e. Fish
<https://biotime.st-andrews.ac.uk/download.php>
A subset of the metadata from BioTIME
BTsubset_meta
BTsubset_meta
## 'BTsubset_meta' A data frame with 12 rows and 25 columns:
BioTIME study unique identifier
Realm of study location, i.e. Marine
Climate of study location, i.e. Temperate
Habitat of study location, i.e. Rivers
binary variable indicating if the study is within a protected area
Biome of study location (taken from the WWF biomes, i.e. Temperate broadleaf and mixed forests
High level taxonomic identity of study species, i.e. Fish
More detailed information on taxonomy, i.e. woody plants
Title of study as identified in original source
A, B or AB to designate abundance only, biomass only or both
Number of unique data points in study, e.g. 10 data points spanning 15 years = 10
first year of study
last year of study
Central latitude taken from the convex hull around all study coordinates
Central longitude taken from the convex hull around all study coordinates
Number of distinct species in study
Number of distinct samples in study
Number of distinct geographic coordinates in study
Total number of records in study
Grain size in km2, i.e. size of forest plots
total area of study in km2
Date that the study was added to the database
Type of abundance, i.e. count
Type of biomass, i.e. weight
concatenation of descriptors comprising the unique sampling event
<https://biotime.st-andrews.ac.uk/download.php>
Calculates a set of standard alpha diversity metrics
getAlphaMetrics(x, measure)
getAlphaMetrics(x, measure)
x |
( |
measure |
( |
The function getAlphaMetrics
computes nine alpha diversity metrics for
a given community data frame, where measure
is a character input
specifying the abundance or biomass field used for the calculations. For each
row of the data frame with data, getAlphaMetrics
calculates
the following metrics:
- Species richness (S
) as the total number of species in each year with currency > 0.
- Numerical abundance (N
) as the total currency (sum) in each year
(either total abundance or total biomass).
- Maximum Numerical abundance (maxN) as the highest currency value reported in each year.
- Shannon or Shannon–Weaver index is calculated as , where
is the proportional abundance of species i and b is the base of the logarithm (natural logarithms), while exponential Shannon is given by
exp(Shannon)
.
- Simpson's index is calculated as , while Inverse Simpson as
.
- McNaughton's Dominance is calculated as the sum of the pi of the two most abundant species.
- Probability of intraspecific encounter or PIE is calculated as .
Note that the input data frame needs to be in the format of the output of the
gridding
function and/or resampling
functions,
which includes keeping the default BioTIME data column names. If such columns
are not found an error is issued and the computations are halted.
Returns a data frame with results for species richness (S
), numerical
abundance (N
), maximum numerical abundance (maxN
), Shannon Index (Shannon
),
Exponential Shannon (expShannon
), Simpson's Index (Simpson), Inverse Simpson
(InvSimpson
), Probability of intraspecific encounter (PIE
) and McNaughton's
Dominance (DomMc
) for each year and assemblageID
.
x <- data.frame( resamp = 1L, YEAR = rep(rep(2010:2015, each = 4), times = 4), Species = c(replicate(n = 8L, sample(letters, 24L, replace = FALSE))), ABUNDANCE = rpois(24 * 8, 10), assemblageID = rep(LETTERS[1L:8L], each = 24) ) res <- getAlphaMetrics(x, measure = "ABUNDANCE")
x <- data.frame( resamp = 1L, YEAR = rep(rep(2010:2015, each = 4), times = 4), Species = c(replicate(n = 8L, sample(letters, 24L, replace = FALSE))), ABUNDANCE = rpois(24 * 8, 10), assemblageID = rep(LETTERS[1L:8L], each = 24) ) res <- getAlphaMetrics(x, measure = "ABUNDANCE")
Calculates a set of standard beta diversity metrics
getBetaMetrics(x, measure)
getBetaMetrics(x, measure)
x |
( |
measure |
( |
The function getBetaMetrics computes three beta diversity metrics for a given community data frame, where measure
is a character input specifying the abundance or biomass field used for the calculations. getBetaMetrics
calls the vegdist
function which calculates for each row the following metrics: Jaccard dissimilarity (method = "jaccard"
), Morisita-Horn dissimilarity (method = "horn"
) and Bray-Curtis dissimilarity (method = "bray"
). Here, the dissimilarity metrics are calculated against the baseline year of each assemblage time series i.e.
the first year of each time series.
Note that the input data frame needs to be in the format of the output of the
gridding
and/or resampling
functions, which includes keeping the default BioTIME data column names. If such columns are not found an error is
issued and the computations are halted.
Returns a data.frame
with results for Jaccard dissimilarity (JaccardDiss
), Morisita-Horn dissimilarity (MorisitaHornDiss
), and Bray-Curtis dissimilarity (BrayCurtsDiss
) for each year and assemblageID
.
x <- data.frame( resamp = 1L, YEAR = rep(rep(2010:2015, each = 4), times = 4), Species = c(replicate( n = 8L, sample(letters, 24L, replace = FALSE))), ABUNDANCE = rpois(24 * 8, 10), assemblageID = rep(LETTERS[1L:8L], each = 24) ) res <- getBetaMetrics(x, measure = "ABUNDANCE")
x <- data.frame( resamp = 1L, YEAR = rep(rep(2010:2015, each = 4), times = 4), Species = c(replicate( n = 8L, sample(letters, 24L, replace = FALSE))), ABUNDANCE = rpois(24 * 8, 10), assemblageID = rep(LETTERS[1L:8L], each = 24) ) res <- getBetaMetrics(x, measure = "ABUNDANCE")
Fits linear regression models to getAlphaMetrics
or getBetaMetrics
outputs
getLinearRegressions(x, divType, pThreshold = 0.05)
getLinearRegressions(x, divType, pThreshold = 0.05)
x |
('data.frame') BioTIME data table in the format of the output of |
divType |
('character') string specifying the nature of the metrics in the data; either 'divType = "alpha"' or 'divType = "beta"' are supported |
pThreshold |
('numeric') P-value threshold for statistical significance |
The function 'getLinearRegressions' fits simple linear regression models
(see lm
for details) for a given output ('data') of
either getAlphaMetrics
or getBetaMetrics
function.
'divType' needs to be specified in agreement with x.
The typical model has the form 'metric ~ year'. Note that assemblages with
less than 3 time points and/or single species time series are removed.
Returns a single long 'data.frame' with results of linear regressions (slope, p-value, significance, intercept) for each 'assemblageID'.
library(BioTIMEr) x <- data.frame( resamp = 1L, YEAR = rep(rep(2010:2015, each = 4), times = 4), Species = c(replicate(n = 8L * 6L, sample(letters[1L:10L], 4L, replace = FALSE))), ABUNDANCE = rpois(24 * 8, 10), assemblageID = rep(LETTERS[1L:8L], each = 24) ) alpham <- getAlphaMetrics(x, "ABUNDANCE") getLinearRegressions(x = alpham, divType = "alpha", pThreshold = 0.01) betam <- getBetaMetrics(x = x, "ABUNDANCE") getLinearRegressions(x = betam, divType = "beta")
library(BioTIMEr) x <- data.frame( resamp = 1L, YEAR = rep(rep(2010:2015, each = 4), times = 4), Species = c(replicate(n = 8L * 6L, sample(letters[1L:10L], 4L, replace = FALSE))), ABUNDANCE = rpois(24 * 8, 10), assemblageID = rep(LETTERS[1L:8L], each = 24) ) alpham <- getAlphaMetrics(x, "ABUNDANCE") getLinearRegressions(x = alpham, divType = "alpha", pThreshold = 0.01) betam <- getBetaMetrics(x = x, "ABUNDANCE") getLinearRegressions(x = betam, divType = "beta")
grids BioTIME data into a discrete global grid based on the location of the samples (latitude/longitude).
gridding(meta, btf, res = 12, resByData = FALSE)
gridding(meta, btf, res = 12, resByData = FALSE)
meta |
( |
btf |
( |
res |
( |
resByData |
( |
Each BioTIME study contains distinct samples which were collected with a consistent
methodology over time, and each with unique coordinates and date.
These samples can be fixed plots (i.e. SL or
'single-location' studies where measures are taken from a set of specific
georeferenced sites at any given time) or wide-ranging surveys, transects,
tows, and so on (i.e. ML or 'multi-location' studies where measures are taken
from multiple sampling locations over large extents that may or may not align
from year to year,
see runResampling
. gridding
is a function designed to deal with the issue
of varying spatial extent between studies by using a global grid of hexagonal cells
derived from dgconstruct
and assigning the individual
samples to the cells across the grid based on its latitude and
longitude. Specifically, each sample is assigned
a different combination of study ID and grid cell resulting in a unique
identifier for each assemblage time series within each cell
(assemblageID). This allows for the integrity of each study and each sample
to be maintained, while large extent studies are split into local time series
at the grid cell level. By default meta represents a long form data frame
containing the data information for BioTIME studies and btf
is a data frame
containing long form data from a main BioTIME query (see Example). res
defines the global grid cell resolution, thus determining the size of the
cells (see vignette("dggridR")
). res = 12
was found to be the most
appropriate value when working on the whole BioTIME database(corresponding
to ~96 km2 cell area), but the user can define their own grid resolution
(e.g. res = 14
, or when resbyData = TRUE
allow the function to find the
best res
based on the average study extent.
Returns a 'data.frame'
, with selected columns from the
btf
and meta
data frames, an extra integer column called
'cell'
and two character columns called 'StudyMethod' and 'assemblageID'
(concatenation of study_ID
and cell
).
library(BioTIMEr) gridded_data <- gridding(BTsubset_meta, BTsubset_data)
library(BioTIMEr) gridded_data <- gridding(BTsubset_meta, BTsubset_data)
Takes the output of gridding
and applies sample-based rarefaction to
standardise the number of samples per year within each cell-level time series
(i.e. assemblageID).
resampling(x, measure, resamps = 1L, conservative = FALSE)
resampling(x, measure, resamps = 1L, conservative = FALSE)
x |
( |
measure |
( |
resamps |
( |
conservative |
( |
Sample-based rarefaction prevents temporal variation in sampling effort from
affecting diversity estimates (see Gotelli N.J., Colwell R.K. 2001 Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters 4(4), 379-391) by selecting an equal number of samples across all years in a time series.
resampling
counts the number of unique samples taken in each year (sampling effort),
identifies the minimum number of samples across all years, and then uses this minimum to
randomly resample each year down to that number. Thus, standardising the
sampling effort between years,
standard biodiversity metrics can be calculated based on an equal number of
samples (e.g. using getAlphaMetrics
, getAlphaMetrics
).
measure
is a character
input specifying the chosen currency to be used during the sample-based
rarefaction. It can be a single column name or a vector of two or more column
names - e.g. for BioTIME, measure="ABUNDANCE"
, measure="BIOMASS"
or measure = c("ABUNDANCE", "BIOMASS")
.
By default, any observations with NA
within the currency field(s) are
removed. You can choose to remove the full sample where such observations are
present by setting conservative
to TRUE
. resamps
can be used to define
multiple iterations, effectively creating multiple alternative datasets
as in each iteration different samples will be randomly selected for the
years where number of samples > minimum.
Note that the function always returns a single data frame, i.e. if resamps
> 1,
the returned data frame is the result of individual data frames concatenated
together, one from each iteration identified by a numerical
unique identifier 1:resamps.
Returns a single long form data.frame
containing the total currency
or currencies of interest (sum) for each species in each year within each
rarefied time series (i.e. assemblageID
). An extra integer column called
resamp
indicates the specific iteration.
library(BioTIMEr) set.seed(42) x <- gridding(BTsubset_meta, BTsubset_data) resampling(x, measure = "BIOMASS") resampling(x, measure = "ABUNDANCE") resampling(x, measure = c("ABUNDANCE","BIOMASS"))
library(BioTIMEr) set.seed(42) x <- gridding(BTsubset_meta, BTsubset_data) resampling(x, measure = "BIOMASS") resampling(x, measure = "ABUNDANCE") resampling(x, measure = c("ABUNDANCE","BIOMASS"))
Scale construction for ggplot use
Scale construction for filling in ggplot
scale_color_biotime(palette = "realms", discrete = TRUE, reverse = FALSE, ...) scale_colour_biotime(palette = "realms", discrete = TRUE, reverse = FALSE, ...) scale_fill_biotime(palette = "realms", discrete = TRUE, reverse = FALSE, ...)
scale_color_biotime(palette = "realms", discrete = TRUE, reverse = FALSE, ...) scale_colour_biotime(palette = "realms", discrete = TRUE, reverse = FALSE, ...) scale_fill_biotime(palette = "realms", discrete = TRUE, reverse = FALSE, ...)
palette |
One of: 'realms', 'gradient', 'cool', 'warm', default to 'realms'. |
discrete |
See Details. default to 'FALSE' |
reverse |
Default to 'FALSE' |
... |
Passed to |
USAGE NOTE: Remember to change these arguments when plotting colours continuously.
If discrete
is TRUE
, the function returns a colour palette produced by
discrete_scale
and if discrete
is FALSE
, the function
returns a colour palette produced by scale_color_gradient
.
If discrete
is TRUE
, the function returns a colour palette produced by
discrete_scale
and if discrete
is FALSE
, the function
returns a colour palette produced by scale_color_gradient
.
Cher F. Y. Chow