24 HerbVar Glossary
When writing papers using HerbVar data, authors should attempt to use a consistent set of definitions and terms, which can facilitate comparisons of patterns across studies. If one person’s “among-leaf variation” is the same as another person’s “within-plant variation”, this simple mis-match in terminology will hinder communication. The constellation of terms related to variance, variability, and variation can also be source of confusion. We have assembled a set of terms that might commonly be used in HerbVar manuscripts and grants. Many of the terms related to “distributional thinking” are defined in Wetzel et al. (2023) and we recommend referring to that paper for more detailed discussion.
24.1 About HerbVar data
Herbivory: Consumption of living plant tissues by animals (i.e. herbivores or omnivores). Many of the HerbVar data on herbivory are about leaf damage done by insects, rather than large mammals; we recommend that authors clarify this at some point in manuscripts.
Herbivore Damage: Plant tissue removed or directly injured by herbivore feeding. Some authors, especially those focused on agriculture, define damage to mean the growth, fitness, or yield consequences of injury, and reserve injury for the direct consumption effects.
Folivory: Damage to leaves by animals that consume leaves.
Frugivory: Damage to fruits by animals that consume fruits.
Reproductive damage: Damage on any reproductive structure, including flowers, fruits, and seeds.
24.2 Distributional thinking
Variability: The propensity for a process to produce different outcomes. Variability is a feature of underlying processes, whereas variation is a feature of an observed pattern (i.e., data). Well-known drivers of interaction variability in ecology include organismal plasticity, demographic stochasticity, and population dynamics; these factors and more combine to create an underlying probability distribution for a given interaction. When it is not possible to measure repeated outcomes, it may not be possible to measure variability directly.
Variation: The magnitude of differences among units in a system. Variability is a feature of underlying processes, whereas variation is a feature of an observed pattern (i.e., data).
Variance and standard deviation: A common way to quantify variation, the average squared deviation from the mean (or sum of squared deviations divided by (n-1) for samples from populations). Note that the units of the variance are squared, so it is common to work with the square root of the variance, which is the standard deviation.
Coefficient of Variation (CV): The standard deviation divided by the mean. Many data-generating processes result in larger variances when means are larger (e.g., for Poisson processes the variance and mean are equal, by definition). Dividing the standard deviation by the mean helps to “correct” for this relationship, and results in a unitless number so that the CVs can be compared among any data, regardless of the original data units (leaves, area, height, etc.).
The CV is popular, but is problematic when the mean is near 0 or negative (as can happen if data can take negative values as well as positive ones), or the variance is extremely large. A recently proposed alternative is bounded by 0 and 1 and is based on second-order distributional moments. It was proposed by Kvålseth (2017), thus is indicated by KCV, whereas the traditional CV is indicated here by PCV as it was first proposed by Pearson (1897). They are related by Lobry et al. (2023).
Gini Coefficient: The Gini Coefficient is a measure of inequality, rather than variance. It is bounded by 0 and 1, where 0 indicates a perfectly even distribution of a total amount among all individuals, and 1 indicates a perfectly unequal distribution of the total among individuals (all of the total belongs to one individual). Note that because the Gini is defined by a distribution of a total amount, it does not depend on the mean amount per individual.
Median Absolute Difference: The median of the absolute values of the differences between each value and the overall median. This is an alternative to the standard deviation that does not rely on squared values. It is less sensitive than the standard deviation to extreme values.
Mean Absolute Difference: The mean of the absolute values of the differences between each value and the overall mean. Like the median absolute difference, this metric is less sensitive than the standard deviation to large deviations.
Relative Mean Absolute Difference: The mean absolute difference divided by the arithmetic mean. Like the coefficient of variation (CV), this metric relativizes the mean absolute difference by the mean so that the dispersion of populations with different means can be compared. Like the CV, it is a unitless number. The relative mean absolute difference is equivalent to two times the Gini coefficient.
Skewness: A measure of the asymmetry of a distribution. A symmetric distribution, with the same shape of distribution tail in both directions, has a skew of zero. A positive skew indicates a longer tail toward higher values (the median is to the left of the mean). A negative skew indicates a longer tail to the left. In R, the ‘sn’ package can be useful for working with ‘skew-Normal’ distributions, extensions of Normal (Gaussian) distributions.
Kurtosis: A measure of the thickness (“fatness”) of a distribution’s “tails”. When a distribution’s tails are thicker, data are more likely to have a high variance and contain larger outliers.
24.3 Terms specific to the HerbVar data
A strength of the HerbVar dataset is that it contains data for multiple hierarchical scales of organization, from leaves on individual plants, up through populations and species. Patterns in the HerbVar data can be analyzed at multiple scales. Within a manuscript, we recommend people be clear about the scale(s) of analysis, and consistent about which term are used.
Among-leaf variation: This is the smallest scale of HerbVar data, the leaves on a single plant individual. This variation can be quantified using metrics such as the Gini coefficient, the variance, the standard deviation, or a CV. At this scale all of the damage is to a single individual, so it has also been called “within-plant variation”.
Among-plant variation: Each plant has a mean amount of herbivory, which can vary among plants within a population (for most HerbVar data, a population contains 60 individuals). This scale might also be called “within-population variation”.
Among-population variation: For some plant species, participants have contributed data from different populations. For focal species, the HerbVar data include many populations from multiple continents. Each population has a mean amount of herbivory, also known as “within-species” variation. Comparisons among populations are facilitated by having the same sample size for each population.
Among-species variation: For studies looking across species within a family, or at any larger phylogenetic context, one could compare species-level mean amounts of herbivory. This idea can be transferred to higher phylogenetic levels by looking at among-family variation, or to environmental scales of organization by looking at among-habitat variation or among-biome variation.