24 HerbVar Glossary

Modified

January 23, 2026

When writing papers using HerbVar data, authors should attempt to use a consistent set of definitions and terms, which can facilitate comparisons of patterns across studies. If one person’s “among-leaf variation” is the same as another person’s “within-plant variation”, this simple mis-match in terminology will hinder communication. The constellation of terms related to variance, variability, and variation can also be source of confusion. We have assembled a set of terms that might commonly be used in HerbVar manuscripts and grants. Many of the terms related to “distributional thinking” are defined in Wetzel et al. (2023) and we recommend referring to that paper for more detailed discussion.

24.1 About HerbVar data

Herbivory: Consumption of living plant tissues by animals (i.e. herbivores or omnivores). Many of the HerbVar data on herbivory are about leaf damage done by insects, rather than large mammals; we recommend that authors clarify this at some point in manuscripts.

Herbivore Damage: Plant tissue removed or directly injured by herbivore (or omnivore) feeding. Some authors, especially those focused on agriculture, define damage to mean the growth, fitness, or yield consequences of injury, and reserve injury for the direct consumption effects. For consistency with other HerbVar papers, we recommend using ‘damage’.

Folivory: Damage to leaves by animals that consume leaves.

Frugivory: Damage to fruits by animals that consume fruits.

Reproductive damage: Damage to any reproductive structure, including flowers, fruits, and seeds.

24.2 Distributional thinking

Variability: The propensity for a process to produce different outcomes. Variability is a feature of underlying processes, whereas variation is a feature of an observed pattern (i.e., data). Well-known drivers of interaction variability in ecology include organismal plasticity, demographic stochasticity, and population dynamics; these factors and more combine to create an underlying probability distribution for a given interaction. When it is not possible to measure repeated outcomes, it may not be possible to measure variability directly.

Variation: The magnitude of differences among units in a system. Variability is a feature of underlying processes, whereas variation is a feature of an observed pattern (i.e., data).

Variance and standard deviation: The variance is a common way to quantify variation, and is the average squared deviation from the mean (or sum of squared deviations divided by \((n-1)\) for samples from populations). Note that the units of the variance are squared, so it is common to work with the square root of the variance, which is the standard deviation.

Coefficient of Variation (CV): The standard deviation divided by the mean. Many data-generating processes result in larger variances when means are larger (e.g., for Poisson processes the variance and mean are equal, by definition). Dividing the standard deviation by the mean helps to “correct” for this relationship, and results in a unitless number so that the CVs can be compared among any data, regardless of the original data units (leaves, area, height, etc.).

Important consideration regarding the CV

The CV is popular, but is problematic when the mean is near 0 or negative (as can happen if data can take negative values as well as positive ones), or the variance is extremely large. A recently proposed alternative is bounded by 0 and 1 and is based on second-order distributional moments. It was proposed by Kvålseth (2017), thus is indicated by \(^KCV\), whereas the traditional CV is indicated here by \(^PCV\) as it was first proposed by Pearson (1897). As per Lobry et al. (2023), they are related by \(^KCV = \sqrt{\frac{{^PCV}^2}{1 + {^PCV}^2}}\).

Gini Coefficient: The Gini Coefficient is a measure of inequality, rather than variance. It is bounded by 0 and 1, where 0 indicates a perfectly even distribution of a total amount among all individuals, and 1 indicates a perfectly unequal distribution of the total among individuals (i.e., all of the total belongs to one individual). Note that because the Gini is defined by a distribution of a total amount, it does not depend on the mean amount per individual.

Lorenz curve: A curve that represents the inequality of a measurement (i.e., herbivory) in a sample. Individual data points are ordered and plotted as cumulative percent of individuals on the x-axis against the cumulative proportion of the measured variable on the y-axis. The 1:1 line represents the line of equality, where all individuals would have the same amounts of herbivory. The polygon connecting to the origin and the maximum (100% of individuals and herbivory) is the basis for the Gini coefficient and the Lorenz asymmetry coefficient. For an in-depth overview of these indices with ecological examples, see Damgaard and Weiner (2000).

Lorenz asymmetry coefficient: Characterizes the symmetry of the Lorenz curve. A value of 1 represents a symmetric curve, where the vertex is parallel to the line of equality. A value greater than 1 indicates herbivory is concentrated on a few individuals. A value less than 1 indicates that only a few individuals have low herbivory.

Median Absolute Difference: The median of the absolute values of the differences between each value and the overall median. This is an alternative to the standard deviation that does not rely on squared values. It is less sensitive than the standard deviation to extreme values.

Mean Absolute Difference: The mean of the absolute values of the differences between each value and the overall mean. Like the median absolute difference, this metric is less sensitive than the standard deviation to large deviations.

Relative Mean Absolute Difference: The mean absolute difference divided by the arithmetic mean. Like the coefficient of variation (CV), this metric relativizes the mean absolute difference by the mean so that the dispersion of populations with different means can be compared. Like the CV, it is a unitless number. The relative mean absolute difference is equivalent to two times the Gini coefficient.

Skewness: A measure of the asymmetry of a distribution. A symmetric distribution, with the same shape of distribution tail in both directions, has a skew of zero. A positive skew indicates a longer tail toward higher values (i.e., the median is to the left of the mean). A negative skew indicates a longer tail to the left. In R, the sn package (Azzalini 2023) can be useful for working with ‘skew-Normal’ distributions and extensions of Normal (Gaussian) distributions.

Kurtosis: A measure of the thickness (“fatness”) of a distribution’s tails. When a distribution’s tails are thicker, data are more likely to have a high variance and contain larger outliers.

24.3 Terms specific to the HerbVar data

A strength of the HerbVar dataset is that it contains data for multiple hierarchical scales of organization, from leaves on individual plants, up through populations and species. Patterns in the HerbVar data can be analyzed at multiple scales. Within a manuscript, we recommend people be clear about the scale(s) of analysis, and consistent about which terms are used.

Preferred Terminology

For clarity, we recommend using “among-…” for the unit being measured as the descriptor, rather than alternatives such as “within-” or “intra-” at units larger than the focal measurements. For example, if you are measuring damage on plants within a population, the term “among-plant” should be used rather than “within-population”. Ideally, the first mention would say “among-plants within a population”.

Among-leaf variation: This is the smallest scale of HerbVar data, the leaves on a single plant individual. This variation can be quantified using metrics such as the Gini coefficient, the variance, the standard deviation, or a CV. At this scale, all of the damage is to a single individual, so it has also been called “within-plant variation”.

Among-plant variation: Each plant has a mean amount of herbivory (i.e., mean of the herbivore damage on all its leaves), which can vary among plants within a population (for most HerbVar data, a population is represented by a sample of 60 individuals). This scale might also be called “within-population variation” or “intrapopulation variation”, but when the focus is on a plant-level measurement, we recommend using “among-plant variation”.

Among-population variation: For some plant species, participants have contributed data from different populations. For focal species, the HerbVar data include many populations from multiple continents. Each population has a mean amount of herbivory, also known as “within-species” variation. Comparisons among populations are facilitated by having the same sample size for each population.

Among-species variation: For studies looking across species within a family, or at any larger phylogenetic context, one could compare species-level mean amounts of herbivory. This idea can be transferred to higher phylogenetic levels by looking at among-family variation, or to environmental scales of organization by looking at among-habitat variation or among-biome variation.