Sample occurrence data from a virtual prediction surface

Samples n_occ virtual occurrence points from a prediction surface generated by predict() on a nicheR_ellipsoid object. Unlike sample_data(), this function only accepts data frame input and is designed for purely virtual (non-spatial) workflows where no raster or geographic coordinates are involved.

Usage

sample_virtual_data(
  n_occ,
  object,
  virtual_prediction = NULL,
  prediction_layer = NULL,
  sampling = "centroid",
  method = "suitability",
  seed = 1,
  verbose = TRUE,
  strict = NULL
)

Arguments

n_occ: Integer. Number of occurrence points to sample.
object: A nicheR_ellipsoid object. Used for context and validation but not directly for sampling — prediction values are taken from virtual_prediction.
virtual_prediction: A data frame containing the prediction surface to sample from, typically the output of predict() on a nicheR_ellipsoid object.
prediction_layer: Character. Name of the column to use as prediction values. Required when virtual_prediction contains multiple prediction columns.
sampling: Character. Sampling strategy. One of "centroid" (default), "edge", or "random". Controls where within the niche points are preferentially drawn from.
method: Character. Weighting method. One of "suitability" (default) or "mahalanobis". Must match the type of values in prediction_layer: suitability values must be in [0, 1], Mahalanobis values must be non-negative.
seed: Integer. Random seed for reproducibility. Default is 1.
verbose: Logical. If TRUE (default), prints progress messages.
strict: Logical or NULL. If TRUE, removes NA and zero-valued rows before sampling (recommended with truncated prediction layers). If NULL (default), auto-detected from the layer name and the proportion of zeros and NAs in the prediction values.

Value

A data frame of sampled occurrence points with the same columns as virtual_prediction, minus the internal pred column.

Details

The sampling and method arguments interact to define sampling weights in the same way as sample_data():

sampling = "centroid", method = "suitability": weights proportional to suitability — higher near the niche center.
sampling = "edge", method = "suitability": weights proportional to \(1 - \text{suitability}\) — higher near the niche boundary.
sampling = "centroid", method = "mahalanobis": weights inversely proportional to Mahalanobis distance — higher near the centroid.
sampling = "edge", method = "mahalanobis": weights proportional to Mahalanobis distance — higher near the boundary.
sampling = "random": equal weights regardless of method.

Auto-detection of strict follows the same logic as sample_data(): it is set to TRUE if the layer name contains "trunc" or if the proportion of zeros or NAs exceeds 25%.

Sample occurrence data from a virtual prediction surface

Usage

Arguments

Value

Details

See also