Skip to contents

Samples n_occ virtual occurrence points from a prediction surface generated by predict() on a nicheR_ellipsoid object. Unlike sample_data(), this function only accepts data frame input and is designed for purely virtual (non-spatial) workflows where no raster or geographic coordinates are involved.

Usage

sample_virtual_data(
  n_occ,
  object,
  virtual_prediction = NULL,
  prediction_layer = NULL,
  sampling = "centroid",
  method = "suitability",
  seed = 1,
  verbose = TRUE,
  strict = NULL
)

Arguments

n_occ

Integer. Number of occurrence points to sample.

object

A nicheR_ellipsoid object. Used for context and validation but not directly for sampling — prediction values are taken from virtual_prediction.

virtual_prediction

A data frame containing the prediction surface to sample from, typically the output of predict() on a nicheR_ellipsoid object.

prediction_layer

Character. Name of the column to use as prediction values. Required when virtual_prediction contains multiple prediction columns.

sampling

Character. Sampling strategy. One of "centroid" (default), "edge", or "random". Controls where within the niche points are preferentially drawn from.

method

Character. Weighting method. One of "suitability" (default) or "mahalanobis". Must match the type of values in prediction_layer: suitability values must be in [0, 1], Mahalanobis values must be non-negative.

seed

Integer. Random seed for reproducibility. Default is 1.

verbose

Logical. If TRUE (default), prints progress messages.

strict

Logical or NULL. If TRUE, removes NA and zero-valued rows before sampling (recommended with truncated prediction layers). If NULL (default), auto-detected from the layer name and the proportion of zeros and NAs in the prediction values.

Value

A data frame of sampled occurrence points with the same columns as virtual_prediction, minus the internal pred column.

Details

The sampling and method arguments interact to define sampling weights in the same way as sample_data():

  • sampling = "centroid", method = "suitability": weights proportional to suitability — higher near the niche center.

  • sampling = "edge", method = "suitability": weights proportional to \(1 - \text{suitability}\) — higher near the niche boundary.

  • sampling = "centroid", method = "mahalanobis": weights inversely proportional to Mahalanobis distance — higher near the centroid.

  • sampling = "edge", method = "mahalanobis": weights proportional to Mahalanobis distance — higher near the boundary.

  • sampling = "random": equal weights regardless of method.

Auto-detection of strict follows the same logic as sample_data(): it is set to TRUE if the layer name contains "trunc" or if the proportion of zeros or NAs exceeds 25%.

See also

sample_data for spatial sampling from raster or data frame prediction surfaces, sample_biased_data for bias-weighted sampling.