Sample occurrence data from a bias-weighted prediction surface

Samples n_occ virtual occurrence points using the bias-weighted prediction values directly as sampling probabilities. Unlike sample_data(), there is no sampling strategy argument — the prediction layer values themselves define where points are drawn from, making this function suited for simulating realistically biased occurrence records.

Usage

sample_biased_data(n_occ, prediction, prediction_layer = NULL,
                          sampling_mask = NULL, seed = 1, verbose = TRUE,
                          strict = NULL)

Arguments

n_occ: Integer. Number of occurrence points to sample.
prediction: A SpatRaster or data frame containing the bias-weighted prediction surface to sample from.
prediction_layer: Character. Name of the layer or column to use as sampling weights. Required when prediction contains multiple layers or columns.
sampling_mask: A SpatRaster or SpatVector used to restrict sampling to a geographic area. Only supported when prediction is a SpatRaster.
seed: Integer. Random seed for reproducibility. Default is 1.
verbose: Logical. If TRUE (default), prints progress messages.
strict: Logical or NULL. If TRUE, removes NA and zero-valued cells before sampling. If NULL (default), auto-detected from the layer name and the proportion of zeros and NAs in the prediction values.

Value

A data frame of sampled occurrence points with the same columns as the input prediction (minus the internal pred column). If prediction is a SpatRaster, the output includes x and y coordinate columns.

Details

Prediction values are used directly as sampling weights, so they must be non-negative. Higher values correspond to higher sampling probability, reflecting areas of greater bias (e.g., higher detectability or observer effort). This is in contrast to sample_data(), which transforms prediction values according to a sampling and method argument.

Auto-detection of strict follows the same logic as sample_data(): it is set to TRUE if the layer name contains "trunc" or if the proportion of zeros or NAs exceeds 25%.

Examples

biased_pred <- terra::rast(system.file("extdata/applied_bias_rast.tif",
                                     package = "nicheR"))

# Sample points form bias surface (not probability surface)
occ_biased <- sample_biased_data(n_occ = 100,
                                 prediction = biased_pred,
                                 prediction_layer = "suitability_biased_direct")
#> Starting: sample_biased_data()
#> Done: sampled 100 points from biased prediction layer

head(occ_biased)
#>               x         y suitability_biased_direct
#> 30419 -70.25000  8.916667                0.24892101
#> 12972 -98.08333 20.916667                0.18772951
#> 26003 -86.25000 11.916667                0.10474110
#> 14818 -70.41667 19.750000                0.01907690
#> 12489 -98.58333 21.250000                0.27995270
#> 14063 -76.25000 20.250000                0.02141882