Samples n_occ virtual occurrence points from a suitability or
Mahalanobis distance prediction surface. Supports centroid, edge, and
random sampling strategies, and accepts both raster (SpatRaster)
and data frame inputs.
Usage
sample_data(n_occ, prediction, prediction_layer = NULL,
sampling = "centroid", method = "suitability",
sampling_mask = NULL, seed = 1, strict = NULL,
verbose = TRUE)Arguments
- n_occ
Integer. Number of occurrence points to sample.
- prediction
A
SpatRasteror data frame containing the prediction surface to sample from.- prediction_layer
Character. Name of the layer or column to use as the prediction values. Required when
predictioncontains multiple layers or columns.- sampling
Character. Sampling strategy. One of
"centroid"(default),"edge", or"random". Controls where within the niche points are preferentially drawn from.- method
Character. Weighting method. One of
"suitability"(default) or"mahalanobis". Must match the type of values inprediction_layer: suitability values must be in[0, 1], Mahalanobis values must be non-negative.- sampling_mask
A
SpatRasterorSpatVectorused to restrict sampling to a geographic area. Only supported whenpredictionis aSpatRaster.- seed
Integer. Random seed for reproducibility. Default is
1.- strict
Logical or
NULL. IfTRUE, removesNAand zero-valued cells before sampling (recommended with truncated prediction layers). IfNULL(default), auto-detected from the layer name and the proportion of zeros andNAs in the prediction values.- verbose
Logical. If
TRUE(default), prints progress messages.
Value
A data frame of sampled occurrence points with the same columns as the
input prediction (minus the internal pred column). If
prediction is a SpatRaster, the output includes x
and y coordinate columns.
Details
The sampling and method arguments interact to define the
probability weights used when drawing points:
sampling = "centroid",method = "suitability": weights proportional to suitability — higher near the niche center.sampling = "edge",method = "suitability": weights proportional to \(1 - \text{suitability}\) — higher near the niche boundary.sampling = "centroid",method = "mahalanobis": weights inversely proportional to Mahalanobis distance — higher near the centroid.sampling = "edge",method = "mahalanobis": weights proportional to Mahalanobis distance — higher near the boundary.sampling = "random": equal weights regardless of method.
When strict = NULL, the function auto-detects truncation by checking
whether the layer name contains "trunc" or whether the proportion of
zeros or NAs exceeds 25%.
Examples
pred_df <- utils::read.csv(system.file("extdata/predictions_virt.csv",
package = "nicheR"))
# Centroid strategy: samples cluster near the niche center
occ_centroid <- sample_data(n_occ = 100,
prediction = pred_df,
prediction_layer = "suitability_trunc",
sampling = "centroid",
method = "suitability",
strict = TRUE)
#> Starting: sample_data()
#> Warning: 'prediction' is a data.frame, and it is missing 'x' and 'y', results wont show geographical connections.
#> Done: sampled 100 points.
head(occ_centroid)
#> bio_1 bio_12 Mahalanobis suitability suitability_trunc
#> 258 23.28190 1893.185 0.3281314 0.8486862 0.8486862
#> 287 23.56782 1582.955 0.4795846 0.7867912 0.7867912
#> 917 24.59394 1700.047 0.8959799 0.6389111 0.6389111
#> 983 25.32620 1628.294 2.4530267 0.2933135 0.2933135
#> 601 23.25921 1664.733 0.2347962 0.8892311 0.8892311
#> 187 22.46245 2116.146 2.3148894 0.3142883 0.3142883
# Edge strategy: samples spread toward the niche boundary
occ_edge <- sample_data(n_occ = 100,
prediction = pred_df,
prediction_layer = "suitability_trunc",
sampling = "edge",
method = "mahalanobis",
strict = TRUE)
#> Starting: sample_data()
#> Warning: 'prediction' is a data.frame, and it is missing 'x' and 'y', results wont show geographical connections.
#> Done: sampled 100 points.
# Random strategy: samples distributed uniformly across suitable area
occ_random <- sample_data(n_occ = 100,
prediction = pred_df,
prediction_layer = "suitability_trunc",
sampling = "random")
#> Starting: sample_data()
#> Warning: 'prediction' is a data.frame, and it is missing 'x' and 'y', results wont show geographical connections.
#> Step: auto-detected a likely truncated prediction surface. Setting 'strict = TRUE' and removing NA and zero values. You can override this behavior with the 'strict' argument...
#> Done: sampled 100 points.