Title: | Progressive Bias Correction of Satellite Environmental Data |
---|---|
Description: | Implements a bias correction method that combines Random Forest models with Quantile Mapping to improve the accuracy of satellite-derived environmental datasets. The model corrects biases in meteorological variables, such as precipitation and temperature, by integrating in situ measurements and a Digital Elevation Model (DEM). |
Authors: | Jonnathan Landi [aut, cre, cph]
|
Maintainer: | Jonnathan Augusto Landi Bermeo <[email protected]> |
License: | GPL (>=3) |
Version: | 1.3-0 |
Built: | 2025-02-12 02:50:42 UTC |
Source: | https://github.com/jonnathan-landi/rfplus |
This dataset contains daily measurements from several precipitation stations. The first column represents the measurement date, and the following columns correspond to the measurements from each station on that date. The station columns are labeled with unique identifiers for each station, and the number of stations may vary depending on the dataset configuration.
data("BD_Insitu")
data("BD_Insitu")
A 'data.table' object with station measurements. The dataset includes the following columns:
Date
The measurement date (type Date
).
Station_ID_1, Station_ID_2, ...
Measurements from the stations (numeric values). Each column after Date
represents the measurements of a precipitation station for the corresponding date. The columns are labeled with unique identifiers (e.g., Station_ID_1
, Station_ID_2
, etc.) for each station, and the number of stations (columns) may vary.
The data represents daily measurements taken from several precipitation stations. The first column contains the measurement dates, and the following columns represent the measurements of each station on those dates. The number of stations may vary depending on the dataset, and each station is uniquely identified by its column name (e.g., Station_ID_1
, Station_ID_2
, etc.).
The data was generated for use in the bias correction model for satellite products, RFplus.
data(BD_Insitu) ## You can use str(BD_Insitu) to get a description of the structure ## or view some of the first rows using head(BD_Insitu)
data(BD_Insitu) ## You can use str(BD_Insitu) to get a description of the structure ## or view some of the first rows using head(BD_Insitu)
This dataset contains the coordinates (in UTM format) of several precipitation stations. Each station is uniquely identified by the Cod
column, which corresponds to the station identifiers used in the BD_Insitu
dataset. The coordinates of each station are provided in two columns: X
for the Easting (longitude) and Y
for the Northing (latitude).
data("Cords_Insitu")
data("Cords_Insitu")
A 'data.table' object with station coordinates. The dataset includes the following columns:
Cod
The unique identifier for each station. This should correspond to the station columns in the BD_Insitu
dataset.
X
The Easting (X-coordinate) of the station in UTM format (numeric).
Y
The Northing (Y-coordinate) of the station in UTM format (numeric).
The data represents the geographic coordinates of precipitation stations used in the analysis. The first column, Cod
, contains the unique identifiers of the stations, which should match the column names in the BD_Insitu
dataset. The subsequent columns, X
and Y
, contain the UTM coordinates for each station, representing the station's location on the Earth's surface.
The data was generated for use in the bias correction model for satellite products, RFplus.
data(Cords_Insitu) ## You can use str(Cords_Insitu) to get a description of the structure ## or view some of the first rows using head(Cords_Insitu)
data(Cords_Insitu) ## You can use str(Cords_Insitu) to get a description of the structure ## or view some of the first rows using head(Cords_Insitu)
Applies a hybrid three-step bias correction approach combining Random Forest predictions, residual correction, and distribution adjustment using quantile mapping methods to correct biases in satellite-derived environmental data.
RFplus(BD_Insitu, Cords_Insitu, Covariates, ...) ## Default S3 method: RFplus( BD_Insitu, Cords_Insitu, Covariates, n_round = NULL, wet.day = FALSE, ntree = 2000, seed = 123, training = 1, Rain_threshold = 0.1, method = c("RQUANT", "QUANT", "none"), ratio = 15, save_model = FALSE, name_save = NULL, ... ) ## S3 method for class 'data.table' RFplus( BD_Insitu, Cords_Insitu, Covariates, n_round = NULL, wet.day = FALSE, ntree = 2000, seed = 123, training = 1, Rain_threshold = 0.1, method = c("RQUANT", "QUANT", "none"), ratio = 15, save_model = FALSE, name_save = NULL, ... )
RFplus(BD_Insitu, Cords_Insitu, Covariates, ...) ## Default S3 method: RFplus( BD_Insitu, Cords_Insitu, Covariates, n_round = NULL, wet.day = FALSE, ntree = 2000, seed = 123, training = 1, Rain_threshold = 0.1, method = c("RQUANT", "QUANT", "none"), ratio = 15, save_model = FALSE, name_save = NULL, ... ) ## S3 method for class 'data.table' RFplus( BD_Insitu, Cords_Insitu, Covariates, n_round = NULL, wet.day = FALSE, ntree = 2000, seed = 123, training = 1, Rain_threshold = 0.1, method = c("RQUANT", "QUANT", "none"), ratio = 15, save_model = FALSE, name_save = NULL, ... )
BD_Insitu |
'data.table' containing the ground truth measurements (dependent variable) used to train the RFplus model. Each column represents a ground station, and station identifiers are stored as column names. The class of 'BD_Insitu' must be 'data.table'. Each row represents a time step with measurements of the corresponding station. |
Cords_Insitu |
'data.table' containing metadata for the ground stations. Must include the following columns: - 'Cod': Unique identifier for each ground station. - 'X': Latitude of the station in UTM format. - 'Y': Longitude of the station in UTM format. |
Covariates |
A list of covariates used as independent variables in the RFplus model. Each covariate should be a 'SpatRaster' object (from the 'terra' package) and can represent satellite-derived weather variables or a Digital Elevation Model (DEM). All covariates should have the same number of layers (bands), except for the DEM, which must have only one layer. |
... |
Additional arguments to pass to the underlying methods (e.g., for model tuning or future extensions). |
n_round |
Numeric indicating the number of decimal places to round the corrected values. If 'n_round' is set to 'NULL', no rounding is applied. |
wet.day |
Numeric value indicating the threshold for wet day correction. Values below this threshold will be set to zero. - 'wet.day = FALSE': No correction is applied (default). - For wet day correction, provide a numeric threshold (e.g., 'wet.day = 0.1'). |
ntree |
Numeric indicating the maximum number trees to grow in the Random Forest algorithm. The default value is set to 2000. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. If this value is too low, the prediction may be biased. |
seed |
Integer for setting the random seed to ensure reproducibility of results (default: 123). |
training |
Numerical value between 0 and 1 indicating the proportion of data used for model training. The remaining data are used for validation. Note that if you enter, for example, 0.8 it means that 80 If you do not want to perform validation, set training = 1. (Default training = 1). |
Rain_threshold |
Numeric value that defines the precipitation threshold for classifying rainfall events. Precipitation values above this threshold will be considered as rainfall events, while values below it will be considered as no-rain events. This parameter is used to calculate key performance metrics such as the Probability of Detection (POD), False Alarm Rate (FAR), and Critical Success Index (CSI), which help assess the accuracy of rainfall event predictions. Note: This parameter should only be set when 'training' is not equal to 1, as it is needed to calculate the POD, FAR, and CSI metrics. The default value for this parameter is 0.1. |
method |
A character string specifying the quantile mapping method used for distribution adjustment. Options are: - '"RQUANT"': Robust quantile mapping to adjust satellite data distribution to observed data. - '"QUANT"': Standard quantile mapping. - '"none"': No distribution adjustment is applied. Only Random Forest-based bias correction and residual correction are performed. |
ratio |
integer Maximum search radius (in kilometers) for the quantile mapping setting using the nearest station. (default = 15 km) |
save_model |
Logical value indicating whether the corrected raster layers should be saved to disk. The default is 'FALSE'. If set to 'TRUE', make sure to set the working directory beforehand using 'setwd(path)' to specify where the files should be saved. |
name_save |
Character string. Base name for output file (default: NULL). The output file will be saved as "Model_RFplus.nc". If you set a different name, make sure you do not set the ".nc" format, as the code will internally assign it. |
The 'RFplus' method implements a three-step approach: 1. **Base Prediction**: Random Forest model is trained using satellite data and covariates. 2. **Residual Correction**: A second Random Forest model is used to correct the residuals from the base prediction. 3. **Distribution Adjustment**: Quantile mapping (QUANT or RQUANT) is applied to adjust the distribution of satellite data to match the observed data distribution.
The final result combines all three steps, correcting the biases while preserving the outliers, and improving the accuracy of satellite-derived data such as precipitation and temperature.
Returns a list containing two elements:
Ensamble |
A 'SpatRaster' object containing the bias-corrected layers for each time step. The number of layers corresponds to the number of dates for which the correction is applied. This represents the corrected satellite data adjusted for bias. |
Validation |
A data frame or similar object containing the statistical results obtained from the validation process. These statistics assess the performance of the bias correction applied to the satellite data. |
.
Jonnathan Augusto landi Bermeo, [email protected]
# Load the libraries library(terra) library(data.table) # Load the data data("BD_Insitu", package = "RFplus") data("Cords_Insitu", package = "RFplus") # Convert to data.table setDT(BD_Insitu) setDT(Cords_Insitu) # Load the covariates Covariates <- list( MSWEP = terra::rast(system.file("extdata/MSWEP.nc", package = "RFplus")), CHIRPS = terra::rast(system.file("extdata/CHIRPS.nc", package = "RFplus")), DEM = terra::rast(system.file("extdata/DEM.nc", package = "RFplus")) ) # Apply the RFplus bias correction model RFplus(BD_Insitu, Cords_Insitu, Covariates, n_round = 1, wet.day = 0.1, ntree = 2000, seed = 123, training = 1, Rain_threshold = 0.1, method = "QUANT", ratio = 15, save_model = FALSE, name_save = NULL)
# Load the libraries library(terra) library(data.table) # Load the data data("BD_Insitu", package = "RFplus") data("Cords_Insitu", package = "RFplus") # Convert to data.table setDT(BD_Insitu) setDT(Cords_Insitu) # Load the covariates Covariates <- list( MSWEP = terra::rast(system.file("extdata/MSWEP.nc", package = "RFplus")), CHIRPS = terra::rast(system.file("extdata/CHIRPS.nc", package = "RFplus")), DEM = terra::rast(system.file("extdata/DEM.nc", package = "RFplus")) ) # Apply the RFplus bias correction model RFplus(BD_Insitu, Cords_Insitu, Covariates, n_round = 1, wet.day = 0.1, ntree = 2000, seed = 123, training = 1, Rain_threshold = 0.1, method = "QUANT", ratio = 15, save_model = FALSE, name_save = NULL)