We propose the following workflow for processing raw yield monitor data:
To illustrate the functions that process at yield monitor data, we will look at data from a research paper named Prairie strips improve biodiversity and the delivery of multiple ecosystem services from corn–soybean croplands. Let us, specifically, look at the data from 2012.
extd.dir <- system.file("extdata", package = "pacu")
raw.yield <- st_read(file.path(extd.dir, '2012-basswood.shp'), quiet = TRUE)
boundary <- st_read(file.path(extd.dir, 'boundary.shp'), quiet = TRUE)
We can see that the raw yield data has multiple columns. Some of them are more or less relevant depending on how we choose to process the data yield data to make a yield map.
## [1] "ID" "LONGITUDE" "LATITUDE" "FLOW" "TIME"
## [6] "CYCLES" "DISTANCE" "SWATH" "MOISTURE" "STATUS"
## [11] "PASS" "SERIAL" "FIELD" "LOAD" "CROP"
## [16] "GPS" "PDOP" "ALTITUDE" "DRY_BU_AC" "DAY"
## [21] "MONTH" "DAYOFMONTH" "HOUR" "MINUTE" "SECOND"
## [26] "TIMELAPSE" "SPEED" "geometry"
Our main objective in this step of the proposed workflow is to make sure that the data conforms to what we would expect of yield monitor data. For instance, we can examine the raw yield data or the distance between measurements.
The boxplot shows some observations that appear to be abnormally high. These can be a result of sudden speed or direction change, GPS errors, and other sources of unknown variability.
In this step of the proposed workflow, the pa_check_yield() function searches for potential sources of problem in the data to give the user the opportunity of dealing with these before processing the yield data. For instance, when no units are supplied to the pa_yield() function, the function tries to guess the units. Since the pa_check_yield() function outputs the guessed units, the user can supply the units to the pa_yield() function in case the function’s guess is incorrect.
The user can choose to check for potential errors that would occur with all the algorithms available to the function or specify which algorithm should the function target.
## Field information
## -------------------------
## # of points 4240
## Approx. area (ha) 9.8529
## Approx. area (ac) 24.352
## Latitude 41.539
## Longitude -93.27
## CRS NA
## -------------------------
## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326
##
##
## Algorithm: Simple
## Checking column names and units
## --------------------------------
## variable column mean units
## --- --- --- ---
## yield DRY_BU_AC 116.7 bu/ac
## moisture MOISTURE 17 %
## interval CYCLES 2 s
## --------------------------------
## Checking data values
## ---------------------------------------
## variable min max median NAs extreme
## --- --- --- --- --- ---
## yield 10.25 372 118.553 0 6
## moisture 6.90 22.9 17.3000 0 17
## interval 1 3 2 0 0
## ---------------------------------------
## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd
##
##
## Algorithm: RITAS
## Checking column names and units
## -------------------------------
## variable column mean units
## --- --- --- ---
## mass - - -
## flow FLOW 18.80 lb/s
## moisture MOISTURE 17 %
## interval CYCLES 2 s
## angle - - -
## width SWATH 240 in
## distance DISTANCE 147.2 in
## -------------------------------
## Checking data values
## ---------------------------------------
## variable min max median NAs extreme
## --- --- --- --- --- ---
## mass - - - - -
## flow 0.3 40.2 18.4000 0 0
## moisture 6.90 22.9 17.3000 0 17
## interval 1 3 2 0 0
## angle - - - - -
## width 240 240 240 0 0
## distance 5 304 153 0 0
## ---------------------------------------
## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd
## Warning in print.check.yield(x): Column angle not found but can be estimated
## within pa_yield
## Median overlap between harvest polygons is 3.1 %
## If this value seems high, please check the units of distance and swath.
The pa_check_yield() function will search for several sources of problem that could prevent the pa_yield() function from working correctly. Therefore, it is important to deal with the identified issues before processing the yield data.
Let us imagine a scenario in which we want to use the simple algorithm to process the data. To do so, we need a column with the yield data. The pa_yield() and pa_check_yield() functions will try to identify which columns of the input data frame contain the relevant information, and the units if those are not supplied. We can create a toy example in which the yield column has an uncommon name, preventing the functions from identifying it.
toy.example <- raw.yield
names(toy.example) <- gsub('DRY_BU_AC', 'NOT_A_COMMON_NAME', names(toy.example))
chk <- pa_check_yield(toy.example, algorithm = 'simple')
chk
## Field information
## -------------------------
## # of points 4240
## Approx. area (ha) 9.8529
## Approx. area (ac) 24.352
## Latitude 41.539
## Longitude -93.27
## CRS NA
## -------------------------
## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326
##
##
## Algorithm: Simple
## Checking column names and units
## -------------------------------
## variable column mean units
## --- --- --- ---
## yield - - -
## moisture MOISTURE 17 %
## interval CYCLES 2 s
## -------------------------------
## Checking data values
## ---------------------------------------
## variable min max median NAs extreme
## --- --- --- --- --- ---
## yield - - - - -
## moisture 6.90 22.9 17.3000 0 17
## interval 1 3 2 0 0
## ---------------------------------------
## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd
## Warning in print.check.yield(x): Columns yield and moisture are needed for the
## simple algorithm
To deal with this issue, we can rename the column to a more common name such as “yield” or “YIELD”. Additionally, a valid solution would be to provide the column name to the pa_yield() function directly when processing the data. Here, we rename the column simply to “yield” and we can see that the algorithm is able to identify it once again.
names(toy.example) <- gsub('NOT_A_COMMON_NAME', 'yield', names(toy.example))
chk <- pa_check_yield(toy.example, algorithm = 'simple')
chk
## Field information
## -------------------------
## # of points 4240
## Approx. area (ha) 9.8529
## Approx. area (ac) 24.352
## Latitude 41.539
## Longitude -93.27
## CRS NA
## -------------------------
## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326
##
##
## Algorithm: Simple
## Checking column names and units
## -------------------------------
## variable column mean units
## --- --- --- ---
## yield yield 116.7 bu/ac
## moisture MOISTURE 17 %
## interval CYCLES 2 s
## -------------------------------
## Checking data values
## ---------------------------------------
## variable min max median NAs extreme
## --- --- --- --- --- ---
## yield 10.25 372 118.553 0 6
## moisture 6.90 22.9 17.3000 0 17
## interval 1 3 2 0 0
## ---------------------------------------
## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd
The pa_yield() function can produce yield maps using two algorithms: simple and ritas. The simple algorithm allows moisture standardization, data cleaning, and smoothing. By default, it does not conduct any areal aggregations. The ritas algorithm (Damiano & Niemi, 2020) follows a constructive framework, in which the steps are: rectangle creation; intersection assignment; tasselation; apportioning; and smoothing.
When algorithm is simple, the function will search for the columns that indicate yield, moisture, and time. Note that, since different crops have a different conversion factors from bushel to pounds, it is important to specify the lbs.per.bushel argument when the US standard system of units is used. In this case, we are producing a yield map for a maize crop, thus, 56 lbs/bushel. Additionally, the function supports the output of the yield map in both U.S. standard and metric unit systems. In this example, we have chosen “metric”.
ymp1 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
lbs.per.bushel = 56,
unit.system = 'metric',
verbose = FALSE)
The “ymp1” object contains information on the yield processing algorithm, smoothing method, conversion factor used, moisture, and a summary of the yield data.
## Yield data processing algorithm: simple
## Smoothing method: none
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield summary (t/ha)
## -----------------
## Min. 0.635
## 1st Qu. 4.4947
## Median 7.4521
## Mean 7.3107
## 3rd Qu. 10.0796
## Max. 23.054
## -----------------
We can visualize the yield map by plotting it
In the previous map, we can see that the units are “t/ha”. However, a user might want to produce yield maps using US standard units. Additionally, the user can set the moisture level to the market standard moisture. When no moisture adjustment is provided, the function adjusts the moisture level to the average moisture in the data set.
ymp2 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
unit.system = 'standard',
moisture.adj = 15.5,
lbs.per.bushel = 56,
verbose = FALSE)
## Yield data processing algorithm: simple
## Smoothing method: none
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 15.5%
## Yield summary (bushel/acre)
## -----------------
## Min. 9.9172
## 1st Qu. 70.247
## Median 116.47
## Mean 114.26
## 3rd Qu. 157.53
## Max. 360.31
## -----------------
We can visualize the yield map in bushels/acre and at a moisture level of 15.5%
In the previous maps we have conducted simple aggregation of yield data and standardization of the moisture content. We can, additionally, add a cleaning step to remove outliers in the raw yield data.
ymp3 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
unit.system = 'metric',
clean = TRUE,
clean.sd = 3,
lbs.per.bushel = 56,
verbose = FALSE)
This cleaning step will remove observations outside of a 3 standard deviation range from the field mean. This will often leave empty spots in the field if no smoothing method is selected.
In cases in which we want to interpolate the date, the function can interpolate the yield map using two prediction methods: inverse distance weighted (IDW) and kriging. The IDW method is deterministic and is much faster than the kriging method. The kriging method is stochastic and the computation time scales with \(O(n^{3})\), where n is the number of observations.
Here, we can produce and interpolated yield map using the IDW method. By default, the function uses a power of 2, but that can be overridden by passing the argument idp to the function.
If the argument grid is not specified, the function will provide smoothed predictions at the same points as the input data. This will provide predictions for any geometries data would have been removed during the cleaning step.
ymp4 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
unit.system = 'metric',
clean = TRUE,
clean.sd = 3,
smooth.method = 'idw',
lbs.per.bushel = 56,
verbose = FALSE)
## Yield data processing algorithm: simple
## Smoothing method: idw
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield summary (t/ha)
## -----------------
## Min. 0.635
## 1st Qu. 4.4886
## Median 7.4437
## Mean 7.2890
## 3rd Qu. 10.0523
## Max. 17.426
## -----------------
Alternatively, we can interpolate the maps using the Kriging method. As noted earlier, the computational time scales quickly as the number of observations increases. Therefore, we will add the argument “maxdist” to decrease computational time. It should be noted that the value at which the user sets the “maxdist” argument should be determined after examining the variogram.
ymp5 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'simple',
unit.system = 'metric',
clean = TRUE,
clean.sd = 3,
smooth.method = 'krige',
lbs.per.bushel = 56,
verbose = FALSE,
maxdist = 50)
## Yield data processing algorithm: simple
## Smoothing method: krige
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield summary (t/ha)
## -----------------
## Min. 1.2440
## 1st Qu. 5.1989
## Median 7.6114
## Mean 7.4753
## 3rd Qu. 9.7014
## Max. 13.910
## -----------------
## Variogram model: Mat (Matern)
## Partial sill: 122865.5
## Range: 107.4692
## Kappa: 0.3
We can also investigate the variogram
When algorithm is ritas, the function expects an argument data.columns that indicates crop mass or crop flow, moisture, interval, angle, swath, and distance. Additionally, the function expects an argument data.units to indicate the units of the data columns. When neither of the two previous arguments is provided, the function uses a dictionary of common names to try and search for these columns. Additionally, the function will attempt to guess the units of the columns.
The ritas algorithm is more computationally intensive than the simple algorithm. Some of the steps can be sped up by specifying the argument “cores”. This will allow paralellization of some of the processing steps.
ymp6 <- pa_yield(input = raw.yield,
algorithm = 'ritas',
lbs.per.bushel = 56,
unit.system = 'metric',
verbose = FALSE)
In this case, since the function is able to guess the columns and the units correctly, this would be the same as the chunk below.
ymp6 <- pa_yield(input = raw.yield,
data.columns = c(flow = 'FLOW', moisture = 'MOISTURE', interval = 'CYCLES', width = 'SWATH', distance = 'DISTANCE'),
data.units = c(flow = 'lb/s', moisture = '%', interval = 's', width = 'in', distance = 'in'),
unit.system = 'metric',
algorithm = 'ritas',
verbose = FALSE)
In the previous yield map, we did not conduct any smoothing, so technically we did not complete the ritas algorithm. Following the algorithm’s steps, the best results should be expected by setting smooth.method to “krige”. The function also includes the option to Krige in the log scale when the distribution of the yield data is heavily skewed. In this case, the argument fun should be set to ‘log’.
ymp7 <- pa_yield(input = raw.yield,
boundary = boundary,
algorithm = 'ritas',
smooth.method = 'krige',
unit.system = 'metric',
lbs.per.bushel = 56,
verbose = FALSE,
maxdist = 50)
## Yield data processing algorithm: ritas
## Smoothing method: krige
## Conversion factor: 56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield summary (t/ha)
## -----------------
## Min. 1.6147
## 1st Qu. 6.2176
## Median 9.0567
## Mean 8.8636
## 3rd Qu. 11.634
## Max. 17.456
## -----------------
## Variogram model: Mat (Matern)
## Partial sill: 194472.9
## Range: 197.6451
## Kappa: 0.3