Yield monitor data

Proposed workflow

We propose the following workflow for processing raw yield monitor data:

Read the data in
Compute summary statistics and visualize the data
Check the yield data
Fix potential issues
Process the data
Examine the yield map

To illustrate the functions that process at yield monitor data, we will look at data from a research paper named Prairie strips improve biodiversity and the delivery of multiple ecosystem services from corn–soybean croplands. Let us, specifically, look at the data from 2012.

Reading the data in

extd.dir <- system.file("extdata", package = "pacu")
raw.yield <- st_read(file.path(extd.dir, '2012-basswood.shp'), quiet = TRUE)
boundary <- st_read(file.path(extd.dir, 'boundary.shp'), quiet = TRUE)

We can see that the raw yield data has multiple columns. Some of them are more or less relevant depending on how we choose to process the data yield data to make a yield map.

names(raw.yield)

##  [1] "ID"         "LONGITUDE"  "LATITUDE"   "FLOW"       "TIME"      
##  [6] "CYCLES"     "DISTANCE"   "SWATH"      "MOISTURE"   "STATUS"    
## [11] "PASS"       "SERIAL"     "FIELD"      "LOAD"       "CROP"      
## [16] "GPS"        "PDOP"       "ALTITUDE"   "DRY_BU_AC"  "DAY"       
## [21] "MONTH"      "DAYOFMONTH" "HOUR"       "MINUTE"     "SECOND"    
## [26] "TIMELAPSE"  "SPEED"      "geometry"

Compute summary statistics and visualize the data

Our main objective in this step of the proposed workflow is to make sure that the data conforms to what we would expect of yield monitor data. For instance, we can examine the raw yield data or the distance between measurements.

Examining the raw yield data

cols <- function(n) hcl.colors(n, 'Temps', rev = TRUE)
plot(raw.yield['DRY_BU_AC'], pal = cols)

The boxplot shows some observations that appear to be abnormally high. These can be a result of sudden speed or direction change, GPS errors, and other sources of unknown variability.

boxplot(raw.yield$DRY_BU_AC)

Check the yield monitor data with pa_check_yield()

In this step of the proposed workflow, the pa_check_yield() function searches for potential sources of problem in the data to give the user the opportunity of dealing with these before processing the yield data. For instance, when no units are supplied to the pa_yield() function, the function tries to guess the units. Since the pa_check_yield() function outputs the guessed units, the user can supply the units to the pa_yield() function in case the function’s guess is incorrect.

The user can choose to check for potential errors that would occur with all the algorithms available to the function or specify which algorithm should the function target.

chk <- pa_check_yield(input = raw.yield,
               algorithm = 'all')
chk

## Field information
## -------------------------
## # of points        4240  
## Approx. area (ha)  9.8529
## Approx. area (ac)  24.352
## Latitude           41.539
## Longitude          -93.27
## CRS                NA    
## -------------------------

## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326

## 
## 
## Algorithm: Simple
## Checking column names and units
## --------------------------------
## variable  column     mean  units 
## ---       ---        ---   ---   
## yield     DRY_BU_AC  116.7 bu/ac 
## moisture  MOISTURE   17    %     
## interval  CYCLES     2     s     
## --------------------------------
## Checking data values
## ---------------------------------------
## variable  min  max  median  NAs  extreme 
## ---       ---  ---  ---     ---  ---     
## yield     10.25 372  118.553 0    6       
## moisture  6.90 22.9 17.3000 0    17      
## interval  1    3    2       0    0       
## ---------------------------------------

## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd

## 
## 
## Algorithm: RITAS
## Checking column names and units
## -------------------------------
## variable  column    mean  units 
## ---       ---       ---   ---   
## mass      -         -     -     
## flow      FLOW      18.80 lb/s  
## moisture  MOISTURE  17    %     
## interval  CYCLES    2     s     
## angle     -         -     -     
## width     SWATH     240   in    
## distance  DISTANCE  147.2 in    
## -------------------------------
## Checking data values
## ---------------------------------------
## variable  min  max  median  NAs  extreme 
## ---       ---  ---  ---     ---  ---     
## mass      -    -    -       -    -       
## flow      0.3  40.2 18.4000 0    0       
## moisture  6.90 22.9 17.3000 0    17      
## interval  1    3    2       0    0       
## angle     -    -    -       -    -       
## width     240  240  240     0    0       
## distance  5    304  153     0    0       
## ---------------------------------------

## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd

## Warning in print.check.yield(x): Column angle not found but can be estimated
## within pa_yield

## Median overlap between harvest polygons is 3.1 % 
## If this value seems high, please check the units of distance and swath.

Fix potential issues

The pa_check_yield() function will search for several sources of problem that could prevent the pa_yield() function from working correctly. Therefore, it is important to deal with the identified issues before processing the yield data.

Example: missing yield column

Let us imagine a scenario in which we want to use the simple algorithm to process the data. To do so, we need a column with the yield data. The pa_yield() and pa_check_yield() functions will try to identify which columns of the input data frame contain the relevant information, and the units if those are not supplied. We can create a toy example in which the yield column has an uncommon name, preventing the functions from identifying it.

toy.example <- raw.yield 
names(toy.example) <- gsub('DRY_BU_AC', 'NOT_A_COMMON_NAME', names(toy.example))
chk <- pa_check_yield(toy.example, algorithm = 'simple')
chk

## Field information
## -------------------------
## # of points        4240  
## Approx. area (ha)  9.8529
## Approx. area (ac)  24.352
## Latitude           41.539
## Longitude          -93.27
## CRS                NA    
## -------------------------

## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326

## 
## 
## Algorithm: Simple
## Checking column names and units
## -------------------------------
## variable  column    mean  units 
## ---       ---       ---   ---   
## yield     -         -     -     
## moisture  MOISTURE  17    %     
## interval  CYCLES    2     s     
## -------------------------------
## Checking data values
## ---------------------------------------
## variable  min  max  median  NAs  extreme 
## ---       ---  ---  ---     ---  ---     
## yield     -    -    -       -    -       
## moisture  6.90 22.9 17.3000 0    17      
## interval  1    3    2       0    0       
## ---------------------------------------

## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd

## Warning in print.check.yield(x): Columns yield and moisture are needed for the
## simple algorithm

To deal with this issue, we can rename the column to a more common name such as “yield” or “YIELD”. Additionally, a valid solution would be to provide the column name to the pa_yield() function directly when processing the data. Here, we rename the column simply to “yield” and we can see that the algorithm is able to identify it once again.

names(toy.example) <- gsub('NOT_A_COMMON_NAME', 'yield', names(toy.example))
chk <- pa_check_yield(toy.example, algorithm = 'simple')
chk

## Field information
## -------------------------
## # of points        4240  
## Approx. area (ha)  9.8529
## Approx. area (ac)  24.352
## Latitude           41.539
## Longitude          -93.27
## CRS                NA    
## -------------------------

## The CRS is missing from the input object. The pa_yield function will default to EPSG:4326

## 
## 
## Algorithm: Simple
## Checking column names and units
## -------------------------------
## variable  column    mean  units 
## ---       ---       ---   ---   
## yield     yield     116.7 bu/ac 
## moisture  MOISTURE  17    %     
## interval  CYCLES    2     s     
## -------------------------------
## Checking data values
## ---------------------------------------
## variable  min  max  median  NAs  extreme 
## ---       ---  ---  ---     ---  ---     
## yield     10.25 372  118.553 0    6       
## moisture  6.90 22.9 17.3000 0    17      
## interval  1    3    2       0    0       
## ---------------------------------------

## Warning in print.check.yield(x): Extreme values identified. These these are
## values outside of the range of the mean ± 3sd

Processing the data and examining the yield maps and diagnostic plots

The pa_yield() function can produce yield maps using two algorithms: simple and ritas. The simple algorithm allows moisture standardization, data cleaning, and smoothing. By default, it does not conduct any areal aggregations. The ritas algorithm (Damiano & Niemi, 2020) follows a constructive framework, in which the steps are: rectangle creation; intersection assignment; tasselation; apportioning; and smoothing.

Simple aggregation of yield data

Initial example

When algorithm is simple, the function will search for the columns that indicate yield, moisture, and time. Note that, since different crops have a different conversion factors from bushel to pounds, it is important to specify the lbs.per.bushel argument when the US standard system of units is used. In this case, we are producing a yield map for a maize crop, thus, 56 lbs/bushel. Additionally, the function supports the output of the yield map in both U.S. standard and metric unit systems. In this example, we have chosen “metric”.

ymp1 <- pa_yield(input = raw.yield,
                 boundary = boundary,
                 algorithm = 'simple',
                 lbs.per.bushel = 56,
                 unit.system = 'metric',
                 verbose = FALSE)

The “ymp1” object contains information on the yield processing algorithm, smoothing method, conversion factor used, moisture, and a summary of the yield data.

ymp1

## Yield data processing algorithm: simple 
## Smoothing method: none 
## Conversion factor:  56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield  summary (t/ha)
## -----------------
## Min.       0.635 
## 1st Qu.    4.4947
## Median     7.4521
## Mean       7.3107
## 3rd Qu.    10.0796
## Max.       23.054
## -----------------

We can visualize the yield map by plotting it

pa_plot(ymp1)

Unit conversion and moisture adjustment

In the previous map, we can see that the units are “t/ha”. However, a user might want to produce yield maps using US standard units. Additionally, the user can set the moisture level to the market standard moisture. When no moisture adjustment is provided, the function adjusts the moisture level to the average moisture in the data set.

ymp2 <- pa_yield(input = raw.yield,
                 boundary = boundary,
                 algorithm = 'simple',
                 unit.system = 'standard',
                 moisture.adj = 15.5,
                 lbs.per.bushel = 56,
                 verbose = FALSE)

ymp2

## Yield data processing algorithm: simple 
## Smoothing method: none 
## Conversion factor:  56 lbs/bushel
## Adj. moisture: 15.5%
## Yield  summary (bushel/acre)
## -----------------
## Min.       9.9172
## 1st Qu.    70.247
## Median     116.47
## Mean       114.26
## 3rd Qu.    157.53
## Max.       360.31
## -----------------

We can visualize the yield map in bushels/acre and at a moisture level of 15.5%

pa_plot(ymp2)

Outlier removal

In the previous maps we have conducted simple aggregation of yield data and standardization of the moisture content. We can, additionally, add a cleaning step to remove outliers in the raw yield data.

ymp3 <- pa_yield(input = raw.yield,
                 boundary = boundary,
                 algorithm = 'simple',
                 unit.system = 'metric',
                 clean = TRUE,
                 clean.sd = 3,
                 lbs.per.bushel = 56,
                 verbose = FALSE)

This cleaning step will remove observations outside of a 3 standard deviation range from the field mean. This will often leave empty spots in the field if no smoothing method is selected.

pa_plot(ymp3)

Inverse distance weighted interpolation

In cases in which we want to interpolate the date, the function can interpolate the yield map using two prediction methods: inverse distance weighted (IDW) and kriging. The IDW method is deterministic and is much faster than the kriging method. The kriging method is stochastic and the computation time scales with \(O(n^{3})\), where n is the number of observations.

Here, we can produce and interpolated yield map using the IDW method. By default, the function uses a power of 2, but that can be overridden by passing the argument idp to the function.

If the argument grid is not specified, the function will provide smoothed predictions at the same points as the input data. This will provide predictions for any geometries data would have been removed during the cleaning step.

ymp4 <- pa_yield(input = raw.yield,
                 boundary = boundary, 
                 algorithm = 'simple',
                 unit.system = 'metric',
                 clean = TRUE,
                 clean.sd = 3,
                 smooth.method = 'idw',
                 lbs.per.bushel = 56,
                 verbose = FALSE)

ymp4

## Yield data processing algorithm: simple 
## Smoothing method: idw 
## Conversion factor:  56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield  summary (t/ha)
## -----------------
## Min.       0.635 
## 1st Qu.    4.4886
## Median     7.4437
## Mean       7.2890
## 3rd Qu.    10.0523
## Max.       17.426
## -----------------

pa_plot(ymp4)

Kriging

Alternatively, we can interpolate the maps using the Kriging method. As noted earlier, the computational time scales quickly as the number of observations increases. Therefore, we will add the argument “maxdist” to decrease computational time. It should be noted that the value at which the user sets the “maxdist” argument should be determined after examining the variogram.

ymp5 <- pa_yield(input = raw.yield,
                 boundary = boundary,
                 algorithm = 'simple',
                 unit.system = 'metric',
                 clean = TRUE,
                 clean.sd = 3,
                 smooth.method = 'krige',
                 lbs.per.bushel = 56,
                 verbose = FALSE,
                 maxdist = 50)

ymp5

## Yield data processing algorithm: simple 
## Smoothing method: krige 
## Conversion factor:  56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield  summary (t/ha)
## -----------------
## Min.       1.2440
## 1st Qu.    5.1989
## Median     7.6114
## Mean       7.4753
## 3rd Qu.    9.7014
## Max.       13.910
## -----------------
## Variogram model: Mat (Matern) 
## Partial sill:  122865.5 
## Range:  107.4692 
## Kappa:  0.3

pa_plot(ymp5, plot.var = 'yield')

We can also investigate the variogram

pa_plot(ymp5, plot.type = 'variogram')

RITAS

Initial example

When algorithm is ritas, the function expects an argument data.columns that indicates crop mass or crop flow, moisture, interval, angle, swath, and distance. Additionally, the function expects an argument data.units to indicate the units of the data columns. When neither of the two previous arguments is provided, the function uses a dictionary of common names to try and search for these columns. Additionally, the function will attempt to guess the units of the columns.

The ritas algorithm is more computationally intensive than the simple algorithm. Some of the steps can be sped up by specifying the argument “cores”. This will allow paralellization of some of the processing steps.

ymp6 <- pa_yield(input = raw.yield,
                 algorithm = 'ritas',
                 lbs.per.bushel = 56,
                 unit.system = 'metric',
                 verbose = FALSE)

In this case, since the function is able to guess the columns and the units correctly, this would be the same as the chunk below.

ymp6 <- pa_yield(input = raw.yield,
                 data.columns = c(flow = 'FLOW', moisture = 'MOISTURE', interval = 'CYCLES', width = 'SWATH', distance = 'DISTANCE'),
                 data.units = c(flow = 'lb/s', moisture = '%', interval = 's', width = 'in', distance = 'in'),
                 unit.system = 'metric',
                 algorithm = 'ritas',
                 verbose = FALSE)

pa_plot(ymp6)

Complete RITAS algorithm

In the previous yield map, we did not conduct any smoothing, so technically we did not complete the ritas algorithm. Following the algorithm’s steps, the best results should be expected by setting smooth.method to “krige”. The function also includes the option to Krige in the log scale when the distribution of the yield data is heavily skewed. In this case, the argument fun should be set to ‘log’.

ymp7 <- pa_yield(input = raw.yield,
                 boundary = boundary,
                 algorithm = 'ritas',
                 smooth.method = 'krige',
                 unit.system = 'metric',
                 lbs.per.bushel = 56,
                 verbose = FALSE,
                 maxdist = 50)

ymp7

## Yield data processing algorithm: ritas 
## Smoothing method: krige 
## Conversion factor:  56 lbs/bushel
## Adj. moisture: 17.01373%
## Yield  summary (t/ha)
## -----------------
## Min.       1.6147
## 1st Qu.    6.2176
## Median     9.0567
## Mean       8.8636
## 3rd Qu.    11.634
## Max.       17.456
## -----------------
## Variogram model: Mat (Matern) 
## Partial sill:  194472.9 
## Range:  197.6451 
## Kappa:  0.3

pa_plot(ymp7)

pacu: Precision Agriculture Computational Utilities - Yield monitor data

Caio dos Santos & Fernando Miguez

2024-12-16

Specific vignettes

Yield monitor data

Proposed workflow

Reading the data in

Compute summary statistics and visualize the data

Examining the raw yield data

Check the yield monitor data with pa_check_yield()

Fix potential issues

Example: missing yield column

Processing the data and examining the yield maps and diagnostic plots

Simple aggregation of yield data

Initial example

Unit conversion and moisture adjustment

Outlier removal

Inverse distance weighted interpolation

Kriging

RITAS

Initial example

Complete RITAS algorithm