GVS R package

Brian Maitner

2024-12-03

Geocoordinate Validation Service

The package GVS is designed to interact with the Geocoordinate Validation Service API of the Botanical Information and Ecology Network. The GVS accepts a point of observation (PO; pair of latitude, longitude coordinates in decimal degrees) and returns the country, state and county in which the point is located. It also calculates the distance between the OP and six different types of centroids for each of the three political divisions (see Centroid Types) and indicates for each political division if it is small enough for the OP itself to potentially be a centroid (see Distance Thresholds. Finally, the GVS indicates which of the three potential political division centroids is the most likely, if any, based on the threshold parameter MAX_DIST (distance to the actual centroid) and MAX_DIST_REL (relative distance: distance to actual centroid / distance from actual centroid to the farthest point in the political division). In addition, the GVS validates the submitted coordinates and reports errors such as non-numeric values and values out of bounds. Valid coordinates which do not join to any political division are flagged as “Point in ocean”.

Installation

library(devtools)
install_github("EnquistLab/RGVS")

Using the GVS

The GVS takes as input a dataframe consisting of two columns: latitude and longitude (in that order). We provide an example dataset, gvs_testfile.

library(GVS)

# Load the sample dataset

gvs_data <- gvs_testfile

# View the structure
head(gvs_data)
##      latitude    longitude
## 1   36.580435    -96.53331
## 2 39.80818224 -91.62289157
## 3          46           25
## 4    52.92755       4.7864
## 5    52.54731     -2.49544
## 6      -23.62       -65.43

To run data through the GVS, we use the function GVS

gvs_results <- GVS(occurrence_dataframe = gvs_data)

# The resulting output has a lot of information that can be used in data cleaning.

colnames(gvs_results)
##  [1] "id"                                "latitude_verbatim"                
##  [3] "longitude_verbatim"                "latitude"                         
##  [5] "longitude"                         "user_id"                          
##  [7] "gid_0"                             "country"                          
##  [9] "gid_1"                             "state"                            
## [11] "gid_2"                             "county"                           
## [13] "country_cent_dist"                 "country_cent_dist_relative"       
## [15] "country_cent_type"                 "country_cent_dist_max"            
## [17] "is_country_centroid"               "state_cent_dist"                  
## [19] "state_cent_dist_relative"          "state_cent_type"                  
## [21] "state_cent_dist_max"               "is_state_centroid"                
## [23] "county_cent_dist"                  "county_cent_dist_relative"        
## [25] "county_cent_type"                  "county_cent_dist_max"             
## [27] "is_county_centroid"                "subpoly_cent_dist"                
## [29] "subpoly_cent_dist_relative"        "subpoly_cent_type"                
## [31] "subpoly_cent_dist_max"             "is_subpoly_centroid"              
## [33] "centroid_dist_km"                  "centroid_dist_relative"           
## [35] "centroid_type"                     "centroid_dist_max_km"             
## [37] "centroid_poldiv"                   "max_dist"                         
## [39] "max_dist_rel"                      "latlong_err"                      
## [41] "coordinate_decimal_places"         "coordinate_inherent_uncertainty_m"

The contents of some of these columns may not be immediately clear. In this case, we can consult the data dictionary.

dd <- GVS_data_dictionary()

We can now use these metadata to exclude questionable data (or try to fix it). For example, we’ll certainly want to exclude coordinates that are non-numeric or impossible! Depending on our purposes, we may also want to exclude coordinates that correspond to centroids or those that fall into the ocean.

Additional information about the GVS

In addition to the coordinate resolution, the GVS offers metadata responses describing current code and database versions, and formatted citations. For a quick overview, you can use the function GVS_metadata. If you use the GVS in a publication and need to cite it, proper attribution information is available with GVS_citations.

cites <- GVS_citations()

# The citation column provides Bibtex format citations that can be pasten into a reference manager (e.g., EndNote, PaperPile, Zotero)

cites$citation
## [1] "@misc{gvs, author = {Boyle, B. L. and Maitner, B. and Park, D. S. and Rethvick, S. Y. B. and Enquist, B. J.}, journal = {Botanical Information and Ecology Network}, title = {Geocoordinate Validation Service}, year = 2024, url = {https://gvs.biendata.org/}} "
## [2] "@misc{gadm, author= {{University of California, Berkeley, Museum of Vertebrate Zoology}}, title = {Global Administrative Areas}, url = {https://gadm.org/}, note = {Accessed May 6, 2022}}"