The package GVS
is designed to interact with the
Geocoordinate Validation Service API of the Botanical Information and
Ecology Network. The GVS accepts a point of observation (PO; pair of
latitude, longitude coordinates in decimal degrees) and returns the
country, state and county in which the point is located. It also
calculates the distance between the OP and six different types of
centroids for each of the three political divisions (see Centroid Types)
and indicates for each political division if it is small enough for the
OP itself to potentially be a centroid (see Distance Thresholds.
Finally, the GVS indicates which of the three potential political
division centroids is the most likely, if any, based on the threshold
parameter MAX_DIST (distance to the actual centroid) and MAX_DIST_REL
(relative distance: distance to actual centroid / distance from actual
centroid to the farthest point in the political division). In addition,
the GVS validates the submitted coordinates and reports errors such as
non-numeric values and values out of bounds. Valid coordinates which do
not join to any political division are flagged as “Point in ocean”.
The GVS takes as input a dataframe consisting of two columns:
latitude and longitude (in that order). We provide an example dataset,
gvs_testfile
.
## latitude longitude
## 1 36.580435 -96.53331
## 2 39.80818224 -91.62289157
## 3 46 25
## 4 52.92755 4.7864
## 5 52.54731 -2.49544
## 6 -23.62 -65.43
To run data through the GVS, we use the function GVS
gvs_results <- GVS(occurrence_dataframe = gvs_data)
# The resulting output has a lot of information that can be used in data cleaning.
colnames(gvs_results)
## [1] "id" "latitude_verbatim"
## [3] "longitude_verbatim" "latitude"
## [5] "longitude" "user_id"
## [7] "gid_0" "country"
## [9] "gid_1" "state"
## [11] "gid_2" "county"
## [13] "country_cent_dist" "country_cent_dist_relative"
## [15] "country_cent_type" "country_cent_dist_max"
## [17] "is_country_centroid" "state_cent_dist"
## [19] "state_cent_dist_relative" "state_cent_type"
## [21] "state_cent_dist_max" "is_state_centroid"
## [23] "county_cent_dist" "county_cent_dist_relative"
## [25] "county_cent_type" "county_cent_dist_max"
## [27] "is_county_centroid" "subpoly_cent_dist"
## [29] "subpoly_cent_dist_relative" "subpoly_cent_type"
## [31] "subpoly_cent_dist_max" "is_subpoly_centroid"
## [33] "centroid_dist_km" "centroid_dist_relative"
## [35] "centroid_type" "centroid_dist_max_km"
## [37] "centroid_poldiv" "max_dist"
## [39] "max_dist_rel" "latlong_err"
## [41] "coordinate_decimal_places" "coordinate_inherent_uncertainty_m"
The contents of some of these columns may not be immediately clear. In this case, we can consult the data dictionary.
We can now use these metadata to exclude questionable data (or try to fix it). For example, we’ll certainly want to exclude coordinates that are non-numeric or impossible! Depending on our purposes, we may also want to exclude coordinates that correspond to centroids or those that fall into the ocean.
In addition to the coordinate resolution, the GVS offers metadata
responses describing current code and database versions, and formatted
citations. For a quick overview, you can use the function
GVS_metadata
. If you use the GVS in a publication and need
to cite it, proper attribution information is available with
GVS_citations
.
cites <- GVS_citations()
# The citation column provides Bibtex format citations that can be pasten into a reference manager (e.g., EndNote, PaperPile, Zotero)
cites$citation
## [1] "@misc{gvs, author = {Boyle, B. L. and Maitner, B. and Park, D. S. and Rethvick, S. Y. B. and Enquist, B. J.}, journal = {Botanical Information and Ecology Network}, title = {Geocoordinate Validation Service}, year = 2024, url = {https://gvs.biendata.org/}} "
## [2] "@misc{gadm, author= {{University of California, Berkeley, Museum of Vertebrate Zoology}}, title = {Global Administrative Areas}, url = {https://gadm.org/}, note = {Accessed May 6, 2022}}"