Look up information

Martin Westgate & Dax Kellie

2024-04-09

galah supports two functions to look up information: show_all() and search_all(). The first argument to both functions is a type of information that you wish to look up; for example to see what fields are available to filter a query by, use:

show_all(fields)
## # A tibble: 646 × 3
##    id                  description               type  
##    <chr>               <chr>                     <chr> 
##  1 _nest_parent_       <NA>                      fields
##  2 _nest_path_         <NA>                      fields
##  3 _root_              <NA>                      fields
##  4 abcdTypeStatus      <NA>                      fields
##  5 acceptedNameUsage   Accepted name             fields
##  6 acceptedNameUsageID Accepted name             fields
##  7 accessRights        Access rights             fields
##  8 annotationsDoi      <NA>                      fields
##  9 annotationsUid      Referenced by publication fields
## 10 assertionUserId     Assertions by user        fields
## # ℹ 636 more rows

And to search for a specific field:

search_all(fields, "australian states")
## # A tibble: 2 × 3
##   id     description                            type  
##   <chr>  <chr>                                  <chr> 
## 1 cl2013 ASGS Australian States and Territories fields
## 2 cl22   Australian States and Territories      fields

Here is a list of information types that can be used with show_all() and search_all():

Information type Description Sub-functions
Configuration
atlases Show what living atlases are available show_all_atlases(), search_atlases()
apis Show what APIs & functions are available for each atlas show_all_apis(), search_apis()
reasons Show what values are acceptable as ‘download reasons’ for a specified atlas show_all_reasons(), search_reasons()
Taxonomy
taxa Search for one or more taxonomic names search_taxa()
identifiers Take a universal identifier and return taxonomic information search_identifiers()
ranks Show valid taxonomic ranks (e.g. Kingdom, Class, Order, etc.) show_all_ranks(), search_ranks())
Filters
fields Show fields that are stored in an atlas show_all_fields(), search_fields()
assertions Show results of data quality checks run by each atlas show_all_assertions(), search_assertions()
Group filters
profiles Show what data quality profiles are available show_all_profiles(), search_profiles()
lists Show what species lists are available show_lists(), search_lists()
Data providers
providers Show which institutions have provided data show_all_providers(), search_providers()
collections Show the specific collections within those institutions show_all_collections(), search_collections()
datasets Shows all the data groupings within those collections show_all_datasets(), search_datasets()

show_all_ subfunctions

While show_all is useful for a variety of cases, you can still call the underlying subfunctions if you prefer. Functions with the prefix show_all_ do exactly that; they show all the possible values of the category specified.

show_all_atlases()
## # A tibble: 11 × 4
##    region         institution                                                             acronym url                         
##    <chr>          <chr>                                                                   <chr>   <chr>                       
##  1 Australia      Atlas of Living Australia                                               ALA     https://www.ala.org.au      
##  2 Austria        Biodiversitäts-Atlas Österreich                                         BAO     https://biodiversityatlas.at
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira               SiBBr   https://sibbr.gov.br        
##  4 Estonia        eElurikkus                                                              <NA>    https://elurikkus.ee        
##  5 France         Portail français d'accès aux données d'observation sur les espèces      OpenObs https://openobs.mnhn.fr     
##  6 Global         Global Biodiversity Information Facility                                GBIF    https://gbif.org            
##  7 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt  https://snib.conap.gob.gt   
##  8 Portugal       GBIF Portugal                                                           GBIF.pt https://www.gbif.pt         
##  9 Spain          GBIF Spain                                                              GBIF.es https://www.gbif.es         
## 10 Sweden         Swedish Biodiversity Data Infrastructure                                SBDI    https://biodiversitydata.se 
## 11 United Kingdom National Biodiversity Network                                           NBN     https://nbn.org.uk
show_all_reasons()
## # A tibble: 13 × 2
##       id name                            
##    <int> <chr>                           
##  1     1 biosecurity management/planning 
##  2    11 citizen science                 
##  3     5 collection management           
##  4     0 conservation management/planning
##  5     7 ecological research             
##  6     3 education                       
##  7     2 environmental assessment        
##  8    12 restoration/remediation         
##  9     4 scientific research             
## 10     8 systematic research/taxonomy    
## 11    13 species modelling               
## 12     6 other                           
## 13    10 testing

search_ subfunctions

You can also call subfunctions that use the search_ prefix to lookup information. search_ subfunctions differ from show_all_ in that they require a query to work, and they useful to search for detailed information that can’t be summarised across the whole atlas.

search_taxa() is an especially useful function in galah. It let’s you search for a single taxon or multiple taxa by name.

search_taxa("reptilia")
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                                                          rank  match_type kingdom  phylum   class    issues 
##   <chr>       <chr>           <chr>                                                                     <chr> <chr>      <chr>    <chr>    <chr>    <chr>  
## 1 reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c399 class exactMatch Animalia Chordata Reptilia noIssue
search_taxa("reptilia", "aves", "mammalia", "pisces")
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                                                          rank  match_type kingdom  phylum   class    issues 
##   <chr>       <chr>           <chr>                                                                     <chr> <chr>      <chr>    <chr>    <chr>    <chr>  
## 1 reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c399 class exactMatch Animalia Chordata Reptilia noIssue

Alternatively, search_identifiers() is the partner function to search_taxa(). If we already know a taxonomic identifier, we can search for which taxa the identifier belongs to.

search_identifiers("urn:lsid:biodiversity.org.au:afd.taxon:682e1228-5b3c-45ff-833b-550efd40c399")
## # A tibble: 1 × 15
##   search_term                               success scientific_name taxon_concept_id rank  rank_id   lft   rgt match_type kingdom kingdom_id phylum phylum_id class class_id
##   <chr>                                     <lgl>   <chr>           <chr>            <chr>   <int> <int> <int> <chr>      <chr>   <chr>      <chr>  <chr>     <chr> <chr>   
## 1 urn:lsid:biodiversity.org.au:afd.taxon:6… TRUE    REPTILIA        https://biodive… class    3000 33626 36658 taxonIdMa… Animal… https://b… Chord… https://… Rept… https:/…

show_values() & search_values()

Once a desired field is found, you can use show_values() to understand the information contained within that field. For example, we can show the values contained in the field basisOfRecord.

search_all(fields, "basisOfRecord") |> show_values()
## ! Search returned 2 matched fields.
## • Showing values for 'basisOfRecord'.
## # A tibble: 9 × 1
##   basisOfRecord      
##   <chr>              
## 1 HUMAN_OBSERVATION  
## 2 PRESERVED_SPECIMEN 
## 3 OBSERVATION        
## 4 OCCURRENCE         
## 5 MACHINE_OBSERVATION
## 6 MATERIAL_SAMPLE    
## 7 LIVING_SPECIMEN    
## 8 MATERIAL_CITATION  
## 9 FOSSIL_SPECIMEN

Use this information to pass meaningful queries to galah_filter().

galah_call() |> 
  galah_filter(basisOfRecord == "LIVING_SPECIMEN") |> 
  atlas_counts()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 126135

This works for other types of query, such as data profiles:

search_all(profiles, "ALA") |> 
  show_values() |> 
  head()
## • Showing values for 'ALA'.
## # A tibble: 6 × 5
##      id enabled description                                                                                                                              filter displayOrder
##   <int> <lgl>   <chr>                                                                                                                                    <chr>         <int>
## 1    94 TRUE    "Exclude all records where spatial validity is \"false\""                                                                                "-spa…            1
## 2    96 TRUE    "Exclude all records with an assertion that the scientific name provided does not match any of the names lists used by the ALA.  For a … "-ass…            1
## 3    97 TRUE    "Exclude all records with an assertion that the scientific name provided is not structured as a valid scientific name. Also catches ran… "-ass…            2
## 4    98 TRUE    "Exclude all records with an assertion that the name and classification supplied can't be used to choose between 2 homonyms"             "-ass…            3
## 5    99 TRUE    "Exclude all records with an assertion that kingdom provided doesn't match a known kingdom e.g. Animalia, Plantae"                       "-ass…            4
## 6   100 TRUE    "Exclude all records with an assertion that the scientific name provided in the record does not match the expected taxonomic scope of t… "-ass…            5