Chapter 1 Introduction

1.1 Motivation

Fungaria collections and citizen science observations (e.g.,iNaturalist, MushroomObserver) typically contain taxonomic, geographic, and temporal information for each fungal specimen. Additionally, these records also often contain trait-relevant metadata about the host, habitat, or substrate associated with the collected or observed specimens; thus, these records are well-suited for comprehensive investigations into taxonomic, geographic, and temporal patterns of ecological traits. The first step in pursuing these investigations is accessing data and generating a data set. Citizen science platforms and many fungaria have online interfaces for downloading data from their respective databases. This is useful if you are only interested in the specimen records from one particular database; however, if you are interested in maximizing the size of your data set and conducting broad analyses the best approach would be to combine data from all available databases. This is accomplished through online data aggregators like the Mycology Collections Portal (MyCoPortal) and Global Biodiversity Information Facility (GBIF). These online interfaces allows users to access occurrence data from a wide variety of fungaria and citizen science platforms and then automatically aggregate the data from these different sources into one data set.

Before using occurrence data for trait analyses, a variety of preprocessing steps may be necessary. Occurrence data sets, especially those that are large and temporally and geographically diverse, often contain errors or inconsistencies. For example, taxon names may be outdated (e.g., taxon has been placed in new genus), date information may be nonsensical or have inconsistent format, location names may be misspelled, and GPS coordinates may be nonsensical or have inconsistent format. Additionally, the web interfaces where the data were obtained offer limited capabilities for selecting trait-relevant records; therefore, occurrence data also needs to be processed to identify records associated with the trait of interest. To overcome these various issues and help enable comprehensive trait analyses, we created the fungarium package, which contains a suite of functions for handling taxon, date, and location issues as well as enabling the identification of trait-associated records.

To complement these preprocessing tools and further facilitate trait analyses, the fungarium package also contains functions for adding FunGuild data to each record, assigning records to hexagonal grid cells for bounding “box” geographical analyses, visualizing taxonomic patterns of trait data in annotated cladograms, and visualizing geographic patterns of trait data in annotated maps.

Note that many of the fungarium preprocessing tools were created primarily for use with MyCoPortal data which is in verbatim format (i.e., unaltered from the format it was in, when received from the data provider). These functions can be used with GBIF data as well, but be aware that GBIF using its own processing tools to fix issues (i.e., interpretation) within each record. To circumvent GBIF processing and use fungarium processing tools on your data instead, use the “verbatim” data file within the Darwwin Core Archive downloaded from GBIF. This file should not contain GBIF interpretations.

1.2 Getting started

1.2.1 Installing fungarium

install.packages("remotes") #install 'remotes' (if not already installed)
remotes::install_github("hjsimpso/fungarium@*release") #install the latest fungarium release

1.2.2 Retrieving MyCoPortal data

Fungal collection/observation data can be retrieved from the MyCoPortal using mycoportal_tab within R or by manually downloading data from the MyCoPortal web interface at https://mycoportal.org.

1.2.2.1 mycoportal_tab

mycoportal_tab removed from package as of v2.0.0. Function was too difficult to maintain due to the instability of the mycoportal.org webpage layout. If mycoportal.org ever implements a true API, this function may return. Mycoportal records can still be downloaded manually from mycoportal.org

1.2.2.2 MyCoPortal web interface

MyCoPortal data sets can be downloaded using the web interface at https://mycoportal.org. Various query parameters (e.g., taxon, country, year, etc.) can be used to select records of interest. When selecting download preferences, uncheck “Compressed ZIP file” then select “Darwin Core” for Structure, Tab Delimited” for File Format and “UTF-8 (unicode)” for Character Set. These file settings are necessary for data processing tools in the fungarium package to work properly. Downloaded data sets can be imported into R via read.delim or data.table::fread.

1.2.3 Retrieving GBIF data

Fungal collection/observation data can be retrieved from GBIF using the web interface at https://www.gbif.org/. To take advance of fungarium processing tools, download your data sets in Darwin Core Archive format and use the “verbatim” data file within the archive. In some cases, you may also be able to retrieve records from within R using the rgbif package (see package details).