Metadata Standards and RDF

Metadata

Metadata are the data which describe resources of interest. For instance, metadata about an image describe the circumstances under which the image was captured and the intellectual property details associated with the image. Metadata about an organism describe its taxonomy and details of its occurrences. An image with rich metadata has value beyond its visual characteristics because its context is known. Full metadata are available in human-readible form for each image and documented organism on the web page that is associated with it and which is returned when its HTTP URI is dereferenced in a web browser. However, full metada are also available in machine-readible form as Resource Description Framework (RDF) serialized as XML.

RDF

RDF is used to describe properties of resources (entities of interest, such as images and organisms) and the relationships between them. For example, the following RDF fragment (in Turtle syntax) describes the tree having the identifier http://bioimages.vanderbilt.edu/vanderbilt/7-314 using four RDF statements (called triples):

<http://bioimages.vanderbilt.edu/vanderbilt/7-314>
     rdf:type dcterms:PhysicalResource;
     dwc:establishmentMeans "native"@en;
     dwcuri:inCollection <http://biocol.org/urn:lsid:biocol.org:col:35259>;
     foaf:depiction <http://bioimages.vanderbilt.edu/baskauf/79649>.

In machine-readible form, these triples state the kind of thing the tree is (a physical resource), how it got there (native establishment), the collection it is part of (the Vanderbilt Arboretum), and that it is depicted by a particular image. These last two statements link the tree's metadata to RDF descriptions of the related resources (the arboretum and the image) in accordance with Linked Data principles. The web page for each image and individual organism contains a link to the RDF formatted metadata for that particular resource (in XML syntax).

Standards

It does little good to provide machine-readible metadata about a resource if the terms used to specify the properties of the resource are not standardized. Standard metadata terms provide a consistent language for expressing metadata as RDF. The Darwin Core (DwC) and Audiovisual Core (AC) TDWG standards combined with terms from the Dublin Core vocabulary provide many of the metadata terms necessary to describe live organism images and the plants that they document. Wherever possible, Bioimages uses terms from these vocabularies to describe resources in the collection.

Click on the image to download a poster-sized PowerPoint of the graph model. (49kb)

Bioimages Graph Model

The Darwin Core TDWG standard contains many terms to describe data properties of biodiversity resources. It does not generally provide terms to describe the relationships among different types of resources. The Darwin-SW (DSW=Darwin Semantic Web) ontology expands upon the basic DwC vocabulary by formally defining the relationships among resource classes. Bioimages uses DSW object properties to describe RDF relationships between resources. A significant feature of DSW is that a living organism acts as a node which connects all occurrences derived from the individual as well as one or more taxonomic determinations as described in Baskauf (2010), an approach that is now supported by the addition of the Darwin Core organism class (dwc:Organism). The Darwin-SW model permits the expression of complex relationships in RDF that would be difficult to model in a simple database table and the Bioimages graph model is based primarily on the Darwin-SW model.

Accessing Bioimages RDF

A semantic client ("machine"; computer software) can acquire Bioimages RDF/XML by dereferencing particular image and organism HTTP URIs. The client may discover a URI via a link external to Bioimages or through the individual organism or image RDF site maps which are linked to the site's VOID description. The HTML content is also indexed in a site map file which links to a subset of the RDF in the form of RDFa encoded in the HTML of the static web pages.

The entire Bioimages database as RDF (over one million triples) is available via the Bioimages GitHub repository in RDF/XML format as a compressed file (bioimages-rdf.zip). These triples can also be queried via the Vanderbilt Heard Libraries SPARQL endpoint at https://sparql.vanderbilt.edu/sparql. Describing SPARQL is beyond the scope of this web page, but see the references below for more information.

References for further information:

RDF Primer (YouTube video about URIs and RDF in a biodiversity context)

Beginner's Guide to RDF (RDF in a biodiversity informatics context)

Dublin Core terms

Darwin Core terms

Audiovisual Core terms

Darwin-SW website

Baskauf, SJ (2010). Organization of biodiversity resources based on the process of their creation and the role of individual organisms as resource relationship nodes. Biodiversity Informatics 7:17-44.

Baskauf, SJ and CO Webb (2016). Darwin-SW: Darwin Core-based terms for expressing biodiversity data as RDF. Semantic Web Journal 7:629-643. Open access here.