Globally Unique Identifiers

Globally unique identifiers (GUIDs) are an integral part of the global biodiversity information network as it is currently envisioned. In order for users of a global network to know unambiguously what resource they are encountering, the resource must have an identifier that is unique from all other identifiers in the world. But GUIDs are expected to be more than just globally unique. There is an expectation that they should also be persistent. That is, the identifier assigned to a resource should not change over time and that identifier should be used consistently to refer to that resource. There is also a growing consensus that identifiers should be actionable. An actionable identifier provides a means for a user to use the identifier discover information about the resource. HTTP URIs are the form of actionable identifiers preferred by the Linked Data community. These principles are described in more detail below.

Globally Unique

There are several strategies that can be used to ensure that an identifier is globally unique. One is to create a random number that is so large that the probability of any two identifiers being identical is negligible. This is the approach taken with Universally Unique Identifiers (UUIDs), e.g. 7FF3FEFD-4930-4E79-ACCA-753D6DF7C22F (a form of a hexidecimal number). Another strategy is to include an Internet domain (or subdomain) name in the identifier, since the rules of the Internet ensure that no two entities can be assigned the same domain name. If a domain name is attached to a locally unique identifier, the combined identifier must be globally unique.

Bioimages uses the latter approach to construct GUIDs. The subdomain bioimages.vanderbilt.edu is used as a prefix in the GUIDs of all resources on the site. A locally unique identifier is constructed using a "namespace" and catalog number. For example, the namespace ind-hessd is used to group individual plants photographed by Darel Hess and catalog numbers such as e5032 are assigned to individual images. Thus the HTTP URI http://bioimages.vanderbilt.edu/ind-hessd/e5032 uniquely identifies a particular shaggy dwarf morning-glory (Evolvulus nuttallianus) plant that was located in the Couchville Cedar Glade State Natural Area in Tennessee.

Persistent

A good GUID should persist. That means that a particular identifier should be stable and associated with the same resource for a very long time. Bioimages URIs will not change over time (in contrast with many URLs that change over a period of months or days).

Actionable

The most common type of actionable GUID is an HTTP URI, recognizable by the "http://" prefix. HTTP URIs include the familiar URLs that are used to retrive web pages. However, an HTTP URI can identify anything, including a physical thing (such as a person or a plant) or a concept (such as a taxonomic species). So what does it mean for an HTTP URI to be actionable if it can't be used to retrieve a plant using a web browser?

When a human user enters the HTTP URI of a non-information resource in a web browser, the web server knows that the resource itself cannot be returned to the browser. Instead, the server refers the browser to a web page that is about the resource. For example, if the URI http://bioimages.vanderbilt.edu/ind-hessd/e5032 is entered in a browser, the server will not return to the user the plant that is identified by that URI. Rather, the server will refer the browser to the web page http://bioimages.vanderbilt.edu/ind-hessd/e5032.htm, which provides information about the plant. This process of referring a client to an alternative URI that will provide information in an appropriate form for that client is called content negotiation.

After the success of the World Wide Web was demonstrated, it was felt that the next step on the evolution of the Web would be to make it more friendly to computers. This idea has led to the development of a "web of data", the Semantic Web. A machine-readable system for describing resources called Resource Description Framework (RDF) has developed to enable the exchange of information among computers. If a machine requests information about a resource in the form of RDF from a server using that resource's HTTP URI, the server will use content negotiation to refer the machine to the URI of an RDF file. For example, a machine requesting information about http://bioimages.vanderbilt.edu/ind-hessd/e5032 will be referred to the URI http://bioimages.vanderbilt.edu/ind-hessd/e5032.rdf (this file may or may not display properly on a web browser depending on the browser type).

All images and individual organisms in the Bioimages database have been assigned persistent HTTP URI globally unique identifiers. These identifiers redirect through content negotiation to web pages or RDF depending on the content type requested by the client. The GUIDs and stable web page URLs can be obtained from the web pages about the images and organisms.

For more information

A YouTube video for beginners, entitled "An RDF Primer", includes information about URI identifiers in the context of RDF.

The Beginner's Guide to RDF discusses RDF in the context of biodiversity informatics.

The Global Biodiversity Information Facility (GBIF) has published A Beginner's Guide to Persistent Identifiers.

Biodiversity Information Standards (TDWG) has produced a GUID Applicability Statement standard. Bioimages follows the recommendations of this standard.