Bioimages home page
Globally Unique Identifiers page
Guid-O-Matic Introduction

the Guid-O-Matic squid

Guid-O-Matic 1.0


for specimens

"You can write better software than this..."


Specific instructions for creating GUIDs using Guid-O-Matic

 (under construction)

1. Deciding if you are actually in a position create GUIDs 

GUIDs are intended to be persistent.  That means that you should not create them unless you are serious about keeping the metadata that goes with them available to the public via the Internet for a very long time.  The RAX (RDF And XSLT) method on which Guid-O-Matic is based is designed to require no maintenance beyond what is required to maintain a normal website with static pages.  However, the RDF files must be within a domain that is not likely to disappear at any time in the foreseeable future. 

Examples of situations where domains are likely to persist.  

Subdomains of large institutions.  If you are affiliated with an institution that is relatively large and financially stable, and which has a well-developed IT infrastructure, then you can probably acquire a subdomain of that domain, which you can then use to issue GUIDs.  For example, Vanderbilt University is a large academic institution that has been around for over 100 years.  The subdomain bioimages.vanderbilt.edu can persist because the underlying domain vanderbilt.edu is not likely to disappear.  If information within bioimages.vanderbilt.edu has value to the bioinformatics or educational community, it is likely that some entity at Vanderbilt University will continue to maintain that information even after the initial creator is no longer able to do so. 

Privately controlled domains.  If you have obtained a domain name through a public vendor, you must have a plan for making sure that someone will continue to pay the fee indefinitely.  Again, if you have metadata that is of high value to the bioinformatics community, someone may be willing to take over paying for that domain after you are no longer in charge.  For example, cyberlouisiana.com will ultimately serve data for hundreds of thousands of specimens.  Therefore it is likely that someone will be willing to maintain that domain if funding for the specific project ends.

Examples of situations where domains are not likely to persist.

Domains shared with other organizations.  If your institution uses a URL that falls hierarchically within a domain that is shared with another organization (i.e. you do not control the entire domain or subdomain), it is not safe to assume that you will be able to control the physical presence and stability of those URLs.  For example, www.cas.vanderbilt.edu/bioimages falls hierarchically within a subdomain of Vanderbilt's College of Arts and Sciences.  Even if I control files that fall within the bioimages directory of that subdomain, I cannot ensure that the subdomain will continue to exist and serve files.

"Temporary" privately controlled domains.  If the domain owner is not willing or able to keep the domain stable indefinitely or if the domain name is not appropriate for someone else to assume control, the GUIDs should be issued under another domain.  For example, stevestuff.com probably isn't a good domain to use.

Do you have unique local identifiers?

Guid-O-Matic assumes that you have some string of characters that uniquely identify resources within your collection.  Because of the RAX delivery method, it works well to have a local identifier of the sort collectionCode/catalogNumber (Guid-O-Matic will insert the slash between the two parts for you).  Also, because the RAX method where collectionCode forms the name of a directory and catalogNumber represents a file name, it is probably safest to limit the number of catalog numbers within a collectionCode to less than 10 000.  For the same reason, both collectionCode and catalogNumber should contain only characters that can be used as valid file names in Windows.  This generally means alphanumeric characters, the underscore character ("_"), and dash ("-").  They should not contain blanks or punctuation characters such as <>!?./:;& .  Although alphabetic characters can be either upper or lower case, many servers will differentiate between the two and users who manually type the characters into a browser may use the wrong case.  I personally prefer to use only lower case letters.

If your identifier system has many more catalogNumbers than 10 000, there may be ways that you can logically subdivide them.  For example, if you use serial accession numbers, you could place the numbers assigned during a particular year into a collectionCode that is the year.  If you want to use an Index Herbarioum code for your collection code, you can append numeric digits after the code (e.g. vdb000, vdb001,...) and have a systematic means of placing the accession numbers within a collection code.

If these constraints are too limiting for you, then the RAX delivery method and Guid-O-Matic are probably not for you.  You're on your own!

Do you have metadata?

Although I suppose one could issue GUIDs for resources that did not have metadata, there would be little point in it, since the reason for making GUIDs actionable is to provide a means for users to acquire metadata. 

At a minimum, you will want to have basic specimen metadata available (catalog number, collector information, and taxonomic determinations).  If a specimen is geolocated (i.e. has decimal latitudes and longitudes) Guid-O-Matic creates a link to the location on Google Maps.  If there is an image for a specimen, the image is displayed or linked depending on the resolution.  Both geolocation and images are optional.

Because Guid-O-Matic is a very simple program, it requires that the metadata fields be ordered in a particular way in the CSV file.  In some cases, it may be possible to configure a database program to output the file in exactly the format that Guid-O-Matic wants.  It is more likely that you will need to manually adjust the output using something like Excel before creating the CSV file.  It is also perfectly fine to just start with an Excel file itself and arrange the cells in the way that Guid-O-Matic wants.  If some standard format emerges, I may create another version to use that, although I'm not really very interested in writing software (remember the Guid-O-Matic motto: "You can write software better than this...").

2. Create the metadata files

In general, here is the format of the two data files that you need to create to make Guid-O-Matic work.