Each six-letter form code is tied to a specific name. Most of the names are English names, and most of those have now or formerly had some official status, typically from one version or another of the AOU Check-List.
This program will form a code from each name using the rules published in the specification, and detect collisions (cases where application of the standard rules causes two or more names abbreviate to the same code). If the data files do not provide a disambiguation, the program will print an error message, and the operator must correct the files and re-run the program.
Because most checklists and field records enumerate species, each species in the taxonomic tree is assigned a six-letter form code. Form codes are not automatically assigned to higher taxa.
As the standard (.std) file is
read, and the taxonomic tree is built, the program also
builds a table of all the form codes. However, the
.std file does not include forms
deeper than species level.
The alternate forms (.alt) file
enumerates all the form codes that are not derived from
standard names. Each line in the alternate forms file
is of one of these types:
A subspecific form, defining a form code that applies to only part of a species population. Each such line results in a new node being added to the taxonomic tree as a child of that species.
Because many such forms (such as color morphs like Blue Goose) do not have scientific names, they are assigned artificial scientific names consisting of a species name followed by one space and a number. Example: Blue Goose may be given the arbitrary species name Chen caerulescens 1.
A higher taxon line, defining a form code that applies to a level higher than species such as a genus or subfamily.
An equivalence line, showing a deprecated form code and the preferred equivalent. The preferred code may itself be deprecated: form A may be referenced to form B, which in turn is referenced to form C. Occurrences of form A are then to be treated as references to form C. There may be even more links in these reference chains. The data file may even contain cycles, for example where A is referred to B, B to C, and C back to A again. The program must detect such cycles and not go into an infinite loop.