If you are planning on representing bird taxonomy using a relational database, flat files are a universal format accepted by all the major database systems. In a flat file, each record consists of a sequence of fixed-length fields.
If you run the nombuild program without the
-x option, three files are written
by that program:
The tree file defines all the
taxa in the standard forms file plus all subspecific
taxa from the alternate forms file. Its name is the
same as the input file, except it has extension
.tre. For example, if the input
files are aou640.std and
aou640.alt, the tree file will
be called aou640.tre.
The abbreviations file defines
all the six-letter bird codes. This file has extension
.ab6.
The collisions file describes
every six-letter bird code that is invalid because two
or more names would all abbreviate to that code. Its
extension is .col.
The sections below describe the formats of these product files.
The tree file defines all the different scientific names used in the input. Here is the format of that file:
| Length | Contents |
|---|---|
| varies |
The taxonomic key number. The exact format of
this field depends on the content of the
ranks file; see Section 7.1.1, “Taxonomic key numbers”.
|
| 6 | If this taxon has a standard six-letter bird code, that code appears here; otherwise the field is blank. |
| 1 |
For generally accepted forms, this field is
blank. If the form is not in the main AOU Check-List, a
question mark (?)
appears here.
|
| 36 |
The next field is the scientific name of the group to which this form is referred, for example, Junco hyemalis. The field is aligned flush left and padded on the right with spaces. For forms not identified to species, the smallest containing taxon is used, e.g., Aves for “bird sp.”
For subspecific forms defined in the alternate
names file, this field contains the scientific
name with a space and an integer appended. For
example, in the line for the standard species
Snow Goose, this line will have the value
“ |
| 56 |
The English name of the form appears next, aligned flush left and right-padded with spaces. For multi-word names, the generic part comes first, followed by a comma, one space, and the specific part. Examples: Dunlin Loon, Red-throated grebe sp. bird sp. bird, large sp. teal, Blue-winged x Cinnamon Junco, (Gray-headed x Slate-colored) Dark-Eyed
|
| varies | At the end of the record is a variable-length field containing the English name, encoded for typesetting using TEX markup codes. Use this field to get diacritical marks and correct italicization of generic names. |
The taxonomic key number can be used to sort records
into phylogenetic order, as defined by the AOU Check-List. It
contains one or more digits for each rank (except for
the root rank). The number of digits for each rank is
determined by the third column in the ranks file.
It is an extremely bad idea to use this number to represent a taxon for any other purpose other than sorting. Not only is it spectacularly meaningless out of context, but any change to the input files will change all of the taxonomic key numbers.
For example, if your ranks file
looks like the example given above (2-digit order,
2-digit family, 1-digit subfamily, 2-digit genus, 2-digit
species, and 2-digit form), each taxonomic key number
would have these components:
The two-digit serial number of the taxonomic order in
which this form is placed, or “00” if the form is not placed
into an order (e.g., “bird sp.”).
The two-digit serial number of the taxonomic family
within this order, or “00” for forms not placed within a specific
family. Note that the sequence of families starts over
at “01” again
within each order.
The one-digit serial number of the subfamily within
the family, or “0” if the subfamily is unknown.
The two-digit serial number of the genus within the
family, or “00” if the genus is unknown.
The two-digit serial number of the species within the
genus, or “00” if the species is unknown.
The two-digit serial number of the form within the
species, or “00” if the form is unknown.
For example, code daejun
(Dark-eyed Junco) might have a taxonomic key number of
“21 24 3 47 01 00”
(the spaces here are for clarity—they are not actually
present in the record). This key would mean that this
form is in the 21st order, and in the 24th family within
that order, the 3rd subfamily within that family, the
47th genus within that subfamily, and the first
species within that genus, and not in any known subform
of the species.
Other forms that are included within Dark-eyed Junco will
have keys “21 24 3 47 01 01”, “21 24 3 47 01
02”, and so on. Examples of such
forms include races such as Gray-headed Junco, hybrids
among the different races (e.g., “Gray-headed ×
Slate-colored Junco”), and obsolete names
(“Northern Junco”).
Note that the taxonomic key number can be used to deduce
relationships between form codes. For example, to find
out what genus a species is in, just construct a key
number that is the same as the species' key number, but
with its species number set to “00”. Continuing the example above,
suppose Gray-headed Junco has this key number:
21 24 3 47 01 01
Then we can deduce all the higher ranks by substituting zeroes in the appropriate fields:
21 24 3 47 01 00
| The containing species, Junco hyemalis |
21 24 3 47 00 00
| The containing genus, Junco |
21 24 3 00 00 00
| The containing subfamily, Emberizinae |
21 24 0 00 00 00
| The containing family, Emberizidae |
21 00 0 00 00 00
| The containing order, Passeriformes |
00 00 0 00 00 00
| The containing class, Aves |