Assorted machinery having to do with just the bird code
system is relegated to a separate module, abbr.py.
The module contains an assortment of manifest constants and functions, and one class. The constants and functions are:
ABBR_L
Maximum length of a bird code, 6 in the CBC system.
BLANK_ABBR
A string containing ABBR_L spaces.
RE_ABBR
A regular expression (using Python's standard
re regular expression
module) that describes a valid bird code.
REL_SIMPLE
The relationship code for simple (non-compound) forms, one space.
REL_HYBRID
The single-character relationship code denoting
hybrids, "^".
REL_PAIR
The single-character relationship code denoting a
species pair, "|".
abbreviate(eng)
Abbreviates an English name according to the rules of
the system. Takes a string containing a name either
in the usual word order (e.g., "Aztec
Thrush") or in “last,
first” order (e.g., "Amakihi,
Molokai").
engComma(eng)
Given an English name in the customary order, such as “American Robin”, returns it in the inverted form, e.g., “Robin, American”.
engDeComma(eng)
Given an English name in the inverted form, such as “Robin, American”, returns the customary form, e.g., “American Robin”.
Representation of Christmas Bird Count data is complicated considerably by the use of what we call compound forms: species pairs (e.g., “Hammond's/Dusky Flycatcher”) and hybrids (e.g., “Baltimore Oriole×Bullock's Oriole”). Also supported is a trailing “?” to indicate that the identification is only a guess.
Here is the interface to the BirdId class,
which represents simple and compound forms and an optional
question mark.
BirdId ( txny, abbr, rel=None,
abbr2=None, q=None )
Because BirdId objects
connect bird codes to a firm taxonomic foundation,
you must pass a Txny
object as the first argument to the constructor.
The second argument is a bird code. It can be in
either upper or lower case, and either
variable-length or right-padded with spaces. It
will be stored in normalized form: uppercased and
right-padded with spaces to length ABBR_L.
For single bird identities, omit the remaining
arguments. For hybrids, pass rel=REL_HYBRID and the second bird
code in the abbr2
argument.
The q argument should be the string
"?" if the ID is questionable. The
default value is None, meaning that
the ID is not in question.
Here's an example. Suppose txny is your Txny object. This code snippet sets
b1 to a BirdId object representing Ou (a
Hawaiian endemic), and b2
to a BirdId object
representing Indigo × Lazuli Bunting:
b1 = BirdId ( txny, "ou" )
b2 = BirdId ( txny, "lazbun", REL_HYBRID, "indbun" )
This constructor will raise a KeyError exception if any of the abbreviations are
undefined in txny.
.txny
The .txny attribute of a
BirdId object is the
Txny object passed to the
constructor (read-only).
.abbr
The first or only bird code, normalized. A
normalized code is uppercased, and right-padded
with spaces if necessary to length ABBR_L.
.rel
For single forms, this attribute is None. It is set to REL_HYBRID for hybrids, REL_PAIR for species pairs.
.abbr2
For compound forms, this attribute holds the second bird code, normalized.
We stipulate that for any BirdId instance B, B.abbr <
B.abbr2. This means that if you're
looking for a specific hybrid or pair, you don't
have to look in two different places. So we swap
the .abbr and .abbr2 values if necessary to make
this true. For example, in the object returned by
“b2 = BirdId ( txny, "lazbun",
REL_HYBRID, "indbun" )”
b2.abbr would be
"indbun", and "lazbun" would be stored in
b2.abbr2.
.q
Has the value "" (the empty string)
if the ID is not in question; "?" if
there is a question about the ID; or "-" if the ID is correct but the form is
not countable under American Birding Association
rules.
.taxon
This attribute will contain a Taxon object representing the
smallest taxon that contains this identity. For a
single form, this will be taxonomic key of the
taxon containing the form. For hybrids and species
pairs, it will be taken from the smallest taxon
that is an ancestor of both forms.
.fullAbbr
Contains a string made from self's .abbr attribute, with the .rel and .abbr2 attributes concatenated only
for compound forms. Short codes are blank-stripped.
.engComma()
Returns the English name of self in
inverted order, that is, “last,
first”. Examples: "robin,
American"; "mallard x teal,
blue-winged"; "ibis,
glossy?".
.__str__(self)
This method is called when a BirdId object is converted to a
string, implicitly or by explicit use of the
str() function. It
returns the English name as a string. Examples of
its return values: "Nihoa
Finch"; "Blue-winged Teal x
Cinnamon Teal"; "Dusky
Flycatcher / Hammond's Flycatcher".
BirdId.scan ( txny, scan )
This method works with the Scan object, from the author's
personal Python library, to process raw bird codes
while scanning an input file. For more information
on the Scan object, see
the author's library
reference.
This is a static method, a relatively new feature of Python. For more information on Python static methods, see the Python 2.2 quick reference.
The txny argument is a
Txny object providing the
taxonomy system in which the codes are to be
interpreted. The scan
argument is a Scan object
used to scan the input stream containing the bird codes.
This method looks for a bird code, optionally
followed by a relationship code and a second bird
code (which we call a compound
code). Examples: "vireo", for “vireo
sp.”; mallar^amewig, Mallard × American Wigeon; and "dowwoo|haiwoo", Downy or Hairy
Woodpecker.
If the scan object points
at a valid simple or compound code, the scan object is advanced past that
code, and the method returns a new BirdId object representing the code.
If the scan object doesn't
start with a valid code, an error message is sent
to the scan object's error
log, and a ValueError
exception is raised.
This method will recognize a trailing "?" if present.
The method raises KeyError if any
bird codes are undefined.
BirdId.scanFlat ( txny, scan )
This is another static method like BirdId.scan(), but it expects to see
its input in flat file format. Specifically, the
scan object should start
with three fixed fields. The first field has
length ABBR_L and contains
the first or only bird code, left-aligned and
right-padded with spaces. The second field is a
single character and contains the relationship
code: normally blank, but it may contain REL_HYBRID or REL_PAIR for compound codes. The
third field has length ABBR_L and contains the second bird
code when the relationship code is nonblank. The
third field must be blank when the relationship
code is blank.
This method does not support the questionable ID
flag. If any codes are undefined, it raises KeyError.
BirdId.parse ( txny, s )
This static method is also like BirdId.scan(), but is used when the input
is in an ordinary string instead of a Scan object.
This method supports a trailing "?"
for questionable IDs. It will raise KeyError for undefined codes.