John W. Shipman, john@nmt.edu
Zoological Data Processing
507 Fitch Avenue NW
Socorro, NM 87801
(505) 835-0235
Homepage: http://www.nmt.edu/~shipman
The Christmas Bird Count (CBC) censuses, published in the periodicals Audubon Field Notes and American Birds, have been taken since 1900. As a source of long-term population data, it has few peers; however, its utility in printed form is limited.
This document describes a system for representing the data from these censuses in a general computer database form.
Most years' data (1st-62nd and 73rd-90th) were entered by Zoological Data Processing. Data for the 63rd--72nd CBCs were obtained from Carl Bock from old tapes produced by his project at the University of Colorado. Data for the 91st and succeeding CBCs were provided by the National Audubon Society from their publishing operation. There are some minor differences between these sources in the way data were captured; the record layout narratives below notes the differences between the years.
This version is limited to the data from the U.S.A. and Canada (including the French islands of St.-Pierre and Miquelon). Inclusion of the data from other countries would require significant modifications to the method of encoding bird names.
The word circle generally refers to aspects of one count that are generally unchanged from year to year: the count area's latitude and longitude, its geographic name, and so forth. The term count will generally refer to one year's census of that circle.
The term standard refers to those names and coordinates that are used in the computer database. The term as-published refers to the names and coordinates that appear in the published counts.
Actual computer file names, field names, and field contents (character strings as entered and stored) are shown in typewriter type.
The symbol ``_'' is used to represent a space character.
For proper interpretation of the six-letter codes used to describe types of birds, the CBC database depends on an infrastructure of files that describe all the possible bird types from the census file. This system is described in a separate document, `A system for representing taxonomic nomenclature'.
Before we discuss the layout of the database, it is relevant to bring up a sticky problem of organization that affects the design.
One of the most persistent problems in the design of this database is how to treat circles whose centers have shifted slightly one or more times. When the center moves only a minute or two of latitude or longitude, for the purposes of most researchers the censuses from the different centers might as well be treated as from the same location.
However, sometimes even a tiny shift can make a huge difference in the habitat coverage. For example, if a circle containing no open water is shifted so that it includes a reservoir, any conclusions based on sudden increases in waterfowl populations may be unwarranted.
For this reason, I think it would be best if the user of the database has a choice as to whether to lump overlapping circles or not.
An earlier version of this database used the concatenation of the latitude and longitude fields as the link that connected census data to circle and effort data. However, in practice this meant that a number of common operations---such as correcting center coordinates, or deciding whether or not to lump overlapping circles---required a lengthy pass through many megabytes of census data to change the link values.
The solution is to use a single identifier called the countId field to identify each circle.
We need to identify each count---that is, each set of data for a single circle counted in a particular year. We call this identifier the countId.
The published counts have used two different schemes for identifying the counts within a year:
These identifiers alone are not sufficient to identify a single count. Count #1225 might be Modoc, CA in one year and Malibu, CA in another year.
So, we define an aggregate called the countId field which consists of the CBC number, left-zero-padded to three digits, followed by the circle's identifier.
In the older counts entered by Shipman (and the data from the 63rd--72nd CBCs converted from the Bock project), the countId has the form
yyynnnna
where yyy is the CBC (year) number (with left zero
fill), nnnn is the sequence number (also with left zero
fill), and the a column is available for letter suffixes
(e.g., 127A), but usually blank.
In data derived from the publisher of the modern counts (Clinchy Associates), the countId has the form
yyyssii_
where yyy is the CBC number, ss is the state or
province code, and ii is the identifying letter code of
the circle.
Here are some examples of countId fields:
0760034_ 76th CBC, count #34. 0830947A 83rd CBC, count #947A. 104NMZU_ 104th CBC, circle ZU in state NM.Note that the countId field can be used to sort circles into their published order (at least within each state).
The countId field, then, uniquely identifies one circle counted in one year. So this field is the key that relates the various files of the database.
The std file serves as the pivot, relating census and effort records to a particular circle name and center. Since the census records are by far the biggest part of the database, omitting the lat-long of the circle center from that database means that we can change the effective location of census records by using a different std file.
We can, for example, have one version of the std file that does not lump circles, no matter how tiny the shift. This version would be good for waterfowl studies, where a tiny shift might drastically change the inclusion of waterfowl habitat. We could have another version that lumps all circles that overlap at least 10%, and another version that lumps all circles that overlap at least 50%. We can even set up special std files that lump all the counts in a particular bioregion.
So, here are the six principal files of the CBC database and their relationships:
All the files produced for this project are ``flat files,'' meaning that each field has a fixed size. This form was chosen because it is easy to import into most any database system.
Every area that has ever been counted will have at least one corresponding circle record in file cir, defining its location. In those cases where the center has been moved one or more times for a distance of more than 1 minute of latitude or longitude, there may be multiple circle records for a count of a given name.
Only the ``standard'' coordinates and names are defined in this file. We have tried to use the most recent names and the most accurate map-checked coordinates whenever possible, but in many cases the coordinates are an estimate or an outright guess. We welcome any additional information that may help us in establishing the true locations of the counts; please forward any such information in writing to the author.
Here are the field sizes, field names, and the descriptions of their contents.
400909036 40° 9' N. Lat., 90° 36' W. Long.
(Rushville, IL)
512518042 51° 25' N. Lat., 179° 18' E. Long.
(Amchitka, AK)
_ No salt water (blank)
o Open ocean included in circle
e Ocean estuary included in circle (no open ocean)
p Pelagic
_ Ordinary circle, 15-mile diameter (blank)
p Pelagic-only transect
x Odd-shaped and not pelagic-only
ab Alberta
bc British Columbia
mb Manitoba
nb New Brunswick
nf Newfoundland (including Labrador)
nt North West Territories
ns Nova Scotia
on Ontario
pe Prince Edward Island
pq Province Québec
sk Saskatchewan
yt Yukon Territory
The codes are used from left to right as necessary, and the unused
fields are all blanks. For example, if the count is entirely within
one region, rr is used and ss and tt are
blank. If a circle falls within two states,
rr is the primary state code (this determines
which state's section of the listing includes that count),
ss is the secondary state code, and tt is blank.
Here are some examples of encoding the regs field:
mb____ Entirely within Manitoba.
ny____ Entirely within New York.
onny__ Listed with Ontario, also in New York.
iailmo Iowa, Illinois, Missouri (Keokuk).
fr____ St. Pierre et Miquelon Islands (France).
M.A. Management Area
N.M. National Monument
N.P. National Park
N.W.R. National Wildlife Refuge
P.P. Provincial Park
S.P. State Park
W.M.A. Wildlife Management Area
The std file stands between the circle file and the effort and census data, representing our current best guesses about the exact association between bird sightings and locality names and coordinates.
Fields are:
Each effort record corresponds to one year's counting of a circle. Fields:
Weather data is separated from effort data because it is recorded for relatively few years---just the Bock data (63rd--72nd CBCs).
bl Blue
pc Partly Cloudy
mc Moderately Cloudy
tc Totally Cloudy
pf Partly Foggy
mf Moderately Foggy
tf Totally Foggy
no No precipitation
ir Intermittent rain
lr Light rain
mr Moderate rain
hr Heavy rain
is Intermittent snow
ls Light snow
ms Moderate snow
hs Heavy snow
ic Intermittent combination (sleet, freezing rain, hail,
snow/rain)
lc Light combination
mc Moderate combination
hc Heavy combination
o Open
mo Mostly open
po Partly open
f Frozen over
Each form of bird (species or not) mentioned in the body of a count results in one body (census) record. Furthermore, for records where multiple genders or age classes are mentioned, each different combination of gender and age is encoded in a separate record.
Warning! One implication of the above paragraph is that applications programs cannot expect there to be only one record for a given form within a given count. For example, there might be three Bald Eagle records within one count: one record each for adults, immatures, and birds of unknown age. Given all the different age and sex and other codes, there may be many! Therefore, applications programs that want to extract species totals must find all matching records, and then total them.
Fields of the census record are:
x Hybrid of form and altform, e.g.,
Mallard x Common Pintail
/ Either form or altform, e.g.,
Hammond's/Dusky flycatcher.
yebloo Yellow-billed Loon (standard species)
Gavia Gavia sp. (form not identified to species)
raptor raptor sp. (form not identified to species)
amewigxmallar Mallard x American Wigeon (hybrid)
dowwoo/haiwoo Downy / Hairy Woodpecker (two alternatives)
blugoo Blue Goose (subspecific identification)
resfli Red-shafted Flicker (subspecific identification)
For hybrids and pairs of alternatives, the convention is to
place in the form field the code that is lower in the
alphabet, and place the other code in the altform field.
This is so a given hybrid will always have the same code
structure; without this rule, someone looking for Mallard x
American Wigeon hybrids would have to search for two code groups,
amewigxmallar and mallarxamewig.
_ Unknown (blank)
a Adult
i Immature
p Female/immature (symbolized by Greek phi)
_ Unknown
m Male
f Female
p Female/immature (phi)
In order to aid in tracking name and center changes (and typographical errors, some of which persist for many years), aspub tracks as closely as possible the actual published center coordinates and circle names used in each year. This file should be proofed carefully against the published circle accounts.
Exception: in cases where the published latitude and longitude have obviously been transposed, the aspub file need not track this typo. In many cases, the coordinates would not physically fit in the fields anyway.
Fields are:
AFB Air Force Base
Co. County
Cos. Counties
Ft. Fort
GMA Game Management Area
I. Island
Jct. Junction
L. Lake
MA Management Area
MBR Migratory Bird Refuge
Mt. Mount
Mtn. Mountain
Mts. Mountains
NF National Forest
NFRA National Forest Recreation Area
NG National Grasslands
NGP National Game Preserve
NHP National Historical Park
NL National Lakeshore
NM National Monument
NP National Park
NR National River
NRA National Recreational Area
NS National Seashore
NWR National Wildlife Refuge
NWSC Naval Weapons Support Center
PP Provincial Park
Pt. Point
R. River
RA Recreational Area
SF State Forest
SFWA State Fish & Wildlife Area
SGA State Game Area
SGP State Game Preserve
SGR State Game Reserve/Refuge
SP State Park
SRA State Recreational Area
SWR State Wildlife Refuge
Twp. Township
WA Wildlife Area
WMA Wildlife Management Area
WR Wildlife Refuge
WS Wildlife Sanctuary
In order that circle records will sort consistently by locality name,
abbreviations should not be used at the beginning of the name. For
example, ``Point Pelee,'' not ``Pt. Pelee;'' ``Mount Olive,'' not
``Mt. Olive.'' The exception to this exception is ``St.'' for
``Saint:'' ``St. Louis,'' not ``Saint Louis.'' Also, words that are
a significant part of a name should be not abbreviated, for example
``Salt Lake City'' instead of ``Salt L. City,'' and ``Rocky Mountain
NP,'' not ``Rocky Mtn. NP.''