A regular expression that matches the scientific name on a standard forms line. It will match either “Genus species” or “Genus (Subgenus) species”.
We reuse the rank codes of two of these ranks (Section 6.10, “GENUS_CODE”; and Section 6.12, “SPECIES_CODE”
as the regular expression group names. We can't use SUBGENUS_CODE for a group name because it contains a
hyphen, which is not allowed as a group name in the re package; so we define a constant SUBGENUS_FIELD for that group.
When extracting matched strings from a Match
object , keep in
mind that the subgenus group is optional, so that M
will return the value M.group(SUBGENUS_CODE)None if the subgenus is
omitted.
SUBGENUS_FIELD = "sg"
GENUS_SPECIES_RE = re.compile (
r'(?P<%s>' # Start group GENUS_CODE
r'[A-Z]' # Matches a capital letter
r'[a-z]+' # Matches one or more lowercase letters
r')' # End group GENUS_CODE
r'\s+' # Matches one or more spaces
r'(' # Start optional group
r'\(' # Matches '('
r'(?P<%s>' # Start group SUBGENUS_FIELD
r'[A-Z]' # Matches a capital letter
r'[a-z]+' # Matches one or more lowercase letters
r')' # End group SUBGENUS_FIELD
r'\)' # Matches ')'
r'\s+' # Matches one or more spaces
r')?' # End optional group
r'(?P<%s>' # Start group SPECIES_CODE
r'[a-z]+' # Matches one or more lowercase letters
r')' # End group SPECIES_CODE
% (GENUS_CODE, SUBGENUS_FIELD, SPECIES_CODE) )