The node argument to this static method is
either a para node, or some other element that
has the same content. It returns a new Paragraph instance representing that content.
In general, the content can be any mixture of ordinary text,
and text marked up with elements such as genus.
To conform to the representation discussed in Section 19, “class Paragraph: One paragraph of mixed
text”, we have to convert lxml's representation into a sequence of phrases,
where each phrase is “unmarked” (plain text) or
“marked-up”.
In the lxml model, any initial unmarked text
will be found in the .text attribute of the
given node. If any text in the paragraph is marked up, it
will be found in the node's child elements, in the .text attribute. However, if there are any child
elements, the text in their .tail attributes
represents unmarked text following the element.
# - - - P a r a g r a p h . r e a d N o d e
@staticmethod
def readNode(node):
"""Convert para-content to a Paragraph instance.
"""
First, we'll create an empty Paragraph
instance. Then, if there is any initial .text,
we add it to that instance as an unmarked phrase.
#-- 1 --
# [ result := a new, empty Paragraph ]
result = Paragraph()
For the method that adds one phrase, see Paragraph-addPhrase.
#-- 2 --
# [ if node.text is not None ->
# result := result with an unmarked phrase added
# containing (node.text)
# else -> I ]
if node.text is not None:
result.addPhrase(None, node.text)
Next, process the children (if any) in order. For
each child, add its .text as a marked phrase;
then, if there is any .tail text, add that as
an unmarked phrase.
#-- 3 --
# [ result := result with content added from children,
# if any ]
for child in node:
result.addPhrase (child.tag, child.text)
if child.tail is not None:
result.addPhrase (None, child.tail)
#-- 4 --
return result