Python has extensive features for handling strings of characters. There are two types:
A str value is a string of zero or more
8-bit characters. The common characters you see on
North American keyboards all use 8-bit characters. The
official name for this character set is ASCII, for American Standard Code for Information
Interchange.
This character set has one surprising property: all
capital letters are considered less than all lowercase
letters, so the string "Z" sorts before
string "a".
A unicode value is a string of zero or
more 32-bit Unicode characters. The Unicode character
set covers just about every written language and almost
every special character ever invented.
We'll mainly talk about working with str
values, but most unicode operations are
similar or identical, except that Unicode literals are
preceded with the letter u: for example,
"abc" is type str, but u"abc" is type unicode.
In Python, you can enclose string constants in either
single-quote ('...') or double-quote
("...") characters.
>>> cloneName = 'Clem' >>> cloneName 'Clem' >>> print cloneName Clem >>> fairName = "Future Fair" >>> print fairName Future Fair >>> fairName 'Future Fair'
When you display a string value in conversational mode,
Python will usually use single-quote characters.
Internally, the values are the same regardless of which
kind of quotes you use. Note also that the print statement shows only the content of a
string, without any quotes around it.
To convert an integer (int type) value
to its
string equivalent, use the function “istr(”:
i)
>>> str(-497) '-497' >>> str(000) '0'
The inverse operation, converting a string back into an
integer, is written as “sint(”:
s)
>>>
>>> int("-497")
-497
>>> int("-0")
0
>>> int ( "012this ain't no number" )
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: invalid literal for int(): 012this ain't no number
The last example above shows what happens when you try to convert a string that isn't a valid number.
To convert a string containing a number in base s, use the form “Bint(”:
s, B)
>>> int ( '0F', 16 ) 15 >>> int ( "10101", 2 ) 21 >>> int ( "0177776", 8 ) 65534
To obtain the 8-bit integer code contained in a one-character
string , use the
function “sord(”. The inverse function, to convert an
integer s) to the
character that has code i, use “ichr(”. The numeric values of each character are
defined by the ASCIIcharacter set.
i)
>>> chr( 97 )
'a'
>>> ord("a")
97
>>> chr(65)
'A'
>>> ord('A')
65
In addition to the printable characters with codes in the
range from 32 to 127 inclusive, a Python string can
contain any of the other unprintable, special characters
as well. For example, the null
character, whose official name is NUL, is the character whose code is zero.
One way to write such a character is to use this form:
'\xNN'
where is
the character's code in hexadecimal (base 16) notation.
NN
>>> chr(0)
'\x00'
>>> ord('\x00')
0
Another special character you may need to deal with is
the newline character, whose
official name is LF (for “line
feed”). Use the special escape
sequence “\n”
to produced this character.
>>> s = "Two-line\nstring." >>> s 'Two-line\nstring.' >>> print s Two-line string.
As you can see, when a newline character is displayed in
conversational mode, it appears as “\n”, but when you print it, the character
that follows it will appear on the next line. The code
for this character is 10:
>>> ord('\n')
10
>>> chr(10)
'\n'
Python has several other of these escape sequences. The
term “escape sequence” refers to a
convention where a special character, the “escape
character”, changes the meaning of the characters
after it. Python's escape character is backslash (\).
| Input | Code | Name | Meaning |
|---|---|---|---|
\b | 8 | BS | backspace |
\t | 9 | HT | tab |
\" | 34 | " | Double quote |
\' | 39 | ' | Single quote |
\\ | 92 | \ | Backslash |
There is another handy way to get a string that contains newline characters: enclose the string within three pairs of quotes, either single or double quotes.
>>> multi = """This string ... contains three ... lines.""" >>> multi 'This string\n contains three\n lines.' >>> print multi This string contains three lines. >>> s2 = ''' ... xyz ... ''' >>> s2 '\nxyz\n' >>> print s2 xyz >>>
Notice that in Python's conversational mode, when you
press Enter at the end of a line, and
Python knows that your line is not finished, it displays
a “...” prompt instead of
the usual “>>>”
prompt.