STRINGS
Goto:
top
|
string formatting
|
string methods
|
string functions
STRING OBJECTS
A string (object) is a sequence of characters (bytes: 0:255),
between single or double quotes: 'abc' or "abc", or of unicode characters
(two bytes) u'abc' or u"abc".
Quotes of the other kind may occur within a string:
"This is a 'string'. " or 'this is a "string".'
Strings can be sliced and concatenated as other sequential objects.
The binary set operation in works also for strings.
Python does not known the type "character"; a character is a string with
one element.
>>> s="this is a string"
>>> t='of words'
>>> s+' '+t
'this is a string of words'
>>> len(s)
16
>>> s[:4]
'this'
>>> s[-6:]
'string'
>>> 'string' in s
True
Printable characters (1-byte code)
Characters decimal 32-126 are the printable ASCII code characters. Python uses
an 8-bit code; characters decimal 128-255 are special signs and accented characters.
Python just handles bits and bytes; your machine handles display and printing. Don't blame
Python if your characters 128-255 don't appear on the screen or on the printer as you expect
them to appear. To see how they look on your screen, execute the following Python statements:
>>> for i in range(32,256,8):
for j in range(8): print '%5d %2s' % (i+j, chr(i+j)),
print
It is most likely (at least on Windows) that the character set is interpreted according to the
Windows-1252
code.
Here are a few special symbols of this code:
chr(128): € chr(163): £ chr(165): ¥
chr(177): ±
If you want to be sure of non-ASCII characters, use Unicode (not specified here).
Non-printable characters
Non-printable and some printable characters can be specified by an
escape sequence, starting with a '\' (backslash). The printable backslash
itself is produced by '\\'.
|
name
|
symb
|
escape seq.
|
decimal
|
hexadec
|
octal
|
|
alert
|
BEL
|
\a
|
7
|
\x07
|
\007
|
|
backspace
|
BS
|
\b
|
8
|
\x08
|
\010
|
|
form feed
|
FF
|
\f
|
12
|
\x0C
|
\014
|
|
line feed, new line
|
LF
|
\n
|
10
|
\x0A
|
\012
|
|
carriage return
|
CR
|
\r
|
13
|
\x0D
|
\015
|
|
horizontal tab
|
TAB
|
\t
|
9
|
\x09
|
\011
|
|
vertical tab
|
VT
|
\v
|
11
|
\x0B
|
\013
|
|
single quote
|
'
|
\'
|
39
|
\x27
|
\047
|
|
double quote
|
"
|
\"
|
34
|
\x022
|
\042
|
|
backslash
|
\
|
\\
|
92
|
\x5C
|
\134
|
|
any character
|
|
octal-digits
|
...
|
\x..
|
\...
|
|
any character
|
|
\x hexadec-digits
|
...
|
\x..
|
\...
|
string objects
|
string methods
|
string functions
STRING FORMATTING
String formatting is described in the Python Library Reference section 2.3.6.2.
Here a short summary is given of the most useful features, with examples.
String formatting is used mainly for print operations to print variables like
integers, floats, strings in a desired format. This is accomplished by the
format operator % in the form
format % value(s)
format is a string containing one or more conversion specifiers;
value(s) is one or a tuple of variables whose values are to be converted
to strings. The number of conversion specifiers in format must equal
the number of items in value(s)
Examples
>>> print 'The price of one item is $%7.2f' % p
The price of one item is $ 29.90
>>> n=5
>>> print 'The price of %2d items is $%7.2f' % (n,p*n)
The price of 5 items is $ 149.50
The print statement uses the built-in function str to generate the
printable string:
>>> s=str('The price of one item is $%7.2f' % p)
>>> print s
The price of one item is $ 29.90
In these examples %7.2f and %2d are conversion specifiers.
Formatting numbers and strings
The most used form of a conversion specifier is
%[w][.d]t where
w = total field width
d = precision (nr of digits after decimal
point or total nr of digits)
t = conversion type:
d,i: signed integer, decimal
e,E: floating point exponential format
f,F: floating point, decimal format
g,G: as e,E if exp < -4 else as f,F
s: string
c: single character (int or one-el. string)
o: unsigned octal
u: unsigned decimal
x,X: unsigned hexadecimal
The items w and .p are optional and default values are used,
depending on the value. Thus also %d, %e, %g, %s are valid specifications.
Special case: %% does not format any value but prints a single %
Examples: (note that an inadequate specified width can be overridden)
>>> x = 123456789
>>> print 'x = %d, x = %5d, x = %12d' % (x,x,x)
x = 123456789, x = 123456789, x = 123456789
>>> y=1.23456
>>> print 'y = %e, y = %f, y = %g' % (y,y,y)
y = 1.234560e+000, y = 1.234560, y = 1.23456
>>> print 'y = %10.3e, y = %10.3f, y = %10.3g' % (y,y,y)
y = 1.235e+000, y = 1.235, y = 1.23
Formatting using mapping directory
format % dictionary
The conversion specifier is %(k)[w][.p]t where k
is the mapping key without its quotes. A dictionary
is a comma-separated list of key: value pairs within curly braces
(see Python Library Reference Manual section 2.3.8).
Examples:
>>> print 'Dear mr %(who)s, your payment of $%(amt)6.2f is now due.'\
% {'who': 'Johnson', 'amt': 29.90}
Dear mr Johnson, your payment of $ 29.90 is now due.
>>> dict={'John':2358, 'Jack':5731, 'Jill':5329}
>>> print 'The telephone nr of Jack is %(Jack)4d' % dict
The telephone nr of Jack is 5731
Goto:
top
|
string objects
|
string formatting
|
string functions
STRING METHODS
S is any string. The result is the method applied to the string S.
S.capitalize() -> string
Returns a copy of the string S with only its first character
capitalized.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.capitalize()
'A line with 6 random capitals'
See also: islower, istitle, isupper, lower, swapcase, title, upper
S.center(width[, fillchar]) -> string
Returns S centered in a string of length width. Padding is
done using the specified fill character (default is a space)
Example:
>>> s='beware'
>>> s.center(len(s)+6,'!')
'!!!beware!!!'
See also: ljust, rjust
S.count(sub[, start[, end]]) -> int
Returns the number of occurrences of substring sub in string
S[start:end]. Optional arguments start and end are
interpreted as in slice notation.
Example:
>>> s='abacadabra'
>>> s.count('a')
5
S.decode([encoding[,errors]]) -> object
Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registerd with codecs.register_error that is
able to handle UnicodeDecodeErrors.
Default: no change. See Python doc for module codecs.
S.encode([encoding[,errors]]) -> object
Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that is able to handle UnicodeEncodeErrors.
Default: no change. See Python doc for module codecs.
S.endswith(suffix[, start[, end]]) -> bool
Returns True if S ends with the specified suffix, False otherwise.
With optional start, test S beginning at that position.
With optional end, stop comparing S at that position.
Example:
>>> line='some text \n'
>>> line.endswith('\x0A')
True
S.expandtabs([tabsize]) -> string
Returns a copy of S where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
S.find(sub [,start [,end]]) -> int
Returns the lowest index in S where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
Example: (see index example first)
>>> s='Print <all> marked <items> in this <sentence>'
>>> p1=0
>>> while p1>=0:
p1=s.find('<')
p2=s.find('>')
print s[p1+1 : p2]
s=s[p2+1 :]
all
items
sentence
See also: index, rfind, rindex
S.index(sub [,start [,end]]) -> int
Like S.find() but raises ValueError when the substring is not found.
Example:
>>> s='Print the <first> marked item in this
<sentence>'
>>> print s[s.index('<')+1 : s.index('>')]
first
See also: find, rfind, rindex
S.isalnum() -> bool
Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.
Example:
S.isalpha() -> bool
Return True if all characters in S are alphabetic
and there is at least one character in S, False otherwise.
Example:
S.isdigit() -> bool
Return True if all characters in S are digits
and there is at least one character in S, False otherwise.
Example:
S.islower() -> bool
Return True if all cased characters in S are lowercase and there is
at least one cased character in S, False otherwise.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.islower()
False
See also: capitalize, istitle, isupper, lower, swapcase, title, upper
S.isspace() -> bool
Returns True if all characters in S are whitespace
and there is at least one character in S, False otherwise.
Example:
(print decimal codes of all whitespace characters)
>>> for i in range(256):
if chr(i).isspace(): print '%4d' % (i),
9 10 11 12
13 32 160
S.istitle() -> bool
Returns True if S is a titlecased string and there is at least one
character in S, i.e. uppercase characters may only follow uncased
characters and lowercase characters only cased ones. Return False
otherwise.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.istitle()
False
See also: capitalize, islower, isupper, lower, swapcase, title, upper
S.isupper() -> bool
Returns True if all cased characters in S are uppercase and there is
at least one cased character in S, False otherwise.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.isupper()
False
See also: capitalize, islower, istitle, lower, swapcase, title, upper
S.join(sequence) -> string
Returns a string which is the concatenation of the strings in the
sequence. The separator between elements is S.
Example:
>>> ', '.join(['jack', 'john', 'mary'])
'jack, john, mary'
S.ljust(width[, fillchar]) -> string
Returns S left justified in a string of length width. Padding is
done using the specified fill character (default is a space).
Example:
>>> 'left'.left(10),'-')
'left------'
See also: center, rjust
S.lower() -> string
Returns a copy of the string S converted to lowercase.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.lower()
'a line with 6 random capitals'
See also: capitalize, islower, istitle, isupper, lower, swapcase, title, upper
S.lstrip([chars]) -> string or unicode
Returns a copy of the string S with leading whitespace removed.
If chars is given and not None, remove leading characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
Example:
>>> ' spacious '.lstrip()
'spacious '
>>> 'www.example.com'.lstrip('cmowz.')
'example.com'
S.replace (old, new[, count]) -> string
Returns a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
Example:
S.rfind(sub [,start [,end]]) -> int
Return the highest index in S where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
Example:
See also: find, index, rindex
S.rindex(sub [,start [,end]]) -> int
Like S.rfind() but raise ValueError when the substring is not found.
Example:
See also:
find, index, rfind
S.rjust(width[, fillchar]) -> string
Return S right justified in a string of length width. Padding is
done using the specified fill character (default is a space)
Example:
>>> '-->'.right(10)
' -->'
See also: center, ljust
S.rsplit([sep [,maxsplit]]) -> list of strings
Return a list of the words in the string S, using sep as the
delimiter string, starting at the end of the string and working
to the front. If maxsplit is given, at most maxsplit splits are
done. If sep is not specified or is None, any whitespace string
is a separator.
Example:
S.rstrip([chars]) -> string or unicode
Return a copy of the string S with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
Example:
S.split([sep [,maxsplit]]) -> list of strings
Return a list of the words in the string S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator.
Example: (converts numbers in string to array)
>>> s='12 23 35 67'
>>> x=s.split()
>>> array(map(int,x))
array([12, 23, 35, 67])
S.splitlines([keepends]) -> list of strings
Return a list of the lines in S, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends
is given and true.
Example:
S.startswith(prefix[, start[, end]]) -> bool
Return True if S starts with the specified prefix, False otherwise.
With optional start, test S beginning at that position.
With optional end, stop comparing S at that position.
Example:
>>> line=' S.startswith....'
>>> line.lstrip().startswith('S.')
True
S.strip([chars]) -> string or unicode
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
Example:
S.swapcase() -> string
Return a copy of the string S with uppercase characters
converted to lowercase and vice versa.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.swapcase()
'A LInE wITH 6 RaNDOM CApItaLs'
See also:
capitalize, islower, istitle, isupper, lower, title, upper
S.title() -> string
Return a titlecased version of S, i.e. words start with uppercase
characters, all remaining cased characters have lowercase.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.title()
'A Line With 6 Random Capitals'
See also: capitalize, islower, istitle, isupper, lower, swapcase, upper
S.translate(table [,deletechars]) -> string
Return a copy of the string S, where all characters occurring
in the optional argument deletechars are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256.
Example:
S.upper() -> string
Return a copy of the string S converted to uppercase.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.capitalize()
'A LINE WITH 6 RANDOM CAPITALS'
See also: capitalize, islower, istitle, isupper, lower, swapcase, title
S.zfill(width) -> string
Pad a numeric string S with zeros on the left, to fill a field
of the specified width. The string S is never truncated.
Example:
Goto:
top
|
string objects
|
string formatting
|
string methods
STRING FUNCTIONS
Here are the built-in Python functions relevant for strings. Most of these functions
act on other objects as well (see Python Library Reference section 2.1)
chr(i)
Returns string of one character whose ASCII code is the integer i (0..255).
This is the inverse of ord().
>>> print chr(65), chr(97), chr(193), chr(225)
A a Á á
See also: ord, unichr
cmp(s1,s2)
Compares the two strings s1 and s2 and returns integer -1 if x < y,
0 if x == y and +1 if x > y. Order the same as used for sorting.
>>> cmp('Jack', 'Jill')
-1
See also: max, min
dir(s)
Returns a list of valid attributes for the string object
eval(s)
The string s is evaluated as a Python expression.
>>> x=1
>>> eval('x+1')
2
exec(s)
The string s is executed as a Python expression.
>>> x=1
>>> exec('y=x+1')
>>> y
2
float(s)
Converts a string (or a number) to floating point. The string argument must
contain a possibly signed decimal or floating point number, possibly embedded in whitespace.
>>> float('0.00000025')
2.4999999999999999e-007
See also: int, long
int(s)
Converts a string (or number) to a plain integer. The string argument must
contain a possibly signed decimal number representable as a Python integer,
possibly embedded in whitespace. If needed, a long integer is generated.
>>> int(' -125')
-125
>>> int('123456789123456789')
123456789123456789L
See also: float, long
len(s)
Returns the length (nr of characters) of the string s.
long(s)
Converts a string (or number) to a long integer. The string argument must
contain a possibly signed number of arbitrary size, possibly embedded in whitespace.
>>> int(' -125')
-125L
See also: float, int
max(s [, args])
With a single argument s, returns the 'largest' character c, i.e., the largest ord(c),
in the string. With more than one argument, returns the largest of the arguments.
>>>max('abacadabra')
'r'
>>> max('abacadabra', 'nonsense')
'nonsense'
See also: cmp, min
min(s)
With a single argument s, returns the 'smallest' character c, i.e., the smallest ord(c),
in the string. With more than one argument, returns the smallest of the arguments.
See also: cmp, max
ord(c)
Given a string of length one, returns an integer [0..255]: the value of the byte
when the argument is an 8-bit string. This is the inverse of chr().
For unicode objects an integer [0..65535] representing the Unicode code point
of the character is returned. This is the inverse of unichr().
>>> print ord('a'), ord('á')
97 225
>>> print ord(u'a'), ord(u'\u2020')
97 8224
See also: chr, unichr
raw_input(prompt)
If the prompt argument is present, it is written to standard output without
a trailing newline. The function then reads a line from input, converts it
to a string (stripping a trailing newline), and returns that.
>>> s = raw_input('-->')
-->
Monty Python's Flying Circus
>>> s
"Monty Python's Flying Circus"
repr(object)
Returns a string containing a printable representation of an object.
This is the same value yielded by conversions (reverse quotes). It is
sometimes useful to be able to access this operation as an ordinary function.
For many types, this function makes an attempt to return a string that
would yield an object with the same value when passed to eval().
>>> print repr(25), repr(1.e4)
25 10000.0
>>> print repr([1,2,3]), repr(array([1,2,3]))
[1, 2, 3] array([1, 2, 3])
See also: chr
sorted(s)
Returns a new sorted list from the items in iterable. The optional arguments
cmp, key, and reverse have the same meaning as those for the list.sort() method.
>>> s='aA1'
>>> sorted(s)
['1', 'A', 'a']
str(object)
Returns a string containing a nicely printable representation of an object.
For strings, this returns the string itself. The difference with repr(object)
is that str(object) does not always attempts to return a string that is acceptable
to eval(); its goal is to return a printable string. If no argument is given,
returns the empty string, ''.
>>> print str(25), str(1.e4)
25 10000.0
>>> print str([1,2,3]), str(array([1,2,3]))
[1, 2, 3] [1 2 3]
See also: repr
unichr(i)
Returns the Unicode string version of object
See also: chr