You are here: home python Strings

STRINGS

Goto: top | string formatting | string methods | string functions

STRING OBJECTS

A string (object) is a sequence of characters (bytes: 0:255), between single or double quotes: 'abc' or "abc", or of unicode characters (two bytes) u'abc' or u"abc". Quotes of the other kind may occur within a string:
  "This is a 'string'. " or 'this is a "string".'
Strings can be sliced and concatenated as other sequential objects. The binary set operation in works also for strings.
Python does not known the type "character"; a character is a string with one element.

>>> s="this is a string"
>>> t='of words'
>>> s+' '+t
'this is a string of words'
>>> len(s)
16
>>> s[:4]
'this'
>>> s[-6:]
'string'
>>> 'string' in s
True

Printable characters (1-byte code)

Characters decimal 32-126 are the printable ASCII code characters. Python uses an 8-bit code; characters decimal 128-255 are special signs and accented characters. Python just handles bits and bytes; your machine handles display and printing. Don't blame Python if your characters 128-255 don't appear on the screen or on the printer as you expect them to appear. To see how they look on your screen, execute the following Python statements:

>>> for i in range(32,256,8):
        for j in range(8): print '%5d %2s' % (i+j, chr(i+j)),
        print

It is most likely (at least on Windows) that the character set is interpreted according to the Windows-1252 code.
Here are a few special symbols of this code:
chr(128): €    chr(163): £    chr(165): ¥    chr(177): ±

If you want to be sure of non-ASCII characters, use Unicode (not specified here).

Non-printable characters

Non-printable and some printable characters can be specified by an escape sequence, starting with a '\' (backslash). The printable backslash itself is produced by '\\'.

name symb escape seq. decimal hexadec octal
alert BEL    \a 7 \x07 \007
backspace BS    \b 8 \x08 \010
form feed FF    \f 12 \x0C \014
line feed, new line LF    \n 10 \x0A \012
carriage return CR    \r 13 \x0D \015
horizontal tab TAB    \t 9 \x09 \011
vertical tab VT    \v 11 \x0B \013
single quote '    \' 39 \x27 \047
double quote "    \" 34 \x022 \042
backslash \    \\ 92 \x5C \134
any character      octal-digits ...    \x.. \...
any character      \x hexadec-digits ... \x.. \...

string objects | string methods | string functions

STRING FORMATTING

String formatting is described in the Python Library Reference section 2.3.6.2. Here a short summary is given of the most useful features, with examples.

String formatting is used mainly for print operations to print variables like integers, floats, strings in a desired format. This is accomplished by the format operator % in the form
   format % value(s)
format is a string containing one or more conversion specifiers; value(s) is one or a tuple of variables whose values are to be converted to strings. The number of conversion specifiers in format must equal the number of items in value(s)
Examples

>>> print 'The price of one item is $%7.2f' % p
The price of one item is $  29.90
>>> n=5
>>> print 'The price of %2d items is $%7.2f' % (n,p*n)
The price of 5 items is $ 149.50

The print statement uses the built-in function str to generate the printable string:

>>> s=str('The price of one item is $%7.2f' % p)
>>> print s
The price of one item is $  29.90

In these examples %7.2f and %2d are conversion specifiers.

Formatting numbers and strings

The most used form of a conversion specifier is
%[w][.d]t where
     w = total field width
     d = precision (nr of digits after decimal point or total nr of digits)
     t = conversion type:
        d,i: signed integer, decimal
        e,E: floating point exponential format
        f,F: floating point, decimal format
        g,G: as e,E if exp < -4 else as f,F
        s: string
        c: single character (int or one-el. string)
        o: unsigned octal
        u: unsigned decimal
        x,X: unsigned hexadecimal
The items w and .p are optional and default values are used, depending on the value. Thus also %d, %e, %g, %s are valid specifications.
Special case: %% does not format any value but prints a single %
Examples: (note that an inadequate specified width can be overridden)

>>> x = 123456789
>>> print 'x = %d, x = %5d, x = %12d' % (x,x,x)
x = 123456789, x = 123456789, x =    123456789
>>> y=1.23456
>>> print 'y = %e, y = %f, y = %g' % (y,y,y)
y = 1.234560e+000, y = 1.234560, y = 1.23456
>>> print 'y = %10.3e, y = %10.3f, y = %10.3g' % (y,y,y)
y = 1.235e+000, y =      1.235, y =       1.23

Formatting using mapping directory

   format % dictionary
The conversion specifier is %(k)[w][.p]t where k is the mapping key without its quotes. A dictionary is a comma-separated list of key: value pairs within curly braces (see Python Library Reference Manual section 2.3.8).
Examples:

>>> print 'Dear mr %(who)s, your payment of $%(amt)6.2f is now due.'\
% {'who': 'Johnson', 'amt': 29.90}
Dear mr Johnson, your payment of $ 29.90 is now due.
>>> dict={'John':2358, 'Jack':5731, 'Jill':5329}
>>> print 'The telephone nr of Jack is %(Jack)4d' % dict
The telephone nr of Jack is 5731
Goto: top | string objects | string formatting | string functions

STRING METHODS

S is any string. The result is the method applied to the string S.

S.capitalize() -> string
Returns a copy of the string S with only its first character capitalized.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.capitalize()
'A line with 6 random capitals'
See also: islower, istitle, isupper, lower, swapcase, title, upper
S.center(width[, fillchar]) -> string
Returns S centered in a string of length width. Padding is done using the specified fill character (default is a space)
Example:
>>> s='beware'
>>> s.center(len(s)+6,'!')
'!!!beware!!!'
See also: ljust, rjust
S.count(sub[, start[, end]]) -> int
Returns the number of occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Example:
>>> s='abacadabra'
>>> s.count('a')
5
S.decode([encoding[,errors]]) -> object
Decodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name registerd with codecs.register_error that is able to handle UnicodeDecodeErrors.
Default: no change. See Python doc for module codecs.
S.encode([encoding[,errors]]) -> object
Encodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and 'xmlcharrefreplace' as well as any other name registered with codecs.register_error that is able to handle UnicodeEncodeErrors.
Default: no change. See Python doc for module codecs.
S.endswith(suffix[, start[, end]]) -> bool
Returns True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position.
Example:
>>> line='some text \n'
>>> line.endswith('\x0A')
True
S.expandtabs([tabsize]) -> string
Returns a copy of S where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed.
S.find(sub [,start [,end]]) -> int
Returns the lowest index in S where substring sub is found, such that sub is contained within s[start,end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
Example: (see index example first)
>>> s='Print <all> marked <items> in this <sentence>'
>>> p1=0
>>> while p1>=0:
        p1=s.find('<')
        p2=s.find('>')
        print s[p1+1 : p2]
        s=s[p2+1 :]

all
items
sentence
See also: index, rfind, rindex
S.index(sub [,start [,end]]) -> int
Like S.find() but raises ValueError when the substring is not found.
Example:
>>> s='Print the <first> marked item in this <sentence>'
>>> print s[s.index('<')+1 : s.index('>')]
first
See also: find, rfind, rindex
S.isalnum() -> bool
Return True if all characters in S are alphanumeric and there is at least one character in S, False otherwise.
Example:
S.isalpha() -> bool
Return True if all characters in S are alphabetic and there is at least one character in S, False otherwise.
Example:
S.isdigit() -> bool
Return True if all characters in S are digits and there is at least one character in S, False otherwise.
Example:
S.islower() -> bool
Return True if all cased characters in S are lowercase and there is at least one cased character in S, False otherwise.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.islower()
False
See also: capitalize, istitle, isupper, lower, swapcase, title, upper
S.isspace() -> bool
Returns True if all characters in S are whitespace and there is at least one character in S, False otherwise.
Example: (print decimal codes of all whitespace characters)
>>> for i in range(256):
        if chr(i).isspace(): print '%4d' % (i),

  9   10   11   12   13   32   160
S.istitle() -> bool
Returns True if S is a titlecased string and there is at least one character in S, i.e. uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return False otherwise.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.istitle()
False
See also: capitalize, islower, isupper, lower, swapcase, title, upper
S.isupper() -> bool
Returns True if all cased characters in S are uppercase and there is at least one cased character in S, False otherwise.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.isupper()
False
See also: capitalize, islower, istitle, lower, swapcase, title, upper
S.join(sequence) -> string
Returns a string which is the concatenation of the strings in the sequence. The separator between elements is S.
Example:
>>> ', '.join(['jack', 'john', 'mary'])
'jack, john, mary'
S.ljust(width[, fillchar]) -> string
Returns S left justified in a string of length width. Padding is done using the specified fill character (default is a space).
Example:
>>> 'left'.left(10),'-')
'left------'
See also: center, rjust
S.lower() -> string
Returns a copy of the string S converted to lowercase.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.lower()
'a line with 6 random capitals'
See also: capitalize, islower, istitle, isupper, lower, swapcase, title, upper
S.lstrip([chars]) -> string or unicode
Returns a copy of the string S with leading whitespace removed. If chars is given and not None, remove leading characters in chars instead. If chars is unicode, S will be converted to unicode before stripping
Example:
>>> '   spacious   '.lstrip()
'spacious   '
>>> 'www.example.com'.lstrip('cmowz.')
'example.com'
S.replace (old, new[, count]) -> string
Returns a copy of string S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
Example:
S.rfind(sub [,start [,end]]) -> int
Return the highest index in S where substring sub is found, such that sub is contained within s[start,end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
Example:
See also: find, index, rindex
S.rindex(sub [,start [,end]]) -> int
Like S.rfind() but raise ValueError when the substring is not found.
Example:
See also: find, index, rfind
S.rjust(width[, fillchar]) -> string
Return S right justified in a string of length width. Padding is done using the specified fill character (default is a space)
Example:
>>> '-->'.right(10)
'       -->'
See also: center, ljust
S.rsplit([sep [,maxsplit]]) -> list of strings
Return a list of the words in the string S, using sep as the delimiter string, starting at the end of the string and working to the front. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator.
Example:
S.rstrip([chars]) -> string or unicode
Return a copy of the string S with trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping
Example:
S.split([sep [,maxsplit]]) -> list of strings
Return a list of the words in the string S, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator.
Example: (converts numbers in string to array)
>>> s='12 23 35 67'
>>> x=s.split()
>>> array(map(int,x))
array([12, 23, 35, 67])
S.splitlines([keepends]) -> list of strings
Return a list of the lines in S, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.
Example:
S.startswith(prefix[, start[, end]]) -> bool
Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position.
Example:
>>> line=' S.startswith....'
>>> line.lstrip().startswith('S.')
True
S.strip([chars]) -> string or unicode
Return a copy of the string S with leading and trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping
Example:
S.swapcase() -> string
Return a copy of the string S with uppercase characters converted to lowercase and vice versa.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.swapcase()
'A LInE wITH 6 RaNDOM CApItaLs'
See also: capitalize, islower, istitle, isupper, lower, title, upper
S.title() -> string
Return a titlecased version of S, i.e. words start with uppercase characters, all remaining cased characters have lowercase.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.title()
'A Line With 6 Random Capitals'
See also: capitalize, islower, istitle, isupper, lower, swapcase, upper
S.translate(table [,deletechars]) -> string
Return a copy of the string S, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256.
Example:
S.upper() -> string
Return a copy of the string S converted to uppercase.
Example:
>>> 'a liNe With 6 rAndom caPiTAlS'.capitalize()
'A LINE WITH 6 RANDOM CAPITALS'
See also: capitalize, islower, istitle, isupper, lower, swapcase, title
S.zfill(width) -> string
Pad a numeric string S with zeros on the left, to fill a field of the specified width. The string S is never truncated.
Example:
Goto: top | string objects | string formatting | string methods

STRING FUNCTIONS

Here are the built-in Python functions relevant for strings. Most of these functions act on other objects as well (see Python Library Reference section 2.1)

chr(i)
Returns string of one character whose ASCII code is the integer i (0..255). This is the inverse of ord().
>>> print chr(65), chr(97), chr(193), chr(225)
A a Á á
See also: ord, unichr
cmp(s1,s2)
Compares the two strings s1 and s2 and returns integer -1 if x < y, 0 if x == y and +1 if x > y. Order the same as used for sorting.
>>> cmp('Jack', 'Jill')
-1
See also: max, min
dir(s)
Returns a list of valid attributes for the string object
eval(s)
The string s is evaluated as a Python expression.
>>> x=1
>>> eval('x+1')
2
exec(s)
The string s is executed as a Python expression.
>>> x=1
>>> exec('y=x+1')
>>> y
2
float(s)
Converts a string (or a number) to floating point. The string argument must contain a possibly signed decimal or floating point number, possibly embedded in whitespace.
>>> float('0.00000025')
2.4999999999999999e-007
See also: int, long
int(s)
Converts a string (or number) to a plain integer. The string argument must contain a possibly signed decimal number representable as a Python integer, possibly embedded in whitespace. If needed, a long integer is generated.
>>> int(' -125')
-125
>>> int('123456789123456789')
123456789123456789L
See also: float, long
len(s)
Returns the length (nr of characters) of the string s.
long(s)
Converts a string (or number) to a long integer. The string argument must contain a possibly signed number of arbitrary size, possibly embedded in whitespace.
>>> int(' -125')
-125L
See also: float, int
max(s [, args])
With a single argument s, returns the 'largest' character c, i.e., the largest ord(c), in the string. With more than one argument, returns the largest of the arguments.
>>>max('abacadabra')
'r'
>>> max('abacadabra', 'nonsense')
'nonsense'
See also: cmp, min
min(s)
With a single argument s, returns the 'smallest' character c, i.e., the smallest ord(c), in the string. With more than one argument, returns the smallest of the arguments.
See also: cmp, max
ord(c)
Given a string of length one, returns an integer [0..255]: the value of the byte when the argument is an 8-bit string. This is the inverse of chr().
For unicode objects an integer [0..65535] representing the Unicode code point of the character is returned. This is the inverse of unichr().
>>> print ord('a'), ord('')
97 225
>>> print ord(u'a'), ord(u'\u2020')
97 8224
See also: chr, unichr
raw_input(prompt)
If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that.
>>> s = raw_input('-->')
--> Monty Python's Flying Circus
>>> s
"Monty Python's Flying Circus"
repr(object)
Returns a string containing a printable representation of an object. This is the same value yielded by conversions (reverse quotes). It is sometimes useful to be able to access this operation as an ordinary function. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval().
>>> print repr(25), repr(1.e4)
25 10000.0
>>> print repr([1,2,3]), repr(array([1,2,3]))
[1, 2, 3] array([1, 2, 3])
See also: chr
sorted(s)
Returns a new sorted list from the items in iterable. The optional arguments cmp, key, and reverse have the same meaning as those for the list.sort() method.
>>> s='aA1'
>>> sorted(s)
['1', 'A', 'a']
str(object)
Returns a string containing a nicely printable representation of an object. For strings, this returns the string itself. The difference with repr(object) is that str(object) does not always attempts to return a string that is acceptable to eval(); its goal is to return a printable string. If no argument is given, returns the empty string, ''.
>>> print str(25), str(1.e4)
25 10000.0
>>> print str([1,2,3]), str(array([1,2,3]))
[1, 2, 3] [1 2 3]
See also: repr
unichr(i)
Returns the Unicode string version of object
See also: chr