Home  Contents

Strings in Python

In this part of the Python programming tutorial, we will work with string data in more detail.

Strings are the most important data types in computer languages. That is why we dedicate a whole chapter to working with strings in Python.

String literals

In Python programming language, strings can be created with single quotes, double quotes or tripple quotes.

Strings in python can be created using single quotes, double quotes and tripple quotes. When we use tripple quotes, strings can span several lines without using the escape character.

#!/usr/bin/python

# strings.py

a = "proximity alert"
b = 'evacuation'
c = """
requiem 
for
a 
tower
"""

print a
print b
print c

In our example we assign three string literals to a, b, c variables. And we print them to the console.

$ ./strings.py 
proximity alert
evacuation

requiem 
for 
a 
tower

If we want to create unicode strings, we add a u/U character at the beginning of the text.

#!/usr/bin/python

# unicode.py

text = u'\u041b\u0435\u0432 \u041d\u0438\u043a\u043e\u043b\u0430\
\u0435\u0432\u0438\u0447 \u0422\u043e\u043b\u0441\u0442\u043e\u0439: \n\
\u0410\u043d\u043d\u0430 \u041a\u0430\u0440\u0435\u043d\u0438\u043d\u0430'

print text

In our example, we print Leo Tolstoy: Anna Karenina in azbuka.

$ ./unicode.py 
Лев Николаевич Толстой: 
Анна Каренина

Escape sequences

When we work with strings, we can use escape sequences. The escape sequences are special characters, that have a specific purpose, when used within a string.

print "   bbb\raaa" # prints aaabbb

The carriage return \r is a control character for end of line return to beginning of line.

#!/usr/bin/python

# strophe.py

print "Incompatible, it don't matter though\n'cos someone's bound to hear my cry"
print "Speak out if you do\nYou're not easy to find"

The new line is a control character, which begins a new line of text.

$ ./strophe.py 
Incompatible, it don't matter though
'cos someone's bound to hear my cry
Speak out if you do
You're not easy to find

Next we examine the backspace control character.

print "Python\b\b\booo" # prints Pytooo

The backspace control character \b moves the cursor one character back. In our case, we use three backspace characters to delete three letters and replace them with three o characters.

print "Towering\tinferno" # prints Towering        inferno

The horizontal tab puts a space between text.

"Johnie's dog"
'Johnie\'s dog'

Single and double quotes can be nested. Or in case we use only single quotes, we can use the backslash to escape the default meaning of a single quote.

If we prepend an r to the string, we get a raw string. The escape sequences are not interpreted.

#!/usr/bin/python

# raw.py

print r"Another world\n"
$ ./raw.py 
Another world\n

We get the string with the lew line character included.

Basic operations

Technically, a string is a an immutable sequence of characters.

#!/usr/bin/python

# seq.py

val = 'Python'

print val[0:3]
print val[:-1]

Note, that we cannot modify a string.

$ ./seq.py 
Pyt
Pytho

In the next example, we will show string multiplication and concatenation.

#!/usr/bin/python

# strings2.py

print "eagle " * 5

print "eagle " "falcon"

print "eagle " + "and " + "falcon"

The * operation repeates the string n times. In our case five times. Two string literals next to each other are automatically concatenated. We can also use the + operator to explicitly concatenate the strings.

$ ./strings2.py 
eagle eagle eagle eagle eagle 
eagle falcon
eagle and falcon

This is the output of the strings.py script.

We can use the len() function to calculate the length of the string in characters.

#!/usr/bin/python

# eagle.py

var = 'eagle'

print var, "has", len(var), "characters"

In the example, we compute tht number of characters in a string variable.

$ ./eagle.py 
eagle has 5 characters

Some programming languages enable implicit addition of strings and numbers. In Python language, this is not possible. We must explicitly convert values.

#!/usr/bin/python

# strnum.py

print int("12") + 12
print "There are " + str(22) + " oranges."
print float('22.33') + 22.55

We use a built-in int() function to convert a string to integer. And there is also a built-in str() function to convert a number to a string. And we use the float() function to convert a string to a floating point number.

Operations on strings

There are several useful built-in functions that can be used for working with string.

#!/usr/bin/python

# letters.py

sentence = "There are 22 apples"

alphas = 0
digits = 0
spaces = 0

for i in sentence:
   if i.isalpha():
      alphas += 1
   if i.isdigit():
      digits += 1
   if i.isspace():
      spaces += 1

print "There are", len(sentence), "characters"
print "There are", alphas, "alphabetic characters"
print "There are", digits, "digits"
print "There are", spaces, "spaces"

In our example, we have a string sentence. We calculate the absolute number of characters, number of alphabetic characters, digits and spaces in the sentence. To do this, we use functions: len(), isalpha(), isdigit() and isspace().

$ ./letters.py 
There are 19 characters
There are 14 alphabetic characters
There are 2 digits
There are 3 spaces

Next we will play with case distinctions.

#!/usr/bin/python

# case.py

title = "Lose Yourself"

print title.upper()
print title.lower()
print title.title()
print title.swapcase()
print title.capitalize()

Note that these functions do not modify the string. They return a modified copy of the original string.

$ ./case.py 
LOSE YOURSELF
lose yourself
Lose Yourself
lOSE yOURSELF
Lose yourself

In the next example, we will print results of football matches.

#!/usr/bin/python

# teams.py

print "Ajax Amsterdam" " - " + "Inter Milano " "2:3"
print "Real Madridi" " - " "AC Milano " "3:3"
print "Dortmund" " - " "Sparta Praha " "2:1"

We already know, that adjacent strings are concatenated.

$ ./teams1.py 
Ajax Amsterdam - Inter Milano 2:3
Real Madridi - AC Milano 3:3
Dortmund - Sparta Praha 2:1

The output does not look very good. We will change it so that it looks better.

#!/usr/bin/python

# teams.py

teams = { 
      0: ("Ajax Amsterdam", "Inter Milano"),
      1: ("Real Madrid", "AC Milano"),
      2: ("Dortmund", "Sparta Praha")
}

results = ("2:3", "3:3", "2:1")


for i in teams:
   print teams[i][0].ljust(16) + "-".ljust(5) + \
       teams[i][1].ljust(16) + results[i].ljust(3)

The ljust() method returns a left justified string, the rjust() method returns a right justified string. If the string is smaller than the width that we provided, it is filled with spaces.

$ ./teams2.py 
Ajax Amsterdam  -    Inter Milano    2:3
Real Madrid     -    AC Milano       3:3
Dortmund        -    Sparta Praha    2:1

Now the output looks better.

String formatting

String formatting or string interpolation is dynamic putting of various values into a string. It is a very handy feature of the Python programming language. To do string interpolation, we use the % operator.

#!/usr/bin/python

# oranges.py

print 'There are %d oranges in the basket' % 32

We use the %d formatting specifier. The d character says, we are expecting an integer. After the string, we put a modulo operator and an argument. In this case an integer value.

$ ./oranges.py 
There are 32 oranges in the basket

If we interpolate more values, we put the arguments into a tuple.

$ cat fruits.py 
#!/usr/bin/python

# fruits.py

print 'There are %d oranges and %d apples in the basket' % (12, 23)
$ ./fruits.py 
There are 12 oranges and 23 apples in the basket

In the next example, we will interpolate a float and a string value.

#!/usr/bin/python

# height.py

print 'Height: %f %s' % (172.3, 'cm')

The formatting specifier for a float value is %f and for a string %s.

$ ./height.py 
Height: 172.300000 cm

We might not like the fact, that the number in the previous example has 6 decimal places by default. We can control the number of the decimal places in the formatting specifier.

#!/usr/bin/python

# height2.py

print 'Height: %.1f %s' % (172.3, 'cm')

The decimal point followed by an integer controls the number of decimal places. In our case, the number will have exactly one decimal place.

$ ./height2.py 
Height: 172.3 cm

The following example shows other formatting options.

#!/usr/bin/python

# various.py

# hexadecimal
print "%x" % 300
print "%#x" % 300

# octal
print "%o" % 300

# scientific
print "%e" % 300000

The first two interpolations work with hexadecimal numbers. The x character will format the number in hexadecimal notation. The # character will add 0x to the hexadecimal number. The o character shows the number in octal format. The e character will show the number in scientific format.

$ ./various.py 
12c
0x12c
454
3.000000e+05

The next example will print three columns of numbers.

#!/usr/bin/python

# columns1.py

for x in range(1,11):
    print '%d %d %d' % (x, x*x, x*x*x)

The numbers are left justified and the output looks terrible.

$ ./columns1.py 
1 1 1
2 4 8
3 9 27
4 16 64
5 25 125
6 36 216
7 49 343
8 64 512
9 81 729
10 100 1000

To correct this, we use the width specifier. The width specifier defines the minimal width of the object. If the object is smaller than the width, it is filled with spaces.

#!/usr/bin/python

# spaces.py

for x in range(1,11):
    print '%2d %3d %4d' % (x, x*x, x*x*x)

Now the output looks OK. 2 says that the first column will be 2 charactes wide.

$ ./columns2.py 
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512
 9  81  729
10 100 1000

This chapter was about string data type in Python.