2.5. Type Str

2.5.1. Type Definition

  • str is a sequence

Listing 2.17. str Type Definition
data = ''
data = 'Jan Twardowski'

data = """First line
Second line
Third line"""
# 'First line\nSecond line\nThird line'

2.5.2. Type Casting

  • str() converts argument to str

  • print() before printing on the screen runs str() on its arguments

Listing 2.18. str() converts argument to str
str('Moon')                     # 'Moon'
str(1969)                       # '1969'
str(13.37)                      # '13.37'

2.5.3. Single and Double Quotes

  • " and ' works the same

  • Choose one and keep consistency in code

  • Python console prefers single quote (') character

  • It matters for doctest, which compares two outputs character by character

  • For multiline always use double quote characters to be consistent with the docstring convention PEP 257

Listing 2.19. Python console prefers single quote (')
data = 'My name is José Jiménez'

print(data)
# 'My name is José Jiménez'
Listing 2.20. Python console prefers single quote (')
data = "My name is José Jiménez"

print(data)
# 'My name is José Jiménez'
Listing 2.21. It's better to use double quotes, when text has apostrophes. This is the behavior of Python console.
data = 'My name\'s José Jiménez'

print(data)
# "My name's José Jiménez"
Listing 2.22. HTML and XML uses double quotes to enclose attribute values, hence it's better to use single quotes for the string.
data = '<a href="http://python.astrotech.io">Python and Machine Learning</a>'

print(data)
# '<a href="http://python.astrotech.io">Python and Machine Learning</a>'
Listing 2.23. For multiline always use double quote characters to be consistent with the docstring convention PEP 257
data = """My name's "José Jiménez""""
data = '''My name\'s "José Jiménez"'''

2.5.4. Docstring

  • For multiline always use double quote characters to be consistent with the docstring convention PEP 257

  • More information in Function Doctest

Listing 2.24. If assigned to variable, it serves as multiline str otherwise it's a docstring.
"""
We choose to go to the Moon!
We choose to go to the Moon in this decade and do the other things,
not because they are easy, but because they are hard;
because that goal will serve to organize and measure the best of our energies and skills,
because that challenge is one that we are willing to accept, one we are unwilling to postpone,
and one we intend to win, and the others, too.
"""

2.5.5. Escape Characters

  • \n - New line (ENTER)

  • \t - Horizontal Tab (TAB)

  • \' - Single quote ' (escape in single quoted strings)

  • \" - Double quote " (escape in double quoted strings)

  • \\ - Backslash \ (to indicate, that this is not escape char)

  • More information in Builtin Printing

print('\U0001F680')     # 🚀

2.5.6. Format String

  • String interpolation (variable substitution)

  • Since Python 3.6

  • Used for str concatenation

name = 'José Jiménez'

print(f'My name... {name}')
# My name... José Jiménez
firstname = 'José'
lastname = 'Jiménez'
result = f'My name... {firstname} {lastname}'

print(result)
# My name... José Jiménez

2.5.7. Unicode Literal

  • In Python 3 str is Unicode

  • In Python 2 str is Bytes

  • In Python 3 u'...' is only for compatibility with Python 2

u'zażółć gęślą jaźń'

2.5.8. Bytes Literal

  • Used while reading from low level devices and drivers

  • Used in sockets and HTTP connections

  • bytes is a sequence of octets (integers between 0 and 255)

  • bytes.decode() conversion to unicode str

  • str.encode() conversion to bytes

'Moon'              # Unicode (in Python 3)
b'Moon'             # Bytes Literal
'Moon'.encode()     # b'Moon'
b'Moon'.decode()    # 'Moon'

2.5.9. Raw String

  • Escapes does not matters

Listing 2.25. In Regular Expressions
r'[a-z0-9]\n'
print(r'C:\Users\Admin\file.txt')
# C:\Users\Admin\file.txt

print('C:\\Users\\Admin\\file.txt')
# C:\Users\Admin\file.txt

print('C:\Users\Admin\file.txt')
# Traceback (most recent call last):
#     ...
# SyntaxError: (unicode error) 'unicodeescape'
#   codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
  • Problem: \Users

  • after \U... python expects Unicode codepoint in hex

  • s is invalid hexadecimal character

2.5.10. Reading Input

  • input() returns str

  • Good practice: add space at the end of prompt

  • Good practice: always .strip() text from user input

  • Good practice: always sanitize values from user prompt

Listing 2.26. input() function argument is prompt text, which "invites" user to enter specific information. Note colon space (": ") at the end. Space is needed to separate user input from prompt.
name = input('What is your name: ')  # Jan Twardowski<ENTER>

print(name)     # 'Jan Twardowski'
type(name)      # <class 'str'>
Listing 2.27. input() always returns a str. To get numeric value type casting to int is needed.
age = input('What is your age: ')  # 42<ENTER>

print(age)      # '42'
type(age)       # <class 'str'>

age = int(age)
print(age)      # 42
type(age)       # <class 'int'>
Listing 2.28. Conversion to float handles decimals, which int does not support
age = input('What is your age: ')  # 42.5<ENTER>

age = int(age)      # ValueError: invalid literal for int() with base 10: '42.5'
age = float(age)    # 42.5

print(age)          # 42.5
type(age)           # <class 'int'>
Listing 2.29. Conversion to float cannot handle comma (',') as a decimal separator
age = input('What is your age: ')  # 42,5<ENTER>

age = int(age)      # ValueError: invalid literal for int() with base 10: '45,5'
age = float(age)    # ValueError: could not convert string to float: '45,5'

2.5.11. Length

len('hello')
# 5

2.5.12. Concatenation

  • Preferred string concatenation is using f-string formatting

'a' + 'b'
# 'ab'

'1' + '2'
# '12'
text1 = 'a'
text2 = 'b'

text1 + text2
# 'ab'
a = '1'
b = '2'

a + b
'12'
'-' * 10                # '----------'
'Beetlejuice' * 3       # 'BeetlejuiceBeetlejuiceBeetlejuice'
'Mua' + 'Ha' * 2        # 'MuaHaHa'
'Mua' + ('Ha'*2)        # 'MuaHaHa'
('Mua'+'Ha') * 2        # 'MuaHaMuaHa'
firstname = 'Jan'
lastname = 'Twardowski'

firstname + lastname
# JanTwardowski

firstname + ' ' + lastname
# Jan Twardowski

2.5.13. String Immutability

Listing 2.30. How many string are there in a memory?
firstname = 'Jan'
lastname = 'Twardowski'

firstname + ' ' + lastname
# Jan Twardowski
Listing 2.31. How many string are there in a memory?
firstname = 'Jan'
lastname = 'Twardowski'

f'{firstname} {lastname}'
# Jan Twardowski
Listing 2.32. How many string are there in a memory?
firstname = 'Jan'
lastname = 'Twardowski'
age = 42

'Hello ' + firstname + ' ' + lastname + ' ' + str(age) + '!'
# 'Hello Jan Twardowski 42!'
Listing 2.33. How many string are there in a memory?
firstname = 'Jan'
lastname = 'Twardowski'
age = 42

f'Hello {firstname} {lastname} {age}!'
# 'Hello Jan Twardowski 42!'
../../_images/memory-str-1.png

Figure 2.2. Define str

../../_images/memory-str-2.png

Figure 2.3. Define another str with the same value

../../_images/memory-str-3.png

Figure 2.4. Define another str with different value

2.5.14. Assignments

2.5.14.1. Type String Input

  • Assignment name: Type String Input

  • Last update: 2020-10-01

  • Complexity level: easy

  • Lines of code to write: 3 lines

  • Estimated time of completion: 3 min

  • Solution: solution/type_str_input.py

English
  1. Ask user to input text

  2. Print number of characters

Polish
  1. Poproś użytkownika o wprowadzenie tekstu

  2. Wypisz liczbę znaków

2.5.14.2. Type String Emoticon

  • Assignment name: Type String Emoticon

  • Last update: 2020-10-01

  • Complexity level: easy

  • Lines of code to write: 2 lines

  • Estimated time of completion: 3 min

  • Solution: solution/type_str_emoticon.py

English
  1. Ask user to input name

  2. Print hello NAME EMOTICON, where:

    • NAME is a name read from user

    • EMOTICON is Unicode Codepoint "U0001F642"

Polish
  1. Poproś użytkownika o wprowadzenie imienia

  2. Wypisz hello NAME EMOTICON, gdzie:

    • NAME to imię wczytane od użytkownika

    • EMOTICON to Unicode Codepoint "U0001F642"

The whys and wherefores
  • Variable declaration

  • Print formatting

  • Reading input data from user

2.5.14.3. Type String Quotes

  • Assignment name: Type String Quotes

  • Last update: 2020-10-01

  • Complexity level: easy

  • Lines of code to write: 3 lines

  • Estimated time of completion: 8 min

  • Solution: solution/type_str_quotes.py

English
  1. Ask user to input name

  2. To print use f-string formatting

  3. Note, that second line starts with tab

  4. Value NAME in double quotes is a name read from user

  5. Mind the different quotes, apostrophes, tabs and newlines

  6. Do not use neither space not enter - use \n and \t

  7. Do not use string addition (str + str)

  8. Compare result with "Output" section (see below)

Polish
  1. Poproś użytkownika o wprowadzenie imienia

  2. Do wypisania użyj f-string formatting

  3. Zauważ, że druga linijka zaczyna się od tabulacji

  4. Wartość NAME w podwójnych cudzysłowach to ciąg od użytkownika

  5. Zwróć uwagę na znaki apostrofów, cudzysłowów, tabulacji i nowych linii

  6. Nie używaj spacji ani entera - użyj \n i \t

  7. Nie korzystaj z dodawania stringów (str + str)

  8. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Output
'''My name... "NAME".
    I'm an """astronaut!"""'''
The whys and wherefores
  • Variable declaration

  • Print formatting

  • Reading input data from user