2.5. Type Str

2.5.1. Type Definition

  • str is a sequence

Listing 2.14. str Type Definition
data = ''
data = 'Jan Twardowski'
data = "Jan Twardowski"
data = '''Jan Twardowski'''
data = """Jan Twardowski"""

data = """First line
Second line
Third line"""
# 'First line\nSecond line\nThird line'

data = """
    First line
    Second line
    Third line
"""
# '\n        First line\n        Second line\n        Third line\n    '

2.5.2. Type Casting

Listing 2.15. str() converts argument to str
str('Moon')                     # 'Moon'
str('Twardowski\'s Moon.')      # "Twardowski's Moon."
str(1969)                       # '1969'
str(13.37)                      # '13.37'

2.5.3. Single and Double Quotes

  • " and ' works the same

  • Choose one and keep consistency in code

  • Python console prefers single quote (') character

  • It matters for doctest, which compares two outputs character by character

  • For multiline always use double quote characters to be consistent with the docstring convention PEP 257

Listing 2.16. Python console prefers single quote (')
data = "We choose to go to the Moon!"

print(data)
# 'We choose to go to the Moon!'
Listing 2.17. It's better to use double quotes, when text has apostrophes. This is the behavior of Python console.
data = 'It\'s Twardowski\'s Moon.'

print(data)
# "It's Twardowski's Moon."
Listing 2.18. HTML and XML uses double quotes to enclose attribute values, hence it's better to use single quotes for the string.
data = '<a href="http://python.astrotech.io">Python and Machine Learning</a>'

print(data)
# '<a href="http://python.astrotech.io">Python and Machine Learning</a>'
Listing 2.19. For multiline always use double quote characters to be consistent with the docstring convention PEP 257
data = """My name's "José Jiménez""""
data = '''My name\'s "José Jiménez"'''

2.5.4. Docstring

  • For multiline always use double quote characters to be consistent with the docstring convention PEP 257

  • More information in Function Doctest

Listing 2.20. If assigned to variable, it serves as multiline str otherwise it's a docstring.
"""
We choose to go to the Moon!
We choose to go to the Moon in this decade and do the other things,
not because they are easy, but because they are hard;
because that goal will serve to organize and measure the best of our energies and skills,
because that challenge is one that we are willing to accept, one we are unwilling to postpone,
and one we intend to win, and the others, too.
"""

2.5.5. Escape Characters

  • \r\n - New line used on Windows (CR LF)

  • \n - New line used on Linux, macOS and other *nix systems (CR)

  • \t - Horizontal Tab (TAB)

  • \' - Single quote ' (escape in single quoted strings)

  • \" - Double quote " (escape in double quoted strings)

  • \\ - Backslash \ (to indicate, that this is not escape char)

  • More information in Builtin Printing

print('\U0001F680')     # 🚀

2.5.6. Format String

  • String interpolation (variable substitution)

  • Since Python 3.6

  • Used for str concatenation

name = 'José Jiménez'

print(f'My name... {name}')
# My name... José Jiménez
first_name = 'Jan'
last_name = 'Twardowski'

result = f'My name... {first_name} {last_name}'
# Jan Twardowski

2.5.7. Unicode Literal

  • In Python 3 str is Unicode

  • In Python 2 str is Bytes

  • In Python 3 u'...' is only for compatibility with Python 2

u'zażółć gęślą jaźń'

2.5.8. Bytes Literal

  • Used while reading from low level devices and drivers

  • Used in sockets and HTTP connections

  • bytes is a sequence of octets (integers between 0 and 255)

  • bytes.decode() conversion to unicode str

  • str.encode() conversion to bytes

b'this is bytes literals'

2.5.9. Raw String

  • Escapes does not matters

Listing 2.21. In Regular Expressions
r'[a-z0-9]\n'
print(r'C:\Users\Admin\file.txt')
# C:\Users\Admin\file.txt

print('C:\Users\Admin\file.txt')
# SyntaxError: (unicode error) 'unicodeescape'
#   codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
  • Problem: \Users

  • after \U... python expects Unicode codepoint in hex

  • s is invalid hexadecimal character

2.5.10. Reading Input

  • input() returns str

  • Good practice: add space at the end of prompt

  • Good practice: always .strip() text from user input

  • Good practice: always sanitize values from user prompt

Listing 2.22. input() function argument is prompt text, which "invites" user to enter specific information. Note colon space (": ") at the end. Space is needed to separate user input from prompt.
name = input('What is your name: ')
# What is your name: Jan Twardowski<ENTER>

print(name)     # 'Jan Twardowski'
type(name)      # <class 'str'>
Listing 2.23. input() always returns a str. To get numeric value type casting to int is needed.
age = input('What is your age: ')
# What is your age: 42<ENTER>

print(age)      # '42'
type(age)       # <class 'str'>

age = int(age)
print(age)      # 42
type(age)       # <class 'int'>
Listing 2.24. Conversion to float handles decimals, which int does not support
age = input('What is your age: ')
# What is your age: 42.5<ENTER>

age = int(age)      # ValueError: invalid literal for int() with base 10: '42.5'
age = float(age)    # 42.5

print(age)          # 42.5
type(age)           # <class 'int'>
Listing 2.25. Conversion to float cannot handle comma (',') as a decimal separator
age = input('What is your age: ')
# What is your age: 42,5<ENTER>

age = int(age)      # ValueError: invalid literal for int() with base 10: '45,5'
age = float(age)    # ValueError: could not convert string to float: '45,5'

2.5.11. Concatenation

  • Preferred string concatenation is using f-string formatting

'a' + 'b'
# 'ab'
text1 = 'a'
text2 = 'b'

text1 + text2
# 'ab'
first_name = 'Jan'
last_name = 'Twardowski'

first_name + last_name
# Jan Twardowski

first_name + ' ' + last_name
# Jan Twardowski
Listing 2.26. How many string are there in a memory?
first_name = 'Jan'
last_name = 'Twardowski'
age = 42

# How many string are there in a memory?
first_name + last_name

# How many string are there in a memory?
'Hello ' + first_name + ' ' + last_name + ' ' + str(age) + '!'

# How many string are there in a memory?
f'Hello {first_name} {last_name} {age}!'
'-' * 10            # ----------
'Beetlejuice' * 3   # BeetlejuiceBeetlejuiceBeetlejuice

'Mua' + 'Ha' * 2    # 'MuaHaHa'

2.5.12. Assignments

2.5.12.1. Example

English
  • Ask user to input text

  • Print number of characters

Polish
  • Poproś użytkownika o wprowadzenie tekstu

  • Wypisz liczbę znaków

Solution
text = input('Type text: ')
length = len(text)

print(length)

2.5.12.2. Emoticon Print

English
  1. Ask user to input name

  2. Print hello NAME EMOTICON, where:

    • NAME is a name read from user

    • EMOTICON is Unicode Codepoint "U0001F642"

  3. Print length of a name, which was read from user

Polish
  1. Poproś użytkownika o wprowadzenie imienia

  2. Wypisz hello NAME EMOTICON, gdzie:

    • NAME to imię wczytane od użytkownika

    • EMOTICON to Unicode Codepoint "U0001F642"

  3. Wyświetl długość imienia, wczytanego od użytkownika

The whys and wherefores
  • Variable declaration

  • Print formatting

  • Reading input data from user

2.5.12.3. Variables and Types

  • Complexity level: easy

  • Lines of code to write: 3 lines

  • Estimated time of completion: 10 min

  • Solution: solution/type_str_input.py

English
  1. Ask user to input name

  2. To print use f-string formatting

  3. Note, that second line starts with tab

  4. Value in double quotes is a name read from user (in output user typed José Jiménez)

  5. Mind the different quotes, apostrophes, tabs and newlines

  6. Do not use neither space not enter - use \n and \t

  7. Do not use string addition (str + str)

  8. Compare result with "Output" section (see below)

Polish
  1. Poproś użytkownika o wprowadzenie imienia

  2. Do wypisania użyj f-string formatting

  3. Zauważ, że druga linijka zaczyna się od tabulacji

  4. Wartość w podwójnych cudzysłowach to ciąg od użytkownika (w przykładzie użytkownik wpisał José Jiménez)

  5. Zwróć uwagę na znaki apostrofów, cudzysłowów, tabulacji i nowych linii

  6. Nie używaj spacji ani entera - użyj \n i \t

  7. Nie korzystaj z dodawania stringów (str + str)

  8. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Output
'''My name... "José Jiménez".
    I'm an """astronaut!"""'''
The whys and wherefores
  • Variable declaration

  • Print formatting

  • Reading input data from user