2.5. Type Str

2.5.1. Definition

  • str is a sequence

    >>> data = ''
    >>> data = 'Jan Twardowski'
    >>> data =  'First line\nSecond line\nThird line'
    >>>
    >>> data = """First line
    ... Second line
    ... Third line"""
    

2.5.2. Type Casting

Builtin function str() converts argument to str

>>> str('Moon')
'Moon'
>>> str(1969)
'1969'
>>> str(1.337)
'1.337'

Builtin function print() before printing on the screen runs str() on its arguments:

>>> print(1969)
1969

2.5.3. Single and Double Quotes

  • " and ' works the same

  • Choose one and keep consistency in code

  • Python console prefers single quote (') character

  • It matters for doctest, which compares two outputs character by character

  • PEP 257 -- Docstring Conventions: For multiline str always use three double quote (""") characters

Python console prefers single quote ('):

>>> data = 'My name is José Jiménez'
>>> data
'My name is José Jiménez'

Python console prefers single quote ('):

>>> data = "My name is José Jiménez"
>>> data
'My name is José Jiménez'

It's better to use double quotes, when text has apostrophes. This is the behavior of Python console:

>>> data = 'My name\'s José Jiménez'
>>> data
"My name's José Jiménez"

HTML and XML uses double quotes to enclose attribute values, hence it's better to use single quotes for the string:

>>> data = '<a href="http://python.astrotech.io">Python and Machine Learning</a>'
>>> data
'<a href="http://python.astrotech.io">Python and Machine Learning</a>'

PEP 257 -- Docstring Conventions: For multiline str always use three double quote (""") characters

>>> data = """My name's \"José Jiménez\""""
>>> data = '''My name\'s "José Jiménez"'''

2.5.4. Docstring

  • PEP 257 -- Docstring Conventions: For multiline str always use three double quote (""") characters

  • More information in Function Doctest

If assigned to variable, it serves as multiline str otherwise it's a docstring:

>>> TEXT = """
... We choose to go to the Moon!
... We choose to go to the Moon in this decade and do the other things,
... not because they are easy, but because they are hard;
... because that goal will serve to organize and measure the best of our energies and skills,
... because that challenge is one that we are willing to accept, one we are unwilling to postpone,
... and one we intend to win, and the others, too.
... """

2.5.5. Escape Characters

  • \n - New line (ENTER)

  • \t - Horizontal Tab (TAB)

  • \' - Single quote ' (escape in single quoted strings)

  • \" - Double quote " (escape in double quoted strings)

  • \\ - Backslash \ (to indicate, that this is not escape char)

  • More information in Builtin Printing

  • https://en.wikipedia.org/wiki/List_of_Unicode_characters

    >>> print('\U0001F680')
    🚀
    
    >>> a = '\U0001F9D1'  # 🧑
    >>> b = '\U0000200D'  # ''
    >>> c = '\U0001F680'  # 🚀
    >>>
    >>> astronaut = a + b + c
    >>> print(astronaut)
    🧑‍🚀
    

2.5.6. Format String

  • String interpolation (variable substitution)

  • Since Python 3.6

  • Used for str concatenation

    >>> name = 'José Jiménez'
    >>>
    >>> print(f'My name... {name}')
    My name... José Jiménez
    
    >>> firstname = 'José'
    >>> lastname = 'Jiménez'
    >>>
    >>> result = f'My name... {firstname} {lastname}'
    >>> print(result)
    My name... José Jiménez
    

2.5.7. Unicode Literal

  • In Python 3 str is Unicode

  • In Python 2 str is Bytes

  • In Python 3 u'...' is only for compatibility with Python 2

    >>> u'zażółć gęślą jaźń'
    'zażółć gęślą jaźń'
    

2.5.8. Bytes Literal

  • Used while reading from low level devices and drivers

  • Used in sockets and HTTP connections

  • bytes is a sequence of octets (integers between 0 and 255)

  • bytes.decode() conversion to unicode str

  • str.encode() conversion to bytes

    >>> data = 'Moon'   # Unicode Literal
    >>> data = u'Moon'  # Unicode Literal
    >>> data = b'Moon'  # Bytes Literal
    
    >>> data = 'Moon'
    >>>
    >>> type(data)
    <class 'str'>
    >>> data.encode()
    b'Moon'
    

    >>>data = b'Moon' >>> >>> type(data) <class 'bytes'> >>> data.decode() 'Moon'

2.5.9. Raw String

  • Escapes does not matters

In Regular Expressions:

>>> r'[a-z0-9]\n'
'[a-z0-9]\\n'
>>> print(r'C:\Users\Admin\file.txt')
C:\Users\Admin\file.txt
>>>
>>> print('C:\\Users\\Admin\\file.txt')
C:\Users\Admin\file.txt
>>>
>>> print('C:\Users\Admin\file.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
  • Problem: \Users

  • after \U... python expects Unicode codepoint in hex i.e. '\U0001F680' which is 🚀 emoticon

  • s is invalid hexadecimal character

  • Only valid characters are 0123456789abcdefABCDEF

2.5.10. Concatenation

  • Preferred string concatenation is using f-string formatting

    >>> 'a' + 'b'
    'ab'
    >>> '1' + '2'
    '12'
    
    >>> a = 'a'
    >>> b = 'b'
    >>>
    >>> a + b
    'ab'
    
    >>> a = '1'
    >>> b = '2'
    >>>
    >>> a + b
    '12'
    
    >>> '*' * 10
    '**********'
    >>> 'Beetlejuice' * 3
    'BeetlejuiceBeetlejuiceBeetlejuice'
    >>> 'Mua' + 'Ha' * 2
    'MuaHaHa'
    >>> 'Mua' + ('Ha'*2)
    'MuaHaHa'
    >>> ('Mua'+'Ha') * 2
    'MuaHaMuaHa'
    
    >>> firstname = 'Jan'
    >>> lastname = 'Twardowski'
    >>>
    >>> firstname + lastname
    'JanTwardowski'
    >>>
    >>> firstname + ' ' + lastname
    'Jan Twardowski'
    

2.5.11. Length

>>> len('hello')
5

2.5.12. Reading Input

  • input() returns str

  • Good practice: add space at the end of prompt

  • Good practice: always .strip() text from user input

  • Good practice: always sanitize values from user prompt

input() function argument is prompt text, which "invites" user to enter specific information. Note colon space (": ") at the end. Space is needed to separate user input from prompt.

Note, that the line input = lambda x: 'Mark Watney' is only for testing purposes (it is called "Stub"), and you should not do that in your programs! This assumes, that user will input str Mark Watney:

>>> # Assume user will input 'Mark Watney' and then hit ENTER key
>>> input = lambda x: 'Mark Watney'  # Don't do this in your code
>>>
>>> name = input('What is your name: ')
>>>
>>> print(name)
Mark Watney
>>> type(name)
<class 'str'>

input() always returns a str. To get numeric value type casting to int is needed.

>>> # Assume user will input '42' and then hit ENTER key
>>> input = lambda x: '42'  # Don't do this in your code
>>>
>>> age = input('What is your age: ')
>>>
>>> print(age)
42
>>> type(age)
<class 'str'>
>>>
>>> age = int(age)
>>> print(age)
42
>>> type(age)
<class 'int'>

Conversion to float handles decimals, which int does not support:

>>> # Assume user will input '42.5' and then hit ENTER key
>>> input = lambda x: '42.5'  # Don't do this in your code
>>>
>>> age = input('What is your age: ')
>>>
>>> age = int(age)
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: '42.5'
>>>
>>> age = float(age)
>>> print(age)
42.5
>>> type(age)
<class 'float'>

Conversion to float cannot handle comma (',') as a decimal separator:

>>> # Assume user will input '42,5' and then hit ENTER key
>>> input = lambda x: '42,5'  # Don't do this in your code
>>>
>>> age = input('What is your age: ')
>>>
>>> age = int(age)
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: '45,5'
>>>
>>> age = float(age)
Traceback (most recent call last):
ValueError: could not convert string to float: '45,5'

2.5.13. Assignments

Code 2.20. Solution
"""
* Assignment: Type String Input
* Filename: type_str_input.py
* Complexity: easy
* Lines of code: 1 lines
* Time: 2 min

English:
    1. Ask user to input text `NASA` (case sensitive)
    2. Define `result` with text from user

Polish:
    1. Poproś użytkownika o wprowadzenie tekstu `NASA` (wielkość liter ma znaczenie)
    2. Zdefiniuj `result` z tekstem wprowadzonym od użytkownika

Tests:
    >>> type(result)
    <class 'str'>
    >>> result
    'NASA'
"""


Code 2.21. Solution
"""
* Assignment: Type String Emoticon
* Filename: type_str_emoticon.py
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Define `name` with value `Mark Watney`
    2. Print `Hello NAME EMOTICON`, where:
        a. NAME is a name read from user
        b. EMOTICON is Unicode Codepoint "\U0001F642"
    3. Compare result with "Tests" section (see below)

Polish:
    1. Zdefiniuj `name` z wartością `Mark Watney`
    2. Wypisz `Hello NAME EMOTICON`, gdzie:
        a. NAME to imię wczytane od użytkownika
        b. EMOTICON to Unicode Codepoint "\U0001F642"
    3. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> type(result)
    <class 'str'>
    >>> '\U0001F642' in result
    True
    >>> name in result
    True
    >>> result
    'Hello Mark Watney 🙂'
"""

Code 2.22. Solution
"""
* Assignment: Type String Quotes
* Filename: type_str_quotes.py
* Complexity: easy
* Lines of code: 1 lines
* Time: 5 min

English:
    1. Use data from "Given" section (see below)
    2. To print use f-string formatting
    3. Note, that second line starts with tab
    4. Value `NAME` in double quotes is a name read from user
    5. Mind the different quotes, apostrophes, tabs and newlines
    6. Do not use neither space not enter - use `\n` and `\t`
    7. Do not use string addition (`str + str`)
    8. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Do wypisania użyj f-string formatting
    3. Zauważ, że druga linijka zaczyna się od tabulacji
    4. Wartość `NAME` w podwójnych cudzysłowach to ciąg od użytkownika
    5. Zwróć uwagę na znaki apostrofów, cudzysłowów, tabulacji i nowych linii
    6. Nie używaj spacji ani klawisza enter - użyj `\n` i `\t`
    7. Nie korzystaj z dodawania stringów (`str + str`)
    8. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> print(result)  # doctest: +NORMALIZE_WHITESPACE
    '''My name... "José Jiménez".
        I'm an \"\"\"astronaut!\"\"\"'''
"""


# Given
name = 'José Jiménez'