2.5. Type Str¶
2.5.1. Definition¶
str
is a sequence>>> data = '' >>> data = 'Jan Twardowski' >>> data = 'First line\nSecond line\nThird line' >>> >>> data = """First line ... Second line ... Third line"""
2.5.2. Type Casting¶
Builtin function str()
converts argument to str
>>> str('Moon')
'Moon'
>>> str(1969)
'1969'
>>> str(1.337)
'1.337'
Builtin function print()
before printing on the screen
runs str()
on its arguments:
>>> print(1969)
1969
2.5.3. Single and Double Quotes¶
"
and'
works the sameChoose one and keep consistency in code
Python console prefers single quote (
'
) characterIt matters for
doctest
, which compares two outputs character by characterPEP 257 -- Docstring Conventions: For multiline
str
always use three double quote ("""
) characters
Python console prefers single quote ('
):
>>> data = 'My name is José Jiménez'
>>> data
'My name is José Jiménez'
Python console prefers single quote ('
):
>>> data = "My name is José Jiménez"
>>> data
'My name is José Jiménez'
It's better to use double quotes, when text has apostrophes. This is the behavior of Python console:
>>> data = 'My name\'s José Jiménez'
>>> data
"My name's José Jiménez"
HTML and XML uses double quotes to enclose attribute values, hence it's better to use single quotes for the string:
>>> data = '<a href="http://python.astrotech.io">Python and Machine Learning</a>'
>>> data
'<a href="http://python.astrotech.io">Python and Machine Learning</a>'
PEP 257 -- Docstring Conventions: For multiline str
always use three
double quote ("""
) characters
>>> data = """My name's \"José Jiménez\""""
>>> data = '''My name\'s "José Jiménez"'''
2.5.4. Docstring¶
PEP 257 -- Docstring Conventions: For multiline
str
always use three double quote ("""
) charactersMore information in Function Doctest
If assigned to variable, it serves as multiline str
otherwise
it's a docstring:
>>> TEXT = """
... We choose to go to the Moon!
... We choose to go to the Moon in this decade and do the other things,
... not because they are easy, but because they are hard;
... because that goal will serve to organize and measure the best of our energies and skills,
... because that challenge is one that we are willing to accept, one we are unwilling to postpone,
... and one we intend to win, and the others, too.
... """
2.5.5. Escape Characters¶
\n
- New line (ENTER)\t
- Horizontal Tab (TAB)\'
- Single quote'
(escape in single quoted strings)\"
- Double quote"
(escape in double quoted strings)\\
- Backslash\
(to indicate, that this is not escape char)More information in Builtin Printing
https://en.wikipedia.org/wiki/List_of_Unicode_characters
>>> print('\U0001F680') 🚀
>>> a = '\U0001F9D1' # 🧑 >>> b = '\U0000200D' # '' >>> c = '\U0001F680' # 🚀 >>> >>> astronaut = a + b + c >>> print(astronaut) 🧑🚀
2.5.6. Format String¶
String interpolation (variable substitution)
Since Python 3.6
Used for
str
concatenation>>> name = 'José Jiménez' >>> >>> print(f'My name... {name}') My name... José Jiménez
>>> firstname = 'José' >>> lastname = 'Jiménez' >>> >>> result = f'My name... {firstname} {lastname}' >>> print(result) My name... José Jiménez
2.5.7. Unicode Literal¶
In Python 3
str
is UnicodeIn Python 2
str
is BytesIn Python 3
u'...'
is only for compatibility with Python 2>>> u'zażółć gęślą jaźń' 'zażółć gęślą jaźń'
2.5.8. Bytes Literal¶
Used while reading from low level devices and drivers
Used in sockets and HTTP connections
bytes
is a sequence of octets (integers between 0 and 255)bytes.decode()
conversion to unicodestr
str.encode()
conversion tobytes
>>> data = 'Moon' # Unicode Literal >>> data = u'Moon' # Unicode Literal >>> data = b'Moon' # Bytes Literal
>>> data = 'Moon' >>> >>> type(data) <class 'str'> >>> data.encode() b'Moon'
>>>data = b'Moon' >>> >>> type(data) <class 'bytes'> >>> data.decode() 'Moon'
2.5.9. Raw String¶
Escapes does not matters
In Regular Expressions:
>>> r'[a-z0-9]\n'
'[a-z0-9]\\n'
>>> print(r'C:\Users\Admin\file.txt')
C:\Users\Admin\file.txt
>>>
>>> print('C:\\Users\\Admin\\file.txt')
C:\Users\Admin\file.txt
>>>
>>> print('C:\Users\Admin\file.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Problem:
\Users
after
\U...
python expects Unicode codepoint in hex i.e. '\U0001F680' which is 🚀 emoticons
is invalid hexadecimal characterOnly valid characters are
0123456789abcdefABCDEF
2.5.10. Concatenation¶
Preferred string concatenation is using
f-string
formatting>>> 'a' + 'b' 'ab' >>> '1' + '2' '12'
>>> a = 'a' >>> b = 'b' >>> >>> a + b 'ab'
>>> a = '1' >>> b = '2' >>> >>> a + b '12'
>>> '*' * 10 '**********' >>> 'Beetlejuice' * 3 'BeetlejuiceBeetlejuiceBeetlejuice' >>> 'Mua' + 'Ha' * 2 'MuaHaHa' >>> 'Mua' + ('Ha'*2) 'MuaHaHa' >>> ('Mua'+'Ha') * 2 'MuaHaMuaHa'
>>> firstname = 'Jan' >>> lastname = 'Twardowski' >>> >>> firstname + lastname 'JanTwardowski' >>> >>> firstname + ' ' + lastname 'Jan Twardowski'
2.5.11. Length¶
>>> len('hello')
5
2.5.12. Reading Input¶
input()
returnsstr
Good practice: add space at the end of prompt
Good practice: always
.strip()
text from user inputGood practice: always sanitize values from user prompt
input()
function argument is prompt text, which "invites" user to enter
specific information. Note colon space (": ") at the end. Space is needed
to separate user input from prompt.
Note, that the line input = lambda x: 'Mark Watney'
is only for testing
purposes (it is called "Stub"), and you should not do that in your programs!
This assumes, that user will input str Mark Watney
:
>>> # Assume user will input 'Mark Watney' and then hit ENTER key
>>> input = lambda x: 'Mark Watney' # Don't do this in your code
>>>
>>> name = input('What is your name: ')
>>>
>>> print(name)
Mark Watney
>>> type(name)
<class 'str'>
input()
always returns a str
.
To get numeric value type casting to int
is needed.
>>> # Assume user will input '42' and then hit ENTER key
>>> input = lambda x: '42' # Don't do this in your code
>>>
>>> age = input('What is your age: ')
>>>
>>> print(age)
42
>>> type(age)
<class 'str'>
>>>
>>> age = int(age)
>>> print(age)
42
>>> type(age)
<class 'int'>
Conversion to float
handles decimals, which int
does not support:
>>> # Assume user will input '42.5' and then hit ENTER key
>>> input = lambda x: '42.5' # Don't do this in your code
>>>
>>> age = input('What is your age: ')
>>>
>>> age = int(age)
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: '42.5'
>>>
>>> age = float(age)
>>> print(age)
42.5
>>> type(age)
<class 'float'>
Conversion to float
cannot handle comma (',') as a decimal separator:
>>> # Assume user will input '42,5' and then hit ENTER key
>>> input = lambda x: '42,5' # Don't do this in your code
>>>
>>> age = input('What is your age: ')
>>>
>>> age = int(age)
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: '45,5'
>>>
>>> age = float(age)
Traceback (most recent call last):
ValueError: could not convert string to float: '45,5'
2.5.13. Assignments¶
"""
* Assignment: Type String Input
* Filename: type_str_input.py
* Complexity: easy
* Lines of code: 1 lines
* Time: 2 min
English:
1. Ask user to input text `NASA` (case sensitive)
2. Define `result` with text from user
Polish:
1. Poproś użytkownika o wprowadzenie tekstu `NASA` (wielkość liter ma znaczenie)
2. Zdefiniuj `result` z tekstem wprowadzonym od użytkownika
Tests:
>>> type(result)
<class 'str'>
>>> result
'NASA'
"""
"""
* Assignment: Type String Emoticon
* Filename: type_str_emoticon.py
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min
English:
1. Define `name` with value `Mark Watney`
2. Print `Hello NAME EMOTICON`, where:
a. NAME is a name read from user
b. EMOTICON is Unicode Codepoint "\U0001F642"
3. Compare result with "Tests" section (see below)
Polish:
1. Zdefiniuj `name` z wartością `Mark Watney`
2. Wypisz `Hello NAME EMOTICON`, gdzie:
a. NAME to imię wczytane od użytkownika
b. EMOTICON to Unicode Codepoint "\U0001F642"
3. Porównaj wyniki z sekcją "Tests" (patrz poniżej)
Tests:
>>> type(result)
<class 'str'>
>>> '\U0001F642' in result
True
>>> name in result
True
>>> result
'Hello Mark Watney 🙂'
"""
"""
* Assignment: Type String Quotes
* Filename: type_str_quotes.py
* Complexity: easy
* Lines of code: 1 lines
* Time: 5 min
English:
1. Use data from "Given" section (see below)
2. To print use f-string formatting
3. Note, that second line starts with tab
4. Value `NAME` in double quotes is a name read from user
5. Mind the different quotes, apostrophes, tabs and newlines
6. Do not use neither space not enter - use `\n` and `\t`
7. Do not use string addition (`str + str`)
8. Compare result with "Tests" section (see below)
Polish:
1. Użyj danych z sekcji "Given" (patrz poniżej)
2. Do wypisania użyj f-string formatting
3. Zauważ, że druga linijka zaczyna się od tabulacji
4. Wartość `NAME` w podwójnych cudzysłowach to ciąg od użytkownika
5. Zwróć uwagę na znaki apostrofów, cudzysłowów, tabulacji i nowych linii
6. Nie używaj spacji ani klawisza enter - użyj `\n` i `\t`
7. Nie korzystaj z dodawania stringów (`str + str`)
8. Porównaj wyniki z sekcją "Tests" (patrz poniżej)
Tests:
>>> print(result) # doctest: +NORMALIZE_WHITESPACE
'''My name... "José Jiménez".
I'm an \"\"\"astronaut!\"\"\"'''
"""
# Given
name = 'José Jiménez'