3.6. String Literals¶
3.6.1. Escape Characters¶
\n
- New line (ENTER)\t
- Horizontal Tab (TAB)\'
- Single quote'
(escape in single quoted strings)\"
- Double quote"
(escape in double quoted strings)\\
- Backslash\
(to indicate, that this is not escape char)More information in Builtin Printing
>>> print('Hello\World')
Hello\World
>>> print('Hello\nWorld')
Hello
World
>>> print('Hello\tWorld')
Hello World
3.6.2. Unicode¶
>>> print('\U0001F680')
🚀
>>> a = '\U0001F9D1' # 🧑
>>> b = '\U0000200D' # ''
>>> c = '\U0001F680' # 🚀
>>>
>>> astronaut = a + b + c
>>> print(astronaut)
🧑🚀
3.6.3. Format String¶
String interpolation (variable substitution)
Since Python 3.6
Used for
str
concatenation
>>> name = 'Mark'
>>>
>>> print('Hello {name}')
Hello {name}
>>> name = 'Mark'
>>> print(f'Hello {name}')
Hello Mark
3.6.4. Unicode Literal¶
In Python 3
str
is UnicodeIn Python 2
str
is BytesIn Python 3
u'...'
is only for compatibility with Python 2
>>> u'zażółć gęślą jaźń'
'zażółć gęślą jaźń'
3.6.5. Bytes Literal¶
Used while reading from low level devices and drivers
Used in sockets and HTTP connections
bytes
is a sequence of octets (integers between 0 and 255)bytes.decode()
conversion to unicodestr
str.encode()
conversion tobytes
>>> data = 'Moon' # Unicode Literal
>>> data = u'Moon' # Unicode Literal
>>> data = b'Moon' # Bytes Literal
Encode string from unicode (UTF-8) string to bytes:
>>> data = 'cześć'
>>>
>>> data.encode()
b'cze\xc5\x9b\xc4\x87'
Decode string from bytes to unicode (UTF-8):
>>> data = b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.decode()
'cześć'
Unicode (UTF-8) is a default encoding. You can also specify different encodings to encode and decode data:
>>> data = 'cześć'
>>>
>>>
>>> data.encode('utf-8')
b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.encode('iso-8859-2')
b'cze\xb6\xe6'
>>>
>>> data.encode('windows-1250')
b'cze\x9c\xe6'
>>>
>>> data.encode('cp1250')
b'cze\x9c\xe6'
3.6.6. Raw String¶
Escapes does not matters
>>> print('Print "\n" to get new line')
Print "
" to get new line
>>> print('Print "\\n" to get new line')
Print "\n" to get new line
3.6.7. Use Case - 0x01¶
Raw-string in Regular Expressions:
>>> '\\b[a-z]+\\b'
'\\b[a-z]+\\b'
>>> r'\b[a-z]+\b'
'\\b[a-z]+\\b'
3.6.8. Use Case - 0x02¶
Raw-string in escaping tab character:
>>> print('C:\watney\temporary.txt')
C:\watney emporary.txt
>>>
>>> print(r'C:\watney\temporary.txt')
C:\watney\temporary.txt
Raw-string in escaping newline character:
>>> print('C:\nasa\myfile.txt')
C:
asa\myfile.txt
>>>
>>> print(r'C:\nasa\myfile.txt')
C:\nasa\myfile.txt
Raw-string in escaping newline and tab character:
>>> print('C:\nasa\temporary.txt')
C:
asa emporary.txt
>>>
>>> print(r'C:\nasa\temporary.txt')
C:\nasa\temporary.txt
3.6.9. Use Case - 0x03¶
There are no problems with escapes in POSIX compliant paths:
>>> path = '/home/mwatney/myfile.txt' # Linux
>>> path = '/User/mwatney/myfile.txt' # macOS
In Windows you can find escape character in paths. In order to avoid problems you can use slashes instead of backslashes:
>>> path = 'c:/Users/mwatney/myfile.txt'
This is not typical for this operating system, therefore hardly anyone does that. Typically users will put paths using slashes, and that's ok, if you are using escaped slashes or raw-strings:
>>> path = 'c:\\Users\\mwatney\\myfile.txt'
>>> path = r'c:\Users\mwatney\myfile.txt'
As soon as you forget about using either of them, the problem occurs:
>>> path = 'c:\Users\mwatney\myfile.txt'
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Problem is with \Users
. After escape sequence \U...
Python expects
hexadecimal Unicode codepoint, i.e. '\U0001F680' which is a rocket 🚀
emoticon. In this example, Python finds letter s
, which is invalid
hexadecimal character and therefore raises an SyntaxError
telling user
that there is an error with decoding bytes. The only valid hexadecimal
numbers are 0123456789abcdefABCDEF
and s
isn't one of them.
3.6.10. Assignments¶
"""
* Assignment: String Literals Emoticon
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min
English:
1. Print `Hello 😀`
2. Run doctests - all must succeed
Polish:
1. Wypisz `Hello 😀`
2. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* 😀 unicode codepoint is `\U0001F600`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> '😀' in result
True
>>> result
'Hello 😀'
"""
# Expected result: 'Hello 😀'
# type: str
result = ...