3.7. Sequence Slice

3.7.1. Rationale

  • Slice argument must be int (positive, negative or zero)

  • Positive Index starts with 0

  • Negative index starts with -1

3.7.2. Slice Forwards

  • sequence[start:stop]

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[0:2]
'We'
>>> text[:2]
'We'
>>> text[0:9]
'We choose'
>>> text[:9]
'We choose'
>>> text[23:28]
'Moon!'
>>> text[23:]
'Moon!'

3.7.3. Slice Backwards

  • Negative index starts from the end and go right to left

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:-13]
'We choose to go'
>>> text[:-19]
'We choose'
>>> text[-12:]
'to the Moon!'
>>> text[-5:]
'Moon!'
>>> text[-5:-1]
'Moon'
>>> text[23:-2]
'Moo'
>>>
>>> text[-1:0]
''
>>> text[-2:0]
''
>>> text[-2:2]
''
>>> text[-5:5]
''

3.7.4. Step

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[::1]
'We choose to go to the Moon!'
>>> text[::2]
'W hoet ot h on'
>>> text[::-1]
'!nooM eht ot og ot esoohc eW'
>>> text[::-2]
'!oMeto go soce'

3.7.5. Out of Range

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:100]
'We choose to go to the Moon!'
>>>
>>> text[100:]
''

3.7.6. Ordered Sequences

Slicing str:

>>> data = 'abcde'
>>>
>>> data[0:3]
'abc'
>>> data[3:5]
'de'
>>> data[:3]
'abc'
>>> data[3:]
'de'
>>> data[::1]
'abcde'
>>> data[::-1]
'edcba'
>>> data[::2]
'ace'
>>> data[::-2]
'eca'
>>> data[1::2]
'bd'
>>> data[1:4:2]
'bd'

Slicing tuple:

>>> data = ('a', 'b', 'c', 'd', 'e')
>>>
>>> data[0:3]
('a', 'b', 'c')
>>> data[3:5]
('d', 'e')
>>> data[:3]
('a', 'b', 'c')
>>> data[3:]
('d', 'e')
>>> data[::2]
('a', 'c', 'e')
>>> data[::-1]
('e', 'd', 'c', 'b', 'a')
>>> data[1::2]
('b', 'd')
>>> data[1:4:2]
('b', 'd')

Slicing list:

>>> data = ['a', 'b', 'c', 'd', 'e']
>>>
>>> data[0:3]
['a', 'b', 'c']
>>> data[3:5]
['d', 'e']
>>> data[:3]
['a', 'b', 'c']
>>> data[3:]
['d', 'e']
>>> data[::2]
['a', 'c', 'e']
>>> data[::-1]
['e', 'd', 'c', 'b', 'a']
>>> data[1::2]
['b', 'd']
>>> data[1:4:2]
['b', 'd']

3.7.7. Unordered Sequences

Slicing set is not possible:

>>> data = {'a', 'b', 'c', 'd', 'e'}
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'set' object is not subscriptable

Slicing frozenset is not possible:

>>> data = frozenset({'a', 'b', 'c', 'd', 'e'})
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'frozenset' object is not subscriptable

3.7.8. Nested Sequences

>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> DATA[1:]  # doctest: +NORMALIZE_WHITESPACE
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> DATA[-3:]  # doctest: +NORMALIZE_WHITESPACE
[(6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[::2]  # doctest: +NORMALIZE_WHITESPACE
[[1, 2, 3],
 [7, 8, 9]]
>>>
>>> data[::2][1]
[7, 8, 9]
>>>
>>> data[::2][:1]
[[1, 2, 3]]
>>>
>>> data[::2][1][1:]
[8, 9]

3.7.9. Slice All

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:]
'We choose to go to the Moon!'

Used in numpy to get all rows or columns:

>>> import numpy as np
>>>
>>> data = np.array([[1, 2, 3],
...                  [4, 5, 6],
...                  [7, 8, 9]])
...
>>> data[:, 1]
array([2, 5, 8])
>>>
>>> data[1, :]
array([4, 5, 6])

This unfortunately does not work on list:

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[:]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
>>> data[:, 1]
Traceback (most recent call last):
TypeError: list indices must be integers or slices, not tuple
>>>
>>> data[:][1]
[4, 5, 6]

Used in pandas to get all rows or columns:

>>> import pandas as pd
>>> pd.set_option('display.max_columns', 10)
>>>
>>>
>>> df = pd.DataFrame([
...     {'A': 1, 'B': 2, 'C': 3},
...     {'A': 4, 'B': 5, 'C': 6},
...     {'A': 7, 'B': 8, 'C': 9}])
>>>
>>> df
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9
>>>
>>> df.loc[:, ('A','B')]
   A  B
0  1  2
1  4  5
2  7  8
>>>
>>> df.loc[::2, ::2]
   A  C
0  1  3
2  7  9
>>>
>>> df.loc[1, :]
A    4
B    5
C    6
Name: 1, dtype: int64

3.7.10. Index Arithmetic

>>> text = 'We choose to go to the Moon!'
>>> first = 23
>>> last = 28
>>> step = 2
>>>
>>> text[first:last]
'Moon!'
>>> text[first:last-1]
'Moon'
>>> text[first:last:step]
'Mo!'
>>> text[first:last-1:step]
'Mo'

3.7.11. Slice Function

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> q = slice(23, 27)
>>> text[q]
'Moon'
>>>
>>> q = slice(None, 9)
>>> text[q]
'We choose'
>>>
>>> q = slice(23, None)
>>> text[q]
'Moon!'
>>>
>>> q = slice(23, None, 2)
>>> text[q]
'Mo!'
>>>
>>> q = slice(None, None, 2)
>>> text[q]
'W hoet ot h on'

3.7.12. Example

>>> from pprint import pprint
>>>
>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> pprint(DATA[1:])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> pprint(DATA[1::2])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor')]
>>>
>>> pprint(DATA[1::-2])
[(5.8, 2.7, 5.1, 1.9, 'virginica')]
>>>
>>> pprint(DATA[:1:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa')]
>>>
>>> pprint(DATA[:-5:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'), (6.3, 2.9, 5.6, 1.8, 'virginica')]
>>>
>>> pprint(DATA[1:-5:-2])
[]

3.7.13. Assignments

Code 3.14. Solution
"""
* Assignment: Sequence Slice Substr
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
    1. Use data from "Given" section (see below)
    2. Use `str.find()` and slicing
    3. Print `TEXT` without text in `REMOVE`
    4. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Użyj `str.find()` oraz wycinania
    3. Wypisz `TEXT` bez tekstu z `REMOVE`
    4. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> import sys
    >>> sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, 'Assignment solution must be in `result` instead of ... (Ellipsis)'
    >>> assert type(result) is str, 'Variable `result` has invalid type, should be str'

    >>> result
    'We choose the Moon!'
"""

# Given
TEXT = 'We choose to go to the Moon!'
REMOVE = 'to go to '

result = ...  # str TEXT without REMOVE part

Code 3.15. Solution
"""
* Assignment: Sequence Slice Sequence
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Use data from "Given" section (see below)
    3. Create set `result` with every second element from `a` and `b`
    4. Print `result`
    5. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    3. Stwórz zbiór `result` z co drugim elementem `a` i `b`
    4. Wypisz `result`
    5. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> import sys
    >>> sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, 'Assignment solution must be in `result` instead of ... (Ellipsis)'
    >>> assert type(result) is set, 'Variable `result` has invalid type, should be set'

    >>> result
    {0, 2, 4}
"""

# Given
a = (0, 1, 2, 3)
b = [2, 3, 4, 5]

result = ...  # set with every second element from `a` and `b`

Code 3.16. Solution
"""
* Assignment: Sequence Slice Text
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
    1. Use data from "Given" section (see below)
    2. Remove title and military rank in each variable
    3. Remove also whitespaces at the beginning and end of a text
    4. Use only `slice` to clean text
    5. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Usuń tytuł naukowy i stopień wojskowy z każdej zmiennej
    3. Usuń również białe znaki na początku i końcu tekstu
    4. Użyj tylko `slice` do oczyszczenia tekstu
    5. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> import sys
    >>> sys.tracebacklimit = 0

    >>> assert a is not Ellipsis, 'Assignment solution must be in `a` instead of ... (Ellipsis)'
    >>> assert b is not Ellipsis, 'Assignment solution must be in `b` instead of ... (Ellipsis)'
    >>> assert c is not Ellipsis, 'Assignment solution must be in `c` instead of ... (Ellipsis)'
    >>> assert d is not Ellipsis, 'Assignment solution must be in `d` instead of ... (Ellipsis)'
    >>> assert e is not Ellipsis, 'Assignment solution must be in `e` instead of ... (Ellipsis)'
    >>> assert f is not Ellipsis, 'Assignment solution must be in `f` instead of ... (Ellipsis)'
    >>> assert g is not Ellipsis, 'Assignment solution must be in `g` instead of ... (Ellipsis)'
    >>> assert type(a) is str, 'Variable `a` has invalid type, should be str'
    >>> assert type(b) is str, 'Variable `b` has invalid type, should be str'
    >>> assert type(c) is str, 'Variable `c` has invalid type, should be str'
    >>> assert type(d) is str, 'Variable `d` has invalid type, should be str'
    >>> assert type(e) is str, 'Variable `e` has invalid type, should be str'
    >>> assert type(f) is str, 'Variable `f` has invalid type, should be str'
    >>> assert type(g) is str, 'Variable `g` has invalid type, should be str'

    >>> example
    'Mark Watney'
    >>> a
    'Jan Twardowski'
    >>> b
    'Jan Twardowski'
    >>> c
    'Mark Watney'
    >>> d
    'Melissa Lewis'
    >>> e
    'Ryan Stone'
    >>> f
    'Ryan Stone'
    >>> g
    'Jan Twardowski'
"""

# Given
example = 'lt. Mark Watney, PhD'
a = 'dr hab. inż. Jan Twardowski, prof. AATC'
b = 'gen. pil. Jan Twardowski'
c = 'Mark Watney, PhD'
d = 'lt. col. ret. Melissa Lewis'
e = 'dr n. med. Ryan Stone'
f = 'Ryan Stone, MD-PhD'
g = 'lt. col. Jan Twardowski\t'

example: str = example[4:-5]
a: str  # Jan Twardowski
b: str  # Jan Twardowski
c: str  # Mark Watney
d: str  # Melissa Lewis
e: str  # Ryan Stone
f: str  # Ryan Stone
g: str  # Jan Twardowski

Code 3.17. Solution
"""
* Assignment: Sequence Slice Split
* Complexity: easy
* Lines of code: 6 lines
* Time: 8 min

English:
    1. Use data from "Given" section (see below)
    2. Separate header from data
    3. Write header (first line) to `header` variable
    4. Write data without header to `data` variable
    5. Calculate pivot point: number records in `data` multiplied by PERCENT
    (division ratio below)
    6. Divide `data` into two lists:
        a. `train`: 60% - training data
        b. `test`: 40% - testing data
    7. From `data` write training data from start to pivot
    8. From `data` write test data from pivot to end
    9. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Odseparuj nagłówek od danych
    3. Zapisz nagłówek (pierwsza linia) do zmiennej `header`
    4. Zapisz dane bez nagłówka do zmiennej `data`
    5. Wylicz punkt podziału: ilość rekordów w `data` razy PROCENT (
    proporcja podziału poniżej)
    6. Podziel `data` na dwie listy:
        a. `train`: 60% - dane do uczenia
        b. `test`: 40% - dane do testów
    7. Z `data` zapisz do uczenia rekordy od początku do punktu podziału
    8. Z `data` zapisz do testów rekordy od punktu podziału do końca
    9. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> import sys
    >>> sys.tracebacklimit = 0

    >>> assert header is not Ellipsis, 'Assignment solution must be in `header` instead of ... (Ellipsis)'
    >>> assert data is not Ellipsis, 'Assignment solution must be in `data` instead of ... (Ellipsis)'
    >>> assert train is not Ellipsis, 'Assignment solution must be in `train` instead of ... (Ellipsis)'
    >>> assert test is not Ellipsis, 'Assignment solution must be in `test` instead of ... (Ellipsis)'
    >>> assert type(header) is tuple, 'Variable `header` has invalid type, should be tuple'
    >>> assert type(train) is list, 'Variable `train` has invalid type, should be list'
    >>> assert type(train) is list, 'Variable `train` has invalid type, should be list'
    >>> assert type(test) is list, 'Variable `test` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in train), 'All elements in `train` should be tuple'
    >>> assert all(type(x) is tuple for x in test), 'All elements in `test` should be tuple'
    >>> assert header not in train, 'Header should not be in `train`'
    >>> assert header not in test, 'Header should not be in `test`'

    >>> header  # doctest: +NORMALIZE_WHITESPACE
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species')

    >>> train  # doctest: +NORMALIZE_WHITESPACE
    [(5.8, 2.7, 5.1, 1.9, 'virginica'),
     (5.1, 3.5, 1.4, 0.2, 'setosa'),
     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
     (6.3, 2.9, 5.6, 1.8, 'virginica'),
     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
     (4.7, 3.2, 1.3, 0.2, 'setosa')]

    >>> test  # doctest: +NORMALIZE_WHITESPACE
    [(7.0, 3.2, 4.7, 1.4, 'versicolor'),
     (7.6, 3.0, 6.6, 2.1, 'virginica'),
     (4.9, 3.0, 1.4, 0.2, 'setosa'),
     (4.9, 2.5, 4.5, 1.7, 'virginica')]
"""

# Given
DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
        (5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa'),
        (7.0, 3.2, 4.7, 1.4, 'versicolor'),
        (7.6, 3.0, 6.6, 2.1, 'virginica'),
        (4.9, 3.0, 1.4, 0.2, 'setosa'),
        (4.9, 2.5, 4.5, 1.7, 'virginica')]

header = ...  # tuple with row at index 0 from DATA
data = ...  # list of tuple with rows at all the other indexes from DATA
train = ...  # first 60% from data
test = ...  # last 40% from data