3.7. Sequence Slice

3.7.1. Rationale

  • Slice argument must be int (positive, negative or zero)

  • Positive Index starts with 0

  • Negative index starts with -1

3.7.2. Slice Forwards

  • sequence[start:stop]

    >>> text = 'We choose to go to the Moon!'
    >>>
    >>> text[0:2]
    'We'
    >>> text[:2]
    'We'
    >>> text[0:9]
    'We choose'
    >>> text[:9]
    'We choose'
    >>> text[23:28]
    'Moon!'
    >>> text[23:]
    'Moon!'
    

3.7.3. Slice Backwards

  • Negative index starts from the end and go right to left

    >>> text = 'We choose to go to the Moon!'
    >>>
    >>> text[:-13]
    'We choose to go'
    >>> text[:-19]
    'We choose'
    >>> text[-12:]
    'to the Moon!'
    >>> text[-5:]
    'Moon!'
    >>> text[-5:-1]
    'Moon'
    >>> text[23:-2]
    'Moo'
    >>>
    >>> text[-1:0]
    ''
    >>> text[-2:0]
    ''
    >>> text[-2:2]
    ''
    >>> text[-5:5]
    ''
    

3.7.4. Step

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

    >>> text = 'We choose to go to the Moon!'
    >>>
    >>> text[::1]
    'We choose to go to the Moon!'
    >>> text[::2]
    'W hoet ot h on'
    >>> text[::-1]
    '!nooM eht ot og ot esoohc eW'
    >>> text[::-2]
    '!oMeto go soce'
    

3.7.5. Out of Range

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:100]
'We choose to go to the Moon!'
>>>
>>> text[100:]
''

3.7.6. Ordered Sequences

Slicing str:

>>> data = 'abcde'
>>>
>>> data[0:3]
'abc'
>>> data[3:5]
'de'
>>> data[:3]
'abc'
>>> data[3:]
'de'
>>> data[::1]
'abcde'
>>> data[::-1]
'edcba'
>>> data[::2]
'ace'
>>> data[::-2]
'eca'
>>> data[1::2]
'bd'
>>> data[1:4:2]
'bd'

Slicing tuple:

>>> data = ('a', 'b', 'c', 'd', 'e')
>>>
>>> data[0:3]
('a', 'b', 'c')
>>> data[3:5]
('d', 'e')
>>> data[:3]
('a', 'b', 'c')
>>> data[3:]
('d', 'e')
>>> data[::2]
('a', 'c', 'e')
>>> data[::-1]
('e', 'd', 'c', 'b', 'a')
>>> data[1::2]
('b', 'd')
>>> data[1:4:2]
('b', 'd')

Slicing list:

>>> data = ['a', 'b', 'c', 'd', 'e']
>>>
>>> data[0:3]
['a', 'b', 'c']
>>> data[3:5]
['d', 'e']
>>> data[:3]
['a', 'b', 'c']
>>> data[3:]
['d', 'e']
>>> data[::2]
['a', 'c', 'e']
>>> data[::-1]
['e', 'd', 'c', 'b', 'a']
>>> data[1::2]
['b', 'd']
>>> data[1:4:2]
['b', 'd']

3.7.7. Unordered Sequences

Slicing set is not possible:

>>> data = {'a', 'b', 'c', 'd', 'e'}
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'set' object is not subscriptable

Slicing frozenset is not possible:

>>> data = frozenset({'a', 'b', 'c', 'd', 'e'})
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'frozenset' object is not subscriptable

3.7.8. Nested Sequences

>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> DATA[1:]  # doctest: +NORMALIZE_WHITESPACE
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> DATA[-3:]  # doctest: +NORMALIZE_WHITESPACE
[(6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[::2]  # doctest: +NORMALIZE_WHITESPACE
[[1, 2, 3],
 [7, 8, 9]]
>>>
>>> data[::2][1]
[7, 8, 9]
>>>
>>> data[::2][:1]
[[1, 2, 3]]
>>>
>>> data[::2][1][1:]
[8, 9]

3.7.9. Slice All

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:]
'We choose to go to the Moon!'

Used in numpy to get all rows or columns:

>>> import numpy as np
>>>
>>> data = np.array([[1, 2, 3],
...                  [4, 5, 6],
...                  [7, 8, 9]])
...
>>> data[:, 1]
array([2, 5, 8])
>>>
>>> data[1, :]
array([4, 5, 6])

This unfortunately does not work on list:

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[:]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
>>> data[:, 1]
Traceback (most recent call last):
TypeError: list indices must be integers or slices, not tuple
>>>
>>> data[:][1]
[4, 5, 6]

Used in pandas to get all rows or columns:

>>> import pandas as pd
>>>
>>> df = pd.DataFrame([
...     {'A': 1, 'B': 2, 'C': 3},
...     {'A': 4, 'B': 5, 'C': 6},
...     {'A': 7, 'B': 8, 'C': 9}])
>>>
>>> df
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9
>>>
>>> df.loc[:, ('A','B')]
   A  B
0  1  2
1  4  5
2  7  8
>>>
>>> df.loc[::2, ::2]
   A  C
0  1  3
2  7  9
>>>
>>> df.loc[1, :]
A    4
B    5
C    6
Name: 1, dtype: int64

3.7.10. Index Arithmetic

>>> text = 'We choose to go to the Moon!'
>>> first = 23
>>> last = 28
>>> step = 2
>>>
>>> text[first:last]
'Moon!'
>>> text[first:last-1]
'Moon'
>>> text[first:last:step]
'Mo!'
>>> text[first:last-1:step]
'Mo'

3.7.11. Slice Function

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

    >>> text = 'We choose to go to the Moon!'
    >>>
    >>> q = slice(23, 27)
    >>> text[q]
    'Moon'
    >>>
    >>> q = slice(None, 9)
    >>> text[q]
    'We choose'
    >>>
    >>> q = slice(23, None)
    >>> text[q]
    'Moon!'
    >>>
    >>> q = slice(23, None, 2)
    >>> text[q]
    'Mo!'
    >>>
    >>> q = slice(None, None, 2)
    >>> text[q]
    'W hoet ot h on'
    

3.7.12. Example

>>> from pprint import pprint
>>>
>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> pprint(DATA[1:])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> pprint(DATA[1::2])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor')]
>>>
>>> pprint(DATA[1::-2])
[(5.8, 2.7, 5.1, 1.9, 'virginica')]
>>>
>>> pprint(DATA[:1:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa')]
>>>
>>> pprint(DATA[:-5:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'), (6.3, 2.9, 5.6, 1.8, 'virginica')]
>>>
>>> pprint(DATA[1:-5:-2])
[]

3.7.13. Assignments

Code 3.13. Solution
"""
* Assignment: Sequence Slice Substr
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
    1. Use data from "Given" section (see below)
    2. Use `str.find()` and slicing
    3. Print `TEXT` without text in `REMOVE`
    4. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Użyj `str.find()` oraz wycinania
    3. Wypisz `TEXT` bez tekstu z `REMOVE`
    4. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> type(result)
    <class 'str'>
    >>> result
    'We choose the Moon!'
"""


# Given
TEXT = 'We choose to go to the Moon!'
REMOVE = 'to go to '


Code 3.14. Solution
"""
* Assignment: Sequence Slice Sequence
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Use data from "Given" section (see below)
    3. Create set `result` with every second element from `a` and `b`
    4. Print `result`
    5. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    3. Stwórz zbiór `result` z co drugim elementem `a` i `b`
    4. Wypisz `result`
    5. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> type(result)
    <class 'set'>
    >>> result
    {0, 2, 4}
"""


# Given
a = (0, 1, 2, 3)
b = [2, 3, 4, 5]


Code 3.15. Solution
"""
* Assignment: Sequence Slice Text
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
    1. Use data from "Given" section (see below)
    2. Remove title and military rank in each variable
    3. Remove also whitespaces at the beginning and end of a text
    4. Use only `slice` to clean text
    5. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Usuń tytuł naukowy i stopień wojskowy z każdej zmiennej
    3. Usuń również białe znaki na początku i końcu tekstu
    4. Użyj tylko `slice` do oczyszczenia tekstu
    5. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> example
    'Mark Watney'
    >>> a
    'Jan Twardowski'
    >>> b
    'Jan Twardowski'
    >>> c
    'Mark Watney'
    >>> d
    'Melissa Lewis'
    >>> e
    'Ryan Stone'
    >>> f
    'Ryan Stone'
    >>> g
    'Jan Twardowski'
"""


# Given
example = 'lt. Mark Watney, PhD'
a = 'dr hab. inż. Jan Twardowski, prof. AATC'
b = 'gen. pil. Jan Twardowski'
c = 'Mark Watney, PhD'
d = 'lt. col. ret. Melissa Lewis'
e = 'dr n. med. Ryan Stone'
f = 'Ryan Stone, MD-PhD'
g = 'lt. col. Jan Twardowski\t'

example = example[4:-5]


Code 3.16. Solution
"""
* Assignment: Sequence Slice Split
* Complexity: easy
* Lines of code: 6 lines
* Time: 8 min

English:
    1. Use data from "Given" section (see below)
    2. Separate header from data
    3. Write header (first line) to `header` variable
    4. Write data without header to `data` variable
    5. Calculate pivot point: number records in `data` multiplied by PERCENT (division ratio below)
    6. Divide `data` into two lists:
        a. `train`: 60% - training data
        b. `test`: 40% - testing data
    7. From `data` write training data from start to pivot
    8. From `data` write test data from pivot to end
    9. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Odseparuj nagłówek od danych
    3. Zapisz nagłówek (pierwsza linia) do zmiennej `header`
    4. Zapisz dane bez nagłówka do zmiennej `data`
    5. Wylicz punkt podziału: ilość rekordów w `data` razy PROCENT (proporcja podziału poniżej)
    6. Podziel `data` na dwie listy:
        a. `train`: 60% - dane do uczenia
        b. `test`: 40% - dane do testów
    7. Z `data` zapisz do uczenia rekordy od początku do punktu podziału
    8. Z `data` zapisz do testów rekordy od punktu podziału do końca
    9. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> type(header)
    <class 'tuple'>
    >>> type(train)
    <class 'list'>
    >>> type(test)
    <class 'list'>
    >>> ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species') not in train
    True
    >>> ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species') not in test
    True
    >>> assert all(type(x) is tuple for x in train)
    >>> assert all(type(x) is tuple for x in test)
    >>> header  # doctest: +NORMALIZE_WHITESPACE
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species')
    >>> train  # doctest: +NORMALIZE_WHITESPACE
    [(5.8, 2.7, 5.1, 1.9, 'virginica'),
     (5.1, 3.5, 1.4, 0.2, 'setosa'),
     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
     (6.3, 2.9, 5.6, 1.8, 'virginica'),
     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
     (4.7, 3.2, 1.3, 0.2, 'setosa')]
    >>> test  # doctest: +NORMALIZE_WHITESPACE
    [(7.0, 3.2, 4.7, 1.4, 'versicolor'),
     (7.6, 3.0, 6.6, 2.1, 'virginica'),
     (4.9, 3.0, 1.4, 0.2, 'setosa'),
     (4.9, 2.5, 4.5, 1.7, 'virginica')]
"""


# Given
DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
        (5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa'),
        (7.0, 3.2, 4.7, 1.4, 'versicolor'),
        (7.6, 3.0, 6.6, 2.1, 'virginica'),
        (4.9, 3.0, 1.4, 0.2, 'setosa'),
        (4.9, 2.5, 4.5, 1.7, 'virginica')]