# 5.7. Sequence Slice¶

## 5.7.1. Rationale¶

• Slice argument must be int (positive, negative or zero)

• Positive Index starts with 0

• Negative index starts with -1

## 5.7.2. Slice Forwards¶

• sequence[start:stop]

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[0:2]
'We'
>>> text[:2]
'We'
>>> text[0:9]
'We choose'
>>> text[:9]
'We choose'
>>> text[23:28]
'Moon!'
>>> text[23:]
'Moon!'


## 5.7.3. Slice Backwards¶

• Negative index starts from the end and go right to left

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:-13]
'We choose to go'
>>> text[:-19]
'We choose'
>>> text[-12:]
'to the Moon!'
>>> text[-5:]
'Moon!'
>>> text[-5:-1]
'Moon'
>>> text[23:-2]
'Moo'
>>>
>>> text[-1:0]
''
>>> text[-2:0]
''
>>> text[-2:2]
''
>>> text[-5:5]
''


## 5.7.4. Step¶

• Every n-th element

• sequence[start:stop:step]

• start defaults to 0

• stop defaults to len(sequence)

• step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[::1]
'We choose to go to the Moon!'
>>> text[::2]
'W hoet ot h on'
>>> text[::-1]
'!nooM eht ot og ot esoohc eW'
>>> text[::-2]
'!oMeto go soce'


## 5.7.5. Out of Range¶

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:100]
'We choose to go to the Moon!'
>>>
>>> text[100:]
''


## 5.7.6. Ordered Sequences¶

Slicing str:

>>> data = 'abcde'
>>>
>>> data[0:3]
'abc'
>>> data[3:5]
'de'
>>> data[:3]
'abc'
>>> data[3:]
'de'
>>> data[::1]
'abcde'
>>> data[::-1]
'edcba'
>>> data[::2]
'ace'
>>> data[::-2]
'eca'
>>> data[1::2]
'bd'
>>> data[1:4:2]
'bd'


Slicing tuple:

>>> data = ('a', 'b', 'c', 'd', 'e')
>>>
>>> data[0:3]
('a', 'b', 'c')
>>> data[3:5]
('d', 'e')
>>> data[:3]
('a', 'b', 'c')
>>> data[3:]
('d', 'e')
>>> data[::2]
('a', 'c', 'e')
>>> data[::-1]
('e', 'd', 'c', 'b', 'a')
>>> data[1::2]
('b', 'd')
>>> data[1:4:2]
('b', 'd')


Slicing list:

>>> data = ['a', 'b', 'c', 'd', 'e']
>>>
>>> data[0:3]
['a', 'b', 'c']
>>> data[3:5]
['d', 'e']
>>> data[:3]
['a', 'b', 'c']
>>> data[3:]
['d', 'e']
>>> data[::2]
['a', 'c', 'e']
>>> data[::-1]
['e', 'd', 'c', 'b', 'a']
>>> data[1::2]
['b', 'd']
>>> data[1:4:2]
['b', 'd']


## 5.7.7. Unordered Sequences¶

Slicing set is not possible:

>>> data = {'a', 'b', 'c', 'd', 'e'}
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'set' object is not subscriptable


Slicing frozenset is not possible:

>>> data = frozenset({'a', 'b', 'c', 'd', 'e'})
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'frozenset' object is not subscriptable


## 5.7.8. Nested Sequences¶

>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> DATA[1:]
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> DATA[-3:]
[(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[::2]
[[1, 2, 3],
[7, 8, 9]]
>>>
>>> data[::2][1]
[7, 8, 9]
>>>
>>> data[::2][:1]
[[1, 2, 3]]
>>>
>>> data[::2][1][1:]
[8, 9]


## 5.7.9. Slice All¶

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:]
'We choose to go to the Moon!'


Column selection unfortunately does not work on list:

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[:]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
>>> data[:, 1]
Traceback (most recent call last):
TypeError: list indices must be integers or slices, not tuple
>>>
>>> data[:][1]
[4, 5, 6]


However this syntax is valid in numpy and pandas.

## 5.7.10. Index Arithmetic¶

>>> text = 'We choose to go to the Moon!'
>>> first = 23
>>> last = 28
>>> step = 2
>>>
>>> text[first:last]
'Moon!'
>>> text[first:last-1]
'Moon'
>>> text[first:last:step]
'Mo!'
>>> text[first:last-1:step]
'Mo'


## 5.7.11. Slice Function¶

• Every n-th element

• sequence[start:stop:step]

• start defaults to 0

• stop defaults to len(sequence)

• step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> q = slice(23, 27)
>>> text[q]
'Moon'
>>>
>>> q = slice(None, 9)
>>> text[q]
'We choose'
>>>
>>> q = slice(23, None)
>>> text[q]
'Moon!'
>>>
>>> q = slice(23, None, 2)
>>> text[q]
'Mo!'
>>>
>>> q = slice(None, None, 2)
>>> text[q]
'W hoet ot h on'


## 5.7.12. Example¶

>>> from pprint import pprint
>>>
>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> pprint(DATA[1:])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> pprint(DATA[1::2])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.4, 3.2, 4.5, 1.5, 'versicolor')]
>>>
>>> pprint(DATA[1::-2])
[(5.8, 2.7, 5.1, 1.9, 'virginica')]
>>>
>>> pprint(DATA[:1:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa')]
>>>
>>> pprint(DATA[:-5:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'), (6.3, 2.9, 5.6, 1.8, 'virginica')]
>>>
>>> pprint(DATA[1:-5:-2])
[]


## 5.7.13. Assignments¶

"""
* Assignment: Sequence Slice Text
* Required: yes
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
1. Remove title and military rank in each variable
2. Remove also whitespaces at the beginning and end of a text
3. Use only slice to clean text
4. Run doctests - all must succeed

Polish:
1. Usuń tytuł naukowy i stopień wojskowy z każdej zmiennej
2. Usuń również białe znaki na początku i końcu tekstu
3. Użyj tylko slice do oczyszczenia tekstu
4. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert a is not Ellipsis, \
'Assign result to variable: a'
>>> assert b is not Ellipsis, \
'Assign result to variable: b'
>>> assert c is not Ellipsis, \
'Assign result to variable: c'
>>> assert d is not Ellipsis, \
'Assign result to variable: d'
>>> assert e is not Ellipsis, \
'Assign result to variable: e'
>>> assert f is not Ellipsis, \
'Assign result to variable: f'
>>> assert g is not Ellipsis, \
'Assign result to variable: g'
>>> assert type(a) is str, \
'Variable a has invalid type, should be str'
>>> assert type(b) is str, \
'Variable b has invalid type, should be str'
>>> assert type(c) is str, \
'Variable c has invalid type, should be str'
>>> assert type(d) is str, \
'Variable d has invalid type, should be str'
>>> assert type(e) is str, \
'Variable e has invalid type, should be str'
>>> assert type(f) is str, \
'Variable f has invalid type, should be str'
>>> assert type(g) is str, \
'Variable g has invalid type, should be str'

>>> example
'Mark Watney'
>>> a
'Jan Twardowski'
>>> b
'Jan Twardowski'
>>> c
'Mark Watney'
>>> d
'Melissa Lewis'
>>> e
'Ryan Stone'
>>> f
'Ryan Stone'
>>> g
'Jan Twardowski'
"""

example = 'lt. Mark Watney, PhD'
A = 'dr hab. inż. Jan Twardowski, prof. AATC'
B = 'gen. pil. Jan Twardowski'
C = 'Mark Watney, PhD'
D = 'lt. col. ret. Melissa Lewis'
E = 'dr n. med. Ryan Stone'
F = 'Ryan Stone, MD-PhD'
G = 'lt. col. Jan Twardowski\t'

example: str = example[4:-5]

# str: Jan Twardowski
a = ...

# str: Jan Twardowski
b = ...

# str: Mark Watney
c = ...

# str: Melissa Lewis
d = ...

# str: Ryan Stone
e = ...

# str: Ryan Stone
f = ...

# str: Jan Twardowski
g = ...


"""
* Assignment: Sequence Slice Substr
* Required: yes
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
1. Use str.find() and slicing
2. Print TEXT without text in REMOVE
3. Run doctests - all must succeed

Polish:
1. Użyj str.find() oraz wycinania
2. Wypisz TEXT bez tekstu z REMOVE
3. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert result is not Ellipsis, \
'Assign result to variable: result'
>>> assert type(result) is str, \
'Variable result has invalid type, should be str'

>>> result
'We choose the Moon!'
"""

TEXT = 'We choose to go to the Moon!'
REMOVE = 'to go to '

# str: TEXT without REMOVE part
result = ...


"""
* Assignment: Sequence Slice Sequence
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
1. Create set result with every second element from a and b
2. Run doctests - all must succeed

Polish:
1. Stwórz zbiór result z co drugim elementem a i b
2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert result is not Ellipsis, \
'Assign result to variable: result'
>>> assert type(result) is set, \
'Variable result has invalid type, should be set'

>>> result
{0, 2, 4}
"""

a = (0, 1, 2, 3)
b = [2, 3, 4, 5]

# set[int]: with every second element from a and b
result = ...


"""
* Assignment: Sequence Slice Split
* Required: yes
* Complexity: easy
* Lines of code: 6 lines
* Time: 8 min

English:
2. Write header (first line) to header variable
3. Write data without header to data variable
4. Calculate pivot point: number records in data multiplied by PERCENT
(division ratio below)
5. Divide data into two lists:
a. train: 60% - training data
b. test: 40% - testing data
6. From data write training data from start to pivot
7. From data write test data from pivot to end
8. Run doctests - all must succeed

Polish:
1. Odseparuj nagłówek od danych
2. Zapisz nagłówek (pierwsza linia) do zmiennej header
3. Zapisz dane bez nagłówka do zmiennej data
4. Wylicz punkt podziału: ilość rekordów w data razy PROCENT (
proporcja podziału poniżej)
5. Podziel data na dwie listy:
a. train: 60% - dane do uczenia
b. test: 40% - dane do testów
6. Z data zapisz do uczenia rekordy od początku do punktu podziału
7. Z data zapisz do testów rekordy od punktu podziału do końca
8. Uruchom doctesty - wszystkie muszą się powieść

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert header is not Ellipsis, \
'Assign result to variable: header'

>>> assert data is not Ellipsis, \
'Assign result to variable: data'

>>> assert train is not Ellipsis, \
'Assign result to variable: train'

>>> assert test is not Ellipsis, \
'Assign result to variable: test'

>>> assert type(header) is tuple, \
'Variable header has invalid type, should be tuple'

>>> assert type(train) is list, \
'Variable train has invalid type, should be list'

>>> assert type(train) is list, \
'Variable train has invalid type, should be list'

>>> assert type(test) is list, \
'Variable test has invalid type, should be list'

>>> assert all(type(x) is tuple for x in train), \
'All elements in train should be tuple'

>>> assert all(type(x) is tuple for x in test), \
'All elements in test should be tuple'

>>> assert header not in train, \
'Header should not be in train'

>>> assert header not in test, \
'Header should not be in test'

('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species')

>>> train  # doctest: +NORMALIZE_WHITESPACE
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]

>>> test  # doctest: +NORMALIZE_WHITESPACE
[(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica')]
"""

DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica')]

# tuple[str]: with row at index 0 from DATA