# 3.7. Sequence Slice¶

## 3.7.1. Rationale¶

• Slice argument must be int (positive, negative or zero)

• Positive Index starts with 0

• Negative index starts with -1

## 3.7.2. Slice Forwards¶

• sequence[start:stop]

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[0:2]
'We'
>>> text[:2]
'We'
>>> text[0:9]
'We choose'
>>> text[:9]
'We choose'
>>> text[23:28]
'Moon!'
>>> text[23:]
'Moon!'


## 3.7.3. Slice Backwards¶

• Negative index starts from the end and go right to left

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:-13]
'We choose to go'
>>> text[:-19]
'We choose'
>>> text[-12:]
'to the Moon!'
>>> text[-5:]
'Moon!'
>>> text[-5:-1]
'Moon'
>>> text[23:-2]
'Moo'
>>>
>>> text[-1:0]
''
>>> text[-2:0]
''
>>> text[-2:2]
''
>>> text[-5:5]
''


## 3.7.4. Step¶

• Every n-th element

• sequence[start:stop:step]

• start defaults to 0

• stop defaults to len(sequence)

• step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[::1]
'We choose to go to the Moon!'
>>> text[::2]
'W hoet ot h on'
>>> text[::-1]
'!nooM eht ot og ot esoohc eW'
>>> text[::-2]
'!oMeto go soce'


## 3.7.5. Out of Range¶

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:100]
'We choose to go to the Moon!'
>>>
>>> text[100:]
''


## 3.7.6. Ordered Sequences¶

Slicing str:

>>> data = 'abcde'
>>>
>>> data[0:3]
'abc'
>>> data[3:5]
'de'
>>> data[:3]
'abc'
>>> data[3:]
'de'
>>> data[::1]
'abcde'
>>> data[::-1]
'edcba'
>>> data[::2]
'ace'
>>> data[::-2]
'eca'
>>> data[1::2]
'bd'
>>> data[1:4:2]
'bd'


Slicing tuple:

>>> data = ('a', 'b', 'c', 'd', 'e')
>>>
>>> data[0:3]
('a', 'b', 'c')
>>> data[3:5]
('d', 'e')
>>> data[:3]
('a', 'b', 'c')
>>> data[3:]
('d', 'e')
>>> data[::2]
('a', 'c', 'e')
>>> data[::-1]
('e', 'd', 'c', 'b', 'a')
>>> data[1::2]
('b', 'd')
>>> data[1:4:2]
('b', 'd')


Slicing list:

>>> data = ['a', 'b', 'c', 'd', 'e']
>>>
>>> data[0:3]
['a', 'b', 'c']
>>> data[3:5]
['d', 'e']
>>> data[:3]
['a', 'b', 'c']
>>> data[3:]
['d', 'e']
>>> data[::2]
['a', 'c', 'e']
>>> data[::-1]
['e', 'd', 'c', 'b', 'a']
>>> data[1::2]
['b', 'd']
>>> data[1:4:2]
['b', 'd']


## 3.7.7. Unordered Sequences¶

Slicing set is not possible:

>>> data = {'a', 'b', 'c', 'd', 'e'}
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'set' object is not subscriptable


Slicing frozenset is not possible:

>>> data = frozenset({'a', 'b', 'c', 'd', 'e'})
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'frozenset' object is not subscriptable


## 3.7.8. Nested Sequences¶

>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> DATA[1:]  # doctest: +NORMALIZE_WHITESPACE
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> DATA[-3:]  # doctest: +NORMALIZE_WHITESPACE
[(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[::2]  # doctest: +NORMALIZE_WHITESPACE
[[1, 2, 3],
[7, 8, 9]]
>>>
>>> data[::2][1]
[7, 8, 9]
>>>
>>> data[::2][:1]
[[1, 2, 3]]
>>>
>>> data[::2][1][1:]
[8, 9]


## 3.7.9. Slice All¶

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:]
'We choose to go to the Moon!'


Used in numpy to get all rows or columns:

>>> import numpy as np
>>>
>>> data = np.array([[1, 2, 3],
...                  [4, 5, 6],
...                  [7, 8, 9]])
...
>>> data[:, 1]
array([2, 5, 8])
>>>
>>> data[1, :]
array([4, 5, 6])


This unfortunately does not work on list:

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[:]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
>>> data[:, 1]
Traceback (most recent call last):
TypeError: list indices must be integers or slices, not tuple
>>>
>>> data[:][1]
[4, 5, 6]


Used in pandas to get all rows or columns:

>>> import pandas as pd
>>> pd.set_option('display.max_columns', 10)
>>>
>>>
>>> df = pd.DataFrame([
...     {'A': 1, 'B': 2, 'C': 3},
...     {'A': 4, 'B': 5, 'C': 6},
...     {'A': 7, 'B': 8, 'C': 9}])
>>>
>>> df
A  B  C
0  1  2  3
1  4  5  6
2  7  8  9
>>>
>>> df.loc[:, ('A','B')]
A  B
0  1  2
1  4  5
2  7  8
>>>
>>> df.loc[::2, ::2]
A  C
0  1  3
2  7  9
>>>
>>> df.loc[1, :]
A    4
B    5
C    6
Name: 1, dtype: int64


## 3.7.10. Index Arithmetic¶

>>> text = 'We choose to go to the Moon!'
>>> first = 23
>>> last = 28
>>> step = 2
>>>
>>> text[first:last]
'Moon!'
>>> text[first:last-1]
'Moon'
>>> text[first:last:step]
'Mo!'
>>> text[first:last-1:step]
'Mo'


## 3.7.11. Slice Function¶

• Every n-th element

• sequence[start:stop:step]

• start defaults to 0

• stop defaults to len(sequence)

• step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> q = slice(23, 27)
>>> text[q]
'Moon'
>>>
>>> q = slice(None, 9)
>>> text[q]
'We choose'
>>>
>>> q = slice(23, None)
>>> text[q]
'Moon!'
>>>
>>> q = slice(23, None, 2)
>>> text[q]
'Mo!'
>>>
>>> q = slice(None, None, 2)
>>> text[q]
'W hoet ot h on'


## 3.7.12. Example¶

>>> from pprint import pprint
>>>
>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> pprint(DATA[1:])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> pprint(DATA[1::2])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.4, 3.2, 4.5, 1.5, 'versicolor')]
>>>
>>> pprint(DATA[1::-2])
[(5.8, 2.7, 5.1, 1.9, 'virginica')]
>>>
>>> pprint(DATA[:1:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa')]
>>>
>>> pprint(DATA[:-5:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'), (6.3, 2.9, 5.6, 1.8, 'virginica')]
>>>
>>> pprint(DATA[1:-5:-2])
[]


## 3.7.13. Assignments¶

"""
* Assignment: Sequence Slice Substr
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
1. Use data from "Given" section (see below)
2. Use str.find() and slicing
3. Print TEXT without text in REMOVE
4. Compare result with "Tests" section (see below)

Polish:
1. Użyj danych z sekcji "Given" (patrz poniżej)
2. Użyj str.find() oraz wycinania
3. Wypisz TEXT bez tekstu z REMOVE
4. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
>>> import sys
>>> sys.tracebacklimit = 0

>>> assert result is not Ellipsis, 'Assignment solution must be in result instead of ... (Ellipsis)'
>>> assert type(result) is str, 'Variable result has invalid type, should be str'

>>> result
'We choose the Moon!'
"""

# Given
TEXT = 'We choose to go to the Moon!'
REMOVE = 'to go to '

result = ...  # str TEXT without REMOVE part


"""
* Assignment: Sequence Slice Sequence
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
1. Use data from "Given" section (see below)
3. Create set result with every second element from a and b
4. Print result
5. Compare result with "Tests" section (see below)

Polish:
1. Użyj danych z sekcji "Given" (patrz poniżej)
3. Stwórz zbiór result z co drugim elementem a i b
4. Wypisz result
5. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
>>> import sys
>>> sys.tracebacklimit = 0

>>> assert result is not Ellipsis, 'Assignment solution must be in result instead of ... (Ellipsis)'
>>> assert type(result) is set, 'Variable result has invalid type, should be set'

>>> result
{0, 2, 4}
"""

# Given
a = (0, 1, 2, 3)
b = [2, 3, 4, 5]

result = ...  # set with every second element from a and b


"""
* Assignment: Sequence Slice Text
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
1. Use data from "Given" section (see below)
2. Remove title and military rank in each variable
3. Remove also whitespaces at the beginning and end of a text
4. Use only slice to clean text
5. Compare result with "Tests" section (see below)

Polish:
1. Użyj danych z sekcji "Given" (patrz poniżej)
2. Usuń tytuł naukowy i stopień wojskowy z każdej zmiennej
3. Usuń również białe znaki na początku i końcu tekstu
4. Użyj tylko slice do oczyszczenia tekstu
5. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
>>> import sys
>>> sys.tracebacklimit = 0

>>> assert a is not Ellipsis, 'Assignment solution must be in a instead of ... (Ellipsis)'
>>> assert b is not Ellipsis, 'Assignment solution must be in b instead of ... (Ellipsis)'
>>> assert c is not Ellipsis, 'Assignment solution must be in c instead of ... (Ellipsis)'
>>> assert d is not Ellipsis, 'Assignment solution must be in d instead of ... (Ellipsis)'
>>> assert e is not Ellipsis, 'Assignment solution must be in e instead of ... (Ellipsis)'
>>> assert f is not Ellipsis, 'Assignment solution must be in f instead of ... (Ellipsis)'
>>> assert g is not Ellipsis, 'Assignment solution must be in g instead of ... (Ellipsis)'
>>> assert type(a) is str, 'Variable a has invalid type, should be str'
>>> assert type(b) is str, 'Variable b has invalid type, should be str'
>>> assert type(c) is str, 'Variable c has invalid type, should be str'
>>> assert type(d) is str, 'Variable d has invalid type, should be str'
>>> assert type(e) is str, 'Variable e has invalid type, should be str'
>>> assert type(f) is str, 'Variable f has invalid type, should be str'
>>> assert type(g) is str, 'Variable g has invalid type, should be str'

>>> example
'Mark Watney'
>>> a
'Jan Twardowski'
>>> b
'Jan Twardowski'
>>> c
'Mark Watney'
>>> d
'Melissa Lewis'
>>> e
'Ryan Stone'
>>> f
'Ryan Stone'
>>> g
'Jan Twardowski'
"""

# Given
example = 'lt. Mark Watney, PhD'
a = 'dr hab. inż. Jan Twardowski, prof. AATC'
b = 'gen. pil. Jan Twardowski'
c = 'Mark Watney, PhD'
d = 'lt. col. ret. Melissa Lewis'
e = 'dr n. med. Ryan Stone'
f = 'Ryan Stone, MD-PhD'
g = 'lt. col. Jan Twardowski\t'

example: str = example[4:-5]
a: str  # Jan Twardowski
b: str  # Jan Twardowski
c: str  # Mark Watney
d: str  # Melissa Lewis
e: str  # Ryan Stone
f: str  # Ryan Stone
g: str  # Jan Twardowski


"""
* Assignment: Sequence Slice Split
* Complexity: easy
* Lines of code: 6 lines
* Time: 8 min

English:
1. Use data from "Given" section (see below)
3. Write header (first line) to header variable
4. Write data without header to data variable
5. Calculate pivot point: number records in data multiplied by PERCENT
(division ratio below)
6. Divide data into two lists:
a. train: 60% - training data
b. test: 40% - testing data
7. From data write training data from start to pivot
8. From data write test data from pivot to end
9. Compare result with "Tests" section (see below)

Polish:
1. Użyj danych z sekcji "Given" (patrz poniżej)
2. Odseparuj nagłówek od danych
3. Zapisz nagłówek (pierwsza linia) do zmiennej header
4. Zapisz dane bez nagłówka do zmiennej data
5. Wylicz punkt podziału: ilość rekordów w data razy PROCENT (
proporcja podziału poniżej)
6. Podziel data na dwie listy:
a. train: 60% - dane do uczenia
b. test: 40% - dane do testów
7. Z data zapisz do uczenia rekordy od początku do punktu podziału
8. Z data zapisz do testów rekordy od punktu podziału do końca
9. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
>>> import sys
>>> sys.tracebacklimit = 0

>>> assert header is not Ellipsis, 'Assignment solution must be in header instead of ... (Ellipsis)'
>>> assert data is not Ellipsis, 'Assignment solution must be in data instead of ... (Ellipsis)'
>>> assert train is not Ellipsis, 'Assignment solution must be in train instead of ... (Ellipsis)'
>>> assert test is not Ellipsis, 'Assignment solution must be in test instead of ... (Ellipsis)'
>>> assert type(header) is tuple, 'Variable header has invalid type, should be tuple'
>>> assert type(train) is list, 'Variable train has invalid type, should be list'
>>> assert type(train) is list, 'Variable train has invalid type, should be list'
>>> assert type(test) is list, 'Variable test has invalid type, should be list'
>>> assert all(type(x) is tuple for x in train), 'All elements in train should be tuple'
>>> assert all(type(x) is tuple for x in test), 'All elements in test should be tuple'
>>> assert header not in train, 'Header should not be in train'
>>> assert header not in test, 'Header should not be in test'

('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species')

>>> train  # doctest: +NORMALIZE_WHITESPACE
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]

>>> test  # doctest: +NORMALIZE_WHITESPACE
[(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica')]
"""

# Given
DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica')]

header = ...  # tuple with row at index 0 from DATA
data = ...  # list of tuple with rows at all the other indexes from DATA
train = ...  # first 60% from data
test = ...  # last 40% from data