4. Slices

4.1. Accessing range of elements

  • Slice Index must be positive or negative int or zero

  • Slice has three indexes:

    • start (inclusive)

    • stop (exclusive)

    • step

4.2. Accessing slice from start

text = 'We choose to go to the Moon!'

len(text)
# 28

text[0:2]       # 'We'
text[:2]        # 'We'
text[3:9]       # 'choose'
text[23:28]     # 'Moon!'
text[23:27]     # 'Moon'

4.3. Accessing slice from back

  • Negative index starts from the end and go right to left

text = 'We choose to go to the Moon!'

text[-5:]       # 'Moon!'
text[-5:-1]     # 'Moon'
text[:-6]       # 'We choose to go to the'
text = 'We choose to go to the Moon!'

text[13:-2]  # 'go to the Moo'
text[-5:5]  # ''

4.4. Accessing slice not existing elements

text = 'We choose to go to the Moon!'

text[:100]  # 'We choose to go to the Moon!'
text[100:]  # ''

4.5. Accessing slice from all elements

text = 'We choose to go to the Moon!'

text[:]               # 'We choose to go to the Moon!'

4.6. Arithmetic operations on slice indexes

text = 'We choose to go to the Moon!'
first = 23
last = 28

text[first:last]       # 'Moon!'
text[first:last-1]     # 'Moon'

4.7. Every n-th element

text = 'We choose to go to the Moon!'

text[::2]             # 'W hoet ot h on'

4.7.1. Reversing

text = 'We choose to go to the Moon!'

text[::-1]            # '!nooM eht ot og ot esoohc eW'
text[::-2]            # '!oMeto go soce'

4.8. Slicing sequences

4.8.1. Slicing str

DATA = 'abcde'

DATA[:3]            # 'abc'
DATA[3:]            # 'de'
DATA[1:4]           # 'bcd'
DATA[::2]           # 'ace'
DATA[::-1]          # 'edcba'

4.8.2. Slicing tuple

DATA = ('a', 'b', 'c', 'd', 'e')

DATA[:3]            # ('a', 'b', 'c')
DATA[3:]            # ('d', 'e')
DATA[1:4]           # ('b', 'c', 'd')
DATA[::2]           # ('a', 'c', 'e')
DATA[::-1]          # ('e', 'd', 'c', 'b', 'a')

4.8.3. Slicing list

  • Slicing works the same as for str

DATA = ['a', 'b', 'c', 'd', 'e']

DATA[:3]            # ['a', 'b', 'c']
DATA[3:]            # ['d', 'e']
DATA[1:4]           # ['b', 'c', 'd']
DATA[::2]           # ['a', 'c', 'e']
DATA[::-1]          # ['e', 'd', 'c', 'b', 'a']

4.8.4. Slice set

  • Slicing set is not possible

DATA = {'a', 'b', 'c', 'd', 'e'}

DATA[:3]
# TypeError: 'set' object is not subscriptable

4.8.5. Slice dict

DATA = {'a': 1, 'b': 2}

DATA[:3]
# TypeError: unhashable type: 'slice'

4.9. Slicing nested sequences

DATA = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]

DATA[::2]
# [
#   [1, 2, 3],
#   [7, 8, 9],
# ]

4.10. Slice function

  • Slice object can be returned from function

  • Function can, for example, calculate starting point of a sub-string

text = 'We choose to go to the Moon!'

between = slice(23, 28)
text[between]
# 'Moon!'

4.11. Assignments

4.11.1. Simple collections

English
  1. Create tuple a with digits: 0, 1, 2, 3

  2. Create list b with digits: 2, 3, 4, 5

  3. Create set c with every second element from a and b

  4. Print c

Polish
  1. Stwórz tuplę a z cyframi: 0, 1, 2, 3

  2. Stwórz listę b z cyframi: 2, 3, 4, 5

  3. Stwórz zbiór c z co drugim elementem a i b

  4. Wypisz c

The whys and wherefores
  • Defining and using list, tuple, set

  • Slice data structures

  • Type casting

4.11.2. Split train/test

English
  1. For input data (see below)

  2. Write header (first line) to header variable

  3. Write data without header to data variable

  4. Calculate pivot point: number records in data multiplied by PERCENT

  5. Divide data into two lists:

    • X_train: 60% - training data

    • X_test: 40% - testing data

  6. From data write training data from start to pivot

  7. From data write test data from pivot to end

Polish
  1. Dla danych wejściowych (patrz poniżej)

  2. Zapisz nagłówek (pierwsza linia) do zmiennej header

  3. Zapisz dane bez nagłówka do zmiennej data

  4. Wylicz punkt podziału: ilość rekordów w data razy PROCENT

  5. Podziel data na dwie listy:

    • X_train: 60% - dane do uczenia

    • X_test: 40% - dane do testów

  6. Z data zapisz do uczenia rekordy od początku do punktu podziału

  7. Z data zapisz do testów rekordy od punktu podziału do końca

Input
INPUT = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
    (7.1, 3.0, 5.9, 2.1, 'virginica'),
    (4.6, 3.4, 1.4, 0.3, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (5.0, 3.6, 1.4, 0.3, 'setosa'),
    (5.5, 2.3, 4.0, 1.3, 'versicolor'),
    (6.5, 3.0, 5.8, 2.2, 'virginica'),
    (6.5, 2.8, 4.6, 1.5, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (6.9, 3.1, 4.9, 1.5, 'versicolor'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]
The whys and wherefores
  • Using nested sequences

  • Using slices

  • Type casting

  • Magic Number

4.11.3. Iris dataset

  • Complexity level: medium

  • Lines of code to write: 30 lines

  • Estimated time of completion: 20 min

  • Filename: solution/slice_iris.py

English
  1. For input data (see below)

  2. Use only slice

  3. Extract list features with measurements (every row must be tuple)

  4. Extract species name (every fifth element) and write to labels list

  5. Write unique species names to species set

Polish
  1. Dla danych wejściowych (patrz poniżej)

  2. Użyj tylko slice

  3. Wyodrębnij listę features z pomiarami (każdy wiersz ma być krotką)

  4. Wyodrębnij nazwę gatunku (co piąty element) i zapisz do listy labels

  5. Zapisz unikalne nazwy gatunków do zbioru species

Input
INPUT = (
    5.8, 2.7, 5.1, 1.9, 'virginica',
    5.1, 3.5, 1.4, 0.2, 'setosa',
    5.7, 2.8, 4.1, 1.3, 'versicolor',
    6.3, 2.9, 5.6, 1.8, 'virginica',
    6.4, 3.2, 4.5, 1.5, 'versicolor',
    4.7, 3.2, 1.3, 0.2, 'setosa',
)
Output
features = [
    (5.8, 2.7, 5.1, 1.9),
    (5.1, 3.5, 1.4, 0.2),
    (5.7, 2.8, 4.1, 1.3),
    (6.3, 2.9, 5.6, 1.8),
    (6.4, 3.2, 4.5, 1.5),
    (4.7, 3.2, 1.3, 0.2),
]

labels = [
    'virginica',
    'setosa',
    'versicolor',
    'virginica',
    'versicolor',
    'setosa',
]

species = {
    'versicolor',
    'setosa',
    'virginica',
}
The whys and wherefores
  • Defining and using list, tuple, set

  • Slicing sequences

4.11.4. Slicing text

  • Complexity level: easy

  • Lines of code to write: 8 lines

  • Estimated time of completion: 10 min

  • Filename: solution/slice_text.py

English
  1. For input data (see below)

  2. Expected value is Jana III Sobieskiego

  3. Use only slice to clean each variable

  4. Compare with output data (see below)

Polish
  1. Dla danych wejściowych (patrz poniżej)

  2. Oczekiwana wartość Jana III Sobieskiego

  3. Wykorzystaj tylko slice oczyszczenia każdej zmiennej

  4. Porównaj wyniki z danymi wyjściowymi (patrz poniżej)

Input
a = 'lt. Mark Watney'
b = 'lt. col. Jan Twardowski\t'
c = 'dr hab. inż. Jan Twardowski, prof. LAW'
d = 'gen. pil. Jan Twardowski'
e = 'Mark Watney, PhD'
f = 'lt. col. ret. Melissa Lewis'
g = 'dr n. med. Ryan Stone'
h = 'Ryan Stone, MD-PhD'
The whys and wherefores
  • Variable definition

  • Print formatting

  • Slicing strings

  • Cleaning text input