5. Nested Collections

5.1. list of tuple

5.1.1. Getting elements

DATA = [
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
]

DATA[2]     # (7.6, 3.0, 6.6, 2.1, 'virginica')
DATA[2][1]  # 3.0

5.1.2. Appending elements

DATA = [
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
]

element = (4.9, 2.5, 4.5, 1.7, 'virginica')
DATA.append(element)
DATA = [
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
]

DATA.append((4.9, 3.0, 1.4, 0.2, 'setosa'))

5.1.3. Length

DATA = [
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
]

len(DATA)       # 3
len(DATA[2])    # 5

5.2. list of dict

5.2.1. Getting elements

DATA = [
    {'measurements': [4.7, 3.2, 1.3, 0.2], 'species': 'setosa'},
    {'measurements': [7.0, 3.2, 4.7, 1.4], 'species': 'versicolor'},
    {'measurements': [7.6, 3.0, 6.6, 2.1], 'species': 'virginica'},
]

DATA[0]                             # {'measurements': [4.7, 3.2, 1.3, 0.2], 'species': 'setosa')
DATA[0]['measurements']             # [4.7, 3.2, 1.3, 0.2]
DATA[0]['species']                  # 'setosa'
DATA = [
    {'measurements': [4.7, 3.2, 1.3, 0.2], 'species': 'setosa'},
    {'measurements': [7.0, 3.2, 4.7, 1.4], 'species': 'versicolor'},
    {'measurements': [7.6, 3.0, 6.6, 2.1], 'species': 'virginica'},
]

DATA[0].get('kind')                 # KeyError: 'kind'
DATA[0].get('kind', 'n/a')          # 'n/a'
DATA[2].get('measurements')         # [7.6, 3.0, 6.6, 2.1]
DATA[2].get('measurements')[1]      # 3.0

5.2.2. Length

DATA = [
    {'measurements': [4.7, 3.2, 1.3, 0.2], 'species': 'setosa'},
    {'measurements': [7.0, 3.2, 4.7, 1.4], 'species': 'versicolor'},
    {'measurements': [7.6, 3.0, 6.6, 2.1], 'species': 'virginica'},
]

len(DATA)                     # 3
len(DATA[0])                  # 2
len(DATA[1])                  # 2
len(DATA[1]['species'])       # 10
len(DATA[1]['measurements'])  # 4

5.3. list of list

  • Multidimensional lists

my_list = [[4.7, 3.2], [1.3, 0.2]]

my_list = [
    [4.7, 3.2],
    [1.3, 0.2]]

5.3.1. Readability counts

DATA = [[1,2,3],[4,5,6],[7,8,9]]
DATA = [[1,2,3], [4,5,6], [7,8,9]]
DATA = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
DATA = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]

5.3.2. Getting elements

DATA = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]

array[0][0]  # 1
array[0][2]  # 3
array[2][1]  # 8

5.3.3. Length

DATA = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]

len(DATA)     # 3
len(DATA[2])  # 3

5.4. Mixed types

5.4.1. Getting elements

DATA = [
    [1, 2, 3],
    (4, 5, 6),
    {7, 8, 9},
    {'species': 'virginica', 'measurements': [7.6, 3.0, 6.6, 2.1]}
]

DATA[1][2]                # 6
DATA[3]['species']        # 'virginica'
DATA[3].get('species')    # 'virginica'

5.4.2. Length

DATA = [
    [1, 2, 3],
    (4, 5, 6),
    {7, 8, 9},
    {'species': 'virginica', 'measurements': [7.6, 3.0, 6.6, 2.1]}
]

len(DATA)                     # 4
len(DATA[0])                  # 3
len(DATA[3])                  # 2
len(DATA[3]['measurements'])  # 4

5.5. Assignments

5.5.1. Split train/test

  • Filename: sequences_split_train_test.py

  • Lines of code to write: 6 lines

  • Estimated time of completion: 15 min

Listing 5.1. Iris Dataset
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
    (7.1, 3.0, 5.9, 2.1, 'virginica'),
    (4.6, 3.4, 1.4, 0.3, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (5.0, 3.6, 1.4, 0.3, 'setosa'),
    (5.5, 2.3, 4.0, 1.3, 'versicolor'),
    (6.5, 3.0, 5.8, 2.2, 'virginica'),
    (6.5, 2.8, 4.6, 1.5, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (6.9, 3.1, 4.9, 1.5, 'versicolor'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]
  1. Mając do dyspozycji zbiór danych Irysów z listingu Listing 5.1.

  2. Zapisz nagłówek (pierwsza linia) do zmiennej

  3. Zapisz do innej zmiennej dane bez nagłówka

  4. Wylicz punkt podziału: ilość rekordów danych bez nagłówka razy procent

  5. Podziel zbiór na dwie listy w proporcji:

    • X_train - dane do uczenia - 60%

    • X_test - dane testowe - 40%

  6. Z danych bez nagłówka zapisz do uczenia rekordy od początku do punktu podziału

  7. Z danych bez nagłówka zapisz do testów rekordy od punktu podziału do końca

The whys and wherefores
  • Umiejętność przetwarzania złożonych typów danych

  • Korzystanie z przecięć danych

  • Konwersja typów

  • Magic Number

Hint
  • selected = DATA[1:]