7.5. Loop over Dict

7.5.1. Rationale

  • Since Python 3.7: dict keeps order

  • Before Python 3.7: dict order is not ensured!!

7.5.2. Iterate

  • By default dict iterates over keys

  • Suggested variable name: key

>>> DATA = {'Sepal length': 5.1,
...         'Sepal width': 3.5,
...         'Petal length': 1.4,
...         'Petal width': 0.2,
...         'Species': 'setosa'}
>>>
>>> for obj in DATA:
...     print(obj)
Sepal length
Sepal width
Petal length
Petal width
Species

7.5.3. Iterate Keys

  • Suggested variable name: key

>>> DATA = {'Sepal length': 5.1,
...         'Sepal width': 3.5,
...         'Petal length': 1.4,
...         'Petal width': 0.2,
...         'Species': 'setosa'}
>>>
>>> list(DATA.keys())
['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>>
>>> for obj in DATA.keys():
...     print(obj)
Sepal length
Sepal width
Petal length
Petal width
Species

7.5.4. Iterate Values

  • Suggested variable name: value

>>> DATA = {'Sepal length': 5.1,
...         'Sepal width': 3.5,
...         'Petal length': 1.4,
...         'Petal width': 0.2,
...         'Species': 'setosa'}
>>>
>>> list(DATA.values())
[5.1, 3.5, 1.4, 0.2, 'setosa']
>>>
>>> for obj in DATA.values():
...     print(obj)
5.1
3.5
1.4
0.2
setosa

7.5.5. Iterate Key-Value Pairs

  • Suggested variable name: key, value

Getting pair: key, value from dict items:

>>> DATA = {'Sepal length': 5.1,
...         'Sepal width': 3.5,
...         'Petal length': 1.4,
...         'Petal width': 0.2,
...         'Species': 'setosa'}
>>>
>>>
>>> list(DATA.items())  # doctest: +NORMALIZE_WHITESPACE
[('Sepal length', 5.1),
 ('Sepal width', 3.5),
 ('Petal length', 1.4),
 ('Petal width', 0.2),
 ('Species', 'setosa')]
>>>
>>> for key, value in DATA.items():
...     print(key, '->', value)
Sepal length -> 5.1
Sepal width -> 3.5
Petal length -> 1.4
Petal width -> 0.2
Species -> setosa

7.5.6. List of Dicts

Unpacking list of dict:

>>> DATA = [{'Sepal length': 5.1, 'Sepal width': 3.5, 'Petal length': 1.4, 'Petal width': 0.2, 'Species': 'setosa'},
...         {'Sepal length': 5.7, 'Sepal width': 2.8, 'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'},
...         {'Sepal length': 6.3, 'Sepal width': 2.9, 'Petal length': 5.6, 'Petal width': 1.8, 'Species': 'virginica'}]
>>>
>>> for row in DATA:
...     sepal_length = row['Sepal length']
...     species = row['Species']
...     print(f'{species} -> {sepal_length}')
setosa -> 5.1
versicolor -> 5.7
virginica -> 6.3

7.5.7. Generate with Range

  • range()

  • Pythonic way is to use zip()

  • Don't use len(range(...)) - it evaluates generator

Create dict from two list:

>>> header = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = {}
>>>
>>> for i in range(len(header)):
...     key = header[i]
...     value = data[i]
...     result[key] = value
>>>
>>> print(result)  # doctest: +NORMALIZE_WHITESPACE
{'Sepal length': 5.1,
 'Sepal width': 3.5,
 'Petal length': 1.4,
 'Petal width': 0.2,
 'Species': 'setosa'}

7.5.8. Generate with Enumerate

  • enumerate()

  • _ regular variable name (not a special syntax)

  • _ by convention is used when variable will not be referenced

Create dict from two list:

>>> header = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = {}
>>>
>>> for i, key in enumerate(header):
...     result[key] = data[i]
>>>
>>> print(result)  # doctest: +NORMALIZE_WHITESPACE
{'Sepal length': 5.1,
 'Sepal width': 3.5,
 'Petal length': 1.4,
 'Petal width': 0.2,
 'Species': 'setosa'}

7.5.9. Generate with Zip

  • zip()

  • The most Pythonic way

>>> header = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = {}
>>>
>>> for key, value in zip(header, data):
...     result[key] = value
>>>
>>> print(result)  # doctest: +NORMALIZE_WHITESPACE
{'Sepal length': 5.1,
 'Sepal width': 3.5,
 'Petal length': 1.4,
 'Petal width': 0.2,
 'Species': 'setosa'}
>>> header = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = dict(zip(header, data))
>>>
>>> print(result)  # doctest: +NORMALIZE_WHITESPACE
{'Sepal length': 5.1,
 'Sepal width': 3.5,
 'Petal length': 1.4,
 'Petal width': 0.2,
 'Species': 'setosa'}

7.5.10. Assignments

Code 7.16. Solution
"""
* Assignment: Loop Dict To Dict
* Complexity: easy
* Lines of code: 3 lines
* Time: 8 min

English:
    1. Use data from "Given" section (see below)
    2. Convert to `result: dict[str, str]`
    3. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Przekonwertuj do `result: dict[str, str]`
    3. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> type(result)
    <class 'dict'>
    >>> result  # doctest: +NORMALIZE_WHITESPACE
    {'Doctorate': '6',
     'Prof-school': '6',
     'Masters': '5',
     'Bachelor': '5',
     'Engineer': '5',
     'HS-grad': '4',
     'Junior High': '3',
     'Primary School': '2',
     'Kindergarten': '1'}
"""

# Given
DATA = {
    6: ['Doctorate', 'Prof-school'],
    5: ['Masters', 'Bachelor', 'Engineer'],
    4: ['HS-grad'],
    3: ['Junior High'],
    2: ['Primary School'],
    1: ['Kindergarten'],
}

result = ...  # dict[str,str]: converted DATA. Note values are str not int!

Code 7.17. Solution
"""
* Assignment: Loop Dict To List
* Complexity: medium
* Lines of code: 4 lines
* Time: 5 min

English:
    1. Use data from "Given" section (see below)
    2. Print `list[dict]`:
        a. key - name from the header
        b. value - measurement or species
    3. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Wypisz `list[dict]`:
        a. klucz - nazwa z nagłówka
        b. wartość - wyniki pomiarów lub gatunek
    3. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> import sys
    >>> sys.tracebacklimit = 0

    >>> type(result)
    <class 'list'>

    >>> assert all(type(x) is dict for x in result)

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [{'Sepal length': 5.8, 'Sepal width': 2.7, 'Petal length': 5.1, 'Petal width': 1.9, 'Species': 'virginica'},
     {'Sepal length': 5.1, 'Sepal width': 3.5, 'Petal length': 1.4, 'Petal width': 0.2, 'Species': 'setosa'},
     {'Sepal length': 5.7, 'Sepal width': 2.8, 'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'},
     {'Sepal length': 6.3, 'Sepal width': 2.9, 'Petal length': 5.6, 'Petal width': 1.8, 'Species': 'virginica'},
     {'Sepal length': 6.4, 'Sepal width': 3.2, 'Petal length': 4.5, 'Petal width': 1.5, 'Species': 'versicolor'},
     {'Sepal length': 4.7, 'Sepal width': 3.2, 'Petal length': 1.3, 'Petal width': 0.2, 'Species': 'setosa'}]
"""

# Given
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
]

result = ...  # list[dict]: with converted DATA

Code 7.18. Solution
"""
* Assignment: Loop Dict Label Encoder
* Complexity: hard
* Lines of code: 9 lines
* Time: 13 min

English:
    1. Use data from "Given" section (see below)
    2. Define:
        a. `features: list[tuple]` - measurements
        b. `labels: list[int]` - species
        c. `label_encoder: dict[int, str]`
            dictionary with encoded (as numbers) species names
    3. Separate header from data
    4. To encode and decode `labels` (species) we need:
        a. Define `label_encoder: dict[int, str]`
        a. key - id (incremented integer value)
        b. value - species name
    5. `label_encoder` must be generated from `DATA`
    6. For each row add appropriate data to `features`, `labels` and
    `label_encoder`
    7. Print `features`, `labels` and `label_encoder`
    8. Compare result with "Tests" section (see below)

Polish:
    1. Użyj danych z sekcji "Given" (patrz poniżej)
    2. Zdefiniuj:
        a. `features: list[tuple]` - pomiary
        b. `labels: list[int]` - gatunki
        c. `label_encoder: dict[int, str]`
            słownik zakodowanych (jako cyfry) nazw gatunków
    3. Odseparuj nagłówek od danych
    4. Aby móc zakodować i odkodować `labels` (gatunki) potrzebujesz:
        a. Zdefiniuj `label_encoder: dict[int, str]`:
        a. key - identyfikator (kolejna liczba rzeczywista)
        b. value - nazwa gatunku
    5. `label_encoder` musi być wygenerowany z `DATA`
    6. Dla każdego wiersza dodawaj odpowiednie dane do
        `feature`, `labels` i `label_encoder`
    7. Wypisz `feature`, `labels` i `label_encoder`
    8. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Hints:
    * Reversed lookup dict

Tests:
    >>> import sys
    >>> sys.tracebacklimit = 0

    >>> assert type(features) is list
    >>> assert type(labels) is list
    >>> assert type(label_encoder) is dict
    >>> assert all(type(x) is tuple for x in features)
    >>> assert all(type(x) is int for x in labels)
    >>> assert all(type(x) is int for x in label_encoder.keys())
    >>> assert all(type(x) is str for x in label_encoder.values())

    >>> features  # doctest: +NORMALIZE_WHITESPACE
    [(5.8, 2.7, 5.1, 1.9),
     (5.1, 3.5, 1.4, 0.2),
     (5.7, 2.8, 4.1, 1.3),
     (6.3, 2.9, 5.6, 1.8),
     (6.4, 3.2, 4.5, 1.5),
     (4.7, 3.2, 1.3, 0.2)]
    >>> labels
    [0, 1, 2, 0, 2, 1]
    >>> label_encoder  # doctest: +NORMALIZE_WHITESPACE
    {0: 'virginica',
     1: 'setosa',
     2: 'versicolor'}
"""

# Given
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
]

features = ...  # list[tuple]: values from column 0-3 from DATA without header
labels = ...  # list[str]: species name from column 4 from DATA without header
label_encoder = ...  # dict[int,str]: lookup dict generated from species names