12.2. Comprehension List

12.2.1. Syntax

Short syntax:

>>> [x for x in range(0,5)]
[0, 1, 2, 3, 4]

Long Syntax:

>>> list(x for x in range(0,5))
[0, 1, 2, 3, 4]

12.2.2. Microbenchmark

>>> 
... %%timeit -r 1000 -n 1000
... result = []
... for x in range(0,5):
...     result.append(x)
...
457 ns ± 69.4 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
>>> 
... %%timeit -r 1000 -n 1000
... result = [x for x in range(0,5)]
...
411 ns ± 76.6 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)

12.2.3. Manipulate Numbers

>>> [x+1 for x in range(0,5)]
[1, 2, 3, 4, 5]
>>>
>>> [x+10 for x in range(0,5)]
[10, 11, 12, 13, 14]
>>> [x*x for x in range(1,5)]
[1, 4, 9, 16]
>>>
>>> [x*(x+1) for x in range(1,5)]
[2, 6, 12, 20]
>>> [x**2 for x in range(0,5)]
[0, 1, 4, 9, 16]
>>>
>>> [x**3 for x in range(0,5)]
[0, 1, 8, 27, 64]
>>>
>>> [2**x for x in range(0,5)]
[1, 2, 4, 8, 16]
>>>
>>> [3**x for x in range(0,5)]
[1, 3, 9, 27, 81]
>>> [1/x for x in range(0,5)]
Traceback (most recent call last):
ZeroDivisionError: division by zero
>>>
>>> [1/x for x in range(1,5)]
[1.0, 0.5, 0.3333333333333333, 0.25]

12.2.4. Manipulate Strings

>>> DATA = ['a', 'b', 'c']
>>>
>>> ','.join(DATA)
'a,b,c'
>>>
>>> ','.join(x for x in DATA)
'a,b,c'
>>>
>>> ','.join(x.upper() for x in DATA)
'A,B,C'

12.2.5. Type Conversion

>>> DATA = [1, 2, 3]
>>>
>>> [float(x) for x in DATA]
[1.0, 2.0, 3.0]

Method str.join() requires all arguments to be strings. If your data has other types in it, such as int in the following examples, method will fail. You can convert those values to string using comprehension.

>>> DATA = [1, 2, 3]
>>>
>>> ','.join(DATA)
Traceback (most recent call last):
TypeError: sequence item 0: expected str instance, int found
>>>
>>> ','.join(str(x) for x in DATA)
'1,2,3'

12.2.6. Slice Sequences

>>> DATA = [
...     ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa'),
... ]
>>>
>>>
>>> [row for row in DATA]  
[('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
 (5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> [row for row in DATA[1:]]  
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]

12.2.7. Slice Data in Sequences

>>> DATA = [
...     ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa'),
... ]
>>>
>>>
>>> [row[-1] for row in DATA[1:]]
['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']
>>>
>>> [row[0:4] for row in DATA[1:]]  
[(5.8, 2.7, 5.1, 1.9),
 (5.1, 3.5, 1.4, 0.2),
 (5.7, 2.8, 4.1, 1.3),
 (6.3, 2.9, 5.6, 1.8),
 (6.4, 3.2, 4.5, 1.5),
 (4.7, 3.2, 1.3, 0.2)]

12.2.8. Unpack Sequences

>>> DATA = [
...     ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa'),
... ]
>>>
>>>
>>> [row[0:4] for row in DATA[1:]]  
[(5.8, 2.7, 5.1, 1.9),
 (5.1, 3.5, 1.4, 0.2),
 (5.7, 2.8, 4.1, 1.3),
 (6.3, 2.9, 5.6, 1.8),
 (6.4, 3.2, 4.5, 1.5),
 (4.7, 3.2, 1.3, 0.2)]
>>> [row[-1] for row in DATA[1:]]
['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']

12.2.9. Use Case - 0x01

  • Increment

>>> [x+1 for x in range(0,5)]
[1, 2, 3, 4, 5]

12.2.10. Use Case - 0x02

  • Decrement

>>> [x-1 for x in range(0,5)]
[-1, 0, 1, 2, 3]

12.2.11. Use Case - 0x03

  • Sum

>>> sum(x for x in range(0,5))
10

12.2.12. Use Case - 0x04

  • Even or Odd

>>> [x for x in range(0,5)]
[0, 1, 2, 3, 4]
>>> [x%2==0 for x in range(0,5)]
[True, False, True, False, True]

12.2.13. Assignments

Code 12.2. Solution
"""
* Assignment: Comprehension List Translate
* Type: class assignment
* Complexity: easy
* Lines of code: 1 lines
* Time: 3 min

English:
    1. Use list comprehension to iterate over `DATA`
    2. If letter is in `PL` then use conversion value as letter
    3. Add letter to `result`
    4. Run doctests - all must succeed

Polish:
    1. Użyj rozwinięcia listowego do iteracji po `DATA`
    2. Jeżeli litera jest w `PL` to użyj skonwertowanej wartości jako litera
    3. Dodaj literę do `result`
    4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.join()`
    * `dict.get()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(result) is str

    >>> result
    'zazolc gesla jazn'
"""

PL = {
    'ą': 'a',
    'ć': 'c',
    'ę': 'e',
    'ł': 'l',
    'ń': 'n',
    'ó': 'o',
    'ś': 's',
    'ż': 'z',
    'ź': 'z',
}

DATA = 'zażółć gęślą jaźń'

# DATA with substituted PL diacritic chars to ASCII letters
# type: str
result = ...

Code 12.3. Solution
"""
* Assignment: Comprehension List Split
* Type: homework
* Complexity: medium
* Lines of code: 4 lines
* Time: 8 min

English:
    1. Using List Comprehension split `DATA` into:
        a. `features_train: list[tuple]` - 60% of first features in `DATA`
        b. `features_test: list[tuple]` - 40% of last features in `DATA`
        c. `labels_train: list[str]` - 60% of first labels in `DATA`
        d. `labels_test: list[str]` - 40% of last labels in `DATA`
    2. In order to do so, calculate pivot point:
        a. length of `DATA` times given percent (60% = 0.6)
        b. remember, that slice indicies must be `int`, not `float`
        c. for example: if dataset has 10 rows, then 6 rows will be for
           training, and 4 rows for test
    3. Run doctests - all must succeed

Polish:
    1. Używając List Comprehension podziel `DATA` na:
        a. `features_train: list[tuple]` - 60% pierwszych features w `DATA`
        b. `features_test: list[tuple]` - 40% ostatnich features w `DATA`
        c. `labels_train: list[str]` - 60% pierwszych labels w `DATA`
        d. `labels_test: list[str]` - 40% ostatnich labels w `DATA`
    2. Aby to zrobić, wylicz punkt podziału:
        a. długość `DATA` razy zadany procent (60% = 0.6)
        b. pamiętaj, że indeksy slice muszą być `int` a nie `float`
        c. na przykład: if zbiór danych ma 10 wierszy, to 6 wierszy będzie
        do treningu, a 4 do testów
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `iterable[:split]`
    * `iterable[split:]`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint

    >>> assert type(features_train) is list, \
    'make sure features_train is a list'

    >>> assert type(features_test) is list, \
    'make sure features_test is a list'

    >>> assert type(labels_train) is list, \
    'make sure labels_train is a list'

    >>> assert type(labels_test) is list, \
    'make sure labels_test is a list'

    >>> assert all(type(x) is tuple for x in features_train), \
    'all elements in features_train should be tuple'

    >>> assert all(type(x) is tuple for x in features_test), \
    'all elements in features_test should be tuple'

    >>> assert all(type(x) is str for x in labels_train), \
    'all elements in labels_train should be str'

    >>> assert all(type(x) is str for x in labels_test), \
    'all elements in labels_test should be str'

    >>> pprint(features_train)
    [(5.8, 2.7, 5.1, 1.9),
     (5.1, 3.5, 1.4, 0.2),
     (5.7, 2.8, 4.1, 1.3),
     (6.3, 2.9, 5.6, 1.8),
     (6.4, 3.2, 4.5, 1.5),
     (4.7, 3.2, 1.3, 0.2)]

    >>> pprint(features_test)
    [(7.0, 3.2, 4.7, 1.4),
     (7.6, 3.0, 6.6, 2.1),
     (4.9, 3.0, 1.4, 0.2),
     (4.9, 2.5, 4.5, 1.7)]

    >>> pprint(labels_train)
    ['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']

    >>> pprint(labels_test)
    ['versicolor', 'virginica', 'setosa', 'virginica']
"""

DATA = [
    ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
]

ratio = 0.6
header = DATA[0]
rows = DATA[1:]
split = int(len(rows) * ratio)