7.6. Comprehensions

7.6.1. Loop Information Recap

Listing 116. Iterative approach to applying function to elements
output = []

for x in range(0, 5):
    output.append(x+10)

print(output)
# [10, 11, 12, 13, 14]

7.6.2. Comprehensions Syntax

output = [<RETURN> for <VARIABLE> in <ITERABLE>]
[x for x in (0,1,2,3,4)]
# [0, 1, 2, 3, 4]

[x for x in range(0,5)]
# [0, 1, 2, 3, 4]

[x**2 for x in range(0,5)]
# [0, 1, 4, 9, 16]

7.6.3. Generator expressions vs. Comprehensions

  • Comprehensions executes instantly

  • Generator expression executes lazily

list(x for x in range(0,5))        # [0, 1, 2, 3, 4]
[x for x in range(0,5)]            # [0, 1, 2, 3, 4]

set(x for x in range(0,5))         # {0, 1, 2, 3, 4}
{x for x in range(0,5)}            # {0, 1, 2, 3, 4}

dict((x,x) for x in range(0,5))    # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
{x: x for x in range(0,5)}         # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}

tuple(x for x in range(0,5))       # (0, 1, 2, 3, 4)
(x for x in range(0,5))            # <generator object <genexpr> at 0x118c1aed0>

all(x for x in range(0,5))         # False
any(x for x in range(0,5))         # True
sum(x for x in range(0,5))         # 10

7.6.4. Simple usage

7.6.4.1. List Comprehension

Listing 117. list Comprehension approach to applying function to elements
[x+10 for x in range(0, 5)]
# [10, 11, 12, 13, 14]

list(x+10 for x in range(0,5))
# [10, 11, 12, 13, 14]

7.6.4.2. Set Comprehension

Listing 118. set Comprehension approach to applying function to elements
{x+10 for x in range(0, 5)}
# {10, 11, 12, 13, 14}

set(x+10 for x in range(0, 5))
# {10, 11, 12, 13, 14}

7.6.4.3. Dict Comprehension

Listing 119. dict Comprehension approach to applying function to elements
{x: x+10 for x in range(0, 5)}
# {0:10, 1:11, 2:12, 3:13, 4:14}

dict((x,x+10) for x in range(0,5))
# {0:10, 1:11, 2:12, 3:13, 4:14}
Listing 120. dict Comprehension approach to applying function to elements
{x+10: x for x in range(0, 5)}
# {10:0, 11:1, 12:2, 13:3, 14:4}

dict((x+10,x) for x in range(0,5))
# {10:0, 11:1, 12:2, 13:3, 14:4}
Listing 121. dict Comprehension approach to applying function to elements
{x+10: x+10 for x in range(0, 5)}
# {10:10, 11:11, 12:12, 13:13, 14:14}

dict((x+10: x+10) for x in range(0,5))
# {10:10, 11:11, 12:12, 13:13, 14:14}

7.6.4.4. Tuple Comprehension?!

  • Tuple Comprehension vs. Generator Expression

  • More in chapter Generators

Listing 122. Tuple Comprehension
tuple(x for x in range(0,5))
# (0, 1, 2, 3, 4)
Listing 123. Generator Expression
(x+10 for x in range(0, 5))
# <generator object <genexpr> at 0x11eaef570>

7.6.5. Conditional Comprehension

Listing 124. Iterative approach to applying function to selected elements
output = []

for x in range(0, 5):
    if x % 2 == 0:
        output.append(x)

print(output)
# [0, 2, 4]
Listing 125. list Comprehensions approach to applying function to selected elements
[x for x in range(0, 5) if x % 2 == 0]
# [0, 2, 4]

7.6.5.1. Filtering dict items

DATA = [
    {'first_name': 'Иван', 'last_name': 'Иванович', 'agency': 'Roscosmos'},
    {'first_name': 'Jose', 'last_name': 'Jimenez', 'agency': 'NASA'},
    {'first_name': 'Melissa', 'last_name': 'Lewis', 'agency': 'NASA'},
    {'first_name': 'Alex', 'last_name': 'Vogel', 'agency': 'ESA'},
    {'first_name': 'Mark', 'last_name': 'Watney', 'agency': 'NASA'},
]

nasa_astronauts = [(astro['first_name'], astro['last_name'])
                    for astro in DATA
                        if astro['agency'] == 'NASA']

print(nasa_astronauts)
# [
#   ('Jose', 'Jimenez'),
#   ('Melissa', 'Lewis'),
#   ('Mark', 'Watney')
# ]

7.6.6. Applying function

Listing 126. Applying function to each output element
[float(x) for x in range(0, 5)]
# [0.0, 1.0, 2.0, 3.0, 4.0]

[float(x) for x in range(0, 5) if x % 2 == 0]
# [0.0, 2.0, 4.0]
Listing 127. Applying function to each output element
[pow(2, x) for x in range(0, 5)]
# [1, 2, 4, 8, 16]

[pow(2, x) for x in range(0, 5) if x % 2 == 0]
# [1, 4, 16]
[pow(2, x)
    for x in range(0, 5)
        if x % 2 == 0
]
# [1, 4, 16]

7.6.7. Examples

7.6.7.1. Filtering results

Listing 128. Using list comprehension for result filtering
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
]

[features for *features,label in DATA if label == 'setosa']
# [
#   [5.1, 3.5, 1.4, 0.2],
#   [4.7, 3.2, 1.3, 0.2],
# ]

[features
 for *features, label in DATA
    if label == 'setosa']
# [
#   [5.1, 3.5, 1.4, 0.2],
#   [4.7, 3.2, 1.3, 0.2],
# ]

[f for *f,l in DATA if l == 'setosa']
# [
#   [5.1, 3.5, 1.4, 0.2],
#   [4.7, 3.2, 1.3, 0.2],
# ]

[X for *X,y in DATA if y == 'setosa']
# [
#   [5.1, 3.5, 1.4, 0.2],
#   [4.7, 3.2, 1.3, 0.2],
# ]

7.6.7.2. Filtering with complex expressions

Listing 129. Using list comprehension for result filtering with more complex expression
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
]


def is_setosa(species):
    if species == 'setosa':
        return True
    else:
        return False


[X for *X,y in DATA if is_setosa(y)]
# [
#   [5.1, 3.5, 1.4, 0.2],
#   [4.7, 3.2, 1.3, 0.2],
# ]

7.6.7.3. Quick parsing lines

Listing 130. Quick parsing lines
DATA = [
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
]

output = []

for row in DATA:
    row = row.split(',')
    output.append(row)

print(output)
# [
#   ['5.8', '2.7', '5.1', '1.9', 'virginica'],
#   ['5.1', '3.5', '1.4', '0.2', 'setosa'],
#   ['5.7', '2.8', '4.1', '1.3', 'versicolor']
# ]
Listing 131. Quick parsing lines
DATA = [
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
]

output = [row.split(',') for row in DATA]

print(output)
# [
#   ['5.8', '2.7', '5.1', '1.9', 'virginica'],
#   ['5.1', '3.5', '1.4', '0.2', 'setosa'],
#   ['5.7', '2.8', '4.1', '1.3', 'versicolor']
# ]

7.6.7.4. Reversing dict keys with values

Listing 132. Reversing dict keys with values
DATA = {'a': 1, 'b': 2}

list(DATA.items())
# [
#    ('a', 1),
#    ('b', 2),
# ]

[(k,v) for k,v in DATA.items()]
# [
#    ('a', 1),
#    ('b', 2),
# ]

[(v,k) for k,v in DATA.items()]
# [
#    (1, 'a'),
#    (2, 'b'),
# ]
Listing 133. Reversing dict keys with values
DATA = {'a': 1, 'b': 2}

{v:k for k,v in DATA.items()}
# {1:'a', 2:'b'}
Listing 134. Value collision while reversing dict
DATA = {'a': 1, 'b': 2, 'c': 2}

{v:k for k,v in DATA.items()}
# {1:'a', 2:'c'}

7.6.7.5. Nested

INPUT = {
    6: ['Doctorate', 'Prof-school'],
    5: ['Masters', 'Bachelor', 'Engineer'],
    4: ['HS-grad'],
    3: ['Junior High'],
    2: ['Primary School'],
    1: ['Kindergarten'],
}

OUTPUT = {education: str(key)
          for key, names in INPUT.items()
             for education in names}

print(OUTPUT)
# {
#   'Doctorate': '6',
#   'Prof-school': '6',
#   'Masters': '5',
#   'Bachelor': '5',
#   'Engineer': '5',
#   'HS-grad': '4',
#   'Junior High': '3',
#   'Primary School': '2',
#   'Kindergarten': '1'
# }

7.6.8. Assignments

7.6.8.1. Split train/test

English
  1. For given data structure INPUT: List[tuple] (see below)

  2. Separate header from data

  3. Calculate pivot point: length of data times given percent

  4. Using List Comprehension split data to:

    • features: List[tuple] - list of measurements (each measurement row is a tuple)

    • labels: List[str] - list of species names

  5. Split those data structures with proportion:

    • features_train: List[tuple] - features to train - 60%

    • features_test: List[tuple] - features to test - 40%

    • labels_train: List[str] - labels to train - 60%

    • labels_test: List[str] - labels to test - 40%

  6. Create OUTPUT: Tuple[list, list, list, list] with features (training and test) and labels (training and test)

  7. Print OUTPUT

  8. Compare results with "Output" section below

Polish
  1. Dana jest struktura danych INPUT: List[tuple] (patrz sekcja input)

  2. Odseparuj nagłówek do danych

  3. Wylicz punkt podziału: długość danych razy zadany procent

  4. Używając List Comprehension podziel dane na:

    • features: List[tuple] - lista pomiarów (każdy wiersz z pomiarami ma być tuple)

    • labels: List[str] - lista nazw gatunków

  5. Podziel te struktury danych w proporcji:

    • features_train: List[tuple] - features do uczenia - 60%

    • features_test: List[tuple] - features do testów - 40%

    • labels_train: List[str] - labels do uczenia - 60%

    • labels_test: List[str] - labels do testów - 40%

  6. Stwórz OUTPUT: Tuple[list, list, list, list] z cechami (treningowymi i testowymi) oraz labelkami (treningowymi i testowymi)

  7. Wypisz OUTPUT

  8. Porównaj wynik z sekcją "Output" poniżej

Input
INPUT = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
    (7.1, 3.0, 5.9, 2.1, 'virginica'),
    (4.6, 3.4, 1.4, 0.3, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (5.0, 3.6, 1.4, 0.3, 'setosa'),
    (5.5, 2.3, 4.0, 1.3, 'versicolor'),
    (6.5, 3.0, 5.8, 2.2, 'virginica'),
    (6.5, 2.8, 4.6, 1.5, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (6.9, 3.1, 4.9, 1.5, 'versicolor'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]
Output
from typing import List, Dict


features_train: List[tuple]
# [(5.8, 2.7, 5.1, 1.9), (5.1, 3.5, 1.4, 0.2), (5.7, 2.8, 4.1, 1.3),
#  (6.3, 2.9, 5.6, 1.8), (6.4, 3.2, 4.5, 1.5), (4.7, 3.2, 1.3, 0.2),
#  (7.0, 3.2, 4.7, 1.4), (7.6, 3.0, 6.6, 2.1), (4.9, 3.0, 1.4, 0.2),
#  (4.9, 2.5, 4.5, 1.7), (7.1, 3.0, 5.9, 2.1), (4.6, 3.4, 1.4, 0.3)]

features_test: List[tuple]
# [(5.4, 3.9, 1.7, 0.4), (5.7, 2.8, 4.5, 1.3), (5.0, 3.6, 1.4, 0.3),
#  (5.5, 2.3, 4.0, 1.3), (6.5, 3.0, 5.8, 2.2), (6.5, 2.8, 4.6, 1.5),
#  (6.3, 3.3, 6.0, 2.5), (6.9, 3.1, 4.9, 1.5), (4.6, 3.1, 1.5, 0.2)]

labels_train: List[str]
# ['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor',
#  'setosa', 'versicolor', 'virginica', 'setosa', 'virginica',
#  'virginica', 'setosa']

labels_test: List[str]
# ['setosa', 'versicolor', 'setosa', 'versicolor', 'virginica',
#  'versicolor', 'virginica', 'versicolor', 'setosa']

OUTPUT: Tuple[list, list, list, list]
# ([(5.8, 2.7, 5.1, 1.9), (5.1, 3.5, 1.4, 0.2), (5.7, 2.8, 4.1, 1.3),
#   (6.3, 2.9, 5.6, 1.8), (6.4, 3.2, 4.5, 1.5), (4.7, 3.2, 1.3, 0.2),
#   (7.0, 3.2, 4.7, 1.4), (7.6, 3.0, 6.6, 2.1), (4.9, 3.0, 1.4, 0.2),
#   (4.9, 2.5, 4.5, 1.7), (7.1, 3.0, 5.9, 2.1), (4.6, 3.4, 1.4, 0.3)],
#
#  [(5.4, 3.9, 1.7, 0.4), (5.7, 2.8, 4.5, 1.3), (5.0, 3.6, 1.4, 0.3),
#   (5.5, 2.3, 4.0, 1.3), (6.5, 3.0, 5.8, 2.2), (6.5, 2.8, 4.6, 1.5),
#   (6.3, 3.3, 6.0, 2.5), (6.9, 3.1, 4.9, 1.5), (4.6, 3.1, 1.5, 0.2)],
#
#  ['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor',
#   'setosa', 'versicolor', 'virginica', 'setosa', 'virginica',
#   'virginica', 'setosa'],
#
#  ['setosa', 'versicolor', 'setosa', 'versicolor', 'virginica',
#   'versicolor', 'virginica', 'versicolor', 'setosa'])
The whys and wherefores
  • Iterating over nested data structures

  • Using slices

  • Type casting

  • List comprehension

  • Magic Number