7. Comprehensions

7.1. Simple usage

7.1.1. Traditional

Listing 7.1. Iterative approach to applying function to elements
numbers = []

for x in range(0, 5):
    numbers.append(x+10)

# numbers = [10, 11, 12, 13, 14]

7.1.2. List Comprehension

Listing 7.2. list Comprehension approach to applying function to elements
numbers = [x+10 for x in range(0, 5)]
# [10, 11, 12, 13, 14]

7.1.3. Set Comprehension

Listing 7.3. set Comprehension approach to applying function to elements
numbers = {x+10 for x in range(0, 5)}
# {10, 11, 12, 13, 14}

7.1.4. Dict Comprehension

Listing 7.4. dict Comprehension approach to applying function to elements
numbers = {x: x+10 for x in range(0, 5)}
# {0:10, 1:11, 2:12, 3:13, 4:14}
Listing 7.5. dict Comprehension approach to applying function to elements
numbers = {x+10: x for x in range(0, 5)}
# {10:0, 11:1, 12:2, 13:3, 14:4}
Listing 7.6. dict Comprehension approach to applying function to elements
numbers = {x+10: x+10 for x in range(0, 5)}
# {10:10, 11:11, 12:12, 13:13, 14:14}

7.1.5. Tuple Comprehension?!

Listing 7.7. Generator Expression approach to applying function to elements
numbers = (x+10 for x in range(0, 5))
# <generator object <genexpr> at 0x11eaef570>

7.2. Conditional Comprehension

7.2.1. Traditional

Listing 7.8. Iterative approach to applying function to selected elements
even_numbers = []

for x in range(0, 10):
    if x % 2 == 0:
        even_numbers.append(x)

print(even_numbers)
# [0, 2, 4, 6, 8]

7.2.2. Comprehensions

Listing 7.9. list Comprehensions approach to applying function to selected elements
even_numbers = [x for x in range(0, 10) if x % 2 == 0]

print(even_numbers)
# [0, 2, 4, 6, 8]

7.3. Why?

7.3.1. Filtering results

Listing 7.10. Using list comprehension for result filtering
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
]

measurements = [record for record in DATA if record[4] == 'setosa']
# [
#   (5.1, 3.5, 1.4, 0.2, 'setosa'),
#   (4.7, 3.2, 1.3, 0.2, 'setosa')
# ]

7.3.2. Filtering with complex expressions

Listing 7.11. Using list comprehension for result filtering with more complex expression
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
]


def is_setosa(record):
    if record[4] == 'setosa':
        return True
    else:
        return False


measurements = [record for record in DATA if is_setosa(record)]
# [
#   (5.1, 3.5, 1.4, 0.2, 'setosa'),
#   (4.7, 3.2, 1.3, 0.2, 'setosa')
# ]

7.3.3. Reversing dict keys with values

Listing 7.12. Reversing dict keys with values
DATA = {'a': 1, 'b': 2}

DATA.items()
# [
#    ('a', 1),
#    ('b', 2),
# ]
Listing 7.13. Reversing dict keys with values
DATA = {'a': 1, 'b': 2}

{value: key for key, value in DATA.items()}
# {1:'a', 2:'b'}
Listing 7.14. Reversing dict keys with values
DATA = {'a': 1, 'b': 2}

{v:k for k,v in DATA.items()}
# {1:'a', 2:'b'}

7.3.4. Value collision while reversing dict

Listing 7.15. Value collision while reversing dict
DATA = {'a': 1, 'b': 2, 'c': 2}

{v:k for k,v in DATA.items()}
# {1:'a', 2:'c'}

7.3.5. Quick parsing lines

Listing 7.16. Quick parsing lines
DATA = [
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
]

output = []

for row in DATA:
    row = row.split(',')
    output.append(row)


print(output)
# [
#   ['5.8', '2.7', '5.1', '1.9', 'virginica'],
#   ['5.1', '3.5', '1.4', '0.2', 'setosa'],
#   ['5.7', '2.8', '4.1', '1.3', 'versicolor']
# ]
Listing 7.17. Quick parsing lines
DATA = [
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
]

output = [row.split(',') for row in DATA]

print(output)
# [
#   ['5.8', '2.7', '5.1', '1.9', 'virginica'],
#   ['5.1', '3.5', '1.4', '0.2', 'setosa'],
#   ['5.7', '2.8', '4.1', '1.3', 'versicolor']
# ]

7.3.6. Applying function to each output element

Listing 7.18. Applying function to each output element
numbers = [float(x) for x in range(0, 10)]
Listing 7.19. Applying function to each output element
numbers = [float(x) for x in range(0, 10) if x % 2 == 0]

7.3.7. Returning nested objects

Listing 7.20. Returning nested objects
def get_tuple(number):
    return number, number+10

[get_tuple(x) for x in range(0, 5)]
# [
#   (0, 10),
#   (1, 11),
#   (2, 12),
#   (3, 13),
#   (4, 14)
# ]
Listing 7.21. Returning nested objects
def get_dict(number):
    if number % 2 == 0:
        return {'number': number, 'status': 'even'}
    else:
        return {'number': number, 'status': 'odd'}


[get_dict(x) for x in range(0, 5)]
# [
#    {'number': 0, 'status': 'even'},
#    {'number': 1, 'status': 'odd'},
#    {'number': 2, 'status': 'even'},
#    {'number': 3, 'status': 'odd'},
#    {'number': 4, 'status': 'even'},
# ]

7.4. Advanced usage for Comprehensions and Generators

Note

More in chapter Generators and Comprehensions

7.5. Assignments

7.5.1. Split train/test

  • Filename: comprehension_split_train_test.py

  • Lines of code to write: 8 lines

  • Estimated time of completion: 15 min

  • Input data: Listing 7.22.

    Listing 7.22. Split train/test data
    DATA = [
        ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
        (5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa'),
        (7.0, 3.2, 4.7, 1.4, 'versicolor'),
        (7.6, 3.0, 6.6, 2.1, 'virginica'),
        (4.9, 3.0, 1.4, 0.2, 'setosa'),
        (4.9, 2.5, 4.5, 1.7, 'virginica'),
        (7.1, 3.0, 5.9, 2.1, 'virginica'),
        (4.6, 3.4, 1.4, 0.3, 'setosa'),
        (5.4, 3.9, 1.7, 0.4, 'setosa'),
        (5.7, 2.8, 4.5, 1.3, 'versicolor'),
        (5.0, 3.6, 1.4, 0.3, 'setosa'),
        (5.5, 2.3, 4.0, 1.3, 'versicolor'),
        (6.5, 3.0, 5.8, 2.2, 'virginica'),
        (6.5, 2.8, 4.6, 1.5, 'versicolor'),
        (6.3, 3.3, 6.0, 2.5, 'virginica'),
        (6.9, 3.1, 4.9, 1.5, 'versicolor'),
        (4.6, 3.1, 1.5, 0.2, 'setosa'),
    ]
    
  1. Mając do dyspozycji zbiór danych Irysów z listingu Listing 7.22.:

  2. Zapisz nagłówek (pierwsza linia) do zmiennej

  3. Zapisz do innej zmiennej dane bez nagłówka

  4. Wylicz punkt podziału: ilość rekordów danych bez nagłówka razy procent

  5. Za pomocą List Comprehension podziel dane na:

    • X: List[Tuple[float]] - features

    • y: List[str] - labels

  6. Podziel zbiór na listy w proporcji:

    • X_train: List[Tuple[float]] - features do uczenia - 60%

    • X_test: List[Tuple[float]] - features do testów - 40%

    • y_train: List[str] - labels do uczenia - 60%

    • y_test: List[str] - labels do testów - 40%

  7. Stwórz result: Tuple[list, list, list, list] z wszystkimi cechami i labelkami

  8. Wypisz na ekranie result

The whys and wherefores
  • Umiejętność przetwarzania złożonych typów danych

  • Korzystanie z przecięć danych

  • Konwersja typów

  • Magic Number