9. Comprehensions

9.1. Simple usage

9.1.1. Traditional

numbers = []

for x in range(0, 5):
    numbers.append(x+10)

# numbers = [10, 11, 12, 13, 14]

9.1.2. List Comprehension

[x+10 for x in range(0, 5)]
# [10, 11, 12, 13, 14]

9.1.3. Set Comprehension

{x+10 for x in range(0, 5)}
# {10, 11, 12, 13, 14}

9.1.4. Dict Comprehension

{x: x+10 for x in range(0, 5)}
# {0:10, 1:11, 2:12, 3:13, 4:14}
{x+10: x for x in range(0, 5)}
# {10:0, 11:1, 12:2, 13:3, 14:4}
{x+10: x+10 for x in range(0, 5)}
# {10:10, 11:11, 12:12, 13:13, 14:14}

9.1.5. Tuple Comprehension?!

(x+10 for x in range(0, 5))
# <generator object <genexpr> at 0x11eaef570>

9.2. Conditional Comprehension

9.2.1. Traditional

even_numbers = []

for x in range(0, 10):
    if x % 2 == 0:
        even_numbers.append(x)

print(even_numbers)
# [0, 2, 4, 6, 8]

9.2.2. Comprehensions

even_numbers = [x for x in range(0, 10) if x % 2 == 0]

print(even_numbers)
# [0, 2, 4, 6, 8]

9.3. Why?

9.3.1. Filtering results

DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
]

[record for record in DATA if record[4] == 'setosa']
# [
#   (5.1, 3.5, 1.4, 0.2, 'setosa'),
#   (4.7, 3.2, 1.3, 0.2, 'setosa')
# ]

9.3.2. Filtering with complex expressions

DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
]


def is_setosa(record):
    if record[4] == 'setosa':
        return True
    else:
        return False


[record for record in DATA if is_setosa(record)]
# [
#   (5.1, 3.5, 1.4, 0.2, 'setosa'),
#   (4.7, 3.2, 1.3, 0.2, 'setosa')
# ]

9.3.3. Applying function to each output element

[float(x) for x in range(0, 10)]
[float(x) for x in range(0, 10) if x % 2 == 0]

9.3.4. Returning nested objects

def get_tuple(number):
    return number, number+10

[get_tuple(x) for x in range(0, 5)]
# [
#   (0, 10),
#   (1, 11),
#   (2, 12),
#   (3, 13),
#   (4, 14)
# ]
def get_dict(number):
    if number % 2 == 0:
        return {'number': number, 'status': 'even'}
    else:
        return {'number': number, 'status': 'odd'}


[get_dict(x) for x in range(0, 5)]
# [
#    {'number': 0, 'status': 'even'},
#    {'number': 1, 'status': 'odd'},
#    {'number': 2, 'status': 'even'},
#    {'number': 3, 'status': 'odd'},
#    {'number': 4, 'status': 'even'},
# ]

9.3.5. Reversing dict keys with values

DATA = {'a': 1, 'b': 2}

DATA.items()
# [
#    ('a', 1),
#    ('b', 2),
# ]
DATA = {'a': 1, 'b': 2}

{value: key for key, value in DATA.items()}
# {1:'a', 2:'b'}
DATA = {'a': 1, 'b': 2}

{v:k for k,v in DATA.items()}
# {1:'a', 2:'b'}

9.3.6. Value collision while reversing dict

DATA = {'a': 1, 'b': 2, 'c': 2}

{v:k for k,v in DATA.items()}
# {1:'a', 2:'c'}

9.3.7. Quick parsing lines

FILE = [
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
]

output = []

for line in FILE:
    line = line.split(',')
    output.append(line)


print(output)
# [
#   ['5.8', '2.7', '5.1', '1.9', 'virginica'],
#   ['5.1', '3.5', '1.4', '0.2', 'setosa'],
#   ['5.7', '2.8', '4.1', '1.3', 'versicolor']
# ]
FILE = [
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
]

output = [line.split(',') for line in FILE]

print(output)
# [
#   ['5.8', '2.7', '5.1', '1.9', 'virginica'],
#   ['5.1', '3.5', '1.4', '0.2', 'setosa'],
#   ['5.7', '2.8', '4.1', '1.3', 'versicolor']
# ]

9.4. Advanced usage for Comprehensions and Generators

Note

More in chapter Generators and Comprehensions

9.5. Assignments

9.5.1. Split train/test

  • Filename: comprahension_split_train_test.py
  • Lines of code to write: 6 lines
  • Estimated time of completion: 15 min
  1. Mając do dyspozycji zbiór danych Irysów z listingu poniżej:

    DATA = [
        ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
        (5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa'),
        (7.0, 3.2, 4.7, 1.4, 'versicolor'),
        (7.6, 3.0, 6.6, 2.1, 'virginica'),
        (4.9, 3.0, 1.4, 0.2, 'setosa'),
        (4.9, 2.5, 4.5, 1.7, 'virginica'),
        (7.1, 3.0, 5.9, 2.1, 'virginica'),
        (4.6, 3.4, 1.4, 0.3, 'setosa'),
        (5.4, 3.9, 1.7, 0.4, 'setosa'),
        (5.7, 2.8, 4.5, 1.3, 'versicolor'),
        (5.0, 3.6, 1.4, 0.3, 'setosa'),
        (5.5, 2.3, 4.0, 1.3, 'versicolor'),
        (6.5, 3.0, 5.8, 2.2, 'virginica'),
        (6.5, 2.8, 4.6, 1.5, 'versicolor'),
        (6.3, 3.3, 6.0, 2.5, 'virginica'),
        (6.9, 3.1, 4.9, 1.5, 'versicolor'),
        (4.6, 3.1, 1.5, 0.2, 'setosa'),
    ]
    
  2. Zapisz nagłówek (pierwsza linia) do zmiennej

  3. Zapisz do innej zmiennej dane bez nagłówka

  4. Wylicz punkt podziału: ilość rekordów danych bez nagłówka razy procent

  5. Za pomocą List Comprehension podziel dane na:

    • X: List[Tuple[float]] - features
    • y: List[str] - labels
  6. Podziel zbiór na listy w proporcji:

    • X_train: List[Tuple[float]] - features do uczenia - 60%
    • X_test: List[Tuple[float]] - features do testów - 40%
    • y_train: List[str] - labels do uczenia - 60%
    • y_test: List[str] - labels do testów - 40%
  7. Stwórz result: Tuple[list, list, list, list] z wszystkimi cechami i labelkami

  8. Wypisz na ekranie result

The whys and wherefores:
 
  • Umiejętność przetwarzania złożonych typów danych
  • Korzystanie z przecięć danych
  • Konwersja typów
  • Magic Number