# 2. Generators and Comprehensions

## 2.1. Lazy evaluation

• Code do not execute instantly

• Sometimes code is not executed at all!

### 2.1.1. Declaring generators

Listing 362. This will not execute code!
```range(0, 1E30)
range(0, 1E30)
range(0, 1E30)
```
Listing 363. This will only create generator expression, but not execute it!
```numbers = range(0, 1E30)

print(numbers)
# range(0, 1E30)
```

### 2.1.2. Getting values from generator

• Get all values from generator (not very efficient)

```numbers = range(0, 1E30)
list(range)
```
• Generator will calculate next number for every loop iteration, forgetting previous number, and not knowing next one

```for i in range(0, 1E30):
print(i)
```
• Will generate only three numbers, not 1,000,000,000,000,000,000,000,000,000,000

```for i in range(0, 1E30):
print(i)

if i == 3:
break

0
1
2
```

## 2.2. Generator expressions vs. Comprehensions

### 2.2.1. Comprehensions

• Executes instantly

```list(x for x in range(0, 5))        # [0, 1, 2, 3, 4]
[x for x in range(0, 5)]            # [0, 1, 2, 3, 4]
```
```set(x for x in range(0, 5))         # {0, 1, 2, 3, 4}
{x for x in range(0, 5)}            # {0, 1, 2, 3, 4}
```
```{x: x for x in range(0, 5)}         # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
```
```tuple(x for x in range(0, 5))       # (0, 1, 2, 3, 4)
(x for x in range(0, 5))            # <generator object <genexpr> at 0x1197032a0>
```
```all(x for x in range(0, 5))         # False
any(x for x in range(0, 5))         # True
sum(x for x in range(0, 5))         # 10
```

### 2.2.2. Generator Expressions

• Lazy evaluation

```(x*x for x in range(0, 30) if x % 2)
# <generator object <genexpr> at 0x1197032a0>
```

### 2.2.3. What is the difference?

• Execution and assignment

```numbers = [x**2 for x in range(0, 30) if x % 2 == 0]

print(numbers)
# [0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576, 676, 784]

print(numbers)
# [0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576, 676, 784]
```
• Create generator object and assign pointer (do not execute)

```numbers = (x**2 for x in range(0, 30) if x % 2 == 0)

print(numbers)
# <generator object <genexpr> at 0x11af5a570>

print(list(numbers))
# [0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576, 676, 784]

print(list(numbers))
# []
```

### 2.2.4. Which one is better?

• Comprehensions - Using values more than one

• Generators - Using value one (for example in the loop iterator)

## 2.3. Returning nested objects

Listing 364. Returning nested objects
```def my_function(number):
return number, number+10

[my_function(x) for x in range(0, 5)]
# [
#   (0, 10),
#   (1, 11),
#   (2, 12),
#   (3, 13),
#   (4, 14)
# ]
```
Listing 365. Returning nested objects
```def my_function(number):
if number % 2 == 0:
return {'number': number, 'status': 'even'}
else:
return {'number': number, 'status': 'odd'}

[my_function(x) for x in range(0, 5)]
# [
#    {'number': 0, 'status': 'even'},
#    {'number': 1, 'status': 'odd'},
#    {'number': 2, 'status': 'even'},
#    {'number': 3, 'status': 'odd'},
#    {'number': 4, 'status': 'even'},
# ]
```

### 2.3.1. Nested Comprehensions

```DATA = [
{'last_name': 'Jiménez'},
{'first_name': 'Mark', 'last_name': 'Watney'},
{'first_name': 'Иван'},
{'first_name': 'Jan', 'last_name': 'Twardowski', 'born': 1961},
{'first_name': 'Melissa', 'last_name': 'Lewis', 'first_step': 1969},
]

fieldnames = set()
fieldnames.update(key for record in DATA for key in record.keys())
```
```DATA = [
{'last_name': 'Jiménez'},
{'first_name': 'Mark', 'last_name': 'Watney'},
{'first_name': 'Иван'},
{'first_name': 'Jan', 'last_name': 'Twardowski', 'born': 1961},
{'first_name': 'Melissa', 'last_name': 'Lewis', 'first_step': 1969},
]

fieldnames = set()
fieldnames.update(key
for record in DATA
for key in record.keys()
)
```

## 2.4. `yield` Operator

```# ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
DATA = [
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(5.4, 3.9, 1.7, 0.4, 'setosa'),
(4.6, 3.4, 1.4, 0.3, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(5.7, 2.8, 4.5, 1.3, 'versicolor'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 3.3, 6.0, 2.5, 'virginica'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(4.9, 2.5, 4.5, 1.7, 'virginica'),
]
```
```def get_species(species):
output = []

for record in DATA:
if record[4] == species:
output.append(record)

return output

data = get_species('setosa')

print(data)
# [(5.1, 3.5, 1.4, 0.2, 'setosa'),
#  (4.9, 3.0, 1.4, 0.2, 'setosa'),
#  (5.4, 3.9, 1.7, 0.4, 'setosa'),
#  (4.6, 3.4, 1.4, 0.3, 'setosa')]

for row in data:
print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.9, 3.0, 1.4, 0.2, 'setosa')
# (5.4, 3.9, 1.7, 0.4, 'setosa')
# (4.6, 3.4, 1.4, 0.3, 'setosa')
```
```def get_species(species):
for record in DATA:
if record[4] == species:
yield record

data = get_species('setosa')

print(data)
# <generator object get_species at 0x11af257c8>

for row in data:
print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.9, 3.0, 1.4, 0.2, 'setosa')
# (5.4, 3.9, 1.7, 0.4, 'setosa')
# (4.6, 3.4, 1.4, 0.3, 'setosa')
```

## 2.5. Example

### 2.5.1. Filtering `list` items

```DATA = [
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(5.4, 3.9, 1.7, 0.4, 'setosa'),
(4.6, 3.4, 1.4, 0.3, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(5.7, 2.8, 4.5, 1.3, 'versicolor'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 3.3, 6.0, 2.5, 'virginica'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(4.9, 2.5, 4.5, 1.7, 'virginica'),
]

setosa = [x for x in DATA if x[4] == 'setosa']
print(setosa)
```

### 2.5.2. Filtering `dict` items

```DATA = [
{'first_name': 'Иван', 'last_name': 'Иванович', 'agency': 'Roscosmos'},
{'first_name': 'Jose', 'last_name': 'Jimenez', 'agency': 'NASA'},
{'first_name': 'Melissa', 'last_name': 'Lewis', 'agency': 'NASA'},
{'first_name': 'Alex', 'last_name': 'Vogel', 'agency': 'ESA'},
{'first_name': 'Mark', 'last_name': 'Watney', 'agency': 'NASA'},
]

nasa_astronauts = [(x['first_name'], x['last_name'])
for x in DATA if x['agency'] == 'NASA']
# [
#   ('Jose', 'Jimenez'),
#   ('Melissa', 'Lewis'),
#   ('Mark', 'Watney')
# ]
```

### 2.5.3. Reversing `dict` keys with values

```data = {'first_name': 'Иван', 'last_name': 'Иванович'}

{v: k for k, v in data.items()}
# dict {'Иван': 'first_name', 'Иванович': 'last_name'}
```

### 2.5.4. Applying functions

```[float(x) for x in range(0, 5) if x % 2 == 0]
# [0.0, 2.0, 4.0, 6.0, 8.0]
```
```def is_even(x):
if x % 2 == 0:
return True
else:
return False

[float(x) for x in range(0, 5) if is_even(x)]
# [0.0, 2.0, 4.0, 6.0, 8.0]
```

Listing 366. Clean Code in generator
```DATA = {'username': 'Иван Иванович', 'agency': 'Roscosmos'}

def asd(x):
return x.replace('Иван', 'Ivan')

out = {
value: asd(value)
for key, value in DATA.items()
}
# {'Иван Иванович': 'Ivan Ivanоvic'}

out = ['CCCP' if k == 'Roscosmos' else 'USA' for k,v in DATA.items() if k == 'agency']
print(out)
# ['USA']

```
```DATA = [
{'last_name': 'Jiménez'},
{'first_name': 'Mark', 'last_name': 'Watney'},
{'first_name': 'Иван'},
{'first_name': 'Jan', 'last_name': 'Twardowski', 'born': 1961},
{'first_name': 'Melissa', 'last_name': 'Lewis', 'first_step': 1969},
]

[asd(value)

for d in DATA
for key, value in d.items()

]
```
```DATA = [
{'first_name': 'Иван', 'last_name': 'Иванович', 'agency': 'Roscosmos'},
{'first_name': 'Jose', 'last_name': 'Jimenez', 'agency': 'NASA'},
{'first_name': 'Melissa', 'last_name': 'Lewis', 'agency': 'NASA'},
{'first_name': 'Alex', 'last_name': 'Vogel', 'agency': 'ESA'},
{'first_name': 'Mark', 'last_name': 'Watney', 'agency': 'NASA'},
]

nasa_astronauts = [(astronaut['first_name'], astronaut['last_name']) for astronaut in DATA if astronaut['agency'] == 'NASA']
# [
#   ('Jose', 'Jimenez'),
#   ('Melissa', 'Lewis'),
#   ('Mark', 'Watney')
# ]
```

## 2.7. Assignments

### 2.7.1. Generators vs. Comprehensions - iris

1. Skopiuj dane do pliku "iris.csv"

2. Zaczytaj dane pomijając nagłówek

3. Napisz funkcję która zwraca wszystkie pomiary dla danego gatunku

4. Gatunek będzie podawany jako `str` do funkcji

5. Zaimplementuj rozwiązanie wykorzystując zwykłą funkcję

6. Zaimplementuj rozwiązanie wykorzystując generator i słówko kluczowe `yield`

7. Porównaj wyniki jednego i drugiego rozwiązania przez użycie `sys.getsizeof()`

The whys and wherefores
• Wykorzystanie generatorów

• Odbieranie danych z lazy evaluation

• Porównanie wielkości struktur danych

• Parsowanie pliku

• Filtrowanie treści w locie

### 2.7.2. Generators vs. Comprehensions - passwd

• Complexity level: easy

• Lines of code to write: 40 lines

• Estimated time of completion: 20 min

1. Napisz program, który wczyta plik Listing 367.

2. Przefiltruj linie, tak aby nie zawierały komentarzy (zaczynające się od `#`) oraz pustych linii

3. Przefiltruj linie, aby wyciągnąć konta systemowe - użytkowników, którzy mają UID (trzecie pole) mniejsze niż 1000

4. Zwróć listę loginów użytkowników systemowych

5. Zaimplementuj rozwiązanie wykorzystując zwykłą funkcję

6. Zaimplementuj rozwiązanie wykorzystując generator i słówko kluczowe `yield`

7. Porównaj wyniki jednego i drugiego rozwiązania przez użycie `sys.getsizeof()`

8. Dlaczego różnice są tak niewielkie?

9. Co się stanie, gdy ilość danych się zwiększy?

The whys and wherefores
• Wykorzystanie generatorów

• Odbieranie danych z lazy evaluation

• Porównanie wielkości struktur danych

• Parsowanie pliku

• Filtrowanie treści w locie

Listing 367. `/etc/passwd` sample file
```##
# User Database
#   - User name
#   - User ID number (UID)
#   - User's group ID number (GID)
#   - Full name of the user (GECOS)
#   - User home directory
##

root:x:0:0:root:/root:/bin/bash
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
peck:x:1000:1000:Max Peck:/home/peck:/bin/bash
jimenez:x:1001:1001:José Jiménez:/home/jimenez:/bin/bash
ivanovic:x:1002:1002:Ivan Иванович:/home/ivanovic:/bin/bash
```