5.11. Generators

5.11.1. Recap

  • Comprehensions executes instantly

  • Generators are lazy evaluated

>>> data = [x for x in range(0,5)]
>>>
>>> print(data)
[0, 1, 2, 3, 4]
>>> list(data)
[0, 1, 2, 3, 4]
>>> data = (x for x in range(0,5))
>>>
>>> print(data)  # doctest: +ELLIPSIS
<generator object <genexpr> at 0x...>
>>> list(data)
[0, 1, 2, 3, 4]
>>> _ = list(x for x in range(0,5))      # list comprehension
>>> _ = tuple(x for x in range(0,5))     # tuple comprehension
>>> _ = set(x for x in range(0,5))       # set comprehension
>>> _ = dict((x,x) for x in range(0,5))  # dict comprehension
>>> _ = [x for x in range(0,5)]          # list comprehension
>>> _ = (x for x in range(0,5))          # generator expression
>>> _ = {x for x in range(0,5)}          # set comprehension
>>> _ = {x:x for x in range(0,5)}        # dict comprehension

5.11.2. Rationale

Generators:

  • Lazy Evaluated

  • Sometimes code is executed partially or not executed at all!

  • If you want each result once (for example in loop)

  • Cannot rollback or reset

  • Forgets previous result

  • Knows only current result

  • Don't know the next result

  • Cleared once they are executed

  • If you need generator evaluated instantly, there is no point in using generators

Comprehension:

  • Evaluated instantly

  • Stored in memory until end of a program or freed by del

  • If you want to use values more than once

5.11.3. Yield Keyword

>>> def myfunc():
...     yield 'a'
...     yield 'b'
...     yield 'c'
>>>
>>>
>>> result = myfunc()
>>>
>>> result  # doctest: +ELLIPSIS
<generator object myfunc at 0x...>
>>>
>>> next(result)
'a'
>>> next(result)
'b'
>>> next(result)
'c'
>>> next(result)
Traceback (most recent call last):
StopIteration
>>> def myfunc():
...     yield [x for x in range(0,5)]
...     yield [x for x in range(5,10)]
...     yield [x for x in range(10,15)]
>>>
>>>
>>> data = myfunc()
>>>
>>> next(data)
[0, 1, 2, 3, 4]
>>> next(data)
[5, 6, 7, 8, 9]
>>> next(data)
[10, 11, 12, 13, 14]
>>> next(data)
Traceback (most recent call last):
StopIteration

5.11.4. Generator Function

Function:

>>> def even(data):
...     result = []
...     for x in data:
...         if x % 2 == 0:
...             result.append(x)
...     return result
>>>
>>>
>>> DATA = [0, 1, 2, 3, 4, 5]
>>>
>>> result = even(DATA)
>>>
>>> print(result)
[0, 2, 4]

Generator:

>>> def even(data):
...     for x in data:
...         if x % 2 == 0:
...             yield x
>>>
>>>
>>> DATA = [0, 1, 2, 3, 4, 5]
>>>
>>> result = even(DATA)
>>>
>>> print(result)  # doctest: +ELLIPSIS
<generator object even at 0x...>
>>> list(result)
[0, 2, 4]

5.11.5. Generator Filter

>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> def get_values(species):
...     result = []
...     for row in DATA:
...         if row[4] == species:
...             result.append(row)
...     return result
>>>
>>>
>>> data = get_values('setosa')
>>>
>>> print(data)
[(5.1, 3.5, 1.4, 0.2, 'setosa'), (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> for row in data:
...     print(row)
(5.1, 3.5, 1.4, 0.2, 'setosa')
(4.7, 3.2, 1.3, 0.2, 'setosa')
>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> def get_values(species):
...     for row in DATA:
...         if row[4] == species:
...             yield row
>>>
>>>
>>> data = get_values('setosa')
>>>
>>> print(data)  # doctest: +ELLIPSIS
<generator object get_values at 0x...>
>>>
>>> for row in data:
...     print(row)
(5.1, 3.5, 1.4, 0.2, 'setosa')
(4.7, 3.2, 1.3, 0.2, 'setosa')

5.11.6. Itertools

  • Learn more at https://docs.python.org/library/itertools.html

  • More information in Itertools

  • from itertools import *

  • count(start=0, step=1)

  • cycle(iterable)

  • repeat(object[, times])

  • accumulate(iterable[, func, *, initial=None])

  • chain(*iterables)

  • compress(data, selectors)

  • islice(iterable, start, stop[, step])

  • starmap(function, iterable)

  • product(*iterables, repeat=1)

  • permutations(iterable, r=None)

  • combinations(iterable, r)

  • combinations_with_replacement(iterable, r)

  • groupby(iterable, key=None)

5.11.7. Memory Footprint

  • sys.getsizeof(obj) returns the size of an obj in bytes

  • sys.getsizeof(obj) calls obj.__sizeof__() method

  • sys.getsizeof(obj) adds an additional garbage collector overhead if the obj is managed by the garbage collector

>>> from sys import getsizeof
>>>
>>>
>>> gen1 = (x for x in range(0,1))
>>> gen10 = (x for x in range(0,10))
>>> gen100 = (x for x in range(0,100))
>>> gen1000 = (x for x in range(0,1000))
>>>
>>> getsizeof(gen1)
112
>>>
>>> getsizeof(gen10)
112
>>>
>>> getsizeof(gen100)
112
>>>
>>> getsizeof(gen1000)
112
>>> from sys import getsizeof
>>>
>>>
>>> com1 = [x for x in range(0,1)]
>>> com10 = [x for x in range(0,10)]
>>> com100 = [x for x in range(0,100)]
>>> com1000 = [x for x in range(0,1000)]
>>>
>>>
>>> getsizeof(com1)
88
>>>
>>> getsizeof(com10)
184
>>>
>>> getsizeof(com100)
920
>>>
>>> getsizeof(com1000)
8856

5.11.8. Inspection

>>> from inspect import isgenerator
>>>
>>>
>>> a = [x for x in range(0,5)]
>>> b = (x for x in range(0,5))
>>>
>>> isgenerator(a)
False
>>> isgenerator(b)
True
>>> from inspect import isgenerator
>>>
>>>
>>> data = range(0, 10)
>>>
>>> isgenerator(data)
False

5.11.9. Introspection

>>> data = (x for x in range(0,10))
>>>
>>>
>>> next(data)
0
>>>
>>> data.gi_code  # doctest: +ELLIPSIS
<code object <genexpr> at 0x..., file "<...>", line 1>
>>>
>>> data.gi_running
False
>>>
>>> data.gi_frame  # doctest: +ELLIPSIS
<frame at 0x..., file '<...>', line 1, code <genexpr>>
>>>
>>> data.gi_frame.f_locals  # doctest: +ELLIPSIS
{'.0': <range_iterator object at 0x...>, 'x': 0}
>>>
>>> data.gi_frame.f_code  # doctest: +ELLIPSIS
<code object <genexpr> at 0x...0, file "<...>", line 1>
>>>
>>> data.gi_frame.f_lineno
1
>>>
>>> data.gi_frame.f_lasti
8
>>>
>>> data.gi_yieldfrom

5.11.10. Multiple Yields

>>> def run():
...     for x in range(0, 3):
...         yield x
...     for y in range(10, 13):
...         yield y
>>>
>>>
>>> result = run()
>>>
>>> type(result)
<class 'generator'>
>>>
>>> next(result)
0
>>> next(result)
1
>>> next(result)
2
>>> next(result)
10
>>> next(result)
11
>>> next(result)
12
>>> next(result)
Traceback (most recent call last):
StopIteration

5.11.11. Yield From

  • Since Python 3.3: PEP 380 -- Syntax for Delegating to a Subgenerator

  • Helps with refactoring generators

  • Useful for large generators which can be split into smaller ones

  • Delegation call

  • yield from terminates on GeneratorExit from other function

  • The value of the yield from expression is the first argument to the StopIteration exception raised by the iterator when it terminates

  • Return expr in a generator causes StopIteration(expr) to be raised upon exit from the generator

>>> def generator1():
...     for x in range(0, 3):
...         yield x
>>>
>>> def generator2():
...     for x in range(10, 13):
...         yield x
>>>
>>> def run():
...     yield from generator1()
...     yield from generator2()
>>>
>>>
>>> result = run()
>>>
>>> type(result)
<class 'generator'>
>>>
>>> next(result)
0
>>> next(result)
1
>>> next(result)
2
>>> next(result)
10
>>> next(result)
11
>>> next(result)
12
>>> next(result)
Traceback (most recent call last):
StopIteration

The code is equivalent to itertools.chain():

>>> from itertools import chain
>>>
>>>
>>> def generator1():
...     for x in range(0, 3):
...         yield x
>>>
>>> def generator2():
...     for x in range(10, 13):
...         yield x
>>>
>>> def run():
...     for x in chain(generator1(), generator2()):
...         yield x
>>>
>>>
>>> result = run()
>>>
>>> type(result)
<class 'generator'>
>>>
>>> list(result)
[0, 1, 2, 10, 11, 12]

yield from turns ordinary function, into a delegation call:

>>> def worker():
...     return [1, 2, 3]
>>>
>>> def run():
...     yield from worker()
>>>
>>>
>>> result = run()
>>>
>>> next(result)
1
>>> next(result)
2
>>> next(result)
3
>>> next(result)
Traceback (most recent call last):
StopIteration
>>> def worker():
...     return [x for x in range(0,3)]
>>>
>>> def run():
...     yield from worker()
>>>
>>>
>>> result = run()
>>>
>>> next(result)
0
>>> next(result)
1
>>> next(result)
2
>>> next(result)
Traceback (most recent call last):
StopIteration

yield from with sequences:

>>> def run():
...     yield from [0, 1, 2]
>>>
>>>
>>> result = run()
>>>
>>> type(result)
<class 'generator'>
>>>
>>> next(result)
0
>>> next(result)
1
>>> next(result)
2
>>> next(result)
Traceback (most recent call last):
StopIteration

yield from with comprehensions:

>>> def run():
...     yield from [x for x in range(0,3)]
>>>
>>>
>>> result = run()
>>>
>>> type(result)
<class 'generator'>
>>>
>>> next(result)
0
>>> next(result)
1
>>> next(result)
2
>>> next(result)
Traceback (most recent call last):
StopIteration

yield from with generator expressions:

>>> def run():
...     yield from (x for x in range(0,3))
>>>
>>>
>>> result = run()
>>>
>>> type(result)
<class 'generator'>
>>>
>>> next(result)
0
>>> next(result)
1
>>> next(result)
2
>>> next(result)
Traceback (most recent call last):
StopIteration

5.11.12. Send

  • .send() method allows to pass value to the generator

  • data = yield will receive this "sent" value

  • After running you have to send None value to begin processing

  • Sending anything other will raise TypeError

>>> def run():
...     while True:
...         data = yield
...         print(f'Processing {data}')
>>>
>>>
>>> worker = run()
>>>
>>> type(worker)
<class 'generator'>
>>>
>>> worker.send('hello')
Traceback (most recent call last):
TypeError: can't send non-None value to a just-started generator
>>>
>>> worker.send(None)
>>> worker.send(0)
Processing 0
>>> worker.send(1)
Processing 1
>>> worker.send(2)
Processing 2
>>> worker.send('Mark Watney')
Processing Mark Watney
>>> def run():
...     while True:
...         data = yield
...         print(f'Processing {data}')
>>>
>>>
>>> worker = run()
>>> worker.send(None)
>>>
>>> for x in range(0,3):
...     worker.send(x)
Processing 0
Processing 1
Processing 2
>>> def run():
...     while True:
...         data = yield
...         print(f'Processing {data}')
>>>
>>>
>>> worker = run()
>>> worker.send(None)
>>>
>>> for x in range(0,3):
...     worker.send(x)
Processing 0
Processing 1
Processing 2
>>> def worker():
...     while True:
...         data = yield
...         print(f'Processing {data}')
>>>
>>> def run(gen):
...     gen.send(None)
...     while True:
...         x = yield
...         gen.send(x)
>>>
>>>
>>> result = run(worker())
>>> result.send(None)
>>>
>>> for x in range(0,3):
...     result.send(x)
Processing 0
Processing 1
Processing 2

5.11.13. Conclusion

  • Python yield keyword creates a generator function.

  • It’s useful when the function returns a large amount of data by splitting it into multiple chunks.

  • We can also send values to the generator using its send() function.

  • The yield from statement is used to create a sub-iterator from the generator function.

  • Source: https://www.askpython.com/python/python-yield-examples

5.11.14. Assignments

Code 5.63. Solution
"""
* Assignment: Idioms Generator Iris
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
    1. Write filter for `DATA` which returns `features` for given `species`
    2. Implement solution using function
    3. Implement solution using generator and `yield` keyword
    4. Compare results of both using `sys.getsizeof()`
    5. What will happen if input data will be bigger?
    6. Note, that in different Python versions you'll get slightly
       different values for getsizeof generator and function:
        a. 112 for generator in Python 3.9
        b. 112 for generator in Python 3.8
        c. 120 for generator in Python 3.7
    7. Run doctests - all must succeed

Polish:
    1. Napisz filtr dla `DATA` zwracający `features` dla danego gatunku `species`
    2. Zaimplementuj rozwiązanie wykorzystując funkcję
    3. Zaimplementuj rozwiązanie wykorzystując generator i słowo kluczowe `yield`
    4. Porównaj wyniki obu używając `sys.getsizeof()`
    5. Co się stanie, gdy ilość danych będzie większa?
    6. Zwróć uwagę, że w zależności od wersji Python wartości getsizeof
       dla funkcji i generatora mogą się nieznaczenie różnić:
        a. 112 dla generator w Python 3.9
        b. 112 dla generator w Python 3.8
        c. 120 dla generator w Python 3.7
    7. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from sys import getsizeof
    >>> from inspect import isfunction, isgeneratorfunction

    >>> assert isfunction(function)
    >>> assert isgeneratorfunction(generator)

    >>> list(function(DATA, 'setosa'))
    [[5.1, 3.5, 1.4, 0.2], [4.7, 3.2, 1.3, 0.2]]
    >>> list(generator(DATA, 'setosa'))
    [[5.1, 3.5, 1.4, 0.2], [4.7, 3.2, 1.3, 0.2]]

    >>> getsizeof(function(DATA, 'setosa'))  # Python 3.8: 88
    88
    >>> getsizeof(function(DATA*10, 'setosa'))  # Python 3.8: 256
    248
    >>> getsizeof(function(DATA*100, 'setosa'))  # Python 3.8: 1664
    1656
    >>> getsizeof(generator(DATA, 'setosa'))
    112
    >>> getsizeof(generator(DATA*10, 'setosa'))
    112
    >>> getsizeof(generator(DATA*100, 'setosa'))
    112
"""

DATA = [(5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa')]


def function(data: list, species: str):
    ...


def generator(data: list, species: str):
    ...


Code 5.64. Solution
"""
* Assignment: Idioms Generator Passwd
* Complexity: medium
* Lines of code: 10 lines
* Time: 8 min

English:
    1. Split `DATA` by lines and then by colon `:`
    2. Extract system accounts (users with UID [third field] is less than 1000)
    3. Return list of system account logins
    4. Implement solution using function
    5. Implement solution using generator and `yield` keyword
    6. Compare results of both using `sys.getsizeof()`
    7. Run doctests - all must succeed

Polish:
    1. Podziel `DATA` po liniach a następnie po dwukropku `:`
    2. Wyciągnij konta systemowe (użytkownicy z UID (trzecie pole) mniejszym niż 1000)
    3. Zwróć listę loginów użytkowników systemowych
    4. Zaimplementuj rozwiązanie wykorzystując funkcję
    5. Zaimplementuj rozwiązanie wykorzystując generator i słowo kluczowe `yield`
    6. Porównaj wyniki obu używając `sys.getsizeof()`
    7. Uruchom doctesty - wszystkie muszą się powieść

Hint:
    * `str.splitlines()`
    * `str.strip()`
    * `str.split()`
    * `bool(0) is False`
    * `bool('0') is True`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from sys import getsizeof
    >>> from inspect import isfunction, isgeneratorfunction

    >>> assert isfunction(function)
    >>> assert isgeneratorfunction(generator)
    >>> fun = function(DATA)
    >>> gen = generator(DATA)

    >>> list(fun)
    ['root', 'bin', 'daemon', 'adm', 'shutdown', 'halt', 'nobody', 'sshd']
    >>> list(gen)
    ['root', 'bin', 'daemon', 'adm', 'shutdown', 'halt', 'nobody', 'sshd']

    >>> getsizeof(fun)
    120
    >>> getsizeof(gen)
    112
"""

DATA = """root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
nobody:x:99:99:Nobody:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
watney:x:1000:1000:Mark Watney:/home/watney:/bin/bash
jimenez:x:1001:1001:José Jiménez:/home/jimenez:/bin/bash
ivanovic:x:1002:1002:Иван Иванович:/home/ivanovic:/bin/bash
lewis:x:1003:1002:Melissa Lewis:/home/ivanovic:/bin/bash"""


def function(data: str):
    ...


def generator(data: str):
    ...