5.9. Generators

5.9.1. Recap

  • Comprehensions executes instantly

  • Generators are lazy evaluated

>>> data = [x for x in range(0,5)]
>>>
>>> print(data)
[0, 1, 2, 3, 4]
>>> list(data)
[0, 1, 2, 3, 4]
>>> data = (x for x in range(0,5))
>>>
>>> print(data)  # doctest: +ELLIPSIS
<generator object <genexpr> at 0x...>
>>> list(data)
[0, 1, 2, 3, 4]
>>> _ = list(x for x in range(0,5))      # list comprehension
>>> _ = tuple(x for x in range(0,5))     # tuple comprehension
>>> _ = set(x for x in range(0,5))       # set comprehension
>>> _ = dict((x,x) for x in range(0,5))  # dict comprehension
>>> _ = [x for x in range(0,5)]          # list comprehension
>>> _ = (x for x in range(0,5))          # generator expression
>>> _ = {x for x in range(0,5)}          # set comprehension
>>> _ = {x:x for x in range(0,5)}        # dict comprehension

5.9.2. Rationale

  • Create generator object and assign pointer (do not execute)

  • Comprehensions will be in the memory until end of a program

  • Generators are cleared once they are executed

  • Comprehensions - Using values more than one

  • Generators - Using values once (for example in the loop iterator)

  • Generator will calculate next number for every loop iteration

  • Generator forgets previous number

  • Generator doesn't know the next number

  • Code do not execute instantly

  • Sometimes code is not executed at all!

  • If you need values evaluated instantly, there is no point in using generators

5.9.3. Generator Function

DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
        (5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa')]


def get_values(species):
    result = []
    for row in DATA:
        if row[4] == species:
            result.append(row)
    return result


data = get_values('setosa')

print(data)
# [(5.1, 3.5, 1.4, 0.2, 'setosa'), (4.7, 3.2, 1.3, 0.2, 'setosa')]

for row in data:
    print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.7, 3.2, 1.3, 0.2, 'setosa')
DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
        (5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa')]


def get_values(species):
    for row in DATA:
        if row[4] == species:
            yield row


data = get_values('setosa')

print(data)
# <generator object get_values at 0x103632820>

for row in data:
    print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.7, 3.2, 1.3, 0.2, 'setosa')

5.9.4. Itertools

from itertools import *

count(start=0, step=1)
cycle(iterable)
repeat(object[, times])
accumulate(iterable[, func, *, initial=None])
chain(*iterables)
compress(data, selectors)
islice(iterable, start, stop[, step])
starmap(function, iterable)
product(*iterables, repeat=1)
permutations(iterable, r=None)
combinations(iterable, r)
combinations_with_replacement(iterable, r)
groupby(iterable, key=None)

5.9.5. Memory Footprint

  • sys.getsizeof(obj) returns the size of an obj in bytes

  • sys.getsizeof(obj) calls obj.__sizeof__() method

  • sys.getsizeof(obj) adds an additional garbage collector overhead if the obj is managed by the garbage collector

from sys import getsizeof


gen1 = (x for x in range(0,1))
gen10 = (x for x in range(0,10))
gen100 = (x for x in range(0,100))
gen1000 = (x for x in range(0,1000))

getsizeof(gen1)
# 112

getsizeof(gen10)
# 112

getsizeof(gen100)
# 112

getsizeof(gen1000)
# 112
from sys import getsizeof


com1 = [x for x in range(0,1)]
com10 = [x for x in range(0,10)]
com100 = [x for x in range(0,100)]
com1000 = [x for x in range(0,1000)]


getsizeof(com1)
# 88

getsizeof(com10)
# 184

getsizeof(com100)
# 920

getsizeof(com1000)
# 8856

5.9.6. Inspection

from inspect import isgenerator

a = [x for x in range(0,5)]
b = (x for x in range(0,5))

isgenerator(a)
# False

isgenerator(b)
# True
from inspect import isgenerator

data = range(0, 10)

isgenerator(data)
# False

5.9.7. Introspection

data = (x for x in range(0,10))

next(data)
# 0

data.gi_code
# <code object <genexpr> at 0x11fc4dc90, file "<input>", line 1>

data.gi_running
# False

data.gi_yieldfrom

data.gi_frame
# <frame at 0x7f93a1723200, file '<input>', line 1, code <genexpr>>

data.gi_frame.f_locals
# {'.0': <range_iterator object at 0x11fc4c840>, 'x': 0}

data.gi_frame.f_code
# <code object <genexpr> at 0x11fc4dc90, file "<input>", line 1>

data.gi_frame.f_lineno
# 1

data.gi_frame.f_lasti
# 8

5.9.8. Assignments

Code 5.57. Solution
"""
* Assignment: Function Generator Chain
* Filename: function_generators_chain.py
* Complexity: easy
* Lines of code: 10 lines
* Time: 8 min

English:
    1. Use generator expression to create `result`
    2. In generator use `range()` to get numbers from 1 to 33 (inclusive) divisible by 3
    3. Use `filter()` to get odd numbers from `result`
    4. Use `map()` to cube all numbers in `result`
    5. Set `result` with arithmetic mean of `result`
    6. Compare result with "Tests" section (see below)

Polish:
    1. Użyj wyrażenia generatorowego do stworzenia `result`
    2. W generatorze użyj `range()` aby otrzymać liczby od 1 do 33 (włącznie) podzielne przez 3
    3. Użyj `filter()` aby otrzymać liczby nieparzyste z `result`
    4. Użyj `map()` aby podnieść wszystkie liczby w `result` do sześcianu
    5. Ustaw `result` ze średnią arytmetyczną z `result`
    6. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Hints:
    * type cast to `list()` to expand generator before calculating mean
    * `mean = sum(...) / len(...)`

Tests:
    >>> result
    11502.0
"""


# Given
def odd(x):
    return x % 2


def cube(x):
    return x ** 3


result: float


Code 5.58. Solution
"""
* Assignment: Function Generator Iris
* Filename: function_generator_iris.py
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
    1. Use code from "Given" section (see below)
    2. Write filter for `DATA` which returns `features` for given `species`
    3. Implement solution using function
    4. Implement solution using generator and `yield` keyword
    5. Compare results of both using `sys.getsizeof()`
    6. What will happen if input data will be bigger?
    7. Note, that in different Python versions you'll get slightly
       different values for getsizeof generator and function:
        a. 112 for generator in Python 3.9
        b. 112 for generator in Python 3.8
        c. 120 for generator in Python 3.7
    8. Compare result with "Tests" section (see below)

Polish:
    1. Użyj kodu z sekcji "Given" (patrz poniżej)
    2. Napisz filtr dla `DATA` zwracający `features` dla danego gatunku `species`
    3. Zaimplementuj rozwiązanie wykorzystując funkcję
    4. Zaimplementuj rozwiązanie wykorzystując generator i słowo kluczowe `yield`
    5. Porównaj wyniki obu używając `sys.getsizeof()`
    6. Co się stanie, gdy ilość danych będzie większa?
    7. Zwróć uwagę, że w zależności od wersji Python wartości getsizeof
       dla funkcji i generatora mogą się nieznaczenie różnić:
        a. 112 dla generator w Python 3.9
        b. 112 dla generator w Python 3.8
        c. 120 dla generator w Python 3.7
    8. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> from sys import getsizeof
    >>> from inspect import isfunction, isgeneratorfunction
    >>> assert isfunction(function)
    >>> assert isgeneratorfunction(generator)

    >>> list(function(DATA, 'setosa'))
    [[5.1, 3.5, 1.4, 0.2], [4.7, 3.2, 1.3, 0.2]]
    >>> list(generator(DATA, 'setosa'))
    [[5.1, 3.5, 1.4, 0.2], [4.7, 3.2, 1.3, 0.2]]

    >>> getsizeof(function(DATA, 'setosa'))
    88
    >>> getsizeof(function(DATA*10, 'setosa'))
    248
    >>> getsizeof(function(DATA*100, 'setosa'))
    1656
    >>> getsizeof(generator(DATA, 'setosa'))
    112
    >>> getsizeof(generator(DATA*10, 'setosa'))
    112
    >>> getsizeof(generator(DATA*100, 'setosa'))
    112
"""


# Given
DATA = [(5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa')]


def function(data: list, species: str):
    ...


def generator(data: list, species: str):
    ...


Code 5.59. Solution
"""

* Assignment: Function Generator Passwd
* Filename: function_generator_passwd.py
* Complexity: medium
* Lines of code: 10 lines
* Time: 8 min

English:
    1. Use code from "Given" section (see below)
    2. Split `DATA` by lines and then by colon `:`
    3. Extract system accounts (users with UID [third field] is less than 1000)
    4. Return list of system account logins
    5. Implement solution using function
    6. Implement solution using generator and `yield` keyword
    7. Compare results of both using `sys.getsizeof()`
    8. Compare result with "Tests" section (see below)

Polish:
    1. Użyj kodu z sekcji "Given" (patrz poniżej)
    2. Podziel `DATA` po liniach a następnie po dwukropku `:`
    3. Wyciągnij konta systemowe (użytkownicy z UID (trzecie pole) mniejszym niż 1000)
    4. Zwróć listę loginów użytkowników systemowych
    5. Zaimplementuj rozwiązanie wykorzystując funkcję
    6. Zaimplementuj rozwiązanie wykorzystując generator i słowo kluczowe `yield`
    7. Porównaj wyniki obu używając `sys.getsizeof()`
    8. Porównaj wyniki z sekcją "Tests" (patrz poniżej)

Tests:
    >>> from sys import getsizeof
    >>> from inspect import isfunction, isgeneratorfunction
    >>> assert isfunction(function)
    >>> assert isgeneratorfunction(generator)
    >>> fun = function(DATA)
    >>> gen = generator(DATA)
    >>> list(fun)
    ['root', 'bin', 'daemon', 'adm', 'shutdown', 'halt', 'nobody', 'sshd']
    >>> list(gen)
    ['root', 'bin', 'daemon', 'adm', 'shutdown', 'halt', 'nobody', 'sshd']
    >>> getsizeof(fun)
    120
    >>> getsizeof(gen)
    112
"""


# Given
DATA = """root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
nobody:x:99:99:Nobody:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
watney:x:1000:1000:Mark Watney:/home/watney:/bin/bash
jimenez:x:1001:1001:José Jiménez:/home/jimenez:/bin/bash
ivanovic:x:1002:1002:Иван Иванович:/home/ivanovic:/bin/bash
lewis:x:1003:1002:Melissa Lewis:/home/ivanovic:/bin/bash"""


def function(data: str):
    ...


def generator(data: str):
    ...