2.5. Generators

2.5.1. Generator expressions vs. Comprehensions

list(x for x in range(0,5))        # [0, 1, 2, 3, 4]
[x for x in range(0,5)]            # [0, 1, 2, 3, 4]

set(x for x in range(0,5))         # {0, 1, 2, 3, 4}
{x for x in range(0,5)}            # {0, 1, 2, 3, 4}

dict((x,x) for x in range(0,5))    # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
{x: x for x in range(0,5)}         # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}

tuple(x for x in range(0,5))       # (0, 1, 2, 3, 4)
(x for x in range(0,5))            # <generator object <genexpr> at 0x118c1aed0>

all(x for x in range(0,5))         # False
any(x for x in range(0,5))         # True
sum(x for x in range(0,5))         # 10

2.5.1.1. What is the difference?

  • Comprehensions executes instantly

  • Generators are lazy evaluated

  • Create generator object and assign pointer (do not execute)

  • Comprehensions will be in the memory until end of a program

  • Generators are cleared once they are executed

a = [x for x in range(0, 5)]

print(a)
# [0, 1, 2, 3, 4]

print(a)
# [0, 1, 2, 3, 4]
a = (x for x in range(0, 5))

print(a)
# <generator object <genexpr> at 0x111e7acd0>

print(list(a))
# [0, 1, 2, 3, 4]

print(list(a))
# []

2.5.2. Lazy evaluation

  • Code do not execute instantly

  • Sometimes code is not executed at all!

2.5.2.1. Declaring generators

Listing 2.113. This will not generate any numbers!
a = (x for x in range(0,5))
b = (x for x in range(0,5))
c = (x for x in range(0,5))
Listing 2.114. This will only create generator expression, but not evaluate it!
a = (x for x in range(0,5))

print(a)
# <generator object <genexpr> at 0x11cb45950>

2.5.2.2. Evaluating generator instantly

  • Not very efficient

  • If you need values evaluated instantly, there is no point in using generators

a = (x for x in range(0,5))

list(a)
# [0, 1, 2, 3, 4]

2.5.2.3. Evaluate generator iteratively

  • Generator will calculate next number for every loop iteration

  • Forgets previous number

  • Doesn't know the next number

a = (x for x in range(0,5))

for i in a:
    print(i)
# 0
# 1
# 2
# 3
# 4

2.5.2.4. Halting and resuming iteration

  • Will generate only three numbers, then stop

  • Forget generator

Listing 2.115. Comprehension will generate a sequence instantly, and iterate over it. It will be in the memory until end of a program
numbers = [x for x in range(0, 10)]

for x in numbers:
   print(x)
   if x == 3:
       break
# 0
# 1
# 2
# 3

for x in numbers:
   print(x)
   if x == 6:
       break
# 0
# 1
# 2
# 3
# 4
# 5
# 6

list(numbers)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

list(numbers)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Listing 2.116. Generator with generate numbers as it goes in the process
numbers = (x for x in range(0, 10))

for x in numbers:
   print(x)
   if x == 3:
       break
# 0
# 1
# 2
# 3

for x in numbers:
   print(x)
   if x == 6:
       break
# 4
# 5
# 6

list(numbers)
# [7, 8, 9]

list(numbers)
# []

2.5.2.5. Which one is better?

  • Comprehensions - Using values more than one

  • Generators - Using values once (for example in the loop iterator)

2.5.3. yield Operator

DATA = [
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
]

def get_species(species):
    result = []
    for row in DATA:
        if row[4] == species:
            result.append(row)
    return result


data = get_species('setosa')

print(data)
# [(5.1, 3.5, 1.4, 0.2, 'setosa'),
#  (4.9, 3.0, 1.4, 0.2, 'setosa'),
#  (5.4, 3.9, 1.7, 0.4, 'setosa')]

for row in data:
    print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.9, 3.0, 1.4, 0.2, 'setosa')
# (5.4, 3.9, 1.7, 0.4, 'setosa')
DATA = [
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
]

def get_species(species):
    for row in DATA:
        if row[4] == species:
            yield row

data = get_species('setosa')

print(data)
# <generator object get_species at 0x11af257c8>

for row in data:
    print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.9, 3.0, 1.4, 0.2, 'setosa')
# (5.4, 3.9, 1.7, 0.4, 'setosa')

2.5.4. Built-in generators

2.5.4.1. zip()

Listing 2.117. map() syntax
zip(<sequence>, <sequence>, ...)
header = ['a', 'b', 'c']
data = [1, 2, 3]

zip(header, data)
# <zip object at 0x11cf54b90>

list(zip(header, data))
# [('a', 1), ('b', 2), ('c', 3)]

tuple(zip(header, data))
# (('a', 1), ('b', 2), ('c', 3))

dict(zip(header, data))
# {'a': 1, 'b': 2, 'c': 3}
header = ['a', 'b', 'c']
data = [1, 2, 3]
row = [77,88,99]

[(h,d,r) for h,d,r in zip(header, data, row)]
# [('a', 1, 77), ('b', 2, 88), ('c', 3, 99)]

2.5.4.2. map()

Listing 2.118. map() syntax
map(<callable>, <sequence>)
data = [1, 2, 3]

list(map(float, data))
# [1.0, 2.0, 3.0]
map(float, [1, 2, 3])
# <map object at 0x11d15a190>

list(map(float, [1, 2, 3]))
# [1.0, 2.0, 3.0]

tuple(map(float, [1, 2, 3]))
# (1.0, 2.0, 3.0)

2.5.4.3. filter()

Listing 2.119. filter() syntax
filter(<callable>, <sequence>)
Listing 2.120. Show only even numbers
data = [1, 2, 3, 4, 5, 6]

list(filter(lambda x: x % 2 == 0, data))
# [2, 4, 6]
data = [1, 2, 3, 4, 5, 6]

def is_even(x):
    return x % 2 == 0

filter(is_even, data)
# <filter object at 0x11d182990>

list(filter(is_even, data))
# [2, 4, 6]
data = [1, 2, 3, 4, 5, 6]

def is_even(x):
    if x % 2 == 0:
        return True
    else:
        return False

filter(is_even, data)
# <filter object at 0x11d182990>

list(filter(is_even, data))
# [2, 4, 6]
Listing 2.121. filter() example
DATA = [
    {'name': 'Jan Twardowski', 'age': 21},
    {'name': 'Mark Watney', 'age': 25},
    {'name': 'Melissa Lewis', 'age': 18},
]

def is_adult(person):
    if person['age'] >= 21:
        return True
    else:
        return False


result = filter(is_adult, DATA)
print(list(result))
# [
#   {'name': 'Jan Twardowski', 'age': 21},
#   {'name': 'Mark Watney', 'age': 25},
# ]

2.5.4.4. enumerate()

Listing 2.122. enumerate() syntax
enumerate(<sequence>)
data = ['a', 'b', 'c']

list(enumerate(data))
# [(0, 'a'), (1, 'b'), (2, 'c')]

dict(enumerate(data))
# {0: 'a', 1: 'b', 2: 'c'}

dict((v,k) for k,v in enumerate(data))
# {'a': 0, 'b': 1, 'c': 2}

{v:k for k,v in enumerate(data, start=5)}
# {'a': 5, 'b': 6, 'c': 7}
header = ['a', 'b', 'c']
data = [1, 2, 3]
result = {}

for i, _ in enumerate(header):
    key = header[i]
    value = data[i]
    result[key] = value

print(result)
# {'a': 1, 'b': 2, 'c': 3}
header = ['a', 'b', 'c']
data = [1, 2, 3]
result = {}

for i, header in enumerate(header):
    result[header] = data[i]

print(result)
# {'a': 1, 'b': 2, 'c': 3}

2.5.5. Generator as Iterator

a = (x for x in range(0,3))

next(a)
# 0

next(a)
# 1

next(a)
# 2

next(a)
# Traceback (most recent call last):
#   File "<input>", line 1, in <module>
# StopIteration
data = (x for x in range(0,3))

for a in data:
    print(a)

# is analogous to:
try:
    i = iter(data)

    a = next(i)
    print(a)

    a = next(i)
    print(a)

    a = next(i)
    print(a)

    a = next(i)
    print(a)

    a = next(i)
    print(a)
except StopIteration:
    pass

2.5.6. Is Generator

import inspect

a = [x for x in range(0,5)]
b = (x for x in range(0,5))

inspect.isgenerator(a)
# False

inspect.isgenerator(b)
# True
import inspect

data = range(0, 10)

inspect.isgenerator(data)
# False

2.5.7. Introspection

a = (x for x in range(0,10))

next(a)
# 0

a.gi_code
# <code object <genexpr> at 0x11fc4dc90, file "<input>", line 1>

a.gi_running
# False

a.gi_yieldfrom

a.gi_frame
# <frame at 0x7f93a1723200, file '<input>', line 1, code <genexpr>>

a.gi_frame.f_locals
# {'.0': <range_iterator object at 0x11fc4c840>, 'x': 0}

a.gi_frame.f_code
# <code object <genexpr> at 0x11fc4dc90, file "<input>", line 1>

a.gi_frame.f_lineno
# 1

a.gi_frame.f_lasti
# 8

2.5.8. Memory Size

  • sys.getsizeof(object) returns the size of an object in bytes

  • sys.getsizeof(object) calls the object's __sizeof__ method

  • sys.getsizeof(object) adds an additional garbage collector overhead if the object is managed by the garbage collector

  • More info: https://stackoverflow.com/a/30316760

import sys


genexpr = (x for x in range(0,10))
listcomp = [x for x in range(0,10)]

sys.getsizeof(genexpr)
# 112

sys.getsizeof(listcomp)
# 184

2.5.9. Assignments

2.5.9.1. Function Generator Iris

English
  1. Use code from "Input" section (see below)

  2. Download data/iris.csv and save as iris.csv

  3. Iterate over file lines

  4. Read header from first line

  5. Create function which returns all features for given species

  6. Species will be passed as an str argument to the function

  7. Implement solution using function

  8. Implement solution using generator and yield keyword

  9. Compare results of both using sys.getsizeof()

  10. What will happen if input data will be bigger?

  11. Compare result with "Output" section (see below)

Polish
  1. Użyj kodu z sekcji "Input" (patrz poniżej)

  2. Pobierz data/iris.csv i zapisz jako iris.csv

  3. Iteruj po liniach pliku

  4. Wczytaj header z pierwszej linii

  5. Napisz funkcję która zwraca wszystkie pomiary dla danego gatunku

  6. Gatunek będzie podawany jako argument typu str do funkcji

  7. Zaimplementuj rozwiązanie wykorzystując funkcję

  8. Zaimplementuj rozwiązanie wykorzystując generator i słówko kluczowe yield

  9. Porównaj wyniki obu używając sys.getsizeof()

  10. Co się stanie, gdy ilość danych będzie większa?

  11. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Input
import sys

FILE = r'iris.csv'


def function(species):
    raise NotImplementedError

def generator(species):
    raise NotImplementedError


fun = function('setosa')
gen = generator('setosa')

print('Function', sys.getsizeof(fun))
print('Generator', sys.getsizeof(gen))
Output
Function 520
Generator 112
The whys and wherefores
  • Using generators

  • Unpacking lazy evaluated code

  • Comparing size of objects

  • Parsing CSV file

  • Filtering file content

2.5.9.2. Function Generator Passwd

English
  1. Download data/etc-passwd.txt and save as etc-passwd.txt

  2. Iterating over lines, filter out comments, empty lines, etc.

  3. Extract system accounts (users with UID [third field] is less than 1000)

  4. Return list of system account logins

  5. Implement solution using function

  6. Implement solution using generator and yield keyword

  7. Compare results of both using sys.getsizeof()

  8. Compare result with "Output" section (see below)

Polish
  1. Pobierz data/etc-passwd.txt i zapisz jako etc-passwd.txt

  2. Iterując po liniaj, odfiltruj komentarze, puste linie itp.

  3. Wyciągnnij konta systemowe (użytkownicy z UID (trzecie pole) mniejszym niż 1000)

  4. Zwróć listę loginów użytkowników systemowych

  5. Zaimplementuj rozwiązanie wykorzystując funkcję

  6. Zaimplementuj rozwiązanie wykorzystując generator i słówko kluczowe yield

  7. Porównaj wyniki obu używając sys.getsizeof()

  8. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Output
Function 120
Generator 112
The whys and wherefores
  • Using generators

  • Unpacking lazy evaluated code

  • Comparing size of objects

  • Parsing CSV file

  • Filtering file content