2.4. Generators

2.4.1. Generator expressions vs. Comprehensions

list(x for x in range(0,5))        # [0, 1, 2, 3, 4]
[x for x in range(0,5)]            # [0, 1, 2, 3, 4]

set(x for x in range(0,5))         # {0, 1, 2, 3, 4}
{x for x in range(0,5)}            # {0, 1, 2, 3, 4}

dict((x,x) for x in range(0,5))    # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
{x: x for x in range(0,5)}         # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}

tuple(x for x in range(0,5))       # (0, 1, 2, 3, 4)
(x for x in range(0,5))            # <generator object <genexpr> at 0x118c1aed0>

all(x for x in range(0,5))         # False
any(x for x in range(0,5))         # True
sum(x for x in range(0,5))         # 10

2.4.1.1. What is the difference?

  • Comprehensions executes instantly

  • Generators are lazy evaluated

  • Create generator object and assign pointer (do not execute)

  • Comprehensions will be in the memory until end of a program

  • Generators are cleared once they are executed

a = [x for x in range(0, 5)]

print(a)
# [0, 1, 2, 3, 4]

print(a)
# [0, 1, 2, 3, 4]
a = (x for x in range(0, 5))

print(a)
# <generator object <genexpr> at 0x111e7acd0>

print(list(a))
# [0, 1, 2, 3, 4]

print(list(a))
# []

2.4.2. Lazy evaluation

  • Code do not execute instantly

  • Sometimes code is not executed at all!

2.4.2.1. Declaring generators

Listing 452. This will not generate any numbers!
a = (x for x in range(0,5))
b = (x for x in range(0,5))
c = (x for x in range(0,5))
Listing 453. This will only create generator expression, but not evaluate it!
a = (x for x in range(0,5))

print(a)
# <generator object <genexpr> at 0x11cb45950>

2.4.2.2. Evaluating generator instantly

  • Not very efficient

  • If you need values evaluated instantly, there is no point in using generators

a = (x for x in range(0,5))

list(a)
# [0, 1, 2, 3, 4]

2.4.2.3. Evaluate generator iteratively

  • Generator will calculate next number for every loop iteration

  • Forgets previous number

  • Doesn't know the next number

a = (x for x in range(0,5))

for i in a:
    print(i)
# 0
# 1
# 2
# 3
# 4

2.4.2.4. Halting and resuming iteration

  • Will generate only three numbers, then stop

  • Forget generator

Listing 454. Comprehension will generate a sequence instantly, and iterate over it. It will be in the memory until end of a program
numbers = [x for x in range(0, 10)]

for x in numbers:
   print(x)
   if x == 3:
       break
# 0
# 1
# 2
# 3

for x in numbers:
   print(x)
   if x == 6:
       break
# 0
# 1
# 2
# 3
# 4
# 5
# 6

list(numbers)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

list(numbers)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Listing 455. Generator with generate numbers as it goes in the process
numbers = (x for x in range(0, 10))

for x in numbers:
   print(x)
   if x == 3:
       break
# 0
# 1
# 2
# 3

for x in numbers:
   print(x)
   if x == 6:
       break
# 4
# 5
# 6

list(numbers)
# [7, 8, 9]

list(numbers)
# []

2.4.2.5. Which one is better?

  • Comprehensions - Using values more than one

  • Generators - Using values once (for example in the loop iterator)

2.4.3. yield Operator

DATA = [
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
]

def get_species(species):
    output = []
    for row in DATA:
        if row[4] == species:
            output.append(row)
    return output


data = get_species('setosa')

print(data)
# [(5.1, 3.5, 1.4, 0.2, 'setosa'),
#  (4.9, 3.0, 1.4, 0.2, 'setosa'),
#  (5.4, 3.9, 1.7, 0.4, 'setosa')]

for row in data:
    print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.9, 3.0, 1.4, 0.2, 'setosa')
# (5.4, 3.9, 1.7, 0.4, 'setosa')
DATA = [
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
]

def get_species(species):
    for row in DATA:
        if row[4] == species:
            yield row

data = get_species('setosa')

print(data)
# <generator object get_species at 0x11af257c8>

for row in data:
    print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.9, 3.0, 1.4, 0.2, 'setosa')
# (5.4, 3.9, 1.7, 0.4, 'setosa')

2.4.4. Built-in generators

header = ['a', 'b', 'c']
data = [1, 2, 3]
output = {}

for i, _ in enumerate(header):
    key = header[i]
    value = data[i]
    output[key] = value

print(output)
# {'a': 1, 'b': 2, 'c': 3}

2.4.4.1. zip()

Listing 456. map() syntax
zip(<sequence>, <sequence>, ...)
header = ['a', 'b', 'c']
data = [1, 2, 3]

zip(header, data)
# <zip object at 0x11cf54b90>

list(zip(header, data))
# [('a', 1), ('b', 2), ('c', 3)]

dict(zip(header, data))
# {'a': 1, 'b': 2, 'c': 3}

tuple(zip(header, data))
# (('a', 1), ('b', 2), ('c', 3))
header = ['a', 'b', 'c']
data = [1, 2, 3]
row = [77,88,99]

[(k,v,r) for k,v,r in zip(header, data, row)]
# [('a', 1, 77), ('b', 2, 88), ('c', 3, 99)]

2.4.4.2. map()

Listing 457. map() syntax
map(<callable>, <sequence>)
data = [1, 2, 3]

list(map(float, data))
# [1.0, 2.0, 3.0]
map(float, [1, 2, 3])
# <map object at 0x11d15a190>

list(map(float, [1, 2, 3]))
# [1.0, 2.0, 3.0]

tuple(map(float, [1, 2, 3]))
# (1.0, 2.0, 3.0)

2.4.4.3. filter()

Listing 458. filter() syntax
filter(<callable>, <sequence>)
Listing 459. Show only even numbers
list(filter(lambda x: x % 2 == 0, data))
# [2, 4, 6]
data = [1, 2, 3, 4, 5, 6]

def is_even(x):
    if x % 2 == 0:
        return True
    else:
        return False

filter(is_even, data)
# <filter object at 0x11d182990>

list(filter(is_even, data))
# [2, 4, 6]

2.4.4.4. enumerate()

Listing 460. enumerate() syntax
enumerate(<sequence>)
header = ['a', 'b', 'c']

list(enumerate(header))
# [(0, 'a'), (1, 'b'), (2, 'c')]

dict(enumerate(header))
# {0: 'a', 1: 'b', 2: 'c'}

2.4.5. Generator as Iterator

a = (x for x in range(0,3))

next(a)
# 0

next(a)
# 1

next(a)
# 2

next(a)
# Traceback (most recent call last):
#   File "<input>", line 1, in <module>
# StopIteration

2.4.6. Assignments

2.4.6.1. Generators vs. Comprehensions - iris

  • Complexity level: medium

  • Lines of code to write: 40 lines

  • Estimated time of completion: 15 min

  • Solution: solution/generator_iris.py

English
  1. Download data/iris.csv and save as generator_iris.csv

  2. Read data skipping header

  3. Create function with returns all measurements for given species

  4. Species will be passed as an str argument to the function

  5. Implement solution using normal function

  6. Implement solution using generator and yield keyword

  7. Compare results of both using sys.getsizeof()

  8. What will happen if input data will be bigger?

Polish
  1. Pobierz data/iris.csv i zapisz jako generator_iris.csv

  2. Zaczytaj dane pomijając nagłówek

  3. Napisz funkcję która zwraca wszystkie pomiary dla danego gatunku

  4. Gatunek będzie podawany jako argument typu str do funkcji

  5. Zaimplementuj rozwiązanie wykorzystując zwykłą funkcję

  6. Zaimplementuj rozwiązanie wykorzystując generator i słówko kluczowe yield

  7. Porównaj wyniki obu używając sys.getsizeof()

  8. Co się stanie, gdy ilość danych będzie większa?

The whys and wherefores
  • Using generators

  • Unpacking lazy evaluated code

  • Comparing size of objects

  • Parsing CSV file

  • Filtering file content

Hint
fun = function_filter('setosa')
gen = generator_filter('setosa')

print('Function', sys.getsizeof(fun))
print('Generator', sys.getsizeof(gen))

2.4.6.2. Generators vs. Comprehensions - passwd

English
  1. Download data/hosts.txt and save as generator_iris.csv

  2. Iterating over lines, filter out comments, empty lines, etc.

  3. Extract system accounts (users with UID [third field] is less than 1000)

  4. Return list of system account logins

  5. Implement solution using normal function

  6. Implement solution using generator and yield keyword

  7. Compare results of both using sys.getsizeof()

  8. What will happen if input data will be bigger?

Polish
  1. Pobierz data/hosts.txt i zapisz jako hosts.txt

  2. Iterując po liniaj, odfiltruj komentarze, puste linie itp.

  3. Wyciągnnij konta systemowe (użytkownicy z UID (trzecie pole) mniejszym niż 1000)

  4. Zwróć listę loginów użytkowników systemowych

  5. Zaimplementuj rozwiązanie wykorzystując zwykłą funkcję

  6. Zaimplementuj rozwiązanie wykorzystując generator i słówko kluczowe yield

  7. Porównaj wyniki obu używając sys.getsizeof()

  8. Co się stanie, gdy ilość danych będzie większa?

The whys and wherefores
  • Using generators

  • Unpacking lazy evaluated code

  • Comparing size of objects

  • Parsing CSV file

  • Filtering file content