2.4. Generators

2.4.1. Generator expressions vs. Comprehensions

list(x for x in range(0,5))        # [0, 1, 2, 3, 4]
[x for x in range(0,5)]            # [0, 1, 2, 3, 4]

set(x for x in range(0,5))         # {0, 1, 2, 3, 4}
{x for x in range(0,5)}            # {0, 1, 2, 3, 4}

dict((x,x) for x in range(0,5))    # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
{x: x for x in range(0,5)}         # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}

tuple(x for x in range(0,5))       # (0, 1, 2, 3, 4)
(x for x in range(0,5))            # <generator object <genexpr> at 0x118c1aed0>

all(x for x in range(0,5))         # False
any(x for x in range(0,5))         # True
sum(x for x in range(0,5))         # 10

2.4.1.1. What is the difference?

  • Comprehensions executes instantly

  • Generators are lazy evaluated

  • Create generator object and assign pointer (do not execute)

  • Comprehensions will be in the memory until end of a program

  • Generators are cleared once they are executed

a = [x for x in range(0, 5)]

print(a)
# [0, 1, 2, 3, 4]

print(a)
# [0, 1, 2, 3, 4]
a = (x for x in range(0, 5))

print(a)
# <generator object <genexpr> at 0x111e7acd0>

print(list(a))
# [0, 1, 2, 3, 4]

print(list(a))
# []

2.4.2. Lazy evaluation

  • Code do not execute instantly

  • Sometimes code is not executed at all!

2.4.2.1. Declaring generators

Listing 2.57. This will not generate any numbers!
a = (x for x in range(0,5))
b = (x for x in range(0,5))
c = (x for x in range(0,5))
Listing 2.58. This will only create generator expression, but not evaluate it!
a = (x for x in range(0,5))

print(a)
# <generator object <genexpr> at 0x11cb45950>

2.4.2.2. Evaluating generator instantly

  • Not very efficient

  • If you need values evaluated instantly, there is no point in using generators

a = (x for x in range(0,5))

list(a)
# [0, 1, 2, 3, 4]

2.4.2.3. Evaluate generator iteratively

  • Generator will calculate next number for every loop iteration

  • Forgets previous number

  • Doesn't know the next number

a = (x for x in range(0,5))

for i in a:
    print(i)
# 0
# 1
# 2
# 3
# 4

2.4.2.4. Halting and resuming iteration

  • Will generate only three numbers, then stop

  • Forget generator

Listing 2.59. Comprehension will generate a sequence instantly, and iterate over it. It will be in the memory until end of a program
numbers = [x for x in range(0, 10)]

for x in numbers:
   print(x)
   if x == 3:
       break
# 0
# 1
# 2
# 3

for x in numbers:
   print(x)
   if x == 6:
       break
# 0
# 1
# 2
# 3
# 4
# 5
# 6

list(numbers)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

list(numbers)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Listing 2.60. Generator with generate numbers as it goes in the process
numbers = (x for x in range(0, 10))

for x in numbers:
   print(x)
   if x == 3:
       break
# 0
# 1
# 2
# 3

for x in numbers:
   print(x)
   if x == 6:
       break
# 4
# 5
# 6

list(numbers)
# [7, 8, 9]

list(numbers)
# []

2.4.2.5. Which one is better?

  • Comprehensions - Using values more than one

  • Generators - Using values once (for example in the loop iterator)

2.4.3. yield Operator

DATA = [
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
]

def get_species(species):
    result = []
    for row in DATA:
        if row[4] == species:
            result.append(row)
    return result


data = get_species('setosa')

print(data)
# [(5.1, 3.5, 1.4, 0.2, 'setosa'),
#  (4.9, 3.0, 1.4, 0.2, 'setosa'),
#  (5.4, 3.9, 1.7, 0.4, 'setosa')]

for row in data:
    print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.9, 3.0, 1.4, 0.2, 'setosa')
# (5.4, 3.9, 1.7, 0.4, 'setosa')
DATA = [
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
]

def get_species(species):
    for row in DATA:
        if row[4] == species:
            yield row

data = get_species('setosa')

print(data)
# <generator object get_species at 0x11af257c8>

for row in data:
    print(row)
# (5.1, 3.5, 1.4, 0.2, 'setosa')
# (4.9, 3.0, 1.4, 0.2, 'setosa')
# (5.4, 3.9, 1.7, 0.4, 'setosa')

2.4.4. Built-in generators

2.4.4.1. zip()

Listing 2.61. map() syntax
zip(<sequence>, <sequence>, ...)
header = ['a', 'b', 'c']
data = [1, 2, 3]

zip(header, data)
# <zip object at 0x11cf54b90>

list(zip(header, data))
# [('a', 1), ('b', 2), ('c', 3)]

tuple(zip(header, data))
# (('a', 1), ('b', 2), ('c', 3))

dict(zip(header, data))
# {'a': 1, 'b': 2, 'c': 3}
header = ['a', 'b', 'c']
data = [1, 2, 3]
row = [77,88,99]

[(h,d,r) for h,d,r in zip(header, data, row)]
# [('a', 1, 77), ('b', 2, 88), ('c', 3, 99)]

2.4.4.2. map()

Listing 2.62. map() syntax
map(<callable>, <sequence>)
data = [1, 2, 3]

list(map(float, data))
# [1.0, 2.0, 3.0]
map(float, [1, 2, 3])
# <map object at 0x11d15a190>

list(map(float, [1, 2, 3]))
# [1.0, 2.0, 3.0]

tuple(map(float, [1, 2, 3]))
# (1.0, 2.0, 3.0)

2.4.4.3. filter()

Listing 2.63. filter() syntax
filter(<callable>, <sequence>)
Listing 2.64. Show only even numbers
data = [1, 2, 3, 4, 5, 6]

list(filter(lambda x: x % 2 == 0, data))
# [2, 4, 6]
data = [1, 2, 3, 4, 5, 6]

def is_even(x):
    return x % 2 == 0

filter(is_even, data)
# <filter object at 0x11d182990>

list(filter(is_even, data))
# [2, 4, 6]
data = [1, 2, 3, 4, 5, 6]

def is_even(x):
    if x % 2 == 0:
        return True
    else:
        return False

filter(is_even, data)
# <filter object at 0x11d182990>

list(filter(is_even, data))
# [2, 4, 6]
Listing 2.65. filter() example
DATA = [
    {'name': 'Jan Twardowski', 'age': 21},
    {'name': 'Mark Watney', 'age': 25},
    {'name': 'Melissa Lewis', 'age': 18},
]

def is_adult(person):
    if person['age'] >= 21:
        return True
    else:
        return False


result = filter(is_adult, DATA)
print(list(result))
# [
#   {'name': 'Jan Twardowski', 'age': 21},
#   {'name': 'Mark Watney', 'age': 25},
# ]

2.4.4.4. enumerate()

Listing 2.66. enumerate() syntax
enumerate(<sequence>)
data = ['a', 'b', 'c']

list(enumerate(data))
# [(0, 'a'), (1, 'b'), (2, 'c')]

dict(enumerate(data))
# {0: 'a', 1: 'b', 2: 'c'}

dict((v,k) for k,v in enumerate(data))
# {'a': 0, 'b': 1, 'c': 2}

{v:k for k,v in enumerate(data, start=5)}
# {'a': 5, 'b': 6, 'c': 7}
header = ['a', 'b', 'c']
data = [1, 2, 3]
result = {}

for i, _ in enumerate(header):
    key = header[i]
    value = data[i]
    result[key] = value

print(result)
# {'a': 1, 'b': 2, 'c': 3}
header = ['a', 'b', 'c']
data = [1, 2, 3]
result = {}

for i, header in enumerate(header):
    result[header] = data[i]

print(result)
# {'a': 1, 'b': 2, 'c': 3}

2.4.5. Generator as Iterator

a = (x for x in range(0,3))

next(a)
# 0

next(a)
# 1

next(a)
# 2

next(a)
# Traceback (most recent call last):
#   File "<input>", line 1, in <module>
# StopIteration
data = (x for x in range(0,3))

for a in data:
    print(a)

# is analogous to:
try:
    i = iter(data)

    a = next(i)
    print(a)

    a = next(i)
    print(a)

    a = next(i)
    print(a)

    a = next(i)
    print(a)

    a = next(i)
    print(a)
except StopIteration:
    pass

2.4.6. Assignments

2.4.6.1. Function Generator Generators vs. Comprehensions

English
  1. Use code from "Input" section (see below)

  2. Download data/iris.csv and save as iris.csv

  3. Read data skipping header

  4. Create function with returns all measurements for given species

  5. Species will be passed as an str argument to the function

  6. Implement solution using normal function

  7. Implement solution using generator and yield keyword

  8. Compare results of both using sys.getsizeof()

  9. What will happen if input data will be bigger?

Polish
  1. Użyj kodu z sekcji "Input" (patrz poniżej)

  2. Pobierz data/iris.csv i zapisz jako iris.csv

  3. Zaczytaj dane pomijając nagłówek

  4. Napisz funkcję która zwraca wszystkie pomiary dla danego gatunku

  5. Gatunek będzie podawany jako argument typu str do funkcji

  6. Zaimplementuj rozwiązanie wykorzystując zwykłą funkcję

  7. Zaimplementuj rozwiązanie wykorzystując generator i słówko kluczowe yield

  8. Porównaj wyniki obu używając sys.getsizeof()

  9. Co się stanie, gdy ilość danych będzie większa?

The whys and wherefores
  • Using generators

  • Unpacking lazy evaluated code

  • Comparing size of objects

  • Parsing CSV file

  • Filtering file content

Input
with open(r'iris.csv') as file:
    data = file.read()

fun = function_filter(data, 'setosa')
gen = generator_filter(data, 'setosa')

print('Function', sys.getsizeof(fun))
print('Generator', sys.getsizeof(gen))

2.4.6.2. Function Generators vs. Comprehensions - passwd

English
  1. Download data/hosts.txt and save as hosts.txt

  2. Iterating over lines, filter out comments, empty lines, etc.

  3. Extract system accounts (users with UID [third field] is less than 1000)

  4. Return list of system account logins

  5. Implement solution using normal function

  6. Implement solution using generator and yield keyword

  7. Compare results of both using sys.getsizeof()

  8. What will happen if input data will be bigger?

Polish
  1. Pobierz data/hosts.txt i zapisz jako hosts.txt

  2. Iterując po liniaj, odfiltruj komentarze, puste linie itp.

  3. Wyciągnnij konta systemowe (użytkownicy z UID (trzecie pole) mniejszym niż 1000)

  4. Zwróć listę loginów użytkowników systemowych

  5. Zaimplementuj rozwiązanie wykorzystując zwykłą funkcję

  6. Zaimplementuj rozwiązanie wykorzystując generator i słówko kluczowe yield

  7. Porównaj wyniki obu używając sys.getsizeof()

  8. Co się stanie, gdy ilość danych będzie większa?

The whys and wherefores
  • Using generators

  • Unpacking lazy evaluated code

  • Comparing size of objects

  • Parsing CSV file

  • Filtering file content