2. Micro-benchmarking

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil” – Donald Knuth

2.1. Evaluation

  • Fresh start of Python process

  • Clean memory before start

  • Same data

  • Same start conditions, CPU load, RAM usage, iostat

  • Do not measure how long Python wakes up

  • Check what you measure

2.2. timeit

2.2.1. Programmatic use

Code Listing 2.63. Timeit simple statement
from timeit import timeit


setup = 'name="Jose Jimenez"'
stmt = 'out = f"My name... {name}"'

duration = timeit(stmt, setup, number=10000)

print(duration)
# 0.0005737080000000061
Code Listing 2.64. Timeit multiple statements with setup code
from timeit import timeit


setup = """
first_name = 'José'
last_name = 'Jiménez'
"""

TEST = dict()
TEST[0] = 'name = f"{first_name} {last_name}"'
TEST[1] = 'name = "{0} {1}".format(first_name, last_name)'
TEST[2] = 'name = first_name + " " + last_name'
TEST[3] = 'name = " ".join([first_name, last_name])'


for stmt in TEST.values():
    duration = timeit(stmt, setup, number=10000)
    print(f'{duration:.5}\t{stmt}')

# 0.00071559    name = f"{first_name} {last_name}"
# 0.0026514     name = "{0} {1}".format(first_name, last_name)
# 0.001015      name = first_name + " " + last_name
# 0.0013494     name = " ".join([first_name, last_name])
Code Listing 2.65. Timeit with globals()
from timeit import timeit


def factorial(n: int) -> int:
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)


duration = timeit(
    stmt='factorial(500); factorial(400); factorial(450)',
    globals=globals(),
    number=10000,
)

duration = round(duration, 6)

print(f'factorial time: {duration} seconds')
# factorial time: 2.845382 seconds

2.2.2. Console use

Code Listing 2.66. Timeit
python3 -m timeit -n100000 -r100 --setup='name="Jose Jimenez"' 'out = f"My name... {name}"'
# 100000 loops, best of 100: 55.9 nsec per loop

python3 -m timeit -n100000 -r100 --setup='name="Jose Jimenez"' 'out = "My name... {name}".format(name=name)'
# 100000 loops, best of 100: 327 nsec per loop

python3 -m timeit -n100000 -r100 --setup='name="Jose Jimenez"' 'out = "My name... %s" % name'
# 100000 loops, best of 100: 124 nsec per loop
-n N, --number=N
how many times to execute ‘statement’

-r N, --repeat=N
how many times to repeat the timer (default 5)

-s S, --setup=S
statement to be executed once initially (default pass)

-p, --process
measure process time, not wallclock time, using time.process_time() instead of time.perf_counter(), which is the default

-u, --unit=U
specify a time unit for timer output; can select nsec, usec, msec, or sec

-v, --verbose
print raw timing results; repeat for more digits precision

-h, --help
print a short usage message and exit

2.3. Use cases

2.3.1. Setup

DATA = [
    {'Sepal length': 5.1, 'Sepal width': 3.5, 'Species': 'setosa'},
    {'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'},
    {'Sepal length': 6.3, 'Petal width': 1.8, 'Species': 'virginica'},
    {'Petal length': 1.4, 'Petal width': 0.2, 'Species': 'setosa'},
    {'Sepal width': 2.8, 'Petal length': 4.1, 'Species': 'versicolor'},
    {'Sepal width': 2.9, 'Petal width': 1.8, 'Species': 'virginica'},
]

2.3.2. Statements

  • Runtime: Jupyter %%timeit

Code Listing 2.67. Code 1
%%timeit

fieldnames = set()

for row in DATA:
    fieldnames.update(row.keys())

# 1.53 µs ± 8.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Code Listing 2.68. Code 2
%%timeit

fieldnames = set(key for record in DATA for key in record.keys())

# 2.03 µs ± 49.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Code Listing 2.69. Code 3 (Is it correct?!)
%%timeit

fieldnames = set()
fieldnames.add(key
    for record in DATA
       for key in record.keys())

# 431 ns ± 5.93 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Code Listing 2.70. Code 4
%%timeit

fieldnames = set()
fieldnames.update(tuple(x.keys()) for x in DATA)

# 2.11 µs ± 51 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Code Listing 2.71. Code 5
%%timeit

fieldnames = list()

for row in DATA:
    for key in row.keys():
        fieldnames.append(key)

set(fieldnames)

# 2.43 µs ± 63.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Code Listing 2.72. Code 6
%%timeit

fieldnames = set()

for row in DATA:
    for key in row.keys():
        fieldnames.add(key)
Code Listing 2.73. Code 7
%%timeit

fieldnames = list()

for row in DATA:
    for key in row.keys():
        if key not in fieldnames:
            fieldnames.append(key)

2.4. Summary

  • Code 3 appends generator object not values, this is why it is so fast!