1. Matplotlib

1.1. Glossary

agg
cairo
eps
pdf
png
ps
svg
raster graphics
vector graphics

1.2. What is matplotlib

Pyplot's state-machine environment behaves similarly to MATLAB and should be most familiar to users with MATLAB experience.

1.3. Installing and using matplotlib

$ pip install matplotlib
import matplotlib.pyplot as plt

1.3.1. Embedding matplotlib charts in Jupyter

  • %matplotlib inline

1.3.2. Running matplotlib in PyCharm

  • Scientific Mode

1.3.3. Running matplotlib in standalone scripts

  • Scale

  • Export to image

  • Reposition

  • Other options

x = [1,2,3]
y = [4,5,6]

plt.plot(x, y)

plt.show()
x = [1,2,3]
y = [4,5,6]

plt.plot(x, y)

plt.savefig('my_file.png')

1.3.5. pandas and matplotlib

  • All of plotting functions expect np.array or np.ma.masked_array as input

  • Classes that are 'array-like' such as pandas data objects and np.matrix may or may not work as intended

  • It is best to convert these to np.array objects prior to plotting

  • Convert a pandas.DataFrame:

    a = pandas.DataFrame(np.random.rand(4,5), columns = list('abcde'))
    a_asndarray = a.values
    
  • Covert a np.matrix:

    b = np.matrix([[1,2],[3,4]])
    b_asarray = np.asarray(b)
    

1.3.6. Opening files

  • with open('filename.csv') - context manager

  • numpy.loadtxt('filename.csv', delimeter=',', unpack=True)

  • csv.DictReader()

import pandas as pd

url = 'https://raw.githubusercontent.com/scikit-learn/scikit-learn/master/sklearn/datasets/data/iris.csv'
columns = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
species = {0: 'setosa', 1: 'versicolor', 2: 'virginica'}

data = pd.read_csv(url, skiprows=1, names=columns)

# Change column Species values
data.Species.replace(to_replace=species, inplace=True)

# Shuffle columns and reset indexes
data.sample(frac=1).reset_index(drop=True, inplace=True)
#      Sepal length  Sepal width     ...      Petal width     Species
# 0             5.0          2.0     ...              1.0  versicolor
# 1             6.4          2.7     ...              1.9   virginica
# 2             5.6          3.0     ...              1.5  versicolor
# 3             5.7          2.6     ...              1.0  versicolor
# 4             6.4          3.1     ...              1.8   virginica
# 5             4.6          3.6     ...              0.2      setosa
# 6             5.9          3.0     ...              1.5  versicolor

1.3.7. Backends

Renderer

Filetypes

Description

AGG

png

raster graphics -- high quality images using the Anti-Grain Geometry engine

PS

ps eps

vector graphics -- Postscript output

PDF

pdf

vector graphics -- Portable Document Format

SVG

svg

vector graphics -- Scalable Vector Graphics

Cairo

png ps pdf svg

raster graphics and vector graphics -- using the Cairo graphics library

1.4. How to understand charts?

1.4.1. Figure anatomy

../_images/matplotlib-figure-anatomy.png

Figure 71. Figure Anatomy

1.4.2. Axes

  • A given figure can contain many Axes, but a given Axes object can only be in one Figure

  • Data limits can be controlled via set_xlim() and set_ylim() methods

  • Each Axes has a title (set via set_title()), an x-label (set via set_xlabel()), and a y-label (set via set_ylabel())

1.4.3. Axis

  • These are the number-line-like objects

  • Axis can be integers

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

x = np.linspace(0, 2, 100)

ax = plt.figure().gca()  # ``gca`` - get current axes

ax.plot(x, x, label='linear')
ax.plot(x, x**2, label='quadratic')
ax.plot(x, x**3, label='cubic')

ax.xaxis.set_major_locator(MaxNLocator(integer=True))

1.4.4. Artist

  • Everything you can see on the figure is an artist (even the Figure, Axes, and Axis objects)

  • This includes Text objects, Line2D objects, collection objects, Patch objects, etc

  • Most Artists are tied to an Axes; such an Artist cannot be shared by multiple Axes, or moved from one to another

1.5. Simple examples

1.5.1. Exponential functions

x = np.linspace(0, 2, 100)

plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

plt.title('Exponential functions')
plt.xlabel('x')
plt.ylabel('y')

plt.legend()
plt.show()
../_images/matplotlib-exponentials.png

Figure 72. Exponential functions

1.5.2. Sin wave

x = np.arange(0, 10, 0.2)
y = np.sin(x)
fig, ax = plt.subplots()
ax.plot(x, y)
plt.show()
../_images/matplotlib-sin-wave.png

Figure 73. Sin wave

1.5.3. Multiple lines on one chart

import numpy as np
import matplotlib.pyplot as plt

# evenly sampled time at 200ms intervals
t = np.arange(0., 5., 0.2)

# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.show()
../_images/matplotlib-multiple.png

Figure 74. Multiple lines on one chart

1.6. Labels and Legend

1.6.1. Axis naming

x = [1,2,3]
y = [4,5,6]

plt.xlabel('X axis')
plt.ylabel('Y axis')

plt.plot(x, y)
plt.show()

1.6.2. Title

x = [1,2,3]
y = [4,5,6]

plt.title('This is my chart')

plt.plot(x, y)
plt.show()
x = [1,2,3]
y = [4,5,6]

plt.title('This is my chart\nSecond line')

plt.plot(x, y)
plt.show()

1.6.3. Legend

  • Good practice: always have labels

x1 = [1,2,3]
y1 = [4,5,6]

x2 = [1,2,3]
y2 = [10,11,12]

plt.plot(x1, y1, label='first line')
plt.plot(x2, y2, label='second line')

plt.legend()
plt.show()

1.6.4. Colors

  • first color name letter (not recommended):

    • r - red

    • g - green

    • b - blue

    • c - cyan

    • m - magenta

    • y - yellow

    • k - karmin

    • w - white

  • color names (X11/CSS4):

  • hexadecimal code (RGB or RGBA):

    • #FF0000 - red

    • #00FF00 - green

    • #0000FF - blue

    • #FF000033 - semi-transparent red

  • tuple (RGB or RGBA):

    • (0.1, 0.2, 0.5)

    • (0.1, 0.2, 0.5, 0.3)

plt.bar(x1, y1, label='Bars 1', color='blue')
plt.bar(x2, y2, label='Bars 2', color='red')

1.6.5. Line styles

../_images/matplotlib-line-style.png

Figure 75. Line styles

pylab.plot(x, y, color="red", linestyle='--')

1.6.6. fmt parameters

Character

Description

-

solid line style

--

dashed line style

-.

dash-dot line style

:

dotted line style

.

point marker

,

pixel marker

o

circle marker

v

triangle_down marker

^

triangle_up marker

<

triangle_left marker

>

triangle_right marker

1

tri_down marker

2

tri_up marker

3

tri_left marker

4

tri_right marker

s

square marker

p

pentagon marker

*

star marker

h

hexagon1 marker

H

hexagon2 marker

+

plus marker

x

x marker

D

diamond marker

d

thin_diamond marker

|

vline marker

_

hline marker

1.6.7. Line2D parameters

Property

Value Type

alpha

float

animated

[True | False]

antialiased or aa

[True | False]

clip_box

a matplotlib.transform.Bbox instance

clip_on

[True | False]

clip_path

a Path instance and a Transform instance, a Patch

color or c

any matplotlib color

contains

the hit testing function

dash_capstyle

['butt' | 'round' | 'projecting']

dash_joinstyle

['miter' | 'round' | 'bevel']

dashes

sequence of on/off ink in points

data

(np.array xdata, np.array ydata)

figure

a matplotlib.figure.Figure instance

label

any string

linestyle or ls

[ '-' | '--' | '-.' | ':' | 'steps' | ...]

linewidth or lw

float value in points

lod

[True | False]

marker

[ '+' | ',' | '.' | '1' | '2' | '3' | '4' ]

markeredgecolor or mec

any matplotlib color

markeredgewidth or mew

float value in points

markerfacecolor or mfc

any matplotlib color

markersize or ms

float

markevery

[ None | integer | (startind, stride) ]

picker

used in interactive line selection

pickradius

the line pick selection radius

solid_capstyle

['butt' | 'round' | 'projecting']

solid_joinstyle

['miter' | 'round' | 'bevel']

transform

a matplotlib.transforms.Transform instance

visible

[True | False]

xdata

np.array

ydata

np.array

zorder

any number

1.7. Basic customizations

  • figure object is implied

  • explicit assignment is needed when customizing

fig = plt.figure()

1.7.1. Size

Local:

plt.figure(figsize=(3,4))
Global:
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (20,10)
import matplotlib

matplotlib.rc('figure', figsize=(20,10))

1.7.2. Font

  • 'serif'

  • 'sans-serif'

  • 'cursive'

  • 'fantasy'

  • 'monospace'

import matplotlib
import matplotlib.pyplot as plt

matplotlib.rc('font', family='Serif', weight='bold', size=8)

x = [1, 2, 3, 4, 5]
y = [1, 2, 3, 4, 5]

plt.plot(x, y)
plt.grid(True)
plt.show()

1.7.3. Subplots

fig = plt.figure()

ax1 = plt.subplot2grid(shape=(1,1), loc=(0,0)) # ``loc`` = Location to place axis within grid.

plt.subplot_adjust(left=0.9, bottom=0.16)  # set margins

1.8. Additional info

1.8.1. Lablel rotation

Listing 494. Lablel rotation
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [1, 4, 9, 6]
labels = ['Frogs', 'Hogs', 'Bogs', 'Slogs']

plt.plot(x, y, 'ro')

# You can specify a rotation for the tick labels in degrees or with keywords.
plt.xticks(rotation=45)

# Pad margins so that markers don't get clipped by the axes
plt.margins(0.2)

# Tweak spacing to prevent clipping of tick-labels
plt.subplots_adjust(bottom=0.15)
plt.show()
../_images/matplotlib-tick-rotation.png

Figure 76. Lablel rotation

1.8.2. Grid

Listing 495. Grid Simple
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [1, 2, 3, 4, 5]
plt.plot(x, y)


plt.grid(True)
plt.show()
../_images/matplotlib-grid-simple.png

Figure 77. Grid Simple

Listing 496. Grid Extra
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

# Major ticks every 20, minor ticks every 5
major_ticks = np.arange(0, 101, 20)
minor_ticks = np.arange(0, 101, 5)

ax.set_xticks(major_ticks)
ax.set_xticks(minor_ticks, minor=True)
ax.set_yticks(major_ticks)
ax.set_yticks(minor_ticks, minor=True)

# And a corresponding grid
ax.grid(which='both')

# Or if you want different settings for the grids:
ax.grid(which='minor', alpha=0.2)
ax.grid(which='major', alpha=0.5)

plt.show()
../_images/matplotlib-grid-extra.png

Figure 78. Grid Extra

1.8.3. Trend line

Listing 497. Trend line
import matplotlib.pylab as pylab
import numpy as np

x = [1, 3, 5, 7, 9]
y = [2, 3, 4, 3, 4]

# plot the data itself
pylab.plot(x, y, label="data")

# calc the trendline (it is simply a linear fitting)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)

pylab.plot(x, p(x), color="red", linestyle='--')

# the line equation:
a = z[0]
b = z[1]
print(f"y = {a:.6}x + ({b:.6})")

# parabolic fit will be:
# z = numpy.polyfit(x, y, 2)
../_images/matplotlib-trendline.png

Figure 79. Trend line

1.8.4. Error bars

Listing 498. Error bars
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
e = [0.5, 1., 1.5, 2.]

plt.errorbar(x, y, yerr=e, fmt='o')
plt.show()
../_images/matplotlib-plt-errorbar.png

Figure 80. Error bars

1.8.5. Colorbar

Listing 499. Colorbar
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris


iris = load_iris()

# The indices of the features that we are plotting
x_index = 0
y_index = 1

# this formatter will label the ``colorbar`` with the correct target names
formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])

plt.figure(figsize=(5, 4))
plt.scatter(iris.data[:, x_index], iris.data[:, y_index], c=iris.target)
plt.colorbar(ticks=[0, 1, 2], format=formatter)

plt.xlabel(iris.feature_names[x_index])
plt.ylabel(iris.feature_names[y_index])

plt.tight_layout()
plt.show()
../_images/matplotlib-colorbar.png

Figure 81. Colorbar

1.8.6. Changing colors

ax.spines['bottom'].set_color('#dddddd')
ax.spines['top'].set_color('#dddddd')
ax.spines['right'].set_color('red')
ax.spines['left'].set_color('red')
ax.tick_params(axis='x', colors='red')
ax.tick_params(axis='y', colors='red')
ax.yaxis.label.set_color('red')
ax.xaxis.label.set_color('red')
ax.title.set_color('red')

1.9. Working with multiple figures and axes

import numpy as np
import matplotlib.pyplot as plt

def f(t):
    return np.exp(-t) * np.cos(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)

plt.figure(1)
plt.subplot(211)
plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k')

plt.subplot(212)
plt.plot(t2, np.cos(2*np.pi*t2), 'r--')
plt.show()
../_images/matplotlib-plt-subplot.png

Figure 82. Working with multiple figures and axes

1.10. Working with text

import numpy as np
import matplotlib.pyplot as plt

# Fixing random state for reproducibility
np.random.seed(19680801)

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)


plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()
../_images/matplotlib-plt-hist-text.png

Figure 83. Working with text

1.10.1. Using mathematical expressions in text

plt.title(r'$\sigma_i=15$')

1.10.2. Annotating text

import numpy as np
import matplotlib.pyplot as plt

ax = plt.subplot(111)

t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2*np.pi*t)
line, = plt.plot(t, s, lw=2)

plt.annotate('local max', xy=(2, 1), xytext=(3, 1.5),
            arrowprops=dict(facecolor='black', shrink=0.05),
            )

plt.ylim(-2,2)
plt.show()
../_images/matplotlib-plt-annotate.png

Figure 84. Annotating text

1.11. Logarithmic and other nonlinear axes

plt.xscale('log')
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter  # useful for `logit` scale


# Fixing random state for reproducibility
np.random.seed(19680801)

# make up some data in the interval ]0, 1[
y = np.random.normal(loc=0.5, scale=0.4, size=1000)
y = y[(y > 0) & (y < 1)]
y.sort()
x = np.arange(len(y))

# plot with various axes scales
plt.figure(1)

# linear
plt.subplot(221)
plt.plot(x, y)
plt.yscale('linear')
plt.title('linear')
plt.grid(True)


# log
plt.subplot(222)
plt.plot(x, y)
plt.yscale('log')
plt.title('log')
plt.grid(True)


# symmetric log
plt.subplot(223)
plt.plot(x, y - y.mean())
plt.yscale('symlog', linthreshy=0.01)
plt.title('symlog')
plt.grid(True)

# logit
plt.subplot(224)
plt.plot(x, y)
plt.yscale('logit')
plt.title('logit')
plt.grid(True)
# Format the minor tick labels of the y-axis into empty strings with
# `NullFormatter`, to avoid cumbering the axis with too many labels.
plt.gca().yaxis.set_minor_formatter(NullFormatter())
# Adjust the subplot layout, because the logit one may take more space
# than usual, due to y-tick labels like "1 - 10^{-3}"
plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.25,
                    wspace=0.35)

plt.show()
../_images/matplotlib-plt-scale.png

Figure 85. Logarithmic and other nonlinear axes

1.12. plt.plot() vs ax.plot()

fig = plt.figure()
plt.plot(data)
fig.show()
  1. Takes the current figure and axes (if none exists it will create a new one) and plot into them:

    line = plt.plot(data)
    
  2. In your case, the behavior is same as before with explicitly stating the axes for plot:

    ax = plt.axes()
    line = ax.plot(data)
    
  3. This approach of using ax.plot(...) is a must, if you want to plot into multiple axes (possibly in one figure). For example when using a subplots. Explicitly creates new figure - you will not add anything to previous one. Explicitly creates a new axes with given rectangle shape and the rest is the same as with 2:

    fig = plt.figure()
    ax = fig.add_axes([0,0,1,1])
    line = ax.plot(data)
    

    possible problem using figure.add_axes is that it may add a new axes object to the figure, which will overlay the first one (or others). This happens if the requested size does not match the existing ones.

1.13. Assignment

1.13.1. Trigonometry

  1. Dla x z przedziału od 0.0 do 1.0 z próbkowaniem co 0.01 przedstaw przebiegi funkcji sin, cos dla parametrów 2 * np.pi * x

  2. Stwórz dwa osobne obrazki (figure):

    • Każdy z przebiegów na osobnym subplot

    • Na jednym plot dwa przebiegi funkcji

  3. Wykresy (subplot) mają być jeden nad drugim

  4. Wykresy podpisz nazwą funkcji trygonometrycznej

  5. Tekst etykiety osi y ustaw na "Wartość funkcji"

  6. Pokoloruj nazwy thicków x dla wykresu sin na czerwono

  7. Pokoloruj nazwę (label) dla cos na kolor zielony

  8. Na obu wykresach pokaż grid

  9. Narysuj drugi obrazek z nałożonymi na jeden plot wykresami obu funkcji

Hint
  • np.sin()

  • np.cos()

1.13.2. Iris scatter

  1. Z podanego powyżej adresu URL pobierz dane

  2. Dla każdego gatunku

  3. Dane stosunku sepal_length do sepal_width zwizualizuj w formie scatter za pomocą matplotlib

  4. Każdy gatunek powinien mieć inny kolor

Hint
  • pd.groupby()

1.13.3. Random points

  • Complexity level: medium

  • Lines of code to write: 15 lines

  • Estimated time of completion: 20 min

  • Filename: solution/random_points.py

  1. Wygeneruj 100 losowych punktów:

    • rozkład gaussa o średniej 0

    • o odchyleniu standardowym równym 0.2

  2. Punkty muszą być wylosowane wokół dwóch wybranych punktów (A = (0, 1), B = (2, 4)`).

  3. Funkcja musi przechodzić doctest

def random_point(center, std: int = 0.2):
    """
    >>> random.seed(1); random_point((0,0), std=0.2)
    (0.2576369506310926, 0.2898891217399542)

    >>> random.seed(1); random_point((0,0))
    (0.2576369506310926, 0.2898891217399542)

    >>> random.seed(1); random_point((2,5), std=10)
    (14.881847531554628, 19.494456086997708)

    >>> random.seed(1); random_point((2,5), std=(0.1, 12))
    (2.1288184753155464, 22.393347304397253)
    """
    pass
  1. Wyrysuj te punkty na wykresie (możesz użyć funkcji plt.axis('equal') żeby osie wykresu były w tej samej skali).

  2. Punkt A i punkty wygenerowane na jego podstawie wyrysuj kolorem czerwonym

  3. punkt B i punkty wygenerowane na jego podstawie wyrysuj kolorem niebieskim

  4. Możesz do tego celu napisać funkcję plot_point(point, color), która przyjmuje punkt (dwuelementowy tuple, lub listę, z czego pierwszy element to współrzędna x, a druga to y), i kolor i doda ten punkt do aktualnie aktywnego rysunku.

  5. Korzystając z funkcji napisanej w ćwiczeniu powyżej oblicz odległość od każdego z punktów do punktów A i B

  6. Na podstawie tej odległości zaklasyfikuj te punkty

    • jeżeli punkt jest bliżej punktu A to należy do zbioru A

    • jeżeli jest bliżej do zbioru B to należy do zbioru B

  7. Narysuj nowy wykres, na którym:

    • punkty ze zbioru A będą narysowane kolorem czerwonym,

    • punkty ze zbioru B będą narysowane kolorem niebieskim.

  8. Czy dwa wykresy są takie same?

  9. Co się stanie jeżeli będziemy zwiększali odchylenie standardowe przy generacji punktów?

  10. Albo przybliżymy do siebie punkty A i B?

Hints
  • argument color='red' w funkcji plt.plot