1. Matplotlib

1.1. Glossary

agg
cairo
eps
pdf
png
ps
svg
raster graphics
vector graphics

1.2. What is matplotlib

Pyplot’s state-machine environment behaves similarly to MATLAB and should be most familiar to users with MATLAB experience.

1.3. Installing and using matplotlib

$ pip install matplotlib
import matplotlib.pyplot as plt

1.3.1. Embedding matplotlib charts in Jupyter

  • %matplotlib inline

1.3.2. Running matplotlib in PyCharm

  • Scientific Mode

1.3.3. Running matplotlib in standalone scripts

  • Scale

  • Export to image

  • Reposition

  • Other options

x = [1,2,3]
y = [4,5,6]

plt.plot(x, y)

plt.show()
x = [1,2,3]
y = [4,5,6]

plt.plot(x, y)

plt.savefig('my_file.png')

1.3.5. pandas and matplotlib

  • All of plotting functions expect np.array or np.ma.masked_array as input

  • Classes that are ‘array-like’ such as pandas data objects and np.matrix may or may not work as intended

  • It is best to convert these to np.array objects prior to plotting

  • Convert a pandas.DataFrame:

    a = pandas.DataFrame(np.random.rand(4,5), columns = list('abcde'))
    a_asndarray = a.values
    
  • Covert a np.matrix:

    b = np.matrix([[1,2],[3,4]])
    b_asarray = np.asarray(b)
    

1.3.6. Opening files

  • with open('filename.csv') - context manager

  • numpy.loadtxt('filename.csv', delimeter=',', unpack=True)

  • csv.DictReader()

import pandas as pd

url = 'https://raw.githubusercontent.com/scikit-learn/scikit-learn/master/sklearn/datasets/data/iris.csv'
columns = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
species = {0: 'setosa', 1: 'versicolor', 2: 'virginica'}

data = pd.read_csv(url, skiprows=1, names=columns)

# Change column Species values
data.Species.replace(to_replace=species, inplace=True)

# Shuffle columns and reset indexes
data.sample(frac=1).reset_index(drop=True, inplace=True)
#      Sepal length  Sepal width     ...      Petal width     Species
# 0             5.0          2.0     ...              1.0  versicolor
# 1             6.4          2.7     ...              1.9   virginica
# 2             5.6          3.0     ...              1.5  versicolor
# 3             5.7          2.6     ...              1.0  versicolor
# 4             6.4          3.1     ...              1.8   virginica
# 5             4.6          3.6     ...              0.2      setosa
# 6             5.9          3.0     ...              1.5  versicolor

1.3.7. Backends

Renderer

Filetypes

Description

AGG

png

raster graphics – high quality images using the Anti-Grain Geometry engine

PS

ps eps

vector graphics – Postscript output

PDF

pdf

vector graphics – Portable Document Format

SVG

svg

vector graphics – Scalable Vector Graphics

Cairo

png ps pdf svg

raster graphics and vector graphics – using the Cairo graphics library

1.4. How to understand charts?

1.4.1. Figure anatomy

../_images/matplotlib-figure-anatomy.png

Fig. 1.7. Figure Anatomy

1.4.2. Axes

  • A given figure can contain many Axes, but a given Axes object can only be in one Figure

  • Data limits can be controlled via set_xlim() and set_ylim() methods

  • Each Axes has a title (set via set_title()), an x-label (set via set_xlabel()), and a y-label (set via set_ylabel())

1.4.3. Axis

  • These are the number-line-like objects

  • Axis can be integers

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

x = np.linspace(0, 2, 100)

ax = plt.figure().gca()  # ``gca`` - get current axes

ax.plot(x, x, label='linear')
ax.plot(x, x**2, label='quadratic')
ax.plot(x, x**3, label='cubic')

ax.xaxis.set_major_locator(MaxNLocator(integer=True))

1.4.4. Artist

  • Everything you can see on the figure is an artist (even the Figure, Axes, and Axis objects)

  • This includes Text objects, Line2D objects, collection objects, Patch objects, etc

  • Most Artists are tied to an Axes; such an Artist cannot be shared by multiple Axes, or moved from one to another

1.5. Simple examples

1.5.1. Exponential functions

x = np.linspace(0, 2, 100)

plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

plt.title('Exponential functions')
plt.xlabel('x')
plt.ylabel('y')

plt.legend()
plt.show()
../_images/matplotlib-exponentials.png

Fig. 1.8. Exponential functions

1.5.2. Sin wave

x = np.arange(0, 10, 0.2)
y = np.sin(x)
fig, ax = plt.subplots()
ax.plot(x, y)
plt.show()
../_images/matplotlib-sin-wave.png

Fig. 1.9. Sin wave

1.5.3. Multiple lines on one chart

import numpy as np
import matplotlib.pyplot as plt

# evenly sampled time at 200ms intervals
t = np.arange(0., 5., 0.2)

# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.show()
../_images/matplotlib-multiple.png

Fig. 1.10. Multiple lines on one chart

1.6. Labels and Legend

1.6.1. Axis naming

x = [1,2,3]
y = [4,5,6]

plt.xlabel('X axis')
plt.ylabel('Y axis')

plt.plot(x, y)
plt.show()

1.6.2. Title

x = [1,2,3]
y = [4,5,6]

plt.title('This is my chart')

plt.plot(x, y)
plt.show()
x = [1,2,3]
y = [4,5,6]

plt.title('This is my chart\nSecond line')

plt.plot(x, y)
plt.show()

1.6.3. Legend

  • Good practice: always have labels

x1 = [1,2,3]
y1 = [4,5,6]

x2 = [1,2,3]
y2 = [10,11,12]

plt.plot(x1, y1, label='first line')
plt.plot(x2, y2, label='second line')

plt.legend()
plt.show()

1.6.4. Colors

  • first color name letter (not recommended):

    • r - red

    • g - green

    • b - blue

    • c - cyan

    • m - magenta

    • y - yellow

    • k - karmin

    • w - white

  • color names (X11/CSS4):

  • hexadecimal code (RGB or RGBA):

    • #FF0000 - red

    • #00FF00 - green

    • #0000FF - blue

    • #FF000033 - semi-transparent red

  • tuple (RGB or RGBA):

    • (0.1, 0.2, 0.5)

    • (0.1, 0.2, 0.5, 0.3)

plt.bar(x1, y1, label='Bars 1', color='blue')
plt.bar(x2, y2, label='Bars 2', color='red')

1.6.5. Line styles

../_images/matplotlib-line-style.png

Fig. 1.11. Line styles

pylab.plot(x, y, color="red", linestyle='--')

1.6.6. fmt parameters

Character

Description

-

solid line style

--

dashed line style

-.

dash-dot line style

:

dotted line style

.

point marker

,

pixel marker

o

circle marker

v

triangle_down marker

^

triangle_up marker

<

triangle_left marker

>

triangle_right marker

1

tri_down marker

2

tri_up marker

3

tri_left marker

4

tri_right marker

s

square marker

p

pentagon marker

*

star marker

h

hexagon1 marker

H

hexagon2 marker

+

plus marker

x

x marker

D

diamond marker

d

thin_diamond marker

|

vline marker

_

hline marker

1.6.7. Line2D parameters

Property

Value Type

alpha

float

animated

[True | False]

antialiased or aa

[True | False]

clip_box

a matplotlib.transform.Bbox instance

clip_on

[True | False]

clip_path

a Path instance and a Transform instance, a Patch

color or c

any matplotlib color

contains

the hit testing function

dash_capstyle

['butt' | 'round' | 'projecting']

dash_joinstyle

['miter' | 'round' | 'bevel']

dashes

sequence of on/off ink in points

data

(np.array xdata, np.array ydata)

figure

a matplotlib.figure.Figure instance

label

any string

linestyle or ls

[ '-' | '--' | '-.' | ':' | 'steps' | …]

linewidth or lw

float value in points

lod

[True | False]

marker

[ '+' | ',' | '.' | '1' | '2' | '3' | '4' ]

markeredgecolor or mec

any matplotlib color

markeredgewidth or mew

float value in points

markerfacecolor or mfc

any matplotlib color

markersize or ms

float

markevery

[ None | integer | (startind, stride) ]

picker

used in interactive line selection

pickradius

the line pick selection radius

solid_capstyle

['butt' | 'round' | 'projecting']

solid_joinstyle

['miter' | 'round' | 'bevel']

transform

a matplotlib.transforms.Transform instance

visible

[True | False]

xdata

np.array

ydata

np.array

zorder

any number

1.7. Basic customizations

  • figure object is implied

  • explicit assignment is needed when customizing

fig = plt.figure()

1.7.1. Size

Local:

plt.figure(figsize=(3,4))
Global:
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (20,10)
import matplotlib

matplotlib.rc('figure', figsize=(20,10))

1.7.2. Font

  • 'serif'

  • 'sans-serif'

  • 'cursive'

  • 'fantasy'

  • 'monospace'

import matplotlib
import matplotlib.pyplot as plt

matplotlib.rc('font', family='Serif', weight='bold', size=8)

x = [1, 2, 3, 4, 5]
y = [1, 2, 3, 4, 5]

plt.plot(x, y)
plt.grid(True)
plt.show()

1.7.3. Subplots

fig = plt.figure()

ax1 = plt.subplot2grid(shape=(1,1), loc=(0,0)) # ``loc`` = Location to place axis within grid.

plt.subplot_adjust(left=0.9, bottom=0.16)  # set margins

1.8. Additional info

1.8.1. Lablel rotation

Code Listing 1.50. Lablel rotation
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [1, 4, 9, 6]
labels = ['Frogs', 'Hogs', 'Bogs', 'Slogs']

plt.plot(x, y, 'ro')

# You can specify a rotation for the tick labels in degrees or with keywords.
plt.xticks(rotation=45)

# Pad margins so that markers don't get clipped by the axes
plt.margins(0.2)

# Tweak spacing to prevent clipping of tick-labels
plt.subplots_adjust(bottom=0.15)
plt.show()
../_images/matplotlib-tick-rotation.png

Fig. 1.12. Lablel rotation

1.8.2. Grid

Code Listing 1.51. Grid Simple
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [1, 2, 3, 4, 5]
plt.plot(x, y)


plt.grid(True)
plt.show()
../_images/matplotlib-grid-simple.png

Fig. 1.13. Grid Simple

Code Listing 1.52. Grid Extra
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

# Major ticks every 20, minor ticks every 5
major_ticks = np.arange(0, 101, 20)
minor_ticks = np.arange(0, 101, 5)

ax.set_xticks(major_ticks)
ax.set_xticks(minor_ticks, minor=True)
ax.set_yticks(major_ticks)
ax.set_yticks(minor_ticks, minor=True)

# And a corresponding grid
ax.grid(which='both')

# Or if you want different settings for the grids:
ax.grid(which='minor', alpha=0.2)
ax.grid(which='major', alpha=0.5)

plt.show()
../_images/matplotlib-grid-extra.png

Fig. 1.14. Grid Extra

1.8.3. Trend line

Code Listing 1.53. Trend line
import matplotlib.pylab as pylab
import numpy as np

x = [1, 3, 5, 7, 9]
y = [2, 3, 4, 3, 4]

# plot the data itself
pylab.plot(x, y, label="data")

# calc the trendline (it is simply a linear fitting)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)

pylab.plot(x, p(x), color="red", linestyle='--')

# the line equation:
a = z[0]
b = z[1]
print(f"y = {a:.6}x + ({b:.6})")

# parabolic fit will be:
# z = numpy.polyfit(x, y, 2)
../_images/matplotlib-trendline.png

Fig. 1.15. Trend line

1.8.4. Error bars

Code Listing 1.54. Error bars
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
e = [0.5, 1., 1.5, 2.]

plt.errorbar(x, y, yerr=e, fmt='o')
plt.show()
../_images/matplotlib-plt-errorbar.png

Fig. 1.16. Error bars

1.8.5. Colorbar

Code Listing 1.55. Colorbar
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris


iris = load_iris()

# The indices of the features that we are plotting
x_index = 0
y_index = 1

# this formatter will label the colorbar with the correct target names
formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])

plt.figure(figsize=(5, 4))
plt.scatter(iris.data[:, x_index], iris.data[:, y_index], c=iris.target)
plt.colorbar(ticks=[0, 1, 2], format=formatter)

plt.xlabel(iris.feature_names[x_index])
plt.ylabel(iris.feature_names[y_index])

plt.tight_layout()
plt.show()
../_images/matplotlib-colorbar.png

Fig. 1.17. Colorbar

1.8.6. Changing colors

ax.spines['bottom'].set_color('#dddddd')
ax.spines['top'].set_color('#dddddd')
ax.spines['right'].set_color('red')
ax.spines['left'].set_color('red')
ax.tick_params(axis='x', colors='red')
ax.tick_params(axis='y', colors='red')
ax.yaxis.label.set_color('red')
ax.xaxis.label.set_color('red')
ax.title.set_color('red')

1.9. Working with multiple figures and axes

import numpy as np
import matplotlib.pyplot as plt

def f(t):
    return np.exp(-t) * np.cos(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)

plt.figure(1)
plt.subplot(211)
plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k')

plt.subplot(212)
plt.plot(t2, np.cos(2*np.pi*t2), 'r--')
plt.show()
../_images/matplotlib-plt-subplot.png

Fig. 1.18. Working with multiple figures and axes

1.10. Working with text

import numpy as np
import matplotlib.pyplot as plt

# Fixing random state for reproducibility
np.random.seed(19680801)

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)


plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()
../_images/matplotlib-plt-hist-text.png

Fig. 1.19. Working with text

1.10.1. Using mathematical expressions in text

plt.title(r'$\sigma_i=15$')

1.10.2. Annotating text

import numpy as np
import matplotlib.pyplot as plt

ax = plt.subplot(111)

t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2*np.pi*t)
line, = plt.plot(t, s, lw=2)

plt.annotate('local max', xy=(2, 1), xytext=(3, 1.5),
            arrowprops=dict(facecolor='black', shrink=0.05),
            )

plt.ylim(-2,2)
plt.show()
../_images/matplotlib-plt-annotate.png

Fig. 1.20. Annotating text

1.11. Logarithmic and other nonlinear axes

plt.xscale('log')
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter  # useful for `logit` scale


# Fixing random state for reproducibility
np.random.seed(19680801)

# make up some data in the interval ]0, 1[
y = np.random.normal(loc=0.5, scale=0.4, size=1000)
y = y[(y > 0) & (y < 1)]
y.sort()
x = np.arange(len(y))

# plot with various axes scales
plt.figure(1)

# linear
plt.subplot(221)
plt.plot(x, y)
plt.yscale('linear')
plt.title('linear')
plt.grid(True)


# log
plt.subplot(222)
plt.plot(x, y)
plt.yscale('log')
plt.title('log')
plt.grid(True)


# symmetric log
plt.subplot(223)
plt.plot(x, y - y.mean())
plt.yscale('symlog', linthreshy=0.01)
plt.title('symlog')
plt.grid(True)

# logit
plt.subplot(224)
plt.plot(x, y)
plt.yscale('logit')
plt.title('logit')
plt.grid(True)
# Format the minor tick labels of the y-axis into empty strings with
# `NullFormatter`, to avoid cumbering the axis with too many labels.
plt.gca().yaxis.set_minor_formatter(NullFormatter())
# Adjust the subplot layout, because the logit one may take more space
# than usual, due to y-tick labels like "1 - 10^{-3}"
plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.25,
                    wspace=0.35)

plt.show()
../_images/matplotlib-plt-scale.png

Fig. 1.21. Logarithmic and other nonlinear axes

1.12. plt.plot() vs ax.plot()

fig = plt.figure()
plt.plot(data)
fig.show()
  1. Takes the current figure and axes (if none exists it will create a new one) and plot into them:

    line = plt.plot(data)
    
  2. In your case, the behavior is same as before with explicitly stating the axes for plot:

    ax = plt.axes()
    line = ax.plot(data)
    
  3. This approach of using ax.plot(...) is a must, if you want to plot into multiple axes (possibly in one figure). For example when using a subplots. Explicitly creates new figure - you will not add anything to previous one. Explicitly creates a new axes with given rectangle shape and the rest is the same as with 2:

    fig = plt.figure()
    ax = fig.add_axes([0,0,1,1])
    line = ax.plot(data)
    

    possible problem using figure.add_axes is that it may add a new axes object to the figure, which will overlay the first one (or others). This happens if the requested size does not match the existing ones.

1.13. Assignment

1.13.1. Trigonometry

  • Filename: matplotlib_trigonometry.py

  • Lines of code to write: 15 lines

  • Estimated time of completion: 20 min

  1. Dla x z przedziału od 0.0 do 1.0 z próbkowaniem co 0.01 przedstaw przebiegi funkcji sin, cos dla parametrów 2 * np.pi * x

  2. Stwórz dwa osobne obrazki (figure):

    • Każdy z przebiegów na osobnym subplot

    • Na jednym plot dwa przebiegi funkcji

  3. Wykresy (subplot) mają być jeden nad drugim

  4. Wykresy podpisz nazwą funkcji trygonometrycznej

  5. Tekst etykiety osi y ustaw na “Wartość funkcji”

  6. Pokoloruj nazwy thicków x dla wykresu sin na czerwono

  7. Pokoloruj nazwę (label) dla cos na kolor zielony

  8. Na obu wykresach pokaż grid

  9. Narysuj drugi obrazek z nałożonymi na jeden plot wykresami obu funkcji

Hint
  • np.sin()

  • np.cos()

1.13.2. Iris scatter

  1. Z podanego powyżej adresu URL pobierz dane

  2. Dla każdego gatunku

  3. Dane stosunku sepal_length do sepal_width zwizualizuj w formie scatter za pomocą matplotlib

  4. Każdy gatunek powinien mieć inny kolor

Hint
  • pd.groupby()