1. Matplotlib

1.1. Glossary

agg
cairo
eps
pdf
png
ps
svg
raster graphics
vector graphics

1.2. What is matplotlib

Pyplot’s state-machine environment behaves similarly to MATLAB and should be most familiar to users with MATLAB experience.

1.3. Installing and using matplotlib

$ pip install matplotlib
import matplotlib.pyplot as plt

1.3.1. Embedding matplotlib charts in Jupyter

  • %matplotlib inline

1.3.2. Running matplotlib in PyCharm

  • Scientific Mode

1.3.3. Running matplotlib in standalone scripts

  • Scale
  • Export to image
  • Reposition
  • Other options
x = [1,2,3]
y = [4,5,6]

plt.plot(x, y)

plt.show()
x = [1,2,3]
y = [4,5,6]

plt.plot(x, y)

plt.savefig('my_file.png')

1.3.5. pandas and matplotlib

  • All of plotting functions expect np.array or np.ma.masked_array as input

  • Classes that are ‘array-like’ such as pandas data objects and np.matrix may or may not work as intended

  • It is best to convert these to np.array objects prior to plotting

  • Convert a pandas.DataFrame:

    a = pandas.DataFrame(np.random.rand(4,5), columns = list('abcde'))
    a_asndarray = a.values
    
  • Covert a np.matrix:

    b = np.matrix([[1,2],[3,4]])
    b_asarray = np.asarray(b)
    

1.3.6. Opening files

  • with open('filename.csv') - context manager
  • numpy.loadtxt('filename.csv', delimeter=',', unpack=True)
  • csv.DictReader()
import pandas as pd

url = 'https://raw.githubusercontent.com/scikit-learn/scikit-learn/master/sklearn/datasets/data/iris.csv'
columns = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
species = {0: 'setosa', 1: 'versicolor', 2: 'virginica'}

data = pd.read_csv(url, skiprows=1, names=columns)

# Change column Species values
data.Species.replace(to_replace=species, inplace=True)

# Shuffle columns and reset indexes
data.sample(frac=1).reset_index(drop=True, inplace=True)
#      Sepal length  Sepal width     ...      Petal width     Species
# 0             5.0          2.0     ...              1.0  versicolor
# 1             6.4          2.7     ...              1.9   virginica
# 2             5.6          3.0     ...              1.5  versicolor
# 3             5.7          2.6     ...              1.0  versicolor
# 4             6.4          3.1     ...              1.8   virginica
# 5             4.6          3.6     ...              0.2      setosa
# 6             5.9          3.0     ...              1.5  versicolor

1.3.7. Backends

Renderer Filetypes Description
AGG png raster graphics – high quality images using the Anti-Grain Geometry engine
PS ps eps vector graphics – Postscript output
PDF pdf vector graphics – Portable Document Format
SVG svg vector graphics – Scalable Vector Graphics
Cairo png ps pdf svg raster graphics and vector graphics – using the Cairo graphics library

1.4. How to understand charts?

1.4.1. Figure anatomy

../_images/matplotlib-figure-anatomy.png

Fig. 1.4. Figure Anatomy

1.4.2. Axes

  • A given figure can contain many Axes, but a given Axes object can only be in one Figure
  • Data limits can be controlled via set_xlim() and set_ylim() methods
  • Each Axes has a title (set via set_title()), an x-label (set via set_xlabel()), and a y-label (set via set_ylabel())

1.4.3. Axis

  • These are the number-line-like objects
  • Axis can be integers
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

x = np.linspace(0, 2, 100)

ax = plt.figure().gca()  # ``gca`` - get current axes

ax.plot(x, x, label='linear')
ax.plot(x, x**2, label='quadratic')
ax.plot(x, x**3, label='cubic')

ax.xaxis.set_major_locator(MaxNLocator(integer=True))

1.4.4. Artist

  • Everything you can see on the figure is an artist (even the Figure, Axes, and Axis objects)
  • This includes Text objects, Line2D objects, collection objects, Patch objects, etc
  • Most Artists are tied to an Axes; such an Artist cannot be shared by multiple Axes, or moved from one to another

1.5. Simple examples

1.5.1. Exponential functions

x = np.linspace(0, 2, 100)

plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

plt.title('Exponential functions')
plt.xlabel('x')
plt.ylabel('y')

plt.legend()
plt.show()
../_images/matplotlib-exponentials.png

Fig. 1.5. Exponential functions

1.5.2. Sin wave

x = np.arange(0, 10, 0.2)
y = np.sin(x)
fig, ax = plt.subplots()
ax.plot(x, y)
plt.show()
../_images/matplotlib-sin-wave.png

Fig. 1.6. Sin wave

1.5.3. Multiple lines on one chart

import numpy as np
import matplotlib.pyplot as plt

# evenly sampled time at 200ms intervals
t = np.arange(0., 5., 0.2)

# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.show()
../_images/matplotlib-multiple.png

Fig. 1.7. Multiple lines on one chart

1.6. Labels and Legend

1.6.1. Axis naming

x = [1,2,3]
y = [4,5,6]

plt.xlabel('X axis')
plt.ylabel('Y axis')

plt.plot(x, y)
plt.show()

1.6.2. Title

x = [1,2,3]
y = [4,5,6]

plt.title('This is my chart')

plt.plot(x, y)
plt.show()
x = [1,2,3]
y = [4,5,6]

plt.title('This is my chart\nSecond line')

plt.plot(x, y)
plt.show()

1.6.3. Legend

  • Good practice: always have labels
x1 = [1,2,3]
y1 = [4,5,6]

x2 = [1,2,3]
y2 = [10,11,12]

plt.plot(x1, y1, label='first line')
plt.plot(x2, y2, label='second line')

plt.legend()
plt.show()

1.6.4. Colors

  • first color name letter (not recommended):

    • r - red
    • g - green
    • b - blue
    • c - cyan
    • m - magenta
    • y - yellow
    • k - karmin
    • w - white
  • color names (X11/CSS4):

  • hexadecimal code (RGB or RGBA):

    • #FF0000 - red
    • #00FF00 - green
    • #0000FF - blue
    • #FF000033 - semi-transparent red
  • tuple (RGB or RGBA):

    • (0.1, 0.2, 0.5)
    • (0.1, 0.2, 0.5, 0.3)
plt.bar(x1, y1, label='Bars 1', color='blue')
plt.bar(x2, y2, label='Bars 2', color='red')

1.6.5. Line styles

../_images/matplotlib-line-style.png

Fig. 1.8. Line styles

pylab.plot(x, y, color="red", linestyle='--')

1.6.6. fmt parameters

Character Description
- solid line style
-- dashed line style
-. dash-dot line style
: dotted line style
. point marker
, pixel marker
o circle marker
v triangle_down marker
^ triangle_up marker
< triangle_left marker
> triangle_right marker
1 tri_down marker
2 tri_up marker
3 tri_left marker
4 tri_right marker
s square marker
p pentagon marker
* star marker
h hexagon1 marker
H hexagon2 marker
+ plus marker
x x marker
D diamond marker
d thin_diamond marker
| vline marker
_ hline marker

1.6.7. Line2D parameters

Property Value Type
alpha float
animated [True | False]
antialiased or aa [True | False]
clip_box a matplotlib.transform.Bbox instance
clip_on [True | False]
clip_path a Path instance and a Transform instance, a Patch
color or c any matplotlib color
contains the hit testing function
dash_capstyle ['butt' | 'round' | 'projecting']
dash_joinstyle ['miter' | 'round' | 'bevel']
dashes sequence of on/off ink in points
data (np.array xdata, np.array ydata)
figure a matplotlib.figure.Figure instance
label any string
linestyle or ls [ '-' | '--' | '-.' | ':' | 'steps' | …]
linewidth or lw float value in points
lod [True | False]
marker [ '+' | ',' | '.' | '1' | '2' | '3' | '4' ]
markeredgecolor or mec any matplotlib color
markeredgewidth or mew float value in points
markerfacecolor or mfc any matplotlib color
markersize or ms float
markevery [ None | integer | (startind, stride) ]
picker used in interactive line selection
pickradius the line pick selection radius
solid_capstyle ['butt' | 'round' | 'projecting']
solid_joinstyle ['miter' | 'round' | 'bevel']
transform a matplotlib.transforms.Transform instance
visible [True | False]
xdata np.array
ydata np.array
zorder any number

1.7. Basic customizations

  • figure object is implied
  • explicit assignment is needed when customizing
fig = plt.figure()

1.7.1. Size

Local:

plt.figure(figsize=(3,4))
Global:
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (20,10)
import matplotlib

matplotlib.rc('figure', figsize=(20,10))

1.7.2. Font

  • 'serif'
  • 'sans-serif'
  • 'cursive'
  • 'fantasy'
  • 'monospace'
import matplotlib
import matplotlib.pyplot as plt

matplotlib.rc('font', family='Serif', weight='bold', size=8)

x = [1, 2, 3, 4, 5]
y = [1, 2, 3, 4, 5]

plt.plot(x, y)
plt.grid(True)
plt.show()

1.7.3. Subplots

fig = plt.figure()

ax1 = plt.subplot2grid(shape=(1,1), loc=(0,0)) # ``loc`` = Location to place axis within grid.

plt.subplot_adjust(left=0.9, bottom=0.16)  # set margins

1.8. Additional info

1.8.1. Lablel rotation

Code Listing 1.52. Lablel rotation
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [1, 4, 9, 6]
labels = ['Frogs', 'Hogs', 'Bogs', 'Slogs']

plt.plot(x, y, 'ro')

# You can specify a rotation for the tick labels in degrees or with keywords.
plt.xticks(rotation=45)

# Pad margins so that markers don't get clipped by the axes
plt.margins(0.2)

# Tweak spacing to prevent clipping of tick-labels
plt.subplots_adjust(bottom=0.15)
plt.show()
../_images/matplotlib-tick-rotation.png

Fig. 1.9. Lablel rotation

1.8.2. Grid

Code Listing 1.53. Grid Simple
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [1, 2, 3, 4, 5]
plt.plot(x, y)


plt.grid(True)
plt.show()
../_images/matplotlib-grid-simple.png

Fig. 1.10. Grid Simple

Code Listing 1.54. Grid Extra
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

# Major ticks every 20, minor ticks every 5
major_ticks = np.arange(0, 101, 20)
minor_ticks = np.arange(0, 101, 5)

ax.set_xticks(major_ticks)
ax.set_xticks(minor_ticks, minor=True)
ax.set_yticks(major_ticks)
ax.set_yticks(minor_ticks, minor=True)

# And a corresponding grid
ax.grid(which='both')

# Or if you want different settings for the grids:
ax.grid(which='minor', alpha=0.2)
ax.grid(which='major', alpha=0.5)

plt.show()
../_images/matplotlib-grid-extra.png

Fig. 1.11. Grid Extra

1.8.3. Trend line

Code Listing 1.55. Trend line
import matplotlib.pylab as pylab
import numpy as np

x = [1, 3, 5, 7, 9]
y = [2, 3, 4, 3, 4]

# plot the data itself
pylab.plot(x, y, label="data")

# calc the trendline (it is simply a linear fitting)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)

pylab.plot(x, p(x), color="red", linestyle='--')

# the line equation:
a = z[0]
b = z[1]
print(f"y = {a:.6}x + ({b:.6})")

# parabolic fit will be:
# z = numpy.polyfit(x, y, 2)
../_images/matplotlib-trendline.png

Fig. 1.12. Trend line

1.8.4. Error bars

Code Listing 1.56. Error bars
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
e = [0.5, 1., 1.5, 2.]

plt.errorbar(x, y, yerr=e, fmt='o')
plt.show()
../_images/matplotlib-plt-errorbar.png

Fig. 1.13. Error bars

1.8.5. Colorbar

Code Listing 1.57. Colorbar
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris


iris = load_iris()

# The indices of the features that we are plotting
x_index = 0
y_index = 1

# this formatter will label the colorbar with the correct target names
formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])

plt.figure(figsize=(5, 4))
plt.scatter(iris.data[:, x_index], iris.data[:, y_index], c=iris.target)
plt.colorbar(ticks=[0, 1, 2], format=formatter)

plt.xlabel(iris.feature_names[x_index])
plt.ylabel(iris.feature_names[y_index])

plt.tight_layout()
plt.show()
../_images/matplotlib-colorbar.png

Fig. 1.14. Colorbar

1.8.6. Changing colors

ax.spines['bottom'].set_color('#dddddd')
ax.spines['top'].set_color('#dddddd')
ax.spines['right'].set_color('red')
ax.spines['left'].set_color('red')
ax.tick_params(axis='x', colors='red')
ax.tick_params(axis='y', colors='red')
ax.yaxis.label.set_color('red')
ax.xaxis.label.set_color('red')
ax.title.set_color('red')

1.9. Working with multiple figures and axes

import numpy as np
import matplotlib.pyplot as plt

def f(t):
    return np.exp(-t) * np.cos(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)

plt.figure(1)
plt.subplot(211)
plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k')

plt.subplot(212)
plt.plot(t2, np.cos(2*np.pi*t2), 'r--')
plt.show()
../_images/matplotlib-plt-subplot.png

Fig. 1.15. Working with multiple figures and axes

1.10. Working with text

import numpy as np
import matplotlib.pyplot as plt

# Fixing random state for reproducibility
np.random.seed(19680801)

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)


plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()
../_images/matplotlib-plt-hist-text.png

Fig. 1.16. Working with text

1.10.1. Using mathematical expressions in text

plt.title(r'$\sigma_i=15$')

1.10.2. Annotating text

import numpy as np
import matplotlib.pyplot as plt

ax = plt.subplot(111)

t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2*np.pi*t)
line, = plt.plot(t, s, lw=2)

plt.annotate('local max', xy=(2, 1), xytext=(3, 1.5),
            arrowprops=dict(facecolor='black', shrink=0.05),
            )

plt.ylim(-2,2)
plt.show()
../_images/matplotlib-plt-annotate.png

Fig. 1.17. Annotating text

1.11. Logarithmic and other nonlinear axes

plt.xscale('log')
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter  # useful for `logit` scale


# Fixing random state for reproducibility
np.random.seed(19680801)

# make up some data in the interval ]0, 1[
y = np.random.normal(loc=0.5, scale=0.4, size=1000)
y = y[(y > 0) & (y < 1)]
y.sort()
x = np.arange(len(y))

# plot with various axes scales
plt.figure(1)

# linear
plt.subplot(221)
plt.plot(x, y)
plt.yscale('linear')
plt.title('linear')
plt.grid(True)


# log
plt.subplot(222)
plt.plot(x, y)
plt.yscale('log')
plt.title('log')
plt.grid(True)


# symmetric log
plt.subplot(223)
plt.plot(x, y - y.mean())
plt.yscale('symlog', linthreshy=0.01)
plt.title('symlog')
plt.grid(True)

# logit
plt.subplot(224)
plt.plot(x, y)
plt.yscale('logit')
plt.title('logit')
plt.grid(True)
# Format the minor tick labels of the y-axis into empty strings with
# `NullFormatter`, to avoid cumbering the axis with too many labels.
plt.gca().yaxis.set_minor_formatter(NullFormatter())
# Adjust the subplot layout, because the logit one may take more space
# than usual, due to y-tick labels like "1 - 10^{-3}"
plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.25,
                    wspace=0.35)

plt.show()
../_images/matplotlib-plt-scale.png

Fig. 1.18. Logarithmic and other nonlinear axes

1.12. plt.plot() vs ax.plot()

fig = plt.figure()
plt.plot(data)
fig.show()
  1. Takes the current figure and axes (if none exists it will create a new one) and plot into them:

    line = plt.plot(data)
    
  2. In your case, the behavior is same as before with explicitly stating the axes for plot:

    ax = plt.axes()
    line = ax.plot(data)
    
  3. This approach of using ax.plot(...) is a must, if you want to plot into multiple axes (possibly in one figure). For example when using a subplots. Explicitly creates new figure - you will not add anything to previous one. Explicitly creates a new axes with given rectangle shape and the rest is the same as with 2:

    fig = plt.figure()
    ax = fig.add_axes([0,0,1,1])
    line = ax.plot(data)
    

    possible problem using figure.add_axes is that it may add a new axes object to the figure, which will overlay the first one (or others). This happens if the requested size does not match the existing ones.

1.13. Assignment

1.13.1. Trigonometry

  1. Dla x z przedziału od 0.0 do 1.0 z próbkowaniem co 0.01 przedstaw przebiegi funkcji sin, cos dla parametrów 2 * np.pi * x

  2. Stwórz dwa osobne obrazki (figure):

    • Każdy z przebiegów na osobnym subplot
    • Na jednym plot dwa przebiegi funkcji
  3. Wykresy (subplot) mają być jeden nad drugim

  4. Wykresy podpisz nazwą funkcji trygonometrycznej

  5. Tekst etykiety osi y ustaw na “Wartość funkcji”

  6. Pokoloruj nazwy thicków x dla wykresu sin na czerwono

  7. Pokoloruj nazwę (label) dla cos na kolor zielony

  8. Na obu wykresach pokaż grid

  9. Narysuj drugi obrazek z nałożonymi na jeden plot wykresami obu funkcji

Hint:
  • np.sin()
  • np.cos()

1.13.2. Iris scatter

  1. https://raw.githubusercontent.com/AstroMatt/book-python/master/data-vizualization/data/iris.csv
  2. Z podanego powyżej adresu URL pobierz dane
  3. Dla każdego gatunku
  4. Dane stosunku sepal_length do sepal_width zwizualizuj w formie scatter za pomocą matplotlib
  5. Każdy gatunek powinien mieć inny kolor
Hint:
  • pd.groupby()