5.22. DataFrame Plotting¶

../../_images/matplotlib-figure-anatomy.png

5.22.1. Plot kinds¶

line - Line Plot
bar - Vertical Bar Plot
barh - Horizontal Bar Plot
hist - Histogram
box - Boxplot
density, kde - Kernel Density Estimation Plot
area - Area Plot
pie - Pie Plot
scatter - Scatter Plot
hexbin - Hexbin Plot

5.22.2. Parameters¶

Table 5.13. Parameters¶
Parameter	Default value
x	`None`
y	`None`
kind	line
ax	`None`
subplots	`False`
sharex	`None`
sharey	`False`
layout	`None`
figsize	`None`
use_index	`True`
title	`None`
grid	`None`
legend	`True`
style	`None`
logx	`False`
logy	`False`
loglog	`False`
xticks	`None`
yticks	`None`
xlim	`None`
ylim	`None`
rot	`None`
fontsize	`None`
colormap	`None`
table	`False`
yerr	`None`
xerr	`None`
secondary_y	`False`
sort_columns	`False`
xlabel	`None`
ylabel	`None`

Table 5.14. Parameters¶
Parameter	Type	Default	Description
`data`	Series or DataFrame	None	The object for which the method is called
`x`	label or position	None	Only used if data is a DataFrame
`y`	label, position or list of label, positions	None	Allows plotting of one column versus another. Only used if data is a DataFrame.
`kind`	str	`line`	`line`, `bar`, `barh`, `hist`, `box`, `kde`, `density`, `area`, `pie`, `scatter`, `hexbin`
`figsize`	tuple	None	(width, height) in inches
`use_index`	bool	True	Use index as ticks for x axis
`title`	str or list	None	Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot.
`grid`	bool	None	(matlab style default) Axis grid lines
`legend`	bool or 'reverse'	None	Place legend on axis subplots
`style`	list or dict	None	matplotlib line style per column
`logx`	bool or 'sym'	False	Use log scaling or symlog scaling on x axis
`logy`	bool or 'sym'	False	Use log scaling or symlog scaling on y axis
`loglog`	bool or 'sym'	False	Use log scaling or symlog scaling on both x and y axes
`xticks`	sequence	None	Values to use for the xticks
`yticks`	sequence	None	Values to use for the yticks
`xlim`	2-tuple/list	None
`ylim`	2-tuple/list	None
`rot`	int	None	Rotation for ticks (xticks for vertical, yticks for horizontal plots)
`fontsize`	int	None	Font size for xticks and yticks
`colormap`	str or matplotlib colormap object	default None	Colormap to select colors from. If string, load colormap with that name from matplotlib.
`colorbar`	bool	None	If True, plot colorbar (only relevant for 'scatter' and 'hexbin' plots)
`position`	float	0.5 (center)	Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end).
`table`	bool, Series or DataFrame	False	If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib's default layout. If a Series or DataFrame is passed, use passed data to draw a table.
`yerr`	DataFrame, Series, array-like, dict or str	None	Equivalent to xerr.
`xerr`	DataFrame, Series, array-like, dict or str	None	Equivalent to yerr.
`mark_right`	bool	True	When using a secondary_y axis, automatically mark the column labels with "(right)" in the legend.
`**kwds`	keywords	None	Options to pass to matplotlib plotting method.

5.22.3. SetUp¶

>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>>
>>>
>>> DATA = 'https://python3.info/_static/iris-clean.csv'
>>>
>>> df = pd.read_csv(DATA)

5.22.4. Line Plot¶

default

>>> plot = df.plot(kind='line')
>>> plt.show()  

../../_images/pandas-dataframe-plot-line.png — Figure 5.18. Line Plot¶

>>> plot = df.plot(kind='line', subplots=True)
>>> plt.show()  

../../_images/pandas-dataframe-plot-line-subplots.png — Figure 5.19. Line Plot with Subplots¶

>>> plot = df.plot(kind='line',
...                subplots=True,
...                layout=(2,2),
...                sharex=True,
...                sharey=True)
>>> plt.show()  

../../_images/pandas-dataframe-plot-line-layout.png — Figure 5.20. Line Plot with Subplots and Layout¶

5.22.5. Vertical Bar Plot¶

>>> plot = df.plot(kind='bar', subplots=True, layout=(2,2))
>>> plt.show()  

../../_images/pandas-dataframe-plot-bar.png — Figure 5.21. Vertical Bar Plot¶

5.22.6. Horizontal Bar Plot¶

>>> plot = df.plot(kind='barh',
...                title='Iris',
...                ylabel='centimeters',
...                xlabel='iris',
...                subplots=True,
...                layout=(2,2),
...                sharex=True,
...                sharey=True,
...                legend='upper right',
...                grid=True,
...                figsize=(10,10))
>>> plt.show()  

../../_images/pandas-dataframe-plot-barh.png — Figure 5.22. Horizontal Bar Plot¶

5.22.7. Histogram¶

>>> plot = df.plot(kind='hist',
...                rwidth=0.8,
...                xlabel='centimeters',
...                title='Iris Dimensions Frequency')
>>> plt.show()  

../../_images/pandas-dataframe-plot-hist.png — Figure 5.23. Histogram¶

>>> plot = df.plot(kind='hist',
...                rwidth=0.8,
...                xlabel='centimeters',
...                title='Iris Dimensions Frequency',
...                subplots=True,
...                layout=(2,2),
...                sharex=True,
...                sharey=True)
>>> plt.show()  

../../_images/pandas-dataframe-plot-hist-layout.png — Figure 5.24. Histogram¶

>>> plot = df.hist()
>>> plt.show()  

>>> plot = df['sepal_length'].hist(bins=3,
...                                rwidth=0.8,
...                                legend=None,
...                                grid=False)
>>>
>>> _ = plot.xaxis.set_ticks(ticks=[4.9, 6.1, 7.3],
...                          labels=['small', 'medium', 'large'])
>>> plt.show()  

../../_images/pandas-dataframe-plot-hist-categories.png — Figure 5.26. Visualization using hist¶

5.22.8. Boxplot¶

>>> plot = df.plot(kind='box')
>>> plt.show()  

../../_images/pandas-dataframe-plot-box.png — Figure 5.27. Boxplot¶

>>> plot = df.plot(kind='box',
...                subplots=True,
...                layout=(2,2),
...                sharex=False,
...                sharey=False)
>>>
>>> plt.show()  

../../_images/pandas-dataframe-plot-box-layout.png — Figure 5.28. Boxplot with layout¶

5.22.9. Kernel Density Estimation Plot¶

Also known as kind='kde' - Kernel Density Estimation

>>> plot = df.plot(kind='density')
>>> plt.show()  

../../_images/pandas-dataframe-plot-density.png — Figure 5.29. Kernel Density Estimation Plot¶

>>> plot = df.plot(kind='density',
...                subplots=True,
...                layout=(2,2),
...                sharex=False)
>>> plt.subplots_adjust(hspace=0.5, wspace=0.5)  # margins between charts
>>> plt.show()  

../../_images/pandas-dataframe-plot-density-margin.png — Figure 5.30. Density plot with margins¶

5.22.10. Area Plot¶

>>> plot = df.plot(kind='area')
>>> plt.show()  

../../_images/pandas-dataframe-plot-area.png — Figure 5.31. Area Plot¶

../../_images/pandas-dataframe-plot-cumulative-flow-diagram.png — Figure 5.32. Cumulative Flow Diagram in Atlassian Jira¶

5.22.11. Pie Plot¶

List of Matplotlib color names [1]

../../_images/matplotlib-colors.png — Figure 5.33. List of Matplotlib color names [1]¶

>>> data = pd.cut(df['sepal_length'],
...               bins=[3, 5, 7, np.inf],
...               labels=['small', 'medium', 'large'],
...               include_lowest=True).value_counts()
>>>
>>> plot = data.plot(kind='pie',
...                  autopct='%1.0f%%',
...                  colors=['plum', 'violet', 'magenta'],
...                  explode=[0.1, 0, 0],
...                  shadow=True,
...                  startangle=-215,
...                  xlabel=None,
...                  ylabel=None,
...                  title='sepal_length\nsmall: 0.0 to 3.0\nmedium: 3.0 to 5.0\nlarge: 7.0 to inf',
...                  figsize=(10,10))
>>>
>>> plt.show()  

../../_images/pandas-dataframe-plot-pie.png — Figure 5.34. Pie Plot¶

5.22.12. Scatter Plot¶

>>> plot = df.plot(kind='scatter', x='sepal_length', y='sepal_width')
>>> plt.show()  

../../_images/pandas-dataframe-plot-scatter-sepal.png — Figure 5.35. Scatter plot: sepal_length vs sepal_width¶

>>> plot = df.plot(kind='scatter', x='petal_length', y='petal_width')
>>> plt.show()  

../../_images/pandas-dataframe-plot-scatter-petal.png — Figure 5.36. Scatter plot: petal_length vs petal_width¶

>>> data = df.replace({'setosa': 0,
...                    'virginica': 1,
...                    'versicolor': 2})
>>>
>>> plot = data.plot(kind='scatter',
...                  x='sepal_length',
...                  y='sepal_width',
...                  colormap='viridis',
...                  c='species')
>>> plt.show()  

../../_images/pandas-dataframe-plot-scatter-viridis.png — Figure 5.37. Scatter plot using viridis colormap¶

5.22.13. Hexbin Plot¶

>>> plot = df.plot(kind='hexbin', x='petal_length', y='petal_width')
>>> plt.show()  

../../_images/pandas-dataframe-plot-hexbin.png — Figure 5.38. Hexbin Plot¶

5.22.14. Scatter matrix¶

The in pandas version 0.22 plotting module has been moved from pandas.tools.plotting to pandas.plotting
As of version 0.19, the pandas.plotting library did not exist

>>> from pandas.plotting import scatter_matrix
>>>
>>> plot = scatter_matrix(df)
>>> plt.show()  

../../_images/pandas-dataframe-plot-scattermatrix.png — Figure 5.39. Scatter Matrix¶

>>> data = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
>>> colors = df['species'].replace({'setosa': 0, 'virginica': 1, 'versicolor': 2})  # colors must be numerical
>>>
>>> plot = scatter_matrix(data, c=colors)
>>> plt.show()  

../../_images/pandas-dataframe-plot-scattermatrix-colors.png — Figure 5.40. Scatter Matrix with colors¶

5.22.15. Actinograms¶

../../_images/pandas-dataframe-actinogram-1.png

../../_images/pandas-dataframe-actinogram-2.png

5.22.16. Further Reading¶

5.22.17. References¶

5.22.18. Assignments¶

Code 5.109. Solution¶

"""
* Assignment: DataFrame Plot
* Complexity: medium
* Lines of code: 15 lines
* Time: 21 min

English:
    1. Read data from `DATA` as `df: pd.DataFrame`
    2. Select `Luminance` stylesheet
    3. Parse column with dates
    4. Select desired date and location, then resample by hour
    5. Display chart (line) with activity hours in "Sleeping Quarters upper" location
    6. Active is when `Luminance` is not zero
    7. Easy: for day 2019-09-28
    8. Advanced: for each day, as subplots
    9. Run doctests - all must succeed

Polish:
    1. Wczytaj dane z `DATA` jako `df: pd.DataFrame`
    2. Wybierz arkusz `Luminance`
    3. Sparsuj kolumny z datami
    4. Wybierz pożądaną datę i lokację, następnie próbkuj co godzinę
    5. Aktywność jest gdy `Luminance` jest różna od zera
    6. Wyświetl wykres (line) z godzinami aktywności w dla lokacji "Sleeping Quarters upper"
    7. Łatwe: dla dnia 2019-09-28
    8. Zaawansowane: dla wszystkich dni, jako subplot
    9. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `pd.Series.apply(np.sign)` :ref:`Numpy signum`
    * `pd.Series.resample('H').sum()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> pd.set_option('display.width', 500)
    >>> pd.set_option('display.max_columns', 10)
    >>> pd.set_option('display.max_rows', 10)

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is pd.Series, \
    'Variable `result` must be a `pd.Series` type'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    datetime
    2019-09-28 00:00:00+00:00    1
    2019-09-28 01:00:00+00:00    1
    2019-09-28 02:00:00+00:00    1
    2019-09-28 03:00:00+00:00    1
    2019-09-28 04:00:00+00:00    0
                                ..
    2019-09-28 19:00:00+00:00    1
    2019-09-28 20:00:00+00:00    1
    2019-09-28 21:00:00+00:00    1
    2019-09-28 22:00:00+00:00    1
    2019-09-28 23:00:00+00:00    1
    Freq: H, Name: value, Length: 24, dtype: int64
"""

import numpy as np
import pandas as pd


DATA = 'https://python3.info/_static/sensors-optima.xlsx'
WHERE = 'Sleeping Quarters upper'
WHEN = '2019-09-28'

# type: pd.Series
result = ...