4.20. DataFrame Plotting¶

4.20.1. Plot kinds¶
line
- Line Plotbar
- Vertical Bar Plotbarh
- Horizontal Bar Plothist
- Histogrambox
- Boxplotdensity
,kde
- Kernel Density Estimation Plotarea
- Area Plotpie
- Pie Plotscatter
- Scatter Plothexbin
- Hexbin Plot
4.20.2. Parameters¶
Parameter |
Default value |
---|---|
x |
|
y |
|
kind |
line |
ax |
|
subplots |
|
sharex |
|
sharey |
|
layout |
|
figsize |
|
use_index |
|
title |
|
grid |
|
legend |
|
style |
|
logx |
|
logy |
|
loglog |
|
xticks |
|
yticks |
|
xlim |
|
ylim |
|
rot |
|
fontsize |
|
colormap |
|
table |
|
yerr |
|
xerr |
|
secondary_y |
|
sort_columns |
|
xlabel |
|
ylabel |
|
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
Series or DataFrame |
None |
The object for which the method is called |
|
label or position |
None |
Only used if data is a DataFrame |
|
label, position or list of label, positions |
None |
Allows plotting of one column versus another. Only used if data is a DataFrame. |
|
str |
|
|
|
tuple |
None |
(width, height) in inches |
|
bool |
True |
Use index as ticks for x axis |
|
str or list |
None |
Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot. |
|
bool |
None |
(matlab style default) Axis grid lines |
|
bool or 'reverse' |
None |
Place legend on axis subplots |
|
list or dict |
None |
matplotlib line style per column |
|
bool or 'sym' |
False |
Use log scaling or symlog scaling on x axis |
|
bool or 'sym' |
False |
Use log scaling or symlog scaling on y axis |
|
bool or 'sym' |
False |
Use log scaling or symlog scaling on both x and y axes |
|
sequence |
None |
Values to use for the xticks |
|
sequence |
None |
Values to use for the yticks |
|
2-tuple/list |
None |
|
|
2-tuple/list |
None |
|
|
int |
None |
Rotation for ticks (xticks for vertical, yticks for horizontal plots) |
|
int |
None |
Font size for xticks and yticks |
|
str or matplotlib colormap object |
default None |
Colormap to select colors from. If string, load colormap with that name from matplotlib. |
|
bool |
None |
If True, plot colorbar (only relevant for 'scatter' and 'hexbin' plots) |
|
float |
0.5 (center) |
Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end). |
|
bool, Series or DataFrame |
False |
If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib's default layout. If a Series or DataFrame is passed, use passed data to draw a table. |
|
DataFrame, Series, array-like, dict or str |
None |
Equivalent to xerr. |
|
DataFrame, Series, array-like, dict or str |
None |
Equivalent to yerr. |
|
bool |
True |
When using a secondary_y axis, automatically mark the column labels with "(right)" in the legend. |
|
keywords |
None |
Options to pass to matplotlib plotting method. |
4.20.3. Prepare Data¶
import pandas as pd
DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'
df = pd.read_csv(DATA)
df.columns = [
'Sepal length',
'Sepal width',
'Petal length',
'Petal width',
'Species'
]
4.20.4. Generate Plot¶
4.20.10. Kernel Density Estimation Plot¶
df.plot(kind='density')
df.plot(kind='kde')

Figure 4.21. Kernel Density Estimation Plot¶
4.20.11. Area Plot¶
df.plot(kind='area')

Figure 4.22. Area Plot¶

Figure 4.23. Cumulative Flow Diagram in Atlassian Jira¶
4.20.15. Other¶
4.20.16. Hist¶
import matplotlib.pyplot as plt
import pandas as pd
DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'
df = pd.read_csv(DATA)
df.hist()
plt.show()

Figure 4.27. Visualization using hist¶
4.20.17. Density¶
import matplotlib.pyplot as plt
import pandas as pd
DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'
df = pd.read_csv(DATA)
df.plot(kind='density', subplots=True, layout=(2,2), sharex=False)
plt.show()

Figure 4.28. Visualization using density¶
4.20.18. Box¶
import matplotlib.pyplot as plt
import pandas as pd
DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'
df = pd.read_csv(DATA)
df.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()

Figure 4.29. Visualization using density¶
4.20.19. Scatter matrix¶
The in
pandas
version0.22
plotting module has been moved frompandas.tools.plotting
topandas.plotting
As of version
0.19
, thepandas.plotting
library did not exist
import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import scatter_matrix
DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'
df = pd.read_csv(DATA)
scatter_matrix(df)
plt.show()

Figure 4.30. Visualization using density¶
4.20.21. Further Reading¶
4.20.22. Assignments¶
"""
* Assignment: DataFrame Plot
* Complexity: medium
* Lines of code: 15 lines
* Time: 21 min
English:
1. Use data from "Given" section (see below)
2. Read data from `DATA` as `df: pd.DataFrame`
3. Select `Luminance` stylesheet
4. Parse column with dates
5. Select desired date and location, then resample by hour
6. Display chart (line) with activity hours in "Sleeping Quarters upper" location
7. Active is when `Luminance` is not zero
8. Easy: for day 2019-09-28
9. Advanced: for each day, as subplots
Polish:
1. Użyj danych z sekcji "Given" (patrz poniżej)
2. Wczytaj dane z `DATA` jako `df: pd.DataFrame`
3. Wybierz arkusz `Luminance`
4. Sparsuj kolumny z datami
5. Wybierz pożądaną datę i lokację, następnie próbkuj co godzinę
6. Aktywność jest gdy `Luminance` jest różna od zera
7. Wyświetl wykres (line) z godzinami aktywności w dla lokacji "Sleeping Quarters upper"
8. Łatwe: dla dnia 2019-09-28
9. Zaawansowane: dla wszystkich dni, jako subplot
Hints:
* `pd.Series.apply(np.sign)` :ref:`Numpy signum`
* `pd.Series.resample('H').sum()`
Tests:
>>> type(result) is pd.Series
True
>>> pd.set_option('display.width', 500)
>>> pd.set_option('display.max_columns', 10)
>>> pd.set_option('display.max_rows', 10)
>>> result # doctest: +NORMALIZE_WHITESPACE
datetime
2019-09-28 00:00:00+00:00 1
2019-09-28 01:00:00+00:00 1
2019-09-28 02:00:00+00:00 1
2019-09-28 03:00:00+00:00 1
2019-09-28 04:00:00+00:00 0
..
2019-09-28 19:00:00+00:00 1
2019-09-28 20:00:00+00:00 1
2019-09-28 21:00:00+00:00 1
2019-09-28 22:00:00+00:00 1
2019-09-28 23:00:00+00:00 1
Freq: H, Name: value, Length: 24, dtype: int64
"""
# Given
import numpy as np
import pandas as pd
DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/xlsx/sensors-optima.xlsx'
WHERE = 'Sleeping Quarters upper'
WHEN = '2019-09-28'
result = ...