4.20. DataFrame Plotting

../../_images/matplotlib-figure-anatomy1.png

4.20.1. Plot kinds

  • line - Line Plot

  • bar - Vertical Bar Plot

  • barh - Horizontal Bar Plot

  • hist - Histogram

  • box - Boxplot

  • density, kde - Kernel Density Estimation Plot

  • area - Area Plot

  • pie - Pie Plot

  • scatter - Scatter Plot

  • hexbin - Hexbin Plot

4.20.2. Parameters

Table 4.10. Parameters

Parameter

Default value

x

None

y

None

kind

line

ax

None

subplots

False

sharex

None

sharey

False

layout

None

figsize

None

use_index

True

title

None

grid

None

legend

True

style

None

logx

False

logy

False

loglog

False

xticks

None

yticks

None

xlim

None

ylim

None

rot

None

fontsize

None

colormap

None

table

False

yerr

None

xerr

None

secondary_y

False

sort_columns

False

xlabel

None

ylabel

None

Table 4.11. Parameters

Parameter

Type

Default

Description

data

Series or DataFrame

None

The object for which the method is called

x

label or position

None

Only used if data is a DataFrame

y

label, position or list of label, positions

None

Allows plotting of one column versus another. Only used if data is a DataFrame.

kind

str

line

line, bar, barh, hist, box, kde, density, area, pie, scatter, hexbin

figsize

tuple

None

(width, height) in inches

use_index

bool

True

Use index as ticks for x axis

title

str or list

None

Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot.

grid

bool

None

(matlab style default) Axis grid lines

legend

bool or 'reverse'

None

Place legend on axis subplots

style

list or dict

None

matplotlib line style per column

logx

bool or 'sym'

False

Use log scaling or symlog scaling on x axis

logy

bool or 'sym'

False

Use log scaling or symlog scaling on y axis

loglog

bool or 'sym'

False

Use log scaling or symlog scaling on both x and y axes

xticks

sequence

None

Values to use for the xticks

yticks

sequence

None

Values to use for the yticks

xlim

2-tuple/list

None

ylim

2-tuple/list

None

rot

int

None

Rotation for ticks (xticks for vertical, yticks for horizontal plots)

fontsize

int

None

Font size for xticks and yticks

colormap

str or matplotlib colormap object

default None

Colormap to select colors from. If string, load colormap with that name from matplotlib.

colorbar

bool

None

If True, plot colorbar (only relevant for 'scatter' and 'hexbin' plots)

position

float

0.5 (center)

Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end).

table

bool, Series or DataFrame

False

If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib's default layout. If a Series or DataFrame is passed, use passed data to draw a table.

yerr

DataFrame, Series, array-like, dict or str

None

Equivalent to xerr.

xerr

DataFrame, Series, array-like, dict or str

None

Equivalent to yerr.

mark_right

bool

True

When using a secondary_y axis, automatically mark the column labels with "(right)" in the legend.

**kwds

keywords

None

Options to pass to matplotlib plotting method.

4.20.3. Prepare Data

import pandas as pd


DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'

df = pd.read_csv(DATA)
df.columns = [
    'Sepal length',
    'Sepal width',
    'Petal length',
    'Petal width',
    'Species'
]

4.20.4. Generate Plot

4.20.4.1. Line Plot

  • default

df.plot()
df.plot(kind='line')
../../_images/pandas-dataframe-plot-line.png

Figure 4.19. Line Plot

4.20.4.2. Vertical Bar Plot

df.plot(kind='bar')
../../_images/pandas-dataframe-plot-bar.png

Figure 4.20. Vertical Bar Plot

4.20.4.3. Horizontal Bar Plot

df.plot(kind='barh')
../../_images/pandas-dataframe-plot-barh.png

Figure 4.21. Horizontal Bar Plot

4.20.4.4. Histogram

df.plot(kind='hist')
../../_images/pandas-dataframe-plot-hist.png

Figure 4.22. Histogram

4.20.4.5. Boxplot

df.plot(kind='box')
../../_images/pandas-dataframe-plot-box.png

Figure 4.23. Boxplot

4.20.4.6. Kernel Density Estimation Plot

df.plot(kind='density')
df.plot(kind='kde')
../../_images/pandas-dataframe-plot-density.png

Figure 4.24. Kernel Density Estimation Plot

4.20.4.7. Area Plot

df.plot(kind='area')
../../_images/pandas-dataframe-plot-area.png

Figure 4.25. Area Plot

../../_images/pandas-dataframe-plot-cumulative-flow-diagram.png

Figure 4.26. Cumulative Flow Diagram in Atlassian Jira

4.20.4.8. Pie Plot

df.plot(kind='pie')
pandas/dataframe/img/pandas-dataframe-plot-pie.png

Figure 4.27. Pie Plot

4.20.4.9. Scatter Plot

df.plot(kind='scatter')
pandas/dataframe/img/pandas-dataframe-plot-scatter.png

Figure 4.28. Scatter Plot

4.20.4.10. Hexbin Plot

df.plot(kind='hexbin')
pandas/dataframe/img/pandas-dataframe-plot-hexbin.png

Figure 4.29. Hexbin Plot

4.20.5. Other

4.20.5.1. Hist

import matplotlib.pyplot as plt
import pandas as pd


DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'

df = pd.read_csv(DATA)
df.hist()
plt.show()
../../_images/pandas-dataframe-plot-hist.png

Figure 4.30. Visualization using hist

4.20.5.2. Density

import matplotlib.pyplot as plt
import pandas as pd


DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'

df = pd.read_csv(DATA)
df.plot(kind='density', subplots=True, layout=(2,2), sharex=False)
plt.show()
../../_images/pandas-dataframe-plot-density2.png

Figure 4.31. Visualization using density

4.20.5.3. Box

import matplotlib.pyplot as plt
import pandas as pd


DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'

df = pd.read_csv(DATA)
df.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()
../../_images/pandas-dataframe-plot-box2.png

Figure 4.32. Visualization using density

4.20.5.4. Scatter matrix

  • The in pandas version 0.22 plotting module has been moved from pandas.tools.plotting to pandas.plotting

  • As of version 0.19, the pandas.plotting library did not exist

import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import scatter_matrix


DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-clean.csv'

df = pd.read_csv(DATA)
scatter_matrix(df)
plt.show()
../../_images/pandas-dataframe-plot-scatter-matrix.png

Figure 4.33. Visualization using density

4.20.6. Actinograms

../../_images/pandas-dataframe-actinogram-1.png
../../_images/pandas-dataframe-actinogram-2.png

4.20.8. Assignments

4.20.8.1. DataFrame Plot

  • Assignment: DataFrame Plot

  • Last update: 2020-10-01

  • Complexity level: medium

  • Lines of code to write: 15 lines

  • Estimated time of completion: 21 min

  • Filename: solution/df_plot.py

English:
  1. Use data from "Given" section (see below)

  2. Read data from DATA as sensors: pd.DataFrame

  3. Select Luminance stylesheet

  4. Parse column with dates

  5. Select desired date and location, then resample by hour

  6. Display chart (line) with activity hours in "Sleeping Quarters upper" location

  7. Active is when Luminance is not zero

  8. Easy: for day 2019-09-28

  9. Advanced: for each day, as subplots

Polish:
  1. Użyj danych z sekcji "Given" (patrz poniżej)

  2. Wczytaj dane z DATA jako sensors: pd.DataFrame

  3. Wybierz arkusz Luminance

  4. Sparsuj kolumny z datami

  5. Wybierz pożądaną datę i lokację, następnie próbkuj co godzinę

  6. Aktywność jest gdy Luminance jest różna od zera

  7. Wyświetl wykres (line) z godzinami aktywności w dla lokacji "Sleeping Quarters upper"

  8. Łatwe: dla dnia 2019-09-28

  9. Zaawansowane: dla wszystkich dni, jako subplot

Given:
DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/xlsx/sensors-optima.xlsx'
WHERE = 'Sleeping Quarters upper'
WHEN = '2019-09-28'
Hints:
  • pd.Series.apply(np.sign) Signum

  • pd.Series.resample('H').sum()