5.30. DataFrame Mapping

import pandas as pd
import numpy as np
np.random.seed(0)

df = pd.DataFrame(
    columns = ['Morning', 'Noon', 'Evening', 'Midnight'],
    index = pd.date_range('1999-12-30', periods=7),
    data = np.random.randn(7, 4))

df
#              Morning      Noon   Evening  Midnight
# 1999-12-30  1.764052  0.400157  0.978738  2.240893
# 1999-12-31  1.867558 -0.977278  0.950088 -0.151357
# 2000-01-01 -0.103219  0.410599  0.144044  1.454274
# 2000-01-02  0.761038  0.121675  0.443863  0.333674
# 2000-01-03  1.494079 -0.205158  0.313068 -0.854096
# 2000-01-04 -2.552990  0.653619  0.864436 -0.742165
# 2000-01-05  2.269755 -1.454366  0.045759 -0.187184

5.30.1. Map

  • .map() works element-wise on a Series

df['Morning'].map(lambda value: round(value, 2))
# 1999-12-30    1.76
# 1999-12-31    1.87
# 2000-01-01   -0.10
# 2000-01-02    0.76
# 2000-01-03    1.49
# 2000-01-04   -2.55
# 2000-01-05    2.27
# Freq: D, Name: Morning, dtype: float64
df['Morning'].map(int)
# 1999-12-30    1
# 1999-12-31    1
# 2000-01-01    0
# 2000-01-02    0
# 2000-01-03    1
# 2000-01-04   -2
# 2000-01-05    2
# Freq: D, Name: Morning, dtype: int64

5.30.2. Apply

  • .apply() works on a row / column basis of a DataFrame

df['Morning'].apply(int)
# 1999-12-30    1
# 1999-12-31    1
# 2000-01-01    0
# 2000-01-02    0
# 2000-01-03    1
# 2000-01-04   -2
# 2000-01-05    2
# Freq: D, Name: Morning, dtype: int64
df['Morning'].apply(lambda value: round(value, 2))
# 1999-12-30    1.76
# 1999-12-31    1.87
# 2000-01-01   -0.10
# 2000-01-02    0.76
# 2000-01-03    1.49
# 2000-01-04   -2.55
# 2000-01-05    2.27

5.30.3. Applymap

  • .applymap() works element-wise on a DataFrame

5.30.4. Summary

First major difference: DEFINITION

  • map is defined on Series ONLY

  • applymap is defined on DataFrames ONLY

  • apply is defined on BOTH

Second major difference: ARGUMENT TYPE

  • map accepts dict``s, ``Series, or callable

  • applymap and apply accept callables only

Third major difference: BEHAVIOR

  • map is elementwise for Series

  • applymap is elementwise for DataFrames

  • apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): USE CASE

  • map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))

  • applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))

  • apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize))

Footnotes:

  • map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.

  • applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in some cases. My suggestion is to test them both and use whatever works better.

  • map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

  • Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has fastpaths when called with certain NumPy functions such as mean, sum, etc.

../../_images/pd-mapping.png

5.30.5. Assignments

5.30.5.1. DataFrame Mapping

English
  1. Download data/phones.csv

  2. Use parse_dates=['date'] on reading file

  3. Split column with datetime into two separate: date and time columns

  4. Use lambda

Polish
  1. Pobierz data/phones.csv

  2. Użyj parse_dates=['date'] przy wczytywaniu pliku

  3. Podziel kolumnę z datetime na dwie osobne: datę i czas

  4. Użyj lambdy

Hint
  • .apply()

  • .applymap()

  • df[ ['A', 'b'] ]