5.23. DataFrame Loc

  • loc - uses fancy indexing

  • iloc - only index numbers

  • df.loc - start and stop are included!!

  • df.iloc - behaves like Python slices

../../_images/pandas-iat-iloc.png
import pandas as pd
import numpy as np
np.random.seed(0)

df = pd.DataFrame(
    columns = ['Morning', 'Noon', 'Evening', 'Midnight'],
    index = pd.date_range('1999-12-30', periods=7),
    data = np.random.randn(7, 4))

df
#              Morning      Noon   Evening  Midnight
# 1999-12-30  1.764052  0.400157  0.978738  2.240893
# 1999-12-31  1.867558 -0.977278  0.950088 -0.151357
# 2000-01-01 -0.103219  0.410599  0.144044  1.454274
# 2000-01-02  0.761038  0.121675  0.443863  0.333674
# 2000-01-03  1.494079 -0.205158  0.313068 -0.854096
# 2000-01-04 -2.552990  0.653619  0.864436 -0.742165
# 2000-01-05  2.269755 -1.454366  0.045759 -0.187184

5.23.1. All Values in Row

5.23.1.1. Single row

  • Returns the row as a pd.Series

df.loc['2000-01-01']
# Morning    -0.103219
# Noon        0.410599
# Evening     0.144044
# Midnight    1.454274
# Name: 2000-01-01 00:00:00, dtype: float64

5.23.1.2. Range of rows

  • Returns the rows as a pd.DataFrame

df.loc['2000-01-02':'2000-01-04']
#              Morning      Noon   Evening  Midnight
# 2000-01-02  0.761038  0.121675  0.443863  0.333674
# 2000-01-03  1.494079 -0.205158  0.313068 -0.854096
# 2000-01-04 -2.552990  0.653619  0.864436 -0.742165

5.23.1.3. Range of dates

df.loc['2000-01']
#              Morning      Noon   Evening  Midnight
# 2000-01-01 -0.103219  0.410599  0.144044  1.454274
# 2000-01-02  0.761038  0.121675  0.443863  0.333674
# 2000-01-03  1.494079 -0.205158  0.313068 -0.854096
# 2000-01-04 -2.552990  0.653619  0.864436 -0.742165
# 2000-01-05  2.269755 -1.454366  0.045759 -0.187184

df.loc['1999']
#              Morning      Noon   Evening  Midnight
# 1999-12-30  1.764052  0.400157  0.978738  2.240893
# 1999-12-31  1.867558 -0.977278  0.950088 -0.151357

5.23.2. Values in Selected Columns

5.23.2.1. Single row and single column

df.loc['2000-01-05', 'Morning']
# 2.2697546239876076

5.23.2.2. Range of rows and single column

  • Note that both the start and stop of the slice are included

df.loc['1999-12-31':'2000-01-02', 'Noon']
# 1999-12-31   -0.977278
# 2000-01-01    0.410599
# 2000-01-02    0.121675
# Freq: D, Name: Noon, dtype: float64

5.23.2.3. Range of rows and single column

  • For Numeric Index, Range Index and String Index works without conversion

  • For Datetime index, conversion to pd.Timestamp() is needed

df.loc[['2000-01-02','2000-01-04'], 'Noon']
# KeyError: "None of [Index(['2000-01-02', '2000-01-04'], dtype='object')] are in the [index]"
date1 = pd.Timestamp('2000-01-02')
date2 = pd.Timestamp('2000-01-05')

df.loc[[date1,date2], 'Noon']
# 2000-01-02    0.121675
# 2000-01-05   -1.454366
# Name: Noon, dtype: float64

5.23.2.4. Single row and selected columns

df.loc['2000-01-05', ['Noon', 'Midnight']]
# Noon       -1.454366
# Midnight   -0.187184
# Name: 2000-01-05 00:00:00, dtype: float64

5.23.2.5. Single row and column range

df.loc['2000-01-05', 'Noon':'Midnight']
# Noon       -1.454366
# Evening     0.045759
# Midnight   -0.187184
# Name: 2000-01-05 00:00:00, dtype: float64

5.23.3. Fancy Indexing

5.23.3.1. Boolean list with the same length as the row axis

  • Print row for given index is True

df.loc[[True, False, True, False, False, False, True]]
#              Morning      Noon   Evening  Midnight
# 1999-12-30  1.764052  0.400157  0.978738  2.240893
# 2000-01-01 -0.103219  0.410599  0.144044  1.454274
# 2000-01-05  2.269755 -1.454366  0.045759 -0.187184

5.23.3.2. Conditional that returns a boolean Series

df.loc[df['Morning'] < 0]
#              Morning      Noon   Evening  Midnight
# 2000-01-01 -0.103219  0.410599  0.144044  1.454274
# 2000-01-04 -2.552990  0.653619  0.864436 -0.742165

5.23.3.3. Conditional that returns a boolean Series with column labels specified

df.loc[df['Morning'] < 0, 'Evening']
# 2000-01-01    0.144044
# 2000-01-04    0.864436
# Freq: 3D, Name: Evening, dtype: float64
df.loc[df['Morning'] < 0, ['Morning', 'Evening']]
#              Morning   Evening
# 2000-01-01 -0.103219  0.144044
# 2000-01-04 -2.552990  0.864436
where = df['Morning'] < 0

df.loc[where, ['Morning', 'Evening']]
#              Morning   Evening
# 2000-01-01 -0.103219  0.144044
# 2000-01-04 -2.552990  0.864436
where = df['Morning'] < 0
select = ['Morning', 'Evening']

df.loc[where, select]
#              Morning   Evening
# 2000-01-01 -0.103219  0.144044
# 2000-01-04 -2.552990  0.864436

5.23.4. Callable

5.23.4.1. Filtering with callable

def morning_below_zero(df):
    return df['Morning'] < 0

df.loc[morning_below_zero]
#                  Morning      Noon   Evening  Midnight
# 2000-01-01 -0.103219  0.410599  0.144044  1.454274
# 2000-01-04 -2.552990  0.653619  0.864436 -0.742165
df.loc[lambda df: df['Morning'] < 0]
#              Morning      Noon   Evening  Midnight
# 2000-01-01 -0.103219  0.410599  0.144044  1.454274
# 2000-01-04 -2.552990  0.653619  0.864436 -0.742165

5.23.5. Setting Values

5.23.5.1. Set value for all items matching the list of labels

df.loc[df['Morning'] < 0, 'Evening'] = np.inf
#              Morning      Noon   Evening  Midnight
# 1999-12-30  1.764052  0.400157  0.978738  2.240893
# 1999-12-31  1.867558 -0.977278  0.950088 -0.151357
# 2000-01-01 -0.103219  0.410599       inf  1.454274
# 2000-01-02  0.761038  0.121675  0.443863  0.333674
# 2000-01-03  1.494079 -0.205158  0.313068 -0.854096
# 2000-01-04 -2.552990  0.653619       inf -0.742165
# 2000-01-05  2.269755 -1.454366  0.045759 -0.187184

5.23.5.2. Set value for an entire row

df.loc['2000-01-01'] = np.nan
#              Morning      Noon   Evening  Midnight
# 1999-12-30  1.764052  0.400157  0.978738  2.240893
# 1999-12-31  1.867558 -0.977278  0.950088 -0.151357
# 2000-01-01       NaN       NaN       NaN       NaN
# 2000-01-02  0.761038  0.121675  0.443863  0.333674
# 2000-01-03  1.494079 -0.205158  0.313068 -0.854096
# 2000-01-04 -2.552990  0.653619       inf -0.742165
# 2000-01-05  2.269755 -1.454366  0.045759 -0.187184

5.23.5.3. Set value for an entire column

df.loc[:, 'Evening'] = 0.0
#              Morning      Noon  Evening  Midnight
# 1999-12-30  1.764052  0.400157      0.0  2.240893
# 1999-12-31  1.867558 -0.977278      0.0 -0.151357
# 2000-01-01       NaN       NaN      0.0       NaN
# 2000-01-02  0.761038  0.121675      0.0  0.333674
# 2000-01-03  1.494079 -0.205158      0.0 -0.854096
# 2000-01-04 -2.552990  0.653619      0.0 -0.742165
# 2000-01-05  2.269755 -1.454366      0.0 -0.187184

5.23.5.4. Set value for rows matching callable condition

df[df < 0] = -np.inf
df
#              Morning      Noon  Evening  Midnight
# 1999-12-30  1.764052  0.400157      0.0  2.240893
# 1999-12-31  1.867558      -inf      0.0      -inf
# 2000-01-01       NaN       NaN      0.0       NaN
# 2000-01-02  0.761038  0.121675      0.0  0.333674
# 2000-01-03  1.494079      -inf      0.0      -inf
# 2000-01-04      -inf  0.653619      0.0      -inf
# 2000-01-05  2.269755      -inf      0.0      -inf

5.23.6. Assignments

Todo

Create assignments