5.5. DataFrame Getitem

  • .at[] - takes tuple[str,str] as argument

  • .loc[] - takes tuple[str,str] as argument

  • .iat[] - takes tuple[int,int] as argument

  • .iloc[] - takes tuple[int,int] as argument

5.5.1. SetUp

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>>
>>> df = pd.DataFrame(
...     columns = ['Morning', 'Noon', 'Evening', 'Midnight'],
...     index = pd.date_range('1999-12-30', periods=7),
...     data = np.random.randn(7, 4))
>>>
>>> df
             Morning      Noon   Evening  Midnight
1999-12-30  1.764052  0.400157  0.978738  2.240893
1999-12-31  1.867558 -0.977278  0.950088 -0.151357
2000-01-01 -0.103219  0.410599  0.144044  1.454274
2000-01-02  0.761038  0.121675  0.443863  0.333674
2000-01-03  1.494079 -0.205158  0.313068 -0.854096
2000-01-04 -2.552990  0.653619  0.864436 -0.742165
2000-01-05  2.269755 -1.454366  0.045759 -0.187184

5.5.2. Columns

Single Column:

>>> df.Morning
1999-12-30    1.764052
1999-12-31    1.867558
2000-01-01   -0.103219
2000-01-02    0.761038
2000-01-03    1.494079
2000-01-04   -2.552990
2000-01-05    2.269755
Freq: D, Name: Morning, dtype: float64
>>> df['Morning']
1999-12-30    1.764052
1999-12-31    1.867558
2000-01-01   -0.103219
2000-01-02    0.761038
2000-01-03    1.494079
2000-01-04   -2.552990
2000-01-05    2.269755
Freq: D, Name: Morning, dtype: float64
>>> df.loc[:, 'Morning']
1999-12-30    1.764052
1999-12-31    1.867558
2000-01-01   -0.103219
2000-01-02    0.761038
2000-01-03    1.494079
2000-01-04   -2.552990
2000-01-05    2.269755
Freq: D, Name: Morning, dtype: float64
>>> df.iloc[:, 0]
1999-12-30    1.764052
1999-12-31    1.867558
2000-01-01   -0.103219
2000-01-02    0.761038
2000-01-03    1.494079
2000-01-04   -2.552990
2000-01-05    2.269755
Freq: D, Name: Morning, dtype: float64

Multiple columns:

>>> df[['Morning', 'Evening']]
             Morning   Evening
1999-12-30  1.764052  0.978738
1999-12-31  1.867558  0.950088
2000-01-01 -0.103219  0.144044
2000-01-02  0.761038  0.443863
2000-01-03  1.494079  0.313068
2000-01-04 -2.552990  0.864436
2000-01-05  2.269755  0.045759

5.5.3. Rows

df['2000-01-05'] will imply to take column with name 2000-01-05, hence KeyError:

>>> df['2000-01-05']
Traceback (most recent call last):
KeyError: '2000-01-05'
>>> df.loc['2000-01-05']
Morning     2.269755
Noon       -1.454366
Evening     0.045759
Midnight   -0.187184
Name: 2000-01-05 00:00:00, dtype: float64
>>> df.loc['2000-01']
             Morning      Noon   Evening  Midnight
2000-01-01 -0.103219  0.410599  0.144044  1.454274
2000-01-02  0.761038  0.121675  0.443863  0.333674
2000-01-03  1.494079 -0.205158  0.313068 -0.854096
2000-01-04 -2.552990  0.653619  0.864436 -0.742165
2000-01-05  2.269755 -1.454366  0.045759 -0.187184
>>> df.loc['1999']
             Morning      Noon   Evening  Midnight
1999-12-30  1.764052  0.400157  0.978738  2.240893
1999-12-31  1.867558 -0.977278  0.950088 -0.151357

5.5.4. Columns by Index

>>> df.iloc[:, 1]
1999-12-30    0.400157
1999-12-31   -0.977278
2000-01-01    0.410599
2000-01-02    0.121675
2000-01-03   -0.205158
2000-01-04    0.653619
2000-01-05   -1.454366
Freq: D, Name: Noon, dtype: float64
>>> df.iloc[:, [1,2]]
                Noon   Evening
1999-12-30  0.400157  0.978738
1999-12-31 -0.977278  0.950088
2000-01-01  0.410599  0.144044
2000-01-02  0.121675  0.443863
2000-01-03 -0.205158  0.313068
2000-01-04  0.653619  0.864436
2000-01-05 -1.454366  0.045759