4.5. DataFrame Getitem¶
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(
columns = ['Morning', 'Noon', 'Evening', 'Midnight'],
index = pd.date_range('1999-12-30', periods=7),
data = np.random.randn(7, 4))
df
# Morning Noon Evening Midnight
# 1999-12-30 1.764052 0.400157 0.978738 2.240893
# 1999-12-31 1.867558 -0.977278 0.950088 -0.151357
# 2000-01-01 -0.103219 0.410599 0.144044 1.454274
# 2000-01-02 0.761038 0.121675 0.443863 0.333674
# 2000-01-03 1.494079 -0.205158 0.313068 -0.854096
# 2000-01-04 -2.552990 0.653619 0.864436 -0.742165
# 2000-01-05 2.269755 -1.454366 0.045759 -0.187184
4.5.1. Rationale¶
df.at[]
df.loc[]
df.iat[]
df.iloc[]
iat
andiloc
takesint
as argumentat
andloc
takesstr
as argument
4.5.2. Columns¶
Single Column:
df.Morning
# 1999-12-30 1.764052
# 1999-12-31 1.867558
# 2000-01-01 -0.103219
# 2000-01-02 0.761038
# 2000-01-03 1.494079
# 2000-01-04 -2.552990
# 2000-01-05 2.269755
# Freq: D, Name: Morning, dtype: float64
df['Morning']
# 1999-12-30 1.764052
# 1999-12-31 1.867558
# 2000-01-01 -0.103219
# 2000-01-02 0.761038
# 2000-01-03 1.494079
# 2000-01-04 -2.552990
# 2000-01-05 2.269755
# Freq: D, Name: Morning, dtype: float64
df.loc[:, 'Morning']
# 1999-12-30 1.764052
# 1999-12-31 1.867558
# 2000-01-01 -0.103219
# 2000-01-02 0.761038
# 2000-01-03 1.494079
# 2000-01-04 -2.552990
# 2000-01-05 2.269755
# Freq: D, Name: Morning, dtype: float64
df.iloc[:, 0]
# 1999-12-30 1.764052
# 1999-12-31 1.867558
# 2000-01-01 -0.103219
# 2000-01-02 0.761038
# 2000-01-03 1.494079
# 2000-01-04 -2.552990
# 2000-01-05 2.269755
# Freq: D, Name: Morning, dtype: float64
Multiple columns:
df[['Morning', 'Evening']]
# Morning Evening
# 1999-12-30 1.764052 0.978738
# 1999-12-31 1.867558 0.950088
# 2000-01-01 -0.103219 0.144044
# 2000-01-02 0.761038 0.443863
# 2000-01-03 1.494079 0.313068
# 2000-01-04 -2.552990 0.864436
# 2000-01-05 2.269755 0.045759
4.5.3. Rows¶
df['2000-01-05']
will imply to take column with name 2000-01-05
, hence KeyError
:
df['2000-01-05']
# Traceback (most recent call last):
# KeyError: '2000-01-05'
df.loc['2000-01-05']
# Morning 2.269755
# Noon -1.454366
# Evening 0.045759
# Midnight -0.187184
# Name: 2000-01-05 00:00:00, dtype: float64
df.loc['2000-01']
# Morning Noon Evening Midnight
# 2000-01-01 -0.103219 0.410599 0.144044 1.454274
# 2000-01-02 0.761038 0.121675 0.443863 0.333674
# 2000-01-03 1.494079 -0.205158 0.313068 -0.854096
# 2000-01-04 -2.552990 0.653619 0.864436 -0.742165
# 2000-01-05 2.269755 -1.454366 0.045759 -0.187184
df.loc['1999']
# Morning Noon Evening Midnight
# 1999-12-30 1.764052 0.400157 0.978738 2.240893
# 1999-12-31 1.867558 -0.977278 0.950088 -0.151357
4.5.4. Columns by Index¶
df.iloc[:, 1]
# 1999-12-30 0.400157
# 1999-12-31 -0.977278
# 2000-01-01 0.410599
# 2000-01-02 0.121675
# 2000-01-03 -0.205158
# 2000-01-04 0.653619
# 2000-01-05 -1.454366
# Freq: D, Name: Noon, dtype: float64
df.iloc[:, [1,2]]
# Noon Evening
# 1999-12-30 0.400157 0.978738
# 1999-12-31 -0.977278 0.950088
# 2000-01-01 0.410599 0.144044
# 2000-01-02 0.121675 0.443863
# 2000-01-03 -0.205158 0.313068
# 2000-01-04 0.653619 0.864436
# 2000-01-05 -1.454366 0.045759