3.6. Series Slicing¶
3.6.1. SetUp¶
>>> import pandas as pd
3.6.2. Numeric Index¶
>>> s = pd.Series([1.0, 2.0, 3.0, 4.0, 5.0])
>>> s
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64
>>> s[:2]
0 1.0
1 2.0
dtype: float64
>>> s[2:]
2 3.0
3 4.0
4 5.0
dtype: float64
>>> s[1:-2]
1 2.0
2 3.0
dtype: float64
>>> s[::2]
0 1.0
2 3.0
4 5.0
dtype: float64
>>> s[1::2]
1 2.0
3 4.0
dtype: float64
3.6.3. String Index¶
Using string index upper and lower bound are inclusive!
String indexes has also numeric index underneath
>>> s = pd.Series(
... data = [1.0, 2.0, 3.0, 4.0, 5.0],
... index = ['a', 'b', 'c', 'd', 'e'])
>>> s
a 1.0
b 2.0
c 3.0
d 4.0
e 5.0
dtype: float64
>>> s['a':'d']
a 1.0
b 2.0
c 3.0
d 4.0
dtype: float64
>>> s['a':'d':2]
a 1.0
c 3.0
dtype: float64
>>> s['a':'d':'b']
Traceback (most recent call last):
TypeError: '>=' not supported between instances of 'str' and 'int'
>>> s['d':'a']
Series([], dtype: float64)
>>> s[:2]
a 1.0
b 2.0
dtype: float64
>>> s[2:]
c 3.0
d 4.0
e 5.0
dtype: float64
>>> s[1:-2]
b 2.0
c 3.0
dtype: float64
>>> s[::2]
a 1.0
c 3.0
e 5.0
dtype: float64
>>> s[1::2]
b 2.0
d 4.0
dtype: float64
>>> s = pd.Series(
... data = [1.0, 2.0, 3.0, 4.0, 5.0],
... index = ['aaa', 'bbb', 'ccc', 'ddd', 'eee'])
>>>
>>> s
aaa 1.0
bbb 2.0
ccc 3.0
ddd 4.0
eee 5.0
dtype: float64
>>>
>>> s['a':'b']
aaa 1.0
dtype: float64
>>>
>>> s['a':'c']
aaa 1.0
bbb 2.0
dtype: float64
3.6.4. Date Index¶
>>> s = pd.Series(
... data = [1.0, 2.0, 3.0, 4.0, 5.0],
... index = pd.date_range('1999-12-30', periods=5))
>>>
>>> s
1999-12-30 1.0
1999-12-31 2.0
2000-01-01 3.0
2000-01-02 4.0
2000-01-03 5.0
Freq: D, dtype: float64
>>> s['2000-01-02':'2000-01-04']
2000-01-02 4.0
2000-01-03 5.0
Freq: D, dtype: float64
>>> s['1999-12-30':'2000-01-04':2]
1999-12-30 1.0
2000-01-01 3.0
2000-01-03 5.0
Freq: 2D, dtype: float64
>>> s['1999-12-30':'2000-01-04':-1]
Series([], Freq: -1D, dtype: float64)
>>> s['2000-01-04':'1999-12-30':-1]
2000-01-03 5.0
2000-01-02 4.0
2000-01-01 3.0
1999-12-31 2.0
1999-12-30 1.0
Freq: -1D, dtype: float64
>>> s[:'1999']
1999-12-30 1.0
1999-12-31 2.0
Freq: D, dtype: float64
>>> s['2000':]
2000-01-01 3.0
2000-01-02 4.0
2000-01-03 5.0
Freq: D, dtype: float64
>>> s[:'1999-12']
1999-12-30 1.0
1999-12-31 2.0
Freq: D, dtype: float64
>>> s['2000-01':]
2000-01-01 3.0
2000-01-02 4.0
2000-01-03 5.0
Freq: D, dtype: float64
>>> s[:'2000-01-02']
1999-12-30 1.0
1999-12-31 2.0
2000-01-01 3.0
2000-01-02 4.0
Freq: D, dtype: float64
>>> s['2000-01-02':]
2000-01-02 4.0
2000-01-03 5.0
Freq: D, dtype: float64
>>> s['1999-12':'1999-12']
1999-12-30 1.0
1999-12-31 2.0
Freq: D, dtype: float64
>>> s['2000-01':'2000-01-05']
2000-01-01 3.0
2000-01-02 4.0
2000-01-03 5.0
Freq: D, dtype: float64
>>> s[:'2000-01-05':2]
1999-12-30 1.0
2000-01-01 3.0
2000-01-03 5.0
Freq: 2D, dtype: float64
>>> s[:'2000-01-03':-1]
2000-01-03 5.0
Freq: -1D, dtype: float64
Despite DatetimeIndex
, this series also has RangeIndex
underneath,
which you can slice.
>>> s[1:3]
1999-12-31 2.0
2000-01-01 3.0
Freq: D, dtype: float64
>>>
>>> s[:3]
1999-12-30 1.0
1999-12-31 2.0
2000-01-01 3.0
Freq: D, dtype: float64
>>>
>>> s[:3:2]
1999-12-30 1.0
2000-01-01 3.0
Freq: 2D, dtype: float64
>>>
>>> s[::-1]
2000-01-03 5.0
2000-01-02 4.0
2000-01-01 3.0
1999-12-31 2.0
1999-12-30 1.0
Freq: -1D, dtype: float64
3.6.5. Assignments¶
"""
* Assignment: Series Slice Datetime
* Complexity: easy
* Lines of code: 5 lines
* Time: 3 min
English:
1. Set random seed to zero
2. Create `s: pd.Series` with 100 random numbers from standard distribution
3. Series Index are following dates since 2000
4. Define `result: pd.Series` with values for dates between 2000-02-14 and end of February 2000
5. Run doctests - all must succeed
Polish:
1. Ustaw ziarno losowości na zero
2. Stwórz `s: pd.Series` z 100 losowymi liczbami z rozkładu normalnego
3. Indeksem w serii mają być kolejne dni od 2000 roku
4. Zdefiniuj `result: pd.Series` z wartościami pomiędzy datami od 2000-02-14 do końca lutego 2000
5. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `np.random.randn()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> pd.set_option('display.width', 500)
>>> pd.set_option('display.max_columns', 10)
>>> pd.set_option('display.max_rows', 10)
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is pd.Series, \
'Variable `result` has invalid type, should be `pd.Series`'
>>> result # doctest: +NORMALIZE_WHITESPACE
2000-02-14 -0.509652
2000-02-15 -0.438074
2000-02-16 -1.252795
2000-02-17 0.777490
2000-02-18 -1.613898
...
2000-02-25 0.428332
2000-02-26 0.066517
2000-02-27 0.302472
2000-02-28 -0.634322
2000-02-29 -0.362741
Freq: D, Length: 16, dtype: float64
"""
import pandas as pd
import numpy as np
np.random.seed(0)
NUMBER = 100
# type: pd.Series
result = ...
"""
* Assignment: Slicing Slice Str
* Complexity: easy
* Lines of code: 10 lines
* Time: 13 min
English:
1. Create `pd.Series` with 26 random integers in range `[10, 100)`
2. Name indexes like letters from ASCII alphabet (`ASCII_LOWERCASE: str`)
3. Find middle letter of alphabet
4. Slice from series 3 elements up and down from middle
5. Run doctests - all must succeed
Polish:
1. Stwórz `pd.Series` z 26 losowymi liczbami całkowitymi z przedziału `<10; 100)`
2. Nazwij indeksy jak kolejne litery alfabetu ASCII (`ASCII_LOWERCASE: str`)
3. Znajdź środkową literę alfabetu
4. Wytnij z serii po 3 elementy w górę i w dół od wyszukanego środka
5. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `np.random.randint(..., ..., size=...)`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is pd.Series, \
'Variable `result` has invalid type, should be `pd.Series`'
>>> result
j 97
k 80
l 98
m 98
n 22
o 68
p 75
dtype: int64
"""
from statistics import median_low
import pandas as pd
import numpy as np
np.random.seed(0)
ASCII_LOWERCASE = 'abcdefghijklmnopqrstuvwxyz'
# type: pd.Series
result = ...