3.4. Series Sample¶
3.4.1. SetUp¶
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>>
>>> s = pd.Series(
... data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
... index = pd.date_range(start='1999-12-28', periods=10))
>>> s
1999-12-28 0
1999-12-29 1
1999-12-30 2
1999-12-31 3
2000-01-01 4
2000-01-02 5
2000-01-03 6
2000-01-04 7
2000-01-05 8
2000-01-06 9
Freq: D, dtype: int64
3.4.2. Head¶
>>> s.head(2)
1999-12-28 0
1999-12-29 1
Freq: D, dtype: int64
>>> s.head(n=1)
1999-12-28 0
Freq: D, dtype: int64
3.4.3. Tail¶
>>> s.tail(2)
2000-01-05 8
2000-01-06 9
Freq: D, dtype: int64
>>> s.tail(n=1)
2000-01-06 9
Freq: D, dtype: int64
3.4.4. First¶
>>> s.first('Y')
1999-12-28 0
1999-12-29 1
1999-12-30 2
1999-12-31 3
Freq: D, dtype: int64
>>> s.first('M')
1999-12-28 0
1999-12-29 1
1999-12-30 2
1999-12-31 3
Freq: D, dtype: int64
>>> s.first('D')
1999-12-28 0
Freq: D, dtype: int64
>>> s.first('3D')
1999-12-28 0
1999-12-29 1
1999-12-30 2
Freq: D, dtype: int64
>>> s.first('W')
1999-12-28 0
1999-12-29 1
1999-12-30 2
1999-12-31 3
2000-01-01 4
2000-01-02 5
Freq: D, dtype: int64
3.4.5. Last¶
>>> s.last('Y')
2000-01-01 4
2000-01-02 5
2000-01-03 6
2000-01-04 7
2000-01-05 8
2000-01-06 9
Freq: D, dtype: int64
>>> s.last('M')
2000-01-01 4
2000-01-02 5
2000-01-03 6
2000-01-04 7
2000-01-05 8
2000-01-06 9
Freq: D, dtype: int64
>>> s.last('D')
2000-01-06 9
Freq: D, dtype: int64
>>> s.last('2D')
2000-01-05 8
2000-01-06 9
Freq: D, dtype: int64
>>> s.last('W')
2000-01-03 6
2000-01-04 7
2000-01-05 8
2000-01-06 9
Freq: D, dtype: int64
3.4.6. Sample¶
1/4 is 25%
.05 is 5%
0.5 is 50%
1.0 is 100%
n number or fraction random rows with and without repetition:
>>> s.sample()
1999-12-30 2
Freq: D, dtype: int64
>>> s.sample(2)
1999-12-31 3
2000-01-02 5
Freq: 2D, dtype: int64
>>> s.sample(n=2, replace=True)
2000-01-04 7
2000-01-04 7
dtype: int64
>>> s.sample(frac=1/4)
1999-12-30 2
1999-12-31 3
Freq: D, dtype: int64
>>> s.sample(frac=0.5)
2000-01-01 4
1999-12-29 1
2000-01-04 7
2000-01-05 8
1999-12-30 2
dtype: int64
3.4.7. Reset Index¶
>>> s.sample(frac=1.0).reset_index()
index 0
0 2000-01-02 5
1 1999-12-30 2
2 2000-01-04 7
3 2000-01-01 4
4 1999-12-29 1
5 1999-12-28 0
6 2000-01-03 6
7 2000-01-05 8
8 2000-01-06 9
9 1999-12-31 3
3.4.8. Assignments¶
"""
* Assignment: Series Sample
* Complexity: easy
* Lines of code: 5 lines
* Time: 5 min
English:
1. Set random seed to zero
2. Create `pd.Series` with 100 random numbers from standard normal distribution
3. Series Index are following dates since 2000
4. Print values:
a. first in the series,
b. last 5 elements in the series,
c. first two weeks in the series,
d. last month in the series,
e. three random elements,
f. 125% of random elements with replacement.
5. Run doctests - all must succeed
Polish:
1. Ustaw ziarno losowości na zero
2. Stwórz `pd.Series` z 100 losowymi liczbami z rozkładu normalnego
3. Indeksem w serii mają być kolejne dni od 2000 roku
4. Wypisz wartości:
a. pierwszy w serii,
b. ostatnie 5 elementów w serii,
c. dwa pierwsze tygodnie w serii,
d. ostatni miesiąc w serii,
e. trzy losowe element,
f. 125% losowych elementów z powtórzeniami.
5. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `np.random.seed(0)`
* `np.random.randn(n)`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert all(type(x) is not Ellipsis for x in result.values()), \
'Assign result to dict values in `result`'
>>> assert type(result) is dict, \
'Variable `result` has invalid type, should be `dict`'
>>> result # doctest: +NORMALIZE_WHITESPACE
{'head': 2000-01-01 1.764052
Freq: D, dtype: float64, 'tail': 2000-04-05 0.706573
2000-04-06 0.010500
2000-04-07 1.785870
2000-04-08 0.126912
2000-04-09 0.401989
Freq: D, dtype: float64, 'first': 2000-01-01 1.764052
2000-01-02 0.400157
2000-01-03 0.978738
2000-01-04 2.240893
2000-01-05 1.867558
2000-01-06 -0.977278
2000-01-07 0.950088
2000-01-08 -0.151357
2000-01-09 -0.103219
Freq: D, dtype: float64, 'last': 2000-04-01 1.222445
2000-04-02 0.208275
2000-04-03 0.976639
2000-04-04 0.356366
2000-04-05 0.706573
2000-04-06 0.010500
2000-04-07 1.785870
2000-04-08 0.126912
2000-04-09 0.401989
Freq: D, dtype: float64, 'sample_n': 2000-01-20 -0.854096
2000-01-07 0.950088
2000-02-15 -0.438074
dtype: float64, 'sample_frac': 2000-03-07 -1.630198
2000-04-01 1.222445
2000-03-26 1.895889
2000-02-09 -0.302303
2000-02-09 -0.302303
...
2000-01-08 -0.151357
2000-03-21 -1.165150
2000-01-23 0.864436
2000-03-20 0.056165
2000-03-30 1.054452
Length: 125, dtype: float64}
"""
import pandas as pd
import numpy as np
np.random.seed(0)
# type: dict[str, pd.Series]
result = {
'head': ...,
'tail': ...,
'first': ...,
'last': ...,
'sample_n': ...,
'sample_frac': ...,
}