5.1. DataFrame Create¶
pd.DataFrame(list[dict])
pd.DataFrame(dict[str,list])
5.1.1. SetUp¶
>>> import pandas as pd
>>> import numpy as np
5.1.2. Create from List of Dicts¶
>>> pd.DataFrame([
... {'A': 1.0, 'B': 2.0},
... {'A': 3.0, 'B': 4.0},
... ])
A B
0 1.0 2.0
1 3.0 4.0
>>> pd.DataFrame([
... {'A': 1.0, 'B': 2.0},
... {'B': 3.0, 'C': 4.0},
... ])
A B C
0 1.0 2.0 NaN
1 NaN 3.0 4.0
>>> pd.DataFrame([
... {'firstname': 'Mark', 'lastname': 'Watney'},
... {'firstname': 'Melissa', 'lastname': 'Lewis'},
... {'firstname': 'Rick', 'lastname': 'Martinez'},
... {'firstname': 'Alex', 'lastname': 'Vogel'},
... ])
firstname lastname
0 Mark Watney
1 Melissa Lewis
2 Rick Martinez
3 Alex Vogel
5.1.3. Create from Dict¶
>>> pd.DataFrame({
... 'A': ['a', 'b', 'c'],
... 'B': [1.0, 2.0, 3.0],
... 'C': [1, 2, 3],
... })
A B C
0 a 1.0 1
1 b 2.0 2
2 c 3.0 3
>>> pd.DataFrame({
... 'firstname': ['Mark', 'Melissa', 'Rick', 'Alex'],
... 'lastname': ['Watney', 'Lewis', 'Martinez', 'Vogel'],
... })
firstname lastname
0 Mark Watney
1 Melissa Lewis
2 Rick Martinez
3 Alex Vogel
5.1.4. Create from NDArray¶
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>>
>>> df = pd.DataFrame(np.random.randn(7, 4))
>>>
>>> df
0 1 2 3
0 1.764052 0.400157 0.978738 2.240893
1 1.867558 -0.977278 0.950088 -0.151357
2 -0.103219 0.410599 0.144044 1.454274
3 0.761038 0.121675 0.443863 0.333674
4 1.494079 -0.205158 0.313068 -0.854096
5 -2.552990 0.653619 0.864436 -0.742165
6 2.269755 -1.454366 0.045759 -0.187184
5.1.5. Use Case - 0x01¶
>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> pd.DataFrame({
... 'A': 1.,
... 'B': pd.Timestamp('1961-04-12'),
... 'C': pd.Series(1, index=list(range(4)), dtype='float32'),
... 'D': np.array([3] * 4, dtype='int32'),
... 'E': pd.Categorical(["test", "train", "test", "train"]),
... 'F': 'foo',
... 'G': [1,2,3,4],
... })
A B C D E F G
0 1.0 1961-04-12 1.0 3 test foo 1
1 1.0 1961-04-12 1.0 3 train foo 2
2 1.0 1961-04-12 1.0 3 test foo 3
3 1.0 1961-04-12 1.0 3 train foo 4
5.1.6. Use Case - 0x02¶
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>>
>>> df = pd.DataFrame(
... columns = ['Morning', 'Noon', 'Evening', 'Midnight'],
... index = pd.date_range('1999-12-30', periods=7),
... data = np.random.randn(7, 4))
...
>>> df
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-02 0.761038 0.121675 0.443863 0.333674
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-04 -2.552990 0.653619 0.864436 -0.742165
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
5.1.7. Assignments¶
"""
* Assignment: DataFrame Create
* Complexity: easy
* Lines of code: 5 lines
* Time: 3 min
English:
1. Create `result: pd.DataFrame` for input data
2. Name columns: `Crew`, `Role`, `Astronaut`
2. Run doctests - all must succeed
Polish:
1. Stwórz `result: pd.DataFrame` dla danych wejściowych
2. Name columns: `Crew`, `Role`, `Astronaut`
2. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* Use selection with `alt` key in your IDE
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> pd.set_option('display.width', 500)
>>> pd.set_option('display.max_columns', 10)
>>> pd.set_option('display.max_rows', 10)
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is pd.DataFrame, \
'Variable `result` must be a `pd.DataFrame` type'
>>> result # doctest: +NORMALIZE_WHITESPACE
Crew Role Astronaut
0 Prime CDR Neil Armstrong
1 Prime LMP Buzz Aldrin
2 Prime CMP Michael Collins
3 Backup CDR James Lovell
4 Backup LMP William Anders
5 Backup CMP Fred Haise
"""
import pandas as pd
"""
"commander", "Melissa", "Lewis"
"botanist", "Mark", "Watney"
"pilot", "Rick", "Martinez"
"chemist", "Alex", "Vogel"
"engineer", "Beth", "Johanssen"
"CMP", "Chris", "Back"
"""
# type: pd.DataFrame
result = ...