7.2. CSV Format Read¶
7.2.1. Assignments¶
"""
* Assignment: CSV Format ReadString
* Complexity: easy
* Lines of code: 4 lines
* Time: 5 min
English:
1. Convert `DATA` to `result: list[tuple[str]]`
2. Do not convert numeric values to `float`, leave them as `str`
3. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
2. Nie konwertuj wartości numerycznych do `float`, zostaw jako `str`
3. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.splitlines()`
* `str.strip()`
* `str.split()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> result = list(result) # expand map object
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is tuple for x in result), \
'All rows in `result` should be tuple'
>>> result # doctest: +NORMALIZE_WHITESPACE
[('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
('5.8', '2.7', '5.1', '1.9', 'virginica'),
('5.1', '3.5', '1.4', '0.2', 'setosa'),
('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...
"""
* Assignment: CSV Format ReadSwitch
* Complexity: easy
* Lines of code: 6 lines
* Time: 5 min
English:
1. Convert `DATA` to `result: list[tuple[str]]`
2. Substitute last element (class label) with value from `LABEL_ENCODER`
3. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
2. Podmień ostatni element (etykietę klasową) z wartością z `LABEL_ENCODER`
3. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.splitlines()`
* `str.strip()`
* `str.split()`
* `dict.get()`
* `list() + list()`
* `list.append()`
* `tuple()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> result = list(result) # expand map object
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is tuple for x in result), \
'All rows in `result` should be tuple'
>>> result # doctest: +NORMALIZE_WHITESPACE
[('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
('5.8', '2.7', '5.1', '1.9', 'virginica'),
('5.1', '3.5', '1.4', '0.2', 'setosa'),
('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,0
5.1,3.5,1.4,0.2,1
5.7,2.8,4.1,1.3,2"""
LABEL_ENCODER = {
'0': 'virginica',
'1': 'setosa',
'2': 'versicolor'}
# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...
"""
* Assignment: CSV Format ReadLabelEncoder
* Complexity: medium
* Lines of code: 10 lines
* Time: 13 min
English:
1. Convert `DATA` to `result: list[tuple[str]]`
2. Generate `LABEL_ENCODER: dict[int,str]` from `header: list[str]`
3. Substitute last element (class label) with value from `LABEL_ENCODER`
4. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
2. Wygeneruj `LABEL_ENCODER: dict[int,str]` z `header: list[str]`
3. Podmień ostatni element (etykietę klasową) z wartością z `LABEL_ENCODER`
4. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `dict(enumerate())`
* `str.strip()`
* `str.split()`
* `dict.get()`
* `int()`
* `list() + list()`
* `list.append()`
* `tuple()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> result = list(result) # expand map object
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is tuple for x in result), \
'All rows in `result` should be tuple'
>>> result # doctest: +NORMALIZE_WHITESPACE
[('5.8', '2.7', '5.1', '1.9', 'virginica'),
('5.1', '3.5', '1.4', '0.2', 'setosa'),
('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""
DATA = """3,4,setosa,virginica,versicolor
5.8,2.7,5.1,1.9,1
5.1,3.5,1.4,0.2,0
5.7,2.8,4.1,1.3,2"""
# values from file (note the list[tuple] format!)
# type: list[tuple]
result = ...
"""
* Assignment: CSV Format ReadTypeCast
* Complexity: easy
* Lines of code: 9 lines
* Time: 8 min
English:
1. Convert `DATA` to `result: list[tuple[str]]`
2. Convert numeric values to `float`
3. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
2. Przekonwertuj wartości numeryczne do `float`
3. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.strip()`
* `str.split()`
* `map()`
* `list() + list()`
* `list.append()`
* `tuple()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> result = list(result) # expand map object
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is tuple for x in result), \
'All rows in `result` should be tuple'
>>> result # doctest: +NORMALIZE_WHITESPACE
[('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor')]
"""
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
# values from file (note the list[tuple] format!)
# type: list[tuple]
result = ...
"""
* Assignment: CSV Format ReadFixedHeader
* Complexity: easy
* Lines of code: 5 lines
* Time: 5 min
English:
1. Convert `DATA` to `result: list[dict]`
2. Use `HEADER` as dict keys
3. Do not convert numeric values to `float`, leave them as `str`
4. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[dict]`
2. Użyj `HEADER` jako kluczy dictów
3. Nie konwertuj wartości numeryczne do `float`, pozostaw je jako `str`
4. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.splitlines()`
* `str.strip()`
* `str.split()`
* `dict(zip())`
* `list.append()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> result = list(result) # expand map object
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is dict for x in result), \
'All rows in `result` should be dict'
>>> result # doctest: +NORMALIZE_WHITESPACE
[{'sepal_length': '5.8', 'sepal_width': '2.7', 'petal_length': '5.1',
'petal_width': '1.9', 'species': 'virginica'},
{'sepal_length': '5.1', 'sepal_width': '3.5', 'petal_length': '1.4',
'petal_width': '0.2', 'species': 'setosa'},
{'sepal_length': '5.7', 'sepal_width': '2.8', 'petal_length': '4.1',
'petal_width': '1.3', 'species': 'versicolor'}]
"""
DATA = """5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
HEADER = [
'sepal_length',
'sepal_width',
'petal_length',
'petal_width',
'species',
]
# Replace keys with `HEADER`
# type: list[dict[str,str]]
result = ...
"""
* Assignment: CSV Format ReadGenerateHeader
* Complexity: easy
* Lines of code: 7 lines
* Time: 8 min
English:
1. Generate `header: list[str]` from first line `DATA`
2. Convert `DATA` to `result: list[dict]`
3. Use `header` as keys
4. Do not convert numeric values to `float`, leave them as `str`
5. Run doctests - all must succeed
Polish:
1. Wygeneruj `header: list[str]` z pierwszej linii `DATA`
2. Przekonwertuj `DATA` to `result: list[dict]`
3. Użyj nagłówka jako kluczy
4. Nie konwertuj wartości numeryczne do `float`, pozostaw je jako `str`
5. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.strip()`
* `str.split()`
* `map()`
* `list() + list()`
* `list.append()`
* `tuple()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> result = list(result) # expand map object
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is dict for x in result), \
'All rows in `result` should be dict'
>>> result # doctest: +NORMALIZE_WHITESPACE
[{'sepal_length': '5.8', 'sepal_width': '2.7', 'petal_length': '5.1',
'petal_width': '1.9', 'species': 'virginica'},
{'sepal_length': '5.1', 'sepal_width': '3.5', 'petal_length': '1.4',
'petal_width': '0.2', 'species': 'setosa'},
{'sepal_length': '5.7', 'sepal_width': '2.8', 'petal_length': '4.1',
'petal_width': '1.3', 'species': 'versicolor'}]
"""
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
# replace fieldnames with `FIELDNAMES`
# type: list[dict]
result = ...