8.2. CSV Format Read¶

8.2.1. Assignments¶

Code 8.13. Solution¶

"""
* Assignment: CSV Format ReadString
* Complexity: easy
* Lines of code: 1 lines
* Time: 5 min

English:
    1. Convert `DATA` to `result: list[tuple[str]]`
    2. Do not convert numeric values to `float`, leave them as `str`
    3. Run doctests - all must succeed

Polish:
    1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
    2. Nie konwertuj wartości numerycznych do `float`, zostaw jako `str`
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.splitlines()`
    * `str.strip()`
    * `str.split()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> result = list(result)  # expand map object
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in result), \
    'All rows in `result` should be tuple'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
     ('5.8', '2.7', '5.1', '1.9', 'virginica'),
     ('5.1', '3.5', '1.4', '0.2', 'setosa'),
     ('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""

DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""

# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...

Code 8.14. Solution¶

"""
* Assignment: CSV Format ReadSwitch
* Complexity: easy
* Lines of code: 6 lines
* Time: 5 min

English:
    1. Convert `DATA` to `result: list[tuple[str]]`
    2. Substitute last element (class label) with value from `LABEL_ENCODER`
    3. Run doctests - all must succeed

Polish:
    1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
    2. Podmień ostatni element (etykietę klasową) z wartością z `LABEL_ENCODER`
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.splitlines()`
    * `str.strip()`
    * `str.split()`
    * `dict.get()`
    * `list() + list()`
    * `list.append()`
    * `tuple()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> result = list(result)  # expand map object
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in result), \
    'All rows in `result` should be tuple'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
     ('5.8', '2.7', '5.1', '1.9', 'virginica'),
     ('5.1', '3.5', '1.4', '0.2', 'setosa'),
     ('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""

DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,0
5.1,3.5,1.4,0.2,1
5.7,2.8,4.1,1.3,2"""

LABEL_ENCODER = {
    '0': 'virginica',
    '1': 'setosa',
    '2': 'versicolor',
}

# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...

Code 8.15. Solution¶

"""
* Assignment: CSV Format ReadLabelEncoder
* Complexity: medium
* Lines of code: 10 lines
* Time: 13 min

English:
    1. Convert `DATA` to `result: list[tuple[str]]`
    2. Generate `LABEL_ENCODER: dict[int,str]` from `header: list[str]`
    3. Substitute last element (class label) with value from `LABEL_ENCODER`
    4. Run doctests - all must succeed

Polish:
    1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
    2. Wygeneruj `LABEL_ENCODER: dict[int,str]` z `header: list[str]`
    3. Podmień ostatni element (etykietę klasową) z wartością z `LABEL_ENCODER`
    4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `dict(enumerate())`
    * `str.strip()`
    * `str.split()`
    * `dict.get()`
    * `int()`
    * `list() + list()`
    * `list.append()`
    * `tuple()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> result = list(result)  # expand map object
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in result), \
    'All rows in `result` should be tuple'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [('5.8', '2.7', '5.1', '1.9', 'virginica'),
     ('5.1', '3.5', '1.4', '0.2', 'setosa'),
     ('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""

DATA = """3,4,setosa,virginica,versicolor
5.8,2.7,5.1,1.9,1
5.1,3.5,1.4,0.2,0
5.7,2.8,4.1,1.3,2"""

# values from file (note the list[tuple] format!)
# type: list[tuple]
result = ...

Code 8.16. Solution¶

"""
* Assignment: CSV Format ReadTypeCast
* Complexity: easy
* Lines of code: 9 lines
* Time: 8 min

English:
    1. Convert `DATA` to `result: list[tuple[str]]`
    2. Convert numeric values to `float`
    3. Run doctests - all must succeed

Polish:
    1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
    2. Przekonwertuj wartości numeryczne do `float`
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.strip()`
    * `str.split()`
    * `map()`
    * `list() + list()`
    * `list.append()`
    * `tuple()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> result = list(result)  # expand map object
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in result), \
    'All rows in `result` should be tuple'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
     (5.8, 2.7, 5.1, 1.9, 'virginica'),
     (5.1, 3.5, 1.4, 0.2, 'setosa'),
     (5.7, 2.8, 4.1, 1.3, 'versicolor')]
"""

DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""

# values from file (note the list[tuple] format!)
# type: list[tuple]
result = ...

Code 8.17. Solution¶

"""
* Assignment: CSV Format ReadFixedHeader
* Complexity: easy
* Lines of code: 5 lines
* Time: 5 min

English:
    1. Convert `DATA` to `result: list[dict]`
    2. Use `HEADER` as dict keys
    3. Do not convert numeric values to `float`, leave them as `str`
    4. Run doctests - all must succeed

Polish:
    1. Przekonwertuj `DATA` to `result: list[dict]`
    2. Użyj `HEADER` jako kluczy dictów
    3. Nie konwertuj wartości numeryczne do `float`, pozostaw je jako `str`
    4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.splitlines()`
    * `str.strip()`
    * `str.split()`
    * `dict(zip())`
    * `list.append()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> result = list(result)  # expand map object
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is dict for x in result), \
    'All rows in `result` should be dict'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [{'sepal_length': '5.8', 'sepal_width': '2.7', 'petal_length': '5.1',
      'petal_width': '1.9', 'species': 'virginica'},
     {'sepal_length': '5.1', 'sepal_width': '3.5', 'petal_length': '1.4',
      'petal_width': '0.2', 'species': 'setosa'},
     {'sepal_length': '5.7', 'sepal_width': '2.8', 'petal_length': '4.1',
      'petal_width': '1.3', 'species': 'versicolor'}]
"""

DATA = """5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""

HEADER = [
    'sepal_length',
    'sepal_width',
    'petal_length',
    'petal_width',
    'species',
]

# Replace keys with `HEADER`
# type: list[dict[str,str]]
result = ...

Code 8.18. Solution¶

"""
* Assignment: CSV Format ReadGenerateHeader
* Complexity: easy
* Lines of code: 7 lines
* Time: 8 min

English:
    1. Generate `header: list[str]` from first line `DATA`
    2. Convert `DATA` to `result: list[dict]`
    3. Use `header` as keys
    4. Do not convert numeric values to `float`, leave them as `str`
    5. Run doctests - all must succeed

Polish:
    1. Wygeneruj `header: list[str]` z pierwszej linii `DATA`
    2. Przekonwertuj `DATA` to `result: list[dict]`
    3. Użyj nagłówka jako kluczy
    4. Nie konwertuj wartości numeryczne do `float`, pozostaw je jako `str`
    5. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.strip()`
    * `str.split()`
    * `map()`
    * `list() + list()`
    * `list.append()`
    * `tuple()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> result = list(result)  # expand map object
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is dict for x in result), \
    'All rows in `result` should be dict'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [{'sepal_length': '5.8', 'sepal_width': '2.7', 'petal_length': '5.1',
      'petal_width': '1.9', 'species': 'virginica'},
     {'sepal_length': '5.1', 'sepal_width': '3.5', 'petal_length': '1.4',
      'petal_width': '0.2', 'species': 'setosa'},
     {'sepal_length': '5.7', 'sepal_width': '2.8', 'petal_length': '4.1',
      'petal_width': '1.3', 'species': 'versicolor'}]
"""

DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""

# replace fieldnames with `FIELDNAMES`
# type: list[dict]
result = ...