8.5. CSV Reader

  • Reads CSV file to list[list]

  • csv.reader()

  • Default encoding is encoding='utf-8'

8.5.1. SetUp

>>> import csv
>>> from pprint import pprint
>>> from pathlib import Path

8.5.2. Minimal

  • Default mode is mode='r'

Data:

sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor

SetUp:

>>> DATA = """sepal_length,sepal_width,petal_length,petal_width,species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... """
>>>
>>> _ = Path('/tmp/myfile.csv').write_text(DATA)

Usage:

>>> with open('/tmp/myfile.csv') as file:
...     reader = csv.reader(file)
...     result = list(reader)
>>>
>>>
>>> pprint(result)
[['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'],
 ['5.8', '2.7', '5.1', '1.9', 'virginica'],
 ['5.1', '3.5', '1.4', '0.2', 'setosa'],
 ['5.7', '2.8', '4.1', '1.3', 'versicolor']]

8.5.3. Parametrized

Data:

"sepal_length";"sepal_width";"petal_length";"petal_width";"species"
"5.8";"2.7";"5.1";"1.9";"virginica"
"5.1";"3.5";"1.4";"0.2";"setosa"
"5.7";"2.8";"4.1";"1.3";"versicolor"

SetUp:

>>> DATA = '''"sepal_length";"sepal_width";"petal_length";"petal_width";"species"
... "5.8";"2.7";"5.1";"1.9";"virginica"
... "5.1";"3.5";"1.4";"0.2";"setosa"
... "5.7";"2.8";"4.1";"1.3";"versicolor"
... '''
>>>
>>> _ = Path('/tmp/myfile.csv').write_text(DATA)

Usage:

>>> with open('/tmp/myfile.csv', mode='r', encoding='utf-8') as file:
...     reader = csv.reader(file, quotechar='"', delimiter=';', quoting=csv.QUOTE_ALL)
...     result = list(reader)
>>>
>>>
>>> pprint(result)
[['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'],
 ['5.8', '2.7', '5.1', '1.9', 'virginica'],
 ['5.1', '3.5', '1.4', '0.2', 'setosa'],
 ['5.7', '2.8', '4.1', '1.3', 'versicolor']]

8.5.4. Assignments

Code 8.25. Solution
"""
* Assignment: CSV Reader Syntax
* Complexity: easy
* Lines of code: 4 lines
* Time: 5 min

English:
    1. Using `csv.reader()` read data from `FILE`
    2. Define `result: list[tuple]` with converted data
    3. Use Unix `\n` line terminator
    4. Run doctests - all must succeed

Polish:
    1. Za pomocą `csv.reader()` wczytaj dane z `FILE`
    2. Zdefiniuj `result: list[tuple]` z przekonwerowanymi danymi
    3. Użyj zakończenia linii Unix `\n`
    4. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in result), \
    'All rows in `result` should be tuple'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
     ('5.8', '2.7', '5.1', '1.9', 'virginica'),
     ('5.1', '3.5', '1.4', '0.2', 'setosa'),
     ('5.7', '2.8', '4.1', '1.3', 'versicolor')]

    >>> remove(FILE)
"""

import csv


FILE = r'_temporary.csv'

DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""


with open(FILE, mode='w') as file:
    file.write(DATA)

# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...


Code 8.26. Solution
"""
* Assignment: CSV Reader Substitute
* Complexity: easy
* Lines of code: 6 lines
* Time: 5 min

English:
    1. Using `csv.reader()` read data from `FILE`
    2. Define `result: list[tuple]` with converted data
    3. Lookup species name in `SPECIES` dictionary
    4. Use Unix `\n` line terminator
    5. Run doctests - all must succeed

Polish:
    1. Za pomocą `csv.reader()` wczytaj dane z `FILE`
    2. Zdefiniuj `result: list[tuple]` z przekonwerowanymi danymi
    3. Nazwę gatunku wyszukaj w słowniku `SPECIES`
    4. Użyj zakończenia linii Unix `\n`
    5. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in result), \
    'All rows in `result` should be tuple'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [('5.8', '2.7', '5.1', '1.9', 'virginica'),
     ('5.1', '3.5', '1.4', '0.2', 'setosa'),
     ('5.7', '2.8', '4.1', '1.3', 'versicolor')]

    >>> remove(FILE)
"""

import csv


FILE = r'_temporary.csv'

DATA = """5.8,2.7,5.1,1.9,1
5.1,3.5,1.4,0.2,0
5.7,2.8,4.1,1.3,2"""

SPECIES = {
    0: 'setosa',
    1: 'virginica',
    2: 'versicolor'}

with open(FILE, mode='w') as file:
    file.write(DATA)

# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...


Code 8.27. Solution
"""
* Assignment: CSV Reader Enumerate
* Complexity: medium
* Lines of code: 8 lines
* Time: 8 min

English:
    1. Using `csv.reader()` read data from `FILE`
    2. Define `result: list[tuple]` with converted data
    3. Use Unix `\n` line terminator
    4. Run doctests - all must succeed

Polish:
    1. Za pomocą `csv.reader()` wczytaj dane z `FILE`
    2. Zdefiniuj `result: list[tuple]` z przekonwerowanymi danymi
    3. Użyj zakończenia linii Unix `\n`
    4. Uruchom doctesty - wszystkie muszą się powieść

Hint:
    * For Python before 3.8: `dict(OrderedDict)`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in result), \
    'All rows in `result` should be tuple'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [('5.8', '2.7', '5.1', '1.9', 'virginica'),
     ('5.1', '3.5', '1.4', '0.2', 'setosa'),
     ('5.7', '2.8', '4.1', '1.3', 'versicolor')]

    >>> remove(FILE)
"""

import csv


FILE = r'_temporary.csv'

DATA = """3,4,setosa,virginica,versicolor
5.8,2.7,5.1,1.9,1
5.1,3.5,1.4,0.2,0
5.7,2.8,4.1,1.3,2"""

with open(FILE, mode='w') as file:
    file.write(DATA)

# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...

Code 8.28. Solution
"""
* Assignment: CSV Reader TypeCast
* Complexity: medium
* Lines of code: 8 lines
* Time: 8 min

English:
    1. Using `csv.reader()` read data from `FILE`
    2. Define `result: list[tuple]` with converted data
    3. Use Unix `\n` line terminator
    4. Run doctests - all must succeed

Polish:
    1. Za pomocą `csv.reader()` wczytaj dane z `FILE`
    2. Zdefiniuj `result: list[tuple]` z przekonwerowanymi danymi
    3. Użyj zakończenia linii Unix `\n`
    4. Uruchom doctesty - wszystkie muszą się powieść

Hint:
    * For Python before 3.8: `dict(OrderedDict)`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in result), \
    'All rows in `result` should be tuple'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
     (5.8, 2.7, 5.1, 1.9, 'virginica'),
     (5.1, 3.5, 1.4, 0.2, 'setosa'),
     (5.7, 2.8, 4.1, 1.3, 'versicolor')]

    >>> remove(FILE)
"""

import csv


FILE = r'_temporary.csv'

DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""


with open(FILE, mode='w') as file:
    file.write(DATA)


# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...