8.7. CSV DictReader¶
Reads CSV file to list[dict]
csv.DictReader()
8.7.1. SetUp¶
>>> import csv
>>> from pathlib import Path
>>> from pprint import pprint
8.7.2. Minimal¶
Data:
sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
SetUp:
>>> DATA = """sepal_length,sepal_width,petal_length,petal_width,species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... """
>>>
>>> _ = Path('/tmp/myfile.csv').write_text(DATA)
Usage:
>>> with open('/tmp/myfile.csv') as file:
... reader = csv.DictReader(file)
... result = list(reader)
>>>
>>> pprint(result, sort_dicts=False)
[{'sepal_length': '5.8',
'sepal_width': '2.7',
'petal_length': '5.1',
'petal_width': '1.9',
'species': 'virginica'},
{'sepal_length': '5.1',
'sepal_width': '3.5',
'petal_length': '1.4',
'petal_width': '0.2',
'species': 'setosa'},
{'sepal_length': '5.7',
'sepal_width': '2.8',
'petal_length': '4.1',
'petal_width': '1.3',
'species': 'versicolor'}]
8.7.3. Parametrized¶
Data:
"sepal_length";"sepal_width";"petal_length";"petal_width";"species"
"5.8";"2.7";"5.1";"1.9";"virginica"
"5.1";"3.5";"1.4";"0.2";"setosa"
"5.7";"2.8";"4.1";"1.3";"versicolor"
SetUp:
>>> DATA = '''"sepal_length";"sepal_width";"petal_length";"petal_width";"species"
... "5.8";"2.7";"5.1";"1.9";"virginica"
... "5.1";"3.5";"1.4";"0.2";"setosa"
... "5.7";"2.8";"4.1";"1.3";"versicolor"
... '''
>>>
>>> _ = Path('/tmp/myfile.csv').write_text(DATA)
Usage:
>>> with open('/tmp/myfile.csv', mode='r', encoding='utf-8') as file:
... reader = csv.DictReader(file, quotechar='"', delimiter=';', quoting=csv.QUOTE_ALL)
... result = list(reader)
>>>
>>> pprint(result, sort_dicts=False)
[{'sepal_length': '5.8',
'sepal_width': '2.7',
'petal_length': '5.1',
'petal_width': '1.9',
'species': 'virginica'},
{'sepal_length': '5.1',
'sepal_width': '3.5',
'petal_length': '1.4',
'petal_width': '0.2',
'species': 'setosa'},
{'sepal_length': '5.7',
'sepal_width': '2.8',
'petal_length': '4.1',
'petal_width': '1.3',
'species': 'versicolor'}]
8.7.4. Custom Header¶
Read data from CSV file using csv.DictReader()
. While giving custom names
note, that first line (typically a header) will be treated like normal data.
Therefore we skip it using header = file.readline()
:
Data:
sl,sw,pl,pw,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
SetUp:
>>> DATA = """sl,sw,pl,pw,species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... """
>>>
>>> _ = Path('/tmp/myfile.csv').write_text(DATA)
Usage:
>>> FIELDNAMES = [
... 'sepal_length',
... 'sepal_width',
... 'petal_length',
... 'petal_width',
... 'species',
... ]
>>>
>>> with open('/tmp/myfile.csv') as file:
... old_header = file.readline() # skip the first line (old header)
... reader = csv.DictReader(file, fieldnames=FIELDNAMES)
... result = list(reader)
>>>
>>> pprint(result, sort_dicts=False)
[{'sepal_length': '5.8',
'sepal_width': '2.7',
'petal_length': '5.1',
'petal_width': '1.9',
'species': 'virginica'},
{'sepal_length': '5.1',
'sepal_width': '3.5',
'petal_length': '1.4',
'petal_width': '0.2',
'species': 'setosa'},
{'sepal_length': '5.7',
'sepal_width': '2.8',
'petal_length': '4.1',
'petal_width': '1.3',
'species': 'versicolor'}]
8.7.5. Use Case - 0x01¶
sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
>>> import csv
>>> from pathlib import Path
>>> from pprint import pprint
>>>
>>>
>>> DATA = """sepal_length,sepal_width,petal_length,petal_width,species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... """
>>>
>>> _ = Path('/tmp/myfile.csv').write_text(DATA)
>>>
>>>
>>> def clean(row: dict) -> dict:
... return {
... 'sepal_length': float(row['sepal_length']),
... 'sepal_width': float(row['sepal_width']),
... 'petal_length': float(row['petal_length']),
... 'petal_width': float(row['petal_width']),
... 'species': row['species']
... }
>>>
>>>
>>> with open('/tmp/myfile.csv') as file:
... reader = csv.DictReader(file)
... result = map(clean, reader)
... result = list(result)
>>>
>>> pprint(result, sort_dicts=False)
[{'sepal_length': 5.8,
'sepal_width': 2.7,
'petal_length': 5.1,
'petal_width': 1.9,
'species': 'virginica'},
{'sepal_length': 5.1,
'sepal_width': 3.5,
'petal_length': 1.4,
'petal_width': 0.2,
'species': 'setosa'},
{'sepal_length': 5.7,
'sepal_width': 2.8,
'petal_length': 4.1,
'petal_width': 1.3,
'species': 'versicolor'}]
8.7.6. Assignments¶
"""
* Assignment: CSV DictReader Iris
* Complexity: easy
* Lines of code: 5 lines
* Time: 5 min
English:
1. Using `csv.DictReader` read the `FILE` content
2. Use explicit `encoding`, `delimiter` and `quotechar`
3. Replace column names with `FIELDNAMES`
4. Skip the first line (header)
5. Add rows to `result: list[dict]`
6. Run doctests - all must succeed
Polish:
1. Korzystając z `csv.DictReader` wczytaj zawartość pliku `FILE`
2. Podaj jawnie `encoding`, `delimiter` oraz `quotechar`
3. Podmień nazwy kolumn na `FIELDNAMES`
4. Pomiń pierwszą linię (nagłówek)
5. Dodaj wiersze do `result: list[dict]`
6. Uruchom doctesty - wszystkie muszą się powieść
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> from os import remove
>>> remove(FILE)
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is dict for x in result), \
'All rows in `result` should be dict'
>>> result # doctest: +NORMALIZE_WHITESPACE
[{'sepal_length': '5.8', 'sepal_width': '2.7', 'petal_length': '5.1',
'petal_width': '1.9', 'species': 'virginica'},
{'sepal_length': '5.1', 'sepal_width': '3.5', 'petal_length': '1.4',
'petal_width': '0.2', 'species': 'setosa'},
{'sepal_length': '5.7', 'sepal_width': '2.8', 'petal_length': '4.1',
'petal_width': '1.3', 'species': 'versicolor'}]
"""
import csv
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
FIELDNAMES = [
'sepal_length',
'sepal_width',
'petal_length',
'petal_width',
'species',
]
FILE = r'_temporary.csv'
with open(FILE, mode='w') as file:
file.write(DATA)
# Using `csv.DictReader` read the `FILE` content
# type: list[dict]
result = ...