3.1. Serialization CSV

3.1.1. Dialects

import csv

csv.list_dialects()
# ['excel', 'excel-tab', 'unix']
  • quoting options:

    • csv.QUOTE_ALL (safest)

    • csv.QUOTE_MINIMAL (best)

    • csv.QUOTE_NONE

    • csv.QUOTE_NONNUMERIC

  • quotechar options:

    • ' - apostrophe

    • " - quote char (best)

    • | - pipe

    • None - no delimeter

  • lineterminator options:

    • \r\n - New line on Windows

    • \n - New line on *nix

    • *nix operating systems: Linux, macOS, BSD and other POSIX compliant OSes (excluding Windows)

  • encoding options:

    • utf-8 - international standard (should be always used!)

    • iso-8859-1 - ISO standard for Western Europe and USA

    • iso-8859-2 - ISO standard for Central Europe (including Poland)

    • cp1250 or windows-1250 - Polish encoding on Windows

    • cp1251 or windows-1251 - Russian encoding on Windows

    • cp1252 or windows-1252 - Western European encoding on Windows

    • ASCII - ASCII characters only

  • Microsoft Excel 2016 uses:

    • quotechar='"'

    • delimiter=','

    • lineterminator='\n'

    • encoding='...' - depends on Windows version and settings typically windows-*

3.1.2. Reader Object

Listing 3.57. Read data from CSV file using csv.reader()
import csv

FILE = r'/tmp/csv-reader.csv'
# sepal_length,sepal_width,petal_length,petal_width,species
# 5.4,3.9,1.3,0.4,setosa
# 5.9,3.0,5.1,1.8,virginica
# 6.0,3.4,4.5,1.6,versicolor


with open(FILE) as file:
    result = csv.reader(file)

    for line in result:
        print(line)

# ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
# ['5.4', '3.9', '1.3', '0.4', 'setosa']
# ['5.9', '3.0', '5.1', '1.8', 'virginica']
# ['6.0', '3.4', '4.5', '1.6', 'versicolor']

3.1.3. Writer Object

Listing 3.58. Writing data to CSV file using csv.writer()
import csv

FILE = r'/tmp/csv-writer.csv'

DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
]

with open(FILE, mode='w') as file:
    result = csv.writer(file)
    result.writerows(DATA)


# Sepal length,Sepal width,Petal length,Petal width,Species
# 5.8,2.7,5.1,1.9,virginica
# 5.1,3.5,1.4,0.2,setosa
# 5.7,2.8,4.1,1.3,versicolor

3.1.4. DictReader

Listing 3.59. Read data from CSV file using csv.DictReader()
import csv

FILE = r'/tmp/csv-dictreader.csv'
# sepal_length,sepal_width,petal_length,petal_width,species
# 5.4,3.9,1.3,0.4,setosa
# 5.9,3.0,5.1,1.8,virginica
# 6.0,3.4,4.5,1.6,versicolor


with open(FILE) as file:
    result = csv.DictReader(file)

    for line in result:
        print(line)

# {'sepal_length': '5.4', 'sepal_width': '3.9', 'petal_length': '1.3', 'petal_width': '0.4', 'species': 'setosa'}
# {'sepal_length': '5.9', 'sepal_width': '3.0', 'petal_length': '5.1', 'petal_width': '1.8', 'species': 'virginica'}
# {'sepal_length': '6.0', 'sepal_width': '3.4', 'petal_length': '4.5', 'petal_width': '1.6', 'species': 'versicolor'}
Listing 3.60. Read data from CSV file using csv.DictReader()
import csv

FILE = r'/tmp/csv-dictreader.csv'
# 'sepal_length';'sepal_width';'petal_length';'petal_width';'species'
# '5,4';'3,9';'1,3';'0,4';'setosa'
# '5,9';'3,0';'5,1';'1,8';'virginica'
# '6,0';'3,4';'4,5';'1,6';'versicolor'


def isnumeric(value):
    try:
        float(value)
        return True
    except ValueError:
        return False


def clean(line):
    return {key: float(v) if isnumeric(v) else v
            for key, value in line.items()
            if (v := value.replace(',', '.'))}


with open(FILE) as file:
    result = csv.DictReader(file, delimiter=';', quotechar="'")

    for line in result:
        print(clean(line))


# {'sepal_length': 5.4, 'sepal_width': 3.9, 'petal_length': 1.3, 'petal_width': 0.4, 'species': 'setosa'}
# {'sepal_length': 5.9, 'sepal_width': 3.0, 'petal_length': 5.1, 'petal_width': 1.8, 'species': 'virginica'}
# {'sepal_length': 6.0, 'sepal_width': 3.4, 'petal_length': 4.5, 'petal_width': 1.6, 'species': 'versicolor'}
Listing 3.61. Read data from CSV file using csv.DictReader(). While giving custom names note, that first line (typically a header) will be treated like normal data. Therefore we skip it using header = file.readline()
import csv

FILE = r'/tmp/csv-dictreader.csv'
# sepal_length,sepal_width,petal_length,petal_width,species
# 5.4,3.9,1.3,0.4,setosa
# 5.9,3.0,5.1,1.8,virginica
# 6.0,3.4,4.5,1.6,versicolor

FIELDNAMES = [
    'Sepal Length',
    'Sepal Width',
    'Petal Length',
    'Petal Width',
    'Species',
]


with open(FILE) as file:
    result = csv.DictReader(file, fieldnames=FIELDNAMES, delimiter=',')
    file.readline()  # skip first line

    for line in result:
        print(line)

# {'Sepal Length': '5.4', 'Sepal Width': '3.9', 'Petal Length': '1.3', 'Petal Width': '0.4', 'Species': 'setosa'}
# {'Sepal Length': '5.9', 'Sepal Width': '3.0', 'Petal Length': '5.1', 'Petal Width': '1.8', 'Species': 'virginica'}
# {'Sepal Length': '6.0', 'Sepal Width': '3.4', 'Petal Length': '4.5', 'Petal Width': '1.6', 'Species': 'versicolor'}

3.1.5. DictWriter

  • Remember to add mode='w' to open() function

  • Default encoding is encoding='utf-8'

import csv

FILE = r'/tmp/csv-dictwriter.csv'

DATA = [
    {'Sepal Length': 5.4, 'Sepal Width': 3.9, 'Petal Length': 1.3, 'Petal Width': 0.4, 'Species': 'setosa'},
    {'Sepal Length': 5.9, 'Sepal Width': 3.0, 'Petal Length': 5.1, 'Petal Width': 1.8, 'Species': 'virginica'},
    {'Sepal Length': 6.0, 'Sepal Width': 3.4, 'Petal Length': 4.5, 'Petal Width': 1.6, 'Species': 'versicolor'},
]

header = DATA[0].keys()

with open(FILE, mode='w') as file:
    result = csv.DictWriter(file, fieldnames=header)
    result.writeheader()
    result.writerows(DATA)


# Sepal Length,Sepal Width,Petal Length,Petal Width,Species
# 5.4,3.9,1.3,0.4,setosa
# 5.9,3.0,5.1,1.8,virginica
# 6.0,3.4,4.5,1.6,versicolor
Listing 3.62. Write data to CSV file using csv.DictWriter()
import csv

FILE = r'/tmp/csv-dictwriter.csv'

DATA = [
    {'sepal_length': 5.4, 'sepal_width': 3.9, 'petal_length': 1.3, 'petal_width': 0.4, 'species': 'setosa'},
    {'sepal_length': 5.9, 'sepal_width': 3.0, 'petal_length': 5.1, 'petal_width': 1.8, 'species': 'virginica'},
    {'sepal_length': 6.0, 'sepal_width': 3.4, 'petal_length': 4.5, 'petal_width': 1.6, 'species': 'versicolor'},
]

FIELDNAMES = [
    'sepal_length',
    'sepal_width',
    'petal_length',
    'petal_width',
    'species'
]

with open(FILE, mode='w', encoding='utf-8') as file:
    result = csv.DictWriter(
        f=file,
        fieldnames=FIELDNAMES,
        delimiter=',',
        quotechar='"',
        quoting=csv.QUOTE_ALL,
        lineterminator='\n')

    result.writeheader()
    result.writerows(DATA)

# "sepal_length","sepal_width","petal_length","petal_width","species"
# "5.4","3.9","1.3","0.4","setosa"
# "5.9","3.0","5.1","1.8","virginica"
# "6.0","3.4","4.5","1.6","versicolor"

3.1.6. Parsing Non-CSV Files

Listing 3.63. Parsing /etc/passwd file with csv.DictReader()
import csv


FILE = r'/tmp/etc-passwd.txt'
# root:x:0:0:root:/root:/bin/bash
# watney:x:1000:1000:Mark Watney:/home/watney:/bin/bash
# jimenez:x:1001:1001:José Jiménez:/home/jimenez:/bin/bash
# twardowski:x:1002:1002:Jan Twardowski:/home/twardowski:/bin/bash

with open(FILE) as file:
    result = csv.DictReader(
        file,
        fieldnames=['username', 'password', 'uid', 'gid', 'full_name', 'home', 'shell'],
        delimiter=':',
        lineterminator='\n',
        quoting=csv.QUOTE_NONE)

    for line in result:
        print(line)

# {'username': 'root', 'password': 'x', 'uid': '0',...}
# {'username': 'watney', 'password': 'x', 'uid': '1000',...}
# {'username': 'jimenez', 'password': 'x', 'uid': '1001',...}
# {'username': 'twardowski', 'password': 'x', 'uid': '1002',...}
Listing 3.64. Parsing Java properties file with csv.DictReader()
import csv


FILE = r'/tmp/sonar-project.properties'
# sonar.projectKey=habitatOS
# sonar.projectName=habitatOS
# sonar.language=py
# sonar.sourceEncoding=UTF-8
# sonar.verbose=true

with open(FILE) as file:
    result = csv.DictReader(
        file,
        fieldnames=['property', 'value'],
        delimiter='=',
        lineterminator='\n',
        quoting=csv.QUOTE_NONE)

    for line in result:
        print(line)

# {'property': 'sonar.projectKey', 'value': 'habitatOS'}
# {'property': 'sonar.projectName', 'value': 'habitatOS'}
# {'property': 'sonar.language', 'value': 'py'}
# {'property': 'sonar.sourceEncoding', 'value': 'UTF-8'}
# {'property': 'sonar.verbose', 'value': 'true'}

3.1.7. Good Practices

  • Always specify:

    • delimiter=',' to csv.DictReader() object

    • quotechar='"' to csv.DictReader() object

    • quoting=csv.QUOTE_ALL to csv.DictReader() object

    • lineterminator='\n' to csv.DictReader() object

    • encoding='utf-8' to open() function (especially when working with Microsoft Excel)

3.1.8. Assignments

3.1.8.1. Serialization CSV DictReader

English
  1. Use data from "Input" section (see below)

  2. Download data/iris.csv file and save as iris.csv in your script folder

  3. Using csv.DictReader read the content

  4. Use explicit encoding, delimiter and quotechar

  5. Replace column names to FIELDNAMES

  6. Skip the first line (header)

  7. Print rows with data

  8. Compare result with "Output" section (see below)

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Pobierz plik data/iris.csv i zapisz go jako iris.csv w katalogu ze skryptami

  3. Korzystając z csv.DictReader wczytaj zawartość pliku

  4. Podaj jawnie encoding, delimiter oraz quotechar

  5. Podmień nazwy kolumn na FIELDNAMES

  6. Pomiń pierwszą linię (nagłówek)

  7. Wypisz wiersze z danymi

  8. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Input
FIELDNAMES = [
    'Sepal Length',
    'Sepal Width',
    'Petal Length',
    'Petal Width',
    'Species',
]
Output
{'Sepal Length': '5.4', 'Sepal Width': '3.9', 'Petal Length': '1.3', 'Petal Width': '0.4', 'Species': 'setosa'}
{'Sepal Length': '5.9', 'Sepal Width': '3.0', 'Petal Length': '5.1', 'Petal Width': '1.8', 'Species': 'virginica'}
{'Sepal Length': '6.0', 'Sepal Width': '3.4', 'Petal Length': '4.5', 'Petal Width': '1.6', 'Species': 'versicolor'}
...

3.1.8.2. Serialization CSV DictWriter

English
  1. Use data from "Input" section (see below)

  2. Using csv.DictWriter() save DATA to file

  3. Open file in your spreadsheet program like Microsoft Excel / Libre Office / Numbers etc.

  4. Open file in simple in your IDE and simple text editor (like Notepad, vim, gedit)

  5. Compare result with "Output" section (see below)

  6. Non functional requirements:

    • All fields must be enclosed by double quote " character

    • Use , to separate columns

    • Use utf-8 encoding

    • Use Unix \n newline

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Za pomocą csv.DictWriter() zapisz DATA do pliku

  3. Spróbuj otworzyć plik w arkuszu kalkulacyjnym tj. Microsoft Excel / Libre Office / Numbers itp

  4. Spróbuj otworzyć plik w IDE i prostym edytorze tekstu tj. Notepad, vim lub gedit

  5. Porównaj wyniki z sekcją "Output" (patrz poniżej)

  6. Wymagania niefunkcjonalne:

    • Wszystkie pola muszą być otoczone znakiem cudzysłowu "

    • Użyj , do oddzielenia kolumn

    • Użyj kodowania utf-8

    • Użyj zakończenia linii Unix \n

Input
DATA = [
    {'firstname': 'Jan',  'lastname': 'Twardowski'},
    {'firstname': 'José', 'lastname': 'Jiménez'},
    {'firstname': 'Mark', 'lastname': 'Watney'},
    {'firstname': 'Ivan', 'lastname': 'Ivanovic'},
    {'firstname': 'Melissa', 'lastname': 'Lewis'},
]
Output
"firstname","lastname"
"Jan","Twardowski"
"José","Jiménez"
"Mark","Watney"
"Ivan","Ivanovic"
"Melissa","Lewis"

3.1.8.3. Serialization CSV List of Tuples

English
  1. Use data from "Input" section (see below)

  2. Using csv.DictWriter() save DATA to file

  3. Compare result with "Output" section (see below)

  4. Non functional requirements:

    • Do not use quotes in output CSV file

    • Use , to separate columns

    • Use utf-8 encoding

    • Use Unix \n newline

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Za pomocą csv.DictWriter() zapisz DATA do pliku

  3. Porównaj wyniki z sekcją "Output" (patrz poniżej)

  4. Wymagania niefunkcjonalne:

    • Nie używaj cudzysłowów w wynikowym pliku CSV

    • Użyj , do oddzielenia kolumn

    • Użyj kodowania utf-8

    • Użyj zakończenia linii Unix \n

Input
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
    (7.1, 3.0, 5.9, 2.1, 'virginica'),
    (4.6, 3.4, 1.4, 0.3, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (5.0, 3.6, 1.4, 0.3, 'setosa'),
    (5.5, 2.3, 4.0, 1.3, 'versicolor'),
    (6.5, 3.0, 5.8, 2.2, 'virginica'),
    (6.5, 2.8, 4.6, 1.5, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (6.9, 3.1, 4.9, 1.5, 'versicolor'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]
Output
Sepal length,Sepal width,Petal length,Petal width,Species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
6.3,2.9,5.6,1.8,virginica
6.4,3.2,4.5,1.5,versicolor
4.7,3.2,1.3,0.2,setosa
7.0,3.2,4.7,1.4,versicolor
7.6,3.0,6.6,2.1,virginica
4.9,3.0,1.4,0.2,setosa
4.9,2.5,4.5,1.7,virginica
7.1,3.0,5.9,2.1,virginica
4.6,3.4,1.4,0.3,setosa
5.4,3.9,1.7,0.4,setosa
5.7,2.8,4.5,1.3,versicolor
5.0,3.6,1.4,0.3,setosa
5.5,2.3,4.0,1.3,versicolor
6.5,3.0,5.8,2.2,virginica
6.5,2.8,4.6,1.5,versicolor
6.3,3.3,6.0,2.5,virginica
6.9,3.1,4.9,1.5,versicolor
4.6,3.1,1.5,0.2,setosa

3.1.8.4. Serialization CSV Schemaless

English
  1. Use data from "Input" section (see below)

  2. Using csv.DictWriter() write variable schema data to CSV file

  3. fieldnames must be automatically generated from DATA

  4. fieldnames must always be in the same order

  5. Compare result with "Output" section (see below)

  6. Non functional requirements:

    • All fields must be enclosed by double quote " character

    • Use ; to separate columns

    • Use utf-8 encoding

    • Use Unix \n newline

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Za pomocą csv.DictWriter() zapisz do pliku CSV dane o zmiennej strukturze

  3. fieldnames musi być generowane automatycznie na podstawie DATA

  4. fieldnames ma być zawsze w takiej samej kolejności

  5. Porównaj wyniki z sekcją "Output" (patrz poniżej)

  6. Wymagania niefunkcjonalne:

    • Wszystkie pola muszą być otoczone znakiem cudzysłowu "

    • Użyj , do oddzielenia kolumn

    • Użyj kodowania utf-8

    • Użyj zakończenia linii Unix \n

The whys and wherefores
  • Ability to use csv module to write data

  • Ability to iterate over nested data structures

  • Dynamically generate data structures from other

Input
DATA = [
    {'Sepal length': 5.1, 'Sepal width': 3.5, 'Species': 'setosa'},
    {'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'},
    {'Sepal length': 6.3, 'Petal width': 1.8, 'Species': 'virginica'},
    {'Sepal length': 5.0, 'Petal width': 0.2, 'Species': 'setosa'},
    {'Sepal width': 2.8, 'Petal length': 4.1, 'Species': 'versicolor'},
    {'Sepal width': 2.9, 'Petal width': 1.8, 'Species': 'virginica'},
]
Output
Table 3.3. Output

Petal length

Petal width

Sepal length

Sepal width

Species

5.1

3.5

setosa

4.1

1.3

versicolor

1.8

6.3

virginica

0.2

5.0

setosa

4.1

2.8

versicolor

1.8

2.9

virginica

3.1.8.5. Serialization CSV Objects

English
  1. Use data from "Input" section (see below)

  2. Using csv.DictWriter() save data to CSV file

  3. Non functional requirements:

    • All fields must be enclosed by double quote " character

    • Use , to separate columns

    • Use utf-8 encoding

    • Use Unix \n newline

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Za pomocą csv.DictWriter() zapisz dane do pliku CSV

  3. Wymagania niefunkcjonalne:

    • Wszystkie pola muszą być otoczone znakiem cudzysłowu "

    • Użyj , do oddzielenia kolumn

    • Użyj kodowania utf-8

    • Użyj zakończenia linii Unix \n

Input
class Iris:
    def __init__(self, sepal_length, sepal_width,
                 petal_length, petal_width, species):

        self.sepal_length = sepal_length
        self.sepal_width = sepal_width
        self.petal_length = petal_length
        self.petal_width = petal_width
        self.species = species


DATA = [
    Iris(5.1, 3.5, 1.4, 0.2, 'setosa'),
    Iris(5.8, 2.7, 5.1, 1.9, 'virginica'),
    Iris(5.1, 3.5, 1.4, 0.2, 'setosa'),
    Iris(5.7, 2.8, 4.1, 1.3, 'versicolor'),
    Iris(6.3, 2.9, 5.6, 1.8, 'virginica'),
    Iris(6.4, 3.2, 4.5, 1.5, 'versicolor'),
]

3.1.8.6. Serialization CSV Relations

English
  1. Use data from "Input" section (see below)

  2. Using csv.DictWriter() save contacts from addressbook to CSV file

  3. How to write relations to CSV file (contact has many addresses)?

  4. Recreate object structure from CSV file

  5. Non functional requirements:

    • All fields must be enclosed by double quote " character

    • Use ; to separate columns

    • Use utf-8 encoding

    • Use Unix \n newline

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Za pomocą csv.DictWriter() zapisz kontakty z książki adresowej w pliku

  3. Jak zapisać w CSV dane relacyjne (kontakt ma wiele adresów)?

  4. Odtwórz strukturę obiektów na podstawie danych odczytanych z pliku

  5. Wymagania niefunkcjonalne:

    • Wszystkie pola muszą być otoczone znakiem cudzysłowu "

    • Użyj ; do oddzielenia kolumn

    • Użyj kodowania utf-8

    • Użyj zakończenia linii Unix \n

Input
class Contact:
     def __init__(self, firstname, lastname, addresses=()):
         self.firstname = firstname
         self.lastname = lastname
         self.addresses = addresses


 class Address:
     def __init__(self, location, city):
         self.location = location
         self.city = city


 DATA = [
     Contact(firstname='Jan', lastname='Twardowski', addresses=(
         Address(location='Johnson Space Center', city='Houston, TX'),
         Address(location='Kennedy Space Center', city='Merritt Island, FL'),
         Address(location='Jet Propulsion Laboratory', city='Pasadena, CA'),
     )),
     Contact(firstname='Mark', lastname='Watney'),
     Contact(firstname='Melissa', lastname='Lewis', addresses=()),
 ]