5.1. CSV Standard

5.1.1. Rationale

  • CSV - Comma Separated Values

  • CSV - Character Separated Values

CSV file with mixed values (numeric and strings). First line is a header:

SepalLength, SepalWidth, PetalLength, PetalWidth, Species
5.4, 3.9, 1.3, 0.4, setosa
5.9, 3.0, 5.1, 1.8, virginica
6.0, 3.4, 4.5, 1.6, versicolor
7.3, 2.9, 6.3, 1.8, virginica
5.6, 2.5, 3.9, 1.1, versicolor
5.4, 3.9, 1.3, 0.4, setosa

5.1.2. Variants

CSV file with numeric values:

5.4, 3.9, 1.3, 0.4, 0
5.9, 3.0, 5.1, 1.8, 1
6.0, 3.4, 4.5, 1.6, 2

CSV file with text values. First line is a header:

Firstname, Lastname
Mark, Watney
Jan, Twardowski
Melissa, Lewis
Alex, Vogel
150,4,setosa,versicolor,virginica
5.1,3.5,1.4,0.2,0
7.0,3.2,4.7,1.4,1
6.3,3.3,6.0,2.5,2
5.8,2.7,5.1,1.9,2
4.9,3.0,1.4,0.2,0
6.4,3.2,4.5,1.5,1

5.1.3. Delimiter

delimiter=',':

SepalLength,SepalWidth,PetalLength,PetalWidth,Species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor

delimiter=';':

SepalLength;SepalWidth;PetalLength;PetalWidth;Species
5.8;2.7;5.1;1.9;virginica
5.1;3.5;1.4;0.2;setosa
5.7;2.8;4.1;1.3;versicolor
SepalLength;SepalWidth;PetalLength;PetalWidth;Species
5,8;2,7;5,1;1,9;virginica
5,1;3,5;1,4;0,2;setosa
5,7;2,8;4,1;1,3;versicolor

delimiter=':':

SepalLength:SepalWidth:PetalLength:PetalWidth:Species
5.8:2.7:5.1:1.9;virginica
5.1:3.5:1.4:0.2;setosa
5.7:2.8:4.1:1.3;versicolor
root:x:0:0:root:/root:/bin/bash
watney:x:1000:1000:Mark Watney:/home/watney:/bin/bash
jimenez:x:1001:1001:José Jiménez:/home/jimenez:/bin/bash
ivanovic:x:1002:1002:Иван Иванович:/home/ivanovic:/bin/bash
lewis:x:1003:1002:Melissa Lewis:/home/ivanovic:/bin/bash

delimiter='|':

SepalLength|SepalWidth|PetalLength|PetalWidth|Species
5.8|2.7|5.1|1.9|virginica
5.1|3.5|1.4|0.2|setosa
5.7|2.8|4.1|1.3|versicolor
Firstname | Lastname | Role
----------|----------|----------
Mark      | Watney   | Botanist
Melissa   | Lewis    | Commander
Rick      | Martinez | Pilot
| Firstname | Lastname | Role      |
|-----------|----------|-----------|
| Mark      | Watney   | Botanist  |
| Melissa   | Lewis    | Commander |
| Rick      | Martinez | Pilot     |

delimiter='\t':

Sepal length        Sepal width     Petal length    Petal width     Species
5.8 2.7     5.1     1.9     virginica
5.1 3.5     1.4     0.2     setosa
5.7 2.8     4.1     1.3     versicolor

5.1.4. Quotechar

  • " - quote char (best)

  • ' - apostrophe

quotechar='"':

"Sepal length","Sepal width","Petal length","Petal width","Species"
"5.8","2.7","5.1","1.9","virginica"
"5.1","3.5","1.4","0.2","setosa"
"5.7","2.8","4.1","1.3","versicolor"

quotechar="'":

'Sepal length','Sepal width','Petal length','Petal width','Species'
'5.8','2.7','5.1','1.9','virginica'
'5.1','3.5','1.4','0.2','setosa'
'5.7','2.8','4.1','1.3','versicolor'

quotechar='|':

|Sepal length|,|Sepal width|,|Petal length|,|Petal width|,|Species|
|5.8|,|2.7|,|5.1|,|1.9|,|virginica|
|5.1|,|3.5|,|1.4|,|0.2|,|setosa|
|5.7|,|2.8|,|4.1|,|1.3|,|versicolor|

quotechar='/':

/Sepal length/,/Sepal width/,/Petal length/,/Petal width/,/Species/
/5.8/,/2.7/,/5.1/,/1.9/,/virginica/
/5.1/,/3.5/,/1.4/,/0.2/,/setosa/
/5.7/,/2.8/,/4.1/,/1.3/,/versicolor/

5.1.5. Quoting

  • csv.QUOTE_ALL (safest)

  • csv.QUOTE_MINIMAL

  • csv.QUOTE_NONE

  • csv.QUOTE_NONNUMERIC

quoting=csv.QUOTE_ALL:

"Sepal length","Sepal width","Petal length","Petal width","Species"
"5.8","2.7","5.1","1.9","virginica"
"5.1","3.5","1.4","0.2","setosa"
"5.7","2.8","4.1","1.3","versicolor"

quoting=csv.QUOTE_MINIMAL:

Sepal length,Sepal width,Petal length,Petal width,Species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor

quoting=csv.QUOTE_NONE:

Sepal length,Sepal width,Petal length,Petal width,Species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor

quoting=csv.QUOTE_NONNUMERIC:

"Sepal length","Sepal width","Petal length","Petal width","Species"
5.8,2.7,5.1,1.9,"virginica"
5.1,3.5,1.4,0.2,"setosa"
5.7,2.8,4.1,1.3,"versicolor"

5.1.6. Lineterminator

  • \r\n - New line on Windows

  • \n - New line on *nix

  • *nix operating systems: Linux, macOS, BSD and other POSIX compliant OSes (excluding Windows)

5.1.7. Encoding

  • utf-8 - international standard (should be always used!)

  • iso-8859-1 - ISO standard for Western Europe and USA

  • iso-8859-2 - ISO standard for Central Europe (including Poland)

  • cp1250 or windows-1250 - Polish encoding on Windows

  • cp1251 or windows-1251 - Russian encoding on Windows

  • cp1252 or windows-1252 - Western European encoding on Windows

  • ASCII - ASCII characters only

with open(FILE, encoding='utf-8') as file:
    ...

5.1.8. Dialects

import csv

csv.list_dialects()
# ['excel', 'excel-tab', 'unix']
  • Microsoft Excel 2016-2020:

    • quoting=csv.QUOTE_MINIMAL

    • quotechar='"'

    • delimiter=','

    • lineterminator='\n'

    • encoding='...' - depends on Windows version and settings typically windows-*

  • Microsoft Excel macOS:

    • quoting=csv.QUOTE_MINIMAL

    • quotechar='"'

    • delimiter=','

    • lineterminator='\r\n'

    • encoding='utf-8'

  • Microsoft export options:

../../_images/csv-standard-dialects.png
$ file utf8.csv
utf8.csv: CSV text

$ cat utf8.csv
Firstname,Lastname,Age,Comment
Mark,Watney,21,zażółć gęślą jaźń
Melissa,Lewis,21.5,"Some, comment"
,,"21,5",Some; Comment
$ file standard.csv
standard.csv: CSV text

$ cat standard.csv
Firstname,Lastname,Age,Comment
Mark,Watney,21,za_?__ g__l_ ja__
Melissa,Lewis,21.5,"Some, comment"
,,"21,5",Some; Comment
$ file dos.csv
dos.csv: CSV text

$ cat dos.csv
Firstname,Lastname,Age,Comment
Mark,Watney,21,za_?__ g__l_ ja__
Melissa,Lewis,21.5,"Some, comment"
,,"21,5",Some; Comment
$ file macintosh.csv
macintosh.csv: Non-ISO extended-ASCII text, with CR line terminators

$ cat macintosh.csv
,,"21,5",Some; Comment

5.1.9. Good Practices

Always specify:

  • delimiter=',' to csv.DictReader() object

  • quotechar='"' to csv.DictReader() object

  • quoting=csv.QUOTE_ALL to csv.DictReader() object

  • lineterminator='\n' to csv.DictReader() object

  • encoding='utf-8' to open() function (especially when working with Microsoft Excel)