9.1. CSV About¶
CSV - Comma/Character Separated Values
No CSV formal standard, just a good practice
Flat file (2D) without relations
Relations has to be flatten (serialization, additional columns, etc...)
Typically first line (header) represents column names
Rarely first line can also have a structure (nrows, ncols)
Internationalization: encoding
Localization: decimal separator, thousands separator, date format
Parameters: delimiter, quotechar, quoting, lineterminator, dialect
Example CSV file:
SepalLength, SepalWidth, PetalLength, PetalWidth, Species
5.8, 2.7, 5.1, 1.9, virginica
5.1, 3.5, 1.4, 0.2, setosa
5.7, 2.8, 4.1, 1.3, versicolor
6.3, 2.9, 5.6, 1.8, virginica
6.4, 3.2, 4.5, 1.5, versicolor
4.7, 3.2, 1.3, 0.2, setosa
7.0, 3.2, 4.7, 1.4, versicolor
7.6, 3.0, 6.6, 2.1, virginica
4.9, 3.0, 1.4, 0.2, setosa
4.9, 2.5, 4.5, 1.7, virginica
9.1.1. Header¶
File without header:
5.8, 2.7, 5.1, 1.9, virginica
5.1, 3.5, 1.4, 0.2, setosa
5.7, 2.8, 4.1, 1.3, versicolor
6.3, 2.9, 5.6, 1.8, virginica
6.4, 3.2, 4.5, 1.5, versicolor
4.7, 3.2, 1.3, 0.2, setosa
7.0, 3.2, 4.7, 1.4, versicolor
7.6, 3.0, 6.6, 2.1, virginica
4.9, 3.0, 1.4, 0.2, setosa
4.9, 2.5, 4.5, 1.7, virginica
First line is a header:
SepalLength, SepalWidth, PetalLength, PetalWidth, Species
5.8, 2.7, 5.1, 1.9, virginica
5.1, 3.5, 1.4, 0.2, setosa
5.7, 2.8, 4.1, 1.3, versicolor
6.3, 2.9, 5.6, 1.8, virginica
6.4, 3.2, 4.5, 1.5, versicolor
4.7, 3.2, 1.3, 0.2, setosa
7.0, 3.2, 4.7, 1.4, versicolor
7.6, 3.0, 6.6, 2.1, virginica
4.9, 3.0, 1.4, 0.2, setosa
4.9, 2.5, 4.5, 1.7, virginica
First line is a structure: number of rows (nrows) and columns (ncols):
10, 5
5.8, 2.7, 5.1, 1.9, virginica
5.1, 3.5, 1.4, 0.2, setosa
5.7, 2.8, 4.1, 1.3, versicolor
6.3, 2.9, 5.6, 1.8, virginica
6.4, 3.2, 4.5, 1.5, versicolor
4.7, 3.2, 1.3, 0.2, setosa
7.0, 3.2, 4.7, 1.4, versicolor
7.6, 3.0, 6.6, 2.1, virginica
4.9, 3.0, 1.4, 0.2, setosa
4.9, 2.5, 4.5, 1.7, virginica
First line is a structure: number of rows (nrows) and features (nfeatures), followed by label_encoder values for label column:
10, 4, virginica, setosa, versicolor
5.8, 2.7, 5.1, 1.9, 0
5.1, 3.5, 1.4, 0.2, 1
5.7, 2.8, 4.1, 1.3, 2
6.3, 2.9, 5.6, 1.8, 0
6.4, 3.2, 4.5, 1.5, 2
4.7, 3.2, 1.3, 0.2, 1
7.0, 3.2, 4.7, 1.4, 2
7.6, 3.0, 6.6, 2.1, 0
4.9, 3.0, 1.4, 0.2, 1
4.9, 2.5, 4.5, 1.7, 0
9.1.2. Delimiter¶
csv
module expects delimeter to be 1-character in length
delimiter=', '
:
SepalLength, SepalWidth, PetalLength, PetalWidth, Species
5.8, 2.7, 5.1, 1.9, virginica
5.1, 3.5, 1.4, 0.2, setosa
5.7, 2.8, 4.1, 1.3, versicolor
delimiter=','
:
SepalLength,SepalWidth,PetalLength,PetalWidth,Species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
delimiter=';'
:
SepalLength;SepalWidth;PetalLength;PetalWidth;Species
5.8;2.7;5.1;1.9;virginica
5.1;3.5;1.4;0.2;setosa
5.7;2.8;4.1;1.3;versicolor
delimiter=':'
:
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
nobody:x:99:99:Nobody:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
watney:x:1000:1000:Mark Watney:/home/watney:/bin/bash
lewis:x:1001:1001:Melissa Lewis:/home/lewis:/bin/bash
martinez:x:1002:1002:Rick Martinez:/home/martinez:/bin/bash
delimiter='|'
:
| Firstname | Lastname | Role |
|-----------|----------|-----------|
| Mark | Watney | Botanist |
| Melissa | Lewis | Commander |
| Rick | Martinez | Pilot |
delimiter='\t'
:
SepalLength SepalWidth PetalLength PetalWidth Species
5.8 2.7 5.1 1.9 virginica
5.1 3.5 1.4 0.2 setosa
5.7 2.8 4.1 1.3 versicolor
9.1.3. Quotechar¶
"
- quote char (best)'
- apostrophe
quotechar='"'
:
"SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "Species"
"5.8", "2.7", "5.1", "1.9", "virginica"
"5.1", "3.5", "1.4", "0.2", "setosa"
"5.7", "2.8", "4.1", "1.3", "versicolor"
quotechar="'"
:
'SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species'
'5.8', '2.7', '5.1', '1.9', 'virginica'
'5.1', '3.5', '1.4', '0.2', 'setosa'
'5.7', '2.8', '4.1', '1.3', 'versicolor'
quotechar='|'
:
|SepalLength|, |SepalWidth|, |PetalLength|, |PetalWidth|, |Species|
|5.8|, |2.7|, |5.1|, |1.9|, |virginica|
|5.1|, |3.5|, |1.4|, |0.2|, |setosa|
|5.7|, |2.8|, |4.1|, |1.3|, |versicolor|
quotechar='/'
:
/SepalLength/, /SepalWidth/, /PetalLength/, /PetalWidth/, /Species/
/5.8/, /2.7/, /5.1/, /1.9/, /virginica/
/5.1/, /3.5/, /1.4/, /0.2/, /setosa/
/5.7/, /2.8/, /4.1/, /1.3/, /versicolor/
9.1.4. Quoting¶
csv.QUOTE_ALL
(safest)csv.QUOTE_MINIMAL
csv.QUOTE_NONE
csv.QUOTE_NONNUMERIC
quoting=csv.QUOTE_ALL
:
"SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "Species"
"5.8", "2.7", "5.1", "1.9", "virginica"
"5.1", "3.5", "1.4", "0.2", "setosa"
"5.7", "2.8", "4.1", "1.3", "versicolor"
quoting=csv.QUOTE_MINIMAL
:
SepalLength, SepalWidth, PetalLength, PetalWidth, Species
5.8, 2.7, 5.1, 1.9, virginica
5.1, 3.5, 1.4, 0.2, setosa
5.7, 2.8, 4.1, 1.3, versicolor
quoting=csv.QUOTE_NONE
:
SepalLength, SepalWidth, PetalLength, PetalWidth, Species
5.8, 2.7, 5.1, 1.9, virginica
5.1, 3.5, 1.4, 0.2, setosa
5.7, 2.8, 4.1, 1.3, versicolor
quoting=csv.QUOTE_NONNUMERIC
:
"SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "Species"
5.8, 2.7, 5.1, 1.9, "virginica"
5.1, 3.5, 1.4, 0.2, "setosa"
5.7, 2.8, 4.1, 1.3, "versicolor"
9.1.5. Lineterminator¶
\r\n
- New line on Windows\n
- New line on*nix
*nix
operating systems: Linux, macOS, BSD and other POSIX compliant OSes (excluding Windows)
9.1.6. Decimal Separator¶
0.1
- Decimal point0,1
- Decimal comma

SepalLength, SepalWidth, PetalLength, PetalWidth, Species
5.8; 2.7; 5.1; 1.9; virginica
5.1; 3.5; 1.4; 0.2; setosa
5.7; 2.8; 4.1; 1.3; versicolor
SepalLength, SepalWidth, PetalLength, PetalWidth, Species
5,8; 2,7; 5,1; 1,9; virginica
5,1; 3,5; 1,4; 0,2; setosa
5,7; 2,8; 4,1; 1,3; versicolor
9.1.7. Thousands Separator¶
1000000
- None1'000'000
- Apostrophe1 000 000
- Space, the internationally recommended thousands separator1.000.000
- Period, used in many non-English speaking countries1,000,000
- Comma, used in most English-speaking countries
9.1.8. Date and Time¶
>>> date = '1961-04-12'
>>> date = '12.4.1961'
>>> date = '12.04.1961'
>>> date = '12-04-1961'
>>> date = '12/04/1961'
>>> date = '4/12/61'
>>> date = '4.12.1961'
>>> date = 'Apr 12, 1961'
>>> date = 'Apr 12th, 1961'
>>> time = '12:00:00'
>>> time = '12:00'
>>> time = '12:00 pm'
>>> duration = '04:30:00'
>>> duration = '4h 30m'
>>> duration = '4 hours 30 minutes'
9.1.9. Encoding¶
utf-8
- international standard (should be always used!)iso-8859-1
- ISO standard for Western Europe and USAiso-8859-2
- ISO standard for Central Europe (including Poland)cp1250
orwindows-1250
- Central European encoding on Windowscp1251
orwindows-1251
- Eastern European encoding on Windowscp1252
orwindows-1252
- Western European encoding on WindowsASCII
- ASCII characters only
with open(FILE, encoding='utf-8') as file:
...
9.1.10. Dialects¶
import csv
csv.list_dialects()
# ['excel', 'excel-tab', 'unix']
Microsoft Excel 2016-2020:
quoting=csv.QUOTE_MINIMAL
quotechar='"'
delimiter=','
ordelimiter=';'
depending on Windows locale decimal separatorlineterminator='\r\n'
encoding='...'
- depends on Windows locale typicallywindows-*
Microsoft Excel macOS:
quoting=csv.QUOTE_MINIMAL
quotechar='"'
delimiter=','
lineterminator='\r\n'
encoding='utf-8'
Microsoft export options:

$ file utf8.csv
utf8.csv: CSV text
$ cat utf8.csv
Firstname,Lastname,Age,Comment
Mark,Watney,21,zażółć gęślą jaźń
Melissa,Lewis,21.5,"Some, comment"
,,"21,5",Some; Comment
$ file standard.csv
standard.csv: CSV text
$ cat standard.csv
Firstname,Lastname,Age,Comment
Mark,Watney,21,za_?__ g__l_ ja__
Melissa,Lewis,21.5,"Some, comment"
,,"21,5",Some; Comment
$ file dos.csv
dos.csv: CSV text
$ cat dos.csv
Firstname,Lastname,Age,Comment
Mark,Watney,21,za_?__ g__l_ ja__
Melissa,Lewis,21.5,"Some, comment"
,,"21,5",Some; Comment
$ file macintosh.csv
macintosh.csv: Non-ISO extended-ASCII text, with CR line terminators
$ cat macintosh.csv
,,"21,5",Some; Comment
9.1.11. Good Practices¶
Always specify:
delimiter=','
tocsv.DictReader()
object
quotechar='"'
tocsv.DictReader()
object
quoting=csv.QUOTE_ALL
tocsv.DictReader()
object
lineterminator='\n'
tocsv.DictReader()
object
encoding='utf-8'
toopen()
function (especially when working with Microsoft Excel)
9.1.12. Assignments¶
"""
* Assignment: CSV Format ReadString
* Complexity: easy
* Lines of code: 4 lines
* Time: 5 min
English:
1. Convert `DATA` to `result: list[tuple[str]]`
2. Do not convert numeric values to `float`, leave them as `str`
3. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
2. Nie konwertuj wartości numerycznych do `float`, zostaw jako `str`
3. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.splitlines()`
* `str.strip()`
* `str.split()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is tuple for x in result), \
'All rows in `result` should be tuple'
>>> result # doctest: +NORMALIZE_WHITESPACE
[('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
('5.8', '2.7', '5.1', '1.9', 'virginica'),
('5.1', '3.5', '1.4', '0.2', 'setosa'),
('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...
"""
* Assignment: CSV Format ReadSwitch
* Complexity: easy
* Lines of code: 6 lines
* Time: 5 min
English:
1. Convert `DATA` to `result: list[tuple[str]]`
2. Substitute last element (class label) with value from `LABEL_ENCODER`
3. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
2. Podmień ostatni element (etykietę klasową) z wartością z `LABEL_ENCODER`
3. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.splitlines()`
* `str.strip()`
* `str.split()`
* `dict.get()`
* `list() + list()`
* `list.append()`
* `tuple()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is tuple for x in result), \
'All rows in `result` should be tuple'
>>> result # doctest: +NORMALIZE_WHITESPACE
[('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
('5.8', '2.7', '5.1', '1.9', 'virginica'),
('5.1', '3.5', '1.4', '0.2', 'setosa'),
('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,0
5.1,3.5,1.4,0.2,1
5.7,2.8,4.1,1.3,2"""
LABEL_ENCODER = {
'0': 'virginica',
'1': 'setosa',
'2': 'versicolor'}
# data from file (note the list[tuple] format!)
# type: list[tuple]
result = ...
"""
* Assignment: CSV Format ReadLabelEncoder
* Complexity: medium
* Lines of code: 10 lines
* Time: 13 min
English:
1. Convert `DATA` to `result: list[tuple[str]]`
2. Generate `LABEL_ENCODER: dict[int,str]` from `header: list[str]`
3. Substitute last element (class label) with value from `LABEL_ENCODER`
4. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
2. Wygeneruj `LABEL_ENCODER: dict[int,str]` z `header: list[str]`
3. Podmień ostatni element (etykietę klasową) z wartością z `LABEL_ENCODER`
4. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `dict(enumerate())`
* `str.strip()`
* `str.split()`
* `dict.get()`
* `int()`
* `list() + list()`
* `list.append()`
* `tuple()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is tuple for x in result), \
'All rows in `result` should be tuple'
>>> result # doctest: +NORMALIZE_WHITESPACE
[('5.8', '2.7', '5.1', '1.9', 'virginica'),
('5.1', '3.5', '1.4', '0.2', 'setosa'),
('5.7', '2.8', '4.1', '1.3', 'versicolor')]
"""
DATA = """3,4,setosa,virginica,versicolor
5.8,2.7,5.1,1.9,1
5.1,3.5,1.4,0.2,0
5.7,2.8,4.1,1.3,2"""
# values from file (note the list[tuple] format!)
# type: list[tuple]
result = ...
"""
* Assignment: CSV Format ReadTypeCast
* Complexity: easy
* Lines of code: 9 lines
* Time: 8 min
English:
1. Convert `DATA` to `result: list[tuple[str]]`
2. Convert numeric values to `float`
3. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[tuple[str]]`
2. Przekonwertuj wartości numeryczne do `float`
3. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.strip()`
* `str.split()`
* `map()`
* `list() + list()`
* `list.append()`
* `tuple()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is tuple for x in result), \
'All rows in `result` should be tuple'
>>> result # doctest: +NORMALIZE_WHITESPACE
[('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor')]
"""
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
# values from file (note the list[tuple] format!)
# type: list[tuple]
result = ...
"""
* Assignment: CSV Format ReadFixedHeader
* Complexity: easy
* Lines of code: 5 lines
* Time: 5 min
English:
1. Convert `DATA` to `result: list[dict]`
2. Use `HEADER` as dict keys
3. Do not convert numeric values to `float`, leave them as `str`
4. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` to `result: list[dict]`
2. Użyj `HEADER` jako kluczy dictów
3. Nie konwertuj wartości numeryczne do `float`, pozostaw je jako `str`
4. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.splitlines()`
* `str.strip()`
* `str.split()`
* `dict(zip())`
* `list.append()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is dict for x in result), \
'All rows in `result` should be dict'
>>> result # doctest: +NORMALIZE_WHITESPACE
[{'sepal_length': '5.8', 'sepal_width': '2.7', 'petal_length': '5.1',
'petal_width': '1.9', 'species': 'virginica'},
{'sepal_length': '5.1', 'sepal_width': '3.5', 'petal_length': '1.4',
'petal_width': '0.2', 'species': 'setosa'},
{'sepal_length': '5.7', 'sepal_width': '2.8', 'petal_length': '4.1',
'petal_width': '1.3', 'species': 'versicolor'}]
"""
DATA = """5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
HEADER = [
'sepal_length',
'sepal_width',
'petal_length',
'petal_width',
'species',
]
# Replace keys with `HEADER`
# type: list[dict[str,str]]
result = ...
"""
* Assignment: CSV Format ReadGenerateHeader
* Complexity: easy
* Lines of code: 7 lines
* Time: 8 min
English:
1. Generate `header: list[str]` from first line `DATA`
2. Convert `DATA` to `result: list[dict]`
3. Use `header` as keys
4. Do not convert numeric values to `float`, leave them as `str`
5. Run doctests - all must succeed
Polish:
1. Wygeneruj `header: list[str]` z pierwszej linii `DATA`
2. Przekonwertuj `DATA` to `result: list[dict]`
3. Użyj nagłówka jako kluczy
4. Nie konwertuj wartości numeryczne do `float`, pozostaw je jako `str`
5. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `str.strip()`
* `str.split()`
* `map()`
* `list() + list()`
* `list.append()`
* `tuple()`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is list, \
'Variable `result` has invalid type, should be list'
>>> assert all(type(x) is dict for x in result), \
'All rows in `result` should be dict'
>>> result # doctest: +NORMALIZE_WHITESPACE
[{'sepal_length': '5.8', 'sepal_width': '2.7', 'petal_length': '5.1',
'petal_width': '1.9', 'species': 'virginica'},
{'sepal_length': '5.1', 'sepal_width': '3.5', 'petal_length': '1.4',
'petal_width': '0.2', 'species': 'setosa'},
{'sepal_length': '5.7', 'sepal_width': '2.8', 'petal_length': '4.1',
'petal_width': '1.3', 'species': 'versicolor'}]
"""
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor"""
# replace fieldnames with `FIELDNAMES`
# type: list[dict]
result = ...
"""
* Assignment: CSV Format WriteListDict
* Complexity: easy
* Lines of code: 4 lines
* Time: 5 min
English:
1. Convert `DATA` to CSV as `result: str`:
a. do not add header
a. firstname - first field
c. lastname - second field
2. Non-functional requirements:
a. Do not use `import` and any module
b. Quotechar: None
c. Quoting: None
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` do CSV jako `result: str`:
a. nie dodawaj nagłówka
b. imię - pierwsze pole
c. nazwisko - drugie pole
2. Wymagania niefunkcjonalne:
a. Nie używaj `import` ani żadnych modułów
b. Quotechar: None
c. Quoting: None
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Uruchom doctesty - wszystkie muszą się powieść
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> print(result) # doctest: +NORMALIZE_WHITESPACE
Pan,Twardowski
Rick,Martinez
Mark,Watney
Ivan,Ivanovic
Melissa,Lewis
<BLANKLINE>
"""
DATA = [
{'firstname': 'Pan', 'lastname': 'Twardowski'},
{'firstname': 'Rick', 'lastname': 'Martinez'},
{'firstname': 'Mark', 'lastname': 'Watney'},
{'firstname': 'Ivan', 'lastname': 'Ivanovic'},
{'firstname': 'Melissa', 'lastname': 'Lewis'},
]
# multiline string with `firstname,lastname` pairs
# type: str
result = ...
"""
* Assignment: CSV Format WriteFixed
* Complexity: medium
* Lines of code: 5 lines
* Time: 5 min
English:
1. Convert `DATA` to CSV as `result: str`:
a. add header
a. firstname - first field
c. lastname - second field
2. Non-functional requirements:
a. Do not use `import` and any module
b. Quotechar: `"`
c. Quoting: always
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Run doctests - all must succeed
Polish:
1. Przekonwertuj `DATA` do CSV jako `result: str`:
a. dodaj nagłówek
b. imię - pierwsze pole
c. nazwisko - drugie pole
2. Wymagania niefunkcjonalne:
a. Nie używaj `import` ani żadnych modułów
b. Quotechar: `"`
c. Quoting: zawsze
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Uruchom doctesty - wszystkie muszą się powieść
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> print(result) # doctest: +NORMALIZE_WHITESPACE
"firstname","lastname"
"Pan","Twardowski"
"Rick","Martinez"
"Mark","Watney"
"Ivan","Ivanovic"
"Melissa","Lewis"
<BLANKLINE>
"""
DATA = [
{'firstname': 'Pan', 'lastname': 'Twardowski'},
{'firstname': 'Rick', 'lastname': 'Martinez'},
{'firstname': 'Mark', 'lastname': 'Watney'},
{'firstname': 'Ivan', 'lastname': 'Ivanovic'},
{'firstname': 'Melissa', 'lastname': 'Lewis'},
]
# multiline string with header and `"firstname","lastname"` pairs
# type: str
result = ...
"""
* Assignment: CSV Format WriteSchemaless
* Complexity: hard
* Lines of code: 13 lines
* Time: 13 min
English:
1. Define `header: str` with sorted list of unique keys from `DATA`
2. `header` must be automatically generated from `DATA`
3. Iterate over `DATA` and extract values for each header column
4. Define `result: str` with header and matching values
5. Non-functional requirements:
a. Do not use `import` and any module
b. Quotechar: `"`
c. Quoting: always
d. Delimiter: `,`
e. Lineseparator: `\n`
f. Sort `fieldnames`
6. Run doctests - all must succeed
Polish:
1. Zdefiniuj `header: str` z posortowaną listą unikalnych kluczy z `DATA`
2. `header` musi być generowany automatycznie z `DATA`
3. Iteruj po `DATA` i wyciągnij wartości dla każdej kolumny z nagłówka
4. Zdefiniuj `result: str` z nagłówkiem i pasującymi wartościami
5. Wymagania niefunkcjonalne:
a. Nie używaj `import` ani żadnych modułów
b. Quotechar: `"`
c. Quoting: zawsze
d. Delimiter: `,`
e. Lineseparator: `\n`
f. Posortuj `fieldnames`
6. Uruchom doctesty - wszystkie muszą się powieść
Hint:
* sorted()
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> print(result)
"Petal length","Petal width","Sepal length","Sepal width","Species"
"","","5.1","3.5","setosa"
"4.1","1.3","","","versicolor"
"","1.8","6.3","","virginica"
"","0.2","5.0","","setosa"
"4.1","","","2.8","versicolor"
"","1.8","","2.9","virginica"
<BLANKLINE>
"""
DATA = [
{'Sepal length': 5.1, 'Sepal width': 3.5, 'Species': 'setosa'},
{'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'},
{'Sepal length': 6.3, 'Petal width': 1.8, 'Species': 'virginica'},
{'Sepal length': 5.0, 'Petal width': 0.2, 'Species': 'setosa'},
{'Sepal width': 2.8, 'Petal length': 4.1, 'Species': 'versicolor'},
{'Sepal width': 2.9, 'Petal width': 1.8, 'Species': 'virginica'},
]
# header has unique keys from DATA, row values match header columns
# type: str
result = ...
"""
* Assignment: CSV Format WriteListTuple
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min
English:
1. Define `result: str` with `DATA` converted to CSV format
2. Non-functional requirements:
a. Do not use `import` and any module
b. Quotechar: None
c. Quoting: never
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Run doctests - all must succeed
Polish:
1. Zdefiniuj `result: str` z `DATA` przekonwertowaną do formatu CSV
2. Wymagania niefunkcjonalne:
a. Nie używaj `import` ani żadnych modułów
b. Quotechar: None
c. Quoting: nigdy
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Uruchom doctesty - wszystkie muszą się powieść
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> print(result)
SepalLength,SepalWidth,PetalLength,PetalWidth,Species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
6.3,2.9,5.6,1.8,virginica
6.4,3.2,4.5,1.5,versicolor
4.7,3.2,1.3,0.2,setosa
7.0,3.2,4.7,1.4,versicolor
7.6,3.0,6.6,2.1,virginica
4.9,3.0,1.4,0.2,setosa
<BLANKLINE>
"""
DATA = [
('SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa')]
# DATA converted to CSV format
# type: str
result = ...
"""
* Assignment: CSV Format WriteListDict
* Complexity: medium
* Lines of code: 7 lines
* Time: 8 min
English:
1. Define `result: str` with `DATA` converted to CSV format
2. Non-functional requirements:
a. Do not use `import` and any module
b. Quotechar: None
c. Quoting: never
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Run doctests - all must succeed
Polish:
1. Zdefiniuj `result: str` z `DATA` przekonwertowaną do formatu CSV
2. Wymagania niefunkcjonalne:
a. Nie używaj `import` ani żadnych modułów
b. Quotechar: None
c. Quoting: nigdy
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `vars(obj)`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> print(result)
sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
6.3,2.9,5.6,1.8,virginica
6.4,3.2,4.5,1.5,versicolor
<BLANKLINE>
"""
DATA = [{'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4,
'petal_width': 0.2, 'species': 'setosa'},
{'sepal_length': 5.8, 'sepal_width': 2.7, 'petal_length': 5.1,
'petal_width': 1.9, 'species': 'virginica'},
{'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4,
'petal_width': 0.2, 'species': 'setosa'},
{'sepal_length': 5.7, 'sepal_width': 2.8, 'petal_length': 4.1,
'petal_width': 1.3, 'species': 'versicolor'},
{'sepal_length': 6.3, 'sepal_width': 2.9, 'petal_length': 5.6,
'petal_width': 1.8, 'species': 'virginica'},
{'sepal_length': 6.4, 'sepal_width': 3.2, 'petal_length': 4.5,
'petal_width': 1.5, 'species': 'versicolor'}]
# DATA converted to CSV format
# type: str
result = ...
"""
* Assignment: CSV Format WriteObjects
* Complexity: medium
* Lines of code: 7 lines
* Time: 8 min
English:
1. Define `result: str` with `DATA` converted to CSV format
2. Non-functional requirements:
a. Do not use `import` and any module
b. Quotechar: None
c. Quoting: never
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Run doctests - all must succeed
Polish:
1. Zdefiniuj `result: str` z `DATA` przekonwertowaną do formatu CSV
2. Wymagania niefunkcjonalne:
a. Nie używaj `import` ani żadnych modułów
b. Quotechar: None
c. Quoting: nigdy
d. Delimiter: `,`
e. Lineseparator: `\n`
3. Uruchom doctesty - wszystkie muszą się powieść
Hints:
* `vars(obj)`
Tests:
>>> import sys; sys.tracebacklimit = 0
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> print(result)
sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
6.3,2.9,5.6,1.8,virginica
6.4,3.2,4.5,1.5,versicolor
<BLANKLINE>
"""
class Iris:
def __init__(self, sepal_length, sepal_width,
petal_length, petal_width, species):
self.sepal_length = sepal_length
self.sepal_width = sepal_width
self.petal_length = petal_length
self.petal_width = petal_width
self.species = species
DATA = [Iris(5.1, 3.5, 1.4, 0.2, 'setosa'),
Iris(5.8, 2.7, 5.1, 1.9, 'virginica'),
Iris(5.1, 3.5, 1.4, 0.2, 'setosa'),
Iris(5.7, 2.8, 4.1, 1.3, 'versicolor'),
Iris(6.3, 2.9, 5.6, 1.8, 'virginica'),
Iris(6.4, 3.2, 4.5, 1.5, 'versicolor')]
# DATA converted to CSV format
# type: str
result = ...