13.8. File Read¶

Works with both relative and absolute path
Fails when directory with file cannot be accessed
Fails when file cannot be accessed
Uses context manager
mode parameter to open() function is optional (defaults to mode='rt')

13.8.1. SetUp¶

>>> from pathlib import Path
>>> Path('/tmp/myfile.txt').unlink(missing_ok=True)
>>> Path('/tmp/myfile.txt').touch()
>>>
>>>
>>> DATA = """sepal_length,sepal_width,petal_length,petal_width,species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... 6.3,2.9,5.6,1.8,virginica
... 6.4,3.2,4.5,1.5,versicolor
... 4.7,3.2,1.3,0.2,setosa
... """
>>>
>>> with open('/tmp/myfile.txt', mode='w') as file:
...     _ = file.write(DATA)

13.8.2. Read From File¶

Always remember to close file

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> file = open(FILE)
>>> data = file.read()
>>> file.close()

13.8.3. Read Using Context Manager¶

Context managers use with ... as ...: syntax
It closes file automatically upon block exit (dedent)
Using context manager is best practice
More information in Protocol Context Manager

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     data = file.read()

13.8.4. Read File at Once¶

Note, that whole file must fit into memory

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     data = file.read()

13.8.5. Read File as List of Lines¶

Note, that whole file must fit into memory

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     data = file.readlines()

Read selected (1-30) lines from file:

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     lines = file.readlines()[1:30]

Read selected (1-30) lines from file:

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     for line in file.readlines()[1:30]:
...         line = line.strip()

Read whole file and split by lines, separate header from content:

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> 
... with open(FILE) as file:
...     lines = file.readlines()
...     header = lines[0]
...     content = lines[1:]
...
...     for line in content:
...         line = line.strip()

13.8.6. Reading File as Generator¶

Use generator to iterate over other lines
In those examples, file is a generator

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     for line in file:
...         line = line.strip()

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE) as file:
...     header = file.readline()
...
...     for line in file:
...         line = line.strip()

13.8.7. Examples¶

>>> FILE = r'/tmp/myfile.txt'
... # sepal_length,sepal_width,petal_length,petal_width,species
... # 5.8,2.7,5.1,1.9,virginica
... # 5.1,3.5,1.4,0.2,setosa
... # 5.7,2.8,4.1,1.3,versicolor
... # 6.3,2.9,5.6,1.8,virginica
... # 6.4,3.2,4.5,1.5,versicolor
... # 4.7,3.2,1.3,0.2,setosa
>>>
>>>
>>> result = []
>>>
>>> with open(FILE) as file:
...     header = file.readline().strip().split(',')
...
...     for line in file:
...         line = line.strip().split(',')
...         values = [float(x) for x in line[0:4]]
...         species = line[4]
...         row = values + [species]
...         pairs = zip(header, row)
...         result.append(dict(pairs))
>>>
>>> result  
[{'sepal_length': 5.8, 'sepal_width': 2.7, 'petal_length': 5.1, 'petal_width': 1.9, 'species': 'virginica'},
 {'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2, 'species': 'setosa'},
 {'sepal_length': 5.7, 'sepal_width': 2.8, 'petal_length': 4.1, 'petal_width': 1.3, 'species': 'versicolor'},
 {'sepal_length': 6.3, 'sepal_width': 2.9, 'petal_length': 5.6, 'petal_width': 1.8, 'species': 'virginica'},
 {'sepal_length': 6.4, 'sepal_width': 3.2, 'petal_length': 4.5, 'petal_width': 1.5, 'species': 'versicolor'},
 {'sepal_length': 4.7, 'sepal_width': 3.2, 'petal_length': 1.3, 'petal_width': 0.2, 'species': 'setosa'}]

13.8.8. StringIO¶

>>> from io import StringIO
>>>
>>>
>>> DATA = """sepal_length,sepal_width,petal_length,petal_width,species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... 6.3,2.9,5.6,1.8,virginica
... 6.4,3.2,4.5,1.5,versicolor
... 4.7,3.2,1.3,0.2,setosa
... """
>>>
>>>
>>> with StringIO(DATA) as file:
...     result = file.readline()
...
>>> result
'sepal_length,sepal_width,petal_length,petal_width,species\n'

>>> from io import StringIO
>>>
>>>
>>> DATA = """sepal_length,sepal_width,petal_length,petal_width,species
... 5.8,2.7,5.1,1.9,virginica
... 5.1,3.5,1.4,0.2,setosa
... 5.7,2.8,4.1,1.3,versicolor
... 6.3,2.9,5.6,1.8,virginica
... 6.4,3.2,4.5,1.5,versicolor
... 4.7,3.2,1.3,0.2,setosa
... """
>>>
>>>
>>> file = StringIO(DATA)
>>>
>>> file.read(50)
'sepal_length,sepal_width,petal_length,petal_width,'
>>> file.seek(0)
0
>>> file.readline()
'sepal_length,sepal_width,petal_length,petal_width,species\n'
>>> file.close()

13.8.9. Use Case - 0x01¶

>>> DATA = """A,B,C,red,green,blue
... 1,2,3,0
... 4,5,6,1
... 7,8,9,2"""
>>>
>>> data = DATA.splitlines()
>>> header = data[0]
>>> lines = data[1:]
>>> colors = header.strip().split(',')[3:]
>>> colors = dict(enumerate(colors))
>>> result = []
>>>
>>> for line in lines:
...     line = line.strip().split(',')
...     *numbers, color = map(int, line)
...     line = numbers + [colors.get(color)]
...     result.append(tuple(line))

13.8.10. Assignments¶

Code 13.7. Solution¶

"""
* Assignment: File Read Str
* Type: class assignment
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Read `FILE` to `result: str`
    2. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` do `result: str`
    2. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove
    >>> result = open(FILE).read()
    >>> remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is str, \
    'Variable `result` has invalid type, should be str'

    >>> result
    'hello world'
"""

FILE = '_temporary.txt'
DATA = 'hello world'

with open(FILE, mode='wt') as file:
    file.write(DATA)

# Read `FILE` to `result: list[str]`
# type: str
result = ...

Code 13.8. Solution¶

"""
* Assignment: File Read Multiline
* Type: class assignment
* Complexity: easy
* Lines of code: 3 lines
* Time: 3 min

English:
    1. Read `FILE` to `result: list[str]`
    2. Remove whitespaces
    3. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` do `result: list[str]`
    2. Usuń białe znaki
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove; remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is str, \
    'Variable `result` has invalid type, should be str'

    >>> result
    'Fist line\\nSecond line\\nThird line\\n'
"""

FILE = '_temporary.txt'

DATA = """Fist line
Second line
Third line
"""

with open(FILE, mode='wt') as file:
    file.write(DATA)

# Read `FILE` to `result: list[str]`
# Remove whitespaces
# type: str
result = ...

Code 13.9. Solution¶

"""
* Assignment: File Read List[str]
* Type: class assignment
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Read `FILE` to `result: list[str]`
    2. Remove whitespaces
    3. Split line by comma
    4. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` do `result: list[str]`
    2. Usuń białe znaki
    3. Podziel linię po przecinku
    4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`
    * `str.strip()`
    * `str.split()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove; remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert all(type(x) is str for x in result), \
    'All rows in `result` should be str'

    >>> result
    ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
"""

FILE = '_temporary.txt'
DATA = 'sepal_length,sepal_width,petal_length,petal_width,species'

with open(FILE, mode='wt') as file:
    file.write(DATA)

# Read `FILE` to `result: list[str]`
# Remove whitespaces
# Split line by comma
# type: str
result = ...

Code 13.10. Solution¶

"""
* Assignment: File Read Multiline
* Type: class assignment
* Complexity: easy
* Lines of code: 6 lines
* Time: 3 min

English:
    1. Read `FILE` to `result: tuple`
    2. Remove whitespaces
    3. Split line by comma
    4. Convert numeric values to float
    5. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` do `result: tuple`
    2. Usuń białe znaki
    3. Podziel linię po przecinku
    4. Przekonwertuj wartości numeryczne do float
    5. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`
    * Comprehension
    * `str.strip()`
    * `str.split()`
    * `float()`
    * `tuple()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from os import remove; remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is tuple, \
    'Variable `result` has invalid type, should be tuple'
    >>> assert all(type(x) in (float, str) for x in result), \
    'All rows in `result` should be float or str'

    >>> result
    (5.1, 3.5, 1.4, 0.2, 'setosa')
"""

FILE = '_temporary.txt'
DATA = (5.1, 3.5, 1.4, 0.2, 'setosa')
data = ','.join(str(x) for x in DATA) + '\n'

with open(FILE, mode='wt') as file:
    file.write(data)

# Read `FILE` to `result: tuple`
# Remove whitespaces
# Split line by comma
# Convert numeric values to float
# type: tuple[float, float, float, float, str]
result = ...

Code 13.11. Solution¶

"""
* Assignment: File Read CSV
* Type: class assignment
* Complexity: easy
* Lines of code: 15 lines
* Time: 8 min

English:
    1. Read `FILE` to `result: tuple`
    2. Remove whitespaces
    3. Split line by comma
    4. Convert numeric values to float
    5. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` do `result: tuple`
    2. Usuń białe znaki
    3. Podziel linię po przecinku
    4. Przekonwertuj wartości numeryczne do float
    5. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `with`
    * `open()`
    * `str.split()`
    * `str.strip()`
    * Comprehension
    * `float()`
    * `(1,2,3) + ('abc',)`
    * `list.append()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint
    >>> from os import remove; remove(FILE)

    >>> assert header is not Ellipsis, \
    'Assign your result to variable `header`'
    >>> assert features is not Ellipsis, \
    'Assign your result to variable `features`'
    >>> assert labels is not Ellipsis, \
    'Assign your result to variable `labels`'
    >>> assert type(header) is list, \
    'Variable `header` has invalid type, should be list'
    >>> assert type(features) is list, \
    'Variable `features` has invalid type, should be list'
    >>> assert type(labels) is list, \
    'Variable `labels` has invalid type, should be list'
    >>> assert all(type(x) is str for x in header), \
    'All rows in `header` should be str'
    >>> assert all(type(x) is tuple for x in features), \
    'All rows in `features` should be tuple'
    >>> assert all(type(x) is str for x in labels), \
    'All rows in `labels` should be str'

    >>> pprint(result)
    [('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
     (5.4, 3.9, 1.3, 0.4, 'setosa'),
     (5.9, 3.0, 5.1, 1.8, 'virginica'),
     (6.0, 3.4, 4.5, 1.6, 'versicolor'),
     (7.3, 2.9, 6.3, 1.8, 'virginica'),
     (5.6, 2.5, 3.9, 1.1, 'versicolor'),
     (5.4, 3.9, 1.3, 0.4, 'setosa')]
"""

FILE = '_temporary.csv'

DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.4,3.9,1.3,0.4,setosa
5.9,3.0,5.1,1.8,virginica
6.0,3.4,4.5,1.6,versicolor
7.3,2.9,6.3,1.8,virginica
5.6,2.5,3.9,1.1,versicolor
5.4,3.9,1.3,0.4,setosa
"""

with open(FILE, mode='w') as file:
    file.write(DATA)

# Read `FILE` to `result: tuple`
# Remove whitespaces
# Split line by comma
# Convert numeric values to float
# type: list[tuple]
result = ...

Code 13.12. Solution¶

"""
* Assignment: File Read CleanFile
* Type: homework
* Complexity: medium
* Lines of code: 10 lines
* Time: 8 min

English:
    1. Read `FILE` to `result: dict`:
        a. key: str - IP address
        b. value: list[str] - list of hosts
    2. Run doctests - all must succeed

Polish:
    1. Wczytaj `FILE` do `result: dict`:
        a. klucz: str - adres IP
        b. wartość: list[str] - lista hostów
    2. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.split()`
    * `str.strip()`
    * `with`
    * `open()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint
    >>> from os import remove; remove(FILE)

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is dict, \
    'Variable `result` has invalid type, should be dict'
    >>> assert all(type(x) is str for x in result.keys()), \
    'All keys in `result` should be str'
    >>> assert all(type(x) is list for x in result.values()), \
    'All values in `result` should be list'

    >>> pprint(result, sort_dicts=False)
    {'127.0.0.1': ['localhost'],
     '10.13.37.1': ['nasa.gov', 'esa.int'],
     '255.255.255.255': ['broadcasthost'],
     '::1': ['localhost']}
"""

FILE = '_temporary.txt'

DATA = """127.0.0.1       localhost
10.13.37.1      nasa.gov esa.int
255.255.255.255 broadcasthost
::1             localhost
"""

with open(FILE, mode='w') as file:
    file.write(DATA)

# Read `FILE` to `result: list[dict]`:
# - key: str - IP address
# - value: list[str] - list of hosts
# Example {'10.13.37.1': ['nasa.gov', 'esa.int'], ...}
# type: dict[str,list[str]]
result = ...