8.2. Files

8.2.1. Path

8.2.1.1. Absolute path

  • paths on Linux, macOS, BSD and other POSIX compliant OSes uses /

  • paths on Windows uses \

Listing 162. Windows and POSIX absolute paths
FILE = 'C:\\Temp\\iris.csv'
FILE = r'C:\Temp\iris.csv'

FILE = '/tmp/iris.csv'
FILE = r'/tmp/iris.csv'

8.2.1.2. Relative path

  • . - Current directory

  • .. - Parent directory

Listing 163. File in the same directory
FILE = r'iris.csv'
FILE = r'./iris.csv'

FILE = r'tmp/iris.csv'
FILE = r'./tmp/iris.csv'

FILE = r'../iris.csv'
FILE = r'../tmp/iris.csv'

FILE = r'../../iris.csv'
FILE = r'../../tmp/iris.csv'

8.2.1.3. Make absolute from relative path

Listing 164. Make absolute from relative path
from os.path import dirname, join

__file__
# /home/python/my_script.py

dirname(__file__)
# /home/python/

join(dirname(__file__), 'iris.csv')
# /home/python/iris.csv

8.2.2. Read from file

  • Works with both relative and absolute path

  • Fails when directory with file cannot be accessed

  • Fails when file cannot be accessed

  • Uses context manager

  • mode parameter to open() function is optional (defaults to mode='r')

  • Reading access modes:

    • mode='rt' - read in text mode (default)

    • mode='rb' - read in binary mode

    • mode='r' - read in text mode

Listing 165. Reading file line by line
FILE = r'/tmp/iris.csv'

with open(FILE) as file:
    for line in file:
        print(line)
Listing 166. Read whole file as a text to content variable
FILE = r'/tmp/iris.csv'

with open(FILE) as file:
    content = file.read()
Listing 167. Reading file as list with lines
FILE = r'/tmp/iris.csv'

with open(FILE) as file:
    lines = file.readlines()
Listing 168. Read selected (1-30) lines from file
FILE = r'/tmp/iris.csv'

with open(FILE) as file:
    lines = file.readlines()[1:30]
Listing 169. Read selected (1-30) lines from file
FILE = r'/tmp/iris.csv'

with open(FILE) as file:
    for line in file.readlines()[1:30]:
        print(line)
Listing 170. Read whole file and split by lines, separate header from content
FILE = r'/tmp/iris.csv'

with open(FILE) as file:
    header, *content = file.readlines()

    for line in content:
        print(line)
Listing 171. Read header, and use generator to iterate over other lines
FILE = r'/tmp/iris.csv'

with open(FILE) as file:
    header = file.readline()

    for line in file:
        print(line)

8.2.3. Writing to file

  • Works with both relative and absolute path

  • Fails when directory with file cannot be accessed

  • Creates file if not exists

  • Truncate the file before writing

  • mode parameter to open() function is required

  • Writing modes:

    • mode='wt' - write in text mode

    • mode='wb' - write in binary mode

    • mode='w' - write in text mode

Listing 172. Writing to file
FILE = r'/tmp/iris.csv'

with open(FILE, mode='w') as file:
    file.write('hello')

8.2.4. Appending to file

  • Works with both relative and absolute path

  • Fails when directory with file cannot be accessed

  • Creates file if not exists

  • Append to the end of file

  • mode parameter to open() function is required

  • Writing modes:

    • mode='at' - append in text mode

    • mode='ab' - append in binary mode

    • mode='a' - append in text mode

Listing 173. Appending to file
FILE = r'/tmp/iris.csv'

with open(FILE, mode='a') as file:
    file.write('hello')

8.2.5. Encoding

  • utf-8 - international standard (should be always used!)

  • iso-8859-1 - ISO standard for Western Europe and USA

  • iso-8859-2 - ISO standard for Central Europe (including Poland)

  • cp1250 or windows-1250 - Polish encoding on Windows

  • cp1251 or windows-1251 - Russian encoding on Windows

  • cp1252 or windows-1252 - Western European encoding on Windows

  • ASCII - ASCII characters only

FILE = r'/tmp/example.txt'

with open(FILE, mode='w', encoding='utf-8') as file:
    file.write('Иван Иванович')

with open(FILE, encoding='utf-8') as file:
    print(file.read())
# Иван Иванович
FILE = r'/tmp/example.txt'

with open(FILE, mode='w', encoding='cp1250') as file:
    file.write('Иван Иванович')
# Traceback (most recent call last):
#   ...
# UnicodeEncodeError: 'charmap' codec can't encode characters in
# position 0-3: character maps to <undefined>
FILE = r'/tmp/example.txt'

with open(FILE, mode='w', encoding='utf-8') as file:
    file.write('Иван Иванович')

with open(FILE, encoding='cp1250') as file:
    print(file.read())
# Traceback (most recent call last):
#   ...
# UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 1: character maps to <undefined>

8.2.6. Exception handling

Listing 174. Exception handling while accessing files
FILE = r'/tmp/example.txt'

try:
    with open(FILE) as file:
        print(file.read())

except FileNotFoundError:
    print('File does not exist')

except PermissionError:
    print('Permission denied')

8.2.7. Good Engineering Practises

  • Never hardcode paths, always use FILE or similar

  • FILE should be constant

  • FILE as a raw string r'...'

  • encoding='utf-8'

  • Use context manager - with keyword

8.2.8. Assignments

8.2.8.1. Example

  • Complexity level: easy

  • Lines of code to write: 5 lines

  • Estimated time of completion: 5 min

  • Solution: solution/file_example.py

English
  1. Using input() ask user for a file path

  2. Print file content

  3. Handle exception for not existing file

  4. Handle exception for not having sufficient permissions

Polish
  1. Używając input() zapytaj użytkownika o ścieżkę do pliku

  2. Wypisz zawartość pliku

  3. Obsłuż wyjątek dla nieistniejącego pliku

  4. Obsłuż wyjątek dla braku wystarczających uprawnień

Solution
filename = input('Type filename: ')

try:
    with open(filename) as file:
        print(file.read())

except FileNotFoundError:
    print('Sorry, file not found')

except PermissionError:
    print('Sorry, not permitted')

8.2.8.2. Save to CSV file

  • Complexity level: easy

  • Lines of code to write: 5 lines

  • Estimated time of completion: 10 min

  • Solution: solution/file_write.py

English
  1. For given data structure INPUT: List[tuple] (see below)

  2. Separate header from data

  3. Write data to file: iris.csv

  4. First line in file must be a header

  5. Use coma (,) as a separator

  6. Use utf-8 encoding and \n for line terminator

Polish
  1. Dana jest struktura danych INPUT: List[tuple] (patrz sekcja input)

  2. Odseparuj nagłówek do danych

  3. Zapisz dane do pliku: iris.csv

  4. Pierwsza linią w pliku musi być nagłówkiem

  5. Użyj przecinka (,) jako separatora

  6. Użyj kodowania utf-8 i \n jako koniec linii

Input
INPUT = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
    (7.1, 3.0, 5.9, 2.1, 'virginica'),
    (4.6, 3.4, 1.4, 0.3, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (5.0, 3.6, 1.4, 0.3, 'setosa'),
    (5.5, 2.3, 4.0, 1.3, 'versicolor'),
    (6.5, 3.0, 5.8, 2.2, 'virginica'),
    (6.5, 2.8, 4.6, 1.5, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (6.9, 3.1, 4.9, 1.5, 'versicolor'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]

8.2.8.3. Parsing simple CSV file

English
  1. Download data/iris.csv save as iris.csv

  2. Define:

    • features: List[tuple] - list of measurements (each row is a tuple)

    • labels: List[str] - list of species names

  3. Read file and for each line:

    1. Remove whitespaces

    2. Split line by coma ,

    3. Append measurements to features

    4. Append species name to labels

  4. Print features and labels

Polish
  1. Ściągnij data/iris.csv i zapisz jako iris.csv

  2. Zdefiniuj:

    • features: List[tuple] - lista pomiarów (każdy wiersz to tuple)

    • labels: List[str] - lista nazw gatunków

  3. Zaczytaj plik i dla każdej linii:

    1. Usuń białe znaki

    2. Podziel linię po przecinku ,

    3. Dodaj pomiary do features

    4. Dodaj gatunek do labels

  4. Wyświetl features i labels

The whys and wherefores
  • Reading file

  • Iterating over lines in file

  • String methods

  • Working with nested sequences

8.2.8.4. /etc/hosts - parsing to dict

English
  1. Using file.write() save input data from listing below to file hosts-simple.txt

  2. Read file and for each line:

    1. Skip line if contains only whitespaces (str.isspace())

    2. Remove leading and trailing whitespaces

    3. Split line by whitespace

    4. Separate IP address and hosts names

    5. Append IP address and hosts names to output

  3. Merge hostnames for the same IP

Polish
  1. Używając file.write() zapisz dane wejściowe z listingu poniżej do pliku hosts-simple.txt

  2. Zaczytaj plik i dla każdej lini:

    1. Pomiń linię, jeżeli zawiera tylko białe znaki (str.isspace())

    2. Usuń białe znaki na początku i końcu linii

    3. Podziel linię po białych znakach

    4. Odseparuj adres IP i nazwy hostów

    5. Dodaj adres IP i nazwy hostów do output

  3. Scal nazwy hostów dla tego samego IP

Input
INPUT = """
127.0.0.1       localhost
127.0.0.1       astromatt
10.13.37.1      nasa.gov esa.int roscosmos.ru
255.255.255.255 broadcasthost
::1 `            localhost
"""
Output
output: Dict[str, List[str]] = {
    '127.0.0.1': ['localhost', 'astromatt'],
    '10.13.37.1': ['nasa.gov', 'esa.int', 'roscosmos.ru'],
    '255.255.255.255': ['broadcasthost'],
    '::1': ['localhost'],
}
The whys and wherefores
  • Reading file

  • Iterating over lines in file

  • String methods

  • Working with nested sequences

Hint
  • str.isspace()

  • str.split()

8.2.8.5. /etc/hosts - parsing to List[dict]

English
  1. Using file.write() save input data from listing below to file hosts-advanced.txt

  2. Read file and for each line:

    1. Skip line if it's empty, is whitespace or starts with comment #

    2. Remove leading and trailing whitespaces

    3. Split line by whitespace

    4. Separate IP address and hosts names

    5. Use one line if to check whether dot . is in the IP address

    6. If is present then protocol is IPv4 otherwise IPv6

    7. Append IP address and hosts names to output

  3. Merge hostnames for the same IP

  4. output must be list of dicts (List[dict])

Polish
  1. Używając file.write() zapisz dane wejściowe z listingu poniżej do pliku hosts-advanced.txt

  2. Przeczytaj plik i dla każdej lini:

    1. Pomiń linię jeżeli jest pusta, jest białym znakiem lub zaczyna się od komentarza #

    2. Usuń białe znaki na początku i końcu linii

    3. Podziel linię po białych znakach

    4. Odseparuj adres IP i nazwy hostów

    5. Wykorzystaj jednolinikowego if do sprawdzenia czy jest kropka . w adresie IP

    6. Jeżeli jest obecna to protokół jest IPv4, w przeciwnym przypadku IPv6

    7. Dodaj adres IP i nazwy hostów do output

  3. Scal nazwy hostów dla tego samego IP

  4. output ma być listą dictów (List[dict])

Input
INPUT = """
##
# ``/etc/hosts`` structure:
#   - IPv4 or IPv6
#   - Hostnames
 ##

127.0.0.1       localhost
127.0.0.1       astromatt
10.13.37.1      nasa.gov esa.int roscosmos.ru
255.255.255.255 broadcasthost
::1             localhost
"""
Output
output: List[Dict[str, Union[str, Set[str]]] = [
    {'ip': '127.0.0.1', 'protocol': 'ipv4', 'hostnames': {'localhost', 'astromatt'}},
    {'ip': '10.13.37.1', 'protocol': 'ipv4', 'hostnames': {'nasa.gov', 'esa.int', 'roscosmos.ru'}},
    {'ip': '255.255.255.255', 'protocol': 'ipv4', 'hostnames': {'broadcasthost'}},
    {'ip': '::1', 'protocol': 'ipv6', 'hostnames': {'localhost'}}
]
The whys and wherefores
  • czytanie i parsowanie pliku

  • nieregularne pliki konfiguracyjne (struktura może się zmieniać)

  • filtrowanie elementów

  • korzystanie z pętli i instrukcji warunkowych

  • parsowanie stringów

  • praca ze ścieżkami w systemie operacyjnym

Hints
  • str.split()

  • str.isspace()

  • value = True if ... else False