8.2. Files

8.2.1. Path

8.2.1.1. Absolute path

  • paths on Linux, macOS, BSD and other POSIX compliant OSes uses /

  • paths on Windows uses \

Listing 154. Windows paths
FILE = 'C:\\Temp\\iris.csv'
FILE = r'C:\Temp\iris.csv'
Listing 155. POSIX path
FILE = '/tmp/iris.csv'
FILE = r'/tmp/iris.csv'

8.2.1.2. Relative path

  • . - Current directory

  • .. - Parent directory

Listing 156. File in the same directory
FILE = r'iris.csv'
FILE = r'./iris.csv'
Listing 157. File in the child directory
FILE = r'tmp/iris.csv'
FILE = r'./tmp/iris.csv'
Listing 158. File in parent directory
FILE = r'../iris.csv'
FILE = r'../tmp/iris.csv'
Listing 159. File in two directories up directory
FILE = r'../../iris.csv'
FILE = r'../../tmp/iris.csv'

8.2.1.3. Make absolute from relative path

Listing 160. Make absolute from relative path
from os.path import dirname, join


__file__
# /home/python/my_script.py

dirname(__file__)
# /home/python/

join(dirname(__file__), 'iris.csv')
# /home/python/iris.csv

8.2.2. Read from file

  • Works with both relative and absolute path

  • Fails when directory with file cannot be accessed

  • Fails when file cannot be accessed

  • Uses context manager

  • mode parameter to open() function is optional (defaults to mode='r')

  • Reading access modes:

    • mode='rt' - read in text mode (default)

    • mode='rb' - read in binary mode

    • mode='r' - read in text mode

Listing 161. Reading file line by line
with open(r'/tmp/iris.csv') as file:
    for line in file:
        print(line)
Listing 162. Read whole file as a text to content variable
with open(r'/tmp/iris.csv') as file:
    content = file.read()
Listing 163. Reading file as list with lines
with open(r'/tmp/iris.csv') as file:
    lines = file.readlines()
Listing 164. Read selected (1-30) lines from file
with open(r'/tmp/iris.csv') as file:
    lines = file.readlines()[1:30]
Listing 165. Read selected (1-30) lines from file
with open(r'/tmp/iris.csv') as file:
    for line in file.readlines()[1:30]:
        print(line)
Listing 166. Read whole file and split by lines, separate header from content
with open(r'/tmp/iris.csv') as file:
    header, *content = file.readlines()

    for line in content:
        print(line)
Listing 167. Read header, and use generator to iterate over other lines
with open(r'/tmp/iris.csv') as file:
    header = file.readline()

    for line in file:
        print(line)

8.2.3. Writing to file

  • Works with both relative and absolute path

  • Fails when directory with file cannot be accessed

  • Creates file if not exists

  • Truncate the file before writing

  • mode parameter to open() function is required

  • Writing modes:

    • mode='wt' - write in text mode

    • mode='wb' - write in binary mode

    • mode='w' - write in text mode

Listing 168. Writing to file
with open(r'/tmp/iris.csv', mode='w') as file:
    file.write('hello')

8.2.4. Appending to file

  • Works with both relative and absolute path

  • Fails when directory with file cannot be accessed

  • Creates file if not exists

  • Append to the end of file

  • mode parameter to open() function is required

  • Writing modes:

    • mode='at' - append in text mode

    • mode='ab' - append in binary mode

    • mode='a' - append in text mode

Listing 169. Appending to file
with open(r'/tmp/iris.csv', mode='a') as file:
    file.write('hello')

8.2.5. Encoding

  • utf-8 - international standard (should be always used!)

  • iso-8859-1 - ISO standard for Western Europe and USA

  • iso-8859-2 - ISO standard for Central Europe (including Poland)

  • cp1250 or windows-1250 - Polish encoding on Windows

  • cp1251 or windows-1251 - Russian encoding on Windows

  • cp1252 or windows-1252 - Western European encoding on Windows

  • ASCII - ASCII characters only

with open(r'/tmp/example.txt', mode='w', encoding='utf-8') as file:
    file.write('Иван Иванович')

with open(r'/tmp/example.txt', encoding='utf-8') as file:
    print(file.read())
# Иван Иванович
with open(r'/tmp/example.txt', mode='w', encoding='cp1250') as file:
    file.write('Иван Иванович')
# Traceback (most recent call last):
#   ...
# UnicodeEncodeError: 'charmap' codec can't encode characters in
# position 0-3: character maps to <undefined>
with open(r'/tmp/example.txt', mode='w', encoding='utf-8') as file:
    file.write('Иван Иванович')

with open(r'/tmp/example.txt', encoding='cp1250') as file:
    print(file.read())
# Traceback (most recent call last):
#   ...
# UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 1: character maps to <undefined>

8.2.6. Exception handling

Listing 170. Exception handling while accessing files
try:
    with open(r'/tmp/iris.csv') as file:
        print(file.read())

except FileNotFoundError:
    print('File does not exist')

except PermissionError:
    print('Permission denied')

8.2.7. Good Engineering Practises

  • Never hardcode paths

  • FILE should be constant

  • FILE as a raw string r'...'

  • encoding='utf-8'

  • Use context manager - with keyword

8.2.8. Assignments

8.2.8.1. Example

  • Complexity level: easy

  • Lines of code to write: 5 lines

  • Estimated time of completion: 5 min

  • Filename: solution/file_example.py

English
  1. Using input() ask user for a file path

  2. Print file content

  3. Handle exception for not existing file

  4. Handle exception for not having sufficient permissions

Polish
  1. Używając input() zapytaj użytkownika o ścieżkę do pliku

  2. Wypisz zawartość pliku

  3. Obsłuż wyjątek dla nieistniejącego pliku

  4. Obsłuż wyjątek dla braku wystarczających uprawnień

Solution
filename = input('Type filename: ')

try:
    with open(filename) as file:
        print(file.read())

except FileNotFoundError:
    print('Sorry, file not found')

except PermissionError:
    print('Sorry, not permitted')

8.2.8.2. Parsing simple CSV file

English
  1. Download data/iris.csv save as iris.csv

  2. Define:

    • features: List[tuple] - list of measurements (each row is a tuple)

    • labels: List[str] - list of species names

  3. For each line in file:

    1. Remove whitespaces

    2. Split line by coma ,

    3. Append measurements to features

    4. Append species name to labels

  4. Print features and labels

Polish
  1. Ściągnij data/iris.csv i zapisz jako iris.csv

  2. Zdefiniuj:

    • features: List[tuple] - lista pomiarów (każdy wiersz to tuple)

    • labels: List[str] - lista nazw gatunków

  3. Dla każdej linii:

    1. Usuń białe znaki

    2. Podziel linię po przecinku ,

    3. Dodaj pomiary do features

    4. Dodaj gatunek do labels

  4. Wyświetl features i labels

The whys and wherefores
  • Reading file

  • Iterating over lines in file

  • String methods

  • Working with nested sequences

8.2.8.3. /etc/hosts - parsing to dict

English
  1. Copy input data from listing below and save to file hosts.txt

  2. For each line in file:

    1. Remove leading and trailing whitespaces

    2. Split line by whitespace

    3. Separate IP address and hosts names

    4. Append IP address and hosts names to OUTPUT

  3. Merge hostnames for the same IP

Polish
  1. Skopiuj dane wejściowe z listingu poniżej i zapisz do pliku hosts.txt

  2. Dla każdej lini w pliku:

    1. Usuń białe znaki na początku i końcu linii

    2. Podziel linię po białych znakach

    3. Odseparuj adres IP i nazwy hostów

    4. Dodaj adres IP i nazwy hostów do OUTPUT

  3. Scal nazwy hostów dla tego samego IP

Input
127.0.0.1       localhost
127.0.0.1       astromatt
10.13.37.1      nasa.gov esa.int roscosmos.ru
255.255.255.255 broadcasthost
::1             localhost
Output
OUTPUT: Dict[str, List[str]] = {
    '127.0.0.1': ['localhost', 'astromatt'],
    '10.13.37.1': ['nasa.gov', 'esa.int', 'roscosmos.ru'],
    '255.255.255.255': ['broadcasthost'],
    '::1': ['localhost'],
}
The whys and wherefores
  • Reading file

  • Iterating over lines in file

  • String methods

  • Working with nested sequences

8.2.8.4. /etc/hosts - parsing to List[dict]

English
  1. Copy input data from listing below and save to file hosts.txt

  2. Copy also comments and empty lines

  3. For each line in file:

    1. Skip line if it's empty, is whitespace or starts with comment #

    2. Remove leading and trailing whitespaces

    3. Split line by whitespace

    4. Separate IP address and hosts names

    5. Use one line if to check whether dot . is in the IP address

    6. If is present then protocol is IPv4 otherwise IPv6

    7. Append IP address and hosts names to OUTPUT

  4. Merge hostnames for the same IP

  5. OUTPUT must be list of dicts (List[dict])

Polish
  1. Skopiuj dane wejściowe z listingu poniżej i zapisz do pliku hosts.txt

  2. Skopiuj również komentarz i pustą linię

  3. Dla każdej lini w pliku:

    1. Pomiń linię jeżeli jest pusta, jest białym znakiem lub zaczyna się od komentarza #

    2. Usuń białe znaki na początku i końcu linii

    3. Podziel linię po białych znakach

    4. Odseparuj adres IP i nazwy hostów

    5. Wykorzystaj jednolinikowego if do sprawdzenia czy jest kropka . w adresie IP

    6. Jeżeli jest obecna to protokół jest IPv4, w przeciwnym przypadku IPv6

    7. Dodaj adres IP i nazwy hostów do OUTPUT

  4. Scal nazwy hostów dla tego samego IP

  5. OUTPUT ma być listą dictów (List[dict])

Input
##
# ``/etc/hosts`` structure:
#   - IPv4 or IPv6
#   - Hostnames
##

127.0.0.1       localhost
127.0.0.1       astromatt
10.13.37.1      nasa.gov esa.int roscosmos.ru
255.255.255.255 broadcasthost
::1             localhost
Output
OUTPUT: List[Dict[str, Union[str, Set[str]]] = [
    {'ip': '127.0.0.1', 'protocol': 'ipv4', 'hostnames': {'localhost', 'astromatt'}},
    {'ip': '10.13.37.1', 'protocol': 'ipv4', 'hostnames': {'nasa.gov', 'esa.int', 'roscosmos.ru'}},
    {'ip': '255.255.255.255', 'protocol': 'ipv4', 'hostnames': {'broadcasthost'}},
    {'ip': '::1', 'protocol': 'ipv6', 'hostnames': {'localhost'}}
]
The whys and wherefores
  • czytanie i parsowanie pliku

  • nieregularne pliki konfiguracyjne (struktura może się zmieniać)

  • filtrowanie elementów

  • korzystanie z pętli i instrukcji warunkowych

  • parsowanie stringów

  • praca ze ścieżkami w systemie operacyjnym

Hints
  • str.isspace()

  • value = True if ... else False