7.5. File Read

7.5.1. Rationale

  • Works with both relative and absolute path

  • Fails when directory with file cannot be accessed

  • Fails when file cannot be accessed

  • Uses context manager

  • mode parameter to open() function is optional (defaults to mode='rt')

7.5.2. Read From File

  • Always remember to close file

FILE = r'/tmp/myfile.txt'

file = open(FILE)
data = file.read()
file.close()

7.5.3. Read Using Context Manager

  • Context managers use with ... as ...: syntax

  • It closes file automatically upon block exit (dedent)

  • Using context manager is best practice

  • More information in Context Managers

FILE = r'/tmp/myfile.txt'

with open(FILE) as file:
    data = file.read()

7.5.4. Read File at Once

  • Note, that whole file must fit into memory

FILE = r'/tmp/myfile.txt'

with open(FILE) as file:
    data = file.read()

7.5.5. Read File as List of Lines

  • Note, that whole file must fit into memory

FILE = r'/tmp/myfile.txt'

with open(FILE) as file:
    data = file.readlines()
Listing 7.1. Read selected (1-30) lines from file
FILE = r'/tmp/myfile.txt'

with open(FILE) as file:
    lines = file.readlines()[1:30]
Listing 7.2. Read selected (1-30) lines from file
FILE = r'/tmp/myfile.txt'

with open(FILE) as file:
    for line in file.readlines()[1:30]:
        print(line)
Listing 7.3. Read whole file and split by lines, separate header from content
FILE = r'/tmp/myfile.txt'

with open(FILE) as file:
    header, *content = file.readlines()

    for line in content:
        print(line)

7.5.6. Reading File as Generator

  • Use generator to iterate over other lines

  • In those examples, file is a generator

FILE = r'/tmp/myfile.txt'

with open(FILE) as file:
    for line in file:
        print(line)
FILE = r'/tmp/myfile.txt'

with open(FILE) as file:
    header = file.readline()

    for line in file:
        print(line)

7.5.7. Examples

def isnumeric(x):
    try:
        float(x)
        return True
    except ValueError:
        return False


def clean(line):
    line = line.strip().split(',')
    line = map(lambda x: float(x) if isnumeric(x) else x, line)
    return tuple(line)


with open(FILE) as file:
    header = clean(file.readline())

    for line in file:
        line = clean(line)
        print(line)
total = 0

with open(FILE) as file:
    for line in file:
        total += sum(line)

print(total)
moving_average = 0
window = 10
tmp = []

with open(FILE) as file:
    for i, line in enumerate(file):
        line = line.strip().split(',')
        values = [x for x in line if x.isnumeric()]
        tmp.append(sum(values) / len(values))

        if i % window == 0:
            moving_average += sum(tmp) / len(tmp)
            tmp = []

print(mean)

7.5.8. Assignments

7.5.8.1. File Read Str

  • Complexity level: easy

  • Lines of code to write: 3 lines

  • Estimated time of completion: 3 min

  • Solution: solution/file_read_str.py

English
  1. Use data from "Input" section (see below)

  2. Write DATA to file FILE

  3. Read FILE to result: str

  4. Print result

  5. Compare result with "Output" section (see below)

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Zapisz DATA do pliku FILE

  3. Wczytaj FILE do result: str

  4. Wypisz result

  5. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Input
FILE = r'file_write_hello.txt'
DATA = 'hello world'
Output
result: str
# hello world

7.5.8.2. File Read Multiline

English
  1. Use data from "Input" section (see below)

  2. Write DATA to file FILE

  3. Read FILE to result: List[str]

  4. Print result

  5. Compare result with "Output" section (see below)

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Zapisz DATA do pliku FILE

  3. Wczytaj FILE do result: List[str]

  4. Wypisz result

  5. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Input
FILE = r'file_write_hello.txt'
DATA = 'sepal_length\nsepal_width\npetal_length\npetal_width\nspecies\n'
Output
result: List[str]
# ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

7.5.8.3. File Read CSV

  • Complexity level: easy

  • Lines of code to write: 15 lines

  • Estimated time of completion: 8 min

  • Solution: solution/file_read_csv.py

English
  1. Use data from "Input" section (see below)

  2. Write DATA to file FILE

  3. Read FILE

  4. Separate header from data

  5. Write header (first line) to header

  6. Read file and for each line:

    • Strip whitespaces

    • Split line by coma ,

    • Convert measurements do Tuple[float]

    • Append measurements to features

    • Append species name to labels

  7. Print header, features and labels

  8. Compare result with "Output" section (see below)

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Zapisz DATA do pliku FILE

  3. Wczytaj FILE

  4. Odseparuj nagłówek od danych

  5. Zapisz nagłówek (pierwsza linia) do header

  6. Zaczytaj plik i dla każdej linii:

    • Usuń białe znaki z początku i końca linii

    • Podziel linię po przecinku ,

    • Przekonwertuj pomiary do Tuple[float]

    • Dodaj pomiary do features

    • Dodaj gatunek do labels

  7. Wyświetl header, features i labels

  8. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Input
FILE = r'file_read_csv.csv'
DATA = """sepal_length,sepal_width,petal_length,petal_width,species
5.4,3.9,1.3,0.4,setosa
5.9,3.0,5.1,1.8,virginica
6.0,3.4,4.5,1.6,versicolor
7.3,2.9,6.3,1.8,virginica
5.6,2.5,3.9,1.1,versicolor
5.4,3.9,1.3,0.4,setosa
5.5,2.6,4.4,1.2,versicolor
5.7,2.9,4.2,1.3,versicolor
4.9,3.1,1.5,0.1,setosa
6.7,2.5,5.8,1.8,virginica
6.5,3.0,5.2,2.0,virginica
5.1,3.3,1.7,0.5,setosa
"""

header = []
features = []
labels = []
Output
header: List[str]
# ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

features: List[Tuple[float]]
# [(5.4, 3.9, 1.3, 0.4), (5.9, 3.0, 5.1, 1.8), (6.0, 3.4, 4.5, 1.6),
#  (7.3, 2.9, 6.3, 1.8), (5.6, 2.5, 3.9, 1.1), (5.4, 3.9, 1.3, 0.4),
#  (5.5, 2.6, 4.4, 1.2), (5.7, 2.9, 4.2, 1.3), (4.9, 3.1, 1.5, 0.1), ...]

labels: List[str]
# ['setosa', 'virginica', 'versicolor', 'virginica', 'versicolor',
#  'setosa', 'versicolor', 'versicolor', 'setosa', 'virginica',
#  'virginica', 'setosa', 'setosa', ...]
The whys and wherefores
  • Reading file

  • Iterating over lines in file

  • String methods

  • Working with nested sequences

Hint
  • tuple(float(x) for x in X)

7.5.8.4. File Read Parsing Dict

English
  1. Use data from "Input" section (see below)

  2. Write DATA to file FILE

  3. Read FILE and for each line:

    • Remove leading and trailing whitespaces

    • Skip line if it is empty

    • Split line by whitespace

    • Separate IP address and hosts names

    • Append IP address and hosts names to result

  4. Merge hostnames for the same IP

  5. Compare result with "Output" section (see below)

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Zapisz DATA do pliku FILE

  3. Wczytaj FILE i dla każdej lini:

    • Usuń białe znaki na początku i końcu linii

    • Pomiń linię, jeżeli jest pusta

    • Podziel linię po białych znakach

    • Odseparuj adres IP i nazwy hostów

    • Dodaj adres IP i nazwy hostów do result

  4. Scal nazwy hostów dla tego samego IP

  5. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Input
FILE = r'file_read_parsing_dict.txt'
DATA = """
127.0.0.1       localhost
10.13.37.1      nasa.gov esa.int roscosmos.ru
255.255.255.255 broadcasthost
::1             localhost"""
Output
result: dict
# {'127.0.0.1': ['localhost'],
#  '10.13.37.1': ['nasa.gov', 'esa.int', 'roscosmos.ru'],
#  '255.255.255.255': ['broadcasthost'],
#  '::1': ['localhost']}
The whys and wherefores
  • Reading file

  • Iterating over lines in file

  • String methods

  • Working with nested sequences

Hint
  • str.isspace()

  • str.split()

7.5.8.5. File Read Parsing List of Dicts

English
  1. Use data from "Input" section (see below)

  2. Using file.write() save input data from listing below to file hosts-advanced.txt

  3. Read file and for each line:

    • Skip line if it's empty, is whitespace or starts with comment #

    • Remove leading and trailing whitespaces

    • Split line by whitespace

    • Separate IP address and hosts names

    • Use one line if to check whether dot . is in the IP address

    • If is present then protocol is IPv4 otherwise IPv6

    • Append IP address and hosts names to result

  4. Merge hostnames for the same IP

  5. result must be list of dicts (List[dict])

  6. Compare result with "Output" section (see below)

Polish
  1. Użyj danych z sekcji "Input" (patrz poniżej)

  2. Używając file.write() zapisz dane wejściowe z listingu poniżej do pliku hosts-advanced.txt

  3. Przeczytaj plik i dla każdej lini:

    • Pomiń linię jeżeli jest pusta, jest białym znakiem lub zaczyna się od komentarza #

    • Usuń białe znaki na początku i końcu linii

    • Podziel linię po białych znakach

    • Odseparuj adres IP i nazwy hostów

    • Wykorzystaj jednolinikowego if do sprawdzenia czy jest kropka . w adresie IP

    • Jeżeli jest obecna to protokół jest IPv4, w przeciwnym przypadku IPv6

    • Dodaj adres IP i nazwy hostów do result

  4. Scal nazwy hostów dla tego samego IP

  5. result ma być listą dictów (List[dict])

  6. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Input
DATA = """
##
# ``/etc/hosts`` structure:
#   - IPv4 or IPv6
#   - Hostnames
 ##

127.0.0.1       localhost
127.0.0.1       astromatt
10.13.37.1      nasa.gov esa.int roscosmos.ru
255.255.255.255 broadcasthost
::1             localhost
"""
Output
result: List[dict]
# [{'ip': '127.0.0.1', 'protocol': 'ipv4', 'hostnames': {'localhost', 'astromatt'}},
#  {'ip': '10.13.37.1', 'protocol': 'ipv4', 'hostnames': {'nasa.gov', 'esa.int', 'roscosmos.ru'}},
#  {'ip': '255.255.255.255', 'protocol': 'ipv4', 'hostnames': {'broadcasthost'}},
#  {'ip': '::1', 'protocol': 'ipv6', 'hostnames': {'localhost'}}]
The whys and wherefores
  • czytanie i parsowanie pliku

  • nieregularne pliki konfiguracyjne (struktura może się zmieniać)

  • filtrowanie elementów

  • korzystanie z pętli i instrukcji warunkowych

  • parsowanie stringów

  • praca ze ścieżkami w systemie operacyjnym

Hints
  • str.split()

  • str.isspace()

  • value = True if ... else False