3.4. Data Import and Export

3.4.1. np.loadtxt()

import numpy as np


url = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/numerical-analysis/numpy/data/iris.csv'

a = np.loadtxt(url)
# ValueError: could not convert string to float: 'sepal_length,sepal_width,petal_length,petal_width,species'

a = np.loadtxt(url, skiprows=1)
# ValueError: could not convert string to float: '5.4,3.9,1.3,0.4,setosa'

a = np.loadtxt(url, skiprows=1, delimiter=',')
# ValueError: could not convert string to float: 'setosa'

a = np.loadtxt(url, skiprows=1, delimiter=',', usecols=(0,1,2,3))
# array([[5.4, 3.9, 1.3, 0.4],
#        [5.9, 3. , 5.1, 1.8],
#        [6. , 3.4, 4.5, 1.6],
#        [7.3, 2.9, 6.3, 1.8],
#        [5.6, 2.5, 3.9, 1.1],
#        ...,
# ])

a = np.loadtxt(url, skiprows=1, max_rows=3, delimiter=',', usecols=(0,1,2,3))
# array([[5.4, 3.9, 1.3, 0.4],
#        [5.9, 3. , 5.1, 1.8],
#        [6. , 3.4, 4.5, 1.6]])

a = np.loadtxt(url, skiprows=1, max_rows=3, delimiter=',', usecols=(0,1,2,3))
# array([[5.4, 3.9, 1.3, 0.4],
#        [5.9, 3. , 5.1, 1.8],
#        [6. , 3.4, 4.5, 1.6]])

3.4.2. np.savetxt()

3.4.2.1. int

import numpy as np


a = np.array([[1,2,3],
              [4,5,6]])

np.savetxt('/tmp/filename.csv', a, delimiter=',')
# 1.000000000000000000e+00,2.000000000000000000e+00,3.000000000000000000e+00
# 4.000000000000000000e+00,5.000000000000000000e+00,6.000000000000000000e+00
import numpy as np


a = np.array([[1,2,3],
              [4,5,6]])

np.savetxt('/tmp/filename.csv', a, delimiter=',', fmt='%d')
# 1,2,3
# 4,5,6

3.4.2.2. float

import numpy as np


a = np.array([[5.4, 3.9, 1.3, 0.4],
              [5.9, 3. , 5.1, 1.8],
              [6. , 3.4, 4.5, 1.6],
              [7.3, 2.9, 6.3, 1.8],
              [5.6, 2.5, 3.9, 1.1]])

np.savetxt('/tmp/filename.csv', a, delimiter=',', fmt='%.2f')
# 5.40,3.90,1.30,0.40
# 5.90,3.00,5.10,1.80
# 6.00,3.40,4.50,1.60
# 7.30,2.90,6.30,1.80
# 5.60,2.50,3.90,1.10

3.4.3. Other

  • np.load(), np.loads() - pickle

  • np.fromstring()

  • np.fromregex()

  • np.genfromtxt() - Load data with missing values handled as specified

  • scipy.io.loadmat() - reads MATLAB data files

3.4.4. Assignments

3.4.4.1. Load Dirty CSV

English
  1. Load text from URL given as input (see below)

  2. Read first line with dtype=str and save as header: ndarray

  3. Read other lines with dtype=float and save as data: ndarray

  4. From header slice Iris species names and save result as species: ndarray

  5. In data split measurements from species number (last column)

  6. Measurements save as features: ndarray as type float

  7. Species numbers save as labels: ndarray as type int

  8. Print species, labels and features

Polish
  1. Wczytaj tekst z URL podanego na wejściu (patrz poniżej)

  2. Przeczytaj pierwszą linię jako dtype=str i zapisz do header: ndarray

  3. Przeczytaj pozostałe linie jako dtype=float i zapisz jako data: ndarray

  4. Z header wytnij nazwy gatunków Irysów i zapisz rezultat jako species: ndarray

  5. W data oddziel pomiary od numerów gatunków (ostatnia kolumna)

  6. Pomiary zapisz do features: ndarray jako typ float

  7. Gatunki zapisz do labels: ndarray jako typ int

  8. Wyświetl species, labels i features

Input
https://raw.githubusercontent.com/AstroMatt/book-python/master/numerical-analysis/numpy/data/iris-dirty.csv
Output
species: ndarray
# array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

features: ndarray
# array([[5.4, 3.9, 1.3, 0.4],
#        [5.9, 3. , 5.1, 1.8],
#        [6. , 3.4, 4.5, 1.6],
#        [7.3, 2.9, 6.3, 1.8],
#        ...
#        [6.8, 3.2, 5.9, 2.3]])

labels: ndarray
# array([0, 2, 1, 2, ..., 0, 2, 2, 2])
Hint
  • np.loadtext(..., dtype=str)

  • header[2:]

  • ndarray.astype(int)

  • data[:, :-1]

  • data[:, -1]