4.2. Data Import and Export¶
4.2.1. np.loadtxt()¶
import numpy as np
url = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris.csv'
a = np.loadtxt(url)
# Traceback (most recent call last):
# ValueError: could not convert string to float: 'sepal_length,sepal_width,petal_length,petal_width,species'
a = np.loadtxt(url, skiprows=1)
# Traceback (most recent call last):
# ValueError: could not convert string to float: '5.4,3.9,1.3,0.4,setosa'
a = np.loadtxt(url, skiprows=1, delimiter=',')
# Traceback (most recent call last):
# ValueError: could not convert string to float: 'setosa'
a = np.loadtxt(url, skiprows=1, delimiter=',', usecols=(0,1,2,3))
# array([[5.4, 3.9, 1.3, 0.4],
# [5.9, 3. , 5.1, 1.8],
# [6. , 3.4, 4.5, 1.6],
# [7.3, 2.9, 6.3, 1.8],
# [5.6, 2.5, 3.9, 1.1],
# ...,
# ])
a = np.loadtxt(url, skiprows=1, max_rows=3, delimiter=',', usecols=(0,1,2,3))
# array([[5.4, 3.9, 1.3, 0.4],
# [5.9, 3. , 5.1, 1.8],
# [6. , 3.4, 4.5, 1.6]])
a = np.loadtxt(url, max_rows=1, delimiter=',', dtype=str, usecols=(0,1,2,3))
# array(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='<U12')
4.2.2. np.savetxt()¶
4.2.3. int
¶
import numpy as np
a = np.array([[1,2,3],
[4,5,6]])
np.savetxt('/tmp/filename.csv', a, delimiter=',')
# 1.000000000000000000e+00,2.000000000000000000e+00,3.000000000000000000e+00
# 4.000000000000000000e+00,5.000000000000000000e+00,6.000000000000000000e+00
np.savetxt('/tmp/filename.csv', a, delimiter=',', fmt='%d')
# 1,2,3
# 4,5,6
4.2.4. float
¶
import numpy as np
a = np.array([[5.4, 3.9, 1.3, 0.4],
[5.9, 3. , 5.1, 1.8],
[6. , 3.4, 4.5, 1.6],
[7.3, 2.9, 6.3, 1.8],
[5.6, 2.5, 3.9, 1.1]])
np.savetxt('/tmp/filename.csv', a, delimiter=',')
# 5.400000000000000355e+00,3.899999999999999911e+00,1.300000000000000044e+00,4.000000000000000222e-01
# 5.900000000000000355e+00,3.000000000000000000e+00,5.099999999999999645e+00,1.800000000000000044e+00
# 6.000000000000000000e+00,3.399999999999999911e+00,4.500000000000000000e+00,1.600000000000000089e+00
# 7.299999999999999822e+00,2.899999999999999911e+00,6.299999999999999822e+00,1.800000000000000044e+00
# 5.599999999999999645e+00,2.500000000000000000e+00,3.899999999999999911e+00,1.100000000000000089e+00
np.savetxt('/tmp/filename.csv', a, delimiter=',', fmt='%.1f')
# 5.4,3.9,1.3,0.4
# 5.9,3.0,5.1,1.8
# 6.0,3.4,4.5,1.6
# 7.3,2.9,6.3,1.8
# 5.6,2.5,3.9,1.1
np.savetxt('/tmp/filename.csv', a, delimiter=',', fmt='%.2f')
# 5.40,3.90,1.30,0.40
# 5.90,3.00,5.10,1.80
# 6.00,3.40,4.50,1.60
# 7.30,2.90,6.30,1.80
# 5.60,2.50,3.90,1.10
4.2.5. Other¶
Method |
Data Type |
Format |
Description |
---|---|---|---|
|
Text |
|
Save in text format, such as CSV |
|
Binary |
|
Save in NumPy native format |
|
Binary |
|
Save multiple arrays to native format |
|
Compressed |
|
Save multiple arrays to compressed native format |
Method |
Data Type |
Description |
---|---|---|
|
Text |
Load data from text file such as |
|
Binary |
Load data from |
|
Binary |
Load binary data from |
|
Text |
Load data from string |
|
Text |
Load data from file using regex to parse |
|
Text |
Load data with missing values handled as specified |
|
Binary |
reads MATLAB data files |
import numpy as np
data = np.loadtxt('_temporary.csv', delimiter=',', usecols=1, skiprows=1, dtype=np.float16)
small = (data < 1)
medium = (data < 1) & (data < 2.0)
large = (data < 2)
np.save('/tmp/small', data[small])
np.save('/tmp/medium', data[medium])
np.save('/tmp/large', data[large])
4.2.6. Assignments¶
"""
* Assignment: Numpy Loadtext
* Complexity: easy
* Lines of code: 4 lines
* Time: 5 min
English:
1. Use data from "Given" section (see below)
2. Load text from `URL`
3. From the first line select Iris species names and save as str to `species: np.ndarray`
4. For other lines:
a. Read columns with data and save as float to `features: np.ndarray`
b. Read last column with species numbers and save as `int` to `labels: np.ndarray`
5. Compare result with "Tests" section (see below)
Polish:
1. Użyj danych z sekcji "Given" (patrz poniżej)
2. Wczytaj tekst z `URL`
3. Z pierwszej linii wybierz nazwy gatunków Irysów i zapisz rezultat jako str do `species: np.ndarray`
4. W pozostałych linii:
a Wczytaj kolumny z danymi i zapisz jako float do `features: np.ndarray`
b Wczytaj ostatnią kolumnę z numerami gatunków i zapisz jako `int` do `labels: np.ndarray`
5. Porównaj wyniki z sekcją "Tests" (patrz poniżej)
Tests:
>>> type(species) is np.ndarray
True
>>> type(features) is np.ndarray
True
>>> type(labels) is np.ndarray
True
>>> species
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
>>> len(features)
151
>>> features[:3]
array([[5.4, 3.9, 1.3, 0.4],
[5.9, 3. , 5.1, 1.8],
[6. , 3.4, 4.5, 1.6]])
>>> features[-3:]
array([[4.9, 2.5, 4.5, 1.7],
[6.3, 2.8, 5.1, 1.5],
[6.8, 3.2, 5.9, 2.3]])
>>> labels
array([0, 2, 1, 2, 1, 0, 1, 1, 0, 2, 2, 0, 0, 2, 2, 1, 2, 2, 2, 1, 0, 1,
1, 0, 0, 0, 2, 2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2, 1, 1, 1, 2, 2,
0, 1, 1, 1, 1, 1, 2, 0, 2, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 2, 0, 0,
0, 0, 0, 0, 1, 0, 2, 0, 0, 1, 1, 2, 2, 1, 0, 2, 1, 0, 1, 0, 2, 1,
0, 2, 0, 2, 1, 0, 2, 1, 1, 0, 0, 1, 2, 2, 2, 1, 0, 1, 1, 1, 2, 2,
0, 2, 2, 0, 2, 1, 2, 0, 0, 1, 0, 2, 0, 2, 1, 2, 2, 2, 1, 0, 2, 1,
0, 0, 2, 0, 2, 1, 1, 1, 0, 1, 1, 2, 0, 1, 1, 0, 2, 2, 2])
"""
# Given
import numpy as np
DATA = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/_data/csv/iris-dirty.csv'
species = ...
features = ...
labels = ...