5.3. Import and Export Data

5.3.1. Import data

  • File paths works also with URLs

  • SQL functions uses SQLAlchemy, which supports many RDBMS

import pandas as pd


# Important
pd.read_csv()
pd.read_excel()
pd.read_html()
pd.read_json()
pd.read_sql()        # Read SQL query or database table into a DataFrame

# Others
pd.read_clipboard()
pd.read_feather()
pd.read_fwf()
pd.read_gbq()
pd.read_hdf()
pd.read_msgpack()
pd.read_parquet()
pd.read_pickle()
pd.read_sas()
pd.read_spss()
pd.read_sql_query()  # Read SQL query into a DataFrame
pd.read_sql_table()  # Read SQL database table into a DataFrame
pd.read_stata()
pd.read_table()

5.3.2. Export data

  • File paths works also with URLs

  • SQL functions uses SQLAlchemy, which supports many RDBMS

import pandas as pd


df = pd.DataFrame()

# Important
df.to_csv()
df.to_dict()
df.to_excel()
df.to_json()
df.to_sql()

# Other
df.to_clipboard()
df.to_dense()
df.to_feather()
df.to_gbq()
df.to_hdf()
df.to_html()
df.to_latex()
df.to_msgpack()
df.to_numpy()
df.to_parquet()
df.to_period()
df.to_pickle()
df.to_records()
df.to_sparse()
df.to_stata()
df.to_string()
df.to_timestamp()
df.to_xarray()

5.3.3. Usage

import pandas as pd

url = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/numerical-analysis/pandas/data/iris.csv'

df = pd.read_csv(url)

df.head(3)
#    sepal_length  sepal_width  petal_length  petal_width     species
# 0           5.4          3.9           1.3          0.4      setosa
# 1           5.9          3.0           5.1          1.8   virginica
# 2           6.0          3.4           4.5          1.6  versicolor
import pandas as pd

url = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/numerical-analysis/pandas/data/iris-dirty.csv'

df = pd.read_csv(url)
df.head(3)
#      150    4  setosa  versicolor  virginica
# 0    5.4  3.9     1.3         0.4          0
# 1    5.9  3.0     5.1         1.8          2
# 2    6.0  3.4     4.5         1.6          1

df = pd.read_csv(url, skiprows=1, names=['sepal_length', 'sepal_width',
                                         'petal_length', 'petal_width', 'species'])
df.head(3)
#    sepal_length  sepal_width  petal_length  petal_width  species
# 0           5.4          3.9           1.3          0.4        0
# 1           5.9          3.0           5.1          1.8        2
# 2           6.0          3.4           4.5          1.6        1

df['species'].replace({
    0: 'setosa',
    1: 'versicolor',
    2: 'virginica',
}, inplace=True)
#    sepal_length  sepal_width  petal_length  petal_width  species
# 0           5.4          3.9           1.3          0.4        setosa
# 1           5.9          3.0           5.1          1.8        virginica
# 2           6.0          3.4           4.5          1.6        versicolor

5.3.4. Assignments

5.3.4.1. Read

  • Complexity level: easy

  • Lines of code to write: 5 lines

  • Estimated time of completion: 5 min

  • Filename: solution/pandas_read.py

English
  1. Read data from given url (see below) to DataFrame

  2. Use provided column names

  3. Read labels from the first row

  4. Replace data in label column with values extracted above

  5. Print DataFrame

  6. Print first 5 and last 10 rows from DataFrame

Polish
  1. Wczytaj dane z danego url (patrz poniżej) do DataFrame

  2. Użyj podanych nazw kolumn

  3. Wczytaj nazwy labeli z pierwszego wiersza

  4. Podmień dane w kolumnie label na wartości wyciągnięte powyżej

  5. Wypisz pierwsze 5 i ostatnie 10 wierszy z DataFrame

Input
url = 'https://raw.githubusercontent.com/scikit-learn/scikit-learn/master/sklearn/datasets/data/breast_cancer.csv'

column_names = ['mean radius', 'mean texture', 'mean perimeter', 'mean area',
                'mean smoothness', 'mean compactness', 'mean concavity',
                'mean concave points', 'mean symmetry', 'mean fractal dimension',
                'radius error', 'texture error', 'perimeter error', 'area error',
                'smoothness error', 'compactness error', 'concavity error',
                'concave points error', 'symmetry error',
                'fractal dimension error', 'worst radius', 'worst texture',
                'worst perimeter', 'worst area', 'worst smoothness',
                'worst compactness', 'worst concavity', 'worst concave points',
                'worst symmetry', 'worst fractal dimension', 'label']
The whys and wherefores
  • Read Pandas DataFrame

Hint
  • pd.read_csv(url, nrows=0).columns

  • df['label'].replace({'from': 'to'})