2.1. Jupyter

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

2.1.1. Install

$ pip install jupyter

2.1.2. Run

$ jupyter-notebook
[I 08:58:24.417 NotebookApp] Serving notebooks from local directory: /Users/catherine
[I 08:58:24.417 NotebookApp] 0 active kernels
[I 08:58:24.417 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 08:58:24.417 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
$ jupyter-notebook filename.ipynb

2.1.3. Using

  • Add code

  • Run code

  • Modify code and run

  • Autocomplete

  • Cell type (Markdown, LaTeX, Code)

2.1.4. Shortcut keys

2.1.4.1. Indent

  • Tab

  • Shift + Tab

2.1.4.2. Comment Code

  • Ctrl + /

2.1.4.3. Run

  • Shift + Enter

2.1.5. Cells

2.1.5.1. Insert Below/Above Cells

2.1.5.2. Add, Delete Cells

2.1.5.3. Cut, Copy, Paste Cells

2.1.5.4. Move Up/Down Cells

2.1.5.5. Merge, Split Cells

2.1.6. Run

2.1.6.1. Run Cell

  • Shift-Enter

2.1.6.2. Run All (above/below)

2.1.6.3. Clear Output

2.1.7. Magic commands

2.1.9. Functions

2.1.9.1. Checkpoints

2.1.9.2. Download

2.1.9.3. Trust Notebook

2.1.9.4. Close and Halt

2.1.10. Performance and profiling

  • %%timeit

  • %%timeit -n 1000 -r 7

2.1.11. Markdown

2.1.11.1. Unorganized lists

* first element
* second element
* third element
- first element
- second element
- third element

2.1.11.2. Organized lists

1. first element
1. second element
1. third element

2.1.11.3. Headers

# Header level 1
## Header level 2
### Header level 3
#### Header level 4
##### Header level 5
###### Header level 6

2.1.11.4. Formatting

*italic*
**bold**

2.1.11.5. Code inline

`class`

2.1.11.6. Code blocks

```python
name = 'Jose Jimenez'
print(f'My name... {name}')
```

2.1.11.7. Tables

| id | first_name | last_name |    agency |
|----|:-----------|:---------:|----------:|
| 1  | José       |  Jiménez  |      NASA |
| 2  | Иван       |  Иванович | Roscosmos |
| 3  | Mark       |   Watney  |      NASA |
| 4  | Alex       |   Vogel   |      NASA |

2.1.12. Embedding objects

2.1.12.1. LaTeX

  • %%latex

%%latex

$$c = \sqrt{a^2 + b^2}$$
%%latex

$$\int_{x=0}^{x=\infty} x^\pi dx$$
%%latex

\begin{equation}
H← ​​​60 ​+​ \frac{​​30(B-R)​​}{Vmax-Vmin}  ​​, if V​max​​ = G
\end{equation}
from IPython.display import display, Math, Latex

display(Math(r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx'))

2.1.12.2. Matplotlib charts

%matplotlib inline
import math
import random
from matplotlib import pyplot as plt

x1 = [x*0.01 for x in range(0,628)]
y1 = [math.sin(x*0.01)+random.gauss(0, 0.1) for x in range(0,628)]
plt.plot(x1, y1)

x2 = [x*0.5 for x in range(0,round(63/5))]
y2 = [math.cos(x*0.5) for x in range(0,round(63/5))]
plt.plot(x2, y2, 'o-')

plt.show()

2.1.12.3. HTML

from IPython.display import HTML

HTML("We can <i>generate</i> <code>html</code> code <b>directly</b>!")

2.1.12.4. JavaScript

from IPython.display import Javascript

Javascript("alert('It is JavaScript!')")

2.1.12.5. Image

2.1.12.6. YouTube

from IPython.display import YouTubeVideo

YouTubeVideo("wupToqz1e2g")

2.1.13. Workflow

$ pip install pandas
import pandas as pd


FILE = 'https://raw.githubusercontent.com/scikit-learn/scikit-learn/master/sklearn/datasets/data/iris.csv'

df = pd.read_csv(FILE, skiprows=1)

df.head(5)
#      5.1  3.5  1.4  0.2  0
# 0    4.9  3.0  1.4  0.2  0
# 1    4.7  3.2  1.3  0.2  0
# 2    4.6  3.1  1.5  0.2  0
# 3    5.0  3.6  1.4  0.2  0
# 4    5.4  3.9  1.7  0.4  0

df.columns = [
    'Sepal length',
    'Sepal width',
    'Petal length',
    'Petal width',
    'Species'
]

df.head(5)
#    Sepal length  Sepal width  Petal length  Petal width  Species
# 0           5.1          3.5           1.4          0.2        0
# 1           4.9          3.0           1.4          0.2        0
# 2           4.7          3.2           1.3          0.2        0
# 3           4.6          3.1           1.5          0.2        0
# 4           5.0          3.6           1.4          0.2        0

df.tail(3)
#      Sepal length  Sepal width  Petal length  Petal width  Species
# 147           6.5          3.0           5.2          2.0        2
# 148           6.2          3.4           5.4          2.3        2
# 149           5.9          3.0           5.1          1.8        2

df['Species'].replace({
    0: 'setosa',
    1: 'versicolor',
    2: 'virginica'
}, inplace=True)

df = df.sample(frac=1.0)
#      Sepal length  Sepal width  Petal length  Petal width     Species
# 120           5.6          2.8           4.9          2.0   virginica
# 9             5.4          3.7           1.5          0.2      setosa
# 54            5.7          2.8           4.5          1.3  versicolor
# 46            4.6          3.2           1.4          0.2      setosa
# 2             4.6          3.1           1.5          0.2      setosa
# ...

df.reset_index(drop=True)
#      Sepal length  Sepal width     ...      Petal width     Species
# 0             5.0          2.0     ...              1.0  versicolor
# 1             6.4          2.7     ...              1.9   virginica
# 2             5.6          3.0     ...              1.5  versicolor
# 3             5.7          2.6     ...              1.0  versicolor
# 4             6.4          3.1     ...              1.8   virginica
# ...

df.describe()
#        Sepal length  Sepal width  Petal length  Petal width
# count    150.000000   150.000000    150.000000   150.000000
# mean       5.843333     3.057333      3.758000     1.199333
# std        0.828066     0.435866      1.765298     0.762238
# min        4.300000     2.000000      1.000000     0.100000
# 25%        5.100000     2.800000      1.600000     0.300000
# 50%        5.800000     3.000000      4.350000     1.300000
# 75%        6.400000     3.300000      5.100000     1.800000
# max        7.900000     4.400000      6.900000     2.500000

2.1.13.1. Hist

import matplotlib.pyplot as plt
import pandas as pd


INPUT = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/serialization/data/iris.csv'

df = pd.read_csv(INPUT)
df.hist()
plt.show()
../../_images/matplotlib-pd-hist.png

Figure 70. Visualization using hist

2.1.13.2. Density

import matplotlib.pyplot as plt
import pandas as pd


INPUT = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/serialization/data/iris.csv'


df = pd.read_csv(INPUT)
df.plot(kind='density', subplots=True, layout=(2,2), sharex=False)
plt.show()
../../_images/matplotlib-pd-density.png

Figure 71. Visualization using density

2.1.13.3. Box

import matplotlib.pyplot as plt
import pandas as pd


INPUT = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/serialization/data/iris.csv'


df = pd.read_csv(INPUT)
df.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()
../../_images/matplotlib-pd-box.png

Figure 72. Visualization using density

2.1.13.4. Scatter matrix

  • The in pandas version 0.22 plotting module has been moved from pandas.tools.plotting to pandas.plotting

  • As of version 0.19, the pandas.plotting library did not exist

import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import scatter_matrix


INPUT = 'https://raw.githubusercontent.com/AstroMatt/book-python/master/serialization/data/iris.csv'


df = pd.read_csv(INPUT)
scatter_matrix(df)
plt.show()
../../_images/matplotlib-pd-scatter-matrix.png

Figure 73. Visualization using density

2.1.13.5. Descriptive statistics

Table 86. Descriptive statistics

Function

Description

count

Number of non-null observations

sum

Sum of values

mean

Mean of values

mad

Mean absolute deviation

median

Arithmetic median of values

min

Minimum

max

Maximum

mode

Mode

abs

Absolute Value

prod

Product of values

std

Unbiased standard deviation

var

Unbiased variance

sem

Unbiased standard error of the mean

skew

Unbiased skewness (3rd moment)

kurt

Unbiased kurtosis (4th moment)

quantile

Sample quantile (value at %)

cumsum

Cumulative sum

cumprod

Cumulative product

cummax

Cumulative maximum

cummin

Cumulative minimum

2.1.14. Execute terminal commands

  • !

  • !pwd

  • !ls

  • dirs = !ls
    
    for file in dirs:
        if file.find("1_") >= 0:
            print(file)
    

2.1.15. Output to different formats

File -> Download as:

  • Notebook (.ipynb)

  • Python (.py)

  • HTML (.html)

  • Reveal.js Slides (.html)

  • Markdown (.md)

  • reST (.rst)

  • LaTeX (.lex)

  • PDF via LaTeX (.pdf)

2.1.15.1. Generate HTML

$ jupyter nbconvert --to html --template basic mynotebook.ipynb

2.1.15.2. Slides

View -> Cell Toolbar -> Slideshow

Listing 637. First run will generate config and may exit with error! In such case, rerun the line
$ jupyter nbconvert filename.ipynb --to slides --post serve

2.1.15.3. Github pages with Jupyter Slides

$ git submodule add https://github.com/hakimel/reveal.js.git reveal.js

$ jupyter nbconvert --to slides index.ipynb --reveal-prefix=reveal.js

$ jupyter nbconvert --to slides index.ipynb --reveal-prefix=reveal.js \
    --SlidesExporter.reveal_theme=serif \
    --SlidesExporter.reveal_scroll=True \
    --SlidesExporter.reveal_transition=none

2.1.16. Assignments

2.1.16.1. Podstawy korzystania

  1. Stwórz notebook jupyter o nazwie jupyter_first.ipynb

  2. Dodaj tekst opisujący następne polecenia

  3. Dodaj trzy różne 'Code Cell'

  4. Uruchom Code Cell z wynikiem wszystkich powyżej

  5. Dodaj Code Cell, który pokaże czas wykonywania instrukcji

  6. Dodaj Code Cell, który wyświetli wykres funkcji sin() inplace

2.1.16.2. Slajdy

  1. Poprzedni skrypt przekonwertuj na slajdy i uruchom prezentację w przeglądarce