3.11. Series Statistics

import pandas as pd
import numpy as np

s = pd.Series(
    data = [1.0, 2.0, 3.0, np.nan, 5.0],
    index = ['a', 'b', 'c', 'd', 'e'])

s
# a    1.0
# b    2.0
# c    3.0
# d    NaN
# e    5.0
# dtype: float64

3.11.1. Count

  • Series.count() - Number of non-null observations

len(s)          # 5
s.size          # 5
s.count()       # 4
s.nunique()     # 4
s.values_count()
# 5.0    1
# 3.0    1
# 2.0    1
# 1.0    1
# dtype: int64

3.11.2. Sum

  • Series.sum() - Sum of values

  • Series.cumsum() - Cumulative sum

s.sum()
# 11.0

s.cumsum()
# a    1.0
# b    3.0
# c    6.0
# d    NaN
# e    11.0
# dtype: float64

3.11.3. Product

  • Series.prod() - Product of values

  • Series.cumprod() - Cumulative product

s.prod()
# 30.0

s.cumprod()
# a    1.0
# b    2.0
# c    6.0
# d    NaN
# e    30.0
# dtype: float64

3.11.4. Extremes

  • Series.min() - Minimum value

  • Series.idxmin() - Index of minimum value (Float, Int, Object, Datetime, Index)

  • Series.argmin() - Range index of minimum value

  • Series.cummin() - Cumulative minimum

  • Series.max() - Maximum value

  • Series.idxmax() - Index of maximum value (Float, Int, Object, Datetime, Index)

  • Series.argmax() - Range index of maximum value

  • Series.cummax() - Cumulative maximum

Code 3.165. Minimum, index of minimum and cumulative minimum
s.min()
# 1.0

s.idxmin()
# 'a'

s.argmin()
# 0

s.cummin()
# a    1.0
# b    1.0
# c    1.0
# d    NaN
# e    1.0
# dtype: float64
Code 3.166. Maximum, index of maximum and cumulative maximum
s.max()
# 5.0

s.idxmax()
# 'e'

s.argmax()
# 4

s.cummax()
# a    1.0
# b    2.0
# c    3.0
# d    NaN
# e    5.0
# dtype: float64

3.11.5. Average

Code 3.167. Arithmetic mean of values
s.mean()
# 2.75
Code 3.168. Arithmetic median of values
s.median()
# 2.5
Code 3.169. Mode
s.mode()
# 0    1.0
# 1    2.0
# 2    3.0
# 3    5.0
# dtype: float64
Code 3.170. Rolling Average
s.rolling(window=2).mean()
# a    NaN
# b    1.5
# c    2.5
# d    NaN
# e    NaN
# dtype: float64
../../_images/pandas-series-stats-rolling.png

Figure 3.18. Rolling Average

3.11.6. Distribution

Code 3.171. Absolute value
s.abs()
# a    1.0
# b    2.0
# c    3.0
# d    NaN
# e    5.0
# dtype: float64
Code 3.172. Standard deviation
s.std()
# 1.707825127659933
../../_images/pandas-series-stats-stdev.png

Figure 3.19. Standard Deviation

Code 3.173. Mean absolute deviation
s.mad()
# 1.25
Code 3.174. Standard Error of the Mean (SEM)
s.sem()
# 0.8539125638299665
../../_images/pandas-series-stats-sem.png

Figure 3.20. Standard Error of the Mean (SEM)

Code 3.175. Skewness (3rd moment)
s.skew()
../../_images/pandas-series-stats-skew.png

Figure 3.21. Skewness

Code 3.176. Kurtosis (4th moment)
s.kurt()
../../_images/pandas-series-stats-kurt.png

Figure 3.22. Kurtosis

Code 3.177. Sample quantile (value at %). Quantile also known as Percentile.
s.quantile(.3)
# 1.9

s.quantile([.25, .5, .75])
# 0.25    1.75
# 0.50    2.50
# 0.75    3.50
# dtype: float64
Code 3.178. Variance
s.var()
# 2.9166666666666665
Code 3.179. Correlation Coefficient
s.corr(s)
# 1.0
../../_images/pandas-series-stats-corr.png

Figure 3.23. Correlation Coefficient

3.11.7. Describe

s.describe()
# count    4.000000
# mean     2.750000
# std      1.707825
# min      1.000000
# 25%      1.750000
# 50%      2.500000
# 75%      3.500000
# max      5.000000
# dtype: float64

3.11.8. Assignments

Todo

Create assignments