Python Pandas: Resample Time Series


In [24]:
import pandas as pd
import numpy as np
In [25]:
df = pd.DataFrame()

df['german_army'] = np.random.randint(low=20000, high=30000, size=100)
df['allied_army'] = np.random.randint(low=20000, high=40000, size=100)
df.index = pd.date_range('1/1/2014', periods=100, freq='H')

df.head()
Out[25]:
german_army allied_army
2014-01-01 00:00:00 28755 33938
2014-01-01 01:00:00 25176 28631
2014-01-01 02:00:00 23261 39685
2014-01-01 03:00:00 28686 27756
2014-01-01 04:00:00 24588 25681

Truncate the dataframe

In [26]:
df.truncate(before='1/2/2014', after='1/3/2014')
Out[26]:
german_army allied_army
2014-01-02 00:00:00 26401 20189
2014-01-02 01:00:00 29958 23934
2014-01-02 02:00:00 24492 39075
2014-01-02 03:00:00 25707 39262
2014-01-02 04:00:00 27129 35961
2014-01-02 05:00:00 27903 25418
2014-01-02 06:00:00 20409 25163
2014-01-02 07:00:00 25736 34794
2014-01-02 08:00:00 24057 27209
2014-01-02 09:00:00 26875 33402
2014-01-02 10:00:00 23963 38575
2014-01-02 11:00:00 27506 31859
2014-01-02 12:00:00 23564 25750
2014-01-02 13:00:00 27958 24365
2014-01-02 14:00:00 24915 38866
2014-01-02 15:00:00 23538 33820
2014-01-02 16:00:00 23361 30080
2014-01-02 17:00:00 27284 22922
2014-01-02 18:00:00 24176 32155
2014-01-02 19:00:00 23924 27763
2014-01-02 20:00:00 23111 32343
2014-01-02 21:00:00 20348 28907
2014-01-02 22:00:00 27136 38634
2014-01-02 23:00:00 28649 29950
2014-01-03 00:00:00 21292 26395

Set the dataframe's index

In [28]:
df.index = df.index + pd.DateOffset(months=4, days=5)
df.head()
Out[28]:
german_army allied_army
2014-05-06 00:00:00 28755 33938
2014-05-06 01:00:00 25176 28631
2014-05-06 02:00:00 23261 39685
2014-05-06 03:00:00 28686 27756
2014-05-06 04:00:00 24588 25681

Lead a variable 1 hour

In [29]:
df.shift(1).head()
Out[29]:
german_army allied_army
2014-05-06 00:00:00 NaN NaN
2014-05-06 01:00:00 28755.0 33938.0
2014-05-06 02:00:00 25176.0 28631.0
2014-05-06 03:00:00 23261.0 39685.0
2014-05-06 04:00:00 28686.0 27756.0

Lag a variable 1 hour

In [30]:
df.shift(-1).tail()
Out[30]:
german_army allied_army
2014-05-09 23:00:00 26903.0 39144.0
2014-05-10 00:00:00 27576.0 39759.0
2014-05-10 01:00:00 25232.0 35246.0
2014-05-10 02:00:00 23391.0 21044.0
2014-05-10 03:00:00 NaN NaN

Grouping Options

There are many options for grouping. You can learn more about them in Pandas's timeseries docs, however, I have also listed them below for your convience.

Value Description
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA business year end frequency
AS year start frequency
BAS business year start frequency
BH business hour frequency
H hourly frequency
T minutely frequency
S secondly frequency
L milliseonds
U microseconds
N nanosecondsa

Aggregation

Similar to the aggregating API, groupby API, and the window functions API, a Resampler can be selectively (resampled)[http://pandas.pydata.org/pandas-docs/stable/timeseries.html#aggregation].

Aggregate into days by taking the last value of each day's worth of hourly observation

In [36]:
df.resample('D').last()
Out[36]:
german_army allied_army
2014-05-06 28214 32110
2014-05-07 28649 29950
2014-05-08 28379 32600
2014-05-09 26752 22379
2014-05-10 23391 21044

Aggregate into days by taking the first, last, highest, and lowest value of each day's worth of hourly observation

In [37]:
df.resample('D').ohlc()
Out[37]:
german_army allied_army
open high low close open high low close
2014-05-06 28755 29206 20037 28214 33938 39955 23417 32110
2014-05-07 26401 29958 20348 28649 20189 39262 20189 29950
2014-05-08 21292 29786 20296 28379 26395 38197 20404 32600
2014-05-09 25764 29952 20738 26752 22613 39695 20189 22379
2014-05-10 26903 27576 23391 23391 39144 39759 21044 21044