In [24]:

import pandas as pd
import numpy as np

In [25]:

df = pd.DataFrame()

df['german_army'] = np.random.randint(low=20000, high=30000, size=100)
df['allied_army'] = np.random.randint(low=20000, high=40000, size=100)
df.index = pd.date_range('1/1/2014', periods=100, freq='H')

df.head()

Out[25]:

	german_army	allied_army
2014-01-01 00:00:00	28755	33938
2014-01-01 01:00:00	25176	28631
2014-01-01 02:00:00	23261	39685
2014-01-01 03:00:00	28686	27756
2014-01-01 04:00:00	24588	25681

Truncate the dataframe¶

In [26]:

df.truncate(before='1/2/2014', after='1/3/2014')

Out[26]:

	german_army	allied_army
2014-01-02 00:00:00	26401	20189
2014-01-02 01:00:00	29958	23934
2014-01-02 02:00:00	24492	39075
2014-01-02 03:00:00	25707	39262
2014-01-02 04:00:00	27129	35961
2014-01-02 05:00:00	27903	25418
2014-01-02 06:00:00	20409	25163
2014-01-02 07:00:00	25736	34794
2014-01-02 08:00:00	24057	27209
2014-01-02 09:00:00	26875	33402
2014-01-02 10:00:00	23963	38575
2014-01-02 11:00:00	27506	31859
2014-01-02 12:00:00	23564	25750
2014-01-02 13:00:00	27958	24365
2014-01-02 14:00:00	24915	38866
2014-01-02 15:00:00	23538	33820
2014-01-02 16:00:00	23361	30080
2014-01-02 17:00:00	27284	22922
2014-01-02 18:00:00	24176	32155
2014-01-02 19:00:00	23924	27763
2014-01-02 20:00:00	23111	32343
2014-01-02 21:00:00	20348	28907
2014-01-02 22:00:00	27136	38634
2014-01-02 23:00:00	28649	29950
2014-01-03 00:00:00	21292	26395

Set the dataframe's index¶

In [28]:

df.index = df.index + pd.DateOffset(months=4, days=5)
df.head()

Out[28]:

	german_army	allied_army
2014-05-06 00:00:00	28755	33938
2014-05-06 01:00:00	25176	28631
2014-05-06 02:00:00	23261	39685
2014-05-06 03:00:00	28686	27756
2014-05-06 04:00:00	24588	25681

Lead a variable 1 hour¶

In [29]:

df.shift(1).head()

Out[29]:

	german_army	allied_army
2014-05-06 00:00:00	NaN	NaN
2014-05-06 01:00:00	28755.0	33938.0
2014-05-06 02:00:00	25176.0	28631.0
2014-05-06 03:00:00	23261.0	39685.0
2014-05-06 04:00:00	28686.0	27756.0

Lag a variable 1 hour¶

In [30]:

df.shift(-1).tail()

Out[30]:

	german_army	allied_army
2014-05-09 23:00:00	26903.0	39144.0
2014-05-10 00:00:00	27576.0	39759.0
2014-05-10 01:00:00	25232.0	35246.0
2014-05-10 02:00:00	23391.0	21044.0
2014-05-10 03:00:00	NaN	NaN

Grouping Options¶

There are many options for grouping. You can learn more about them in Pandas's timeseries docs, however, I have also listed them below for your convience.

Value	Description
B	business day frequency
C	custom business day frequency (experimental)
D	calendar day frequency
W	weekly frequency
M	month end frequency
BM	business month end frequency
CBM	custom business month end frequency
MS	month start frequency
BMS	business month start frequency
CBMS	custom business month start frequency
Q	quarter end frequency
BQ	business quarter endfrequency
QS	quarter start frequency
BQS	business quarter start frequency
A	year end frequency
BA	business year end frequency
AS	year start frequency
BAS	business year start frequency
BH	business hour frequency
H	hourly frequency
T	minutely frequency
S	secondly frequency
L	milliseonds
U	microseconds
N	nanosecondsa

Aggregation¶

Similar to the aggregating API, groupby API, and the window functions API, a Resampler can be selectively (resampled)[http://pandas.pydata.org/pandas-docs/stable/timeseries.html#aggregation].

Aggregate into days by taking the last value of each day's worth of hourly observation¶

In [36]:

df.resample('D').last()

Out[36]:

	german_army	allied_army
2014-05-06	28214	32110
2014-05-07	28649	29950
2014-05-08	28379	32600
2014-05-09	26752	22379
2014-05-10	23391	21044

Aggregate into days by taking the first, last, highest, and lowest value of each day's worth of hourly observation¶

In [37]:

df.resample('D').ohlc()

Out[37]:

	german_army				allied_army
	open	high	low	close	open	high	low	close
2014-05-06	28755	29206	20037	28214	33938	39955	23417	32110
2014-05-07	26401	29958	20348	28649	20189	39262	20189	29950
2014-05-08	21292	29786	20296	28379	26395	38197	20404	32600
2014-05-09	25764	29952	20738	26752	22613	39695	20189	22379
2014-05-10	26903	27576	23391	23391	39144	39759	21044	21044

M Hendra Herviawan

Python Pandas: Resample Time Series