- Sun 01 May 2016
- Data Science
- M Hendra Herviawan
- #Data Wrangling, #Time Series, #Python
In [24]:
import pandas as pd
import numpy as np
In [25]:
df = pd.DataFrame()
df['german_army'] = np.random.randint(low=20000, high=30000, size=100)
df['allied_army'] = np.random.randint(low=20000, high=40000, size=100)
df.index = pd.date_range('1/1/2014', periods=100, freq='H')
df.head()
Out[25]:
Truncate the dataframe¶
In [26]:
df.truncate(before='1/2/2014', after='1/3/2014')
Out[26]:
Set the dataframe's index¶
In [28]:
df.index = df.index + pd.DateOffset(months=4, days=5)
df.head()
Out[28]:
Lead a variable 1 hour¶
In [29]:
df.shift(1).head()
Out[29]:
Lag a variable 1 hour¶
In [30]:
df.shift(-1).tail()
Out[30]:
Grouping Options¶
There are many options for grouping. You can learn more about them in Pandas's timeseries docs, however, I have also listed them below for your convience.
Value | Description |
---|---|
B | business day frequency |
C | custom business day frequency (experimental) |
D | calendar day frequency |
W | weekly frequency |
M | month end frequency |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
BQ | business quarter endfrequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A | year end frequency |
BA | business year end frequency |
AS | year start frequency |
BAS | business year start frequency |
BH | business hour frequency |
H | hourly frequency |
T | minutely frequency |
S | secondly frequency |
L | milliseonds |
U | microseconds |
N | nanosecondsa |
Aggregation¶
Similar to the aggregating API, groupby API, and the window functions API, a Resampler can be selectively (resampled)[http://pandas.pydata.org/pandas-docs/stable/timeseries.html#aggregation].
Aggregate into days by taking the last value of each day's worth of hourly observation¶
In [36]:
df.resample('D').last()
Out[36]:
Aggregate into days by taking the first, last, highest, and lowest value of each day's worth of hourly observation¶
In [37]:
df.resample('D').ohlc()
Out[37]: