- Sun 01 May 2016
- Data Science
- M Hendra Herviawan
- #Data Wrangling, #Time Series, #Python
In [24]:
import pandas as pd
import numpy as np
In [25]:
df = pd.DataFrame()
df['german_army'] = np.random.randint(low=20000, high=30000, size=100)
df['allied_army'] = np.random.randint(low=20000, high=40000, size=100)
df.index = pd.date_range('1/1/2014', periods=100, freq='H')
df.head()
Out[25]:
Truncate the dataframe¶
In [26]:
df.truncate(before='1/2/2014', after='1/3/2014')
Out[26]:
Set the dataframe's index¶
In [28]:
df.index = df.index + pd.DateOffset(months=4, days=5)
df.head()
Out[28]:
Lead a variable 1 hour¶
In [29]:
df.shift(1).head()
Out[29]:
Lag a variable 1 hour¶
In [30]:
df.shift(-1).tail()
Out[30]:
Grouping Options¶
There are many options for grouping. You can learn more about them in Pandas's timeseries docs, however, I have also listed them below for your convience.
| Value | Description |
|---|---|
| B | business day frequency |
| C | custom business day frequency (experimental) |
| D | calendar day frequency |
| W | weekly frequency |
| M | month end frequency |
| BM | business month end frequency |
| CBM | custom business month end frequency |
| MS | month start frequency |
| BMS | business month start frequency |
| CBMS | custom business month start frequency |
| Q | quarter end frequency |
| BQ | business quarter endfrequency |
| QS | quarter start frequency |
| BQS | business quarter start frequency |
| A | year end frequency |
| BA | business year end frequency |
| AS | year start frequency |
| BAS | business year start frequency |
| BH | business hour frequency |
| H | hourly frequency |
| T | minutely frequency |
| S | secondly frequency |
| L | milliseonds |
| U | microseconds |
| N | nanosecondsa |
Aggregation¶
Similar to the aggregating API, groupby API, and the window functions API, a Resampler can be selectively (resampled)[http://pandas.pydata.org/pandas-docs/stable/timeseries.html#aggregation].
Aggregate into days by taking the last value of each day's worth of hourly observation¶
In [36]:
df.resample('D').last()
Out[36]:
Aggregate into days by taking the first, last, highest, and lowest value of each day's worth of hourly observation¶
In [37]:
df.resample('D').ohlc()
Out[37]: