I have an irregular time series of events (posts) using xts
, and I want to calculate the number of events that occur over a rolling weekly window (or biweekly, or 3 day, etc). The data looks like this:
postid
2010-08-04 22:28:07 867
2010-08-04 23:31:12 891
2010-08-04 23:58:05 901
2010-08-05 08:35:50 991
2010-08-05 13:28:02 1085
2010-08-05 14:14:47 1114
2010-08-05 14:21:46 1117
2010-08-05 15:46:24 1151
2010-08-05 16:25:29 1174
2010-08-05 23:19:29 1268
2010-08-06 12:15:42 1384
2010-08-06 15:22:06 1403
2010-08-07 10:25:49 1550
2010-08-07 18:58:16 1596
2010-08-07 21:15:44 1608
which should produce something like
nposts
2010-08-05 00:00:00 10
2010-08-06 00:00:00 9
2010-08-07 00:00:00 5
for a 2-day window. I have looked into rollapply
, apply.rolling
from PerformanceAnalytics
, etc, and they all assume regular time series data. I tried changing all of the times to just the day the the post occurred and using something like ddply
to group on each day, which gets me close. However, a user might not post every day, so the time series will still be irregular. I could fill in the gaps with 0s, but that might inflate my data a lot and it's already quite large.
What should I do?
See Question&Answers more detail:os