Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Something like

sliding = function(df, n, f)
    ldply(1:(nrow(df) - n + 1), function(k)
        f(df[k:(k + n - 1), ])
    )

That would be used like

> df
  n         a
1 1 0.8021891
2 2 0.9446330
...

> sliding(df, 2, function(df) with(df,
+     data.frame(n = n[1], a = a[1], b = sum(n - a))
+ ))
  n         a        b
1 1 0.8021891 1.253178
...

Except straight inside ddply, so that I could get the nice syntactic sugar that comes with it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
298 views
Welcome To Ask or Share your Answers For Others

1 Answer

Since there hasn't been an answer posted to this question, I thought I'd put one up to make the case that there is actually an even better way to go about this type of problem - one that can also be potentially thousands of times faster. (If this isn't helpful, please let me know, but I thought it would be better than nothing here)

Whenever I hear "moving average" or "sliding window", FFT convolution immediately pops into my mind. This is because it can handle these types of problems in an extremely efficient manner. Since all the "sliding" is done behind the scenes, I think that it also has all the syntactic beauty you could ever ask for.

(The following code is available in one file at https://gist.github.com/1320175)

We start by simulating some data (I'm using integers here for simplicity, but of course you don't need to).

require(plyr)
set.seed(12345)

n = 10
n.sum = 2
a = sample.int(10, n, replace=T)

df = data.frame(n=1:n, a)
> df
    n  a
1   1  8
2   2  9
3   3  8
4   4  9
5   5  5
6   6  2
7   7  4
8   8  6
9   9  8
10 10 10

Now, we will precompute n-a all in one go.

n.minus.a = with(df, n - a)

Next, define a kernel k that, when convolved with our input n.minus.a, will do the summing (or averaging/smoothing/whatever else) to our data.

k = rep(0, n)
k[1:n.sum] = 1

With everything set up, we can define a function to do this convolution efficiently in the frequency domain via fft().

myConv <- function(x, k){
  Fx  = fft(x)
  Fk  = fft(k)
  Fxk = Fx * Fk
  xk  = fft(Fxk, inverse=T)
  (Re(xk) / n)[-(1:(n.sum-1))]
}

The syntax to execute this is nice and simple:

> myConv(n.minus.a, k)
[1] -14 -12 -10  -5   4   7   5   3   1

All this also happens under the hood when you use the convolve() convenience function in R.

> convolve(n.minus.a, k)[1:(length(n.minus.a)-n.sum+1)]
[1] -14 -12 -10  -5   4   7   5   3   1

We now compare this to the manual method to show that the results are all equivalent:

> sliding(df, 2, function(df) with(df, data.frame(n = n[1], a = a[1], b = sum(n - a))))
  n a   b
1 1 8 -14
2 2 9 -12
3 3 8 -10
4 4 9  -5
5 5 5   4
6 6 2   7
7 7 4   5
8 8 6   3
9 9 8   1

Finally, we will make n=10^4 and test all these methods for speed:

> system.time(myConv(n.minus.a, k))
   user  system elapsed 
  0.002   0.000   0.002 
> system.time(convolve(n.minus.a, k, type='circ')[1:(length(n.minus.a)-n.sum+1)])
   user  system elapsed 
  0.002   0.000   0.002 
> system.time(sliding(df, 2, function(df) with(df, data.frame(n = n[1], a = a[1], b = sum(n - a)))))
   user  system elapsed 
  7.944   0.018   7.962 

The FFT methods return almost instantaneously, and, even with this rough timing, are almost 4000 times faster than the manual method.

Of course not every sort of sliding problem can be pigeon-holed into this paradigm, but for numerical problems like this one using sum() (and also means, weighted averages, etc.) it works perfectly. At any rate, it is usually well worth it to at least google a bit to see if there is a filter kernel available that will do the trick for a given probelm. Good luck!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...