I'm trying to use the na.approx()
function from the zoo
library (in conjunction with xts
) to interpolate missing values from repeated measures data for multiple individuals with multiple measurements.
Sample data...
event.date <- c("2010-05-25", "2010-09-10", "2011-05-13", "2012-03-28", "2013-03-07",
"2014-02-13", "2010-06-11", "2010-09-10", "2011-05-13", "2012-03-28",
"2013-03-07", "2014-02-13")
variable <- c("neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd",
"wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd")
value <- c(0.7490, 0.7615, 0.7900, 0.7730, NA, 0.7420, 1.0520, 1.0665, 1.0760,
1.0870, NA, 1.0550)
## Bind into a data frame
df <- data.frame(event.date, variable, value)
rm(event.date, variable, value)
## Convert date
df$event.date <- as.Date(df$event.date)
## Load libraries
library(magrittr)
library(xts)
library(zoo)
I can interpolate one missing data point for a single outcome for a given person using xts()
and na.approx()
....
## Subset one variable
wbody <- subset(df, variable == "wbody.bmd")
## order/index and then interpolate
xts(wbody$value, wbody$event.date) %>%
na.approx()
2010-06-11 1.052000
2010-09-10 1.066500
2011-05-13 1.076000
2012-03-28 1.087000
2013-03-07 1.070977
2014-02-13 1.055000
Not ideal having a matrix returned, but I can work around that. The main problem I have though is that I've multiple outcomes for multiple people. I, perhaps naively thought that since this is therefore a split-apply-combine problem that I could utilise dplyr
to achieve this in the following manner...
## Load library
library(dplyr)
## group and then arrange the data (to ensure dates are correct)
df %>%
group_by(variable) %>%
arrange(variable, event.date) %>%
xts(.$value, .$event.date) %>%
na.approx()
Error in xts(., .$value, .$event.date) : order.by requires an appropriate time-based object
It seems that dplyr
doesn't play well with xts
/zoo
and I've spent a couple of hours searching around trying to find tutorials/examples on how to interpolate missing data points in R, but all I've found are single case examples and so far I've been unable to find anything on how to do this for multiple sites for multiple people (I realise I could make it just a multiple people problem by reshaping my data to wide but that still wouldn't solve the problem I'm encountering).
Any thoughts/advice/insights on how to proceed would be greatly appreciated.
Thanks
EDIT : Clarification that some functions come from zoo
package.