I've built an lm
model without using the data=
parameter:
m1 <- lm( mdldvlp.trim$y ~ gc.pc$scores[,1] + gc.pc$scores[,2] + gc.pc$scores[,3] +
gc.pc$scores[,4] + gc.pc$scores[,5] + gc.pc$scores[,6] + predict(gc.tA))
Now I'd like to predict m1
using newdata
and so name my new data.frame to match the variables used in the lm()
call above.
With newComps
as my new gc.pc
(which, like the gc.tA
prediction, were predicted using the new data.frame without any issues), I've tried
newD <- data.frame( newComps[1:100,1:6] ,
predict(gc.tA , newdata = mdldvlp[1:100,predKept]))
names(newD) <- names(m1$coefficients)[-1]
names(newD) <- names(m1$model)[-1]
names(newD) <- c( "gc.pc$scores[, 1]" , "gc.pc$scores[, 2]" , "gc.pc$scores[, 3]" ,
"gc.pc$scores[, 4]" , "gc.pc$scores[, 5]" , "gc.pc$scores[, 6]" ,
"predict(gc.tA)" )
names(newD) <- c( "gc.pc$scores[,1]" , "gc.pc$scores[,2]" , "gc.pc$scores[,3]" ,
"gc.pc$scores[,4]" , "gc.pc$scores[,5]" , "gc.pc$scores[,6]" ,
"predict(gc.tA)" )
Unfortunately, predict.lm
does not accept the naming strategies above and returns the dreaded newdata
warning along with the predictions from the original data.frame that built m1
:
Warning message:
'newdata' had 100 rows but variable(s) found have 1414 rows
How should I name the newD
columns to make the predict
call work? Thanks.
The code below recreates the issue:
require(rpart)
set.seed(123)
X <- matrix(runif(200) , 20 , 10)
gc.pc <- princomp(X)
y <- runif(20)
mdldvlp.trim <- data.frame(y,X)
names(mdldvlp.trim) <- c("y",paste("x",1:10,sep=""))
predKept <- paste("x",1:10,sep="")
gc.tA <- rpart( y ~ . , data = mdldvlp.trim)
m1 <- lm( mdldvlp.trim$y ~ gc.pc$scores[,1] + gc.pc$scores[,2] + gc.pc$scores[,3] +
gc.pc$scores[,4] + gc.pc$scores[,5] + gc.pc$scores[,6] + predict(gc.tA))
mdldvlp <- data.frame(matrix(runif(2000) , 200 , 10))
names(mdldvlp) <- predKept
newComps <- predict( gc.pc , newdata=mdldvlp )
newD <- data.frame( newComps[1:100,1:6] ,
predict(gc.tA , newdata = mdldvlp[1:100,predKept]))
# enter newD naming strategy here
predict( m1 , newdata=newD )
4/20 Follow up:
Thanks all for your answers. I understand things would be easier by first creating a data.frame with properly named predictors. I understand that. My question is if the modeling data frame does indeed evaluate to a data frame with variables named gc.pc$scores[,1]
etc. then why won't the naming 'strategies' used above work with predict.lm
? In other words, does lm
really evaluate its modeling data frame with gc.pc$scores[,1]
and so on? If it did, wouldn't the renaming strategies above work in predict.lm
?