A similar question was asked however the link in the answer points to random forest example, it doesn't seem to work in my case.
Here is an example what I'm trying to do:
gbmGrid <- expand.grid(interaction.depth = c(5, 9),
n.trees = (1:3)*200,
shrinkage = c(0.05, 0.1))
fitControl <- trainControl(
method = "cv",
number = 3,
classProbs = TRUE)
gbmFit <- train(strong~.-Id-PlayerName, data = train[1:10000,],
method = "gbm",
trControl = fitControl,
verbose = TRUE,
tuneGrid = gbmGrid)
gbmFit
Everything goes fine, I get the best parameters. Now if I do the prediction:
predictStrong = predict(gbmFit, newdata=train[11000:50000,])
I get a binary vector of predictions, which is good:
[1] 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 ...
However when I try to get probabilities, I get an error:
predictStrong = predict(gbmFit, newdata=train[11000:50000,], type="prob")
Error in `[.data.frame`(out, , obsLevels, drop = FALSE) :
undefined columns selected
Where seems to be the problem?
Additional info:
traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(out, , obsLevels, drop = FALSE)
3: out[, obsLevels, drop = FALSE]
2: predict.train(gbmFit, newdata = train[11000:50000, ], type = "prob")
1: predict(gbmFit, newdata = train[11000:50000, ], type = "prob")
Versions:
R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
caret version: 6.0-29
EDIT:
I've seen this topic as well and I don't get an error about variable names, although I have couple of variable names with underscores, which I assume it's valid, as I use make.names
and get the same names as the original.
colnames(train) == make.names(colnames(train))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
See Question&Answers more detail:os