This issue is related to Pipe '.' dot causes trouble in glm call.
purrr:map is wonderful for subgroup analysis and/or model comparison. However, when using glm
, the call is messed up and causing issues, e.g. when computing pseudo-R2s. The reason is that update
doesn't work with the ugly call
, and thus pscl::pR2
cannot compute the log-likelihood of the base model.
pacman::p_load(tidyverse)
#sample data
pacman::p_load(ISLR)
mydata = ISLR::Default
#nest data, students and non-students
Default_nested = Default %>% group_by(student) %>% nest
#fit glms
formul= default ~income+balance
glms = Default_nested %>%
mutate(model=map(data,glm,formula=formul,family='binomial'))
#pscl::pR2 throwing error
pacman::p_load(pscl)
glms %>% mutate(pr2=map(model,pR2))
Now we can take a look at the first submodel. The call looks strange (formula=..1) even though formula contains the right formula.
> glms$model[[1]]$call
.f(formula = ..1, family = "binomial", data = .x[[i]])
> glms$model[[1]]$formula
default ~ income + balance
> glms$model[[1]]$data
# A tibble: 7,056 x 3
default balance income
<fct> <dbl> <dbl>
1 No 730. 44362.
What is the cleanest way to be able to use pscl::pR2 when you have many (more than 2 in this example) glm objects in your tibble?
Edit:
Overview of solution strategies:
(A) "fix" the glm object, so that update
can be applied to it:
glms %>% mutate(model = map(model,function(x){x$call = call2("glm",formula=x$formula,data=quote(Default),family='binomial');x})) %>%
mutate(pr2=map(model,pR2)) %>% unnest(pr2)
This 'runs', however, the computed R2 is off. So this solution strategy is probably a dead-end.
(B) Write a wrapper for `glm, as proposed by Artem. This should work fine. Downside: the calls look ugly.
I expanded on Artem's proposed solution to create glm3
.
glm3 <- function(formula,data,family) {
eval(rlang::expr( glm(!!rlang::enexpr(data),
formula=!!formula,
family=!!family ) ))}
glms3 <- Default_nested %>% mutate( model=map(data,glm3,formula=formul,family='binomial'),pr2=map(model,pR2) )
glms3 %>% unnest(pr2)
(C) In this particular case (pseudo R2s), simply write a better pseudo-r2 function. Since it's probably the only major statistic that doesn't work within purrr::map, this may actually make sense. I put together the psr2glm
function.
psr2glm=function(glmobj){
L.base=
logLik(
glm(formula = reformulate('1',gsub( " .*$", "", deparse(glmobj$formula) )),
data=glmobj$data,
family = glmobj$family))
n=length(glmobj$residuals)
L.full=logLik(glmobj)
D.full <- -2 * L.full
D.base <- -2 * L.base
G2 <- -2 * (L.base - L.full)
return(data.frame(McFadden = 1-L.full/L.base,
CoxSnell = 1 - exp(-G2/n),
Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))))
}
It works:
glms = Default_nested %>%
mutate(model=map(data,glm,formula=formul,family='binomial'))
glms %>% mutate(pr2=map(model,psr2glm)) %>% unnest(pr2)
I consider proposing changes to DescTools:::PseudoR2, however, I first need to check if the solution is general.
The key to this idea is to skip update
and instead directly call glm
. All required information pieces are within the glm object, even within purrr::map.
Nice side effect of using psr2glm: unnest's output looks nice.
(D) Change either glm
or update
. Given that the glm object actually contains all necessary information, one could consider the observed behavior a bug. So it should be fixed in base R.