I’ll not repeat all the code from that previous post.1 I have a data.frame that has a character column models with the formulas I want. I want the fitted results of that in another column of the same data.frame.
That seems to have worked! Lets see …
Yes! all the models are in there. Getting them out of the data.frame is a bit awkward. First I had to extract the relevant column of the dataframe, and then pull out a piece of the list. Looking at results is also painful, but turn it into a tbl and …
That’s quite beautiful! It is interesting that
but tlb_df() shows it as (chr).
OK, now I want to use broom::glance() to get the AIC etc
Hmmm. OK, so now summaries is a column of data.frames. mutate() might not be quite the right way to do this. I was hoping for several independent columns. In hindsight what I get is obvious, but a bit awkward to work with. Lets see …
So that’s embarrassing. But now tidyr to the rescue with unnest().
So that’s pretty good. The top 4 models are all virtually identical, the there’s a huge leap to the rest of the set. Let’s see what else broom has for us.
That looks pretty good … calculating model averaged parameters is the next step. What I need is a data.frame with one row per coefficient per model, and the weight for that model.
OK, that doesn’t work to do it all at once because the number of elements in each of the nested columns isn’t the same. I think I have to do it in stages. First I’ll unnest the summaries and calculate the model weights.
This is basically the output of aictab(). But, there’s more. I’ve got all the estimates and their standard errors in there. If I unnest that, group_by(term) and then summarize …
OK, there’s a problem. Not all of the terms appear in every model. This is apparent because the total weight associated with each term is less than 1 for everything besides the intercept.2 So now I have two choices. Normalize the averaged coefficients by the total weight for that coefficient, or assume that those coefficients are zero in the models where they’re missing. I prefer the second option because it honestly reflects the knowledge of the parameter in the set. I believe there’s a function for that.
Huh. That’s exactly the same result as before. Makes sense, for the weighted avg. each term that’s now 0 is just 0 in the sum. So makes no change there. I think it does matter for the averaged standard error. Now I’ve another little problem to figure out. The model averaged variance of a parameter includes a term with the difference between the model averaged coefficient, and the coefficient conditional on the specific model. So I need to use the model averaged coefficient above and stick it back into the dataframe with one row per term per model. I can use left_join() for that.
OK then. That sucks. Extensive mucking around in the middle of the chain above reveals the problem. When I do complete() the value of an unspecified column, like, w for example, ends up missing. So when I do the final sum to get the model averaged variance, the result is missing. I can’t just set w = 0 in complete(), because I actually need to include the non-zero between model variance component. I think I need to do another join in the middle of the pipe to pull in the model weights from results2. What I want is for that operation to replace the column w in the data.frame.
term
avgCoef
totalw
avgVar
avgSE
(Intercept)
0.6881
1.0000
0.0461
0.2146
pc1
0.0009
0.7309
0.0000
0.0011
pc1:pc2
0.0000
0.1820
0.0000
0.0000
pc2
-0.0088
1.0000
0.0000
0.0022
vor
0.2625
1.0000
0.0013
0.0364
vor:pc1
-0.0002
0.5578
0.0000
0.0002
vor:pc2
0.0014
0.9984
0.0000
0.0004
Excellent! I had to use w.y as the name of the weight beaus the 2nd left_join() creates two columns called w.x and w.y because the name is the same between the two input data.frames. That’s OK, I think.
I was wondering if this is really better than my old style using a list of formulas, a list of fitted models, and package AICcmodavg. Part of the reason the above looks so awful is that I tried alot of things that didn’t work, and I left the code in there for the sake of honesty! There’s another reason too – modavg() won’t do the type of averaging that I prefer. It also only does a single parameter at a time, and in the case of this model set it would flat out refuse to do what I’ve done here because of the interaction terms. Using the approach I’ve just tried here all the terms get done, and I don’t have to think about keeping things lined up properly.
I don’t feel I’ve fully grasped how to effectively use a list column in a data.frame yet, but this was a huge step in the right direction!
All the code for this post, including that not shown, can be found here. ↩
This is sometimes called the variable importance weight, and used as an indicator of how important that particular variable is. ↩