Replies: 8 comments 9 replies
-
It's best to think of both of these as "case-level"--the difference here is whether you are dichotomizing lesions into a binary variable (present or not) versus leaving it as a count variable (number of lesions). I'll discuss model form in a minute. Let's first discuss your predictors. You have described 3 predictor variables—case, type, and location. You need to decide whether to model each of these as a fixed-effects predictor or as a random-effects predictor. To decide, ask yourself, are you interested in the specific values of the variables themselves (e.g., these cases, these locations), or do you want to treat these values as samples from a broader population for the variable and to generalize to that broader population (e.g., do you want to generalize to the population of potential cases, to the population of potential locations)? Another way to think about this is--do you want to take extreme values for one of the cases/locations/types at face value, or do you want to regularize them a bit and pull them toward the overall mean (this is often reasonable). If you want to generalize to a broader population or regularize values, model the variable as a random grouping factor. Otherwise, model it as a fixed factor. For example, if you want to model case as a random factor but type and location as fixed factors, you could use the formula: If you want to model all 3 as random factors, then you could use: Both of the above formulations treat the three types of grouping factors as distinct (but correlated): cases may have predispositions toward more lesions generally, but not predispositions to specific types or locations of lesions. If you want to consider predispositions toward specific types/locations across cases, you can add an interaction to your grouping structure: Here, I've left in the direct effects of location and type to consider that these factors may have main effects in addition to their individual case-level effects. You could consider dropping those. Now, turning to your question about model family. The most appropriate form I would argue is your (2)--to model the number of lesions, which may include zero. For this approach, you likely want to choose a family that (1) reflects a count variable, (2) is flexible about the mean and variance for the lesion counts, and (3) also flexibly models the absence of any lesions. For this, I would recommend a zero-inflated negative binomial model, as it has all of these features. glmmTMB can fit this family of models, with random effects for main count portion of the model, but only fixed effects for the zero-inflation.
If you want to model the zero-inflation with random effects as well, use brms.
Note that you can use this model to answer your first question (what predicts presence of any lesions versus none), but does so better than a binomial model because the binomial model treats any number of lesions greater than zero as the same. |
Beta Was this translation helpful? Give feedback.
-
Thank you so much! So would using the suggested model address assessing the differences shown in the following tables? The first table is the proportion of cases with particular lesion type/location while the second is the proportion of lesions of a certain type/location. I think this is why I was thinking of trying to split analyses into case-level vs lesion-level but may just be confusing myself. <style> </style>
|
Beta Was this translation helpful? Give feedback.
-
The issue is that first table treats a case with 1 lesion and 10 lesions identically. The model I suggest can make that comparison, but it doesn’t assume they are identical the way a binomial model would. |
Beta Was this translation helpful? Give feedback.
-
That makes sense! Thank you! One additional question: What are the pros and cons for including the random effect in the zero inflation? Is there a rule of thumb for when you should? |
Beta Was this translation helpful? Give feedback.
-
Same arguments apply as I laid out for the mean function. |
Beta Was this translation helpful? Give feedback.
-
Just to clarify that negative-binomial models can be used to model 0-counts; zero-inflated models model excess zeros - that is, when you have more zeros that is expected from the NB distribution alone (: |
Beta Was this translation helpful? Give feedback.
-
I have converted this issue into a discussion, seemed more appropriate to me. |
Beta Was this translation helpful? Give feedback.
-
Thanks! I just thinking a bit about an extension to this analysis: I'm interested in whether lesion number of particular types and locations predicts age at death and disease duration. I was thinking about adding age, disease duration, and interactions for these two variables with lesion type and location to the model we discussed above to answer this question. However, it seemed this would make the model much more complicated and I was hitting some convergence issues. If I just use age and disease duration as predictors but not location and lesion type I do get a significant relationship with age but am curious if this is driven by particular locations or lesion types. In addition, would doing the mixed model approach answer the reverse question of whether age and disease duration predicts lesion number at particular locations and of particular types rather than question of interest? Therefore, I thought that I could run a separate analysis with age and disease duration as outcome variables instead but wasn't sure how to deal with predictors that would clustered. |
Beta Was this translation helpful? Give feedback.
-
Dear All,
I have collected data on the presence and number of lesions of 1) different types and 2) within different locations in a patient cohort. I looked at the following summary (http://htmlpreview.github.io/?https://github.com/strengejacke/mixed-models-snippets/blob/master/overview_modelling_packages.html) to come up with an approach and wanted to get some feedback. Ultimately, I want to perform the following:
Case-level approach to look for differences in the proportion of cases with lesions at the different locations (region 1, 2, and 3) and of the different lesion types (type 1, 2, and 3). To accomplish this, I thought I would perform a glmer (family=binomial) model since the data is binary (lesion vs not) at the nested levels of location and lesion type for each case. Would you agree with this method?
Lesion-level approach using the data I have on number of lesions (rather than just presence) to look for differences in the proportion of lesions that are found at particular locations (region 1, 2, and 3) and are of particular types (type 1, 2, and 3). To accomplish this, I thought I would perform glmmTMB(ziformula, family=beta_family/betabinomial) model on the proportion of total lesions identified for each case that fall into the different categories (ie locations and types) since the data is proportional, includes 0 and 1, and is nested for each case. Would this be the best method?
As always, thank you for your insights and help! Do not hesitate to let me know if I need to clarify my aims or the type of data any further.
Beta Was this translation helpful? Give feedback.
All reactions