Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CV-TMLE vs TMLE #91

Open
olivierlabayle opened this issue Sep 6, 2023 · 0 comments
Open

CV-TMLE vs TMLE #91

olivierlabayle opened this issue Sep 6, 2023 · 0 comments

Comments

@olivierlabayle
Copy link

Hello,

I am following the tutorial and trying to look at the difference between CV-TMLE and TMLE with the perinatal dataset.

perinatal.csv

To keep things simple I only use a glm as the model for both the propensity score and the outcome mean. I am surprised to see that the output is exactly the same for both procedures. The CV-TMLE seems to complain about glm not being "CV-aware" which might be the reason. However I don't understand why that should be the case. My understanding of CV-TMLE is that:

  • The dataset should be splitted in V folds
  • The glm models (for both A and Y) should be fitted on each split, so we should have V instantiations of each glm each trained on a different split.
  • The targeting step is pooled from predictions of the V glm model pairs on their respective validation sets
  • The final estimate is the average of estimates across validation folds
  • The influence curve (I am not entirely sure if it is pooled across validation samples or if multiple variance estimates are made and averaged)

As I understand it, we could have used a Super Learning instead of a GLM which would have resulted in another nested cross-validation procedure but Super Learning is not a requirement of CV-TMLE. The code to reproduce is below: you can tweak the learner_list to change to a super learner and then 2 different outputs are returned and no "CV-aware" complaint is formulated.

I would appreciate some clarification on the procedure and why this is happening! Thanks!

library(data.table)
library(tmle3)
library(sl3)

data = read.csv("perinatal.csv")

node_list <- list(
  W = c(
    "apgar1", "apgar5", "gagebrth", "mage", "meducyrs", "sexn"
  ),
  A = "parity01",
  Y = "haz01"
)

glm = Lrnr_glm$new()
lrn_mean = Lrnr_mean$new()
sl <- Lrnr_sl$new(learners = Stack$new(glm, lrn_mean), metalearner = Lrnr_nnls$new())

learner_list <- list(A = glm, Y = glm)
# learner_list = list(A=sl, Y = sl)

ate_spec <- tmle_ATE(
  treatment_level = 1,
  control_level = 0
)

tmle_task <- ate_spec$make_tmle_task(data, node_list)
initial_likelihood <- ate_spec$make_initial_likelihood(
  tmle_task,
  learner_list
)


targeted_likelihood_cv <- Targeted_Likelihood$new(initial_likelihood)

targeted_likelihood_no_cv <-
  Targeted_Likelihood$new(initial_likelihood,
    updater = list(cvtmle = FALSE)
  )

tmle_params_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_cv)
tmle_params_no_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_no_cv)

tmle_no_cv <- fit_tmle3(
  tmle_task, targeted_likelihood_no_cv, tmle_params_no_cv,
  targeted_likelihood_no_cv$updater
)
tmle_no_cv
# -0.1855909

tmle_cv <- fit_tmle3(
  tmle_task, targeted_likelihood_cv, tmle_params_cv,
  targeted_likelihood_cv$updater
)
tmle_cv
# -0.1855909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant