CV-TMLE vs TMLE #91

olivierlabayle · 2023-09-06T11:34:47Z

Hello,

I am following the tutorial and trying to look at the difference between CV-TMLE and TMLE with the perinatal dataset.

To keep things simple I only use a glm as the model for both the propensity score and the outcome mean. I am surprised to see that the output is exactly the same for both procedures. The CV-TMLE seems to complain about glm not being "CV-aware" which might be the reason. However I don't understand why that should be the case. My understanding of CV-TMLE is that:

The dataset should be splitted in V folds
The glm models (for both A and Y) should be fitted on each split, so we should have V instantiations of each glm each trained on a different split.
The targeting step is pooled from predictions of the V glm model pairs on their respective validation sets
The final estimate is the average of estimates across validation folds
The influence curve (I am not entirely sure if it is pooled across validation samples or if multiple variance estimates are made and averaged)

As I understand it, we could have used a Super Learning instead of a GLM which would have resulted in another nested cross-validation procedure but Super Learning is not a requirement of CV-TMLE. The code to reproduce is below: you can tweak the learner_list to change to a super learner and then 2 different outputs are returned and no "CV-aware" complaint is formulated.

I would appreciate some clarification on the procedure and why this is happening! Thanks!

library(data.table)
library(tmle3)
library(sl3)

data = read.csv("perinatal.csv")

node_list <- list(
  W = c(
    "apgar1", "apgar5", "gagebrth", "mage", "meducyrs", "sexn"
  ),
  A = "parity01",
  Y = "haz01"
)

glm = Lrnr_glm$new()
lrn_mean = Lrnr_mean$new()
sl <- Lrnr_sl$new(learners = Stack$new(glm, lrn_mean), metalearner = Lrnr_nnls$new())

learner_list <- list(A = glm, Y = glm)
# learner_list = list(A=sl, Y = sl)

ate_spec <- tmle_ATE(
  treatment_level = 1,
  control_level = 0
)

tmle_task <- ate_spec$make_tmle_task(data, node_list)
initial_likelihood <- ate_spec$make_initial_likelihood(
  tmle_task,
  learner_list
)


targeted_likelihood_cv <- Targeted_Likelihood$new(initial_likelihood)

targeted_likelihood_no_cv <-
  Targeted_Likelihood$new(initial_likelihood,
    updater = list(cvtmle = FALSE)
  )

tmle_params_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_cv)
tmle_params_no_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_no_cv)

tmle_no_cv <- fit_tmle3(
  tmle_task, targeted_likelihood_no_cv, tmle_params_no_cv,
  targeted_likelihood_no_cv$updater
)
tmle_no_cv
# -0.1855909

tmle_cv <- fit_tmle3(
  tmle_task, targeted_likelihood_cv, tmle_params_cv,
  targeted_likelihood_cv$updater
)
tmle_cv
# -0.1855909

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CV-TMLE vs TMLE #91

CV-TMLE vs TMLE #91

olivierlabayle commented Sep 6, 2023

CV-TMLE vs TMLE #91

CV-TMLE vs TMLE #91

Comments

olivierlabayle commented Sep 6, 2023