-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ATT/ATC incorrectly updating A #66
Comments
Also, as a heads up. The data structure/node_list from the cpp data: has virtually no signal between W and A and Y. So all the estimates are pretty much Lrnr_mean. I would recommend making a simulated dataset of sufficient complexity. |
I was under the impression that updates were always fit using observed outcomes and the corresponding likelihood values (so here observed A and p(A=a|W). The update is then applied to both the observed and counterfactual P(A=1|W). Is that not the case for the ATT? |
If you want to update both the observed and counterfactual likelihoods (and not just P(A=1|W)) then you need to use a different submodel. The logistic submodel is really only for updating E[A|W] or E[Y|W]. But this of course translates to an update for P(A=0|W) = 1- P(A=1|W). The score of A is something like HA * (A - P(A=1|W) which means we can update/target P(A=1|W) with a logistic submodel. But currently, we are updating as P(A=a|W)* = plogis(qlogis(P(A=a|W)) + eps HA) where now our offset actually depends on our outcome A, which isn't what we want. We wouldn't use type = "likelihood" for the Y node, and for the same reason, we don't want to for the A node. If we would like to continue representing the A node as a true likelihood and not a conditional mean then |
ATT and ATC are missing the W component for the EIF. Also, I think we can avoid needing to do iterative targeting by estimating using the empirical mean over (A,W). (I think Nima used this trick for the shift intervention as well). |
The ATT parameter targets both the A node and Y node. However (in the spec), the likelihood factor for "A" is of type "likelihood" while the likelihood factor for "Y" is of type "mean". As a result, when targeting "A" with the logistic submodel, the actual observed likelihood P(A|W) is updated (incorrectly) and not the conditional mean P(A=1|W). The epsilon is fit based on a logistic submodel with offset logit P(A|W). This will generally lead to an incorrect update and not solve the necessary score equation. The current ATT test is very simple and does not detect the difference between using P(A=1|W) or P(A|W). With a more complex simulation, you can see that the current approach does not necessarily lead to an increase in the likelihood relative to initial, nor does it solve the A-specific score equation as well as the correct method. With either of the two approaches, I couldn't really get it to match the classic tmle results that well, so this might be worth looking into.
This simulation code is what I used:
D <- DAG.empty()
D <- D +
node("W", distr = "runif", min = -0.8, max = 0.8) +
node("W1", distr = "runif", min = -1, max = 1) +
node("A", distr = "rbinom", size = 1, prob = plogis(W1)) +
node("g1", distr = "rconst", const = plogis(W1)) +
node("Y", distr = "rbinom", size =1 , prob = plogis(( -1.5 + 1 + A + W - W1/2 ))) +
node ("EY1", distr = "rconst", const = plogis(( -1.5 + 1 + 1 + W - W1/2 )))
setD <- set.DAG(D)
data <- sim(setD, n = 1000)
data <- as.data.table(data)
The text was updated successfully, but these errors were encountered: