Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph structure repair based on data #162

Open
cthoyt opened this issue Aug 23, 2023 · 4 comments
Open

Graph structure repair based on data #162

cthoyt opened this issue Aug 23, 2023 · 4 comments
Assignees

Comments

@cthoyt
Copy link
Member

cthoyt commented Aug 23, 2023

This takes a prior knowledge on the network in the form of DAG or ADMG, as well as the available data (can be observational and/or interventional data), and repairs the network structure based on given data. The goal is to make sure that the conditional independence implied by the data are aligned with the conditional independence implied by the network. Here are the steps:

  1. Use a conditional independence test to find all the tests that failed.
  2. For each failed test between two variables such as V_i and V_j, add a bi-directed edge between them.

Now we have a repaired network with additional bi-directed edges. If the prior knowledge graph was an ADMG, we now have a new ADMG with additional bi-directed edges. If the prior knowledge graph was a DAG, it is now converted to an ADMG.

@srtaheri srtaheri changed the title Graph simplification based on data Graph structure repair based on data Aug 23, 2023
@srtaheri
Copy link
Collaborator

srtaheri commented Aug 23, 2023

Here is an example. Assume that this is the prior knowledge network in the form of ADMG:

Screen Shot 2023-08-23 at 10 34 21 AM
Now assume that the conditional independency test based on the data between $Z_1$, $Z_2$ given $M_1$ failed. In addition the conditional independency between $R_2$, and $R_3$ given some variables failed. Furthermore, the conditional independency between $Y$, and $R_3$ given some other variables failed. Hence we will put a bi-directed edge between ($Z_1$, $Z_2$), ($R_2$, $R_3$), and ($Y$, $R_3$) as follows:

Screen Shot 2023-08-23 at 10 35 46 AM

The above graph is the repaired ADMG.

cthoyt added a commit that referenced this issue Aug 23, 2023
Skeleton for #162
@cthoyt
Copy link
Member Author

cthoyt commented Aug 24, 2023

It's not clear to me why you would make a test between $Z_1$, $Z_2$ given $M_1$ then infer that there should be a bidirected edge between ($Z_1$, $Z_2$). What is special about $M_1$ in this case? Can you please make this a bit more of an algorthmic description?

Here are the all of the conditional independencies for this graph calculated with y0.algorithm.conditional_independencies.get_conditional_independencies():

left right conditions
M1 R1 M2
M1 R2 M2
M1 R3 M2
M1 Y M2, X
M1 Z1 X
M1 Z2 X
M1 Z3 X
M2 R2 R1
M2 R3 R2
M2 X M1
M2 Z1 M1
M2 Z2 M1
M2 Z3 M1
R1 R3 R2
R1 X M2
R1 Y M2, R3
R1 Z1 M2
R1 Z2 M2
R1 Z3 M2
R2 X M2
R2 Y M2, R3
R2 Z1 M2
R2 Z2 M2
R2 Z3 M2
R3 X M2
R3 Z1 M2
R3 Z2 M2
R3 Z3 M2
X Y M2, Z3
X Z2 Z1
X Z3 Z2
Y Z1 M2, Z3
Y Z2 M2, Z3
Z1 Z3 Z2
Click here to get the code
from y0.graph import NxMixedGraph
from y0.dsl import Z1, Z2, Z3, Variable, X, Y
from y0.algorithm.conditional_independencies import get_conditional_independencies
import pandas as pd
from tabulate import tabulate

R1, R2, R3 = (Variable(f"R{i + 1}") for i in range(3))
M1, M2 = (Variable(f"M{i + 1}") for i in range(2))


def main():
    graph = NxMixedGraph.from_edges(
        directed=[
            (Z1, X),
            (Z1, Z2),
            (Z2, Z3),
            (Z3, Y),
            (X, M1),
            (M1, M2),
            (M2, R1),
            (M2, Y),
            (R1, R2),
            (R2, R3),
            (R3, Y),
        ],
        undirected=[(X, Z1)],
    )

    cis = get_conditional_independencies(graph)
    df = pd.DataFrame(
        sorted(
            (
                conditional_independency.left,
                conditional_independency.right,
                ", ".join(sorted(v.name for v in conditional_independency.conditions)),
            )
            for conditional_independency in cis
            if conditional_independency.separated
        ),
        columns=["left", "right", "conditions"],
    )
    print(tabulate(df, headers=df.columns, tablefmt="github", showindex=False))


if __name__ == "__main__":
    main()

@cthoyt
Copy link
Member Author

cthoyt commented Aug 25, 2023

So it can be given no variables or any combination of variables

@cthoyt
Copy link
Member Author

cthoyt commented Aug 25, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants