Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to record "original_object" and "original_subject" to every association #249

Open
DnlRKorn opened this issue Aug 1, 2024 · 1 comment

Comments

@DnlRKorn
Copy link
Contributor

DnlRKorn commented Aug 1, 2024

MonarchKG has the following properties recorded on each association they record.

image

This should be fairly trivial with changing the following lines:
https://github.com/RobokopU24/ORION/blob/8d8b643284e70e23c6bb5e2bb48425c9bc949ee4/Common/loader_interface.py#L28-31
becomes:

    def __init__(self, test_mode: bool = False, audit_mode: bool = False, source_data_dir: str = None):
        """Initialize with the option to run in testing mode."""
        self.test_mode: bool = test_mode
        self.audit_mode: bool = audit_mode

and

https://github.com/RobokopU24/ORION/blob/8d8b643284e70e23c6bb5e2bb48425c9bc949ee4/Common/kgx_file_writer.py#L138-144
becomes:

    def write_kgx_edge(self, edge: kgxedge):
        edge_properties = edge.properties
        if(self.audit_mode):
            edge_properties["original_object"] = edge.objectid
            edge_properties["original_subject"] = edge.subjectid
        self.write_edge(subject_id=edge.subjectid,
                        object_id=edge.objectid,
                        predicate=edge.predicate,
                        primary_knowledge_source=edge.primary_knowledge_source,
                        aggregator_knowledge_sources=edge.aggregator_knowledge_sources,
                        edge_properties=edge_properties)
@EvanDietzMorris
Copy link
Contributor

EvanDietzMorris commented Aug 12, 2024

Is the idea that the "original" ids are just pre-normalization, or is this something coming from the source upstream?

If the former, it might make sense to add them during the normalization phase, and that could easily be incorporated into the NormalizationScheme, which would let us easily specify in Graph Specs whether we want them or not.

I worry about altering the kgx file writer for this purpose on a mode based level like that, because for example, someone might use that write_kgx_edge function on post-normalized nodes without realizing it would do that, creating bogus original ids.

We used to have original ids on every edge, and in many cases it can be helpful for quicker troubleshooting etc, but we removed them when we started saving normalization maps for every run.. We could possibly just implement this for every edge again and not worry about a mode or configuration. What do you think @cbizon ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants