Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data QC: workflow_execution_has_input_data_objects #176

Open
mbthornton-lbl opened this issue May 16, 2024 · 5 comments
Open

Data QC: workflow_execution_has_input_data_objects #176

mbthornton-lbl opened this issue May 16, 2024 · 5 comments

Comments

@mbthornton-lbl
Copy link
Contributor

15 mags_activity_set

{
            "_id" : ObjectId("649b0052ec087f6bbab34903"),
            "id" : "nmdc:wfmag-11-t8gc3c64.1",
            "has_input" : "nmdc:dobj-11-y2f0gn35"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34903"),
            "id" : "nmdc:wfmag-11-t8gc3c64.1",
            "has_input" : "nmdc:dobj-11-achfhn33"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34903"),
            "id" : "nmdc:wfmag-11-t8gc3c64.1",
            "has_input" : "nmdc:dobj-11-vt4jr220"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34907"),
            "id" : "nmdc:wfmag-11-0gwm7d87.1",
            "has_input" : "nmdc:dobj-11-9a9pn310"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34907"),
            "id" : "nmdc:wfmag-11-0gwm7d87.1",
            "has_input" : "nmdc:dobj-11-dpnhb305"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34907"),
            "id" : "nmdc:wfmag-11-0gwm7d87.1",
            "has_input" : "nmdc:dobj-11-1fkqcg08"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34941"),
            "id" : "nmdc:wfmag-11-dchy6q29.1",
            "has_input" : "nmdc:dobj-11-1wzar939"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34941"),
            "id" : "nmdc:wfmag-11-dchy6q29.1",
            "has_input" : "nmdc:dobj-11-fg28a080"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34941"),
            "id" : "nmdc:wfmag-11-dchy6q29.1",
            "has_input" : "nmdc:dobj-11-s4hp2x64"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34942"),
            "id" : "nmdc:wfmag-11-8s9xk838.1",
            "has_input" : "nmdc:dobj-11-kr8ev105"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34942"),
            "id" : "nmdc:wfmag-11-8s9xk838.1",
            "has_input" : "nmdc:dobj-11-gxgpbv06"
        },
        {
            "_id" : ObjectId("649b0052ec087f6bbab34942"),
            "id" : "nmdc:wfmag-11-8s9xk838.1",
            "has_input" : "nmdc:dobj-11-whq9ph06"
        },
        {
            "_id" : ObjectId("649b0054ec087f6bbab34a76"),
            "id" : "nmdc:wfmag-11-29carm07.1",
            "has_input" : "nmdc:dobj-11-t0wzq938"
        },
        {
            "_id" : ObjectId("649b0054ec087f6bbab34a76"),
            "id" : "nmdc:wfmag-11-29carm07.1",
            "has_input" : "nmdc:dobj-11-cvcxxr53"
        },
        {
            "_id" : ObjectId("649b0054ec087f6bbab34a76"),
            "id" : "nmdc:wfmag-11-29carm07.1",
            "has_input" : "nmdc:dobj-11-qkzjq615"
        }

7 metagenome_assembly_set

{
            "_id" : ObjectId("649b005f2ca5ee4adb13a1a0"),
            "id" : "nmdc:wfmgas-11-k8x7bf78.1",
            "has_input" : "nmdc:dobj-11-6appe696"
        },
        {
            "_id" : ObjectId("649b005f2ca5ee4adb13a1df"),
            "id" : "nmdc:wfmgas-11-r8dyz821.1",
            "has_input" : "nmdc:dobj-11-xbb2mp07"
        },
        {
            "_id" : ObjectId("649b005f2ca5ee4adb13a1e2"),
            "id" : "nmdc:wfmgas-11-6pbs6265.1",
            "has_input" : "nmdc:dobj-11-axnsyw61"
        },
        {
            "_id" : ObjectId("649b005f2ca5ee4adb13a29d"),
            "id" : "nmdc:wfmgas-11-yh8n1e65.1",
            "has_input" : "nmdc:dobj-11-nz4neg35"
        },
        {
            "_id" : ObjectId("649b005f2ca5ee4adb13a302"),
            "id" : "nmdc:wfmgas-11-kk38rr46.1",
            "has_input" : "nmdc:dobj-11-aetz2k16"
        },
        {
            "_id" : ObjectId("649b005f2ca5ee4adb13a306"),
            "id" : "nmdc:wfmgas-11-10vezr29.1",
            "has_input" : "nmdc:dobj-11-h673kw16"
        }

5 metagenome_annotation_activity_set

{
            "_id" : ObjectId("649b005bbf2caae0415efba8"),
            "id" : "nmdc:wfmgan-11-c516q834.1",
            "has_input" : "nmdc:dobj-11-dpnhb305"
        },
        {
            "_id" : ObjectId("649b005bbf2caae0415efba9"),
            "id" : "nmdc:wfmgan-11-yzp9eq74.1",
            "has_input" : "nmdc:dobj-11-achfhn33"
        },
        {
            "_id" : ObjectId("649b005bbf2caae0415efbef"),
            "id" : "nmdc:wfmgan-11-fmymf551.1",
            "has_input" : "nmdc:dobj-11-fg28a080"
        },
        {
            "_id" : ObjectId("649b005bbf2caae0415efbf0"),
            "id" : "nmdc:wfmgan-11-3nkefn97.1",
            "has_input" : "nmdc:dobj-11-gxgpbv06"
        },
        {
            "_id" : ObjectId("649b005dbf2caae0415efd26"),
            "id" : "nmdc:wfmgan-11-w1d6gy98.1",
            "has_input" : "nmdc:dobj-11-cvcxxr53"
        }

1 metatranscriptome_activity_set

{
            "_id" : ObjectId("65affcdc07a1abea27b3fa1f"),
            "id" : "nmdc:wfmt-11-y9cf0x90.1",
            "has_input" : "nmdc:dobj-11-xwqq5x15"
        }

6 read_based_taxonomy_analysis_set

        {
            "_id" : ObjectId("649b009bff710ae353f8ccdc"),
            "id" : "nmdc:wfrbt-12-zfccjh43.1",
            "has_input" : "nmdc:dobj-11-6bdffd27"
        },
        {
            "_id" : ObjectId("649b009bff710ae353f8cd18"),
            "id" : "nmdc:wfrbt-12-3qany071.1",
            "has_input" : "nmdc:dobj-11-xbb2mp07"
        },
        {
            "_id" : ObjectId("649b009bff710ae353f8cd1f"),
            "id" : "nmdc:wfrbt-12-5p0p4731.1",
            "has_input" : "nmdc:dobj-11-6appe696"
        },
        {
            "_id" : ObjectId("649b009bff710ae353f8cd25"),
            "id" : "nmdc:wfrbt-12-arwnpf44.1",
            "has_input" : "nmdc:dobj-11-axnsyw61"
        },
        {
            "_id" : ObjectId("649b009cff710ae353f8d16e"),
            "id" : "nmdc:wfrbt-11-dkzdrn42.1",
            "has_input" : "nmdc:dobj-11-nz4neg35"
        },
        {
            "_id" : ObjectId("649b009cff710ae353f8d1cf"),
            "id" : "nmdc:wfrbt-11-z5a29n65.1",
            "has_input" : "nmdc:dobj-11-aetz2k16"
        }
@aclum
Copy link
Contributor

aclum commented May 17, 2024

Not a blocker. None of these are for studies we are re-iding, most are from ingesting records from JGI. We can look at this once re-iding is complete.

@aclum
Copy link
Contributor

aclum commented Jul 25, 2024

we should be able to use nmdc_automation/run_process/run_workflows.py to generate records for these.
these were all from GROW and Bioscales which are still on the file systems
/global/cfs/cdirs/m3408/aim2/dev/bioscales_mapping/grow_analysis_projects/ and /global/cfs/cdirs/m3408/aim2/dev/bioscales_mapping/bioscales_analysis_projects/

@eecavanna
Copy link
Contributor

eecavanna commented Oct 25, 2024

For my own future reference and easy of "copy/pasting," here's a list of the 33 ids in the "has_input" fields above (wrapped in quotes and delimited by commas):

"nmdc:dobj-11-y2f0gn35",
"nmdc:dobj-11-achfhn33",
"nmdc:dobj-11-vt4jr220",
"nmdc:dobj-11-9a9pn310",
"nmdc:dobj-11-dpnhb305",
"nmdc:dobj-11-1fkqcg08",
"nmdc:dobj-11-1wzar939",
"nmdc:dobj-11-fg28a080",
"nmdc:dobj-11-s4hp2x64",
"nmdc:dobj-11-kr8ev105",
"nmdc:dobj-11-gxgpbv06",
"nmdc:dobj-11-whq9ph06",
"nmdc:dobj-11-t0wzq938",
"nmdc:dobj-11-cvcxxr53",
"nmdc:dobj-11-qkzjq615",
"nmdc:dobj-11-6appe696",
"nmdc:dobj-11-xbb2mp07",
"nmdc:dobj-11-axnsyw61",
"nmdc:dobj-11-nz4neg35",
"nmdc:dobj-11-aetz2k16",
"nmdc:dobj-11-h673kw16",
"nmdc:dobj-11-dpnhb305",
"nmdc:dobj-11-achfhn33",
"nmdc:dobj-11-fg28a080",
"nmdc:dobj-11-gxgpbv06",
"nmdc:dobj-11-cvcxxr53",
"nmdc:dobj-11-xwqq5x15",
"nmdc:dobj-11-6bdffd27",
"nmdc:dobj-11-xbb2mp07",
"nmdc:dobj-11-6appe696",
"nmdc:dobj-11-axnsyw61",
"nmdc:dobj-11-nz4neg35",

That list only contains 23 distinct id (the other 10 are repeats). Here's a table showing the number of times each id occurs in the list.

# Occurrences id
2 nmdc:dobj-11-6appe696
2 nmdc:dobj-11-achfhn33
2 nmdc:dobj-11-aetz2k16
2 nmdc:dobj-11-axnsyw61
2 nmdc:dobj-11-cvcxxr53
2 nmdc:dobj-11-dpnhb305
2 nmdc:dobj-11-fg28a080
2 nmdc:dobj-11-gxgpbv06
2 nmdc:dobj-11-nz4neg35
2 nmdc:dobj-11-xbb2mp07
1 nmdc:dobj-11-1fkqcg08
1 nmdc:dobj-11-1wzar939
1 nmdc:dobj-11-6bdffd27
1 nmdc:dobj-11-9a9pn310
1 nmdc:dobj-11-h673kw16
1 nmdc:dobj-11-kr8ev105
1 nmdc:dobj-11-qkzjq615
1 nmdc:dobj-11-s4hp2x64
1 nmdc:dobj-11-t0wzq938
1 nmdc:dobj-11-vt4jr220
1 nmdc:dobj-11-whq9ph06
1 nmdc:dobj-11-xwqq5x15
1 nmdc:dobj-11-y2f0gn35

Here are all the (23) distinct ids:

"nmdc:dobj-11-6appe696",
"nmdc:dobj-11-achfhn33",
"nmdc:dobj-11-aetz2k16",
"nmdc:dobj-11-axnsyw61",
"nmdc:dobj-11-cvcxxr53",
"nmdc:dobj-11-dpnhb305",
"nmdc:dobj-11-fg28a080",
"nmdc:dobj-11-gxgpbv06",
"nmdc:dobj-11-nz4neg35",
"nmdc:dobj-11-xbb2mp07",
"nmdc:dobj-11-1fkqcg08",
"nmdc:dobj-11-1wzar939",
"nmdc:dobj-11-6bdffd27",
"nmdc:dobj-11-9a9pn310",
"nmdc:dobj-11-h673kw16",
"nmdc:dobj-11-kr8ev105",
"nmdc:dobj-11-qkzjq615",
"nmdc:dobj-11-s4hp2x64",
"nmdc:dobj-11-t0wzq938",
"nmdc:dobj-11-vt4jr220",
"nmdc:dobj-11-whq9ph06",
"nmdc:dobj-11-xwqq5x15",
"nmdc:dobj-11-y2f0gn35",

@eecavanna
Copy link
Contributor

There may be a typo in the issue description.

7 metagenome_assembly_set

There are only 6 JSON objects in that list.

@eecavanna
Copy link
Contributor

Regarding the 1 metatranscriptome_activity_set document listed above:

{
    "_id" : ObjectId("65affcdc07a1abea27b3fa1f"),
    "id" : "nmdc:wfmt-11-y9cf0x90.1",
    "has_input" : "nmdc:dobj-11-xwqq5x15"
}

As of today, October 26, 2024, the database (i.e. any of the schema-described collections) no longer contains a reference to a document whose id is nmdc:dobj-11-xwqq5x15. Also, there is no document whose id is nmdc:wfmt-11-y9cf0x90.1.

Based on those two things, I think this particular violation is obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants