Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate nodes in pdb.v4 #253

Open
dosumis opened this issue Mar 22, 2022 · 2 comments
Open

Duplicate nodes in pdb.v4 #253

dosumis opened this issue Mar 22, 2022 · 2 comments

Comments

@dosumis
Copy link
Member

dosumis commented Mar 22, 2022

Does anyone know why we now have 2 nodes with the same iri in pdb.v4 -

MATCH (c) 
WHERE c.iri="http://virtualflybrain.org/reports/VFBexp_FBtp0122237FBtp0118721" 
RETURN c 

?
there is only one node for this in the old pdb

v4 has:

<id>: 545644
description: The sum of all cells at the intersection between the expression patterns of P{R74G01-GAL4.DBD} and P{R41G07-p65.AD}.
iri: http://virtualflybrain.org/reports/VFBexp_FBtp0122237FBtp0118721
label: P{R74G01-GAL4.DBD} ∩ P{R41G07-p65.AD} expression pattern
short_form: VFBexp_FBtp0122237FBtp0118721
synonyms: SS00796
uniqueFacets: Expression_pattern,Split
<id>: 178844
curie: VFBexp:FBtp0122237FBtp0118721
description: The sum of all cells at the intersection between the expression patterns of P{R74G01-GAL4.DBD} and P{R41G07-p65.AD}.
has_exact_synonym: {"annotations":{},"value":"GMR_SS00796"},{"annotations":{},"value":"SS00796"},{"annotations":{},"value":"JRC_SS00796"}
iri: http://virtualflybrain.org/reports/VFBexp_FBtp0122237FBtp0118721
label: P{R74G01-GAL4.DBD} ∩ P{R41G07-p65.AD} expression pattern
label_rdfs: P{R74G01-GAL4.DBD} ∩ P{R41G07-p65.AD} expression pattern
qsl: P1R74G014GAL44DBD1_8_P1R41G074p654AD1_expression_pattern
short_form: VFBexp_FBtp0122237FBtp0118721
sl: P1R74G014GAL44DBD1_8_P1R41G074p654AD1_expression_pattern
uniqueFacets: Expression_pattern

KB has

<id>:462066
description:The sum of all cells at the intersection between the expression patterns of P{R74G01-GAL4.DBD} and P{R41G07-p65.AD}.
iri:http://virtualflybrain.org/reports/VFBexp_FBtp0122237FBtp0118721
label:P{R74G01-GAL4.DBD} ∩ P{R41G07-p65.AD} expression pattern
short_form:VFBexp_FBtp0122237FBtp0118721
synonyms:SS00796,GMR_SS00796,JRC_SS00796
@dosumis
Copy link
Member Author

dosumis commented Mar 22, 2022

The simpler one looks like it was introduced during side loading of expression data.
Which would fit with a failure to merge.

Code:
feature_tools.FeatureMover. gen_split_ep_feat has

self.ni.add_node(labels=['Class'],
                             IRI=iri,
                             attribute_dict=ad)

https://github.com/VirtualFlyBrain/VFB_neo4j/blob/master/src/uk/ac/ebi/vfb/neo4j/flybase2neo/feature_tools.py#L413

--->

statement = "MERGE (n:%s { iri: '%s' }) set n.short_form = '%s'" ...

https://github.com/VirtualFlyBrain/VFB_neo4j/blob/master/src/uk/ac/ebi/vfb/neo4j/KB_tools.py#L497

So Merge should work fine as long as the target node has the :Class neo label and iris match. From inspection of the DBs, this seems to be the case

Testing merge behavior against pdb-dev (also on v4):

MERGE (c:Class {iri: 'http://virtualflybrain.org/reports/VFBexp_FBtp0122237FBtp011872'} ) SET c.fu = 'bar'

Adds yet another class. But queries between these classes look broken:

image

Very confusing. Could there be some character encoding issue or indexing bug?

@Robbie1977
Copy link
Contributor

Robbie1977 commented Apr 5, 2022

There is no duplicate showing at http://pdbl.p2.virtualflybrain.org/browser/ so it's after the generic pipeline has loaded as part of the sideloding as the duplicate is in http://pdbsl.p2.virtualflybrain.org/browser/

I've initially ruled out the first step https://github.com/VirtualFlyBrain/pipeline/blob/pipeline2/process.sh
As nothing like what we are looking for is added

[rancher@parsley jenkins-LoadPDB2-175]$ cat add_refs_for_anat.out | grep VFBexp
Processing chunk of 40 of 90 starting with: b'OPTIONAL MATCH (s:Class { short_form:\'VFBexp_FBtp0084107\' }) OPTIONAL MATCH (o:Individual { short_form:\'Unattributed\' }) FOREACH (a IN CASE WHEN s IS NOT NULL THEN [s] ELSE [] END | FOREACH (b IN CASE WHEN o IS NOT NULL THEN [o] ELSE [] END | MERGE (a)-[re:has_reference]->(b) SET re.type = \'Annotation\' SET re.typ = "syn" SET re.scope = "has_exact_synonym" SET re.value = [\'Erm-GAL4 expression pattern\'] SET re.label = \'has_reference\' SET re.short_form = \'references\' SET re.iri = \'http://purl.org/dc/terms/references\' )) RETURN { `VFBexp_FBtp0084107`: count(s), `Unattributed`: count(o) } as match_count'
Processing chunk of 43 of 90 starting with: b'OPTIONAL MATCH (s:Class { short_form:\'VFBexp_FBtp0061640\' }) OPTIONAL MATCH (o:Individual { short_form:\'Unattributed\' }) FOREACH (a IN CASE WHEN s IS NOT NULL THEN [s] ELSE [] END | FOREACH (b IN CASE WHEN o IS NOT NULL THEN [o] ELSE [] END | MERGE (a)-[re:has_reference]->(b) SET re.type = \'Annotation\' SET re.typ = "syn" SET re.scope = "has_exact_synonym" SET re.value = [\'Ktl-GAL4 expression pattern\'] SET re.label = \'has_reference\' SET re.short_form = \'references\' SET re.iri = \'http://purl.org/dc/terms/references\' )) RETURN { `VFBexp_FBtp0061640`: count(s), `Unattributed`: count(o) } as match_count'
Processing chunk of 44 of 90 starting with: b'OPTIONAL MATCH (s:Class { short_form:\'VFBexp_FBtp0060060\' }) OPTIONAL MATCH (o:Individual { short_form:\'Unattributed\' }) FOREACH (a IN CASE WHEN s IS NOT NULL THEN [s] ELSE [] END | FOREACH (b IN CASE WHEN o IS NOT NULL THEN [o] ELSE [] END | MERGE (a)-[re:has_reference]->(b) SET re.type = \'Annotation\' SET re.typ = "syn" SET re.scope = "has_exact_synonym" SET re.value = [\'P{GMR40H02-GAL4} expression pattern\'] SET re.label = \'has_reference\' SET re.short_form = \'references\' SET re.iri = \'http://purl.org/dc/terms/references\' )) RETURN { `VFBexp_FBtp0060060`: count(s), `Unattributed`: count(o) } as match_count'
Processing chunk of 50 of 90 starting with: b'OPTIONAL MATCH (s:Class { short_form:\'VFBexp_FBtp0062535\' }) OPTIONAL MATCH (o:Individual { short_form:\'Unattributed\' }) FOREACH (a IN CASE WHEN s IS NOT NULL THEN [s] ELSE [] END | FOREACH (b IN CASE WHEN o IS NOT NULL THEN [o] ELSE [] END | MERGE (a)-[re:has_reference]->(b) SET re.type = \'Annotation\' SET re.typ = "syn" SET re.scope = "has_exact_synonym" SET re.value = [\'P{GMR73C07-GAL4} expression pattern\'] SET re.label = \'has_reference\' SET re.short_form = \'references\' SET re.iri = \'http://purl.org/dc/terms/references\' )) RETURN { `VFBexp_FBtp0062535`: count(s), `Unattributed`: count(o) } as match_count'
Processing chunk of 65 of 90 starting with: b'OPTIONAL MATCH (s:Class { short_form:\'VFBexp_FBtp0122331FBtp0118760\' }) OPTIONAL MATCH (o:Individual { short_form:\'Unattributed\' }) FOREACH (a IN CASE WHEN s IS NOT NULL THEN [s] ELSE [] END | FOREACH (b IN CASE WHEN o IS NOT NULL THEN [o] ELSE [] END | MERGE (a)-[re:has_reference]->(b) SET re.type = \'Annotation\' SET re.typ = "syn" SET re.scope = "has_exact_synonym" SET re.value = [\'LH1614\'] SET re.label = \'has_reference\' SET re.short_form = \'references\' SET re.iri = \'http://purl.org/dc/terms/references\' )) RETURN { `VFBexp_FBtp0122331FBtp0118760`: count(s), `Unattributed`: count(o) } as match_count'
[rancher@parsley jenkins-LoadPDB2-175]$ cat expand_xrefs.out | grep VFBexp
[rancher@parsley jenkins-LoadPDB2-175]$ cat import_pub_data.out | grep VFBexp

Starting on the next stages...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants