Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in stats generation: duplicate node IDs in bco #202

Open
caufieldjh opened this issue Jan 30, 2023 · 0 comments
Open

Error in stats generation: duplicate node IDs in bco #202

caufieldjh opened this issue Jan 30, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@caufieldjh
Copy link
Collaborator

caufieldjh commented Jan 30, 2023

Describe the bug

During stats generation, this error is raised:

03:16:33  Downloading apollo_sv, version 2022-11-25 from KG-OBO: kg-obo/apollo_sv/2022-11-25/apollo_sv_kgx_tsv.tar.gz
03:16:33  Downloading apollo_sv, version v2023-01-10 from KG-OBO: kg-obo/apollo_sv/v2023-01-10/apollo_sv_kgx_tsv.tar.gz
03:16:33  Downloading apollo_sv, version v4.1.1. from KG-OBO: kg-obo/apollo_sv/v4.1.1./graph.tar.gz
03:16:33  Downloading aro, version 05-10-2021-09-37 from KG-OBO: kg-obo/aro/05-10-2021-09-37/aro_kgx_tsv.tar.gz
03:16:33  Downloading aro, version 12-09-2022-11-38 from KG-OBO: kg-obo/aro/12-09-2022-11-38/aro_kgx_tsv.tar.gz
03:16:33  Downloading bco, version 2020-03-27 from KG-OBO: kg-obo/bco/2020-03-27/bco_kgx_tsv.tar.gz
03:16:33  Downloading bco, version 2021-11-14 from KG-OBO: kg-obo/bco/2021-11-14/bco_kgx_tsv.tar.gz
03:16:33  Encountered unresolvable error while generating stats: <class 'ValueError'> - Duplicated values found while building the vocabulary!
03:16:33  Specifically the duplicated values are:
03:16:33  ["BCO:0000003", "BCO:0000016", "BCO:0000025", "BCO:0000031", "BCO:0000032", "BCO:0000042", "BCO:0000044", "BCO:0000046", "BCO:0000075", "BCO:0000080", "BCO:0000081"].
03:16:33  The number of duplicates found is 11, as the length of the reverse map is 716 and the length of the map is 705.

This causes the stats file to not be generated.

Version

71001b2

Additional context

It's an issue with loading the KGX TSV into grape, but the source of the issue is unclear. This may be due to some other node getting renamed to have the proper BCO: CURIE prefix, as that would leave a duplicate in the nodelist.

@caufieldjh caufieldjh added the bug Something isn't working label Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant