Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problematic .tsv processing #62

Open
suciokhan opened this issue Aug 29, 2022 · 4 comments
Open

Problematic .tsv processing #62

suciokhan opened this issue Aug 29, 2022 · 4 comments

Comments

@suciokhan
Copy link

suciokhan commented Aug 29, 2022

When trying to ingest from .tsv files using Loader 1.4.1 on Ubuntu 20.04, I receive the following error:

[open_alex_0::5] ERROR com.vaticle.typedb.osi.loader.loader - async-writer-4: [THW07] Invalid Thing Write: Attempted to assign a key ',' of type 'id' that had been taken by another 'researcher'.

However, I've reviewed the .tsv and confirmed there are no comma values in this column; all values are open_alex identifiers, which are URLs starting with https.

In my typeDB config.json file, I have it set to expect tab separators, and it successfully ingests hundreds of thousands of rows.

"separator": "\t",

Below is a screenshot of confirming there are no commas in the id column using Python and Pandas.

image

I considered it being an issue with perhaps the header since it fails on the 2nd .tsv it's going through, as there is one record in the database with a comma for an id.
image

However, it doesn't fail until processing over 600,000 rows according to TypeDB processing updates.
image

@flyingsilverfin
Copy link
Member

So it does sound like the data is corrupt somehow, have you managed to track down the duplicate ,?

@suciokhan
Copy link
Author

There are no comma values for the id column in the source data.

@hkuich
Copy link
Member

hkuich commented Aug 31, 2022 via email

@suciokhan
Copy link
Author

Sure, I will send you a link to the 2 files I was having trouble with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants