-
Notifications
You must be signed in to change notification settings - Fork 941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CSV parser to stream file to handle very large volume of data #7594
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7594 +/- ##
==========================================
- Coverage 67.56% 67.52% -0.04%
==========================================
Files 567 570 +3
Lines 69946 69996 +50
Branches 5937 5927 -10
==========================================
+ Hits 47257 47266 +9
- Misses 22689 22730 +41 ☔ View full report in Codecov by Sentry. |
filename: `${workId}.json`, | ||
mimetype: 'application/json', | ||
}; | ||
await uploadToStorage(context, applicantUser, 'import/pending', file, { entity }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it will upload an analyst workbench file with a bundle for the current lines that are being parsed ? does it append the file or replace it only ? From what I read it looks like it will replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good point, i need to dig in into this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an issue with count and number, so I have some doubt on the split data part. I tested will all french cities from https://www.data.gouv.fr/fr/datasets/villes-de-france/.
My csv file has an header and 39146 lines
wc -l cities.csv
39146 cities.csv
But in the import screen I see 32759 expected lines=>
And at the end in OpenCTI UI in data > entities I have 32759 (when I'm expecting 39145). I only got 1 error in one line of the csv.
I'm adding the csv mapper that I used:
So I think that the split in chunck of CSV_MAX_BUNDLE_SIZE_GENERATION is not correct somehow
opencti-platform/opencti-graphql/src/connector/importCsv/importCsv-connector.ts
Show resolved
Hide resolved
Actually there is some duplicate, but I'm still missing 6 cities, I will look |
f8359b7
to
558e6c1
Compare
@richard-julien FYI if it's fine with you, I'm going to update this branch and review in the following days/ weeks as part of #7400 feature. |
Ok great! Thanks. |
See #7589