Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgloader 3.6.7 doesn't continue on fatal errors (ex. non whitespace after quoted data) contrary to pgloader 3.4.1 #1604

Open
Kamal-learner-24 opened this issue Aug 16, 2024 · 1 comment

Comments

@Kamal-learner-24
Copy link

Hello every one,

Let me explain our problem

Recently, we migrate our solution from Redhat 7.5 with PostgreSQL 9.6.9 and pgloader 3.4.1 to Rocky Linux 8.9 with PostgreSQL 13.14 and pgloader 3.6.7

In the old system (Redhat 7.5 with PostgreSQL 9.6.9 and pgloader 3.4.1), when I try to load a CSV file having 478 894 lignes (14 lines having errors ), with a .LOAD command, pgloader 3.6.7 loads 478 880 lines. pgloader 3.4.1 runs as expected and continues loading when encoutring these errors.

In the new system (Rocky Linux 8.9 with PostgreSQL 13.14 and pgloader 3.6.7), when I try to load the same CSV file, with the same .LOAD command, pgloader 3.6.7 loads only 183 569 lines. pgloader 3.6.7 doesn't run as expected and seems to stop loading when encoutring these errors.

Here is the .LOAD command:
LOAD CSV
FROM /inputs/data/F024
WITH ENCODING UTF8 (
user_id [null if blanks], user_name_first [null if blanks], user_name_last [null if blanks]
)
INTO postgresql:///db_rec_dv?cpy.cpy_cso_user_base(user_id, user_name_first, user_name_last)
WITH truncate
, fields optionally enclosed by '"'
, fields terminated by ','
, prefetch rows = 50000
SET client_encoding to 'utf8'
,work_mem to '512MB'
,standard_conforming_strings to 'on'
;

Here is the error I get :
2024-08-08T13:47:09.233005+01:00 ERROR non whitespace after quoted data #<CSV-READER LINE-IDX:2 CHARACTER-LINE-IDX:22 CHARACTER-IDX:793 "byER6Vvdtb," {1005C0E263}> b
2024-08-08T13:47:09.233005+01:00 FATAL non whitespace after quoted data #<CSV-READER LINE-IDX:2 CHARACTER-LINE-IDX:22 CHARACTER-IDX:793 "byER6Vvdtb," {1005C0E263}> b

Here is the extract of the line on error (missing double quotes):
"11","Colyneߌڢ,"Test"

Thank you for your help

Best regards,

Kamal

@svantevonerichsen6906
Copy link
Collaborator

Yes, sorry, but I think you are relying on buggy behaviour, where the bug in question has been fixed six years ago. I'd propose fixing the data errors in the csv files. If I read that right, the csv-reader tells you the faulty lines (LINE-IDX).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants