Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Postgres Table Name with Hypen #3215

Closed
dkindlund opened this issue Sep 18, 2024 · 12 comments
Closed

[BUG] - Postgres Table Name with Hypen #3215

dkindlund opened this issue Sep 18, 2024 · 12 comments
Labels
question Further information is requested

Comments

@dkindlund
Copy link
Contributor

dkindlund commented Sep 18, 2024

Describe the bug
I have a multi-page PDF that I have uploaded into the Flowise Document Store. After successfully splitting this PDF into chunks, I try to finally perform an upsert so that the Document Store can create the corresponding vector records -- one vector per chunk. However, after clicking the Upsert button, I get this Oh snap! error message:

Status: 500
Error: documentStoreServices.insertIntoVectorStore - Error: documentStoreServices._insertIntoVectorStoreWorkerThread - QueryFailedError: syntax error at or near "-"

So there is that bug AND the other bug is that this error message appears to NOT be properly word-wrapped, as the error message appears truncated in the browser. I had to go into the HTML output in order to see the full error message:

image

To Reproduce
Steps to reproduce the behavior:

  1. Upload a large PDF into the Document Store
  2. Configure Embeddings, Postgres, and Postgres Record Manager settings (make sure your Table Name values contain at least one DASH in them, like foo-bar)
  3. Click Upsert button
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
image
image
image

Setup

  • Installation [e.g. docker, npx flowise start, pnpm start]
    AWS App Runner
  • Flowise Version [e.g. 1.2.11]
    2.0.7
  • OS: [e.g. macOS, Windows, Linux]
    Linux
  • Browser [e.g. chrome, safari]
    Chrome

Additional context
To be crystal clear, when I click the Upsert button, it appears Flowise successfully creates the user_guides_rm-v1 Record Manager Table Schema, but I don't see any corresponding records inserted into that Postgres table:

flowise-data=> \d user_guides_rm-v1
                     Table "public.user_guides_rm-v1"
   Column   |       Type       | Collation | Nullable |      Default
------------+------------------+-----------+----------+-------------------
 uuid       | uuid             |           | not null | gen_random_uuid()
 key        | text             |           | not null |
 namespace  | text             |           | not null |
 updated_at | double precision |           | not null |
 group_id   | text             |           |          |
Indexes:
    "user_guides_rm-v1_pkey" PRIMARY KEY, btree (uuid)
    "group_id_index" btree (group_id)
    "key_index" btree (key)
    "namespace_index" btree (namespace)
    "updated_at_index" btree (updated_at)
    "user_guides_rm-v1_key_namespace_key" UNIQUE CONSTRAINT, btree (key, namespace)

flowise-data=> select * from "user_guides_rm-v1" ;
 uuid | key | namespace | updated_at | group_id
------+-----+-----------+------------+----------
(0 rows)

So the Upsert process never gets to the point of creating the user_guides_vec-v1 Vector table, at all.

Lastly, when I created the document chunks from the PDF, I used a standard character splitter of chunk size 1000 with 0 chunk overlap and specified a custom separator of "\n":

image

@HenryHengZJ and @vinodkiran -- any idea on how to resolve this? Thanks in advance!

@dkindlund
Copy link
Contributor Author

Note: I upgraded to v2.1.1 and the bug is still present.

@dkindlund
Copy link
Contributor Author

I also tried specifying NO character text splitter -- same bug persists.

@dkindlund
Copy link
Contributor Author

Tried using different PDF files -- same bug persists.

@dkindlund
Copy link
Contributor Author

Tried using CSV files -- same bug persists.

@dkindlund
Copy link
Contributor Author

Tried using plain text -- same bug persists.

@dkindlund
Copy link
Contributor Author

@HenryHengZJ are the container images posted to Docker hub stripped of all debugging? The reason I ask, is because I tried specifying DEBUG=true and LOG_LEVEL=debug to get more information on this issue, but I'm not seeing any additional errors in the application logs.

@dkindlund
Copy link
Contributor Author

Also, it seems the uploaded files are stored inside the Flowise container's disk with some assumption that the files should permanently persist. Instead of using container disk for this, is there any chance it could be stored in the PostgresDB? That way, if a Flowise container restarts and there's no permanent storage, this feature will still work correctly.

Right now, if you use this feature and then restart a Flowise container (without a permanent disk), then these Document Store settings appear corrupted.

@dkindlund
Copy link
Contributor Author

I was able to replicate the bug completely outside of DocumentStore just by creating a legacy chatflow that uses the same document loader nodes.

From that, I found this stacktrace:

09-18-2024 02:49:03 PM 2024-09-18 18:49:03 [ERROR]: [server]: Error: Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM Error: Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM     at buildFlow (/usr/local/lib/node_modules/flowise/dist/utils/index.js:489:19)
09-18-2024 02:49:03 PM     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
09-18-2024 02:49:03 PM     at async upsertVector (/usr/local/lib/node_modules/flowise/dist/utils/upsertVector.js:126:32)
09-18-2024 02:49:03 PM     at async Object.upsertVectorMiddleware (/usr/local/lib/node_modules/flowise/dist/services/vectors/index.js:9:16)
09-18-2024 02:49:03 PM     at async createInternalUpsert (/usr/local/lib/node_modules/flowise/dist/controllers/vectors/index.js:28:29)

@dkindlund
Copy link
Contributor Author

Looks like there was an earlier stacktrace:

09-18-2024 02:49:03 PM Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM     at Postgres_VectorStores.upsert (/usr/local/lib/node_modules/flowise/node_modules/flowise-components/dist/nodes/vectorstores/Postgres/Postgres.js:98:27)
09-18-2024 02:49:03 PM     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
09-18-2024 02:49:03 PM     at async buildFlow (/usr/local/lib/node_modules/flowise/dist/utils/index.js:415:37)
09-18-2024 02:49:03 PM     at async upsertVector (/usr/local/lib/node_modules/flowise/dist/utils/upsertVector.js:126:32)
09-18-2024 02:49:03 PM     at async Object.upsertVectorMiddleware (/usr/local/lib/node_modules/flowise/dist/services/vectors/index.js:9:16)
09-18-2024 02:49:03 PM     at async createInternalUpsert (/usr/local/lib/node_modules/flowise/dist/controllers/vectors/index.js:28:29)
09-18-2024 02:49:03 PM 2024-09-18 18:49:03 [ERROR]: [server]: Error: Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM Error: Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM     at buildFlow (/usr/local/lib/node_modules/flowise/dist/utils/index.js:489:19)
09-18-2024 02:49:03 PM     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
09-18-2024 02:49:03 PM     at async upsertVector (/usr/local/lib/node_modules/flowise/dist/utils/upsertVector.js:126:32)
09-18-2024 02:49:03 PM     at async Object.upsertVectorMiddleware (/usr/local/lib/node_modules/flowise/dist/services/vectors/index.js:9:16)
09-18-2024 02:49:03 PM     at async createInternalUpsert (/usr/local/lib/node_modules/flowise/dist/controllers/vectors/index.js:28:29)

@dkindlund
Copy link
Contributor Author

Okay, progress! @HenryHengZJ, I think the root cause had to do with the fact that I specified a PostgreSQL table name that includes a dash - in the name.

Specifically, this bug triggers with the table name of: user_guides_vec-v1
This bug does NOT trigger if the table name is just: user_guides_vec (for example)

@dkindlund
Copy link
Contributor Author

So there's one of two solutions to this:

  1. Either add sanity check logic on all Table Name inputs to throw an error whenever a table name contains a dash.
    OR
  2. Figure out and fix why a dash in the table name is causing problems in the first place.

@HenryHengZJ
Copy link
Contributor

From a quick search, having hypen is not the normal naming convention, usually its small letter with underscore. Hypen is a reserved character, but strange thing is that we do escape using double quotes, not sure why it didnt work

@HenryHengZJ HenryHengZJ changed the title [BUG] - DocumentStore Upsert Issues [BUG] - Postgres Table Name with Hypen Sep 19, 2024
@HenryHengZJ HenryHengZJ added the question Further information is requested label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants