Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized type should be Object not String #595

Open
oldshensheep opened this issue May 20, 2024 · 1 comment
Open

Unrecognized type should be Object not String #595

oldshensheep opened this issue May 20, 2024 · 1 comment

Comments

@oldshensheep
Copy link

Describe the bug

To Reproduce
install pgvector for postgres https://github.com/pgvector/pgvector
create table

create table if not exists document_embeddings
(
    id         serial8 primary key,
    embedding  vector(2) not null,
    content    text         not null,
    metadata   jsonb        not null,
    created_at timestamp    not null default now()
);

run code

My.DocumentEmbeddingsTest.builder("[0.2,0.3]", "abc", "{}").build();

or

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?,?,?::jsonb)".execute(
          ctx, "[0.2,0.3]", "abc", "{}");
    });
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: column "embedding" is of type vector but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.

Expected behavior
There is a workaround like jsonb, We pass String and cast it to vector

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?::vector,?,?::jsonb)".execute(
          ctx, "[0.2,0.3]", "abc", "{}");
    });

or better without casting (not possible for now)

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?,?,?::jsonb)".execute(
          ctx, new float[]{0.1F, 0.2F}, "abc", "{}");
    });

use raw jdbc

      pstmt.setObject(1, new float[]{0.1F, 0.2F});
// https://github.com/pgvector/pgvector-java
      PGvector pGvector = new PGvector(new float[]{0.1F, 0.2F});
      pstmt.setObject(1, pGvector);

The Problem
Converting a String to a vector works for small vectors, but often we use vectors of size 1024 or larger. Thus, we need to convert a float [1024] to a String, pass it to the database, and then the database must convert the String back to a vector, which is time-consuming.

Manually maintaining mappings from non-JDBC types to Java types can be an endless task. It's better to allow users to implement these mappings.
What I propose is that an unrecognized type should default to Object, not String, as converting a String to another type can be a performance issue. Additionally, there should be a way for users to map these types themselves.

Desktop (please complete the following information):

  • OS Type & Version:
  • Java/JDK version: 21
  • IDE version (IntelliJ IDEA or Android Studio): IntelliJ IDEA
  • Manifold version: 2024.1.16
  • Manifold IntelliJ plugin version: 2024.1.4

Additional context
I understand that ValueAccessor is for mapping, and I want to implement it myself.

public Class<?> getJavaType( BaseElement elem )
{
return getClassForColumnClassName( elem.getColumnClassName(), Object.class );
}

it invoke getColumnClassName to get java type, postgresql jdbc implement this method, is this the problem of postgresql jdbc?

https://github.com/pgjdbc/pgjdbc/blob/450488c142fdc368cab54e8257407603acc18c4f/pgjdbc/src/main/java/org/postgresql/jdbc/PgResultSetMetaData.java#L440

for some reason I can't step into getColumnClassName while debugging.

Stack trace
Please include a stack trace if applicable

@rsmckinney
Copy link
Member

IIRC postgresql JDBC driver defaults to String for many special types. Using IntelliJ debugger you can "force" step into the driver code, it will be decompiled.

Anyhow, I'm going to make this work in a couple of ways. First, if the JDBC driver supports it, I'll make Object work as the Java type. Additionally, if pgvector-java is in use, I'll make sure PGVector is the Java type and that it is integrated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants