Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sparsevec in weighted vector search function #328

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

dqii
Copy link
Contributor

@dqii dqii commented Aug 16, 2024

No description provided.

Copy link

github-actions bot commented Aug 16, 2024

Benchmarks

metric old new pct change
recall (after create) 0.952 0.952 -
recall (after insert) 0.000 0.000 -
select tps 26694.845 26981.450 +1.07%
select bulk(100) tps 36.678 35.175 -4.10%
select latency (ms) 0.705 ± 1.491𝜎 0.684 ± 1.477𝜎 -2.98%
select bulk(100) latency (ms) 855.498 ± 112.767𝜎 878.168 ± 117.118𝜎 +2.65%
create latency (ms) 391018.657 407151.075 +4.13%
insert tps 441.719 440.408 -0.30%
insert bulk(100) tps 4.753 4.560 -4.05%
insert latency (ms) 71.822 ± 16.181𝜎 72.047 ± 14.955𝜎 +0.31%
insert bulk(100) latency (ms) 6629.397 ± 96.700𝜎 6844.803 ± 257.844𝜎 +3.25%
disk usage (bytes) 8192008192.000 8192008192.000 -

@dqii dqii force-pushed the di/weighted-vector-search branch 9 times, most recently from 863cdc9 to 5e89295 Compare August 18, 2024 02:09
@dqii dqii force-pushed the di/weighted-vector-search branch from 5e89295 to b689c2d Compare August 18, 2024 09:01
Copy link

codecov bot commented Aug 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@dqii dqii marked this pull request as ready for review August 18, 2024 09:35
@dqii dqii requested review from Ngalstyan4 and var77 August 18, 2024 09:35
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This pull request adds support for sparsevec in weighted vector search functions, focusing on enhancing the functionality and flexibility of vector operations in the Lantern project. Here are the key points:

  1. Introduced new overloads for weighted_vector_search functions to handle combinations of vector and sparsevec types in sql/lantern.sql and sql/updates/0.3.1--0.3.2.sql.

  2. Added a new small_world_sparsevec.sql file with sample data for testing sparse vector functionality alongside dense vectors.

  3. Updated test/sql/weighted_search.sql to include conditional tests for sparsevec, ensuring compatibility with systems that may not have the type available.

  4. Modified docker/Dockerfile.dev to upgrade pgvector to version 0.6.1 and simplify the development environment setup process.

  5. Enhanced input validation, error handling, and vector type casting in the weighted vector search functions to improve robustness and flexibility.

6 file(s) reviewed, 24 comment(s)
Edit PR Review Bot Settings

@@ -31,7 +31,7 @@ RUN gem install pg -- --with-pg-include=/usr/local/pgsql/include/ --with-pg-lib=
# hack to make sure postgres user has write access to externally mounted volumes
RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared

RUN cd /root/postgresql-15.5/contrib && make install -j
RUN cd /root/postgresql-15.5/contrib && make install
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Removed '-j' flag from make install. This might slow down the build process.

@@ -55,7 +55,7 @@ COPY . .
RUN sudo rm -rf build \
&& mkdir build \
&& cd build \
&& cmake -DCMAKE_BUILD_TYPE=Debug .. \
&& cmake .. \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Removed '-DCMAKE_BUILD_TYPE=Debug'. This changes the build type from Debug to default (usually Release). Ensure this is intentional and won't affect development workflows.

@@ -65,7 +65,7 @@ RUN git clone https://github.com/lanterndata/benchmark build/benchmark \
&& pip install -r external/requirements.txt

# Install perf
RUN sudo apt update && sudo apt install -y linux-tools-common linux-tools-generic linux-tools-`uname -r`
RUN sudo apt update && sudo apt install -y linux-tools-common linux-tools-generic
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Removed 'linux-tools-uname -r'. This might cause issues if specific kernel version tools are needed.

@@ -1,5 +1,5 @@
ARG VERSION=15
ARG PGVECTOR_VERSION=0.5.1
ARG PGVECTOR_VERSION=0.6.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Upgraded pgvector from 0.5.1 to 0.6.1. Verify compatibility with the rest of the system.

@@ -667,6 +667,7 @@ CREATE OR REPLACE FUNCTION _lantern_internal.maybe_setup_weighted_vector_search(
$weighted_vector_search$
DECLARE
pgvector_exists boolean;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Added check for pgvector sparsevec type existence

Comment on lines +9 to +16
('000', TRUE, '[0,0,0]', '{}/3'),
('001', TRUE, '[0,0,1]', '{3:1}/3'),
('010', FALSE, '[0,1,0]' , '{2:1}/3'),
('011', TRUE, '[0,1,1]', '{2:1,3:1}/3'),
('100', FALSE, '[1,0,0]', '{1:1}/3'),
('101', FALSE, '[1,0,1]', '{1:1,3:1}/3'),
('110', FALSE, '[1,1,0]', '{1:1,2:1}/3'),
('111', TRUE, '[1,1,1]', '{1:1,2:1,3:1}/3');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Inserted test data covering all 3-bit combinations, with corresponding sparse vectors

Comment on lines 123 to 128
-- Check if the type 'sparsevec' exists and store the result in a variable
SELECT EXISTS (
SELECT 1
FROM pg_type
WHERE typname = 'sparsevec'
) AS exists_sparsevec \gset
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Added check for 'sparsevec' type existence

Comment on lines 131 to 145
\if :exists_sparsevec
\echo 'The sparsevec type exists. Running commands...'
\ir utils/small_world_sparsevec.sql
SELECT '{1:0.4,2:0.3,3:0.2}/3' AS s3 \gset
SELECT '[-0.5,-0.1,-0.3]' AS v3 \gset
SELECT
id,
0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as dist
FROM lantern.weighted_vector_search(CAST(NULL as "small_world"), exact => false, ef => 5,
w1=> 0.9, col1=>'s'::text, vec1=>:'s3'::sparsevec,
w2=> 0.1, col2=>'v'::text, vec2=>:'v3'::vector
);
\else
\echo 'The sparsevec type does not exist. Skipping commands...'
\endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Introduced conditional execution for sparsevec tests

Comment on lines 136 to 142
SELECT
id,
0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as dist
FROM lantern.weighted_vector_search(CAST(NULL as "small_world"), exact => false, ef => 5,
w1=> 0.9, col1=>'s'::text, vec1=>:'s3'::sparsevec,
w2=> 0.1, col2=>'v'::text, vec2=>:'v3'::vector
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: New test case for weighted vector search with sparsevec and vector types

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do something like this?

round(cast(0.9 * (s <-> :'s3'::sparsevec) + 0.1 * (v <-> :'v3'::vector) as numeric), 2) as dist

On arm processors the high precision floats seems to slightly differ from x86 ones which makes tests to fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed^

Comment on lines 134 to 135
SELECT '{1:0.4,2:0.3,3:0.2}/3' AS s3 \gset
SELECT '[-0.5,-0.1,-0.3]' AS v3 \gset
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using more descriptive variable names than 's3' and 'v3'

@@ -121,6 +120,30 @@ SELECT count(*)
w3=> 0.52, col3=>'v_real'::text, vec3=>:'v444'::vector
);

-- Check if the type 'sparsevec' exists and store the result in a variable
SELECT EXISTS (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we follow the current pattern and not do this in the test itself?

Maybe, see how we filter out pgvector or lantern extras tests when relevant extensions are not installed and do something similar here

@@ -31,7 +31,7 @@ RUN gem install pg -- --with-pg-include=/usr/local/pgsql/include/ --with-pg-lib=
# hack to make sure postgres user has write access to externally mounted volumes
RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared

RUN cd /root/postgresql-15.5/contrib && make install -j
RUN cd /root/postgresql-15.5/contrib && make install
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link
Contributor Author

@dqii dqii Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

440.5 /usr/bin/install -c -m 755  postgres_fdw.so '/usr/local/pgsql/lib/postgres_fdw.so'
440.5 /usr/bin/install -c -m 644 ./postgres_fdw.control '/usr/local/pgsql/share/extension/'
440.6 /usr/bin/install -c -m 644 ./postgres_fdw--1.0.sql ./postgres_fdw--1.0--1.1.sql  '/usr/local/pgsql/share/extension/'
440.6 make[1]: Leaving directory '/root/postgresql-15.5/contrib/postgres_fdw'
------
Dockerfile.dev:34
--------------------
  32 |     RUN mkdir /lantern_shared && chown postgres:postgres /lantern_shared
  33 |     
  34 | >>> RUN cd /root/postgresql-15.5/contrib && make install -j
  35 |     
  36 |     # allow non-root users to install in the container to make it easier to run update-tests
--------------------
ERROR: failed to solve: process "/bin/sh -c cd /root/postgresql-15.5/contrib && make install -j" did not complete successfully: exit code: 2

View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/ojyc3i59rx9nve3a7qeqpdid7
diqi@Dis-Laptop lantern % 

The error I get with -j

@@ -55,7 +55,7 @@ COPY . .
RUN sudo rm -rf build \
&& mkdir build \
&& cd build \
&& cmake -DCMAKE_BUILD_TYPE=Debug .. \
&& cmake .. \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this?

The dockerfile is meant for development and I often use GDB from inside it, so having a build with symbols is often good.

Though, I usually attach a folder from host and rebuild the DB for debugging so not a big deal but would still like to know why the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build did not succeed for me without it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants