VStream is a distributed streaming vector search system with the following features:
- Vector search on streaming data
- Dynamic data partitioning
- Hierarchical storage mechanism
- Vector compression
- Hot-cold separation
- Linux
- gcc >= 11
- cmake >= 3.10
- Java 8
- Maven >= 3.8.6
bash build.sh
After building, make sure to do the following on every machine in your Flink cluster:
- Put
build/java/librocksdbjni-shared.so
inLD_LIBRARY_PATH
and rename it tolibrocksdbjni.so
. - Copy
build/java/rocksdbjni_classes.jar
to Flink's lib directory.
Before running the experiments, you should upload the vector dataset to HDFS. The vector dataset is expected to be in SIFT format. Run the helper class to upload your dataset:
java -cp ./build/flink-frontend/vstream-1.1.jar cn.edu.zju.daily.util.DataUploader -h
An example of the configuration file is given in flink-frontend/src/main/resources/params.yaml
, which contains
runtime parameters related to HDFS, RocksDB, the HNSW index and the Flink job.
Run the experiment pipeline by submitting the Flink job:
flink run -c cn.edu.zju.daily.VStreamSearchJob ./build/flink-frontend/vstream-1.1.jar <params.yaml>
where params.yaml
is the configuration file.
This repo contains the baseline solution using Milvus 2.3. To run the baseline, you should start a Milvus 2.3 cluster, fill the Milvus root information in the configuration file, and run:
flink run -c cn.edu.zju.daily.MilvusStreamSearchJob ./build/flink-frontend/vstream-1.1.jar <params.yaml>
This system uses code from RocksDB and Apache Flink, both licensed under Apache 2.0 License.