Skip to content

Commit

Permalink
Merge pull request #14 from lerouxrgd/ngt-2
Browse files Browse the repository at this point in the history
NGT 2
  • Loading branch information
lerouxrgd authored Aug 25, 2023
2 parents 53b3e1a + fcfd3de commit 45d9b95
Show file tree
Hide file tree
Showing 18 changed files with 2,266 additions and 557 deletions.
8 changes: 6 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,15 @@ jobs:
matrix:
os: [ubuntu-latest]
feature:
- default
- shared_mem
- large_data
- shared_mem,large_data
- quantized
- quantized,qg_optim
- large_data,shared_mem
- large_data,quantized
- static
- static,quantized
- static,quantized,qg_optim
- static,shared_mem,large_data
steps:
- uses: actions/checkout@v3
Expand Down
16 changes: 11 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "ngt"
version = "0.4.5"
version = "0.5.0"
authors = ["Romain Leroux <[email protected]>"]
edition = "2021"
description = "Rust wrappers for NGT nearest neighbor search."
Expand All @@ -11,17 +11,23 @@ license = "Apache-2.0"
readme = "README.md"

[dependencies]
ngt-sys = { path = "ngt-sys", version = "1.14.8-static" }
num_enum = "0.5"
openmp-sys = { version="1.2.3", features=["static"] }
half = "2"
ngt-sys = { path = "ngt-sys", version = "2.1.2" }
num_enum = "0.7"
scopeguard = "1"

[dev-dependencies]
rand = "0.8"
rayon = "1"
tempfile = "3"

[features]
default = []
static = ["ngt-sys/static"]
shared_mem = ["ngt-sys/shared_mem"]
large_data = ["ngt-sys/large_data"]
quantized = ["ngt-sys/quantized"]
qg_optim = ["quantized", "ngt-sys/qg_optim"]

[package.metadata.docs.rs]
features = ["quantized"]
rustdoc-args = ["--cfg", "docsrs"]
108 changes: 42 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,55 @@
# ngt-rs &emsp; [![Latest Version]][crates.io] [![Latest Doc]][docs.rs]
# ngt-rs

[Latest Version]: https://img.shields.io/crates/v/ngt.svg
[crates.io]: https://crates.io/crates/ngt
[Latest Doc]: https://docs.rs/ngt/badge.svg
[docs.rs]: https://docs.rs/ngt
[![crate]][crate-ngt] [![doc]][doc-ngt]

Rust wrappers for [NGT][], which provides high-speed approximate nearest neighbor
searches against a large volume of data.

Building NGT requires `CMake`. By default `ngt-rs` will be built dynamically, which
means that you'll need to make the build artifact `libngt.so` available to your final
binary. You'll also need to have `OpenMP` installed on the system where it will run. If
you want to build `ngt-rs` statically, then use the `static` Cargo feature, note that in
this case `OpenMP` will be disabled when building NGT.

Furthermore, NGT's shared memory and large dataset features are available through Cargo
features `shared_mem` and `large_data` respectively.

## Usage

Defining the properties of a new index:
[crate]: https://img.shields.io/crates/v/ngt.svg
[crate-ngt]: https://crates.io/crates/ngt
[doc]: https://docs.rs/ngt/badge.svg
[doc-ngt]: https://docs.rs/ngt

```rust
use ngt::{Properties, DistanceType, ObjectType};

// Defaut properties with vectors of dimension 3
let prop = Properties::dimension(3)?;

// Or customize values (here are the defaults)
let prop = Properties::dimension(3)?
.creation_edge_size(10)?
.search_edge_size(40)?
.object_type(ObjectType::Float)?
.distance_type(DistanceType::L2)?;
```

Creating/Opening an index and using it:

```rust
use ngt::{Index, Properties, EPSILON};
Rust wrappers for [NGT][], which provides high-speed approximate nearest neighbor
searches against a large volume of data in high dimensional vector data space (several
ten to several thousand dimensions). The vector data can be `f32`, `u8`, or [f16][].

// Create a new index
let prop = Properties::dimension(3)?;
let index = Index::create("target/path/to/index/dir", prop)?;
This crate provides the following indexes:
* [`NgtIndex`][index-ngt]: Graph and tree based index[^1]
* [`QgIndex`][index-qg]: Quantized graph based index[^2]
* [`QbgIndex`][index-qbg]: Quantized blob graph based index

// Open an existing index
let mut index = Index::open("target/path/to/index/dir")?;
Both quantized indexes are available through the `quantized` Cargo feature. Note that
they rely on `BLAS` and `LAPACK` which thus have to be installed locally. Furthermore,
`QgIndex` performances can be [improved][qg-optim] by using the `qg_optim` Cargo
feature.

// Insert two vectors and get their id
let vec1 = vec![1.0, 2.0, 3.0];
let vec2 = vec![4.0, 5.0, 6.0];
let id1 = index.insert(vec1)?;
let id2 = index.insert(vec2)?;
The `NgtIndex` default implementation is an ANNG. It can be optimized[^3] or converted
to an ONNG through the [`optim`][ngt-optim] module.

// Actually build the index (not yet persisted on disk)
// This is required in order to be able to search vectors
index.build(2)?;
By default `ngt-rs` will be built dynamically, which requires `CMake` to build NGT. This
means that you'll have to make the build artifact `libngt.so` available to your final
binary (see an example in the [CI][ngt-ci]). However the `static` feature will build and
link NGT statically. Note that `OpenMP` will also be linked statically. If the
`quantized` feature is used, then `BLAS` and `LAPACK` libraries will also be linked
statically.

// Perform a vector search (with 1 result)
let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
assert_eq!(res[0].id, id1);
assert_eq!(index.get_vec(id1)?, vec![1.0, 2.0, 3.0]);
NGT's [shared memory][ngt-sharedmem] and [large dataset][ngt-largedata] features are
available through the Cargo features `shared_mem` and `large_data` respectively.

// Remove a vector and check that it is not present anymore
index.remove(id1)?;
let res = index.get_vec(id1);
assert!(matches!(res, Result::Err(_)));
[^1]: [Graph and tree based method explanation][ngt-desc]

// Verify that now our search result is different
let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
assert_eq!(res[0].id, id2);
assert_eq!(index.get_vec(id2)?, vec![4.0, 5.0, 6.0]);
[^2]: [Quantized graph based method explanation][qg-desc]

// Persist index on disk
index.persist()?;
```
[^3]: [NGT index optimizations in Python][ngt-optim-py]

[ngt]: https://github.com/yahoojapan/NGT
[ngt-desc]: https://opensource.com/article/19/10/ngt-open-source-library
[ngt-sharedmem]: https://github.com/yahoojapan/NGT#shared-memory-use
[ngt-largedata]: https://github.com/yahoojapan/NGT#large-scale-data-use
[ngt-ci]: https://github.com/lerouxrgd/ngt-rs/blob/master/.github/workflows/ci.yaml
[ngt-optim]: https://docs.rs/ngt/latest/ngt/optim/index.html
[ngt-optim-py]: https://github.com/yahoojapan/NGT/wiki/Optimization-Examples-Using-Python
[qg-desc]: https://medium.com/@masajiro.iwasaki/fusion-of-graph-based-indexing-and-product-quantization-for-ann-search-7d1f0336d0d0
[qg-optim]: https://github.com/yahoojapan/NGT#build-parameters-1
[f16]: https://docs.rs/half/latest/half/struct.f16.html
[index-ngt]: https://docs.rs/ngt/latest/ngt/#usage
[index-qg]: https://docs.rs/ngt/latest/ngt/qg/
[index-qbg]: https://docs.rs/ngt/latest/ngt/qgb/
6 changes: 4 additions & 2 deletions ngt-sys/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "ngt-sys"
version = "1.14.8-static"
version = "2.1.2"
authors = ["Romain Leroux <[email protected]>"]
edition = "2021"
links = "ngt"
Expand All @@ -18,4 +18,6 @@ cpp_build = { version = "0.5", optional = true }
[features]
static = ["dep:cpp_build"]
shared_mem = []
large_data = []
large_data = []
quantized = []
qg_optim = []
2 changes: 1 addition & 1 deletion ngt-sys/NGT
Submodule NGT updated 87 files
+28 −9 CMakeLists.txt
+126 −31 README-jp.md
+135 −35 README.md
+1 −1 VERSION
+4 −3 bin/CMakeLists.txt
+1 −1 bin/ngt/CMakeLists.txt
+2 −0 bin/ngt/ngt.cpp
+0 −13 bin/ngtq/CMakeLists.txt
+0 −122 bin/ngtq/README-jp.md
+0 −124 bin/ngtq/README.md
+0 −30 bin/ngtq/ngtq.cpp
+0 −15 bin/ngtqg/CMakeLists.txt
+0 −138 bin/ngtqg/README.md
+13 −0 bin/qbg/CMakeLists.txt
+287 −0 bin/qbg/README.md
+2 −2 bin/qbg/qbg.cpp
+13 −13 lib/NGT/ArrayFile.h
+7 −4 lib/NGT/CMakeLists.txt
+388 −9 lib/NGT/Capi.cpp
+60 −0 lib/NGT/Capi.h
+233 −80 lib/NGT/Clustering.h
+56 −20 lib/NGT/Command.cpp
+2 −1 lib/NGT/Command.h
+316 −120 lib/NGT/Common.h
+82 −77 lib/NGT/Graph.cpp
+22 −16 lib/NGT/Graph.h
+11 −9 lib/NGT/GraphOptimizer.h
+44 −44 lib/NGT/GraphReconstructor.h
+1 −1 lib/NGT/HashBasedBooleanSet.h
+95 −74 lib/NGT/Index.cpp
+149 −57 lib/NGT/Index.h
+13 −13 lib/NGT/MmapManager.cpp
+20 −20 lib/NGT/MmapManager.h
+30 −30 lib/NGT/MmapManagerDefs.h
+22 −22 lib/NGT/MmapManagerImpl.hpp
+653 −16 lib/NGT/NGTQ/Capi.cpp
+167 −100 lib/NGT/NGTQ/Capi.h
+641 −0 lib/NGT/NGTQ/HierarchicalKmeans.cpp
+1,312 −0 lib/NGT/NGTQ/HierarchicalKmeans.h
+687 −0 lib/NGT/NGTQ/Matrix.h
+0 −612 lib/NGT/NGTQ/NGTQCommand.h
+0 −296 lib/NGT/NGTQ/NGTQGCommand.cpp
+0 −117 lib/NGT/NGTQ/NGTQGCommand.h
+616 −0 lib/NGT/NGTQ/ObjectFile.h
+596 −0 lib/NGT/NGTQ/Optimizer.cpp
+395 −0 lib/NGT/NGTQ/Optimizer.h
+1,652 −0 lib/NGT/NGTQ/QbgCli.cpp
+134 −0 lib/NGT/NGTQ/QbgCli.h
+1,756 −0 lib/NGT/NGTQ/QuantizedBlobGraph.h
+80 −0 lib/NGT/NGTQ/QuantizedGraph.cpp
+204 −49 lib/NGT/NGTQ/QuantizedGraph.h
+2,766 −612 lib/NGT/NGTQ/Quantizer.h
+14 −1 lib/NGT/Node.cpp
+11 −11 lib/NGT/Node.h
+23 −16 lib/NGT/ObjectRepository.h
+120 −49 lib/NGT/ObjectSpace.h
+38 −18 lib/NGT/ObjectSpaceRepository.h
+54 −54 lib/NGT/Optimizer.h
+28 −28 lib/NGT/PrimitiveComparator.h
+4 −4 lib/NGT/SharedMemoryAllocator.cpp
+9 −9 lib/NGT/SharedMemoryAllocator.h
+2 −2 lib/NGT/Thread.h
+7 −8 lib/NGT/Tree.cpp
+3 −3 lib/NGT/Tree.h
+17 −17 lib/NGT/Version.cpp
+12 −1 lib/NGT/defines.h.in
+3,202 −3,202 lib/NGT/half.hpp
+1 −32 python/README-jp.md
+58 −12 python/README-ngtpy-jp.md
+21 −16 python/README-ngtpy.md
+2 −32 python/README.md
+71 −10 python/setup.py
+570 −28 python/src/ngtpy.cpp
+5 −0 samples/CMakeLists.txt
+5 −6 samples/jaccard-sparse/jaccard-sparse.cpp
+9 −0 samples/qbg-capi/CMakeLists.txt
+138 −0 samples/qbg-capi/qbg-capi.cpp
+9 −0 samples/qg-capi/CMakeLists.txt
+148 −0 samples/qg-capi/qg-capi.cpp
+9 −0 samples/qg-l2-float/CMakeLists.txt
+123 −0 samples/qg-l2-float/qg-l2-float.cpp
+ tests/ann-benchmarks-results/fashion-mnist-784-euclidean.png
+ tests/ann-benchmarks-results/gist-960-euclidean.png
+ tests/ann-benchmarks-results/glove-100-angular.png
+ tests/ann-benchmarks-results/glove-25-angular.png
+ tests/ann-benchmarks-results/nytimes-256-angular.png
+ tests/ann-benchmarks-results/sift-128-euclidean.png
47 changes: 33 additions & 14 deletions ngt-sys/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,54 @@ fn main() {
let out_dir = env::var("OUT_DIR").unwrap();

let mut config = cmake::Config::new("NGT");

if env::var("CARGO_FEATURE_SHARED_MEM").is_ok() {
config.define("NGT_SHARED_MEMORY_ALLOCATOR", "ON");
}

if env::var("CARGO_FEATURE_LARGE_DATA").is_ok() {
config.define("NGT_LARGE_DATASET", "ON");
}

if env::var("CARGO_FEATURE_QUANTIZED").is_err() {
config.define("NGT_QBG_DISABLED", "ON");
} else {
config.define("CMAKE_BUILD_TYPE", "Release");
if env::var("CARGO_FEATURE_QG_OPTIM").is_ok() {
config.define("NGTQG_NO_ROTATION", "ON");
config.define("NGTQG_ZERO_GLOBAL", "ON");
}
}
let dst = config.build();

#[cfg(feature = "static")]
cpp_build::Config::new()
.include(format!("{}/lib", out_dir))
.build("src/lib.rs");

println!("cargo:rustc-link-search=native={}/lib", dst.display());
#[cfg(feature = "static")]
println!("cargo:rustc-link-lib=static=ngt");
#[cfg(not(feature = "static"))]
println!("cargo:rustc-link-lib=dylib=ngt");
{
println!("cargo:rustc-link-lib=dylib=ngt");
}
#[cfg(feature = "static")]
{
cpp_build::Config::new()
.include(format!("{}/lib", out_dir))
.build("src/lib.rs");
println!("cargo:rustc-link-lib=static=ngt");
println!("cargo:rustc-link-lib=gomp");

if env::var("CARGO_FEATURE_QUANTIZED").is_ok() {
println!("cargo:rustc-link-lib=blas");
println!("cargo:rustc-link-lib=lapack");
}
}

let capi_header = if cfg!(feature = "quantized") {
format!("{}/include/NGT/NGTQ/Capi.h", dst.display())
} else {
format!("{}/include/NGT/Capi.h", dst.display())
};

let out_path = PathBuf::from(out_dir);
let bindings = bindgen::Builder::default()
.clang_arg(format!("-I{}/include", dst.display()))
.header(format!("{}/include/NGT/NGTQ/Capi.h", dst.display()))
.header(capi_header)
.generate()
.expect("Unable to generate bindings");

let out_path = PathBuf::from(out_dir);
bindings
.write_to_file(out_path.join("bindings.rs"))
.expect("Couldn't write bindings");
Expand Down
50 changes: 44 additions & 6 deletions src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ use std::fmt;

use ngt_sys as sys;

use crate::properties::{DistanceType, ObjectType};

pub type Result<T> = std::result::Result<T, Error>;

#[derive(Debug)]
Expand All @@ -25,6 +23,12 @@ pub(crate) fn make_err(err: sys::NGTError) -> Error {
Error(err_msg)
}

impl From<String> for Error {
fn from(err: String) -> Self {
Self(err)
}
}

impl From<std::io::Error> for Error {
fn from(source: std::io::Error) -> Self {
Self(source.to_string())
Expand All @@ -43,14 +47,48 @@ impl From<std::ffi::NulError> for Error {
}
}

impl From<num_enum::TryFromPrimitiveError<ObjectType>> for Error {
fn from(source: num_enum::TryFromPrimitiveError<ObjectType>) -> Self {
impl From<std::ffi::IntoStringError> for Error {
fn from(source: std::ffi::IntoStringError) -> Self {
Self(source.to_string())
}
}

impl From<num_enum::TryFromPrimitiveError<crate::NgtObject>> for Error {
fn from(source: num_enum::TryFromPrimitiveError<crate::NgtObject>) -> Self {
Self(source.to_string())
}
}

impl From<num_enum::TryFromPrimitiveError<crate::NgtDistance>> for Error {
fn from(source: num_enum::TryFromPrimitiveError<crate::NgtDistance>) -> Self {
Self(source.to_string())
}
}

#[cfg(feature = "quantized")]
impl From<num_enum::TryFromPrimitiveError<crate::qg::QgObject>> for Error {
fn from(source: num_enum::TryFromPrimitiveError<crate::qg::QgObject>) -> Self {
Self(source.to_string())
}
}

#[cfg(feature = "quantized")]
impl From<num_enum::TryFromPrimitiveError<crate::qg::QgDistance>> for Error {
fn from(source: num_enum::TryFromPrimitiveError<crate::qg::QgDistance>) -> Self {
Self(source.to_string())
}
}

#[cfg(feature = "quantized")]
impl From<num_enum::TryFromPrimitiveError<crate::qbg::QbgObject>> for Error {
fn from(source: num_enum::TryFromPrimitiveError<crate::qbg::QbgObject>) -> Self {
Self(source.to_string())
}
}

impl From<num_enum::TryFromPrimitiveError<DistanceType>> for Error {
fn from(source: num_enum::TryFromPrimitiveError<DistanceType>) -> Self {
#[cfg(feature = "quantized")]
impl From<num_enum::TryFromPrimitiveError<crate::qbg::QbgDistance>> for Error {
fn from(source: num_enum::TryFromPrimitiveError<crate::qbg::QbgDistance>) -> Self {
Self(source.to_string())
}
}
Loading

0 comments on commit 45d9b95

Please sign in to comment.