Skip to content

Commit

Permalink
Update doc and improve QG setup
Browse files Browse the repository at this point in the history
  • Loading branch information
lerouxrgd committed Aug 25, 2023
1 parent 182747c commit fcfd3de
Show file tree
Hide file tree
Showing 13 changed files with 302 additions and 189 deletions.
1 change: 0 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ jobs:
matrix:
os: [ubuntu-latest]
feature:
- default
- shared_mem
- large_data
- quantized
Expand Down
7 changes: 5 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ readme = "README.md"
[dependencies]
half = "2"
ngt-sys = { path = "ngt-sys", version = "2.1.2" }
num_enum = "0.5"
num_enum = "0.7"
scopeguard = "1"

[dev-dependencies]
Expand All @@ -22,9 +22,12 @@ rayon = "1"
tempfile = "3"

[features]
default = ["quantized", "qg_optim"] # TODO: should not be default
static = ["ngt-sys/static"]
shared_mem = ["ngt-sys/shared_mem"]
large_data = ["ngt-sys/large_data"]
quantized = ["ngt-sys/quantized"]
qg_optim = ["quantized", "ngt-sys/qg_optim"]

[package.metadata.docs.rs]
features = ["quantized"]
rustdoc-args = ["--cfg", "docsrs"]
110 changes: 31 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,103 +1,55 @@
# ngt-rs   [![Latest Version]][crates.io] [![Latest Doc]][docs.rs]
# ngt-rs

[Latest Version]: https://img.shields.io/crates/v/ngt.svg
[crates.io]: https://crates.io/crates/ngt
[Latest Doc]: https://docs.rs/ngt/badge.svg
[docs.rs]: https://docs.rs/ngt
[![crate]][crate-ngt] [![doc]][doc-ngt]

[crate]: https://img.shields.io/crates/v/ngt.svg
[crate-ngt]: https://crates.io/crates/ngt
[doc]: https://docs.rs/ngt/badge.svg
[doc-ngt]: https://docs.rs/ngt

Rust wrappers for [NGT][], which provides high-speed approximate nearest neighbor
searches against a large volume of data in high dimensional vector data space (several
ten to several thousand dimensions).
ten to several thousand dimensions). The vector data can be `f32`, `u8`, or [f16][].

This crate provides the following indexes:
* `NgtIndex`: Graph and tree-based index[^1]
* `QgIndex`: Quantized graph-based index[^2]
* `QbgIndex`: Quantized blob graph-based index
* [`NgtIndex`][index-ngt]: Graph and tree based index[^1]
* [`QgIndex`][index-qg]: Quantized graph based index[^2]
* [`QbgIndex`][index-qbg]: Quantized blob graph based index

Both quantized indexes are available through the `quantized` Cargo feature. Note that
they rely on `BLAS` and `LAPACK` which thus have to be installed locally. The CPU
running the code must also support `AVX2` instructions. Furthermore, `QgIndex`
performances can be [improved][qg-optim] by using the `qg_optim` Cargo feature.
they rely on `BLAS` and `LAPACK` which thus have to be installed locally. Furthermore,
`QgIndex` performances can be [improved][qg-optim] by using the `qg_optim` Cargo
feature.

The `NgtIndex` default implementation is an ANNG, it can be optimized[^3] or converted
The `NgtIndex` default implementation is an ANNG. It can be optimized[^3] or converted
to an ONNG through the [`optim`][ngt-optim] module.

By default `ngt-rs` will be built dynamically, which requires `CMake` to build NGT. This
means that you'll have to make the build artifact `libngt.so` available to your final
binary (see an example in the [CI][ngt-ci]).

However the `static` feature will build and link NGT statically. Note that `OpenMP` will
also be linked statically. If the `quantized` feature is used, then `BLAS` and `LAPACK`
libraries will also be linked statically.

Finally, NGT's [shared memory][ngt-sharedmem] and [large dataset][ngt-largedata]
features are available through the features `shared_mem` and `large_data` respectively.

## Usage

Defining the properties of a new index:

```rust,ignore
use ngt::{NgtProperties, NgtDistance};
// Defaut properties with vectors of dimension 3
let prop = NgtProperties::<f32>::dimension(3)?;
// Or customize values (here are the defaults)
let prop = NgtProperties::<f32>::dimension(3)?
.creation_edge_size(10)?
.search_edge_size(40)?
.distance_type(NgtDistance::L2)?;
```
binary (see an example in the [CI][ngt-ci]). However the `static` feature will build and
link NGT statically. Note that `OpenMP` will also be linked statically. If the
`quantized` feature is used, then `BLAS` and `LAPACK` libraries will also be linked
statically.

Creating/Opening an index and using it:
NGT's [shared memory][ngt-sharedmem] and [large dataset][ngt-largedata] features are
available through the Cargo features `shared_mem` and `large_data` respectively.

```rust,ignore
use ngt::{NgtIndex, NgtProperties, EPSILON};
[^1]: [Graph and tree based method explanation][ngt-desc]

// Create a new index
let prop = NgtProperties::dimension(3)?;
let index: NgtIndex<f32> = NgtIndex::create("target/path/to/index/dir", prop)?;
[^2]: [Quantized graph based method explanation][qg-desc]

// Open an existing index
let mut index = NgtIndex::open("target/path/to/index/dir")?;
// Insert two vectors and get their id
let vec1 = vec![1.0, 2.0, 3.0];
let vec2 = vec![4.0, 5.0, 6.0];
let id1 = index.insert(vec1)?;
let id2 = index.insert(vec2)?;
// Build the index in RAM (not yet persisted on disk)
// This is required in order to be able to search vectors
index.build(2)?;
// Perform a vector search (with 1 result)
let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
assert_eq!(res[0].id, id1);
assert_eq!(index.get_vec(id1)?, vec![1.0, 2.0, 3.0]);
// Remove a vector and check that it is not present anymore
index.remove(id1)?;
let res = index.get_vec(id1);
assert!(res.is_err());
// Verify that now our search result is different
let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
assert_eq!(res[0].id, id2);
assert_eq!(index.get_vec(id2)?, vec![4.0, 5.0, 6.0]);
// Persist index on disk
index.persist()?;
```
[^3]: [NGT index optimizations in Python][ngt-optim-py]

[ngt]: https://github.com/yahoojapan/NGT
[ngt-desc]: https://opensource.com/article/19/10/ngt-open-source-library
[ngt-sharedmem]: https://github.com/yahoojapan/NGT#shared-memory-use
[ngt-largedata]: https://github.com/yahoojapan/NGT#large-scale-data-use
[ngt-ci]: https://github.com/lerouxrgd/ngt-rs/blob/master/.github/workflows/ci.yaml
[ngt-optim]: https://docs.rs/ngt/latest/ngt/optim/index.html
[ngt-optim-py]: https://github.com/yahoojapan/NGT/wiki/Optimization-Examples-Using-Python
[qg-desc]: https://medium.com/@masajiro.iwasaki/fusion-of-graph-based-indexing-and-product-quantization-for-ann-search-7d1f0336d0d0
[qg-optim]: https://github.com/yahoojapan/NGT#build-parameters-1

[^1]: https://opensource.com/article/19/10/ngt-open-source-library
[^2]: https://medium.com/@masajiro.iwasaki/fusion-of-graph-based-indexing-and-product-quantization-for-ann-search-7d1f0336d0d0
[^3]: https://github.com/yahoojapan/NGT/wiki/Optimization-Examples-Using-Python
[f16]: https://docs.rs/half/latest/half/struct.f16.html
[index-ngt]: https://docs.rs/ngt/latest/ngt/#usage
[index-qg]: https://docs.rs/ngt/latest/ngt/qg/
[index-qbg]: https://docs.rs/ngt/latest/ngt/qgb/
6 changes: 6 additions & 0 deletions src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ pub(crate) fn make_err(err: sys::NGTError) -> Error {
Error(err_msg)
}

impl From<String> for Error {
fn from(err: String) -> Self {
Self(err)
}
}

impl From<std::io::Error> for Error {
fn from(source: std::io::Error) -> Self {
Self(source.to_string())
Expand Down
73 changes: 71 additions & 2 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,77 @@
#![cfg_attr(docsrs, feature(doc_auto_cfg))]
#![doc = include_str!("../README.md")]
//!
//! # Usage
//!
//! Graph and tree based index (NGT Index)
//!
//! ## Defining the properties of a new NGT index:
//!
//! ```rust
//! # fn main() -> Result<(), ngt::Error> {
//! use ngt::{NgtProperties, NgtDistance};
//!
//! // Defaut properties with vectors of dimension 3
//! let prop = NgtProperties::<f32>::dimension(3)?;
//!
//! // Or customize values (here are the defaults)
//! let prop = NgtProperties::<f32>::dimension(3)?
//! .creation_edge_size(10)?
//! .search_edge_size(40)?
//! .distance_type(NgtDistance::L2)?;
//!
//! # Ok(())
//! # }
//! ```
//!
//! ## Creating/Opening a NGT index and using it:
//!
//! ```rust
//! # fn main() -> Result<(), ngt::Error> {
//! use ngt::{NgtIndex, NgtProperties};
//!
//! // Create a new index
//! let prop = NgtProperties::dimension(3)?;
//! let index: NgtIndex<f32> = NgtIndex::create("target/path/to/ngt_index/dir", prop)?;
//!
//! // Open an existing index
//! let mut index = NgtIndex::open("target/path/to/ngt_index/dir")?;
//!
//! // Insert two vectors and get their id
//! let vec1 = vec![1.0, 2.0, 3.0];
//! let vec2 = vec![4.0, 5.0, 6.0];
//! let id1 = index.insert(vec1)?;
//! let id2 = index.insert(vec2)?;
//!
//! // Build the index in RAM (not yet persisted on disk)
//! // This is required in order to be able to search vectors
//! index.build(2)?;
//!
//! // Perform a vector search (with 1 result)
//! let res = index.search(&vec![1.1, 2.1, 3.1], 1, ngt::EPSILON)?;
//! assert_eq!(res[0].id, id1);
//! assert_eq!(index.get_vec(id1)?, vec![1.0, 2.0, 3.0]);
//!
//! // Remove a vector and check that it is not present anymore
//! index.remove(id1)?;
//! let res = index.get_vec(id1);
//! assert!(res.is_err());
//!
//! // Verify that now our search result is different
//! let res = index.search(&vec![1.1, 2.1, 3.1], 1, ngt::EPSILON)?;
//! assert_eq!(res[0].id, id2);
//! assert_eq!(index.get_vec(id2)?, vec![4.0, 5.0, 6.0]);
//!
//! // Persist index on disk
//! index.persist()?;
//!
//! # std::fs::remove_dir_all("target/path/to/ngt_index/dir").unwrap();
//! # Ok(())
//! # }
//! ```

#[cfg(all(feature = "quantized", feature = "shared_mem"))]
compile_error!("only one of ['quantized', 'shared_mem'] can be enabled");
compile_error!(r#"only one of ["quantized", "shared_mem"] can be enabled"#);

mod error;
mod ngt;
Expand All @@ -23,5 +93,4 @@ pub const EPSILON: f32 = 0.1;
pub use crate::error::{Error, Result};
pub use crate::ngt::{optim, NgtDistance, NgtIndex, NgtObject, NgtProperties};

#[doc(inline)]
pub use half;
4 changes: 2 additions & 2 deletions src/ngt/index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ where
}

let results = Vec::from_raw_parts(
results as *mut f32,
results,
self.prop.dimension as usize,
self.prop.dimension as usize,
);
Expand Down Expand Up @@ -353,7 +353,7 @@ where
}

let results = Vec::from_raw_parts(
results as *mut u8,
results,
self.prop.dimension as usize,
self.prop.dimension as usize,
);
Expand Down
65 changes: 0 additions & 65 deletions src/ngt/mod.rs
Original file line number Diff line number Diff line change
@@ -1,68 +1,3 @@
//! Defining the properties of a new index:
//!
//! ```rust
//! # fn main() -> Result<(), ngt::Error> {
//! use ngt::{NgtProperties, NgtDistance};
//!
//! // Defaut properties with vectors of dimension 3
//! let prop = NgtProperties::<f32>::dimension(3)?;
//!
//! // Or customize values (here are the defaults)
//! let prop = NgtProperties::<f32>::dimension(3)?
//! .creation_edge_size(10)?
//! .search_edge_size(40)?
//! .distance_type(NgtDistance::L2)?;
//!
//! # Ok(())
//! # }
//! ```
//!
//! Creating/Opening an index and using it:
//!
//! ```rust
//! # fn main() -> Result<(), ngt::Error> {
//! use ngt::{NgtIndex, NgtProperties, EPSILON};
//!
//! // Create a new index
//! let prop = NgtProperties::dimension(3)?;
//! let index: NgtIndex<f32> = NgtIndex::create("target/path/to/index/dir", prop)?;
//!
//! // Open an existing index
//! let mut index = NgtIndex::open("target/path/to/index/dir")?;
//!
//! // Insert two vectors and get their id
//! let vec1 = vec![1.0, 2.0, 3.0];
//! let vec2 = vec![4.0, 5.0, 6.0];
//! let id1 = index.insert(vec1)?;
//! let id2 = index.insert(vec2)?;
//!
//! // Build the index in RAM (not yet persisted on disk)
//! // This is required in order to be able to search vectors
//! index.build(2)?;
//!
//! // Perform a vector search (with 1 result)
//! let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
//! assert_eq!(res[0].id, id1);
//! assert_eq!(index.get_vec(id1)?, vec![1.0, 2.0, 3.0]);
//!
//! // Remove a vector and check that it is not present anymore
//! index.remove(id1)?;
//! let res = index.get_vec(id1);
//! assert!(res.is_err());
//!
//! // Verify that now our search result is different
//! let res = index.search(&vec![1.1, 2.1, 3.1], 1, EPSILON)?;
//! assert_eq!(res[0].id, id2);
//! assert_eq!(index.get_vec(id2)?, vec![4.0, 5.0, 6.0]);
//!
//! // Persist index on disk
//! index.persist()?;
//!
//! # std::fs::remove_dir_all("target/path/to/index/dir").unwrap();
//! # Ok(())
//! # }
//! ```

mod index;
pub mod optim;
mod properties;
Expand Down
6 changes: 5 additions & 1 deletion src/ngt/optim.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
#![cfg_attr(feature = "shared_mem", allow(unused_imports))]

//! Functions aimed at optimizing [`NgtIndex`](NgtIndex)

use std::ffi::CString;
use std::os::unix::ffi::OsStrExt;
use std::path::Path;
Expand Down Expand Up @@ -92,7 +96,7 @@ pub fn refine_anng<T: NgtObjectType>(
/// [`optimize_anng_edges_number`](optimize_anng_edges_number).
///
/// If more performance is needed, a larger `creation_edge_size` can be set through
/// [`Properties`](crate::Properties::creation_edge_size) at ANNG index
/// [`Properties`](crate::NgtProperties::creation_edge_size) at ANNG index
/// [`create`](NgtIndex::create) time.
///
/// Important [`GraphOptimParams`](GraphOptimParams) parameters are `nb_outgoing` edges
Expand Down
Loading

0 comments on commit fcfd3de

Please sign in to comment.