Implement distance metric selection #35

pierd · 2023-02-09T19:39:54Z

Addresses #20

Changes

separate crate distance-metrics with Euclid and Cosine metrics
adds distance_metric field to Config and DistanceMetric enum class
adds benchmarks for metric implementations and for building HNSW from Python
turns FloatArray into a generic struct taking metric as a parameter
introduces HnswWithMetric and HnswMapWithMetric intermediate enums to hide generic parameters from the classes exposed to Python and to have static dispatch internally

Benchmark

The only noticeable change (regression) is from using the original SIMD implementation to using the qdrant one. Time spent in Python integration benchmark grows ~12% (175 msec -> 196 msec) which roughly corresponds the growth in time spent in distance metrics microbenchmark +19% (21 ns -> 25 ns).

Note that these benchmarks are very simple and most likely not reflect real world cases. It would be interesting to see the results for aarch64 since previously we didn't have SIMD implementation for it.

Distance metrics

Distance metrics microbenchmark results (on AMD Ryzen 9 5900HS 3.30 GHz):

test legacy               ... bench:          21 ns/iter (+/- 0)
test non_simd             ... bench:         183 ns/iter (+/- 2)
test simple<CosineMetric> ... bench:          22 ns/iter (+/- 0)
test simple<EuclidMetric> ... bench:          25 ns/iter (+/- 0)

Python integration

version	result
original (before any changes)	`10 loops, best of 5: 175 msec per loop`
after refactoring for enum dispatch to different metrics (but still using the same distance metric implementation)	`10 loops, best of 5: 175 msec per loop`
after switching to qdrant's Euclid metric implementation	`10 loops, best of 5: 196 msec per loop`
after adding Cosine metric	`10 loops, best of 5: 196 msec per loop`

Known issues

Since HnswWithMetric and HnswMapWithMetric are introduced, the serialization format has changed. Objects serialized with the previous version won't work with this one.

distance-metrics/src/simple.rs

distance-metrics/src/simple_avx.rs

distance-metrics/src/simple_neon.rs

distance-metrics/src/simple_sse.rs

djc

Hey, nice work! I did a first round of review, but there's a lot here to digest. I'm most curious about the benchmarking results, I'm guessing the interposition of Metric in its current state would cause performance to regress substantially.

FYI, we usually like to rebase-merge atomic commits. For example, the way I would submit this would probably be to do the refactoring first, then add the other SIMD implementations (one commit per architecture) for the euclid stuff, then add one new metric per commit. No need to do that now if you prefer to keep it like this (since doing it retroactively can be a bit of a pain), although it might still be nice to squash all the follow-up commits into the first one.

djc · 2023-02-10T09:00:40Z

Cargo.toml

@@ -1,5 +1,5 @@
 [workspace]
-members = ["instant-distance", "instant-distance-py"]
+members = ["distance-metrics", "instant-distance", "instant-distance-py"]


Curious what your motivation for adding a new crate for this is?

Separate mostly because I couldn't get benchmark to access instant-distance-py internals and the metrics felt to specific for instant-distance crate.

I gave this another try. There are 2 things that prevent me from adding benches directly to instant-distance-py:

crate type is cdylib - that one is easy we could have 2 types (cdylib and regular lib),

the name is instance-distance, just like the other crate - AFAIK it makes it impossible to use both crates at the same time.

Okay, let's just change the crate name to instant-distance-py in the Rust metadata (but should verify that the Python name will still be instant-distance), and then if we add the extra type we wouldn't need the extra crate anymore, right?

It should be possible if we add a tiny python wrapper module. That's to force maturin to build wheel for mixed rust/python project. Without the wrapper maturin uses lib.name as top level module name.

djc · 2023-02-10T09:03:59Z

distance-metrics/benches/all.rs

+    simple::<CosineMetric>
+);
+
+fn legacy(bench: &mut Bencher) {


Would be curious to see some benchmarking results in this PR for the legacy approach vs your new implementations!

The most recent results:

test legacy ... bench: 21 ns/iter (+/- 0) test non_simd ... bench: 183 ns/iter (+/- 2) test simple<CosineMetric> ... bench: 22 ns/iter (+/- 0) test simple<EuclidMetric> ... bench: 25 ns/iter (+/- 0)

So I think legacy is euclid except it doesn't bother taking the square root, right? If it's still 4ns faster than Euclid, we should maybe expose it under a name other than EuclidMetric, as I think we'd want to stick to it (since the difference between the square root and the squared value doesn't usually matter for our use case?).

Did you run any tests to show that the platform-specific implementations have the same result (or close to it) as the non-SIMD implementation?

Yes, the legacy is euclid but with square root. I believe we don't care about having squared values. If I drop the square root calculation then they are almost the same (2-5% difference):

test legacy ... bench: 54 ns/iter (+/- 6) test metric<CosineMetric> ... bench: 56 ns/iter (+/- 11) test metric<EuclidMetric> ... bench: 55 ns/iter (+/- 6) test non_simd ... bench: 441 ns/iter (+/- 40)

(the numbers are higher because I'm on battery and the CPU is throttled)

We could use that since it doesn't have dimensions limitation. Another idea would be to have the current metric (as for example Euclid300) for the use case of 300 dimensions.

distance-metrics/src/legacy.rs

distance-metrics/src/lib.rs

distance-metrics/src/metric.rs

instant-distance-py/src/lib.rs

djc · 2023-02-10T09:19:44Z

instant-distance-py/src/lib.rs

+#[derive(Deserialize, Serialize)]
+enum HnswMapWithMetric {
+    Euclid(instant_distance::HnswMap<FloatArray<EuclidMetric>, MapValue>),
+    Cosine(instant_distance::HnswMap<FloatArray<CosineMetric>, MapValue>),
+    DotProduct(instant_distance::HnswMap<FloatArray<DotProductMetric>, MapValue>),
+}
+
+// Helper macro for dispatching to inner implementation
+macro_rules! impl_for_each_hnsw_with_metric {
+    ($type:ident, $instance:expr, $inner:ident, $($tokens:tt)+) => {
+        match $instance {
+            $type::Euclid($inner) => {
+                $($tokens)+
+            }
+            $type::Cosine($inner) => {
+                $($tokens)+
+            }
+            $type::DotProduct($inner) => {
+                $($tokens)+
+            }
+        }
+    };


Style nit: let's move this down below the Hnsw.

Since it's a macro it has to be defined before the first use (in HnswMap::search).

instant-distance-py/src/lib.rs

instant-distance-py/test/test.py

instant-distance-py/src/lib.rs

pierd · 2023-02-11T06:58:15Z

Hey, nice work! I did a first round of review, but there's a lot here to digest. I'm most curious about the benchmarking results, I'm guessing the interposition of Metric in its current state would cause performance to regress substantially.

FYI, we usually like to rebase-merge atomic commits. For example, the way I would submit this would probably be to do the refactoring first, then add the other SIMD implementations (one commit per architecture) for the euclid stuff, then add one new metric per commit. No need to do that now if you prefer to keep it like this (since doing it retroactively can be a bit of a pain), although it might still be nice to squash all the follow-up commits into the first one.

Yeah, I think I'll do as you say and split it in more meaningful commits. I'm used to squash merging the whole PR branch so it lands in main/master as a single commit.

One thing I did that makes this PR look bigger than it is, I tried to use as much of qdrant implementation with minimal changes but after second thought it will fit better if I just use the SIMD implementations and rework the rest.

djc · 2023-02-17T09:49:32Z

instant-distance-py/src/lib.rs

+// Helper macro for dispatching to inner implementation
+macro_rules! impl_for_each_hnsw_with_metric {
+    ($type:ident, $instance:expr, $inner:ident, $($tokens:tt)+) => {
+        match $instance {
+            $type::Euclid($inner) => {
+                $($tokens)+
+            }
+            $type::Cosine($inner) => {
+                $($tokens)+
+            }
+        }
+    };
+}


Style nit: even though we have to define the macro here before using it, I'd like to move the DistanceMetric closer to the bottom of the module (probably below MapValue?), to fit it in with the top-down order.

djc · 2023-02-17T09:51:20Z

instant-distance-py/src/lib.rs

+    inner: HnswMapWithMetric,
+}
+
+#[derive(Deserialize, Serialize)]


Style: in order to keep the struct HnswMap close to the impl below, I'd prefer to move the HnswMapWithMetric below the HnswMap impl block (above the Hnsw).

djc · 2023-02-17T09:51:54Z

instant-distance-py/src/lib.rs

+    inner: HnswWithMetric,
+}
+
+#[derive(Deserialize, Serialize)]


djc · 2023-02-17T09:54:04Z

Cargo.toml

@@ -1,5 +1,5 @@
 [workspace]
-members = ["instant-distance", "instant-distance-py"]
+members = ["distance-metrics", "instant-distance", "instant-distance-py"]


Okay, let's just change the crate name to instant-distance-py in the Rust metadata (but should verify that the Python name will still be instant-distance), and then if we add the extra type we wouldn't need the extra crate anymore, right?

djc · 2023-02-17T09:59:25Z

distance-metrics/benches/all.rs

+    simple::<CosineMetric>
+);
+
+fn legacy(bench: &mut Bencher) {


So I think legacy is euclid except it doesn't bother taking the square root, right? If it's still 4ns faster than Euclid, we should maybe expose it under a name other than EuclidMetric, as I think we'd want to stick to it (since the difference between the square root and the squared value doesn't usually matter for our use case?).

Did you run any tests to show that the platform-specific implementations have the same result (or close to it) as the non-SIMD implementation?

djc · 2023-02-17T10:06:50Z

instant-distance-py/src/lib.rs

 #[derive(Clone, Deserialize, Serialize)]
-struct FloatArray(#[serde(with = "BigArray")] [f32; DIMENSIONS]);
+struct FloatArray<M> {


Feels to me like the change here from parametrizing the metric is separate from moving from a fixed-size array to a Vec, so I'd like those changes to be in separate commits.

Also, once this is changed to hold a Vec, should probably also change the name, since it's no longer actually an array? (Maybe FloatVector?)

And in particular, I'm curious about the performance impact of the addtional allocations (and pointer indirection/cache) of using a Vec here over an array. For that we probably need more of a macro-benchmark, do you have a data set you've tried for this?

👍 I'll change the name.

I don't have any real dataset, I've been using random numbers so far. I could potentially use the dataset from translate.py example or maybe something form ann-benchmark (although from what I've seen those are usually quite big).

djc · 2023-02-17T10:14:56Z

instant-distance-py/src/lib.rs

+            .iter()?
+            .map(|val| val.and_then(|v| v.extract::<f32>()))
+            .collect::<Result<_, _>>()?;
+        Ok(Self::from(array))


If you ditch the array binding in favor of yielding Self::from(...collect()) directly, you don't need the type annotation.

There might be too many steps for rustc to figure this out. I still need type annotation on collect:

Ok(Self::from( value .iter()? .map(|val| val.and_then(|v| v.extract::<f32>())) .collect::<Result<Vec<f32>, _>>()?, ))

If I remove even Vec<f32> part then it fails to compile:

error[E0277]: the trait bound `FloatVector<M>: From<()>` is not satisfied --> instant-distance-py/src/lib.rs:431:13 | 430 | Ok(Self::from( | ---------- required by a bound introduced by this call 431 | / value 432 | | .iter()? 433 | | .map(|val| val.and_then(|v| v.extract::<f32>())) 434 | | .collect::<Result<_, _>>()?, | |___________________________________________^ the trait `From<()>` is not implemented for `FloatVector<M>` | = help: the trait `From<Vec<f32>>` is implemented for `FloatVector<M>` error[E0277]: a value of type `()` cannot be built from an iterator over elements of type `f32` --> instant-distance-py/src/lib.rs:431:13 | 431 | / value 432 | | .iter()? 433 | | .map(|val| val.and_then(|v| v.extract::<f32>())) | |________________________________________________________________^ value of type `()` cannot be built from `std::iter::Iterator<Item=f32>` 434 | .collect::<Result<_, _>>()?, | ------- required by a bound introduced by this call | = help: the trait `FromIterator<f32>` is not implemented for `()` = help: the trait `FromIterator<()>` is implemented for `()` = note: required for `Result<(), PyErr>` to implement `FromIterator<Result<f32, PyErr>>` note: required by a bound in `collect` --> /home/kuba/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:1832:19 | 1832 | fn collect<B: FromIterator<Self::Item>>(self) -> B | ^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `collect` For more information about this error, try `rustc --explain E0277`. error: could not compile `instant-distance-py` due to 2 previous errors

I get the same error if I remove the whole type annotation.

djc · 2023-02-17T10:17:41Z

instant-distance-py/src/lib.rs

    }
 }

-impl Point for FloatArray {


So this commit still mixes the adding/changing of a lot of code with moving it around. Usually I'd request a separate commit that just refactors the code so that the implementation here is moved into its new place (introducing the Metric trait) and the code around it is left more or less unchanged, so that it's easier to review these things separately. So then I'd probably end up with:

Implement Metric trait and parametrize FloatArray with it

Add new Metric types and impls

Change FloatArray to FloatVector

djc · 2023-02-17T10:20:52Z

distance-metrics/src/simd_avx.rs

+#[cfg(test)]
+mod tests {
+    #[test]
+    fn test_spaces_avx() {
+        use super::*;
+        use crate::*;
+
+        if is_x86_feature_detected!("avx") && is_x86_feature_detected!("fma") {
+            let v1: Vec<f32> = vec![
+                10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
+                10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
+                10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
+                10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
+                26., 27., 28., 29., 30., 31.,
+            ];
+            let v2: Vec<f32> = vec![
+                40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55.,
+                10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
+                10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
+                10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
+                56., 57., 58., 59., 60., 61.,
+            ];
+
+            let euclid_simd = unsafe { euclid_distance_avx(&v1, &v2) };
+            let euclid = euclid_distance(&v1, &v2);
+            assert_eq!(euclid_simd, euclid);
+
+            let dot_simd = unsafe { dot_similarity_avx(&v1, &v2) };
+            let dot = dot_similarity(&v1, &v2);
+            assert_eq!(dot_simd, dot);
+
+            let mut v1 = v1;
+            let mut v1_copy = v1.clone();
+            unsafe { cosine_preprocess_avx(&mut v1) };
+            cosine_preprocess(&mut v1_copy);
+            assert_eq!(v1, v1_copy);
+        } else {
+            println!("avx test skipped");


So it would be nice to split the addition of test code between the commits that add the test code (or alternatively, add it separately at the end).

djc · 2023-02-17T10:23:06Z

instant-distance-py/src/lib.rs

+#[derive(Copy, Clone)]
+enum DistanceMetric {
+    Euclid,
+    Cosine,


Given that we don't actually have a Cosine implementation at this point, we should not introduce this variant here until the commit where we add the backing implementation.

pierd commented Feb 9, 2023

View reviewed changes

distance-metrics/src/simple.rs Outdated Show resolved Hide resolved

pierd commented Feb 9, 2023

View reviewed changes

distance-metrics/src/simple_avx.rs Outdated Show resolved Hide resolved

pierd commented Feb 9, 2023

View reviewed changes

distance-metrics/src/simple_neon.rs Outdated Show resolved Hide resolved

pierd commented Feb 9, 2023

View reviewed changes

distance-metrics/src/simple_sse.rs Outdated Show resolved Hide resolved

hartshorne requested a review from djc February 9, 2023 19:49

djc reviewed Feb 10, 2023

View reviewed changes

pierd force-pushed the distance-metrics branch 2 times, most recently from 264dd95 to dcb69c0 Compare February 15, 2023 21:00

pierd added 3 commits February 16, 2023 21:37

Refactor to prepare for multiple distance metrics

bca31ad

Replace Euclid metric implementation

ac42574

Add Cosine distance metric

9d8c83f

pierd force-pushed the distance-metrics branch from dcb69c0 to 9d8c83f Compare February 16, 2023 20:57

djc reviewed Feb 17, 2023

View reviewed changes

djc mentioned this pull request Feb 21, 2023

Generalize SIMD distance implementation to n-length vectors #20

Open

djc mentioned this pull request May 8, 2023

Generalize SIMD distance implementation to n-length vectors #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement distance metric selection #35

Implement distance metric selection #35

pierd commented Feb 9, 2023 •

edited

Loading

djc left a comment

djc Feb 10, 2023

pierd Feb 11, 2023

pierd Feb 15, 2023

djc Feb 17, 2023

pierd Feb 17, 2023

djc Feb 10, 2023

pierd Feb 15, 2023

djc Feb 17, 2023

pierd Feb 17, 2023

djc Feb 10, 2023

pierd Feb 15, 2023

pierd commented Feb 11, 2023

djc Feb 17, 2023

djc Feb 17, 2023

djc Feb 17, 2023

djc Feb 17, 2023

djc Feb 17, 2023

djc Feb 17, 2023

pierd Feb 17, 2023

djc Feb 17, 2023

pierd Feb 17, 2023

djc Feb 17, 2023

djc Feb 17, 2023

djc Feb 17, 2023

Implement distance metric selection #35

Are you sure you want to change the base?

Implement distance metric selection #35

Conversation

pierd commented Feb 9, 2023 • edited Loading

Changes

Benchmark

Distance metrics

Python integration

Known issues

djc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierd commented Feb 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierd commented Feb 9, 2023 •

edited

Loading