-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement distance metric selection #35
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, nice work! I did a first round of review, but there's a lot here to digest. I'm most curious about the benchmarking results, I'm guessing the interposition of Metric
in its current state would cause performance to regress substantially.
FYI, we usually like to rebase-merge atomic commits. For example, the way I would submit this would probably be to do the refactoring first, then add the other SIMD implementations (one commit per architecture) for the euclid stuff, then add one new metric per commit. No need to do that now if you prefer to keep it like this (since doing it retroactively can be a bit of a pain), although it might still be nice to squash all the follow-up commits into the first one.
@@ -1,5 +1,5 @@ | |||
[workspace] | |||
members = ["instant-distance", "instant-distance-py"] | |||
members = ["distance-metrics", "instant-distance", "instant-distance-py"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what your motivation for adding a new crate for this is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate mostly because I couldn't get benchmark to access instant-distance-py
internals and the metrics felt to specific for instant-distance
crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this another try. There are 2 things that prevent me from adding benches directly to instant-distance-py
:
- crate type is
cdylib
- that one is easy we could have 2 types (cdylib
and regularlib
), - the name is
instance-distance
, just like the other crate - AFAIK it makes it impossible to use both crates at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's just change the crate name to instant-distance-py in the Rust metadata (but should verify that the Python name will still be instant-distance), and then if we add the extra type we wouldn't need the extra crate anymore, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible if we add a tiny python wrapper module. That's to force maturin to build wheel for mixed rust/python project. Without the wrapper maturin uses lib.name as top level module name.
simple::<CosineMetric> | ||
); | ||
|
||
fn legacy(bench: &mut Bencher) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be curious to see some benchmarking results in this PR for the legacy approach vs your new implementations!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most recent results:
test legacy ... bench: 21 ns/iter (+/- 0)
test non_simd ... bench: 183 ns/iter (+/- 2)
test simple<CosineMetric> ... bench: 22 ns/iter (+/- 0)
test simple<EuclidMetric> ... bench: 25 ns/iter (+/- 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think legacy
is euclid except it doesn't bother taking the square root, right? If it's still 4ns faster than Euclid
, we should maybe expose it under a name other than EuclidMetric
, as I think we'd want to stick to it (since the difference between the square root and the squared value doesn't usually matter for our use case?).
Did you run any tests to show that the platform-specific implementations have the same result (or close to it) as the non-SIMD implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the legacy
is euclid but with square root. I believe we don't care about having squared values. If I drop the square root calculation then they are almost the same (2-5% difference):
test legacy ... bench: 54 ns/iter (+/- 6)
test metric<CosineMetric> ... bench: 56 ns/iter (+/- 11)
test metric<EuclidMetric> ... bench: 55 ns/iter (+/- 6)
test non_simd ... bench: 441 ns/iter (+/- 40)
(the numbers are higher because I'm on battery and the CPU is throttled)
We could use that since it doesn't have dimensions limitation. Another idea would be to have the current metric (as for example Euclid300
) for the use case of 300 dimensions.
instant-distance-py/src/lib.rs
Outdated
#[derive(Deserialize, Serialize)] | ||
enum HnswMapWithMetric { | ||
Euclid(instant_distance::HnswMap<FloatArray<EuclidMetric>, MapValue>), | ||
Cosine(instant_distance::HnswMap<FloatArray<CosineMetric>, MapValue>), | ||
DotProduct(instant_distance::HnswMap<FloatArray<DotProductMetric>, MapValue>), | ||
} | ||
|
||
// Helper macro for dispatching to inner implementation | ||
macro_rules! impl_for_each_hnsw_with_metric { | ||
($type:ident, $instance:expr, $inner:ident, $($tokens:tt)+) => { | ||
match $instance { | ||
$type::Euclid($inner) => { | ||
$($tokens)+ | ||
} | ||
$type::Cosine($inner) => { | ||
$($tokens)+ | ||
} | ||
$type::DotProduct($inner) => { | ||
$($tokens)+ | ||
} | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: let's move this down below the Hnsw
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a macro it has to be defined before the first use (in HnswMap::search
).
Yeah, I think I'll do as you say and split it in more meaningful commits. I'm used to squash merging the whole PR branch so it lands in main/master as a single commit. One thing I did that makes this PR look bigger than it is, I tried to use as much of qdrant implementation with minimal changes but after second thought it will fit better if I just use the SIMD implementations and rework the rest. |
264dd95
to
dcb69c0
Compare
dcb69c0
to
9d8c83f
Compare
// Helper macro for dispatching to inner implementation | ||
macro_rules! impl_for_each_hnsw_with_metric { | ||
($type:ident, $instance:expr, $inner:ident, $($tokens:tt)+) => { | ||
match $instance { | ||
$type::Euclid($inner) => { | ||
$($tokens)+ | ||
} | ||
$type::Cosine($inner) => { | ||
$($tokens)+ | ||
} | ||
} | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: even though we have to define the macro here before using it, I'd like to move the DistanceMetric
closer to the bottom of the module (probably below MapValue
?), to fit it in with the top-down order.
inner: HnswMapWithMetric, | ||
} | ||
|
||
#[derive(Deserialize, Serialize)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: in order to keep the struct HnswMap
close to the impl
below, I'd prefer to move the HnswMapWithMetric
below the HnswMap
impl
block (above the Hnsw
).
inner: HnswWithMetric, | ||
} | ||
|
||
#[derive(Deserialize, Serialize)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@@ -1,5 +1,5 @@ | |||
[workspace] | |||
members = ["instant-distance", "instant-distance-py"] | |||
members = ["distance-metrics", "instant-distance", "instant-distance-py"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's just change the crate name to instant-distance-py in the Rust metadata (but should verify that the Python name will still be instant-distance), and then if we add the extra type we wouldn't need the extra crate anymore, right?
simple::<CosineMetric> | ||
); | ||
|
||
fn legacy(bench: &mut Bencher) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think legacy
is euclid except it doesn't bother taking the square root, right? If it's still 4ns faster than Euclid
, we should maybe expose it under a name other than EuclidMetric
, as I think we'd want to stick to it (since the difference between the square root and the squared value doesn't usually matter for our use case?).
Did you run any tests to show that the platform-specific implementations have the same result (or close to it) as the non-SIMD implementation?
#[derive(Clone, Deserialize, Serialize)] | ||
struct FloatArray(#[serde(with = "BigArray")] [f32; DIMENSIONS]); | ||
struct FloatArray<M> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels to me like the change here from parametrizing the metric is separate from moving from a fixed-size array to a Vec
, so I'd like those changes to be in separate commits.
Also, once this is changed to hold a Vec
, should probably also change the name, since it's no longer actually an array? (Maybe FloatVector
?)
And in particular, I'm curious about the performance impact of the addtional allocations (and pointer indirection/cache) of using a Vec
here over an array. For that we probably need more of a macro-benchmark, do you have a data set you've tried for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'll change the name.
I don't have any real dataset, I've been using random numbers so far. I could potentially use the dataset from translate.py
example or maybe something form ann-benchmark (although from what I've seen those are usually quite big).
.iter()? | ||
.map(|val| val.and_then(|v| v.extract::<f32>())) | ||
.collect::<Result<_, _>>()?; | ||
Ok(Self::from(array)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you ditch the array
binding in favor of yielding Self::from(...collect())
directly, you don't need the type annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be too many steps for rustc to figure this out. I still need type annotation on collect
:
Ok(Self::from(
value
.iter()?
.map(|val| val.and_then(|v| v.extract::<f32>()))
.collect::<Result<Vec<f32>, _>>()?,
))
If I remove even Vec<f32>
part then it fails to compile:
error[E0277]: the trait bound `FloatVector<M>: From<()>` is not satisfied
--> instant-distance-py/src/lib.rs:431:13
|
430 | Ok(Self::from(
| ---------- required by a bound introduced by this call
431 | / value
432 | | .iter()?
433 | | .map(|val| val.and_then(|v| v.extract::<f32>()))
434 | | .collect::<Result<_, _>>()?,
| |___________________________________________^ the trait `From<()>` is not implemented for `FloatVector<M>`
|
= help: the trait `From<Vec<f32>>` is implemented for `FloatVector<M>`
error[E0277]: a value of type `()` cannot be built from an iterator over elements of type `f32`
--> instant-distance-py/src/lib.rs:431:13
|
431 | / value
432 | | .iter()?
433 | | .map(|val| val.and_then(|v| v.extract::<f32>()))
| |________________________________________________________________^ value of type `()` cannot be built from `std::iter::Iterator<Item=f32>`
434 | .collect::<Result<_, _>>()?,
| ------- required by a bound introduced by this call
|
= help: the trait `FromIterator<f32>` is not implemented for `()`
= help: the trait `FromIterator<()>` is implemented for `()`
= note: required for `Result<(), PyErr>` to implement `FromIterator<Result<f32, PyErr>>`
note: required by a bound in `collect`
--> /home/kuba/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:1832:19
|
1832 | fn collect<B: FromIterator<Self::Item>>(self) -> B
| ^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `collect`
For more information about this error, try `rustc --explain E0277`.
error: could not compile `instant-distance-py` due to 2 previous errors
I get the same error if I remove the whole type annotation.
} | ||
} | ||
|
||
impl Point for FloatArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this commit still mixes the adding/changing of a lot of code with moving it around. Usually I'd request a separate commit that just refactors the code so that the implementation here is moved into its new place (introducing the Metric
trait) and the code around it is left more or less unchanged, so that it's easier to review these things separately. So then I'd probably end up with:
- Implement
Metric
trait and parametrizeFloatArray
with it - Add new
Metric
types and impls - Change
FloatArray
toFloatVector
#[cfg(test)] | ||
mod tests { | ||
#[test] | ||
fn test_spaces_avx() { | ||
use super::*; | ||
use crate::*; | ||
|
||
if is_x86_feature_detected!("avx") && is_x86_feature_detected!("fma") { | ||
let v1: Vec<f32> = vec![ | ||
10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., | ||
10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., | ||
10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., | ||
10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., | ||
26., 27., 28., 29., 30., 31., | ||
]; | ||
let v2: Vec<f32> = vec![ | ||
40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., | ||
10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., | ||
10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., | ||
10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., | ||
56., 57., 58., 59., 60., 61., | ||
]; | ||
|
||
let euclid_simd = unsafe { euclid_distance_avx(&v1, &v2) }; | ||
let euclid = euclid_distance(&v1, &v2); | ||
assert_eq!(euclid_simd, euclid); | ||
|
||
let dot_simd = unsafe { dot_similarity_avx(&v1, &v2) }; | ||
let dot = dot_similarity(&v1, &v2); | ||
assert_eq!(dot_simd, dot); | ||
|
||
let mut v1 = v1; | ||
let mut v1_copy = v1.clone(); | ||
unsafe { cosine_preprocess_avx(&mut v1) }; | ||
cosine_preprocess(&mut v1_copy); | ||
assert_eq!(v1, v1_copy); | ||
} else { | ||
println!("avx test skipped"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it would be nice to split the addition of test code between the commits that add the test code (or alternatively, add it separately at the end).
#[derive(Copy, Clone)] | ||
enum DistanceMetric { | ||
Euclid, | ||
Cosine, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we don't actually have a Cosine
implementation at this point, we should not introduce this variant here until the commit where we add the backing implementation.
Addresses #20
Changes
distance-metrics
with Euclid and Cosine metricsdistance_metric
field toConfig
andDistanceMetric
enum classFloatArray
into a generic struct taking metric as a parameterHnswWithMetric
andHnswMapWithMetric
intermediate enums to hide generic parameters from the classes exposed to Python and to have static dispatch internallyBenchmark
The only noticeable change (regression) is from using the original SIMD implementation to using the qdrant one. Time spent in Python integration benchmark grows ~12% (175 msec -> 196 msec) which roughly corresponds the growth in time spent in distance metrics microbenchmark +19% (21 ns -> 25 ns).
Note that these benchmarks are very simple and most likely not reflect real world cases. It would be interesting to see the results for
aarch64
since previously we didn't have SIMD implementation for it.Distance metrics
Distance metrics microbenchmark results (on AMD Ryzen 9 5900HS 3.30 GHz):
Python integration
10 loops, best of 5: 175 msec per loop
10 loops, best of 5: 175 msec per loop
10 loops, best of 5: 196 msec per loop
10 loops, best of 5: 196 msec per loop
Known issues
HnswWithMetric
andHnswMapWithMetric
are introduced, the serialization format has changed. Objects serialized with the previous version won't work with this one.