-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Vector Search Nested benchmark #584
Support Vector Search Nested benchmark #584
Conversation
Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
osbenchmark/workload/params.py
Outdated
@@ -884,6 +884,7 @@ class VectorDataSetPartitionParamSource(ParamSource): | |||
def __init__(self, workload, params, context: Context, **kwargs): | |||
super().__init__(workload, params, **kwargs) | |||
self.field_name: str = parse_string_parameter("field", params) | |||
self.is_nested = "." in self.field_name # in base class because used for both bulk ingest and queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make "." as constant
@@ -1155,6 +1156,16 @@ def _build_vector_search_query_body(self, vector, efficient_filter=None) -> dict | |||
query.update({ | |||
"filter": efficient_filter, | |||
}) | |||
|
|||
if self.is_nested: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if more than one level nested is identified, lets raise an exception and add TODO to fix that later
Signed-off-by: Finn Roblin <[email protected]>
…ct#546 Signed-off-by: Finn Roblin <[email protected]>
Signed-off-by: Finn Roblin <[email protected]>
osbenchmark/workload/params.py
Outdated
@@ -884,6 +884,8 @@ class VectorDataSetPartitionParamSource(ParamSource): | |||
def __init__(self, workload, params, context: Context, **kwargs): | |||
super().__init__(workload, params, **kwargs) | |||
self.field_name: str = parse_string_parameter("field", params) | |||
self.NESTED_FIELD_SEPARATOR = "." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it constant
osbenchmark/workload/params.py
Outdated
"knn": { | ||
self.field_name: query, | ||
}, | ||
} | ||
|
||
if self.is_nested: | ||
outer_field_name, _inner_field_name = self.get_split_fields() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outer_field_name, _inner_field_name = self.get_split_fields() | |
outer_field_name, _ = self.get_split_fields() |
@finnroblin looks good, please resolve conflict and rebase your change. |
Signed-off-by: Finn Roblin <[email protected]>
LGTM. Thanks |
Description
This change supports nested field benchmarks for vector search. Previously k-NN maintainers would need to use the deprecated perf tool to benchmark nested fields.
Issues Resolved
This PR is part of the effort to port perf tool benchmarks to OSB.
Testing
Unit tests + tested E2E with modified nested sift euclidian dataset (workload PR forthcoming)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.