-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Hybrid Search
In addition to vectors, Milvus supports data types such as boolean, integers, floating-point numbers, and more. A collection in Milvus can hold multiple fields for accommodating different data features or properties. Milvus is a flexible vector database that pairs scalar filtering with powerful vector similarity search.
A hybrid search is a vector similarity search, during which you can filter the scalar data by specifying a boolean expression.
For example:
In Python
import random
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
# Connect to server
connections.connect("default", host='localhost', port='19530')
# Create a collection
collection_name = "test_collection_search"
schema = CollectionSchema([
FieldSchema("film_id", DataType.INT64, is_primary=True),
FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
])
collection = Collection(collection_name, schema, using='default', shards_num=2)
# Insert some random data
data = [
[i for i in range(10)],
[[random.random() for _ in range(2)] for _ in range(10)],
]
collection.insert(data)
collection.num_entities
# Load collection to memory
collection.load()
# Conduct a similarity search with an expression filtering ID column
search_param = {
"data": [[1.0, 1.0]],
"anns_field": "films",
"param": {"metric_type": "L2"},
"limit": 2,
"expr": "film_id in [2,4,6,8]"
}
res = collection.search(**search_param)
# Check results
hits = res[0]
print(f"- Total hits: {len(hits)}, hits ids: {hits.ids} ")
print(f"- Top1 hit id: {hits[0].id}, distance: {hits[0].distance}, score: {hits[0].score} ")
In Node.js
import { MilvusClient } from "@zilliz/milvus2-sdk-node";
const milvusClient = new MilvusClient("localhost:19530");
// Prepare a test collection
const COLLECTION_NAME = "test_collection_search";
milvusClient.collectionManager.createCollection({
collection_name: COLLECTION_NAME,
fields: [
{
name: "films",
description: "vector field",
data_type: DataType.FloatVector,
type_params: {
dim: "2",
},
},
{
name: "film_id",
data_type: DataType.Int64,
autoID: false,
is_primary_key: true,
description: "",
},
],
});
// Insert some random data
let id = 1;
const entities = Array.from({ length: 10 }, () => ({
films: Array.from({ length: 2 }, () => Math.random() * 10),
film_id: id++,
}));
await milvusClient.collectionManager.insert({
collection_name: COLLECTION_NAME,
fields_data: entities,
});
// Load collection to memory & conduct a search with boolean expression
await milvusClient.collectionManager.loadCollection({
collection_name: COLLECTION_NAME,
});
await milvusClient.dataManager.search({
collection_name: COLLECTION_NAME,
// partition_names: [],
expr: "film_id in [1,4,6,8]",
vectors: [entities[0].films],
search_params: {
anns_field: "films",
topk: "4",
metric_type: "L2",
params: JSON.stringify({ nprobe: 10 }),
},
vector_type: 100, // float vector -> 100
});
// search result will be like:{ status: { error_code: 'Success', reason: '' }, results: [ { score: 0, id: '1' }, { score: 9.266796112060547, id: '4' }, { score: 28.263811111450195, id: '8' }, { score: 41.055686950683594, id: '6' } ]}
A predicate expression outputs a boolean value. Milvus conducts scalar filtering by searching with predicates. A predicate expression, when evaluated, returns either TRUE or FALSE.
EBNF grammar rules describe boolean expressions rules:
Expr = LogicalExpr | NIL
LogicalExpr = LogicalExpr BinaryLogicalOp LogicalExpr
| UnaryLogicalOp LogicalExpr
| "(" LogicalExpr ")"
| SingleExpr;
BinaryLogicalOp = "&&" | "and" | "||" | "or";
UnaryLogicalOp = "not";
SingleExpr = TermExpr | CompareExpr;
TermExpr = IDENTIFIER "in" ConstantArray;
Constant = INTEGER | FLOAT
ConstantExpr = Constant
| ConstantExpr BinaryArithOp ConstantExpr
| UnaryArithOp ConstantExpr;
ConstantArray = "[" ConstantExpr { "," ConstantExpr } "]";
UnaryArithOp = "+" | "-"
BinaryArithOp = "+" | "-" | "*" | "/" | "%" | "**";
CompareExpr = IDENTIFIER CmpOp IDENTIFIER
| IDENTIFIER CmpOp ConstantExpr
| ConstantExpr CmpOp IDENTIFIER
| ConstantExpr CmpOpRestricted IDENTIFIER CmpOpRestricted ConstantExpr;
CmpOpRestricted = "<" | "<=";
CmpOp = ">" | ">=" | "<" | "<=" | "=="| "!=";
The following table lists the description of each symbol mentioned in the above Boolean expression rules:
Notation | Description |
---|---|
= | Definition. |
, | Concatenation. |
; | Termination. |
| | Alternation. |
{...} | Repetition. |
(...) | Grouping. |
NIL | Empty. The expression can be an empty string. |
INTEGER | Integers such as 1, 2, 3. |
FLOAT | Float numbers such as 1.0, 2.0. |
CONST | Integers or float numbers. |
IDENTIFIER | Identifier. In Milvus, the IDENTIFIER represents the field name. |
LogicalOp | A LogicalOp is a logical operator that supports combining more than one relational operation in one comparison. Returned value of a LogicalOp is either TRUE (1) or FALSE (0). There are two types of LogicalOps, including BinaryLogicalOps and UnaryLogicalOps. |
UnaryLogicalOp | UnaryLogicalOp refers to the unary logical operator "not". |
BinaryLogicalOp | Binary logical operators that perform actions on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules. |
ArithmeticOp | An ArithmeticOp, namely an arithmetic operator, performs mathematical operations such as addition and subtraction on operands. |
UnaryArithOp | A UnaryArithOp is an arithmetic operator that performs an operation on a single operand. The negative UnaryArithOp changes a positive expression into a negative one, or the other way round. |
BinaryArithOp | A BinaryArithOp, namely a binary operator, performs operations on two operands. In a complex expression with two or more operands, the order of evaluation depends on precedence rules. |
CmpOp | CmpOp is a relational operator that perform actions on two operands. |
CmpOpRestricted | CmpOpRestricted is restricted to "Less than" and "Equal". |
ConstantExpr | ConstantExpr can be a Constant or a BinaryArithop on two ConstExprs or a UnaryArithOp on a single ConstantExpr. It is defined recursively. |
ConstantArray | ConstantArray is wrapped by square brackets, and ConstantExpr can be repeated in the square brackets. ConstArray must include at least one ConstantExpr. |
TermExpr | TermExpr is used to check whether the value of an IDENTIFIER appears in a ConstantArray. TermExpr is represented by "in". |
CompareExpr | A CompareExpr, namely comparison expression can be relational operations on two IDENTIFIERs, or relational operations on one IDENTIFIER and one ConstantExpr, or ternary operation on two ConstantExprs and one IDENTIFIER. |
SingleExpr | SingleExpr, namely single expression, can be either a TermExpr or a CompareExpr. |
LogicalExpr | A LogicalExpr can be a BinaryLogicalOp on two LogicalExprs, or a UnaryLogicalOp on a single LogicalExpr, or a LogicalExpr grouped within parentheses, or a SingleExpr. The LogicalExpr is defined recursively. |
Expr | Expr, an abbreviation meaning expression, can be LogicalExpr or NIL. |
Logical operators perform a comparison between two expressions.
Symbol | Operation | Example | Description |
---|---|---|---|
'and' && | and | expr1 && expr2 | True if both expr1 and expr2 are true. |
'or' || | or | expr1 || expr2 | True if either expr1 or expr2 are true. |
Binary arithmetic operators contain two operands and can perform basic arithmetic operations and return the corresponding result.
Symbol | Operation | Example | Description |
---|---|---|---|
+ | Addition | a + b | Add the two operands. |
- | Subtraction | a - b | Subtract the second operand from the first operand. |
* | Multiplication | a * b | Multiply the two operands. |
/ | Division | a / b | Divide the first operand by the second operand. |
** | Power | a ** b | Raise the first operand to the power of the second operand. |
% | Modulo | a % b | Divide the first operand by the second operand and yield the remainder portion. |
Relational operators use symbols to check for equality, inequality, or relative order between two expressions.
Symbol | Operation | Example | Description |
---|---|---|---|
< | Less than | a < b | True if a is less than b. |
> | Greater than | a > b | True if a is greater than b. |
== | Equal | a == b | True if a is equal to b. |
!= | Not equal | a != b | True if a is not equal to b. |
<= | Less than or equal | a <= b | True if a is less than or equal to b. |
>= | Greater than or equal | a >= b | True if a is greater than or equal to b. |
The following table lists the precedence and associativity of operators. Operators are listed top to bottom, in descending precedence.
Precedence | Operator | Description | Associativity |
---|---|---|---|
1 | + - | UnaryArithOp | Left-to-right |
2 | not | UnaryLogicOp | Right-to-left |
3 | ** | BinaryArithOp | Left-to-right |
4 | * / % | BinaryArithOp | Left-to-right |
5 | + - | BinaryArithOp | Left-to-right |
6 | < <= > >= | CmpOp | Left-to-right |
7 | == != | CmpOp | Left-to-right |
8 | && and | BinaryLogicOp | Left-to-right |
9 | || or | BinaryLogicOp | Left-to-right |
- Expressions are normally evaluated from left to right. Complex expressions are evaluated one at a time. The order in which the expressions are evaluated is determined by the precedence of the operators used.
- If an expression contains two or more operators with the same precedence, the operator to the left is evaluated first.
- When a lower precedence operation should be processed first, it should be enclosed within parentheses.
- Parentheses can be nested within expressions. Innermost parenthetical expressions are evaluated first.