The Noise query language is an expressive example-based syntax for finding documents, formatting and returning specific information in the documents, performing relevancy scoring, ordering and aggregations.
All queries have a find
clause followed by an example based query syntax. It is a combination of expressions that consist of three parts: The key to query, an operator and the value to match.
This query will return the _id
of every document with a {"foo": "bar",...}
find {foo: == "bar"}
This query will match all documents in the index and return their _id
.
find {}
To match on multiple fields, or even nested fields, simply construct the same json structure in query form.
Match on two fields:
find {foo: == "bar", fizz: == "buzz"}
Match on fields, one nested within another:
find {foo: == "bar", fizz: {fazz: == "buzz"}}
~=
is the full text match operator. Use it find a word in a text field.
find {body: ~= "word"}
Put multiple words in the quoted string to find a phrase in the field.
find {body: ~= "a multi word sentence"}
To find words that are within a specified distance of each other, put the the maximum word distance in the operator. This example will return results where each word is with the 50 words of the others.
find {body: ~50= "bitcoin gold price"}
Noise supports the following comparison operators:
Operator | Description | Types |
---|---|---|
== |
Equality | Strings, Numbers, true, false, null |
> |
Less Than | Numbers |
< |
Greater Than | Numbers |
>= |
Less Than or Equal | Numbers |
<= |
Greater Than or Equal | Numbers |
Noise does not do type conversions of datatypes. Strings only compare with strings, number only compare with numbers, etc.
GeoJSON geometry objects are indexed automatically for geospatial queries. Currently only bounding box queries are supported. The input geometries are simplified to bounding boxes. The operator is &&
(the similar to PostGIS) followed by a bounding box, which is a 4 element array where the coordinate order is [West, South, East, North
] (or you could say lower-left corner followed by upper-right corner).
find {geometry: && [-180, -90, 180, 90]}
Let's say you have document like this with text in an array:
{"foo": ["bar", "baz"]}
To find element with value "baz"
in the array, use syntax like this:
find {foo:[ == "baz"]}
If objects are nested in array, like this:
{"foo": [{"fiz": "bar"}, {"fiz": "baz"}]}
To find a {"fiz": "baz"}
in the array, use syntax like this:
find {foo: [{fiz: == "baz"}]}
Noise has full support for boolean logic using &&
(logical AND) and ||
(logical OR) operators and nesting logic with parens.
The comma ,
in objects is actually the same as the &&
operator. They can be used interchangeably for which ever is more readable.
Find a doc with "foo"
or "bar"
in the body
:
find {body: ~= "foo" || body: ~= "bar"}
Find a doc that has "foo"
or "bar"
and has "baz"
or "biz"
in the body
:
find {(body: ~= "foo" || body: ~= "bar") &&
(body: ~= "baz" || body: ~= "biz")}
The fields can be nested as well. Find a doc where either the nested field fiz
contains either "baz"
or "biz"
.
find {foo: {fiz: ~= "baz" || fiz: ~= "biz"}}
Use the !
(logical NOT) to exclude matching criteria.
Find docs where foo
has value "bar"
and fab
does not have value "baz"
:
find {foo: == "bar", fab: != "baz"}
You can use logical not with parentheses to negate everything enclosed. This example finds docs where foo
has value "bar"
and fab
does not have value "baz"
or "biz"
':
find {foo: == "bar", !(fab: == "baz" || fab: == "biz")}
You cannot have every clause be negated. Query need at least one non-negated clauses.
Illegal:
find {foo: !~= "bar" && foo: !~= "baz"}
Illegal:
find {!(foo: ~= "bar" && foo: ~= "baz"})
Also double negation is not allowed.
Illegal:
find {foo ~= "waz" && !(foo: ~= "bar" && foo: !~= "baz"})
To return results in a particular order, use the order clause.
This will order results ascending based on the contents of the baz
field:
find {foo: == "bar"}
order .baz
If baz
doesn't existing, null
be the value used for ordering.
This will order baz
descending:
find {foo: == "bar"}
order .baz
This will order baz
ascending:
find {foo: == "bar"}
order .baz asc
This will order baz
ascending with default value of 1
if no baz
value exists:
find {foo: == "bar"}
order .baz default=1 asc
This will order baz
ascending, for values of baz
that are the same, those results are now ordered as biz
ascending.
find {foo: == "bar"}
order .baz asc, .biz desc
Relevancy scoring uses a combination boolean model and Term Frequency/Inverse Document Frequency (TF/IDF) scoring system, very similar to Lucene and Elastic Search. The details of the scoring model is beyond the scope of the document.
To return results in relevancy score order (most relevant first), simply use the order clause with the score()
function.
find {subject: ~= "hammer" || body: ~= "hammer"}
order score() desc
But if you want matches in subject
fields to score higher than in body
fields, you can boost the score with the ^
operator. It is a multiplier of the scores of associated clauses.
This boosts subject
matches by 2x:
find {subject: ~= "hammer"^2 || body: ~= "hammer"}
order score() desc
You can also boost everything in parenthesis or objects or arrays:
find {(subject: ~= "hammer" || subject: ~= "nails")^2 ||
body: ~= "hammer" || body: ~= "nails"}
order score() desc
Another way to express the same thing:
find {subject: ~= "hammer" || subject: ~= "nails"}^2 ||
{body: ~= "hammer" || body: ~= "nails"}
order score() desc
The return clause is how data or scoring is returned to the client. You can extract the whole document, a single field, multiple fields, and perform aggregations.
For this section these examples the following document will be used:
{
"_id": "example",
"foo": "bar",
"baz": {"biz": "bar"},
"faz": [
{"fiz": 213},
{"biz": 5463},
{"biz": 73}
]
}
A leading dot indicates the root of the document. To return the whole document, place a single dot in return clause.
This will return the whole document for each document found.
find
{foo: == "bar"}
return
.
// [{
// "_id": "example",
// "foo": "bar",
// "baz": {"biz": "bar"},
// "faz": [
// {"fiz": 213},
// {"biz": 5463},
// {"biz": 73}
// ]
// }]
To return a specific field, place the field name after the dot:
find {foo: == "bar"}
return .baz
// [{"biz": "bar"}]
To return a nested field, use another dot:
find {foo: == "bar"}
return .baz.biz
// ["bar"]
To return an array element, use the array notation:
find {foo: == "bar"}
return .faz[1]
// [{"biz": 5463}]
To return an object field nested in the array, add a dot after the array notation:
find {foo: == "bar"}
return .faz[1].biz
// [5463]
To return multiple values, embed the return paths in other JSON structures.
For each match this example returns 2 values inside an array:
find {foo: == "bar"}
return [.baz, .faz]
// [[
// {"biz": "bar"},
// [{"fiz": 213}, {"biz": 5463}, {"biz": 73}]
// ]]
For each match this example return 2 values inside an object:
find {foo: == "bar"}
return {baz: .baz, faz: .faz}
// [{
// "baz": {"biz": "bar"},
// "faz": [{"fiz": 213}, {"biz": 5463}, {"biz": 73}]
// }]
Sometimes you'll want to return a field that doesn't exist on a matching document. When that happens, null
is returned.
If you'd like a different value to be returned, use the default=<json>
option, like this:
find {foo: == "bar"}
return .hammer default=0
// [0]
Each returned value can have a default as well.
find {foo: == "bar"}
return {baz: .baz default=0, hammer: .hammer default=1}
// [{
// "baz": {"biz": "bar"},
// "hammer": 1
// }]
If want to return a nested field inside an array, but for each object in the array, use the []
with no index.
This will return each biz field as an array of values:
find {foo: == "bar"}
return .faz[].biz
// [[5463, 73]]
If you are searching for nested values or objects nested in arrays, and you want to return only the match objects, use the bind syntax before the array in the query. The bound value is always an array, as multiple elements might match.
Say you have a document like this:
{
"_id": "a",
"foo": [
{"fiz": "bar", "val": 4}, {"fiz": "baz", "val": 7}
],
"bar": [
{"fiz": "baz", "val": 9}
]
}
You want to return the object where {"fiz": "bar", ...}
(but not the others), use you a bind variable (var::[...]
), like this:
find {foo: x::[{fiz: == "bar"}]}
return x
// [[{"fiz": "bar", "val": 4}]]
If instead you want to return the val
field, add the .val
to the bind variable like this:
find {foo: x::[{fiz: == "bar"}]}
return x.val
// [[4]]
You can have any number of bind variables:
find {foo: x::[{fiz: == "bar"}], foo: y::[{fiz: == "baz"}]}
return [x.val, y.val]
// [[[4], [7]]]
The same query as the previous one, but returning an object:
find {foo: x::[{fiz: == "bar"}], foo: y::[{fiz: == "baz"}]}
return {x: x.val, y: y.val}
// [{"x": [4], "y": [7]}]
You can reuse bind variables in different clauses and they'll be combined:
find {foo: x::[{fiz: == "baz"}] || bar: x::[{fiz: == "baz"}]}
return {x: x.val}
// [{"x": [7, 9]}]
To limit the number of results, use a limit clause at the end of the query.
This limits the results to the first 10 found:
find {foo: == "bar"}
return .baz
limit 10
Noise includes ways to group rows together and aggregate values.
Values you want to group together use group(...)
function in the return
clause.
For values that are grouped together you can then perform aggregations on other values and return that aggregation. If a group function is used, all other fields must also be grouped or aggregated.
The aggregation functions available are:
function | Description |
---|---|
array(...) |
Returns all values in the group as values in an array. |
array_flat(...) |
Returns all values in the group as values in an array. However if an array is encountered it extracts all the values inside the array (and further nested arrays) and returns them as a singe flat array |
avg(...) |
Averages numeric values in the group. If numeric values are in arrays, it extracts the values from the arrays. Even if arrays are nested in arrays, it extracts through all levels of nested arrays and averages them. |
count() |
Returns the count of the grouped rows for each grouping. |
concat(... [sep="..."]) |
Returns all the strings in the group as a single concatenated string. Other value types are ignored. Use the optional sep="..." to specify a separator between string values. |
max(...) |
Returns the maximum value in the group. See type ordering below to see how different types are considered. |
max_array(...) |
Returns the maximum value in the group, if array is encountered the values inside the array are extracted and considered. |
min(...) |
Returns the minimum value in the group. See type ordering below to see how different types are considered. |
min_array(...) |
Returns the minimum value in the group, if array is encountered the values inside the array are extracted and considered. |
sum(...) |
Sums numeric values in the group. If numeric values are in arrays, it extracts the values from the arrays. Even if arrays are nested in arrays, it extracts through all levels of nested arrays and sums them. |
To perform grouping and/or aggregate, each field returned will need either a grouping or a aggregate function. It's an error it on some returned fields but not others.
Groupings are are ordered first on the leftmost group(...)
function, then on the next leftmost, etc.
You do not need to use group(...)
to perform aggregates. If you have no group(...)
defined, then all rows are aggregated into a single row.
The ordering of types for max(...)
and min(...)
is as follows:
null < false < true < number < string < array < object
Let's say we have documents like this:
{"foo":"group1", "baz": "a", "bar": 1}
{"foo":"group1", "baz": "b", "bar": 2}
{"foo":"group1", "baz": "c", "bar": 3}
{"foo":"group1", "baz": "a", "bar": 1}
{"foo":"group1", "baz": "b", "bar": 2}
{"foo":"group1", "baz": "c", "bar": 3}
{"foo":"group1", "baz": "a", "bar": 1}
{"foo":"group1", "baz": "b", "bar": 2}
{"foo":"group1", "baz": "c", "bar": 3}
{"foo":"group1", "baz": "a", "bar": 1}
{"foo":"group1", "baz": "b", "bar": 2}
{"foo":"group1", "baz": "c", "bar": 3}
{"foo":"group2", "baz": "a", "bar": "a"}
{"foo":"group2", "baz": "a", "bar": "b"}
{"foo":"group2", "baz": "b", "bar": "a"}
{"foo":"group2", "baz": "b", "bar": "b"}
{"foo":"group2", "baz": "a", "bar": "a"}
{"foo":"group2", "baz": "a", "bar": "c"}
{"foo":"group2", "baz": "b", "bar": "d"}
{"foo":"group2", "baz": "b", "bar": "e"}
{"foo":"group2", "baz": "a", "bar": "f"}
{"foo":"group3", "baz": "a", "bar": "a"}
("foo":"group3", "bar": "b"}
{"foo":"group3", "baz": "b", "bar": "a"}
{"foo":"group3", "baz": "b", "bar": "b"}
{"foo":"group3", "baz": "a", "bar": "a"}
{"foo":"group3", "baz": "a" }
{"foo":"group3", "baz": "b", "bar": "d"}
{"foo":"group3", "baz": "b", "bar": "e"}
{"foo":"group3", "baz": "a", "bar": "f"}
Query:
find {foo: == "group1"}
return {baz: group(.baz), count: count()}
Results:
{"baz":"a","bar":4}
{"baz":"b","bar":4}
{"baz":"c","bar":4}
Query:
find {foo: == "group1"}
return {baz: group(.baz), bar: sum(.bar)}
Results:
{"baz":"a","bar":4}
{"baz":"b","bar":8}
{"baz":"c","bar":12}
Query:
find {foo: == "group1"}
return {avg: avg(.bar)}
Results:
{"bar":2}
Query:
find {foo: == "group1"}
return {baz: group(.baz), concat: concat(.baz sep="|")}
Results:
{"baz":"a","concat":"a|a|a|a"}
{"baz":"b","concat":"b|b|b|b"}
{"baz":"c","concat":"c|c|c|c"}
Query:
find {foo: == "group1"}
return {max: max(.bar)}
Results:
{"max":3}
Query:
find {foo: == "group1"}
return {max: max(.baz)}
Results:
{"max":"c"}
Query:
find {foo: == "group1"}
return {min: min(.bar)}
Results:
{"min":1}
Query:
find {foo: == "group2"}
return [group(.baz order=asc), group(.bar order=desc), count()]
Results:
["a","f",1]
["a","c",1]
["a","b",1]
["a","a",2]
["b","e",1]
["b","d",1]
["b","b",1]
["b","a",1]
Query:
find {foo: =="group2"}
return [group(.baz order=asc) default="a", group(.bar order=desc) default="c", count()];
Results:
["a","f",1]
["a","c",1]
["a","b",1]
["a","a",2]
["b","e",1]
["b","d",1]
["b","b",1]
["b","a",1]
When performing aggregations on arrays, some functions will extract values out of the arrays (and arrays nested in arrays).
We have documents like this:
{"foo":"array1", "baz": ["a","b",["c","d",["e"]]]}
{"foo":"array1", "baz": ["f","g",["h","i"],"j"]}
{"foo":"array2", "baz": [1,2,[3,4,[5]]]}
{"foo":"array2", "baz": [6,7,[8,9],10]};
Query:
find {foo: == "array1"}
return array(.baz)
Results:
[["f","g",["h","i"],"j"],["a","b",["c","d",["e"]]]]
Query:
find {foo: == "array1"}
return array_flat(.baz)
Results:
["f","g","h","i","j","a","b","c","d","e"]
Query:
find {foo: == "array1"}
return max(.baz)
Results:
["f","g",["h","i"],"j"]
Query:
find {foo: == "array1"}
return max_array(.baz)
Results:
"j"
Query:
find {foo: == "array1"}
return min_array(.baz)
Results:
"a"
Query:
find {foo: =="array2"}
return avg(.baz)
Results:
5.5
Query:
find {foo: =="array2"}
return sum(.baz)
Results:
55