Skip to content

Commit

Permalink
Merge branch 'ubi-docs-consolidation' of https://github.com/o19s/docu…
Browse files Browse the repository at this point in the history
…mentation-website into ubi-docs-consolidation
  • Loading branch information
RasonJ committed May 31, 2024
2 parents c9edc86 + 6f08485 commit 6626bb1
Show file tree
Hide file tree
Showing 4 changed files with 117 additions and 110 deletions.
2 changes: 1 addition & 1 deletion _search-plugins/ubi/data-structures.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ nav_order: 7

# Sample client data structures
The client data structures can be used to create events that follow the [UBI event schema specification](https://github.com/o19s/opensearch-ubi),
which is describedin further detail [here]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/).
which is described in further detail [here]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/).

The developer provides an implementation for the following functions:
- `getClientId()`
Expand Down
1 change: 1 addition & 0 deletions _search-plugins/ubi/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ to improve search relevance and user experience.

## Quick start

We need a Quick Start!!!

## UBI store

Expand Down
4 changes: 4 additions & 0 deletions _search-plugins/ubi/query_id.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ If the user interacts with an *object* (book, product, document) that was return
This mermaid source is converted into an png under
.../images/ubi/query_id.png

<!-- vale off -->

```mermaid
%%{init: {'theme':'base',
'themeVariables': {
Expand Down Expand Up @@ -73,6 +75,8 @@ sequenceDiagram
```
{% endcomment %}

<!-- vale on -->

<!-- vale off -->
# The *Cute Things Animal Rescue*
<!-- vale on -->
Expand Down
220 changes: 111 additions & 109 deletions _search-plugins/ubi/schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ nav_order: 7
## Key ID's
UBI is not functional unless the links between the following fields are consistently maintained within your UBI-enabled application:

- [`client_id`](#client_id) represents a unique user with their client application.
- [`client_id`](#client_id) represents a unique user with their client application.
- [`object_id`](#object_id) represents an id for whatever item the user is searching for, such as `epc`, `isbn`, `ssn`, `handle`.
<!-- vale off -->
- [`object_id_field`](#object_id) tells us the type of `object_id`, i.e. the actual labels: "epc", "isbn", "ssn", or "handle" for each `object_id`.
- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s (_hits_) that the query returned.
<!-- vale on -->
- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s (_hits_) that the query returned.
- [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact user action (such as `click` or `add_to_cart`, `watch`, `view`, `purchase`) that was taken (or not) with a given `object_id`.

To summarize: the `query_id` signals the beginning of a `client_id`'s *Search Journey* every time a user queries the search index, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with.
Expand All @@ -29,14 +31,14 @@ To summarize: the `query_id` signals the beginning of a `client_id`'s *Search Jo
- **Search Client**: in charge of searching, and then recieving *objects* from some document index in OpenSearch.
(1, 2, *5* and 7, in following sections)
- **User Behavior Insights** plugin: if activated in the `ext.ubi` stanza of the search request, manages the **UBI Queries** store in the background, indexing each underlying, technical, DSL, index query with a unique [`query_id`](#query_id) along with all returned resultant [`object_id`](#object_id)'s, and then passing the `query_id` back to the **Search Client** so that events can be linked to this query.
(3, 4 and *5*, in following sections)
(3, 4 and *5*, in following sections)
- **objects**: are whatever items the user is searching for with the queries. Activating UBI involves mapping your real-world objects (using it's `isbn`, `ssn`) to the [`object_id`](#object_id) fields in the schemas.
- The **Search Client**, if separate from the **UBI Client**, forwards the indexed [`query_id`](#query_id) to the **UBI Client**.
&ensp; *Note:* We break out the roles of *search* and *UBI event indexing* here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.
&ensp;(6, following section)
&ensp; *Note:* We break out the roles of *search* and *UBI event indexing* here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.
&ensp;(6, following section)
- The **UBI Client** then indexes all user events with this [`query_id`](#query_id) until a new search is performed, and a new `query_id` is generated by **User Behavior Insights** and passed back to the **UBI Client**
- If the **UBI Client** interacts with a result *object*, such as `onClick`, that [`object_id`](#object_id), `onClick` [`action_name`](#action_name) and `query_id` are all indexed together, signalling the causal link between the *search* and the *object*.
(8 and 9, following section)
(8 and 9, following section)


<img src="{{site.url}}{{site.baseurl}}/images/ubi/ubi-schema-interactions_legend.png" />
Expand All @@ -51,62 +53,62 @@ The mermaid source is converted into an png under
graph LR
style L fill:none,stroke-dasharray: 5 5
subgraph L["`*Legend*`"]
style ss height:150px
subgraph ss["Standard Search"]
direction LR
style ln1a fill:blue
ln1a[ ]--->ln1b[ ];
end
subgraph ubi-leg["UBI data flow"]
direction LR
ln2a[ ].->|"`**UBI interaction**`"|ln2b[ ];
style ln1c fill:red
ln1c[ ]-->|<span style="font-family:Courier New">query_id</span> flow|ln1d[ ];
end
style ss height:150px
subgraph ss["Standard Search"]
direction LR
style ln1a fill:blue
ln1a[ ]--->ln1b[ ];
end
subgraph ubi-leg["UBI data flow"]
direction LR
ln2a[ ].->|"`**UBI interaction**`"|ln2b[ ];
style ln1c fill:red
ln1c[ ]-->|<span style="font-family:Courier New">query_id</span> flow|ln1d[ ];
end
end
linkStyle 0 stroke-width:2px,stroke:#0A1CCF
linkStyle 2 stroke-width:2px,stroke:red
```
```mermaid
%%{init: {
"flowchart": {"htmlLabels": false},
"flowchart": {"htmlLabels": false},
}
}
}%%
graph TB
User--1) <i>raw search string</i>-->Search;
User--1) <i>raw search string</i>-->Search;
Search--2) <i>search string</i>-->Docs
style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5
subgraph OS[OpenSearch Cluster fa:fa-database]
style E stroke-width:1px,stroke:red
E[(&emsp;<b>UBI Events</b>&emsp;)]
style Docs stroke-width:1px,stroke:#0A1CCF
style Q stroke-width:1px,stroke:red
Docs[(Document Index)] -."3) {<i>DSL</i>...} & [<i>object_id's</i>,...]".-> Q[(&emsp;<b>UBI Queries</b>&emsp;)];
Q -.4) <span style="font-family:Courier New">query_id</span>.-> Docs ;
style E stroke-width:1px,stroke:red
E[(&emsp;<b>UBI Events</b>&emsp;)]
style Docs stroke-width:1px,stroke:#0A1CCF
style Q stroke-width:1px,stroke:red
Docs[(Document Index)] -."3) {<i>DSL</i>...} & [<i>object_id's</i>,...]".-> Q[(&emsp;<b>UBI Queries</b>&emsp;)];

Check warning on line 90 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L90

[OpenSearch.AcronymParentheses] 'DSL': Spell out acronyms the first time that you use them on a page and follow them with the acronym in parentheses. Subsequently, use the acronym alone.
Raw output
{"message": "[OpenSearch.AcronymParentheses] 'DSL': Spell out acronyms the first time that you use them on a page and follow them with the acronym in parentheses. Subsequently, use the acronym alone.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 90, "column": 36}}}, "severity": "WARNING"}
Q -.4) <span style="font-family:Courier New">query_id</span>.-> Docs ;
end
Docs -- "5) <i>return</i> both <span style="font-family:Courier New">query_id</span> & [<i>objects</i>,...]" --->Search ;
Search-.6) <span style="font-family:Courier New">query_id</span>.->U;
Search-.6) <span style="font-family:Courier New">query_id</span>.->U;
Search --7) [<i>results</i>, ...]--> User
style *client-side* stroke-width:1px, stroke:#D35400
subgraph "`*client-side*`"
style User stroke-width:4px, stroke:#EC636
User["`**User**`" fa:fa-user]
App
Search
U
style App fill:#D35400,opacity:.35, stroke:#0A1CCF, stroke-width:2px
subgraph App[&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;UserApp fa:fa-store]
style Search stroke-width:2px, stroke:#0A1CCF
Search(&emsp;Search Client&emsp;)
style U stroke-width:1px,stroke:red
U(&emsp;<b>UBI Client</b>&emsp;)
end
style User stroke-width:4px, stroke:#EC636
User["`**User**`" fa:fa-user]
App
Search
U
style App fill:#D35400,opacity:.35, stroke:#0A1CCF, stroke-width:2px
subgraph App[&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;UserApp fa:fa-store]
style Search stroke-width:2px, stroke:#0A1CCF
Search(&emsp;Search Client&emsp;)
style U stroke-width:1px,stroke:red
U(&emsp;<b>UBI Client</b>&emsp;)
end
end
User -.8) <i>selects</i> <span style="font-family:Courier New">object_id:123</span>.->U;
Expand All @@ -121,107 +123,107 @@ linkStyle 3,4,5,8 stroke-width:2px,fill:none,stroke:red
There are 2 separate stores for UBI:
### 1) **UBI queries**
All underlying query information and results ([`object_id`](#object_id)'s) are stored in the **UBI Queries** store, and remains largely invisible in the background.
The only obvious difference will be in the `ubi` stanze of the json response, *which could cause index bloat if one forgets that this is enabled*.
The only obvious difference will be in the `ubi` stanza of the JSON response, *which could cause index bloat if one forgets that this is enabled*.

**UBI Queries** [schema](https://github.com/o19s/opensearch-ubi/tree/2.14.0/src/main/resources/queries-mapping.json):
Since UBI manages the **UBI Queries** store, the developer should never have to write directly to this store (except for importing data).

- `timestamp` (events & queries)
- `timestamp` (events and queries)
&ensp; A UNIX timestamp of when the query was received

- `query_id` (events & queries)
&ensp; A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.
- `query_id` (events and queries)
&ensp; A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.

- `client_id` (events)
&ensp; A user/client ID provided by the client application

- `query_response_objects_ids` (queries)
&ensp; This is an array of the `object_id`'s. This *could* be the same id as the `_id` but is meant to be the externally valid id of document/item/product.
- `query_response_objects_ids` (queries)
&ensp; This is an array of the `object_id`'s. This *could* be the same id as the `_id` but is meant to be the externally valid id of document/item/product.



### 2) **UBI events**
This is the event store that the client side directly indexes events to, linking the event [`action_name`](#action_name), [`object_id`](#object_id)'s and [`query_id`](#query_id)'s together with any other important event information.
Since this schema is dynamic, the developer can add any new fields and structures (such as *user* information, *geo-location* information) at index time that are not in the current **UBI Events** [schema](https://github.com/o19s/opensearch-ubi/tree/2.14.0/src/main/resources/events-mapping.json):
- `application`
<p id="application">
<p id="application">

&ensp; (size 100) - name of the application tracking UBI events (e.g. `amazon-shop`, `ABC-microservice`)
&ensp; (size 100) - name of the application tracking UBI events (e.g. `amazon-shop`, `ABC-microservice`)
- `action_name`
<p id="action_name">
<p id="action_name">

&ensp; (size 100) - any name you want to call your event such as `click`, `watch`, `purchase`, and `add_to_cart`, but one could map these to any common *JavaScript* events, or debugging events.
_TODO: How to formalize? A list of standard ones and then custom ones._
&ensp; (size 100) - any name you want to call your event such as `click`, `watch`, `purchase`, and `add_to_cart`, but one could map these to any common *JavaScript* events, or debugging events.
_TODO: How to formalize? A list of standard ones and then custom ones._

- `query_id`
<p id="query_id">
- `query_id`
<p id="query_id">

&ensp; (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated at index time by the **UBI Plugin**.
&ensp; (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated at index time by the **UBI Plugin**.

The `client_id` must be consistent in both the **UBI Queries** and **UBI Events** stores.

- `timestamp`:
&ensp; UTC-based, UNIX epoch time.
&ensp; UTC-based, UNIX epoch time.

- `message_type`
&ensp; (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything else useful such as `QUERY` or `CONVERSION`.
Can be used to group `action_name` together in logical bins. _Thinking this should be backend logic in analysis_
- `message_type`

&ensp; (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything else useful such as `QUERY` or `CONVERSION`.
Can be used to group `action_name` together in logical bins. _Thinking this should be backend logic in analysis_

- `message`
&ensp; (size 256) - optional text message for the log entry. For example, with a `message_type` of `INFO`, people might expect an informational or debug type text for this field, but a `message_type` of `QUERY`, we would expect the text to be more about what the user is searching on.
- `message`

&ensp; (size 256) - optional text message for the log entry. For example, with a `message_type` of `INFO`, people might expect an informational or debug type text for this field, but a `message_type` of `QUERY`, we would expect the text to be more about what the user is searching on.

`event_attributes` has dynamic mapping, meaning if events are indexed with many custom fields, the index could bloat quickly with many new fields.
{: .warning}
{: .warning}

- `event_attributes`'s structure that describes any important context about the event. Within it, it has 2 primary structures `position` and `object`, as well as being extensible to add anymore relevant, custom, information about the event can be stored such as timing informaiton, individual user or session information, etc.
- `event_attributes`'s structure that describes any important context about the event. Within it, it has 2 primary structures `position` and `object`, as well as being extensible to add anymore relevant, custom, information about the event can be stored such as timing informaiton, individual user or session information, etc.

Check failure on line 180 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L180

[OpenSearch.Spelling] Error: informaiton. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: informaiton. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 180, "column": 270}}}, "severity": "ERROR"}

Check warning on line 180 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L180

[OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove.
Raw output
{"message": "[OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 180, "column": 323}}}, "severity": "WARNING"}

The two primary structures in the `event_attributes`:
- **`event_attributes.position`** - structure that contains information on the location of the event origin, such as screen *x,y* coordinates, or the *n-th* object out of 10 results, ....
The two primary structures in the `event_attributes`:
- **`event_attributes.position`** - structure that contains information on the location of the event origin, such as screen *x,y* coordinates, or the *n-th* object out of 10 results, ....

Check failure on line 183 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L183

[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'x,y'.
Raw output
{"message": "[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'x,y'.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 183, "column": 127}}}, "severity": "ERROR"}

- `event_attributes.position.ordinal`

- `event_attributes.position.ordinal`

&ensp; tracks the *n*th item within a list that a user could select, click (i.e. selecting the 3rd element could be event{`onClick, results[4]`})
&ensp; tracks the *n*th item within a list that a user could select, click (i.e. selecting the 3rd element could be event{`onClick, results[4]`})

Check failure on line 187 in _search-plugins/ubi/schemas.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _search-plugins/ubi/schemas.md#L187

[OpenSearch.Spelling] Error: th. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: th. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/ubi/schemas.md", "range": {"start": {"line": 187, "column": 24}}}, "severity": "ERROR"}

- `event_attributes.position.{x,y}`
&ensp; tracks x and y values, that the client defines
- `event_attributes.position.{x,y}`

&ensp; tracks x and y values, that the client defines

- `event_attributes.position.page_depth`
&ensp; tracks page depth of results
- `event_attributes.position.page_depth`

&ensp; tracks page depth of results

- `event_attributes.position.scroll_depth`
&ensp; tracks scroll depth of page results
- `event_attributes.position.scroll_depth`

&ensp; tracks scroll depth of page results

- `event_attributes.position.trail`
&ensp; text field for tracking the path/trail that a user took to get to this location
<p id="object_id">
- `event_attributes.position.trail`

&ensp; text field for tracking the path/trail that a user took to get to this location

<p id="object_id">

- **`event_attributes.object`**, which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post).
The `object` structure has two ways to refer to the object, with `object_id` being the id that links prior queries to this object:
- `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indexes.
- `event_attributes.object.object_id`
&ensp; is the id that a user could look up amd find the object instance within the **document corpus**. Examples include: `ssn`, `isbn`, `ean`. Variants need to be incorporated in the `object_id`, so for a t-shirt that is red, you would need SKU level as the `object_id`.
Initializing UBI requires mapping from the **Document Index**'s primary key to this `object_id`

- `event_attributes.object.object_id_field`
&ensp; indicates the type/class of object _and_ the ID field of the search index.

- `event_attributes.object.description`
&ensp; optional description of the object

- `event_attributes.object.object_detail`
&ensp; optional text for further data object details
- *extensible fields*: any new fields by any other names in the `object` that one indexes will dynamically expand this schema to that use-case.
{: .warning}
- **`event_attributes.object`**, which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post).
The `object` structure has two ways to refer to the object, with `object_id` being the id that links prior queries to this object:

- `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indexes.
- `event_attributes.object.object_id`
&ensp; is the id that a user could look up and find the object instance within the **document corpus**. Examples include: `ssn`, `isbn`, `ean`. Variants need to be incorporated in the `object_id`, so for a t-shirt that is red, you would need SKU level as the `object_id`.
Initializing UBI requires mapping from the **Document Index**'s primary key to this `object_id`

- `event_attributes.object.object_id_field`

&ensp; indicates the type/class of object _and_ the ID field of the search index.

- `event_attributes.object.description`

&ensp; optional description of the object


- `event_attributes.object.object_detail`

&ensp; optional text for further data object details
- *extensible fields*: any new fields by any other names in the `object` that one indexes will dynamically expand this schema to that use-case.
{: .warning}

0 comments on commit 6626bb1

Please sign in to comment.