From 6f084852b46b8252d1fd3a3219acb9467bef90bf Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Thu, 30 May 2024 18:25:31 -0400 Subject: [PATCH] chase some more vale warnings. --- _search-plugins/ubi/data-structures.md | 2 +- _search-plugins/ubi/documentation.md | 1 + _search-plugins/ubi/query_id.md | 4 + _search-plugins/ubi/schemas.md | 220 +++++++++++++------------ 4 files changed, 117 insertions(+), 110 deletions(-) diff --git a/_search-plugins/ubi/data-structures.md b/_search-plugins/ubi/data-structures.md index 04c8a463c4..a534101d02 100644 --- a/_search-plugins/ubi/data-structures.md +++ b/_search-plugins/ubi/data-structures.md @@ -8,7 +8,7 @@ nav_order: 7 # Sample client data structures The client data structures can be used to create events that follow the [UBI event schema specification](https://github.com/o19s/opensearch-ubi), -which is describedin further detail [here]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/). +which is described in further detail [here]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/). The developer provides an implementation for the following functions: - `getClientId()` diff --git a/_search-plugins/ubi/documentation.md b/_search-plugins/ubi/documentation.md index 32c5b25f2b..f0fdad964e 100644 --- a/_search-plugins/ubi/documentation.md +++ b/_search-plugins/ubi/documentation.md @@ -15,6 +15,7 @@ to improve search relevance and user experience. ## Quick start +We need a Quick Start!!! ## UBI store diff --git a/_search-plugins/ubi/query_id.md b/_search-plugins/ubi/query_id.md index 9a8796b39e..738a741635 100644 --- a/_search-plugins/ubi/query_id.md +++ b/_search-plugins/ubi/query_id.md @@ -21,6 +21,8 @@ If the user interacts with an *object* (book, product, document) that was return This mermaid source is converted into an png under .../images/ubi/query_id.png + + ```mermaid %%{init: {'theme':'base', 'themeVariables': { @@ -73,6 +75,8 @@ sequenceDiagram ``` {% endcomment %} + + # The *Cute Things Animal Rescue* diff --git a/_search-plugins/ubi/schemas.md b/_search-plugins/ubi/schemas.md index efda65539c..41a7f204eb 100644 --- a/_search-plugins/ubi/schemas.md +++ b/_search-plugins/ubi/schemas.md @@ -11,10 +11,12 @@ nav_order: 7 ## Key ID's UBI is not functional unless the links between the following fields are consistently maintained within your UBI-enabled application: -- [`client_id`](#client_id) represents a unique user with their client application. +- [`client_id`](#client_id) represents a unique user with their client application. - [`object_id`](#object_id) represents an id for whatever item the user is searching for, such as `epc`, `isbn`, `ssn`, `handle`. + - [`object_id_field`](#object_id) tells us the type of `object_id`, i.e. the actual labels: "epc", "isbn", "ssn", or "handle" for each `object_id`. -- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s (_hits_) that the query returned. + +- [`query_id`](#query_id) is a unique id for the raw query language executed and the resultant `object_id`'s (_hits_) that the query returned. - [`action_name`](#action_name), though not technically an *id*, the `action_name` tells us what exact user action (such as `click` or `add_to_cart`, `watch`, `view`, `purchase`) that was taken (or not) with a given `object_id`. To summarize: the `query_id` signals the beginning of a `client_id`'s *Search Journey* every time a user queries the search index, the `action_name` tells us how the user is interacting with the query results within the application, and [`event_attributes.object.object_id`](#object_id) is referring to the precise query result that the user interacts with. @@ -29,14 +31,14 @@ To summarize: the `query_id` signals the beginning of a `client_id`'s *Search Jo - **Search Client**: in charge of searching, and then recieving *objects* from some document index in OpenSearch. (1, 2, *5* and 7, in following sections) - **User Behavior Insights** plugin: if activated in the `ext.ubi` stanza of the search request, manages the **UBI Queries** store in the background, indexing each underlying, technical, DSL, index query with a unique [`query_id`](#query_id) along with all returned resultant [`object_id`](#object_id)'s, and then passing the `query_id` back to the **Search Client** so that events can be linked to this query. - (3, 4 and *5*, in following sections) + (3, 4 and *5*, in following sections) - **objects**: are whatever items the user is searching for with the queries. Activating UBI involves mapping your real-world objects (using it's `isbn`, `ssn`) to the [`object_id`](#object_id) fields in the schemas. - The **Search Client**, if separate from the **UBI Client**, forwards the indexed [`query_id`](#query_id) to the **UBI Client**. -   *Note:* We break out the roles of *search* and *UBI event indexing* here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing. -  (6, following section) +   *Note:* We break out the roles of *search* and *UBI event indexing* here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing. +  (6, following section) - The **UBI Client** then indexes all user events with this [`query_id`](#query_id) until a new search is performed, and a new `query_id` is generated by **User Behavior Insights** and passed back to the **UBI Client** - If the **UBI Client** interacts with a result *object*, such as `onClick`, that [`object_id`](#object_id), `onClick` [`action_name`](#action_name) and `query_id` are all indexed together, signalling the causal link between the *search* and the *object*. - (8 and 9, following section) + (8 and 9, following section) @@ -51,62 +53,62 @@ The mermaid source is converted into an png under graph LR style L fill:none,stroke-dasharray: 5 5 subgraph L["`*Legend*`"] - style ss height:150px - subgraph ss["Standard Search"] - direction LR - - style ln1a fill:blue - ln1a[ ]--->ln1b[ ]; - end - subgraph ubi-leg["UBI data flow"] - direction LR - - ln2a[ ].->|"`**UBI interaction**`"|ln2b[ ]; - style ln1c fill:red - ln1c[ ]-->|query_id flow|ln1d[ ]; - end + style ss height:150px + subgraph ss["Standard Search"] + direction LR + + style ln1a fill:blue + ln1a[ ]--->ln1b[ ]; + end + subgraph ubi-leg["UBI data flow"] + direction LR + + ln2a[ ].->|"`**UBI interaction**`"|ln2b[ ]; + style ln1c fill:red + ln1c[ ]-->|query_id flow|ln1d[ ]; + end end linkStyle 0 stroke-width:2px,stroke:#0A1CCF linkStyle 2 stroke-width:2px,stroke:red ``` ```mermaid %%{init: { - "flowchart": {"htmlLabels": false}, + "flowchart": {"htmlLabels": false}, - } + } }%% graph TB -User--1) raw search string-->Search; +User--1) raw search string-->Search; Search--2) search string-->Docs style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5 subgraph OS[OpenSearch Cluster fa:fa-database] - style E stroke-width:1px,stroke:red - E[( UBI Events )] - style Docs stroke-width:1px,stroke:#0A1CCF - style Q stroke-width:1px,stroke:red - Docs[(Document Index)] -."3) {DSL...} & [object_id's,...]".-> Q[( UBI Queries )]; - Q -.4) query_id.-> Docs ; + style E stroke-width:1px,stroke:red + E[( UBI Events )] + style Docs stroke-width:1px,stroke:#0A1CCF + style Q stroke-width:1px,stroke:red + Docs[(Document Index)] -."3) {DSL...} & [object_id's,...]".-> Q[( UBI Queries )]; + Q -.4) query_id.-> Docs ; end Docs -- "5) return both query_id & [objects,...]" --->Search ; -Search-.6) query_id.->U; +Search-.6) query_id.->U; Search --7) [results, ...]--> User style *client-side* stroke-width:1px, stroke:#D35400 subgraph "`*client-side*`" - style User stroke-width:4px, stroke:#EC636 - User["`**User**`" fa:fa-user] - App - Search - U - style App fill:#D35400,opacity:.35, stroke:#0A1CCF, stroke-width:2px - subgraph App[       UserApp fa:fa-store] - style Search stroke-width:2px, stroke:#0A1CCF - Search( Search Client ) - style U stroke-width:1px,stroke:red - U( UBI Client ) - end + style User stroke-width:4px, stroke:#EC636 + User["`**User**`" fa:fa-user] + App + Search + U + style App fill:#D35400,opacity:.35, stroke:#0A1CCF, stroke-width:2px + subgraph App[       UserApp fa:fa-store] + style Search stroke-width:2px, stroke:#0A1CCF + Search( Search Client ) + style U stroke-width:1px,stroke:red + U( UBI Client ) + end end User -.8) selects object_id:123.->U; @@ -121,22 +123,22 @@ linkStyle 3,4,5,8 stroke-width:2px,fill:none,stroke:red There are 2 separate stores for UBI: ### 1) **UBI queries** All underlying query information and results ([`object_id`](#object_id)'s) are stored in the **UBI Queries** store, and remains largely invisible in the background. -The only obvious difference will be in the `ubi` stanze of the json response, *which could cause index bloat if one forgets that this is enabled*. +The only obvious difference will be in the `ubi` stanza of the JSON response, *which could cause index bloat if one forgets that this is enabled*. **UBI Queries** [schema](https://github.com/o19s/opensearch-ubi/tree/2.14.0/src/main/resources/queries-mapping.json): Since UBI manages the **UBI Queries** store, the developer should never have to write directly to this store (except for importing data). -- `timestamp` (events & queries) +- `timestamp` (events and queries)   A UNIX timestamp of when the query was received -- `query_id` (events & queries) -   A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`. +- `query_id` (events and queries) +   A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`. - `client_id` (events)   A user/client ID provided by the client application -- `query_response_objects_ids` (queries) -   This is an array of the `object_id`'s. This *could* be the same id as the `_id` but is meant to be the externally valid id of document/item/product. +- `query_response_objects_ids` (queries) +   This is an array of the `object_id`'s. This *could* be the same id as the `_id` but is meant to be the externally valid id of document/item/product. @@ -144,84 +146,84 @@ Since UBI manages the **UBI Queries** store, the developer should never have to This is the event store that the client side directly indexes events to, linking the event [`action_name`](#action_name), [`object_id`](#object_id)'s and [`query_id`](#query_id)'s together with any other important event information. Since this schema is dynamic, the developer can add any new fields and structures (such as *user* information, *geo-location* information) at index time that are not in the current **UBI Events** [schema](https://github.com/o19s/opensearch-ubi/tree/2.14.0/src/main/resources/events-mapping.json): - `application` -

+

-   (size 100) - name of the application tracking UBI events (e.g. `amazon-shop`, `ABC-microservice`) +   (size 100) - name of the application tracking UBI events (e.g. `amazon-shop`, `ABC-microservice`) - `action_name` -

+

-   (size 100) - any name you want to call your event such as `click`, `watch`, `purchase`, and `add_to_cart`, but one could map these to any common *JavaScript* events, or debugging events. -_TODO: How to formalize? A list of standard ones and then custom ones._ +   (size 100) - any name you want to call your event such as `click`, `watch`, `purchase`, and `add_to_cart`, but one could map these to any common *JavaScript* events, or debugging events. +_TODO: How to formalize? A list of standard ones and then custom ones._ -- `query_id` -

+- `query_id` +

-   (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated at index time by the **UBI Plugin**. - +   (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated at index time by the **UBI Plugin**. + The `client_id` must be consistent in both the **UBI Queries** and **UBI Events** stores. - `timestamp`: -   UTC-based, UNIX epoch time. +   UTC-based, UNIX epoch time. -- `message_type` - -   (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything else useful such as `QUERY` or `CONVERSION`. - Can be used to group `action_name` together in logical bins. _Thinking this should be backend logic in analysis_ +- `message_type` + +   (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything else useful such as `QUERY` or `CONVERSION`. + Can be used to group `action_name` together in logical bins. _Thinking this should be backend logic in analysis_ -- `message` - -   (size 256) - optional text message for the log entry. For example, with a `message_type` of `INFO`, people might expect an informational or debug type text for this field, but a `message_type` of `QUERY`, we would expect the text to be more about what the user is searching on. +- `message` + +   (size 256) - optional text message for the log entry. For example, with a `message_type` of `INFO`, people might expect an informational or debug type text for this field, but a `message_type` of `QUERY`, we would expect the text to be more about what the user is searching on. `event_attributes` has dynamic mapping, meaning if events are indexed with many custom fields, the index could bloat quickly with many new fields. -{: .warning} +{: .warning} -- `event_attributes`'s structure that describes any important context about the event. Within it, it has 2 primary structures `position` and `object`, as well as being extensible to add anymore relevant, custom, information about the event can be stored such as timing informaiton, individual user or session information, etc. +- `event_attributes`'s structure that describes any important context about the event. Within it, it has 2 primary structures `position` and `object`, as well as being extensible to add anymore relevant, custom, information about the event can be stored such as timing informaiton, individual user or session information, etc. - The two primary structures in the `event_attributes`: - - **`event_attributes.position`** - structure that contains information on the location of the event origin, such as screen *x,y* coordinates, or the *n-th* object out of 10 results, .... + The two primary structures in the `event_attributes`: + - **`event_attributes.position`** - structure that contains information on the location of the event origin, such as screen *x,y* coordinates, or the *n-th* object out of 10 results, .... + + - `event_attributes.position.ordinal` - - `event_attributes.position.ordinal` - -   tracks the *n*th item within a list that a user could select, click (i.e. selecting the 3rd element could be event{`onClick, results[4]`}) +   tracks the *n*th item within a list that a user could select, click (i.e. selecting the 3rd element could be event{`onClick, results[4]`}) - - `event_attributes.position.{x,y}` - -   tracks x and y values, that the client defines + - `event_attributes.position.{x,y}` + +   tracks x and y values, that the client defines - - `event_attributes.position.page_depth` - -   tracks page depth of results + - `event_attributes.position.page_depth` + +   tracks page depth of results - - `event_attributes.position.scroll_depth` - -   tracks scroll depth of page results + - `event_attributes.position.scroll_depth` + +   tracks scroll depth of page results - - `event_attributes.position.trail` - -   text field for tracking the path/trail that a user took to get to this location - -

+ - `event_attributes.position.trail` + +   text field for tracking the path/trail that a user took to get to this location + +

- - **`event_attributes.object`**, which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post). - The `object` structure has two ways to refer to the object, with `object_id` being the id that links prior queries to this object: - - - `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indexes. - - `event_attributes.object.object_id` -   is the id that a user could look up amd find the object instance within the **document corpus**. Examples include: `ssn`, `isbn`, `ean`. Variants need to be incorporated in the `object_id`, so for a t-shirt that is red, you would need SKU level as the `object_id`. - Initializing UBI requires mapping from the **Document Index**'s primary key to this `object_id` - - - `event_attributes.object.object_id_field` - -   indicates the type/class of object _and_ the ID field of the search index. - - - `event_attributes.object.description` - -   optional description of the object - - - - `event_attributes.object.object_detail` - -   optional text for further data object details - - - *extensible fields*: any new fields by any other names in the `object` that one indexes will dynamically expand this schema to that use-case. -{: .warning} + - **`event_attributes.object`**, which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post). + The `object` structure has two ways to refer to the object, with `object_id` being the id that links prior queries to this object: + + - `event_attributes.object.internal_id` is a unique id that OpenSearch can use to internally to index the object, think the `_id` field in the indexes. + - `event_attributes.object.object_id` +   is the id that a user could look up and find the object instance within the **document corpus**. Examples include: `ssn`, `isbn`, `ean`. Variants need to be incorporated in the `object_id`, so for a t-shirt that is red, you would need SKU level as the `object_id`. + Initializing UBI requires mapping from the **Document Index**'s primary key to this `object_id` + + - `event_attributes.object.object_id_field` + +   indicates the type/class of object _and_ the ID field of the search index. + + - `event_attributes.object.description` + +   optional description of the object + + + - `event_attributes.object.object_detail` + +   optional text for further data object details + + - *extensible fields*: any new fields by any other names in the `object` that one indexes will dynamically expand this schema to that use-case. +{: .warning}