Skip to content

Latest commit

 

History

History
408 lines (348 loc) · 20.8 KB

conversation-api.adoc

File metadata and controls

408 lines (348 loc) · 20.8 KB

Conversational web service

Through Smarti’s conversational RESTful-API it can be seamlessly integrated into any Chat-based communication platform.

Conversation Data Model

Conversations in Samrti are based on the following JSON model

{
  "id": "5a1403a535887bbdfc94b531",
  "meta": {
    "status": "New",
    "lastMessageAnalyzed": 4,
    "channel_id": [
      "lp3aiunsij2664ft"
    ]
    /* any meta fields */
  },
  "user": {
    "id": "be4df619-f273-4d15-b3fb-6fa91b08a71a",
    "displayName": "Anonymous User"
  },
  "messages": [
    {
      "id": "{message-id}",
      "time": 1509619512000,
      "origin": "User",
      "content": "{message-content}",
      "user": {
        "id": "hkxkyrwf",
        "displayName": "Anonymous User"
      },
      "votes": 0,
      "metadata": {},
      "private": false
    }
  ],
  "context": {
    "contextType": "rocket.chat",
    "domain": "loadtest",
    "environment": {
      /* any environment variable */
    }
  },
  "lastModified": 1511261254233
}
  • id: Each conversation has an id that is assigned by Smarti

  • meta: meta information about the message.

    • Smarti allows for any meta fields to be added.

    • the channel_id was used up to v0.6.* to store the ID of the channel. While the mapping of conversation to channel is no longer managed by Smarti starting with version 0.7.0 this field SHOULD still be used in case an integrator wants to store such mappings within Smarti.

  • user: The user that started this conversation.

    • only the id is required

  • messages: The messages of the conversation in the order as sent

    • id: the id of the message (typically the id as provided by the messaging system). Smarti requires this id to be unique within the conversation.

    • time: the time when the message was sent

    • origin: the origin of the message (User, Agent)

    • content: the textual content of the message

    • user: information about the user of this message.

      • only the id is required

    • votes: allows to set the votes for this message

    • metadata: allows for additional metadata to be stored with this message

    • private: if true this is a private message. Private messages will not be available for Conversation Search in Smarti.

  • context: Contextual information about the message.

    • environment: allows for additional contextual information about the environment

  • lastModified: managed by Smarti this field holds the last modification date of this Conversation in Smarti

Tokens

Information extracted by Smarti are represented by Tokens and later combined in Templates.

{
  "tokens": [
    {
      "messageIdx": 0,
      "start": 0,
      "end": 8,
      "origin": "System",
      "state": "Suggested",
      "value": "Physiker",
      "type": "Keyword",
      "hints": [
        "interestingTerm"
      ],
      "confidence": 0.7794291
    }
  ]
}
  • The messageIdx is the index of the message within the conversation. start and end are offsets of the mention within the message. -1 is allowed for both messageIdx, start and end in cases where a token has no mention in the conversation (e.g. a classification result based on the conversation as a whole would use -1 for messageIdx, start and end; a classification of the first message would use messageIdx=0 and -1 for start and end).

  • The origin specifies who created the Token. Possible values are System, Agent. The value System is reserved for Tokens created by Smarti. Other values are for manual annotations.

  • The state allows for user feedback on Token level. Suggested is reserved for Tokens created by the Smarti processing pipeline. Users can Confirmed and Rejected Tokens. Manually created Tokens should use Confirmed.

  • The type defines the kind of the Token. The type also specifies the data type of the value

    • Date: An extracted date/time. This type uses a JSON object with { "date": "2016-09-01T16:30:00.000Z", "grain": "minute"} as value.

      • date holds the extracted date/time value

      • grain specifies the granularity of the extracted value - millisecond, second, minute, hour, day, week, month and year are allowed values.

    • Topic: Used for any kind of classification. The value is the identifier of the topic

    • Place, Organization, Person, Product and Entity: Extracted named entities of different kinds. Entity shall be used if the named entity is not of any of the more specific kinds.

    • Term: A term of a terminology that was mentioned in the conversation

    • Keyword: Keywords identified in the conversation. Keywords are important words and phrases within the conversation. In contrast to Terms Keywords are not NOT part of a controlled vocabulary.

    • Attribute: Attributions the can modify the meaning of other tokens (e.g. the attribute "romantic" can modify the meaning of the term "restaurant").

    • Other: type to be used for Tokens that do not fit any of the above. The real type should by stored as a hint

  • hint: Hints allow to store additional information for Tokens. Examples are more specific type information; roles of Tokens (e.g. if a Place is the origin or target of a travel) …​

  • confidence: the confidence of the Token in the range [0..1]

Templates

An abstraction over single Tokens are Templates. Templates structure extracted information as required for a specific intend. Examples are things like travel planing, location based recommendations, route planing but also more abstract things like information retrieval or related content recommendation

{
  "templates": [
    {
      "type": "related.conversation",
      "slots": [
        {
          "role": "Term",
          "tokenType": null,
          "required": false,
          "tokenIndex": 36
        }
      ],
      "queries": [],
      "confidence": 0.75306565
    }
  ]
}
  • type: the type refers to the definition this template was build based.

  • slots: this are slots of the template.

    • role Templates define different roles for Tokens. Some roles may be multi valued. In this case multiple slots will have the same role.

    • tokenType: if present the referenced Token MUST BE of the specified type

    • required: if this slot is required for the Template to be valid. Required slots are always included in the template. If a required Slot can not be filled with a Token the tokenIndex is set to -1

    • tokenIndex: the index of the token within the tokens array (-1 means unassigned).

  • confidence: the confidence for this template. The confidence is typically used in combination with Intend classification. e.g. given a classification for the intend and templates representing intends the confidence will be set to the confidence of the intend to be present.

Query

Queries are most specific extraction result of Smarti. Queries are used to retrieve information from a service and are build based on a template. Because of that query is also a sub-element of template.

{
  "queries": [
    {
      "creator": "queryBuilder%3Aconversationmlt%3Arelated-conversations",
      "displayTitle": "Flug Salzburg -> Berlin (01.09)",
      "inlineResultSupport": false,
      "state": "Suggested",
      "url": "{}",
      "confidence": 0.75306565
    }
  ]
}

Common fields supported by all queries include:

  • The creator identifies the component/configuration that created this query.

  • The displayTitle is intended to be used for provide a human readable representation of this query.

  • If inlineResultSupport is true the creator supports server-side execution of the query.

  • The state can be used for user feedback. Every query will start with Suggested. Users can Confirmed and Rejected queries.

  • The url representing this query. If server side execution is supported this might not be present.

  • The confidence specifies how well the service is suited to search for information of the template (e.g. bahn.de is very suitable for a travel planing template so a query for Bahn.de would get a high confidence. One could also create a Google Query for travel planing, but results would be less precise so this query should get a lower confidence).

In addition to those fields queries can provide additional information. Those are specific to the creator of the query.

Working with Smarti Analysis

Smarti processes ongoing Conversations. Those analysis results are available via

or when providing a callback URI asynchronously on most CRUD requests for conversations and messages.

{
  "conversation" : "5a7befcb27c2da49d629dcab",
  "date" : 1518082127519,
  "tokens" : [ /* Tokens */ ],
  "templates" : [ /* Templates and Queries for Templates */]
}
  • conversation: The id of the analysed conversation

  • date: the modification date of the conversation - the version of the conversation used for the analysis

  • tokens: Information extracted form the conversation. Important as template.slots refer to token indexes

  • templates: Templates represent a structured view onto the extracted information. The type of the template defines the slots it supports and also what queries one can expect for the tempalte. The LATCH template is an example of such a template intended to be used for information retrieval.

  • queries: Queries are built based on templates and are typically used to retrieve information related to the conversation from services. A single template can have multiple queries from different creators (e.g. the LATCH template can have multiple queries for different configured search indexes).

Note
Queries are NOT executed during processing of the conversation. The client is responsible for execution (typically based on a user interaction).

When inlineResultSupport is true a query can be executed by Smarti. In such cases the url is often not defined - meaning that the query can only be executed by Smarti. To execute a query via smarti the following service has to be used:

where:

  • {templateIndex} - is the index of the "template" array returned by the /conversation/{conversationId}/template request (e.g. 0 for the first template in the original response)

  • {creatorName} - the value of the creator attribute of the executed query (e.g. the value of template[0].queries[0].creator when executing the first query of the first template)

Note
In a future version it is planed to change referencing of Messages, Tokens and Templates from an index to an ID based model. This will have affects to this webservice.

The response format is not normalised and fully specific for the query type. Server side execution is e.g. used for the related conversation queries.

For queries where inlineResultSupport is false the client needs to execute the query. Typically this can be done by using the url attribute. However specific queries might provide the information for query execution in additional fields.

Common Templates and Query Provider

The related conversation template allows to find conversation similar to the current one. Two query provider do use this template:

  1. Similar Conversation: This searches for similar conversation based on the textual content of the current conversation. It does NOT use extracted tokens but rather uses the content of the messages to search for similar conversations. Starting with v0.7.0 only the content of recent messages is used as context. Up to v0.6.* the content of all messages in the conversation was used.

  2. Related Conversation Search: This uses extracted Keywords and Terms to search for related conversations. The search for related conversations uses the conversation/search service (part of the conversation API).

The following listing shows the structure use by this template and queries

{
    "type" : "related.conversation",
    "slots" : [ {
      "role" : "Keyword",
      "tokenType" : "Keyword",
      "required" : false,
      "tokenIndex" : {token-idx}
    }, {
      "role" : "Term",
      "tokenType" : null,
      "required" : false,
      "tokenIndex" : {token-idx}
    } ],
    "state" : "Suggested",
    "queries" : [ {
      "_class" : "io.redlink.smarti.query.conversation.ConversationMltQuery",
      "displayTitle" : "conversationmlt",
      "confidence" : 0.55,
      "url" : null,
      "creator" : "queryBuilder:conversationmlt:conversationmlt",
      "inlineResultSupport" : true,
      "created" : 1518423401234,
      "state" : "Suggested",
      "content" : "{textual-context}"
    }, {
      "_class" : "io.redlink.smarti.query.conversation.ConversationSearchQuery",
      "creator" : "queryBuilder:conversationsearch:conversationsearch",
      "displayTitle" : "conversationsearch",
      "confidence" : 0.6,
      "url" : "/conversation/search",
      "inlineResultSupport" : false,
      "created" : 1518423401234,
      "state" : "Suggested",
      "defaults" : {
        "ctx.before" : 1,
        "rows" : 3,
        "ctx.after" : 2,
        "fl" : "id,message_id,meta_channel_id,user_id,time,message,type"
      },
      "keywords" : [ "{keyword-1}", "{keyword-1}", ".." ],
      "terms" : [ "{term-1}", "{term-2}", ".." ],
      "queryParams" : [ ],
      "filterQueries" : [ "completed:true", "-conversation_id:5a7befcb27c2da49d629dcab", "-meta_support_area:*" ]
    } ]
  }

Related Conversation Template:

  • The related.conversation template has two roles - Keyword and Term. Both allow for multiple assignments.

    • {token-idx} is the index of the token referenced by the slot of the template. -1 indicates an unassigend slot.

    • the tokenType defines the required type of the token. For slots with the role Keyword the token type will be Keyword. For role Term slots this will be null. The actual type of the token needs to be obtained from the referenced Token itself.

Similar Conversation Query:

  • Similar Conversation are provided by the ConversationMltQuery query provider

  • The {textual-context} is the context used to search for similar conversations

  • The url is null as this query is server executed. This means that the GET conversation/{conversationId}/template/{templateIndex}/result/{creator} service needs to be used to obtain results.

Related Conversation Search Query:

  • Related Conversation are provided by the ConversationSearchQuery query provider

  • The url is the to /conversation/search the path the the conversation search service part of the conversation webservice API

  • The keywords and terms are suggested search parameters for the conversation search

    • they can be parsed by using the text parameter. In this case prefix and infix searches will be used to the exact match (with different boosts)

    • optionally thy can also be parsed via the Solr q parameter. This allows for more control over the actual search. See Solr Documentation for more information and options

    • the defaults are additional parameter to be parsed to the conversation search service. Those are based on the clients conviguration and should be included unchanged when calling the conversation search service.

    • filterQueries are expected to be parsed as is via the Solr fq parameter. They are also calculated based on the clients configuration and expected to be included unchanged on calles to the conversation search service. NOTE: that skipping those filter does NOT allow users to retrieve information they do not have acces to as all security relevant restrictions are applied by the conversation search service based on the authentication information of the call. So removing some/all of those filter does only affect the quality of results.

LATCH Information Retrieval

LATCH stands for Location, Alphabet, Time, Category and Hierarchy and represents a well established model for information retrieval.

The LATCH templates collects information extracted from the conversation and assigns them to the the different dimensions of the Template (e.g. extracted tokens with the type Time will be assigned to the time dimension; extracted tokens with the type Place will be assigned to the location dimension …​).

Based on this template a generic query builder for Apache Solr endpoints is implemented. This query builder can be configured for Solr search endpoints so that information related to the current conversation can be suggested to the user.

{
    "type" : "ir_latch",
    "slots" : [ {
      "role" : "{role}",
      "tokenType" : {type},
      "required" : false,
      "tokenIndex" : {token-idx}
    } ],
    "probability" : 0.0,
    "state" : "Suggested",
    "queries" : [ {
      "_class" : "io.redlink.smarti.query.solr.SolrSearchQuery",
      "creator" : "queryBuilder:solrsearch:config-name-1",
      "resultConfig" : {
        "mappings" : {
          "source" : "source_field",
          "title" : "title_field",
          "description" : "desc_field",
          "type" : "type_field",
          "doctype" : null,
          "link" : "link_field",
          "date" : "date_field",
          "thumb" : "thumb_field"
        },
        "numOfRows" : 10
      },
      "defaults" : { },
      "displayTitle" : "config-title-1",
      "confidence" : 0.8,
      "url" : "https://search.host-1.org/query?{solr-query-parameter}",
      "inlineResultSupport" : false,
      "created" : 1518433563606,
      "state" : "Suggested",
      "queryParams" : [ "address_cityplz_de:\"Salzburg\"^0.8478922", "context_de:\"Tätowieren\"^0.39709732", "title_de_sv:\"Tätowieren\"^1.1912919" ],
      "filterQueries" : [ ]
    }, {
      "_class" : "io.redlink.smarti.query.solr.SolrSearchQuery",
      "creator" : "queryBuilder:solrsearch:config-name-2",
      "resultConfig" : {
        "mappings" : {
          "source" : null,
          "title" : "title",
          "description" : "description",
          "type" : "type",
          "doctype" : null,
          "link" : null,
          "date" : "release_date",
          "thumb" : null
        },
        "numOfRows" : 5
      },
      "defaults" : {
        "fq" : "structure:(type_1 OR type_2)"
      },
      "displayTitle" : "Stadt Salzburg",
      "confidence" : 0.8,
      "url" : "https://www.host-2.com/search/select?fq=structure%3A(type_1%20OR%20type_2)&{solr-query-parameter}",
      "inlineResultSupport" : false,
      "created" : 1518433563606,
      "state" : "Suggested",
      "queryParams" : [ "content:\"Tätowieren\"^0.39709732", "title:\"Tätowieren\"^1.1912919" ],
      "filterQueries" : [ ]
    } ]
  }

LATCH Information Retrieval

  • the type of this template is ir_latch

  • it supports the following roles: location, alphabet, time, category and hierarchy

    • all roles are optional and allow for multiple assignments.

    • some Roles require specific types: locationPlace, timeDate, categoryTopic

Solr Search Query

  • generic query provider that can be configured for a specific Solr Endpont

  • the resultConfig is copied from the configuration and defines the Solr field names to be used for the visualisation of results. The numOfRows is the number of results per page.

  • the url represents the Solr request representing the data provided by the query provider. If the client wants to modify the query it needs to strip the query part from the URL and create the query parameter to reflect the intended modifications

  • defaults are query parameters to be included in the request. They originate form the client configuration.

    • the second example shows how a fq (filter query) is configuerd in the defaults. NOTE that this is also present as query paramter in the url

  • the queryParams represent parts of the Solr query (q parameter) generated for the extracted tokens and the solr fields as defined by the configuration.

    • in the first example the config-name-1 has mappings for locations and full text search. Because of that the extracted token Salzburg with the type Place is used as query parameter for the location name field address_cityplz_de. The token Tätowieren with the type Keyword is used for full text search. First in the title field title_de_sv with a boost of 1.1912919 and second in the full text field context_de with a lower boost of 0.39709732

    • the second example config-name-2 hos no field to search for locations. Because of this only Tätowieren is included in the query parameters

  • filterQueries are additional filters set by the query provider based on the conversation (e.g. based on detected categories). Values need to be parsed as fq parameters in the Solr query.