Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onboard basic sentiment analysis with defaults #350

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

ohltyler
Copy link
Member

@ohltyler ohltyler commented Sep 6, 2024

Description

Adds a new sentiment analysis preset, with defaults for ingest/search pipelines and index configurations. Based roughly off of this documented example: https://opensearch.org/docs/latest/search-plugins/search-pipelines/ml-inference-search-request/#example-externally-hosted-model

This use case is intended to be used with a specialized sentiment analysis model (or LLM with tuned prompt) that takes in text and returns a sentiment/category (generally Positive/Neutral/Negative). One basic example is for storing and analyzing website reviews. This particular preset is two-fold:

  1. Take in a document with text, process the text with an ML ingest processor to generate and store a label field with the returned sentiment as part of the document
  2. Search using plaintext, augment with an ML search request processor to generate and replace the label field's value in the request, such that only results with the matching sentiment are returned.

Overall, this use case could be tuned and enhanced in many different ways. Users may want to persist more than just a label. For example, one reasonable use case is being able to perform a hybrid search over some text's vector, it's sentiment/label, and its plaintext, and try out different weights in a hybrid query, etc.

More details:

  • adds quick-configure presets and form inputs for sentiment analysis
  • adds logic in quick configure modal to inject quick configure values into the config (the "label" field)
  • adds metadata defaults and a new preset JSON resource for this use case
  • adds default query inputs for vector search use cases (query.term.${text_field}.value) for the ML models. This may be tuned later on and depends on the default queries or if the query editing experience changes.
  • remove noisy toast when editing transforms, since we now run it automatically instead of requiring explicit user input

Demo video, showing a basic usecase with a sagemaker sentiment analysis model. Also shows the default values set in the ML search request processor for a vector search use case. Note that now by using all defaults, no further input is needed on this search request processor now.

screen-capture.14.webm

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tyler Ohlsen <[email protected]>
@ohltyler ohltyler marked this pull request as ready for review September 9, 2024 16:44
@ohltyler ohltyler merged commit 9b644de into opensearch-project:main Sep 9, 2024
6 checks passed
@ohltyler ohltyler deleted the sentiment-analysis branch September 9, 2024 16:57
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 9, 2024
Signed-off-by: Tyler Ohlsen <[email protected]>
(cherry picked from commit 9b644de)
ohltyler added a commit that referenced this pull request Sep 9, 2024
Signed-off-by: Tyler Ohlsen <[email protected]>
(cherry picked from commit 9b644de)

Co-authored-by: Tyler Ohlsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants