Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partial support of advanced input transformations (ingest) #220

Merged
merged 10 commits into from
Jul 18, 2024

Conversation

ohltyler
Copy link
Member

@ohltyler ohltyler commented Jul 17, 2024

Description

This PR begins the process of supporting the ML inference processor form inputs to support an advanced transformation flow. The scope of this PR is to set up the base form changes to the input map / output map inputs, the base transformation modals for configuring advanced input / output flows, and partial support of input transformations in the context of ingest. More specifically:

  • finished removing UI fields from the workflow config (help text / help link) for ML inference processors and embed them directly in the new MLProcessorInputs component
  • changing the ML processor inputs to using a specialized MLProcessorInputs component instead of the generic ConfigFieldList. This is required, since this particular processor we need to maintain more specific and complex state unique to this processor type. We still keep a parent/generic component with a switch to select the appropriate processor UX component based on the processor type. The main idea is switching from the components being flexible on a per-field basis ---> per-processor-type basis. This allows easy onboarding of new processor types we will eventually add.
  • introduces the InputTransformModal with partial implementation. Specifically, the ability to dynamically fetch the input schema. Future PRs will integrate with JSONPath and the ML interfaces to determine the output schema.
  • introduces the stubbed transform modals for configuring complex outputs
  • onboarding bulk API and updating the default document ingestion to be an array instead of an individual doc. This expands the capability and flexibility for users when trying to test/ingest with a set of documents
  • onboarding _simulate ingest pipeline API for executing proposed ingest pipeline configurations when users are configuring advanced inputs. For example, given some pipeline that looks like Processor A -> Processor B -> Processor C, suppose a user wants to configure Processor D at the end. They go into the advanced view to configure an input transformation. To fetch the transformed document data up to Processor D, this PR adds logic to 1/ create a temporary pipeline configuration containing Processor A -> Processor B -> Processor C, collect the currently-configured documents in the form, run _simulate, and display the transformed doc results. See demo for a visual explanation
  • updating/adding global interfaces, including for the expected input/output of the simulate pipeline API, as these can get complicated and error-prone, and to make the code cleaner & more readable

Demo video, showing the advanced flow. Expected input to each processor is shown. For the first/initial processor, the expected input is simply the list of documents. For the second processor, the expected input is a transformed version of the document, containing the embeddings values. Internally, this creates an ingest pipeline containing all of the preceding processors and executes _simulate against the provided documents. Note a single input_map transform is made to convert the hello field to the model's expected input field.

screen-capture.4.webm

Additional note: there is no defined UX on this, subject to change. But core functionality will remain the same.

Issues Resolved

Makes progress on #23

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ohltyler ohltyler marked this pull request as draft July 17, 2024 20:07
@ohltyler ohltyler changed the title Add partial support of advanced input transformations [WIP] Add partial support of advanced input transformations Jul 17, 2024
@ohltyler ohltyler changed the title [WIP] Add partial support of advanced input transformations [WIP] Add partial support of advanced input transformations (ingest) Jul 17, 2024
@ohltyler ohltyler changed the title [WIP] Add partial support of advanced input transformations (ingest) Add partial support of advanced input transformations (ingest) Jul 17, 2024
@ohltyler ohltyler marked this pull request as ready for review July 17, 2024 21:51
@ohltyler
Copy link
Member Author

mend failure can be ignored - it is occasionally failing to run and marking as failed. No new deps are added in this PR.

@ohltyler ohltyler merged commit 62d18e4 into opensearch-project:main Jul 18, 2024
5 of 6 checks passed
@ohltyler ohltyler deleted the transform-flow branch July 18, 2024 20:38
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 18, 2024
ohltyler added a commit that referenced this pull request Jul 18, 2024
…#221)

Signed-off-by: Tyler Ohlsen <[email protected]>
(cherry picked from commit 62d18e4)

Co-authored-by: Tyler Ohlsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants