Add partial support of advanced input transformations (ingest) #220

ohltyler · 2024-07-17T20:06:31Z

Description

This PR begins the process of supporting the ML inference processor form inputs to support an advanced transformation flow. The scope of this PR is to set up the base form changes to the input map / output map inputs, the base transformation modals for configuring advanced input / output flows, and partial support of input transformations in the context of ingest. More specifically:

finished removing UI fields from the workflow config (help text / help link) for ML inference processors and embed them directly in the new MLProcessorInputs component
changing the ML processor inputs to using a specialized MLProcessorInputs component instead of the generic ConfigFieldList. This is required, since this particular processor we need to maintain more specific and complex state unique to this processor type. We still keep a parent/generic component with a switch to select the appropriate processor UX component based on the processor type. The main idea is switching from the components being flexible on a per-field basis ---> per-processor-type basis. This allows easy onboarding of new processor types we will eventually add.
introduces the InputTransformModal with partial implementation. Specifically, the ability to dynamically fetch the input schema. Future PRs will integrate with JSONPath and the ML interfaces to determine the output schema.
introduces the stubbed transform modals for configuring complex outputs
onboarding bulk API and updating the default document ingestion to be an array instead of an individual doc. This expands the capability and flexibility for users when trying to test/ingest with a set of documents
onboarding _simulate ingest pipeline API for executing proposed ingest pipeline configurations when users are configuring advanced inputs. For example, given some pipeline that looks like Processor A -> Processor B -> Processor C, suppose a user wants to configure Processor D at the end. They go into the advanced view to configure an input transformation. To fetch the transformed document data up to Processor D, this PR adds logic to 1/ create a temporary pipeline configuration containing Processor A -> Processor B -> Processor C, collect the currently-configured documents in the form, run _simulate, and display the transformed doc results. See demo for a visual explanation
updating/adding global interfaces, including for the expected input/output of the simulate pipeline API, as these can get complicated and error-prone, and to make the code cleaner & more readable

Demo video, showing the advanced flow. Expected input to each processor is shown. For the first/initial processor, the expected input is simply the list of documents. For the second processor, the expected input is a transformed version of the document, containing the embeddings values. Internally, this creates an ingest pipeline containing all of the preceding processors and executes _simulate against the provided documents. Note a single input_map transform is made to convert the hello field to the model's expected input field.

screen-capture.4.webm

Additional note: there is no defined UX on this, subject to change. But core functionality will remain the same.

Issues Resolved

Makes progress on #23

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… unnecessary data from ml processor ui config; Signed-off-by: Tyler Ohlsen <[email protected]>

Signed-off-by: Tyler Ohlsen <[email protected]>

…lation; Signed-off-by: Tyler Ohlsen <[email protected]>

Signed-off-by: Tyler Ohlsen <[email protected]>

ohltyler · 2024-07-17T21:54:51Z

mend failure can be ignored - it is occasionally failing to run and marking as failed. No new deps are added in this PR.

Signed-off-by: Tyler Ohlsen <[email protected]> (cherry picked from commit 62d18e4)

…#221) Signed-off-by: Tyler Ohlsen <[email protected]> (cherry picked from commit 62d18e4) Co-authored-by: Tyler Ohlsen <[email protected]>

ohltyler added 7 commits July 16, 2024 10:36

Add ML processor inputs component; add simple/advanced toggle; remove…

4a840ff

… unnecessary data from ml processor ui config; Signed-off-by: Tyler Ohlsen <[email protected]>

Set up baseline modals; set up stub for fetching ingest input

24afc14

Signed-off-by: Tyler Ohlsen <[email protected]>

Set up conversion fns

0083b1a

Signed-off-by: Tyler Ohlsen <[email protected]>

Get valid simulate API JSON body complete

fbff5d7

Signed-off-by: Tyler Ohlsen <[email protected]>

Onboard simulate pipeline API; integrate and get working in UI

e4acea0

Signed-off-by: Tyler Ohlsen <[email protected]>

Add bulk api; refactor ingest to use bulk api; support multi-doc simu…

042f3d3

…lation; Signed-off-by: Tyler Ohlsen <[email protected]>

Update defaults/wording around ingest docs being an array

b4abd93

Signed-off-by: Tyler Ohlsen <[email protected]>

ohltyler added backport 2.x rapid advanced transform labels Jul 17, 2024

ohltyler requested review from dbwiddis, owaiskazi19, joshpalis, amitgalitz and jackiehanyang as code owners July 17, 2024 20:06

ohltyler marked this pull request as draft July 17, 2024 20:07

ohltyler changed the title ~~Add partial support of advanced input transformations~~ [WIP] Add partial support of advanced input transformations Jul 17, 2024

ohltyler changed the title ~~[WIP] Add partial support of advanced input transformations~~ [WIP] Add partial support of advanced input transformations (ingest) Jul 17, 2024

ohltyler added 3 commits July 17, 2024 13:22

Refactor JSON array into standalone field type / validation / defaults

f728c18

Signed-off-by: Tyler Ohlsen <[email protected]>

fix multi-processor editing preceding processor bug

8e5f6fd

Signed-off-by: Tyler Ohlsen <[email protected]>

Remove radio; other minor visual updates

b1797c8

Signed-off-by: Tyler Ohlsen <[email protected]>

ohltyler changed the title ~~[WIP] Add partial support of advanced input transformations (ingest)~~ Add partial support of advanced input transformations (ingest) Jul 17, 2024

ohltyler marked this pull request as ready for review July 17, 2024 21:51

dbwiddis approved these changes Jul 18, 2024

View reviewed changes

minalsha approved these changes Jul 18, 2024

View reviewed changes

ohltyler merged commit 62d18e4 into opensearch-project:main Jul 18, 2024
5 of 6 checks passed

ohltyler deleted the transform-flow branch July 18, 2024 20:38

opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 18, 2024

Add partial support of advanced input transformations (ingest) (#220)

b42437d

Signed-off-by: Tyler Ohlsen <[email protected]> (cherry picked from commit 62d18e4)

opensearch-trigger-bot bot mentioned this pull request Jul 18, 2024

[Backport 2.x] Add partial support of advanced input transformations (ingest) #221

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add partial support of advanced input transformations (ingest) #220

Add partial support of advanced input transformations (ingest) #220

ohltyler commented Jul 17, 2024 •

edited

Loading

ohltyler commented Jul 17, 2024

Add partial support of advanced input transformations (ingest) #220

Add partial support of advanced input transformations (ingest) #220

Conversation

ohltyler commented Jul 17, 2024 • edited Loading

Description

Issues Resolved

Check List

ohltyler commented Jul 17, 2024

ohltyler commented Jul 17, 2024 •

edited

Loading