Re-Work of Our Pipeline Logic for Cycles #8339

silvanocerza · 2024-09-06T11:55:09Z

silvanocerza
Sep 6, 2024
Maintainer

With Haystack 2.0, our pipelines which also allow for loops and cycles became a lot more flexible than they were with Haystack 1.x.

However, we also noticed in the last few weeks that the current logic can have some issues when it comes to validation, as well as some unpredictable behaviour when there are cycles in the pipeline.

Overall, the problems we noticed happened in incredibly complex pipeline architectures, where components may run when they are not supposed to, the pipeline returned partial results etc.

So, we’ve worked on re-architecting the underlying pipeline logic to make sure pipelines with cycles run as expected no matter the architectural complexity of the pipeline.

What now?

Our re-work is not yet released because we need your help!

All our internal tests of the core logic are passing. We’re really confident in the robustness of the rework but given the wide range of possible ways to build a Pipeline we can’t know if we missed some small corner case.

That’s why we ask you, our community, for some help testing this out.

To test this new logic install directly from the subgraph branch with the following command:

pip install git+https://github.com/deepset-ai/haystack.git@subgraphs

This is based on the latest v2.5.1 release and has the same set of features apart from the different Pipeline.run() logic.

Once released, we also plan to write a nice blog post to explain the inner workings of this rework.

🙏 Please test out our re-work, try out all your looping pipeline ideas, and let us know what you think, and if you find any problems! We will keep this discussion open until this re-work is merged

vblagoje · 2024-09-06T14:10:56Z

vblagoje
Sep 6, 2024
Maintainer

A great starting point to better understand cycles—and a fascinating example of cycles in action—is the ReAct pipeline.

👉 Check it out here: ReAct Haystack Notebook

0 replies

prosto · 2024-09-16T11:42:17Z

prosto
Sep 16, 2024

hi,

By looking at the documentation for pipelines https://docs.haystack.deepset.ai/docs/pipelines it is not evident the exact rule of what actually triggers a component in the pipeline. I saw some questions people asking about default values / pipeline inputs / connected inputs. So in terms of cycles, what if we have all nodes connected in a loop, with all nodes receiving pipeline inputs, but also all nodes are connected as well, and all node have some non-connected inputs which have default values. I know its hypothetical but if we plan to have a robust pipeline execution logic I believe it should be based on rules so it should be easy to answer where such pipeline will start its execution and how it works exactly. It should be possible to grasp the logic by just looking at pipeline diagram without actually testing it and then finding out how it works , or by looking in implementation - e.g. running queues vs waiting queues). I think a good documentation with deterministic rules would be helpful.

1 reply

silvanocerza Sep 17, 2024
Maintainer Author

We plan to have a better explanation on how the new Pipeline logic will work. I agree that the current documentation is not the best explanation of the current state of the Pipeline though. We planned to write some more detailed documentation in the months but decided to put that on hold cause of the above rework.

I completely agree that one should be able to understand how a Pipeline will run just by looking at it. My goal with this rework is exactly this, having a easily predictable way to undestand the execution flow of a Pipeline. Without knowing the whole nitty gritty of the internals.

Though that's not easy at all. The main problems with our Pipelines are cycles. If you remove those completely you have a graph that is easy to sort and traverse, and thus execute. This new approach is the best effort really.

Covering every use case is basically impossible. I test the Pipeline with the most common use cases, but predicting or knowing all possible graphs combinations and testing them is not feasible. The problem space is immense and I have no doubt that we'll discover use cases in the future that we can't support.

I don't think we'll support every possible comination. Though I think this approach is the solution that solves most of the common problems.

prosto · 2024-10-04T10:17:07Z

prosto
Oct 4, 2024

hi @silvanocerza ,

I was am trying to test the following pipeline with a loop https://colab.research.google.com/drive/1YHJ8-NgGdtP3yRQXfyXSSKeAWZ2iYb3Y?usp=sharing and got confused about the order of execution. In particular , I am not sure the "sum_1" should run twice.. Can you please take a look ? thanks

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-Work of Our Pipeline Logic for Cycles #8339

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Re-Work of Our Pipeline Logic for Cycles #8339

silvanocerza Sep 6, 2024 Maintainer

What now?

Replies: 3 comments · 1 reply

vblagoje Sep 6, 2024 Maintainer

prosto Sep 16, 2024

silvanocerza Sep 17, 2024 Maintainer Author

prosto Oct 4, 2024

silvanocerza
Sep 6, 2024
Maintainer

Replies: 3 comments 1 reply

vblagoje
Sep 6, 2024
Maintainer

prosto
Sep 16, 2024

silvanocerza Sep 17, 2024
Maintainer Author

prosto
Oct 4, 2024