-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalable Deployment of OpenWhisk 2.0.0 #5510
Comments
I just want to ask you a couple of questions first. |
Hey @style95 thank for taking this up How did you deploy OpenWhisk-2.0.0? Are you able to invoke an action? So this question is something that is part of my doubts. As in, if I want to deploy OpenWhisk 2.0.0 in a multi node setting in a scalable manner, how should I go about it? Is ansible a way for that, or is there some way to user kubernetes for this? What is your concurrency limit and userMemory assigned to invokers? Some more updates from the benchmarking I did in the meantime:
So I think the warm containers not being used amongst different actions is what's causing my workflows to not scale. I saw invoker tags based scheduling in the docs and that is some temporary fix for my use case, that is in 2.0.0 and not 1.0.0. But bigger concern is my limited understanding of warm containers reuse amongst different actions. Where do I get more information about this? Is this the intentional way warm containers reuse is supposed to run? |
You're experiencing the hot spotting / container swapping problem of the best effort 1.0.0 scheduling algorithm. If your container pool is full and no functions exist for Action B, you need to evict an Action A in order to cold start an Action B. You will find that performance should be significantly better on Openwhisk 2.0 with the new scheduler for the traffic pattern you're trying to test. |
I think @bdoyle0182 comment hits the root of the confusion. OpenWhisk will never reuse a container that was running action A to now run action B, even if A and B are actions of the same user that used the same runtime+memory combination. There is a related concept of a stem cell container (the default configuration is here https://github.com/apache/openwhisk/blob/master/ansible/files/runtimes.json#L44-L56), where if there is unused capacity the system tries to hide container creation latency by keeping a few unused containers for popular runtimes+memory combinations up and running into which it can inject the code for a function on its first invocation. But once the code is injected, these containers are bound to a specific function and will never be used for anything else. |
Indeed @dgrove-oss, the explanation by @bdoyle0182 made my understanding about the underlying problem in my benchmarking technique and warm container re-use much clear. Thanks a lot @bdoyle0182. Circling back to my main question back again, is setting up OpenWhisk 2.0.0 on a kubernetes cluster for benchmarking is a good way forward? Or are there any other well tested scalable ways for multi-node deployment of the whole stack. I've some experience with Ansible but haven't used ansible with a goal of multi-node clustering. Since I've realized that OpenWhisk 2.0.0 comes with a lot of improvements and worth spending time into instead of writing hackish fixes into version 1 for my use cases, I am trying to get the helm chart to support 2.0.0 as this should help others looking to run the latest version too. |
Sorry to be slow; I thought I had responded but actually hadn't. I'll have to defer to others (@style95 @bdoyle0182) to comment about how they are deploying OpenWhisk 2.0. From a community standpoint, I think it would be great if we could get the helm chart updated. It's unfortunately a couple of years out-of-date, so it may take some effort to update version dependencies, etc. |
Yes, that’s my long-overdue task. I’ve just started looking into it but haven’t completed it yet. I see some areas for improvement, but I’m still struggling to set up my own Kubernetes environment first. Since I’m involved in this project purely out of personal interest, it’s taking more time than I expected. |
Hey @style95 I've started working on it. Planning to update and do sanity testing for non openwhisk images first like redis, kafka, etc. Then I'll move on to controller, invoker and other openwhisk related images. Kinda stuck with a paper deadlines, but expecting to make a PR soon and then iterate over it. |
@singhsegv Great! |
I am very confused about how the OpenWhisk 2.0.0 is meant to be deployed for a scalable bechmarking setting. Need some help from the maintainers to understand what am I missing since I've spent a large amount of time now and still missing some key pieces.
Context
We are using OpenWhisk for a research project where workflows (sequential as well as fork/join) are to be deployed and benchmarked at 1/4/8 RPS etc for long period of times. This is to compare private cloud FaaS vs public cloud FaaS.
Current Infrastructure Setting
We have a in-house cluster with around 10 VMs running on different nodes, 50 vCPUs and around 200Gb of memory. Since I am new to this, I've initially followed https://github.com/apache/openwhisk-deploy-kube to deploy it and along with OpenWhisk Composer, was able to get the workflows running with a lot of small fixes and changes.
Problems with Current Infrastructure
/init
and I am unable to debug why that is happening.Main Doubts about scaling
@style95 @dgrove-oss Since you people have been active in the community and have answered some of my previous queries too, any help on this will be very appreciated.
We are planning to go all in with OpenWhisk for our research and planning to contribute some good changes back to the community relating to FaaS at edge and improving the communication times in FaaS. But since none of us have infrastructure as our strong suite, getting over these initial hiccups is a becoming a blocker for us. So looking forward to some help, thanks :).
The text was updated successfully, but these errors were encountered: