Replies: 4 comments 2 replies
-
@btovar Do you have a transaction log of the NDCMS run? Maybe it can give me some insights into the workflow. |
Beta Was this translation helpful? Give feedback.
-
Yes, sending by other means. |
Beta Was this translation helpful? Give feedback.
-
Also, I was thinking on your idea of sending N tasks and kill those that exhaust resources. The problem with that approach is that the resources are not really partitioned in the machine. Say I have tasks that use 2 cores, but I don't know that, and that I have a worker with 24 cores. If I send 24 tasks to the worker, then the resource monitor is going to tell me that the tasks used less than 2 cores, and if I'm really unlucky, its going to tell me they are fine using 1 core. Any stats of that run are suspect, because the wall time is going to be at least two times at least what it should be. (And even more given all the context switches required.) |
Beta Was this translation helpful? Give feedback.
-
I just found this paper. Basically the worker keeps the maximum resource consumption of a task over a time window (most recent 5 mins for example) and readjust that task's resource limit on-the-fly. Sounds pretty promising, and we'll only have to be careful about when spikes happen. |
Beta Was this translation helpful? Give feedback.
-
Motivation
Tasks in topcoffea may run for 20 seconds, or for more than 20 minutes. Usually tasks are submitted clustered together, for example, in the default NDCMS run that processes all the needed data, all short running tasks are submitted together, and then all the long running tasks. This causes runs to be executed slowly as the long tail of tasks finishes.
Solutions that are not likely not work
One immediate thought may be to separate these tasks into two separate categories. However, in this particular case, such change does not improve performance, and in fact is counterproductive, because:
Another solution would be to randomize the waiting list of tasks. This helps with a better packing of the tasks (short and long running together), but reduces throughput as some long running tasks are going to use whole workers from the start.
** Proposed solution **
When a task is running using the whole worker, it may receive an updated maximum resource allocation to use. This update is constructed from tasks that finished after the target task was submitted. The update is also constructed using the uniform partition allocation in a worker (i.e., allocating half the cores means allocation half the memory). With this change, resource allocation can always be revised down, and up only if the worker has space for it. Thus, more tasks can be fitted in a worker as soon as we know something about the size of the tasks.
For the topEFT particular case, this change by itself won't help. It must be combined with some other strategy that allows for better packing, such as the randomization of the waiting list.
** Necessary changes **
resource_monitor
to allow for resource limits updates. (via files?, sockets?)Beta Was this translation helpful? Give feedback.
All reactions