Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add named pools #45

Merged
merged 5 commits into from
Sep 30, 2021
Merged

Add named pools #45

merged 5 commits into from
Sep 30, 2021

Conversation

Sudha247
Copy link
Contributor

Adds an optional argument name to setup pool. When the argument is provided, the pool is stored in an existing collection of pools. The name can be used to retrieve the pool later.

lib/task.ml Outdated Show resolved Hide resolved
lib/task.ml Outdated Show resolved Hide resolved
lib/task.ml Outdated Show resolved Hide resolved
lib/task.mli Outdated Show resolved Hide resolved
@Sudha247
Copy link
Contributor Author

Now includes get_num_domains which returns the number of domains in a given pool.

test/test_task.ml Outdated Show resolved Hide resolved
lib/task.ml Outdated
let _ = match name with
| Some x ->
Mutex.lock named_pool_mutex;
Hashtbl.add named_pools x p;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta-comment: I started thinking about the possibility of exceptions from Hashtbl.add. While Hashtbl.add does not raise any synchronous exceptions, there is of course the possibility of asynchronous exceptions. For example, add performs an allocation, which triggers a GC, which runs a finaliser which raises an exception. Because of the exception, the mutex would not be unlocked.

It would be useful to have a with_mutex primitive in Mutex module:

let with_mutex m f v =
  Mutex.lock m;
  begin try f v with e -> 
    Mutex.unlock m;
    raise e
  end;
  Mutex.unlock m

which will guard against any exceptional behaviour. CC @ctk21 @sadiqj.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems not unreasonable to me


let named_pool_mutex = Mutex.create ()

let setup_pool ?name ~num_additional_domains () =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse my ignorance, but why do we need to add the unit () arg to the end of this one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the case when we don't include the unit value argument:

# let setup_pool ?name ~num_additional_domains = ();;
Warning 16 [unerasable-optional-argument]: this optional argument cannot be erased.
val setup_pool : ?name:'a -> num_additional_domains:'b -> unit = <fun>
# setup_pool ~num_additional_domains:5;;
- : ?name:'a -> unit = <fun>

vs when we have it:

# let setup_pool ?name ~num_additional_domains () = ();;
val setup_pool : ?name:'a -> num_additional_domains:'b -> unit -> unit =
  <fun>
# setup_pool ~num_additional_domains:5 ();;
- : unit = ()

https://stackoverflow.com/a/1667891 has more information on this issue.

Copy link
Contributor

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one thing we might want to think about for the named pool lifecycle. I see additions for creating pools, but how do things get removed from the named_pools structure?

You probably want the shutting down and the removal from the named_pools to be easy to use as well.

@kayceesrk
Copy link
Contributor

You probably want the shutting down and the removal from the named_pools to be easy to use as well.

Good point. I have some ideas on this. Let me prototype this quickly.

@kayceesrk
Copy link
Contributor

@Sudha247 See Sudha247#1.

@sadiqj
Copy link

sadiqj commented Sep 29, 2021

This is good.

My only suggestion is that it probably would be good to have a way of iterating the named pools? In a large application you might want to figure out what is keeping a bunch of domains around.

(Is it also worth tracking the anonymous pools we've created? Could this actually be a use for ephemerons?)

@kayceesrk
Copy link
Contributor

kayceesrk commented Sep 30, 2021

Thanks @sadiqj! Iterating over pools is useful. But we only have two operations that we can possibly reasonably do in the iteration -- get_num_domains and teardown_pool. The latter may possibly be useful to bring down all the pools. There may be other operations that we may want to do on a pool. May I suggest we track this request in a separate issue so that we can land the named pool feature.

(Is it also worth tracking the anonymous pools we've created? Could this actually be a use for ephemerons?)

I don't think you need ephemerons. Weak hash sets should be sufficient, right? https://ocaml.org/api/Weak.html#1_Weakhashsets

@kayceesrk
Copy link
Contributor

Created issue #46 to track the request to iterate over all the pools. I'm merging this PR.

@gasche
Copy link
Contributor

gasche commented Oct 18, 2022

Hi there,

I'm currently looking at the code of Domainslib again, and I wonder what the use-case for lookup_pool is? It is not explained in the present PR, nor in the current documentation of Domainslib. I must say that it is not clear to me that this feature is necessary: it is easy to implement on the side in a separate library, and I am not sure that addressing global pools by strings is a future-proof design.

This is briefly discussed by @Sudha247 in ocsigen/lwt#860 (comment). What i understand from the discussion is that domains are an inherently global, non-modular resource, and we collectively are not sure yet how to deal with that. This affects anyone using domains and in particular all Domainslib users. But, as a non-expoert, I don't see how the lookup_pool function improves the situation ? It's of course fine to iterate with an okay solution before we find a better solution, but I think it would be important to articulate the design and motivation for the current solution.

@Sudha247
Copy link
Contributor Author

Pool is essentially a global resource, possibly shared between multiple modules in the same application. Some applications also create multiple pools; to give a simple example, one pool to run I/O operations, another one for rest of the computations etc. In situations such as these, the pools will then have to be threaded through as an argument in the various modules and functions. lookup_pool gives a way to avoid that, by letting one obtain a pool with an identifier, in this case a string. This can indeed be implemented outside the library. We thought enough users of domainslib might run into this, and it might be worth having it in the library itself.

@gasche
Copy link
Contributor

gasche commented Oct 19, 2022

I think it would be easy for applications to have their own global state to do this. In particular, they probably can define a global variable for each pool they care about, and then refer to the pools using the variables, which is safer than using strings. (No risk of typos, etc.)

Do you have an example of an application using lookup_pool? I haven't been able to find anything on Github.

@Sudha247
Copy link
Contributor Author

IIRC @paracetamolo mentioned using it in plonk, not sure if the code is available on gitlab.

@kayceesrk
Copy link
Contributor

kayceesrk commented Oct 20, 2022

The idea of lookup_pool came from @sadiqj's suggestions about how Java manages thread pools. Perhaps @sadiqj will have a reference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants