diff --git a/data/planet/gallium/rethinking-ocaml-abstract-signatures.md b/data/planet/gallium/rethinking-ocaml-abstract-signatures.md new file mode 100644 index 0000000000..2dde2ea3cc --- /dev/null +++ b/data/planet/gallium/rethinking-ocaml-abstract-signatures.md @@ -0,0 +1,555 @@ +--- +title: Rethinking OCaml abstract signatures +description: +url: http://cambium.inria.fr/blog/rethinking-ocaml-abstract-signautres +date: 2023-12-01T08:00:00-00:00 +preview_image: +featured: +authors: +- GaGallium +source: +--- + + + + +
Abstract module types are one of the less understood features of the +OCaml module system. They have been one of the obstacles in the on-going +effort to specify, and eventually redesign, the module system.
+In this blog post, I (Clément Blaudeau) present an explanation of +what are those abstract module types, and propose a slightly restricted +version that might be easier to understand and specify while remaining +pretty expressive.
+ + + + +For the past 2 years, I’ve been working on building a new +specification for the OCaml module system based on Fω. The goal, besides +the theoretical interest, is to eventually redo (the module part of) the +typechecker with this approach, which would have several benefits:
+Yet, a key aspect in OCaml development culture is to ensure backward +compatibility. Therefore, the new Fω approach I’ve been building should +not only subsumes the current typechecker in normal use cases, but +actually support all of the features of the module system. For +long, abstract signatures (also called abstract module +types) were believed to be, at least, problematic for Fω. +Hopefully, we found out that a slightly restricted version of the +feature was encodable in Fω, and, in passing, made the semantics of +abstract signatures much simpler. Thus, only one question remains: does +this restricted form actually covers all use cases, i.e., is the +restriction backward compatible ?
+Here, we aim at presenting the current state of abstract signatures +and our proposed simplification purely from an OCaml user point +of view, not from the theoretical one. We welcome any feedback, +specifically, use cases or potential use cases that significantly differ +from our examples.
+We start by introducing abstract signatures through examples. Then, +we present the current state of abstract signatures in OCaml: we explain +the syntactic approach and the issues associated with it. We argue that +it has surprising behaviors and, in its current unrestricted +form, it is actually too powerful for its own good. Then, we +propose a restriction to make the system predicative which, by +decreasing its expressiveness, actually makes it more usable. (Our +actual proposal is given in 3.3). We finish by other aspect related to +usability (syntax, inference).
+The art of modularity is all about controlling abstraction and
+interfaces. ML languages offer this control via a module
+system, which contains a signature language to describe
+interfaces. Signatures contain declarations (fields): values
+val x : t
, types type t = int
, modules
+module X : S
, and module types
+module type T = S
. Type and module type declarations can
+also be abstract type t
, module type T
, which
+serves both to hide implementation details via sealing and to
+have polymorphic interfaces, using functors.
Here, we focus on the construct module type T
, called
+abstract module type or abstract signature. We start
+with examples adapted from this
+forum discussion.
Let’s consider the following scenario. Two modules providing an
+implementation of UDP (UDP1
and UDP2
) are
+developed with different design trade-offs. They both implement a
+signature with basic send and receive operations. Then, functors are
+added as layers on top: taking a udp library as input, they return
+another udp library as an output.
Reliable
adds sequence numbers to the packets and
+re-sends missing packets;CongestionControl
tracks the rate of missing packets to
+adapt the throughput to network congestion situations;Encryption
encrypts the content of all messages.A project might need different combinations of the basic libraries
+and functors, while requiring that all combinations use
+encryption. To enforce this, the solution is to use the module-level
+sealing of abstract signatures. In practice, the signature of
+the whole library containing implementations and functors
+UDPLib
(typically, its .mli
file) is rewritten
+to abstract all interfaces except for the output of the
+Encryption
functor.
module type UDPLib = sig
+ module type UNSAFE
+
+ module UDP1 : UNSAFE
+ module UDP2 : UNSAFE
+
+ module Reliable : UNSAFE -> UNSAFE
+ module CongestionControl : UNSAFE -> UNSAFE
+
+ module Encryption : UNSAFE ->
+ sig val send : string -> unit (* ... *) end
+end
+
Just as type abstraction, signature abstraction can be used to
+enforce certain code patterns: users of UDPLib
will only be
+able to use the content of modules after calling the
+Encryption
functor, and yet they have the freedom to choose
+between different implementations and features:
module UDPKeyHandshake = Encryption(Reliable(UDP1))
+module UDPVideoStream = Encryption(CongestionControl(UDP2))
+(* etc *)
+
Another use is to introduce polymorphism at the module level. Just as
+polymorphic functions can be used to factor code, module-level
+polymorphic functors can be used to factor module expressions. If a code
+happens to often feature functor applications of the form
+Hashtbl.Make(F(X))
or Set.Make(F(X))
, one can
+define the MakeApply
functor as follows:
(* Factorizing common expressions *)
+module type Type = sig module type T end
+module MakeApply
+ (A:Type) (X: A.T)
+ (B:Type) (F: A.T -> B.T)
+ (C:Type) (H: sig module Make : B.T -> C.T end) = H.Make(F(X))
+
Downstream the code is rewritten into
+MakeApply(...)(X)(...)(F)(...)(Set)
or
+MakeApply(...)(X)(...)(F)(...)(Hashtbl)
Right now, the
+verbosity of such example would probably be a deal-breaker. We address
+this aspect at the end. Ignoring the verbosity, this can be useful for
+maintenance: by channeling all applications through
+MakeApply
, only one place needs to be updated if the arity
+or order of arguments is changed. Similarly, if several functors expect
+a constant argument containing – for instance – global
+variables, a ApplyGv
functor can be defined to always
+provide the right second argument, which can even latter be hidden away
+to the user of ApplyGv
:
(* Constant argument *)
+module Gv : GlobalVars
+module ApplyGv (Y : sig module type A module type B end)
+ (F : Y.A -> GlobalVars -> Y.B)(X : Y.A) = F(X)(Gv)
+
Downstream, code featuring F(X)(GlobalVars)
is rewritten
+into ApplyGv(...)(F)(X)
Then, the programmer can hide the
+GlobalVars
module while letting users use
+ApplyGv
, ensuring that global variables are not modified in
+uncontrolled ways by certain part of the program.
Finally, polymorphism can also be used by a developer to prevent
+unwanted dependencies on implementation details. If the body of a
+functor uses an argument with a submodule X
, but actually
+does not depend on the content of S
, abstracting it is a
+“good practice”.
module F (Arg : sig ... module X : S ... end) =
+ ... (* polymorphism is not enforced *)
+
+module F' (Y: sig module type S end)
+ (Arg : sig ... module X : Y.S ... end ) =
+ ... (* polymorphism is enforced *)
+
Fundamentally, these example are not surprising for developers that +are used to rely on abstraction to protect invariants and factor code. +Their specificity lies in the fact that there are at the module level, +and therefore require projects with a certain size and a strong emphasis +on modularity to be justified.
+The challenge for understanding (and implementing) abstract
+signatures lies more in the meaning of the module-level
+polymorphism that they offer than the module level sealing, the
+latter being pretty straightforward. More specifically, the crux lies in
+the meaning of the instantiation of an abstract signature
+variable A
by some other signature S
, that
+happens when a polymorphic functor is applied. OCaml follows an
+unrestricted syntactical approach: A
can be
+instantiated by any (well-formed) signature S
. During
+instantiation, all occurrences of A
are just replaced by
+S
; finally, the resulting signature is
+re-interpreted—as if it were written as is by
+the user.
However, this syntactical rewriting interferes with the +variant interpretation of signatures, which can lead to +surprising behaviors. We discuss this aspect first. The +unrestricted aspect leads to the (infamous) Type : +Type issue which has some theoretical consequences. We finish this +section by mentioning other—more technical—issues.
+The first key issue of this approach comes from the fact that +signatures in OCaml have a variant interpretation: abstract fields (1) +have a different meaning (sealing or polymorphism) depending on whether +they occur in positive or negative positions, and (2) abstract fields +open new scopes, i.e. duplicating an abstract type field introduces two +different abstract types. Overall, OCaml signatures can be thought of as +having implicit quantifiers: using a signature in positive or +negative position changes its implicit quantifiers (from existential to +universal) while duplicating a signature duplicates the quantifiers (and +therefore introduces new incompatible abstract types).
+Therefore, when instantiating an abstract signature with a signature
+that has abstract fields, the user must be aware of this, and mentally
+infer the meaning of the resulting signature. To illustrate how it can
+be confusing, let’s revisit the first motivating example and let’s
+assume that the developer actually want to expose part of the interface
+of the raw UDP
libraries. One might be tempted to
+instantiate UNSAFE
with something along the following
+lines:
module type UDPLib_expose = sig
+ include UDPLib with module type UNSAFE =
+ sig
+ module type CORE_UNSAFE
+ module Unsafe : CORE_UNSAFE (* this part remains abstract *)
+ module Safe : sig ... end (* this part is exposed *)
+ end
+end
+
This returns :
+module type UDPLib_expose = sig
+ module type UNSAFE =
+ sig
+ module type CORE_UNSAFE
+ module Unsafe : CORE_UNSAFE
+ module Safe : sig ... end
+ end
+ module UDP1 : UNSAFE
+ module UDP2 : UNSAFE
+ module Reliable : UNSAFE -> UNSAFE
+ module CongestionControl : UNSAFE -> UNSAFE
+ module Encryption : UNSAFE -> sig val send : string -> unit (* ... *) end
+end
+
However, the syntactical rewriting and reinterpretation of this
+signature in the negative positions produces a counter-intuitive result.
+For instance, if we expand the signature of the argument for the functor
+Reliable
(for instance) we see:
module Reliable :
+sig
+ module type CORE_UNSAFE
+ module Unsafe : CORE_UNSAFE
+ module Safe : sig ... end
+end -> UNSAFE
+
This means that the functor actually has to be polymorphic
+in the underlying implementation of CORE_UNSAFE
, rather
+than using the internal details, which has the opposite
+meaning as before. If the user wants to hide a shared
+unsafe core, accessible to the functor when they were defined and then
+abstracted away, the following pattern may be used instead:
module type UDPLib_expose' = sig
+ module type CORE_UNSAFE
+ include UDPLib with module type UNSAFE = sig
+ module type CORE_UNSAFE = CORE_UNSAFE
+ module Unsafe : CORE_UNSAFE
+ module Safe : sig ... end
+ end
+end
+
Doing so, the instantiated signature does not contain abstract fields +and therefore its variant reinterpretation will not introduce unwanted +polymorphism. This observation is at the core of the proposal of this +post.
+Type : Type
and
+impredicativityAbstract module types are impredicative: a signature containing an +abstract signature can be instantiated by itself. One can trick the +subtyping algorithm into an infinite loop of instantiating an abstract +signature by itself, as shown by Andreas +Rosseberg, adapting an example from Harper and Lillibridge +(POPL ’94). This also allows type-checking of (non-terminating) +programs with an absurd type, as shown by the encoding of the Girard’s +paradox done by Leo +White.
+The current implementation of the typechecker does not handle +abstract signatures correctly in some scenarios. It’s unclear if they +are just bugs or pose theoretical challenges.
+Inside a functor, module aliases are disallowed between the parameter +and the body (for soundness reasons, due to coercive subtyping). +However, this check can be bypassed by using an abstract signature that +is then instantiated with an alias. If we try to use it to produce a +functor that exports its argument as an alias, the typechecker crashes. +This is discussed in #11441
+(* crashes the typechecker in current OCaml *)
+module F (Type : sig module type T end)(Y : Type.T) = Y
+
+module Crash (Y : sig end) =
+ F(struct module type T = sig module X = Y end end)
+
The use of abstract signatures clashes with applicativity of +functors, as discussed in #12204.
+Another known issue is that the typechecker can abstract a signature +when it contains unreachable type fields (types pointing to anonymous +modules). This can lead to the production of invalid signatures +: signatures that are refused by the typechecker when re-entered back +in.
+module F (Y: sig type t end) =
+ struct
+ module type A = sig
+ type t = Y.t (* this will force the abstraction of all of A *)
+ type u
+ end
+ module X : A = struct type t = Y.t type u = int end
+ type u = X.u
+ end
+
+module Test = F(struct type t end)
+
+(* returns *)
+module Test : sig module type A module X : A type u = X.u end
+
Here, the type field type u = X.u
is invalid as
+X
has an abstract signature (and therefore, no fields).
In this section we explore solutions for fixing the issues of the +current approach. The core criticism we make of the OCaml approach is +that it is actually too expressive for its own good. Abstract +signatures are impredicative: they can be instantiated by +themselves. Having impredicative instantiation with variant +reinterpretation is hard to track for the user and interacts in very +subtle ways with other features of the module system, slowing down its +development—and breaking its theoretical properties. To address this, we +take the opposite stance and propose to make the system actually +predicative: we restrict the set of signatures that can be used +to instantiate an abstract signature. This also indirectly addresses the +complexity of the variant reinterpretation.
+We start with the simplest solution where instantiation of abstract +signatures is restricted to signatures containing no abstract fields. +Then, we propose to relax this restriction and allow for signatures that +contain abstract type fields (but no abstract module types), +which we call simple signatures. This will requires us to +briefly discuss the need for module-level sharing.
+In this section we focus on the theoretical aspects, but present them +informally with examples. The practical aspects, notably syntax and +inference, are discussed in the next section.
+One might wonder why abstract types and abstract signatures +syntactically resembles one another and yet, the latter is much more +complex than the former. The key lies in the fact that abstract types +can only be instantiated by concrete type expressions, without +free variables. Informally, this:
+sig
+ type t
+ val x : t
+ val f : t -> t
+end with type t = (int * 'a)
+
is not allowed, notably because (1) the scope of the abstract type
+variable 'a
is unclear, (2) values of type t
,
+like x
, would be ill-typed.
Therefore, a first solution is to require abstract signatures +to be instantiated only by concrete signatures, i.e. signatures with no +abstract fields (neither types nor module types). This +circumvents the clash between the rewriting and variant reinterpretation +of abstract fields (by disallowing them).
+This is simple and sound but prevents some valid uses of abstract
+types: in the first example, UNSAFE
could not be
+instantiated with abstract type fields, forcing UDP1
and
+UDP2
to have the same type definitions.
If we want to relax the no-abstraction proposal, some abstract fields +will be allowed when instantiating signatures. Then, the question of +what sharing (i.e., type equalities) should be kept between +different occurrences of the abstract fields arises.
+In OCaml signatures, sharing between two modules is usually expressed +at the core-level by rewriting the fields of the signature of +the second module to refer to their counterpart in the first one. This +cannot be done with abstract signatures, as they have no fields. +Instead, the language needs module-level sharing, which in OCaml is very +restricted. Indeed, it provides a form of module aliases (only for +submodules, not at the top-level of a signature), but aliasing between a +functor body and its parameter is not allowed—while it is typically the +use-case for abstract signatures in polymorphic functors. Consider the +following code:
+(* Code *)
+module F1 (Y: sig module type A module X : A end) = Y.X
+module F2 (Y: sig module type A module X : A end) = (Y.X : Y.A)
+
Currently, the typechecker cannot distinguish between the two and +returns the same signature, while we would expect the first one to keep +the sharing between the parameter and the body.
+(* Currently, both are given the same type: *)
+module F1 (Y: sig module type A module X : A end) : A
+module F2 (Y: sig module type A module X : A end) : A
+
As an example, we can consider the argument for the functors:
+module Y = struct
+ module type A = sig type t end
+ module X = struct type t = int end
+end
+
+module Test1 = F1(Y)
+module Test2 = F2(Y)
+
This returns :
+module Test1 : sig type t end
+module Test2 : sig type t end
+
While we would expect :
+module Test1 : sig type t = int end
+module Test2 : sig type t end
+
Two possible extensions would help tackle this issue.
+A recently proposed experimentation, named lazy
+strengthening, extends the signature language with an operator
+S with P
, where S
is a signature and
+P
a module path. It is interpreted as S
+strengthened by P
, i.e. S
in which all
+abstract fields are rewritten to point to their counterpart in
+P
. Initially considered for performance reasons, it would
+allow for tracking of type equalities when using abstract
+signatures.
(* Lazy strengthening would keep type equalities: *)
+module F1 (Y: sig module type A module X : A end) = Y.A with Y.X
+
A more involved solution is the use of an extension of aliasing
+called transparent ascription, where both the alias
+and the signature are stored in the signature. The signature
+language would be extended with an operator (= P < S)
.
+The technical implications of this choice are beyond the scope of this
+discussion.
(* Transparent ascription would keep module equalities: *)
+module F1 (Y: sig module type A module X : A end) : (= Y.X < Y.A)
+
Maintaining a predicative approach, we propose to restrict +instantiation only by simple signatures, i.e., signatures that +may contain abstract type fields, but no abstract module types. +This reintroduces the need to express module-level sharing and the +mental gymnastic of variant re-interpretation of abstract type fields. +However, it guarantees that all modules sharing the same abstract +signature will also share the same structure (same fields) after +instantiation, and can only differ in their type fields. We believe this +makes for a good compromise.
+One might wonder how restrictive is this proposal. Specifically, if +we consider a simple polymorphic functor as:
+module Apply (Y : sig module type A end) (F : Y.A -> Y.A)(X : Y.A) = F(X)
+
The following partial application would be rejected:
+(* Rejected as A would be instantiated by `sig module type B module X : B -> B end` *)
+module Apply' = Apply(struct module type A = sig module type B module X : B -> B end end)
+
However, this could be circumvented by eta-expanding, thus +expliciting module type parameters, and instantiating only a simple +signature:
+(* Accepted as A is instantiated by a signature with no abstract fields *)
+module Apply'' = functor (Y:sig module type B end) ->
+ Apply(struct module type A = sig module type B = Y.B module X : B -> B end end)
+
Concrete and simple signatures can be seen as the first two levels of +the predicative approach for types declarations. There are no more +levels for type declarations, as types cannot be partially +abstract (see 3.1). Could it be useful to add even more +expressivity and authorize instantiation by a signature containing again +an abstract module type field (which would need to be restricted with a +level system like universes)? We have found no example where +this was useful. Besides, it would add a great layer of complexity.
+A key aspect of abstract module types that reduces their usability is +the verbosity of the syntax. Rather than having to pass signature as +part of a module argument to a polymorphic functor, using a separate +notation for module type parameters could be more concise. In practice, +abstract signature arguments could be indicated by using brackets +instead of parenthesis, and interleaved with normal module arguments, as +in this example:
+(* At definition *)
+module MakeApply
+ [A] (X:A)
+ [B] (F: A -> B)
+ [C] (H : sig module Make : B -> C end)
+ = H.Make(F(X))
+
+module ApplyGv
+ [A] [B] (F:A -> GlobalVars -> B) (X:A)
+ = F(X)(Gv)
+
+(* At the call site *)
+module M1 = MakeApply
+ [T] (X)
+ [Hashtbl.HashedType] (F)
+ [Hashtbl.S] (Hashtbl)
+
+module M2 = ApplyGv [A] [B] (F) (X)
+
Technically, this is not just syntactic sugar for anonymous +parameters due to the fact that OCaml relies on names for applicativity +of functors.
+Following up on the previous point, usability of abstract signatures +could even be improved with some form of inference at call sites. +Further work is needed to understand to what extend this could be +done.
+We have presented the feature of abstract signatures in +OCaml. After showing use cases via examples, we explained the issues +associated with the unrestricted syntactical approach. Then, we propose +a new specification: simple abstract signatures. In addition to +making the behavior of abstract signatures much more predictable for the +user, this approach can be fully formalized by translation into Fω +(extended with predicative kinds).
+As stated above, our goal here was both to sum up the current state +and our proposal, but also to gather feedback from users or potential +users. In particular, we want to see if it can indeed cover all use +cases, and if we missed other usability problems.
+ +