Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: stable memory block names for globals #220

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

jeremie-koenig
Copy link
Contributor

These changes introduce a more structured namespace for memory blocks: a block id can either be a global ident (new), or a dynamically allocated positive (as before). Using this approach, we can ensure that global environments always assign the same block name to a global identifier id, so that for all ge1, ge2 we have:

Genv.find_symbol ge1 id = Some b1 ->
Genv.find_symbol ge2 id = Some b2 ->
b1 = b2.

Rationale

In all work around compositional semantics, we need to ensure that the memory blocks for globals are named consistently across modules. For instance,

  • In CompCertX, we force programs to follow a predetermined structure so that the same identifiers will always be mapped to the same block ids.
  • Compositional CompCert assumes that the global environments of linked modules are consistent, arguing that we can always add external definitions to ensure that's the case.

Our changes should make it possible to eliminate these hacks/assumptions.

Status

We have updated all of CompCert v3.2 to use this new naming scheme.

We plan to use these changes in our own ongoing work on compositional compilation, and we may tweak them further over the coming weeks. However, since these changes have a broader potential applicability and we're hoping for them to be merged into CompCert at some point, we want to collect some feedback as we go forward with that plan.

Notes:

  • The new namespace is specified as an abstract Module Type, making it easy to explore richer namespace structures in the future.
  • In Initializers.v, it is no longer necessary for constval/transl_init to abuse Vptr to store the global identifier of referenced variables, since it is now possible to use a pointer value that makes sense.
  • Since global pointers are preserved by compilation, it is now possible to use val directly in events, as demonstrated by our globmem-eventval branch, which defines eventval := val.
  • With the uniform block naming convention, it should be possible to define a linker for global environments corresponding to the existing linker for programs, and to relate new notions of semantic composition to the already implemented syntactic composition.

Known remaining issue:

  • Our implementation of BMap using EMap may be problematic for the performance of the interpreter. This can be easily fixed with some more work.
  • Since block names for globals are predetermined, it might be advantageous to replace find_symbol with a simpler has_symbol boolean predicate (for example), or remove it entirely.

@xavierleroy
Copy link
Contributor

Thanks for the proposal. I haven't had time yet to study the diff but will do soon. @andrew-appel, @jhjourdan, @robbertkrebbers, any opinions on this proposal?

@andrew-appel
Copy link

We will discuss this in the VST group and get back to you.

@andrew-appel
Copy link

Today we (@andrew-appel, @lennartberinger, @scuellar, @jeremie-koenig, @pwilke) discussed this by phone. We agree that the proposal addresses part of the problem, but does not address the problem of consistent (across compilation units) mapping of ASCII strings ("names") to AST.ident. We believe that very possibly this pull request could be a useful part of the solution to that larger problem, but we are still studying and discussing it.

@jeremie-koenig
Copy link
Contributor Author

jeremie-koenig commented Feb 15, 2018

One thing @pwilke and I did not think to mention this during the call: in this pull request, the ASCII names <-> AST.ident mapping does in fact show up, because the interpreter needs to be able to print out pointers:

  • In BlockNames.v we use an axiom ident_to_string: ident -> string to implement a Block.to_string : block -> string function;
  • In extraction.v we realize this axiom as (fun i -> Camlcoq.coqstring_of_camlstring (Camlcoq.extern_atom i));
  • In Interp.ml we use Block.to_string in the definition of format_value.

This allows us to print out a global pointer &globvar[12] as <glob:globvar+48>, and a dynamic one &localvar as <dyn:123+0> (say). Previously both would have been printed out using numerical block-ids. We could move the definition of ident_to_string to a more central place if we want to make that a more general feature or use it in other ways.


This does not really address the problem mentioned by @andrew-appel, which I understand as: What if I want to use strings that never went through the parser, therefore are not in the map? This happens anytime you use CompCert as a library rather than a self-contained compiler: in VST, when you use the output of clightgen; likewise in CertiKOS, our "source-ish" code is Clight/Asm ASTs in .v files.

VST/CertiKOS use things like:

Definition foo : ident := 1%positive.
Definition bar : ident := 2%positive.
...
Definition myast : function := ... foo ... bar ...

Instead, we would want to use something like:

Definition foo : ident := ident_of_string "foo".
Definition bar : ident := ident_of_string "bar",

or perhaps even declare ident_of_string as a coercion and use strings directly when defining myast. This way we would never again need to mess around with potentially inconsistent tables of names <-> ident mappings in VST and CertiKOS, and there would never be a problem putting together modules that have been developped independently.


I believe we can accomplish this in the following way:

In the Coq part of CompCert, we axiomatize ident_to_string and string_to_ident as a predefined, but unspecified bijection between strings and ident's. Such a mapping does exists, but any particular one will be inefficient in general (the strings we do happen to use in a given case will correpond to sparsely distributed, large numbers).

But since we're free to use any such mapping, in the ML part, we can simply implement string_to_ident using a stateful process that allocates small numbers on-demand (as is already done). The key thing here is that although string_to_ident will be effectful, this can never be observed from the outside. In effect, we progressively "discover" the specific mapping we want to use.

Using this approach, we don't need to deal with the statefulness of the table in Coq, but still get the benefits. I believe implementing this can be done independently of our changes to the block namespace.

@jeremie-koenig
Copy link
Contributor Author

Here's a proof-of-concept CertiKOS@85d2b87

In particular, use the name [AST.string_of_ident] and declare it in the
same places, so that a merge would be straightforward.
@jeremie-koenig
Copy link
Contributor Author

At this point I thought I'd write some kind of "diff walk-through" to facilitate any digging people would want to do.

The main changes occur in:

  • common/AST.v declares the string_of_ident parameter (realized together with string_of_positive with Caml code in extraction.v). This is needed in BlockNames.v to print out block names as strings.
  • common/BlockNames.v is a new file which defines the interface and implementation of the new block namespace. It includes an empirical eauto-based blomega tactic that can discharge most of the inequalities about blocks that xomega used to handle.
  • common/Values.v now defines block := Block.t, rather than positive.
  • common/Memtype.v is modified to reflect changes to the memory model interface:
    • There is a new invariant that Block.init <= nextblock m.
    • There is a new operation alloc_at used for allocating global blocks by name.
    • inject_neutral no longer needs a threshold (Block.init is always what we want).
  • common/Memory.v implements those changes.
    • Note that memory states now use BMap rather than PMap. In the most recent version BMap is implemented using IMap through an embedding of the new block type into positive, meaning the performance implications for the interpreter should be minimal.
  • common/Globalenvs.v is somewhat simplified:
    • genv_next is no longer necessary (the constant Block.init now plays this role)
    • genv_defs is now indexed by AST.ident values rather than block
    • the remaining maps and invariants in Genv.t are no longer necessary (they can be simple definitions and lemmas based on genv_defs and block operations); only genv_public remains
    • init_mem is modified to use alloc_at

Less remarkable changes are:

  • common/Maps.v: make sure the MAP interface has everything needed of BMap, and make sure EMap and IMap implement it.
  • cfrontend/Initializers(proof).v has one hack pleasantly removed, whereby ident values would be stored as block in pointers (now stored as Block.glob i).
  • driver/Interp.ml now uses block_compare in state comparisons, and Block.to_string to print out pointers.

The rest of the patch consists of replacing positive operators, predicates, lemmas, and tactics by their new block equivalents --- a set of numerous but fairly trivial changes. The ~550 new lines introduced by the patch mainly are BlockNames.v and the new Mem.alloc_at.

xavierleroy pushed a commit that referenced this pull request Feb 20, 2020
We can get linker errors for addresses of the form "symbol + offset"
where "symbol" is in the small data area and "offset" is large enough
to overflow the relative displacement from the SDA base register.

To avoid this, this commit enriches `C2C.atom_is_small_data`,
which is the implementation of `Asm.symbol_is_small_data` in the PPC port,
with a check that the offset is within the bounds of the symbol.
If it is not, `Asm.symbol_is_small_data` returns `false` and Asmgen produces
an absolute addressing instead of a SDA-relative addressing.

To implement the check, we record the sizes of symbols in the atom table,
just like we already record their alignments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants