Skip to content

Commit

Permalink
Merge pull request #2 from codex-semantics-library/switch_log2
Browse files Browse the repository at this point in the history
Switch log2 from zarith to __builtins_clz
  • Loading branch information
dlesbre authored Apr 30, 2024
2 parents d62ec23 + b63d36b commit 5b06e5e
Show file tree
Hide file tree
Showing 11 changed files with 429 additions and 233 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/ocaml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ jobs:
run: echo "ref=$(echo ${GITHUB_REF#refs/*/})" >> ${GITHUB_OUTPUT}

- uses: actions/upload-artifact@v4
if: github.event_name == 'push' &&
( github.ref == 'refs/heads/main' || startsWith(github.ref,'refs/tags') )
with:
name: doc-${{ steps.vars.outputs.ref }}
path: _build/default/_doc/_html/patricia-tree/
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
# Unreleased

- Patricia Tree now support using negative keys. Tree are built using the bitwise representation
of integer, meaning they effectively use an unsigned order. Negative keys are
considered bigger than positive keys, `0` is the minimal number and `-1` the maximal one.
- Renamed `min_binding`, `max_binding`, `pop_minimum`, `pop_maximum`, `min_elt`
and `max_elt` to `unsigned_min_binding`, `unsigned_max_binding`,
`pop_unsigned_minimum`, `pop_unsigned_maximum`, `unsigned_min_elt`
and `unsigned_max_elt` respectively, to clarify that these functions consider
negative numbers as larger than positive ones.
- Fixed a bug where NodeWithId wasn't incrementing ids properly
- `zarith` is no longer a dependency, used GCC's `__builtin_clz` as a faster
method of finding an integer's highest bit.

# v0.9.0 - 2024-04-18

Expand Down
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,7 @@ dune build @doc
and the same convention for order of arguments. This should allow switching to
and from Patricia Tree with minimal effort.
- The functor parameters (`KEY` module) requires an injective `to_int : t -> int`
function instead of a `compare` function. `to_int` should be fast, injective,
and only return positive integers.
function instead of a `compare` function. `to_int` should be fast and injective.
This works well with [hash-consed](https://en.wikipedia.org/wiki/Hash_consing) types.
- The Patricia Tree representation is stable, contrary to maps, inserting nodes
in any order will return the same shape.
Expand All @@ -77,13 +76,18 @@ dune build @doc
for the general one)

- Since our Patricia Tree use big-endian order on keys, the maps and sets are
sorted in increasing order of keys. We only support positive integer keys.
sorted in increasing **unsigned order** of keys.
This means negative keys are sorted above positive keys, with `-1` being the
largest possible key, and `0` the smallest.
This also avoids a bug in Okasaki's paper discussed in [*QuickChecking Patricia Trees*](https://www.cs.tufts.edu/comp/150FP/archive/jan-midtgaard/qc-patricia.pdf)
by Jan Mitgaard.

It also affects functions like `unsigned_min_binding` and `pop_unsigned_minimum`. They will return the smallest
positive integer of both positive and negative keys are present; and not the smallest negative, as one might expect.
- Supports generic maps and sets: a `'m map` that maps `'k key` to `('k, 'm) value`.
This is especially useful when using [GADTs](https://v2.ocaml.org/manual/gadts-tutorial.html) for the type of keys. This is also sometimes called a dependent map.
- Allows easy and fast operations across different types of maps and set (e.g.
an intersection between a map and a set), since all sets and maps, no matter their key type, are really positive integer sets or maps.
- Allows easy and fast operations across different types of maps and set
which have the same type of keys (e.g. an intersection between a map and a set).
- Multiple choices for internal representation (`NODE`), which allows for efficient
storage (no need to store a value for sets), or using weak nodes only (values removed from the tree if no other pointer to it exists). This system can also
be extended to store size information in nodes if needed.
Expand Down Expand Up @@ -294,7 +298,6 @@ These are smaller and closer to OCaml's built-in Map and Set, however:
- These libraries work with older version of OCaml (`>= 4.05` I believe), whereas
ours requires OCaml `>= 4.14` (for the new interface of `Ephemeron` used in
`WeakNode`).
- Our keys are limited to positive integers.

### dmap

Expand Down
8 changes: 5 additions & 3 deletions dune
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,10 @@
(library
(name PatriciaTree)
(public_name patricia-tree)
(libraries zarith)
(modules PatriciaTree))
(modules PatriciaTree)
(foreign_stubs
(language c)
(names int_builtins)))

(documentation
(package patricia-tree))
Expand All @@ -36,5 +38,5 @@
(libraries qcheck-core))
(preprocess
(pps ppx_inline_test))
(libraries PatriciaTree zarith qcheck-core)
(libraries PatriciaTree qcheck-core)
(modules PatriciaTreeTest))
2 changes: 0 additions & 2 deletions dune-project
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,6 @@
(depends
(ocaml
(>= 4.14))
(zarith
(>= "1.13"))
dune
(qcheck-core
(and
Expand Down
22 changes: 14 additions & 8 deletions index.mld
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,13 @@ dune build @doc
{1 Features}

{ul
{li Similar to OCaml's [Map] and [Set], using the same function names when possible
{li Similar to OCaml's {{: https://ocaml.org/api/Map.S.html}[Map]} and {{: https://ocaml.org/api/Set.S.html}[Set]},
using the same function names when possible
and the same convention for order of arguments. This should allow switching to
and from Patricia Tree with minimal effort.}
{li The functor parameters ({!PatriciaTree.KEY} module) requires an injective [to_int : t -> int]
function instead of a [compare] function. {!PatriciaTree.KEY.to_int} should be fast,
injective, and only return positive integers.
and injective.
This works well with {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consed} types.}
{li The Patricia Tree representation is stable, contrary to maps, inserting nodes
in any order will return the same shape.
Expand All @@ -61,16 +62,22 @@ dune build @doc
[idempotent_inter] for the efficient version and [nonidempotent_inter_no_share]
for the general one)}
{li Since our Patricia Tree use big-endian order on keys, the maps and sets are
sorted in increasing order of keys. We only support positive integer keys.
sorted in increasing {b {{!PatriciaTree.unsigned_lt}unsigned order}} of keys.
This means negative keys are sorted above positive keys, with [-1] being the
largest possible key, and [0] the smallest.
This also avoids a bug in Okasaki's paper discussed in
{{: https://www.cs.tufts.edu/comp/150FP/archive/jan-midtgaard/qc-patricia.pdf}{i QuickChecking Patricia Trees}}
by Jan Mitgaard.}
by Jan Mitgaard.

It also affects functions like {{!PatriciaTree.BASE_MAP.unsigned_min_binding}[unsigned_min_binding]}
and {{!PatriciaTree.BASE_MAP.pop_unsigned_minimum}[pop_unsigned_minimum}. They will return the smallest
positive integer of both positive and negative keys are present; and not the smallest negative,
as one might expect.}
{li Supports generic maps and sets: a ['m map] that maps ['k key] to [('k, 'm) value].
This is especially useful when using {{: https://v2.ocaml.org/manual/gadts-tutorial.html}GADTs}
for the type of keys. This is also sometimes called a dependent map.}
{li Allows easy and fast operations across different types of maps and set (e.g.
an intersection between a map and a set), since all sets and maps, no matter their key type,
are really positive integer sets or maps.}
{li Allows easy and fast operations across different types of maps and set
which have the same type of keys (e.g. an intersection between a map and a set).}
{li Multiple choices for internal representation ({!PatriciaTree.NODE}), which allows for efficient
storage (no need to store a value for sets), or using weak nodes only (values removed from the tree if no other pointer to it exists). This system can also
be extended to store size information in nodes if needed.}
Expand Down Expand Up @@ -292,7 +299,6 @@ These are smaller and closer to OCaml's built-in [Map] and [Set], however:
- These libraries work with older version of OCaml ([>= 4.05] I believe), whereas
ours requires OCaml [>= 4.14] (for the new interface of [Ephemeron] used in
{!PatriciaTree.WeakNode}).
- Our keys are limited to positive integers.

{2 dmap}

Expand Down
65 changes: 65 additions & 0 deletions int_builtins.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
/**************************************************************************/
/* This file is part of the Codex semantics library. */
/* */
/* Copyright (C) 2013-2024 */
/* CEA (Commissariat à l'énergie atomique et aux énergies */
/* alternatives) */
/* */
/* you can redistribute it and/or modify it under the terms of the GNU */
/* Lesser General Public License as published by the Free Software */
/* Foundation, version 2.1. */
/* */
/* It is distributed in the hope that it will be useful, */
/* but WITHOUT ANY WARRANTY; without even the implied warranty of */
/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the */
/* GNU Lesser General Public License for more details. */
/* */
/* See the GNU Lesser General Public License version 2.1 */
/* for more details (enclosed in the file LICENSE). */
/* */
/**************************************************************************/

#define CAML_NAME_SPACE
#include <caml/mlvalues.h>
#include <caml/alloc.h>
#include <caml/memory.h>

#ifdef _MSC_VER
#include <intrin.h>
#endif

__attribute__((__always_inline__))
static inline uintnat clz(uintnat v){
/* Note: on a 64 bit platform, GCC's _builtin_clz will perform a 32
bit operation (even if the argument has type int). We have to use
_builtin_clzll instead. */
#if __GNUC__
#ifdef ARCH_SIXTYFOUR
return __builtin_clzll(v);
#else
return __builtin_clz(v)
#endif
#endif
#ifdef _MSC_VER
int res = 0;
#ifdef ARCH_SIXTYFOUR
_BitScanReverse64(&res,v);
#else
_BitScanReverse(&res,v);
#endif
return res;
#endif
}

/**************** Highest bit ****************/

CAMLprim uintnat caml_int_builtin_highest_bit (value i){
/* printf("Highest bit In C: %x %x %x %x\n", */
/* i, i >> 1, 62-clz(i), 1 << (62 - clz(i))); */
/* fflush(stdout); */
return ((uintnat) 1 << (8*sizeof(value) - 2 - clz(i)));
}

CAMLprim value caml_int_builtin_highest_bit_byte (value i){
return Val_int(caml_int_builtin_highest_bit(i));
}
1 change: 0 additions & 1 deletion patricia-tree.opam
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ bug-reports:
"https://github.com/codex-semantics-library/patricia-tree/issues"
depends: [
"ocaml" {>= "4.14"}
"zarith" {>= "1.13"}
"dune" {>= "2.7"}
"qcheck-core" {>= "0.21.2" & with-test}
"ppx_inline_test" {>= "v0.16.0" & with-test}
Expand Down
Loading

0 comments on commit 5b06e5e

Please sign in to comment.