Skip to content

Commit

Permalink
Insist on the positive integer key requirement
Browse files Browse the repository at this point in the history
  • Loading branch information
dlesbre committed Apr 17, 2024
1 parent a8186bb commit ab57e7e
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 22 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ dune install
and the same convention for order of arguments. This should allow switching to
and from Patricia Tree with minimal effort.
- The functor parameters (`KEY` module) requires an injective `to_int : t -> int`
function instead of a `compare` function. `to_int` should be fast and injective,
this works well with [hash-consed](https://en.wikipedia.org/wiki/Hash_consing) types.
function instead of a `compare` function. `to_int` should be fast, injective,
and only return positive integers.
This works well with [hash-consed](https://en.wikipedia.org/wiki/Hash_consing) types.
- The Patricia Tree representation is stable, contrary to maps, inserting nodes
in any order will return the same shape.
This allows different versions of a map to share more subtrees in memory, and
Expand All @@ -69,7 +70,7 @@ dune install
- Supports generic maps and sets: a `'m map` that maps `'k key` to `('k, 'm) value`.
This is especially useful when using [GADTs](https://v2.ocaml.org/manual/gadts-tutorial.html) for the type of keys. This is also sometimes called a dependent map.
- Allows easy and fast operations across different types of maps and set (e.g.
an intersection between a map and a set), since all sets and maps, no matter their key type, are really integer sets or maps.
an intersection between a map and a set), since all sets and maps, no matter their key type, are really positive integer sets or maps.
- Multiple choices for internal representation (`NODE`), which allows for efficient
storage (no need to store a value for sets), or using weak nodes only (values removed from the tree if no other pointer to it exists). This system can also
be extended to store size information in nodes if needed.
Expand Down
38 changes: 20 additions & 18 deletions index.mld
Original file line number Diff line number Diff line change
Expand Up @@ -30,41 +30,43 @@ dune install

{1 Features}

- Similar to OCaml's [Map] and [Set], using the same function names when possible
{ul
{li Similar to OCaml's [Map] and [Set], using the same function names when possible}
and the same convention for order of arguments. This should allow switching to
and from Patricia Tree with minimal effort.
- The functor parameters ({!PatriciaTree.KEY} module) requires an injective [to_int : t -> int]
function instead of a [compare] function. {!PatriciaTree.KEY.to_int} should be fast and injective,
this works well with {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consed} types.
- The Patricia Tree representation is stable, contrary to maps, inserting nodes
{li The functor parameters ({!PatriciaTree.KEY} module) requires an injective [to_int : t -> int]
function instead of a [compare] function. {!PatriciaTree.KEY.to_int} should be fast,
injective, and only return positive integers.
This works well with {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consed} types.}
{li The Patricia Tree representation is stable, contrary to maps, inserting nodes
in any order will return the same shape.
This allows different versions of a map to share more subtrees in memory, and
the operations over two maps to benefit from this sharing. The functions in
this library attempt to **maximally preserve sharing and benefit from sharing**,
this library attempt to {b maximally preserve sharing and benefit from sharing},
allowing very important improvements in complexity and running time when
combining maps or sets is a frequent operation.

To do so, these functions often have extra requirements on their argument
(e.g. [inter f m1 m2] can be optimized by not inspecting common subtrees when
[f] is idempotent). To avoid accidental errors, they are renamed (e.g. to
[idempotent_inter] for the efficient version and [nonidempotent_inter_no_share]
for the general one)

- Since our Patricia Tree use big-endian order on keys, the maps and sets are
for the general one)}
{li Since our Patricia Tree use big-endian order on keys, the maps and sets are
sorted in increasing order of keys. We only support positive integer keys.
This also avoids a bug in Okasaki's paper discussed in
{{: https://www.cs.tufts.edu/comp/150FP/archive/jan-midtgaard/qc-patricia.pdf}{i QuickChecking Patricia Trees}}
by Jan Mitgaard.
- Supports generic maps and sets: a ['m map] that maps ['k key] to [('k, 'm) value].
by Jan Mitgaard.}
{li Supports generic maps and sets: a ['m map] that maps ['k key] to [('k, 'm) value].
This is especially useful when using {{: https://v2.ocaml.org/manual/gadts-tutorial.html}GADTs}
for the type of keys. This is also sometimes called a dependent map.
- Allows easy and fast operations across different types of maps and set (e.g.
an intersection between a map and a set), since all sets and maps, no matter their key type, are really integer sets or maps.
- Multiple choices for internal representation ({!PatriciaTree.NODE}), which allows for efficient
for the type of keys. This is also sometimes called a dependent map.}
{li Allows easy and fast operations across different types of maps and set (e.g.
an intersection between a map and a set), since all sets and maps, no matter their key type,
are really positive integer sets or maps.}
{li Multiple choices for internal representation ({!PatriciaTree.NODE}), which allows for efficient
storage (no need to store a value for sets), or using weak nodes only (values removed from the tree if no other pointer to it exists). This system can also
be extended to store size information in nodes if needed.
- Exposes a common interface ({!type:PatriciaTree.NODE.view}) to allow users to write their own pattern
matching on the tree structure without depending on the {!PatriciaTree.NODE} being used.
be extended to store size information in nodes if needed.}
{li Exposes a common interface ({!type:PatriciaTree.NODE.view}) to allow users to write their own pattern
matching on the tree structure without depending on the {!PatriciaTree.NODE} being used.}}

{1 Quick overview}

Expand Down
17 changes: 16 additions & 1 deletion patriciaTree.mli
Original file line number Diff line number Diff line change
Expand Up @@ -856,11 +856,14 @@ end
(** The signature of keys when they are all of the same type. *)
module type KEY = sig
type t
(** The type of keys *)

(** A unique identifier for values of the type. Usually, we use a
fresh counter that is increased to give a unique id to each
object. Correctness of the operations requires that different
values in a tree correspond to different integers. *)
values in a tree correspond to different integers.
Must be injective, return only positive values, and ideally fast *)
val to_int: t -> int
end

Expand All @@ -872,8 +875,20 @@ type (_, _) cmp = Eq : ('a, 'a) cmp | Diff : ('a, 'b) cmp
(** The signature of heterogeneous keys. *)
module type HETEROGENEOUS_KEY = sig
type 'key t
(** The type of generic/heterogeneous keys *)


val to_int : 'key t -> int
(** A unique identifier for values of the type. Usually, we use a
fresh counter that is increased to give a unique id to each
object. Correctness of the operations requires that different
values in a tree correspond to different integers.
Must be injective, return only positive values, and ideally fast *)

val polyeq : 'a t -> 'b t -> ('a, 'b) cmp
(** Polymorphic equality function used to compare our keys.
It should satisfy [(to_int a) = (to_int b) ==> polyeq a b = Eq] *)
end


Expand Down

0 comments on commit ab57e7e

Please sign in to comment.