diff --git a/doc/signatures.md b/doc/signatures.md index 8279622..cc5fd32 100644 --- a/doc/signatures.md +++ b/doc/signatures.md @@ -2,10 +2,10 @@ Signatures ========== It often occurs in coding that we need a library, a set of -functionally. Often there are several algorithms that could provide -this functionality. However, the code that uses it, either doesn't +functionalities. Often there are several algorithms that could provide +each of these functionalities. However, the code that uses it, either doesn't care about the individual algorithm or wishes to delegate choosing -that algorithm to some higher level. Lets take the concrete example of +that algorithm to some higher level. Let's take the concrete example of dictionaries. A dictionary provides the ability to access a value via a key (other things as well but primarily this). There are may ways to implement a dictionary. Just a few are: @@ -17,13 +17,13 @@ implement a dictionary. Just a few are: * Many, many more .... Each of these approaches has their own performance characteristics, -memory footprints etc. For example, a table of size n with open -addressing has no collisions and holds up to n elements, with a single -comparison for successful lookup, and a table of size n with chaining -and k keys has the minimum max(0, k-n) collisions and O(1 + k/n) +memory footprints, etc. For example, a table of size $n$ with open +addressing has no collisions and holds up to $n$ elements, with a single +comparison for successful lookup, and a table of size $n$ with chaining +and $k$ keys has the minimum $\max(0, k-n)$ collisions and $\mathcal{O}(1 + k/n)$ comparisons for lookup. While for skip lists the performance characteristics are about as good as that of randomly-built binary -search trees - namely (O log n). So the choice of which to select +search trees - namely $\mathcal{O}(\log n)$. So the choice of which to select depends very much on memory available, insert/read characteristics, etc. So delegating the choice to a single point in your code is a very good idea. Unfortunately, in Erlang that's so easy to do at the moment. @@ -39,17 +39,20 @@ directly. There are a few ways you can approximate it. One way is to pass the Module name to the calling functions along with the data that it is going to be called on. - :::erlang - add(ModuleToUse, Key, Value, DictData) -> - ModuleToUse:add(Key, Value, DictData). +```erlang +add(ModuleToUse, Key, Value, DictData) -> + ModuleToUse:add(Key, Value, DictData). +``` This works, and you can vary how you want to pass the data. For example, you could easily use a tuple to contain the data. That is, you could pass in `{ModuleToUse, DictData}` and that would make it a bit cleaner. - :::erlang - add(Key, Value, {ModuleToUse, DictData}) -> - ModuleToUse:add(Key, Value, DictData). + +```erlang +add(Key, Value, {ModuleToUse, DictData}) -> + ModuleToUse:add(Key, Value, DictData). +``` Either way, there are a few problems with this approach. One of the biggest is that you lose code locality, by looking at this bit of code @@ -75,9 +78,10 @@ name. So what we actually want to do is something mole like this: - :::erlang - add(Key, Value, DictData) -> - dictionary:add(Key, Value, DictData). +```erlang +add(Key, Value, DictData) -> + dictionary:add(Key, Value, DictData). +``` Doing this we retain the locality. We can easily look up the `dictionary` Module. We immediately have a good idea what a @@ -97,24 +101,25 @@ a [Behaviour](http://metajack.im/2008/10/29/custom-behaviors-in-erlang/) for our functionality. To continue our example we will define a Behaviour for dictionaries. That Behaviour looks like this: - :::erlang - -module(ec_dictionary). +```erlang +-module(ec_dictionary). - -export([behaviour_info/1]). +-export([behaviour_info/1]). - behaviour_info(callbacks) -> - [{new, 0}, - {has_key, 2}, - {get, 2}, - {add, 3}, - {remove, 2}, - {has_value, 2}, - {size, 1}, - {to_list, 1}, - {from_list, 1}, - {keys, 1}]; - behaviour_info(_) -> - undefined. +behaviour_info(callbacks) -> + [{new, 0}, + {has_key, 2}, + {get, 2}, + {add, 3}, + {remove, 2}, + {has_value, 2}, + {size, 1}, + {to_list, 1}, + {from_list, 1}, + {keys, 1}]; +behaviour_info(_) -> + undefined. +``` So we have our Behaviour now. Unfortunately, this doesn't give us much @@ -124,14 +129,15 @@ dictionaries in an abstract way in our code. To do that we need to add a bit of functionality. We do that by actually implementing our own behaviour, starting with `new/1`. - :::erlang - %% @doc create a new dictionary object from the specified module. The - %% module should implement the dictionary behaviour. - %% - %% @param ModuleName The module name. - -spec new(module()) -> dictionary(_K, _V). - new(ModuleName) when is_atom(ModuleName) -> - #dict_t{callback = ModuleName, data = ModuleName:new()}. +```erlang +%% @doc create a new dictionary object from the specified module. The +%% module should implement the dictionary behaviour. +%% +%% @param ModuleName The module name. +-spec new(module()) -> dictionary(_K, _V). +new(ModuleName) when is_atom(ModuleName) -> + #dict_t{callback = ModuleName, data = ModuleName:new()}. +``` This code creates a new dictionary for us. Or to be more specific it actually creates a new dictionary Signature record, that will be used @@ -148,16 +154,17 @@ dictionary and another that just retrieves data. The first we will look at is the one that updates the dictionary by adding a value. - :::erlang - %% @doc add a new value to the existing dictionary. Return a new - %% dictionary containing the value. - %% - %% @param Dict the dictionary object to add too - %% @param Key the key to add - %% @param Value the value to add - -spec add(key(K), value(V), dictionary(K, V)) -> dictionary(K, V). - add(Key, Value, #dict_t{callback = Mod, data = Data} = Dict) -> - Dict#dict_t{data = Mod:add(Key, Value, Data)}. +```erlang +%% @doc add a new value to the existing dictionary. Return a new +%% dictionary containing the value. +%% +%% @param Dict the dictionary object to add too +%% @param Key the key to add +%% @param Value the value to add +-spec add(key(K), value(V), dictionary(K, V)) -> dictionary(K, V). +add(Key, Value, #dict_t{callback = Mod, data = Data} = Dict) -> + Dict#dict_t{data = Mod:add(Key, Value, Data)}. +``` There are two key things here. @@ -173,16 +180,17 @@ implementation to do the work itself. Now lets do a data retrieval function. In this case, the `get` function of the dictionary Signature. - :::erlang - %% @doc given a key return that key from the dictionary. If the key is - %% not found throw a 'not_found' exception. - %% - %% @param Dict The dictionary object to return the value from - %% @param Key The key requested - %% @throws not_found when the key does not exist - -spec get(key(K), dictionary(K, V)) -> value(V). - get(Key, #dict_t{callback = Mod, data = Data}) -> - Mod:get(Key, Data). +```erlang +%% @doc given a key return that key from the dictionary. If the key is +%% not found throw a 'not_found' exception. +%% +%% @param Dict The dictionary object to return the value from +%% @param Key The key requested +%% @throws not_found when the key does not exist +-spec get(key(K), dictionary(K, V)) -> value(V). +get(Key, #dict_t{callback = Mod, data = Data}) -> + Mod:get(Key, Data). +``` In this case, you can see a very similar approach to deconstructing the dict record. We still need to pull out the callback module and the @@ -226,29 +234,30 @@ purpose is to help a preexisting module implement the Behaviour defined by a Signature. A good example of this in our current example is the [erlware_commons/ec_dict](https://github.com/ericbmerritt/erlware_commons/blob/types/src/ec_dict.erl) -module. It implements the ec_dictionary Behaviour, but all the +module. It implements the `ec_dictionary` Behaviour, but all the functionality is provided by the [stdlib/dict](http://www.erlang.org/doc/man/dict.html) module itself. Let's take a look at one example to see how this is done. We will take a look at one of the functions we have already seen. The -`get` function in ec_dictionary doesn't have quite the same -semantics as any of the functions in the dict module. So a bit of -translation needs to be done. We do that in the ec_dict module `get` function. - - :::erlang - -spec get(ec_dictionary:key(K), Object::dictionary(K, V)) -> - ec_dictionary:value(V). - get(Key, Data) -> - case dict:find(Key, Data) of - {ok, Value} -> - Value; - error -> - throw(not_found) - end. - -So the ec_dict module's purpose for existence is to help the -preexisting dict module implement the Behaviour defined by the +`get` function in `ec_dictionary` doesn't have quite the same +semantics as any of the functions in the `dict` module. So a bit of +translation needs to be done. We do that in the `ec_dict:get/2` function. + +```erlang +-spec get(ec_dictionary:key(K), Object::dictionary(K, V)) -> + ec_dictionary:value(V). +get(Key, Data) -> + case dict:find(Key, Data) of + {ok, Value} -> + Value; + error -> + throw(not_found) + end. +``` + +So the `ec_dict` module's purpose for existence is to help the +preexisting `dict` module implement the Behaviour defined by the Signature. @@ -267,15 +276,16 @@ create a couple of functions that create dictionaries for each type we want to test. The first we want to time is the Signature Wrapper, so `dict` vs `ec_dict` called as a Signature. - :::erlang - create_dict() -> +```erlang +create_dict() -> lists:foldl(fun(El, Dict) -> - dict:store(El, El, Dict) - end, dict:new(), - lists:seq(1,100)). + dict:store(El, El, Dict) + end, dict:new(), + lists:seq(1,100)). +``` The only thing we do here is create a sequence of numbers 1 to 100, -and then add each of those to the dict as an entry. We aren't too +and then add each of those to the `dict` as an entry. We aren't too worried about replicating real data in the dictionary. We care about timing the function call overhead of Signatures, not the performance of the dictionaries themselves. @@ -283,35 +293,37 @@ of the dictionaries themselves. We need to create a similar function for our Signature based dictionary `ec_dict`. - :::erlang - create_dictionary(Type) -> +```erlang +create_dictionary(Type) -> lists:foldl(fun(El, Dict) -> - ec_dictionary:add(El, El, Dict) - end, - ec_dictionary:new(Type), - lists:seq(1,100)). + ec_dictionary:add(El, El, Dict) + end, + ec_dictionary:new(Type), + lists:seq(1,100)). +``` Here we actually create everything using the Signature. So we don't need one function for each type. We can have one function that can create anything that implements the Signature. That is the magic of -Signatures. Otherwise, this does the exact same thing as the dict -`create_dict/1`. +Signatures. Otherwise, this does the exact same thing as the dictionary +given by `create_dict/0`. We are going to use two function calls in our timing. One that updates data and one that returns data, just to get good coverage. For our dictionaries we are going to use the `size` function as well as the `add` function. - :::erlang - time_direct_vs_signature_dict() -> - io:format("Timing dict~n"), - Dict = create_dict(), - test_avg(fun() -> - dict:size(dict:store(some_key, some_value, Dict)) - end, - 1000000), - io:format("Timing ec_dict implementation of ec_dictionary~n"), - time_dict_type(ec_dict). +```erlang +time_direct_vs_signature_dict() -> + io:format("Timing dict~n"), + Dict = create_dict(), + test_avg(fun() -> + dict:size(dict:store(some_key, some_value, Dict)) + end, + 1000000), + io:format("Timing ec_dict implementation of ec_dictionary~n"), + time_dict_type(ec_dict). +``` The `test_avg` function runs the provided function the number of times specified in the second argument and collects timing information. We @@ -323,18 +335,19 @@ we don't have to hard code the calls for the Signature implementations. Lets take a look at the `time_dict_type` function. - :::erlang - time_dict_type(Type) -> - io:format("Testing ~p~n", [Type]), - Dict = create_dictionary(Type), - test_avg(fun() -> - ec_dictionary:size(ec_dictionary:add(some_key, some_value, Dict)) - end, - 1000000). +```erlang +time_dict_type(Type) -> + io:format("Testing ~p~n", [Type]), + Dict = create_dictionary(Type), + test_avg(fun() -> + ec_dictionary:size(ec_dictionary:add(some_key, some_value, Dict)) + end, + 1000000). +``` As you can see we take the type as an argument (we need it for `dict` creation) and call our create function. Then we run the same timings -that we did for ec dict. In this case though, the type of dictionary +that we did for `ec_dict`. In this case though, the type of dictionary is never specified, we only ever call ec_dictionary, so this test will work for anything that implements that Signature. @@ -343,24 +356,25 @@ work for anything that implements that Signature. So we have our tests, what was the result. Well on my laptop this is what it looked like. - :::sh - Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false] - - Eshell V5.8.2 (abort with ^G) - - 1> ec_timing:time_direct_vs_signature_dict(). - Timing dict - Range: 2 - 5621 mics - Median: 3 mics - Average: 3 mics - Timing ec_dict implementation of ec_dictionary - Testing ec_dict - Range: 3 - 6097 mics - Median: 3 mics - Average: 4 mics - 2> - -So for the direct dict call, we average about 3 mics per call, while +```sh +Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false] + +Eshell V5.8.2 (abort with ^G) + +1> ec_timing:time_direct_vs_signature_dict(). +Timing dict +Range: 2 - 5621 mics +Median: 3 mics +Average: 3 mics +Timing ec_dict implementation of ec_dictionary +Testing ec_dict +Range: 3 - 6097 mics +Median: 3 mics +Average: 4 mics +2> +``` + +So for the direct `dict` call, we average about 3 mics per call, while for the Signature Wrapper we average around 4. That's a 25% cost for Signature Wrappers in this example, for a very small number of calls. Depending on what you are doing that is going to be greater or @@ -373,30 +387,32 @@ Signature, but it is not a Signature Wrapper. It is a native implementation of the Signature. To use `ec_rbdict` directly we have to create a creation helper just like we did for dict. - :::erlang - create_rbdict() -> +```erlang +create_rbdict() -> lists:foldl(fun(El, Dict) -> - ec_rbdict:add(El, El, Dict) - end, ec_rbdict:new(), - lists:seq(1,100)). + ec_rbdict:add(El, El, Dict) + end, ec_rbdict:new(), + lists:seq(1,100)). +``` This is exactly the same as `create_dict` with the exception that dict is replaced by `ec_rbdict`. The timing function itself looks very similar as well. Again notice that we have to hard code the concrete name for the concrete -implementation, but we don't for the ec_dictionary test. - - :::erlang - time_direct_vs_signature_rbdict() -> - io:format("Timing rbdict~n"), - Dict = create_rbdict(), - test_avg(fun() -> - ec_rbdict:size(ec_rbdict:add(some_key, some_value, Dict)) - end, - 1000000), - io:format("Timing ec_dict implementation of ec_dictionary~n"), - time_dict_type(ec_rbdict). +implementation, but we don't for the `ec_dictionary` test. + +```erlang +time_direct_vs_signature_rbdict() -> + io:format("Timing rbdict~n"), + Dict = create_rbdict(), + test_avg(fun() -> + ec_rbdict:size(ec_rbdict:add(some_key, some_value, Dict)) + end, + 1000000), + io:format("Timing ec_dict implementation of ec_dictionary~n"), + time_dict_type(ec_rbdict). +``` And there we have our test. What do the results look like? @@ -406,22 +422,23 @@ The main thing we are timing here is the additional cost of the dictionary Signature itself. Keep that in mind as we look at the results. - :::sh - Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false] - - Eshell V5.8.2 (abort with ^G) - - 1> ec_timing:time_direct_vs_signature_rbdict(). - Timing rbdict - Range: 6 - 15070 mics - Median: 7 mics - Average: 7 mics - Timing ec_dict implementation of ec_dictionary - Testing ec_rbdict - Range: 6 - 6013 mics - Median: 7 mics - Average: 7 mics - 2> +```sh +Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false] + +Eshell V5.8.2 (abort with ^G) + +1> ec_timing:time_direct_vs_signature_rbdict(). +Timing rbdict +Range: 6 - 15070 mics +Median: 7 mics +Average: 7 mics +Timing ec_dict implementation of ec_dictionary +Testing ec_rbdict +Range: 6 - 6013 mics +Median: 7 mics +Average: 7 mics +2> +``` So no difference it time. Well the reality is that there is a difference in timing, there must be, but we don't have enough