A kd-tree is a data structure to store multidimensional points in such a way, that it is easy and fast to repeatedly find nearest neighbours.
For further explanations see http://en.wikipedia.org/wiki/Kd-tree.
For loading this package you follow the general SciSmalltalk README and set the repo variable to the local copy of this directory (eg something like repo := '/home/user/SciSmalltalk/Math-KDTree'.). Then you can load it with:
Gofer new
repository: (MCFileTreeRepository new directory:
(FileDirectory on: repo));
package: 'Math-KDTree';
load.
You construct a kdtree using #withAll:
kd:=KDTree withAll: aCollectionOfVectors.
where a Vector can be any collection of numbers that understands #at:
You can then find the nearest neighbours of anotherVector in aCollectionOfVectors with #nnSearch:i:
kd nnSearch: anotherVector i: numberOfNearestNeighbours.
Let me make a simple application just as an example how it can be used.
"let's say we have some simple 1-dimensional data"
sine:= (1 to: 50 by: 0.2)collect: [:i|i sin ].
"let's blur that a little bit"
dirtysine :=sine collect: [:i|i +Float random-0.5].
"now suppose we want to clean up the mess by partititioning that mess and then averaging the most similar partitions.
ok, first we calc the partition"
partitionLength:=30.
partition := (1 to: (dirtysine size+1 -partitionLength))collect:
[:i|( dirtysine copyFrom: i to:( i+partitionLength-1)) ].
"then we make a kdtree so that it does not take too long to find the most similar parts"
tree:=KDTree withAll: partition.
"say we want to average 30 partitions"
numberOfNeighbours :=30.
"then we go through all partitions, find its neighbours and calc the average for each of them"
averagedPartition :=(1 to: partition size)collect: [:i| |neighbours|
neighbours :=(tree nnSearch: (partition at: i) i: numberOfNeighbours) reduce: [ :a :b | a + b ].
neighbours / numberOfNeighbours].
"what we have now with 'averagedPartition' is an averaged version of 'partition' .
we can now reconstruct a somewhat smoother version of the original mess by undoing the partition thing,
iow reversing the calculation of the partition we did in the first part"
reconstruction :=Array new:(sine size).
(1 to: sine size)do:[:i| |n sum|
n:=0.
sum:=0.
1 to: averagedPartition first size do:[:k|
(i between: k and: (averagedPartition size+k-1)) ifTrue:
[n:=n+1.sum:=sum+((averagedPartition at:(i+1-k))at:k)] ].
reconstruction at:i put: (sum/n)].
"ok, lets calc the errors"
((sine - dirtysine) squared sum / sine size)sqrt . " 0.2901228130544897"
((sine - reconstruction) squared sum / sine size)sqrt . " 0.07493816828762763"
"well, it seems the code did its thing"