Skip to content

Latest commit

 

History

History
59 lines (55 loc) · 2.9 KB

README-Math-KDTREE.md

File metadata and controls

59 lines (55 loc) · 2.9 KB

A kd-tree is a data structure to store multidimensional points in such a way, that it is easy and fast to repeatedly find nearest neighbours.
For further explanations see http://en.wikipedia.org/wiki/Kd-tree.

For loading this package you follow the general SciSmalltalk README and set the repo variable to the local copy of this directory (eg something like repo := '/home/user/SciSmalltalk/Math-KDTree'.). Then you can load it with:

Gofer new
    repository: (MCFileTreeRepository new directory: 
                    (FileDirectory on: repo));
    package: 'Math-KDTree';
    load.

You construct a kdtree using #withAll:

kd:=KDTree withAll: aCollectionOfVectors.

where a Vector can be any collection of numbers that understands #at:

You can then find the nearest neighbours of anotherVector in aCollectionOfVectors with #nnSearch:i:

kd nnSearch: anotherVector i: numberOfNearestNeighbours.

Let me make a simple application just as an example how it can be used.

"let's say we have some simple 1-dimensional data"  
sine:= (1 to: 50 by: 0.2)collect: [:i|i sin ].  
"let's blur that a little bit"  
dirtysine :=sine collect: [:i|i +Float random-0.5].  
"now suppose we want to clean up the mess by partititioning that mess and then averaging the most similar partitions.  
ok, first we calc the partition"  
partitionLength:=30.   
partition := (1 to: (dirtysine size+1 -partitionLength))collect: 
	[:i|( dirtysine copyFrom: i to:( i+partitionLength-1)) ].  
"then we make a kdtree so that it does not take too long to find the most similar parts"  
tree:=KDTree withAll: partition.   
"say we want to average 30 partitions"  
numberOfNeighbours :=30.  
"then we go through all partitions, find its neighbours and calc the average for each of them"  
averagedPartition :=(1 to: partition size)collect: [:i| |neighbours|   
	neighbours :=(tree nnSearch: (partition at: i) i: numberOfNeighbours)  reduce: [ :a :b | a + b ].  
	neighbours / numberOfNeighbours].  
"what we have now with 'averagedPartition' is an averaged version of 'partition' .  
we can now reconstruct a somewhat smoother version of the original mess by undoing the partition thing, 
iow reversing the calculation of the partition we did in the first part"  
reconstruction :=Array new:(sine size).  
(1 to: sine size)do:[:i| |n sum|  
	n:=0.  
	sum:=0.  
	1 to: averagedPartition first size do:[:k|  
		(i between: k and: (averagedPartition size+k-1)) ifTrue:  
			[n:=n+1.sum:=sum+((averagedPartition at:(i+1-k))at:k)] ].  
	reconstruction at:i put: (sum/n)].  
"ok, lets calc the errors"  
((sine - dirtysine) squared sum / sine size)sqrt . " 0.2901228130544897"  
((sine - reconstruction) squared sum / sine size)sqrt . " 0.07493816828762763"  
"well, it seems the code did its thing"

plot plot