Go-Trie is small library implementing trie datastructure for searching large quantities of strings in text with additional skipping optimization. It extracts information about words similar to KMP algorithm. This information is then used to determine how many characters we can skip ahead.
It is not suitable for use-cases with small dictionary (or single word searches). Naive implementations might be much better in those cases! Always measure!
import "github.com/MartinKuzma/go-trie"
- Case sensitive.
- Functions:
Contains
,Find
,HasPrefix
,WordsWithPrefix
,SortedWords
,FromJson
,ToJson
- Trie can be exported into JSON format and parsed from it, thus saving us from building and optimization steps.
- Nodes in trie are not saved in map due to performance reasons, binary search is performed instead.
- Designed to be immutable after it has been created. This ensures thread-safety and less complexity.
textToSearch := `Lorem ipsum ...`
// Create optimized trie
myTrie := trie.NewTrie().
WithWords(substrings...).
Optimize(true).
Build()
myTrie.Find(textToSearch, func(result trie.SearchResult) {
// Do something useful with result
fmt.Printf("Found %s at position %d\n", result.Word, result.Position)
})
if myTrie.IsContained(textToSearch) {
// Atleast one of the words in our trie is present.
}
These results were achieved by using database with 3000 words. Naive implementation uses strings.Index
and strings.Contains
functions. While the results represent huge performance advantage, we can easily find situations where naive implementation is much faster.
Always use measurements that fit your data and use-case.
Trie initialization included
cpu: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
BenchmarkNaiveFind-4 90 13292769 ns/op 81896 B/op 12 allocs/op
BenchmarkTrieFind-4 2292 504501 ns/op 82282 B/op 20 allocs/op
BenchmarkNaiveContained-4 4428 250002 ns/op 0 B/op 0 allocs/op
BenchmarkTrieIsContained-4 6020218 201.2 ns/op 0 B/op 0 allocs/op
Trie initialization not included
cpu: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
BenchmarkNaiveFind-4 82 13285727 ns/op 81896 B/op 12 allocs/op
BenchmarkTrieFind-4 2392 514459 ns/op 81896 B/op 12 allocs/op
BenchmarkNaiveContained-4 4500 249894 ns/op 0 B/op 0 allocs/op
BenchmarkTrieIsContained-4 6042928 198.6 ns/op 0 B/op 0 allocs/op
- Improve performance of optimization algorithm
- Provide better examples and tests
- Implement naive version for small-sized database
- Find out what constitues as a small database
- Find out good hashing function for searching children