[DNM] memdb: replace the current implementation with ART(adaptive radix tree) #1400

you06 · 2024-07-29T11:45:45Z

This PR introduce the ART(adaptive radix tree) as a faster replacement of the current memdb. In the micro bench, this implementation outperforms the current memdb in every case, faster in single thread as well as lower total CPU utilization.

The implementation is inspired by plar/go-adaptive-radix-tree, with some additional work:

Support memory arena, which reduces the allocation cost in GC language.
Support vlog, and cascade transaction(aka. staging/release/cleanup)
Support iterator.
Support tracking memory usage.

In a word, the ART has the same interface as current memdb.

…a faster replacement of current memdb. Signed-off-by: you06 <[email protected]>

Signed-off-by: you06 <[email protected]>

ekexium · 2024-07-30T07:43:57Z

internal/unionstore/art/node.go

+	}
+}
+
+func (n4 *node4) init() {


Can we somehow reduce duplicated code in all inits?
Or can we consider using generics to reduce more duplicated code? Maybe a benchmark is needed.

ekexium · 2024-07-30T07:44:17Z

internal/unionstore/art/node.go

+	return a.getNode256(n.addr)
+}
+
+func (an artNode) matchDeep(a *artAllocator, key Key, depth uint32) uint32 /* mismatch index*/ {


Should we unify all receiver types to use pointers?

ekexium · 2024-07-30T07:44:51Z

internal/unionstore/art/node.go

+	for idx := 0; idx < int(n16.nodeNum); idx++ {
+		if n16.keys[idx] == c {
+			return idx, n16.children[idx]
+		}
+	}


IIRC the paper uses SIMD or binary search here. We may benchmark the difference between the linear search and a binary search

ekexium · 2024-07-30T07:45:07Z

go.mod

@@ -59,3 +60,5 @@ require (
 	gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
 	gopkg.in/yaml.v3 v3.0.1 // indirect
 )
+
+replace github.com/plar/go-adaptive-radix-tree => github.com/you06/go-adaptive-radix-tree v0.0.0-20240523051018-0278e8bfcd2b


Do we also need to review this? Shall we consider merge the changes to upstream?

No need to review this, I import it only for benchmark test. My changes in 0278e8bfcd2b is only for the special usage in p-dml. As the mem arena implementation, I don't think the auther would like to accept it, since the impact of this change is too large(all the pointers are replaced by our self-defined address), and it actually sacrifice the read performance by introducing vlog.

Maybe the memory arena without vlog is useful for them, but I don't have any performance data for it, what about discussing with them to see if they would like to adopt our improvement, which is not related to our implementation in client-go(I think client-go's requirements are quite unique).

ekexium · 2024-07-30T07:45:22Z

internal/unionstore/art_memdb.go

@@ -0,0 +1,265 @@
+package unionstore


Is this file still needed?

There are some benchmark comparations between plar/go-adaptive-radix-tree and ours in memdb_bench_test.go, it should be removed later.

ekexium · 2024-07-30T07:46:32Z

internal/unionstore/art/art.go

+}
+
+// ARTCheckpoint is the checkpoint of memory DB.
+type ARTCheckpoint struct {


Can merge it with MemDBCheckPoint, maybe

ekexium · 2024-07-30T07:46:48Z

internal/unionstore/art/art.go

+			keptFlags := lf.getKeyFlags()
+			keptFlags = keptFlags.AndPersistent()
+			if keptFlags == 0 {
+				lf.markDelete()


Why do we only mark it deleted, but not freeing the node as well?

Free the node needs to change the parent of it, and I'm not inclined to store the parent address in leaf (which consumes more memory).

Note delete node in the original memdb actually do not release any space, see comment, the only benefit of it is the suppress the height of tree, which is not a issue for ART.

So I think mark it as deleted won't make it worse at least.

Like the memdb's comment, I do not implement reusing leaf not memdb by now, because of the difficulty of length-variety.

ekexium · 2024-07-30T07:47:08Z

internal/unionstore/art/node.go

+	node4size   = 80
+	node16size  = 236
+	node48size  = 888
+	node256size = 3096


Is it better to replace it with node256Size=Unsafe.SizeOf(node256{})?

ekexium · 2024-07-30T07:47:39Z

internal/unionstore/art/node.go

+}
+
+func (n4 *node4) prevPresentIdx(start int) int {
+	mask := uint8(1<<(start+1) - 1) // e.g. start=3 => 0b000...0001111


Nit: It might be clearer to add another pair of parentheses

ekexium · 2024-07-30T07:47:55Z

internal/unionstore/art/node.go

+	return 8 - zeros - 1
+}
+
+func (n16 *node16) init() {


Can we somehow reduce duplicated code in all inits?
Or can we consider using generics to reduce more duplicated code? Maybe a benchmark is needed.

Signed-off-by: you06 <[email protected]>

ekexium · 2024-08-02T02:47:18Z

internal/unionstore/art/arena.go

+)
+
+const (
+	alignMask = 1<<32 - 8 // 29 bit 1 and 3 bit 0.


Suggested change

alignMask = 1<<32 - 8 // 29 bit 1 and 3 bit 0.

alignMask = 0xFFFFFFF8 // 29 bits of 1 and 3 bits of 0

Is it better for 32-bit machine compatibility?

I copied it from memdb's arena.
My understanding is because the address is 32bit integer, 32 bit mask is enough.

My point is will it overflow in a 32-bit machine?

However, modifying the type would necessitate changes in numerous locations where int is currently implemented. In my opinion, we should avoid using int in these places... I'm comfortable with simply declaring that we don't support 32-bit systems. Just let it panic

My point is will it overflow in a 32-bit machine?

The maxBlockSize is set to 128MB(128 << 20), which is far below maxuint32. The offset of address is less than maxBlockSize so it won't overflow.

We can declare it don't support 32-bit systems, but I still want to use uint32 address 1 it reduces the memory usage for address. Is const alignMask uint32 = 0xFFFFFFF8 what you mean by avoiding using int.

ekexium · 2024-08-02T02:48:35Z

internal/unionstore/art/arena.go

+	off uint32
+}
+
+func (addr nodeAddr) isNull() bool {


The function can be optimized as return addr == nullAddr || addr.idx == math.MaxUint32 || addr.off == math.MaxUint32
So it can be inlined and reduce branches. I've seen a notable time spent in it for the original memdb.

What an inline trick.

ekexium · 2024-08-02T03:29:45Z

internal/unionstore/art/art_iterator.go

+	it.idxes = it.idxes[:0]
+}
+
+type ArtMemKeyHandle struct {


It should not be declared in art_iterator.go. MemKeyHandle can be used here.
Even if the vanilla memdb will be removed, some structures can be shared, I think

Use MemKeyHandle will lead to cyclic imports now actually.

ArtMemKeyHandle is same to MemKeyHandle because the mem arena is same, but I think they can be different for different MemBuffer implementation.

The best choice is associated type. But associated type is not supported, maybe try generic type in memBufferMutations if we want it to support all the MemBuffer implementation.

internal/unionstore/art/arena.go

ekexium · 2024-08-02T05:39:41Z

internal/unionstore/art/arena.go

+		return tombstone
+	}
+	valOff := hdrOff - valLen
+	return block[valOff:hdrOff:hdrOff]


I know this line is copied from memdbArena. I'm just wondering why its capacity is specified as hdrOff

See "Full slice expressions" sector in Slice expressions, the capacity is set to max - low(hdrOff - valOff in our code).

internal/unionstore/art/node.go

Signed-off-by: you06 <[email protected]>

cfzjywxk · 2024-08-21T03:47:36Z

@you06
Please change it to draft PR or adding [DNM] flag on the title.

you06 added 2 commits July 29, 2024 20:29

Add an art(from plar/go-adaptive-radix-tree) with staging support as …

86b9e7d

…a faster replacement of current memdb. Signed-off-by: you06 <[email protected]>

clean code

98cc66b

Signed-off-by: you06 <[email protected]>

ti-chi-bot bot added the dco-signoff: yes Indicates the PR's author has signed the dco. label Jul 29, 2024

skip slow race test

4c6c5f7

Signed-off-by: you06 <[email protected]>

ekexium self-requested a review July 30, 2024 02:00

ekexium reviewed Jul 30, 2024

View reviewed changes

you06 added 8 commits July 30, 2024 22:34

opt minimum

3fe84bd

Signed-off-by: you06 <[email protected]>

make longestCommonPrefix a common func

8b06ac0

Signed-off-by: you06 <[email protected]>

remove unnecessary check & use unsafe.Sizeof

b5b791f

Signed-off-by: you06 <[email protected]>

remove unnecessary check in minimum

f8eecf2

Signed-off-by: you06 <[email protected]>

remove recursive func usage for minimum

15affcc

Signed-off-by: you06 <[email protected]>

remove unused present for n4 and n16 & enlarge in-node prefix

9bcaf42

Signed-off-by: you06 <[email protected]>

Merge branch 'master' into staging-art

6d1aa8b

fast search for n256

90e7e39

Signed-off-by: you06 <[email protected]>

ekexium reviewed Aug 2, 2024

View reviewed changes

you06 added 7 commits August 2, 2024 16:50

address comment

be7f324

Signed-off-by: you06 <[email protected]>

use uint32 address & wip design doc

30dd323

Signed-off-by: you06 <[email protected]>

add node section

ce55f0c

Signed-off-by: you06 <[email protected]>

fix lint

9f5d150

Signed-off-by: you06 <[email protected]>

manually inline critical path

5584048

Signed-off-by: you06 <[email protected]>

refine code

f2e8c08

Signed-off-by: you06 <[email protected]>

remove art lib & add tests

bded7db

Signed-off-by: you06 <[email protected]>

you06 changed the title ~~memdb: replace the current implementation with ART(adaptive radix tree)~~ [DNM] memdb: replace the current implementation with ART(adaptive radix tree) Aug 21, 2024

you06 marked this pull request as draft August 21, 2024 05:34

ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM] memdb: replace the current implementation with ART(adaptive radix tree) #1400

[DNM] memdb: replace the current implementation with ART(adaptive radix tree) #1400

you06 commented Jul 29, 2024 •

edited

Loading

ekexium Jul 30, 2024

ekexium Jul 30, 2024

ekexium Jul 30, 2024

ekexium Jul 30, 2024

you06 Jul 30, 2024 •

edited

Loading

you06 Jul 30, 2024

ekexium Jul 30, 2024

you06 Jul 30, 2024

ekexium Jul 30, 2024

ekexium Jul 30, 2024

you06 Aug 2, 2024

ekexium Jul 30, 2024

ekexium Jul 30, 2024

ekexium Jul 30, 2024

ekexium Aug 2, 2024

you06 Aug 2, 2024

ekexium Aug 2, 2024

ekexium Aug 2, 2024 •

edited

Loading

you06 Aug 8, 2024

ekexium Aug 2, 2024

you06 Aug 2, 2024

ekexium Aug 2, 2024

you06 Aug 2, 2024

ekexium Aug 2, 2024

you06 Aug 2, 2024 •

edited

Loading

cfzjywxk commented Aug 21, 2024

	alignMask = 1<<32 - 8 // 29 bit 1 and 3 bit 0.
	alignMask = 0xFFFFFFF8 // 29 bits of 1 and 3 bits of 0

[DNM] memdb: replace the current implementation with ART(adaptive radix tree) #1400

Are you sure you want to change the base?

[DNM] memdb: replace the current implementation with ART(adaptive radix tree) #1400

Conversation

you06 commented Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

you06 Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekexium Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

you06 Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

cfzjywxk commented Aug 21, 2024

you06 commented Jul 29, 2024 •

edited

Loading

you06 Jul 30, 2024 •

edited

Loading

ekexium Aug 2, 2024 •

edited

Loading

you06 Aug 2, 2024 •

edited

Loading