Attempt to reduce memory footprint of compaction #627

codesome · 2019-06-11T13:39:11Z

What does this PR do

Removes building of entire new postings using the postings from other blocks.
Builds ([old series id] -> [new series id]) map for each block, and uses that to write postings 1 by 1 for each label-value pair.

I will do any more cleanup needed and post the benchmark results tomorrow. Hoping for some good news.

Update: This PR ended up have much more memory allocation savings in many places than just the above. This comment summarizes it all (hopefully).

UPDATE 2: As this PR got very complicated for review, it is broken down into multiple PRs (more to come soon). For now these are the child PRs which originate from this PR

Breakdown generic writeOffsetTable #643 - Breakdown generic writeOffsetTable
Check error before wrapping #644 - Check error before wrapping
Re-use 'keys' in ReadOffsetTable #645 - Reuse keys in ReadOffsetTable
Reuse byte buffer in WriteChunks and writeHash #653 - Reuse byte buffer in WriteChunks and writeHash
Reuse string buffer in stringTuples.Swap #654 - Reuse string buffer in stringTuples.Swap
NumSeries() method for BlockReader interface #656 - NumSeries() method for BlockReader interface

Signed-off-by: Ganesh Vernekar <[email protected]>

…d -> new series id) map. Signed-off-by: Ganesh Vernekar <[email protected]>

Signed-off-by: Ganesh Vernekar <[email protected]>

compact.go

codesome · 2019-06-12T06:52:55Z

Results are little disappointing

benchmark                                                                                old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     22137952284     37095788828     +67.57%

benchmark                                                                                old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     17821797       23269082       +30.57%

benchmark                                                                                old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2111300240     2192055400     +3.82%

Memory profile for this PR: compact_opt_PR.zip

Memory profile for master: compact_opt_master.zip

codesome · 2019-06-12T07:19:11Z

I managed to make it better but has a failing test in vertical compaction. Will push after I fix it.
UPDATE: Benchmarks scores are updated after the fix.

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     22137952284     25780080396     +16.45%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     17821797       20869162       +17.10%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2111300240     1629204160     -22.83%

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-06-12T07:59:54Z

This is the profile after the latest commit (getting rid of postingMap). Finally some reduced memory usage, but allocs are still more.
compact_opt_noPostingMap.zip

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-06-12T12:10:12Z

Minute improvement after the last commit

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     22627970234     24963923703     +10.32%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     17837934       20792238       +16.56%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2095948320     1549497480     -26.07%

codesome · 2019-06-12T13:27:21Z

This zip container memory and cpu profiles for both master and this PR while running those above benchmarks (the block creation was done independently and BenchmarkCompaction was modified to do only compaction)
mem_cpu_profiles.zip

Signed-off-by: Ganesh Vernekar <[email protected]>

…mory. Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-06-14T12:36:45Z

While trying to reduce allocs, I was able to get the B/op less than before instead in the last few commits.

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     15926839269     21870626474     +37.32%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     17854969       20803998       +16.52%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2111793184     1377519288     -34.77%

Still digging more to reduce allocs.

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-06-17T12:36:45Z

Found the culprit for increased allocs: https://github.com/prometheus/tsdb/blob/c1c39ed2e7900fb056445c9818754e3254518a49/compact.go#L870 (this line alone)

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-06-18T03:19:37Z

With the latest commit (re-using the bigEndianPostings object instead of creating a new one every time), allocs reduced significantly (With ~17.8M allocs at 0%, +5.11% now compared to +16.x% before).

With that came more B/op reduction. CPU usage is little higher as expected.

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     15923268078     21262455380     +33.53%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     17853104       18765444       +5.11%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2112769168     1245923288     -41.03%

I will be looking into reducing more allocs if possible.

This reduces the allocs of WriteLabelIndex significantly. Signed-off-by: Ganesh Vernekar <[email protected]>

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-06-18T07:05:57Z

Good news in the last 2 commits. Allocs reduced significantly and the diff for allocs is negative now.

EDIT: Numbers updated after fixing the race.

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     16858323771     20695705390     +22.76%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     17847227       13111051       -26.54%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2112914256     1218299784     -42.34%

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-06-18T09:26:18Z

More reduction in allocs in the last few commits!

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     16858323771     20462639317     +21.38%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     17847227       10110743       -43.35%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     2112914256     1151946488     -45.48%

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-07-10T09:47:22Z

This last commit f8ddc03 shed off about 2-3s of compaction time from a total of 15-17s before. The change was the avoid repeated binary search on the postingBuf and replace with a single linear scan.

Now it compares like this with the current master.

benchmark                                                                                 old ns/op       new ns/op       delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     13061950871     15038790382     +15.13%

benchmark                                                                                 old allocs     new allocs     delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     9149767        7111218        -22.28%

benchmark                                                                                 old bytes      new bytes      delta
BenchmarkCompaction/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     1980380560     1031404000     -47.92%

This benchmark results are on top of recently merged optimizations, and does not relate directly to numbers in this comment #627 (comment) (they were w.r.t. the master without any optimizations).

gouthamve

First pass, makes sense overall. I want to see if we can simplify things further.

gouthamve · 2019-07-12T07:05:01Z

compact.go

-			return errors.Wrap(err, "write postings")
+	postingBuf := make([]uint64, 0, 1000000)
+	var bigEndianPost index.Postings = index.NewBigEndianPostings(nil)
+	var listPost = index.NewListPostings(nil).(*index.ListPostings)


Can we make NewListPostings return a ListPostings?

compact.go

gouthamve · 2019-07-12T13:30:32Z

head.go

@@ -1349,6 +1361,7 @@ func (h *Head) getOrCreateWithID(id, hash uint64, lset labels.Labels) (*memSerie

 	h.metrics.series.Inc()


We're tracking this through NumSeries(). Can we just drop the gauge and use gaugeFunc()

Will do. Also, I have another PR for NumSeries() #656 if that would help in reviewing. Do you want to be in a separate PR or inside this PR itself?

Signed-off-by: Ganesh Vernekar <[email protected]>

gouthamve

almost lgtm :) Will take another pass after suggestions are fixed before I approve though.

compact.go

index/postings.go

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-07-16T10:58:51Z

While running prombench with this PR, compactions for head block was failing with the following error.

level=error ts=2019-07-16T09:58:13.238Z caller=db.go:377 component=tsdb msg=\"compaction failed\" err=\"persist head block: write compaction: write postings: 0xc36a049540 series for reference 17138146 not found\"\n

I am not able to replicate it yet. Compaction from head is working fine in synthetic test.

codesome · 2019-07-16T11:04:52Z

Able to re-create by just running prometheus in local machine! Will dig in more.

Signed-off-by: Ganesh Vernekar <[email protected]>

This fixes series out-of-order bug when compacting from on-disk block, because postings used to contain invalid series references as even the deleted series were present in seriesMap with bogus series references. Signed-off-by: Ganesh Vernekar <[email protected]>

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-08-01T11:02:03Z

The previous way of merging remapped postings did not scale well. I was keeping a sorted list of postings. And adding the postings in that list while keeping it sorted. This was slow. During prombench test, compaction of on-disk blocks was stuck at this place and allocations also shot up (it had ~13-14M series in total from all the blocks being compacted).

Now I am gathering the postings list from one index reader at a time and merging the sorted list in the old fashion way. I tested this on the indices that I had downloaded from prombench, and they finally don't get stuck here.

Below are the updated (synthetic) benchmark results.

2 blocks, same as all the benchmarks in this PR thread.

benchmark                                                                                  old ns/op       new ns/op       delta
BenchmarkCompaction2/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     12804627758     14663944235     +14.52%

benchmark                                                                                  old allocs     new allocs     delta
BenchmarkCompaction2/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     6149618        4111065        -33.15%

benchmark                                                                                  old bytes      new bytes      delta
BenchmarkCompaction2/type=normal,blocks=2,series=1000000,samplesPerSeriesPerBlock=51-8     1973256560     1042666800     -47.16%

This is for 3 blocks. An indication that with more blocks the time will get more worse.:

benchmark                                                                                  old ns/op       new ns/op       delta
BenchmarkCompaction2/type=normal,blocks=3,series=1000000,samplesPerSeriesPerBlock=51-8     15223759663     19132151137     +25.67%

benchmark                                                                                  old allocs     new allocs     delta
BenchmarkCompaction2/type=normal,blocks=3,series=1000000,samplesPerSeriesPerBlock=51-8     7135520        5096165        -28.58%

benchmark                                                                                  old bytes      new bytes      delta
BenchmarkCompaction2/type=normal,blocks=3,series=1000000,samplesPerSeriesPerBlock=51-8     2137628704     1276175792     -40.30%

I will test this on prombench now.

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2019-08-05T09:24:27Z

Closing it for now. Analysis of this PR vs master is in this doc https://docs.google.com/document/d/1ZzZ8sslkA5LPshr9gqz-MmRpPmyY19jFDz-AKrkwpVw/edit?usp=sharing.

bwplotka · 2019-08-05T11:37:29Z

Thanks for that work guys, I think we learnt a lot from this despite negative outcome of this particular experiment!

codesome added 3 commits June 11, 2019 13:25

Returning series ref in ChunkSeriesSet.At()

9811a14

Signed-off-by: Ganesh Vernekar <[email protected]>

newCompactionMerger takes []ChunkSeriesSet. Calculating (old series i…

12a10a0

…d -> new series id) map. Signed-off-by: Ganesh Vernekar <[email protected]>

User (old series id -> new series id) map to write new postings

b7839c4

Signed-off-by: Ganesh Vernekar <[email protected]>

brian-brazil reviewed Jun 11, 2019

View reviewed changes

compact.go Outdated Show resolved Hide resolved

Get rid of postingMap

5d3bf62

Signed-off-by: Ganesh Vernekar <[email protected]>

NumSeries() for BlockReader. Pre-allocate memory for seriesMap maps.

76b67fd

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome marked this pull request as ready for review June 12, 2019 13:02

codesome added 3 commits June 14, 2019 16:41

Set a higher initial value for postingBuf

64e4487

Signed-off-by: Ganesh Vernekar <[email protected]>

Efficiently iterate over all sorted label-values

3a4a6e6

Signed-off-by: Ganesh Vernekar <[email protected]>

Hint number of postings writes to the index Writer to pre-allocate me…

0a0dbfe

…mory. Signed-off-by: Ganesh Vernekar <[email protected]>

Pre-allocate slice for label values in populateBlock

c1c39ed

Signed-off-by: Ganesh Vernekar <[email protected]>

Re-use bigEndianPostings in populateBlock

3593059

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome added 2 commits June 18, 2019 11:12

Reuse string buffer when swapping stringTuples

e78ec69

This reduces the allocs of WriteLabelIndex significantly. Signed-off-by: Ganesh Vernekar <[email protected]>

Reuse byte slice for writing chunk meta hash

f344fa2

Signed-off-by: Ganesh Vernekar <[email protected]>

Fix race in writeHash

9fab207

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome force-pushed the compact-opt branch from f2813b4 to 9fab207 Compare June 18, 2019 07:24

codesome added 4 commits June 18, 2019 14:11

Fix lint errors

0ff98c0

Signed-off-by: Ganesh Vernekar <[email protected]>

Reuse slice in ReadOffsetTable. Reduces allocs for OpenBlock.

407fe55

Signed-off-by: Ganesh Vernekar <[email protected]>

Reuse ListPostings in populateBlock.

39b97e5

Signed-off-by: Ganesh Vernekar <[email protected]>

Check for error before wrapping in blockIndexReader.Series

4ff0ed8

Signed-off-by: Ganesh Vernekar <[email protected]>

writePostings perf improvements

f8ddc03

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome mentioned this pull request Jul 10, 2019

NumSeries() method for BlockReader interface #656

Closed

gouthamve reviewed Jul 12, 2019

View reviewed changes

codesome added 3 commits July 12, 2019 20:51

Fix review comments

1582ebc

Signed-off-by: Ganesh Vernekar <[email protected]>

Add RemappedPostings

5993889

Signed-off-by: Ganesh Vernekar <[email protected]>

Merge remote-tracking branch 'upstream/master' into compact-opt

4293e37

gouthamve reviewed Jul 15, 2019

View reviewed changes

compact.go Outdated Show resolved Hide resolved

compact.go Outdated Show resolved Hide resolved

compact.go Outdated Show resolved Hide resolved

index/postings.go Outdated Show resolved Hide resolved

Fix review comments

e0693dc

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome mentioned this pull request Jul 15, 2019

Benchmark tsdb master prometheus/prometheus#5765

Closed

codesome force-pushed the compact-opt branch from c7bb7b5 to 337afe8 Compare July 16, 2019 13:53

Reset ref in compactionMerger when there are 0 chunks

048790b

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome force-pushed the compact-opt branch from 337afe8 to 048790b Compare July 16, 2019 17:37

codesome added 2 commits July 23, 2019 15:56

compactionMerger doesn't return series with 0 chunks

605f5b7

Signed-off-by: Ganesh Vernekar <[email protected]>

Merge remote-tracking branch 'upstream/master' into compact-opt

40ad33c

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome mentioned this pull request Jul 23, 2019

Vendor tsdb v0.10.0 prometheus/prometheus#5793

Merged

codesome added 5 commits July 24, 2019 11:54

Fix out-of-order series bug

b72273f

Signed-off-by: Ganesh Vernekar <[email protected]>

Merge remote-tracking branch 'upstream/master' into compact-opt

ffe544e

Bug fix in compactionMerger.Next()

570d5e1

Signed-off-by: Ganesh Vernekar <[email protected]>

Better allocation strategy for remapped postings

c252f96

Signed-off-by: Ganesh Vernekar <[email protected]>

Faster way to merge remapped postings

a75fa32

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome force-pushed the compact-opt branch from 30eda7b to a75fa32 Compare August 2, 2019 07:19

codesome closed this Aug 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to reduce memory footprint of compaction #627

Attempt to reduce memory footprint of compaction #627

codesome commented Jun 11, 2019 •

edited

Loading

codesome commented Jun 12, 2019

codesome commented Jun 12, 2019 •

edited

Loading

codesome commented Jun 12, 2019

codesome commented Jun 12, 2019

codesome commented Jun 12, 2019

codesome commented Jun 14, 2019

codesome commented Jun 17, 2019

codesome commented Jun 18, 2019

codesome commented Jun 18, 2019 •

edited

Loading

codesome commented Jun 18, 2019

codesome commented Jul 10, 2019

gouthamve left a comment

gouthamve Jul 12, 2019

gouthamve Jul 12, 2019

codesome Jul 12, 2019

gouthamve left a comment

codesome commented Jul 16, 2019

codesome commented Jul 16, 2019

codesome commented Aug 1, 2019

codesome commented Aug 5, 2019

bwplotka commented Aug 5, 2019

		@@ -1349,6 +1361,7 @@ func (h Head) getOrCreateWithID(id, hash uint64, lset labels.Labels) (memSerie

		h.metrics.series.Inc()

Attempt to reduce memory footprint of compaction #627

Attempt to reduce memory footprint of compaction #627

Conversation

codesome commented Jun 11, 2019 • edited Loading

codesome commented Jun 12, 2019

codesome commented Jun 12, 2019 • edited Loading

codesome commented Jun 12, 2019

codesome commented Jun 12, 2019

codesome commented Jun 12, 2019

codesome commented Jun 14, 2019

codesome commented Jun 17, 2019

codesome commented Jun 18, 2019

codesome commented Jun 18, 2019 • edited Loading

codesome commented Jun 18, 2019

codesome commented Jul 10, 2019

gouthamve left a comment

Choose a reason for hiding this comment

gouthamve Jul 12, 2019

Choose a reason for hiding this comment

gouthamve Jul 12, 2019

Choose a reason for hiding this comment

codesome Jul 12, 2019

Choose a reason for hiding this comment

gouthamve left a comment

Choose a reason for hiding this comment

codesome commented Jul 16, 2019

codesome commented Jul 16, 2019

codesome commented Aug 1, 2019

codesome commented Aug 5, 2019

bwplotka commented Aug 5, 2019

codesome commented Jun 11, 2019 •

edited

Loading

codesome commented Jun 12, 2019 •

edited

Loading

codesome commented Jun 18, 2019 •

edited

Loading