Add estimated row batch size in bytes to state #230

floriecai · 2021-01-13T00:12:57Z

Related: #226

Similar PR: #228

Add an estimate of number of bytes for each row batch and sending it with the progress callback

shuhaowu · 2021-01-13T14:07:45Z

cursor.go

@@ -142,7 +141,6 @@ func (c *Cursor) Each(f func(*RowBatch) error) error {
 		tx.Rollback()

 		c.lastSuccessfulPaginationKey = paginationKeypos
-		c.rowsExamined += uint64(batch.Size())


oh yeah this wasn't used... :whistling:..

row_batch.go

shuhaowu · 2021-01-13T14:09:22Z

test/go/row_batch_test.go

+
+	s := batch.EstimateByteSize()
+
+	fmt.Printf("%d", s)


Maybe we should actually assert something?

Removed since I'm arleady testing in the callback.

fjordan · 2021-01-13T17:58:09Z

row_batch.go

+func (e *RowBatch) EstimateByteSize() uint64 {
+	var total int
+	for _, v := range e.values {
+		size, err := json.Marshal(v)


I'm worried about the overhead of the json marshaling here. Have you run any benchmarks to see how much additional CPU this will take?

Also, instead of the json marshaling, have we considered unsafe.Sizeof() or reflect.Type.Size()? I'm not familiar with the risks of using the unsafe package, however. Something to consider.

unsafe.Sizeof (and reflect - they're aliases) give you the size of the pointer. So regardless of string length, the size is 8. Same with uint64, its always gonna be 8. So it's not quite the same since we want to know the byte length of the data itself.

And with some benchmarking, the json.Marshal seems harmless 👇

BenchmarkSize-12 1000000000 0.000454 ns/op 0 allocs/op #2.8MB file BenchmarkJSON-12 1000000000 0.00634 ns/op 0 allocs/op #2.8MB file BenchmarkSizeSmall-12 1000000000 0.000077 ns/op 0 allocs/op #200KB file BenchmarkJSONSmall-12 1000000000 0.000429 ns/op 0 allocs/op #200KB file

Yeah you'd have to traverse into the pointer and things gets ugly quickly (although json.Marshal is technically doing it).

Thinking out loud here, but we could technically with minimal performance hit wrap mysql.writeExecutePacket and len the return data after filtering for INSERTS.

Or no we can't, it doesn't return the data, just an error.

Manan007224 · 2021-02-05T22:51:04Z

There are some performance issues with the estimation of the row batch size in bytes noted here - #240. I have currently added a flag to turn on/off the row batch size estimation which can be passed in the ghostferry config.

cc @shuhaowu @tiwilliam

Manan007224 · 2021-02-09T07:02:59Z

batch_writer.go

+				bytesWrittenForThisBatch = batch.EstimateByteSize()
+			}
+			w.StateTracker.UpdateLastSuccessfulPaginationKey(batch.TableSchema().String(), endPaginationKeypos,
+				RowStats{NumBytes: bytesWrittenForThisBatch, NumRows: uint64(batch.Size())})


This behaviour seems a bit sketchy. If someone has turned off the EnableRowBatchSize to off they'll see the NumBytes to 0 which would create confusion. I don't think there's much we can do in here other then documenting this behaviour.

changes in config debug tests fix tests added go tests modifications

update cc

floriecai requested review from shuhaowu and fjordan January 13, 2021 00:39

shuhaowu reviewed Jan 13, 2021

View reviewed changes

fjordan reviewed Jan 13, 2021

View reviewed changes

floriecai force-pushed the estimate-row-batch-bytes branch from 765fd22 to 3fdea62 Compare January 13, 2021 22:12

floriecai requested review from shuhaowu and fjordan January 13, 2021 22:14

floriecai force-pushed the estimate-row-batch-bytes branch from 7508eb4 to 5e3db8a Compare January 13, 2021 22:49

Manan007224 self-assigned this Jan 19, 2021

Manan007224 mentioned this pull request Jan 21, 2021

estimated row size in bytes performance issues #240

Open

Manan007224 force-pushed the estimate-row-batch-bytes branch 2 times, most recently from a62be4d to 19e5524 Compare February 5, 2021 22:08

Manan007224 requested a review from tiwilliam February 5, 2021 22:51

Manan007224 reviewed Feb 9, 2021

View reviewed changes

Manan007224 force-pushed the estimate-row-batch-bytes branch from 465f7ce to 1ae01dc Compare February 9, 2021 07:10

shuhaowu approved these changes Mar 25, 2021

View reviewed changes

floriecai and others added 2 commits March 25, 2021 09:20

Add estimated row batch size in bytes to state

d187ce1

added a flag to turn off rowbatch size estimation

9c372e2

changes in config debug tests fix tests added go tests modifications

Manan007224 force-pushed the estimate-row-batch-bytes branch from 1ae01dc to 7b4d802 Compare March 25, 2021 16:24

add documentation for enableRowBatchSize flag

6d14b90

update cc

Manan007224 force-pushed the estimate-row-batch-bytes branch from 7b4d802 to 6d14b90 Compare March 25, 2021 16:43

Manan007224 merged commit cfe5638 into master Mar 26, 2021

shuhaowu deleted the estimate-row-batch-bytes branch April 28, 2021 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add estimated row batch size in bytes to state #230

Add estimated row batch size in bytes to state #230

floriecai commented Jan 13, 2021 •

edited

Loading

shuhaowu Jan 13, 2021

shuhaowu Jan 13, 2021

floriecai Jan 13, 2021

fjordan Jan 13, 2021

floriecai Jan 13, 2021 •

edited

Loading

shuhaowu Jan 13, 2021

tiwilliam Jan 14, 2021

tiwilliam Jan 14, 2021

Manan007224 commented Feb 5, 2021 •

edited

Loading

Manan007224 Feb 9, 2021

Add estimated row batch size in bytes to state #230

Add estimated row batch size in bytes to state #230

Conversation

floriecai commented Jan 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

floriecai Jan 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Manan007224 commented Feb 5, 2021 • edited Loading

Choose a reason for hiding this comment

floriecai commented Jan 13, 2021 •

edited

Loading

floriecai Jan 13, 2021 •

edited

Loading

Manan007224 commented Feb 5, 2021 •

edited

Loading