Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Snappy compressor with Snappy compatible version of S2 algorithm #276

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sylwiaszunejko
Copy link
Collaborator

Fixes: #142

@dkropachev
Copy link
Collaborator

@sylwiaszunejko, let's move compressors out to a separate package with separate go.mod
changeing go version in root go.mod needs to be done in separate minor update.

@sylwiaszunejko
Copy link
Collaborator Author

@sylwiaszunejko, let's move compressors out to a separate package with separate go.mod changeing go version in root go.mod needs to be done in separate minor update.

That would be easy to do if we would want to add new compressor type that uses S2, we could have it in separate module like LZ4Compressor. But I am not sure about moving SnappyCompressor to the separate module, wouldn't it be also a breaking change?

@dkropachev
Copy link
Collaborator

dkropachev commented Sep 26, 2024

@sylwiaszunejko, let's move compressors out to a separate package with separate go.mod changeing go version in root go.mod needs to be done in separate minor update.

That would be easy to do if we would want to add new compressor type that uses S2, we could have it in separate module like LZ4Compressor. But I am not sure about moving SnappyCompressor to the separate module, wouldn't it be also a breaking change?

We better leave snappy where it is and as default, but on next minor release, updating golang we can move it out and make s2 default.

Btw, on that topic, we need to have benchmark tests for both compressors.

@sylwiaszunejko
Copy link
Collaborator Author

@sylwiaszunejko, let's move compressors out to a separate package with separate go.mod changeing go version in root go.mod needs to be done in separate minor update.

That would be easy to do if we would want to add new compressor type that uses S2, we could have it in separate module like LZ4Compressor. But I am not sure about moving SnappyCompressor to the separate module, wouldn't it be also a breaking change?

We better leave snappy where it is and as default, but on next minor release, updating golang we can move it out and make s2 default.

So your recommendation is to not merge this PR right now? Or to change it to add new type of compressor in separate module that uses S2 and leave snappy as it is? Because if I understand @mykaul correctly in the issue discussion, he wanted to edit existing SnappyCompressor not to add a new one

Btw, on that topic, we need to have benchmark tests for both compressors.

I agree

@sylwiaszunejko sylwiaszunejko self-assigned this Sep 26, 2024
@@ -14,7 +15,7 @@ func TestSnappyCompressor(t *testing.T) {
}

str := "My Test String"
//Test Encoding
//Test Encoding with Snappy
expected := snappy.Encode(nil, []byte(str))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this test does. It still tests with the original Snappy compressor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted some test to check if it is indeed compatible but this test is not a good way of doing that, and I agree that S2 probably has such basic tests

compressor_test.go Outdated Show resolved Hide resolved
expected = s2.EncodeSnappy(nil, []byte(str))
if res, err := c.Encode([]byte(str)); err != nil {
t.Fatalf("failed to encode '%v' with error %v", str, err)
} else if bytes.Compare(expected, res) != 0 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we verifying that S2 can indeed produce a Snappy compatible output, or a byte to byte compatible output? It seems the latter, which isn't clear is what S2 can do actually. I think what you need to test is can you decode it.

(But I also think it's not needed. S2 probably has such basic tests).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test doesn't make sens I removed it

go.mod Outdated Show resolved Hide resolved
@mykaul
Copy link

mykaul commented Sep 26, 2024

@sylwiaszunejko, let's move compressors out to a separate package with separate go.mod changeing go version in root go.mod needs to be done in separate minor update.

That would be easy to do if we would want to add new compressor type that uses S2, we could have it in separate module like LZ4Compressor. But I am not sure about moving SnappyCompressor to the separate module, wouldn't it be also a breaking change?

We better leave snappy where it is and as default, but on next minor release, updating golang we can move it out and make s2 default.

So your recommendation is to not merge this PR right now? Or to change it to add new type of compressor in separate module that uses S2 and leave snappy as it is? Because if I understand @mykaul correctly in the issue discussion, he wanted to edit existing SnappyCompressor not to add a new one

We cannot add a new type that Scylla server won't understand. And if we wish to have snappy and S2-snappy - I don't fully see the point. They are either compatible and therefore replaceable, or not.

Btw, on that topic, we need to have benchmark tests for both compressors.

I agree

We must. There's absolutely no value other than improved performance. So if it doesn't buy us anything, I don't see a reason to do it.
(I could see a future reason - it's a good basis to continue and add ZSTD...)

@dkropachev
Copy link
Collaborator

@sylwiaszunejko, let's move compressors out to a separate package with separate go.mod changeing go version in root go.mod needs to be done in separate minor update.

That would be easy to do if we would want to add new compressor type that uses S2, we could have it in separate module like LZ4Compressor. But I am not sure about moving SnappyCompressor to the separate module, wouldn't it be also a breaking change?

We better leave snappy where it is and as default, but on next minor release, updating golang we can move it out and make s2 default.

So your recommendation is to not merge this PR right now? Or to change it to add new type of compressor in separate module that uses S2 and leave snappy as it is? Because if I understand @mykaul correctly in the issue discussion, he wanted to edit existing SnappyCompressor not to add a new one

  1. We can't merge it as is, because of changes in go.mod
  2. But we woud be able to merge it, if you move s2 to a package with a separate go.mod and keeping old snappy as default.
  3. Later when we move to newer golang version we would switch to s2 be default and remove go.mod from it.

@sylwiaszunejko
Copy link
Collaborator Author

@sylwiaszunejko, let's move compressors out to a separate package with separate go.mod changeing go version in root go.mod needs to be done in separate minor update.

That would be easy to do if we would want to add new compressor type that uses S2, we could have it in separate module like LZ4Compressor. But I am not sure about moving SnappyCompressor to the separate module, wouldn't it be also a breaking change?

We better leave snappy where it is and as default, but on next minor release, updating golang we can move it out and make s2 default.

So your recommendation is to not merge this PR right now? Or to change it to add new type of compressor in separate module that uses S2 and leave snappy as it is? Because if I understand @mykaul correctly in the issue discussion, he wanted to edit existing SnappyCompressor not to add a new one

1. We can't merge it as is, because of changes in `go.mod`

2. But we woud be able to merge it, if you move s2 to a package with a separate `go.mod` and keeping old snappy as default.

3. Later when we move to newer golang version we would switch to s2 be default and remove `go.mod` from it.

@mykaul @dkropachev That's probably not the best solution but I changed s2 version from 1.17.10 to 1.17.9 and now running go mod tidy doesn't require golang version change, could it be acceptable to use not the latest version of s2?

@dkropachev
Copy link
Collaborator

@sylwiaszunejko, let's move compressors out to a separate package with separate go.mod changeing go version in root go.mod needs to be done in separate minor update.

That would be easy to do if we would want to add new compressor type that uses S2, we could have it in separate module like LZ4Compressor. But I am not sure about moving SnappyCompressor to the separate module, wouldn't it be also a breaking change?

We better leave snappy where it is and as default, but on next minor release, updating golang we can move it out and make s2 default.

So your recommendation is to not merge this PR right now? Or to change it to add new type of compressor in separate module that uses S2 and leave snappy as it is? Because if I understand @mykaul correctly in the issue discussion, he wanted to edit existing SnappyCompressor not to add a new one

1. We can't merge it as is, because of changes in `go.mod`

2. But we woud be able to merge it, if you move s2 to a package with a separate `go.mod` and keeping old snappy as default.

3. Later when we move to newer golang version we would switch to s2 be default and remove `go.mod` from it.

@mykaul @dkropachev That's probably not the best solution but I changed s2 version from 1.17.10 to 1.17.9 and now running go mod tidy doesn't require golang version change, could it be acceptable to use not the latest version of s2?

Great news, sure. Let's have a perfromance tests to confirm it is better and switch to it.

@mykaul
Copy link

mykaul commented Sep 29, 2024

klauspost/compress#1008 probably updated Golang. We should also update, but certainly not as part of this PR.
The diff between 1.17.10 and 1.17.9 (klauspost/compress@v1.17.9...v1.17.10 ) doesn't seem substantial to me.

@sylwiaszunejko
Copy link
Collaborator Author

I tried to test the performance with this code:

package main

import (
	"crypto/rand"
	"fmt"
	"math/big"
	"time"

	"github.com/gocql/gocql"
)

// Generate random text payload of a given size in bytes
func generateTextPayload(size int) []byte {
	letters := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
	payload := make([]byte, size)
	charsetLen := big.NewInt(int64(len(letters)))

	// Fill the payload with random characters from the charset
	for i := 0; i < size; i++ {
		randomIndex, err := rand.Int(rand.Reader, charsetLen)
		if err != nil {
			return nil
		}
		payload[i] = letters[randomIndex.Int64()]
	}

	return payload
}

// Generate random binary payload of a given size in bytes
func generateBinaryPayload(size int) []byte {
	payload := make([]byte, size)
	rand.Read(payload) // Fill the slice with random binary data
	return payload
}

// Configure the cluster and set the compressor
func setupClusterWithCompressor(compressor gocql.Compressor) *gocql.ClusterConfig {
	cluster := gocql.NewCluster("127.0.0.2")
	cluster.Keyspace = "example"
	cluster.Compressor = compressor
	return cluster
}

// Insert a payload into the database
func insertPayload(session *gocql.Session, id gocql.UUID, payload []byte) error {
	start := time.Now()
	err := session.Query(`INSERT INTO test_payload (id, data) VALUES (?, ?)`, id, payload).Exec()
	elapsed := time.Since(start)
	fmt.Printf("Insert Time: %s\n", elapsed)
	return err
}

// Retrieve a payload from the database
func retrievePayload(session *gocql.Session, id gocql.UUID) ([]byte, error) {
	var payload []byte
	start := time.Now()
	err := session.Query(`SELECT data FROM test_payload WHERE id = ?`, id).Scan(&payload)
	elapsed := time.Since(start)
	fmt.Printf("Retrieve Time: %s\n", elapsed)
	return payload, err
}

func testCompressorPerformance(session *gocql.Session, compressorName string, payload []byte) {
	id := gocql.TimeUUID()

	// Insert payload
	err := insertPayload(session, id, payload)
	if err != nil {
		fmt.Printf("Error inserting payload: %v", err)
	}

	// Retrieve payload
	retrievedPayload, err := retrievePayload(session, id)
	if err != nil {
		fmt.Printf("Error retrieving payload: %v", err)
	}

	fmt.Printf("Compressor: %s, Payload Size: %d, Retrieved Size: %d\n", compressorName, len(payload), len(retrievedPayload))
}

func main() {
	cluster := setupClusterWithCompressor(gocql.SnappyCompressor{})
	session, err := cluster.CreateSession()
	if err != nil {
		fmt.Printf("Unable to connect to cluster: %v\n", err)
	}

	binaryPayload := generateBinaryPayload(1024)
	testCompressorPerformance(session, "snappy-s2", binaryPayload)

	binaryPayload2 := generateBinaryPayload(1024 * 50)
	testCompressorPerformance(session, "snappy-s2", binaryPayload2)

	binaryPayload3 := generateBinaryPayload(1024 * 500)
	testCompressorPerformance(session, "snappy-s2", binaryPayload3)

	textPayload := generateTextPayload(1024)
	testCompressorPerformance(session, "snappy-s2", textPayload)

	textPayload2 := generateTextPayload(1024 * 50)
	testCompressorPerformance(session, "snappy-s2", textPayload2)

	textPayload3 := generateTextPayload(1024 * 500)
	testCompressorPerformance(session, "snappy-s2", textPayload3)

	session.Close()

	cluster = setupClusterWithCompressor(gocql.SnappyCompressor_old{})
	session, err = cluster.CreateSession()
	if err != nil {
		fmt.Printf("Unable to connect to cluster: %v\n", err)
	}

	binaryPayload = generateBinaryPayload(1024)
	testCompressorPerformance(session, "snappy", binaryPayload)

	binaryPayload2 = generateBinaryPayload(1024 * 50)
	testCompressorPerformance(session, "snappy", binaryPayload2)

	binaryPayload3 = generateBinaryPayload(1024 * 500)
	testCompressorPerformance(session, "snappy", binaryPayload3)

	textPayload = generateTextPayload(1024)
	testCompressorPerformance(session, "snappy", textPayload)

	textPayload2 = generateTextPayload(1024 * 50)
	testCompressorPerformance(session, "snappy", textPayload2)

	textPayload3 = generateTextPayload(1024 * 500)
	testCompressorPerformance(session, "snappy", textPayload3)
}

Unfortunately there is no significant difference between these two, for some payloads one is better, for some the other, example results:

Insert Time: 5.131291ms
Retrieve Time: 3.222042ms
Compressor: snappy-s2, Payload Size: 1024, Retrieved Size: 1024
Insert Time: 2.467388ms
Retrieve Time: 3.829467ms
Compressor: snappy-s2, Payload Size: 51200, Retrieved Size: 51200
Insert Time: 4.7324ms
Retrieve Time: 4.10411ms
Compressor: snappy-s2, Payload Size: 512000, Retrieved Size: 512000
Insert Time: 1.936204ms
Retrieve Time: 1.352703ms
Compressor: snappy-s2, Payload Size: 1024, Retrieved Size: 1024
Insert Time: 2.570928ms
Retrieve Time: 2.250566ms
Compressor: snappy-s2, Payload Size: 51200, Retrieved Size: 51200
Insert Time: 3.443636ms
Retrieve Time: 2.63757ms
Compressor: snappy-s2, Payload Size: 512000, Retrieved Size: 512000


Insert Time: 2.903622ms
Retrieve Time: 2.492422ms
Compressor: snappy, Payload Size: 1024, Retrieved Size: 1024
Insert Time: 2.760623ms
Retrieve Time: 2.426934ms
Compressor: snappy, Payload Size: 51200, Retrieved Size: 51200
Insert Time: 4.191973ms
Retrieve Time: 4.482372ms
Compressor: snappy, Payload Size: 512000, Retrieved Size: 512000
Insert Time: 2.175619ms
Retrieve Time: 1.287564ms
Compressor: snappy, Payload Size: 1024, Retrieved Size: 1024
Insert Time: 2.206364ms
Retrieve Time: 1.681896ms
Compressor: snappy, Payload Size: 51200, Retrieved Size: 51200
Insert Time: 3.002807ms
Retrieve Time: 3.414728ms
Compressor: snappy, Payload Size: 512000, Retrieved Size: 512000

Maybe you have any suggestion around testing, maybe I could do something different/better?

@mykaul
Copy link

mykaul commented Oct 1, 2024

  1. Are the compression numbers the same?
  2. Memory consumption? CPU consumption?

However, if there's no real change, I would be happy to move - just because I feel the original Snappy is not being actively worked on, and S2 seem to have more movement.

@sylwiaszunejko
Copy link
Collaborator Author

  1. Are the compression numbers the same?

not sure how to check that

  1. Memory consumption? CPU consumption?

I have checked memory consumption and it looks similar for both compressors, I am not sure how to compare cpu consumption, tried to use cpu profiling but with not much success yet.

@mykaul
Copy link

mykaul commented Oct 2, 2024

  1. Are the compression numbers the same?

not sure how to check that

size of the packets.

  1. Memory consumption? CPU consumption?

I have checked memory consumption and it looks similar for both compressors, I am not sure how to compare cpu consumption, tried to use cpu profiling but with not much success yet.

Yes, that may be more challenging, if we don't measure it well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace Snappy compressor with https://github.com/klauspost/compress/tree/master/s2#s2-compression
3 participants