-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transition away from Java serialization for storing state on disk or in Zookeeper #625
base: master
Are you sure you want to change the base?
Conversation
config and nimbus currently use the core Java serialization to store StormTopology instances. This commit will change this to use Thrift serialization instead. StormTopology is a Thrift struct so this basically involves finding all the places we call `Utils/serialize` and `Utils/deserialize` and replace them with a call to a method that serializes with Thrift instead.
config.clj and nimbus serialize Storm configuration using the default Java implementation. This commit will phase out Java default serialization in favor of Clojure default serialization. There are obviously many reasons to phase out Java serialization, but the main rationale is that Java serialization will complain about version mismatch even if the change is semantically backward compatible.
TSerializer is not threadsafe. In `Utils` we instantiate a static final Tserializer, but this can (and will) cause odd bugs if we start calling `serialize()` in different threads. Thus, every time we call `Utils/serializeTopology`, we create a new TSerializer. Another way to do this would be to lock it, which performance may or may not merit.
Serialization of configuration is handled in config, but it is not different from a generic method for Clojure form serialization. This commit will move this method to utils so that we can use it for other things, like serialization in cluster.
cluster.clj uses the stock Java serialization implementation. There are obviously many reason to not use standard Java serialization, but our main motivation is that Java will complain about serialized state when the versions don't match even if they're semantically backwards compatible.
Supervisor currently just uses stock jvm serialization to communicate LocalState. This is undesirable for many reasons, so this commit will introduce a serialization interface which makes code cleaner, letting us specify which type of serializer to use without populating Supervisor with unnecessary boxing/unboxing behavior, or LocalState with too much knowledge about what's happening in the Supervisor. Also this commit will introduce a basic implementation sketch for a serializer for LocalState (though it will just use the jvm serialization at this point).
To build the serialization interface in Java, we need to put these constants in a Java file. This commit will put them in Constants.java
@mrflip @hausdorff If so I'm +1. |
@@ -230,7 +230,7 @@ | |||
(assignment-info [this storm-id callback] | |||
(when callback | |||
(swap! assignment-info-callback assoc storm-id callback)) | |||
(maybe-deserialize (get-data cluster-state (assignment-path storm-id) (not-nil? callback))) | |||
(maybe-deserialize-clj-bytes (get-data cluster-state (assignment-path storm-id) (not-nil? callback))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we renaming these functions? If we stick the original func name, we will have less code to worry about. No?
Hey, cool -- good to see this finally getting resolved. I'll be happy to take a look at this in a couple hours, after I get off work. |
@hausdorff @mrflip |
This is all @hausdorff's work -- it addresses #419 and #525, and supersedes #497
I've rebased his commits into the current master, and validated that the tests pass.
Hopefully this helps grease the path for the PR to go in...