-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18950: shaded avro jar #4854
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Can the Avro version be upgraded to |
@tejaswini-imply I raised https://issues.apache.org/jira/browse/HADOOP-18921. This change in this PR has no support from the Hadoop committers so it is very unlikely to proceed. |
b67612b
to
d36d805
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
0137b04
to
35086eb
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
b44c83a
to
3358357
Compare
replace org.apache.avro in generated source code use temp version of hadoop-shaded-avro Update JobQueueChangeEvent.java more avro related code that needs changes more exclusions use thirdparty avro staging jar use thirdparty release Update pom.xml Update pom.xml
💔 -1 overall
This message was automatically generated. |
@ayushtkn @slfan1989 @steveloughran Is this a change we could put in 3.4.0-RC3? Avro is a bit like Protobuf in that Hadoop has generated classes using both frameworks and needs runtime jars that match the version of the tool used to generate the classes. I think a switch to using the shaded Avro version is something that should probably only be done in a minor release like 3.4.0 and not in a patch release like 3.4.1. The broken tests in the last run are down to protobuf (HADOOP-19090). |
💔 -1 overall
This message was automatically generated. |
let's do it in 3.4.1 after a 1.3.0 release, and make the "we've tuned the packaging" a key change along with "we've fixed the bits steve broke". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this could be trouble, yes.
Here is what I propose
- We differentiate internal uses from those of @public classes.
- public stuff stays on unshaded, maybe move to a later version
- internal moves to shaded; this may require us to copy stuff into an internal private package
- cut the unshade avro as an export off hadoop-common; document this
- and change SerializationFactory to use the shaded lib; if people want legacy avro, declare it explicitly.
Provided we can get annotations to coexist and classes to instantiate without their dependencies on annotation scans (we could check this in hadoop-release with sub-modules wit different imports) we could pull this off.
@@ -27,7 +27,7 @@ | |||
import java.util.Optional; | |||
import java.util.regex.Pattern; | |||
|
|||
import org.apache.avro.reflect.Stringable; | |||
import org.apache.hadoop.thirdparty.avro.reflect.Stringable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be dangerous, as we are saying that a public class can no longer be serialised through Avro.
Do you think it will be possible for us to retain the unshaded annotation as well as adding the new one? And still have everything to work without Avro on the CP?
@@ -21,7 +21,7 @@ | |||
import java.io.Closeable; | |||
import java.io.IOException; | |||
|
|||
import org.apache.avro.file.SeekableInput; | |||
import org.apache.hadoop.thirdparty.avro.file.SeekableInput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, this is a public class we don't use internally.
Should we actually deprecate it? I don't know what uses it?
I don't know enough about Avro to hack it to work with shaded and non-shaded annotations. I thought all we cared about was how to support the internal Hadoop code and its internal use of Avro. If we need to support users who want to do their own Avro serialization of Hadoop classes, then I think we should abandon this PR. I think it would be far easier to just upgrade the actual Avro jars that Hadoop uses and give up on shading it. |
let's start this as a [DISCUSS] I'd certainly want our internals to be hidden and upgraded; for the other bits we have to choose how safest to update them |
Description of PR
HADOOP-18950 - Use hadoop-shaded-avro (see apache/hadoop-thirdparty#24 and apache/hadoop-thirdparty#21)
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?