You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it would be a good idea for commits to be properly tagged as targeted at specific Hive versions. Currently, the two main references are "master" or "branch-3.4.0". These branches have the ability to change over time, so rebuilding a specific state is difficult as the target is changing with any new commit. Adding Hive version-based tags/releases will at least give some clarity and consistency when using the repository directly to build by being able to checkout specific tags. We will know that tag is targeted at a specific Hive version.
Think of it this way. In a CICD pipeline, you would want to reference the Hive version being built, so you can clone the Hive repository to patch and build:
hive_version = 2.3.7
# clone the specific version of hive
git clone --depth 1 -b rel/release-${hive_version} https://github.com/apache/hive.git /build/hive
In this case above, I would need to clone master and apply the patch linked in the README to Hive. However, if I want to build Hive 3.x to match the EMR container, it's not clear how to do this from the page or from releases. The branch is named 3.4.0; is this Hive 3.4.0 or something else? How can I trust that master will always work with 2.3.7 and that branch-3.4.0 will match Hive 3.1.x?
Ideally, there would be either a one to one mapping where the tags for this repository would have an exact match for the Hive releases or if they are generic enough, perhaps the major Hive released.
The problem is, if I reference branch-3.4.0 of this repository in CICD, then there is no way I can guarantee that the build work work again in a year when this branch has been committed to master, updated with new version compatibility or deleted. At least tagging specific commits on master as a release would avoid this issue. Referencing branches to build against feels like the wrong solution for automated builds IMO, as they are practically used for developing features rather than referencing stable releases.
Curious to hear anyone's opinion on this, just my two cents, I've faced a lot of confusion and time wasted trying to get this repo to work in our EKS cluster correctly using Spark K8s cluster and Glue/Lake Formation.
The text was updated successfully, but these errors were encountered:
I think it would be a good idea for commits to be properly tagged as targeted at specific Hive versions. Currently, the two main references are "master" or "branch-3.4.0". These branches have the ability to change over time, so rebuilding a specific state is difficult as the target is changing with any new commit. Adding Hive version-based tags/releases will at least give some clarity and consistency when using the repository directly to build by being able to checkout specific tags. We will know that tag is targeted at a specific Hive version.
Think of it this way. In a CICD pipeline, you would want to reference the Hive version being built, so you can clone the Hive repository to patch and build:
In this case above, I would need to clone master and apply the patch linked in the README to Hive. However, if I want to build Hive 3.x to match the EMR container, it's not clear how to do this from the page or from releases. The branch is named 3.4.0; is this Hive 3.4.0 or something else? How can I trust that master will always work with 2.3.7 and that branch-3.4.0 will match Hive 3.1.x?
Ideally, there would be either a one to one mapping where the tags for this repository would have an exact match for the Hive releases or if they are generic enough, perhaps the major Hive released.
The problem is, if I reference branch-3.4.0 of this repository in CICD, then there is no way I can guarantee that the build work work again in a year when this branch has been committed to master, updated with new version compatibility or deleted. At least tagging specific commits on master as a release would avoid this issue. Referencing branches to build against feels like the wrong solution for automated builds IMO, as they are practically used for developing features rather than referencing stable releases.
Curious to hear anyone's opinion on this, just my two cents, I've faced a lot of confusion and time wasted trying to get this repo to work in our EKS cluster correctly using Spark K8s cluster and Glue/Lake Formation.
The text was updated successfully, but these errors were encountered: