Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement: Use tagged releases to target specific Hive releases #63

Open
berglh opened this issue Jul 5, 2022 · 0 comments
Open

Improvement: Use tagged releases to target specific Hive releases #63

berglh opened this issue Jul 5, 2022 · 0 comments

Comments

@berglh
Copy link

berglh commented Jul 5, 2022

I think it would be a good idea for commits to be properly tagged as targeted at specific Hive versions. Currently, the two main references are "master" or "branch-3.4.0". These branches have the ability to change over time, so rebuilding a specific state is difficult as the target is changing with any new commit. Adding Hive version-based tags/releases will at least give some clarity and consistency when using the repository directly to build by being able to checkout specific tags. We will know that tag is targeted at a specific Hive version.

Think of it this way. In a CICD pipeline, you would want to reference the Hive version being built, so you can clone the Hive repository to patch and build:

hive_version = 2.3.7
# clone the specific version of hive
git clone --depth 1 -b rel/release-${hive_version} https://github.com/apache/hive.git /build/hive

In this case above, I would need to clone master and apply the patch linked in the README to Hive. However, if I want to build Hive 3.x to match the EMR container, it's not clear how to do this from the page or from releases. The branch is named 3.4.0; is this Hive 3.4.0 or something else? How can I trust that master will always work with 2.3.7 and that branch-3.4.0 will match Hive 3.1.x?

Ideally, there would be either a one to one mapping where the tags for this repository would have an exact match for the Hive releases or if they are generic enough, perhaps the major Hive released.

The problem is, if I reference branch-3.4.0 of this repository in CICD, then there is no way I can guarantee that the build work work again in a year when this branch has been committed to master, updated with new version compatibility or deleted. At least tagging specific commits on master as a release would avoid this issue. Referencing branches to build against feels like the wrong solution for automated builds IMO, as they are practically used for developing features rather than referencing stable releases.

Curious to hear anyone's opinion on this, just my two cents, I've faced a lot of confusion and time wasted trying to get this repo to work in our EKS cluster correctly using Spark K8s cluster and Glue/Lake Formation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant