Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image for contributors #72

Open
Triamus opened this issue Jan 23, 2023 · 11 comments
Open

Docker image for contributors #72

Triamus opened this issue Jan 23, 2023 · 11 comments

Comments

@Triamus
Copy link
Contributor

Triamus commented Jan 23, 2023

A ready-to-use docker image for project contributors as alternative to (local) Python environment via Poetry. The image probably would need to include things like

  • base image (e.g. ubuntu)
  • (py)spark
  • delta
  • python libs
  • environment vars
  • etc.

I just tried to find examples for other oss repos but didn't find any in the short research time. So maybe this is not useful to most contributors or there are other reasons not to have it. Nothing would come to my mind atm.

@MrPowers
Copy link
Owner

@Triamus - thanks for adding this.

Anyone in the community can feel free to grab this.

@souvik-databricks
Copy link
Collaborator

@MrPowers I have built this kind of docker images in the past. Mind if I take a stab at this?

@Triamus
Copy link
Contributor Author

Triamus commented Jan 24, 2023

@souvik-databricks you probably know this but a few things I already researched.

I would think that ideally any image is building on top of those efforts but I don't know the timeline. In Jira they speak of Spark 3.4.

@Triamus
Copy link
Contributor Author

Triamus commented Jan 24, 2023

@souvik-databricks I'd be happy to test things out if needed.

@MrPowers
Copy link
Owner

@souvik-databricks - yea, sure, go for it!

I think there are some Delta Lake docker images around. Let me take a look.

@MrPowers
Copy link
Owner

Actually, looks like @Triamus has already provided the link, here it is: GitHub/delta-io/delta-docs: quickstart_docker

@danielbeach
Copy link
Collaborator

danielbeach commented Mar 2, 2023

@MrPowers I have a local branch ready to go for Docker and docker-compose support for mack if you want it. Runs the unit tests inside the container and also has instructions for dropping into the container for development as well. Container has Spark (spark-3.3.2), Delta (delta-core_2.12:2.2.0), etc, everything needed to develop and test.

@MrPowers
Copy link
Owner

MrPowers commented Mar 2, 2023

@danielbeach - yea, that sounds great. Any chance you could send a PR? I'll be happy to test, document in the README, and market. Thank you!

@danielbeach
Copy link
Collaborator

@MrPowers I tried to push a PR, but need access.

@MrPowers
Copy link
Owner

MrPowers commented Mar 2, 2023

@danielbeach - sent you an invite to collab on the repo ;)

@Triamus
Copy link
Contributor Author

Triamus commented Mar 3, 2023

In the opening of the issue, I mentioned that I didn't find nice OSS examples of creating a reproducible local dev setup for contributors of a project. By coincident, I saw a talk from PyData Global 2022 which was recently uploaded to youtube on exactly that topic from one of the core Airflow devs. And it turns out that Airflow has invested a lot in what they call a breeze environment to cover everything from local dev and test to deployment. It is certainly overngineering for mack at this point but it has some nice insights and potential ideas that one can draw inspiration from. I leave the talk and Airflow Breeze docs here for future reference.

From the docs:

Airflow Breeze is an easy-to-use development and test environment using Docker Compose. The environment is available for local use and is also used in Airflow's CI tests. We call it Airflow Breeze as It's a Breeze to contribute to Airflow. The advantages and disadvantages of using the Breeze environment vs. other ways of testing Airflow are described in CONTRIBUTING.rst.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants