Skip to content

Commit

Permalink
[SPARK-49402][PYTHON] Fix Binder integration in PySpark documentation
Browse files Browse the repository at this point in the history
This PR proposes to fix Binder integration by using `Dockerfile` directly.

Binder integration is broken now (https://mybinder.org/v2/gh/apache/spark/bb7846dd487?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb):

![Screenshot 2024-08-27 at 2 04 35 PM](https://github.com/user-attachments/assets/29222fc2-7cc6-43fa-8e04-a65c8384c4d5)

This seems to be related to the size of the repository (jupyterhub/mybinder.org-deploy#3074).

I tried all the ways out but could not find the way except using `Dockerfile`.

Yes. This should recover the Binder integration.

Manually tested within my fork:

https://mybinder.org/v2/gh/HyukjinKwon/spark/binder-test1?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb

No.

Closes #47883 from HyukjinKwon/binder-test1.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 9fc1e05)
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
HyukjinKwon committed Aug 27, 2024
1 parent 1eb558c commit ed2d028
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 3 deletions.
43 changes: 43 additions & 0 deletions binder/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM python:3.10-slim
# install the notebook package
RUN pip install --no-cache notebook jupyterlab

# create user with a home directory
ARG NB_USER
ARG NB_UID
ENV USER ${NB_USER}
ENV HOME /home/${NB_USER}

RUN adduser --disabled-password \
--gecos "Default user" \
--uid ${NB_UID} \
${NB_USER}
WORKDIR ${HOME}
USER ${USER}

# Make sure the contents of our repo are in ${HOME}
COPY . ${HOME}
USER root
RUN chown -R ${NB_UID} ${HOME}
RUN apt-get update && apt-get install -y openjdk-8-jre git coreutils
USER ${NB_USER}

RUN binder/postBuild

2 changes: 0 additions & 2 deletions binder/apt.txt

This file was deleted.

2 changes: 1 addition & 1 deletion binder/postBuild
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# Jupyter notebook.

VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
TAG=$(git describe --tags --exact-match 2>/dev/null)
TAG=$(git describe --tags --exact-match 2> /dev/null || true)

# If a commit is tagged, exactly specified version of pyspark should be installed to avoid
# a kind of accident that an old version of pyspark is installed in the live notebook environment.
Expand Down

0 comments on commit ed2d028

Please sign in to comment.