Skip to content

Commit

Permalink
apacheGH-41430: [Docs] Use sphinxcontrib-mermaid instead of generatin…
Browse files Browse the repository at this point in the history
…g images from .mmd (apache#41455)

### Rationale for this change

This is for easy to maintain. 

### What changes are included in this PR?

* Install sphinxcontrib-mermaid
* Install Chromium to generate SVG from .mmd
* Use Debian instead of Ubuntu for building docs because Ubuntu provides Chromium only via snap
* Use a normal user not root to build documents because Mermaid require additional `--no-sandbox` argument when we use root

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#41430

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
  • Loading branch information
kou authored and vibhatha committed May 25, 2024
1 parent f65f29c commit 6f7e73d
Show file tree
Hide file tree
Showing 37 changed files with 210 additions and 135 deletions.
13 changes: 7 additions & 6 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@ env:
jobs:

complete:
name: AMD64 Ubuntu 22.04 Complete Documentation
name: AMD64 Debian 12 Complete Documentation
runs-on: ubuntu-latest
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
timeout-minutes: 150
env:
UBUNTU: "22.04"
JDK: 17
steps:
- name: Checkout Arrow
uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.0.0
Expand All @@ -50,8 +50,8 @@ jobs:
uses: actions/cache@13aacd865c20de90d75de3b17ebe84f7a17d57d2 # v4.0.0
with:
path: .docker
key: ubuntu-docs-${{ hashFiles('cpp/**') }}
restore-keys: ubuntu-docs-
key: debian-docs-${{ hashFiles('cpp/**') }}
restore-keys: debian-docs-
- name: Setup Python
uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5.1.0
with:
Expand All @@ -62,7 +62,8 @@ jobs:
env:
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}
run: archery docker run ubuntu-docs
JDK: 17
run: archery docker run debian-docs
- name: Docker Push
if: >-
success() &&
Expand All @@ -73,4 +74,4 @@ jobs:
ARCHERY_DOCKER_USER: ${{ secrets.DOCKERHUB_USER }}
ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}
continue-on-error: true
run: archery docker push ubuntu-docs
run: archery docker push debian-docs
2 changes: 1 addition & 1 deletion .github/workflows/docs_light.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ on:

permissions:
contents: read

env:
ARCHERY_DEBUG: 1
ARCHERY_USE_DOCKER_CLI: 1
Expand Down
1 change: 1 addition & 0 deletions ci/conda_env_sphinx.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ sphinx-design
sphinx-copybutton
sphinx-lint
sphinxcontrib-jquery
sphinxcontrib-mermaid
sphinx==6.2
# Requirement for doctest-cython
# Needs upper pin of 0.3.0, see:
Expand Down
60 changes: 39 additions & 21 deletions ci/docker/linux-apt-docs.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,34 @@ FROM ${base}
ARG r=4.4
ARG jdk=8

# See R install instructions at https://cloud.r-project.org/bin/linux/ubuntu/
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium

# See R install instructions at https://cloud.r-project.org/bin/linux/
RUN apt-get update -y && \
apt-get install -y \
dirmngr \
apt-transport-https \
software-properties-common && \
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \
tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc && \
add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu '$(lsb_release -cs)'-cran40/' && \
dirmngr \
gpg \
lsb-release && \
gpg --keyserver keyserver.ubuntu.com \
--recv-key 95C0FAF38DB3CCAD0C080A7BDC78B2DDEABC47B7 && \
gpg --export 95C0FAF38DB3CCAD0C080A7BDC78B2DDEABC47B7 | \
gpg --no-default-keyring \
--keyring /usr/share/keyrings/cran.gpg \
--import - && \
echo "deb [signed-by=/usr/share/keyrings/cran.gpg] https://cloud.r-project.org/bin/linux/$(lsb_release -is | tr 'A-Z' 'a-z') $(lsb_release -cs)-cran40/" | \
tee /etc/apt/sources.list.d/cran.list && \
if [ -f /etc/apt/sources.list.d/debian.sources ]; then \
sed -i \
-e 's/main$/main contrib non-free non-free-firmware/g' \
/etc/apt/sources.list.d/debian.sources; \
fi && \
apt-get update -y && \
apt-get install -y --no-install-recommends \
autoconf-archive \
automake \
chromium \
chromium-sandbox \
curl \
doxygen \
gi-docgen \
Expand All @@ -48,16 +64,21 @@ RUN apt-get update -y && \
libxml2-dev \
meson \
ninja-build \
nodejs \
npm \
nvidia-cuda-toolkit \
openjdk-${jdk}-jdk-headless \
pandoc \
r-recommended=${r}* \
r-base=${r}* \
rsync \
ruby-dev \
sudo \
wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
rm -rf /var/lib/apt/lists/* && \
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
npm install -g yarn @mermaid-js/mermaid-cli

ENV JAVA_HOME=/usr/lib/jvm/java-${jdk}-openjdk-amd64

Expand All @@ -68,20 +89,6 @@ RUN /arrow/ci/scripts/util_download_apache.sh \
ENV PATH=/opt/apache-maven-${maven}/bin:$PATH
RUN mvn -version

ARG node=16
RUN apt-get purge -y npm && \
apt-get autoremove -y --purge && \
wget -q -O - https://deb.nodesource.com/setup_${node}.x | bash - && \
apt-get install -y nodejs && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
npm install -g yarn

COPY docs/requirements.txt /arrow/docs/
RUN python3 -m venv ${ARROW_PYTHON_VENV} && \
. ${ARROW_PYTHON_VENV}/bin/activate && \
pip install -r arrow/docs/requirements.txt

COPY c_glib/Gemfile /arrow/c_glib/
RUN gem install --no-document bundler && \
bundle install --gemfile /arrow/c_glib/Gemfile
Expand All @@ -98,6 +105,17 @@ COPY r/DESCRIPTION /arrow/r/
RUN /arrow/ci/scripts/r_deps.sh /arrow && \
R -e "install.packages('pkgdown')"

RUN useradd --user-group --create-home --groups audio,video arrow
RUN echo "arrow ALL=(ALL:ALL) NOPASSWD:ALL" | \
EDITOR=tee visudo -f /etc/sudoers.d/arrow
USER arrow

COPY docs/requirements.txt /arrow/docs/
RUN sudo chown -R arrow: ${ARROW_PYTHON_VENV} && \
python3 -m venv ${ARROW_PYTHON_VENV} && \
. ${ARROW_PYTHON_VENV}/bin/activate && \
pip install -r arrow/docs/requirements.txt

ENV ARROW_ACERO=ON \
ARROW_AZURE=OFF \
ARROW_BUILD_STATIC=OFF \
Expand Down
13 changes: 9 additions & 4 deletions ci/scripts/cpp_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -229,12 +229,17 @@ find . -name "*.o" -delete
popd

if [ -x "$(command -v ldconfig)" ]; then
ldconfig ${ARROW_HOME}/${CMAKE_INSTALL_LIBDIR:-lib}
if [ -x "$(command -v sudo)" ]; then
SUDO=sudo
else
SUDO=
fi
${SUDO} ldconfig ${ARROW_HOME}/${CMAKE_INSTALL_LIBDIR:-lib}
fi

if [ "${ARROW_USE_CCACHE}" == "ON" ]; then
echo -e "===\n=== ccache statistics after build\n==="
ccache -sv 2>/dev/null || ccache -s
echo -e "===\n=== ccache statistics after build\n==="
ccache -sv 2>/dev/null || ccache -s
fi

if command -v sccache &> /dev/null; then
Expand All @@ -244,6 +249,6 @@ fi

if [ "${BUILD_DOCS_CPP}" == "ON" ]; then
pushd ${source_dir}/apidoc
doxygen
OUTPUT_DIRECTORY=${build_dir}/apidoc doxygen
popd
fi
2 changes: 2 additions & 0 deletions ci/scripts/integration_arrow.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ if [ "${ARROW_INTEGRATION_JAVA}" == "ON" ]; then
pip install jpype1
fi

export ARROW_BUILD_ROOT=${build_dir}

# Get more detailed context on crashes
export PYTHONFAULTHANDLER=1

Expand Down
13 changes: 11 additions & 2 deletions ci/scripts/java_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,16 @@ fi
# Use `2 * ncores` threads
mvn="${mvn} -T 2C"

pushd ${source_dir}
# https://github.com/apache/arrow/issues/41429
# TODO: We want to out-of-source build. This is a workaround. We copy
# all needed files to the build directory from the source directory
# and build in the build directory.
mkdir -p ${build_dir}
rm -rf ${build_dir}/format
cp -aL ${arrow_dir}/format ${build_dir}/
rm -rf ${build_dir}/java
cp -aL ${source_dir} ${build_dir}/
pushd ${build_dir}/java

if [ "${ARROW_JAVA_SHADE_FLATBUFFERS}" == "ON" ]; then
mvn="${mvn} -Pshade-flatbuffers"
Expand All @@ -95,7 +104,7 @@ if [ "${BUILD_DOCS_JAVA}" == "ON" ]; then
# HTTP pooling is turned of to avoid download issues https://issues.apache.org/jira/browse/ARROW-11633
mkdir -p ${build_dir}/docs/java/reference
${mvn} -Dcheckstyle.skip=true -Dhttp.keepAlive=false -Dmaven.wagon.http.pool=false clean install site
rsync -a ${arrow_dir}/java/target/site/apidocs/ ${build_dir}/docs/java/reference
rsync -a target/site/apidocs/ ${build_dir}/docs/java/reference
fi

popd
4 changes: 2 additions & 2 deletions ci/scripts/java_cdata_integration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@
set -ex

arrow_dir=${1}
export ARROW_SOURCE_DIR=${arrow_dir}
build_dir=${2}

pushd ${arrow_dir}/java/c/src/test/python
pushd ${build_dir}/java/c/src/test/python

python integration_tests.py

Expand Down
19 changes: 14 additions & 5 deletions ci/scripts/js_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,16 @@ build_dir=${2}

: ${BUILD_DOCS_JS:=OFF}

pushd ${source_dir}
# https://github.com/apache/arrow/issues/41429
# TODO: We want to out-of-source build. This is a workaround. We copy
# all needed files to the build directory from the source directory
# and build in the build directory.
rm -rf ${build_dir}/js
mkdir -p ${build_dir}
cp -aL ${arrow_dir}/LICENSE.txt ${build_dir}/
cp -aL ${arrow_dir}/NOTICE.txt ${build_dir}/
cp -aL ${source_dir} ${build_dir}/js
pushd ${build_dir}/js

yarn --immutable
yarn lint:ci
Expand All @@ -34,18 +43,18 @@ yarn build
if [ "${BUILD_DOCS_JS}" == "ON" ]; then
# If apache or upstream are defined use those as remote.
# Otherwise use origin which could be a fork on PRs.
if [ "$(git config --get remote.apache.url)" == "[email protected]:apache/arrow.git" ]; then
if [ "$(git -C ${arrow_dir} config --get remote.apache.url)" == "[email protected]:apache/arrow.git" ]; then
yarn doc --gitRemote apache
elif [[ "$(git config --get remote.upstream.url)" =~ "https://github.com/apache/arrow" ]]; then
elif [[ "$(git -C ${arrow_dir}config --get remote.upstream.url)" =~ "https://github.com/apache/arrow" ]]; then
yarn doc --gitRemote upstream
elif [[ "$(basename -s .git $(git config --get remote.origin.url))" == "arrow" ]]; then
elif [[ "$(basename -s .git $(git -C ${arrow_dir} config --get remote.origin.url))" == "arrow" ]]; then
yarn doc
else
echo "Failed to build docs because the remote is not set correctly. Please set the origin or upstream remote to https://github.com/apache/arrow.git or the apache remote to [email protected]:apache/arrow.git."
exit 0
fi
mkdir -p ${build_dir}/docs/js
rsync -a ${arrow_dir}/js/doc/ ${build_dir}/docs/js
rsync -a doc/ ${build_dir}/docs/js
fi

popd
3 changes: 2 additions & 1 deletion ci/scripts/js_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@
set -ex

source_dir=${1}/js
build_dir=${2}/js

pushd ${source_dir}
pushd ${build_dir}

yarn lint
yarn test
Expand Down
33 changes: 29 additions & 4 deletions ci/scripts/python_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,17 +78,42 @@ export PYARROW_PARALLEL=${n_jobs}
export CMAKE_PREFIX_PATH
export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}

pushd ${source_dir}
# https://github.com/apache/arrow/issues/41429
# TODO: We want to out-of-source build. This is a workaround. We copy
# all needed files to the build directory from the source directory
# and build in the build directory.
rm -rf ${python_build_dir}
cp -aL ${source_dir} ${python_build_dir}
pushd ${python_build_dir}
# - Cannot call setup.py as it may install in the wrong directory
# on Debian/Ubuntu (ARROW-15243).
# - Cannot use build isolation as we want to use specific dependency versions
# (e.g. Numpy, Pandas) on some CI jobs.
${PYTHON:-python} -m pip install --no-deps --no-build-isolation -vv .
# Remove build artifacts from source directory
find build/ -user root -delete
popd

if [ "${BUILD_DOCS_PYTHON}" == "ON" ]; then
# https://github.com/apache/arrow/issues/41429
# TODO: We want to out-of-source build. This is a workaround.
#
# Copy docs/source because the "autosummary_generate = True"
# configuration generates files to docs/source/python/generated/.
rm -rf ${python_build_dir}/docs/source
mkdir -p ${python_build_dir}/docs
cp -a ${arrow_dir}/docs/source ${python_build_dir}/docs/
rm -rf ${python_build_dir}/format
cp -a ${arrow_dir}/format ${python_build_dir}/
rm -rf ${python_build_dir}/cpp/examples
mkdir -p ${python_build_dir}/cpp
cp -a ${arrow_dir}/cpp/examples ${python_build_dir}/cpp/
rm -rf ${python_build_dir}/ci
cp -a ${arrow_dir}/ci/ ${python_build_dir}/
ncpus=$(python -c "import os; print(os.cpu_count())")
sphinx-build -b html -j ${ncpus} ${arrow_dir}/docs/source ${build_dir}/docs
export ARROW_CPP_DOXYGEN_XML=${build_dir}/cpp/apidoc/xml
pushd ${build_dir}
sphinx-build \
-b html \
${python_build_dir}/docs/source \
${build_dir}/docs
popd
fi
20 changes: 17 additions & 3 deletions ci/scripts/r_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,29 @@ build_dir=${2}

: ${BUILD_DOCS_R:=OFF}

pushd ${source_dir}
# https://github.com/apache/arrow/issues/41429
# TODO: We want to out-of-source build. This is a workaround. We copy
# all needed files to the build directory from the source directory
# and build in the build directory.
rm -rf ${build_dir}/r
cp -aL ${source_dir} ${build_dir}/r
pushd ${build_dir}/r

# build first so that any stray compiled files in r/src are ignored
${R_BIN} CMD build .
${R_BIN} CMD INSTALL ${INSTALL_ARGS} arrow*.tar.gz
if [ -x "$(command -v sudo)" ]; then
SUDO=sudo
else
SUDO=
fi
${SUDO} \
env \
PKG_CONFIG_PATH=${ARROW_HOME}/lib/pkgconfig:${PKG_CONFIG_PATH} \
${R_BIN} CMD INSTALL ${INSTALL_ARGS} arrow*.tar.gz

if [ "${BUILD_DOCS_R}" == "ON" ]; then
${R_BIN} -e "pkgdown::build_site(install = FALSE)"
rsync -a ${source_dir}/docs/ ${build_dir}/docs/r
rsync -a docs/ ${build_dir}/docs/r
fi

popd
4 changes: 4 additions & 0 deletions dev/archery/archery/docker/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,10 @@ def run(self, service_name, command=None, *, env=None, volumes=None,
v = "{}:{}".format(v['source'], v['target'])
args.extend(['-v', v])

# append capabilities from the compose conf
for c in service.get('cap_add', []):
args.extend([f'--cap-add={c}'])

# infer whether an interactive shell is desired or not
if command in ['cmd.exe', 'bash', 'sh', 'powershell']:
args.append('-it')
Expand Down
Loading

0 comments on commit 6f7e73d

Please sign in to comment.