Skip to content

Add CI for checking for broken links manually, weekly and in PRs #6

Add CI for checking for broken links manually, weekly and in PRs

Add CI for checking for broken links manually, weekly and in PRs #6

Workflow file for this run

name: Check URLs
on:
workflow_dispatch:
schedule:
- cron: '17 5 * * 0' # 5:17 AM every Sunday
pull_request:
branches: [ main ]
env:
ignore_url_patterns: |
http://localhost:4000
https://preview.bssw.io
https://github.com/<your-github-handle>
ignore_file_patterns: |
docs/
images/
utils/
Events/
jobs:
check-urls:
runs-on: ubuntu-latest
steps:
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
- name: Install dependencies
run: |
python -m pip --no-cache-dir --disable-pip-version-check install --upgrade pip
python -m pip --no-cache-dir --disable-pip-version-check install linkchecker
- name: Reformat environment variables
id: setup_vars
run: |
tmp=$(echo "${{ env.ignore_url_patterns }}" | tr '\n' ',' | sed -e 's/,,$//')
echo "ignore_url_patterns=$tmp" >> $GITHUB_OUTPUT
tmp=$(echo "${{ env.ignore_file_patterns }}" | tr '\n' ',' | sed -e 's/,,$//')
echo "ignore_file_patterns=$tmp" >> $GITHUB_OUTPUT
- name: Checkout Repository
uses: actions/checkout@v4
- name: Get Changed Files (for PRs)
if: ${{ github.event_name == 'pull_request' }}
id: changed-files
uses: tj-actions/changed-files@v42
with:
separator: ','
- name: Generate lists of files to check and ignore
id: file_list
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
echo "files=${{ steps.changed-files.outputs.all_changed_files }}" >> $GITHUB_OUTPUT
echo "ignore_file_patterns=" >> $GITHUB_OUTPUT
else
echo "files=" >> $GITHUB_OUTPUT
echo "ignore_file_patterns=${{ steps.setup_vars.outputs.ignore_file_patterns }}" >> $GITHUB_OUTPUT
fi
#exclude_patterns: ${{ steps.setup_vars.outputs.ignore_url_patterns }}
- name: Check URLs in selected files
run: |
which linkchecker
for f in ${{ steps.file_list.outputs.files }}; do
for ef in ${{ steps.file_list.outputs.ignore_file_patterns }}; do
if [ "$ef" == "$f" ]; then
continue 2 # ignore this file
fi
done
echo $f
done
#
# Description:
#
# Triggers in one of three ways; 1) manually, 2) scheduled weekly Sunday's 5:17 AM
# or 3) pull request
#
# Stores ignore pattern cases in env. variables and then reformats those (because
# they have newlines) into a comma-separated single line string that can be digested
# as inputs to other actions.
#
# For PRs, uses changed-files action to get list of changed files and passes this
# to urlchecker via `include_files param. Also, ignore file patterns is set to
# empty string for PRs because we think URLs anywhere in PRs should be checked.
#
# For scheduled or manual triggers, uses fact that empty `include_files` param
# causes urlchecker to process *all* files that match in `file_type` param but do
# not match any `exclude_files` patterns. These file patterns for exclude work
# more or less like file globs. So, specifying the initial part of the string
# for a file (path) name is sufficient to ignore the file.
#
# We include Events in file patterns to ignore because of all content we host,
# we suspect Event URLs are the most likely to go stale rather quickly **and** because
# the URL validness is important only during the short window prior to the event.
# That said, we don't want to ignore Events in PRs and we do not as per above.
#