Implement pre-commit hooks

Signed-off-by: droctothorpe <[email protected]>
kubeflow · Jul 30, 2024 · ebebb46 · ebebb46
1 parent 736c814
commit ebebb46
Show file tree

Hide file tree

Showing 61 changed files with 242 additions and 174 deletions.
diff --git a/.github/workflows/pre-commit.yaml b/.github/workflows/pre-commit.yaml
@@ -0,0 +1,14 @@
+name: pre-commit
+
+on:
+  pull_request:
+  push:
+    branches: [main]
+
+jobs:
+  pre-commit:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v3
+    - uses: actions/setup-python@v3
+    - uses: pre-commit/[email protected]
diff --git a/.github/workflows/test-python.yaml b/.github/workflows/test-python.yaml
@@ -26,15 +26,6 @@ jobs:
         with:
           python-version: ${{ matrix.python-version }}
 
-        # TODO (andreyvelich): We need to replace this action with script to do
-        # linting and formatting for Training Operator SDK.
-      - name: Check Python code with Black
-        uses: psf/black@stable
-        with:
-          version: 24.2.0
-          options: --check --exclude '/*kubeflow_org_v1*|__init__.py|api_client.py|configuration.py|exceptions.py|rest.py'
-          src: sdk/
-
       - name: Install dependencies
         run: |
           pip install pytest python-dateutil urllib3 kubernetes

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,44 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v2.3.0
+    hooks:
+      - id: check-yaml
+        args: [--allow-multiple-documents]
+      - id: check-json
+      - id: end-of-file-fixer
+      - id: trailing-whitespace
+  - repo: https://github.com/pycqa/isort
+    rev: 5.11.5
+    hooks:
+      - id: isort
+        name: isort
+        entry: isort --profile google
+  - repo: https://github.com/psf/black
+    rev: 24.2.0
+    hooks:
+    - id: black
+      files: sdk/.*
+      exclude: |
+        (?x)^(
+          /*kubeflow_org_v1*|
+          __init__.py|
+          api_client.py|
+          configuration.py|
+          exceptions.py|
+          rest.py
+        )$
+exclude: |
+  (?x)^(
+    pkg/apis/kubeflow.org/v1/openapi_generated.go|
+    pkg/apis/kubeflow.org/v1/zz_.*|
+    pkg/client/.*|
+    test_job/apis/test_job/v1/.*generated.*.go|
+    test_job/client/.*|
+    sdk/python/kubeflow/training/[^/]*.py|
+    sdk/python/kubeflow/training/models/.*|
+    sdk/python/test/.*|
+    docs/api/.*|
+    sdk/python/docs/.*|
+    sdk/python/.openapi-generator/VERSION|
+    sdk/python/kubeflow/__init__.py
+  )$
diff --git a/docs/development/developer_guide.md b/docs/development/developer_guide.md
@@ -5,7 +5,7 @@ Kubeflow Training Operator is currently at v1.
 ## Requirements
 
 - [Go](https://golang.org/) (1.22 or later)
-- [Docker](https://docs.docker.com/) 
+- [Docker](https://docs.docker.com/)
 - [Docker](https://docs.docker.com/) (20.10 or later)
 - [Docker Buildx](https://docs.docker.com/build/buildx/) (0.8.0 or later)
 - [Python](https://www.python.org/) (3.11 or later)
@@ -49,12 +49,12 @@ First, you need to run a Kubernetes cluster locally. We recommend [Kind](https:/
 
 You can create a `kind` cluster by running
 ```sh
-kind create cluster 
+kind create cluster
 ```
-This will load your kubernetes config file with the new cluster. 
+This will load your kubernetes config file with the new cluster.
 
-After creating the cluster, you can check the nodes with the code below which should show you the kind-control-plane. 
-```sh 
+After creating the cluster, you can check the nodes with the code below which should show you the kind-control-plane.
+```sh
 kubectl get nodes
 ```
 The output should look something like below:
@@ -74,9 +74,9 @@ Then we can patch it with the latest operator image.
 ```sh
 kubectl patch -n kubeflow deployments training-operator --type json -p '[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "kubeflow/training-operator:latest"}]'
 ```
-Then we can run the job with the following command. 
+Then we can run the job with the following command.
 
-```sh 
+```sh
 kubectl apply -f https://raw.githubusercontent.com/kubeflow/training-operator/master/examples/pytorch/simple.yaml
 ```
 And we can see the output of the job from the logs, which may take some time to produce but should look something like below.
@@ -116,10 +116,10 @@ make docker-build IMG=my-username/training-operator:my-pr-01
 ```
 You can swap `my-username/training-operator:my-pr-01` with whatever you would like.
 
-## Load docker image 
+## Load docker image
 ```sh
 kind load docker-image my-username/training-operator:my-pr-01
-``` 
+```
 
 ## Modify operator image with new one
 
@@ -129,8 +129,8 @@ kustomize edit set image my-username/training-operator=my-username/training-oper
 ```
 Update the `newTag` key in `./manifests/overlayes/standalone/kustimization.yaml` with the new image.
 
-Deploy the operator with: 
-```sh 
+Deploy the operator with:
+```sh
 kubectl apply -k ./manifests/overlays/standalone
 ```
 And now we can submit jobs to the operator.
@@ -140,7 +140,7 @@ kubectl apply -f https://raw.githubusercontent.com/kubeflow/training-operator/ma
 ```
 You should be able to see a pod for your training operator running in your namespace using
 ```
-kubectl logs -n kubeflow -l training.kubeflow.org/job-name=pytorch-simple 
+kubectl logs -n kubeflow -l training.kubeflow.org/job-name=pytorch-simple
 ```
 ## Go version
 
@@ -187,3 +187,19 @@ sdk/python/kubeflow/training/api
   ```sh
   black --check --exclude '/*kubeflow_org_v1*|__init__.py|api_client.py|configuration.py|exceptions.py|rest.py' sdk/
   ```
+
+### pre-commit
+
+Make sure to install [pre-commit](https://pre-commit.com/) (`pip install
+pre-commit`) and run `pre-commit install` from the root of the repository at
+least once before creating git commits.
+
+The pre-commit [hooks](../.pre-commit-config.yaml) ensure code quality and
+consistency. They are executed in CI. PRs that fail to comply with the hooks
+will not be able to pass the corresponding CI gate. The hooks are only executed
+against staged files unless you run `pre-commit run --all`, in which case,
+they'll be executed against every file in the repository.
+
+Specific programmatically generated files listed in the `exclude` field in
+[.pre-commit-config.yaml](../.pre-commit-config.yaml) are deliberately excluded
+from the hooks.
diff --git a/docs/diagrams/tfjob_k8s_resources.svg b/docs/diagrams/tfjob_k8s_resources.svg
diff --git a/docs/release/changelog.py b/docs/release/changelog.py
@@ -1,6 +1,7 @@
-from github import Github
 import argparse
 
+from github import Github
+
 REPO_NAME = "kubeflow/training-operator"
 CHANGELOG_FILE = "CHANGELOG.md"
 

diff --git a/examples/paddlepaddle/simple-gpu.yaml b/examples/paddlepaddle/simple-gpu.yaml
@@ -33,4 +33,3 @@ spec:
             - name: dshm
               emptyDir:
                 medium: Memory
-
diff --git a/examples/pytorch/README.md b/examples/pytorch/README.md
@@ -1,5 +1,5 @@
-## Installation & deployment tips 
-1. You need to configure your node to utilize GPU. This can be done the following way: 
+## Installation & deployment tips
+1. You need to configure your node to utilize GPU. This can be done the following way:
     * Install [nvidia-docker2](https://github.com/NVIDIA/nvidia-docker)
     * Connect to your MasterNode and set nvidia as the default run in `/etc/docker/daemon.json`:
         ```
@@ -13,11 +13,11 @@
             }
         }
         ```
-    * After that deploy nvidia-daemon to kubernetes: 
+    * After that deploy nvidia-daemon to kubernetes:
         ```bash
         kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
         ```
-        
+
 2. NVIDIA GPUs can now be consumed via container level resource requirements using the resource name nvidia.com/gpu:
       ```
       resources:

diff --git a/examples/pytorch/elastic/echo/echo.yaml b/examples/pytorch/elastic/echo/echo.yaml
@@ -26,4 +26,3 @@ spec:
                 - torch.distributed.run
                 - --rdzv_backend=c10d
                 - ./echo.py
-
diff --git a/examples/pytorch/elastic/etcd.yaml b/examples/pytorch/elastic/etcd.yaml
@@ -71,4 +71,4 @@ spec:
       protocol: TCP
       targetPort: 2380
   selector:
-    etcd_node: etcd-server
+    etcd_node: etcd-server
diff --git a/examples/pytorch/elastic/imagenet/.dockerignore b/examples/pytorch/elastic/imagenet/.dockerignore
@@ -1 +1 @@
-data
+data
diff --git a/examples/pytorch/elastic/imagenet/imagenet.py b/examples/pytorch/elastic/imagenet/imagenet.py
@@ -44,30 +44,30 @@
 """
 
 import argparse
+from contextlib import contextmanager
+from datetime import timedelta
 import io
 import os
 import shutil
 import time
-from contextlib import contextmanager
-from datetime import timedelta
 from typing import List, Tuple
 
 import numpy
 import torch
 import torch.distributed as dist
+from torch.distributed.elastic.multiprocessing.errors import record
+from torch.distributed.elastic.utils.data import ElasticDistributedSampler
 import torch.nn as nn
 import torch.nn.parallel
+from torch.nn.parallel import DistributedDataParallel
 import torch.optim
+from torch.optim import SGD
 import torch.utils.data
+from torch.utils.data import DataLoader
 import torch.utils.data.distributed
 import torchvision.datasets as datasets
 import torchvision.models as models
 import torchvision.transforms as transforms
-from torch.distributed.elastic.multiprocessing.errors import record
-from torch.distributed.elastic.utils.data import ElasticDistributedSampler
-from torch.nn.parallel import DistributedDataParallel
-from torch.optim import SGD
-from torch.utils.data import DataLoader
 
 model_names = sorted(
     name

diff --git a/examples/pytorch/image-classification/Train-CNN-with-FashionMNIST.ipynb b/examples/pytorch/image-classification/Train-CNN-with-FashionMNIST.ipynb
@@ -696,4 +696,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}
diff --git a/examples/pytorch/mnist/Makefile b/examples/pytorch/mnist/Makefile
@@ -19,7 +19,7 @@ IMG = gcr.io/kubeflow-examples/pytorch-dist-mnist
 PUBLIC = gcr.io/kubeflow-examples/pytorch-dist-mnist
 DIR := ${CURDIR}
 
-# List any changed  files. 
+# List any changed  files.
 CHANGED_FILES := $(shell git diff-files --relative=examples/dist-mnist)
 
 ifeq ($(strip $(CHANGED_FILES)),)
@@ -43,9 +43,9 @@ build:
 # Build but don't attach the latest tag. This allows manual testing/inspection of the image
 # first.
 push: build
-	gcloud docker -- push $(IMG):$(TAG)	
+	gcloud docker -- push $(IMG):$(TAG)
 	@echo Pushed $(IMG) with  :$(TAG) tags
-	
+
 push-latest: push
 	gcloud container images add-tag --quiet $(IMG):$(TAG) $(IMG):latest --verbosity=info
 	echo created $(IMG):latest

diff --git a/examples/pytorch/mnist/mnist.py b/examples/pytorch/mnist/mnist.py
@@ -3,14 +3,15 @@
 import argparse
 import os
 
+from tensorboardX import SummaryWriter
 import torch
 import torch.distributed as dist
 import torch.nn as nn
 import torch.nn.functional as F
 import torch.optim as optim
-from tensorboardX import SummaryWriter
 from torch.utils.data import DistributedSampler
-from torchvision import datasets, transforms
+from torchvision import datasets
+from torchvision import transforms
 
 
 class Net(nn.Module):

diff --git a/examples/pytorch/mnist/v1/pytorch_job_mnist_gloo.yaml b/examples/pytorch/mnist/v1/pytorch_job_mnist_gloo.yaml
@@ -22,7 +22,7 @@ spec:
       restartPolicy: OnFailure
       template:
         spec:
-          containers: 
+          containers:
             - name: pytorch
               image: kubeflow/pytorch-dist-mnist:latest
               args: ["--backend", "gloo"]

diff --git a/examples/pytorch/mnist/v1/pytorch_job_mnist_mpi.yaml b/examples/pytorch/mnist/v1/pytorch_job_mnist_mpi.yaml
@@ -14,19 +14,19 @@ spec:
               image: kubeflow/pytorch-dist-mnist:latest
               args: ["--backend", "mpi"]
               # Comment out the below resources to use the CPU.
-              resources: 
+              resources:
                 limits:
                   nvidia.com/gpu: 1
     Worker:
       replicas: 1
-      restartPolicy: OnFailure 
+      restartPolicy: OnFailure
       template:
         spec:
-          containers: 
+          containers:
             - name: pytorch
               image: kubeflow/pytorch-dist-mnist:latest
               args: ["--backend", "mpi"]
               # Comment out the below resources to use the CPU.
-              resources: 
+              resources:
                 limits:
                   nvidia.com/gpu: 1
diff --git a/examples/pytorch/mnist/v1/pytorch_job_mnist_nccl.yaml b/examples/pytorch/mnist/v1/pytorch_job_mnist_nccl.yaml
@@ -13,18 +13,18 @@ spec:
             - name: pytorch
               image: kubeflow/pytorch-dist-mnist:latest
               args: ["--backend", "nccl"]
-              resources: 
+              resources:
                 limits:
                   nvidia.com/gpu: 1
     Worker:
       replicas: 1
       restartPolicy: OnFailure
       template:
         spec:
-          containers: 
+          containers:
             - name: pytorch
               image: kubeflow/pytorch-dist-mnist:latest
               args: ["--backend", "nccl"]
-              resources: 
+              resources:
                 limits:
                   nvidia.com/gpu: 1
diff --git a/examples/pytorch/smoke-dist/README.md b/examples/pytorch/smoke-dist/README.md
@@ -1,4 +1,4 @@
-### Distributed send/recv e2e test 
+### Distributed send/recv e2e test
 
 This folder containers Dockerfile and distributed send/recv test.
 

diff --git a/examples/pytorch/smoke-dist/dist_sendrecv.py b/examples/pytorch/smoke-dist/dist_sendrecv.py
@@ -33,19 +33,19 @@ def init_processes(fn, backend='gloo'):
 
 def main():
     logging.info("Torch version: %s", torch.__version__)
-    
+
     port = os.environ.get("MASTER_PORT", "{}")
     logging.info("MASTER_PORT: %s", port)
-    
+
     addr = os.environ.get("MASTER_ADDR", "{}")
     logging.info("MASTER_ADDR: %s", addr)
 
     world_size = os.environ.get("WORLD_SIZE", "{}")
     logging.info("WORLD_SIZE: %s", world_size)
-    
+
     rank = os.environ.get("RANK", "{}")
     logging.info("RANK: %s", rank)
-    
+
     init_processes(run)
 
 

diff --git a/examples/tensorflow/dist-mnist/README.md b/examples/tensorflow/dist-mnist/README.md
@@ -20,4 +20,4 @@ docker build -f Dockerfile.ppc64le -t kubeflow123/tf-dist-mnist-test:1.0 ./
 ```
 kubectl create -f ./tf_job_mnist.yaml
 ```
-  * If on ppc64le, please update tf_job_mnist.yaml to use the image of ppc64le firstly.
+  * If on ppc64le, please update tf_job_mnist.yaml to use the image of ppc64le firstly.