Skip to content

Commit

Permalink
Merging master branch into sept_release branch (#1229)
Browse files Browse the repository at this point in the history
* Updating issue template for adding issue SLO (#1216)

* adding bug SLO

* adding bug SLO

* Fixing comment

* Integration tests --key-file flag and GOOGLE_APPLICATION_CREDENTIALS env with admin permission tests (#1167)

* updating go version

* empty commit

* local commit

* local changes

* local changes

* local changes

* adding key file tests

* testing

* testing

* testing

* testing

* local changes

* local changes

* local changes

* local changes

* testing

* testing

* testing

* testing

* testing

* adding test for admin creds

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* testing

* formating

* testing defer statement

* testing defer statement for deleting credentials

* adding comment

* testing with error

* testing with error

* testing with error

* removing testing statement

* adding testbucket and mntdir in commnd

* adding comment

* updating bucket name

* updating bucket name

* removing unnecessary changes

* removing unnecessary changes

* removing unnecessary changes

* formatting

* conflict

* adding error handling

* testing

* small fix

* removing creds tests from implicit and explicit dir tests

* testing

* testing

* testing

* testing

* removing testing statement

* adding creds tests in operations back

* Testing

* Testing

* Testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* create service account key testing

* adding remaining changes

* adding remaining changes

* adding remaining changes

* testing service account

* testing service account

* testing service account

* adding comments

* removing unnecessary changes

* formatting

* testing

* testing

* testing

* testing

* removing without key file tests

* small fix

* formalizing for reuse

* small fix

* removing unnecessary changes

* formatting

* updating comment

* updating comment

* updating comment

* fixing comments

* adding comment

* testing

* testing

* adding condintion for service account already exsit

* adding condintion for service account already exsit

* testing time

* running tests only for operations

* Changing the number of epoch based on previous observation. (#1222)

* Adding rpm digest while creating rpm package (#1215)

* removing defer as not working properly in for loop (#1223)

---------

Co-authored-by: Tulsishah <[email protected]>
Co-authored-by: Prince Kumar <[email protected]>
  • Loading branch information
3 people committed Feb 29, 2024
1 parent aac55f2 commit 5a032b3
Show file tree
Hide file tree
Showing 5 changed files with 174 additions and 0 deletions.
95 changes: 95 additions & 0 deletions perfmetrics/scripts/ml_tests/pytorch/dino/setup_container.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/bin/bash

wget -O go_tar.tar.gz https://go.dev/dl/go1.20.4.linux-amd64.tar.gz
rm -rf /usr/local/go && tar -C /usr/local -xzf go_tar.tar.gz
export PATH=$PATH:/usr/local/go/bin

# Clone and build the gcsfuse master branch.
git clone https://github.com/GoogleCloudPlatform/gcsfuse.git
cd gcsfuse
go build .
cd -

# Create a directory for gcsfuse logs
mkdir run_artifacts/gcsfuse_logs

echo "Mounting GCSFuse..."
nohup /pytorch_dino/gcsfuse/gcsfuse --foreground --type-cache-ttl=1728000s \
--stat-cache-ttl=1728000s \
--stat-cache-capacity=1320000 \
--stackdriver-export-interval=60s \
--implicit-dirs \
--max-conns-per-host=100 \
--debug_fuse \
--debug_gcs \
--log-file run_artifacts/gcsfuse.log \
--log-format text \
gcsfuse-ml-data gcsfuse_data > "run_artifacts/gcsfuse.out" 2> "run_artifacts/gcsfuse.err" &

# Update the pytorch library code to bypass the kernel-cache
echo "Updating the pytorch library code to bypass the kernel-cache..."
echo "
def pil_loader(path: str) -> Image.Image:
fd = os.open(path, os.O_DIRECT)
f = os.fdopen(fd, \"rb\")
img = Image.open(f)
rgb_img = img.convert(\"RGB\")
f.close()
return rgb_img
" > bypassed_code.py

folder_file="/opt/conda/lib/python3.7/site-packages/torchvision/datasets/folder.py"
x=$(grep -n "def pil_loader(path: str) -> Image.Image:" $folder_file | cut -f1 -d ':')
y=$(grep -n "def accimage_loader(path: str) -> Any:" $folder_file | cut -f1 -d ':')
y=$((y - 2))
lines="$x,$y"
sed -i "$lines"'d' $folder_file
sed -i "$x"'r bypassed_code.py' $folder_file

# Fix the caching issue - comes when we run the model first time with 8
# nproc_per_node - by downloading the model in single thread environment.
python -c 'import torch;torch.hub.list("facebookresearch/xcit:main")'

ARTIFACTS_BUCKET_PATH="gs://gcsfuse-ml-tests-logs/ci_artifacts/pytorch/dino"
echo "Update status file"
echo "RUNNING" > status.txt
gsutil cp status.txt $ARTIFACTS_BUCKET_PATH/

echo "Update start time file"
echo $(date +"%s") > start_time.txt
gsutil cp start_time.txt $ARTIFACTS_BUCKET_PATH/

(
set +e
# Run the pytorch Dino model
# We need to run it in foreground mode to make the container running.
echo "Running the pytorch dino model..."
experiment=dino_experiment
python3 -m torch.distributed.launch \
--nproc_per_node=2 dino/main_dino.py \
--arch vit_small \
--num_workers 20 \
--data_path gcsfuse_data/imagenet/ILSVRC/Data/CLS-LOC/train/ \
--output_dir "./run_artifacts/$experiment" \
--norm_last_layer False \
--use_fp16 False \
--clip_grad 0 \
--epochs 80 \
--global_crops_scale 0.25 1.0 \
--local_crops_number 10 \
--local_crops_scale 0.05 0.25 \
--teacher_temp 0.07 \
--warmup_teacher_temp_epochs 30 \
--clip_grad 0 \
--min_lr 0.00001
if [ $? -eq 0 ];
then
echo "Pytorch dino model completed the training successfully!"
echo "COMPLETE" > status.txt
else
echo "Pytorch dino model training failed!"
echo "ERROR" > status.txt
fi
)

gsutil cp status.txt $ARTIFACTS_BUCKET_PATH/
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright 2023 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

KEY_FILE_PATH=$1
SERVICE_ACCOUNT=$2
gcloud iam service-accounts keys create $KEY_FILE_PATH --iam-account=$SERVICE_ACCOUNT
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright 2023 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

SERVICE_ACCOUNT=$1
SERVICE_ACCOUNT_ID=$2
# Delete service account if already exist.
gcloud iam service-accounts delete $SERVICE_ACCOUNT_ID
if [ $? -eq 1 ]; then
echo "Service account does not exist."
fi
gcloud iam service-accounts create $SERVICE_ACCOUNT --description="$SERVICE_ACCOUNT" --display-name="$SERVICE_ACCOUNT"
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright 2023 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Provide permission to the bucket.
TEST_BUCKET=$1
SERVICE_ACCOUNT=$2
PERMISSION=$3

gsutil iam ch serviceAccount:$SERVICE_ACCOUNT:$PERMISSION gs://$TEST_BUCKET
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright 2023 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Delete service account after testing
SERVICE_ACCOUNT=$1
KEY_FILE=$2
gcloud auth revoke $SERVICE_ACCOUNT
gcloud iam service-accounts delete $SERVICE_ACCOUNT
rm $KEY_FILE

0 comments on commit 5a032b3

Please sign in to comment.