From 72195c090ec41b878a9e55100e3b21af6e3ee3de Mon Sep 17 00:00:00 2001 From: Anna Date: Tue, 27 Feb 2024 09:45:05 -0800 Subject: [PATCH] Link back to watchdog (#3052) --- examples/checkpoint_autoresume.ipynb | 3 +++ 1 file changed, 3 insertions(+) diff --git a/examples/checkpoint_autoresume.ipynb b/examples/checkpoint_autoresume.ipynb index 7b8de078bc..cff6cbd6c2 100644 --- a/examples/checkpoint_autoresume.ipynb +++ b/examples/checkpoint_autoresume.ipynb @@ -10,6 +10,9 @@ "\n", "We've put together this tutorial to demonstrate this feature in action and how you can activate it through the Composer trainer.\n", "\n", + "**🐕 Autoresume via Watchdog**: Composer autoresumption works best when coupled with automated node failure detection and retries on the MosaicML platform. \n", + "See our [platform docs page](https://docs.mosaicml.com/projects/mcli/en/latest/training/watchdog.html) on enabling this feature for your runs\n", + "\n", "### Recommended Background\n", "\n", "This tutorial assumes you are familiar with the Composer trainer basics and its [saving/checkpointing features][checkpointing]. \n",