diff --git a/docs/ai-testbed/cerebras/files/Trust_ctl.png b/docs/ai-testbed/cerebras/files/Trust_ctl.png new file mode 100644 index 000000000..dd551d3c0 Binary files /dev/null and b/docs/ai-testbed/cerebras/files/Trust_ctl.png differ diff --git a/docs/ai-testbed/cerebras/files/grafana_ctl.png b/docs/ai-testbed/cerebras/files/grafana_ctl.png new file mode 100644 index 000000000..bc045a8bd Binary files /dev/null and b/docs/ai-testbed/cerebras/files/grafana_ctl.png differ diff --git a/docs/ai-testbed/cerebras/job-queuing-and-submission.md b/docs/ai-testbed/cerebras/job-queuing-and-submission.md index efafe454d..9de47d109 100644 --- a/docs/ai-testbed/cerebras/job-queuing-and-submission.md +++ b/docs/ai-testbed/cerebras/job-queuing-and-submission.md @@ -13,6 +13,7 @@ NAME AGE DURATION PHASE SYSTEMS USER LABEL wsjob-thjj8zticwsylhppkbmjqe 13s 1s RUNNING cer-cs2-01 username name=unet_pt https://grafana.cerebras1.lab.alcf.anl.gov/d/WebHNShVz/wsjob-dashboard?orgId=1&var-wsjob=wsjob-thjj8zticwsylhppkbmjqe&from=1691705374000&to=now (venv_pt) $ ``` +To view the grafana databoard for a job, follow the instructions at [Grafana WsJob Dashboard for Cerebras jobs](./miscellaneous.md#grafana-wsjob-dashboard-for-cerebras-jobs) Jobs can be canceled as shown: diff --git a/docs/ai-testbed/cerebras/miscellaneous.md b/docs/ai-testbed/cerebras/miscellaneous.md index 2a8f323bc..d7a0751d3 100644 --- a/docs/ai-testbed/cerebras/miscellaneous.md +++ b/docs/ai-testbed/cerebras/miscellaneous.md @@ -5,6 +5,48 @@ Cerebras documentation for porting code to run on a Cerebras CS-2 system:
[Ways to port your model](https://docs.cerebras.net/en/latest/wsc/port/index.html) +## Grafana WsJob Dashboard for Cerebras jobs +A Grafana dashboard provides support for visualizing, querying, and exploring the CS2 system’s metrics and enables to access system logs and traces. +See the Cerebras documentation for the [Job Information Dashboard](https://docs.cerebras.net/en/latest/wsc/getting-started/grafana.html#wsjob-dashboard) + +Here is a summary (tested to work on Ubuntu and MacOS)
+ +On your work machine with a web browser, e.g. your laptop,
+edit /etc/hosts, using your editor of choice +```console +sudo nano /etc/hosts +``` +Add this line +```console +127.0.0.1 grafana.cerebras1.lab.alcf.anl.gov +``` +Save, and exit the editor + +Download the Grafana certificate present on the Cerebras node at /opt/cerebras/certs/grafana_tls.crt to your local machine. To add this certificate to your browser keychain, + +1. On chrome, go to Settings->Privacy and security->Security->Manage device certificates +2. Select System under "System Keychains" on the left hand side of your screen. Also select the "Certificate" tab. +3. Drag and drop the downloaded certificate. Once it is added, it is visible as "lab.alcf.anl.gov" + ![Cerebras Wafer-Scale Cluster connection diagram](files/grafana_ctl.png) +4. Select the certificate, and ensure that the "Trust" section is set to "Always Trust" + ![Cerebras Wafer-Scale Cluster connection diagram](files/Trust_ctl.png) + + +On your work machine with a web browser, e.g. your laptop,
+tunnel the grafana https port on the cerebras grafana host through to localhost +``` +ssh -L 8443:grafana.cerebras1.lab.alcf.anl.gov:443 arnoldw@cer-login-03.ai.alcf.anl.gov +``` + +Point a browser at grafana. (Tested with Firefox and Chrome/Brave)
+Open browser to a job grafana url shown in csctl get jobs, adding :8443 to hostname, e.g.
+```console +https://grafana.cerebras1.lab.alcf.anl.gov:8443/d/WebHNShVz/wsjob-dashboard?orgId=1&var-wsjob=wsjob-49b7uuojdelvtrcxu3cwbw&from=1684859330000&to=noww +``` + +Login to the dashboard with user admin, and password prom-operator + +