diff --git a/docs/admin/ondemand.md b/docs/admin/ondemand.md index 8a9c29b..05b6dee 100644 --- a/docs/admin/ondemand.md +++ b/docs/admin/ondemand.md @@ -41,6 +41,7 @@ After that, you can `git clone` apps to your OOD app developer environment locat batch_connect: basic: script_wrapper: | + source /etc/profile module purge %s set_host: host=$(facter fqdn) @@ -103,18 +104,59 @@ openondemand::conf: ``` support_ticket: - data: - support_ticket: - email: - from: "noreply@%{trusted.domain}" - to: hpc-help@ucdavis.edu + data: + support_ticket: + email: + from: "noreply@%{trusted.domain}" + to: hpc-help@ucdavis.edu ``` More about `ondemand.d`, `openondemand::confs`, and their function and format can be found here: and here: + +## Admin Troublshooting + +```mermaid +sequenceDiagram + user->>apache: `/var/log/apache2/error.log` + apache->>CAS: `/var/cache/apache2/mod_auth_cas/` + CAS->>apache: return + apache->>pun: `/var/log/apache2/$fqdn_error.log` + pun->>dashboard: `/var/log/ondemand-nginx/$user/error.log` + dashboard->>oodapp: `$home/ondemand/data/sys/dashboard/batch_connect/sys/$app/output/$session_id/output.log` + oodapp->>user: render +``` + +1. Apache + To start, all users who navigate to the ondemand website first encounter the apache server. Any errors encountered at this step will be in the log(s) at `/var/log/apache2/error.log` + +2. CAS + Apache then redirects the users to CAS for authentication. You can `grep -r $user /var/cache/apache2/mod_auth_cas/` to check if users have been authed to CAS and a cookie has been set. + +3. Apache part deux + CAS brings us back to apache and here apache runs all sorts of OOD Lua hooks. Any errors encountered at this step will be in the l +og(s) at `/var/log/apache2/$fqdn_error.log` + +4. The PUN (Per User Nginx) session + Apache then starts an NginX server as the user and most things like the main dashboard, submitting jobs, running apps, etc happen here in the PUN. Any errors encountered at this step will be in the logs at `/var/log/ondemand-nginx/$user/error.log`. You can also see what might be happening here by running commands like `ps aux | grep $USER` to see the users PUN, or `ps aux | grep -i nginx` to see all the PUNs. From the ondemand web UI theres an option to "Restart Web Server" which essentially kills and restarts the users PUN. + +5. /pun/sys/dashboard + The dashboard is mostly covered in section 4, but just wanted to denote that apache then redirects us here after the PUN has been started where users can do everything else. At this step OOD will warn you about things like "Home Directory Not Found" and such. If you get this far I'd recommend you troubleshoot issues with users' home dir, NASii, and free space: `df | grep $HOME`, `du -sh $HOME`, `journalctl -u autofs`, and umount stuff. Check that `$HOME/ondemand` exists perhaps. + +6. OOD Apps + When users start an app like JuyterLab or a VNC desktop the job is submitted by the users' PUN and here OOD copies and renders (with ERB) the global app template from `/var/www/ood/apps/sys//template/*` to `$HOME/ondemand/data/sys/dashboard/batch_connect/sys//(output)/`. Any errors encountered at this step will be in `$HOME/ondemand/data/sys/dashboard/batch_connect/sys//(output)//*.log`. + + ## OOD FQDNs + ### Farm -#### Production: `ondemand.vm.farm.hpc.ucdavis.edu` -#### Dev: `ondemand-dev.vm.farm.hpc.ucdavis.edu` +#### Production: `ondemand.farm.hpc.ucdavis.edu` +#### Dev: `dood.vm.farm.hpc.ucdavis.edu` + +### Franklin +#### Production: `ondemand.franklin.hpc.ucdavis.edu` + +### Hive +#### Production: `ondemand.hive.hpc.ucdavis.edu`