From f6cc295aef1dbc1e90019a9cbba40b054d33ae84 Mon Sep 17 00:00:00 2001
From: Silvio Rizzi <31414702+srizzi88@users.noreply.github.com>
Date: Thu, 31 Oct 2024 10:53:24 -0500
Subject: [PATCH 1/6] fixed typo (#510)

Co-authored-by: Silvio Rizzi <srizzi@anl.gov>
---
 docs/sophia/visualization/paraview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sophia/visualization/paraview.md b/docs/sophia/visualization/paraview.md
index d4007cb01..569bbcff3 100644
--- a/docs/sophia/visualization/paraview.md
+++ b/docs/sophia/visualization/paraview.md
@@ -67,7 +67,7 @@ There are a number of parameters that you must enter manually here:
 
 **SSH executable:** the name of your ssh command. It may be different on Windows depending on the ssh client installed (i.e putty)
 
-**Remote machine:** leave this value at polaris.alcf.anl.gov
+**Remote machine:** leave this value at sophia.alcf.anl.gov
 
 **Username:** your ALCF user name
 

From cfc239ba7f792b8150f00ba94506f268d58d2f7f Mon Sep 17 00:00:00 2001
From: Riccardo Balin <balin@aurora-uan-0009.hostmgmt.cm.aurora.alcf.anl.gov>
Date: Thu, 31 Oct 2024 20:16:49 +0000
Subject: [PATCH 2/6] Updated LibTorch docs for Aurora

---
 .../data-science/frameworks/libtorch.md       | 295 +++++++++++-------
 1 file changed, 174 insertions(+), 121 deletions(-)

diff --git a/docs/aurora/data-science/frameworks/libtorch.md b/docs/aurora/data-science/frameworks/libtorch.md
index 3da31b7b7..133f3219e 100644
--- a/docs/aurora/data-science/frameworks/libtorch.md
+++ b/docs/aurora/data-science/frameworks/libtorch.md
@@ -9,32 +9,105 @@ During compilation, Intel optimizations will be activated automatically once the
 ## Environment Setup
 
 To use LibTorch on Aurora, load the ML frameworks module
+```bash
+module load frameworks/2024.2.1_u1
 ```
-module use /soft/modulefiles
-module load frameworks/2023.12.15.001
-```
-which will also load the consistent oneAPI SDK and `cmake`.
+which will also load the consistent oneAPI SDK (version 2024.2) and `cmake`.
 
 
 ## Torch and IPEX libraries
 
 With the ML frameworks module loaded as shown above, run
-```
+```bash
 python -c 'import torch; print(torch.__path__[0])'
 python -c 'import torch;print(torch.utils.cmake_prefix_path)'
 ```
 to find the path to the Torch libraries, include files, and CMake files.
 
 For the path to the IPEX dynamic library, run
-```
+```bash
 python -c 'import torch; print(torch.__path__[0].replace("torch","intel_extension_for_pytorch"))'
 ```
 
 
+## Linking LibTorch and IPEX Libraries
+
+When using the CMake build system, LibTorch and IPEX libraries can be linked to an example C++ application using the following `CMakeLists.txt` file
+```bash
+cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
+cmake_policy(SET CMP0074 NEW)
+project(project-name)
+
+find_package(Torch REQUIRED)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} -Wl,--no-as-needed")
+set(TORCH_LIBS ${TORCH_LIBRARIES})
+
+find_library(IPEX_LIB intel-ext-pt-gpu PATHS ${INTEL_EXTENSION_FOR_PYTORCH_PATH}/lib NO_DEFAULT_PATH REQUIRED)
+set(TORCH_LIBS ${TORCH_LIBS} ${IPEX_LIB})
+include_directories(SYSTEM ${INTEL_EXTENSION_FOR_PYTORCH_PATH}/include)
+
+add_executable(exe main.cpp)
+target_link_libraries(exe ${TORCH_LIBS})
+
+set_property(TARGET exe PROPERTY CXX_STANDARD 17)
+```
+
+and configuring the build with 
+```
+cmake \
+    -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` \
+    -DINTEL_EXTENSION_FOR_PYTORCH_PATH=`python -c 'import torch; print(torch.__path__[0].replace("torch","intel_extension_for_pytorch"))'` \
+    ./
+make
+```
+
+
+## Device Introspection
+
+Similarly to PyTorch, LibTorch provides API to perform instrospection on the devices available on the system.
+The simple code below shows how to check if XPU devices are available, how many are present, and how to loop through them to discover some properties.
+
+```bash
+#include <torch/torch.h>
+#include <c10/xpu/XPUFunctions.h>
+
+int main(int argc, const char* argv[])
+{
+  torch::DeviceType device;
+  int num_devices = 0;
+  if (torch::xpu::is_available()) {
+    std::cout << "XPU devices detected" << std::endl;
+    device = torch::kXPU;
+
+    num_devices = torch::xpu::device_count();
+    std::cout << "Number of XPU devices: " << num_devices << std::endl;
+
+
+    for (int i = 0; i < num_devices; ++i) {
+      c10::xpu::set_device(i);
+      std::cout << "Device " << i << ":" << std::endl;
+      //std::string device_name = c10::xpu::get_device_name();
+      //std::cout << "Device " << i << ": " << device_name << std::endl;
+
+      c10::xpu::DeviceProp device_prop{};
+      c10::xpu::get_device_properties(&device_prop, i);
+      std::cout << "  Name: " << device_prop.name << std::endl;
+      std::cout << "  Total memory: " << device_prop.global_mem_size / (1024 * 1024) << " MB" << std::endl;
+    }
+  } else {
+    device = torch::kCPU;
+    std::cout << "No XPU devices detected, setting device to CPU" << std::endl;
+  }
+
+  return 0;
+}
+```
+
 
 ## Model Inferencing Using the Torch API 
-This example shows how to perform inference on the ResNet50 model using only the LibTorch API.
-First, get a jit-traced version of the model running `resnet50_trace.py` below.
+
+This example shows how to perform inference with the ResNet50 model using LibTorch.
+First, get a jit-traced version of the model executing `python resnet50_trace.py` (shwn below) on a compute node.
 ```
 import torch
 import torchvision
@@ -58,81 +131,53 @@ print(f"Inference time: {toc-tic}")
 torch.jit.save(model_jit, f"resnet50_jit.pt")
 ```
 
-Then, use the source code in `inference-example.cpp`
-```
+Then, build `inference-example.cpp` (shown below)
+```bash
 #include <torch/torch.h>
 #include <torch/script.h>
-#include <iostream>
 
 int main(int argc, const char* argv[]) {
-    torch::jit::script::Module model;
-    try {
-        model = torch::jit::load(argv[1]);
-        std::cout << "Loaded the model\n";
-    }
-    catch (const c10::Error& e) {
-        std::cerr << "error loading the model\n";
-        return -1;
-    }
-    // Upload model to GPU
-    model.to(torch::Device(torch::kXPU));
-    std::cout << "Model offloaded to GPU\n\n";
-
-    auto options = torch::TensorOptions()
+  torch::jit::script::Module model;
+  try {
+    model = torch::jit::load(argv[1]);
+    std::cout << "Loaded the model\n";
+  }
+  catch (const c10::Error& e) {
+    std::cerr << "error loading the model\n";
+    return -1;
+  }
+
+  model.to(torch::Device(torch::kXPU));
+  std::cout << "Model offloaded to GPU\n\n";
+
+  auto options = torch::TensorOptions()
                       .dtype(torch::kFloat32)
                       .device(torch::kXPU);
-    torch::Tensor input_tensor = torch::rand({1,3,224,224}, options);
-    assert(input_tensor.dtype() == torch::kFloat32);
-    assert(input_tensor.device().type() == torch::kXPU);
-    std::cout << "Created the input tesor on GPU\n";
+  torch::Tensor input_tensor = torch::rand({1,3,224,224}, options);
+  assert(input_tensor.dtype() == torch::kFloat32);
+  assert(input_tensor.device().type() == torch::kXPU);
+  std::cout << "Created the input tesor on GPU\n";
 
-    torch::Tensor output = model.forward({input_tensor}).toTensor();
-    std::cout << "Performed inference\n\n";
+  torch::Tensor output = model.forward({input_tensor}).toTensor();
+  std::cout << "Performed inference\n\n";
 
-    std::cout << "Predicted tensor is : \n";
-    std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/10) << '\n';
+  std::cout << "Slice of predicted tensor is : \n";
+  std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/10) << '\n';
 
-    return 0;
+  return 0;
 }
 ```
 
-and the `CMakeLists.txt` file
-
-```
-cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
-cmake_policy(SET CMP0074 NEW)
-project(inference-example)
-
-find_package(Torch REQUIRED)
-set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} -Wl,--no-as-needed")
-
-add_executable(inference-example inference-example.cpp)
-target_link_libraries(inference-example "${TORCH_LIBRARIES}" "${INTEL_EXTENSION_FOR_PYTORCH_PATH}/lib/libintel-ext-pt-gpu.so")
-
-set_property(TARGET inference-example PROPERTY CXX_STANDARD 17)
-```
-
-to build the inference example.
-
-Finally, execute the `doConfig.sh` script below
-```
-#!/bin/bash
-
-cmake \
-    -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` \
-    -DINTEL_EXTENSION_FOR_PYTORCH_PATH=`python -c 'import torch; print(torch.__path__[0].replace("torch","intel_extension_for_pytorch"))'` \
-    ./
+and execute it with `./inference-example ./resnet50_jit.pt`.
 
-make
-./inference-example ./resnet50_jit.pt
-```
 
 ## LibTorch Interoperability with SYCL Pipelines
-The LibTorch API can be integrated with data pilelines using SYCL to offload and operate on the input and output data on the Intel Max 1550 GPU. 
-The code below extends the above example of performing inference with the ResNet50 model by first generating the input data on the CPU, then offloading it to the GPU with SYCL, and finally passing the device pointer to LibTorch for inference.
+
+The LibTorch API can be integrated with data pilelines using SYCL to operate on input and output data already offloaded to the Intel Max 1550 GPU. 
+The code below extends the above example of performing inference with the ResNet50 model by first generating the input data on the CPU, then offloading it to the GPU with SYCL, and finally passing the device pointer to LibTorch for inference using `torch::from_blob()`, which create a Torch tensor from a data pointer with zero-copy.
 
 The source code for `inference-example.cpp` is modified as follows
-```
+```bash
 #include <torch/torch.h>
 #include <torch/script.h>
 #include <iostream>
@@ -143,78 +188,86 @@ const int N_BATCH = 1;
 const int N_CHANNELS = 3;
 const int N_PIXELS = 224;
 const int INPUTS_SIZE = N_BATCH*N_CHANNELS*N_PIXELS*N_PIXELS;
+const int OUTPUTS_SIZE = N_BATCH*N_CHANNELS;
 
 int main(int argc, const char* argv[]) {
-    torch::jit::script::Module model;
-    try {
-        model = torch::jit::load(argv[1]);
-        std::cout << "Loaded the model\n";
-    }
-    catch (const c10::Error& e) {
-        std::cerr << "error loading the model\n";
-        return -1;
-    }
-    // Upload model to GPU
-    model.to(torch::Device(torch::kXPU));
-    std::cout << "Model offloaded to GPU\n\n";
-
-    // Create the input data on the host
-    std::vector<float> inputs(INPUTS_SIZE);
-    srand(12345);
-    for (int i=0; i<INPUTS_SIZE; i++) {
-      inputs[i] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
-    }
-    std::cout << "Generated input data on the host \n\n";
-
-    // Move input data to the device with SYCL
-    sycl::queue Q(sycl::gpu_selector_v);
-    std::cout << "SYCL running on "
+  torch::jit::script::Module model;
+  try {
+    model = torch::jit::load(argv[1]);
+    std::cout << "Loaded the model\n";
+  }
+  catch (const c10::Error& e) {
+    std::cerr << "error loading the model\n";
+    return -1;
+  }
+
+  model.to(torch::Device(torch::kXPU));
+  std::cout << "Model offloaded to GPU\n\n";
+
+  // Create the input data on the host
+  std::vector<float> inputs(INPUTS_SIZE);
+  srand(12345);
+  for (int i=0; i<INPUTS_SIZE; i++) {
+    inputs[i] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
+  }
+  std::cout << "Generated input data on the host \n\n";
+
+  // Move input data to the device with SYCL
+  sycl::queue Q(sycl::gpu_selector_v);
+  std::cout << "SYCL running on "
             << Q.get_device().get_info<sycl::info::device::name>()
             << "\n\n";
-    float *d_inputs = sycl::malloc_device<float>(INPUTS_SIZE, Q);
-    Q.memcpy((void *) d_inputs, (void *) inputs.data(), INPUTS_SIZE*sizeof(float));
-    Q.wait();
-
-    // Convert input array to Torch tensor
-    auto options = torch::TensorOptions()
+  float *d_inputs = sycl::malloc_device<float>(INPUTS_SIZE, Q);
+  Q.memcpy((void *) d_inputs, (void *) inputs.data(), INPUTS_SIZE*sizeof(float));
+  Q.wait();
+
+  // Pre-allocate the output array on device and fill with a number
+  double *d_outputs = sycl::malloc_device<double>(OUTPUTS_SIZE, Q);
+  Q.submit([&](sycl::handler &cgh) {
+    cgh.parallel_for(OUTPUTS_SIZE, [=](sycl::id<1> idx) {
+      d_outputs[idx] = 1.2345;
+    });
+  });
+  Q.wait();
+  std::cout << "Offloaded input data to the GPU \n\n";
+
+  // Convert input array to Torch tensor
+  auto options = torch::TensorOptions()
                       .dtype(torch::kFloat32)
                       .device(torch::kXPU);
-    torch::Tensor input_tensor = at::from_blob(d_inputs, {N_BATCH,N_CHANNELS,N_PIXELS,N_PIXELS},
-                                               nullptr, at::device(torch::kXPU).dtype(torch::kFloat32),
-                                               torch::kXPU)
-                                               .to(torch::kXPU);
-    assert(input_tensor.dtype() == torch::kFloat32);
-    assert(input_tensor.device().type() == torch::kXPU);
-    std::cout << "Created the input tesor on GPU\n";
-
-    // Perform inference
-    torch::Tensor output = model.forward({input_tensor}).toTensor();
-    std::cout << "Performed inference\n\n";
-
-    std::cout << "Predicted tensor is : \n";
-    std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/10) << '\n';
-
-    return 0;
+  torch::Tensor input_tensor = torch::from_blob(
+                                 d_inputs,
+                                 {N_BATCH,N_CHANNELS,N_PIXELS,N_PIXELS},
+                                 options);
+  assert(input_tensor.dtype() == torch::kFloat32);
+  assert(input_tensor.device().type() == torch::kXPU);
+  std::cout << "Created the input Torch tesor on GPU\n\n";
+
+  // Perform inference
+  torch::NoGradGuard no_grad; // equivalent to "with torch.no_grad():" in PyTorch
+  torch::Tensor output = model.forward({input_tensor}).toTensor();
+  std::cout << "Performed inference\n\n";
+
+  // Copy the output Torch tensor to the SYCL pointer
+  auto output_tensor_ptr = output.contiguous().data_ptr();
+  Q.memcpy((void *) d_outputs, (void *) output_tensor_ptr, OUTPUTS_SIZE*sizeof(double));
+  Q.wait();
+  std::cout << "Copied output Torch tensor to SYCL pointer\n";
+
+  return 0;
 }
 ```
 
-and the CMake commands also change to include
-```
-#!/bin/bash
-
+Note that an additional C++ flag is needed in this case, as shown below in the `cmake` command
+```bash
 cmake \
     -DCMAKE_CXX_FLAGS="-std=c++17 -fsycl" \
     -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` \
     -DINTEL_EXTENSION_FOR_PYTORCH_PATH=`python -c 'import torch; print(torch.__path__[0].replace("torch","intel_extension_for_pytorch"))'` \
     ./
-
-make
-./inference-example ./resnet50_jit.pt
 ```
 
-## Known Issues
 
-* The LibTorch introspection API that are available for CUDA devices, such as `torch::cuda::is_available()`, are still under development for Intel Max 1550 GPU.
 
 
 

From b6f712c5df461e67423b6c3f9845c499d5c9f2fa Mon Sep 17 00:00:00 2001
From: jfrancis-anl <jfrancis@anl.gov>
Date: Thu, 31 Oct 2024 15:44:45 -0500
Subject: [PATCH 3/6] Added known issues

---
 docs/aurora/running-jobs-aurora.md | 2 +-
 docs/polaris/known-issues.md       | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/aurora/running-jobs-aurora.md b/docs/aurora/running-jobs-aurora.md
index 5109d7b77..aba76ae3b 100644
--- a/docs/aurora/running-jobs-aurora.md
+++ b/docs/aurora/running-jobs-aurora.md
@@ -11,7 +11,7 @@ There is a single routing queue in place called `EarlyAppAccess` which submits t
 
 ### Submitting a job
 
-Note: Jobs should be submitted only from your allocated project directory and not from your home directory. 
+Note: Jobs should be submitted only from your allocated project directory and not from your home directory or from `/soft/modulefiles`. Submitting an interactive job from `/soft/modulefiles` will result in your job ending abruptly.
 
 For example, a one-node interactive job can be requested for 30 minutes with the following command, where `[your_ProjectName]` is replaced with an appropriate project name.
 
diff --git a/docs/polaris/known-issues.md b/docs/polaris/known-issues.md
index eb321279d..45ea7207e 100644
--- a/docs/polaris/known-issues.md
+++ b/docs/polaris/known-issues.md
@@ -33,3 +33,6 @@ This is a collection of known issues that have been encountered on Polaris. Docu
       3. `-rw-------  (600)  id_rsa`
       4. `-rw-r--r--  (644)  id_rsa.pub`
    3. Copy the contents of your `.ssh/id_rsa.pub` file to `.ssh/authorized_keys`.
+   4. If you do not have the files mentioned above, you will need to create them.
+      1. You can generate an `id_rsa` file with the following command: `ssh-keygen -t rsa`
+

From af0abdd90be303fe1e094aaddb0137cdec2ef723 Mon Sep 17 00:00:00 2001
From: balin <riccardo.balin@colorado.edu>
Date: Thu, 31 Oct 2024 21:16:36 +0000
Subject: [PATCH 4/6] Added LibTorch docs for Polaris

---
 .../data-science/frameworks/libtorch.md       |   7 +-
 .../frameworks/libtorch.md                    | 149 ++++++++++++++++++
 mkdocs.yml                                    |   1 +
 3 files changed, 152 insertions(+), 5 deletions(-)
 create mode 100644 docs/polaris/data-science-workflows/frameworks/libtorch.md

diff --git a/docs/aurora/data-science/frameworks/libtorch.md b/docs/aurora/data-science/frameworks/libtorch.md
index 133f3219e..c8cd22e68 100644
--- a/docs/aurora/data-science/frameworks/libtorch.md
+++ b/docs/aurora/data-science/frameworks/libtorch.md
@@ -82,12 +82,9 @@ int main(int argc, const char* argv[])
     num_devices = torch::xpu::device_count();
     std::cout << "Number of XPU devices: " << num_devices << std::endl;
 
-
     for (int i = 0; i < num_devices; ++i) {
       c10::xpu::set_device(i);
       std::cout << "Device " << i << ":" << std::endl;
-      //std::string device_name = c10::xpu::get_device_name();
-      //std::cout << "Device " << i << ": " << device_name << std::endl;
 
       c10::xpu::DeviceProp device_prop{};
       c10::xpu::get_device_properties(&device_prop, i);
@@ -107,8 +104,8 @@ int main(int argc, const char* argv[])
 ## Model Inferencing Using the Torch API 
 
 This example shows how to perform inference with the ResNet50 model using LibTorch.
-First, get a jit-traced version of the model executing `python resnet50_trace.py` (shwn below) on a compute node.
-```
+First, get a jit-traced version of the model executing `python resnet50_trace.py` (shown below) on a compute node.
+```bash
 import torch
 import torchvision
 import intel_extension_for_pytorch as ipex
diff --git a/docs/polaris/data-science-workflows/frameworks/libtorch.md b/docs/polaris/data-science-workflows/frameworks/libtorch.md
new file mode 100644
index 000000000..9ae999256
--- /dev/null
+++ b/docs/polaris/data-science-workflows/frameworks/libtorch.md
@@ -0,0 +1,149 @@
+# LibTorch C++ Library
+
+LibTorch is a C++ library for Torch, with many of the API that are available in PyTorch. Users can find more information on the [PyTorch documentation](https://pytorch.org/cppdocs/installing.html).
+This is useful to integrate the Torch ML framework into traditional HPC simulation codes and therefore enable training and inferecing of ML models.
+During compilation, Intel optimizations will be activated automatically once the IPEX dynamic library is linked.
+
+
+## Environment Setup
+
+To use LibTorch on Polaris, load the ML frameworks module
+```bash
+module use /soft/modulefiles
+module load conda/2024-04-29
+conda activate
+```
+which will also load, `PrgEnv-gnu/8.5.0` and `cmake`.
+
+
+## Torch Libraries
+
+With the ML frameworks module loaded as shown above, run
+```bash
+python -c 'import torch; print(torch.__path__[0])'
+python -c 'import torch;print(torch.utils.cmake_prefix_path)'
+```
+to find the path to the Torch libraries, include files, and CMake files.
+
+
+## Linking the Torch Libraries
+
+When using the CMake build system, the LibTorch libraries can be linked to an example C++ application using the following `CMakeLists.txt` file
+```bash
+cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
+cmake_policy(SET CMP0074 NEW)
+project(project-name)
+
+find_package(Torch REQUIRED)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} -Wl,--no-as-needed")
+set(TORCH_LIBS ${TORCH_LIBRARIES})
+
+add_executable(exe main.cpp)
+target_link_libraries(exe ${TORCH_LIBS})
+
+set_property(TARGET exe PROPERTY CXX_STANDARD 17)
+```
+
+and configuring the build with 
+```
+cmake \
+    -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` \
+    ./
+make
+```
+
+
+## Device Introspection
+
+Similarly to PyTorch, LibTorch provides API to perform instrospection on the devices available on the system.
+The simple code below shows how to check if CUDA devices are available, how many are present, and how to loop through them to discover some properties.
+
+```bash
+#include <torch/torch.h>
+
+int main(int argc, const char* argv[])
+{
+  torch::DeviceType device;
+  int num_devices = 0;
+  if (torch::cuda::is_available()) {
+    std::cout << "CUDA devices detected" << std::endl;
+    device = torch::kCUDA;
+
+    num_devices = torch::cuda::device_count();
+    std::cout << "Number of CUDA devices: " << num_devices << std::endl;
+  } else {
+    device = torch::kCPU;
+    std::cout << "No CUDA devices detected, setting device to CPU" << std::endl;
+  }
+
+  return 0;
+}
+```
+
+
+## Model Inferencing Using the Torch API 
+
+This example shows how to perform inference with the ResNet50 model using LibTorch.
+First, get a jit-traced version of the model executing `python resnet50_trace.py` (shown below) on a compute node.
+```bash
+import torch
+import torchvision
+from time import perf_counter
+
+device = 'cuda'
+
+model = torchvision.models.resnet50()
+model.to(device)
+model.eval()
+
+dummy_input = torch.rand(1, 3, 224, 224).to(device)
+
+model_jit = torch.jit.trace(model, dummy_input)
+tic = perf_counter()
+predictions = model_jit(dummy_input)
+toc = perf_counter()
+print(f"Inference time: {toc-tic}")
+
+torch.jit.save(model_jit, f"resnet50_jit.pt")
+```
+
+Then, build `inference-example.cpp` (shown below)
+```bash
+#include <torch/torch.h>
+#include <torch/script.h>
+
+int main(int argc, const char* argv[]) {
+  torch::jit::script::Module model;
+  try {
+    model = torch::jit::load(argv[1]);
+    std::cout << "Loaded the model\n";
+  }
+  catch (const c10::Error& e) {
+    std::cerr << "error loading the model\n";
+    return -1;
+  }
+
+  model.to(torch::Device(torch::kCUDA));
+  std::cout << "Model offloaded to GPU\n\n";
+
+  auto options = torch::TensorOptions()
+                      .dtype(torch::kFloat32)
+                      .device(torch::kCUDA);
+  torch::Tensor input_tensor = torch::rand({1,3,224,224}, options);
+  assert(input_tensor.dtype() == torch::kFloat32);
+  assert(input_tensor.device().type() == torch::kCUDA);
+  std::cout << "Created the input tesor on GPU\n";
+
+  torch::Tensor output = model.forward({input_tensor}).toTensor();
+  std::cout << "Performed inference\n\n";
+
+  std::cout << "Slice of predicted tensor is : \n";
+  std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/10) << '\n';
+
+  return 0;
+}
+```
+
+and execute it with `./inference-example ./resnet50_jit.pt`.
+
+
diff --git a/mkdocs.yml b/mkdocs.yml
index dd4d1d98a..f19acb197 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -78,6 +78,7 @@ nav:
             - PyTorch: polaris/data-science-workflows/frameworks/pytorch.md
             - Jax: polaris/data-science-workflows/frameworks/jax.md
             - DeepSpeed: polaris/data-science-workflows/frameworks/deepspeed.md
+            - LibTorch: polaris/data-science-workflows/frameworks/libtorch.md
         - Applications:
             - Megatron-DeepSpeed: polaris/data-science-workflows/applications/megatron-deepspeed.md
             - gpt-neox: polaris/data-science-workflows/applications/gpt-neox.md

From a174961975157b9c4a12bc84f6a2d6c6c761b03d Mon Sep 17 00:00:00 2001
From: rickybalin <riccardo.balin@colorado.edu>
Date: Thu, 31 Oct 2024 15:40:32 -0600
Subject: [PATCH 5/6] fixed some typos

---
 docs/aurora/data-science/frameworks/libtorch.md           | 8 ++++----
 .../polaris/data-science-workflows/frameworks/libtorch.md | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/docs/aurora/data-science/frameworks/libtorch.md b/docs/aurora/data-science/frameworks/libtorch.md
index c8cd22e68..3ccf158f3 100644
--- a/docs/aurora/data-science/frameworks/libtorch.md
+++ b/docs/aurora/data-science/frameworks/libtorch.md
@@ -67,7 +67,7 @@ make
 Similarly to PyTorch, LibTorch provides API to perform instrospection on the devices available on the system.
 The simple code below shows how to check if XPU devices are available, how many are present, and how to loop through them to discover some properties.
 
-```bash
+```c++
 #include <torch/torch.h>
 #include <c10/xpu/XPUFunctions.h>
 
@@ -105,7 +105,7 @@ int main(int argc, const char* argv[])
 
 This example shows how to perform inference with the ResNet50 model using LibTorch.
 First, get a jit-traced version of the model executing `python resnet50_trace.py` (shown below) on a compute node.
-```bash
+```python
 import torch
 import torchvision
 import intel_extension_for_pytorch as ipex
@@ -129,7 +129,7 @@ torch.jit.save(model_jit, f"resnet50_jit.pt")
 ```
 
 Then, build `inference-example.cpp` (shown below)
-```bash
+```c++
 #include <torch/torch.h>
 #include <torch/script.h>
 
@@ -174,7 +174,7 @@ The LibTorch API can be integrated with data pilelines using SYCL to operate on
 The code below extends the above example of performing inference with the ResNet50 model by first generating the input data on the CPU, then offloading it to the GPU with SYCL, and finally passing the device pointer to LibTorch for inference using `torch::from_blob()`, which create a Torch tensor from a data pointer with zero-copy.
 
 The source code for `inference-example.cpp` is modified as follows
-```bash
+```c++
 #include <torch/torch.h>
 #include <torch/script.h>
 #include <iostream>
diff --git a/docs/polaris/data-science-workflows/frameworks/libtorch.md b/docs/polaris/data-science-workflows/frameworks/libtorch.md
index 9ae999256..f8959b8bd 100644
--- a/docs/polaris/data-science-workflows/frameworks/libtorch.md
+++ b/docs/polaris/data-science-workflows/frameworks/libtorch.md
@@ -13,7 +13,7 @@ module use /soft/modulefiles
 module load conda/2024-04-29
 conda activate
 ```
-which will also load, `PrgEnv-gnu/8.5.0` and `cmake`.
+which will also loads `PrgEnv-gnu/8.5.0` and `cmake`.
 
 
 ## Torch Libraries
@@ -58,7 +58,7 @@ make
 Similarly to PyTorch, LibTorch provides API to perform instrospection on the devices available on the system.
 The simple code below shows how to check if CUDA devices are available, how many are present, and how to loop through them to discover some properties.
 
-```bash
+```c++
 #include <torch/torch.h>
 
 int main(int argc, const char* argv[])
@@ -85,7 +85,7 @@ int main(int argc, const char* argv[])
 
 This example shows how to perform inference with the ResNet50 model using LibTorch.
 First, get a jit-traced version of the model executing `python resnet50_trace.py` (shown below) on a compute node.
-```bash
+```python
 import torch
 import torchvision
 from time import perf_counter
@@ -108,7 +108,7 @@ torch.jit.save(model_jit, f"resnet50_jit.pt")
 ```
 
 Then, build `inference-example.cpp` (shown below)
-```bash
+```c++
 #include <torch/torch.h>
 #include <torch/script.h>
 

From 2ba3ca9e39c0485cf17acbaafd811472b45d55a6 Mon Sep 17 00:00:00 2001
From: jfrancis-anl <jfrancis@anl.gov>
Date: Fri, 1 Nov 2024 13:57:24 -0500
Subject: [PATCH 6/6] Tweaked formatting

---
 docs/polaris/known-issues.md | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/docs/polaris/known-issues.md b/docs/polaris/known-issues.md
index 45ea7207e..7b5cb2539 100644
--- a/docs/polaris/known-issues.md
+++ b/docs/polaris/known-issues.md
@@ -11,13 +11,10 @@ This is a collection of known issues that have been encountered on Polaris. Docu
 ## Compiling & Running Applications
 
 1. If your job fails to start with an `RPC launch` message like below, please forward the complete messages to [support@alcf.anl.gov](mailto:support@alcf.anl.gov).
-
    ```bash
    launch failed on x3104c0s1b0n0: Couldn't forward RPC launch(ab751d77-e80a-4c54-b1c2-4e881f7e8c90) to child x3104c0s31b0n0.hsn.cm.polaris.alcf.anl.gov: Resource temporarily unavailable
    ```
-
 2. The message below is an XALT-related warning that can be ignored when running `apptainer`. For other commands, please forward the complete message to [support@alcf.anl.gov](mailto:support@alcf.anl.gov) so we are aware of your use case.
-
    ```bash
    ERROR: ld.so: object '/soft/xalt/3.0.2-202408282050/lib64/libxalt_init.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
    ```
@@ -32,7 +29,7 @@ This is a collection of known issues that have been encountered on Polaris. Docu
       2. `-rw-r--r--  (644)  config`
       3. `-rw-------  (600)  id_rsa`
       4. `-rw-r--r--  (644)  id_rsa.pub`
-   3. Copy the contents of your `.ssh/id_rsa.pub` file to `.ssh/authorized_keys`.
-   4. If you do not have the files mentioned above, you will need to create them.
+   3. If you do not have the files mentioned above, you will need to create them.
       1. You can generate an `id_rsa` file with the following command: `ssh-keygen -t rsa`
+   4. Copy the contents of your `.ssh/id_rsa.pub` file to `.ssh/authorized_keys`.