This is an interactive visualization tool for TaskVine, a task scheduler for large workflows to run effiently on HPC clusters, allowing users to efficiently organize and visualize the results from log files.
Install these packages via conda
conda install -y flask pandas tqdm bitarray python-graphviz
The process of generating a report from a log directory involves three steps:
-
Organize the logs directory properly. All log entries should be stored in the
logs
directory.By default, when running TaskVine, the manager generates a log directory named
vine-logs
located undervine-run-info/most-recent
, thoughmost-recent
can be replaced by any entry withinvine-run-info
.You need to copy the
most-recent
(or any other entry fromvine-run-info
) into thelogs
directory.There is an example in the repository which can be used out of the box:
logs └── test_example └── vine-logs ├── debug ├── performance ├── taskgraph ├── transactions └── workflow.json
Typically, log files under
vine-logs
include at leastdebug
,performance
,taskgraph
, andtransactions
.A
workflow.json
file is also normally produced by TaskVine, but it won't be used by this tool. -
Once the logs are properly arranged, you can produce the intermediate CSV and graph files for visualization.
The reason we split this process is that data processing is usually compute-intensive, while visualization alone is relatively fast.
By generating the data once, it can be reused multiple times.
The first step is to generate some CSV files, which provides the majority of aspects of a run, including how tasks are distributed among workers, what are the execution time of each task, how many tasks are running concurrently at different time, etc.
To generate the CSV files, use:
python generate_csv.py logs/[log_name]
For example
python generate_csv.py logs/test_example
Additionally, users can optionally generate the task graph to further examine the relationships between tasks. We separate this process because it can take a significant amount of time if there are hundreds of thousands of tasks.
To generate the graph files, use
python generate_graph.py logs/[log_name]
For example
python generate_graph.py logs/test_example/
-
Once all the data is generated, run the following command to open a port for online visualization
python app.py
All entries under
logs
are detected, allowing to switch between different entries for exploring and comparing.
Here are some quick demostrations of the demonstration page. Note that this demo runs on a larger scale example, which is different from the one in the current repository.
Optionally, we also provide more lightweight visualizations using matplotlib, take a look at