Skip to content

This tool provides an interactive way to visualize logs generated by TaskVine.

Notifications You must be signed in to change notification settings

cooperative-computing-lab/taskvine-report-tool

Repository files navigation

README

This is an interactive visualization tool for TaskVine, a task scheduler for large workflows to run effiently on HPC clusters, allowing users to efficiently organize and visualize the results from log files.

Quick Install

Install these packages via conda

conda install -y flask pandas tqdm bitarray python-graphviz

Use Instruction

The process of generating a report from a log directory involves three steps:

  1. Organize the logs directory properly. All log entries should be stored in the logs directory.

    By default, when running TaskVine, the manager generates a log directory named vine-logs located under vine-run-info/most-recent, though most-recent can be replaced by any entry within vine-run-info.

    You need to copy the most-recent (or any other entry from vine-run-info) into the logs directory.

    There is an example in the repository which can be used out of the box:

    logs
    └── test_example
        └── vine-logs
            ├── debug
            ├── performance
            ├── taskgraph
            ├── transactions
            └── workflow.json
    

    Typically, log files under vine-logs include at least debug, performance, taskgraph, and transactions.

    A workflow.json file is also normally produced by TaskVine, but it won't be used by this tool.

  2. Once the logs are properly arranged, you can produce the intermediate CSV and graph files for visualization.

    The reason we split this process is that data processing is usually compute-intensive, while visualization alone is relatively fast.

    By generating the data once, it can be reused multiple times.

    The first step is to generate some CSV files, which provides the majority of aspects of a run, including how tasks are distributed among workers, what are the execution time of each task, how many tasks are running concurrently at different time, etc.

    To generate the CSV files, use:

    python generate_csv.py logs/[log_name]
    

    For example

    python generate_csv.py logs/test_example
    

    Additionally, users can optionally generate the task graph to further examine the relationships between tasks. We separate this process because it can take a significant amount of time if there are hundreds of thousands of tasks.

    To generate the graph files, use

    python generate_graph.py logs/[log_name]
    

    For example

    python generate_graph.py logs/test_example/
    
  3. Once all the data is generated, run the following command to open a port for online visualization

    python app.py
    

    All entries under logs are detected, allowing to switch between different entries for exploring and comparing.

Examples

Here are some quick demostrations of the demonstration page. Note that this demo runs on a larger scale example, which is different from the one in the current repository.

screenshot_1

screenshot_2

screenshot_3

screenshot_4

screenshot_5

screenshot_6

screenshot_7

Optionally, we also provide more lightweight visualizations using matplotlib, take a look at pyplot

About

This tool provides an interactive way to visualize logs generated by TaskVine.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published