FEAT: add conversion functions for TES and WES to WRROC #5

Karanjot786 · 2024-06-15T15:26:09Z

Description

This PR adds the implementation of the convert_tes_to_wrroc and convert_wes_to_wrroc functions. These functions are responsible for converting TES (Task Execution Service) and WES (Workflow Execution Service) tasks and runs into WRROC (Workflow Run RO-Crate) format.

Changes Included

convert_tes_to_wrroc function: Converts TES tasks to WRROC format by mapping TES properties to WRROC properties.
convert_wes_to_wrroc function: Converts WES runs to WRROC format by mapping WES properties to WRROC properties.
Helper functions:
- convert_to_iso8601: Converts timestamps from RFC 3339 format to ISO 8601 format.
- convert_status: Maps TES and WES task statuses to appropriate CreateAction statuses in WRROC.

Summary by Sourcery

This pull request adds conversion functions for transforming TES tasks and WES runs into WRROC format, along with helper functions for timestamp conversion and status mapping.

New Features:
- Introduced convert_tes_to_wrroc function to convert TES tasks to WRROC format.
- Introduced convert_wes_to_wrroc function to convert WES runs to WRROC format.
Enhancements:
- Added helper function convert_to_iso8601 to convert timestamps from RFC 3339 format to ISO 8601 format.
- Added helper function convert_status to map TES and WES task statuses to appropriate CreateAction statuses in WRROC.

sourcery-ai · 2024-06-15T15:26:15Z

Reviewer's Guide by Sourcery

This pull request introduces two new functions, convert_tes_to_wrroc and convert_wes_to_wrroc, which convert TES (Task Execution Service) tasks and WES (Workflow Execution Service) runs into WRROC (Workflow Run RO-Crate) format. Additionally, helper functions convert_to_iso8601 and convert_status are implemented to assist with timestamp and status conversions, respectively.

File-Level Changes

Files	Changes
`src/converter.py`	Introduced functions for converting TES and WES data to WRROC format, along with necessary helper functions for timestamp and status conversions.

Tips

Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
You can always contact us if you have any questions or feedback.

sourcery-ai

Hey @Karanjot786 - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟡 General issues: 7 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

sourcery-ai · 2024-06-15T15:30:04Z

src/converter.py

+                "@id": tes_data["id"],
+                "name": tes_data.get("name", ""),
+                "description": tes_data.get("description", ""),
+                "instrument": tes_data["executors"][0]["image"],


issue (bug_risk): Potential IndexError if 'executors' list is empty

Consider adding a check to ensure 'executors' list is not empty before accessing the first element to avoid potential IndexError.

sourcery-ai · 2024-06-15T15:30:04Z

src/converter.py

+                "name": tes_data.get("name", ""),
+                "description": tes_data.get("description", ""),
+                "instrument": tes_data["executors"][0]["image"],
+                "object": [{"@id": input["url"], "name": input["path"]} for input in tes_data["inputs"]],


issue (bug_risk): Potential KeyError if 'url' or 'path' keys are missing in 'inputs'

Consider using the .get() method or adding a check to ensure 'url' and 'path' keys exist in each input dictionary to avoid potential KeyError.

sourcery-ai · 2024-06-15T15:30:04Z

src/converter.py

+                "description": tes_data.get("description", ""),
+                "instrument": tes_data["executors"][0]["image"],
+                "object": [{"@id": input["url"], "name": input["path"]} for input in tes_data["inputs"]],
+                "result": [{"@id": output["url"], "name": output["path"]} for output in tes_data["outputs"]],


issue (bug_risk): Potential KeyError if 'url' or 'path' keys are missing in 'outputs'

Consider using the .get() method or adding a check to ensure 'url' and 'path' keys exist in each output dictionary to avoid potential KeyError.

sourcery-ai · 2024-06-15T15:30:04Z

src/converter.py

+                "@id": wes_data["run_id"],
+                "name": wes_data["run_log"].get("name", ""),
+                "description": wes_data["run_log"].get("description", ""),
+                "object": [{"@id": input["location"], "name": input["name"]} for input in wes_data["request"]["workflow_params"]],


issue (bug_risk): Potential KeyError if 'location' or 'name' keys are missing in 'workflow_params'

Consider using the .get() method or adding a check to ensure 'location' and 'name' keys exist in each workflow_params dictionary to avoid potential KeyError.

sourcery-ai · 2024-06-15T15:30:04Z

src/converter.py

+                "name": wes_data["run_log"].get("name", ""),
+                "description": wes_data["run_log"].get("description", ""),
+                "object": [{"@id": input["location"], "name": input["name"]} for input in wes_data["request"]["workflow_params"]],
+                "result": [{"@id": output["location"], "name": output["name"]} for output in wes_data["run_log"]["outputs"]],


issue (bug_risk): Potential KeyError if 'location' or 'name' keys are missing in 'outputs'

Consider using the .get() method or adding a check to ensure 'location' and 'name' keys exist in each output dictionary to avoid potential KeyError.

sourcery-ai · 2024-06-15T15:30:04Z

src/converter.py

+
+def convert_to_iso8601(timestamp):
+    # Assuming timestamp is in RFC 3339 format, convert to ISO 8601
+    return datetime.datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%SZ").isoformat() + "Z"


issue: Assumption about timestamp format

The function assumes the timestamp is always in RFC 3339 format. Consider adding error handling for cases where the timestamp format is different.

@Karanjot786, please check this out, verify if the format is the WES and TES standard, if yes, add error handling and if not, add support for other formats, and error handling.

From the example I tested here, it doesn't seem to be the standard format

sourcery-ai · 2024-06-15T15:30:04Z

src/converter.py

+            "end_time": "2024-01-01T01:00:00Z"
+        }
+    }
+    wrroc = convert_wes_to_wrroc(wes_run_example)


suggestion: Example usage in production code

Consider moving the example usage to a separate script or a test file to keep the production code clean.

Suggested change

wrroc = convert_wes_to_wrroc(wes_run_example)

# src/converter.py

if __name__ == "__main__":

wrroc = convert_wes_to_wrroc(wes_run_example)

print(wrroc)

SalihuDickson · 2024-06-18T23:51:04Z

src/converter.py

+
+def convert_to_iso8601(timestamp):
+    # Assuming timestamp is in RFC 3339 format, convert to ISO 8601
+    return datetime.datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%SZ").isoformat() + "Z"


@Karanjot786, please check this out, verify if the format is the WES and TES standard, if yes, add error handling and if not, add support for other formats, and error handling.

From the example I tested here, it doesn't seem to be the standard format

SalihuDickson · 2024-06-18T23:51:13Z

src/converter.py

You need to add some basic error handling to the conversion, and it seems you're checking for the wes output in the wrong object, it's not supposed to be in the run_logs.

In the mapping document, createAction.instrument is supposed to be a 1 to 1 match for the executors.image field but it only takes one identifier and the executor image can have multiple inputs. @uniqueg, was this done on purpose?

I'm not sure if I understand what you mean. There is only one image per executor in TES, but there may be multiple executors. We need to consider this in the mapping.

@uniqueg, yeah while there can be multiple executors per task, it seems the createAction.instrument field only takes one identifier as explained here. So not sure how they would be a 1 to 1 match between the executors.image (since there can be multiple of these in one task) and the createAction.instruments field. I might be wrong though, just wanted to see if there might be some context that I am missing.

Karanjot786 · 2024-06-19T04:45:58Z

Hi @SalihuDickson ,

Thank you for the feedback. I'll verify the WES and TES formats against their standards and update the conversion functions accordingly. I'll also ensure to add error handling and support for other formats if needed. Regarding the WES output, I'll correct the object reference to avoid looking in the wrong place. Additionally, I'll review the mapping for CreateAction.instrument to ensure it correctly handles multiple executor images.

I'll make the necessary adjustments and resubmit the PR for review.

…sion functions

Karanjot786 · 2024-06-19T06:23:06Z

Hi @SalihuDickson ,

I have made the following updates based on your feedback:

Verified and Corrected Format Handling:
- Ensured that the WES and TES data formats are correctly handled according to their respective standards.
- Added checks to ensure the executors list in TES data is not empty before accessing its first element.
Added Error Handling:
- Implemented error handling for missing keys, invalid formats, and other potential exceptions in both convert_tes_to_wrroc and convert_wes_to_wrroc functions.
- Used the .get() method to safely access keys in inputs and outputs for both TES and WES data, preventing potential KeyError issues.
Corrected Mapping for CreateAction.instrument:
- Ensured that the CreateAction.instrument field correctly handles multiple executor images as needed.
Enhanced Timestamp Conversion:
- Added error handling in the convert_to_iso8601 function to manage different timestamp formats, assuming RFC 3339 format and converting to ISO 8601.

Please review the changes and let me know if there are any further adjustments required.

SalihuDickson · 2024-06-19T07:37:45Z

@Karanjot786, thank you for getting to this so quickly. However these corrections may be making this PR unnecessarily complex. So I think you should revert the most recent commit, so we can merge and work on these changes on the next one. I think you should also separate the WES and TES conversions into different PRs so we can add a little more specificity.

What I want you to consider for the next PR:

First I think we have 3 types of data when working with error handling, when you're dealing with user defined data,

data that must be present and must be of a certain type (or range of types) if not, an error should be thrown
data is optional but should be of a certain type (or range of types) if present or an error should be thrown
And data that is recommended but not mandatory (could have a default value), this should show a warning

I think you need to figure out which applies to each field and write your error handling accordingly, so that we can avoid cryptic errors and gracefully handle these simple error scenarios. Something you can also do is collect all the errors in a list and then throw them all at once to ensure the user sees everything that is wrong with their data without having to rerun the program over and over.

We will also need to consider some best practices for error handling like having one function that all errors are passed into, to ensure uniform error handling and to constants for similar errors. This is quite a lot to consider, please feel free to ask me about anything you may be confused about : )

SalihuDickson · 2024-06-19T07:42:35Z

And you are currently looking for the output field in the run_logs but according to the WES specification, the output field isn't supposed to be in the run_logs. Please look at it here

Karanjot786 · 2024-06-19T11:51:39Z

Hi @SalihuDickson ,

Considering the feedback and to simplify the review process, may I cancel the current PR and create two separate PRs? One for convert_tes_to_wrroc and another for convert_wes_to_wrroc, each in different files. This approach will allow us to address any issues with more specificity and clarity.

Please let me know if this works for you.

SalihuDickson · 2024-06-19T16:27:34Z

Yes @Karanjot786, that is fine.

uniqueg

Thanks, @Karanjot786, but I think we are getting ahead of ourselves here. There are a couple of issues that I think need to be addressed before coding:

Make sure to properly set up the repo first, including packaging, dependency management, linting, static code/type checks, (API) documentation and a CI pipeline before adding any code. Have a look at how @JaeAeich and @Rahuljagwani are doing this.
For the actual code, please prepare an issue to discuss the design of the software. We want to write well organized code that can be extended and easily read, understood and maintained - not just a single script with a bunch of functions. So, please open an issue and think of an OOP-based architecture to organize your code (again, please have a look at what others are doing for inspiration) - then describe it in that issue. Think about abstractions. Think how you want to represent TES and WES request payloads and responses, validate them etc. Same thing for ROC.
Once we have the code design, break down the work into small work packages - perhaps you want to start with your abstraction layer, then do the TES to WRROC, then the WRROC to TES, then the WES to WRROC, then the WRROC to WES etc.
Make sure that each work package (1 work package = 1 PR, roughly) does not become too big. But make sure that all code you add is tested (provide unit tests covering all your code! - think about integration tests as well) and documented.

You can leave this PR open as a draft PR to copy some code over from later on, perhaps, but I don't think this will be merged.

uniqueg · 2024-06-20T12:50:36Z

src/converter.py

@@ -0,0 +1,173 @@
+import datetime


Please follow the contributor guidelines carefully. For Python these include:

Include Google style docstrings on all levels (package, module, class, method and function)

Provide type hints for all function/method signatures (arguments and return value) and global vars, and ideally for local vars as well

uniqueg · 2024-06-20T12:52:26Z

src/converter.py

I'm not sure if I understand what you mean. There is only one image per executor in TES, but there may be multiple executors. We need to consider this in the mapping.

uniqueg · 2024-06-20T13:08:25Z

src/converter.py

+import json
+
+def convert_tes_to_wrroc(tes_data):
+    try:


You should wrap only the code that you expect to raise particular errors in try/except - not just a bunch of commands and catch some generic errors. There is no added benefit of catching, say, an indexing error and then just say that there was an indexing error - the regular error and traceback will have all of that info already - and more.

uniqueg · 2024-06-20T13:10:37Z

src/converter.py

+        if not tes_data or not isinstance(tes_data, dict):
+            raise ValueError("Invalid TES data provided")
+
+        if not tes_data.get("executors"):
+            raise ValueError("TES data missing 'executors' list or 'executors' list is empty")


This is insufficient to validate the TES response data. It's not like any dict with a executors key would be a valid TES response. You need to define models for TES and WES, e.g., with Pydantic. Please discuss this with @JaeAeich - he is already writing models for TES (you could reuse them and add the models for WES).

uniqueg · 2024-06-20T13:13:06Z

src/converter.py

+        wrroc = {
+            "@context": "https://w3id.org/ro/crate/1.1/context",
+            "@graph": [
+                {
+                    "@type": "Dataset",
+                    "conformsTo": "https://w3id.org/ro/wfrun/process/0.4"
+                },
+                {
+                    "@type": "CreateAction",
+                    "@id": tes_data.get("id"),
+                    "name": tes_data.get("name", ""),
+                    "description": tes_data.get("description", ""),
+                    "instrument": tes_data["executors"][0].get("image"),
+                    "object": [{"@id": input.get("url"), "name": input.get("path")} for input in tes_data.get("inputs", []) if input.get("url") and input.get("path")],
+                    "result": [{"@id": output.get("url"), "name": output.get("path")} for output in tes_data.get("outputs", []) if output.get("url") and output.get("path")],
+                    "startTime": convert_to_iso8601(tes_data["logs"][0].get("start_time")),
+                    "endTime": convert_to_iso8601(tes_data["logs"][0].get("end_time")),
+                    "actionStatus": convert_status(tes_data.get("state")),
+                },
+                {
+                    "@type": "SoftwareApplication",
+                    "@id": tes_data["executors"][0].get("image"),
+                    "name": tes_data.get("name", "")
+                }
+            ]
+        }


See if you can model that with Pydantic. The members whose names start with @ might give you problems, but probably you could get around it with the use of aliases (https://docs.pydantic.dev/latest/api/fields/).

uniqueg · 2024-06-20T13:14:29Z

src/converter.py

+    except KeyError as e:
+        raise ValueError(f"Missing expected key: {e}")
+    except IndexError as e:
+        raise ValueError(f"Missing expected index: {e}")
+    except Exception as e:
+        raise ValueError(f"An error occurred during TES to WRROC conversion: {e}")


None of these are helpful... Think of a proper error handling strategy. Look at other code, read some articles etc.

JaeAeich · 2024-06-20T14:51:39Z

Make sure to properly set up the repo first, including packaging, dependency management, linting, static code/type checks, (API) documentation and a CI pipeline before adding any code.

Hey please consider linting and checks for non code files, since this this is a new proj we wouldn't end up with the prob I have with implementing it with TESK (ie huge changes). Consider looking at this PR, you don't necessarily need to implement pre-commit hook as that can hinder with DX, but the .pre-commit-config.yaml in that PR is has pretty good set of rules which you can put in CI.

Karanjot786 and others added 6 commits May 30, 2024 06:59

add .gitignore

d965357

Merge branch 'elixir-cloud-aai:main' into main

c92f437

docs: add initial README.md draft

a1bfee7

Merge branch 'elixir-cloud-aai:main' into main

35a0b1e

build: add Dockerfile

3023c76

FEAT: add conversion functions for TES and WES to WRROC

2550a5e

sourcery-ai bot reviewed Jun 15, 2024

View reviewed changes

Karanjot786 enabled auto-merge (rebase) June 15, 2024 15:35

Karanjot786 disabled auto-merge June 15, 2024 15:35

SalihuDickson requested changes Jun 18, 2024

View reviewed changes

FIX: Add error handling and correct WES/TES format handling in conver…

272fe98

…sion functions

uniqueg requested changes Jun 20, 2024

View reviewed changes

Karanjot786 closed this by deleting the head repository Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: add conversion functions for TES and WES to WRROC #5

FEAT: add conversion functions for TES and WES to WRROC #5

Karanjot786 commented Jun 15, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jun 15, 2024 •

edited

Loading

sourcery-ai bot left a comment

sourcery-ai bot Jun 15, 2024

sourcery-ai bot Jun 15, 2024

sourcery-ai bot Jun 15, 2024

sourcery-ai bot Jun 15, 2024

sourcery-ai bot Jun 15, 2024

sourcery-ai bot Jun 15, 2024

SalihuDickson Jun 18, 2024

sourcery-ai bot Jun 15, 2024

SalihuDickson Jun 18, 2024

SalihuDickson Jun 18, 2024

uniqueg Jun 20, 2024

SalihuDickson Jun 20, 2024

Karanjot786 commented Jun 19, 2024

Karanjot786 commented Jun 19, 2024

SalihuDickson commented Jun 19, 2024

SalihuDickson commented Jun 19, 2024

Karanjot786 commented Jun 19, 2024

SalihuDickson commented Jun 19, 2024

uniqueg left a comment

uniqueg Jun 20, 2024

uniqueg Jun 20, 2024

uniqueg Jun 20, 2024

uniqueg Jun 20, 2024

uniqueg Jun 20, 2024

uniqueg Jun 20, 2024

JaeAeich commented Jun 20, 2024 •

edited

Loading

-    wrroc = convert_wes_to_wrroc(wes_run_example)
+# src/converter.py
+if __name__ == "__main__":
+    wrroc = convert_wes_to_wrroc(wes_run_example)
+    print(wrroc)

FEAT: add conversion functions for TES and WES to WRROC #5

FEAT: add conversion functions for TES and WES to WRROC #5

Conversation

Karanjot786 commented Jun 15, 2024 • edited by sourcery-ai bot Loading

Description

Changes Included

Summary by Sourcery

sourcery-ai bot commented Jun 15, 2024 • edited Loading

Reviewer's Guide by Sourcery

File-Level Changes

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Jun 15, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jun 15, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jun 15, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jun 15, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jun 15, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jun 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sourcery-ai bot Jun 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Karanjot786 commented Jun 19, 2024

Karanjot786 commented Jun 19, 2024

SalihuDickson commented Jun 19, 2024

SalihuDickson commented Jun 19, 2024

Karanjot786 commented Jun 19, 2024

SalihuDickson commented Jun 19, 2024

uniqueg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaeAeich commented Jun 20, 2024 • edited Loading

Karanjot786 commented Jun 15, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jun 15, 2024 •

edited

Loading

JaeAeich commented Jun 20, 2024 •

edited

Loading