-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documenting "inspect" and context awareness udf.rst #617
base: master
Are you sure you want to change the base?
Conversation
Hi, I made a quick description on how to use the "inspect" function to print within the UDFs. moreover, I tried to give a brief explanation on how users can provide their own local side variables to be used within the UDF.
Nice, thanks! |
Hi, thanks for taking the time to contribute! About the part on About the part on passing through the context: that is indeed a valid topic to document better. Also see the discussion at #520 |
Yes, indeed it is better to integrate into the existing docs. |
good point, it's not easy to find succint examples on properly using context in UDF. We have some unit test coverage in the VITO backend on this for example at https://github.com/Open-EO/openeo-geopyspark-driver/blob/cdd731ce6d684eba894beff7c8ac78266ddf12b0/tests/test_api_result.py#L718-L889, but that's probably a bit cryptic. I think there are two use cases to document:
udf = openeo.UDF(
"...",
context={"factor": 12.34},
)
cube = cube.apply(udf)
udf = openeo.UDF(
"...",
context={"from_parameter": "context"},
)
cube = cube.apply(udf, context={"factor": 12.34}) Both of these patterns have their usefulness .The first is simpler to reason about. The second is the approach to take when the context comes from "higher up", e.g. UDP parameters |
That being said, I think the python client should make it simpler to get that second usage pattern right. I made a ticket for that: |
Co-authored-by: Stefaan Lippens <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added proper description, shorter lines and passing "context" to parent udf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more notes
@@ -317,7 +317,58 @@ To invoke a UDF like this, the apply_neighborhood method is most suitable: | |||
{'dimension': 'y', 'value': 128, 'unit': 'px'} | |||
], overlap=[]) | |||
|
|||
Inspecting variables within UDF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is still redundant given the existing "Logging from a UDF" section at
openeo-python-client/docs/udf.rst
Lines 616 to 645 in 21922f1
Logging from a UDF | |
===================== | |
From time to time, when things are not working as expected, | |
you may want to log some additional debug information from your UDF, inspect the data that is being processed, | |
or log warnings. | |
This can be done using the :py:class:`~openeo.udf.debug.inspect()` function. | |
For example: to discover the shape of the data cube chunk that you receive in your UDF function: | |
.. code-block:: python | |
:caption: Sample UDF code with ``inspect()`` logging | |
:emphasize-lines: 1, 5 | |
from openeo.udf import inspect | |
import xarray | |
def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray: | |
inspect(data=[cube.shape], message="UDF logging shape of my cube") | |
cube.values = 0.0001 * cube.values | |
return cube | |
After the batch job is finished (or failed), you can find this information in the logs of the batch job. | |
For example (as explained at :ref:`batch-job-logs`), | |
use :py:class:`BatchJob.logs() <openeo.rest.job.BatchJob.logs>` in a Jupyter notebook session | |
to retrieve and filter the logs interactively: | |
.. image:: _static/images/udf/logging_arrayshape.png | |
Which reveals in this example a chunking shape of ``[3, 256, 256]``. |
I'd propose to finetune the existing docs if that is necessary
Passing user defined variables to UDF | ||
======================================== | ||
|
||
In order to pass variables and values that are used throughout the user side of script, these need to be put in the `context` dictionary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used throughout the user side of script
I'm not sure what you mean here, e.g. with "user side". And what "script" are you referring to? The script that build the process graph, or the UDF script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the user side, what I mean is the script where the user runs the UDF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused what you mean:
where the user runs the UDF.
the user does not run the UDF user side, it's the backend that executes the UDF backend-side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the script where the user defines the UDF script.
======================================== | ||
|
||
In order to pass variables and values that are used throughout the user side of script, these need to be put in the `context` dictionary. | ||
Once, these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once, these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF. | |
Once these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF. |
variables are defined within
context
dictionary
This is a bit confusing, because in your example you define them in a user_variable
dictionary
In the example above, the user stores a preferred value of `0.0001` in the `user_variable` dictionary, | ||
which can be passed to the UDF and used by the function. | ||
Later, this value is accessed by calling `context["factor"]` within the UDF. | ||
The parent UDF is called with the user's custom dictionary with `.apply(udf, context = user_variable)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The UDF is not the parent here, apply
is the parent.
the hierarchical flow is apply
-> run_udf
-> your UDF
Co-authored-by: Stefaan Lippens <[email protected]>
Hi, I made a quick description on how to use the "inspect" function to print within the UDFs.
moreover, I tried to give a brief explanation on how users can provide their own local side variables to be used within the UDF.