Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting "inspect" and context awareness udf.rst #617

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions docs/udf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,58 @@ To invoke a UDF like this, the apply_neighborhood method is most suitable:
{'dimension': 'y', 'value': 128, 'unit': 'px'}
], overlap=[])

Inspecting variables within UDF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is still redundant given the existing "Logging from a UDF" section at

Logging from a UDF
=====================
From time to time, when things are not working as expected,
you may want to log some additional debug information from your UDF, inspect the data that is being processed,
or log warnings.
This can be done using the :py:class:`~openeo.udf.debug.inspect()` function.
For example: to discover the shape of the data cube chunk that you receive in your UDF function:
.. code-block:: python
:caption: Sample UDF code with ``inspect()`` logging
:emphasize-lines: 1, 5
from openeo.udf import inspect
import xarray
def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:
inspect(data=[cube.shape], message="UDF logging shape of my cube")
cube.values = 0.0001 * cube.values
return cube
After the batch job is finished (or failed), you can find this information in the logs of the batch job.
For example (as explained at :ref:`batch-job-logs`),
use :py:class:`BatchJob.logs() <openeo.rest.job.BatchJob.logs>` in a Jupyter notebook session
to retrieve and filter the logs interactively:
.. image:: _static/images/udf/logging_arrayshape.png
Which reveals in this example a chunking shape of ``[3, 256, 256]``.

I'd propose to finetune the existing docs if that is necessary

========================================

To print and inspect variables that are within the UDF, users can use `inspect(data=[], message="")` function.
This will print the data that is supplied within, and show it with the message within the logs.

.. code-block:: python
:linenos:
:caption: ``Inspecting UDFs``
:emphasize-lines: 7

# Create a UDF object from inline source code.
udf = openeo.UDF("""
import xarray

def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:
cube.values = 0.0001 * cube.values
inspect(data=[type(cube.values)], message="The dtype of cube.values")
return cube
""")

In the above example, the `inspect` function is used to retrieve the datatype of `cube.values`. Once the job logs are opened in the Web Editor, the result will appear
under the supplied message. This case it will be shown that `Data: <class 'numpy.ndarray'>`

Passing user defined variables to UDF
========================================

In order to pass variables and values that are used throughout the user side of script, these need to be put in the `context` dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used throughout the user side of script

I'm not sure what you mean here, e.g. with "user side". And what "script" are you referring to? The script that build the process graph, or the UDF script?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the user side, what I mean is the script where the user runs the UDF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused what you mean:

where the user runs the UDF.

the user does not run the UDF user side, it's the backend that executes the UDF backend-side

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the script where the user defines the UDF script.

Once, these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once, these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF.
Once these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF.

variables are defined within context dictionary

This is a bit confusing, because in your example you define them in a user_variable dictionary

See the example below:

.. code-block:: python
:linenos:
:caption: ``Passing user defined values``
:emphasize-lines: 8

# Create a UDF object from inline source code.
udf = openeo.UDF("""
import xarray

def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:
cube.values = context["factor"] * cube.values # Accessing the value stored in the context dictionary by the "factor" key.
daviddkovacs marked this conversation as resolved.
Show resolved Hide resolved
return cube
""",context={"from_parameter": "context"}) # the UDF is now context aware

user_variable = {"factor": 0.0001}
cube = cube.apply(udf, context = user_variable)

In the example above, the user stores a preferred value of ``0.0001`` in the ``user_variable`` dictionary,
which can be passed to the UDF and used by the function.
Later, this value is accessed by calling `context["factor"]` within the UDF.
The parent UDF is called with the user's custom dictionary with `.apply(udf, context = user_variable)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The UDF is not the parent here, apply is the parent.

the hierarchical flow is apply -> run_udf -> your UDF


Example: ``apply_dimension`` with a UDF
========================================
Expand Down