Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply_neighborhood gives error when time dimension is missing #305

Open
EmileSonneveld opened this issue Jun 12, 2024 · 0 comments
Open

Comments

@EmileSonneveld
Copy link
Contributor

When removing the time dimension, apply_neighborhood gives the following error:

OpenEO batch job failed: java.lang.ClassCastException: class geotrellis.layer.SpatialKey cannot be cast to class geotrellis.layer.SpaceTimeKey (geotrellis.layer.SpatialKey and geotrellis.layer.SpaceTimeKey are in unnamed module of loader org.apache.spark.util.MutableURLClassLoader @7c4d1c7b)
Traceback (most recent call last):
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1375, in <module>
    main(sys.argv)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1040, in main
    run_driver()
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1011, in run_driver
    run_job(
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py", line 56, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1104, in run_job
    result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 377, in evaluate
    result = convert_node(result_node, env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1581, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1581, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 416, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1613, in apply_process
    return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 721, in apply_neighborhood
    return data_cube.apply_neighborhood(process=process, size=size, overlap=overlap, env=env, context=context)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 1077, in apply_neighborhood
    retiled_collection = self._apply_to_levels_geotrellis_rdd(
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 113, in _apply_to_levels_geotrellis_rdd
    pyramid = Pyramid({
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 114, in <dictcomp>
    k: self._create_tilelayer(func(l.srdd.rdd(), k), l.layer_type if target_type==None else target_type , k)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 1078, in <lambda>
    lambda rdd, level: jvm.org.openeo.geotrellis.OpenEOProcesses().retile(rdd, sizeX, sizeY, overlap_x, overlap_y))
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o1746.retile.
: java.lang.ClassCastException: class geotrellis.layer.SpatialKey cannot be cast to class geotrellis.layer.SpaceTimeKey (geotrellis.layer.SpatialKey and geotrellis.layer.SpaceTimeKey are in unnamed module of loader org.apache.spark.util.MutableURLClassLoader @7c4d1c7b)
	at geotrellis.util.package$withGetComponentMethods.getComponent(package.scala:35)
	at org.openeo.geotrellis.OpenEOProcesses.filterNegativeSpatialKeys(OpenEOProcesses.scala:763)
	at org.openeo.geotrellis.OpenEOProcesses.retile(OpenEOProcesses.scala:921)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)

Adding an empty time dimension does not seem to fix it.

process graph
{
  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "B04"
        ],
        "id": "SENTINEL2_L2A",
        "properties": {
          "eo:cloud_cover": {
            "process_graph": {
              "lte1": {
                "process_id": "lte",
                "arguments": {
                  "x": {
                    "from_parameter": "value"
                  },
                  "y": 20
                },
                "result": true
              }
            }
          }
        },
        "spatial_extent": {
          "west": 8.908,
          "south": 53.791,
          "east": 8.96,
          "north": 54.016
        },
        "temporal_extent": [
          "2022-10-01",
          "2022-12-01"
        ]
      }
    },
    "reducedimension1": {
      "process_id": "reduce_dimension",
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "dimension": "t",
        "reducer": {
          "process_graph": {
            "min1": {
              "process_id": "min",
              "arguments": {
                "data": {
                  "from_parameter": "data"
                }
              },
              "result": true
            }
          }
        }
      }
    },
    "applyneighborhood1": {
      "process_id": "apply_neighborhood",
      "arguments": {
        "data": {
          "from_node": "reducedimension1"
        },
        "overlap": [
          {
            "dimension": "x",
            "value": 32,
            "unit": "px"
          },
          {
            "dimension": "y",
            "value": 32,
            "unit": "px"
          }
        ],
        "process": {
          "process_graph": {
            "runudf1": {
              "process_id": "run_udf",
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "runtime": "Python",
                "udf": "import functools\nimport sys\nfrom openeo.udf import XarrayDataCube\nfrom typing import Dict\nimport xarray as xr\nfrom openeo.udf.debug import inspect\n\n# The onnx_deps folder contains the extrcted contents of the dependencies archive provided in the job options\nsys.path.insert(0, \"onnx_deps\") \nimport onnxruntime as ort\n\[email protected]_cache(maxsize=5)\ndef _load_ort_session(model_name: str):\n    \"\"\"\n    Loads an ONNX model from the onnx_models folder and returns an ONNX runtime session.\n\n    Extracting the model loading code into a separate function allows us to cache the loaded model.\n    This prevents the model from being loaded for every chunk of data that is processed, but only once per executor.\n\n    Should you have to download the model from a remote location, you can add the download code here, and cache the model.\n\n    Make sure that the arguments of the method you add the @functools.lru_cache decorator to are hashable.\n    Be carefull with using this decorator for class methods, as the self argument is not hashable. In that case, you can use a static method.\n\n    More information on this functool can be found here: https://docs.python.org/3/library/functools.html#functools.lru_cache\n    \"\"\"\n    # the onnx_models folder contians the content of the model archive provided in the job options\n    return ort.InferenceSession(f\"onnx_models/{model_name}\") \n\ndef apply_datacube(cube: xr.DataArray, context: Dict) -> xr.DataArray:\n    inspect(cube)\n    cube = cube.transpose(\"bands\", \"y\", \"x\")  # Make sure the data is in the correct order\n\n    input_np = cube.values  # Only perform inference for the first date and get the numpy array\n    input_np = input_np.reshape(1,1,256,256)  # Neural network expects shape (1, 1, 256, 256)\n    inspect(input_np)\n    # Load the model\n    ort_session = _load_ort_session(\"test_model.onnx\") #name of the model in the archive\n\n    # Perform inference\n    ort_inputs = {ort_session.get_inputs()[0].name: input_np}\n    ort_outputs = ort_session.run(None, ort_inputs)\n    output_np = ort_outputs[0].reshape(1,256,256)  # Reshape the output to the expected shape\n\n    # Convert the output back to an xarray DataArray\n    output_data = xr.DataArray(\n        output_np,\n        dims=[\"bands\", \"y\", \"x\"],\n    ) \n    inspect(output_data)\n    # The input data did not contain a time dimension in this case, so we do not add it to the output. \n    # If the input data contains a time dimension, you should add it here:\n    # output_data = output_data.expand_dims(\n    #     dim={\n    #         \"t\": [\"2000-01-01\"], \n    #     },\n    # )\n\n\n    return output_data\n"
              },
              "result": true
            }
          }
        },
        "size": [
          {
            "dimension": "x",
            "value": 256,
            "unit": "px"
          },
          {
            "dimension": "y",
            "value": 256,
            "unit": "px"
          }
        ]
      }
    },
    "saveresult1": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "applyneighborhood1"
        },
        "format": "netCDF",
        "options": {}
      },
      "result": true
    }
  }
}
jdries added a commit that referenced this issue Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant