More complex output

xarray-contrib · Jun 16, 2023 · 1291a5c · 1291a5c
1 parent a425211
commit 1291a5c
Show file tree

Hide file tree

Showing 2 changed files with 206 additions and 9 deletions.
diff --git a/advanced/apply_ufunc/complex-output-numpy.ipynb b/advanced/apply_ufunc/complex-output-numpy.ipynb
@@ -10,10 +10,16 @@
    "source": [
     "# Handling complex output\n",
     "\n",
+    "We've seen how to use `apply_ufunc` to handle relatively simple functions that transform every element, or reduce along a single dimension.\n",
+    "\n",
+    "This lesson will show you how to handle cases where the output is more complex in two ways:\n",
+    "1. Handle adding a new dimension by speicfying `output_core_dims`\n",
+    "1. Handling the change in size of an existing dimension by specifying `exclude_dims`\n",
+    "\n",
     "\n",
     "## Introduction\n",
     "\n",
-    "A good example is numpy's 1D interpolate function :py:func:`numpy.interp`:\n",
+    "A good example of a function that returns relatively complex output is numpy's 1D interpolate function `numpy.interp`:\n",
     "\n",
     "```\n",
     "    Signature: np.interp(x, xp, fp, left=None, right=None, period=None)\n",
@@ -24,7 +30,17 @@
     "    with given discrete data points (`xp`, `fp`), evaluated at `x`.\n",
     "```\n",
     "\n",
-    "This function expects a 1D array as input, and returns a 1D array as output.\n"
+    "This function expects a 1D array as input, and returns a 1D array as output.\n",
+    "\n",
+    "\n",
+    "```{tip}Exercise\n",
+    "How many core dimensions does `numpy.interp` handle?\n",
+    "```\n",
+    "```{tip}Solution\n",
+    ":class:dropdown\n",
+    "\n",
+    "One.\n",
+    "```\n"
    ]
   },
   {
@@ -55,7 +71,29 @@
     "user_expressions": []
    },
    "source": [
-    "## Adding a new dimension"
+    "## Adding a new dimension\n",
+    "\n",
+    "1D interpolation transforms the size of the input along a single dimension.\n",
+    "\n",
+    "Logically, we can think of this as removing the old dimension and adding a new dimension.\n",
+    "\n",
+    "We provide this information to `apply_ufunc` using the `output_core_dims` keyword argument\n",
+    "\n",
+    "```\n",
+    "   output_core_dims : List[tuple], optional\n",
+    "        List of the same length as the number of output arguments from\n",
+    "        ``func``, giving the list of core dimensions on each output that were\n",
+    "        not broadcast on the inputs. By default, we assume that ``func``\n",
+    "        outputs exactly one array, with axes corresponding to each broadcast\n",
+    "        dimension.\n",
+    "\n",
+    "        Core dimensions are assumed to appear as the last dimensions of each\n",
+    "        output in the provided order.\n",
+    "```\n",
+    "\n",
+    "For `interp` we expect one returned output with one new core dimension that we will call `\"lat_interp\"`.\n",
+    "\n",
+    "Specify this using `output_core_dims=[[\"lat_interp\"]]`"
    ]
   },
   {
@@ -79,6 +117,40 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "c0a5b8d4-729e-4d0e-b284-4751c5edc37c",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "```{tip}Exercise\n",
+    "\n",
+    "\n",
+    "Apply the following function using `apply_ufunc`. It adds a new dimension to the input array, let's call it `newdim`. Specify the new dimension using `output_core_dims`\n",
+    "\n",
+    "```python\n",
+    "def add_new_dim(array):\n",
+    "    return np.expand_dims(array, axis=0)\n",
+    "```\n",
+    "```{tip}Solution\n",
+    ":class: dropdown\n",
+    "\n",
+    "```{code-cell} python\n",
+    "def add_new_dim(array):\n",
+    "    return np.expand_dims(array, axis=0)\n",
+    "\n",
+    "\n",
+    "xr.apply_ufunc(\n",
+    "    add_new_dim,\n",
+    "    air,\n",
+    "    input_core_dims=[[\"lat\"]],\n",
+    "    output_core_dims=[[\"newdim\", \"lat\"]],\n",
+    ")\n",
+    "```"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "7767a63d-20c5-4c2d-8bf0-3b26bc2b336f",
@@ -87,7 +159,11 @@
     "user_expressions": []
    },
    "source": [
-    "## Dimensions that change size"
+    "## Dimensions that change size\n",
+    "\n",
+    "Imagine that you want the output to have the same dimension name `\"lat\"` i.e. applying`np.interp` changes the size of the `\"lat\"` dimension.\n",
+    "\n",
+    "We get an a error if we specify `\"lat\"` in `output_core_dims`"
    ]
   },
   {
@@ -101,7 +177,8 @@
    },
    "outputs": [],
    "source": [
-    "%xmode minimal\n",
+    "# minimize error message\n",
+    "%xmode Minimal\n",
     "\n",
     "newlat = np.linspace(15, 75, 100)\n",
     "\n",
@@ -112,7 +189,34 @@
     "    air,\n",
     "    input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n",
     "    output_core_dims=[[\"lat\"]],\n",
-    ")"
+    ")\n",
+    "\n",
+    "%xmode Context"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5276692d-0e1d-498a-8d60-e08a4d8b9d3a",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "As the error message points out,\n",
+    "```\n",
+    "Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size.\n",
+    "```\n",
+    "\n",
+    "Looking at the docstring we need to specify `exclude_dims` as a \"set\":\n",
+    "\n",
+    "```\n",
+    "exclude_dims : set, optional\n",
+    "        Core dimensions on the inputs to exclude from alignment and\n",
+    "        broadcasting entirely. Any input coordinates along these dimensions\n",
+    "        will be dropped. Each excluded dimension must also appear in\n",
+    "        ``input_core_dims`` for at least one argument. Only dimensions listed\n",
+    "        here are allowed to change size between input and output objects.\n",
+    "```\n"
    ]
   },
   {
@@ -145,7 +249,96 @@
     "user_expressions": []
    },
    "source": [
-    "## TODO: Returning multiple variables"
+    "## Returning multiple variables\n",
+    "\n",
+    "Another common, but more complex, case is to handle multiple outputs returned by the function.\n",
+    "\n",
+    "As an example we will write a function that returns the minimum and maximum value along the last axis of the array.\n",
+    "\n",
+    "We will work with a 2D array, and apply the function `minmax` along the `\"lat\"` dimension:\n",
+    "```python\n",
+    "def minmax(array):\n",
+    "    return array.min(axis=-1), array.max(axis=-1)\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c7accd69-fece-46be-852b-0cb7c432197a",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "def minmax(array):\n",
+    "    return array.min(axis=-1), array.max(axis=-1)\n",
+    "\n",
+    "\n",
+    "air2d = xr.tutorial.load_dataset(\"air_temperature\").air.isel(time=0)\n",
+    "air2d"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c99f6a5e-f977-4828-9418-202d93d0acda",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "By default, Xarray assumes one array is returned by the applied function.\n",
+    "\n",
+    "Here we have two returned arrays, and the input core dimension `\"lat\"` is removed (or reduced over).\n",
+    "\n",
+    "So we provide `output_core_dims=[[], []]` i.e. an empty list of core dimensions for each of the two returned arrays."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "204c649e-18d4-403a-9366-c46caaaefb52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "minda, maxda = xr.apply_ufunc(\n",
+    "    minmax,\n",
+    "    air2d,\n",
+    "    input_core_dims=[[\"lat\"]],\n",
+    "    output_core_dims=[[], []],\n",
+    ")\n",
+    "minda"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9b023e8-5ca4-436a-bdfb-3ce35f6ea712",
+   "metadata": {
+    "tags": [],
+    "user_expressions": []
+   },
+   "source": [
+    "```{tip}Exercise\n",
+    "\n",
+    "We presented the concept of \"core dimensions\" as the \"smallest unit of data the function could handle.\" Do you understand how the following use of `apply_ufunc` generalizes to an array with more than one dimension? Try it with `air3d = xr.tutorial.load_dataset(\"air_temperature\").air.isel(time=0)`\n",
+    "\n",
+    "```{code-cell} python\n",
+    "minda, maxda = xr.apply_ufunc(\n",
+    "    minmax,\n",
+    "    air2d,\n",
+    "    input_core_dims=[[\"lat\"]],\n",
+    "    output_core_dims=[[],[]],\n",
+    ")\n",
+    "```\n",
+    "\n",
+    "```{tip}Solution\n",
+    ":class:dropdown\n",
+    "\n",
+    "We want to use `minmax` to compute the minimum and maximum along the \"lat\" dimension always, regardless of how many dimensions are on the input. So we specify `input_core_dims=[[\"lat\"]]`. The output does not contain the \"lat\" dimension, but we expect two returned variables. So we pass an empty list `[]` for each returned array, so `output_core_dims=[[], []]`\n",
+    "\n",
+    "```"
    ]
   }
  ],

diff --git a/advanced/apply_ufunc/simple_numpy_apply_ufunc.ipynb b/advanced/apply_ufunc/simple_numpy_apply_ufunc.ipynb
@@ -353,7 +353,9 @@
     "applied function returned data with unexpected number of dimensions. Received 2 dimension(s) but expected 3 dimensions with names: ('time', 'lat', 'lon')\n",
     "```\n",
     "\n",
-    "means that while `np.mean` did indeed reduce one dimension, we did not tell `apply_ufunc` that this would happen. That is, we need to specify the core dimensions on the input."
+    "means that while `np.mean` did indeed reduce one dimension, we did not tell `apply_ufunc` that this would happen. That is, we need to specify the core dimensions on the input.\n",
+    "\n",
+    "Do that by passing a list of dimension names for each input object. For this function we have one input : `ds` and with a single core dimension `\"time\"` so we have `input_core_dims=[[\"time\"]]`"
    ]
   },
   {
@@ -372,7 +374,9 @@
     "    ds,\n",
     "    # specify core dimensions as a list of lists\n",
     "    # here 'time' is the core dimension on `ds`\n",
-    "    input_core_dims=[[\"time\"]],\n",
+    "    input_core_dims=[\n",
+    "        [\"time\"],  # core dimension for ds\n",
+    "    ],\n",
     "    kwargs={\"axis\": 0},\n",
     ")"
    ]