Skip to content

Commit

Permalink
More complex output
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian committed Jun 16, 2023
1 parent a425211 commit 1291a5c
Show file tree
Hide file tree
Showing 2 changed files with 206 additions and 9 deletions.
207 changes: 200 additions & 7 deletions advanced/apply_ufunc/complex-output-numpy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,16 @@
"source": [
"# Handling complex output\n",
"\n",
"We've seen how to use `apply_ufunc` to handle relatively simple functions that transform every element, or reduce along a single dimension.\n",
"\n",
"This lesson will show you how to handle cases where the output is more complex in two ways:\n",
"1. Handle adding a new dimension by speicfying `output_core_dims`\n",
"1. Handling the change in size of an existing dimension by specifying `exclude_dims`\n",
"\n",
"\n",
"## Introduction\n",
"\n",
"A good example is numpy's 1D interpolate function :py:func:`numpy.interp`:\n",
"A good example of a function that returns relatively complex output is numpy's 1D interpolate function `numpy.interp`:\n",
"\n",
"```\n",
" Signature: np.interp(x, xp, fp, left=None, right=None, period=None)\n",
Expand All @@ -24,7 +30,17 @@
" with given discrete data points (`xp`, `fp`), evaluated at `x`.\n",
"```\n",
"\n",
"This function expects a 1D array as input, and returns a 1D array as output.\n"
"This function expects a 1D array as input, and returns a 1D array as output.\n",
"\n",
"\n",
"```{tip}Exercise\n",
"How many core dimensions does `numpy.interp` handle?\n",
"```\n",
"```{tip}Solution\n",
":class:dropdown\n",
"\n",
"One.\n",
"```\n"
]
},
{
Expand Down Expand Up @@ -55,7 +71,29 @@
"user_expressions": []
},
"source": [
"## Adding a new dimension"
"## Adding a new dimension\n",
"\n",
"1D interpolation transforms the size of the input along a single dimension.\n",
"\n",
"Logically, we can think of this as removing the old dimension and adding a new dimension.\n",
"\n",
"We provide this information to `apply_ufunc` using the `output_core_dims` keyword argument\n",
"\n",
"```\n",
" output_core_dims : List[tuple], optional\n",
" List of the same length as the number of output arguments from\n",
" ``func``, giving the list of core dimensions on each output that were\n",
" not broadcast on the inputs. By default, we assume that ``func``\n",
" outputs exactly one array, with axes corresponding to each broadcast\n",
" dimension.\n",
"\n",
" Core dimensions are assumed to appear as the last dimensions of each\n",
" output in the provided order.\n",
"```\n",
"\n",
"For `interp` we expect one returned output with one new core dimension that we will call `\"lat_interp\"`.\n",
"\n",
"Specify this using `output_core_dims=[[\"lat_interp\"]]`"
]
},
{
Expand All @@ -79,6 +117,40 @@
")"
]
},
{
"cell_type": "markdown",
"id": "c0a5b8d4-729e-4d0e-b284-4751c5edc37c",
"metadata": {
"tags": [],
"user_expressions": []
},
"source": [
"```{tip}Exercise\n",
"\n",
"\n",
"Apply the following function using `apply_ufunc`. It adds a new dimension to the input array, let's call it `newdim`. Specify the new dimension using `output_core_dims`\n",
"\n",
"```python\n",
"def add_new_dim(array):\n",
" return np.expand_dims(array, axis=0)\n",
"```\n",
"```{tip}Solution\n",
":class: dropdown\n",
"\n",
"```{code-cell} python\n",
"def add_new_dim(array):\n",
" return np.expand_dims(array, axis=0)\n",
"\n",
"\n",
"xr.apply_ufunc(\n",
" add_new_dim,\n",
" air,\n",
" input_core_dims=[[\"lat\"]],\n",
" output_core_dims=[[\"newdim\", \"lat\"]],\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "7767a63d-20c5-4c2d-8bf0-3b26bc2b336f",
Expand All @@ -87,7 +159,11 @@
"user_expressions": []
},
"source": [
"## Dimensions that change size"
"## Dimensions that change size\n",
"\n",
"Imagine that you want the output to have the same dimension name `\"lat\"` i.e. applying`np.interp` changes the size of the `\"lat\"` dimension.\n",
"\n",
"We get an a error if we specify `\"lat\"` in `output_core_dims`"
]
},
{
Expand All @@ -101,7 +177,8 @@
},
"outputs": [],
"source": [
"%xmode minimal\n",
"# minimize error message\n",
"%xmode Minimal\n",
"\n",
"newlat = np.linspace(15, 75, 100)\n",
"\n",
Expand All @@ -112,7 +189,34 @@
" air,\n",
" input_core_dims=[[\"lat\"], [\"lat\"], [\"lat\"]],\n",
" output_core_dims=[[\"lat\"]],\n",
")"
")\n",
"\n",
"%xmode Context"
]
},
{
"cell_type": "markdown",
"id": "5276692d-0e1d-498a-8d60-e08a4d8b9d3a",
"metadata": {
"tags": [],
"user_expressions": []
},
"source": [
"As the error message points out,\n",
"```\n",
"Only dimensions specified in ``exclude_dims`` with xarray.apply_ufunc are allowed to change size.\n",
"```\n",
"\n",
"Looking at the docstring we need to specify `exclude_dims` as a \"set\":\n",
"\n",
"```\n",
"exclude_dims : set, optional\n",
" Core dimensions on the inputs to exclude from alignment and\n",
" broadcasting entirely. Any input coordinates along these dimensions\n",
" will be dropped. Each excluded dimension must also appear in\n",
" ``input_core_dims`` for at least one argument. Only dimensions listed\n",
" here are allowed to change size between input and output objects.\n",
"```\n"
]
},
{
Expand Down Expand Up @@ -145,7 +249,96 @@
"user_expressions": []
},
"source": [
"## TODO: Returning multiple variables"
"## Returning multiple variables\n",
"\n",
"Another common, but more complex, case is to handle multiple outputs returned by the function.\n",
"\n",
"As an example we will write a function that returns the minimum and maximum value along the last axis of the array.\n",
"\n",
"We will work with a 2D array, and apply the function `minmax` along the `\"lat\"` dimension:\n",
"```python\n",
"def minmax(array):\n",
" return array.min(axis=-1), array.max(axis=-1)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7accd69-fece-46be-852b-0cb7c432197a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def minmax(array):\n",
" return array.min(axis=-1), array.max(axis=-1)\n",
"\n",
"\n",
"air2d = xr.tutorial.load_dataset(\"air_temperature\").air.isel(time=0)\n",
"air2d"
]
},
{
"cell_type": "markdown",
"id": "c99f6a5e-f977-4828-9418-202d93d0acda",
"metadata": {
"tags": [],
"user_expressions": []
},
"source": [
"By default, Xarray assumes one array is returned by the applied function.\n",
"\n",
"Here we have two returned arrays, and the input core dimension `\"lat\"` is removed (or reduced over).\n",
"\n",
"So we provide `output_core_dims=[[], []]` i.e. an empty list of core dimensions for each of the two returned arrays."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "204c649e-18d4-403a-9366-c46caaaefb52",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"minda, maxda = xr.apply_ufunc(\n",
" minmax,\n",
" air2d,\n",
" input_core_dims=[[\"lat\"]],\n",
" output_core_dims=[[], []],\n",
")\n",
"minda"
]
},
{
"cell_type": "markdown",
"id": "b9b023e8-5ca4-436a-bdfb-3ce35f6ea712",
"metadata": {
"tags": [],
"user_expressions": []
},
"source": [
"```{tip}Exercise\n",
"\n",
"We presented the concept of \"core dimensions\" as the \"smallest unit of data the function could handle.\" Do you understand how the following use of `apply_ufunc` generalizes to an array with more than one dimension? Try it with `air3d = xr.tutorial.load_dataset(\"air_temperature\").air.isel(time=0)`\n",
"\n",
"```{code-cell} python\n",
"minda, maxda = xr.apply_ufunc(\n",
" minmax,\n",
" air2d,\n",
" input_core_dims=[[\"lat\"]],\n",
" output_core_dims=[[],[]],\n",
")\n",
"```\n",
"\n",
"```{tip}Solution\n",
":class:dropdown\n",
"\n",
"We want to use `minmax` to compute the minimum and maximum along the \"lat\" dimension always, regardless of how many dimensions are on the input. So we specify `input_core_dims=[[\"lat\"]]`. The output does not contain the \"lat\" dimension, but we expect two returned variables. So we pass an empty list `[]` for each returned array, so `output_core_dims=[[], []]`\n",
"\n",
"```"
]
}
],
Expand Down
8 changes: 6 additions & 2 deletions advanced/apply_ufunc/simple_numpy_apply_ufunc.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -353,7 +353,9 @@
"applied function returned data with unexpected number of dimensions. Received 2 dimension(s) but expected 3 dimensions with names: ('time', 'lat', 'lon')\n",
"```\n",
"\n",
"means that while `np.mean` did indeed reduce one dimension, we did not tell `apply_ufunc` that this would happen. That is, we need to specify the core dimensions on the input."
"means that while `np.mean` did indeed reduce one dimension, we did not tell `apply_ufunc` that this would happen. That is, we need to specify the core dimensions on the input.\n",
"\n",
"Do that by passing a list of dimension names for each input object. For this function we have one input : `ds` and with a single core dimension `\"time\"` so we have `input_core_dims=[[\"time\"]]`"
]
},
{
Expand All @@ -372,7 +374,9 @@
" ds,\n",
" # specify core dimensions as a list of lists\n",
" # here 'time' is the core dimension on `ds`\n",
" input_core_dims=[[\"time\"]],\n",
" input_core_dims=[\n",
" [\"time\"], # core dimension for ds\n",
" ],\n",
" kwargs={\"axis\": 0},\n",
")"
]
Expand Down

0 comments on commit 1291a5c

Please sign in to comment.