Skip to content

Commit

Permalink
[SPARK-48766][PYTHON] Document the behavior difference of `extraction…
Browse files Browse the repository at this point in the history
…` between `element_at` and `try_element_at`

### What changes were proposed in this pull request?
Document the behavior difference of `extraction` between `element_at` and `try_element_at`

### Why are the changes needed?
when the function `try_element_at` was introduced in 3.5, its `extraction` handling was unintentionally  not consistent with the `element_at`, which causes confusion.

This PR document this behavior difference (I don't think we can fix it since it will be a breaking change).
```
In [1]: from pyspark.sql import functions as sf

In [2]: df = spark.createDataFrame([({"a": 1.0, "b": 2.0}, "a")], ['data', 'b'])

In [3]: df.select(sf.try_element_at(df.data, 'b')).show()
+-----------------------+
|try_element_at(data, b)|
+-----------------------+
|                    1.0|
+-----------------------+

In [4]: df.select(sf.element_at(df.data, 'b')).show()
+-------------------+
|element_at(data, b)|
+-------------------+
|                2.0|
+-------------------+
```

### Does this PR introduce _any_ user-facing change?
doc changes

### How was this patch tested?
ci, added doctests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#47161 from zhengruifeng/doc_element_at_extraction.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
  • Loading branch information
zhengruifeng committed Jul 1, 2024
1 parent 5c29d8d commit 5ac7c9b
Showing 1 changed file with 36 additions and 0 deletions.
36 changes: 36 additions & 0 deletions python/pyspark/sql/functions/builtin.py
Original file line number Diff line number Diff line change
Expand Up @@ -14098,10 +14098,13 @@ def element_at(col: "ColumnOrName", extraction: Any) -> Column:
Notes
-----
The position is not zero based, but 1 based index.
If extraction is a string, :meth:`element_at` treats it as a literal string,
while :meth:`try_element_at` treats it as a column name.

See Also
--------
:meth:`get`
:meth:`try_element_at`

Examples
--------
Expand Down Expand Up @@ -14148,6 +14151,17 @@ def element_at(col: "ColumnOrName", extraction: Any) -> Column:
+-------------------+
| NULL|
+-------------------+

Example 5: Getting a value from a map using a literal string as the key

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0}, "a")], ['data', 'b'])
>>> df.select(sf.element_at(df.data, 'b')).show()
+-------------------+
|element_at(data, b)|
+-------------------+
| 2.0|
+-------------------+
"""
return _invoke_function_over_columns("element_at", col, lit(extraction))

Expand All @@ -14172,6 +14186,17 @@ def try_element_at(col: "ColumnOrName", extraction: "ColumnOrName") -> Column:
extraction :
index to check for in array or key to check for in map

Notes
-----
The position is not zero based, but 1 based index.
If extraction is a string, :meth:`try_element_at` treats it as a column name,
while :meth:`element_at` treats it as a literal string.

See Also
--------
:meth:`get`
:meth:`element_at`

Examples
--------
Example 1: Getting the first element of an array
Expand Down Expand Up @@ -14228,6 +14253,17 @@ def try_element_at(col: "ColumnOrName", extraction: "ColumnOrName") -> Column:
+-----------------------+
| NULL|
+-----------------------+

Example 6: Getting a value from a map using a column name as the key

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0}, "a")], ['data', 'b'])
>>> df.select(sf.try_element_at(df.data, 'b')).show()
+-----------------------+
|try_element_at(data, b)|
+-----------------------+
| 1.0|
+-----------------------+
"""
return _invoke_function_over_columns("try_element_at", col, extraction)

Expand Down

0 comments on commit 5ac7c9b

Please sign in to comment.