Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Commit

Permalink
Fixes for examples in the User Guide (#357)
Browse files Browse the repository at this point in the history
* Fix last example in User Guide / Expressions / Window functions

* Fix example for asof join with tolerance

* Fix typos in Transformations / Pivot and Time Series

* Clarify Lazy API / Query plan
  • Loading branch information
aglebov authored Jul 11, 2023
1 parent d5decdd commit ed42ca4
Show file tree
Hide file tree
Showing 6 changed files with 33 additions and 20 deletions.
4 changes: 2 additions & 2 deletions docs/src/python/user-guide/expressions/window.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,12 @@
out = df.sort("Type 1").select(
pl.col("Type 1").head(3).over("Type 1", mapping_strategy="explode"),
pl.col("Name")
.sort_by(pl.col("Speed"))
.sort_by(pl.col("Speed"), descending=True)
.head(3)
.over("Type 1", mapping_strategy="explode")
.alias("fastest/group"),
pl.col("Name")
.sort_by(pl.col("Attack"))
.sort_by(pl.col("Attack"), descending=True)
.head(3)
.over("Type 1", mapping_strategy="explode")
.alias("strongest/group"),
Expand Down
4 changes: 3 additions & 1 deletion docs/src/python/user-guide/transformations/joins.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@
# --8<-- [end:asof]

# --8<-- [start:asof2]
df_asof_tolerance_join = df_trades.join_asof(df_quotes, on="time", by="stock")
df_asof_tolerance_join = df_trades.join_asof(
df_quotes, on="time", by="stock", tolerance="1m"
)
print(df_asof_tolerance_join)
# --8<-- [end:asof2]
4 changes: 2 additions & 2 deletions docs/src/rust/user-guide/expressions/window.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,14 +101,14 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
.over(["Type 1"])
.flatten(),
col("Name")
.sort_by(["Speed"], [false])
.sort_by(["Speed"], [true])
.head(Some(3))
.list()
.over(["Type 1"])
.flatten()
.alias("fastest/group"),
col("Name")
.sort_by(["Attack"], [false])
.sort_by(["Attack"], [true])
.head(Some(3))
.list()
.over(["Type 1"])
Expand Down
6 changes: 3 additions & 3 deletions docs/user-guide/expressions/window.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,9 @@ For more exercise, below are some window functions for us to compute:

- sort all pokemon by type
- select the first `3` pokemon per type as `"Type 1"`
- sort the pokemon within a type by speed and select the first `3` as `"fastest/group"`
- sort the pokemon within a type by attack and select the first `3` as `"strongest/group"`
- sort the pokemon by name within a type and select the first `3` as `"sorted_by_alphabet"`
- sort the pokemon within a type by speed in descending order and select the first `3` as `"fastest/group"`
- sort the pokemon within a type by attack in descending order and select the first `3` as `"strongest/group"`
- sort the pokemon within a type by name and select the first `3` as `"sorted_by_alphabet"`

{{code_block('user-guide/expressions/window','examples',['over','implode'])}}

Expand Down
33 changes: 22 additions & 11 deletions docs/user-guide/lazy/query_plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,28 @@ For any lazy query `Polars` has both:

We can understand both the non-optimized and optimized query plans with visualization and by printing them as text.

## Non-optimized query plan

### Graphviz visualization

First we visualize the non-optimized plan by setting `optimized=False`.

{{code_block('user-guide/lazy/query_plan','plan',['show_graph'])}}

<div style="display:none">
```python exec="on" result="text" session="user-guide/lazy/query_plan"
--8<-- "python/user-guide/lazy/query_plan.py:setup"
--8<-- "python/user-guide/lazy/query_plan.py:plan"
```
</div>

Below we consider the following query:

{{code_block('user-guide/lazy/query_plan','plan',[])}}

```python exec="on" session="user-guide/lazy/query_plan"
--8<-- "python/user-guide/lazy/query_plan.py:plan"
```

## Non-optimized query plan

### Graphviz visualization

First we visualise the non-optimized plan by setting `optimized=False`.

{{code_block('user-guide/lazy/query_plan','showplan',['show_graph'])}}

```python exec="on" session="user-guide/lazy/query_plan"
--8<-- "python/user-guide/lazy/query_plan.py:createplan"
```
Expand All @@ -36,7 +43,11 @@ The query plan visualization should be read from bottom to top. In the visualiza

We can also print the non-optimized plan with `explain(optimized=False)`

{{code_block('user-guide/lazy/query_plan','plan',['explain'])}}
{{code_block('user-guide/lazy/query_plan','describe',['explain'])}}

```python exec="on" session="user-guide/lazy/query_plan"
--8<-- "python/user-guide/lazy/query_plan.py:describe"
```

```text
FILTER [(col("comment_karma")) > (0)] FROM WITH_COLUMNS:
Expand Down Expand Up @@ -82,4 +93,4 @@ The optimized plan is to:
- apply the filter on the `comment_karma` column while the CSV is being read line-by-line
- transform the `name` column to uppercase

In this case the query optimizer has identified that the `filter` can be applied while the CSV is read from disk rather than writing the whole file to disk and then applying it. This optimization is called *Predicate Pushdown*.
In this case the query optimizer has identified that the `filter` can be applied while the CSV is read from disk rather than reading the whole file into memory and then applying the filter. This optimization is called *Predicate Pushdown*.
2 changes: 1 addition & 1 deletion docs/user-guide/transformations/time-series/rolling.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ The rolling groupby, `groupby_rolling`, is another entrance to the `groupby` con
not fixed by a parameter `every` and `period`. In a rolling groupby the windows are not fixed at all! They are determined
by the values in the `index_column`.

So imagine having a time column with the values `{2021-01-06, 20210-01-10}` and a `period="5d"` this would create the following
So imagine having a time column with the values `{2021-01-06, 2021-01-10}` and a `period="5d"` this would create the following
windows:

```text
Expand Down

0 comments on commit ed42ca4

Please sign in to comment.