Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Add a note on reusing LazyFrames #370

Merged
merged 3 commits into from
Jul 18, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/user-guide/lazy/execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ Above we see that from the 10 million rows there are 14,029 rows that match our

With the default `collect` method Polars processes all of your data as one batch. This means that all the data has to fit into your available memory at the point of peak memory usage in your query.

!!! warning "Reusing `LazyFrame` objects"

Remember that `LazyFrame`s are query plans i.e. a promise on computation and does not do any implicit caching. This means that every time you reuse it in separate downstream queries after it is defined, it is computed all over again. If you define an operation on a `LazyFrame` that doesn't maintain row order (such as a `groupby`), then the order will also change every time it is run. To avoid this, use `maintain_order=True` arguments for such operations.
avimallu marked this conversation as resolved.
Show resolved Hide resolved

### Execution on larger-than-memory data

If your data requires more memory than you have available Polars may be able to process the data in batches using *streaming* mode. To use streaming mode you simply pass the `streaming=True` argument to `collect`
Expand Down