Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Commit

Permalink
Add a note on reusing LazyFrames
Browse files Browse the repository at this point in the history
  • Loading branch information
avimallu committed Jul 11, 2023
1 parent 422cdb6 commit 3cdc299
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/user-guide/lazy/execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ Above we see that from the 10 million rows there are 14,029 rows that match our

With the default `collect` method Polars processes all of your data as one batch. This means that all the data has to fit into your available memory at the point of peak memory usage in your query.

!!! warning "Reusing `LazyFrame` objects"

Remember that `LazyFrame`s are query plans i.e. a promise on computation. This means that every time you reuse it in separate downstream queries after it is defined, it is computed all over again. If you define an operation on a `LazyFrame` that doesn't maintain row order (such as a `groupby`), then the order will also change every time it is run. To avoid this, use `maintain_order=True` arguments for such operations.

### Execution on larger-than-memory data

If your data requires more memory than you have available Polars may be able to process the data in batches using *streaming* mode. To use streaming mode you simply pass the `streaming=True` argument to `collect`
Expand Down

0 comments on commit 3cdc299

Please sign in to comment.