Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add plot namespace (which defers to hvplot) #13238

Merged
merged 22 commits into from
Dec 31, 2023

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Dec 24, 2023

Much simpler version of #13149

The idea to just defer to hvplot (much like how read_database_uri, as far as I understand, defers to connectorx)

Demo: (note: these plots are interactive, I took a screenshot with my mouse over one of them to show the popup that appears)

image

Screenshot 2023-12-24 105136

image

Doing .collect within a plotting function sounds potentially like a footgun, and a risk that people re-trigger the same expensive IO function over and over again? I'd suggest, at least to start, to only have .plot on DataFrame, so people are expected to collect before plotting

@MarcoGorelli MarcoGorelli force-pushed the hvplot-backend branch 2 times, most recently from 203b007 to fe06c3a Compare December 24, 2023 10:43
@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Dec 24, 2023

@hoxbro @maximlt @jbednar @MarcSkovMadsen I think you're all involved in hvplot (apologies if not - your names are ones I've come across in related discussions)? If so, would be curious to hear your thoughts on this, whether it seems maintainable, and if there's any risks Polars should be aware of.

hvplot has been friendly and welcoming to Polars so I hope this could take the collaboration further 🤝

(I'm aware it's xmas eve, so sorry for writing today - I don't expect a reply, I've just got a long train ride to visit family and so figured I'd try this out)

@ritchie46
Copy link
Member

For some people, Christmas comes early this year.

@MarcSkovMadsen
Copy link

But if plotting requires running some operations for example if using 'by' argument, making boxplot, histogram or using datashader, would you not want the plotting library to control when things are collected?

(I'm not the expert)

@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Dec 24, 2023

I'd suggest moving incrementally:

  1. start with DataFrame.plot (based on comments/reactions so far, fairly uncontroversial?)
  2. then, discuss LazyFrame.plot later, as I think that's more complicated (see semi/anti join #5508 (comment) for an example of how it's possible to collect-shoot yourself in the foot). I'd like to see at least some discussion of collect-footguns in the context of plotting before adding LazyFrame.plot. But let's save that for a future PR / discussion if it's OK

@ritchie46
Copy link
Member

then, discuss LazyFrame.plot later

Yes.. that's a big no-no for me.

@jbednar
Copy link

jbednar commented Dec 27, 2023

This sounds great! I can't think of any risks for doing this; hvPlot is part of our daily workflows for Pandas and Dask DataFrames already, and I'm very pleased for Polars to be joining this list. I don't have experience with using Polars myself, but .collect sounds like it might be conceptually similar to Dask .compute()? For Dask we keep it lazy all the way through to plotting so that tools like Datashader can do the actual rendering in a fully distributed fashion, with chunks of each plot rendered to an image or array in parallel and only the final rendered chunk needing to cross the wire out of each remote worker. Datashader doesn't currently support Polars, but I'd hope it will some day, and so it would be good to choose an approach here that anticipates having that work smoothly.

@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Dec 27, 2023

Datashader doesn't currently support Polars, but I'd hope it will some day, and so it would be good to choose an approach here that anticipates having that work smoothly.

Hope so too! Looks like there's already an open issue holoviz/datashader#1199

Adding DataFrame.plot now doesn't preclude adding some LazyFrame.plot solution later on, possibly with some safeguards (e.g. a "I know what I'm doing" kind of flag)

In the meantime, I think this would already be an improvement:

  • the plots look good by default
  • as far as I can tell, hvplot is a well-maintained project
  • users would easily be able to discover how to produce high-quality plots during interactive development

It's fairly common to hear user Polars users say they wish they could quickly add .plot to the end of a cell to see how something looks, like they did with pandas. Except this would be a lot better, because of how good hvplot is!

I'll mark as ready-for-review once CI is green

@MarcoGorelli MarcoGorelli marked this pull request as ready for review December 27, 2023 15:42
@MarcoGorelli MarcoGorelli changed the title RFC feat(python): add plot namespace (which defers to hvplot) feat(python): add plot namespace (which defers to hvplot) Dec 27, 2023
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Dec 27, 2023
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure here looks great!

I only wonder where the limits are - are all data types supported, for example? What happens if we want to plot some dataframes with Decimal columns? What kind of error do I get if I try to plot an empty DataFrame / the column I am trying to plot is missing?

I would appreciate some additional tests to showcase this.

py-polars/tests/unit/namespaces/test_plot.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
@MarcoGorelli MarcoGorelli marked this pull request as draft December 28, 2023 12:28
@@ -392,7 +392,7 @@ def __get__( # type: ignore[override]
return self.fget( # type: ignore[misc]
instance if isinstance(instance, cls) else cls
)
except AttributeError:
except (AttributeError, ImportError):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding this to avoid an importerror if building the docs without hvplot installed. I don't think hvplot should be required for building docs, and it's probably not worth showing plots in the docs themselves

@MarcoGorelli MarcoGorelli marked this pull request as ready for review December 28, 2023 16:50
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good - I am just a bit unsure about relying on the library internals like this. Feels like this would break easily when new hvplot versions are released. It would be nice to get some feedback from the hvplot people.

However, I think this version is probably good enough to be released and get some user feedback. It's probably stable enough on the short term and we can improve as we go along.

py-polars/polars/dataframe/frame.py Outdated Show resolved Hide resolved
# private
not name.startswith("_")
# `.plot` not available on Expr
and name != "plot"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be required - plot is not an empty function so it's skipped. Everything runs fine when I comment this out - am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you try building the docs?

Copy link
Member

@stinodego stinodego Dec 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm no, indeed it fails on building the docs. But the problem isn't that .plot is not available on Expr. The problem is that it calls hvPlotTabularPolars which imports polars, which doesn't exist while building the docs.

I'm not sure of the best solution here. I would ideally want to avoid hardcoding random stuff here as the functionality is already complex enough. But I'm not sure what the best solution would be 🤔

I think we should probably update sphinx_accessor in some way to avoid this from happening. Looking into it as we speak...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but my understanding is that expr_dispatch is only needed for methods available on Expr

you're right about complexity, I just couldn't think of a solution which doesn't add any. maybe there's a more generic solution which would mean that this isn't necessary

cc @alexander-beedie in case you have ideas here

@hoxbro
Copy link

hoxbro commented Dec 31, 2023

I have answered the questions and the changes LGTM.

I would make one suggestion of adding a plot accessor to the LazyFrame, with an error or warning which says plotting is not supported for LazyFrame, please call .collect() beforehand.

@MarcoGorelli
Copy link
Collaborator Author

Thanks for your reviews!

Sure, I've switched to using hvplot.post_patch - that does indeed look better! Glad you're open to a "sister PR" in hvPlot, I'll mark holoviz/hvplot#1243 as ready-for-review if/when this is accepted after the release

I'm not sure about adding a function which only raises for LazyFrame - any function which is defined is going to tab-complete for users, and if something only raises then I think I'd prefer to keep it out of the API (e.g. pivot is only defined for DataFrame)

@hoxbro
Copy link

hoxbro commented Dec 31, 2023

I'm not sure about adding a function which only raises for LazyFrame - any function which is defined is going to tab-complete for users, and if something only raises then I think I'd prefer to keep it out of the API (e.g. pivot is only defined for DataFrame)

That seems like a fair point. My thought was more helping users. An example could be if they started with a DataFrame, did some work, and plotted, and after that, they changed it to a LazyFrame, they will get a non-obvious AttributeError. I don't know if that is too niche of an example. For the autocomplete, at least in IPython/Jupyter, it is possible to exclude it with the __dir__ method.

It was only a suggestion and I trust your judgment 🙂

py-polars/pyproject.toml Outdated Show resolved Hide resolved
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's just the one loose end around the interaction between Sphinx and expr_dispatch. We can try to solve it now or keep it as a TODO. Otherwise, this looks good and I think it's a great addition!

@ritchie46
Copy link
Member

Al right! Let's get this in. Thanks a lot @MarcoGorelli! Curious how this is received.

@ritchie46 ritchie46 merged commit 2341372 into pola-rs:main Dec 31, 2023
17 checks passed
@stinodego stinodego added the highlight Highlight this PR in the changelog label Dec 31, 2023
@alonme
Copy link
Contributor

alonme commented Jan 2, 2024

Might be worth adding Polars in this section
https://github.com/holoviz/hvplot?tab=readme-ov-file#hvplot-works-with-the-tools-you-know-and-love

@MarcoGorelli
Copy link
Collaborator Author

can you add this an issue to hvplot please?

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Jan 2, 2024

can you add this an issue to hvplot please?

Done. Please verify this is what you suggest @alonme holoviz/hvplot#1244

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature highlight Highlight this PR in the changelog python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants