-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-40359: Implement Butler.get for matplotlib generated png files #877
base: main
Are you sure you want to change the base?
Conversation
7a59242
to
4013c4e
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #877 +/- ##
=======================================
Coverage 87.72% 87.72%
=======================================
Files 274 274
Lines 36096 36110 +14
Branches 7549 7550 +1
=======================================
+ Hits 31665 31678 +13
- Misses 3259 3260 +1
Partials 1172 1172
☔ View full report in Codecov by Sentry. |
4013c4e
to
0078c8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Is it impossible for the Figure->numpy conversion to work?
I need to think about how easy it would be to add in-memory datastore tests like we do in test_parquet.py. As things stand I think that will break because of the dummy converter not being real.
@@ -27,6 +27,8 @@ | |||
import unittest | |||
from random import Random | |||
|
|||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a protected import since numpy is not butler requirement (although effectively it is since astropy won't work without it).
|
||
fig = butler.get(ref) | ||
# Ensure that the result is a figure | ||
self.assertTrue(isinstance(fig, pyplot.Figure)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertTrue(isinstance(fig, pyplot.Figure)) | |
self.assertIsinstance(fig, pyplot.Figure) |
# Ensure that the result is a figure | ||
self.assertTrue(isinstance(fig, pyplot.Figure)) | ||
image = butler.get(ref, storageClass="NumpyArray") | ||
self.assertTrue(isinstance(image, np.ndarray)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertTrue(isinstance(image, np.ndarray)) | |
self.assertIsinstance(image, np.ndarray) |
return fig | ||
|
||
@staticmethod | ||
def dummyCovnerter(cls: np.ndarray) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name has a typo in it. Also see comment about naming I make above.
I am confused by the signature. Surely this should take a plt.Figure
and return an np.ndarray
? Is there no way to do that conversion? It is going to be problematic in some contexts (eg in-memory datastore) if this method is not implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is just what the name says, a dummy converter. This is what I was trying to get across in slack: In some cases we want to just load the png file directly, although a numpy array is fine. Yes, it is possible to convert a Figure to a numpy array, but it's messy. You basically have to create a dummy file, save the figure to the dummy file, then read the dummy file as a numpy array. 👎
I did the cursory stack overflow search, and that was the recommended solution. I also looked at the matplotlib codebase a bit and it does indeed look like it would be non-trivial to try and convert the figure to a numpy array directly. This is probably why the method wasn't implemented in the first place.
The other solution that I thought of is that we could define a PNG
store class, which would be an IoBytes
instance, and convert plots and arrays to and from the PNG
class. In other words
@staticmethod
def fig_to_png(fig):
buf = IoBytes()
fig.savefig(buf)
return buf
@staticmethod
def png_to_array(png):
return plt.imread(png)
@staticmethod
def png_to_fig(png):
arr = png_to_array(png)
fig = plt.figure()
plt.imshow(arr)
return fig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely untested but can something like this work?
def fig_to_np(fig: pyplot.figure.Figure) -> np.ndarray:
buf = io.BytesIO()
fig.savefig(buf, format="png")
return plt.imread(buf, format="png")
?
|
||
def _writeFile(self, inMemoryDataset: Any) -> None: | ||
# docstring inherited from FileFormatter._writeFile | ||
inMemoryDataset.savefig(self.fileDescriptor.location.path) | ||
|
||
@staticmethod | ||
def fromArray(cls: np.ndarray) -> plt.Figure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not need to be a static method in the formatter class. It's okay for this to be a private function in the file that is not exported and only referenced from the yaml file. The first parameter should not be named cls
, it should be array
or something.
@@ -38,8 +41,22 @@ class MatplotlibFormatter(FileFormatter): | |||
|
|||
def _readFile(self, path: str, pytype: type[Any] | None = None) -> Any: | |||
# docstring inherited from FileFormatter._readFile | |||
raise NotImplementedError(f"matplotlib figures cannot be read by the butler; path is {path}") | |||
return plt.imread(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is returning something unexpected (and effectively backwards from expectations) please add a comment saying that it's not returning a figure.
The other thing that I'll add is that in general you don't want to load a saved figure as a figure, as what you're actually loading is a png file that doesn't have information about the axes, size of the figure, etc. So the dummy figure that is loaded is rarely (if ever) what the user actually desires. |
Yes, I understand that what you get back isn't what you put in, and that is a bit of a problem in butler in general. The problem here though is that if an in-memory datastore is involved behind the scenes everything is going to break because you put a Figure and then you do a butler.get() and all it has is a Figure -- you get back what you put in but if you ask for the NumpyArray version by specifying a different storage class it won't work and will fall over because there is no functioning converter to even attempt it. In the general case you can't tell what the datastore is doing. If the converter has to use a BytesIO buffer to pretend to write a file to disk and then call imread on that buffer with format="png" then that would help a lot. |
There is code in the wild that does: ref = butler.registry.findDataset(plot_name, detector=detector)
print ("Plot number", plot_name)
uri = butler.getURI(ref)
display(Image(data=uri.read())) so we should consider adding a |
I like the idea of using storage class conversions to deal with this. |
Checklist
doc/changes