Add save() and load() method to solution. #52

orionarcher · 2021-12-06T23:12:17Z

This important functionality is currently missing. As is, users would need to rerun their analysis.

There should be methods to serialize and load a solution object in a single file.

I believe we would need to save:

solute indexes
solvent names and indexes
rdf data
cutoff radii
solvation_data dataframe

That should be sufficient to reconstruct the Solution.

The text was updated successfully, but these errors were encountered:

hmacdope · 2021-12-07T02:54:55Z

AFAIK there are 2 main options. The first one (and in my opinion best) is to pickle the Solution class in its entirety, allowing the whole class state to be reconstructed from disk.

I am no expert at this, perhaps @MDAnalysis/coredevs will know more but see the following link: https://docs.python.org/3/library/pickle.html.

The other option is to parse out the data to and from some kind of set of JSON files or such like. I am less in favour of this as it is fiddly and will require some introspection into class state etc which is a bit complicated. It may also seperate stuff into multiple files which is a lot less clean. On the plus side, these can then be human readable, but I think the downsides outweigh the positives.

orbeckst · 2021-12-07T15:45:22Z

Pickle is quick but not a good format for data. It can happen that you can’t process a pickle file with a different version of Python IIRC.

Results such as RDFs should be in a good data format anyway. CSV (compressed) is the lowest common denominator.

HDF5 is quite flexible and widely used but it is a heavy dependency.

Overall, I would spend some time figuring out how your workflow should work out. It’s often cleaner to have data producers and data analyzers and reduce coupling between the two.

orionarcher · 2021-12-08T17:01:30Z

If we save the output of expensive operations like solvation_data and rdf_data it would be cheap to recalculate everything else with the load() function. However, Solution also saves a copy of the Universe, is there an established way to serialize a Universe? I could save links to the files that created the universe, but that adds unnecessary complexity.

Maybe as @orbeckst suggests it's best to decouple the data production and analysis and not bother with implementing load. Instead, I could write a save_data function that saves a JSON of important statistics but does not save object state.

I favor JSON because it fits with the other infrastructure I use, but that's my personal bias. CSV would likely be more space efficient for the DataFrames.

hmacdope · 2021-12-14T06:02:09Z

As you suggest, perhaps the best initial target is to implement saving functions for analyses in simple easy to use formats.

JSON or CSV is fine, but as @orbeckst says perhaps CSV is the lowest common denominator.

If we were to be dumping state and making it loadable I would favour PyHDF5 as everything can be contained in a single space efficient file. However writing load() is not an easy job as it needs to mirror the behaviour of __init__() very very closely, otherwise you will end up with unexpectedly None data all over the shop.

The coupling with Universe is also quite complicated, you would likely require the Universe ingredients in separates file AFAIK, resulting in 3 requirements: "topology.tpr", "trajectory.h5md", solution_state.hdf5. I think we can move this kind of complexity down the track possibly?

orionarcher · 2022-01-20T00:32:10Z

Agreed @hmacdope. I just moved this off the v0.2 roadmap. I'll make a new issue that specifically identifies creating a save_data method.

When we return to this later, I think your points are spot on.

hmacdope · 2022-01-20T03:39:22Z

Sounds good. :)

orionarcher added enhancement New feature or request core labels Dec 6, 2021

orionarcher mentioned this issue Dec 21, 2021

implement basic save and load functionality #53

Closed

5 tasks

orionarcher mentioned this issue Jan 20, 2022

Create a save_data method for solution #57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add save() and load() method to solution. #52

Add save() and load() method to solution. #52

orionarcher commented Dec 6, 2021 •

edited

Loading

hmacdope commented Dec 7, 2021

orbeckst commented Dec 7, 2021

orionarcher commented Dec 8, 2021

hmacdope commented Dec 14, 2021

orionarcher commented Jan 20, 2022

hmacdope commented Jan 20, 2022

Add save() and load() method to solution. #52

Add save() and load() method to solution. #52

Comments

orionarcher commented Dec 6, 2021 • edited Loading

hmacdope commented Dec 7, 2021

orbeckst commented Dec 7, 2021

orionarcher commented Dec 8, 2021

hmacdope commented Dec 14, 2021

orionarcher commented Jan 20, 2022

hmacdope commented Jan 20, 2022

orionarcher commented Dec 6, 2021 •

edited

Loading