-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement custom load function for collect_results #421
implement custom load function for collect_results #421
Conversation
This adds an optional wrapper around the loading of data when being processed by collect_results. This is useful in case the data being loaded from file needs to be modified before being processed by collect_results. The wrapper can be passed to collect_results using the new kwarg `load_function_wrapper`. For example, consider you have a simulation result that is saved to disc as a struct. When loaded from disc it will be as a one-element dict containing that struct. The struct may contain many fields that should form the columns of the DataFrame. In that case you'd want to convert the struct to a dictionary after being loaded but before being processed by collect_results. The `load_function_wrapper` provides such a possibility. By default `load_function_wrapper=nothing` and the default methods will be used to load data (so `wload` and `jld2.jldopen`).
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #421 +/- ##
==========================================
- Coverage 90.47% 89.59% -0.88%
==========================================
Files 8 8
Lines 787 788 +1
==========================================
- Hits 712 706 -6
- Misses 75 82 +7 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, besides the minor changes for the source code, you should also add a test in the test suite using exactly the situation you describe in the docstring as well.
src/result_collection.jl
Outdated
if isnothing(load_function_wrapper) | ||
data = wload(filename) | ||
else | ||
data = load_function_wrapper(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using nothing
here? Why don't we have the keyword itself be the function? Make load_function = wload
and then use load_function
throughout the source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take a look at the pull request I mention why underneath the code snippet. Probably should have included it farther up in the pull request.
But I had difficulties getting this to work with to_data_row for the specialized JLD2 case. In that case it doesn't expect wload (as far as I could tell, I'm not an expert on JLD2 though) so that as a default doesn't work. But I am happy to refactor this PR if there is a better or more idiomatic way to achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I don't understand the problem. Please explain it again. The way I see the code right now, if load_function
is nothing, it becomes wload
. Otherwise, it becomes a user-provided function. So... load_function
can be by default wload
. That's what it becomes anyways. So I am overall confused because in the current PR wload
is the default but you say it doesn't work as the default.
The quote you quoted: what does "this" mean in its first sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I don't understand the problem. Please explain it again. The way I see the code right now, if
load_function
is nothing, it becomeswload
. Otherwise, it becomes a user-provided function. So...load_function
can be by defaultwload
. That's what it becomes anyways. So I am overall confused because in the current PRwload
is the default but you say it doesn't work as the default.The quote you quoted: what does "this" mean in its first sentence?
Yeah sorry for the confusion, I probably made this more complicated than it needed to be. I've pushed some new changes based on your advice. Thanks for your patience. Please take a look and let me know what you think.
src/result_collection.jl
Outdated
@@ -50,6 +50,7 @@ See also [`collect_results`](@ref). | |||
* `black_list = [:gitcommit, :gitpatch, :script]`: List of keys not to include from result-file. | |||
* `special_list = []`: List of additional (derived) key-value pairs | |||
to put in `df` as explained below. | |||
* `load_function_wrapper = nothing`: Wrapper around calls to loading the data. This is useful in case you need to do some conversion to your data before `collect_results` processes it. For example, if you are loading a struct, it will be stored as a one-element dictionary on disc. When processing it with `collect_results` you might want to convert the struct to a dict so that the fields of the struct become the elements of the dictionary. One way to do this would be to pass `load_function_wrapper = (d) -> struct2dict(d["key_name"])`. Note, this assumes the session you are loading the struct into is aware of the struct type and that the struct definition has not changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think load_function_wrapper
is unecessarily complicated. There is no reason to talk about wrappers here. We can keep it simple and say the truth: this is just the loading function. We can call it load_function
.
Thanks - I've updated the pull request. Also just realized my autoformat in vscode made tons of changes to Edit: I just realized some of the tests are not passing. I did run the tests on my PC before pushing this and I got 589 of 589 passing (see below) so I'll have to look into whats breaking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late. Don't worry about windows failures we have a problem that we don't know how to solve and CI fails on Windows.
There is a small change necessary: there is no reason to use an anonumous wrapper.
There is also a larger change necessary: please revert all formatting changes. Probably simpler to start a new branch/pr or reset this one to its original state and then do the couple of lines changes necessary.
src/result_collection.jl
Outdated
@@ -50,6 +50,7 @@ See also [`collect_results`](@ref). | |||
* `black_list = [:gitcommit, :gitpatch, :script]`: List of keys not to include from result-file. | |||
* `special_list = []`: List of additional (derived) key-value pairs | |||
to put in `df` as explained below. | |||
* `load_function = (filename) -> wload(filename)`: function for loading data from file. This is useful in the event you have data saved as a struct. When loaded from file it will be as a one-element `Dict` which is not what you want passed to the dataframe. Instead, you'd rather have the fields of the struct to be available as columns of the dataframe. In that case you can use this function to ensure the struct is converted to a dict before being processed by `collect_results!`. For example, `load_function = (filename) -> struct2dict(wload(filename)["my_key"])`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `load_function = (filename) -> wload(filename)`: function for loading data from file. This is useful in the event you have data saved as a struct. When loaded from file it will be as a one-element `Dict` which is not what you want passed to the dataframe. Instead, you'd rather have the fields of the struct to be available as columns of the dataframe. In that case you can use this function to ensure the struct is converted to a dict before being processed by `collect_results!`. For example, `load_function = (filename) -> struct2dict(wload(filename)["my_key"])`. | |
* `load_function = wload`: function for loading data from file. This is useful in the event you have data saved as a struct. When loaded from file it will be as a one-element `Dict` which is not what you want passed to the dataframe. Instead, you'd rather have the fields of the struct to be available as columns of the dataframe. In that case you can use this function to ensure the struct is converted to a dict before being processed by `collect_results!`. For example, `load_function = (filename) -> struct2dict(wload(filename)["my_key"])`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wload
is already a function. Wrapping it in an anonymous function has no benefit.
src/result_collection.jl
Outdated
newfile=false, # keyword only for defining collect_results without ! | ||
rinclude=[r""], | ||
rexclude=[r"^\b$"], | ||
load_function=(filename) -> wload(filename), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_function=(filename) -> wload(filename), | |
load_function = wload, |
The purpose of this commit is to incorpoate comments from pull request - removes the anonymous function wrapper, except from `to_data_row` which requires the default be an anonymous function so that the "r" parameter can be fixed in the call. - reverts all the autoformatting As before, a test was added to `update_results_tests.jl`. All tests passed, 589 of 589.
Thanks - I've updated the pull request per your comments. Edit: So I just realized the formatting did not revert as expected. Here is what I attempted and thought would work.
It seems the changes in whitespace were ignored though (as the current pull request still has the formatting changes) so I have more to look into here. I will try to get this sorted in the new few days. Edit 2: I think I have found my answer here. Quoting from the docs
|
So I've prepared a new branch without the whitespace changes. Would it just be easier to close this PR and open a new one with the cleaned up branch? |
sorry for the late reply!!! Yes, a new PR is easiest! |
The changes in this branch are a follow up from a previous pull request based on commit 6e6ff07 in PR JuliaDynamics#421. In that PR there were issues with whitespace changes inadvertantly coming from the autoformatter in vscode. Reverting the whitespace only changes proved to be more difficult than anticicpated. So to resolve this, this branch was created and a new PR will be created from it. The whitespace issues are gone but all the feedback and changes from the original PR are retained. The commit makes the following changes. - add the `load_function` kwarg to `collect_results`. This allows customizing how data is loaded from file before being processed into a dataframe by `collect_results`. - add a test to `update_result_tests.jl` - update docstring of `collect_results` - increase package version to 2.16.0 - update `CHANGELOG.md` All tests passed, 589 of 589.
The changes in this branch are a follow up from a previous pull request based on commit 6e6ff07 in PR #421. In that PR there were issues with whitespace changes inadvertantly coming from the autoformatter in vscode. Reverting the whitespace only changes proved to be more difficult than anticicpated. So to resolve this, this branch was created and a new PR will be created from it. The whitespace issues are gone but all the feedback and changes from the original PR are retained. The commit makes the following changes. - add the `load_function` kwarg to `collect_results`. This allows customizing how data is loaded from file before being processed into a dataframe by `collect_results`. - add a test to `update_result_tests.jl` - update docstring of `collect_results` - increase package version to 2.16.0 - update `CHANGELOG.md` All tests passed, 589 of 589.
Here is my first pass at implementing #420
This adds an optional wrapper around the loading of data when being processed by
collect_results
. This is useful in case the data being loaded from file needs to be modified before being processed bycollect_results
. The wrapper can be passed tocollect_results
using the new kwargload_function_wrapper
.For example, consider you have a simulation result that is saved to disc as a struct. When loaded from disc it will be as a one-element dict containing that struct. The struct may contain many fields that should form the columns of the DataFrame. In that case you'd want to convert the struct to a dictionary after being loaded but before being processed by
collect_results
. Theload_function_wrapper
provides such a possibility.By default
load_function_wrapper=nothing
and the default methods will be used to load data (sowload
andjld2.jldopen
).Here is an example usage.
All 587 tests passed.
I did try to implement something along the lines of the following.
But I had difficulties getting this to work with
to_data_row
for the specialized JLD2 case. In that case it doesn't expectwload
(as far as I could tell, I'm not an expert on JLD2 though) so that as a default doesn't work. But I am happy to refactor this PR if there is a better or more idiomatic way to achieve this.