-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema evolution based on ROOT and Reflex dictionaries #472
Conversation
Easier this way to get the current version information into the whole system
Remove two step registration again
… non-versioned legacy data
@tmadlener - this is the current status w/ Reflex based schema evolution. It contains a debug output. The problem is that the value of |
@tmadlener - seems I never reported back in the tracker. so here for the records. Newer ROOT versions do not suffer from this issue any more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working for me locally with ROOT v6.28/06 and as far as the schema evolution part is concerned seems to be complete (TBC at tomorrows meeting). I have left a few inline comments for cleanup / nitpicking.
As we have already discussed in private handling multiple old versions will be done in a separate PR. From a technical point of view that should "just be another loop" over all the older versions.
The roundtrip tests that dump the model from a written file are currently failing because the generation of the dumped model does not generate any schema evolution code, but the diffing against the original model has schema evolution code now:
92: + diff -ru /home/tmadlener/work/AIDASoft/podio/build/tests/root_io/example_frame.root.dumped_datamodel/datamodel /home/tmadlener/work/AIDASoft/podio/tests/datamodel
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithArrayv2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithNamespacev2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithUserInitv2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: NamespaceStructv2.h
This leads me to a point on the naming and file layout of the generated code, currently we will have a
FooData
inFooData.h
FoovXData
inFoovXData.h
I think this could be cleaner if we had a vX
namespace and potentially a vX
subfolder to put "old" headers into. (The latter could also make the roundtrip tests slightly easier to fix).
Finally, I think we should consider splitting this PR into two:
- One that only has the refactoring of the jinja templates that happens in the early commits here which leads to quite a bit of unrelated noise here. We would then also avoid having these mixed up with the "interesting" changes when we squash in the end.
- The actual changes for schema evolution.
python/podio_class_generator.py
Outdated
self.root_schema_component_names.add(name + self.old_schema_version) | ||
|
||
def _replaceComponentInPaths(self, oldname, newname, paths): | ||
"""Replace component in paths""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To which paths is this referring to? I suppose it is the include paths for older components?
Can we make the paths
a return value here instead of an in-out parameter and simply re-assign to the (input) paths on the calling site?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is generic, but it is used for include paths in this case. I would not like dealing with return values as it is changing values within an existing list. And I cannot guarantee without code checking that a reference to the list isn't "cached" elsewhere already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I missed the if
in the loop. I thought all elements were touched, but they aren't. Then we leave it like this.
Thanks for checking. I was addressing most of the points now. Concerning the split into two PRs. The part you want to have in a separate PR is probably commit b65c6ee I could cherry pick that one for a new PR if that is what you want. When it comes to namespaces and directory structure for the old data - most of the work will actually be in CMake, and not in the code generation part. Do we want to do it in a separate PR? |
Yes exactly. Plus this one (to not break things again) 6b8fe15
I think parts of it would also need addressing here because the include paths and file names are part of code generation. But we can also do all of that in a separate PR where we clean up things. We will have at least one other PR in any case to handle several old versions. |
One other observation: The key4hep-nightlies seem to work for the schema evolution part, but they only come with root 6.28/04, so maybe we don't even need the latest version? |
Make the schema evolution datamodel the new one and use the original one to write old data. This allows the roundtrip tests to work without additional work because the additional schema evolution code is not used anywhere for that.
Any 6.28 is enough. 6.24 is not. And 6.26 I did not try. |
Ah alright. Then I simply misunderstood "latest release". 6.26 did not work for me. |
@tmadlener - if you are fine, can you review and approve? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Can you confirm whether the comments I marked are really left overs from earlier? The Error message can be addressed in a follow up PR as well, just wanted to have it on a "list" somewhere.
iorule.code = f'{iorule.target} = onfile.{schema_change.member_name_old};' | ||
self.root_schema_iorules.add(iorule) | ||
else: | ||
raise NotImplementedError("Schema evolution for this type not yet implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially confusing, without further information on what "type" is.
Co-authored-by: Thomas Madlener <[email protected]>
Agreed. Type needs clarification. The rest was a leftover |
* Add SchemaEvolution singleton to hold evolution functions * Inject type information into collection buffers * Inject current schema version into buffers from buffer factory * Require registration of each evolution function * Create schema_evolution test subdirectory and build old datamodel * creating components and datatypes for explicit schema evolution * add code generation for reflex schema evolution * Rearrange schema evolution tests to not interfere with others * Move function implementations into .cc files for Components Co-authored-by: Thomas Madlener <[email protected]>
BEGINRELEASENOTES
ENDRELEASENOTES
Schema evolution based on old Reflex dictionary approach.