Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema evolution based on ROOT and Reflex dictionaries #472

Merged
merged 34 commits into from
Sep 13, 2023
Merged

Conversation

hegner
Copy link
Collaborator

@hegner hegner commented Sep 1, 2023

BEGINRELEASENOTES

  • Allow comparison of data schemata across versions
  • Provide syntax to clarify user intentions in schema evolution
  • Provide schema evolution implementation based on ROOT backend
  • Include infrastructure for future support for schema evolution in other backends
  • Documentation for schema evolution functionality

ENDRELEASENOTES

Schema evolution based on old Reflex dictionary approach.

@hegner
Copy link
Collaborator Author

hegner commented Sep 1, 2023

@tmadlener - this is the current status w/ Reflex based schema evolution. It contains a debug output. The problem is that the value of y_old read in is already claimed to be zero. Opening with bare ROOT the value however is correct.
If playing with not-evolved members like x we get the proper number. Thus my assumption is that there is no proper read before this custom code kicks in. Still - this code crashes if the original file does not contain an y_old leaf. So I would assume another ROOT bug behind this.

@hegner
Copy link
Collaborator Author

hegner commented Sep 8, 2023

@tmadlener - seems I never reported back in the tracker. so here for the records. Newer ROOT versions do not suffer from this issue any more.

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working for me locally with ROOT v6.28/06 and as far as the schema evolution part is concerned seems to be complete (TBC at tomorrows meeting). I have left a few inline comments for cleanup / nitpicking.

As we have already discussed in private handling multiple old versions will be done in a separate PR. From a technical point of view that should "just be another loop" over all the older versions.

The roundtrip tests that dump the model from a written file are currently failing because the generation of the dumped model does not generate any schema evolution code, but the diffing against the original model has schema evolution code now:

92: + diff -ru /home/tmadlener/work/AIDASoft/podio/build/tests/root_io/example_frame.root.dumped_datamodel/datamodel /home/tmadlener/work/AIDASoft/podio/tests/datamodel
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithArrayv2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithNamespacev2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithUserInitv2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: NamespaceStructv2.h

This leads me to a point on the naming and file layout of the generated code, currently we will have a

  • FooData in FooData.h
  • FoovXData in FoovXData.h

I think this could be cleaner if we had a vX namespace and potentially a vX subfolder to put "old" headers into. (The latter could also make the roundtrip tests slightly easier to fix).

Finally, I think we should consider splitting this PR into two:

  • One that only has the refactoring of the jinja templates that happens in the early commits here which leads to quite a bit of unrelated noise here. We would then also avoid having these mixed up with the "interesting" changes when we squash in the end.
  • The actual changes for schema evolution.

python/podio_class_generator.py Outdated Show resolved Hide resolved
self.root_schema_component_names.add(name + self.old_schema_version)

def _replaceComponentInPaths(self, oldname, newname, paths):
"""Replace component in paths"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To which paths is this referring to? I suppose it is the include paths for older components?

Can we make the paths a return value here instead of an in-out parameter and simply re-assign to the (input) paths on the calling site?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is generic, but it is used for include paths in this case. I would not like dealing with return values as it is changing values within an existing list. And I cannot guarantee without code checking that a reference to the list isn't "cached" elsewhere already.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I missed the if in the loop. I thought all elements were touched, but they aren't. Then we leave it like this.

python/podio_class_generator.py Outdated Show resolved Hide resolved
python/podio_class_generator.py Outdated Show resolved Hide resolved
python/podio_class_generator.py Outdated Show resolved Hide resolved
python/templates/Collection.cc.jinja2 Show resolved Hide resolved
@hegner
Copy link
Collaborator Author

hegner commented Sep 11, 2023

Thanks for checking. I was addressing most of the points now.

Concerning the split into two PRs. The part you want to have in a separate PR is probably commit b65c6ee

I could cherry pick that one for a new PR if that is what you want.

When it comes to namespaces and directory structure for the old data - most of the work will actually be in CMake, and not in the code generation part. Do we want to do it in a separate PR?

@tmadlener
Copy link
Collaborator

tmadlener commented Sep 12, 2023

Concerning the split into two PRs. The part you want to have in a separate PR is probably commit b65c6ee

Yes exactly. Plus this one (to not break things again) 6b8fe15

When it comes to namespaces and directory structure for the old data - most of the work will actually be in CMake, and not in the code generation part. Do we want to do it in a separate PR?

I think parts of it would also need addressing here because the include paths and file names are part of code generation. But we can also do all of that in a separate PR where we clean up things. We will have at least one other PR in any case to handle several old versions.

@tmadlener
Copy link
Collaborator

One other observation: The key4hep-nightlies seem to work for the schema evolution part, but they only come with root 6.28/04, so maybe we don't even need the latest version?

python/podio_class_generator.py Outdated Show resolved Hide resolved
python/podio_class_generator.py Outdated Show resolved Hide resolved
python/podio_class_generator.py Outdated Show resolved Hide resolved
python/podio_class_generator.py Outdated Show resolved Hide resolved
python/podio_schema_evolution.py Show resolved Hide resolved
tmadlener and others added 3 commits September 12, 2023 11:04
Make the schema evolution datamodel the new one and use the original one
to write old data. This allows the roundtrip tests to work without
additional work because the additional schema evolution code is not used
anywhere for that.
@hegner
Copy link
Collaborator Author

hegner commented Sep 12, 2023

One other observation: The key4hep-nightlies seem to work for the schema evolution part, but they only come with root 6.28/04, so maybe we don't even need the latest version?

Any 6.28 is enough. 6.24 is not. And 6.26 I did not try.

@hegner hegner changed the title [WIP ]Schema reflex Schema evolution based on ROOT and Reflex dictionaries Sep 12, 2023
@tmadlener
Copy link
Collaborator

Ah alright. Then I simply misunderstood "latest release". 6.26 did not work for me.

@hegner
Copy link
Collaborator Author

hegner commented Sep 13, 2023

@tmadlener - if you are fine, can you review and approve?

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can you confirm whether the comments I marked are really left overs from earlier? The Error message can be addressed in a follow up PR as well, just wanted to have it on a "list" somewhere.

python/podio_class_generator.py Outdated Show resolved Hide resolved
iorule.code = f'{iorule.target} = onfile.{schema_change.member_name_old};'
self.root_schema_iorules.add(iorule)
else:
raise NotImplementedError("Schema evolution for this type not yet implemented")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially confusing, without further information on what "type" is.

@hegner
Copy link
Collaborator Author

hegner commented Sep 13, 2023

Looks good. Can you confirm whether the comments I marked are really left overs from earlier? The Error message can be addressed in a follow up PR as well, just wanted to have it on a "list" somewhere.

Agreed. Type needs clarification. The rest was a leftover

@hegner hegner merged commit b00fd75 into master Sep 13, 2023
17 checks passed
Ananya2003Gupta pushed a commit to Ananya2003Gupta/podio that referenced this pull request Sep 26, 2023
* Add SchemaEvolution singleton to hold evolution functions

* Inject type information into collection buffers

* Inject current schema version into buffers from buffer factory

* Require registration of each evolution function

* Create schema_evolution test subdirectory and build old datamodel

* creating components and datatypes for explicit schema evolution

* add code generation for reflex schema evolution

* Rearrange schema evolution tests to not interfere with others

* Move function implementations into .cc files for Components

Co-authored-by: Thomas Madlener <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants