Schema evolution based on ROOT and Reflex dictionaries #472

hegner · 2023-09-01T08:44:40Z

BEGINRELEASENOTES

Allow comparison of data schemata across versions
Provide syntax to clarify user intentions in schema evolution
Provide schema evolution implementation based on ROOT backend
Include infrastructure for future support for schema evolution in other backends
Documentation for schema evolution functionality

ENDRELEASENOTES

Schema evolution based on old Reflex dictionary approach.

Easier this way to get the current version information into the whole system

Remove two step registration again

… non-versioned legacy data

…ex dicts

hegner · 2023-09-01T08:49:47Z

@tmadlener - this is the current status w/ Reflex based schema evolution. It contains a debug output. The problem is that the value of y_old read in is already claimed to be zero. Opening with bare ROOT the value however is correct.
If playing with not-evolved members like x we get the proper number. Thus my assumption is that there is no proper read before this custom code kicks in. Still - this code crashes if the original file does not contain an y_old leaf. So I would assume another ROOT bug behind this.

hegner · 2023-09-08T13:45:01Z

@tmadlener - seems I never reported back in the tracker. so here for the records. Newer ROOT versions do not suffer from this issue any more.

tmadlener

This is working for me locally with ROOT v6.28/06 and as far as the schema evolution part is concerned seems to be complete (TBC at tomorrows meeting). I have left a few inline comments for cleanup / nitpicking.

As we have already discussed in private handling multiple old versions will be done in a separate PR. From a technical point of view that should "just be another loop" over all the older versions.

The roundtrip tests that dump the model from a written file are currently failing because the generation of the dumped model does not generate any schema evolution code, but the diffing against the original model has schema evolution code now:

92: + diff -ru /home/tmadlener/work/AIDASoft/podio/build/tests/root_io/example_frame.root.dumped_datamodel/datamodel /home/tmadlener/work/AIDASoft/podio/tests/datamodel
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithArrayv2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithNamespacev2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: ExampleWithUserInitv2Data.h
92: Only in /home/tmadlener/work/AIDASoft/podio/tests/datamodel: NamespaceStructv2.h

This leads me to a point on the naming and file layout of the generated code, currently we will have a

FooData in FooData.h
FoovXData in FoovXData.h

I think this could be cleaner if we had a vX namespace and potentially a vX subfolder to put "old" headers into. (The latter could also make the roundtrip tests slightly easier to fix).

Finally, I think we should consider splitting this PR into two:

One that only has the refactoring of the jinja templates that happens in the early commits here which leads to quite a bit of unrelated noise here. We would then also avoid having these mixed up with the "interesting" changes when we squash in the end.
The actual changes for schema evolution.

python/podio_class_generator.py

tmadlener · 2023-09-11T14:41:03Z

python/podio_class_generator.py

+      self.root_schema_component_names.add(name + self.old_schema_version)
+
+  def _replaceComponentInPaths(self, oldname, newname, paths):
+    """Replace component in paths"""


To which paths is this referring to? I suppose it is the include paths for older components?

Can we make the paths a return value here instead of an in-out parameter and simply re-assign to the (input) paths on the calling site?

it is generic, but it is used for include paths in this case. I would not like dealing with return values as it is changing values within an existing list. And I cannot guarantee without code checking that a reference to the list isn't "cached" elsewhere already.

Ah sorry, I missed the if in the loop. I thought all elements were touched, but they aren't. Then we leave it like this.

python/podio_class_generator.py

python/templates/Collection.cc.jinja2

hegner · 2023-09-11T20:48:32Z

Thanks for checking. I was addressing most of the points now.

Concerning the split into two PRs. The part you want to have in a separate PR is probably commit b65c6ee

I could cherry pick that one for a new PR if that is what you want.

When it comes to namespaces and directory structure for the old data - most of the work will actually be in CMake, and not in the code generation part. Do we want to do it in a separate PR?

tmadlener · 2023-09-12T06:48:47Z

Concerning the split into two PRs. The part you want to have in a separate PR is probably commit b65c6ee

Yes exactly. Plus this one (to not break things again) 6b8fe15

When it comes to namespaces and directory structure for the old data - most of the work will actually be in CMake, and not in the code generation part. Do we want to do it in a separate PR?

I think parts of it would also need addressing here because the include paths and file names are part of code generation. But we can also do all of that in a separate PR where we clean up things. We will have at least one other PR in any case to handle several old versions.

tmadlener · 2023-09-12T06:59:46Z

One other observation: The key4hep-nightlies seem to work for the schema evolution part, but they only come with root 6.28/04, so maybe we don't even need the latest version?

python/podio_class_generator.py

python/podio_schema_evolution.py

Make the schema evolution datamodel the new one and use the original one to write old data. This allows the roundtrip tests to work without additional work because the additional schema evolution code is not used anywhere for that.

hegner · 2023-09-12T21:52:52Z

One other observation: The key4hep-nightlies seem to work for the schema evolution part, but they only come with root 6.28/04, so maybe we don't even need the latest version?

Any 6.28 is enough. 6.24 is not. And 6.26 I did not try.

tmadlener · 2023-09-13T07:02:43Z

Ah alright. Then I simply misunderstood "latest release". 6.26 did not work for me.

hegner · 2023-09-13T12:56:29Z

@tmadlener - if you are fine, can you review and approve?

tmadlener

Looks good. Can you confirm whether the comments I marked are really left overs from earlier? The Error message can be addressed in a follow up PR as well, just wanted to have it on a "list" somewhere.

python/podio_class_generator.py

tmadlener · 2023-09-13T13:24:06Z

python/podio_class_generator.py

+          iorule.code = f'{iorule.target} = onfile.{schema_change.member_name_old};'
+          self.root_schema_iorules.add(iorule)
+        else:
+          raise NotImplementedError("Schema evolution for this type not yet implemented")


This is potentially confusing, without further information on what "type" is.

Co-authored-by: Thomas Madlener <[email protected]>

hegner · 2023-09-13T14:00:26Z

Looks good. Can you confirm whether the comments I marked are really left overs from earlier? The Error message can be addressed in a follow up PR as well, just wanted to have it on a "list" somewhere.

Agreed. Type needs clarification. The rest was a leftover

* Add SchemaEvolution singleton to hold evolution functions * Inject type information into collection buffers * Inject current schema version into buffers from buffer factory * Require registration of each evolution function * Create schema_evolution test subdirectory and build old datamodel * creating components and datatypes for explicit schema evolution * add code generation for reflex schema evolution * Rearrange schema evolution tests to not interfere with others * Move function implementations into .cc files for Components Co-authored-by: Thomas Madlener <[email protected]>

tmadlener and others added 19 commits June 15, 2023 11:17

Remove unused fields

0b57a4d

Add SchemaEvolution singleton to hold evolution functions

074a0e5

Inject type information into collection buffers

df470ed

Inject current schema version into buffers from buffer factory

7fee6f2

[wip] Start populating SchemaEvolution registry

2ba52f8

[wip] Split registration into two steps

8942b3d

Easier this way to get the current version information into the whole system

[wip] Require registration of each evolution function

60f24d3

Remove two step registration again

[clang-tidy] Mark inputs as const& for now

64d148c

Create schema_evolution test subdirectory and build old datamodel

9e006fd

Add first simple tests for "trivial" schema evolution

14c2010

Fix test environment and typo

3ccd157

Add failing test for renamed member variables

557f347

Merge branch 'master' into schema-evol-library

77fec67

move Collection::createBuffers template into macro

b65c6ee

creating components and datatypes for explicit schema evolution

2807c55

add more schema evolution code generation

20e30d0

bump of schema version for testing. version 1 is already reserved for…

5b41643

… non-versioned legacy data

add missing schema evolution pieces; prepare for ioread rules in refl…

afa0e41

…ex dicts

add code generation for reflex schema evolution

39a42b6

hegner requested a review from tmadlener September 1, 2023 08:44

hegner and others added 6 commits September 11, 2023 09:33

Merge branch 'master' into schema-reflex

09455c0

Update SchemaEvolution.cc

28ca566

Update Collection.cc.jinja2

fab33e8

disable currently unused schema evolution parts

178b520

address static code checker warnings

905b473

Fix bug re-introduced in merging master

6b8fe15

tmadlener reviewed Sep 11, 2023

View reviewed changes

addressing PR comments

51462d9

andresailer reviewed Sep 12, 2023

View reviewed changes

tmadlener and others added 3 commits September 12, 2023 11:04

Add a test for a float to double migration

36d06e6

addressing review comments and code checker

0caa209

hegner mentioned this pull request Sep 12, 2023

Introduce a createBuffers macro for the jinja template #480

Merged

hegner changed the title ~~[WIP ]Schema reflex~~ Schema evolution based on ROOT and Reflex dictionaries Sep 12, 2023

hegner mentioned this pull request Sep 12, 2023

[WIP] Moving PODIO to using rootcling instead of reflex #460

Closed

tmadlener added 4 commits September 13, 2023 15:14

Reduce unnecessary template instantiations

6921613

Fix preprocessor directives

5cbfec5

Move function implementations into .cc files for Components

10ec62e

Merge branch 'master' into schema-reflex

f632556

tmadlener reviewed Sep 13, 2023

View reviewed changes

Update python/podio_class_generator.py

1b425c8

Co-authored-by: Thomas Madlener <[email protected]>

tmadlener approved these changes Sep 13, 2023

View reviewed changes

hegner merged commit b00fd75 into master Sep 13, 2023
17 checks passed

hegner mentioned this pull request Sep 15, 2023

Clarify error message in case of not implemented schema changes #483

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema evolution based on ROOT and Reflex dictionaries #472

Schema evolution based on ROOT and Reflex dictionaries #472

hegner commented Sep 1, 2023 •

edited

Loading

hegner commented Sep 1, 2023

hegner commented Sep 8, 2023

tmadlener left a comment

tmadlener Sep 11, 2023

hegner Sep 11, 2023

tmadlener Sep 12, 2023

hegner commented Sep 11, 2023

tmadlener commented Sep 12, 2023 •

edited

Loading

tmadlener commented Sep 12, 2023

hegner commented Sep 12, 2023

tmadlener commented Sep 13, 2023

hegner commented Sep 13, 2023

tmadlener left a comment

tmadlener Sep 13, 2023

hegner commented Sep 13, 2023

Schema evolution based on ROOT and Reflex dictionaries #472

Schema evolution based on ROOT and Reflex dictionaries #472

Conversation

hegner commented Sep 1, 2023 • edited Loading

hegner commented Sep 1, 2023

hegner commented Sep 8, 2023

tmadlener left a comment

Choose a reason for hiding this comment

tmadlener Sep 11, 2023

Choose a reason for hiding this comment

hegner Sep 11, 2023

Choose a reason for hiding this comment

tmadlener Sep 12, 2023

Choose a reason for hiding this comment

hegner commented Sep 11, 2023

tmadlener commented Sep 12, 2023 • edited Loading

tmadlener commented Sep 12, 2023

hegner commented Sep 12, 2023

tmadlener commented Sep 13, 2023

hegner commented Sep 13, 2023

tmadlener left a comment

Choose a reason for hiding this comment

tmadlener Sep 13, 2023

Choose a reason for hiding this comment

hegner commented Sep 13, 2023

hegner commented Sep 1, 2023 •

edited

Loading

tmadlener commented Sep 12, 2023 •

edited

Loading