Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[on hold] RNTuple example #245

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Member

@bernhardmgruber bernhardmgruber commented May 17, 2021

This is an example for using LLAMA with data loaded from an RNTuple via the ROOT data analysis framework.

At the moment, a LLAMA view is created to hold all events stored in the RNTuple file, excluding subarrays. This data contains 1095 columns.

It compiles in 2min, needs ~2GiB of memory at runtime and copies from RNTuple to LLAMA in 26s.

@bernhardmgruber bernhardmgruber changed the title HEP RNTuple example RNTuple example May 17, 2021
@bernhardmgruber bernhardmgruber marked this pull request as ready for review May 17, 2021 23:52
@bussmann
Copy link

Can you put these numbers in perspective?

@bernhardmgruber
Copy link
Member Author

Can you put these numbers in perspective?

As a quick answer, I can only provide you with experience and guesses:

  • 2min compilation time is long for a single source file. Other LLAMA examples compile within ~10s. LLAMA has scalability issues with very large records, although I tried to improve the situation a lot in the recent days with 2 PRs against Improve compile time #240.
  • The RNTuple on disk (compressed) has 2.5GiB. I read the events, which are roughly 2/3 of the data. If that is 2GiB uncompressed in RAM, that sounds reasonable. I guess the compression is not that strong.
  • Reading that data in 26s seems slow, but it runs in WSL on my notebook. And I have not made any attempt to improve that.

* allow dynamic field types in the record dimension
* add specializations to most of the core functions
* add llama::dynamic to signal a dynamic array member in a RecordCoord
* extend VirtualRecord to allow holding dynamic indices
* extend blobNrAndOffset to allow for additional dynamic indices
* add OffsetTable mapping
* add customization allowing to dump OffsetTable mappings
* add a few unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants