Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI/merge: Avoid duplication of identical schemas #982

Merged
merged 3 commits into from
Sep 28, 2023

Conversation

achim-k
Copy link
Contributor

@achim-k achim-k commented Sep 28, 2023

Public-Facing Changes

CLI/merge: Avoid duplication of identical schemas

Description

  • Removes some unused members of the McapMerger struct
  • Avoid duplication of identical schemas when merging mcap files. Schemas are considered identical if their md5 hash sum over the schema name, encoding and data is equal.

Comment on lines +172 to +177
// Reset struct members
m.schemaIdByHash = make(map[string]uint16)
m.schemaIDs = make(map[schemaID]uint16)
m.channelIDs = make(map[channelID]uint16)
m.nextChannelID = 1
m.nextSchemaID = 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I have to reset the schemaIdByHash as we might have a new writer where the schemas haven't been written to yet.
In general, I don't fully understand the purpose of the mcapMerger struct. Wouldn't a single function be sufficient? I don't think users are supposed to call mcapMerger.addSchema themselves? But maybe I don't fully understand Go's concept of private members yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could definitely be written as a single function. There are no users of mcapMerger since it's private. I don't remember the details about mcapMerger but I'm pretty sure it only exists to assist with whatever remapping of channel/schema IDs are required, and you could write that either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you doing this resetting because you anticipate mergeInputs will be called multiple times?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ this seems reasonable to me (though I don't think it actually will be called that way), as would writing a single function -- maybe that's why you were asking.

Patch looks good to me generally, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you doing this resetting because you anticipate mergeInputs will be called multiple times?

It's called multiple times in the tests. The resetting was required to make it succeed.

@achim-k achim-k requested a review from wkalt September 28, 2023 13:44
@achim-k achim-k merged commit 04d820a into main Sep 28, 2023
25 checks passed
@achim-k achim-k deleted the achim/avoid-schema-duplication-cli-merge branch September 28, 2023 14:08
pezy pushed a commit to pezy/mcap that referenced this pull request Jan 11, 2024
### Public-Facing Changes

CLI/merge: Avoid duplication of identical schemas

### Description
- Removes some unused members of the McapMerger struct
- Avoid duplication of identical schemas when merging mcap files.
Schemas are considered identical if their md5 hash sum over the schema
name, encoding and data is equal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants