Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final naming tweaks re Tranche/Partition (DynamoStore Index Tranches should be Partitions) #201

Closed
bartelink opened this issue Jan 31, 2023 · 9 comments · Fixed by #203
Closed

Comments

@bartelink
Copy link
Collaborator

bartelink commented Jan 31, 2023

I picked Tranche as a relatively obscure term so nobody would bring any preconceptions about what a "partition" might be (or might be limited to being). (Back when it was Cosmos and Kafka only, I originally used the term PartitionId and went to the trouble of doing a rename as the Feed features came into being)

But "what is a tranche" seems to be a perpetual question so I may be swung into doing the legwork to rename it (there are probably hundreds of usages of the term spread around the place).

Given Propulsion is about to turn 3.0 (it's rc.2 now), this is the only time it can ever happen.

Recent discussion that prompted this with @jeffijoe https://discord.com/channels/514783899440775168/1002835415239233548/1069633398131413022
and @oskardudycz #200 (comment)

This glossary entry in #200 would go away: https://github.com/jet/propulsion/blob/doc/DOCUMENTATION.md#tranches

  • Though I would have to have a paragraph and/or lots of callouts saying "when we say partition, don't necessary asume it's an actual partition - it's any slicing of the input including sharding the space based on a hash of the key"

The term shard was also suggested

@jeffijoe
Copy link

jeffijoe commented Jan 31, 2023

It's a tough one because once you explain the reasoning behind it, I'll agree that Tranche fits the bill and I can understand the ambiguity that Partition brings, but it also goes back to the DDD concept of context, and in the Propulsion context, partition means XYZ.

With that said, I am still torn about it myself, and to be honest I might even consider stealing Tranche for the same reasons (if not just to sound smarter than I am 😎 )

@bartelink
Copy link
Collaborator Author

Lots of useful input regarding this in the Discord. Working on it in #200 but the key decisions are:

  • Tranches stays
  • DynamoStore Index Tranches should actually be Partitions
  • Propulsion docs will map all the terms early on in a glossary/preamble

@bartelink bartelink changed the title Consider rename Tranche -> Partition Final naming tweaks re Tranche/Partition (DynamoStore Index Tranches should be Partitions) Feb 1, 2023
@bartelink
Copy link
Collaborator Author

#203 resolves this

  • usages of tranche on the consumer and checkpointing side make sense at the consumer end - you could have multiple tranches over the same source partition (see Reactions and Projections docs #200)
  • but for the DynamoStore Index structure, the ability to split the index is far more correctly termed partition.
    Thanks for the nudges @oskardudycz @jeffijoe

@nordfjord So, going forward the event the code will emit from the next RC onward is now Started2 with "partition" instead of "tranche"

@nordfjord
Copy link
Contributor

the ability to split the index is far more correctly termed partition

It's technically also a tranche 😅.

RC onward is now Started2 with "partition" instead of "tranche"

Given this index format is kind of private maybe it's worth avoiding the breaking change and doing

type Event =
  | Started of { [<JsonPropertyName "tranche">] partition: AppendsPartitionId }

I'm not running a DynamoStore in production though and as such don't have skin in the game.

@bartelink
Copy link
Collaborator Author

bartelink commented Feb 3, 2023

It's technically also a tranche 😅.

Technically yes.

Technically the Reader does indeed map partitionId directly to a tranche, but as noted in the docs, there's scope for it to surface multiple tranches from the partition and then use that to enable distributed processing.

Seriously though: for future cases where I build systems that split pools, I will call entitle the divisions/shards "partitions" and identify those by PartitionIds.


Hm, maybe not the craziest idea - it's definitely ugly and cryptic.
But it's not a breaking change as such (except for you) - as long as you dont deploy an older indexer lambda over a new one
I did ponder adding making tranche and partition option fields and then taking them with an Option.orElse ... Option.get...

@bartelink
Copy link
Collaborator Author

Actually could transparently map by doing a JsonIsomorphism from { partition: Pid option; tranche: Pid option ; pos } to { partition: Pid option; pos }

@nordfjord
Copy link
Contributor

Making tranche optional would be a breaking change. So I’d just make a breaking change at that point

@bartelink
Copy link
Collaborator Author

I am saying that I'd make read of { tranche = t; pos = p} map to { partition = t; pos = p} internally. New writes would write { partitition = r; pos = p}.
But the only readers that wont be able to cope with stuff where its in a partition field are:

  • Lambdas <= rc.2
  • your logic

@bartelink
Copy link
Collaborator Author

f93ec00
I could make the updated version write a tranche too, but chose to let it write null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants