You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running trin before storage hits the allocation limit, we currently run at 100% radius. If the steady state radius will land at 3% for a particular storage allocation, then 97% of the stored content we receive early on (in the period before radius starts to shrink) will end up being deleted. During that period, 97% of the network transfer, cpu for verification, and storage I/O is wasted.
Proposal
Pre-shrink the radius
We can't know the size of the network with arbitrary precision ahead of time, but we can bake in a rough guess to the binary. This rough guess can massively shrink the wasted data.
Asymmetry in the guess
The effect of incorrectly estimating the radius is highly asymmetrical:
don't preshrink the radius enough, and ... we're still much better off than before
preshrink the radius too much, and you never store as much as you are willing to store (without us adding a more complicated feature, to bump up the radius)
So we probably want to take our radius estimate and double it. We still get a plenty large benefit, with only a small downside when our baked-in numbers get stale over time (assuming it's quite rare for a network total storage to shrink).
Real example
Our current fleet of history nodes at steady state show a ~2% radius with 35 GB allocated each, which implies a 1.75 TB total network size.
If you launch a fresh trin instance for history with 17.5 GB of space, we can estimate that the radius will end up at 1%. With a 2x buffer, we could preshrink the radius to 2%. That means that we waste ~1% of total resources compared to the status quo of ~99%. (During the period before the radius shrinks below 100%) 99x gains are good. The less storage you allocate, the more there is to gain with this change. Those are the folks who probably care the most about performance.
History approaches the correct radius relatively quickly in practice, but state takes a while, and is more resource-intensive. So state is probably where we will feel the benefit of this the most.
The text was updated successfully, but these errors were encountered:
This is another one that will probably cause hive test failures. A way around this would be to have this value be able to be configured at the CLI so that in hive tests it can effectively be disabled. Seems potentially as simple as something like --history.total_network_data=600_000. This would allow us to take the storage limit they provided (or the default) and extrapolate the appropriate starting radius. For hive, we might also want to introduce something like --history.max_allowed_radius=auto as the default, which means it will extrapolate the value from the total network size. In hive, we can then do --history.max_allowed_radius=100 to have it do the default thing it does now and start the radius at 100%.
Note that this approach suffers from the problem of growth over time making it less accurate as time goes on but in practice my guess is that this will be a non-issue. We could choose a more complex metric to account for this like --history.network_growth_rate but I would be surprised if the complexity of such an approach was worth it. We'll be making releases regularly and we can update
Problem
When running trin before storage hits the allocation limit, we currently run at 100% radius. If the steady state radius will land at 3% for a particular storage allocation, then 97% of the stored content we receive early on (in the period before radius starts to shrink) will end up being deleted. During that period, 97% of the network transfer, cpu for verification, and storage I/O is wasted.
Proposal
Pre-shrink the radius
We can't know the size of the network with arbitrary precision ahead of time, but we can bake in a rough guess to the binary. This rough guess can massively shrink the wasted data.
Asymmetry in the guess
The effect of incorrectly estimating the radius is highly asymmetrical:
So we probably want to take our radius estimate and double it. We still get a plenty large benefit, with only a small downside when our baked-in numbers get stale over time (assuming it's quite rare for a network total storage to shrink).
Real example
Our current fleet of history nodes at steady state show a ~2% radius with 35 GB allocated each, which implies a 1.75 TB total network size.
If you launch a fresh trin instance for history with 17.5 GB of space, we can estimate that the radius will end up at 1%. With a 2x buffer, we could preshrink the radius to 2%. That means that we waste ~1% of total resources compared to the status quo of ~99%. (During the period before the radius shrinks below 100%) 99x gains are good. The less storage you allocate, the more there is to gain with this change. Those are the folks who probably care the most about performance.
History approaches the correct radius relatively quickly in practice, but state takes a while, and is more resource-intensive. So state is probably where we will feel the benefit of this the most.
The text was updated successfully, but these errors were encountered: