FIP-6: Flexible Storage #98

varunsrin · 2023-06-19T08:39:45Z

varunsrin
Jun 19, 2023
Maintainer

FIP: Flexible Storage

Title: Flexible Storage
Type: Implementation FIP
Authors: @cassie, @horsefacts, @v

Problem

Users on Farcaster can only store a fixed number of messages after which older messages are expired. Limited storage, along with restricted signups, has kept the size of the network from growing indefinitely during testnet. This has been important to make running a Hub practical, since each Hub must store a copy of every user’s data.

When mainnet launch happens, signups will become permissionless and inexpensive which creates a vector for unbounded growth. This will lead to a few problems:

At the application level, free storage incentivizes low quality, high volume content like spam and airdrop farming.
At the protocol level, unlimited storage creates vectors for denial of service attacks, resource wasting, and other malicious behavior.
At the infrastructure level, unbounded overconsumption makes Hubs more expensive and accelerates storage growth, which incentivizes centralization.

Message storage space is a common resource which is rivalrous. Users would prefer not to have limits on what they can do while Hub operators would prefer to only store useful content. In the absence of guardrails, we should expect overconsumption by users and eventually, a tragedy of the commons where Hub operators try to implement controls to block certain users.

We need a better system to manage and allocate storage between users and hubs that is:

Simple — can be built quickly, understood easily and upgraded without trouble.
Permissionless — should let users join without invitations or needing approval.
Value Maximizing — discourage low quality or spammy content by adding a storage cost.
Flexible — gives users the ability to acquire more storage.
Low Friction — should require as little gas as possible and let apps easily onboard users.

Specification

We propose a system that imposes a limit on the size of a Hub and divides it into equal shares called units. Users can rent units of storage by paying a yearly fee to a contract which allows them to store messages on Hubs.

A Hub’s capacity is divided into n equal units which is the total supply available for users. Acquiring a unit of storage costs a price p and is performed by making a transaction to a smart contract, which emits an event. Hubs monitor these events and increase or decrease a user’s storage limits. If a user acquires a unit, their CRDT limits on the Hub increase by:

5,000 messages for Casts
2,500 for Reactions
2,500 for Links
50 for Verifications
50 for UserData

These limits are chosen to cover the 99th percentile usage for casts and links, and the 90th percentile usage for reactions over the course of the year. Casts and Links provide more long term value to the network than reactions, so we set those limits higher than that of Reactions. See the appendices for more details.

A unit of storage may take up to s bytes on disk which depends on the type of messages in each CRDT and the limits assigned to them. Hubs must ensure that they reserve at least s*n bytes of storage for messages from users. A Hub that does not allocate enough space will not be able to synchronize the network and may be disconnected by its peers.

The storage parameters n , p, d and s can be modified as part of a protocol release which happens every 6 weeks. Parameters may be changed for the following reasons:

p may be increased if it is not preventing spam or decreased if it hinders onboarding.
n may be increased if ≥ 25% of s*n is in use by the network.
s may be increases or decreased if new message types are added, existing types are extended or if limits are changed to reflect common usage patterns.

The system is intended to last for a year and be manually tuned for that period. Afterwards, we may choose to replace it with a more dynamic system or continue with the current implementation.

Storage Contract

The storage contract lets users pay rent to acquire storage and manages the price and supply of storage. It’s deployed on the same L2 as the Farcaster Identity Registry, and has the following functionality:

It keeps track of the price p, total supply n and deprecation timestamp d in storage which are also initialized on deployment.
1. p is set to 500_000_000 (5 USDC or $5)
2. n is set to 50,000
3. d is set to block.timestamp + 365 days
Users can call rent to acquire storage units which:
- Calculates the price in ETH from USD using a Chainlink price oracle.
- Checks that the payment is correct and returns excess payment (or reverts)
- Emits an event with block.timestamp, number of purchased units (n), and the fid.
- Can be called by anyone, allowing any party to pay for a user’s storage.
- Can be called in a single tx while registering an fid, via either contract or a bundler.
Users can call batchRent to acquire multiple users of storage for multiple fids, following the same logic as rent
Users cannot call rent or batchRent if block.timestamp > d
Admins can call functions to change the p, d, and n at anytime.

Hubs

Hubs must be updated to allow dynamic limits for each user within each CRDT type. They must also monitor the contract and update the limits as storage is purchased and expires.

Hubs must monitor the storage contract for Rent events and:
- Store a storage count per user in rocks db, along with the expiration timestamp
- Increase a user’s limits when a new event is noticed
- Decrease a user’s limits after 365 days + 30 day grace period has passed.
Every CRDT must be aware of each users personal storage limits and prune accordingly.
- If the limits are reduced, a a prune job must be run immediately.
Every CRDT must no longer implement time-based limits.

APIs

Hubs must also expose the following APIs which allows a caller to determine the storage limits for a user.

rpc GetCurrentStorageLimitsByFid(FidRequest) returns (StorageLimitsResponse);

message StorageLimitsResponse {
  repeated StorageLimit limits = 1;
}

message StorageLimit {
  string store_type = 1;
  uint64 limit = 2;
}

Rationale

Can we prevent overconsumption by keeping the network invite-only?

Staying invite-only makes it harder for developers to onboard users and gatekeeps access to the network. This was necessary to bootstrap the network when it was being developed, but is becoming undesirable as the network grows. A permissionless system is a more level playing field for everyone.

Can we prevent overconsumption by kicking out the spammers?

People may have different perspectives on what spam is and making such decisions at the network level introduces a vector for censorship. A better approach is to reduce spam by adding fees and leave the final filtering steps to applications which can experiment with different, dynamic heuristics to identify useful content to surface.

Why is a unit set to a specific number of messages and not bytes?

Keeping track of bytes is much harder than keeping track of the number of messages and is less intuitive for users. A total message count is easier for end users to understand and model their behavior around.

Why is the price for a unit of storage $5/year and not more or less?

A “best guess” was made to select a price that was easy to communicate and was large enough to prevent overconsumption of storage. Networks like ENS and domain names have pricing models that are similar or in the same order of magnitude. We don’t yet know if this is the right number and may change it over time.

Why denominate in USDC instead of ETH?

Pricing in USDC makes it easier for end users - they pay a predictable fee every year that does not change based on the price of Ethereum, which is currently much more volatile.

Why is the storage price fixed instead of market-based?

A fixed price system manually tuned by admins is simpler and less likely to have bugs. It helps us ship something quickly that we can develop as we observe user behavior. It is dependent on a price feed which adds some risk, but this is acceptable when the amounts are small. The right long term approach is a dynamic pricing approach with a GDA or VRGDA. This can maximize value captured while minimizing manual intervention. The main downside is that it is complex, takes a while to tune correctly and may result in “winner’s curse” during periods of high demand.

Who sets the storage price?

The price can be set by a multi-sig currently controlled by Dan & Varun. The price will be adjusted upwards if it is not effective enough at reducing spam and will be adjusted downwards if it is creating too much friction for user onboarding. For the first year, these decisions will be made by the team based on qualitative factors.

Where do the storage fees go?

Fees are expected to be minimal in the first year (~$10-20k) and are mostly for spam prevention. They will be collected in a multi-sig controlled by Dan & Varun. If they reach a significant amount ($100k+) we will consider spending them in ways that benefit the protocol.

Why rent storage instead of buying it permanently?

Storage units could be issued as transferrable ERC-20’s, which users can buy and re-sell when they no longer need it. While this creates a marketplace for storage, it creates more UX complexity, increases gas costs and makes it more difficult for app developers. It may also lead to hoarding and inefficient allocation, where users may lose or forget about tokens but Hubs can’t tell this and must keep reserving space for these “dead” tokens.

Release

Hubs start listening to a temporary storage contract as part of v1.4 release. Storage limits are tracked, but not enforced.
During mainnet migration, a final version of the contracts will be deployed.
Hubs will start enforcing storage limits after a grace period of one week.
As part of the migration to the new system, existing users will get two units of storage free for their first year.

Appendix A: Prior Work

Appendix B: Usage Patterns

The following data was collected from three groups of users: Group 1 (who signed up in 2021/2022), Group 2 (who signed up in 2022/2023) and Group 3 (who signed up in 2023). Data is obtained from warpcast which keeps an archive of pruned and revoked data allowing us to get a broader picture of how many messages users generate over time.

	Group 1 (fids < 5000)	Group 2 (5000 ≤ fids < 10,000)	Group 3 (fids ≥ 10,000)
Casts
Mean	180.96	48.53	18.60
Median	16.0	4.0	1.0
90th Percentile	281.80	74.0	24.0
95th Percentile	639.0	160.60	47.0
99th Percentile	3229.40	769.48	224.36
99.9th Percentile	13838.29	2835.29	2378.64
Reactions
Mean	455.04	105.24	30.64
Median	34.0	7.0	3.0
90th Percentile	652.20	143.0	51.0
95th Percentile	1546.60	318.0	107.0
99th Percentile	9929.40	1582.62	469.80
99.9th Percentile	24491.21	10818.61	2624.78
Links
Mean	142.14	82.36	59.34
Median	67.0	52.0	51.0
90th Percentile	292.80	145.60	95.0
95th Percentile	507.80	236.30	138.0
99th Percentile	1497.04	808.82	408.48
99.9th Percentile	2505.38	2624.26	1813.94

Appendix C: Message Sizes

The following shows the maximum possible size of each message type allowed by the protobuf format and the average size observed on Hubs as of Jul 21st, 2023.

	Avg Size (B)	Max Size (B)
Casts	273	1344
Reactions	171	356
Links	153	160
Verification	263	266
User Data	177	448

The following table shows the projected size of Hubs under three different scenarios for different numbers of storage slots:

If usage stays at average sizes and average message count.
If usage stays at average sizes but all storage slots are used.
If a spammer tries to fill every message to the max size possible. (theoretical max)

We also multiply all estimates by a 2.42x overhead factor and a 5x buffer factor. The overhead accounts for indexes, events, sync tries and other consumption of storage. The buffer is simply to protect against underestimations and can be relaxed in the future.

We expect realistic usage to fall somewhere between (1) and (2), likely closer to (1). The sizes are represented below in TB:

Slots	Scenario 1	Scenario 2	Scenario 3
10,000	0.00	0.24	0.89
100,000	0.04	2.42	8.85
1,000,000	0.44	24.18	88.54
10,000,000	4.35	241.78	885.42
100,000,000	43.55	2417.77	8854.20
1,000,000,000	435.45	24177.73	88542.01

michaelhly · 2023-06-19T22:34:09Z

michaelhly
Jun 19, 2023

Do we have to impose storage restrictions on the user level? Are other solutions possible?

Gmail became the most popular email client b/c it provided users with so much storage (compared to other email clients) that users didn't have to think about it.

@kcchu mentioned that [this] problem could be solved later and by others. Perhaps better solution(s) could be found later without imposing storage limits on users, such as:

applications purchasing storage units on behalf of users
sophisticated methods of message pruning
standardize an optional content archiving mechanism
dropping unwanted users

How does addressing this problem today help Farcaster clients/applications find product-market fit faster?

8 replies

michaelhly Jun 22, 2023

The Ivory app (some Mastodon client) lets new users sign up for read-only access but users have to pay for a subscription to be able to write.

Can storage be flexible in a way where users can sign up for an fid but cannot write to hubs?

varunsrin Jun 23, 2023
Maintainer Author

yes, it works that way right now

michaelhly Jun 25, 2023

I'm a fan of this, and this is entirely possible with this design. Apps can pay for storage on behalf of users by calling the rent function and assigning the storage to the user's fid.

I don't think apps should be paying "retail" prices for storage units (or FIDs). What are your thoughts on apps paying the protocol a yearly fee to receive "bulk" discounts?

My Rationale:

Exchanges don't charge the same maker-taker fee they charge "retail" traders to liquidity providers. In fact, most exchanges offer "rebates" to market makers who bring liquidity to the platform.
Suppliers charge grocery/department stores cheaper "bulk" prices and then grocery/department sell to "retail" at a mark up — protocol : supplier :: apps : department/grocery store.

varunsrin Jun 25, 2023
Maintainer Author

if it's cheaper to register via an app, everyone will register via an app, and therefore the price will be whatever the app price is.

so the idea of a separate retail price doesn't work, there is likely only one price and we need to set it correctly.

qbig Jul 24, 2023

So this is more intended for preventing spam rather than encouraging hosting more hubs? as it seems more hubs would not scale the network capacity unless some kind of sharding is introduce

musnit · 2023-07-06T16:02:57Z

musnit
Jul 6, 2023

If cheaper/free signups are desired, could be nice to have different "tiers" of storage payment, such that if there is spare unused free space, it can be claimed with a lower-tier purchase:

highest tier is simple, $5 or so as described with guaranteed storage until deprecation as described
second tier is free, but always dropped if first tier needs the space
second tier storage is fundamentally optional for hubs, though ideally there is some out-of-protocol social consensus around how it works so that most hubs still store the same stuff in second tier storage
- for example, some shared out-of-protocol bot/sybil-mitigation techniques, like a semi-decentralized set of oracles doing offchain proof-of-human or captcha or bootstrapping from centralized bot-protection systems like facebook/twitter (eg: https://www.brightid.org/)

There is a risk that this idea makes it more likely for a free-rider problem to emerge. As long as hubs all retain strong social consensus that the highest tier takes priority and won't fork that out, then they should co-ordinate to disable and drop second tier support more heavily if the network as a whole has too much of a free-rider problem.

2 replies

varunsrin Jul 10, 2023
Maintainer Author

I think the right way to solve this is by reducing the price of storage, which is a simple way to change the demand curve.

Multi-tier storage increases complexity in a bunch of areas - hub sync has to change dramatically, users and developers cannot rely on all hubs behaving the same way etc.

musnit Jul 10, 2023

yeh it's probably best to reconsider the multi-tier ideas when closer to building a multi-context / multi-network farcaster as its related to Sandboxing and Storage Management across networks

dimal97 · 2023-07-24T21:13:07Z

dimal97
Jul 24, 2023

Can storage be proportionate to user’s account quality?

By default everyone gets low quota, if your account is popular among others (follows, likes, recasts) quota increases.

Should storage fees be distributed among hub operators?

1 reply

varunsrin Jul 24, 2023
Maintainer Author

for both, it might be possible to do them retroactively in the future.

but any kind of automated reward system for user quality or hub availability is easily gameable.

shotaronowhere · 2023-07-24T22:37:25Z

shotaronowhere
Jul 24, 2023

generally I favor local over global state.

'light hubs' as I suggested to greg at the farcaster meetup in paris would solve dos vectors.

use the web of trust as rate limiters with power users (highly connected nodes in the social graph), hosting infrastructure for their communities (3 degrees of separation for example from direct follows).

social media isn't a strictly financialized adversarial environment. there are non-financial incentives as well. I'm willing to bet you that most people would be willing to share idle PC hosting time to support their direct communities (people they follow, groups they are a part of).

side comment: I think lens is making a mistake over financializing their platform

maybe there's a bootstrapping phase where new users need to find content from global state, but as the network evolves, a rich web of trust will form, at which point anti dos mechanisms which emphasize local state and local trust could work well.

Global state could still benefit from tapping into the web of trust. Perhaps farcaster power users who are willing to pay $5 for extra storage can lend that storage to their local web of trust.

7 replies

vrypan Sep 6, 2023

What if the hub operator could override (upwards) the limits for specific users? So, the storage contract may report that user X has one storage unit, but the hub operator can override this value for this specific user and set it to 100.

This would allow hub operators to offer an archive for some users (their own account, friends and family, community), but the rest of hubs will follow what the storage contract says. This could also be a way to monetise hubs directly (similar to ifs pinning in a way).

I can also think of use cases where some hub operators have a specific interest in archiving specific accounts for their own reasons. For example, keeping a full archive of politicians and public figure accounts for ever, even if the accounts do not pay a rent.

varunsrin Sep 6, 2023
Maintainer Author

We've explicitly chosen not to do federation and to take the approach of requiring all hubs to have the same global state. Federation makes sync much harder to perform correctly, makes it easier to censor users and violate sufficient decentralization and makes it harder for users who want to remove data from the network.

If the goal is to have an archive, you can simply stream data out from hubs and back it up somewhere for posterity.

vrypan Sep 7, 2023

Varun, I'm not talking about a federated model. I like the current model. Here is how I would imagine it:

Each hub works exactly as described by FIP-6.
Hubs allow to manually increase storage limits of users (some, all, this is up to the hub operator)
When syncing with other hubs, the additional records are not reported. So the hub has them, but they will not appear during sync.
If a client/app connects to the specific hub, it has access to the extra records. (This will require changes to the Get* methods to include an option like fullArchive=True or maybe a different set of methods).

I think that the current approach is good at this point in FC's lifecycle, but making casts ephemeral will be a limitation at some point. There are many important cases where users need to feel that there is a preservation mechanism. We were discussing all the great use cases of FIP-2 yesterday. I think that most of these cases will not work if we tell users that their blog comments or bookmarks will disappear in a year.

You could argue that preservation may be implemented outside the protocol, however there is value in offering it as part of it.

I understand that what I'm proposing is outside the scope of this FIP. But I think it could be the object of a separate FIP. (I'd be happy to detail and submit one.) It could even be a FIP that Hubble or most hubs do not implement.

varunsrin Sep 7, 2023
Maintainer Author

You could argue that preservation may be implemented outside the protocol, however there is value in offering it as part of it.

if a user wants to preserve data, they can just pay for storage ($5/year right now) and increase the storage limits and nothing is expired. so this is specifically about hub operators being able to choose what data should get retained for longer durations, which in some cases may be against the desires of the user. it's an interesting idea, but not one that is on the top of our priority stack relative to other fips.

When syncing with other hubs, the additional records are not reported. So the hub has them, but they will not appear during sync.

But I think it could be the object of a separate FIP. (I'd be happy to detail and submit one.)

If you want to push this forward you'll probably have to write the code to show that it works. Writing the FIP is the easy part, there are lots of reasons why this may not be practical or even possible to implement with the way the sync trie, data layer and gossip work today.

vrypan Sep 7, 2023

All fair points. I'm not trying to convince the team to work in some specific direction, I'm mostly trying to seed some ideas and share concerns that may (or may not!) be valuable down the road.

I started a separate discussion around fc archiving, and maybe others share better ideas than mine on the topic. But I believe that it's not a topic to be ignored.

deepu · 2023-07-25T06:54:11Z

deepu
Jul 25, 2023

This sounds reasonable.

But wonder if it's better to have a 1$ account with 1/5th of the current limits for people to try out before upgrading to the 5$ account.

My concern comes from 2 points :-

Mostly apps will create these on behalf of users on the backend. Imaging signing up 100k users to the app . Going by usual app retention rates, only around 10-20% will be active after a week. App has to spend $500k on signing up all users but ends up retaining only 20% of them, Rest won't even occupy space.
For apps targeting developing countries (ex - India) 5$ is still a huge amount per person. Especially so if apps end up losing majority anyways. 1$ will allow apps from developing countries to on-board more people . Will be affordable even if majority are lost to retention.

2 replies

varunsrin Jul 25, 2023
Maintainer Author

the goal is to hit the right balance between making it hard for spammers to create accounts while allowing legitimate users to sign up.

tbh, i don't think we know what the right number is could be $.1 or $1 or $10 - plan is to start here, see the reaction and modify as we go.

longer term, we'd want a smarter pricing contract that actually auto-adjusts based on demand

deepu Jul 25, 2023

Understood.

As long as we keep correcting based on feedback from real world usage, we should be good.
Let's see how the experiment goes.

Hmac512 · 2023-08-02T16:05:11Z

Hmac512
Aug 2, 2023

What about the case where you want to setup a shared umbrella of storage for a number of users to use?

A Telegram group chat or Github discussions forum are examples. On Discord you can boost servers to give it more functionality/storage.

You can setup a FID for that community, and then that FID can be given storage by anyone.

The problem is messages intended for a group/community are signed by a individual member, and not the overarching group.

This can be solved by having the group setup a server that will sign messages on-top a base message from a user. The downside of this a group needs to setup a dedicated server w/ a private key that can sign messages.

1 reply

NetWalker108 Aug 25, 2023

The downside of this a group needs to setup a dedicated server w/ a private key that can sign messages.

Wouldn't it be the group leader or server manager(that the group trusts) taking on this role in this case? If so then it'll look something similar to Mastodon but please confirm if I'm missing something w.r.t. the specific problem.

BlankerL · 2023-08-24T12:53:40Z

BlankerL
Aug 24, 2023

The solution sounds quite reasonable for the short term.

However, we will face a scenario that a user pay $10 to rent 2 units of storage and only pay $5 in the next year. How are we going to prune the users' excess data, on a First-In-First-Out (delete the earlier messages and keep later ones) basis or should each application developer provide an interface for users to delete them one by one?

Also, currently, the requirement is that "each Hub must store a copy of every user’s data."

Will we find a solution for this in the future? Because, the capacity for the total number of casts is still restricted by the Hub's minimum disk space, which is restricting the growth of the Farcaster ecosystem in the long term.

0 replies