Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding mechanisms to gracefully handle "map is full" error #34

Open
ncloudioj opened this issue Jun 7, 2018 · 12 comments
Open

Adding mechanisms to gracefully handle "map is full" error #34

ncloudioj opened this issue Jun 7, 2018 · 12 comments

Comments

@ncloudioj
Copy link
Member

Each lmdb store has a predefined size (10MB by default), if a store is running out of free space due to,

  • Store was filled by the data
  • Some orphan transactions prevented lmdb from reclaiming the unused pages

either way, all the following inserts will be rejected by lmdb with a MDB_MAP_FULL error.

We will have to provider some bailout mechanisms for this particular issue to avoid the write downtime.

  • Resizing the store, this requires all users terminate their transactions (read/write) first, also, once the size gets increased, there is no way to shrink it unless recreating a new one and copying all the data over
  • To let lmdb reclaim the unused pages, we need to ensure there is no orphan holding locks in lmdb's transaction table. Lmdb has a API (mdb_reader_check) to clean up those zombie transactions, looks like this API was not exposed by ldmb-rs, we might have to add that to upstream first.
@zen0wu
Copy link

zen0wu commented Oct 13, 2018

On a related matter, it seems that mdb_env_set_mapsize should be exposed to change the default size. I went back to lower level library because of this limitation. Not sure if there's an existing issue on that.

@mykmelez
Copy link
Contributor

@shivawu You can change the map size using rkv. See #82 for tests that demonstrate how to do that.

@zen0wu
Copy link

zen0wu commented Oct 21, 2018

Ah I see, somehow I missed it. Thanks for pointing that out. Probably the local doc does not work that well for me.. Also btw, the online doc is missing for the project (https://docs.rs/rkv/0.5.1/rkv/), expected?

@mykmelez
Copy link
Contributor

Ah I see, somehow I missed it. Thanks for pointing that out. Probably the local doc does not work that well for me.. Also btw, the online doc is missing for the project (https://docs.rs/rkv/0.5.1/rkv/), expected?

Not expected, but known. It's #81, where you can read all the details (tl;dr: docs.rs is using an outdated rustc compiler).

@ncloudioj
Copy link
Member Author

Another way for us to deal with MAP_FULL is that we check the space usage of environment via its stats API (which reports max map size, total page # in use, total page # in the freelist etc), and then increase the map size if the disk usage goes over a pre-defined high watermark.

Specifically, we can do this either by a dedicated maintenance worker, or by the consumers themselves. Either way, some coordination is required since resizing can only be done on a free environment (only one writer without any readers).

This preventive resizing appears to be more preferred over the resize-when-you-hit-the-map-full in two aspects:

  • Write transactions are less likely to be interrupted
  • Easier to coordinate, preparing the resize condition is hard when there are active readers/writers

@mykmelez What do you think?

@mykmelez
Copy link
Contributor

@ncloudioj It seems helpful to periodically check stats and preemptively increase the map size if the database is approaching its limit (as well as to periodically call mdb_reader_check to clear stale readers). And that'll make it less likely for a write to fail with MAP_FULL. But it seems like we'll still occasionally encounter that error, so we'll still need to handle it somehow.

Perhaps kvstore (or the consumer) can catch the MAP_FULL error and trigger a resize (and stale reader check) operation at that point, after which it can retry the write?

That operation could be identical to the one we run periodically. And perhaps it could be invoked with a flag to indicate that it's being invoked manually and should definitely increase the map size, even if the environment doesn't look too full, since a large-enough write could fail in an environment that is mostly empty.

As for coordination, I suppose that whatever is invoking the resize operation could await existing readers while clearing stale ones and blocking new ones, then resize once existing readers are cleared, and finally unblocking the new readers (after which it can return control to whoever called it—kvstore or the consumer—to retry the write).

@ncloudioj
Copy link
Member Author

Agreed. Looks like kvstore is the perfect candidate to do this bookkeeping. Once this particular error gets handled properly, it'll be a big step forward for us to roll out rkv in production at scale!

Speaking of maintenance, LMDB also has a set of API for the environment clone (as mentioned in #12 ), which could be useful to eschew the fragmentation problem (like VACUUM). A frequently updated store could consume much more pages than a fresh clone.

With stale reader collector, resize utility, and live copy utility, I believe they'd make a solid maintenance lineup for kvstore.

@mykmelez
Copy link
Contributor

This morning it occurred to me that we might implement this in rkv instead of kvstore, in which case rkv consumers more generally would gain the benefit of this automatic resizing. And doing that would be consistent with the goal of rkv to provide a "simple, humane" interface to LMDB. Unsure if there's any reason not to implement this in rkv, but I'll think about it some more.

@ncloudioj Do you have any thoughts about implementing this in kvstore vs. rkv?

@ncloudioj
Copy link
Member Author

ncloudioj commented Feb 15, 2019

My reasoning was based on that the consumer knows better than rkv regarding when&how to do the resize :)

Though I agree with you that letting rkv handle it for the consumers would be nice in most cases. If we were to implement it in rkv, I think we also need to consider following cases:

  • rkv consumers might want to handle MPA_FULL by themselves. For example, they might either want a fixed size store, or do not want it to grow without bound. Perhaps leave auto-resize as an option?
  • Off the top of my mind, coordination between readers and writer for resizing would be hard because rkv doesn't know anything about the active readers, such as when they will end the underlying transactions. I just realize that kvstore actually face the same challenge though. The key point is how can we efficiently send this need-to-resize message to all the rkv readers&writers.

@mykmelez
Copy link
Contributor

It occurred to me recently that a reason to do this in rkv rather than kvstore is that even in Firefox there are consumers who are using rkv directly, such as the one in bug 1429796.

rkv consumers might want to handle MPA_FULL by themselves. For example, they might either want a fixed size store, or do not want it to grow without bound. Perhaps leave auto-resize as an option?

As with some other issues, like #109, a consumer might want more control over their interaction with LMDB under certain circumstances while appreciating the value of rkv's higher-level abstractions in other cases. And there's a tension between wanting to support those use cases while keeping the API simple for the common case.

I do want to support those use cases, and I agree that it should be possible for an rkv consumer to "opt-out" of auto-resize. I would just want to make sure we do so in a way that imposes the least cognitive burden on consumers in the common case.

Off the top of my mind, coordination between readers and writer for resizing would be hard because rkv doesn't know anything about the active readers, such as when they will end the underlying transactions. I just realize that kvstore actually face the same challenge though. The key point is how can we efficiently send this need-to-resize message to all the rkv readers&writers.

I suspect that rkv will have to track and manage active readers somehow, such that it can await them, while blocking new ones, when a resize is pending.

@ncloudioj
Copy link
Member Author

Indeed, there are pros&cons in both mechanisms. Looks like equipping rkv with auto-resize works better in more cases at the moment. So yes, I am convinced, and let's get started with that design.

To keep the initial work in a reasonable scope, I'd also like to revoke my prior suggestion of making the auto-resize optional. We can re-visit such feature if we find it useful in the future.

I suspect that rkv will have to track and manage active readers somehow, such that it can await them, while blocking new ones, when a resize is pending.

Yes, it's definitely doable in a single process scenario, but will be more challenging when active readers reside in different processes. Perhaps we can have an internal db (e.g. __meta__) to store all the meta information in order to facilitate the coordination.

With this change, readers could also be blocked by the resize coordination. Though this should be easier to handle as we can return an error (such as RESIZE_IN_PROGRESS) in the reader constructor.

@mykmelez
Copy link
Contributor

mykmelez commented Mar 25, 2019

Note https://www.openldap.org/its/index.cgi/Software%20Bugs?id=8975, which fixes a (rare) error/crash when calling mdb_env_set_mapsize() on Windows if MDB_WRITEMAP is set. It's already been landed on the mdb.RE/0.9 branch, but a new version of LMDB hasn't been released with the fix yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants