Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support 64-bit file offsets on systems with 32-bit off_t #2032

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

inglorion
Copy link
Contributor

@inglorion inglorion commented May 8, 2023

Adds large file support (64-bit file positions) to the existing Nix
API when targetting systems that have separate 32-bit and 64-bit
APIs.

On many platforms that support 64-bit file positions, this support is
present in the standard I/O functions. On others, there have
historically been two APIs: the standard API, which limits file size to
2**31-1 bytes, and a separate API that suffixes function names with "64"
and supports 64-bit file sizes. This "transitional interface for 64-bit
file offsets" is not present on every platform. As a result, using the
*64 API will make programs non-portable, whereas using the standard API
will result in programs that cannot handle large file offsets on some
systems, even if those systems actually do have a way to handle
large file offsets.

This change enables 64-bit file offsets on the following platforms
where the standard API only provides 32-bit offsets:

  • 32-bit Linux with glibc.
  • 32-bit Android.

To support large files with a consistent API across platforms, this
change makes Nix functions call the 64-bit capable equivalents where
necessary. Other C libraries may not provide the *64 API, so we continue
to call the standard functions on those.

Broadly, the change consists of 5 parts:

  1. Uses of libc::off_t have are replaced by i64. This provides a
    consistent API across platforms and allows 64-bit offsets to
    be used even when libc::off_t is 32-bit. This is similar to
    std::fs, which uses u64 for file positions. Nix uses i64
    because off_t is a signed type. Into is used where appropriate
    so that existing code that uses libc::off_t will continue to
    work without changes.

  2. A largefile_fn macro that is used to select the large-file-capable
    version of a function. E.g. largefile_fn![pwrite] is equivalent
    to libc::pwrite64 on glibc/Linux and plain libc::pwrite on
    systems that don't have or need libc::pwrite64.

  3. A new require_largefile macro that is used to skip tests that
    require libc::off_t to be larger than 32 bits.

  4. Changes to fallocate, ftruncate, lseek, mmap, open, openat,
    posix_fadvise, posix_fallocate, pread, preadv, pwrite, pwritev,
    sendfile, and truncate, making them support large files.

  5. A set of test_*_largefile tests to verify that each of the
    previously mentioned functions now works with files whose size
    requires more than 32 bits to represent.

A few functions are still limited to 32-bit file sizes after this
change. This includes the aio* functions, the stat functions, and
mkstemp(). Adding large file support to those requires a bit more
work than simply calling a different function and is therefore left
for later.

@inglorion
Copy link
Contributor Author

The unused_mut complaints in src/sys/socket/mod.rs exist before this PR. I've sent PR #2033 to fix those.

@inglorion
Copy link
Contributor Author

#2033 was merged. I've rebased this PR on top of that. I expect it will pass the checks now. I've made a bunch of commits addressing check failures - would you like to squash those before review?

@inglorion
Copy link
Contributor Author

Does anyone have thoughts on this? @asomers ?

@inglorion
Copy link
Contributor Author

Any comments?

@asomers
Copy link
Member

asomers commented Jun 29, 2023

This PR will break backwards compatibility for callers who pass a libc::off_t to Nix, right? What about making these functions generic, like this, and always calling the 64-bit syscall?

fn pread<Off: Into<libc::off_t>>(f: RawFd, buf: &mut [u8], offset: Off> {...}

@inglorion
Copy link
Contributor Author

This PR will break backwards compatibility for callers who pass a libc::off_t to Nix, right? What about making these functions generic, like this, and always calling the 64-bit syscall?

fn pread<Off: Into<libc::off_t>>(f: RawFd, buf: &mut [u8], offset: Off> {...}

Assuming you meant "Into<off_t>" (that is, accept any type (including libc::off_t) that can be converted into a nix::off_t).

I'll see how that goes.

@inglorion inglorion force-pushed the largefile branch 8 times, most recently from 4d35ffa to 10d3fea Compare July 1, 2023 00:54
@inglorion
Copy link
Contributor Author

Ok, uclibc is interesting. It can be compiled so that file offsets are always 64-bit (__USE_FILE_OFFSET64) and it can be compiled to provide the *64() functions and off64_t (__USE_LARGEFILE64). However, it does not apply these options to preadv() and pwritev(). Instead, preadv() and pwritev() always take 64-bit offsets (regardless of __USE_FILE_OFFSET64), and preadv64() and pwritev64() are never provided (regardless of __USE_LARGEFILE64).

My original plan was to focus on making nix work with large files when using glibc and leave everything else untouched as much as possible. After having tried a couple of variations of the code as it relates to uclibc, it's still in that spirit. I have changed the comments to make it clear that "uclibc doesn't use off_t" is specific to preadv and pwritev, but other than that, the code does the same thing it did before when using uclibc. This means uclibc on 32-bit Linux is still using 32-bit file offsets - something we may want to address in a follow-up.

@inglorion
Copy link
Contributor Author

inglorion commented Jul 1, 2023

I've made most of the relevant functions use <Off: Into<off_t>> so that they can accept either a nix::off_t or a libc::off_t. This should increase compatibility with existing code.

There are, however, still a few cases where a change to calling code is required:

  1. Functions that return an off_t are still switching from potentially 32-bit libc::off_t to nix::off_t.
  2. Same for functions that take a &mut off_t (e.g. sendfile).

I also ran into a situation with mmap. I can add Off: Into<off_t> easily enough, and that will work fine for call sites that don't pass any type parameters. But there are a number of cases in the tests where it's being called as mmap(...). This is because None is being passed as the optional f, and we need to tell the compiler what type the None is. If I change the number of type parameters, such call sites will have to be updated. I'm not really happy with how that looks, so I've left mmap alone for now. I suspect that, in practice, the offset being passed to mmap is usually a literal 0, but, of course, I don't really know all the code that's out there.

@asomers
Copy link
Member

asomers commented Jul 1, 2023

This PR will break backwards compatibility for callers who pass a libc::off_t to Nix, right? What about making these functions generic, like this, and always calling the 64-bit syscall?

fn pread<Off: Into<libc::off_t>>(f: RawFd, buf: &mut [u8], offset: Off> {...}

Assuming you meant "Into<off_t>" (that is, accept any type (including libc::off_t) that can be converted into a nix::off_t).

I'll see how that goes.

Actually, I meant Into<u64>. Because if we go this route, I think we can get rid of nix::off_t. Can we?

@inglorion inglorion force-pushed the largefile branch 2 times, most recently from 03b0935 to 28ff751 Compare July 2, 2023 02:07
@inglorion
Copy link
Contributor Author

Status update, since I've been called away to other duties for a while:

u64 is going to be a bit of a challenge, because off_t is signed. This means, among other things, that code that passes in a libc::off_t will fail to compile if we switch to u64. It does have the nice property that it's also what std uses (std::fs::Metadata::len is u64, and std::io::Seek::stream_position returns Result, to name a few examples).

Alternatively, we could do i64. I made some changes to see what that would look like, but was called away to other duties. I've uploaded what I have, which doesn't yet pass tests on all targets, but should give an idea. Mostly, it simplifies things compared to having some nix-specific offset type, which I like.

The biggest difference between this and the previous version (which used nix::off_t) is that i64 is always 64-bit, whereas nix::off_t as I had coded it was 64-bit on most targets, but could be a different size on some - for example, on 32-bit Linux where the C library is not glibc. This is what many of the check failures are about: we're calling a libc function which expects a 32-bit off_t, but we're giving it an i64. This can be fixed; I just haven't gotten around to it.

In the repo where I'm doing the development, I have a function to_off_t which converts the offset passed into a function into the type that the libc function requires and returns Err(Errno::EOVERFLOW) if this does not succeed. Here is an example of how it is used. This has the same effect as lseek returning EOVERFLOW, which it is documented to do if the file position cannot be represented in an off_t.

Advantages of this approach:

  • Consistent API: file offsets are represented as i64 everywhere.
  • We can call 64-bit capable functions where they exist and fail with an already-documented error code when trying to use file offsets that the underlying implementation does not support.
  • Thanks to Into, we can continue to accept both libc::off_t values as input (both 32-bit and 64-bit).

Disadvantages:

  • Because we're using i64 everywhere, code that currently expects 32-bit off_t return values or use 32-bit off_t references will fail to compile. This may not be a huge problem, but it is an API change.

@asomers
Copy link
Member

asomers commented Jul 6, 2023

Yes, I wasn't thinking about signedness. i64 is definitely better than u64. The important questions are:

  • Are there any platforms where off_t is 32-bits and don't also have 64-bit functions like lseek64? If so, we shouldn't use Into<i64> on those platforms. It's better to fail at compile time than runtime.
  • Functions that return off_t and that also have 64-bit equivalents, like lseek, can be modified to return a generic type. Are there any that can't?

@inglorion
Copy link
Contributor Author

Yes, I wasn't thinking about signedness. i64 is definitely better than u64. The important questions are:

  • Are there any platforms where off_t is 32-bits and don't also have 64-bit functions like lseek64? If so, we shouldn't use Into<i64> on those platforms. It's better to fail at compile time than runtime.

The BSDs have 64-bit off_t even on 32-bit architectures. So does Haiku if I read the source correctly. The only platform I know of that has 32-bit off_t is 32-bit Linux. I think you can compile a libc for Linux that has 32-bit off_t and doesn't provide the *64() functions. Do we want to support such a configuration? I think this already currently doesn't work, because, for example, we provide lseek64 on Linux regardless of word size.

@anacrolix
Copy link

I've discovered this issue compiling for arm-unknown-linux-gnueabi. In particular, using the fcntl API assumes the 32bit version of flock inside FcntlArg::F_SETLK and friends, and I repeated this same problem in #2300 (noting that this doesn't block merging that, the fix will be the same for all APIs).

@inglorion inglorion force-pushed the largefile branch 4 times, most recently from 3966cf7 to eeaed43 Compare July 25, 2024 13:37
@inglorion inglorion force-pushed the largefile branch 3 times, most recently from 979b487 to dc82452 Compare July 25, 2024 16:36
@inglorion
Copy link
Contributor Author

It seems that 32-bit Android, too, uses 32-bit file offsets. Let's see if I can make 64-bit offsets work there, too.

@inglorion
Copy link
Contributor Author

Per https://android.googlesource.com/platform/bionic/+/main/docs/32-bit-abi.md, Android has the *64 functions available, so I think the same change for glibc will work for Android, too.

@inglorion inglorion force-pushed the largefile branch 9 times, most recently from 3db46c2 to 117eb6a Compare July 30, 2024 10:42
@inglorion inglorion changed the title support 64-bit file offsets on 32-bit glibc/Linux support 64-bit file offsets on systems with 32-bit off_t Jul 30, 2024
Adds large file support (64-bit file positions) to the existing Nix
API when targetting systems that have separate 32-bit and 64-bit
APIs.

On many platforms that support 64-bit file positions, this support is
present in the standard I/O functions. On others, there have
historically been two APIs: the standard API, which limits file size to
2**31-1 bytes, and a separate API that suffixes function names with "64"
and supports 64-bit file sizes. This "transitional interface for 64-bit
file offsets" is not present on every platform. As a result, using the
*64 API will make programs non-portable, whereas using the standard API
will result in programs that cannot handle large file offsets on some
systems, even if those systems actually do have a way to handle
large file offsets.

This change enables 64-bit file offsets on the following platforms
where the standard API only provides 32-bit offsets:

 - 32-bit Linux with glibc.
 - 32-bit Android.

To support large files with a consistent API across platforms, this
change makes Nix functions call the 64-bit capable equivalents where
necessary. Other C libraries may not provide the *64 API, so we continue
to call the standard functions on those.

Broadly, the change consists of 5 parts:

 1. Uses of libc::off_t have are replaced by i64. This provides a
    consistent API across platforms and allows 64-bit offsets to
    be used even when libc::off_t is 32-bit. This is similar to
    std::fs, which uses u64 for file positions. Nix uses i64
    because off_t is a signed type. Into<i64> is used where appropriate
    so that existing code that uses libc::off_t will continue to
    work without changes.

 2. A largefile_fn macro that is used to select the large-file-capable
    version of a function. E.g. largefile_fn![pwrite] is equivalent
    to libc::pwrite64 on glibc/Linux and plain libc::pwrite on
    systems that don't have or need libc::pwrite64.

 3. A new require_largefile macro that is used to skip tests that
    require libc::off_t to be larger than 32 bits.

 4. Changes to fallocate, ftruncate, lseek, mmap, open, openat,
    posix_fadvise, posix_fallocate, pread, preadv, pwrite, pwritev,
    sendfile, and truncate, making them support large files.

 5. A set of test_*_largefile tests to verify that each of the
    previously mentioned functions now works with files whose size
    requires more than 32 bits to represent.

A few functions are still limited to 32-bit file sizes after this
change. This includes the aio* functions, the *stat* functions, and
mkstemp(). Adding large file support to those requires a bit more
work than simply calling a different function and is therefore left
for later.
@inglorion
Copy link
Contributor Author

@asomers, are you happy to take another look at this? As we discussed, I've changed to using i64 for offsets. I've also extended the change to cover Android, which on 32-bit also has 32-bit off_t and the *64 largefile API. With the most recent run, all checks pass.

@SteveLauC
Copy link
Member

Thanks for the detailed write-up, I haven't taken a closer look at the patch code, but the idea generally looks good to me.

From the libc crate source code, those xxx64() symbols are only present on Linux/Android/AIX/Hurd, so adding them for Linux glibc/Andoird as the first step looks good to me.

I think you can compile a libc for Linux that has 32-bit off_t and doesn't provide the *64() functions. Do we want to support such a configuration? I think this already currently doesn't work, because, for example, we provide lseek64 on Linux regardless of word size.

We don't need to support that combination, Nix, as a consumer of the libc crate, should follow and trust it to some extent. Since lseek64() is added to the libc crate under that case, we did it as well.

I hope I can review this PR this weekend.

@inglorion
Copy link
Contributor Author

@SteveLauC, thank you for taking a look! Did you get a chance this past weekend?

@SteveLauC
Copy link
Member

Did you get a chance this past weekend?

No, I was quite down during the last 3 weeks, which makes me can barely handle open-source stuff, will review this today:)

@SteveLauC
Copy link
Member

The libc crate wants to deal with this problem in version 1.0, which would bring huge breaking changes to its code base, Nix is a big consumer of the libc crate, so I am kinda worried that if we handle this before the libc crate takes further action, it would be a big refactor for us as well when we switch to libc 1.0.

Though I would say libc 1.0 would land in the near future, LFS is hard since it differs from the platforms and varies with different C pre-processor configs even in the same platform.

Ref:

  1. libc 1.0 tracking issue: Tracking issue for libc 1.0 rust-lang/libc#3248
  2. For Linux/glibc

@inglorion
Copy link
Contributor Author

@SteveLauC, thank you for taking a look and adding the pointers to the libc crate's proposed future direction.

I do appreciate not wanting to do a big refactor in the future. On the other hand, I don't expect that this will turn out to be particularly painful, for a couple of reasons:

  1. There is already an API with 64-bit file offsets. This is the API you get today on many systems, including for example FreeBSD, 64-bit Linux, and musl. The transitional API (with off64_t, pwrite64, etc.) is the same apart from the "64" in the names. Thus, it seems exceedingly likely that libc will continue to provide this API.

  2. The most likely scenarios seem: (a) libc will make file offsets 64 bits everywhere (i.e. pwrite will accept 64-bit offsets also on 32-bit systems), (b) libc will make the "64" names available everywhere (e.g. "pwrite64" will also be available on FreeBSD), or (c) libc will continue to have 32-bit offsets on some systems and a separate 64-bit API on some systems. All of these scenarios are easily adapted to (basically, we update the largefile_fn macro or remove the largefile_fn macro as appropriate).

  3. Even if the libc crate changes in a way that breaks API compatibility, if nix has already standardized on a 64-bit API that is the same across platforms, we might actually able to shield users of nix from that libc change.

The biggest risk I see is that nix's API will look a bit different from libc's. For example, you could imagine the libc crate deciding on the approach that calling a libc crate function should always call the function of the same name in the C library (i.e. pwrite would call pwrite, not pwrite64). This is different from what this PR does, which is always to call the 64-bit variant (whether that has 64 in the name or not). Arguably, if libc and nix take different paths here, that might cause some confusion.

Having said the above, I think it does make sense for nix to always use the 64-bit functions, whether or not the libc crate ends up taking the same approach. After all, nix aims to provide a friendly wrapper and to unify what can be unified. I would say having pwrite always mean the 64-bit pwrite is consistent with that. It makes code written against nix more portable, because the offsets are always 64 bits. Moreover, it avoids the footgun of having nicely named, available on all platforms functions that will suddenly fail if your files grow beyond a certain size, and having to remember to add "64" to the names where available. Given that nix aims to be a friendly wrapper around libc, I would argue that the right thing for it to do is to use the 64-bit libc functions where available.

In summary, I think standardizing on 64-bit file offsets is the right thing to do for nix, and I think we don't have to wait for the libc crate to make that move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants