-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use filefrag -sv to fingerprint COW Reflink extents #88
Comments
Since I've started adding BTW, the Here's an example of using filefrag's FIEMAP output to find differences between two reflinked files. It skips over the headings, filters out cols 1,7,8 and removes any non-digit chars before piping that into |
So a couple of things about the device number. Currently FIEMAP does not return the device number. Fetching the device number from stat is going to be highly misleading, since (a) btrfs can support multiple devices, and (b) btrfs can support multiple subvolumes. What btrfs returns in stat is not necessarily the physical device when multiple subvolumes are involved, and when there are multiple devices involved, and a file can span multiple devices, returning a single "device" number is going to be misleading / wrong. I have no objections to adding a mode where filefrag returns the contents of the returned struct fiemap_extent from the FIEMAP ioctl in some kind of easily parsable format, whether that's CSV, TSV, or JSON. But I don't believe in trying to make up block device numbers by using stat and then returning that as the physical location for all of the fiemap extents, because while that might work for ext4, it most certainly will not always be correct for btrfs. If you want the physical block device for each fiemap extent, please contact the btrfs kernel developers and get them to extend the FIEMAP ioctl. There are reserved fields in the the struct fiemap_extent that could potentially be used for the block device number. But that needs to go into the linux kernel upstream first. And then if filefrag returns information which is "wrong", it will be because it returns exactly what the kernel has returned in struct fiemap extent, and once again, I will refer complainers to the btrfs kernel developers. |
@tytso As per my comment in #84, some use cases (like the one in this issue) don't require actual physical addresses and device refs. They only need some consistent addressing scheme, even if its virtual, to show commonalities. (The situation would be similar for an XFS filesystem that is on a raid layer.) So my comments here are not a complaint, but more of a recognition that @jrw can still accomplish their task using As for the stat device numbers, this will be different for two files that are on different filesystems. |
@tytso The idea I'm looking for with a device number is something to disambiguate files on different filesystems. I don't know what that might be for btrfs. My original thought was devno would be appropriate, but I can see that's wrong. |
If you're trying to figure out whether two files belong to the same file system, why not just fetch st_dev via "stat -c %d ", as opposed to asking for a change in filefrag? Again, this may be misleading in some circumstances, because of tricks and games that btrfs is playing with st_dev and the fact that there is not a device specifier in FIEMAP. But these are issues you need to take up with the btrfs developers; they are not under my control. I don't believe you can can count on filefrag -sv being identical between two reflink flies in all cases, because of the fact that btrfs supports a file system that spans multiple devices. How do you know whether block 42 is on device A or device B? It might work "most" of the time, but again, I don't control btrfs, so please take it up with the btrfs developers. |
I am currently using stat to get the st_dev and I will continue to do so if there's nothing that can be done accurately at the level of filefrag. Also, I hear your point that (maybe?) the filefrag information is not (cannot be) totally accurate due to block numbering across multiple devices. However, I think that the actual change I'm requesting (remove the filename from the output when there is only one file) would not be hard or out-of-line for filefrag to implement. But, if the identification of identical reflink files cannot really be achieved with filefrag output (due to the points you've mentioned), then maybe it's not really a priority. In any case, I did want to make you aware of this attempt to use the information from filefrag -sv to identify identical reflink files. Also, it would obviously be nice/useful to have some kind of utility directly from the btrfs/xfs developers which could be used to identify these identical reflink files. I don't know how to implement it there (so I couldn't make a PR), but I can raise an issue with them. Thanks for your time! |
Submitted https://bugzilla.kernel.org/show_bug.cgi?id=217068 to btrfs devs (hopefully). |
It's not that the information printed by filefrag -v is inaccurate; it's just that it is incomplete if your goal is determine whether two reflinked files are pointing at the same set of blocks. You're trying to use filefrag in a way that it wasn't originally intended; for that matter, the FIEMAP ioctl wasn't intended for this use. My complaint about the patch is that changing filefrag -v to print the st_dev is (a) no different from your using stat(1) to get the st_dev, and (b) might mislead people into thinking that this is actually a reliable way of determining that two files are identical. I won't object to a patch which has adds an option for filefrag -v to print information in a machine-parsable format (although I wonder if using perl and h2ph to directly call the fiemap ioctl might be an easier path forward for you). It's still not going to be reliable for all btrfs setups, though. If you know that you aren't using any btrfs subvolumes and multiple devices for your btrfs file system, it'll probably work fine --- but if you share it on stackexchange or some such, and someone uses your tool in a way that you don't expect, then it's not doing them a service. |
See my new PR #132 which omits printing st_dev. |
I think there's been a misapprehension about the Btrfs data from FIEMAP. It represents an internally consistent virtual address space, regardless of whether or not the fs contains multiple physical devices. If the misapprehension is on my part, I'd really like to know why for instance the dev numbers are necessary in order to be accurate.
Honestly, its a low-level OS feature designed to provide information. If it has enough block address info to determine fragmentation, it also has enough to determine if two files have the same composition. |
Here's the tell for me: No one is objecting to the quality of data (for this use case) when its XFS on top of RAID. No one is saying only specific types of block layers will produce consistent FIEMAP block addresses. |
The question for this use case is whether an extent has a unique block address across multiple devices. You are implying in Btrfs it doesn't, or that the ioctl somehow leaves out a part of the address.
I believe this is to be untrue. FIEMAP data describes the data locations of the file within the context of whatever block layer the fs is using. In Btrfs' case an internal raid-like block layer is used and the need for device numbers is obviated. You won't acknowledge this. Also, you single-out Btrfs and don't warn against "incomplete" data when using filefrag on top of mdraid. Seems like a blanket caveat against anything raid-like would be in order.
This appears to be spillover of the Linux developer controversy about Btrfs subvolume inode numbers, and implying something that isn't true. Extents aren't inodes and Btrfs is not hocus pocus. If Btrfs devs like @osandov say that multiple devices are accounted for in their virtual block addresses then I tend to believe them. At this point I just need to know if |
In the XFS raid case, the numbers reported by FIEMAP are the logical block numbers for the raid device --- that is, they are suitable for use by things like the GRUB bootloader installation program to open /dev/mdXX, seek to a particular offset, and write the bootloader to the correct location on disk. In the case of BTRFS, there is no /dev/mdXX RAID device, because btrfs subsumes the the RAID layer. And so if there is a need to directly write to the file system using a physical block number, you need to know which device to open, and which physical block offset it refers to. Finally, FIEMAP does not define some vague, mystical "virtual block numbers". It explicitly talks about physical offsets:
So, riddle me this. If btrfs supports multiple disks, and fe_physical is the physical offset in bytes from the start of the extent from the beginning of the disk, exactly how does btrfs fill in the fe_physical field? Remember, in btrfs disks can added and removed, so there is no /dev/mdXX RAID device that you can open to get a physical offset with respect to the RAID device (which is how it works when you use XFS on top of MD RAID, or on top of LVM). |
By the way. I suggest you meditate on [1] and read these two article [2][3]. BTRFS has functionality which is like RAID, but it is not traditional RAID the way LVM and MD block devices are structured. In particular, I suggest you take a look at how btrfs chunk id's are used and named (via UUID) and how they map to btrfs stripes, which have their own device and physical offsets. There is no such thing as a virtual lba in btrfs; I'm quite convinced you don't know what you are talking about. At this point, I suggest that you ask that a BTRFS developer, such as Josef Bacik or Chris Mason, talk to me directly. I generally see them at least once or twice a year, at events such as the Linux File Systems, Storage, and MM workshop, as well as the Linux Plumbers events, and we chat about file system issues when we meet. [1] https://btrfs.wiki.kernel.org/index.php/Data_Structures |
All this says is that the working definition of "physical" changes depending on the context. Insisting on strict interpretation of those field labels means the "physical" XFS data is also "wrong" without mdraid; Btrfs has internal raid so the "physical" data is "wrong" without Btrfs. RAID is shown as formal Btrfs concept numerous times on the Data Structures page you linked. For my purposes, only a unique (physical or virtual) address for each extent is required. You questioned how Btrfs could even calculate a logical address and I linked to an example from someone who claims to be a Btrfs developer in kernel.org issues (which is not exactly stackexchange). From the on-disk format reference (emphasis mine):
Citing data structures:
This corroborates what osandov stated about Btrfs using a logical address system to span multiple devices as well as the fact that the provided code displays them. Of course, this program is mainly concerned with adding device numbers to the picture, so its overkill for my application. There is nothing to suggest that logical addresses are localized instead of global, or that they are not unique keys. Meanwhile, I'm reading repeatedly in this issue about the needs of boot loaders, de-fraggers and the "need to directly write to the file system using a physical block number"... none of which concern the use case of determining data identity. Of course I am fine with Btrfs devs chiming in. |
I would like to suggest an enhancement to the filefrag utility. Here's the scenario I'm envisioning:
I'm using btrfs and would like to determine if two files are identical COW reflinks of one another. So, I use
Each output looks like:
If I could remove the mentions of
FILE1
on the 2nd line and last line, then I could directly compare the values of$FILE1_extents
and$FILE2_extents
. I believe that if the two files are on the same file system, then the two values will be identical IFF the data for the two files are identical reflinks of each other.Here is a possible enhancement which would make that reliable:
1. Add the device number to the output
1. Omit the file name when exactly one file is specified on the command line, similar to GNU
grep -h
I have created pull request #87 for this enhancement
Note: the case where there is no data block (e.g. inlined data) also has to be handled. That would be a more complicated change to the
filefrag
code, requiring adding an option. I handle this by grepping forinline|unknown_loc|delalloc
to know which files cannot be reflinked.The text was updated successfully, but these errors were encountered: