___
/\__\ _____ ___
/:/ / /::\ \ /\__\
/:/ / /:/\:\ \ /:/ /
/:/ / ___ /:/ \:\__\ /:/__/
/:/__/ /\__\ /:/__/ \:|__| /::\ \
\:\ \ /:/ / \:\ \ /:/ / /:/\:\ \
\:\ /:/ / \:\ /:/ / \/__\:\ \
\:\/:/ / \:\/:/ / \:\ \
\::/ / \::/ / \:\__\
\/__/ mp \/__/ isk \/__/ ree
You make backups from your macOS disk, right? But how can you check that your important stuff got copied correctly? Using diff
in the terminal like this
diff -r FS1 FS2
gives you so many errors that the command is impossible to use. This is caused by a few separate problems, but the main one is: diff
reports symlinks with non-existing target as errors, but for a disk compare we only want to know whether the target paths in the links are the same, not whether the targets exist.
cmpdisktree to the rescue! This command line tool compares filesystems ("disks") in a sensible way for backup check. It checks symlinks for same target path and excludes by default some system directories. It is mainly designed for macOS disks but via some options it can be tweaked for other purposes.
cmpdisktree takes two directories as command line parameters. These are often a system disk on one volume and its backup on another. That's why I call them filesystems FS1 and FS2. cmpdisktree compares theses filesystems in two phases, the Traversal Phase and the Compare Phase.
- The Traversal Phase
-
In the first phase cmpdisktree goes through the directory tree of FS1 by recursively looking at files, directories and symlinks in each directory. Files which exist in FS1 and FS2 are then marked for later compare of the content while directories and symlinks are compared straight away: directories for the same entries and symlinks for that they point to the same (existing or non-existing) target.
- The Content Compare Phase
-
In this second phase cmpdisktree compares the content of the files in FS1 and FS2 which were marked during the traversal phase. The comparing is done in a "non-shallow" way, byte for byte. Deending on the size of the filesystems this can take a long time, so phase 2 can be disabled (option
--traversal-only
) if a compare of the directory structure seems to be good enough.
In the traversal phase cmpdisktree applies a list of exclusion patterns. If a directory or file matches one of the exclusions it will not be compared, so differences are not reported. This list is adapted from Carbon Copy Cloner's exclusion list. It is heavily tailored towards macOS system disks. You can completely swiych this off via --clear-std-exclusions
or add more exclusions (like for .DS_Store
files) via --live-fs-exclusions
. For details see below.
The compare phase can be used without traversal phase and alternatively be controlled by an external file (option --traverse-from-list
). This is quite a unique option which disables the traversal phase; instead the paths of the files to compare are read from a text file. This makes sense when from very large disks only a portion of files (e.g. a random sample) should be compared. The sample is provided via the external file.
For details about other ways to customise the cmpdisktree's behaviour here the help message:
Usage: cmpdisktree.py [OPTIONS] FS1 FS2
Compare the directories FS1 and FS2 as macOS disk structures
Errors are reported to a file (default 'cmp-err.log')
Options:
-v, --verbose Print debug output
-q, --quiet No informational output
-i, --report-identical Report identical files to file (default:
'cmp-ok.log')
-1, --traversal-only Only traverse FSs (Phase 1). Don't compare
file contents
-t, --traverse-from-list PATH Path of file with list of relative paths for
traversal
-c, --clear-std-exclusions Don't apply the standard exclusions for macOS
disk files systems (i.e. compare everything)
-l, --live-fs-exclusions Add exclusions for live filesystems (e.g.
boot volumes)
-m, --ignore-missing-in-FS1 Ignore when a file from FS2 doesn't exist in
FS1 (used for boot backups where FS1 is the
live disk)
-r, --relative-fs-top Allow relative filesystem top (used when
applying the exclusions)
-o, --output-path PATH Output path for report file.
--version Show the version and exit.
--help Show this message and exit.
--output-path PATH
:-
Set where the output should go:
If the path of a file is given, use this file as error log file and write (if applicable) the OK log tocmp-ok.log
in the same directory.
if the path of a directory is given, writecmp-err.log
andcmp-ok.log
in this directory. --relative-fs-top
:-
Normally exclusion patterns which match only at the beginning of a path name have to start with that pattern as expected. This option widens the match to the middle of a path name as well. This is useful if you want to check on boot filesystems which you have copied deeper into a file system.
--live-fs-exclusions
:-
Use additional exclusions which ignore files like .DS_Store. This helps to compare "life" filesystems as the same even if these files have changed (e.g. by looking at the file structure in Finder). This adds some various (experimental) cache exclusions as well.
--traverse-from-list PATH
:-
Do not traverse the filesystem; instead use the list of relative paths given in a text file (one path per line). The paths are relative to the starting directories
FS1
/FS2
.
Note: Only paths to normal files are compares for same content. Directories are only checked for existence in FS1 and FS2; with symlinks is only checked that they point to the same target.
Version 0.2
-
Option
--traverse-from-list
is implemented. -
Minor improvement to exclusion list, etc.
Version 0.1
- Compare while reporting progress bar
- Option
--clear-std-exclusions
to compare all files in the filesystems.
Thanks to:
-
Mike Bombich for Carbon Copy Cloner1 and its exclusion list.
-
Kent Nassen and Lennert Stock for the ASCII art characters.
-
Armin Ronacher and contributors for click.
-
Casper da Costa-Luis and contributors for tqdm.
Footnotes
-
Backup tools like Carbon Copy Cloner gave me the reason to develop cmpdisktree in the first place. ↩