Skip to content

Performance Numbers ‐ v0.10

jack edited this page Sep 28, 2023 · 1 revision

Here are some sample performance numbers for Duperemove v0.10. See this page for Duperemove v0.09 numbers

With this page I want to provide some example performance numbers so users have an idea of what to expect when they run duperemove on non-trivial data sets.

The following tests were run on a Dell Precision T3610 workstation with a copy of /home from my workstation rsynced to a fresh btrfs partition. You can find more information about the hardware and software setup here

The version of duperemove used here is v0.10beta4.

There are 1146972 files in the data set (about 700 gigabytes of data). Of those files, duperemove finds 128433 of them to be candidates for deduplication. The data itself is a very mixed set of documents, (source code, papers, etc) and media files (ISO images, music, movies, books).

The first two tests measure performance of the file hash and extent finding steps independent of each other. Finally we do a full combined run with dedupe to get a more realistic test.

File scan / hash performance

weyoun2:~ # time duperemove -hr --io-threads=16 --write-hashes=/root/slash-home-pre-dedupe.dup /btrfs/ &> duperemove.log
real    22m29.311s
user    3m2.211s
sys     3m52.568s

Dedupe Performance

weyoun2:~ # time duperemove -dvh --read-hashes=/root/slash-home-pre-dedupe.dup --io-threads=16 &> duperemove.log
real    19m9.587s
user    9m17.129s
sys     10m30.541s

Full run

We reboot to run with no disk cache present. The numbers until now were just breaking down the first two steps for informational purposes. This is representative of what a user would actually experience if they ran duperemove against this data set. I saved the results to a file to check for errors.

weyoun2:~ # time -drh --hashfile=test.dup /btrfs/ &> full_run.txt
real    45m32.311s
user    14m15.198s
sys     13m24.554s

Note: this should be rerun with the '--io-threads=16' argument

So, for this run, duperemove took about 45 minutes to hash and dedupe around 700 Gigabytes of data.

Differences from v0.09

A couple things immediately stand out - v0.10 has slightly better selection criteria for considering files, hence the smaller number of files we chose as candidates here. This translates into smaller memory overhead as v0.09 needlessly kept many non-candidates in memory.

Also, user time for the hashing phase has dropped dramatically as a result of our using murmur3 for the hashing.

The multi threaded dedupe stage also shows here - we now do the entire dedupe stage in the time it takes v0.09 to assemble extents from the duped blocks.