-
Notifications
You must be signed in to change notification settings - Fork 82
Performance Numbers ‐ v0.10
Here are some sample performance numbers for Duperemove v0.10. See this page for Duperemove v0.09 numbers
With this page I want to provide some example performance numbers so users have an idea of what to expect when they run duperemove on non-trivial data sets.
The following tests were run on a Dell Precision T3610 workstation with a copy of /home
from my workstation rsynced to a fresh btrfs partition. You can find more information about the hardware and software setup here
The version of duperemove used here is v0.10beta4.
There are 1146972 files in the data set (about 700 gigabytes of data). Of those files, duperemove finds 128433 of them to be candidates for deduplication. The data itself is a very mixed set of documents, (source code, papers, etc) and media files (ISO images, music, movies, books).
The first two tests measure performance of the file hash and extent finding steps independent of each other. Finally we do a full combined run with dedupe to get a more realistic test.
weyoun2:~ # time duperemove -hr --io-threads=16 --write-hashes=/root/slash-home-pre-dedupe.dup /btrfs/ &> duperemove.log
real 22m29.311s
user 3m2.211s
sys 3m52.568s
weyoun2:~ # time duperemove -dvh --read-hashes=/root/slash-home-pre-dedupe.dup --io-threads=16 &> duperemove.log
real 19m9.587s
user 9m17.129s
sys 10m30.541s
We reboot to run with no disk cache present. The numbers until now were just breaking down the first two steps for informational purposes. This is representative of what a user would actually experience if they ran duperemove against this data set. I saved the results to a file to check for errors.
weyoun2:~ # time -drh --hashfile=test.dup /btrfs/ &> full_run.txt
real 45m32.311s
user 14m15.198s
sys 13m24.554s
Note: this should be rerun with the '--io-threads=16' argument
So, for this run, duperemove took about 45 minutes to hash and dedupe around 700 Gigabytes of data.
A couple things immediately stand out - v0.10 has slightly better selection criteria for considering files, hence the smaller number of files we chose as candidates here. This translates into smaller memory overhead as v0.09 needlessly kept many non-candidates in memory.
Also, user time for the hashing phase has dropped dramatically as a result of our using murmur3 for the hashing.
The multi threaded dedupe stage also shows here - we now do the entire dedupe stage in the time it takes v0.09 to assemble extents from the duped blocks.