Skip to content

Performance Numbers ‐ v0.13

jack edited this page Nov 4, 2023 · 7 revisions

The procedure

The page cache is dropped between each tests.

Dataset are reduplicated between each deduplication tests.

The numbers

Small dataset: size: 17GB, Number of regular files: 38916, Average file size: 457KB

Without partial lookup

Command User time (seconds) System time (seconds) Elapsed (wall clock) time (h:mm:ss or m:ss) Maximum resident set size (kbytes) Hashfile size
First run without deduplication duperemove -rh data/ --hashfile=/tmp/hashes 105.52 44.42 0:43.76 140592 16MB
First run with deduplication duperemove -rh data/ --hashfile=/tmp/hashes -d 105.06 51.19 0:46.81 135036 16MB
Second run without changes duperemove -rh data/ --hashfile=/tmp/hashes 0.97 1.57 0:03.12 32604 16MB

With partial lookup

Command User time (seconds) System time (seconds) Elapsed (wall clock) time (h:mm:ss or m:ss) Maximum resident set size (kbytes) Hashfile size
First run without deduplication duperemove -rh data/ --hashfile=/tmp/hashes --dedupe-option=partial 1655.76 3308.71 16:13.41 178836 35MB
First run with deduplication duperemove -rh data/ --hashfile=/tmp/hashes --dedupe-option=partial -d 1252.24 2525.61 12:27.74 161648 28MB
Second run without changes duperemove -rh data/ --hashfile=/tmp/hashes --dedupe-option=partial 1.33 1.76 0:03.91 62744 28MB

Larger dataset (vm images): size: 48GB, Number of regular files: 6, Average file size: 8.1GB

Without partial lookup

Command User time (seconds) System time (seconds) Elapsed (wall clock) time (h:mm:ss or m:ss) Maximum resident set size (kbytes) Hashfile size
First run without deduplication duperemove -rh data/ --hashfile=/tmp/hashes 5.68 18.25 0:18.32 56600 2.3MB
First run with deduplication duperemove -rh data/ --hashfile=/tmp/hashes -d 5.81 26.97 0:33.21 58740 2.3MB
Second run without changes duperemove -rh data/ --hashfile=/tmp/hashes 0.05 0.02 0:00.07 7680 2.3MB

With partial lookup

Command User time (seconds) System time (seconds) Elapsed (wall clock) time (h:mm:ss or m:ss) Maximum resident set size (kbytes) Hashfile size
First run without deduplication duperemove -rh data/ --hashfile=/tmp/hashes --dedupe-option=partial 9.42 21.21 0:20.87 88080 43MB
First run with deduplication duperemove -rh data/ --hashfile=/tmp/hashes -d --dedupe-option=partial 9.75 36.36 1:26.28 88628 43MB
Second run without changes duperemove -rh data/ --hashfile=/tmp/hashes --dedupe-option=partial 0.00 0.01 0:00.02 6764 43MB

Largest dataset: size: 1.1TB, Number of regular files: 387930, Average file size: 3.3MB

Without partial lookup

Command User time (seconds) System time (seconds) Elapsed (wall clock) time (h:mm:ss or m:ss) Maximum resident set size (kbytes) Hashfile size
First run without deduplication duperemove -rh data/ --hashfile=/tmp/hashes 1810.24 1251.56 34:12.38 477208 172MB
Second run without changes duperemove -rh data/ --hashfile=/tmp/hashes 9.94 17.84 0:33.02 299016 172MB

With partial lookup

The run was shutdown at 14%.

Command User time (seconds) System time (seconds) Elapsed (wall clock) time (h:mm:ss or m:ss) Maximum resident set size (kbytes) Hashfile size
First run without deduplication duperemove -rh data/ --hashfile=/tmp/hashes --dedupe-option=partial 11359.23 23423.60 1:55:54 559952 105MB

Large number of identical small files dataset: size: 3.9G, Number of regular files: 1000000, Average file size: 1KB

Without partial lookup

Command User time (seconds) System time (seconds) Elapsed (wall clock) time (h:mm:ss or m:ss) Maximum resident set size (kbytes) Hashfile size
First run without deduplication duperemove -rh data/ --hashfile=/tmp/hashes 6189.95 3485.18 2:10:10 801312 283M
First run with deduplication duperemove -rh data/ --hashfile=/tmp/hashes -d 2611.09 991.01 26:08.32 596204 166M
Second run without changes duperemove -rh data/ --hashfile=/tmp/hashes 14.32 28.71 0:46.59 355712 166M

With partial lookup

Command User time (seconds) System time (seconds) Elapsed (wall clock) time (h:mm:ss or m:ss) Maximum resident set size (kbytes) Hashfile size
First run with deduplication duperemove -rh data/ --hashfile=/tmp/hashes -d --dedupe-option=partial 3676.44 4204.47 46:27.87 601780 166M
Second run without changes duperemove -rh data/ --hashfile=/tmp/hashes --dedupe-option=partial 14.59 27.26 0:45.68 376344 166M

Hardware details

Kernel/Distribution

3.84 [jack:~] lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux trixie/sid
Release:	n/a
Codename:	trixie
3.93 [jack:~] uname -a
Linux debian 6.4.0-4-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.4.13-1 (2023-08-31) x86_64 GNU/Linux

CPU (Ryzen 3600, 12 x 3.60GHz)

3.93 [jack:~] cat /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 23
model		: 113
model name	: AMD Ryzen 5 3600 6-Core Processor
stepping	: 0
microcode	: 0x8701021
cpu MHz		: 2794.327
cache size	: 512 KB
physical id	: 0
siblings	: 12
core id		: 0
cpu cores	: 6
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips	: 7186.05
TLB size	: 3072 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

    <12 cpus in total, redudant info removed>

Memory (32GB RAM 2666MHz C16)

  3.96 [jack:~] cat /proc/meminfo | head -10
  MemTotal:       32741508 kB
  MemFree:          952968 kB
  MemAvailable:   28113416 kB
  Buffers:           18360 kB
  Cached:         26950188 kB
  SwapCached:            0 kB
  Active:          4939996 kB
  Inactive:       25399956 kB
  Active(anon):    2798580 kB
  Inactive(anon):   772904 kB

Disk

(Micron 9300 MTFDHAL3T8TDP), ~1TB data on an xfs partition