Skip to content

Commit

Permalink
readme for 2.0 update
Browse files Browse the repository at this point in the history
fix(?) PIL import for pyinstaller
preview improvements
  • Loading branch information
PJDude committed May 23, 2024
1 parent d9b3525 commit 7782a67
Show file tree
Hide file tree
Showing 5 changed files with 130 additions and 56 deletions.
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
A cross-platform GUI utility for finding duplicated files, delete or link them to save space.

## Features:
- Scanning for duplicate files in **multiple designated folders** (up to 8). Optional "Cross paths" mode
- Scanning for duplicate files in **multiple designated folders** (up to 8). Optional **"Cross paths"** display mode
- Optional **command line parameters** to start scanning immediately or integrate **Dude** with your favorite file manager
- Two **synchronized** panels:
- groups of duplicates
Expand All @@ -14,7 +14,13 @@ A cross-platform GUI utility for finding duplicated files, delete or link them t
- Support for **regular expressions** or **glob** expressions syntax
- Searching for duplicates based on the **hash** of the file content. Different filenames or extensions do not affect the search results
- Works on **Linux** and **Windows**


💥 Major news in 2.x version:
- **Images similarity mode**, with caching, sensitivity parameters and rotated images detection
- **preview window** for images and text files
- **"Same directory"** display mode

## Why another anti-duplicate application ?
- Because you need to see the context of removed files, and use such application clearly,safely and easily.

Expand Down Expand Up @@ -72,6 +78,8 @@ dude --help
## Portability
**Dude** writes log files, configuration and cache files in runtime. Default location for these files is **dude.data** folder, created next to the **dude executable** file. If there are no write access rights to such folder, platform-specific folders are used for cache, settings and logs (provided by **appdirs** module). You can use --appdirs command line switch to force that behavior even when **dude.data** is accessible.

## Dude is BIG 💥
Well, unfortunately, the 2.x version has much larger distribution package than v1. This is mainly because necessity of importing [NumPy](https://numpy.org/) and [SciPy](https://scipy.org/) packages for image hashing and clustering. I apologize for the inconvenience.

## Technical information
- Scanning process analyzes selected paths and groups files with the same size. **Dude** compare files by calculated **SHA1** hash of file content. CRC calculation is done in separate threads for every identified device (drive). Number of active threads is limited by available CPU cores. Aborting of CRC calculation gives only partial results - not all files may be identified as duplicates. Restarted scanning process will use cached data. The CRC is always calculated based on the entire contents of the file.
Expand All @@ -85,6 +93,7 @@ dude --help
- ***Soft links*** to **directories** are skipped during the scanning process. ***Soft links*** to **files** are ignored during scanning. Both appear in the bottom "folders" pane.
- ***Hard links*** (files with stat.st_nlink>1) currently are ignored during the scanning process and will not be identified as duplicates (within the same inode obviously, as with other inodes). No action can be performed on them. They will only appear in the bottom "folders" pane. This may change in the future versions.
- the "delete" action moves files to **Recycle Bin / Trash** or deletes them permanently according to option settings.
- 💥 Image similarity mode is based on the libraries: [PIL](https://python-pillow.org/), [ImageHash](https://pypi.org/project/ImageHash/), and the [DBSCAN](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) data clustering algorithm from [scikit-learn](https://scikit-learn.org/stable/index.html). For maximum performance, image hashing utilizes all available CPU cores with multiple threads and the DBSCAN algorithm implementation is multi-threaded internally. Key parameters of clustering are available to set on the scan dialog.

###### Manual build (linux):
```
Expand All @@ -111,3 +120,5 @@ python ./src/dude.py

## Licensing
- **dude** is licensed under **[MIT license](./LICENSE)**

### Check out my [homepage](https://github.com/PJDude) for other projects.
2 changes: 1 addition & 1 deletion scripts/pyinstaller.run.bat
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

@echo.
@echo running-pyinstaller-stage_dude
pyinstaller --version-file=version.pi.dude.txt --noconfirm --clean --add-data="distro.info.txt:." --add-data="version.txt;." --add-data="../LICENSE;." --icon=icon.ico --distpath=%OUTDIR% --windowed --contents-directory=internal --name dude --additional-hooks-dir=. --collect-binaries tkinterdnd2 dude.py || exit /b 2
pyinstaller --version-file=version.pi.dude.txt --noconfirm --clean --add-data="distro.info.txt:." --add-data="version.txt;." --add-data="../LICENSE;." --icon=icon.ico --distpath=%OUTDIR% --windowed --contents-directory=internal --name dude --additional-hooks-dir=. --collect-binaries tkinterdnd2 --hidden-import="PIL._tkinter_finder" dude.py || exit /b 2

@echo.
@echo running-pyinstaller-dudecmd
Expand Down
2 changes: 1 addition & 1 deletion scripts/pyinstaller.run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ echo pyinstaller `pyinstaller --version` >> distro.info.txt

echo ''
echo running-pyinstaller-stage_dude
pyinstaller --strip --noconfirm --noconsole --clean --add-data="distro.info.txt:." --add-data="version.txt:." --add-data="../LICENSE:." --contents-directory=internal --distpath=$outdir --additional-hooks-dir=. --collect-binaries tkinterdnd2 ./dude.py
pyinstaller --strip --noconfirm --noconsole --clean --add-data="distro.info.txt:." --add-data="version.txt:." --add-data="../LICENSE:." --contents-directory=internal --distpath=$outdir --additional-hooks-dir=. --collect-binaries tkinterdnd2 --hidden-import='PIL._tkinter_finder' ./dude.py

echo ''
echo packing
Expand Down
7 changes: 2 additions & 5 deletions src/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,11 @@

from send2trash import send2trash

from numpy import array as numpy_array
from PIL.Image import open as image_open,new as image_new, alpha_composite as image_alpha_composite
from imagehash import average_hash,phash,dhash,whash
from imagehash import average_hash,phash,dhash

from sklearn.cluster import DBSCAN
from numpy import array as numpy_array

DELETE=0
SOFTLINK=1
Expand Down Expand Up @@ -629,9 +629,6 @@ def my_hash_combo(file,hash_size):
for hash_row in dhash(file,hash_size).hash:
seq_hash_extend(hash_row)

#whash
#colorhash(file).hash.tolist()

return tuple(seq_hash)

for index_tuple,fullpath in sorted(source_dict.items(), key = lambda x : x[0][6], reverse=True):
Expand Down
162 changes: 114 additions & 48 deletions src/dude.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,11 @@
from os import sep,stat,scandir,readlink,rmdir,system,getcwd,name as os_name,environ as os_environ
from gc import disable as gc_disable, enable as gc_enable,collect as gc_collect,set_threshold as gc_set_threshold, get_threshold as gc_get_threshold

from os.path import abspath,normpath,dirname,join as path_join,isfile as path_isfile,split as path_split,exists as path_exists,isdir
from os.path import abspath,normpath,dirname,join as path_join,isfile as path_isfile,split as path_split,exists as path_exists,isdir, splitext as path_splitext

from PIL import Image, ImageTk
from PIL.ImageTk import PhotoImage as ImageTk_PhotoImage
from PIL.Image import NEAREST,BILINEAR,open as image_open

windows = bool(os_name=='nt')

Expand Down Expand Up @@ -360,20 +362,18 @@ def handle_sigint(self):
self.action_abort=True

def preview_yscrollcommand(self,v1,v2):

if v1=='0.0' and v2=='1.0':
self.preview_canvas_vbar.grid_forget()
self.preview_text_vbar.grid_forget()
else:
self.preview_canvas_vbar.set(v1,v2)
self.preview_canvas_vbar.grid(row=0,column=1,sticky='ns')
self.preview_text_vbar.set(v1,v2)
self.preview_text_vbar.grid(row=0,column=1,sticky='ns')

def preview_xscrollcommand(self,v1,v2):

if v1=='0.0' and v2=='1.0':
self.preview_canvas_hbar.grid_forget()
self.preview_text_hbar.grid_forget()
else:
self.preview_canvas_hbar.set(v1,v2)
self.preview_canvas_hbar.grid(row=1,column=0,sticky='we')
self.preview_text_hbar.set(v1,v2)
self.preview_text_hbar.grid(row=1,column=0,sticky='we')

def __init__(self,cwd,paths_to_add=None,exclude=None,exclude_regexp=None,norun=None,images_mode_tuple=None):
images,ihash,idivergence,rotations = images_mode_tuple if images_mode_tuple else (False,0,0,False)
Expand Down Expand Up @@ -426,44 +426,42 @@ def __init__(self,cwd,paths_to_add=None,exclude=None,exclude_regexp=None,norun=N
####################################
self.preview = preview = Toplevel(self_main,takefocus=False)
preview_bind = preview.bind
preview.minsize(200,50)
preview.minsize(200,200)
preview.title('DUDE - Preview')

preview.withdraw()
preview.update()
preview.protocol("WM_DELETE_WINDOW", lambda : self.hide_preview())
preview_bind('<Escape>', lambda event : self.hide_preview() )

preview_frame=Frame(preview)
#preview_frame_parent=SFrame(preview,'red')
#preview_frame=preview_frame_parent.frame()
preview_frame_txt=self.preview_frame_txt=Frame(preview)

preview_bind('F11', lambda event : self.hide_preview() )
preview_bind('<FocusIn>', lambda event : self.preview_focusin() )
preview_bind('<Configure>', lambda event : self.preview_conf() )
preview_bind('<Configure>', self.preview_conf)

####################################
preview_frame.grid_columnconfigure(0, weight=1)
preview_frame.grid_rowconfigure(0, weight=1)

self.preview_canvas = Canvas(preview_frame)
self.preview_canvas.grid(row=0,column=0,sticky='news')
preview_frame_txt.grid_columnconfigure(0, weight=1)
preview_frame_txt.grid_rowconfigure(0, weight=1)

self.preview_canvas_image=self.preview_canvas.create_image(0, 0, anchor="nw")
self.preview_canvas.config(scrollregion=self.preview_canvas.bbox('all'))
self.preview_text = Text(preview_frame_txt, bg='white',relief='groove',bd=2,wrap='none')
self.preview_text.grid(row=0,column=0,sticky='news')

self.preview_canvas_vbar = Scrollbar(preview_frame, orient='vertical', command=self.preview_canvas.yview)
self.preview_canvas_vbar.grid(row=0,column=1,sticky='ns')
self.preview_text_vbar = Scrollbar(preview_frame_txt, orient='vertical', command=self.preview_text.yview)
self.preview_text_vbar.grid(row=0,column=1,sticky='ns')

self.preview_canvas_hbar = Scrollbar(preview_frame, orient='horizontal', command=self.preview_canvas.xview)
self.preview_canvas_hbar.grid(row=1,column=0,sticky='we')
self.preview_text_hbar = Scrollbar(preview_frame_txt, orient='horizontal', command=self.preview_text.xview)
self.preview_text_hbar.grid(row=1,column=0,sticky='we')

self.preview_canvas.config(yscrollcommand=self.preview_yscrollcommand, xscrollcommand=self.preview_xscrollcommand)
self.preview_text.config(yscrollcommand=self.preview_yscrollcommand, xscrollcommand=self.preview_xscrollcommand)

####################################
self.preview_label_txt=Label(preview,relief='groove',bd=2,anchor='w')
self.preview_label_txt.pack(fill='x',side='top',anchor='nw',padx=1,pady=1)
preview_frame.pack(fill='both',side='top',anchor="nw",expand=1)
self.preview_label_img=Label(preview,bd=2,anchor='nw')

self.preview_label_txt.pack(fill='x',side='top',anchor='nw')
self.preview_label_img.pack(fill='both',side='top',anchor='nw')
preview_frame_txt.pack(fill='both',side='top',anchor="nw",expand=1)

self.main_update = self_main.update
self.main_update()
Expand Down Expand Up @@ -553,7 +551,7 @@ def __init__(self,cwd,paths_to_add=None,exclude=None,exclude_regexp=None,norun=N

bg_color = self.bg_color = style.lookup('TFrame', 'background')
preview.configure(bg=bg_color)
preview_frame.configure(bg=bg_color)
self.preview_frame_txt.configure(bg=bg_color)

style.theme_use("dummy")
style_map = style.map
Expand Down Expand Up @@ -884,6 +882,9 @@ def self_folder_tree_yview(*args):
self.selected={}
self.selected[self.groups_tree]=None
self.selected[self.folder_tree]=None

self.sel_full_path_to_file=None

#######################################################################
#scan dialog

Expand Down Expand Up @@ -2593,11 +2594,31 @@ def tree_on_mouse_button_press(self,event,toggle=False):
preview_photo_image_limit=64
preview_photo_image_list=[]
preview_shown=False
preview_size=(1,1)

def preview_conf(self):
def preview_conf(self,event):
if self.preview_shown:
#print('preview_conf',event)

self.cfg.set('preview',str(self.preview.geometry()),section='geometry')

new_preview_size = (event.width,event.height)
new_preview_size = (event.width,event.height)

if self.preview_size!=new_preview_size:
self.txt_label_heigh = self.preview_label_txt.winfo_height()

#print('preview_conf - real')
self.preview_size=new_preview_size

self.preview_photo_image_cache={}
self.preview_photo_image_list=[]

self.update_preview()
else:
#print('preview_conf - skipped')
pass

def preview_focusin(self):
self.main.focus_set()
self.sel_tree.focus_set()
Expand All @@ -2606,44 +2627,88 @@ def show_preview(self):
self.preview_shown=True

self_preview = self.preview
if cfg_geometry:=self.cfg_get('preview','100x200',section='geometry'):
if cfg_geometry:=self.cfg_get('preview','200x200',section='geometry'):
self_preview.geometry(cfg_geometry)

self.update_preview()
self_preview.deiconify()

self_preview.lift()
self_preview.attributes('-topmost',True)
self_preview.after_idle(self_preview.attributes,'-topmost',False)

self.main.focus_set()
self.sel_tree.focus_set()

text_extensions = ('.txt','.bat','.sh','.md','.html','.py','.cpp','.h','.ini','.tcl','.xml','.url')
pic_extensions = ('.jpeg','.jpg','.jp2','.jpx','.j2k','.png','.bmp','.dds','.dib','.eps','.gif','.tga','.tiff','.tif','.webp','.xbm')
def update_preview(self):
if self.preview_shown:
path = self.sel_full_path_to_file

if path:
try:
if path not in self.preview_photo_image_cache:
im1 = Image.open(path)
head,ext = path_splitext(path)

width = im1.width
height = im1.height
if ext.lower() in self.text_extensions:
self.preview_label_img.pack_forget()
try:

size = (width // 4, height // 4)
with open(path, 'r') as file:
self.preview_text.delete(1.0, 'end')
self.preview_text.insert('end', file.read())

self.preview_photo_image_cache[path]=(ImageTk.PhotoImage(im1.resize(size)),f'{width} x {height}' )
self.preview_photo_image_list.append(path)
if len(self.preview_photo_image_list)>self.preview_photo_image_limit:
del self.preview_photo_image_cache[self.preview_photo_image_list.pop(0)]
self.preview_label_txt.configure(text=path)

self.preview_canvas.itemconfig(self.preview_canvas_image, image=self.preview_photo_image_cache[path][0])
self.preview_canvas.config(scrollregion=self.preview_canvas.bbox('all'))
except Exception as e:
self.preview_label_txt.configure(text=str(e))
else:
self.preview_frame_txt.pack(fill='both',expand=1)

elif ext.lower() in self.pic_extensions:
self.preview_frame_txt.pack_forget()

try:
self_preview_photo_image_cache = self.preview_photo_image_cache
if path not in self_preview_photo_image_cache:
im1 = Image.open(path)

preview_size_width,preview_size_height = self.preview_size
#print(f'{preview_size_width=},{preview_size_height=}')

height = im1.height
ratio_y = height/(preview_size_height-self.txt_label_heigh)

width = im1.width
ratio_x = width/preview_size_width

#print(f'{width=},{height=}')

biggest_ratio = max(ratio_x,ratio_y,1)
#print(f'{biggest_ratio=}')

size = ( int (width/biggest_ratio), int(height/biggest_ratio))

self_preview_photo_image_cache[path]=(ImageTk_PhotoImage(im1.resize(size,BILINEAR)),f'{width} x {height}',round(biggest_ratio,2) )
self_preview_photo_image_list = self.preview_photo_image_list
self_preview_photo_image_list.append(path)
if len(self_preview_photo_image_list)>self.preview_photo_image_limit:
del self_preview_photo_image_cache[self_preview_photo_image_list.pop(0)]

self.preview_label_img.configure(image=self_preview_photo_image_cache[path][0])
self.preview_label_txt.configure(text=self_preview_photo_image_cache[path][1] + f' (factor: {self_preview_photo_image_cache[path][2]})')
except Exception as e:
self.preview_label_txt.configure(text=str(e))
else:
self.preview_label_img.pack(fill='both',expand=1)

else:
self.preview_frame_txt.pack_forget()
self.preview_label_img.pack_forget()
self.preview_label_txt.configure(text='')

self.preview_label_txt.configure(text=self.preview_photo_image_cache[path][1])
except Exception as e:
self.preview_canvas.itemconfig(self.preview_canvas_image, image='')
self.preview_label_txt.configure(text=str(e))
else:
self.preview_canvas.itemconfig(self.preview_canvas_image, image='')
self.preview_frame_txt.pack_forget()
self.preview_label_img.pack_forget()
self.preview_label_txt.configure(text='')

def hide_preview(self):
Expand Down Expand Up @@ -3201,6 +3266,7 @@ def scan(self):
self.status('Scanning...')
self.cfg.write()

self.hide_preview()
dude_core.reset()
self.status_path_configure(text='')
self.groups_show()
Expand Down

0 comments on commit 7782a67

Please sign in to comment.