Caution
🛑 DO NOT USE LTC - There is unresolved security issue, see #10
This is a simple program to remove old thumbnails from pict-rs and lemmy.
It will periodically check the lemmy database for posts that are older than given amount of months and instruct pict-rs to drop the thumbnail for that post.
This program requires connection to the lemmy postgres database and pict-rs HTTP service. The expected deployment is as container/service alongside the pict-rs and lemmy postgres services.
Edit the lemmy docker-compose.yml
to include this service:
services:
# ....
cleaner:
image: ghcr.io/wereii/lemmy-thumbnail-cleaner:v0.1.3
#restart: unless-stopped
environment:
- RUST_LOG=info
- INSTANCE_HOST=https://your_instance_host.here/
- POSTGRES_DSN=postgresql://user:password@postgres/lemmy
- PICTRS_HOST=pict-rs:8080
- PICTRS_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#- THUMBNAIL_MIN_AGE_MONTHS=3
#- CHECK_INTERVAL=300
#- QUERY_LIMIT=100
Pict-rs also needs to be configured with api key (PICTRS__SERVER__API_KEY
), otherwise the endpoint required for this cleaner is not accessible!
-
INSTANCE_HOST
- The "root url" of your lemmy instance. Forlemmy.world
this would look likehttps://lemmy.world/
.This is used to determine if a thumbnail of a post is local to this instance (and thus can be deleted).
-
POSTGRES_DSN
- The URI to the lemmy postgres database. Must be full postgres DSN and must specify the lemmy database (usually/lemmy
). -
PICTRS_API_KEY
- The API key for the pict-rs service. -
PICTRS_HOST
- The host of the pict-rs service, this is just ip/hostname, optionally port (ex.pict-rs:8080
).
-
RUST_LOG
- Controls logging level,debug
will also print the thumbnail ids being deleted (lots of lines).
Without this the default level iswarn
. -
THUMBNAIL_MIN_AGE_MONTHS
- The minimum age of a thumbnail in months before it is considered for deletion.
Default is3
months. -
CHECK_INTERVAL
- The interval in seconds the program sleeps between checks.
Default is300
. The main use is to give other services breathing room and not keep hitting them constantly with requests.- Setting this to
0
will make the program run once and then exit.
- Setting this to
-
QUERY_LIMIT
- The maximum number of posts to get from postgres in one query and thus the maximum number of thumbnails to delete in one check interval.
Default is100
.- Increasing this has direct impact on postgres as it has to return more rows.
- It will also increase memory usage of the cleaner as it keeps the rows in memory until processed (though this shouldn't have too big of an impact).
-
DELETE_ON_NOT_FOUND
- If set totrue
the cleaner will "unlink" the thumbnail from the post even if pict-rs returns 404 (not found) when trying to delete it (basically on 404 it assumes the image does not exist already). Default isfalse
.- Warning: This can leave "dead" thumbnails in pict-rs that are not associated with any post!
- You should only set this if you are sure that the thumbnail can't be anywhere else (e.g. in some other pict-rs instance).
The CHECK_INTERVAL
and QUERY_LIMIT
is what controls how demanding the cleaner is on the database and pict-rs.
You should tweak it to fit the performance of your infrastructure.
When there is a lot (10k+) that can be cleaned up you should reduce the CHECK_INTERVAL
(5-15s) and then
increase QUERY_LIMIT
(~500) to speed up the process.
Keep in mind the program is intentionally single-threaded so increasing QUERY_LIMIT
too much will keep the program
continually hitting
both pict-rs and postgres for longer.
Once there is less (hundreds) you can increase the CHECK_INTERVAL
to hours or days as there won't be that much new
thumbnails old enough (but that depends on your traffic).
I would personally expect this to run once or twice a day at that point, with query limit of around the 300.
When the bucket lifecycle is configured to Keep Only Last Version
, the old versions are not deleted
immediately but hidden instead and deleted after 24h.
So don't be surprised if the bucket size doesn't change immediately.
- My instance of 2 MAU, running for about half a year had about 95k files and 12G of data before running the cleaner.
- After running the cleaner, fully removing all older then 1 month (and after waiting for almost 2 days) the b2 bucket usage dropped to 57k files and 7.9G
My rust is rusty so there might be some issues with the code.
I have tested this on my own instance and it works as expected but
USE AT YOUR OWN RISK