ncdu (NCurses Disk Usage) is a great utility with an ncurses
interface that allows browsing through directories and check their disk
usage (like the du
command). It first walks through a directory and
then allows browsing the cached result.
Newer versions (≥1.9) of ncdu have a feature allowing you to make
a JSON export file on a remote machine (-o
option) and then
browse directories locally (-f
option).
On some old machines there may be an old version of ncdu without that option or there may be no ncdu at all and it may be expensive to build ncdu for every one of them or for some reason you cannot get static binaries for a specific platform. The ncdu-export tool is a workaround - it generates an export file compatible with ncdu and requires only Python 2.6 or Python 3.2 (or newer) on the remote machine.
Below, there is also a script based on the find
command only (without
using Python).
Note, however, that there are static binaries for x86, x86_64 and ARM available directly from the ncdu's homepage, so tools from this repository may be useful only if you cannot use those static binaries.
Currently the scripts' output is not identical to ncdu's output but should work well enough.
Example:
1. Copy the script to the remote host
$ scp ncdu-export remote-host:
2A. Pipe meta-data via ssh to a local file:
$ ssh remote-host ./ncdu-export -p / > files.json
2B. Collect meta-data on the remote host and then download it:
$ ssh remote-host
$ ./ncdu-export -p / > files.json
^D
$ scp remote-host:files.json .
3. Analyze the data
$ ncdu -f files.json
If you get the /usr/bin/env: ‘python’: No such file or directory
message you need to call a chosen version of python explicitly in one of
the following ways:
python2 ncdu-export
(if using Python 2),python3 ncdu-export
(if using Python 3),- fix the first line (hashbang) of the script according to your environment.
Since version 0.8.0 ncdu-export
encodes names using UTF-8 by default.
Use the -a
switch to output ASCII. The default changed because at the
time of writing this the original ncdu
≥ 2.5 based on Zig does not
accept JSON with names encoded in ASCII. See #6 for details.
Tools described below are prepared for filenames containing unusual characters
like newlines. They support -
as the FILE's name so you can use them with
pipes.
Sometimes one can have a need to automatically filter meta-data dumped using
the ncdu or ncdu-export
tools. Those dumps can be quite big, hundreds of
megabytes. One can process those dumps with jq, but:
- using jq in non-stream mode can consume a lot of RAM,
- getting directory name from this kind of dump may be quite complicated (I
don't like my own example with
walk
), - I didn't find a way to process ncdu's output in jq's stream mode and
using methods like
fromstream(1|truncate_stream(inputs))
; I suppose it's because contrary to most formats used in jq's usage examples ncdu's format is not flat (it's an array of arrays of maps).
This set of tools can be used to flatten ncdu's output, make it easy to process using jq and then optionally unflatten it back again. These tools depend on the ijson Python library using the YAJL2 library underneath. Those libraries work on streams and parse JSON incrementally so it's possible to convert huge dumps without consuming all the RAM.
The yajl2_cffi
backend is chosen automatically (if available). It's faster
than the pure Python backend. During experiments it reduced the conversion time
by as much as 40%.
Example of filtering files modified before 2018-01-01:
$ ./ncdu-export -mp a-directory > files.json
$ ./flatten.py files.json > files-flat.json
$ export ts=$(date -d 2018-01-01 +%s)
Rebrowsing in ncdu:
$ jq -c 'select(.mtime < (env.ts | tonumber))' < files-flat.json > files-flat-before2018.json
$ ./unflatten.py files-flat-before2018.json > files-before2018.json
$ ncdu -f files-before2018.json
Putting files in an archive and removing them:
$ jq -j 'select(.mtime < (env.ts | tonumber) and .type == "file") | .dirs + "/" + .name + "\u0000"' < files-flat.json > files-flat-before2018.txt
$ tar cvzf archive.tgz --null -T files-flat-before2018.txt --remove-files
There is also a script that allows you to produce a meta-data dump just using
the find
command on a remote host (without using ncdu nor Python at all)
and then process it locally to regenerate the ncdu-compatible JSON format. It
works thanks to find's printf
action (available in the Linux version, not the
BusyBox one).
$ ./find.sh a-directory > find-export.txt
$ ./find2flat.py find-export.txt > find-flat-export.json
$ ./unflatten.py find-flat-export.json > find-export.json
$ ncdu -f find-export.json
or
$ ./find.sh ~/projects/ | ./find2flat.py - | ./unflatten.py - | ncdu -f -
.------------.
.---------------| filesystem |
| '------------'
| |
| | ncdu -o / ncdu-export
| v
| .------. .---------.
| find.sh | ncdu | ncdu -f | ncdu |
| | JSON |-------->| preview |
| '------' '---------'
| | ^
| flatten.py | | unflatten.py
v v |
.--------. .------.
| find | find2flat.py | flat |<---. jq filtering
| output |------------->| JSON |----'
'--------' '------'
|
| jq
v
.-----------. .---------.
| tar | tar -T | tar |
| file list |------->| archive |
'-----------' '---------'