Script to assist conversion of n5 datasets to paintera-friendly formats, as specified here.
Releases can be download for Ubuntu and MacOS from the Github Releases
This conversion tool currently supports any number of datasets (raw or label) with a single (global) block size, and will output to a single N5 group in a paintera-compatible format.
By default, spark will run locally:
paintera-convert to-paintera [...]
paintera-convert
can also convert paintera label sources to scalar label datasets:
paintera-convert extract-to-scalar [...]
extract-to-scalar
will the highest resolution scale level of a Paintera dataset as a scalar uint64
Dataset. This is useful for using Paintera painted labels (and assignments) in downstream processing, e.g. classifier training. Optionally, the fragment-segment-assignment
can be considered and additional assignments can be added. See extract-to-scalar --help
for more details.
Deprecated
paintera-conversion-helper is available on conda on the hanslovsky
channel:
conda install -cconda-forge -c hanslovsky paintera-conversion-helper
If necessary, you can install openjdk
and maven
from conda-forge
:
conda install -c conda-forge maven openjdk
Alternatively, paintera-conversion-helper
can be installed from PyPI through pip:
pip install paintera-conversion-helper
To compile the conversion helper into a jar, simply run
mvn -Denforcer.skip=true clean package
To run locally build a fat jar including Spark:
mvn -Denforcer.skip=true -PfatWithSpark clean package
To run on the Janelia cluster build a fat jar without Spark:
mvn -Denforcer.skip=true -Pfat clean package
To convert the raw
and neuron_ids
datasets of sample A of the cremi challenge into Paintera format with mipmaps on Linux, assuming that you downloaded the data into $HOME/Downloads
, run:
paintera-convert to-paintera \
--scale 2,2,1 2,2,1 2,2,1 2 2 \
--reverse-array-attributes \
--output-container=paintera-converted.n5 \
--container=sample_A_20160501.hdf \
-d volumes/raw \
--target-dataset=volumes/raw2 \
--dataset-scale 3,3,1 3,3,1 2 2 \
--dataset-resolution 4,4,40.0 \
-d volumes/labels/neuron_ids
$ paintera-convert to-paintera --help
Usage: paintera-convert to-paintera [[--block-size=X,Y,Z|U] [--scale=X,Y,Z|U...] [--scale=X,Y,Z|U...]...
[--downsample-block-sizes=X,Y,Z|U...] [--downsample-block-sizes=X,Y,Z|U...]...
[--reverse-array-attributes] [--resolution=X,Y,Z|U] [--offset=X,Y,Z|U] [-m=N...]
[-m=N...]... [--label-block-lookup-n5-block-size=N]
[--winner-takes-all-downsampling]] ([--container=CONTAINER] [] (-d=DATASET
[--target-dataset=TARGET_DATASET] [[--type=TYPE] ])...)... [--overwrite-existing]
[--help] --output-container=OUTPUT_CONTAINER [--spark-master=<sparkMaster>]
Options:
--block-size=X,Y,Z|U Use --container-block-size and --dataset-block-size for container and dataset specific
block sizes, respectively.
--scale=X,Y,Z|U... Relative downsampling factors for each level in the format x,y,z, where x,y,z are
integers. Single integers u are interpreted as u,u,u.
Use --container-scale and --dataset-scale for container and dataset specific scales,
respectively.
--downsample-block-sizes=X,Y,Z|U...
Use --container-downsample-block-sizes and --dataset-downsample-block-sizes for container
and dataset specific block sizes, respectively.
--reverse-array-attributes
Reverse array attributes like resolution and offset, i.e. [x, y, z] -> [z, y, x].
Use --container-reverse-array-attributes and --dataset-reverse-array-attributes for
container and dataset specific setting, respectively.
--resolution=X,Y,Z|U Specify resolution (overrides attributes of input datasets, if any).
Use --container-resolution and --dataset-resolution for container and dataset specific
resolution, respectively.
--offset=X,Y,Z|U Specify offset (overrides attributes of input datasets, if any).
Use --container-offset and --dataset-offset for container and dataset specific resolution,
respectively.
-m, --max-num-entries=N... Limit number of entries for non-scalar label types by N. If N is negative, do not limit
number of entries. If fewer values than the number of down-sampling layers are
provided, the missing values are copied from the last available entry. If none are
provided, default to -1 for all levels.
Use --container-max-num-entries and --dataset-max-num-entries for container and dataset
specific settings, respectively.
--label-block-lookup-n5-block-size=N
Set the block size for the N5 container for the label-block-lookup.
Use --container-label-block-lookup-n5-block-size and
--dataset-label-block-lookup-n5-block-size for container and dataset specific settings,
respectively.
--winner-takes-all-downsampling
Use scalar label type with winner-takes-all downsampling.
Use --container-winner-takes-all-downsampling and --dataset-winner-takes-all-downsampling
for container and dataset specific settings, respectively.
--overwrite-existing
--output-container=OUTPUT_CONTAINER
--spark-master=<sparkMaster>
--container=CONTAINER
-d, --dataset=DATASET
--target-dataset=TARGET_DATASET
--type=TYPE
--help
Clone the repository with submodules:
git clone --recursive https://github.com/saalfeldlab/paintera-conversion-helper.git
If you have already cloned the repository, run this after cloning to fetch the submodules:
git submodule update --init --recursive
Then, run the following script to build the package:
./build-for-cluster.py
For submitting a job to the Janelia cluster you can use the following script:
startup-scripts/spark-janelia/convert.py <number of cluster nodes> <other parameters>
The first parameter is the number of cluster nodes to use (for example, 5), and the rest is the same parameters as in the paintera-convert
command.