This project is a multi-processed python 3.8+ module facilitating the load of DICOM datasets stored in AWS HealthImaging into the memory or exported to the file system .
This module can be installed with the python pip utility.
- Clone this repository:
git clone https://github.com/aws-samples/healthlake-imaging-to-dicom-python-module.git
- Locate your terminal in the cloned folder.
- Execute the below command to install the modudle via pip :
pip install .
To use this module you need to import the AHItoDICOM class and instantiate the AHItoDICOM helper:
from AHItoDICOMInterface.AHItoDICOM import AHItoDICOM
helper = AHItoDICOM( AHI_endpoint= AHIEndpoint , fetcher_process_count=fetcher_count , dicomizer_process_count=dicomizer_count)
Once the helper is instanciated, you can call th DICOMize() function to export DICOM data from AHI into the memory, as pydicom dataset array.
instances = helper.DICOMizeImageSet(datastore_id=datastoreId , image_set_id=imageSetId)
Function | Description |
---|---|
AHItoDICOM( aws_access_key : str = None, aws_secret_key : str = None , AHI_endpoint : str = None, fetcher_process_count : int = None, dicomizer_process_count : int = None ) |
Use to instantiate the helper. All paraneters are non-mandatory. aws_access_key & aws_secret_key and : Can be used if there is no default credentials configured in the aws client, or if the code runs in an environment not supporting IAM profile. AHI_endpoint : Only useful to AWS employees. Other users should let this value set to None. fetcher_process_count : This parameter defines the number of fetcher processes to instanciate to fetch and uncompress the frames. By default the module will create 4 x the number of cores. dicomizer_process_count : This parameter defines the number of DICOMizer processes to instanciate to create the pydicom datasets. By default the module will create 1 x the number of cores. |
DICOMizeImageSet(datastore_id: str, image_set_id: str) | Use to request the pydicom datasets to be loaded in memory. datastore_id : The AHI datastore where the ImageSet is stored. image_set_id : The AHI ImageSet Id of the image collection requested. |
DICOMizeByStudyInstanceUID(datastore_id: str, study_instance_uid: str) | Use to request the pydicom datasets to be loaded in memory. datastore_id : The AHI datastore where the ImageSet is stored. study_instance_uid : The DICOM study instance uid of the Study to export. |
getImageSetToSeriesUIDMap(datastore_id: str, study_instance_uid: str) | Returns an array of thes series descriptors for the given study, associated with theit ImageSetIds. Can be useful to decide which series to later load in memory. datastore_id : The AHI datastore where the ImageSet is stored. study_instance_uid : The study instance UID of the DICOM study. Returns an array of series descriptors like his : [{'SeriesNumber': '1', 'Modality': 'CT', 'SeriesDescription': 'CT series for liver tumor from nii 014', 'SeriesInstanceUID': '1.2.826.0.1.3680043.2.1125.1.34918616334750294149839565085991567'}] |
saveAsDICOM(ds: Dataset, destination : str) |
Saves the DICOM in memory object on the filesystem destination. ds : The pydicom dataset representing the instance. Mostly one instance of the array returned by DICOMize(). destination : The file path where to store the DIOCM P10 file. |
saveAsPngPIL(ds: Dataset, destination : str) |
Saves a representation of the pixel raster of one instance on the filesystem as PNG. ds : The pydicom dataset representing the instance. Mostly one instance of the array returned by DICOMize(). destination : The file path where to store the PNG file. |
The file example/main.py
demonstrates how to use the various functions described above. To use it modifiy the datastoreId
the imageSetId
and the studyInstanceUID
variables in the main function. You can also experiment by changing the fetcher_count
and dicomizer_count
parameters for better performance. Below is an example how the example can be started with an environment where the AWS CLI was configure with an IAM user and the region us-east-2 selected as default :
$ python3 main.py
python main.py
Getting ImageSet JSON metadata object.
5
Listing ImageSets and Series info by StudyInstanceUID
[{'ImageSetId': '0aaf9a3b6405bd6d393876806034b1c0', 'SeriesNumber': '3', 'Modality': 'CT', 'SeriesDescription': 'KneeHR 1.0 B60s', 'SeriesInstanceUID': '1.3.6.1.4.1.19291.2.1.2.1140133144321975855136128320349', 'InstanceCount': 74}, {'ImageSetId': '81bfc6aa3416912056e95188ab74870b', 'SeriesNumber': '2', 'Modality': 'CT', 'SeriesDescription': 'KneeHR 3.0 B60s', 'SeriesInstanceUID': '1.3.6.1.4.1.19291.2.1.2.1140133144321975855136128221126', 'InstanceCount': 222}]
DICOMizing by StudyInstanceUID
DICOMizebyStudyInstanceUID
0aaf9a3b6405bd6d393876806034b1c0
81bfc6aa3416912056e95188ab74870b
DICOMizing by ImageSetID
222 DICOMized in 3.3336379528045654.
Exporting images of the ImageSet in png format.
Exporting images of the ImageSet in DICOM P10 format.
After the example code has returned the file system now contains folders named with the StudyInstanceUID
of the imageSet exported within the out
folder. This fodler prefixed with dcm_
holds the DICOM P10 files for the imageSet. The folder prefixed with png_
holds PNG image representations of the imageSet.
This package can be used in Amazon SageMaker by adding the following code to the SageMaker notebook instance 2 first cells:
#Install the python packages
%%sh
pip install --upgrade pip --quiet
pip install boto3 botocore awscliv2 AHItoDICOMInterface --upgrade --quiet
#Restart the Kernel to take the new versions of awscliv2 in account.
import IPython
IPython.Application.instance().kernel.do_shutdown(True) #automatically restarts kernel
An example of a SageMaker Jupyter notebook using this module is available in the example
folder of this repository : jupyter-sagemaker-example.ipynb