Skip to content

aws-samples/healthlake-imaging-to-dicom-python-module

AWS HealthImaging DICOM Exporter module

This project is a multi-processed python 3.8+ module facilitating the load of DICOM datasets stored in AWS HealthImaging into the memory or exported to the file system .

Getting started

This module can be installed with the python pip utility.

  1. Clone this repository:
    git clone https://github.com/aws-samples/healthlake-imaging-to-dicom-python-module.git
  1. Locate your terminal in the cloned folder.
  2. Execute the below command to install the modudle via pip :
    pip install .

How to use this module

To use this module you need to import the AHItoDICOM class and instantiate the AHItoDICOM helper:

    from AHItoDICOMInterface.AHItoDICOM import AHItoDICOM

    helper = AHItoDICOM( AHI_endpoint= AHIEndpoint , fetcher_process_count=fetcher_count , dicomizer_process_count=dicomizer_count)

Once the helper is instanciated, you can call th DICOMize() function to export DICOM data from AHI into the memory, as pydicom dataset array.

    instances = helper.DICOMizeImageSet(datastore_id=datastoreId , image_set_id=imageSetId)

Available functions

Function Description
AHItoDICOM(
aws_access_key : str = None,
aws_secret_key : str = None ,
AHI_endpoint : str = None,
fetcher_process_count : int = None,
dicomizer_process_count : int = None )
Use to instantiate the helper. All paraneters are non-mandatory.

aws_access_key & aws_secret_key and : Can be used if there is no default credentials configured in the aws client, or if the code runs in an environment not supporting IAM profile.
AHI_endpoint : Only useful to AWS employees. Other users should let this value set to None.
fetcher_process_count : This parameter defines the number of fetcher processes to instanciate to fetch and uncompress the frames. By default the module will create 4 x the number of cores.
dicomizer_process_count : This parameter defines the number of DICOMizer processes to instanciate to create the pydicom datasets. By default the module will create 1 x the number of cores.
DICOMizeImageSet(datastore_id: str, image_set_id: str) Use to request the pydicom datasets to be loaded in memory.

datastore_id : The AHI datastore where the ImageSet is stored.
image_set_id : The AHI ImageSet Id of the image collection requested.
DICOMizeByStudyInstanceUID(datastore_id: str, study_instance_uid: str) Use to request the pydicom datasets to be loaded in memory.

datastore_id : The AHI datastore where the ImageSet is stored.
study_instance_uid : The DICOM study instance uid of the Study to export.
getImageSetToSeriesUIDMap(datastore_id: str, study_instance_uid: str) Returns an array of thes series descriptors for the given study, associated with theit ImageSetIds. Can be useful to decide which series to later load in memory.

datastore_id : The AHI datastore where the ImageSet is stored.
study_instance_uid : The study instance UID of the DICOM study.

Returns an array of series descriptors like his :
[{'SeriesNumber': '1', 'Modality': 'CT', 'SeriesDescription': 'CT series for liver tumor from nii 014', 'SeriesInstanceUID': '1.2.826.0.1.3680043.2.1125.1.34918616334750294149839565085991567'}]
saveAsDICOM(ds: Dataset,
destination : str)
Saves the DICOM in memory object on the filesystem destination.

ds : The pydicom dataset representing the instance. Mostly one instance of the array returned by DICOMize().
destination : The file path where to store the DIOCM P10 file.
saveAsPngPIL(ds: Dataset,
destination : str)
Saves a representation of the pixel raster of one instance on the filesystem as PNG.

ds : The pydicom dataset representing the instance. Mostly one instance of the array returned by DICOMize().
destination : The file path where to store the PNG file.

Code Example

The file example/main.py demonstrates how to use the various functions described above. To use it modifiy the datastoreId the imageSetId and the studyInstanceUID variables in the main function. You can also experiment by changing the fetcher_count and dicomizer_count parameters for better performance. Below is an example how the example can be started with an environment where the AWS CLI was configure with an IAM user and the region us-east-2 selected as default :

$ python3 main.py
python main.py 
Getting ImageSet JSON metadata object.
5
Listing ImageSets and Series info by StudyInstanceUID
[{'ImageSetId': '0aaf9a3b6405bd6d393876806034b1c0', 'SeriesNumber': '3', 'Modality': 'CT', 'SeriesDescription': 'KneeHR  1.0  B60s', 'SeriesInstanceUID': '1.3.6.1.4.1.19291.2.1.2.1140133144321975855136128320349', 'InstanceCount': 74}, {'ImageSetId': '81bfc6aa3416912056e95188ab74870b', 'SeriesNumber': '2', 'Modality': 'CT', 'SeriesDescription': 'KneeHR  3.0  B60s', 'SeriesInstanceUID': '1.3.6.1.4.1.19291.2.1.2.1140133144321975855136128221126', 'InstanceCount': 222}]
DICOMizing by StudyInstanceUID
DICOMizebyStudyInstanceUID
0aaf9a3b6405bd6d393876806034b1c0
81bfc6aa3416912056e95188ab74870b
DICOMizing by ImageSetID
222 DICOMized in 3.3336379528045654.
Exporting images of the ImageSet in png format.
Exporting images of the ImageSet in DICOM P10 format.

After the example code has returned the file system now contains folders named with the StudyInstanceUID of the imageSet exported within the out folder. This fodler prefixed with dcm_ holds the DICOM P10 files for the imageSet. The folder prefixed with png_ holds PNG image representations of the imageSet.

Using this module in Amazon SageMaker

This package can be used in Amazon SageMaker by adding the following code to the SageMaker notebook instance 2 first cells:

Cell 1

#Install the python packages
%%sh
pip install --upgrade pip --quiet
pip install boto3 botocore awscliv2 AHItoDICOMInterface --upgrade --quiet

Cell 2

#Restart the Kernel to take the new versions of awscliv2 in account.
import IPython
IPython.Application.instance().kernel.do_shutdown(True) #automatically restarts kernel

An example of a SageMaker Jupyter notebook using this module is available in the example folder of this repository : jupyter-sagemaker-example.ipynb