Skip to content

DICOM Import Process Steps

michaelkain edited this page Oct 28, 2022 · 15 revisions

Overview

We assume here that one import is done for one specific study, not cross-study. We assume that only one study card is used for multi-patient import, what means the list of patients to import come all from the same machine -> same study card.

A. Import preparation

  1. Modality Selection
  2. Series Selection
  3. Data Download/Copy
  4. Preview Images
  5. Check studycard compatibility (read infos from first dicom file of first selected serie)
  6. Calculate subjectIdentifier + hash values (pseudonymus mode or not)
  7. Check subject existance
  8. Create entities (subject, rel_subject_study, examination)
  9. Pseudonymization

B. Import

  1. Translate instances to images and read infos from first dicom file of each serie
  2. Calculate datasets from DICOM series (images -> datasets)
  3. Conversion DICOM to NIfTI (datasets)
  4. Create entities (acquisition, protocol, dataset, ...)
  5. Apply studycard
  6. Copy NIfTI files (into persistent disk space, BIDS structure in the best case)
  7. Send DICOM files to backup PACS
  8. Clean import

Import parameters: 6 IDs are important to finish one import DICOM: 1 patient + 1 study + 1..n serie(s)

  1. StudyId
  2. StudyCardId
  3. SubjectId (== patient DICOM)
  4. RelSubjectStudyId (put subject into one study)
  5. ExaminationId (== study DICOM)
  6. ConverterId (not necessary if StudyCardId is linked to one specific converter)

C. Mass data import

Mass data import in this context means importing multiple patients (DICOM) with multiple studies (DICOM) and multiple series (DICOM) on using the same study and studycard. For mass data import the studycard compatibility will have to be checked on each serie of the import.

Import step: A.1. Modality Selection

The user can select a DICOM imaging modality for his current import. The following selections shall be possible in the future and are therefore prepared in the spec here:

  • All modalities (no restriction)
  • MR (default for the moment)
  • CT
  • PT
  • Spectroscopy

The selection of a DICOM imaging modality filters, what series the user can select afterwards and what results are displayed to him. If MR is the default, only MR series will be searched in the PACS (DICOM Query with filter "MR") and displayed or only MR series will be shown when reading the DICOMDIR. All means no filter is applied and the user sees all results. Attention: in the daily life with all the data in a PACS this can be very time consuming, when searched without filter (so careful usage).

The current implementations of ShanoirUploader or the Web GUI of Shanoir-NG do implicitly filter by MR, what will be changed in the near future to extend the usage of Shanoir-NG.

Import step: A.2. Series Selection

  • Query PACS and select series (ShUp + Web GUI) PACS queries are done using PatientRootLevel or StudyRootLevel.
  • Open folder with DICOMDIR (ShUp) or upload ZIP file with DICOMDIR (Web GUI) and select series
  • Open folder without DICOMDIR (ShUp) or upload ZIP file with only DICOM files (Web GUI) and select series
  • Upload Bruker(MR) ZIP (Web GUI, to come in ShUp), Conversion Bruker to DICOM and select series

The current implementation allows only the selection of multiple series of one patient, not multi-patient data. We will think about how to manage multi-patient data.

Import step: A.3. Data Download/Copy

In this step, in the case of the import from a PACS, Shanoir-NG actually downloads the images from the PACS, as before an access to the images is not given as they have not been downloaded, neither to ShUp nor to the Shanoir-NG server. In case of the import from folder, only the images of the selected series are copied from the CD/DVD/folder for further processing. One folder for each serie is created on the disk in a folder called tmp_folder_for_this_import/SERIES/{serieId}/ dicom files. This shall be done by Shanoir Exchange Format (SEF) to /sourcedata/sub1-toto/SERIES/{serieId}/ dicom files.

The DICOM instances attached to the series will be used to download/copy the files as they contain the file names.

Import step: A.4. Preview Images

This step will be implemented in the future. An images preview will be unified and only be possible after the images have been downloaded/copied before. Today in the Web GUI the preview is possible for folder/zip file imports, but not for PACS imports, what is strange, but logic as the PACS images have not been downloaded before. ShUp today does not allow images preview, users can use the PACS interface for this purpose.

Import step: A.5. Check studycard compatibility

For all imports of DICOM files (imports with ShanoirUploader, from PACS or from ZIP, or imports with the Web GUI, from PACS or from ZIP) the compatibility of the StudyCard has to be assured/checked.

3 information sources/models give us the base to calculate that compatibility:

  1. Import from PACS: DIMSE or query & retrieve informations. When we query the PACS on its different levels: patient, study, serie, instance, we can ask for additional informations and try to use these information to check the compatibility. A tree is constructed combining the information received on different levels.
  2. Import from ZIP (CD/DVD): DICOMDIR informations. Normally on a CD/DVD we find a DICOMDIR with information we can try to use to check the compatibility. The DICOMDIR is a structured tree as well.
  3. From the DICOM file(s) itself (after download from PACS or copy from CD/DVD).

3 fields are used to check the compatibility: Manufacturer (0008,0070), ManufacturerModelName (0008,1090) and DeviceSerialNumber (0018,1000). For a match all 3 fields are configured in Shanoir: the StudyCard references an acquisition equipment and all three fields are in the acquisition equipment. Furthermore each studycard references one center.

Unfortunately our tests show, that e.g. the DeviceSerialNumber is not returned by the PACS (source 1)) and is neither available in the DICOMDIR (source 2)). So to calculate the compatibility precisely, when we use all 3 fields, we will need access to the DICOM files itself to do this job. Normally all 3 fields should be present in the dicom files, what is not guaranteed, but quite probable.

The consequence of this is, that after the selection of the series either in ShUp or in the WebGUI, we will have to download (Import from PACS) or copy (Import from ZIP) all dicom files to have access to them and read the first dicom file of each serie to access the 3 fields. In case of the import from PACS (ShUp or Web GUI) the download of all dicom files from the PACS can take some time, what is the negative side of this solution. If in the future we would like to propose as well an import only based on a folder containing dicom files (no DICOMDIR) we consider the current solution (always download or copy first) as the most stable.

For ShUp the idea is to download one patient (or multiple patients with their series later) and change the button to download. Then ShUp starts to download all dicom files for one or multiple patients and adds one row to the "current downloads/imports" table, one for each patient. When the download for one patient is finished, an import (or send) button appears, which opens the ImportDialog (clinical context window), where the user can choose study, studycard, subject and examination.

For the Web GUI at first a synchronous solution can be imagined. After the select series the user waits for some time until all dicoms have been downloaded and then continues the import with the clinical context. This solution can be extended to an asynchronous solution, similar to the solution in ShUp, where the user gets a notification if the download is finished and can finish the import beginning from an import history overview table. One more advantage of this solution (not downloading or copying after the user has entered his informations) is, that the compatibility check is already finally done during the import and not when the backend tries to import already the data.

We consider a valid compatibility only in the case of all 3 fields match: 0 == present, 1 == missing

DicomTag 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1
Manufacturer 0 1 1 1 1 0 0 0
ManufacturerModelName 0 1 1 1 0 0 1 1
DeviceSerialNumber 0 1 0 1 1 1 0 1
Result compatibility check Warning Check if valid Warning Warning Warning Warning Warning Warning

The warning will show a message, that at least one field is missing or does not match. In case all 3 fields match a compatibility signal/message will be displayed. Furthermore if one studycard or multiple are checked as compatible, they will be automatically preselected, same for the study with this or these studycard(s). If multiple studies or studycards are compatible, always the first study and the first studycard will be preselected. In case of a warning the import should still be possible to finish. Sometimes as DummyRuns are not directly available, the creation of a StudyCard can be delayed, but we have to encourage the people to import the data. In this case the already imported (unclean) data can manually be corrected to have the right sequence name. In Shanoir-NG in this case the reapplication of a studycard will be the right solution for this. In any case during all imports we will display the content of 6 fields to the user:

  1. InstitutionName (0008,0082)
  2. InstitutionAddress (0008,0081)
  3. StationName (0008,1010)
  4. Manufacturer (0008,0070)
  5. ManufacturerModelName (0008,1090)
  6. DeviceSerialNumber (0018,1000)

If one of the fields is not present in the dicom files, the user will directly see it. Furthermore InstitutionName and -Address should help the user to link the current import to the right center.

How does this impact on mass data imports, e.g. 5 patients import?

Import step: A.6. Calculate subjectIdentifier

During the import a hash value is calculated for each subject, called subjectIdentifier. This hash does not allow to find back a subject (one-way-hash), but does allow to identify the same subject. Two algorithms are currently used to calculate hash values of different lengths:

  • Hash 1 (pseudonymus mode, called double hash as well) = SHA256(hashP1(firstName)||hashP1(birthName)||hashP(birthDate)) Where «SHA256» is the standard hash function of the family SHA-2, that produces a hash of 256 bits, «hashP1» is the first hash of Pseudonymus, and «||» means concatenation. String length: 64 chars. For this hash an external software library called pseudonymus is required, where OFSEP is the owner.

  • Hash 2 (not pseudonymus mode, called single hash as well, Neurinfo) = SHA(firstName||lastName||birthDate) (isAnonymised=false) or = SHA(commonName||birthDate) (isAnonymised=true) String length: 14 chars.

Import step: A.7. Check subject existance

While importing DICOM files for one or multiple patients, Shanoir-NG checks, if the currently importing patient is already contained in the database or not. For this reason a hash value is used, calculated in the previous step, called subjectIdentifier.

The hash is used to search in the database, if a subject already exists with the same hash value/subjectIdentifier.

Case Number 1. Case 2. Case 3. Case 4. Case
Found with H1/2 No Yes Yes Yes
In study selected for import - Yes No No
User memberOf study of existing subject - Yes Yes No

Depending on the hash used, 1 or 2, the behaviour of Shanoir-NG is not the same. As Hash 1 is much more precise, two different behaviours are implemented for the four cases:

  • Hash 1 (pseudonymus mode), today behaviour for each case:
  1. Case: Preselect creation of a new subject and block selection of other subjects
  2. Case: Preselect this subject and block selection of other subjects
  3. Case: Preselect this subject, add it to the current study and block selection of other subjects
  4. Case: Preselect this subject, add it to the current study and block selection of other subjects

In the case of Hash 1 the importing users, normally using ShanoirUploader, verify before the import, that all credentials are valid, nevertheless what is in the DICOM files.

  • Hash 2 (not pseudonymus mode, Neurinfo), today behaviour for each case (see -1-):
  1. Case: User choses between creation of a new subject and selection of an existing subject
  2. Case: User choses between creation of a new subject and selection of an existing subject, but during creation he can not create the same subject twice, he is informed of the existing subject. The existing subject is NOT pre-selected, the user has to chose it (what is not so good).
  3. +4. Case: User choses between creation of a new subject and selection of an existing subject, but during creation he can not create the same subject twice, he is informed of the existing subject. The existing subject is NOT pre-selected, the user has to chose it (what is not so good). Attention: today Case 3.+4. force the user to errors: he can not select the existing subject from another study, but can not create a new subject neither, he can only do wrong here.

In the case of Hash 2 the importing users, normally using the WebGUI, verify before the import, that all credentials are valid, nevertheless what is in the DICOM files. -1- assuming that an error has been made while creating the DICOM files, but the importing user knows it better

Import step: A.8. Create entities (subject, rel_subject_study, examination)

To prepare the import some entities are created, that are required for the actual import to work.

6.1 Create subject

Each subject in Shanoir-NG has a common name, that is unique and used to find and store a subject. During the import, using the subject identifier, see chapter 5., Shanoir finds an existing subject. If an existing subject could not be found a new subject will have to be created during the import. Two "modes" exist for that concerning the allocation of the common name.

6.1.1 Manual

On using the GUI of ShUp or the web GUI the user itself enters a common name on his own choice, and tries if this common name already exists.

6.1.2 Auto-increment

The user can not choose a common name. Shanoir-NG allocates them with the following pattern: "0010001", where 001 is the centerId and 0001 is the id in Shanoir-NG db for this subject. Shanoir-NG uses the studyCardId selected for the import to access to the id of the center, where the data is coming from. The centerId is then used to create a three digit string: e.g. "001" if centerId == 1 or e.g. "012" if centerId == 12. On using this string all subjects of one study are searched in the db and the subject with the highest number for the subjectId is selected. Then this number, e.g. "0001" is counted up to -> 0010002 and used as common name.

Import step: A.9. Pseudonymization

@TODO: each study will require a configuration of an anonymization profile to be used.

DICOM files are anonymised in Shanoir. The anonymization module (shanoir-ng-anonymization) anonymises DICOM tags according to the Excel file: /shanoir-ng-anonymization/src/main/resources/anonymization.xlsx. We took the decision to use an Excel file, that it could be seen, extended and edited easily by everybody.

Two steps are done in shanoir-ng-anonymization:

  1. Anonymise all tags according to Excel (input param: list of DICOM files + profile name)
  2. Insert subjectIdentifier hash into Tag.PatientName and Tag.PatientID and set Tag.PatientBirthDate to the first day of the year of birth

The anonymization in Shanoir is oriented on the Supplement 142 of the DICOM standard: Clinical Trials De-identification (see the Excel file for details and comparison): see DICOM standard, supplement 142

Different profiles are available (Column 1.+2. List the tags):

  • Basic Profile
  • MR Profile
  • Shanoir Profile (default)
  • OFSEP Profile (check if similar to Shanoir)
Clone this wiki locally