race labels for MIMIC-CXR ? #39

robintibor · 2021-09-20T20:18:45Z

Hi,

I wondered how to obtain the race labels for MIMIC - CXR ?

I do have access to https://physionet.org/content/mimic-cxr/2.0.0/ and https://physionet.org/content/mimic-cxr-jpg/2.0.0/ but could not locate where you get the white/asian/black labels?

Like how to create the modified_viewposition_race_4-race-ethnicity_60-10-30_split_with_gender_age_ver_b.csv that you use in the training code?

Thanks for any help,
Best,
Robin

The text was updated successfully, but these errors were encountered:

blackboxradiology · 2021-09-20T20:41:45Z

Hi Robin,

Race labels can be found here
Under the core directory, in the admissions dataset. From there you can join the subject_id with the CXR subject_id.

Let us know if we can help with anything else!

robintibor · 2021-09-20T21:04:06Z

ah amazing thanks that clears it up! Other questions, am I understading correctly there is some code that preprocesses MIMIC-CXR and that is not in this repo? Like, one cannot just follow:

Fork/Download the GitHub repository.

Fetch the data from the data URLs for open-source datasets and drop them in the data folder.

Run the corresponding training code and save the trained model in the models folder.

for MIMIC-CXR, because https://github.com/Emory-HITI/AI-Vengers/blob/cbdf593b0d852e3078abbc72cf92aad03496511d/training_code/CXR_training/MIMIC/MIMIC_resnet34_race_detection_2021_06_29.ipynb starts from some dataframe that you have created with some code that is not in this repo?

blackboxradiology · 2021-09-20T21:27:53Z

That's correct. At the moment you would have to join the csv dataframes and make your own train-val-test splits, like what we did with modified_viewposition_race_4-race-ethnicity_60-10-30_split_with_gender_age_ver_b.csv

robintibor · 2021-09-21T11:29:27Z

I see.
One more question that came up:
Did you try to handle subjects with multiple values for ethnicity in any way? For example, following code shows there are 168 subjects that had been entered both as BLACK/AFRICAN AMERICAN and WHITE and 2489 subjects with OTHER and WHITE:

admissions_df = pd.read_csv(os.path.join(mimic_folder, 'admissions.csv'))
ethnicity_df = admissions_df.loc[:,['subject_id', 'ethnicity']].drop_duplicates()

v = ethnicity_df.subject_id.value_counts()
subject_id_more_than_once = v.index[v.gt(1)]

ambiguous_ethnicity_df = ethnicity_df[ethnicity_df.subject_id.isin(subject_id_more_than_once)]

grouped = ambiguous_ethnicity_df.groupby('subject_id')
grouped.aggregate(lambda x: "_".join(sorted(x))).ethnicity.value_counts()

blackboxradiology · 2021-09-21T11:59:40Z

Wow! Great catch! As far I know we were unaware of this multiple ethnicity problem. I will look into this and test using these changes. I suspect it could improve performance by reducing noise from mislabeled patients.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

race labels for MIMIC-CXR ? #39

race labels for MIMIC-CXR ? #39

robintibor commented Sep 20, 2021

blackboxradiology commented Sep 20, 2021 •

edited

Loading

robintibor commented Sep 20, 2021

blackboxradiology commented Sep 20, 2021

robintibor commented Sep 21, 2021

blackboxradiology commented Sep 21, 2021 •

edited

Loading

race labels for MIMIC-CXR ? #39

race labels for MIMIC-CXR ? #39

Comments

robintibor commented Sep 20, 2021

blackboxradiology commented Sep 20, 2021 • edited Loading

robintibor commented Sep 20, 2021

blackboxradiology commented Sep 20, 2021

robintibor commented Sep 21, 2021

blackboxradiology commented Sep 21, 2021 • edited Loading

blackboxradiology commented Sep 20, 2021 •

edited

Loading

blackboxradiology commented Sep 21, 2021 •

edited

Loading