Please note: This notebook uses open access data
In this demo we will review how to import MIDRC imaging data, how to convert CT scan images from dicom (dcm) formats to png and jpeg formats, and how to view these CT scan images. This demo will also show how to extract file and patient metadata from the header of dicom (dcm) files.
Import the packages pydicom, pillow, and dicom_csv, as well as pandas, os and numpy. If any of these packages are not already installed to your workspace you can run one of the following:
#The packages below may be necessary for users to install according to the imports necessary in the subsequent cells
#!pip install gen3 --user
#!pip install numpy --upgrade
#!pip install pydicom --upgrade
#!pip install pillow --upgrade
#!pip install dicom-csv --upgrade
import pydicom
import numpy as np
from PIL import Image
import pandas as pd
import os
from dicom_csv import join_tree
import subprocess
Note: "gen3" commands are utilizing the Gen3 SDK "drs-pull" function, which runs at the users command line. See the detailed documentation to learn more about how to access data using the Gen3 SDK: https://github.com/uc-cdis/gen3sdk-python/blob/master/docs/howto/drsDownloading.md
Users may experience errors or warnings if the file's metadata is incomplete, but the file may have still downloaded. Check for the files in your current working directory.
Users will need to change the path to their "--auth" credentials file for each drs-pull command. Credentials are available at https://data.midrc.org/identity in the form of the api key file.
cred = "/Users/christopher/Downloads/midrc-credentials.json" # change this file path
object_ids = ['dg.MD1R/ea669b5e-ae51-40ba-b375-ed23a9cd1855',
'dg.MD1R/a745ed98-0cb9-4537-826b-13b2e354e8bb',
'dg.MD1R/e604979a-c71b-4ec6-b8a0-959837b86384',
'dg.MD1R/b5cee98d-46ff-4438-aa00-90727a383340',
'dg.MD1R/8a5a5579-7925-432d-a614-3ed208f1c182',
'dg.MD1R/33034812-47f3-4c0e-b60b-fa7a2a04ecda',
'dg.MD1R/5ca987c5-c660-4785-a67d-a3424cc8ec6e',
'dg.MD1R/44148117-1858-49ef-b30f-d239abfaff80',
'dg.MD1R/9ea205e8-a774-4318-a323-95eadda9bc5c',
'dg.MD1R/09ece36f-a0fa-48e8-8fc2-62110eaae570']
for object_id in object_ids:
cmd = "gen3 --auth {} --endpoint data.midrc.org drs-pull object {}".format(cred,object_id)
display(cmd)
subprocess.run(cmd, shell=True, capture_output=True)
All 10 data objects are now stored under the folder 'COVID-19-NY-SBU'
!ls -l COVID-19-NY-SBU
image_path = 'COVID-19-NY-SBU/A034518/12-31-1900-CT ABD PELVIS(WITH CHEST IMAGES) W IV CON-21869/4.000000-Lung 1.0 CE-04129/1-273.dcm'
image_path
Read the dcm image using the relative file path.
ds = pydicom.dcmread(image_path)
Get the pixel arrays for the image.
new_image = ds.pixel_array.astype(float)
new_image
Scale the image's pixel array and convert to a uint8 integer.
scaled_image = (np.maximum(new_image, 0) / new_image.max()) * 255.0
scaled_image = np.uint8(scaled_image)
scaled_image
Use the Image package to convert the image array and show the image.
final_image = Image.fromarray(scaled_image)
print(type(final_image))
final_image
Convert images form dcm format to jpeg and png formats and place converted image format to the original image folder.
def view_dicom_image(image_path):
ds = pydicom.dcmread(image_path)
new_image = ds.pixel_array.astype(float)
scaled_image = np.uint8((np.maximum(new_image, 0) / new_image.max()) * 255.0)
final_image = Image.fromarray(scaled_image)
return final_image
def dcm_to_png(image_path):
ds = pydicom.dcmread(image_path)
new_image = ds.pixel_array.astype(float)
scaled_image = np.uint8((np.maximum(new_image, 0) / new_image.max()) * 255.0)
final_image = Image.fromarray(scaled_image)
final_image.save(image_path.rsplit('/', 1)[1][:-3] + 'png')
def dcm_to_jpeg(image_path):
ds = pydicom.dcmread(image_path)
new_image = ds.pixel_array.astype(float)
scaled_image = np.uint8((np.maximum(new_image, 0) / new_image.max()) * 255.0)
final_image = Image.fromarray(scaled_image)
final_image.save(image_path.rsplit('/', 1)[1][:-3] + 'jpg')
Convert dicom image to png and save.
image_path = 'COVID-19-NY-SBU/A117394/10-08-1900-CT ABD AND PELVIS WITH IV CONT-39755/9.000000-CTA 0.5 CE-40834/1-0163.dcm'
dcm_to_png(image_path)
Convert dicom image to jpg and save.
image_path = 'COVID-19-NY-SBU/A587516/04-22-1901-CT CHEST WO IV CONT-40216/2.000000-Body 5.0-01241/1-16.dcm'
dcm_to_jpeg(image_path)
Display a few dicom images.
image_path = 'COVID-19-NY-SBU/A546520/12-30-1900-CT CHEST PULMONARY ANGIO WITH IV CON-13804/11.000000-CTA 3.000 CE-95792/1-119.dcm'
view_dicom_image(image_path)
image_path = 'COVID-19-NY-SBU/A770557/12-19-1900-CT CHEST WO IV CONT-97223/5.000000-Lung 1.0-84269/1-127.dcm'
view_dicom_image(image_path)
image_path = 'COVID-19-NY-SBU/A770557/12-19-1900-CT CHEST WO IV CONT-97223/7.000000-Body 3.000-78395/1-53.dcm'
view_dicom_image(image_path)
The following function will extract the file and patient metadata from the header of each dicom (.dcm) file within a given folder and place the collected metadata into a pandas dataframe.
def extract_metadata(base_folder):
df = pd.DataFrame()
file_folders = os.listdir(path = base_folder)
for folder in file_folders:
path = base_folder + '/' + folder
meta = join_tree(path, verbose=2)
df = pd.concat([df, meta])
return df
base_folder = 'COVID-19-NY-SBU'
metadata = extract_metadata(base_folder)
metadata
Included in this metadata are import pieces of file and patient data, such as the body part examined, the patient's sex, the patient's age, etc.
metadata.columns[40:60]
metadata.BodyPartExamined
metadata.PatientSex
metadata.PatientAge