Extracting data from a DICOM file - Wiley Online Library

10 downloads 2865 Views 67KB Size Report
viewing a DICOM file, it is not necessary for the user to understand the details of the entire ... file, there are several software packages that can be used, such as ...
Extracting data from a DICOM file William R. Riddlea兲 and David R. Pickens Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tennessee 37232-2675

共Received 27 May 2004; revised 27 January 2005; accepted for publication 9 March 2005; published 13 May 2005兲 DICOM v3.0 is a vendor-independent standard for digital medical images that describes a file format and network protocol for the exchange of images between computer systems. When simply viewing a DICOM file, it is not necessary for the user to understand the details of the entire DICOM standard. However, understanding parts of the standard is essential when DICOM files are read and processed by a user-generated program. This paper offers an overview of information a user needs to write a program for extracting the image and acquisition parameters from a DICOM file. © 2005 American Association of Physicists in Medicine. 关DOI: 10.1118/1.1916183兴 Key words: DICOM, file meta information, byte order I. INTRODUCTION Digital medical images were initially stored in formats defined by the companies that manufactured the equipment, which often created problems in multivendor imaging departments. In 1985, the American College of Radiology 共ACR兲 and the National Electrical Manufacturers Association 共NEMA兲 attempted to address these incompatibility issues with a vendor-independent standard format for medical images. A revised version of this standard, called ACRNEMA v2.0, was published in 1988. However, the 1988 standard only defined file formats for use in point-to-point connections. In 1993, ACR and NEMA completed another revision, called DICOM v3.0 共Digital Imaging and Communications in Medicine兲, that could be used in a network environment. The DICOM v3.0 standard, which encompasses both communication protocols and file formats, provides a foundation for intersystem transfer of digital medical images.1 A DICOM file contains the image, its size and format, acquisition parameters, equipment description, and patient information. When simply viewing the image in a DICOM file, there are several software packages that can be used, such as ImageJ.2 While it is not necessary for the user to understand the DICOM standard when using one of these packages, understanding parts of the standard is essential when writing software that reads DICOM images for additional processing. Examples are software to calculate liver fat fractions3 or diffusion tensors4 from magnetic resonance images, where the specifics of the image acquisition are important for processing the images. II. DICOM TRANSFER SYNTAXES A DICOM file contains a File Meta Information section at the beginning of the file followed by multiple data elements. Each data element contains four fields: a data element tag, a value representation 共VR兲 field, a value length 共VL兲 field, and a value field.5 Note that each of these fields contains an even number of bytes. There are three basic transfer syntaxes specified in the DICOM standard: Explicit VR little endian, 1537

Med. Phys. 32 „6…, June 2005

Explicit VR big endian, and implicit VR little endian. To decrease the size of files, DICOM also allows for the compression of pixel data 关JPEG 共joint photographic experts group兲 or RLE 共run length encoding兲兴 with explicit VR little endian transfer syntax for nonpixel data. Little endian is a form of byte ordering where multiple byte binary values are stored with the least significant byte encoded first. With bigendian encoding, multiple byte binary values are stored with the most significant byte first. Value representation types specified with the DICOM standard are listed in Table I. A. File meta information

The file meta information is a mandatory header at the beginning of every DICOM file.6 After a file preamble of 128 bytes 共0–127兲, byte 128 contains “D,” byte 129 contains “I,” byte 130 contains “C,” and byte 131 contains “M.” Starting at byte 132, the data elements are specified in the Explicit VR little endian transfer syntax with a group number of “0002.” One of the data elements 共0002,0010兲 contains the Transfer Syntax UID 共unique identifier兲, which specifies how the data elements following the file meta information are encoded. Table II lists the Unique Identifiers used to specify the DICOM transfer syntax.7 B. Data elements

The data element tag, which is printed as two four-digit hexadecimal numbers, contains a 2-byte integer group number and a 2-byte integer element number. With the explicit VR transfer syntaxes and value representations of OB, OW, SQ, and UN, each data element contains a 2-byte group number, a 2-byte element number, a 2-byte value representation field, a 2-byte word that is not currently being used, a 4-byte value length 共VL兲 field, and a value field containing VL bytes. With the explicit VR transfer syntaxes and value representations other than OB, OW, SQ, and UN, each data element contains a 2-byte group number, a 2-byte element number, a 2-byte VR field, a 2-byte VL field, and a value field of VL bytes. The implicit VR little endian transfer syn-

0094-2405/2005/32„6…/1537/5/$22.50

© 2005 Am. Assoc. Phys. Med.

1537

1538

W. R. Riddle and D. R. Pickens: Extracting data from a DICOM file

1538

TABLE I. DICOM value representation types 共Ref. 5兲. VR name

Meaning

Data format

AE AS AT CS DA DS DT FL FD IS LO LT OB OF OW PN SH SL SQ SS ST TM UI UL UN US UT

Application entity Age string Attribute tag Code string Date Decimal string Date time Floating point single Floating point double Integer string Long string Long text Other byte string Other float string Other word string Person name Short string Signed long Sequence of items Signed short Short text Time Unique identifier Unsigned long Unknown Unsigned short Unlimited text

Characters Character Two 2-byte integers Character Eight characters Character Character 4-byte floating point 8-byte floating point Character Character Character 1-byte integers 4-byte floating point 2-byte integers Character Character 4-byte integer Unknown 2-byte integer Character Character Character 4-byte integer Unknown 2-byte integer Character

tax contains a 2-byte group number, a 2-byte element number, a 4-byte VL field, and a value field of VL bytes. With the implicit VR little endian transfer syntax, the value representation for each data element tag is defined by the DICOM standard8 and the user must provide a file that defines the VR for each data element tag. Figure 1 illustrates the data formats of the three basic transfer syntaxes.

FIG. 1. Structures of DICOM data elements 共Ref. 5兲.

C. Value field contents

The value field contains single byte ASCII characters with the AE, AS, CS, DA, DS, DT, IS, LO, LT, PN, SH, ST, TM, UI, and UT data types; single byte integers with the OB data type; 2-byte integers with the AT, OW, SS, and US data types; 4-byte integers with the SL and UL data types; 4 -byte floating point numbers with the FL and OF data types; and 8-byte floating point numbers with the FD data type. The UN data type contains a string of bytes where the encoding of the contents is unknown.9

TABLE II. DICOM unique identifiers 共UID兲 for transfer syntax 共Ref. 7兲. Value 1.2.840.10008.1.2 1.2.840.10008.1.2.1 1.2.840.10008.1.2.1.99 1.2.840.10008.1.2.2 1.2.840.10008.1.2.4.50 1.2.840.10008.1.2.4.51 1.2.840.10008.1.2.4.57 1.2.840.10008.1.2.4.70 1.2.840.10008.1.2.4.80 1.2.840.10008.1.2.4.81 1.2.840.10008.1.2.4.90 1.2.840.10008.1.2.4.91 1.2.840.10008.1.2.5

Name Implicit VR little endian Explicit VR little endian Deflated explicit VR little endian Explicit VR big endian JPEG baseline 共process 1兲 JPEG extended 共processes 2 and 4兲 JPEG lossless, nonhierarchical 共process 14兲 JPEG lossless, nonhierarchical 关process 14 共selection value 1兲兴 JPEG-LS lossless JPEG-LS lossy 共near-lossless兲 JPEG 2000 共lossless only兲 JPEG 2000 RLE lossless

Medical Physics, Vol. 32, No. 6, June 2005

FIG. 2. Block diagram for decoding file meta information on a big-endian computer after reading the file into a byte array. Numbers inside parentheses indicate the offset in the byte array. ST is a pointer to the offset in the byte array. GL0002 is the group length of the “0002” group. SW4 is a routine that reverses the byte order of a four byte word before converting it to an integer.

1539

W. R. Riddle and D. R. Pickens: Extracting data from a DICOM file

1539

FIG. 3. Block diagram for decoding explicit VR little endian data elements on a big-endian computer after reading the file into a byte array. Numbers inside parentheses indicate the offset in the byte array. ST is a pointer to the offset in the byte array. SW2 is a routine that reverses the byte order of a two-byte word before converting it to an integer. SW4 is a routine that reverses the byte order of a four-byte word before converting it to an integer.

FIG. 4. Block diagram for decoding explicit VR big endian data elements on a big-endian computer after reading the file into a byte array. Numbers inside parentheses indicate the offset in the byte array. ST is a pointer to the offset in the byte array. S2 is a routine that reads a two-byte integer. S4 is a routine that reads a four-byte integer. This decoding process is identical to the process shown in Fig. 3, except that there is no byte order reversal required.

D. SQ data type

cated by the sequence delimitation tag 共FFFE,E0DD兲. The item tag, item delimitation tag, and sequence delimitation tag are in the implicit VR little endian transfer syntax. The value length of 共FFFE,E000兲 may be positive, 0, or −1. The value length of 共FFFE,E00D兲 and 共FFFE,E0DD兲 is always zero.

With the SQ data type, each item is a nested DICOM dataset. The value field consists of a sequence of zero or more items containing a set of data elements. A value length of −1 共hex FFFFFFFF兲 indicates an unknown value length. There can be three different structures of an SQ data element: 共1兲 explicit value length encapsulating items of explicit value length; 共2兲 undefined value length encapsulating items of explicit value length, and 共3兲 undefined value length encapsulating items with both explicit and undefined value length. Examples of these three types of SQ data elements can be found in Table 7.5-1, Table 7.5-2, and Table 7.5-5 of the DICOM standard.10 When the SQ data element has an explicit value length encapsulating items of explicit value length, each item in the sequence starts with an item tag 共FFFE,E000兲. When the SQ data element has an undefined value length encapsulating items of explicit value length, each item in the sequence starts with the item tag 共FFFE,E000兲 and the end of the SQ data element is indicated by the sequence delimitation item tag 共FFFE,E0DD兲. When the SQ data element has an undefined value length encapsulating items with both explicit and undefined value length, each item in the sequence starts with the Item Tag 共FFFE,E000兲, the end of an item with undefined length is indicated by the item delimitation item tag 共FFFE,E00D兲, and the end of the SQ data element is indiMedical Physics, Vol. 32, No. 6, June 2005

E. Byte order

The byte order of the group number, the element number, the value length, and the value fields with data types of AT, FD, FL, OF, OW, SL, SS, UL, and US must be reversed when reading little-endian encoding on a big-endian computer or when reading big-endian encoding on a little-endian computer. Little-endian encoding is used in computers manufactured by Digital Equipment Corporation and Intel. Bigendian encoding is used in computers manufactured by Hewlett Packard 共PaRISC兲, IBM 共RS/6000兲, Motorola, Silicon Graphics, and SUN Microsystems. III. READING A DICOM FILE The DICOM file can be evaluated by reading it into a byte array with a binary read, then using a pointer to index through the binary array. The first step is to evaluate the file meta information to ensure that the file is a DICOM file and to determine the transfer syntax for the data elements after the file meta information. Since the data elements are located

1540

W. R. Riddle and D. R. Pickens: Extracting data from a DICOM file

FIG. 5. Block diagram for decoding implicit VR little endian data elements on a big-endian computer after reading the file into a byte array. Numbers inside parentheses indicate the offset in the byte array. ST is a pointer to the offset in the byte array. SW2 is a routine that reverses the byte order of a two-byte word before converting it to an integer. SW4 is a routine that reverses the byte order of a four-byte word before converting it to an integer.

sequentially and have different lengths, each data element tag and value length must be decoded before the next element can be decoded.

TABLE III. Selected DICOM tags for liver fat fractions. Tag

VR

Name

共0002,0000兲 共0002,0010兲 共0008,0020兲 共0008,0030兲 共0010,0010兲 共0010,0020兲 共0018,0020兲 共0018,0050兲 共0018,0080兲 共0018,0081兲 共0018,0083兲 共0018,0088兲 共0018,0089兲 共0018,1310兲 共0018,1314兲 共0020,0032兲 共0028,0010兲 共0028,0011兲 共0028,0030兲 共7FE0,0010兲

UL UI DA TM PN LO CS DS DS DS DS DS IS US DS DS US US DS OW

Group length Transfer syntax UID Study date Study time Patient’s name Patient ID Scanning sequence Slice thickness Repetition time Echo time Number of averages Spacing between slices Number of phase encoding steps Acquisition matrix Flip angle Image position 共patient兲 Rows Columns Pixel spacing Pixel data

Medical Physics, Vol. 32, No. 6, June 2005

1540

The data elements in the file meta information start at byte 132 with the group length of the file meta information 共0002,0000兲. Figure 2 shows a block diagram for decoding file meta information on a big-endian computer. The encoding of the data elements after the File Meta Information is specified with the transfer syntax UID. Figure 3 shows a block diagram for decoding explicit VR little endian data elements on a big-endian computer, Fig. 4 shows a block diagram for decoding Explicit VR big endian data elements on a big-endian computer, and Fig. 5 shows a block diagram for decoding implicit VR little endian data elements on a big-endian computer. The value fields of user-selected data element tags are evaluated and the information saved for further processing. For example, when evaluating liver fat fractions with two MR images, a user’s program must determine the number of rows and columns in the pixel data. The program can also extract other DICOM data elements that identify the patient, make sure that the selected images have the appropriate echo times, and determine that the selected images have the same repetition time, slice thickness, flip angle, and the slice location 共see Table III兲. To test the validity of user-generated programs, DICOM files with implicit VR little endian, explicit VR little endian, and explicit VR big endian encoding can be obtained from the website http://www.barre.nom.fr/medical/samples/ 共accessed on 22 September 2004兲. Examples containing the SQ data type and JPEG compression can be obtained at the website http://www.leadtools.com/SDK/Medical/ DICOM/ltdc19.htm 共accessed on 22 September 2004兲. IV. CONCLUSIONS Understanding the coding scheme and format of DICOM images will allow experienced programmers to decode the files in order to extract the substantial amount of information recorded in the data elements. The correct extraction of needed parameters is important for many post-collection tasks. This paper has presented a summary view of the DICOM standard so that someone new to the standard can quickly become familiar with the basics of the DICOM header. a兲

Corresponding Author: William R. Riddle, Ph.D., R-1311, MCN, Vanderbilt University Medical Center, Nashville, Tennessee 37232-2675. Telephone: 共615兲322-2432; fax: 共615兲322-3764; electronic mail: [email protected] 1 National Electrical Manufacturers Association 共NEMA兲, “Digital imaging and communications in medicine 共DICOM兲,” Rosslyn, VA, PS 3.1-2003– 3.16-2003, 2003. Website at http://medical.nema.org/dicom/2003.html. Accessed on 22 September 2004. 2 Image Processing and Analysis in Java; website at http://rsb.info.nih.gov/ ij/ 3 M. H. Fishbein, K. G. Gardner, C. J. Potter, P. Schmalbrock, and M. A. Smith, “Introduction of fast MR imaging in the assessment of hepatic steatosis,” Magn. Reson. Imaging 15, 287–293 共1997兲. 4 P. J. Basser, J. Mattiello, and D. LeBihan, “MR diffusion tensor spectroscopy and imaging,” Biophys. J. 66, 259–267 共1994兲. 5 National Electrical Manufacturers Association 共NEMA兲, “Digital imaging and communications in medicine 共DICOM兲, Part 5: Data structures and encoding,” Rosslyn, VA, PS 3.5-2003, pp. 22–28, 2003, http:// medical.nema.org/dicom/2003/03គ05PU.PDF, accessed on 22 September

1541

W. R. Riddle and D. R. Pickens: Extracting data from a DICOM file

2001. National Electrical Manufacturers Association 共NEMA兲, “Digital imaging and communications in medicine 共DICOM兲, Part 10: Media storage and file format for media information,” Rosslyn, VA, PS 3.10-2003, pp. 21– 22, 2003. Website: http://medical.nema.org/dicom/2003/03គ10PU.PDF. Accessed on 22 September 2004. 7 National Electrical Manufacturers Association 共NEMA兲, “Digital imaging and communications in medicine 共DICOM兲, Part 6: Data dictionary,” Rosslyn, VA, PS 3.6-2003, pp. 77–78, 2003. Website: http:// medical.nema.org/dicom/2003/03គ06PU.PDF. Accessed on 22 September 2004. 8 National Electrical Manufacturers Association 共NEMA兲, “Digital imaging and communications in medicine 共DICOM兲, Part 6: Data dictionary,” 6

Medical Physics, Vol. 32, No. 6, June 2005

1541

Rosslyn, VA, PS 3.6-2003, pp. 9–75, 2003. Website: http:// medical.nema.org/dicom/2003/03គ06PU.PDF. Accessed on 22 September 2004. 9 National Electrical Manufacturers Association 共NEMA兲, “Digital imaging and communications in medicine 共DICOM兲, Part 5: Data structures and encoding,” Rosslyn, VA, PS 3.5-2003, p. 29, 2003. Website: http:// medical.nema.org/dicom/2003/03គ05PU.PDF. Accessed on 22 September 2004. 10 National Electrical Manufacturers Association 共NEMA兲, “Digital imaging and communications in medicine 共DICOM兲, Part 5: Data structures and encoding,” Rosslyn, VA, PS 3.5-2003, pp. 39–40, 2003. Website: http:// medical.nema.org/dicom/2003/03គ05PU.PDF. Accessed on 22 September 2004.

Suggest Documents