Evaluation of a software package for automated

3 downloads 0 Views 812KB Size Report
Dec 15, 2015 - fundamental parameters: low contrast and small detail detectability. ... recently developed (April 2004) software tool dedicated to automatic ..... Martin J C, Sharp F P and Sutton G D 1999 Measurement of image quality in ...
Home

Search

Collections

Journals

About

Contact us

My IOPscience

Evaluation of a software package for automated quality assessment of contrast detail images—comparison with subjective visual assessment

This content has been downloaded from IOPscience. Please scroll down to see the full text. 2005 Phys. Med. Biol. 50 5743 (http://iopscience.iop.org/0031-9155/50/23/023) View the table of contents for this issue, or go to the journal homepage for more

Download details: IP Address: 94.14.211.75 This content was downloaded on 15/12/2015 at 20:40

Please note that terms and conditions apply.

INSTITUTE OF PHYSICS PUBLISHING Phys. Med. Biol. 50 (2005) 5743–5757

PHYSICS IN MEDICINE AND BIOLOGY

doi:10.1088/0031-9155/50/23/023

Evaluation of a software package for automated quality assessment of contrast detail images—comparison with subjective visual assessment A Pascoal1,2, C P Lawinski2, I Honey2 and P Blake2 1 Medical Engineering and Physics, King’s College London, Faraday Building Denmark Hill, London SE5 8RX, UK 2 KCARE – King’s Centre for Assessment of Radiological Equipment, King’s College Hospital, Faraday Building Denmark Hill, London SE5 8RX, UK

E-mail: [email protected]

Received 14 June 2005, in final form 13 October 2005 Published 24 November 2005 Online at stacks.iop.org/PMB/50/5743 Abstract Contrast detail analysis is commonly used to assess image quality (IQ) associated with diagnostic imaging systems. Applications include routine assessment of equipment performance and optimization studies. Most frequently, the evaluation of contrast detail images involves human observers visually detecting the threshold contrast detail combinations in the image. However, the subjective nature of human perception and the variations in the decision threshold pose limits to the minimum image quality variations detectable with reliability. Objective methods of assessment of image quality such as automated scoring have the potential to overcome the above limitations. A software package (CDRAD analyser) developed for automated scoring of images produced with the CDRAD test object was evaluated. Its performance to assess absolute and relative IQ was compared with that of an average observer. Results show that the software does not mimic the absolute performance of the average observer. The software proved more sensitive and was able to detect smaller low-contrast variations. The observer’s performance was superior to the software’s in the detection of smaller details. Both scoring methods showed frequent agreement in the detection of image quality variations resulting from changes in kVp and KERMAdetector, which indicates the potential to use the software CDRAD analyser for assessment of relative IQ.

1. Introduction The use of digital imaging technologies is becoming very common in imaging departments. The routine assessment and control of image quality (IQ), both technical and clinical, is a fundamental task associated with good practice. 0031-9155/05/235743+15$30.00 © 2005 IOP Publishing Ltd Printed in the UK

5743

5744

A Pascoal et al

At present, there are various methods that can be used to assess technical IQ associated with diagnostic imaging systems (Martin et al 1999). Some are indirect and based on the measurement of purely quantitative, objective parameters related to detector performance (e.g. detective quantum efficiency). The tasks involved with these methods are complex and not always easy to implement on a routine basis in a hospital environment. Additionally, these methods do not consider the effects of scattered radiation and image processing, both fundamental elements of the imaging chain. An alternative for assessment of technical IQ is the use of contrast detail test objects that provide a quantitative measure of IQ in terms of two fundamental parameters: low contrast and small detail detectability. Such methods involving test objects most frequently consider the overall imaging chain, including the human observer. Although they cannot be used to directly predict clinical image quality, due to the simplicity of the models used in comparison to the complexity of real anatomic structures, they provide useful information on threshold contrast detail detectability and equipment performance. The use of contrast detail images is a practical approach primarily adopted for routine quality control (QC). However, various authors have also reported its extended use in optimization studies (Honey 2004, Hamer et al 2003) and comparisons of equipment performance (Rong et al 2001, Chotas and Ravin 2001). A (commercially available) contrast detail test object is CDRAD (Artinis Medical Systems) and can be used to produce images containing circular details with 15 gradually varying contrast levels. The method originally proposed by the manufacturers for the evaluation of CDRAD images is subjective and based on human perception and decision criteria. It involves several observers individually identifying the just visible details (threshold details) for each detail diameter presented in the image. The final average score provides quantitative information on perceived image quality combining information on low-contrast and small-detail visibility. The human decision criteria are a fundamental element to include in the imaging chain when assessing IQ, due to its crucial role in the medical diagnosis process. However, its subjective nature is a large contributor to the potential variability in the result. A study on the variations associated with contrast detail studies based on human perception has been reported by Cohen et al (1984). In their study involving four observers standard errors between 12 and 18% in threshold contrast were found, with the higher variations obtained for the lower contrast levels. The limited human perception for very low contrast levels has been discussed (Swensson and Judy 1981). Cohen et al also showed that the decision threshold for the same observer (intra-observer variability) changes and reported variations up to 12%. This study showed how contrast detail IQ analysis is affected by variations in human perceptibility and decision criteria. In order to keep these variations within acceptable levels, the averaging of scores from various observers is recommended. The previous discussion highlights that the variability in the scoring poses a fundamental limit to the minimum IQ variations that can be reliably detected using contrast detail analysis and human decision criteria. Objective methods for assessment of IQ based on measurements of image data (e.g. signal-to-noise ratio) are not affected by human perception. Consequently, they do not suffer from variations associated with them and are potentially more reliable and reproducible (Liu and Shaw 2004). Studies that would potentially benefit from the use of an objective rather than a subjective methodology for assessment of IQ are those aimed at detecting drifts in equipment performance (as in routine QC), or studies involving optimization of technical parameters (e.g. kVp), known to affect image quality. Methods for automatic quantitative assessment of IQ of digital images produced with contrast detail test object have been reported (Kwan et al 2003) but not many have been

Evaluation of a software package for automated quality assessment of contrast detail images (a)

5745

(b)

Figure 1. (a) CDRAD contrast-detail test object composed of a Plexiglas plate (26.5 cm × 26.5 cm × 1.0 cm) with holes drilled in it; (b) radiographic image of CDRAD containing 225 details (only a fraction is visible).

developed into tools commercially available for the purpose. The CDRAD analyser is a recently developed (April 2004) software tool dedicated to automatic quantitative analysis of the quality of digital images produced with the CDRAD test object (Thijssen et al 1998). The detection method is based on measurements of image signal and statistical analysis. The manufacturer of this tool claims it simplifies and speeds up the process of image scoring. The primary aim of this study was to evaluate the CDRAD analyser and compare its variability, usability and performance in the assessment of absolute and relative IQ, with that of the human observer. A further aim was to investigate the suitability of this tool to be used for the optimization of image quality of digital imaging systems and in comparative studies of equipment performance. 2. Materials and equipment 2.1. Contrast-detail test object (CDRAD) The CDRAD type 2.0 test object (Instrumentele Dienst, Nijmegen) consists of a square Plexiglas plate containing drilled holes of varying depths (0.3–8.0 mm) and diameters (0.3– 8.0 mm) as illustrated in figure 1(a). A radiograph of CDRAD (figure 1(b)) displays the image of the 225 circular holes in the phantom (referred throughout this paper as details) arranged in a lead engraved grid with 15 columns and 15 rows. Each row displays 15 details of identical diameter and varying contrast level resulting from the gradually varying hole depths in the test object. Each column displays 15 details with identical contrast level and varying diameter. The first three rows contain only one detail per square (figure 1(b)) while the remaining 12 rows contain two identical details per square (same contrast level and diameter). One detail is located in the centre of the square and the second detail is located in a randomly chosen corner. A 4-alternative forced choice methodology (4-AFC) used to score the second (corner) detail allows verification of the true visual detection of the object. CDRAD can be imaged in association with a polymethyl methacrilate (PMMA) block of appropriate thickness to provide attenuation and scatter, simulating attenuation of the anatomical region under investigation (Thijssen et al 1998).

5746

A Pascoal et al (a)

(b)

Figure 2. (a) Automatic detection of the grid outline from the user selected location of the four corners; (b) measurement of the average signal (greyscale value) within the detail and in the surrounding background.

2.2. Imaging systems Two digital imaging systems were used to produce images of CDRAD; an Agfa Compact CR reader with a MD40 image plate (IP) using a BaSrFBr:Eu phosphor. The IP was exposed in a vertical bucky incorporating a moving anti-scatter grid. A second imaging system used was flat panel detector (FPD) system (Vertix FD, Siemens, Forchheim, Germany) consisting of an x-ray tube and generator (Optilix 150/30/50 C, Polydoros LX50) and an indirect conversion CsI/a-Si FPD mounted on a vertical stand. A focused grid was integrated with the detector. 2.3. CDRAD analyser software The CDRAD analyser software (version 1.0) was developed for quantitative analysis of images produced with the CDRAD test object. Detailed detection is based on measurements of the signal resulting from the energy absorbed in the image detector within the circular area of the detail and in its surrounding background. The automated measurement involves the determination of the exact position of the matrix containing the 225 details. To achieve this, the software requires input of the focus-to-detector distance used in the image acquisition. It also requires the identification of the position of the four corners that define the lead grid engraved in the CDRAD phantom (figure 2(a)). This localization can be performed automatically or by the user and is a task of major importance since it will determine the exact location of the region of interest (ROI) in which the signal is measured. The signal in the object (µobj) is considered as the average greyscale value in a specified area within the area of the circular detail and the signal in the background (µobj) is obtained within an identical area in the surrounding background as illustrated in figure 2(b)). The software also calculates the standard deviation associated with both signals measured. With the exception of the details in the first three rows of the CDRAD image, the decision of detail detectability based on the signal measured in the central object, is complemented by information obtained from the 4-AFC test performed for the second detail, randomly located in one of the four corners. The software applies a Welch Satterthwhaite test (student t-test with Welch correction) to the measured signals to decide whether or not the signal in the detail is equal to the signal in the surrounding background plus an a-priori-difference-of means (APD). If the difference

Evaluation of a software package for automated quality assessment of contrast detail images

5747

in the two signals proves statistically significant, at a specified significance level, the detail is detected, otherwise it is not detected. The significance level associated with the statistical test is selected by the user and can assume any value between 0 and 0.5. The default value is 1 × 10−8. The APD is defined by the user prior to the image analysis and is set relative to the image bit depth. This parameter was included to allow a valid comparison of automated scores obtained from images stored with different bit-depth. The same APD value should be used whenever all images compared have the same bit-depth, otherwise the relation between the APD for the images being compared can be derived from equation (1). = APD 2(bit-depth image1 )-(bit-depth image2 ) . (1) APD image1

image2

Once the grid position is identified and the settings entered by the user, the image is analysed and the results are displayed. The output includes a diagram with dots representing all details where a significant difference between signals in the detail and background was found (figure 3(a)). Additionally, a second diagram containing dots representing the scores after the correction using the four nearest neighbours scheme proposed in the CDRAD manual (figure 3(b)). A graph (‘contrast detail curve’) displaying the threshold corrected scores (i.e. minimum diameter detected for each hole depth) is provided. The software also calculates an inverse image quality figure (IQFinv) proposed by the manufacturer (Thijssen et al 1989), using equation (2) where hi refers to hole depth-column i and Di correspond to the minimum diameter (threshold diameter) detected for hole-column i, 100 IQFinv = 15 . (2) i=1 hi · Di ,threshold An html file with an output report is generated and contains information on the settings selected, the coordinates of the grid localization and orientation, results of the statistical tests for each detail and all information contained in the DICOM header of the image. The output diagrams and the ‘contrast detail’ curve displayed can only be exported as picture files. 3. Experimental methods 3.1. Set-up and image acquisition The CDRAD test object was exposed in AP view, between two blocks of PMMA (4 cm + 5 cm) in contact with the front face of the detector assembly. A focus-to-detector distance of 180 cm was set and the anti-scatter grid provided with each system was used. Images were produced with both CR and FPD systems at 70, 90 and 125 kVp. The mAs settings were adjusted to obtain a nominal air KERMA (Kinetic Energy Released per unit MAss) value of 4 µGy at the detector surface, at all kVps. At 90 kVp, images were also acquired aiming at 2 and 8 µGy. For all exposures three replicated images were produced as recommended in the CDRAD manual. Additionally, for one specific exposure (125 kVp, 4 µGy with FPD) a total of 15 replicates were acquired to further investigate intra-sample variability. All images were acquired using a non-clinical acquisition mode and no post-processing was applied. The images were transferred in DICOM format onto a review workstation for automated and visual scoring. 3.2. Automated assessment of IQ The automated scoring comprised displaying each image using the software CDRAD analyser and manually defining the position of the four corners delineating the grid. Settings entered by the user included the source-to-detector distance (180 cm) and the ‘a priori difference of

5748

A Pascoal et al (a)

(b)

(c)

Figure 3. Diagrams displayed by the software showing the results from the automated scoring: (a) diagram with dots representing all details where a significant difference between signals in the detail and background was detected; (b) diagram with dots representing the corrected scores obtained from the application of the four nearest neighbours correction scheme proposed in CDRAD manual; and (c) ‘contrast detail curve’ and IQFinverse.

means’ for which the ‘0’ value was selected. All images were scored at five significance levels (1 × 10−3, 1 × 10−5, 1 × 10−8 (default), 1 × 10−11 and 1 × 10−15) representative of the overall range and the scoring process was repeated twice. The significance level of 0.05, typically adopted as a threshold below which the null hypotheses can be rejected, was also investigated.

3.3. Visual assessment of IQ All images were scored in digital format on a high-resolution workstation (SMM 21125P Siemens, 1280 × 1600 pixels) by three observers using a protocol adapted from the CDRAD manual. The protocol was restrictive on the magnification level to be used (2×) but allowed the contrast and brightness settings to be adjusted at preference. The scoring consisted of each observer identifying and recording the threshold visible thickness for each detail diameter. Additionally, the observer indicated the position of the corner detail, whenever it existed. The observer scores were corrected using the four nearest neighbours method, as proposed

Evaluation of a software package for automated quality assessment of contrast detail images

5749

40% Observer A Observer B

Standard error %

30%

Software 20%

10%

0% 0.1

1.0

10.0

Hole depth (mm)

Figure 4. The standard error was adopted as a measure of intra-observer variability and intrasoftware variability. It was calculated from three repeated scorings of CDRAD images using the visual (observers) and automated (software) methods. 20%

IQFinv scores - avrg

Standard error (%)

15%

scores - max 10%

5%

0% 0

2

4

6

8

10

12

14

16

Number of replicates averaged

Figure 5. The standard error was adopted as a measure of intra-sample variability (software scoring) and was calculated from the 10 different combinations produced for each group of n replicates. The graph shows the variability for both the individual scores and the IQFinv. The variability for the scores showed dependence on the hole depth, decreasing with increasing depth (i.e. increasing contrast). The results presented in the graph correspond to the average value over all hole depths and the maximum value obtained for each group of n replicates averaged.

in the CDRAD manual. All images were rescored by two observers in order to investigate intra-observer variability. 4. Results and discussion 4.1. Variability 4.1.1. Intra-software variability versus intra-observer variability. The repeated automated scorings of each image returned exactly the same values for all images (figure 4). The results for the visual scoring were dependent of hole depth and a maximum intra-observer variability of 20% was obtained for both observers. Larger intra-observer variability was noted for smaller hole depth (i.e. lower contrast). The intra-observer variability, averaged over all depths, was 9% and 10% for observers A and B, respectively. The inter-observer variability was also calculated and it was lower than 15%. 4.1.2. Intra-sample variability (software analysis). The results in figure 5 show a clear decrease in the standard error with increasing number of averaged replicates indicating that the sensitivity of the automated scoring to detect reliably variations in IQ, depends strongly on the number of replicated images averaged.

5750

A Pascoal et al

The results presented correspond to the variability averaged over all detail depths (scoresavrg) and also the maximum value detected for each group of n replicates (scores-max). Larger variability was obtained for individual scores for each hole, compared to the overall IQFinv. The variability for the individual scores was dependent on hole depth and was smaller for larger depths (i.e. higher contrast detail). Variations in x-ray fluence in consecutive exposures will result in small variations in the signal obtained for replicated images. However, modern high frequency x-ray generators show good output repeatability and therefore any differences in x-ray fluence at the detector are likely to be caused by statistical fluctuations associated with the random nature of x-ray production. Measurements of signal and noise previously undertaken on various replicate images acquired with the FPD used in this study revealed small variations in image signal and noise properties (