Radiotherapy portal verification: an observer study - BIR Publications

8 downloads 0 Views 1MB Size Report
Abstract. In many radiotherapy facilities radiotherapy portal verification is currently a subjective process based on the visual comparison of a treatment or portal ...
1995, The British Journal of Radiology, 68, 165-174

Radiotherapy portal verification: an observer study 1

R BISSETT, MD, FRCPC, 2 S BOYKO, RTT, 2 K LESZCZYNSKI, PhD, 2 S COSBY, MSc, 2 P DUNSCOMBE, PhD and 3 N LIGHTFOOT, PhD Departments of Radiation Oncology and 2Physics, and 3Epidemiology Research Unit, Northeastern Ontario Regional Cancer Centre, 41 Ramsey Lake Road, Sudbury, Ontario, Canada P3E 5J1 Abstract In many radiotherapy facilities radiotherapy portal verification is currently a subjective process based on the visual comparison of a treatment or portal image with a prescription or simulation image. The reliability of this process is unknown. We describe here a study in which 16 observers (oncologists, physicists and therapists) independently evaluated the geometric accuracy of 530 treatment fields on 45 patients. The treatment images were acquired by the BEAM VIEW™ on-line portal imaging system (Siemens Medical Laboratories, Concord, CA, USA). Illustrative examples of the large variation in observers' assessments of the same field are given. The kappa statistic is used to evaluate the degree of agreement between observers and between on-line (at the treatment unit) and off-line (in a quiet viewing room) assessments. The best interobserver agreement was between the four oncologists contributing to the study although this level of agreement was rated only as "fair". Comparison of on-line and off-line decisions made by therapists exhibited "poor" agreement. This study has provided statistical confirmation of the suspicions of many workers in the field of radiotherapy portal verification, viz that the subjective evaluation of field accuracy is unreliable. However, the degree of unreliability is surprisingly large. The inconsistencies between observers documented in this study need to be clearly acknowledged in the development of protocols for the clinical use of on-line portal imaging systems. Acceptable reliability in radiotherapy portal verification will only be achieved when subjective decision making is eliminated.

That on-line portal imaging (OPI) has the potential to result in real benefits to patients receiving radiotherapy must now be beyond dispute. Not only is it, in principle, possible to irradiate routinely the prescribed target volume with greater accuracy than was previously achievable but also, with greater confidence in the geometric precision of radiotherapy, it may be feasible to reduce the margin that is the difference between the tumour and target volumes as specified by the ICRU [1]. The routine practical realization of these potential advantages to the patient, however, is limited by the operator's ability to verify a treatment portal reliably and accurately in real time or to identify any remedial action necessary. Progress is being made in the area of automated registration of image pairs [2-5] and if these efforts are successful this limitation will ultimately be removed. However, at the present time almost all OPI systems in clinical use provide information for a rather subjective decision making process by the operator or oncologist (see for example [6, 7]). In this report, we describe a study designed to identify the reliability of the subjective observer process and some potentially dependent factors. Received 2 February 1994 and in revised form 16 May 1994 and accepted 13 June 1994. Presented in part at the 34th Annual Meeting of ASTRO, San Diego 1992. Vol. 68, No. 806

Methods and materials The on-line portal images used in this study were acquired during routine patient treatment with the 6 MV photon beam of a Mevatron MX-2 linear accelerator (Siemens Medical Laboratories, Concord, CA, USA). This accelerator was equipped with a BEAMVIEW™ on-line portal imaging system [8] and used to treat a variety of sites. The geometrical component of a treatment prescription was generally contained in a simulation radiograph produced on a Mevasim S therapy simulator (Siemens Medical Laboratories). Part of our clinical protocol required the therapist operating the unit to designate each BEAMVIEW™ imaged field as "acceptable" or "unacceptable" by comparison with the reference prescription image (simulator film) mounted on a viewing box adjacent to the BEAMVIEW™ monitor. As there are no generally accepted criteria to distinguish an acceptable portal from an unacceptable portal, this decision had to be left to the discretion of the therapist. The comparison was strictly visual with no measurements having been taken from the images. The major part of the study reported here was carried out off-line and in a non-clinical area. It is based on a set of 530 BEAMVIEW™ images of treatments of 45 different patients with the following distribution: 20 breast, 10 lung, four head and neck, four brain, two pelvis, two mantle, two abdomen and one extremity. One of the patients had not been simulated prior to treatment and the first treatment portal film, reviewed and 165

R Bissett, S Boyko, K Leszczynski et al

approved by the oncologist served as a reference prescription image in this instance. For the remaining 44 patients conventional simulation films were used as the reference. Digital portal images, without any form of off-line post-processing, were compared visually with the reference images mounted on an adjacent light box and the accuracy of the treated field was evaluated independently by each of the 16 observers participating in the study. The observers were four radiation oncologists, eight radiation therapists (including two dosimetrists working in the area of treatment planning), and four medical physicists. Only one of the oncologists and four of the therapists had prior experience in viewing and evaluating on-line portal images. The evaluation was performed using an off-line BEAMVIEW™ computer and software system together with a standard lightbox (for prescription radiographs). These were located in a viewing room, away from the treatment areas, with controlled lighting conditions. For optimal observer performance, each viewing session was kept to 1 h in length and the entire image data set was reviewed over a number of sessions. There was no time limit for the evaluation of these images. Images for a given patient could be viewed in any order during a session and at multiple times when desired. At the initial session each observer was given standardized instructions, was invited to become familiar with the brightness and contrast controls on the BEAMVIEW™ monitor and practised evaluating field accuracy for two preselected patients. The results of these practice runs were not included in the analysis. The observers assessed the degree of conformity between the treated and prescribed fields for each portal image by rating on a scale comprising five categories, viz "definitely acceptable", "probably acceptable", "borderline" (or "cannot decide"), "probably unacceptable" and "definitely unacceptable". If a response fell into the latter two categories, observers were asked to describe in their own words why the treated field was unacceptable. This scale was more gradual than the two category scale ("acceptable" or "unacceptable") used for on-line evaluations, so that more subtle differences, observed in off-line evaluations without the pressure of ongoing treatments, could be recorded and analysed. No guidance was offered as to the types or sizes of errors to be included in the various categories as there are no generally accepted criteria. The analysis of agreement between individual observers and within observer groups was carried out using kappa statistics [9, 10]. For measuring interobserver agreement the kappa value (K) is calculated as the observed proportion of agreements in ratings by the two observers, po, corrected for the expected proportion of agreements by chance, pc. The kappa has a value of 1 when the agreement is perfect: Po~Pe

(1)

In general, kappa values of 0.8-1 indicate very good 166

agreement, 0.6-0.8 good, 0.4-0.6 moderate, 0.2-0.4 fair, and below 0.2 poor agreement [ 9 ] . For measuring overall agreements within observer groups and portal image categories, somewhat modified definitions of kappa were used [10]. Detailed formulae for kappa calculations are given in Appendix 1. Results Before proceeding to the presentation of kappa results describing agreement between the 16 observers a few illuminating examples of evaluations of individual treatments will be provided. The examples illustrate the extremes in interobserver agreement as well as in acceptibility of portals. The portal image of a large mantle field together with the distribution of observers' assessments of field acceptability is shown in Figure 1. In situations where the imager is too small to capture the whole radiation field, complete evaluation is not possible and this leads to the range of responses depicted in Figure lb. However, some important details of the field, such as the lung shields in this particular case, may be verifiable even with a restricted field of view. The digital portal image of a lung field, in which the blocks intended to shield the left hilum and upper spinal cord are placed on the wrong side of the field, appears in Figure 2a. Such an error would be classified as "gross" by most, if not all, workers in the field. Figure 2b is a histogram of observers' evaluations of the radiation field. One observer failed to identify such an obvious error. On one occasion a conventional portal film was exposed simultaneously with the acquisition of a BEAMVIEW™ image. This was accomplished by placing a conventional portal film in a film cassette holder mounted on the BEAMVIEW™ detector, allowing acquisition of the on-line image while the film was being exposed. The portal film image was used as a reference for the BEAMVIEW™ image. In Figure 3, the digital image and the observer responses are shown. Again one observer had made a completely inaccurate judgement of this field. The distribution of remaining observers' evaluations between "probably" and "definitely acceptable" is considered to reflect a reluctance to make a strong commitment by some participants. This reluctance was a feature that was consistently apparent throughout the study. Some observers had a clear preference for selecting one of the two "definite" categories while others produced responses clustered around "borderline". In order to compare the responses between observer pairs, tables of joint frequencies of classification were formed from the numbers of portal images classified to one category by the first observer in a pair and to another category by the second observer. An example is given in Table I which shows the comparison between the responses obtained from two oncologist observers. These two observers showed the best degree of agreement with the calculated K equal to 0.41 +0.03. The complete analysis of agreement between pairs of the 16 observers is shown in Figure 4. Each bar in the graphs represents the

The British Journal of Radiology, February 1995

Radiotherapy portal verification

14 12

•III! Q

03

OH'

e^

Q

Assessment of Treatment Portal (b) Figure 1. (a) The BEAMVIEW™ image of a mantle field together with (b) the distribution of observers' assessments of its acceptability. In this and subsequent histograms the following abbreviations apply: D.U., definitely unacceptable; P.U., probably unacceptable; B., borderline; P.A., probably acceptable; and D.A., definitely acceptable.

14 12

Assessment of Treatment Portal (b) Figure 2. (a) The BEAMVIEW™ image of a lung field with the hilar and spinal cord shielding placed on the wrong side together with (b) the distribution of observers' assessments of its acceptability.

Vol. 68, No. 806

167

R Bissett, S Boyko, K Leszczynski et al

14 12

Assessment of Treatment Portal (b)

(a)

Figure 3. (a) The BEAM VIEW™ image of a whole cranial field together with (b) the distribution of observers' assessments of its acceptability. Table I. Comparison of assessments of 530 on-line portal images between two oncologist observers Oncologist 2 Definitely unacceptable Oncologist 1 Definitely unacceptable Probably unacceptable Borderline Probably acceptable Definitely acceptable

Probably unacceptable

Probably acceptable

Definitely acceptable

5

4

0

0

1

5 1

7 7

4 13

6 16

1 8

1

12

25

37

114

0

5

3

39

215

degree of agreement between two observers. As per Landis and Koch [11], a value of kappa less than 0.2 is classified as "poor" agreement; between 0.2 and 0.4 is considered "fair" agreement, and from 0.4 and 0.6 the agreement is "moderate". The key conclusions to be drawn from the data of Figure 4 are: (i) agreement between the four radiation oncologists would be appropriately described as "fair"; (ii) agreement between therapists fell into both the "fair" and "poor" categories; and (iii) agreement between oncologists and therapists fell almost equally into the "fair" and "poor" categories. Further analysis of the observer data showed that the oncologists were more definite in their evaluations of treatment fields. Their responses 168

Borderline

tended to fall into one of the two extreme categories ("definitely acceptable" or "definitely unacceptable") more often than observers with other professional training (55% "definitely acceptable" or "definitely unacceptable" for oncologists versus 47% for other observers, p< 0.0001). In Table II, strengths of agreements within observer groups and within portal image evaluation categories are shown. The intragroup agreement among the oncologist observers is by far the strongest, although still can only be described as "fair". Another clear tendency is that for each group agreement is best for the "definitely unacceptable" category, which indicates that clear localization errors are fairly consistently recognized in

The British Journal of Radiology, February 1995

Radiotherapy portal verification

b) Very

Oncologist to Oncologist

Very Good

Oncologist to Physicist

Moderate 5

mut mi

HIM

6

5

10

12

14

16

Comparison

0

5

10

15

20

25

0

30

5

10

15

20

25

30

Comparison

Comparison

Very Good

Therapist to Therapist

Physicist to Physicist

Very Good

Moderate 5

0

5

10

15

20

Figure 4. (a)-(f) Kappa scores for interobserver agreement.

25

Comparison

Table II. Kappa results and standard errors for intragroup and intracategory agreements Observer group

All 16 observers 4 oncologists 8 therapists 4 physicists

Vol. 68, No. 806

Overall kappa

0.096±0.0025 0.217±0.012 0.065±0.005 0.018±0.012

Kappa for Definitely unacceptable

Probably unacceptable

Borderline

Probably acceptable

Definitely acceptable

0.257±0.004 0.413±0.018 0.245±0.008 0.192±0.018

0.089 + 0.004 0.106 + 0.018 0.116 + 0.008 -0.021+0.018

0.213 + 0.004 0.397 + 0.018 0.161+0.008 0.036 + 0.018

0.015±0.004 0.042 + 0.018 -O.OO5±O.OO8 O.OO7±O.O18

0.114±0.004 0.297 + 0.018 O.O53±O.OO8 O.OO5±O.O18

169

R Bissett, S Boyko, K Leszczynski et al

oncologist must be prepared to delegate some authority to the treatment therapist. In view of the relatively poor agreement between oncologists and therapists (Figure 4), it is recommended that this delegation of authority only occurs under conditions where quantitative measures of field placement error have been made and definite rules for treatment interruption have been established. Identifying these conditions should form a major component of OPI clinical protocol development as long as treatment field verification and rectification rely on a subjective comparison of two images. The poor agreement between decisions made on-line and off-line (Figure 5) constitutes further grounds for concern. These data suggest that, under the pressure of on-going clinical treatments, decisions concerning treatment field accuracy are even less likely to agree with the oncologists' evaluation than those made under more Discussion and conclusion ideal viewing conditions. The absence of accepted criteria to be used in the The off-line observer study described above generated verification of a treated field results in the very variable some additional information concerning field accuracy agreement between pairs of observers' evaluations and OPI use. For those treatment fields which were reflected in the statistics of Figure 4. It is noteworthy assigned to one of the two "unacceptable" categories, that a higher degree of agreement exists between radi- observers were requested to indicate in their own words ation oncologists' evaluations of the acceptability of a where the error lay. In approximately half the cases, the treatment field, but the degree of agreement is still sub- error was considered to relate to added shielding and, optimal. Potential reasons for this include: (1) oncolo- in the other half, to the position and/or orientation of gists may judge field placement accuracy by differing the patient with respect to the treatment machine. criteria; (2) The oncologist may be more familiar with At the conclusion of the oncologists' viewing sessions treatment of some tumour sites than others; or (3) sub- they were asked if, in the light of images they had seen, jective assessment may be too difficult a task to be they would recommend any changes in clinical practices. optimized. Two principal comments were received. It was strongly At its current state of development, true on-line use recommended that double exposure on-line images be of portal imaging requires the treatment field evaluation obtained at least for the first image acquired in a course to be performed by a therapist at the control console of treatment. This recommendation has been of the linear accelerator. If intervention is regarded implemented. It was also suggested that reproducibility as permissible following evaluation, the radiation would be increased by fixing, for a course of treatment, added shielding to the shadow tray rather than pos1.0 itioning the shielding blocks at each treatment as is Very currently done in our centre for some treatment sites. On-line to Off-line Good The impact of adopting this suggestion on our current 0.8 practice is being evaluated. Significant progress in the effective clinical utilization Good of OPI clearly requires the elimination of the variability reflected in the statistics of Figures 4 and 5 and Table II 0.6 whilst not impairing patient throughput. Computer Moderate assisted image registration studies [2-5] will ultimately Q. result in numbers being associated both with the degree 8-0.4 of conformity between prescribed and treated fields and : with the adjustments required to improve this conformity. This, however, will not be sufficient. It will also be 0.2 necessary to eliminate, as much as possible, the subjective contribution of the on-line portal image observer. •oor Several authors have developed techniques for reduc0.0 ing the subjective component of the comparison. These techniques include both fully automated anatomical feature extraction [3, 4] and interactive anatomical feature Off-line Observer ID extraction where the observer uses a computer mouse or other input device to mark anatomical points or curves Figure 5. Kappa scores for on-line and off-line observer [5, 11, 13, 14] on digitized simulator and portal images agreement. on-line portal images. Evans et al [12] have found the same for qualitative analyses of on-line images. For the off-line evaluation of imaged fields, observers had a choice of one of five responses. In order to compare off-line with on-line evaluation, in which case only "acceptable" or "unacceptable" responses were possible with our clinical protocol, it was necessary to amalgamate off-line responses. The kappa values presented in Figure 5 quantify the agreement between on-line and offline field evaluations when the off-line categories "definitely unacceptable" and "probably unacceptable" are jointly classed as "unacceptable" and the other three offline categories are classed as "acceptable". The "poor" agreement between field evaluation made in the clinical environment and that made under more ideal viewing conditions is apparent.

M..1.I.I..I. •

170

The British Journal of Radiology, February 1995

Radiotherapy portal verification

which then allows registration of the two images and determination of rotation and translations between them. Although the interactive techniques combined with computer assisted registration yield more reproducible results than a completely subjective visual approach, some interobserver variability still occurs and it is affected by such factors as observer experience, clinical vs controlled viewing conditions and the absolute magnitude of the rotations and translations in the image set [5,14]. Aside from the residual interobserver variability, interactive techniques are not well suited to on-line comparison if the intention is to allow adjustment of patient position to minimize field placement errors during a particular fraction of radiotherapy. The time required for contouring of the anatomical features (of the order of 45-60 s) precludes the use of such techniques for this purpose. Fully automated techniques employ computer algorithms for anatomical feature extraction and image registration. Gilhuijs and van Herk's [3] chamfer matching technique appears to be closest to being clinically useful for on-line registration. It requires only 3 s of computation on a 486 based computer to complete the image registration of simulator and portal images. At the time of Gilhuijs and van Herk's publication, however, detailed clinical studies of this technique have not been undertaken. Clinical decision making to judge field placement accuracy using portal images can be regarded as a two step process, whether it is done subjectively or objectively. The first step is a determination of the magnitude of the field placement errors and the second step is a decision as to whether the magnitude of the errors is sufficient to warrant interruption of treatment and repositioning of the patient. This second step has traditionally been the responsibility of the radiation oncologist and it has therefore been done off-line. As stated above; the ideal situation is to have a decision made on-line. It is not practical for a radiation oncologist to view all patients' portal images as they are being treated. An alternative would be to use artificial intelligence (Al) to aid in the decision making process. Very little has been written on the use of Al in radiotherapy imaging, but expert systems of this nature have been tested in diagnostic radiology settings [15-17] and in radiotherapy planning [18]. Either a rule based expert system [18] or a neural network expert system [19] could be applicable to decision making in radiotherapyfieldplacement accuracy. The difficulty with a rule based system is that the rules which oncologists use to judge whether field placement accuracy is adequate are unknown and may not be evident even to the oncologists. A neural network uses "features" of the images and oncologists' decisions about the images to "learn" to make decisions. We propose to develop and test a neural network expert system under clinical conditions that will use field placement error data obtained from simulator and portal images that have been registered using a chamfer matching algorithm. As a part of this work we will attempt to Vol. 68, No. 806

establish whether disagreement between oncologists is, in essence, due to random errors or if it is due to systematic differences in decision making. Acknowledgments

The active and enthusiastic participation of the observers and the therapists on the MX-2 accelerator is acknowledged with gratitude. Vahid Ghorbani, a Harold Johns Summer Student of the OCTRF, provided invaluable assistance in the collection and analysis of the observer study data. Finally, the staff of Siemens Medical Laboratories, Concord, California, USA have willingly responded to requests for support and assistance which have occurred frequently throughout the duration of this project. References 1. INTERNATIONAL COMMISSION ON RADIATION UNITS AND MEASUREMENTS, Dose Specification for Reporting External Beam Therapy with Photons and Electrons (ICRU Report 29) (Bethesda, MA, USA) (1978). 2. LESZCZYNSKI, K, LOOSE, S and DUNSCOMBE, P, A comparative study of methods for the registration of pairs of radiationfields,Phys. Med. Biol. 38, 1493-1502 (1993). 3. GILHUIJS, K G A and VAN HERK, M, Automatic on-line inspection of patient set up in radiation therapy using digital portal images, Med. Phys., 20, 667-677 (1993). 4. JONES, S M and BOYER, A L, Investigation of an FFTbased correlation technique for verification of radiation treatment set-up, Med. Phys., 18, 1116-1125 (1991). 5. MICHALSKI, J, WONG, J, BOSCH, W ET AL, An evaluation of two methods of anatomical alignment of radiotherapy portals, Int. J. Radiat. Oncol. Biol. Phys., 27, 1199-1206(1993). 6. DE NEVE, W, VAN DEN HEUWEL, F, COGHE, M ET AL, Interactive use of on-line portal imaging in pelvic radiation, Int. J. Radiat. Oncol. Biol. Phys., 25, 517-524 (1993). 7. REINSTEIN, L E, PAI, S and MEEK, A G, Assessment of geometric treatment accuracy using time-lapse display of electronic portal images, Int. J. Radiat. Oncol. Biol. Phys., 22,1139-1146(1992). 8. LESZCZYNSKI, K, SHALEV, S and COSBY, S, A digital video system for on-line portal verification, Proc. SPIE Med. Imag. (IVConf., Newport Beach) (1990). 9. ALTMAN, D G, Practical Statistics for Medical Research (Chapman and Hall, London), 396-439 (1991). 10. FLEISS, L, Statistical methods for rates and proportions (John Wiley & Sons, New York), 213-235 (1981). 11. LANDIS, J and KOCH, G, The measurement of observer agreement for categorical data, Biometrics, 20, 159-174 (1977). 12. EVANS, P M, GILDERSLEVE, J Q, MORTON, E J ET AL, Image comparison techniques for use with megavoltage imaging systems, Br. J. Radioi, 65, 701-709 (1992). 13. BALTER, J M, PELIZZARI, C A and CHEN, G T, Correlation of projection radiographs in radiation therapy using open curve segments and points, Med. Phys., 19, 329-334(1992). 14. BIJHOLD, J, VAN HERK, M, VIJLBRIEF, R and LEBESQUE, J V, Fast evaluation of patient set-up during radiotherapy by aligning features in portal and simulator images, Phys. Med. Biol., 36,1665-1679 (1991).

171

R Bissett, S Boyko, K Leszczynski et al 15. M A C M A H O N , H, DOI, K, CHAN, H P ET AL, Computeraided diagnosis in chest radiology, J. Thorac. Imaging, 5, 67-76 (1990). 16. PREUL, M C, COLLINS, D L, FEINDEL, W and ARNOLD, D L, Discrimination of human intracranial tumors in vivo using lhrMr spectroscopic imaging with metabolic classification in feature space, Proc. Ann. Meeting Am. Assoc. Cancer Res., 34, 1383 (1993). 17. GOLDBERG, V, MANDUCA, A, EWERT, D L ET AL,

Improvement in specificity of ultrasonography for diagnosis of breast tumors by means of artificial intelligence, Med. Phys., 19,1475-1481 (1992). 18. KALET, I J and PALUSZYNSKI, W, Knowledge-based computer systems for radiotherapy planning, Am. J. Clin. Oncol, 13, 344-351 (1990). 19. WU, Y, GIGER, M L, DOI, K ET AL, Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer, Radiology, 187, 81-87(1993).

Appendix 1

Derivations of the kappa statistics can be found in [9] and [10]. This appendix is restricted to presenting the basic formulae employed in the course of our study. Calculation of kappa for two observers Let N be the number of subjects that were classified by two independent observers into k different categories. From the observers' responses one can form a k x k table: Observer 2

Observer 1

Category 1 ... Category k

Category 1

...

Category k

fn

...

/lk

fkl

...

fkk

where ftj is the frequency of joint classification or the number of subjects classified to the ith category by the first observer and to the jih category by the second observer. These frequencies divided by N can be used as estimates for the appropriate probabilities of joint classification, p0-: Pij=fij/N

(Al)

Furthermore, similarly normalized sums of rows and columns of the joint frequency table give us estimates of the overall probabilities of classification to each category by each of the observers independently from the other: k

Pi = Y,Pij

(A2a)

k

P.j^Y^Pij

(A2b)

In the above formulae pt. is the overall probability of classification to the ith category by the first observer, and Pj is the overall probability of classification to the jth category by the second observer. The observed proportional agreement on all categories, po, can be calculated as the normalized sum of all classifications, where the two observers agreed:

Po=Jjtfu

(A3)

The expected proportion of purely coincidental agreements, pe, can be calculated as the sum of products of independent probabilities: k

Pe=T.Pi.XP.i

(A4)

From these two the value of kappa (K) is derived: K=Es^Jk

(A5)

1-Pe

In the above formulation, only perfect agreements contribute to po. However, when categories are ordered, one can include quantification (weighting) of the degree of agreement in the kappa calculations. A perfect agreement should receive weighting of 1, while discrepancies in classifications by the two observers by one, two and up to (k — 1) categories result in gradually decreasing weightings. In our analysis we employed linearly decreasing weights,

172

The British Journal of Radiology, February 1995

Radiotherapy portal verification

wtj, as in [ 9 ] : (A6)

The weighted observed, p wo , and expected, pwe, proportions of agreement are calculated as: Pwo=^Z Z k

UXfiJ

k

Pwe=y y r we

W

£_j

x p . XPj

w

^j

ij

r i.

r .j

(A7) \

/

The weighted kappa, KW, is given by: Pwo

Pwe

K W =-J—

/ i r> \

(A8)

The standard error of the weighted kappa, SE(/cw), can be calculated according to the formula given by Fleiss [10]:

I! where

and w , = Z PiWij

(AH)

i= l

Calculation of kappa for more than two observers For the case of more than two observers, Fleiss [10] added new definitions of kappa values to the ones calculated for observer pairs. These new kappas quantify the agreement within observer groups and within classification categories. Assuming m to be the number of observers in a group (or alternatively the number of classifications per subject), N and k to be the numbers of subjects and categories, respectively, the kappa for the jth group, Kj, is defined as follows:

Nm{m-\)pfl}

(A12)

where xu is the number of classifications of the ith subject to the jth category, pj is the proportion of all classifications that went to the jth category: N

N

Z xtj

Z xu

ft f Z Z u

(A13)

x

and qj is the complement of p}, i.e. qj=\—pi.

The overall kappa for the m observers is then given as:

Nm2-Y Z 4 K = 1 -

^ ^ iVm(m-l) Z Pftj

(A14)

J= I

Vol. 68, No. 806

173

R Bissett, S Boyko, K Leszczynski et al

The standard errors of the intracategory and intraobserver group kappas, SE(/Cy) and SE(/c) can be estimated using the formulae given by Fleiss [10]: SE(Kj)= 1

SE(K) = ~k

^ Pjqjy/Nm(m-1)

174

/ V Nm(m—l)

x /(£ V

^

(A15)

PjqX-

i PfiMj-Pj)

(A16)

7

The British Journal of Radiology, February 1995

Suggest Documents