Departments of Radiology, 2Medical Physics and 3Medical Statistics, Royal Victoria Infirmary,. Queen Victoria Road, Newcastle upon Tyne NE1 4LP, UK.
1995, The British Journal of Radiology, 68, 1087-1089
How reliable are ultrasound measurements of renal length in adults? 1
M J ABLETT, MRCP, FRCR, 1 A COULTHARD, FRCS, FRCR, ""R E J LEE, BSc, FRCR, 1 D L RICHARDSON, FRCR, ""T BELLAS, DCR, DMU, 1 J P OWEN, DMRD, FRCR, 2 M J KEIR, BSc, PhD and 3 T J BUTLER, BA, MSc Departments of Radiology, 2Medical Physics and 3Medical Statistics, Royal Victoria Infirmary, Queen Victoria Road, Newcastle upon Tyne NE1 4LP, UK Abstract
Ultrasound assessment of patients with renal impairment commonly includes measurement of bipolar renal length. Reduction in length is considered to indicate chronic renal disease and is a factor in deciding whether to proceed to renal biopsy. To date, no published data are available on interobserver and intraobserver variation in sonographic renal length measurement in adults. Bilateral renal lengths were measured in 20 adult subjects, with no history of renal disease, by three experienced operators, on two separate occasions. Limits of agreement for replicate measurements by each ultrasonographer and for replicate measurements by each pair of ultrasonographers were determined. Values of repeatability (a measure of intraobserver variation) and reproducibility (a measure of interobserver variation) were calculated for all renal length measurements, and for right and left renal lengths separately. Results indicate that replicate renal length measurements differ by less than 1.85 cm in 95% of cases, and the magnitude of variations is similar when measurements are made by either single or different ultrasonographers, and are similar for right and left renal length measurements. This suggests that sonographic bipolar renal length measurements in normal adult kidneys are reasonably reliable. In diseased kidneys, however, in which identification of renal poles is difficult, interobserver and intraobserver variation may be much greater.
Ultrasound measurement of renal length is frequently used as an indicator of the chronicity of renal disease, with a vajue of 9 cm or less considered to indicate irreversible disease [ 1 ]. It is also an important factor in the decision to undertake renal biopsy, as knowledge of the histology of shrunken kidneys in chronic renal failure is frequently unhelpful in subsequent treatment and the complication rate following biopsy is increased in shrunken kidneys [ 2 ] . It is therefore important that sonographic renal length measurements are consistent, both for replicate measurements by a single ultrasonographer and by different ultrasonographers. There have been several studies [ 3 - 5 ] which have assessed intraobserver and interobserver variations in sonographic renal length measurements in children, but we are unaware of any such studies in adults. The purpose of this study was to evaluate the reliability of sonographic renal length measurements in healthy adult subjects. Materials and methods 20 adult subjects with no history of renal disease were examined by three experienced ultrasonographers. For each case the same ultrasound scanner (Acuson XP128, Acuson Ltd, Middlesex) and 3.5 MHz sector Received 13 March 1995 and in revised form 11 May 1995, accepted 22 May 1995. Address correspondence to Dr J P Owen.
Vol. 68, No. 814
transducer were used. Renal length was measured bilaterally using electronic calipers, the ultrasonographers using their usual examination and measurement techniques. All subjects were examined on two separate occasions by each ultrasonographer, with sufficient time between the examinations to ensure that the ultrasonographers were not biased by knowledge of their previous results. Limits of agreement for replicate measurements by each ultrasonographer and each pair of ultrasonographers were calculated using the method of Bland and Altman [ 6 ] . Using components of variation analysis, values of repeatability and reproducibility were calculated for all renal length measurements and right and left renal length measurements separately. In the context of this study, repeatability is that value which exceeds the difference between replicate renal length measurements made by the same observer, in 95% of the cases. Whereas reproducibility is that value which exceeds the difference between replicate renal length measurements made by two observers, in 95% of cases. Results A total of 240 renal length measurements were made in 20 subjects, six males and 14 females. The subjects' heights measured 1.74-1.86 m (mean 1.81 m) for males and 1.56-1.81 m (mean 1.65 m) for females. Subjects' weights were 62-88 kg (mean 74 kg) for males and 52-75 kg (mean 59 kg) for females.
1087
M J Ablett, A Coulthard, R E J Lee et al Table I. Limits of agreement for renal length measurements
Observer Observer Observer Observer Observer Observer
Mean difference in renal length measurements (cm)
SDof differences
Limits of agreement (cm)
-0.10 -0.02 -0.03 -0.20 -0.41 -0.21
0.48 0.56 0.71 0.53 0.72 0.64
-1.06 -1.14 -1.45 -1.26 -1.85 -1.49
1 2 3 1 vs 2 2 vs 3 1 vs 3
to to to to to to
0.86 1.10 1.39 0.86 1.03 1.07
Comparison with earlier studies is complicated by various factors. All previous studies involved paediatric subjects with significantly smaller kidneys, as well as including kidneys with known pathology. The methods of comparison also varied considerably. The results of Schlesinger et al [3] seem to yield broadly similar results for interobserver and intraobserver variation. Hederstrom and Forsberg [5] only reported interexamination results, but these appear initially to be better than our results. However, the basis of their assessment was to determine the percentage of repeated measurements falling within a range of, for example, ± 1 cm. It may be that the absolute reproducibility of the measurements of children's kidneys are smaller than those of adults. Sargent and Wilson [4] used a variety of methods for their comparison, including the one employed here, but comparison is made difficult due to some doubts about the nature of their detailed calculations. We note that the abnormal kidneys they studied were considerably larger than the normal kidneys and some of the mean intraobserver and interobserver differences they quote are negative. This implies that they have used actual differences between measurements rather than the absolute differences, and indeed their variability seems correspondingly small.
Limits of agreement for each ultrasonographer were similar (Table I) and did not exceed +1.45 cm. Figure 1 shows the distribution of the differences in replicate length measurements for each ultrasonographer. The values of repeatability for all renal length measurements was 1.21 cm, with no difference between right and left renal length measurements. Limits of agreement for each pair of ultrasonographers did not exceed +1.85 cm (Table I). Figure 2 shows the distribution of the differences in replicate length measurements for each pair of ultrasonographers. The values of reproducibility were 1.45 cm, with no significant difference in the values of reproducibility for right and left renal length measurements. For all three ultrasonographers, variations in replicate renal length measurements did not exceed 1.85 cm, in 95% of the cases, and there was no significant difference in the size of these variations for right and left renal length measurements. Discussion It has been shown that in normal adult kidneys, replicate sonographic renal length measurements differ by values of between about 1 and 1.85 cm in 95% of the cases, irrespective of whether the measurements are performed by the same or by different ultrasonographers.
• A
A
U/S 1
+
U/S 2
A
U/S 3
•
AA A A
A
A
A
+
A
•
•
•
A •
• +
A A
+
A A +
A
A
+
+
« A
+
•
•
'
• +
+
A
+
Figure 1. Bland-Altman plot for intraobserver variation of renal length measurements.
+
A
A
A A A
i
r
12 13 Mean Length (cm)
1088
The British Journal of Radiology, October 1995
Reliability of US measurements of adult renal length
1.5 A
+
A
U/S1
+
U/S1
vs U/S3
U/S2
vs U/S3
A
A
1 -
•
vs U/S2
A .5 "AA +
^
+
#
'
+ +
•
-A * * •
-.5 -
+
•
oA+
'
Figure 2. Bland-Altman plot for inter observer variation of the mean of each observer's renal length measurements.
+
• *v. •
•
-1.5 -
-2 12 13 Mean Length (cm)
It is concluded that sonographic renal length measurements in normal adult kidneys are relatively consistent when made by either the same or by different ultrasonographers, with variations of the order 1-2 cm in 95% of cases and with no significant difference in these variations for right and left renal length measurements. However, the magnitude of variations is sufficiently large to suggest caution in interpreting sonographic renal length measurements if a fixed normal range is used. The lower limif of normal is commonly taken to be 9 cm [1] but results for two of our subjects showed renal length measurements less than this in one of six replicate measurements. This suggests that a small proportion of normal-sized kidneys could be misclassified as small if this method is used. We would advocate caution in extrapolating our findings in normal kidneys to the evaluation of diseased kidneys in which identification of the renal poles is more difficult. In such circumstances interobserver and intraobserver variations are likely to be much greater.
References 1. ROGER, S D, BEALE, A M, CATTELL, W R and WEBB, J A W , What is the value of measuring renal parenchymal thickness before renal biopsy? Clin. Radioi, 49,45-49 (1994). 2. MADAIO, M P, Renal biopsy, Kidney Int., 38, 529-543 (1990). 3. SCHLESINGER, A E, HERNANDEZ, R J, ZERIN, J M ET AL, Interobserver and intraobserver variations in sonographic renal length measurements in children, AJR, 156, 1029-1032(1991). 4. SARGENT, M A and WILSON, B P M, Observer variability in the sonographic measurement of renal length in childhood, Clin. Radioi, 46, 344-347 (1992). 5. HEDERSTROM, E and FORSBERG, L, Accuracy of repeated kidney size estimation by ultrasonography and urography in children, Ada Radioi. Diagn., 26, 603-607 (1985). 6. BLAND, J M and ALTMAN, D G, Statistical methods for assessing agreement between two methods of clinical measurement, Lancet, 1, 307-310(1986).
Acknowledgments We would like to thank the volunteers for participating in the study and Mrs J T Stoddart of the University of Newcastle upon Tyne for typing the manuscript.
Vol. 68, No. 814
1089