Automated Sleep Scoring with Human Supervision Adds Value ...

1 downloads 0 Views 231KB Size Report
Cohen's Kappa values 0.79 – 0.87 from page 124 of Silber et al.3 cited by Zammitare for the intra-rater not inter-rater vari- ability. Danker-Hopke et al. [reference ...
Letter to the Editor

Automated Sleep Scoring with Human Supervision Adds Value Compared with Human Scoring Alone: A reply to Zammit G. K. Insufficient evidence for the use of automated and semi-automated scoring of polysomnographic recordings. SLEEP 2008:31;449-50. Vladimir Svetnik, PhD1; Junshui Ma, PhD1; Keith A. Soper, PhD1; Scott Doran, PhD2; John J. Renger, PhD2; Steve Deacon, PhD3; Ken S. Koblan, PhD4 Merck Research Laboratories, Biometrics Research, Rahway, NJ; 2Merck Research Laboratories, Sleep & Schizophrenia, West Point, PA; H. Lundbeck A/S, International Clinical Research, Valby, Denmark, and Lundbeck Ltd, UK; 4Merck Research Laboratories, Basic Research, Rahway, NJ 1 3

DATA PRESENTED IN OUR PAPER1 (1) ADDRESSED COMPARISONS OF FULLY AUTOMATED AND SEMI-AUTOMATED APPROACHES TO EVALUATE SLEEP measures from a single clinical trial of an established insomnia treatment; (2) examined the extent to which human supervision was effective in enhancing quality of semi-automated scoring; and (3) provided comparison between two methods of automated and semi-automated systems on the same clinical data set. This work is relevant to understanding efficacy of human effort to improve accuracy of fully automated scoring. In his Letter to the Editor, Zammit incorrectly contends that, in our paper, automated scoring techniques were evaluated using data “processed under controlled conditions in a core PSG laboratory.”2 In fact, what we used were copies of actual recordings made at the sites of data collection. They were from all patients completing the clinical study protocol and were sent to the automated and semi-automated scoring labs free of preprocessing. Variability of manual scoring is an issue that has been addressed in a plethora of work, some of which was referenced in our paper [references 3-4, 6-7, 11-12] 1. We provided comparison of our results with the previous work to the extent possible. We also focused on comparisons of scoring methods with respect to the derived values of clinical endpoints. Cohen’s Kappa values 0.79 – 0.87 from page 124 of Silber et al.3 cited by Zammit are for the intra-rater not inter-rater variability. Danker-Hopke et al. [reference 26, in Silber et al.3], reported a Kappa value of 0.68 in a study using 5 sleep stages. Our results were for 6 sleep stages and showed that agreement between automated scoring performed at various sites, with differing automated scoring systems and partial review procedures was Kappa 0.67. When the same automated scoring approaches

were used, but reviews of the scores were by different scorers with differing protocols, Kappa increases to 0.74 (c.f. Table 1 1). Zammit points to a low agreement of automated scoring in Stage 1, presumably in comparison with strict manual scoring. However, in Silber et al., one finds that: “the inter-rater variability of Stage 1 sleep is in the fair range of Cohen Kappa, 0.21–0.4” and that due to this low agreement the rules for its scoring “needed reassessment”. We thank Zammit for reminding readers that the Standards4 require “automatically scored tracings be reviewed on an epochby-epoch basis,”2 and we do not disagree with this. We disagree that this requirement “renders automated scoring unnecessary.”2 Our results suggest that epoch-by-epoch review of automated scores done by different reviewers, renders a higher reproducibility compared with epoch-by-epoch manual scoring and may thus provide an opportunity for an improvement in technique for the larger field of sleep medicine to consider as it evolves. Reluctance to investigate techniques that may facilitate/improve scoring seems counter-intuitive to the advancement of this field. We hope that our work is a step to evaluate technologies that can more rapidly and objectively examine clinical EEG data than human scoring alone. We endorse that more work is needed to better understand the benefits of changing current approaches, and this is a small step in that endeavor. We thank Zammit for sharing his opinions and look forward to further exchange of ideas as the field progresses. References 1.

2.

Disclosure Statement All authors with the exception of Dr. Deacon are employees of Merck. Dr. Deacon was an employee of H. Lundbeck A/S when the paper by Svetnik V. et al. published in November 2007 was written. He is currently an employee of ONO Pharma, London, UK.

3.

Submitted for publication February, 2008 Accepted for publication February, 2008 Address correspondence to: Vladimir Svetnik, Merck & Co., Inc., RY 33300, 126 East Lincoln Ave., Rahway, NJ 07065; Tel: (732) 594-5544; Fax: (732) 594-1565; E-mail: [email protected] SLEEP, Vol. 31, No. 4, 2008

4.

451

Svetnik V, Ma, J., Soper, K., Doran, S., Renger, J. J., Deacon, S., & Koblan, K. S.: Evaluation of automated and semi-automated scoring of polysomnographic recordings from a clinical trial using zolpidem in the treatment of insomnia. Sleep 2007; 30:1562-74. Zammit GK: Insufficient Evidence for the Use of Automated and Semi-Automated Scoring of Polysomnographic Recordings. Sleep 2008;31;449-50. Silber MH, Ancoli-Israel S, Bonnet MH, Chokroverty S, GriggDamberger MM, Hirschkowitz M, Kapen S, Keenan SA, Kryger MH, Penzel T, Pressman MR, Iber C: The visual scoring of sleep in adults. J Clin Sleep Med 2007; 3:121-31 American Academy of Sleep Medicine, AASM Standards for Accreditation Sleep Disorders Centers. Westchester, IL, 2007.

Letter to the Editor—Svetnik et al