structural classification of glaucomatous optic neuropathy - CiteSeerX

0 downloads 0 Views 7MB Size Report
optic nerve head (e.g. confocal scanning laser tomography) are increasingly used to ...... frequently used solutions to this boundary problem are (1) missing value substitution ...... [72] NV Swindale, G Stjepanovic, A Chin, and FS Mikelberg.
STRUCTURAL CLASSIFICATION OF GLAUCOMATOUS OPTIC NEUROPATHY DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Michael Duane Twa, B.A., O.D., M.S ***** The Ohio State University 2006

Dissertation Committee:

Approved by

Mark A. Bullimore, Adviser Srinivasan Parthasarathy P. Ewen King-Smith Chris A. Johnson

Adviser Graduate Program in Vision Science

ABSTRACT

Glaucoma is a leading cause of blindness. Quantitative methods of imaging the optic nerve head (e.g. confocal scanning laser tomography) are increasingly used to diagnose glaucomatous optic neuropathy and monitor its progression, yet there is considerable controversy about how to interpret and make best use of this structural information. In this research, machine learning methods are proposed and evaluated as alternatives to current methods of disease classification. First, multiple mathematical modeling methods such as radial polynomials, wavelet analysis and B-spline fitting were used to reconstruct topographic descriptions of the optic nerve head and peripapillary region. Next, features derived from these models were extracted and used as classification features for automated decision tree induction. Decision tree classification performance was compared with conventional techniques such as expert grading of stereographic photos, Moorfields Regression Analysis, and visual field-based standards for the cross-sectional identification of glaucomatous optic neuropathy. Pseudozernike polynomial modeling methods provided the most compact and faithful representation of these structural data, albeit at considerably greater computational expense when compared to wavelet and B-spline modeling methods. The pseudozernike-based classifier had the greatest area under the receiver-operating

ii

characteristic (ROC) curve, 85% compared to 73% and 71% for the wavelet and B-spline-based classification models respectively. These results show that automated analysis of optic nerve head structural features can identify glaucomatous optic neuropathy in very good agreement with expert assessments of stereographic disc photos. Moreover, these quantitative methods can improve the standardization and agreement of these assessments. Extensions of these methods may provide alternative ways to evaluate structural and functional disease relationships in glaucoma.

iii

For Jeanette who has shared this journey

iv

ACKNOWLEDGMENTS

This effort would amount to very little were it not for the support and encouragement of some selfless individuals who have invested some portion of their time and their careers for my benefit. First, I must acknowledge my family who have willingly shared this adventure that is my career. In retrospect, I hope this chapter in our lives will be a source of encouragement to set high goals and not to fear change, but to embrace change and uncertainty as opportunity. It is with deep admiration and gratitude that I thank my adviser and friend, Professor Mark Bullimore. His generous concern and sincere consideration for my family enabled me to pursue this graduate career. I have grown considerably in the shadow of his intellect, insightful perspective, and tenacity. Mentors teach on so many levels and it has been an honor to learn from Dr. Bullimore what it means to be a good teacher, what it takes to serve your profession, and how to earn the respect of your peers. Although my contributions will differ, they will be tempered by his mentoring and hopefully reflect his influence in a way that will bring honor. For much of my graduate education at Ohio State I have fancied myself as a computer scientist. I am sincerely grateful to Professor Srinivasan Parthasarathy for not quashing those dreams. Instead, he provided direction and a productive fountain of ideas. It is through Dr. Parthasarathy’s generous collaboration that I have succeeded in spanning two academic disciplines—computer and vision science—while v

maintaining the quality needed to make meaningful contributions in both fields. I am grateful to Dr. Parthasarathy for expanding the dimensions of my graduate education and having the courage and enthusiasm to mentor me for these past several years. His generosity and foresight have helped successfully bridge these two academic disciplines and defined new challenges for the integration of information sciences and clinical decision making. None of this work would have been possible without the generous collaboration of Dr. Chris Johnson and his associates at the Devers Eye Institute in Portland, Oregon, especially Mrs. Cindy Blachly. Dr. Johnson kindly shared more than nine years of longitudinal data that is the foundation of this analysis. As a clinician, I appreciate the time and effort required to accumulate this valuable data. Dr. Johnson’s pioneering work in glaucoma and visual field assessment have not only shaped the way clinicians care for patients now, but will for many years to come. Likewise, my hope is that my career in research will have a positive impact on the care that others receive. Dr. Johnson has also given me an entirely different perspective on science. He has taught me to take a longer view of the scientific horizon—that studies in glaucoma can take years, or even decades to show results. His career has been a model of collaborative successes for many and it has been my good fortune to associate with him. I am indebted to Dr. P. Ewen King-Smith as well for his thoughtful comments throughout my education and his considerate, but ever-critical eye. Dr. King-Smith is the commensurate gentleman and scholar. Thanks to Michael Sinai of Heidelberg Engineering for technical support.

vi

Finally, I must acknowledge Professor Karla Zadnik. There are few people who inspire others as well as she does. Dr. Zadnik’s inspiration comes not only from words, but deeds as well. By example, she inspires one with the desire to lead. Her, career transition from clinician to researcher was an example and a source of moral support throughout my graduate school experience. I have never felt that I had a single adviser during my education. I benefited from a team of mentors, each lending their unique skill and perspective when needed. Dr. Zadnik helped to create the supportive culture and cooperative community at the College in which I have thrived. I am grateful to have been the beneficiary of her talent and leadership.

vii

VITA

March 12, 1963 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Born - Springfield, Ohio 1986 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.A. Biology, University of California, San Diego 1988 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.S. Physiological Optics, University of California, Berkeley 1990 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O.D. Clinical Optometry, University of California, Berkeley 1992-1996 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optometrist, Department of Ophthalmology University of California, San Diego 1996-2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Senior Optometrist, Department of Ophthalmology University of California, San Diego 1996-2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clinical Research Director, Department of Ophthalmology University of California, San Diego 1998-2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clinical Instructor of Ophthalmology, Department of Ophthalmology University of California, San Diego 2001-2002 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graduate Research Associate, College of Optometry The Ohio State University 2002 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.S. Vision Science, The Ohio State University 2001-2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clinical Instructor of Optometry, College of Optometry The Ohio State University 2002-2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postdoctoral Fellow, College of Optometry The Ohio State University viii

2002-2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . William C. Ezell Fellowship, American Optometric Foundation 2002-2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NIH-Clinical Research Fellowship, College of Optometry The Ohio State University 2005-present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Senior Research Associate, College of Optometry The Ohio State University

PUBLICATIONS Research Publications Liu JH, Kripke DF, Hoffman RE, Twa MD, Loving RT, Rex KM, Gupta N, Weinreb RN. Nocturnal elevation of intraocular pressure in young adults. Invest Ophthalmol Vis Sci 1998; 39:2707-2712. Liu JH, Kripke DF, Twa MD, Hoffman RE, Mansbaerger SL, Rex KM, Girkin CA, Weinreb RN. Twenty-four-hour pattern of intraocular pressure in the aging population. Invest Ophthalmol Vis Sci 1999; 40:2912-2917. Butler FK, Jr., White E, Twa M. Hyperoxic myopia in a closed-circuit mixed-gas scuba diver. Undersea Hyperb Med 1999; 26:41-45. Liu JH, Kripke DF, Hoffman RE, Twa MD, Loving RT, Rex KM, Lee BL, Mansberger SL, Weinreb RN. Elevation of human intraocular pressure at night under moderate illumination. Invest Ophthalmol Vis Sci 1999; 40:2439-2442. Tran DB, Zadok D, Carpenter M, Korn TS, Twa M, Schanzlin DJ. Intraocular pressure measurement in patients with instrastromal corneal ring segments. J Refract Surg 1999; 15:441-443. Twa MD, Karpecki PM, King BJ, Linn SH, Durrie DS, Schanzlin DJ. One-year results from the phase III investigation of the KeraVision Intacs. J Am Optom Assoc 1999; 70:515-524. Zadok D, Tran DB, Twa M, Carpenter M, Schanzlin DJ. Pneumotonometry versus Goldmann tonometry after laser in situ keratomileusis for myopia. J Cataract Refract Surg 1999; 25:1344-1348. ix

Ruckhofer J, Twa MD, Schanzlin DJ. Clinical characteristics of lamellar channel deposits after implantation of intacs. J Cataract Refract Surg 2000; 26:1473-1479. Suiter BG, Twa MD, Ruckhofer J, Schanzlin DJ. A comparison of visual acuity, predictability, and visual function outcomes after intracorneal ring segments and laser in situ keratomileusis. Trans Am Ophthalmol Soc 2000; 98:51-55; discussion 55-57. Twa MD, Hurst TJ, Walker JG, Waring GO, Schanzlin DJ. Diurnal stability of refraction after implantation with intracorneal ring segments. J Cataract Refract Surg 2000; 26:516-523. Twa MD, Ruckhofer J, Schanzlin DJ. Surgically induced astigmatism after implantation of intacs intrastromal corneal ring segments. J Cataract Refract Surg 2001; 27:411-415. Huang ET, Twa MD, Schanzlin DJ, Van Hoesen KB, Hill M, Langdorf MI. Refractive change in response to acute hyperbaric stress in refractive surgery patients. J Cataract Refract Surg 2002; 28:1575-1580. Kaido TJ, Kash RL, Sasnett MW, Twa M, Marcellino G, Schanzlin D. Cytotoxic and mutagenic action of 193-nm and 213-nm laser radiation. J Refract Surg 2002; 18:529-534. Liu JH, Kripke DF, Twa MD, Gokhale PA, Jones EI, Park EH, Meehan JE, Weinreb RN. Twenty-four-hour pattern of intraocular pressure in young adults with moderate to severe myopia. Invest Ophthalmol Vis Sci 2002; 43:2351-2355. Ruckhofer J, Stoiber J, Twa MD, Grabner G. Correction of astigmatism with short arc-length intrastromal corneal ring segments: preliminary results. Ophthalmology 2003; 110:516-524. Twa MD, Ruckhofer J, Kash RL, Costello M, Schanzlin DJ. Histologic evaluation of corneal stroma in rabbits after intrastromal corneal ring implantation. Cornea 2003; 22:146-152. Twa MD, Parthasarathy S, Raasch TW, Bullimore MA. Decision tree classification of spatial data patterns from videokeratography using Zernike polynomials. In: Proceedings of the third SIAM International Conference on Data Mining. Ed Barbara D, Kamath C. San Francisco, CA. 2003; 3-12. x

Twa MD, Bailey MD, Hayes J, Bullimore MA. Estimation of pupil size with digital photography. J Cataract Refract Surg 2004; 30:381-389. Twa MD, Kash RL, Costello M, Schanzlin DJ. Morphologic Characteristics of Lamellar Channel Deposits in the Human Eye: A Case Report. Cornea 2004; 23: 412-420. Twa MD, Nichols JJ, Joslin CE, Kollbaum PS, Edrington TB, Bullimore MA, Mitchell GL, Cruickshanks KJ, Schanzlin DJ. Characteristics of Corneal Ectasia After LASIK for Myopia. Cornea 2004; 23:447-457. Twa MD, Roberts C, Mahmoud AM, Chang JS, Jr. Response of the posterior corneal surface to laser in situ keratomileusis for myopia. J Cataract Refract Surg 2005; 31:61-71. Bailey MD, Twa MD, Mitchell GL, Dhaliwal DK, Jones LA, PhD, McMahon TT The repeatability of autorefraction and axial length measurements after LASIK. J Cataract Refract Surg 2005; 31:1025-1034. Nichols JJ, Twa MD, Mitchell GL. Sensitivity of the National Eye Institute Refractive Error Quality of Life instrument to refractive surgery outcomes. J Cataract Refract Surg 2005; 31:2313-2318. Twa MD, Lembach RG, Bullimore MA, Roberts C. A prospective randomized clinical trial of laser in-situ keratomileusis using two different lasers. Am J Ophthalmol 2005; 140:173-183. Marsolo K, Twa M, Bullimore M, Parthasarathy S. A Model-Based Approach to Visualizing Classification Decisions for Patient Diagnosis. In: AIME-05: 10th Conference on Artificial Intelligence in Medicine Aberdeen, UK; Springer-Verlag 2005; 473-483. Twa MD, Parthasarathy S, Roberts C, Mahmoud AM, Raasch TW, Bullimore MA. Automated decision tree classification of corneal shape. Optom Vis Sci 2005; 82:1038-1046. Marsolo K, Twa M, Parthasarathy S, Bullimore M. Classification of Biomedical Data Through Model-based Spatial Averaging. In: BIBE-05: Proceedings of the 5th IEEE symposium on Bioinformatics and Bioengineering Minneapolis, MN; IEEE Computer Society 2005; 49-56. xi

Instructional Publications Twa MD, Moreira S. Astigmatism and toric contact lenses. In: Mannis MJ, Zadnik K, et al., eds., Contact lenses in ophthalmic practice. New York, Springer-Verlag, 2003; 90-108. Twa MD, Coral-Ghanem C, Barth B. Corneal topography and contact lenses. In: Mannis MJ, Zadnik K, et al., eds., Contact lenses in ophthalmic practice. New York, Springer-Verlag, 2003; 37-56.

FIELDS OF STUDY Major Field: Vision Science Studies in: Vision Science Prof. Mark A. Bullimore Computer Science and Engineering Prof. Srinivasan Parthasarathy

xii

TABLE OF CONTENTS

Page Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Chapters: 1.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

2.

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.1 2.2 2.3

3.

Structure—Function Relationships in Glaucoma . . . . . . Glaucomatous Optic Neuropathy . . . . . . . . . . . . . . Evolution of HRT Data Analysis . . . . . . . . . . . . . . 2.3.1 Traditional Metrics . . . . . . . . . . . . . . . . . . 2.3.2 Moorfields Regression Analysis . . . . . . . . . . . 2.3.3 Linear Discriminant Functions . . . . . . . . . . . 2.3.4 Applications of Machine Learning and Data Mining

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

4 5 7 8 9 9 10

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

3.1 3.2

12 17

Sample Description . . . . . . . . . . . . . . . . . . . . . . . . . . . Confocal scanning laser ophthalmoscopy . . . . . . . . . . . . . . . xiii

3.3

4.

Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . 3.3.2 Data Centration . . . . . . . . . . . . . . . . . . . . . . . .

20 20 21

Structural Modeling Methods . . . . . . . . . . . . . . . . . . . . . . . .

28

4.1

. . . . . . .

29 29 32 34 36 43 44

Structural Modeling Results . . . . . . . . . . . . . . . . . . . . . . . . .

48

5.1 5.2

Modeling Speed vs. Complexity . . . . . . . . . . . . . . . . . . . . Modeling Accuracy vs. Complexity . . . . . . . . . . . . . . . . . .

48 50

Structural Classification Methods . . . . . . . . . . . . . . . . . . . . . .

60

6.1

. . . . . . .

60 62 63 66 68 72 74

Structural Classification Results . . . . . . . . . . . . . . . . . . . . . . .

76

7.1 7.2

76 80 80 81 85 87 87 89 90 92 97

4.2 4.3

5.

6.

6.2 6.3 7.

7.3

7.4

Radial Polynomials . . . . . . . . . 4.1.1 Zernike Polynomials . . . . 4.1.2 Pseudozernike Polynomials B-splines . . . . . . . . . . . . . . Wavelets . . . . . . . . . . . . . . . 4.3.1 Hilbert Space-Filling Curves 4.3.2 Wavelet Basis Functions . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Classification Standards . . . . . . . . . . . . . 6.1.1 Stereoscopic Disc Photography Grading 6.1.2 Moorfields Regression Analysis . . . . . 6.1.3 Visual Field Assessments . . . . . . . . Decision Tree Classification . . . . . . . . . . . 6.2.1 Classification Experiment Methods . . . Feature Selection . . . . . . . . . . . . . . . . .

Comparisons of Classification Methods . . . . . Decision Tree Classification . . . . . . . . . . . 7.2.1 Tree Induction . . . . . . . . . . . . . . 7.2.2 Performance Comparisons . . . . . . . . 7.2.3 Model Selection . . . . . . . . . . . . . . Visualization of Results . . . . . . . . . . . . . 7.3.1 Pseudozernike Decision Tree . . . . . . . 7.3.2 Coiflet Wavelet Decision Tree . . . . . . 7.3.3 B-Spline Decision Tree . . . . . . . . . . 7.3.4 Two Dimensional Model Representations Alternative Gold Standards . . . . . . . . . . . xiv

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

7.4.1 7.4.2 8.

Moorfields Regression Analysis . . . . . . . . . . . . . . . . Visual Field-based Classification . . . . . . . . . . . . . . .

97 99

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.1 8.2 8.3

Discussion of Modeling Results . . . . . . . . . . . . . . . . . . . . 102 Discussion of Classification Results . . . . . . . . . . . . . . . . . . 107 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

xv

LIST OF TABLES

Table

Page

3.1

Inclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

3.2

Exclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

3.3

Clinical Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

3.4

HRT-1 Raw Data Export File Format . . . . . . . . . . . . . . . . . .

19

4.1

Number of Radial Polynomial Coefficients by Order . . . . . . . . . .

34

4.2

Number of Wavelet Coefficients by Family and Order . . . . . . . . .

46

5.1

One-Way Repeated Measures Analysis of Variance for Residual Model Error by Model Type . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

5.2

Comparisons of Residual Model RMS Error . . . . . . . . . . . . . .

54

6.1

Classification Sample Characteristics . . . . . . . . . . . . . . . . . .

61

6.2

Age Comparison of Stratified Training and Testing Partitions . . . . .

62

6.3

Classification Criteria for Glaucomatous Optic Neuropathy . . . . . .

67

7.1

Comparison of Classification Assignments by Stereographic Disc Photography (Photo Class) and Moorfields Regression Analysis (MRA) .

77

Comparison of Classification Assignments by Stereographic Disc Photography (Photo Class) and Glaucoma Hemifield Test (GHT) from Visual Field Testing; χ2 = 85.95, p < .001 . . . . . . . . . . . . . . .

79

7.2

xvi

7.3

7.4

Qualitative Ranking of the Area Underneath the Receiver Operating Characteristic Curve . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

Performance Comparisons of Training and Test Data Partitions . . .

86

xvii

LIST OF FIGURES

Figure

Page

2.1

HRT-1 Stereometric Analysis of the Optic Nerve Head . . . . . . . .

8

3.1

Subject Accountability and Exclusions . . . . . . . . . . . . . . . . .

13

3.2

Comparison of Visual Field Sensitivity Stratified by Categories Assigned from Stereographic Disc Photos; GON = Glaucomatous Optic Neuropathy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

3.3

HRT-1 Sampling Resolution . . . . . . . . . . . . . . . . . . . . . . .

18

3.4

Strategies for Handling Boundary Artifacts . . . . . . . . . . . . . . .

22

3.5

Data Centration Failure . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.6

Data Centration Success . . . . . . . . . . . . . . . . . . . . . . . . .

25

3.7

Frequency Histogram for the Maximum Radius of the Data Array . .

27

4.1

First 15 Geometric Modes of the Zernike Polynomials . . . . . . . . .

31

4.2

Multi-resolution Illustration of B-Spline Modeling . . . . . . . . . . .

35

4.3

One Dimensional Fourier Signal Analysis . . . . . . . . . . . . . . . .

37

4.4

Two Dimensional Gabor Filter . . . . . . . . . . . . . . . . . . . . . .

38

4.5

Wavelet Analysis Concepts for a One Dimensional Signal . . . . . . .

39

4.6

Wavelet Decomposition of a Two Dimensional Signal . . . . . . . . .

41

xviii

4.7

Hilbert Space Filling Curves (a) 1st order, (b) 2nd order; (c) 4th order

44

4.8

Wavelet Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . .

45

5.1

Computational Time as a Function of the Number of Radial Polynomial Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

5.2

Computational time (Mean ±SD) v. Model Complexity . . . . . . . .

50

5.3

Residual Model Error of Radial Polynomials as a Function of Model Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

5.4

Residual Model Error as a Function of Model Complexity . . . . . . .

52

5.5

Distribution of Residual Model Error by Model Type . . . . . . . . .

55

5.6

Graphical Examples of Model Residual Errors . . . . . . . . . . . . .

57

5.7

Resolution Comparison of Zernike and Pseudozernike Modeling

. . .

58

5.8

Comparison of Wavelet Reconstruction Errors . . . . . . . . . . . . .

59

6.1

Moorfields Regression Analysis . . . . . . . . . . . . . . . . . . . . . .

65

6.2

Example Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . .

69

6.3

Illustration of Decision Tree Logic and Entropy Reduction . . . . . .

71

6.4

Illustration of 10-fold Cross-Validation . . . . . . . . . . . . . . . . .

73

7.1

Comparison of ROC Curve Area for Zernike and Pseudozernike Models 83

7.2

Comparison of ROC Curve Area as a Function of Model Complexity .

83

7.3

Classification Model Accuracy as a Function of Model Complexity . .

85

7.4

Comparisons of ROC Curves for Optimal Classification Models . . . .

86

7.5

Pseudozernike Decision Tree Classification Model . . . . . . . . . . .

88

7.6

Pseudozernike Modes from the Decision Tree Classifier . . . . . . . .

89

xix

7.7

Wavelet-based Decision Tree Classification Model . . . . . . . . . . .

90

7.8

B-spline Decision Tree Classification Model . . . . . . . . . . . . . . .

91

7.9

Discriminant Features from Optimal Classification Models . . . . . .

92

7.10 Visualization of Median Pseudozernike Model Features . . . . . . . .

94

7.11 Median Wavelet Models for Each Category . . . . . . . . . . . . . . .

95

7.12 Median B-Spline Models for Each Category . . . . . . . . . . . . . . .

96

7.13 Classification Performance Based on Moorfields Regression Analysis .

98

7.14 Classification Accuracy Based on Visual Field Standards . . . . . . . 100 7.15 ROC Curve Performance for Visual Field-based Classification . . . . 100

xx

CHAPTER 1

INTRODUCTION

Glaucoma is a leading cause of blindness [1]. In 2002, Prevent Blindness America and the National Eye Institute—National Institutes of Health estimated that nearly 2.2 million people 40 years or older had primary open angle glaucoma in the United States [2]. The diagnosis of glaucoma is frequently established through a combination of tests of visual function (visual field sensitivity) and observations of anatomical structure (the optic nerve head and retinal ganglion cell nerve fibers). Recent imaging developments such as scanning laser polarimetry, optical coherence tomography, and in vivo confocal scanning laser ophthalmoscopy have improved the clinical practitioner’s ability to quantify structural features of the optic nerve head and thereby detect physiological signs of glaucomatous optic neuropathy earlier and quantitatively describe them in greater detail. These imaging capabilities present an opportunity to refine our understanding of the relationship between anatomical structural features of the optic nerve and the loss of visual function associated with glaucoma. The challenge for clinicians and researchers—and the primary focus of this research—is to make the most meaningful use of this structural information. The thesis of this dissertation is that statistical learning methods provide an objective and quantitative means to analyze topographic structural features of the 1

optic nerve head that facilitate early detection of glaucomatous optic neuropathy. In this research, alternative methods of quantifying and evaluating the topographic data from confocal scanning laser ophthalmoscopy are evaluated. Novel methods of modeling the structural data are proposed that permit classification based upon structural features derived directly from the data. This approach is then contrasted with existing clinical practices such as classification of disease by stereo disc photography [3, 4], classification by regression analysis [5, 6, 7] and classification by visual field sensitivity criteria [8, 9]. In summary, interpretation of modern optic nerve head imaging methods presents some new challenges. Evolution of computational modeling and machine learning techniques offer alternative methods to quantify and analyze these data. Application of these methods of analysis may offer better accuracy, greater consistency and more objectivity when making diagnostic assessments. This dissertation is organized as follows. Chapter 2 introduces methods previously used to examine, document, and classify the structure of the optic nerve head and delineates between glaucoma defined by glaucomatous optic neuropathy and glaucomatous field loss. Additional introduction is provided on the motivation to use machine learning and data mining methods for this application as well as some discussion of previous related work in vision science. Chapter 3 describes the sources of data used in this research. This chapter also contains an introduction to confocal scanning laser ophthalmoscopy and the methods used to produce the raw numeric data that is the basis for the modeling and classification experiments. Chapter 4 describes the details of the methods used to model the structural features of the optic nerve head. The results of these structural modeling experiments 2

are provided in Chapter 5. In Chapter 6 the methods of feature extraction, and pattern classification are provided and results of the classification experiments are given in Chapter 7. Chapter 8 summarizes the results of this research, discusses the implications, potential applications, and provides some discussion of possible future extensions of this work.

3

CHAPTER 2

BACKGROUND

2.1

Structure—Function Relationships in Glaucoma

The discordance between observable structural signs of glaucomatous optic neuropathy and measurable loss of visual function is well-recognized [8, 9, 10, 11, 12]. Although there are clearly exceptions to the rule, there is evidence to suggest that structural signs of optic neuropathy precedes measurable loss of visual field sensitivity [13, 14, 15]. These findings may be supported in theory by either selective loss of large axons or non-selective axonal loss models of glaucoma [16, 17]. Nevertheless, either structural or functional signs of glaucoma may occur first and progressive change in either domain has been sufficient to diagnose glaucoma in previous studies [18, 19, 13]. This enigma of structure and function in glaucoma has led to a delineation between the loss of vision in glaucoma and the structural signs of disease now commonly referred to as glaucomatous optic neuropathy, which may be present with or without measurable loss of visual function. This research is intended to advance the understanding of the clinically observable structural features of glaucomatous optic neuropathy and by doing so, reach toward a better understanding of how to relate these two dimensions of the disease. 4

2.2

Glaucomatous Optic Neuropathy

Glaucomatous optic neuropathy is characterized by thinning of the neuroretinal rim at the optic nerve head, deepening and enlargement of the optic cup—especially vertically, sectoral retinal nerve fiber layer defects, hemorrhages at the optic disc, and other signs [20]. Since progression is a hallmark of glaucomatous optic neuropathy in uncontrolled disease, documenting the appearance of the optic nerve and surrounding peripapillary nerve fiber layer is an important aspect of disease diagnosis and management. Historically, clinical practitioners have associated the features listed above with glaucoma and have recorded their observations graphically with schematic drawings. Still a mainstay of clinical practice [20], these drawings are intended to help practitioners judge change and thereby determine the need for and effects of medical and surgical interventions. More recently, photographic documentation of the optic nerve head has supplemented these schematic drawings. Stereo disc photography is the current standard of care in the management of glaucoma. Modern stereoscopic photography of the optic disc offers examiners a highly magnified view of the optic nerve that preserves depth, texture, contour and gradient information. This view is further improved by stability as it is uninterrupted by eye movements or blinking and when performed properly, offers optimal illumination. The ability of a trained expert examiner to observe, compare, and judge stereo disc photographs as normal, abnormal, stable, or progressing, is a highly developed

5

skill that is irreplaceable. Nonetheless, There are two important limitations of photographic stereo disc photography. First, the images provide geometric, spatial, and structural information that is difficult to summarize with text in a medical record. Second, this information is inherently qualitative and therefore difficult to analyze quantitatively. Expert examiners can benefit from analytical tools that provide quantifiable information to support clinical decision making that may reduce subjective bias and improve precision. Recent advances in biomedical imaging offer quantitative imaging alternatives for the management of glaucoma. Three methods have become important clinical supplements to stereo disc photography. These methods are optical coherence tomography, scanning laser polarimetry, and confocal scanning laser ophthalmoscopy. Each method offers a different approach to quantitative imaging. Optical coherence tomography provides cross-sectional detail of both retinal and peripapillary tissue with nearly histological resolution [21]. Scanning laser polarimetry provides structural details regarding the density of retinal ganglion cell axons in the peripapillary region by measurements based upon the birefringent properties of tissue in this region [22]. Confocal scanning laser ophthalmoscopy provides a topographic surface map of structures in the peripapillary region. While there are differences, some of the information captured by stereoscopic disc photography is also captured with these quantitative imaging methods. Much of the summary information that clinicians may derive from evaluation of stereo disc photographs, e.g. neural retinal rim thickness, or physiological cup dimensions, can also be derived from a confocal scanning laser ophthalmoscopy examination. The advantage of the Heidelberg Retinal Tomograph (HRT) confocal scanning laser 6

ophthalmoscopy evaluation is that each of these parameters is quantifiable. There are features related to detection of glaucomatous optic neuropathy that are not as easily detected from a confocal scanning laser ophthalmoscopy analysis, e.g. detection of optic disc hemorrhage or undercutting of the cup at the disc margins. These features are more easily detected from the color and contour information available in disc photographs. Each of these imaging methods provides distinctly different information that is complementary to existing diagnostic information. The advantage of all of these methods is that they provide quantitative information that augments results of conventional diagnostic tests. Yet, this additional information creates new challenges as well. More information and new ways of looking at the structure of the optic nerve creates a dilemma of what to do with this extra information. How should one interpret this additional information in the context of existing data? Do these data provide superior information that deserves priority consideration in clinical decision making? In short, clinicians are increasingly facing the classical dilemma of information theory—faced with an abundance of data, what is the best method to capture the most relevant information in the most efficient manner?

2.3

Evolution of HRT Data Analysis

Several metrics have been developed to assist practitioners with interpretation of HRT examination results that include a form of multivariate linear regression (Moorfields Regression Analysis) linear discriminant functions, and more recently, implementation of machine learning classifiers [23, 24].

7

Figure 2.1: HRT-1 Stereometric Analysis of the Optic Nerve Head

2.3.1

Traditional Metrics

The first available metrics from the Heidelberg Retina Tomograph (HRT) confocal scanning laser ophthalmoscope were indices computed to quantify commonly estimated clinical observations such as the cup to disc ratio, cup shape, and neural rim thickness 2.1. Other parameters were computed to provide practitioners new ways to consider their clinical observations. Additional data enabled additional analysis. Cross-sectional classification of glaucoma from newly derived HRT parameters data is still an active area of clinical research [25, 26, 27, 28].

8

2.3.2

Moorfields Regression Analysis

Garway-Heath and colleagues have developed a multivariate analysis of variance method to discriminate between individual test examination results and a normative comparison group known as the Moorfields Regression Analysis [5]. This method is based on age and neuroretinal rim thickness and is described in greater detail in the Methods section of the classification experiments in Chapter 6. In essence, the authors have extended the quantitative analysis capabilities of practitioners by providing statistical modeling routines to compare individual examination results with a normative database. Comparisons of this method with evaluation and classification based on stereo disc photography are described by several investigators [5, 6, 7, 28].

2.3.3

Linear Discriminant Functions

Other investigators have developed additional multivariate linear regression methods to model relationships between structural features of the optic nerve and the diagnosis of glaucoma [25, 29, 28]. Their results are encouraging, but a thorough independent validation of initial findings suggests that improvement is possible and desirable [30]. In much of this previous work, researchers have compared the ability of quantifiable structural features derived from these modern imaging methods to disease defined by functional visual loss as the classification standard. This is an attempt to use independent information as a classification standard. Nevertheless, since structural signs of disease usually precede measurable visual loss, classification standards based on visual field results as a gold standard in early glaucoma will have reduced sensitivity in early disease. The more difficult and relevant challenge is to identify the 9

earliest signs of structural damage, before visual loss is measurable. In this case, the appropriate classification standard for disease should not be visual field performance using standard automated threshold perimetry. Medeiros and colleagues recently argued in support of this perspective suggesting that progressive structural change is a better definition of early disease [31]. Regardless of whether this definition represents an acceptable clinical standard for the diagnosis of glaucoma, the reliability of these new imaging methods for detecting structural change is an appropriate benchmark for comparisons of the quality of information provided. This applies to both the fidelity of the measurements as well as any analytical routines provided to the users of these devices.

2.3.4

Applications of Machine Learning and Data Mining

The use of complementary information to make the diagnosis of glaucoma and monitor disease progression is not new. Assessment of the functional damage caused by uncontrolled glaucoma via visual field testing is standard practice and the integration of these functional measures with structural data is a familiar challenge to practitioners. Some strict definitions of glaucoma require not only structural evidence of disease, but a measurable loss of visual function as well [18, 19]. It is debatable whether a progressive optic neuropathy in the absence of visual field loss can be diagnosed as glaucoma. Likewise, a diagnosis of glaucoma with characteristic progressive field loss in the absence of structural signs of disease is also equivocal [18, 19]. It is important to appreciate the distinction between the types of information under consideration. Visual field tests measure different information

10

and the reliability of this test information is dependent upon entirely different factors, e.g. patient attention and cooperation among others. While the structure and function of the optic nerve fibers are related in some way, the complexity of this relationship is obscured. Machine learning methods are well adapted for this type of complex and likely non-linear relation. In one of the first applications of this approach, Sample and colleagues used a neural network to discriminate between normal and glaucomatous visual field data [32]. Most published work to date using machine learning classifiers with HRT data consists of using native machine derived indices as features for classification models. More recently, Bowd and colleagues have begun working with local sector based summary information [24], as opposed to the global summary indices from the scanning laser polarimeter. This approach is another step toward the use of raw structural information as the basis for disease classification. In summary, these new quantitative imaging methods have created new research opportunities. There are opportunities to apply image and signal analysis techniques from the fields of electrical engineering and computational sciences to facilitate data representation. There are currently few investigators pursuing this approach and it is a challenging and promising direction [33, 34]. There is also a natural bridge between computer and information sciences and the diagnostic analysis of these data. This is an active area of research with investigators applying tools from several domains including information theory and machine learning methods [24, 35]. The contributions of this research are to extend data representation methods using techniques from image processing and to use structural features derived from these models as unbiased parameters as input for novel methods of classification. 11

CHAPTER 3

METHODS

3.1

Sample Description

The data used in this research come from a larger longitudinal dataset acquired prospectively for the purpose of investigating disease progression in glaucoma. This research is conducted under the review and approval of the Institutional Review Boards of Legacy Health Systems of Portland, Oregon and The Ohio State University. A diagram accounting for the subjects of this study is provided in Figure 3.1. The total available sample was 356 subjects. All subjects were recruited by the research staff from the community in the Portland, Oregon metropolitan region and were examined by experienced research staff at Legacy Health Systems’ Discoveries in Sight clinical facilities. Both eyes of each subject were examined over the course of this study for a total possible sample size of 712 eyes. These 356 subjects are divided into two groups that are referred to throughout this thesis as the study and the comparison groups. These group assignments, however, were not the basis for disease classification labels. The eyes of each subject, regardless of group designation, were subsequently classified as normal or glaucomatous optic neuropathy in a separate analysis described in Chapter 6. The data from one eye of each patient was randomly 12

Figure 3.1: Subject Accountability and Exclusions

selected for the modeling experiments described in Chapter 4 (n = 276) and the fellow eyes were used for the classification experiments described in Chapter 6 (n = 275). The study group included both eyes from 246 subjects (492 possible eyes), and was the primary cohort for the prospective study. The inclusion criteria for this cohort were generally defined as subjects previously diagnosed with early stage glaucoma, or those suspected of having a greater than normal risk to develop glaucoma. Inclusion criteria for subjects with either suspected or early glaucoma included patients suspected of glaucomatous optic neuropathy based on inspection of the optic nerve head (e.g. cup-disc ratio asymmetry > 0.2, possible neuroretinal rim notching or narrowing, and disc hemorrhage), with or without ocular hypertension. For the purpose of this study, ocular hypertension was defined as an untreated intraocular pressure ≥ 22 mm Hg. In addition to the above criteria subjects were required to have at least one of the additional risk factors listed in Table 3.1. The exclusion criteria for these subjects are shown in Table 3.2. A comparison of visual field performance—Mean Deviation values from standard threshold automated perimetry—for

13

Figure 3.2: Comparison of Visual Field Sensitivity Stratified by Categories Assigned from Stereographic Disc Photos; GON = Glaucomatous Optic Neuropathy

these subjects is shown in Figure 3.2. This comparison demonstrates the similarity of these two groups and confirms that the glaucomatous disease process, if present, is at the earliest stages. The comparison group was both eyes from 110 subjects (220 possible eyes). This group was designed to provide some baseline information on the variability of normal subjects on the full complement of clinical tests performed on subjects in the study group. In addition to the exclusion criteria identified in Table 3.2, subjects in the comparison group were also required to meet additional conditions to ensure that they were at very low risk for glaucoma. These additional criteria included: normal anterior and posterior segment examination findings by slit-lamp and direct ophthalmoscopy, no history of ocular abnormalities, injuries, or surgeries (other than successful cataract surgery); no known family history of glaucoma; no diabetes or 14

Item History of glaucoma History of migraine Raynauds syndrome or vasospasm African-American ancestry Age > 70 years History of systemic hypertension Diet-controlled diabetes Visual field loss (Abnormal GHT or PSD (p < .05%) GHT = Glaucoma hemifield test; PSD = Pattern standard deviation

Table 3.1: Inclusion Criteria

Item Other previous or current ocular pathology Previous ocular surgery (except successful cataract surgery) Prior neurological surgery or disease Visual acuity worse than 20/40 in either eye Myopic refractive error > −6 D Diabetes requiring medication Mean deviation on full-threshold SAP 24-2 program > −6 dB SAP = standard automated threshold perimetry

Table 3.2: Exclusion Criteria

15

Test Risk Factor Assessment Keratometry Visual acuity Refraction Intraocular pressure (Goldmann IOP) Corneal thickness Static automated threshold perimetry (SAP) Short-wavelength automated perimetry (SWAP) Frequency doubling perimetry (FDT) Multifocal visually evoked potential (VEP) Confocal scanning laser ophthalmoscopy (HRT) Stereo disc photography

Table 3.3: Clinical Testing

other systemic diseases known to affect vision; no use of medications known to affect vision; a minimum corrected visual acuity of 20/40; IOP < 22 mm Hg; refractive error ≤ −6.00 D sphere with ≤ −2.00 D cylinder; and reliable visual field test results. Both eyes of each comparison subject had to satisfy the above conditions to be eligible. The clinical testing performed on both groups of subjects are listed in Table 3.3. In addition to these entry criteria required for participation in the longitudinal cohort study, there were additional constraints added for the purposes of this research. First, as the subject of this research is primarily focused on modeling the structural features of the optic never head, subjects were required to have a full complement of tests to allow measurement and classification of these features including: confocal scanning laser ophthalmoscopy (HRT), stereo disc photography,

16

and standard automated perimetry (SAP). If any of these clinical tests were missing or acquired greater than 30 days apart [10], the subject’s data was excluded from further consideration. To ensure that the HRT data evaluated was of high quality, each individual examination was reviewed. HRT data were excluded for poor scan quality in 9 eyes. These HRT exclusion criteria are explained in detail below after some introduction to the relevant technical and procedural details of the confocal scanning laser ophthalmoscopy examination.

3.2

Confocal scanning laser ophthalmoscopy

This research is based upon the quantitative data derived from the Heidelberg Retina Tomograph (HRT-1) confocal scanning laser ophthalmoscope (Heidelberg Engineering, GmbH, Dossenheim, Germany). These tomographic examinations were performed by an experienced ophthalmic technician. Each examination consists of a series of 32 en face planar sections within a 10◦ × 10◦ field. Each section is captured as a 256 × 256 pixel array with a lateral resolution of approximately 10 µm/pixel and longitudinal (depth) resolution between sections of approximately 60 µm (Figure 3.3). Corneal curvature measurements were used to correct images for magnification errors and corrective lenses were used when astigmatism was > 1 D. Each eye was imaged a minimum three times during a single session. The quality of each image sequence was evaluated by the technician. The best scans were determined as those with the highest signal-to-noise ratio, best lateral and vertical centration, and best focus. The three best scans were then combined to produce a mean image. These mean images were computed by aligning the individual exams using the native alignment

17

Figure 3.3: HRT-1 Sampling Resolution

algorithms of the HRT software (version 3.04). After the mean image was computed, the standard deviation and 95% confidence interval of the data is calculated to estimate the precision of the data captured. As mentioned above, mean examinations were excluded if the mean exam quality was poor. More specifically, data were excluded if the calculated standard deviation of the mean exam was greater than 50 µm [15, 36]. Subsequently, these trained technicians drew contour lines outlining the disc margin while simultaneously viewing a copy of stereoscopic photographs of the optic disc [23]. While the data used in these modeling experiments did not depend upon the user-drawn contour lines, classificaiton results described in Chapter 5 were compared with class assignments from the Moorfields Regression Analysis that is dependent upon these user-drawn contour lines. The raw tomographic elevation data were exported from the HRT database using experimental data export functions. The HRT viewing software version used in these experiments was 1.4.1.5. The examination files were exported in a structured binary file format. The data contained in each exported file began with a header that

18

Field label

Field Description

version Lname Fname sex DOB patID eye exam date focus depth operator scale x scale y

software version (i.e. v3.04) Patient last name Patient first name Patient sex (♂or ♀) Patient date of birth Patient ID number Examined eye (R or L) Examination date Scan focus in diopters Scan depth in mm Operators initials Pixel width in µm Pixel height in µm

Table 3.4: HRT-1 Raw Data Export File Format

included patient and examination information in the first 512 bytes followed by each of the fields listed in Table 3.4. The remainder of the file contains 131,072 bytes of topographic data formatted in 256 × 256 signed short integers (2 bytes/pixel) that represent the height values in microns. Invalid height values (e.g. missing data from the edges of an exam) are represented as a value of −32,768. The height values of the mean examination data are mapped in a relative and tilted coordinate system. The remaining data are a block of 65,536 bytes of pixel intensity values from the reflectance image data, formatted as 256 × 256 array of unsigned bytes with values ranging from 0 (black) to 255 (white).

19

3.3 3.3.1

Data Pre-processing Boundary Conditions

The mean topographic examination data are not immediately useful. The exported data often have several attributes that if ignored would interfere with subsequent analysis. For example, many individual examinations are not perfectly centered relative to other individual examinations in the same exam sequence. When these non-aligned examinations are combined to produce a mean examination, this results in regions where the data do overlap and a mean elevation value can be computed. It also results in regions where mutual data do not exist and no mean can be computed. This alignment failure is handled by the HRT system software by substitution of a missing value place holder e.g. −32,768. As a consequence, the mean image may be padded with missing values at the border and the width of this border is related to the degree of misalignment in the examination sequence. There are several possible strategies for handling these border artifacts. Some of the most elegant solutions come from image processing applications where image registration and boundary artifacts are commonly encountered. Some of the most frequently used solutions to this boundary problem are (1) missing value substitution e.g. −32,768, (2) border value replication, (3) mirror symmetric border replication. Examples of each of these strategies is shown in Figure 3.4. Note that the source data frame in Figure 3.4 A has dimensions of 256 × 256 pixels. These data were embedded in a padded array with dimensions of 512 × 512 pixels, however the final cropped data array had maximum dimensions of only 180 × 180 pixels. Consequently, the chance of including boundary artifacts in the final data array was small and related

20

to the size of any initial missing border values as well as the magnitude of translation needed to center the final image. One of the goals of this research is to model the raw data and then determine the fidelity of these models. Therefore, boundary solutions that preserve the mean and variability of the existing data values are less likely to be biased than solutions that replicate a single scalar value for all missing points [37]. The boundary solution used in this research was a mirror symmetric boundary (Figure 3.4 d).

3.3.2

Data Centration

In addition to misalignment of individual examinations from the series that comprise the mean examination, it is also possible for the entire sequence to be biased laterally or vertically by the examiner. As a result of centering the mean exam sequence it is possible to create boundary artifacts where none were present on the original examination, or enlarge existing boundary artifacts. Deciding the correct point of centration is crucial and may influence the results of any cross-sectional or longitudinal analysis of the data. There are two logical reference points, the optic disc and the optic cup. Either of these choices are arguably correct in the context of the objectives of this research. Furthermore, the selection of either of these implies some limitations, and reliable location of either reference point by automated methods can be a challenge. The physiological structure of the optic nerve head is extremely variable between individuals and differs in diameter, elliptical regularity, concavity (or convexity), and tilt among other possibilities. This variability makes automated location of either the cup or disc center extremely challenging. Recent work by Chrastek and colleagues

21

(a)

(b)

(c)

(d)

Figure 3.4: Strategies for Handling Boundary Artifacts (a) Source data; (b) Missing value substitution; (c) Border replication; (d) Mirror symmetric border replication

22

describes an image processing strategy for automated location of the optic disc and cup [33]. Their modest success in a fairly homogeneous patient population attests to the difficulty of this problem. Centering the analysis on the optic disc is an especially good choice if the analysis objective is longitudinal change within an individual. Since the optic disc size is less variable for an individual over time, reliable location of the disc center between exams will facilitate detection of longitudinal progression. For a cross-sectional analysis, selection of the optic cup as a central reference point may be preferred as it is more easily located and sequential registration of data from the same subject is not necessary. The center of the physiological cup location may vary relative to the disc center between individuals. For example, images aligned on the optic cup may be skewed toward the temporal retina if the optic disc is tilted. Eyes with relatively small, flat, or even elevated optic discs may be less skewed. The location of the cup center may also vary over time, e.g. with cup enlargement in glaucomatous optic neuropathy. Nonetheless, when present, the cup is more easily located and may also represent the most relevant point of interest with respect to pathophysiological change in glaucomatous optic neuropathy. A first approximation of the center of the optic cup was determined by locating the two dimensional mean of the data array of elevation values. This strategy performed equally well for subjects with physiologic cupping as well as for subjects with elevated disc features. In effect, this strategy locates the absolute value of the central mode of a two dimensional Gaussian distribution fit to the data. Next, the width of this distribution of elevation values were estimated by computing the two dimensional standard deviation of the elevation data array. Then, [1.5 × (2-D mean)] + (1.0× SD) of these elevation data was then calculated to further refine a region of interest. All 23

(a)

(b)

(c)

Figure 3.5: Data Centration Failure (a) Source data; (b) Failed centration; (c) Reprocessed data

data points that fell within this region were used to compute the centered x and y coordinates by taking the median of the x and y location of the values within this restricted region. Each resulting centered image was reviewed to assess the adequacy of centration. The criteria used to determine adequate centration was a difference between the either the vertical or the horizontal cup boundary and the array border. If this difference was ≤ 10 pixels the image centration was judged adequate. In some instances this strategy did not locate a centration point that was acceptable (Figure 3.5) and these were generally easily identified by inspection. In these cases the centration criteria were modified on a case by case basis until the centration criteria was achieved. There were a total of 43 of 561 cases (8%) that required this additional processing and the majority of these instances were adequately centered by modifying the mean and standard deviation centration criteria to 1.0 × (2-D mean). These centered data arrays became the input for the next phase of data pre-processing.

24

(a)

(b)

(c)

Figure 3.6: Data Centration Success (a) Source data; (b) Centered and cropped Cartesian coordinate data; (c) Centered and cropped polar coordinate data

Translation of the data array to a newly centered location results in a border of missing values in the original data array. This potential problem was managed by embedding the original data array in a larger empty array that was seeded with missing values to preserve the integrity of the original data. This larger intermediate array was then padded with mirror reflected boundaries and then cropped to 256 × 256, the original dimensions of the source data array size. The result of these operations is a data structure that is equal in dimensions to the original data array, well-centered on a common reference point, and padded with mirror reflected borders wherever missing data is encountered. An example of this processed data is shown in Figure 3.6. If the original data capture was poorly centered on the optic disc it is possible that some of the border data was imputed and the amount of imputed data would be related to the centration error. To further reduce the influence of missing or imputed data at the borders, the final array was cropped in Cartesian coordinates to a 180 × 180 pixel array. In

25

polar coordinates the data were cropped to a radius of 90 pixels. The selection of this 90 pixel dimension was based upon an optimization of the relationship between maximal diameter of the anatomical dimensions of the optic disc and physiological cup, versus minimal inclusion of boundary artifacts. With a magnification factor of approximately 10 µm/pixel this 90 pixel radius encompasses an area that is roughly 2,500 µm2 in polar coordinates and 3,200 µm2 . With an average optic nerve head area between 1,800−2,700 µm2 [38, 39], this would include the nerve and as well as some of the immediate peripapillary area in most subjects. The number of cases possibly affected by border artifacts due to this 90 pixel radius was evaluated for all eyes in the data set. The frequency distribution of the maximum radius of data that did not include boundary artifacts is shown in Figure 3.7. The number of cases with possible border artifacts is indicated by the left tail of this distribution where the bin values are less than 90 pixels. In the majority of instances where boundary artifacts did occur the data were re-centered and this resulted in a maximal radius that was greater than 90 pixels. In a few remaining instances (7 eyes) the border artifacts persisted and a small portion of the mirror symmetric boundaries were included in the centered data array. No records were excluded due to these centration failures. These centered and cropped arrays in Cartesian and Polar coordinate form represent the final pre-processed data used in this research.

26

Figure 3.7: Frequency Histogram for the Maximum Radius of the Data Array

27

CHAPTER 4

STRUCTURAL MODELING METHODS

This chapter describes the methods used to model the structural features of the optic nerve head. The objectives of these modeling experiments are to derive models of the raw data that will achieve both feature reduction and faithful representation of the data. The goal is to preserve the detailed features of the data as well as any spatial correlation within the data, while reducing the total amount of information needed to capture these relevant details. Multiple candidate models are evaluated including: orthogonal radial polynomials, B-spline methods, and wavelet transformations. The data used for these experiments were randomly sampled from one eye of each subject who met the study criteria (n = 551). The final number of eyes included in this sample was 276 (Figure 3.1). The HRT data array from each of these records was modeled using each of the candidate models with varying complexity to evaluate the compromises between residual model error, model complexity, and computational time.

28

4.1

Radial Polynomials

There are several orthogonal polynomial functions that have been used to model data for the purpose of information extraction including Legendre, Zernike, Pseudozernike, and Chebyshev. Several of these have been used for pattern extraction in image analysis [40]. Originally derived for the purpose of describing optical aberrations in phase-contrast microscopy, Zernike polynomials have been particularly useful in physical and physiological optics [41]. Nonetheless, their mathematical properties, orthogonality and orthonormality, have led others to use Zernike polynomials for a wide range of problems [42, 43, 44].

4.1.1

Zernike Polynomials

Zernike polynomials are an infinite series of orthogonal, circular polynomial functions with a well-established relationship to optical function [45]. Advantages of using Zernike polynomials over other orthogonal polynomials to capture two dimensional signal features include relative insensitivity to noisy signals and low information redundancy, [40]. Webb first suggested their use for modeling the ocular surfaces [46]. Others have validated the accuracy of using Zernike polynomials to describe the corneal surface and have refined their use [47]. Vision scientists frequently use Zernike polynomials because they provide an accurate model of the ocular surfaces and optical wavefront aberrations [44]. In previous work collaborators and I have used these polynomial functions to model corneal surface features [48, 49, 50, 51]. This work describes one of the first applications of modeling structures of the optic nerve head using these functions.

29

Zernike polynomials are series of geometric modes that are both orthogonal and orthonormal over the unit circle. Their orthogonality is defined by the following relation. Z 0



Z 0

1

0

Znm (ρ, θ)ZRnm0 (ρ, θ)ρdρdθ =

π δnn0 δmm0 2(n + 1)

(4.1)

where δ is the Kronecker delta function. Their computation results in a series of linearly independent circular geometric modes that are orthonormal with respect to the inner product given above. Zernike polynomials are composed of three elements: a normalization coefficient, a radial polynomial component and a sinusoidal azimuthal component [45]. The general form for Zernike polynomials is given by: q   2(n + 1)ZRnm (ρ) cos(mθ)  q

Znm (ρ, θ) = 

for m > 0

2(n + 1)ZRm (ρ) sin(|m|θ) for m < 0

n q    (n + 1)ZRm (ρ)

(4.2)

for m = 0

n

where n is the radial polynomial order and m represents azimuthal frequency. The normalization coefficient is given by the square root term preceding the radial and azimuthal components. The radial component of the Zernike polynomial, the second portion of the general formula, is defined as: (n−|m|)/2

ZRnm (ρ)

=

X s=0

(−1)s (n − s)! s!( n+|m| 2



s)!( n−|m| 2

− s)!

ρn−2s

(4.3)

There are restrictions on the combinations of n and m that will yield valid Zernike polynomials. The value of n is either a positive integer or zero. For any given combination of n, m can only take the values −n, −n + 2, −n + 4, . . . , n. In other words, m − |n| = even and |n| ≤ m. Any invalid combination results in a radial polynomial component of zero. These geometric modes increase in complexity with increasing polynomial order, n. To provide some intuition, the first 15 geometric modes of the Zernike polynomials 30

Figure 4.1: First 15 Geometric Modes of the Zernike Polynomials

are shown in Figure 4.1. Surface complexity increases by row (polynomial order), and radial location of the sinusoidal surface deviations trend toward the periphery of the unit circle with increasing azimuthal magnitude, e.g. away from the central column of Figure 4.1. The key variables that influence the fidelity of a surface modeled with Zernike polynomials are (1) the number of terms, or coefficients in the model, (2) the diameter over which the data will be normalized to the unit circle, (3) the sampling of the data, e.g. polar, Cartesian grid, or random, and (4) the total number of data points available for fitting. To construct a model of the HRT data using Zernike polynomials it is important to consider how many terms are sufficient to faithfully model the data. As additional terms are included, the residual error of the model will decrease asymptotically[48].

31

Conversely, the computational expense will increase in a manner that is highly dependent upon the methods of polynomial computation. Using current implementations, colleagues and I have shown that the rise in computational time is nearly linear as additional coefficient terms are included [51]. Adding additional terms to the model also increases the risk of producing a model that has overfit the data. This issue has been previously addressed by Iskander and others for modeling the corneal surface [47, 52]. Iskander applied a bootstrap method to minimize the mean square error for a Zernike polynomial model of normal and keratoconic corneal surfaces to determine the number of polynomial terms needed to faithfully represent the corneal surface [47]. Colleagues and I have extended this work to thoroughly address this issue with regard to modeling the corneal surface [48, 49, 50, 51]. Nevertheless, there are important differences between fitting videokeratography data, 7,000 data points over a 10 mm surface with polar sampling and fitting the HRT-1 data, 65,000 data points over 1, 500 µm with Cartesian grid data sampling. An analysis of Zernike polynomial model error as a function of model complexity is performed in this resesrch and results are shown in Chapter 5.

4.1.2

Pseudozernike Polynomials

Pseudozernike polynomials are another family of orthogonal polynomial functions first described by Bhatia and Wolf [53]. Like Zernike polynomials, these functions are an infinite series of circular modes that are orthogonal over the normalized unit circle. Pseudozernike moments are an improvement on the original Zernike moments providing better noise resistance, and less information redundancy [40]. The general

32

form for pseudozernike polynomials is given by:

P Znm (ρ, θ) =

q   2(n + 1)P Rnm (ρ) cos(mθ) for even n, m 6= 0  q

2(n + 1)P Rm (ρ) sin(mθ) for odd n, m 6= 0

n  q    (n + 1)P Rm (ρ) n

(4.4)

for m = 0

Similarly, the radial portion of the pseudozernike polynomial is defined as: n−|m|

P Rnm (ρ)

=

X s=0

(−1)s (2n + 1 − s)! ρn−s s!(n − m − s)!(n + m + 1 − s)!

(4.5)

and is related to the radial component of the Zernike series in the following manner: 2m+1 ρP Rnm (ρ2 ) = ZR2n+1 (ρ)

(4.6)

One significant difference between pseudozernike and Zernike polynomials is that valid pseudozernike polynomials are not constrained by value of m other than |m| ≤ n. This results in the generation of (n + 1)2 linearly independent polynomials for a given degree n, as opposed to the 21 (n + 1)(n + 2) polynomials that are generated for a Zernike polynomial of the same degree. It is common to refer to Zernike or pseudozernike polynomials of a certain order (n). Increasing orders are comprised of increasing numbers of coefficients. The number of coefficients for each order (10th through 21st order) are shown in Table 4.1. Since the series are orthogonal, all of the lower order coefficients are contained within higher order transformations. For each of the radial polynomial functions, Zernike and pseudozernike, custom routines were written in MatLab (v. R2006a) to fit the Cartesian grid arrays of data. These data were first cropped to a radius of 90 pixels and then converted to polar coordinate form. These regularly sampled polar coordinate points were then fit with varying numbers of terms (32, 64, 128, and 256) over a maximum radius of 900 µm. 33

Order Transform

10

11

12

13

14

15

16

17

18

19

20

21

Zernike Pseudozernike

66 121

78 144

91 169

105 196

120 225

136 256

153 289

171 324

190 361

210 400

231 441

253 484

Table 4.1: Number of Radial Polynomial Coefficients by Order

The resulting coefficients from all left eyes were reflected to represent the data of a right eye. This conversion compensates for the anatomical mirror symmetry that exists between right and left eyes. The time required to calculate the polynomial coefficients and the residual model error were computed and recorded.

4.2

B-splines

B-splines were also evaluated as an alternative to the Zernike and pseudozernike radial polynomials. Spline functions are piecewise polynomial functions with variable properties that may be specified by the user, e.g. a continuous second derivative, to translate sparse or irregularly sampled data to a continuous function. This ability to represent sampled data as a continuous function exposes one of the more powerful attributes of spline functions—they are multi-scalar [54]. Scaling spline functions is relatively trivial. Values at intermediate or unsampled locations can be explicitly determined because the piecewise polynomial function provides a continuous solution between sampled intervals. For example, the source data array shown in Figure 4.2 a is represented by the full complement of 65,000 data points, whereas the downscaled version of the same data, Figure 4.2 b demonstrates how the global features are well preserved with a fraction of the data (65,536 features reduced to 1,024). Similarly, the 34

(a)

(b)

(c)

(d)

Figure 4.2: Multi-resolution Illustration of B-Spline Modeling (a) Source data; (b) Data downsampled 64:1; (c) 2-D source data; (d) 2-D downsampled data

two-dimensional spline function can be evaluated at this reduced number of sampled points to upscale the array while preserving the underlying global (more coarse) description of the surface features. The objective of this downsampling strategy is to reduce the feature space required to represent the original data. Two-dimensional B-Spline surfaces of varying complexity were used to fit the raw HRT data arrays to evaluate the trade offs between model complexity, fidelity 35

and computational expense. Cubic B-splines were computed using the Spline toolbox (v. 3.3) of MatLab (v. R2006a) to model the data with piecewise polynomial functions that had a continuous second derivative. Downsampling the original data was performed to represent the data more compactly. The data were downsampled in increments intended to facilitate fair comparisons with results obtained from wavelet and radial polynomial representations of the same data. The number of spline coefficients evaluated in these experiments were 64, 144, 256, and 1024. The data computed were an array of two dimensional spline coefficients, computational time, and residual model error. As with the radial polynomial representations, data from left eyes were reflected about the central vertical meridian to compensate for anatomical mirror symmetry.

4.3

Wavelets

Wavelet analysis of signals is a powerful approach that has been applied to a very wide range of problems in one dimensional (temporal) and two dimensional (spatial) signal analysis [55]. The most useful insight into the concepts of wavelet signal analysis come from Fourier analysis. In Fourier analysis a signal is fit by decomposing the signal to fundamental sinusoidal frequencies that can be recombined by scale to represent the elements of the source signal (Figure 4.3). Figure 4.3 a shows a one dimensional profile of the elevation data from the two dimensional array of HRT data. Figure 4.3 b shows the corresponding one dimensional plot of the power spectrum as a function of arbitrary frequency units. This approach is computationally efficient and highly useful. Nonetheless, many natural signals have discontinuities that occur at a specific time, such as disease 36

(a)

(b)

Figure 4.3: One Dimensional Fourier Signal Analysis (a) Vector representation of optic nerve head elevation (signal); (b) Power spectrum of signal

onset or surgical intervention. This is analogous to a discrete spatial location for two dimensional signals, such as the margin of the optic cup. Fourier analysis is ill-suited for capturing such discrete time (or spatially) dependent phenomena, e.g. it is impossible to tell when or where an event begins or ends by inspection of the frequency spectrum. Dennis Gabor attempted to address this limitation of Fourier analysis by adding a discrete window to the frequency component of the sinusoidal signal [56]. Gabor’s solution is a signal filter produced by the product of a Gaussian envelope and a sinusoid (Figure 4.4). This approach can help resolve the onset or end of a one-dimensional temporal signal or the spatial localization of a two-dimensional signal. Nevertheless, this approach has a limited ability to resolve temporal or spatial components of a signal that are related to the size of the signal filter aperture window [57]. Temporal or spatial signals that are greater or less than the defined signal envelope will remain ill-defined. What is required to address these issues of uncertainty are (1) fundamental waveforms that have a discrete time envelope (2) 37

(a)

(b)

(c)

Figure 4.4: Two Dimensional Gabor Filter (a) Sinusoidal signal; (b) Gaussian window; (c) Gabor filter window is the product of A × B

variable scale (frequency) (3) and variable location as well. Wavelet analysis provides such a solution. In wavelet analysis, a fundamental waveform known as the Mother wavelet is repetitively scaled to achieve the best estimation of the signal. This is analogous to breaking the signal into constitutive sinusoidal frequencies as in Fourier analysis. A major difference between these two forms of analysis is that the fundamental waveform in wavelet analysis is not only scaled to archive the best fit, but is translated in increments of the wavelet size along the length of the signal as well (Figure 4.5). The Daubechies 3rd order mother wavelet is shown in the top panel of Figure 4.5. For any given scale the mother wavelet is incrementally shifted by the width of the wavelet down the length of the signal. At each location, the correlation between the wavelet and the original signal is calculated. These values are the wavelet coefficients. In brief, the iterative steps of wavelet signal analysis are (1) comparison of the wavelet with a portion of the starting segment of the signal, (2) calculation of the correlation between the wavelet waveform and the signal segment, and (3) translation of the 38

Figure 4.5: Wavelet Analysis Concepts for a One Dimensional Signal Mother wavelet shifting (top panel) and scaling (bottom panel)

wavelet to subsequent positions along the signal. This process is repeated until the entire length of the signal has been analyzed. Next, an alternative scale (stretched or compressed wavelet) is selected and this process is again repeated until all possible scales have been evaluated. There are several families of wavelets that are commonly used. The most familiar are Haar, Daubechies, Coiflets, and Symlets. Each family has some unique attributes that may make one family more suitable than another for a specific application. In general, the main characteristics of a wavelet basis function to consider are (1) symmetry, (2) smoothness, (3) orthogonality, and (4) support length. Symmetric wavelets will yield signal transforms that are also symmetric, a desirable property

39

for signal reconstruction. The smoothness of a wavelet filter is related to the filter length as well as the number of times that the wavelet function can be differentiated. Smooth wavelets are better suited to smooth signal reconstruction. Orthogonality helps to reduce the amount of information required to represent the signal, and support length refers to the minimum number of data points required to produce the wavelet signal. Wavelet analysis produces two sets of coefficients known as approximation coefficients, a downsampled estimation of the best global fit of the wavelet to the original signal, and the detail coefficients. The detail coefficients represent the difference between the wavelet approximation and the original signal. These detail coefficients describe the local (high frequency) details of the signal. Detail coefficients may be combined with the approximation coefficients to reconstruct the original signal. Together, these two sets of coefficients provide a compact representation of the data that can be efficiently transformed from wavelet coefficient space to an original signal representation. The advantages of wavelet analysis are compact signal representation, and very fast computational speed. Wavelet analysis is widely used for two-dimensional signal evaluation as well, most notably in image analysis. As an example, JPEG-2000 standards for image compression specify the inclusion of wavelet analysis for the ability to faithfully and compactly represent signals. The product of a two dimensional wavelet signal analysis is itself an array that has four parts. The original signal is decomposed to an array of horizontal, vertical and diagonal detail coefficients as well as a reduced-scale approximation of the original signal. An example of this four part array is shown for a 1st level Haar representation in Figure 4.6. The upper left panel represents 40

(a)

(b)

Figure 4.6: Wavelet Decomposition of a Two Dimensional Signal (a) Four panel image coefficient arrays, pixel intensity is proportional to coefficient magnitude; (b) Coefficient array legends: A = approximation coefficients, H = horizontal detail coefficients, V = vertical detail coefficients, D = diagonal detail coefficients

the 1st -level approximation coefficients. In the upper right panel the magnitude of the horizontal detail coefficients from the original image are represented as brighter pixel values. The lower left panel show the detail coefficients for the vertical components and the lower right panel shows the diagonal detail coefficients. These three detail coefficient arrays capture the magnitude of salient features for each directional component of the image. The resulting arrays of coefficients do not yield data that can be easily reduced to a vector of features or directly incorporated into machine learning classifiers.

41

Most applications of two-dimensional wavelet analysis emphasize the signal compression and reconstruction features of this approach. The use of wavelets for the purpose of two dimensional feature recognition and extraction are far fewer. In fact the use of wavelets for both feature reduction and pattern recognition in two dimensional space is somewhat rare. Research from information retrieval systems that use wavelet-based machine learning and pattern recognition strategies seem to provide the most parallel perspective on the use of wavelets as applied in this research [58, 59]. Preserving the correlated structure of the spatial information contained in the data array is an important and difficult challenge. Two basic strategies were considered in this research. A first approach was to disregard the spatial context of the data features derived from the wavelet fit and consider the wavelet coefficients as features irrespective of their spatial context and without intent to reconstruct the source data from the fitted wavelet coefficients. While this may be a suitable strategy for classification, it is less useful when attempting to evaluate model fidelity. The second approach considered was to preserve the spatial address by using a 2-D wavelet decomposition and extract the coefficients from the resulting 2-D array. Unfortunately, this would not result in any meaningful feature reduction—a major goal for subsequent classification experiments that follow. The strategy that was adopted was to use Hilbert Space filling curves to sample the 2-D data and reduce the data to a single dimension. This approach is explained in detail below. The result is a vector of data with spatial addressing preserved. As a result, the data may be reduced to a single dimension, analyzed by performing a one dimensional wavelets analysis, and the fitted data reconstructed to compare the fit of the model with the original signal.

42

The MatLab Wavelet toolbox (v. 3.0.4) in MatLab (v. R2006a) was used to perform the wavelet analysis of this research. Three different wavelet families were evaluated and these are introduced in detail below in Section 4.3.2. For each wavelet family the approximation coefficients were used as the features for signal representation. The number of coefficients selected were determined by the level of wavelet decomposition. When using Haar wavelets, each additional level of signal decomposition result in a dyadic reduction in the number of approximation coefficients, e.g. a 5th level Haar decomposition of an original signal with 16,384 elements (128 × 128) results in 1,024 approximation coefficients, while a 6th level decomposition produces 256 coefficients.

4.3.1

Hilbert Space-Filling Curves

A space-filling curve is a continuous map of data from the unit interval onto a unit square. These curves provide a means of translating or remapping the data in either direction, from the two-dimensional unit square to the one-dimensional unit vector [60]. In this case the Hilbert space filling curve is used as a method to reduce the two-dimensional HRT data array to a one-dimensional vector of elevation data points. There are several well-defined space filling curves suitable for the purposes of this research. Hilbert curves were selected as a first implementation of these concepts because of their simplicity. Other possible space filling curve candidates include: Peano, Sierpi´ nski, and Heighway Dragon curves [60]. In each case, the sampling order of the unit square is determined by the path of the curve. This curve path is non-intersecting and self-avoiding so that the interval entirely fills or contains

43

(a)

(b)

(c)

Figure 4.7: Hilbert Space Filling Curves (a) 1st order, (b) 2nd order; (c) 4th order

all possible points within the unit square. The Hilbert curve progresses from the lower left quadrant of the unit square in clockwise fashion to end at the lower right quadrant. This sampling route is shown graphically in Figure 4.7, where a first order curve sampling a 2 × 2 grid would visit each quadrant in a u-shaped pattern. As curve order is increased the fractal nature of the Hilbert space filling curve becomes apparent, yet the general progression remains counterclockwise from quadrant III to quadrant IV . First (4 points), second (16 points) and fourth order (256 points) Hilbert curves are shown in Figure 4.7.

4.3.2

Wavelet Basis Functions

The Haar wavelet family has several useful properties that include orthogonality, symmetry, and compact support. The support length of the Haar wavelet is the shortest of all wavelets and is equal to one. Thus, as an upper limit, it is possible that there are as many Haar wavelet coefficients as there are data points in the original signal. The Haar mother wavelet is a step function and is depicted in Figure 4.8 a. It is one of the few wavelets that is symmetric in shape. 44

(a)

(b)

(c)

Figure 4.8: Wavelet Basis Functions (a) Haar wavelet; (b) Daubechies wavelet, 3rd order; (c) Coiflet wavelet, 3rd order

The Daubechies family of wavelets are another series of compact basis functions that are orthonormal. The length of Daubechies wavelets are defined as 2N −1 where N is the wavelet order. The third order Daubechies wavelet depicted in Figure 4.8 b, has a support length of 5. This longer support translates to additional wavelet coefficients for a given order compared to the Haar wavelet, Table 4.2. It is also notable that the Daubechies wavelet is asymmetric. The sharpness and asymmetry of this wavelet profile may be beneficial for detection of signal discontinuities. Nevertheless, optimal wavelet basis function properties for the analysis of these signals is untested. A third wavelet basis function evaluated in these studies is the Coiflet family (Figure 4.8 c). The coiflet wavelets differ from the Daubechies in several ways, yet share some of the most useful fundamental mathematical properties. Like Daubechies wavelets, coiflets are relatively compact orthonormal wavelets. Their support length is defined for each order N as 6N − 1. The third order coiflet is the least compact of the wavelet functions evaluated here, Table 4.2. Although still relatively compact, their longer support resulted in the most wavelet coefficients for any given order of 45

Order Wavelet Family

8th

7th

6th

4th

Haar Daubechies Coiflet

64 68 80

128 132 144

256 1024 260 1028 272 1039

Table 4.2: Number of Wavelet Coefficients by Family and Order

signal decomposition. Unlike Daubechies wavelets, coiflets are much more symmetrical. The implications of this property are evaluated in the results comparing the different wavelet families evaluated and in the discussion of results. The question considered is whether or not a difference in wavelet symmetry or support length results in better signal representation for these data. To perform the wavelet analysis, the two dimensional array of HRT elevation data were first transformed to a one dimensional signal using the Hilbert space filling curves as described above. These one dimensional data were then fit with a Haar wavelet of varying order to yield the wavelet coefficients. The number of coefficients produced for each order of each wavelet family are given in Table 4.2. The signal approximation coefficients were then saved as a feature vector along with the computation time and residual error. As before, data from left eyes were reflected along the central vertical meridian to represent mirror symmetric right eyes, before the data were combined for analysis. After fitting these data, the process was run in reverse to remap the one dimensional wavelet coefficients back to the unit square. It was then possible to compare

46

the wavelet transformation of the data to the original values and compute the residual error. In summary, each of these signal reconstruction methods: radial polynomials, B-splines, and wavelets were used to model the original HRT elevation data array, but in very different ways. These signal representations were evaluated at several different levels of resolution with respect to the fidelity of the model and the computational time required to generate the model. Specifically, the residual model error was evaluated as a function of model complexity (the number of features required to construct the model). Similarly, the computational time required to generate the model was determined as a function of model complexity. The results of these comparisons are provided in the following chapter on modeling results, Chapter 5.

47

CHAPTER 5

STRUCTURAL MODELING RESULTS

The HRT data arrays were modeled with each of the methods described in Chapter 4. Wherever possible and practical, the modeling conditions were matched for each of the candidate models. In some cases this was not practical, e.g. as the results that follow will show, computational times for the radial polynomial expansion become quite large with diminishing improvements in modeling accuracy. The purpose of these modeling experiments was to evaluate the trade-off between model complexity, computational cost (time), and model fidelity. The goal was to develop a parsimonious model with low residual error that could be rapidly computed.

5.1

Modeling Speed vs. Complexity

The radial polynomials, both Zernike and pseudozernike, are computationally expensive [40]. Colleagues and I have optimized MatLab routines to compute these polynomials and have achieved nearly linear performance for both radial polynomial functions. A plot of computational time as a function of the number of radial polynomial expansion is shown in Figure 5.1. The routines for computation of the polynomial coefficients are nearly identical for these two radial polynomials. This

48

Figure 5.1: Computational Time as a Function of the Number of Radial Polynomial Terms

resulted in time—complexity functions that were indistinguishable and therefore, plotted as a single function in Figure 5.1. The time—complexity scale is very different across the methods of modeling considered. Increasing complexity (additional coefficients) increases the computational iterations for both the radial polynomial functions and the B-spline surface representation. Conversely, a larger number of wavelet coefficients was produced by fewer iterations. The differences in these time scales for each modeling method are best illustrated in separate plots. In Figure 5.2 a, time is plotted on a logarithmic scale because of the enormous difference between the time required to compute the radial polynomial coefficients and the B-spline coefficients. The time required to compute the Zernike and pseudozernike coefficients was on the order of tens of seconds compared to the tenths of seconds required for the B-spline coefficients. The time

49

(a)

(b)

Figure 5.2: Computational time (Mean ±SD) v. Model Complexity (a) Radial polynomial and B-spline surface models; (b) Wavelet surface models

required to compute the wavelet coefficients was independent of the number of coefficients computed (Figure 5.2 b). The time scale required to compute 1,024 wavelet coefficients was on the order of hundredths of seconds and differed little by wavelet family. Wavelet coefficients were clearly the most computationally efficient means of generating model features regardless of model complexity.

5.2

Modeling Accuracy vs. Complexity

Model accuracy was determined as the residual error calculated as the difference between the original HRT elevation data array and a computed model of the data. The root mean square error or RMS was used as a summary statistic to compare the models to one another. In each model, greater complexity was associated with lower residual error as expected.

50

Figure 5.3: Residual Model Error of Radial Polynomials as a Function of Model Complexity

The first two models compared were the Zernike and pseudozernike radial polynomials. The pseudozernike polynomials are reported by Iskander and others to be more robust to noise and have better fit characteristics when compared with Zernike polynomials [61, 40]. Figure 5.3 shows the difference in residual RMS error between the two radial polynomial models. For any given number of model coefficients, the pseudozernike polynomials had consistently lower residual error (approximately 1-5µm), which was significantly less than the Zernike polynomial model with 256 coefficients (paired t-test p < .05). A more thorough statistical comparison of all modeling was performed as part of a repeated measures analysis of variance described in greater detail below. As reported by Iskander and others, the decaying residual error function of the pseudozernike model was smoother and less erratic when compared to the more step-like decay function of the Zernike polynomials [61, 40]. The regions of higher 51

Figure 5.4: Residual Model Error as a Function of Model Complexity Z/PZ = Zernike/pseudozernike; Spline = B-spline; Coif3 = 3rd order Coiflet; Db3 = 3rd order Daubechies

slope—greater reduction in residual model error—were associated with inclusion of additional terms near the center of the pyramidal sequence of geometric modes. These centrally weighted terms corresponded to paraboloid and higher-ordered centrally symmetric terms (Figure 4.1). Similar evaluations of each of the remaining candidate models were performed and a full side-by-side comparison of residual model error as a function of the number of coefficients (model complexity) for each of the models is shown in Figure 5.4. These results show that the residual error of the radial polynomial models were consistently less than the residual error of the B-spline and wavelet models. To

52

achieve a residual error of approximately 30µm required 256 radial polynomial coefficients, while greater than 1,000 wavelet or B-spline coefficients were needed to archive the same level of residual error. In another comparison, the distribution of residual RMS error as a function of model type is shown for all models with equal complexity (256 coefficients). The raw data are shown in Figure 5.5 a. The box plot representations of these same data are provided in Figure 5.5 b. The bar in the middle of each box represents the median value and the box contains the middle 50% of the total distribution from the 25th to the 75th percentile. The whiskers of the box plot extend 1.5 times the interquartile range and any data beyond this limit is plotted as an individual point in the figure. The mean residual error of these modeling methods was compared using a one-way repeated measures analysis of variance design. Here the 6 different models (Zernike, pseudozernike, three wavelet models and B-spline) were evaluated for all 276 eyes. The ANOVA table results are shown in Table 5.1. This analysis shows that there were significant differences between the different modeling methods evaluated that were greater than the observed differences between subjects even after adjustments for repeated measurements. A test of assumptions showed that the residual errors of the linear model were normally distributed. These analysis of variance test results show that there was a significant difference between the different models evaluated and suggests that further pairwise comparisons of the individual mean values is warranted. A t-test was used to compare the differences in the mean residual error between the different modeling methods. These pairwise comparisons were adjusted for multiple comparisons using Bonferroni’s method, Table 5.2. 53

N = 1656 Source of Variation

R2 = 0.92 Adjusted R2 = 0.91 SS

df

MS

F

Prob>F

Subjects 415237 275 1509 44.37 < .0001 Model 151495 5 30299 890.26 < .0001 Greenhouse-Geisser epsilon = 0.31; Adjusted F < .0001 SS=sums of squares; df = degrees of freedom; MS= mean square

Table 5.1: One-Way Repeated Measures Analysis of Variance for Residual Model Error by Model Type

Model

Spline

Z

PZ

Haar

Db3

−25.5 < 0.001 PZ −26.6 −2.1 < 0.001 1.00 Haar −9.4 15.1 17.1 < 0.001 < 0.001 < 0.001 Db3 −7.6 16.9 19.0 < 0.001 < 0.001 < 0.001 Coif3 −9.8 14.7 16.7 < 0.001 < 0.001 < 0.001

1.9 1.00 −0.4 1.00

−2.2 1.00

Z

Row mean - Column mean(µm); Bonferroni adjusted p values; Z = Zernike; PZ = pseudozernike; Wavelet Families: Harr, Coiflet (Coif3), and Daubechies (Db3).

Table 5.2: Comparisons of Residual Model RMS Error Table cell vales represent the mean difference between respective row and column combinations with associated p-values beneath this mean difference

54

(a)

(b)

Figure 5.5: Distribution of Residual Model Error by Model Type (a) Individual data; (b) Box plot distributions: Coif3 = 3rd order Coiflet, Db3 = 3rd order Daubechies, PZ = pseudozernike, Z = Zernike, Spline = B-spline. All models have 256 coefficients

These comparisons show that these wavelet methods were all similar to each other, but were distinguishable from the B-spline models and the radial polynomial models. Although the residual RMS error of the two radial polynomial models were similar to one another, they were significantly less than the wavelet and spline modeling methods, Table 5.2. These numeric summaries do not provide much intuition regarding the source of modeling errors or their spatial distribution. Graphical evaluation of the radial polynomial functions provide additional information regarding the characteristics of the residual error in each of these different modeling methods. A question arising from inspection of the distributions of residual error by modeling method (Figure 5.5) is the identity of the individual outliers in each distribution. Are these outliers the

55

same individual records in each case? To put it another way, is greater residual error related more to the characteristics of the individual, or to the modeling method? To address this question, the 10 cases with the greatest residual RMS error were sampled from each modeling distribution representing the top 4% (10/276) of each distribution. This sample captures all of the outliers depicted in the tails of the boxplots of Figure 5.5 b. A total of 17 different eyes formed the entire set of outliers. Five of these outliers were common to 5 or more of the 6 modeling methods. An example of one such case is shown in Figure 5.6 a-c. By inspection, each of these five cases represents an eye with a large optic disc and greater than average physiological cupping, e.g. Figure 5.6 a-c. Selected other cases are shown in Figure 5.6 d-i. One explanation for these greater residual errors for the radial polynomial and spline modeling methods is that both are sensitive to boundary values. Large elevation gradients near the boundaries of eyes with large optic discs and deep physiologic cupping artificially inflates estimates of the boundary coefficients. The outliers for both of the radial polynomial models were many of the same eyes; 8 of the 10 eyes were common to both sets. The greatest difference between these two radial polynomial functions was the amount of structural detail that could be captured. In almost every instance the structural details revealed by fitting the data with the pseudozernike polynomials was greater than the structure revealed by the Zernike polynomial models (Figure 5.7). Wavelet models seemed less affected by these boundary conditions, but have larger errors in regions of high gradients (Figure 5.6 g-i). From the overall model comparisons in Figure 5.4, it does not appear that the wavelet models differ very much from one another. Nevertheless, the different wavelet basis functions were 56

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Figure 5.6: Graphical Examples of Model Residual Errors (a) Raw elevation data; (b) B-spline fit (256 coefficients); (c) B-spline residual error; (d) Raw elevation data; (e) Pseudozernike fit (256 coefficients); (f) Pseudozernike residual error; (g) Raw elevation data; (h) Db3 Wavelet fit (256 coefficients); (i) Wavelet residual error

57

(a)

(b)

(c)

Figure 5.7: Resolution Comparison of Zernike and Pseudozernike Modeling (a) Source data; (b) Zernike polynomial representation (256 coefficients); (c) Pseudozernike polynomial representation (256 coefficients)

compared to determine if there were discernible differences in the structure of the residual errors for each wavelet type. An example comparing the Daubechies Db3 wavelet to the Haar wavelet model of the same data is shown in Figure 5.8. While the spatial distribution model errors appear quite similar—greater errors in the regions of highest gradient—the Haar wavelet errors are more discrete whereas the 3rd order Daubechies wavelet has a smoother profile. These subtle differences are more visible in Figure 5.8 d and e. In summary, the radial polynomial models provided the most compact data transformation, however, these Zernike and pseudozernike coefficients were also the most time consuming to compute. The primary advantage of the various multi-resolution methods evaluated, B-spline and wavelet representations, was computational speed at the expense of compactness.

58

(a)

(b)

(d)

(c)

(e)

Figure 5.8: Comparison of Wavelet Reconstruction Errors (a) Raw elevation data; (b) Db3 wavelet approximation (256 coefficients); (c) Haar wavelet approximation (256 coefficients); (d) Db3 residual error; (e) Haar residual error

59

CHAPTER 6

STRUCTURAL CLASSIFICATION METHODS

6.1

Classification Standards

Three methods were used to assign classification labels to the data. Two of these criteria identify glaucomatous optic neuropathy. These were assignments based on expert review of stereo disc photographs and Moorfields Regression Analysis. The third criterion used to assign class labels to the records of this dataset were derived from analysis of standard threshold automated perimetry testing. A statistical comparison of class assignments for each of these classification standards was performed using Cochran’s Q test statistic [62]. The results of these comparisons are presented in the following chapter. The sample used for these classification experiments was drawn from the total pool of subjects who had the full complement of quality test data available (HRT, disc photographs, and visual field) performed within a 30 day time window. This total group size was 551 eyes. Of these 551 eyes, one eye was selected at random from each subject for a total final sample size of 275 eyes (Figure 3.1). The data used for these classification experiments come from the fellow eyes of the subjects used for the structural modeling experiments. Of the total 275 eyes, 75% (206 eyes) were 60

Sampled

Variable

N

Yes

HRT HRT HRT HRT

275 275 276 276

No

SD CI SD CI

Mean(µm) SD(µm) Median(µm) 16 48 16 45

8 38 7 19

14 40 14 41

Table 6.1: Classification Sample Characteristics HRT SD is the standard deviation of the mean HRT data array for an individual exam; HRT CI is the 95% confidnce interval of this same mean data array. Table cell values summarize the bias and distribution of exam variability for each group.

used for classification model training and validation. Model training and validataion consisted of multiple experiments with repeated sampling and performace estimation using 10-fold cross validation methods. The remaining 25% (69 eyes) were reserved as an independent test set to evaluate the performace of optimized classifaction models that were the product of the model trainning and validation experiments. This final test set was selected at random from the total 275 subject pool and was stratified so that class representation in this test set was similar to the training partition. The age of the total 275 sampled subjects (mean ± SD) was 57 ± 13 years. These subjects were 59% female (164/275) and 53% (146/275) of the eyes from these subjects were right eyes. The majority class assignment from expert evaluation of stereoscopic disc photography was 58% (161/275) normal and 42% glaucomatous optic neuropathy. A summary comparison of the sampled and unsampled data with respect to the quality of the HRT data (SD and 95% CI of the mean) for the mean examinations evaluated is provided in Table 6.1. The data were separated into two groups, training (75%) and testing (25%) that were stratified for proportional class

61

Normal

Training Testing

GON

Gender

N

Mean Age(SD)

N

Mean Age(SD)

F M F M

73 48 21 21

51±11 55±15 54±11 58±14

49 36 19 8

59±10 56±12 55±11 64±14

SD = standard deviation; GON = glaucomatous optic neuropathy by stereographic disc photography assessment.

Table 6.2: Age Comparison of Stratified Training and Testing Partitions Data partitions were stratified for proportional class representation

representation. This resulted in subsamples with similar age and gender characteristics (Table 6.2).

6.1.1

Stereoscopic Disc Photography Grading

Stereographic disc photography images were captured within 30 days of the HRT and visual field evaluations. Images were captured using a simultaneous stereoscopic camera (3-Dx, NIDEK Co., Ltd.). Photographs were evaluated according to protocols previously described by Johnson and colleagues [9]. Three fellowship trained glaucoma specialists who were masked to the subject’s identification and group affiliation (study or comparison groups) evaluated photographs to assign a classification of either normal or glaucomatous optic neuropathy. The evaluation criteria used to make these assignments was based on stereoscopic photographic evidence only. The following features were evaluated: photographic clarity and stereopsis, disc hemorrhages, sectoral nerve fiber bundle defects, excavation of the optic nerve head, thinning of the neural retinal rim globally or locally (notching), and cup to disc 62

ratio [9]. If record classification labels assigned by the first two examiners did not agree, a third masked expert examiner adjudicated the final class assignment.

6.1.2

Moorfields Regression Analysis

Moorfields Regression analysis is a multivariate linear regression of summary indices from the standard analysis of the HRT data. This method of analysis was developed by Garway-Heath and colleagues who have published a description and validation of their methods [5, 6]. Some of the key regression variables include: age, optic disc size, and neuroretinal rim area. These values are either supplied by the examiner (e.g. age) or computed by the HRT software as a product of the examination after the user identifies the boundaries of the optic disc. A normative database of 112 subjects provide the basis for individual comparisons. For an individual test evaluation, the Moorfields Regression Analysis compares global features such as the total neuroretinal rim area as well as rim area by disc sectors to these normative data. In addition to numeric assessments, a final categorical classification is assigned as either: within normal limits, borderline, or outside normal limits. Results are displayed both graphically and numerically to augment any other information that the examiner elects to consider as part of an individual diagnostic decision. An example of the Moorfields Regression Analysis report is shown in Figure 6.1. The primary features of the Moorfields Regression Analysis are the patient’s age, log of the total neuroretinal rim area, and neuroretinal rim area by sector. The Moorfields Regression Analysis report can be broken into three sections (1) the graphical summary overlaid in the anatomical image, (2) the graphical numeric

63

summary, and (3) The tabular numeric summary. These regions of the report are indicated in Figure 6.1 by sections A-C respectively. The left side of Section A in Figure 6.1 shows a color coded plot of the neuroretinal rim (outermost colors in the grayscale figure, green and blue in the color figure). The physiological cup is indicated as the darker region at the center of the circular disc (center most region in the grayscale figure, red in the color figure). The right hand panel of Section A in Figure 6.1 shows the optic disc divided into six sectors. Each individual sector is analyzed separately and each sector is individually classified as within normal limits, borderline, or outside normal limits. Categorical class assignments for each sector are indicated by the following symbols: normal limits, ! —borderline and

$ —outside normal limits.

" —within

Section B of Figure 6.1 contains two parts, an upper graphical section and a lower categorical classification section. The upper graphical section has seven bar plots that correspond to the six individual neuroretinal rim sectors in the top-right panel of Section A, as well as a global assessment. The left-most bar plot is the global assessment of the neuroretinal rim area. From top to bottom, there are several lines that indicate (1) the age-based normal predicted neuroretinal rim area, (2) the lower limit of the 95th percentile, the lower limit of the 99th percentile, and the lower limit of the 99.9th percentile. The height of the green bar (lighter color in grayscale) indicates the individual’s test results, where a higher green bar is better. Results that are above the lower 99th percentile are marked with the

" —symbol. Neuroretinal rim area

that is lower than the 99th percentile, but above the 99.9th percentile is classified as border line and indicated with the ! —symbol. Neuroretinal rim area less than the 99.9th percentile is indicated as outside normal limits with the 64

$ —symbol. Below

Figure 6.1: Moorfields Regression Analysis

65

this series of bar charts is a categorical classification assignment based on selective weighting of the global and sectoral rim area results. The classification assignment is outside normal limits for the report shown in Figure 6.1. The bottom section of Figure 6.1, Section C, shows a tabular numeric summary of the same information shown graphically in the upper sections of this same figure. In this research, the final categorical class assignments from the Moorfields Regression Analysis were used to assign classification labels to the dataset. The three categorical class assignments were recoded as a binary variable. Glaucomatous optic neuropathy was assigned to test results that were either outside normal limits, or borderline and normal was assigned to test results that were classified as within normal limits by Moorfields Regression Analysis.

6.1.3

Visual Field Assessments

The visual field data were selected from the same cross-sectional exam date window (e.g. ≤ 30 days from the HRT and stereo disc photo examinations). The Visual field examination strategy was a full-threshold or SITA-standard 24-2 pattern that tests within the central 30◦ of the visual field. The methods used to evaluate the visual field test results for each patient were based upon methods originally described by Johnson and colleagues [8, 9]. In addition to the cross-sectional exam data, the next available test—approximately 1 year later in almost every instance—was used for confirmation of any visual field defects discovered according to the methods described by Johnson and colleagues [8]. In their original work, Johnson and colleagues developed analysis routines intended to match the algorithms of the Humphrey automated threshold perimeter.

66

Classification Method Disc Photography MRA Visual Field—GHT Visual Field—GHT clusters

Criteria Abnormal by expert review Outside normal limits or Borderline Outside normal limits (confirmed) Two abnormal GHT clusters (< 0.5% confirmed)

MRA = Moorfields regression analysis; GHT = Glaucoma hemifield test

Table 6.3: Classification Criteria for Glaucomatous Optic Neuropathy

The calculated output from their Visual Basic implementation was was compared against the results of the native Humphrey perimeter software routines to validate their methods. For this research, these routines were rewritten in Matlab (v. R2006a) and the output was compared with results from the original Visual Basic routines for a subset of 32 patients sampled to represent the spectrum of possible visual field results. The result was identical output, with minor probability discrepancies near the tails of the normal distribution, e.g. a 2% probability computed in the Visual Basic routines was computed as 3% in 2 cases and appeared to be due to floating point computational accuracy. Using the output from these routines, the criteria selected to classify records as glaucomatous by visual field test results were (1) a Glaucoma Hemi-field Test (GHT) result that was outside normal limits, or (2) two clusters of the GHT with a probability of < .5%. Either of these criteria had to met on both exams (crosssectional evaluation and subsequent testing) and the abnormal clusters had to be the same locations on both exams. In summary, the final classification dataset consists of class labels assigned by the four different criteria listed in Table 6.3.

67

6.2

Decision Tree Classification

A decision tree is a hierarchical collection of conditional rules that are used to assign categorical labels to the records of a dataset. Decision trees consist of nodes that specify a particular attribute of the data, branches that represent a test of each attribute value, and leaves that correspond to the terminal nodes containing records that are labeled with class assignments (Figure 6.2). In Figure 6.2, the circular nodes represent specific attributes (e.g. individual polynomial coefficients or the RMS error value). The square blocks represent leaf nodes where class labels are assigned as the majority class for all cases described by the path leading to the leaf node. The logic of the decision tree is illustrated by classifying a single case. Begin at the top of the tree, select the relevant attributes (e.g. C0 , C60 , etc.), then choose the branch path corresponding to the attribute value for the individual record. Continue to descend the tree until a terminal node is reached. When all records have been sorted, label the records in each leaf node with the majority class. The classic decision tree induction algorithm, C4.5, described by Ross Quinlan in 1986 was used in this study [63, 64]. The C4.5 algorithm is designed to minimize entropy. In the context of information theory, entropy is mathematically defined as a measure of data homogeneity and is defined by the following equation [65]. Entropy(x) = −

n X

p(i)log2 p(i)

(6.1)

i=1

The entropy of the condition (x) for a sample with i possible outcomes is the sum of the product of the probability of outcome i times the log of the inverse of the probability of i, summed over all possible outcomes.

68

Figure 6.2: Example Decision Tree Circular nodes represent attributes (polynomial coefficients), branches are labeled with split criteria values, and leaf nodes (boxes) are labeled with categorical class assignments, e.g. G = glaucoma, N = normal

69

For decision trees, entropy refers to the homogeneity of class labels for a collection of records in a data set. Computing the entropy of a dataset provides a statistical criteria to determine if and where a group of records should be divided to construct the decision tree. In practice, the C4.5 algorithm uses a non-parametric criteria known as the information gain ratio defined by the following relation: X

IGR(S, a) = E(S) −

v∈values(a)



Sa E(Sa ) S 

(6.2)

where IGR is the information gain ratio and E(S) is the Entropy of S. The information gain ratio is calculated as the ratio of entropy that would result from dividing the data as planned, relative to the entropy before dividing the data. This approach provides a mathematically stable criterion to determine the optimal split criteria [65]. Conceptually, entropy is minimal when a set of records all have the same categorical class label. If a collection of records with heterogeneous class labels can be divided into two or more groups, each with more homogeneous class representations than the original combined group, then the product of this split would be a reduction in entropy and an increase in the information gain ratio (Figure 6.3).

It is possible to iteratively compute the entropy and information gain that would result from splitting the dataset for each possible value of an attribute. Moreover, it is computationally feasible to calculate the reduction in entropy that would result from splitting the data at every possible value for every available attribute. In practice, these computations are optimized, reducing the search space required to determine the optimal subset of attributes and associated values at which to divide the data. The result is an efficient non-parametric recursive partitioning algorithm 70

Figure 6.3: Illustration of Decision Tree Logic and Entropy Reduction Splitting the data along attribute y results in the two more homogeneous subgroups depicted at right

that produces a hierarchical collection of classification rules. These rules are easily interpreted as a series of conditional tests of attributes and ultimately result in categorical classification assignments. The C4.5 algorithm execution strategy is to greedily overfit the data and subsequently prune the resulting complex tree. A user defined criterion known as the confidence factor is set to determine how much pruning is performed. This setting is used to eliminate the lower branches of the decision tree based on statistical confidence that the additional splits (details) are providing additional useful information and not merely over fitting the data. In this context, smaller confidence values result in additional pruning. Conceptually, decision trees are routinely used by clinical practitioners as diagnostic tools in medical decision making. A simple example is the use of patient symptoms such as polydipsia, and polyuria along with diagnostic test results, e.g. fasting blood glucose levels and hemoglobin A1c levels, to establish a diagnosis of diabetes. These logical rules could be derived from accumulated evidence and practice

71

experience. The derivation of decision tree classification models can also be automated and the resulting rules based upon more objective and quantitative statistical criteria.

6.2.1

Classification Experiment Methods

The classification experiments conducted in this research were performed using the Weka data mining suite1 (v. 3.5.2) from the University of Waikato, New Zealand [66]. This software implements Release 8, the last public release of the C4.5 algorithm. Decision trees were induced using the default settings of the algorithm first, which included a minimum of 2 objects per leaf node and a confidence threshold of 0.25. These values are used to determine whether or not a resulting tree structure is pruned and by how much. The data were divided into ten folds that were stratified by class representation and nine of these ten partitions were used to train the decision tree, while the reserved partition was used to estimate the performance of the resulting model (Figure 6.4). The reserved partition was then swapped with an alternate partition from the original training set. Another decision tree was trained on this new collection of data and it’s performance estimated from the newly reserved validation set. This procedure was repeated until all ten partitions served as the validation set. Decision tree classification performance was reported as the average of these ten iterations. This procedure is known as 10-fold cross-validation and is widely accepted for providing realistic estimates of classification model performance [67]. The optimal classification model for each dataset (radial polynomials, wavelets, or splines) was determined by iterating the procedures described above over a range 1

http://www.cs.waikato.ac.nz/ml/weka/(accessed 6/29/2006)

72

Figure 6.4: Illustration of 10-fold Cross-Validation The total sample (left) is divided into two subsets: 75% for training and validation and 25% for final testing. The training set is further partitioned into 10 subsets—9 for training and 1 for validation

73

of possible values for each of the decision tree modeling parameters (e.g. confidence thresholds, or minimum number of objects). These experiments were repeated multiple times and performance measures such as model specificity, sensitivity, accuracy, and area underneath the ROC curve were calculated as the mean of these multiple iterations. Classification performance measures were then compared using t-tests corrected for multiple comparisons using methods described by Nadeau and Bengio [68].

6.3

Feature Selection

Classification features were selected based on the structural modeling results presented in Chapter 5. The number of attributes from the previous modeling experiments varied by model. The number of features derived from the radial polynomials were 32, 64, 128, and 256. The number of features derived from the spline and wavelet models were approximately 64, 128, 256, 1024 for each of the surface models. The total number of coefficients varied slightly due to sampling requirements of each of the different methods. In each case all available coefficients were used as attributes to build the classification models. In addition to the model coefficients, the RMS residual error of the model was included as a feature in the classification experiments. As before, optimal classification models were determined from experiments where decision tree parameters were varied for each of the data representations. Once the optimal classification model was determined for each of the data representations (e.g. radial polynomials or wavelets), these optimal classification models were then compared to one another to determine the final best classification model for every method of data representation. 74

In Chapter 7 the classification model results are compared for each of the data modeling methods (e.g. wavelet, radial polynomials, and spline). Representative classification tree structures are shown for each of the experiments and classification attributes are recombined to create a visualization of features that contributed to the classification label assignments.

75

CHAPTER 7

STRUCTURAL CLASSIFICATION RESULTS

7.1

Comparisons of Classification Methods

Classification of early glaucoma and glaucomatous optic neuropathy is challenging under the best of circumstances. This task is more easily accomplished by observing progressive change of both structural and functional disease features over serial examinations. Correct classification of early disease based on a singe crosssectional examination with structural features as the only basis for class assignment is a far more challenging task. The methods available for this classification task are expert evaluation of stereographic disc photos and Moorfields Regression Analysis. While different, these two methods are both intended to identify structural features of glaucomatous optic neuropathy. The two methods of classification provide the most appropriate standards for comparison with the structural classification models developed in this research. The concordance between these two accepted standards of structural classification of glaucomatous optic neuropathy was statistically evaluated using Cochran’s Q test statistic. Cochran’s Q statistic is a test of the equality of proportions for two or more

76

MRA Class Photo Class

ONL

Borderline

WNL

Total

GON Normal

40 7

35 11

39 143

114 161

Total

47

46

182

275

Table 7.1: Comparison of Classification Assignments by Stereographic Disc Photography (Photo Class) and Moorfields Regression Analysis (MRA) WNL = within normal limits, ONL = outside normal limits; Borderline and ONL categories were combined to create binary MRA class assignments; χ2 = 7.74, p = .008

matched samples. This test statistic has a chi-squared distribution. Cochran’s Q is equivalent to McNemar’s test when only two proportions are evaluated. Comparison of the classification labels assigned by expert review of stereographic disc photos to the class labels assigned by the Moorfields Regression Analysis showed that these two methods of classification were not equivalent. A 2 × 2 table showing the classification distributions by each method is given in Table 7.1. These results show that there were a total of 58 disagreements between the two classification methods—39 false negative and 18 (7 + 11) false positive cases. The majority of these disagreements (68% or 39/57 eyes) were instances where eyes were classified as having glaucomatous optic neuropathy by stereographic disc photography and were classified as within normal limits by Moorfields regression analysis. The remaining 32% (18/57 eyes) of disagreements between these two methods were instances where eyes were labeled as glaucomatous optic neuropathy by Moorfields regression analysis, but were assigned to the normal class by stereo disc phototography examination. Of these 18

77

discrepacies, 7 were outside normal limits and 11 were borderline by the original class labels of the Moorfields Regression Analysis. As a result, class assignment differences between these two methods of classification were not associated with either of the two classes that were combined to create the binary glaucomatous optic neuropathy classification label. The distribution of Moorfields Regression Analysis class lables among the eyes classifed with glaucomatous optic neuropathy by stereo disc photography was 31% (35 eyes) were borderline, 35% (40 eyes) were outside normal limits, and 34% (39 eyes) were within normal limits (Table 7.1). Using stereo disc photography as the standard, Moorfields Regression Analysis has poor sensitivity, 66% (75 of 114 eyes) were correctly identified as having glaucomatous optic neuropathy. Using this same disc photography standard, Moorfields Regression Analysis has good specificity; 89% (143 of 161 eyes) were correctly identified as normal. Conversely, using the binary Moorfields Regression Analysis class labels as the standard, the specificity of expert grading of stereo disc photography was 79% (143/182 eyes) and sensitivity was 81% (75/93 eyes) resulting in more balanced classification performance overall. The two visual field-based classification methods were similar to one another, Cochran’s Q statistic with one degree of freedom was χ2 = 2.25, p = .21. As expected in this population of high-risk and early glaucoma patients, there were far fewer eyes labeled with glaucomatous optic neuropathy by disc photo assessments that were also classified as abnormal by visual field test criteria. In Table 7.2, the number of cases classified as glaucomatous optic neuropathy by expert grading of stereo disc photographs is compared to the classification labels assigned by the Glaucoma Hemifield visual field test results. Using stereographic 78

GHT Class Photo Class

Abnormal

Normal

Total

GON Normal

14 5

100 156

114 161

Total

19

256

275

Table 7.2: Comparison of Classification Assignments by Stereographic Disc Photography (Photo Class) and Glaucoma Hemifield Test (GHT) from Visual Field Testing; χ2 = 85.95, p < .001

disc photo grading as the standard, the Glaucoma Hemifield Test (GHT) had nearly perfect specificity (98% or 156/161 eyes), but very low sensitivity (12% or 14/114 eyes). This huge discrepancy between sterographic disc photo grading and glaucoma defined by confirmed visual field test criteria emphasizes the difference between glaucomatous optic neuropathy defined by structural criteria alone and glaucoma defined by measurable loss of visual function. Of the 105 disagreements between these two mehtods, 100 were cases where the eye was classified as glaucomatous optic neuropathy by disc photography grading and and normal by GHT classification. These results were similar to the comparison of stereographic disc photography classification with the other visual field-based classification criteria—two abnormal GHT clusters confirmed on serial examinations. The only difference was six additional cases where the visual field class labels agreed with the positive classification of disease by disc photgraphy grading.

79

7.2

Decision Tree Classification

A series of experiments were run for each of the data modeling methods, e.g. radial polynomials, Wavelets, and Spline data representations. For each modeling method, multiple resolutions of each model were considered, e.g. the number of pseudozernike model coefficients ranged from 32 to 256. An optimal classification model was selected for each data modeling method at every resolution level by evaluating the model accuracy and the area underneath the ROC curve. The C4.5 user defined modeling options such as confidence level and minimum number of objects allowed per leaf node were also varied to determine the best possible classification model. These experiments were repeated 10 times for each experiment with repeated sampling. Classification performance metrics such as accuracy and area underneath the ROC-curve were estimated for each individual run using 10-fold cross validation. The final training model performance metrics were computed as the average performance of these multiple iterations. The optimal classifier was selected for each data representation method (e.g. radial polynomials, or wavelets) and final model performance was then tested on unseen data partitions using the optimal classification models.

7.2.1

Tree Induction

Although the confidence parameter and pruning strategies were manipulated in an attempt to affect decision tree structure and classification performance, the user selectable variable that had the greatest impact was the minimum number of instances permitted in a leaf node. Decision trees of decreasing complexity were generated by increasing the minimum number of of objects allowed at each leaf node before the 80

Area .90 .80 .70 .60 .50

-

Rank 1.0 .89 .79 .69 .59

excellent very good fair poor fail

Table 7.3: Qualitative Ranking of the Area Underneath the Receiver Operating Characteristic Curve

node was split into increasingly smaller subsets. By increasing the minimum number of cases allowed at each leaf node, the resulting trees become increasingly compact, e.g. fewer attributes and fewer leaf nodes. The minimum number of instances allowed for each leaf node was varied from 2 to 40 and this reduced the number of leaf nodes from 16 to 2 respectively. In effect, the minimum number of instances per node was inversely related to model complexity.

7.2.2

Performance Comparisons

The performance of the induced decision tree classifiers for each method of data representation were compared in two ways. First, they were compared using the area underneath the receiver operating characteristic (ROC) curve. Second, the accuracy of the classifiers, defined as the total number of correct classifications—either truepositive or true-negative class assignments—were compared. The categorical ranks assigned in Table 7.3 may be used as a rough guide to interpret the discriminative quality of the reported area underneath the ROC curve.

81

ROC Curve Comparisons A comparison of the two radial polynomial data modeling methods resulted in similar functions for both Zernike and pseudozernike polynomials (Figure 7.1). In Figure 7.1, the ROC Curve Area is shown for each radial polynomial function as model complexity is decreased. Each series plotted represents a classification model derived from a different number of polynomial coefficients. Regardless of which radial polynomial was used, the general shape of the decision tree classification performance curve was improved performance as the minimum number of objects per node was increased. This trend was generally slower to increase when the classification model included additional features (e.g. more polynomial coefficients) and the maximum performance was also generally reduced. In comparison, the pseudozernike-based classification models resulted in greater area underneath the ROC curve with a maximum area of 84% These comparisons were repeated for each of the candidate models: Daubechies 3rd order wavelet, 3rd order Coiflet wavelet, Haar Wavelet, and B-Splines. These classification models were optimized for each number of coefficients considered (e.g. 64, 128, 256, and 1,024). After each of these individual model comparisons were complete, the single best set of features for each model (e.g. pseudozernike-64 coefficients, B-spline-256 coefficients, and Coiflet-144 coefficients) were compared to one another (Figure 7.2). The ROC curves for each model were again compared as a function of the surrogate for model complexity—minimum number of objects per node—to determine the optimal model. In this comparison, the pseudozernike model with 64 coefficients resulted in the classification model with the overall greatest area underneath the ROC-curve at 85% (Figure 7.2). The Coiflet wavelet model had 82

(a)

(b)

Figure 7.1: Comparison of ROC Curve Area for Zernike and Pseudozernike Models (a) Zernike models, and (b) Pseudozernike models. Legends list the number of features (coefficients) for each model, e.g. PZ-64 = pseudozernike model with 64 coefficients

Figure 7.2: Comparison of ROC Curve Area as a Function of Model Complexity PZ-64 = pseudozernike model with 64 coefficients; Coif3-144 = Third order Coiflet with 144 coefficients; Spline-256 = B-Spline model with 256 coefficients

83

the second greatest area underneath the ROC curve (73%) followed by the B-spline model (71%). These models ranged from 10 attributes and 17 leaf nodes to a decision stump that was a simple binary split based on a single attribute and resulting in two leaf nodes. The more conventional means of controlling model complexity is to modify the confidence factor where a smaller number results in greater decision tree pruning and simpler classification models. Iterative modifications of the confidence factor did not result in substantially different classification trees when compared with modifications to the minimum number of objects per node, e.g. many of the same features were represented in both sets of trees and tree structures were similar. Furthermore, the response function of the change in the confidence factor was non-linear and therefore highly sensitive to small changes over a small portion of the range evaluated and insensitive over a broader region. In contrast, changes to the minimum number of objects per node had a more linear response simplifying the decision tree structure. Classification Accuracy While maximizing the area underneath the ROC curve is a good way to estimate classifier performance it is not the only useful criterion for model selection. The calculated area underneath an ROC curve may obscure useful details about how sensitivity and specificity of a classification method varies over the full range of threshold values. It is possible for several ROC curves, each with a very different shape, to result in the same calculated area. Alternatively, one may wish to maximize classification performance within a particular region of the ROC curve to selectively emphasize sensitivity or specificity. Comparisons of classification accuracy for the

84

Figure 7.3: Classification Model Accuracy as a Function of Model Complexity PZ-64 = pseudozernike model with 64 coefficients; Coif3-144 = Third order Coiflet with 144 coefficients; Spline-256 = B-Spline model with 256 coefficients

optimal subset of each candidate model is shown in Figure 7.3. The accuracy of the classification models is related to the ROC curve area and will be maximal when the ROC curve area is greatest unless the number of instances in either category is near zero.

7.2.3

Model Selection

Optimal models for each method were selected based upon cross-validation estimates of model performance from the experimental results described above. These optimal models were then tested on unseen partitions of the data to further test these models and provide final estimates of classification performance. Area underneath the ROC curves and classification accuracy for each of the optimal models are compared with their corresponding training estimates of model performance in Table 7.4 and in Figure 7.4. In general, estimates of both the area 85

Figure 7.4: Comparisons of ROC Curves for Optimal Classification Models PZ-64 = pseudozernike model with 64 coefficients; Coif3-144 = Third order Coiflet with 144 coefficients; Spline-256 = B-Spline model with 256 coefficients

Training

Testing

Model

ROC Area

Accuracy

ROC Area

Pseudozernike-64 Coiflet-144 B-Spline-256

.84(.15) .79(.16) .77(.19)

.82(.15) .78(.16) .73(.17)

.85 .73 .71

Accuracy .80 .77 .68

Table 7.4: Performance Comparisons of Training and Test Data Partitions Standard deviations for the training data partitions are shown in parenthesis

86

underneath the ROC curves and accuracy agreed well with the results from the reserved test partitions of the data. While results with the test partitions of the data were slightly lower in most cases, they were within the reported standard deviations of results from the training partitions of the data. After adjustments for multiple comparisons, these differences in performance were not statistically significant.

7.3

Visualization of Results

The most direct method of visualizing the structure of a decision tree is to diagram the rules as a hierarchical structure as shown before (Figure 6.2). An ideal tree would be relatively sparse with only a few rules required to accurately classify any record. Furthermore, an ideal classification tree would generalize well and perform similarly on both training and unseen test data. In practice, there is often a trade off between classification tree complexity and model performance, where more complex trees fit the training data better, but generalize poorly. Nevertheless, the classifiers created in this research are compact and performed well on both training and testing partitions of the data. In the following sections, traditional hierarchical graphs are presented to visualize the classification rules for the three best decision trees. Alternatives to these traditional graphs that emphasize the spatial location and anatomical correlation to these classification rules are also presented.

7.3.1

Pseudozernike Decision Tree

A total of 3 features were needed to construct the most accurate classification model of all those considered in this research. The hierarchical tree representation

87

Figure 7.5: Pseudozernike Decision Tree Classification Model Decision tree was induced from 64 coefficients; tree attributes are labeled with pseudozernike coefficient numbers C0 − C63 ; branches are labeled with attribute split values. Terminal nodes are labeled with categorical class labels assigned to all instances that satisfy preceding rules.Leaf nodes also show the corresponding number of correct/incorrect classifications for each node, e.g. C0 = Normal(112 correct/16 incorrect)

of these features is shown in Figure 7.5. These individual features have some interpretable meaning and their geometric modes are represented in Figure 7.6 a-c.

The first geometric mode, pseudozernike mode 0 (Figure 7.6 a), is a plane that represents the mean surface height of the fitted data. From the hierarchical tree representation above, this feature was most important for discriminating between normal eyes and eyes labeled with glaucomatous optic neuropathy by expert evaluation of stereographic evaluation of disc photographs. A mean surface height less than 181 µm was the single best rule, which correctly identified 79% (96/121) of all normal eyes. Two additional features, helped to further separate eyes with glaucomatous optic neuropathy from the remaining normal eyes. The other geometric modes that 88

(a)

(b)

(c)

Figure 7.6: Pseudozernike Modes from the Decision Tree Classifier (a) Pseudozernike mode 0, (b) Pseudozernike mode C60 , and (c) Pseudozernike mode C62

contributed to the decision tree are represented by the shapes in Figure 7.6 b and c. These features have no immediately recognizable anatomical structural correlation. Nevertheless, it is possible to speculate on what information they contributed to classification decisions. Each complex surface has a rather large gradient in the mid-peripheral region that may correspond well with the steeper gradients often seen at the cup margin in eyes with glaucomatous optic neuropathy. Further evaluation of the spatial features that differ between these two categorical classes follows in the Section on Two Dimensional Model Representation below.

7.3.2

Coiflet Wavelet Decision Tree

The Coiflet 3rd order Wavelet model with 144 coefficients produced the most accurate classification model of the data of all the candidate wavelet feature representations considered. Like the pseudozernike decision tree classifier, the best Coiflet-based decision tree classifier had only three model coefficients. The structure of this tree is diagrammed in Figure 7.7. 89

Figure 7.7: Wavelet-based Decision Tree Classification Model Tree was induced from 144 coefficients. Tree attributes are labeled with wavelet coefficient numbers (C133 , C13 , and C69 ). Branches are labeled with attribute split values. Leaf nodes are labeled with categorical class labels assigned to all instances that satisfy preceding rules. Leaf nodes also show the corresponding number of correct/incorrect classifications at each node, e.g. C13 = Glaucoma(50 correct/2 incorrect)

The individual coefficients that define this classification tree do not describe global anatomical structural features as the pseudozernike coefficients do. Instead, these coefficients capture local details of features that differ between the two categorical classes considered. The details of this analysis are more fully explored in the following section on two dimensional model representations. Without the context of anatomical landmarks, there is little intuition to suggest the clinical importance of these features identified in the wavelet-based classification tree.

7.3.3

B-Spline Decision Tree

The spline-based decision tree classifier is similar in structure to the pseudozernike decision tree (Figure 7.8). As with the pseudozernike and wavelet-based decision trees, three spline coefficients were sufficient to produce the best classification

90

Figure 7.8: B-spline Decision Tree Classification Model Tree was induced from 256 B-spline coefficients. Tree attributes are labeled with spline coefficient numbers 1-256. Branches are labeled with attribute split values. Terminal nodes are labeled with categorical class labels assigned to all instances that satisfy preceding rules. Leaf nodes also show the corresponding number of correct/incorrect classifications at each node, e.g. C141 = Normal(95 correct/10 incorrect)

tree. Like the wavelet-based classifier, the features of the spline-based decision tree represent local features. These local differences combined with greater variability resulted in lower classification performance by any standard when compared to the other classification models evaluated. From Figure 7.8 c, the spatial location of the coefficients that were selected as attributes by decision tree induction are located in the upper and lower-right quadrants. The anatomical correspondence of these coefficient locations is discussed in the following section.

91

(a)

(b)

(c)

Figure 7.9: Discriminant Features from Optimal Classification Models (a) Pseudozernike; (b) Coiflet wavelet; (c) B-spline

7.3.4

Two Dimensional Model Representations

As an alternative to a hierarchical diagram of decision rules, the results of these experiments may be visualized graphically as two-dimensional figures to more intuitively show the correspondence between anatomical structure and the discriminating features identified by decision tree classification. This method of visualization is achieved by constructing an image from the subset of features identified as the discriminating elements from the decision tree induction. These features are then combined in proportions determined from the value of the splitting attributes of the decision tree. Elements that do not contribute to classification are set to zero to accentuate the contrast between the classification tree attributes and any surrounding structures (Figure 7.9). Pseudozernike Model Visualization As stated previously, the discriminating features of the pseudozernike classification tree do not suggest any readily identifiable correlation to anatomical structure

92

(Figure 7.9 a). One possible interpretation of this feature set is that it captures some of the gradient differences between normal and glaucomatous optic neuropathy in the mid-peripheral regions. Another way to visualize differences between these two categorical classes is to compare the median representation of each class. A figure representing each class was constructed by setting the 64 elements of pseudozernike feature vector to the median values for each of the two classes (Figure 7.10). The difference between the two median class representations (Figure 7.10 a and c) is greater maximum depression, greater total area of depression, and greater slope in the glaucomatous optic neuropathy group compared to the normal group. When these median images are combined with the corresponding coefficients from the decision tree it is possible to show more specifically how cases differed by category. With the median image coefficients that correspond to the decision tree attributes set to the splitting criteria values, it is possible to show how cases that fell outside the category differed from the median of the category. The median normal image combined with the decision tree attributes shows that borderline normal cases had a larger region of depression that had greater maximal depth (Figure 7.10 b). There is also a more subtle region of localized depression in the superior nasal quadrant compared with the inferior margin. The difference between the median glaucomatous optic neuropathy category and the borderline cases created from addition of the decision tree attributes are shown in Figures 7.10 c and d. The median glaucomatous optic neuropathy data differs from borderline features by having greater maximum depth, greater total area of depression, and localized depression in the superior and inferior nasal regions. 93

(a)

(b)

(c)

(d)

Figure 7.10: Visualization of Median Pseudozernike Model Features (a) Pseudozernike representation of the median normal category; (b) Pseudozernike representation of borderline normal features; (c) Representation of median glaucomatous optic neuropathy category features; (d) Borderline glaucomatous optic neuropathy features

94

(a)

(b)

(c)

Figure 7.11: Median Wavelet Models for Each Category (a) Median normal classification features; (b) Median glaucomatous optic neuropathy classification features; (c) Median normal model with decision tree classification features added; note the superior nasal depression

Wavelet Model Visualization Unlike the pseudozernike polynomials, the wavelet models of the data do not capture the global features in the same manner. By design and as a result of the specific implementation of wavelets in this research, the Coiflet-based decision tree captures local structural differences between these two categorical classes. Similar to the pseudozernike representations of the median category, a wavelet-based median surface was computed for each patient category. These median surfaces are shown in Figures 7.11 a and b. When the features of the median normal surface are combined with the attributes of the wavelet classification tree (Figure 7.9 b) a superior nasal notch created by the C69 coefficient is evident (Figure 7.11 c). As with the pseudozernike model, the median image shows how these two categorical groups differed.

95

(a)

(b)

Figure 7.12: Median B-Spline Models for Each Category (a) Median normal classification features; (b) Median glaucomatous optic neuropathy classification features

B-Spline Model Visualization Visualizing the B-spline model in a similar manner, the median normal surface again differs from the median representation of glaucomatous optic neuropathy by having a smaller area of depression with a more shallow maximum value (Figure 7.12). The Spline classification model features that delineate between normal and glaucomatous optic neuropathy are shown previously in Figure 7.9 c. The first feature selected in the hierarchical model lies in the lower-right quadrant (inferior-nasal) as does the third model attribute, which is more peripheral. The second most important model attribute selected lies in upper-right (superior-nasal) quadrant. There is some consistency among the different data modeling methods. In each of the classification models, local clusters of points located at the margin of the steepest data gradient were associated with greater likelihood of classification as glaucomatous

96

optic neuropathy. There was a difference in location between the Wavelet and BSpline methods. The location that had the most significant contribution (higher in the hierarchical decision tree) for glaucoma classification in the B-spline decision tree was a cluster of data point in the inferior-nasal region. The most important location in the wavelet-based decision tree was a superior-nasal cluster of data points. These findings have validity in the context of clinical practice and are consistent with recent reports by others investigating patients with emerging glaucoma [13, 69].

7.4

Alternative Gold Standards

The accuracy of this decision tree classification method is limited by the validity of the gold standard. Up to this point, the gold standard used has been expert grading of stereographic disc photographs. In the subsequent sections two alternative gold standards were considered, Moorfields Regression Analysis and Visual Field based classification assignments.

7.4.1

Moorfields Regression Analysis

Using analysis methods that were otherwise similar, binary classification labels from the Moorfields Regression Analysis were used in place of the classification labels from stereographic disc photography grading. For each data modeling method—radial polynomials, wavelets and B-splines—multiple models were evaluated to determine the best combination of model complexity and model performance. As before, the performance measures considered were area underneath the ROC curve and total accuracy. As shown earlier in Table 7.1, using stereographic disc photography grading for comparison, Moorfields Regression Analysis was specific, but not

97

(a)

(b)

Figure 7.13: Classification Performance Based on Moorfields Regression Analysis (a) Area underneath the ROC curve as a function of minimum number of objects per node (inverse relation to model complexity); (b) accuracy as a function of model complexity. PZ-64 = pseudozernike model, 64 coefficients; Coif-144 = coiflet wavelet model, 144 coefficients; Spline-256 = spline model, 256 coefficients

sensitive. Thus, a greater proportion of normal cases were correctly identified compared to the number of correct cases of glaucomatous optic neuropathy. As a result, the best possible model using Moorfields Regression Analysis as the gold standard were similarly constrained. This resulted in generally lower maximum performance than the photo grading standards by either criteria—ROC curve area or accuracy. A plot of the area underneath the ROC curve as a function of model complexity is shown for all three of the data modeling methods in Figure 7.13 a. Comparing these functions with the plot of ROC curve area using stereographic photo grading (Figure 7.2, the maximum area was lower for all three of the methods. Unlike the disc photo grading standard, the spline-based modeling method performed best. Both the pseudozernike and B-spline methods had better performance as model

98

complexity decreased. Classification accuracy as a function of model complexity was also reduced compared to the best performance with photo grading as the gold standard (Figure 7.13 B). Again, the B-spline model performed best; at 72% total correct classifications with the best classification model. The maximum accuracy of the pseudozernike and wavelet models was near 66%. The specificity that produced this 66% accuracy level for the B-spline model was 78% (103/132 eyes). As expected from the results comparing the photography and Moorfields Regression Analysis gold standards, the specificity of the best spline-based decision tree classifier was poor at 49% (36/74 eyes). In summary, the best decision tree classifier based on the Moorfields Regression Analysis had good specificity, but poor sensitivity. The greatest area underneath the ROC curve resulted from a spline-based model.

7.4.2

Visual Field-based Classification

There were two visual field-based classification standards used, the Glaucoma Hemifield Test (GHT) and two abnormal GHT clusters. Both of these gold-standard classification criteria resulted in similar decision tree performance. If accuracy is the only performance measure considered, both criteria result in a maximal number of correct classifications of nearly 90% (Figure 7.14). Judging by accuracy alone, this is excellent performance. When classification performance is measured by area underneath the ROC curve there were no acceptable decision tree classifiers. The decision tree classifier with the greatest area underneath the ROC curve was a wavelet-based decision tree that had an ROC curve area just under 60% (Figure 7.15). All of the candidate models

99

(a)

(b)

Figure 7.14: Classification Accuracy Based on Visual Field Standards (a) Glaucoma Hemifield Test (GHT) classification standard. Classification accuracy plotted as a function of minimum number of objects per node (inverse relation to model complexity); (b) Two abnormal GHT clusters classification standard. Classification accuracy plotted as a function of model complexity. PZ-64 = pseudozernike model, 64 coefficients; Coif-144 = coiflet wavelet model, 144 coefficients; Spline-256 = spline model, 256 coefficients

(a)

(b)

Figure 7.15: ROC Curve Performance for Visual Field-based Classification (a) Area underneath the ROC curve using GHT as the classification standard. ROC curve area is plotted as a function of minimum number of objects per node (inverse relation to model complexity); (b) Area underneath the ROC curve using two abnormal GHT clusters as the classification standard. PZ-64 = pseudozernike model, 64 coefficients; Coif-144 = coiflet wavelet model, 144 coefficients; Spline-256 = spline model, 256 coefficients 100

(pseudozernike, wavelet and spline) plateaued with non-discriminating classification performance—ROC-curve area of 50%—as model complexity was reduced. This result emphasizes the benefit of comparing area underneath the ROC curve as a summary measure of performance as opposed to classification accuracy alone. Each of these summary measures describes a different aspect of classification performance. Since the patients evaluated in this study were selected to have either early glaucoma or higher risk for glaucoma, there are relatively few cases identified as having both glaucomatous optic neuropathy and visual field defects by either visual field based classification standard. As a result, it is possible to have excellent classification accuracy by simply categorizing all cases as normal. Thus, the total number of correct classifications is high even though the number of correctly identified cases of glaucomatous optic neuropathy is zero. In this case, poor identification of one class, those with glaucomatous optic neuropathy, produces perfect labeling of normal cases. The result is excellent accuracy, but poor individual class discrimination. In summary, the visual field classification standard is inappropriate for this study of early and suspect cases of glaucoma. Structural classification standards such as Moorfields Regression Analysis and expert grading of stereographic disc photography are more appropriate class labeling standards in this population of patients with high risk and emerging disease. If the objective of this study were to use structural information to detect glaucoma, then visual field criteria would be the preferred classification standard. Since the objective of this research is to use quantitative structural data to correctly identify specific features associated with early signs of glaucomatous optic neuropathy, photo grading and Moorfields Regression Analysis are arguably appropriate. 101

CHAPTER 8

CONCLUSION

The central challenge of this dissertation was to evaluate the position asserted in the thesis—that statistical learning methods provide an objective and quantitative means to analyze topographic structural features of the optic nerve head that facilitate early detection of glaucomatous optic neuropathy. Meeting this challenge has led to several incremental advances. First, this research demonstrates several viable approaches to modeling tomographic representations of the optic nerve head region that permit quantitative representations of the structural features and expose them for the purpose of classification. Second, these features were successfully used to classify glaucomatous optic neuropathy at very early stages, prior to the emergence of visual field deficits. Finally, these methods permit visualization of the basis for individual classifications, thereby enhancing the interpretability of these methods.

8.1

Discussion of Modeling Results

The radial polynomials, specifically pseudozernike modeling, provided the most accurate representation of the optic nerve head elevation data. The radial polynomial methods capture the global features best and by design tend to smooth local features. The location of the residual errors observed were not random and appeared 102

to cluster in regions of highest data gradients. The total magnitude of residual errors were also associated with class assignments and were higher among patients classified with glaucomatous optic neuropathy. These errors may be avoided to some degree by fitting a more complex polynomial model to the data. At some point this will lead to diminishing returns in one of two ways. First, adding additional terms comes at the expense of computational speed; more polynomial terms require more time to compute. To keep the computational times short increases the possibility of developing a useful clinical implementation of these methods. Second, at some point additional polynomial terms will not improve the fit of the model, but will contribute additional noise. The asymptotic decline in RMS error helps to illustrate this relationship. When terms in the model begin to describe noise rather than actual structural features, these model terms also become less useful as classification features for decision tree induction. This point is demonstrated by comparing the different levels of complexity of the pseudozernike data models. The decision tree induced from the 64 term pseudozernike model had better classification performance than the 256 term model. Additional model complexity did lower the residual RMS error of the model, but this did not improve the resulting decision tree classification performance. There was a reduction in the time required to compute equivalent models using wavelet and B-spline methods. There are two ways to consider model equivalence. First, the models were compared with respect to the number of features, e.g. 256 Zernike polynomial coefficients, and 256 wavelet coefficients. Second, they were compared with respect to the residual RMS error. The pseudozernike based models had significantly less residual RMS model error when compared with wavelet and 103

B-spline models with an equivalent number of features. To generate models that were comparable with respect to residual model RMS, a 64-element pseudozernike model (50µm) was similar to a wavelet or spline based model with 256 features. Nevertheless, the computational time was very different between the three modeling methods—approximately 5 seconds for the pseudozernike model and .05 seconds for the spline and wavelet methods. The distribution of the RMS error with all of the modeling methods were similar. The greatest residual errors, regardless of modeling method were located in regions of steepest elevation gradients. Furthermore, these were higher among eyes classified with glaucomatous optic neuropathy than eyes classified as normal. Although the magnitude of residual RMS model error seems intuitively useful, the magnitude of the model error alone was not selected as one of the most discriminating features during decision tree induction. It is possible that combining the magnitude of RMS error with some additional information regarding the spatial distribution of this error may prove more useful than the total magnitude alone. One other reason for the association of RMS error with class assignment is related to the boundary problem of fitting radial polynomial and spline models. In each of these modeling methods, larger elevation gradients near the boundary will tend to influence the quality of fit near the periphery. Eyes with a larger area of cupping would therefore be more likely to have greater RMS error in these peripheral locations that would contribute to greater overall RMS error. In this study, eyes classified as having glaucomatous optic neuropathy did have greater cup area, a structural feature long associated with glaucomatous optic neuropathy. Nevertheless, steep mid-peripheral elevation

104

gradients contributed more to RMS error with larger optic nerves than did errors from boundary locations. By design, the wavelet and B-spline modeling methods implemented here capture local detail information rather than global features albeit a downsampled representation of local details. Nevertheless, these wavelet-based modeling methods have identified several locally relevant structural features at the superior and inferior nasal disc margins that were associated with glaucomatous optic neuropathy in this patient population identified as greater than normal risk for glaucoma. There are limitations to the wavelet implementation methods used here. A wavelet signal decomposition results in approximation and detail coefficients. The methods implemented in this research do not use the detail coefficients. This effectively eliminated information related the gradient and other high frequency details that these results suggest provide useful features for discriminating these two patient categories. Modification of these data modeling methods to incorporate this additional information may yield better features that should improve classification results as well. The use of Hilbert, space filling curves presents another limitation. This method of reducing the dimensionality of the data preserved the spatial address, but not the spatial context of the original two-dimensional data array. As a result, this method was useful for constructing visualization of the results, but limited the fidelity of the wavelet models and classification performance based upon them. Alternative strategies that could improve the performance of wavelet-based modeling and subsequent classification include the use of two-dimensional wavelets with spatial clustering [70], or the use of alternatives to the Hilbert space filling curves that better preserve the 105

spatial context. Alternative sampling strategies have been proposed by others that adaptively sample based on the spatial context of local structure in the data that could be applied to this problem [71]. The classification task performed in this research was to classify cross-sectional data. If longitudinal disease classification is the objective, then wavelet methods may have a natural advantage over radial polynomial moments despite their apparent disadvantages as implemented here. Wavelets are, by design, intended for time dependent signal analysis and are ideally suited for detecting change in longitudinal signals. These modeling methods may be easily extended to include analysis of results from automated threshold visual field examinations. The sensitivity of the visual field is sometimes plotted as a two-dimensional threshold surface. When the data are organized in this manner, their features are easily subjected to a similar analysis. This approach may have advantages over point-wise linear regression methods that ignore the influence of spatially adjacent information that could show how clusters of visual field defects evolve over time. In summary, these data modeling methods describe the structural features well. The radial polynomial methods offer the best balance between compactness and fidelity. Unfortunately, they also are the most computationally intensive. Nevertheless, as the classification results show, the number of features required to construct the optimal classifier are reasonably practical to compute. While the radial polynomial modeling methods offer a good balance between representation of global and local feature details, the spline and wavelet modeling methods as implemented here

106

selectively emphasize the detail components. Alternative implementations may improve the utility of these multi-resolution methods. The approach to modeling features as described in this research offers some advantages over current methods of HRT data analysis. By eliminating the need for users to identify the margin of the optic disc, these methods require no user input. As a result, this approach is free from any errors that could be related to erroneous identification of the disc margin. Second, these modeling methods are free from user or software imposed model constraints. The model parameters describe topographic features of the optic disc and peripapillary region. They are not constrained to traditional univariate summary indices such as cup to disc area, neural rim thickness, etc. Swindale and colleagues have described methods for classification of structural features of the optic nerve in eyes with glaucomatous visual field damage [72]. Their approach constrained the data to a Gaussian profile which clearly did not apply to many of the topographic profiles of optic nerve heads evaluated in this research. The approach described in this research allows a greater level of flexibility in modeling the data that may provide better data representations for classification.

8.2

Discussion of Classification Results

The subjects evaluated in these studies were recruited to represent patients with greater than average risk to develop glaucoma as well as early stages of the disease spectrum. As a consequence, diagnostic classification of these individual cases is by design, a difficult challenge. The objective of using machine learning methods to classify structural features associated with glaucomatous optic neuropathy is twofold. First, these methods provide an objective and quantitative means of 107

challenging existing clinical conventions. Second, this approach was intended to provide an alternative perspective on the structural features relevant to the diagnosis of early disease. The product of this effort is a quantitative basis for classification that provides an objective standard, thereby facilitating consistency among clinical investigators. The best decision tree classification performance was the result of tree induction from the 64 element pseudozernike polynomial data representation. The resulting tree had an area underneath the ROC curve of 85% that was associated a specificity of 90% and a sensitivity of 70%. High specificity is a reasonable objective and was more easily achieved than high sensitivity. This result is similar to previous research on emerging glaucoma [8, 9, 4, 73]. It is more difficult to achieve high sensitivity in emerging gluacoma when classification is performed as a cross sectional exercise. As suggested by Medeiros and colleagues, this is a more difficult problem than classification based on longitudinal observations [31]. One reason that high specificity may be easier to achieve is becuase by design, those with disease represented a fairly homogenous group. Consequently, they were more likely to share some structural characteristics than the patients that they were contrasted with. Conversely, eyes that were classified in the normal category were not at all a homogenous group. For example, eyes in the normal category had a mean (± SD) elevation of −74 ± 136 µm, while the glaucomatous optic neuropathy class had a mean elevation of −254 ± 126µm. This average higher elevation makes it easier to label eyes in the normal category, but the overlapping distributions over most of the range of elevations makes it difficult to separate these two categories on this basis alone. 108

A question that arises from review of these results is whether or not one could improve upon them by incorporating ensemble machine learning methods. The two most common ensemble methods that may be combined with decision trees are boosting and bagging [74]. The motive behind the use of these ensemble methods is to reduce the misclassification errors or improve upon the accuracy of performance estimates. The boosting strategy is an iterative learning method that attempts to improve at every iteration by construction of additional classification trees that concentrate only on the marginal instances that were erroneously classified in the previous iteration. Alternatively, boosting is based upon sampling with replacement—a strategy that stabilizes estimates of performance by repeated sampling. Common to both of these approaches is the construction of multiple classifiers upon which final performance is derived. Although the results are not presented in this dissertation, implementing these methods did not improve classification performance above the conventional methods presented here. Moreover, their use comes with a considerable expense to interpretability of the results. For example, there is presently no way to visualize the multiple dimensions of an aggregate of 10 individual decision trees. The decision trees induced from the wavelet and spline data models provided some additional information regarding local features that distinguish between the two patient categories considered here. These models suggest that the superior nasal and temporal margins of the disc provide the most useful local details to separate these categories. This is in agreement with the findings of the Ocular Hypertension Treatment Study [13, 69] and earlier findings dismissed by Wollstein and colleagues [6]. Identification of the superior nasal nerve fiber bundles presents a challenge for correlation of structural features to measures of visual field sensitivity. Presently, 109

visual field test strategies evaluate a very limited region of the temporal visual field. These findings suggest that alternative test strategies that sample a broader spatial representation of the visual field may be useful. Identifying emerging glaucoma is not the same task as identification of moderate to advanced disease. It is likely that the classification models that would result from decision tree induction of data from eyes with measurable visual field loss would differ from those generated from this population and this is a good question to address in future research. An important criticism of this research is the validity of the classification standard—expert grading of stereographic disc photos. In most cases an independent classification standard is preferred. If the research objective is to identify structural features of disease, then visual field performance is typically the basis for disease classification. It is arguable that this information is truly independent. An alternative view is that this complementary data merely represents more severe pathology along the disease continuum. Indeed, Figure 3.2 nicely demonstrates that the subjects considered in this research do not satisfy most accepted visual field-based definitions of glaucoma. At most, these subjects would be considered at risk for disease. In light of recent studies, it may be better to consider the visual field classification data as an indicator of disease severity [31], rather than a sensitive test for early detection in patients with high risk for glaucoma. The results of this research agree with this assertion. In this case, the objective was to identify topographic structural features in the optic nerve head region that were associated with glaucomatous optic neuropathy, not glaucoma. Since glaucomatous optic neuropathy typically precedes measurable 110

visual field loss, visual field-based classification standards are a relatively indiscriminant basis for the identification of ocular hypertensive glaucoma suspects as there will be poor agreement between structural classification standards of glaucomatous optic neuropathy and established visual field defects. This argument is supported by the results comparing visual field class assignments and classification by photo grading in Table 7.2. The point of this research was to detect structural signs of glaucomatous optic neuropathy. This suggests that established methods of detecting structural changes associated with glaucomatous optic neuropathy, e.g. stereographic disc photo grading and Moorfields Regression Analysis are appropriate classification standards. It should be clear that the objectives of this research are not the same as detecting perimetrically defined glaucoma Differences between the basis for classification of glaucomatous optic neuropathy by stereographic disc photo evaluation and topographic HRT evaluation should be emphasized as well. The protocol for classification of glaucomatous optic neuropathy by stereo disc photography relied primarily on features that were neither accessible, nor analyzed from the topographic HRT data. These features included size of the optic disc, disc hemorrhage, vertical elongation of the cup, or any other feature derived from identification of the optic disc including neural rim thickness. A limitation that remains unanswered in this analysis is the fact that suspicious optic nerve head morphology was an acceptable subject entry criterion. This may bias these results in favor of the identified features that distinguish the two patient categories in this research: larger cup area, greater cup volume, and localized superior nasal and inferior temporal height. Independent confirmation in a patient population where disc appearance was not an entry criterion could address this limitation. Nevertheless, 111

results from the Diagnostic Innovations in Glaucoma Study, an ancillary study to the Ocular Hypertension Treatment Study demonstrated that several baseline features were associated with the development of glaucoma (e.g. cup-disc area, mean cup depth, rim area, cup volume, Moorfields Regression Analysis, etc.) even when normal disc appearance was an enrollment criteria for the study [14].

8.3

Future Work

This approach to classification may be easily adapted to integrate additional information, e.g. both structural and function disease elements, interocular asymmetry, demographic data or comorbidities. In this way it would be possible to develop a better model of risk factors for glaucoma that consider not only clinical observations, but other individual factors as well. Predictive modeling is another extension of these classification and modeling experiments. If reliable features can be identified that are associated with increased risk of disease then it is also possible to integrate this information into quantitative models to assign a calculated level of individual risk for disease. These objectives will be the focus of future work. Integration of additional information is another potentially productive vein that should be explored. Decision tree methods are ideally suited to sifting through numerous potentially useful features to identify those that provide the most useful information for class discrimination. There is no requirement that the data are of similar type, e.g. structural features may be easily combined with functional measures or with demographic data. One aspect of glaucomatous optic neuropathy that was not explored with these methods that may add additional power to the analysis is inclusion of structural information from the fellow eye. Others have shown 112

that structural asymmetry is a useful predictor of glaucoma and these measures are already incorporated into the current versions of the HRT software [75]. Su and Fan have described methods to use correlated data for induction of decision trees and this approach may permit the use of correlated data from the fellow eye that would allow consideration of structural asymmetry as a predictor of disease [76]. A limitation of this research is that the demographics of the patient population evaluated are fairly homogeneous. Others have shown that the anatomical characteristics of ethnic groups differ, e.g. the optic disc of African Americans are larger and the risk of glaucoma in this demographic group is greater [13, 69]. Evaluation of other ethnic patient groups should be performed to evaluate the validity of these findings in other patient populations and demographics with known anatomical differences. As race may merely be a surrogate for larger optic discs, additional studies with subjects matched for disc size and demographics would help to address these questions. There are several possible ways to extend these research methods to permit longitudinal analysis. As mentioned above, wavelets offer advantages to performing longitudinal analysis as these signal analysis methods are optimized for this purpose. Clearly there are challenges that have been identified to the current implementation that should be addressed to strengthen this approach such as including detail coefficients and additional spatial information. Nevertheless, wavelets offer clear advantages for longitudinal analysis. Others have described methods that use geometric moments (e.g. radial polynomials) to detect motion in video sequences for video compression applications [43]. It may be possible to modify this approach

113

to permit detection of longitudinal changes in structural features of the optic nerve head. In conclusion, this research provides some incremental advances in structural modeling of the optic nerve head and in the classification of glaucomatous optic neuropathy. First, this research provides quantitative and objective methods of classification that warrant further investigation with different demographic patient populations and other levels of disease severity. Most importantly, these methods should be evaluated with longitudinal data to determine whether or not they are sensitive to change in structural features over time. Second, these methods should be evaluated further to determine if functional measures may be combined with these structural data analysis methods to provide additional insight into the enigmatic relation between the structural signs of glaucomatous optic neuropathy and functional loss of vision in glaucoma.

114

BIBLIOGRAPHY

[1] HA Quigley. Number of people with glaucoma worldwide. Br J Ophthalmol, 80(5):389–393, 1996. [2] DS Friedman. Vision Problems in the U.S. Prevalence of Adult Vision Impairment and Age-Related Eye Disease in America. Technical report, Prevent Blindness America, National Eye Institute, 2002. [3] RA Hitchings and CA Wheeler. The optic disc in glaucoma. IV: Optic disc evaluation in the ocular hypertensive patient. Br J Ophthalmol, 64(4):232–239, 1980. [4] CA Johnson, JL Keltner, MA Krohn, and GL Portney. Photogrammetry of the optic disc in glaucoma and ocular hypertension with simultaneous stereo photography. Invest Ophthalmol Vis Sci, 18(12):1252–1263, 1979. [5] DF Garway-Heath and RA Hitchings. Quantitative evaluation of the optic nerve head in early glaucoma. Br J Ophthalmol, 82(4):352–361, 1998. [6] G Wollstein, DF Garway-Heath, and RA Hitchings. Identification of early glaucoma cases with the scanning laser ophthalmoscope. Ophthalmology, 105(8):1557–1563, 1998. [7] S Miglior, M Guareschi, E Albe, S Gomarasca, M Vavassori, and N Orzalesi. Detection of glaucomatous visual field changes using the Moorfields regression analysis of the Heidelberg retina tomograph. Am J Ophthalmol, 136(1):26–33, 2003. [8] CA Johnson, PA Sample, GA Cioffi, JR Liebmann, and RN Weinreb. Structure and function evaluation (SAFE): I. criteria for glaucomatous visual field loss using standard automated perimetry (SAP) and short wavelength automated perimetry (SWAP). Am J Ophthalmol, 134(2):177–185, 2002. [9] CA Johnson, DP Sample Ph, LM Zangwill, CG Vasile, GA Cioffi, JR Liebmann, and RN Weinreb. Structure and function evaluation (SAFE): II. Comparison of optic disk and visual field characteristics. Am J Ophthalmol, 135(2):148–154, 2003. 115

[10] SK Gardiner, CA Johnson, and GA Cioffi. Evaluation of the structure-function relationship in glaucoma. Invest Ophthalmol Vis Sci, 46(10):3712–3717, 2005. [11] AJ Kwartz, DB Henson, RA Harper, AF Spencer, and D McLeod. The effectiveness of the Heidelberg Retina Tomograph and laser diagnostic glaucoma scanning system (GDx) in detecting and monitoring glaucoma. Health Technol Assess, 9(46):1–148, 2005. [12] RS Anderson. The psychophysics of glaucoma: improving the structure/function relationship. Prog Retin Eye Res, 25(1):79–97, 2006. [13] MO Gordon, JA Beiser, JD Brandt, DK Heuer, EJ Higginbotham, CA Johnson, JL Keltner, JP Miller, 2nd Parrish, RK, MR Wilson, and MA Kass. The Ocular Hypertension Treatment Study: baseline factors that predict the onset of primary open-angle glaucoma. Arch Ophthalmol, 120(6):714–720; discussion 829–730, 2002. [14] LM Zangwill, RN Weinreb, JA Beiser, CC Berry, GA Cioffi, AL Coleman, G Trick, JM Liebmann, JD Brandt, JR Piltz-Seymour, KA Dirkes, S Vega, MA Kass, and MO Gordon. Baseline topographic optic disc measurements are associated with the development of primary open-angle glaucoma: the Confocal Scanning Laser Ophthalmoscopy Ancillary Study to the Ocular Hypertension Treatment Study. Arch Ophthalmol, 123(9):1188–1197, 2005. [15] C Bowd, LM Zangwill, FA Medeiros, J Hao, K Chan, TW Lee, TJ Sejnowski, MH Goldbaum, PA Sample, JG Crowston, and RN Weinreb. Confocal scanning laser ophthalmoscopy classifiers and stereophotograph evaluation for prediction of visual field abnormalities in glaucoma-suspect eyes. Invest Ophthalmol Vis Sci, 45(7):2255–2262, 2004. [16] LA Kerrigan-Baumrind, HA Quigley, ME Pease, DF Kerrigan, and RS Mitchell. Number of ganglion cells in glaucoma eyes compared with threshold visual field tests in the same persons. Invest Ophthalmol Vis Sci, 41(3):741–748, 2000. [17] C Johnson. Selective versus nonselective losses in glaucoma. J Glaucoma, 3:S32S44, 1994. [18] R Bathija, N Gupta, L Zangwill, and RN Weinreb. Changing definition of glaucoma. J Glaucoma, 7(3):165–169, 1998. [19] BL Lee, R Bathija, and RN Weinreb. The definition of normal-tension glaucoma. J Glaucoma, 7(6):366–371, 1998. [20] JC Morrison and IP Pollack. Glaucoma : science and practice. Thieme Medical Publishers, New York, 2003. 116

[21] A Fercher, W Drexler, C Hitzenberger, and T Lasser. Optical coherence tomography - principles and applications. Rep Prog Phys, 66(2):239–303, 2003. [22] RN Weinreb, S Shakiba, and L Zangwill. Scanning laser polarimetry to measure the nerve fiber layer of normal and glaucomatous eyes. Am J Ophthalmol, 119(5):627–636, 1995. [23] LM Zangwill, K Chan, C Bowd, J Hao, TW Lee, RN Weinreb, TJ Sejnowski, and MH Goldbaum. Heidelberg retina tomograph measurements of the optic disc and parapapillary retina for detecting glaucoma analyzed by machine learning classifiers. Invest Ophthalmol Vis Sci, 45(9):3144–3151, 2004. [24] C Bowd, FA Medeiros, Z Zhang, LM Zangwill, J Hao, TW Lee, TJ Sejnowski, RN Weinreb, and MH Goldbaum. Relevance vector machine and support vector machine classifier analysis of scanning laser polarimetry retinal nerve fiber layer measurements. Invest Ophthalmol Vis Sci, 46(4):1322–1329, 2005. [25] FS Mikelberg, C Parfitt, N Swindale, S Graham, S Drance, and R Gosine. Ability of the Heidelberg retina tomograph to detect early glaucomatous visual field loss. Journal of Glaucoma, 4(4):242–247, 1995. [26] M Iester, DC Broadway, FS Mikelberg, and SM Drance. A comparison of healthy, ocular hypertensive, and glaucomatous optic disc topographic parameters. J Glaucoma, 6(6):363–370, 1997. [27] M Iester, FS Mikelberg, and SM Drance. The effect of optic disc size on diagnostic precision with the Heidelberg retina tomograph. Ophthalmology, 104(3):545– 548, 1997. [28] R Bathija, L Zangwill, CC Berry, PA Sample, and RN Weinreb. Detection of early glaucomatous structural damage with confocal scanning laser tomography. J Glaucoma, 7(2):121–127, 1998. [29] RO Burk, K Rohrschneider, H Noack, and HE Volcker. [Volumetric analysis of the optic papilla using laser scanning tomography. Parameter definition and comparison of glaucoma and control papilla]. Klin Monatsbl Augenheilkd, 198(6):522–529, 1991. [30] BA Ford, PH Artes, TA McCormick, MT Nicolela, RP LeBlanc, and BC Chauhan. Comparison of data analysis tools for detection of glaucoma with the Heidelberg Retina Tomograph. Ophthalmology, 110(6):1145–1150, 2003. [31] FA Medeiros, LM Zangwill, C Bowd, PA Sample, and RN Weinreb. Use of progressive glaucomatous optic disk change as the reference standard for evaluation of diagnostic tests in glaucoma. Am J Ophthalmol, 139(6):1010–1018, 2005. 117

[32] PA Sample, MH Goldbaum, K Chan, C Boden, TW Lee, C Vasile, AG Boehm, T Sejnowski, CA Johnson, and RN Weinreb. Using machine learning classifiers to identify glaucomatous change earlier in standard visual fields. Invest Ophthalmol Vis Sci, 43(8):2660–2665, 2002. [33] R Chrastek, M Wolf, K Donath, H Niemann, D Paulus, T Hothorn, B Lausen, R Lammer, CY Mardin, and G Michelson. Automated segmentation of the optic nerve head for diagnosis of glaucoma. Med Image Anal, 9(4):297–314, 2005. [34] EA Essock, MJ Sinai, C Bowd, LM Zangwill, and RN Weinreb. Fourier analysis of optical coherence tomography and scanning laser polarimetry retinal nerve fiber layer measurements in the diagnosis of glaucoma. Arch Ophthalmol, 121(9):1238–1245, 2003. [35] MH Goldbaum, PA Sample, K Chan, J Williams, TW Lee, E Blumenthal, CA Girkin, LM Zangwill, C Bowd, T Sejnowski, and RN Weinreb. Comparing machine learning classifiers for diagnosing glaucoma from standard automated perimetry. Invest Ophthalmol Vis Sci, 43(1):162–169, 2002. [36] RN Weinreb, AW Dreher, and JF Bille. Quantitative assessment of the optic nerve head with the laser tomographic scanner. Int Ophthalmol, 13(1-2):25–29, 1989. [37] Y Liu, X Zhou, and W-Y Ma. Extracting texture features from arbitraryshaped regions for image retrieval. In Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, ICME 2004, pages 1891–1894, Taipei, Taiwan, 2004. IEEE. [38] HA Quigley, AE Brown, JD Morrison, and SM Drance. The size and shape of the optic disc in normal human eyes. Arch Ophthalmol, 108(1):51–57, 1990. [39] JB Jonas, GC Gusek, and GO Naumann. Optic disc, cup and neuroretinal rim size, configuration and correlations in normal eyes. Invest Ophthalmol Vis Sci, 29(7):1151–1158, 1988. [40] CH Teh and RT Chin. On image analysis by the methods of moments. IEEE T Pattern Anal, 10(4):496–513, 1988. [41] vF Zernike. Beugungstheorie des schneidenverfahrens und seiner verbesserten form, der phasenkontrastmethode. Physica, 1:689–704, 1934. [42] DH Hoekman and C Varekamp. Observation of tropical rain forest trees by airborne high-resolution interferometric radar. Geoscience and Remote Sensing, IEEE Transactions on, 39(3):584–594, 2001. 118

[43] J Shutler and MS Nixon. Zernike velocity moments for sequence-based description of moving features. Image and Vision Computing, 24(4):343–356, 2006. [44] LN Thibos, A Bradley, and X Hong. A statistical model of the aberration structure of normal, well-corrected eyes. Ophthalmic Physiol Opt, 22(5):427– 433, 2002. [45] LN Thibos, RA Applegate, JT Schwiegerling, and R Webb. Standards for reporting the optical aberrations of eyes. J Refract Surg, 18(5):S652–660, 2002. [46] RH Webb. Zernike polynomial description of ophthalmic surfaces. JOSA:Ophthalmic and visual optics topical meeting, pages 38–41, 1992. [47] DR Iskander, MJ Collins, and B Davis. Optimal modeling of corneal surfaces with Zernike polynomials. IEEE Trans Biomed Eng, 48(1):87–95., 2001. [48] MD Twa, S Parthasarathy, TW Raasch, and MA Bullimore. Decision tree classification of spatial data patterns from videokeratography using Zernike polynomials. In D Barbara and C Kamath, editors, 3rd Annual SIAM International Conference on Data Mining, pages 3–12. Society for Industrial and Applied Mathematics, 2003. [49] MD Twa, S Parthasarathy, C Roberts, AM Mahmoud, TW Raasch, and MA Bullimore. Automated decision tree classification of corneal shape. Optom Vis Sci, 82(12):1038–1046, 2005. [50] KA Marsolo, MD Twa, MA Bullimore, and S Parthasarathy. A Model-Based Approach to Visualizing Classification Decisions for Patient Diagnosis. In S Miksch, J Hunter, and E Keravnou, editors, Artificial Intelligence in Medicine: 10th Conference on Artificial Intelligence in Medicine, AIME 05, pages 473–483. Springer-Verlag, 2005. [51] K Marsolo, M Twa, S Parthasarathy, and MA Bullimore. Spatial modeling and classification of corneal shape. IEEE Trans Inf Technol Biomed, page [In Press], 2006. [52] TO Salmon. Corneal contribution to the wavefront aberration of the eye. Phd dissertation, Indiana University, 1999. [53] AB Bhatia and E Wolf. On the circle polynomials of Zernike and related orthogonal sets. Proc Cambridge Philosoph Soc, 50:40–48, 1954. [54] TV Pham and AWM Smeulders. Sparse Representation for Coarse and Fine Object Recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(4):555–567, 2006. 119

[55] I Daubechies. Ten lectures on wavelets. Society for Industrial and Applied Mathematics, Philadelphia, Pa., 1992. [56] D Gabor. Theory of communication. J Inst Elect Eng, 93:429–457, 1946. [57] VC Chen and H Ling. Joint time-frequency analysis for radar signal and image processing. Signal Processing Magazine, IEEE, 16(2):81–93, 1999. [58] MN Do and M Vetterli. Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance. IEEE Transactions on Image Processing, 11(2):146–158, 2002. [59] J Li and RM Gray. Context-based multiscale classification of document images using wavelet coefficient distributions. IEEE Transactions on Image Processing, 9(9):1604–1616, 2000. [60] H Sagan. Space-filling curves. Springer-Verlag, New York, 1994. [61] DR Iskander, MR Morelande, MJ Collins, and B Davis. Modeling of corneal surfaces with radial polynomials. IEEE Trans Biomed Eng, 49(4):320–328, 2002. [62] WG Cochran. The Comparison of Percentages in Matched Samples. Biometrika, 37(3-4):256–266, 1950. [63] JR Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986. [64] JR Quinlan. C4.5 : programs for machine learning. The Morgan Kaufmann series in machine learning. Morgan Kaufmann Publishers, San Mateo, Calif., 1993. [65] TM Mitchell. Machine learning. McGraw-Hill, New York, 1997. [66] IH Witten and E Frank. Data mining : practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, CA, 2000. [67] R Kohavi. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In International Joint Conference on Artificial Intelligence, pages 1137–1145, 1995. [68] C Nadeau and Y Bengio. Inference for the generalization error. Machine Learning, 52(3):239–281, 2003. [69] LM Zangwill, RN Weinreb, CC Berry, AR Smith, KA Dirkes, AL Coleman, JR Piltz-Seymour, JM Liebmann, GA Cioffi, G Trick, JD Brandt, MO Gordon, and MA Kass. Racial differences in optic disc topography: baseline results 120

from the confocal scanning laser ophthalmoscopy ancillary study to the ocular hypertension treatment study. Arch Ophthalmol, 122(1):22–28, 2004. [70] Stefan Pittner and Sagar V. Kamarthi. Feature extraction from wavelet coefficients for pattern recognition tasks. IEEE Trans Pattern Anal Mach Intell, 21(1):83–88, 1999. [71] Revital Dafner, Daniel Cohen-Or, and Yossi Matias. Context-based Space Filling Curves. Computer Graphics Forum, 19(3):209–218, 2000. [72] NV Swindale, G Stjepanovic, A Chin, and FS Mikelberg. Automated analysis of normal and glaucomatous optic nerve head topography images. Invest Ophthalmol Vis Sci, 41(7):1730–1742, 2000. [73] MF Armaly. The optic cup in the normal eye. I. Cup width, depth, vessel displacement, ocular tension and outflow facility. Am J Ophthalmol, 68(3):401– 407, 1969. [74] JR Quinlan. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, page 725730, 1996. [75] P Harasymowycz, B Davis, G Xu, J Myers, A Bayer, and GL Spaeth. The use of RADAAR (ratio of rim area to disc area asymmetry) in detecting glaucoma and its severity. Can J Ophthalmol, 39(3):240–244, 2004. [76] X Su and J Fan. Multivariate survival trees: a maximum likelihood approach based on frailty models. Biometrics, 60(1):93–99, 2004.

121