Nov 26, 2010 - (Haacke et al. 1999; Liang and. Lauterbur 2000 ...... Dr. Hans Burkhardt, who fully supported all my intentions, I offer my lasting appreciation.
Data-driven Quantification and Classification of Diffusion and Perfusion Magnetic Resonance Data
Dissertation zur Erlangung des Doktorgrades der Fakultät für Angewandte Wissenschaften der Albert-Ludwigs-Universität Freiburg im Breisgau
Vorgelegt von Susanne Schnell aus Greifswald 20. Mai 2010
Dekan: Prüfungskommission:
Datum der Disputation:
Prof. Dr. B. Nebel Prof. Dr. Hannelore Bast (Vorsitz) JunProf. Dr. Olaf Ronneberger (Beisitz) Prof. Dr. H. Burkhardt (Betreuer) PD. Dr. Michael Markl (Prüfer) 26.11.2010
Contents DEUTSCHE ZUSAMMENFASSUNG: DATEN-GETRIEBENDE QUANTIFIZIERUNG UND KLASSIFIZIERUNG VON DIFFUSIONS- UND PERFUSIONS-MAGNETRESONANZDATEN .............1 KLASSIFIZIERUNG VON HARDI-DATEN IN GEWEBEARTEN MIT HILFE EINER SVM ................................................. 1 Einleitung ............................................................................................................................................................ 1 Material und Methoden....................................................................................................................................... 2 Ergebnisse........................................................................................................................................................... 3 Diskussion und Ausblick ..................................................................................................................................... 3 VORHERSAGE DER CHRONISCHEN SCHLAGANFALL-FOLGEN UNTER NUTZUNG MULTI-MODALER MRT-DATEN ...... 4 Einleitung ............................................................................................................................................................ 4 Material und Methoden....................................................................................................................................... 5 Ergebnisse und Diskussion.................................................................................................................................. 5 DISKUSSION UND AUSBLICK ..................................................................................................................................... 6 INTRODUCTION
............................................................................................8
1 FUNDAMENTALS OF MAGNETIC RESONANCE IMAGING AND PHYSIOLOGICAL BACKGROUND ..........................................................................................10 1.1. ANATOMY AND FUNCTIONS OF THE HUMAN BRAIN ................................................................................... 10 1.2. MRI IN GENERAL ...................................................................................................................................... 13 1.3. FUNCTIONAL APPLICATIONS OF MR IMAGING .......................................................................................... 13 1.3.1. Diffusion and its properties ................................................................................................................ 14 1.3.2. Diffusion weighted MR imaging ......................................................................................................... 16 1.3.3. Dw-MRI of the human brain ............................................................................................................... 19 1.3.4. Perfusion Weighted Imaging............................................................................................................... 24 1.3.5. Diffusion and perfusion MR imaging in acute stroke ......................................................................... 29 1.3.6. State of the art of prediction algorithms for stroke outcome .............................................................. 33 2
BACKGROUND TO PATTERN RECOGNITION AND SUPERVISED LEARNING......................35 2.1 2.2 2.3 2.4 2.5
3
SUPERVISED LEARNING ............................................................................................................................ 35 THE SUPPORT VECTOR MACHINE .............................................................................................................. 37 IMPACT ON SVM CLASSIFICATION USING IMBALANCED DATA ................................................................. 42 LOGISTIC REGRESSION AS CLASSIFIER ...................................................................................................... 44 DATA SCALING AND NORMALISATION ...................................................................................................... 45
CLASSIFICATION OF HARDI DATA INTO TISSUE TYPES USING A SVM ...............................48 3.1 INTRODUCTION ......................................................................................................................................... 48 3.2 OUTLINE OF THE CLASSIFICATION TASK ................................................................................................... 51 3.3 SIMULATED DATA SETS ........................................................................................................................... 51 IN VIVO HARDI MEASUREMENTS AND PRE-PROCESSING ........................................................................................ 53 3.4 FEATURES AND PROCESSING OF HARDI DATA ........................................................................................ 54 3.5 CLASSIFICATION RESULTS AND COMPARISON WITH OTHER METHODS ..................................................... 57 3.6.1. Simulations ......................................................................................................................................... 57 3.6.2. In vivo results...................................................................................................................................... 59 3.6 DISCUSSION AND OUTLOOK ..................................................................................................................... 65
4
PREDICTION OF STROKE OUTCOME USING MULTI-MODAL ACUTE STROKE MRI DATA ..........................................................................................70 4.1 INTRODUCTION ......................................................................................................................................... 70 4.2 MATERIALS AND METHODS...................................................................................................................... 70 4.2.1. MRI data acquisition .......................................................................................................................... 70 4.2.2. Processing of the acute stroke data .................................................................................................... 72 4.2.3. Training and testing data sets for the Classifiers ............................................................................... 73 4.2.4. Adjustment and usage of classifiers .................................................................................................... 78 4.3 RESULTS ................................................................................................................................................... 79
4.3.1. Statistical Analysis of the performance of the stroke outcome prediction algorithms......................... 79 4.3.2. Data and feature analysis of the nine patients .................................................................................... 82 4.3.3. Evaluation of the developed stroke lesion outcome prediction algorithms ......................................... 90 4.4 DISCUSSION AND OUTLOOK .................................................................................................................... 104 5
SUMMARY
....................................................................................... 111
6
DISCUSSION AND OUTLOOK
....................................................................................... 113
INDEX
............................................................................................I
ABBREVIATIONS ........................................................................................................................................................ I LIST OF FIGURES ...................................................................................................................................................... III LIST OF TABLES ....................................................................................................................................................... VI CITED LITERATURE ................................................................................................................................................. VII OWN PUBLICATIONS
......................................................................................... XI
JOURNAL PAPERS ..................................................................................................................................................... XI CONFERENCE ABSTRACTS....................................................................................................................................... XII ACKNOWLEDGEMENTS
........................................................................................XV
Deutsche Zusammenfassung 1 _____________________________________________________________________________
Deutsche
Zusammenfassung:
Daten-getriebende
Quantifizierung und Klassifizierung von Diffusions- und Perfusions-Magnetresonanzdaten Die vorgestellte Arbeit ist in zwei Hauptteile strukturiert, die als Gemeinsamkeit die funktionelle Anwendung der MR-Bildgebung und ihre neurologische Signifikanz haben. Die Teile werden im Folgenden separat behandelt. Beide Teile bearbeiten ein Klassifikationsproblem auf der Basis eines modellfreien oder daten-getriebenen Ansatzes, da die Hypothese lautet: die zurzeit aktuellen abgeleiteten Modelle vereinfachen so stark, dass wichtige Informationen verloren gehen. Im ersten Teil wurde eine Lösung zum Klassifizieren von hochwinkelaufgelösten diffusions-gewichteten MRT-Daten in alle Bildkomponenten auf mikroskopischer Ebene ohne zusätzlich eine anatomische Messung zu benötigen. Im Speziellen heißt das, dass weiße Hirnsubstanz in zwei mikroskopische Unterklassen aufgeteilt wird (kreuzende und parallele Faserbündel). Diese Information könnte zum Beispiel für das sogenannte „Fibretracking“, wenn man es in zwei Schritten durchführen würde, von Nutzen sein: die Regionen, die parallele Fasern enthalten, könnten mit einem einfachen und schnellen "Streamline"-Algorithmus rekonstruiert werden, wohingegen man sich auf Regionen, die Kreuzungen enthalten konzentrieren könnte und diese separat mit einem teureren, aber akkurateren Algorithmus rekonstruieren könnte. Der zweite Teil der Arbeit ist ein Projekt von großer Bedeutung für die klinische Routine, um die schnelle Entscheidung über die Behandlungsmethode bei ischämischen Schlaganfall zu unterstützen und gibt einen tiefen Einblick in aktuelle Methoden und Möglichkeiten bei der Vorhersage der finalen Schlaganfall-Läsion.
Klassifizierung von HARDI-Daten in Gewebearten mit Hilfe einer SVM Einleitung Hochwinkelaufgelöste diffusions-gewichtete Bildgebungsdaten (HARDI) können für die Klassifizierung in Gewebearten des Gehirns auf mikroskopischer Ebene mit Hilfe des SupportVektor Klassifikators genutzt werden (Schnell et al. 2009). Bei diesem Ansatz können die
Deutsche Zusammenfassung 2 _____________________________________________________________________________ Bildkomponenten
auf
sechs
Klassen
bezogen
werden:
graue
Substanz
(GM),
die
mikroskopischen Unterstrukturen der weißen Substanz: parallele Faserbündel (PF) und kreuzende Faserbündel (CF), das Partialvolumen zwischen weißer und grauer Substanz (PV), Hintergrundrauschen einschließlich Artefaktregionen (BN), sowie das Liquor
(CSF). In
Diffusionstensor-Anwendungen werden typischerweise anatomische T1-gewichtete MPRAGE (Magnetization Prepared Rapid Gradient Echo) Bilder mit den HARDI-Bildern koregistriert. Der Nutzer segmentiert die koregistrierten Daten in nur drei Klassen: GM, weiße Substanz (WM) und CSF, um Masken für die weitere Prozessierung zu erhalten. Dieser Ansatz ist jedoch anfällig für viele Fehler, da die Koregistrierung von zwei verschieden Bildmodalitäten (HARDI und T1 MPRAGE) schwierig ist: der HARDI-Scan wird mit einer schnellen "single-shot echo planar imaging" Pulssequenz aufgenommen und zeigt sehr spezielle Bildartefakte, die im T1 MPRAGE nicht auftreten. Deswegen wird eine Lösung für die Segmentierung des diffusions-gewichteten MRT-Bildes benötigt und konnte auf sechs anstelle von typischerweise drei Klassen erweitert werden, wenn man die Features mit einer daten-getriebenen Methode aus den HARDI-Daten extrahiert.
Material und Methoden Eine Multi-Klassen-Segmentierung der Mikrostrukturen auf Basis von HARDI-Bildern wurde mit Hilfe einer Support-Vektor-Maschine (SVM) erreicht, die aus dem Feld des statistischen Lernens stammt. Die SVM benötigt von den Daten abgeleitete Eigenschaften als Eingangsdaten (der sogenannte Featurevektor). In dieser Arbeit war das die rotationsinvariante Repräsentation der Kugelflächenzerlegung des HARDI-Signals. Mit dieser Information wurde die SVM trainiert, um die Funktion zu finden, die die Klassen separieren kann. Die SVM wurde systematisch mit simulierten Daten getestet und dann in sechs in-vivo Datensätzen angewandt. Da das Training der in-vivo Daten Wissen über die wahren Bildinhalte eines jeden Volumenelementes (Voxel) benötigt, musste dies in zwei Schritten durchgeführt werden. Zunächst wurden nur die Klassen PF und CF mit Hilfe der simulierten Daten (SNR = 10) bestimmt und das Ergebnis wurde mit dem segmentierten WM maskiert (bekannt vom koregistrierten T1 MPRAGE, Registrierung und Segmentierung in SPM5 (2005)). Durch die gleichzeitige Segmentierung von GM und CSF des T1 MPRAGE waren auch für diese Klassen die wahren Voxelinhalte bekannt. Mit Hilfe dieses gelabelten Trainingsdatensatzes wurden verschiedene variierende Parametersätze der SVM getestet. Mit Hilfe eines korrespondierenden
Deutsche Zusammenfassung 3 _____________________________________________________________________________ unabhängigen Testdatensatzes konnten die Klassifikationsergebnisse evaluiert, aber auch mit dem Modellansatz, unter Nutzung des linearen, planaren und sphärischen Westin-Koeffizienten (Westin et al. 1997), verglichen werden.
Ergebnisse Die Klassifizierung der simulierten Daten mit verschiedenen Rauschniveaus zeigte sogar bei einem niedrigen SNR von fünf, das der gewählte SVM-Algorithmus eine hohe Sensitivität und Spezifität, sowie eine hohe Robustheit aufweist. Dies stand im Gegensatz zu dem Modellansatz, der die Westin-Koeffizienten als Basis nutzte. Das Verarbeiten der in-vivo Daten zeigte, dass auch hier, im Gegensatz zum Schwellwertverfahren mit den Westin-Koeffizienten, die daten-getriebene Methode für die Differenzierung zwischen den WM-Strukturen PF und CF geeignet ist und dass diese Methode auch sensitiv genug ist, um zusätzlich die Separierung der Klassen CF von GM zu ermöglichen (siehe Figure 3.3, Figure 3.4 und Figure 3.5). Mit dem Schwellwertverfahren der Westin-Koeffizienten wurden niedrigere Sensitivitäten und Spezifitäten für die Differenzierung zwischen allen Klassen erreicht. Außerdem wurden die SVM Segmentierungsergebnisse mit den Ergebnissen aus der Segmentierung des T1-Bildes mit SPM verglichen und eine gute Übereinstimmung gefunden (siehe Table 6).
Die in-vivo
Ergebnisse zeigten außerdem eine Abhängigkeit von den akquirierten diffusions-enkodierenden (DE) Richtungen, vor allem für das Detektieren der beiden WM-Klassen. Um die Machbarkeit zu zeigen, wurde die Methode an sechs unabhängigen Probanden getestet, was sehr ähnliche Ergebnisse für alle sechs Klassen zeigte.
Diskussion und Ausblick Dieser neue daten-getriebene Ansatz ermöglicht eine vollautomatische Segmentierung auf Basis von HARDI-Daten ohne zusätzlich ein T1 MPRAGE zu benötigen und ohne subjektiver Intervention eines Experten. Das heißt, ist der Trainingsdatensatz einmal aufgesetzt und optimiert worden, kann jedes zukünftige HARDI-Bild, gemessen mit den gleichen Parametern, in die sechs Bildklassen segmentiert werden und das sogar in Substrukturen auf mikroskopischer Ebene unterhalb der Bildauflösung, wie es für die WM-Unterklassen parallele und kreuzende Faserbündel getan werden muss. Das wurde in sechs in-vivo Testdatensätzen mit robusten Ergebnissen gezeigt.
Deutsche Zusammenfassung 4 _____________________________________________________________________________ Die Segmentationsergebnisse könnten als Apriori-Wissen für “Fibretracking”-Algorithmen genutzt werden und so die Performance verbessern. Aber auch für andere klinische
oder
diagnostische Anwendungen der diffusions-gewichteten MRT wäre diese zusätzlich gewonnene Information aus den Daten von großem Nutzen. Mit dieser Anwendung konnte gezeigt werden, dass in Daten versteckte Informationen, die oft durch angenommene Modelle zu stark reduziert werden, mit einem daten-getriebenen Ansatz gewonnen werden können, natürlich mit dem Nachteil der hohen Dimensionalität auswirkend auf Rechenzeiten.
Vorhersage der chronischen Schlaganfall-Folgen unter Nutzung multi-modaler MRT-Daten Einleitung Die Entscheidung über die Behandlung bei akutem nicht-hämorrhagischem Schlaganfall hängt kritisch von der Abwägung zwischen dem Risiko des Infarktwachstums im Gegensatz zum Risiko der Blutung ab (Garcia 1984) und muss schnell getroffen werden. Um die Entscheidung des Arztes zu unterstützen, wurden Vorhersage-Algorithmen zur Bestimmung von den zwei Voxelinhalten gesund und chronisch betroffen entwickelt. Diese Algorithmen nutzen zurzeit den Ansatz der logistischen Regression mit überwachtem Lernen (Wu et al. 2001). Diese Vorhersage-Algorithmen sind auf akut gemessene MRT-Daten angewiesen, im Speziellen diffusions- und perfusions-gewichtete MRT-Daten (dw-MRT, pw-MRT). Von den pw-Daten werden Perfusionswerte mit Hilfe von Modellen abgeschätzt. Jedoch sind diese abgeleiteten Werte oft inakkurat, was die Frage aufwarf, ob die existierenden Probleme vermieden werden können, wenn diese fehlerbehafteten Versuche der Quantifizierung weggelassen werden und ein rein daten-getriebener Ansatz stattdessen für diese medizinische Fragestellung genommen wird In dieser Arbeit wird eine daten-getriebene Lösung für die Aufgabenstellung der Vorhersage der chronischen Schlaganfall-Läsionsgröße vorgestellt, um herauszufinden, ob die existierenden Methoden genug Informationen für diese Aufgabenstellung enthalten. Ein datengetriebener Ansatz für die Selektion der Features aus den Perfusionsdaten wird vorgestellt (im Folgenden auch modellfreier Ansatz genannt) und sowohl für die Standardvorhersagemethode mit der logistischen Regression, als auch mit dem hier eingeführten Klassifikation der SVM angewandt.
Deutsche Zusammenfassung 5 _____________________________________________________________________________
Material und Methoden Es wurden neun akute Schlaganfalldatensätze für die Vorhersage der finalen SchlaganfallLäsion genutzt. Für diese Aufgabe waren zwei Zeitpunkte von Interesse. Die multi-modalen akuten Schlaganfall-Daten (dw-, pw-MRI, T2-FLAIR, T2*) wurden sobald wie möglich nach Ausbruch des Schlaganfalls (Zeitpunkt 1, 1 – 6 Stunden nach Erstausbruch) gemessen und bildeten den Datenkorpus für den Vorhersagealgorithmus. Die T2-FLAIR Messung wurde zwischen zwei Wochen und einem Monat nach Schlaganfall (Zeitpunkt 2) akquiriert und bildete die Basis für die Definition der chronischen Läsion. Sämtliche akquirierten und koregistrierten Daten wurden weiter verarbeitet, um eine bessere Repräsentation der Features zu erreichen. Zwei verschiedene Ansätze wurde untersucht für dieses binäre Klassifikationsproblem: der Modell- und der modellfreie Ansatz. Dies war der Grund für die Erstellung von zwei verschiedenen Datensätzen jeweils zum Trainieren und Testen. Für den Modell-Ansatz wurden die Daten wie in der Literatur vorgeschlagen aufgebaut: der cerebrale Blutfluss (CBF), das cerebrale Blutvolumen (CBV), die mittlere Transitzeit (MTT), der “apparent diffusion coefficient” (ADC), das jeweilige diffusions-gewichtete Bild der drei Hauptrichtungen (DWI), die Messung ohne Diffusionsgewichtung während der DWI-Messung (b0-image), das T2-FLAIR und das T2* Bild waren Grundlage. Diese Zusammensetzung der Daten resultiert zu 8 Features (Wu et al. 2001). Für den modellfreien Ansatz wurde die Perfusionszeitserie in eine Feature-Repräsentation prozessiert. Eine automatische Methode wurde entwickelt, die den Teil des Perfusionssignals extrahiert, der die Boluspassage des Kontrastmittels bis zur Signalerholung enthält.
Ergebnisse und Diskussion Nach der Erstellung der zwei verschiedenen Featuresätze wurde die Prädiktion mit der logistischen Regression und der SVM durchgeführt. Der modellfreie Ansatz konnte keine verbesserten Ergebnisse im Vergleich mit dem Modellansatz erzeugen (siehe Figure 4.19 und Figure 4.20). Obwohl die Ergebnisse sehr ähnlich aussahen, brauchte die der modellfreie Ansatz aufgrund der hohen Datenlast wesentlich länger für den wesentlich höheren Rechenaufwand. Die logistische Regression konnte die Datenmenge des modellfreien Ansatzes gar nicht bearbeiten. Dennoch war keiner der getesteten Klassifikatoren dem anderen überlegen, jedoch präferierte die a
b
c
d
Deutsche Zusammenfassung 6 _____________________________________________________________________________ eine oder andere Feature-Repräsentation (verschiedene Skalierungen oder mit oder ohne Histogramm-Anpassung). Nur die bereits erwähnte lange Rechenzeit des modellfreien Ansatzes macht den Modellansatz, mit der logistischen Regression, die bessere Methode für diese Prädiktionsaufgabe.
Zusätzlich
konnte
gezeigt
werden,
dass
einfachste
Bildnachverarbeitungsmethoden die Vorhersageergebnisse dramatisch verbessern können. Nichtsdestotrotz deckte der daten-getriebene Ansatz auf, dass der Datenkorpus an sich die nötige Information nicht beinhalten könnte, um diese Aufgabe robust und verlässlich durchführen zu können. Weitere Arbeit muss in Zukunft in die Ableitung des Modells gesteckt werden, um die Perfusionsparameter zu bestimmen, aber die wichtigste Verbesserung muss bei den Messmethoden der Gehirndurchblutung passieren. Für Kliniker ist es wichtig zu wissen, dass sowohl das pw- als auch das dw-MRT wichtige Werkzeuge für die frühe Diagnose des ischämischen Schlaganfalls sind. Aber wenn diese Daten im Zusammenhang mit der Prädiktion des Schlaganfallausganges genutzt werden, sind diese Messungen inakkurat entweder aufgrund zu niedriger Bildauflösung (für die pw-MRI beträgt die Schichtdicke ca. 5 mm, was zu großen Problemen mit dem Partialvolumen führt) oder der indirekten Messung der gewünschten Parameter. Für eine größere Genauigkeit müsste man länger messen, doch auch neue Methoden für die Messung der Hirndurchblutung wären von großem Nutzen.
Diskussion und Ausblick Zwei daten-getriebene Evaluationen mit dem Hintergrund der Mustererkennung wurden im Rahmen dieser Doktorarbeit exploriert. In der ersten Anwendung, der Klassifikation von HARDI-Daten in sechs Bildklassen, konnte eine hohe Klassifikationsgenauigkeit erzielt werden. Der Hauptgrund für diese gute Leistung waren die räumlich gut aufgelösten, aber auch die gut aufgelösten Daten bezüglich der hohen Anzahl an gemessenen Diffusionsrichtungen der HARDI-Messung. Außerdem waren alle Klassen gut balanciert, was eine leichtere Differenzierung zwischen den Klassen mit der SVM ermöglichte. Aufgrund der Fähigkeit dieses Ansatzes die Strukturen der weißen Substanz zu differenzieren, wäre eine vielversprechende Erweiterung dieser Anwendung die Klassifizierung von verschiedenen Stadien der Läsionen bei Mutiple Sklerose (MS) mit Hilfe HARDI-Daten und anderen Modalitäten. Auch die Vorhersage des Verlaufs bestimmter Läsionen wäre eine denkbare Anwendung.
Deutsche Zusammenfassung 7 _____________________________________________________________________________ Der zweite Teil der Doktorarbeit beschäftigte sich mit der Vorhersage der chronischen Schlaganfallläsion mit Hilfe von akut gemessenen multi-modalen MRT-Daten und war eine sehr mühsame und schwierige Aufgabenstellung. Außerdem müssten verschiedene Schwierigkeiten gelöst werden, einschließlich dem Problem der sehr unbalancierten Klassengrößen, der niedrigen räumlichen Auflösung der pw-MRT und deren schlechtes SNR, sowie der Signalauslöschung in Voxeln, die Blut enthalten.
Eine weitere Schwierigkeit lag in der großen Varianz in den
Patientendaten aufgrund von Bewegungsartefakten, unterschiedliche Schwerwiegendheit der Schlaganfälle, individueller Durchblutungssitationen und so weiter. Auch die Koregistrierung der chronischen T2-FLAIR Messung mit den akuten pw-Daten, um die Ergebnisse zu evaluieren, ist nicht sehr akkurat, da zum einen große anatomische Veränderungen mit der Abheilung des Gewebes auftreten (wie zum Beispiel das Abschwellen oder das Abheilen der Entzündung) und zum anderen unterschiedliche Bildmodalitäten mit unterschiedlich gemessenen Bildauflösungen koregistriert werden müssen. All diese Punkte machen eine genaue Vorhersage und Evaluierung der chronischen Schlaganfallläsion sehr schwierig. Verbesserungen können in der Messung der dw-MRTs durch einfache Erhöhung der diffusionskodierenden Richtungen (zum Beispiel 12 Raumrichtungen) erreicht werden, aber auf Kosten der Messzeit. Aber dieser kleine Schritt könnte bereits wesentlich mehr Informationen über die zugrunde liegende Zerstörung des WM Gewebes liefern. Die Messung der Hirndurchblutung würde allerdings nicht nur eine verbesserte Bildauflösung (mindestens dieselbe wie das dw-MRT) benötigen, sondern es ist notwendig eine neue Methode zu entwickeln, um die arterielle Eingangsfunktion genau bestimmen zu können. Typischerweise nutzt man bei der Datenverarbeitung und Mustererkennung Modelle oder eine reduzierte Datenbeschreibung, um die Rechenzeiten zu reduzieren und so die Genauigkeit zu verbessern. In dieser Doktorarbeit konnte gezeigt werden, dass dies nicht immer die Wahrheit repräsentiert und dass die Modelle eventuell eine falsche Beschreibung darstellen oder gar wichtige Informationen unterschlagen. Außerdem konnte gezeigt werden, dass die Nutzung eines daten-getriebenen Ansatzes aufzeigen kann, dass Messungen in Wirklichkeit nicht genug Informationen liefern, um für eine bestimmte Aufgabe geeignet zu sein. als Schlussfolgerung kann gesagt werden, dass daten-getrieben Methoden ein sehr gutes Werkzeug sind, um zu testen ob weitere Verbesserungen und Forschung notwendig sind, so dass die zugrunde liegenden Prozesse besser verstanden werden.
Introduction 8 _____________________________________________________________________________
Introduction Magnetic Resonance Imaging (MRI) of the human body enables spatially resolved measurements of tissue properties. In addition to anatomical MRI, physiological functions can also be measured when adding the time dimension. The two functional MRI methods of interest for this thesis are: diffusion-weighted and perfusion-weighted MRI (dw- and pw-MRI). In dwMRI the molecular movement of water is measured along several directions in order to determine the tissue specific diffusion orientation. The quantification of the diffusion orientation is of special interest in the fibrous tissue such as neuronal fibres in the brain, allowing for the detection of damage (e.g. stroke) or shifted pathways (e.g. tumour resections). With the other functional method of interest, pw-MRI, it can be measured how the tissue blood perfusion changes over time after the administration of a contrast agent. With this, tissue specific and diagnostically valuable parameters can be derived such as cerebral blood flow or cerebral blood volume. Data processing of both dw- and pw-MRI data nowadays relies on derived models trying to explain underlying physiology and its relation to MR physics. Unfortunately, very often these models are ill-posed or lack information. In this thesis, the main focus lies on receiving more or the same information from the original data without applying a model, but still finding approaches of reducing data load. This is done in order to test if the used models omit information or if they maybe describe the data insufficiently. Data-driven methods are combined with supervised classification algorithms from the area of pattern recognition. The thesis contains two parts, which have in common the functional application of MR imaging and their neurological significance. Both are classification problems relying on the fact that a model-free approach is used, as our hypothesis states: the up-to-date derived models are oversimplifying so much that important information is omitted. The first part of the thesis deals with the classification of microscopic tissue classes using high-resolution diffusion imaging (HARDI) data (Schnell et al. 2009). It can be shown that segmentation of this data does not only reveal the classical tissue classes such as grey matter (GM), white matter (WM) and cerebro-spinal fluid (CSF), but microscopic substructures of white matter: parallel and crossing fibre bundles. Usually in diffusion tensor applications of the HARDI data the user segments a coregistered T1-weighted MPRAGE (Magnetization Prepared Rapid Gradient Echo) scan into GM, WM and CSF in order to gain masks for further processing. This T1 MPRAGE is usually scanned during the same MR imaging session. But this solution is
Introduction 9 _____________________________________________________________________________ prone to many errors since coregistering of the two different image modalities (HARDI and T1 MPRAGE) is difficult. The diffusion scan is typically acquired using a fast single-shot echo planar imaging (EPI) pulse sequence showing very specific image artefacts, which do not occur in T1 MPRAGE scans. Therefore a solution for segmentation of the diffusion-weighted MRI images is urgently needed and can even be extended into segmentation of six rather than three classes (GM, CSF, background noise including image artefacts, partial volume between grey and white matter as well as the two WM subclasses parallel and crossing fibre bundles (PF and CF)) when extracting the features from HARDI data in a data-driven way. The division of white matter into two subclasses could be very helpful for the reconstruction of neuronal fibres (fibre tracking) for which HARDI data are typically used. Fibre tracking could then be performed in two steps: the regions containing parallel fibre bundles can be tracked with an easy and fast streamline tracking algorithm, whereas areas containing crossing fibre bundles could be concentrated on and reconstructed separately by using e.g. the Gibbs tracking (Kreher et al. 2008) algorithm. The second part of the thesis is a project of high importance for clinical routine for the treatment of stroke patients. A new data-driven method was developed, which uses original acute stroke data for the prediction of the final chronically damaged lesion. After the patients arrive in the hospital the doctors have to decide very fast if treatment using a clot-busting drug should be administered or not. In order to support the decision-making stroke lesion outcome prediction algorithms were developed. The data source in this thesis concentrates on data from MR: the acute diffusion- and perfusion-weighted MRI. Using this acute MRI data a data-driven method for stroke outcome prediction was developed using two different classifiers and then compared to a standard model dependent approach. The dissertation is structured as follows: first, a general introduction covering the topics of anatomy (section 1.1), MRI (sections 1.2 and 1.3) and MRI of stroke (section 1.3.5) is given. Afterwards classification methods in the context of computation theory are introduced (chapter 2). Based on the introduced background the following chapters will cover the thesis topics: datadriven classification of HARDI data (chapter 3) and stroke outcome prediction using supervised learning approaches (chapter 4). At the end of the thesis both applications will be summarised and discussed in the scope of usability and future applications (chapters 5 and 6).
Chapter 1: Fundamentals of Magnetic Resonance Imaging 10 _____________________________________________________________________________
1 Fundamentals
of
Magnetic
Resonance
Imaging
and
physiological background Magnetic Resonance Imaging (MRI) is a non-invasive medical imaging technique, which allows visualising the human body in high detail for either morphology or function. The following first section composes an introduction to this imaging technology as well as the anatomy of the human brain and the pathology of ischemic stroke. The biological and medical background is necessary in order to understand the imaging methods, but also for understanding the practical importance of the presented work. In addition, two applications for imaging functions are explained in detail (diffusion- and perfusion-weighted MRI), which form the data corpus for this thesis.
1.1.
Anatomy and functions of the human brain
The human brain is a complex network of neuronal cells, which exchange information with conduction between different electromagnetic potentials. The neuronal cells are composed of the cell body, the dendrites and a long axon (see Figure 1.1B). The neuronal cells are not distributed arbitrary in a healthy human, but are mainly located at the surface of the brain, which is called the cortex or grey matter (GM) (see Figure 1.1A). The neuronal cells compose clusters, which are divided by their function allowing the mapping of a topology specified by their task. Neighbouring neuronal cells are connected via the dendrites and exchange information using this connection. In addition, information can be sent to neuronal cells, which are farther away and not in the same cortical area by sending an impulse along the long axon. The axons of the neuronal cells are located in the inner part of the brain below the cortex and this tissue type is called white matter (WM) according its colour (see Figure 1.1A). If the connection is pronounced between two areas, several axons compose an axon bundle. The cell wall of an axon forms a long tube, which keeps its diameter along the distance and is not branching. This tube is surrounded by the myelin sheath, which is created by the Schwann cells (B). The myelin sheath is necessary for the fast conduction of impulses and is thick in comparison to the axon itself. This sheath composes a barrier around the axon, through which no water can diffuse resulting in low diffusion perpendicular to the axon.
Chapter 1: Fundamentals of Magnetic Resonance Imaging 11 _____________________________________________________________________________
Figure 1.1: A) Coronal cut through the brain showing in dark grey the different cell types and layers of the grey matter and in light grey the white matter, which is composed by neuronal fibre bundles (figure is modified, original figure from: www.brainmaps.org). B) An exemplary neuronal fibre is shown (modified figure, original figure from: http://en.wikipedia.org/wiki/Neuron)
Besides the GM and WM there is the cerebro-spinal fluid (CSF), which is a fluid of the brain and spine and contains hardly any cells. The CSF serves for protection of the CNS against mechanical deformation or fast pressure equalisation in the fluid system. The CSF can be found between brain and skull but is also connected with some inner CSF “containers”, called the ventricles (see Figure 1.1A and Figure 1.2B) (Reiche et al. 2003). The brain itself is encapsulated in the brain skull and has a strong perfusion with many arteries and veins. Figure 1.2A shows an MR angiography of the brain blood perfusion and Figure 1.2B a schematic drawing of the CSF supply in the brain. The brain blood supply is necessary in order to bring oxygen and glucose to the brain and to evacuate the metabolites and carbon dioxide. Since the human brain is an organ with a very high basal metabolism (it uses a fifth of the total oxygen intake), the blood supply is subject to special conditions. The neuronal cells cannot cover sufficiently the energy need in anaerobe
Chapter 1: Fundamentals of Magnetic Resonance Imaging 12 _____________________________________________________________________________ situations. This is why there are several “security” systems, which make sure the brain is supplied with enough oxygen and substrates. Four big arteries assure the arterial inflow to the brain: two at each side of the oesophagus in the neck; in front the carotid artery and in the back the vertebral artery. The outflow is carried out by the dural venous sinuses (venous channels located between the brain and the dura mater, a layer of the meninges surrounding the brain), which show special properties in comparison to normal veins. They receive blood from internal and external veins of the brain, receive CSF from the subarachnoid space, and ultimately empty into the internal jugular vein.
Figure 1.2: A) A maximum intensity map of a brain MR angiography showing the brain arteries (example image from one of our stroke patient data sets). B) A schematic drawing is shown, which emphasises on the CSF ventricles and flow in the brain (source: http://www.trejos.com/Trejos/BrainCSF.jpg)
In order to get a feeling for perfusion parameters measured with MRI, in the following some quantitative numbers are given. The blood volume per 100 ml brain substance is 4 ml at rest. The normal blood flow in the brain tissue is between 40 and 50 ml blood per 100 g tissue per minute. In GM the blood flow is much higher (up to 90 ml / 100 g / min) than in WM (about 25 ml / 100 g / min). A decrease to half of this could still be compensated with, amongst other things, higher oxygen consumption. But a decrease to 20 ml / 100 g / min would lead to neurological deficits, at first reversible. However, a perfusion below 15 ml / 100 g / min results in gradual cell destruction within minutes to hours. Less than 10 ml / 100 g / min cannot be tolerated by the nerve cells and amounts to cell destruction within eight to ten minutes. (Edvinsson et al. 1993)
Chapter 1: Fundamentals of Magnetic Resonance Imaging 13 _____________________________________________________________________________
1.2.
MRI in general
Magnetic resonance imaging (MRI) uses the property of the 1H nucleus to be manipulated by magnetic fields. An MR-Scanner is composed of three basic parts: ⇒ The magnet: It produces a strong homogeneous magnetic field. This magnetic field cannot be changed during the measurement. ⇒ The receiving and transmitting radio-frequency (RF) coils: The RF coils send high-frequency (or radio-frequency) electromagnetic waves, the so called RF pulses into the person or object to be examined. A potential is produced by coherent induction of the nucleus, which interacts with the receiving coil and induces a current. This current is detected and will be evaluated later on. ⇒ The gradient coils: These coils produce a linear attenuating magnetic field along an arbitrary direction, which will be superimposed on the permanent magnetic field during measurement. Herewith, a linearly varying magnetic field is imposed onto the main magnetic field, which is described by a constant gradient vector. With the aid of gradients switched in different directions it is possible to perform spatial encoding in all spatial dimensions. The timing and order of the gradient switching and production of the RF pulses will be called “pulse sequence” or “sequence” in the following. The choice of the pulse sequence influences the resulting contrast in MR imaging and is one of the main features of this technology. With specific pulse sequences it is possible to influence the image properties or to aim at imaging of specific physical processes spatially and temporally resolved. (Haacke et al. 1999; Liang and Lauterbur 2000; Vlaardingerbroek and den Boer 2003; Bernstein et al. 2004)
1.3.
Functional applications of MR Imaging
Magnetic resonance imaging enables imaging of human function with high temporal resolution. This thesis will discuss two different methods of imaging brain function. The first functional MRI contrast of interest is the diffusion-weighted MRI (dw-MRI). DwMRI is sensitised to motion and allows the determination of the diffusion coefficient depending on the orientation of the diffusion movement of water molecules in an in vivo measurement. In fibrous tissue, such as the neuronal fibres in the human white matter of the brain, there is a high
Chapter 1: Fundamentals of Magnetic Resonance Imaging 14 _____________________________________________________________________________ directional dependency of the diffusion coefficient. This phenomenon provides the basis for MRI-based methods for the reconstruction of neuronal fibre paths, but also allows the detection of brain abnormalities in diseases such as acute stroke or multiple sclerosis. The second MRI contrast of focus in this thesis is the perfusion-weighted MRI (pw-MRI), which images the blood perfusion of tissue. This is done with the help of a contrast agent working as a tracer in order to image the bolus passage through the tissue, which results in concentration time curves and derived perfusion parameters. In the next section the physical background of these two techniques will be introduced. The anisotropy of diffusion occurring in white matter brain tissue will be addressed. Then the measurement of diffusion with MRI will be explained in conjunction with the properties of the tissue and the diffusion-weighted signal. This method enables the measurement of the diffusion coefficient along one direction and when this is repeated with changing direction encoding parameters, the spatial distribution of the diffusion coefficient can be sampled in space. In addition, the method of perfusion imaging and the determination of the perfusion estimates are explained in detail, also addressing pitfalls and artefacts. Finally, both approaches are introduced in the context of stroke imaging and its pathological interpretation.
1.3.1. Diffusion and its properties The process of diffusion involves free Brownian molecular motion. Diffusion can be modified by its surrounding micro architecture, and depends on the types of boundaries, and the extent of motion restriction. These details are discussed below, with the final part of this section giving a short introduction of the diffusion measurement in the human brain using MRI, and the typically used models for approximating the diffusion distributions in the human brain.
1.3.1.1.
Free diffusion
Each molecule in a fluid moves in a specific random trajectory. This property of the molecules is driven by the so-called Brownian law. By indicating molecules at the time t = 0, one can examine the spatial density distribution after a time Δ. The position of the centre of mass stays constant independently of the time Δ, whereas the variance of the density distribution increases with increasing time. (Einstein 1956)
Chapter 1: Fundamentals of Magnetic Resonance Imaging 15 _____________________________________________________________________________ The article of Einstein (1956) investigates the temperature dependency of motion and shows that the density distribution of molecules can be described with a normal (or Gaussian) distribution N(µ,δ). The expectation µ corresponds to the starting position and the variance δ² = 2DΔ is proportional to the diffusion time Δ and the diffusion coefficient D, which describes the mobility of the molecules in the medium. This process is also called Wiener process or molecular diffusion. The diffusion coefficient with the measure [m²/s] depends primarily on the temperature and viscosity of the medium. The diffusion coefficient of free water at a temperature of 37 °C is about D = 3×10-9 m²/s. The for the diffusion coefficient D after a specific diffusion time Δ resulting density distribution for unrestricted diffusion accordant the normal distribution is calculated with: p(p | Δ) =
1.3.1.2.
⎛− r−r 0 exp ⎜ ⎜ 4ΔD 4πΔD ⎝ 1
2
⎞ ⎟ ⎟ ⎠
(1.1)
Diffusion in complex media
The already described normal distribution only occurs in free diffusion in a still and homogeneous fluid. When accounting for the finite circumference of the container and some sufficiently long diffusion time, the density distribution cannot be described with a normal distribution. The molecules cannot cross the borders (distance between borders is described with the variable r) of the container, but become reflected. In this case the form of the density distribution is changing with the diffusion time Δ. For small diffusion times (2DΔ λ2 > λ3, the principle diffusion direction can be determined (Hagmann et al. 2006). This information is used for illustration e.g. in direction dependent colour maps or fibre tracking (reconstruction of neuronal fibres). If the eigenvalues are significantly different from each other, diffusion is said to be anisotropic. If all the eigenvalues are approximately equivalent, diffusion is isotropic and may be represented as a sphere. The relationship between the eigenvalues reflects the characteristics of diffusion. To describe the shape of diffusion with a scalar value, fractional anisotropy (FA) is most often used, which is calculated from the eigenvalues of the DT. FA is computed by comparing each eigenvalue with the mean of all the eigenvalues ( λ ), as in the following equation (Hagmann et al. 2006):
(λ − λ ) + (λ 2
FA =
3 2
1
2
− λ
) + (λ − λ ) 2
2
(1.8)
3
λ12 + λ22 + λ32
Westin et al. formulated coefficients, which give an estimate how spheroid (cs), planar (cp) or linear (cl) the geometric shape of the tensor is (hereafter called the Westin coefficients) (Westin et al. 1997). The sum of all three Westin coefficients is one. cs =
3λ3
λ1 + λ2 + λ3
cp = 2 cl =
(1.9)
λ2 − λ3 λ1 + λ2 + λ3
λ1 − λ2 λ1 + λ2 + λ3
(1.10) (1.11)
Additional methods for the measurement of the diffusion anisotropy evolved using much more complicated setups. With special methods such as q-space or diffusion spectrum imaging (DSI) it is theoretically possible to sample the density distribution along all spatial directions by varying the b-value and constant diffusion time (Cohen and Assaf 2002). For this the gradients should be very short and strong, in order to receive a correct sampling. This is not very practical and in clinical environments not possible and therefore is not further described in the following. However, the interested reader is referred to the review article by Cohen et al. (2002). It was shown that for many applications it is necessary to know the exact diffusion distribution. For this the ADC is determined for several spatial directions, which is often called
Chapter 1: Fundamentals of Magnetic Resonance Imaging 23 _____________________________________________________________________________ HARDI-measurement (High Angular Resolution Diffusion Imaging). In this approach it is assumed that the underlying diffusion is directionally dependent but produces mono-exponential signal characteristics. With increasing number of directions also the number of measurements increases and therefore also the measurement time. Hence, a compromise between quality and measurement time has to be found. Normally, the spherical diffusion distribution is determined by models (see chapter 3). If the used model has a low degree of freedom, the high measurement time can barely be justified by the increased quality of the model. This is the reason why there is a broad range of applied direction schemata in literature, which ranges from 6 to over 100 directions. Various articles suggest several direction-encoding (DE) schemes (Conturo et al. 1996; Jones et al. 1999; Skare et al. 2000). When choosing a DE scheme one has to take care that the directions are evenly distributed on a sphere, to reduce directional bias. In Skare et al. (2000) it is shown that the determination of fibre orientation is most insensitive for noise, if the matrix produced by the gradient direction has a high condition number. When measuring HARDI in the human brain, the adjustment of the b-value is also an important issue. Since the b-value is a measure of the sensitivity of the sequence for diffusion, a high b-value should give the most accurate results. Unfortunately a high b-value also means low signal resulting in a low signal to noise ratio (SNR). There is the rule of thumb that the b-value should be in the area of 1.1/D, whereas D is the expected diffusion coefficient in the examined tissue (Neil 1997). The b-value can be adjusted with gradient strength, gradient duration or diffusion time (see equation (1.3)). In the following, a short overview is given about how to influence a diffusion-weighted MRI measurement: 1. Influence of the diffusion time: As already explained in section 1.3.1.2, in case of restricted diffusion is the diffusion coefficient dependent on the chosen diffusion time Δ. The signal decay is similar to free diffusion for small diffusion times, but if the mileage path within Δ is farther than the distance to any borders, the signal is no longer attenuated. In the case of free diffusion, the signal is linearly dependent on the diffusion time. Therefore the diffusion coefficient is independent from the diffusion time. With for in vivo measurements typically applied measurement parameters the influence from the extra-cellular fluid is stronger than from the intra-cellular fluid. Hence not a restricted but a detained diffusion process is investigated. Thus, with increasing diffusion time retarded signal attenuation can be found. Typically such
Chapter 1: Fundamentals of Magnetic Resonance Imaging 24 _____________________________________________________________________________ diffusion time is chosen which enables a good direction dependent contrast in the diffusion measurement. For an ADC of 0.8×10-8 mm2/s and a mean cell diameter of 10 µm is a diffusion time of 10 ms necessary in order to sufficiently detect anisotropy in the diffusion distribution (Neil 1997). Typical echo timing in the applied dw-MRI sequences for acquisition of several slices covering the whole brain is in the range of TE = 70 ... 120 ms. 2. Influence of the gradients: By changing the gradient amplitude, the b-value can be influenced independently from the diffusion time and the density distribution is not changing. The signal decreases with increasing b-value and constant diffusion time linearly in the beginning, but changes into a non-linear decrease (logarithmic diagram) as with higher b-values the signal attenuation curve clearly flattens (review article Maier et al. (2004)). Originally this effect was ascribed to the interaction of intra- and extra-cellular diffusion. This correlation could not be affirmed by many examinations (Niendorf et al. 1996; Clark and Le Bihan 2000; Lee and Springer 2003; Schwarcz et al. 2004; Ababneh et al. 2005). The actual underlying mechanism is still unknown and is a matter of discussion (Kiselev and Il'yasov 2007). In order to work in the mono-exponential area in HARDI measurements the b-value should be chosen from the linear band. The typically applied b-value lies within the range 800 to 1300 s/mm2.
1.3.4. Perfusion Weighted Imaging Another MRI method for imaging functions in the brain is perfusion weighted imaging (pw-MRI). In pw-MRI a contrast agent is administered into the vein of the arm. This contrast agent causes signal attenuation and the travelling of the contrast agent through the vessels of the brain is observed over time using a fast MRI sequence. The resulting images give responses about the blood perfusion in the brain. This method is in clinical use for acute stroke patients in order to clarify position and size of the stroke as well as possible blood undersupply. Comparing the undersupply with the acute diffusion lesion can determine if thrombolytic (or clot busting) treatment of the patient is indicated, which could possibly improve the patient’s situation, but also has a high risk as it can cause bleeding (see section 1.3.5). In the following paragraphs the physiological background and details about pw-MRI will be given. The application of pw-MRI in acute stroke imaging will be emphasized and its importance and unsolved problems will be discussed.
Chapter 1: Fundamentals of Magnetic Resonance Imaging 25 _____________________________________________________________________________ 1.3.4.1.
Measuring perfusion with MRI and the perfusion estimates
Pw-MRI by dynamic susceptibility contrast (DSC) imaging utilises very rapid imaging for capturing the first passage of intravenously injected paramagnetic contrast agent. This is typically done with an EPI (echo-planar imaging) sequence with a repetition time (TR) of 1.5 seconds or faster. With this setup about 15-20 slices can be acquired with an in-plane resolution of 1.5 to 2 mm and a slice thickness of 5-6 mm, resulting in good brain coverage. By kinetic analysis of this data, haemodynamic indices, namely cerebral blood flow (CBF), cerebral blood volume (CBV) and mean transit time (MTT) can be derived (Belliveau et al. 1991; Rosen et al. 1991; Rosen et al. 1991). In order to do this, there are several assumptions that need to hold true. 1. The tracer must be mixed with the blood, and not impede flow; 2. All of the tracer that enters the vascular system must exit, and not be lost within the system due to blood-brain barrier breakdowns. The paramagnetic contrast agent causes a microscopic susceptibility difference between the vascular wall and the surrounding tissue. This causes the spins in the area to dephase (loss of coherence of spin precession) due to additional T2* and results in a net signal loss. A linear relationship between tissue contrast and change in T2 relaxation rate (ΔR2) is assumed (Østergaard 2005): ΔR2 (t) ∝ Ct (t)
(1.12)
This assumption was confirmed by simulations and indirect in vivo measurements in (Simonsen et al. 1999). The signal intensity depends on the transversal and longitudinal relaxation (ΔR2 and ΔR1) in an exponential manner (Østergaard 2005): − TR ⋅ΔR1 ( t ) − TE ⋅ΔR2 ( t ) (1.13) )⋅e S(t) = S(t0 )(1 − e Assuming that R1 remains constant yields the relation between concentration and signal intensity
(Østergaard 2005): (1.14) ⎛ S(t) ⎞ TE Ct (t) = −k ⋅ log ⎜ ⎟ ⎝ S(t0 ) ⎠ This linear relationship may not hold for all ranges of contrast agent concentrations or tissues due to complex physics of MR signal formation in perfused tissues. The variable k, here assumed to be a constant, in reality is a tissue specific variable, but could only be determined for blood in in-vitro experiments. This assumption of a constant k may cause large errors resulting in overestimation of perfusion estimates (Kiselev 2001).
Chapter 1: Fundamentals of Magnetic Resonance Imaging 26 _____________________________________________________________________________ The arterial and total tissue concentration is detected as a function of time during a single transit (Østergaard 2005). With this, the CBV can be determined from the ratio of the areas under the tissue and arterial concentration time curves (Stewart 1893):
∫ CBV = ∫
∞
−∞ ∞
−∞
Ct (τ ) d τ
(1.15)
Ca (τ ) d τ
This is achieved using concentrations from (1.14). The arterial input function (AIF) Ca(t) is time dependent and the tissue concentration time curve becomes the convolution of the impulse response (impulse response = CBF X·R(t), with R(t) being the residue function, which measures the fraction of tracer present in the vasculature at time t after injection) and the shape of the AIF (Østergaard 2005): Ct (t) = CBF ⋅ (Ca (t) ⊗ R(t))
(1.16)
CBF can be derived from this function by deconvolution of the impulse response, meaning fitting CBF X R(t) from the experimentally arterial and tissue concentration time curves. With the definition of CBF and CBV above, the central volume theorem (Stewart 1893) states that the relationship between these and tissue flow Ft is (Østergaard 2005):
(1.17) CBV CBV = Ft CBF It can be seen that in order to be able to derive CBF, CBV and MTT it is necessary to have MTT =
a representative AIF from the perfusion data set. The AIF should be measured in a voxel placed within the feeding artery supplying the tissue voxel of interest (Zierler 1962). However the manual selection of a representative voxel even by an experienced operator can bring a strong bias into the parameter calculation as can be seen in Mouridsen et al. (2006). Therefore, it is necessary to use a robust and reproducible method for automatic AIF selection. Mouridsen et al. (2006) introduced cluster analysis for the selection of the AIF. The authors first calculated the concentration curves Ct(t) for each voxel and discarded those with a low area under the curve and high fluctuation artefacts. Then a k-means cluster analysis was used in order to partition the concentration curves into k clusters so that those curves belonging to the same group exhibit similar shape features. Then the cluster is selected in which the mean curve has the lowest first moment. Within this selected cluster again the k-mean clustering is applied in order to find the final mean curve with the lowest first moment (Mouridsen et al. 2006). In the following this method will be called “Mouridsen-model”.
Chapter 1: Fundamentals of Magnetic Resonance Imaging 27 _____________________________________________________________________________ Another method of interest in the scope of this thesis is the recently introduced method by Gall et al. (2009) . The authors proposed a data-driven method for the extraction of the passage of an injected contrast agent, which is based on an assumed injection function (typically the injection function is neglected). With this injection function an approximation of the impulse response of the vascular transport is computed by deconvolution. This approximated impulse function is characterised by enhanced minima between subsequent bolus passages allowing a better separation. The first bolus passage is now gained by convolution of the injection function with the first peak of the impulse response. The AIF is automatically selected according to expected shapes. This method will be called “Gall-model” in the following. Once the AIF is determined the CBF, CBV and MTT parameter maps can be determined. The above-mentioned models try to estimate the perfusion parameters from the perfusionweighted MR imaging and bring a couple of problems. In real experiments, experimental noise causes an ill posed problem when deconvolving the equation (1.16) in order to determine CBF. This means that wildly different solutions for the impulse response can result in similar fits to the determined tissue concentration curves. Another already mentioned problem is the assumption of linearity of equation (1.14), which results in overestimation of the perfusion estimates in specific tissues (Kiselev 2001). Especially for the determination of MTT, which can only be determined if CBF and CBV is provided in identical units. This represents a number of practical problems, since the conversion of signal intensities into tracer concentrations does not always follow linearly as the constant in (1.14) may differ among blood and various tissue types (Kiselev 2001). In addition, the necessity of high temporal resolution and due to the inherently limited resolution of MRI relative to vessel size, the acquisition of concentrations time curves of single vessels without partial volume effects are difficult to obtain. Therefore, the AIF measurement is compromised by the limited spatial resolution in pw-MR images leading to partial volume effects where the voxel used for the AIF measurement is contaminated with tissue due to the combination of small arterial radii with large voxels. Depending on the specific voxel composition and structure, different AIF signals are measured. This effect is illustrated in the following Figure 1.6 taken from Kjolby et al. (2009).
Chapter 1: Fundamentals of Magnetic Resonance Imaging 28 _____________________________________________________________________________
Figure 1.6: Resulting differing AIFs (simulation from (Kjolby et al. 2009)) depending on position of the voxel relative to an artery with a radius of 1 mm. The main magnetic field B0 is parallel to the z-axis. a) Four exemplary voxels located outside the artery (represented by four different colours). b) The same colours are used to illustrate the resulting AIFs (time resolved signal). The black signal represents the case without partial volume, but only blood (p = 0) and the orange plot represents the case without partial volume, but only tissue (p = 1).
In addition to this partial volume problem, the method of perfusion measurement itself presents another set of problems. Measurement using gadolinium (Gd) as contrast agent and a T2*-weighted EPI sequence has a high dependence of the signal amplitude and phase upon echo time and Gd concentrations (van Osch et al. 2003). It can be seen that for typical echo times such as 30 ms the signal amplitude in human blood tends to be close to zero, meaning we observe signal void in arteries. In addition, the orientation of the artery to the main magnetic field B0 is of importance as it is assumed to lie parallel. Unfortunately this assumption is not true as typically the AIF is measured in the middle cerebral artery (MCA) which lies perpendicular to B0. It is impossible to predict the signal evolution for a contrast agent passing through a voxel not being parallel to the main magnetic field when partial volume effects are present. This is due to the fact that the contrast agent produces changes of the magnetic field inside and outside the vessel resulting in an additional decrease of the signal amplitude and phase in voxels surrounding the artery (van Osch et al. 2003). This effect of signal void is the reason why, in a realistic experiment, the voxel for determining the AIF is never taken from inside the artery, but close by (see also Figure 1.6). The signal voids and unpredictable signal changes due to the contrast agent impedes the determination of the AIF and adds to the above mentioned problem of the partial volume effects.
Chapter 1: Fundamentals of Magnetic Resonance Imaging 29 _____________________________________________________________________________
1.3.5. Diffusion and perfusion MR imaging in acute stroke The syndrome of stroke consists of an abrupt development of a focal neurologic deficit, which either has the origin of an occlusion of a cerebral vessel (usually arterial) (ischemic stroke), or the spontaneous rupture of an intracranial artery with consequent haemorrhage in the brain parenchyma or in the subarachnoid space (Garcia 1984). The two different origins define two types of stroke: ischemic stroke and brain haemorrhage. Retrospective analyses of large groups of stroke patients have demonstrated that the vast majority (75%) of all strokes are affected by ischemic stroke (Garcia 1984). In order to invoke the correct therapy it is important to identify whether or not the stroke is ischemic and to estimate the time of stroke onset. In case of ischemic stroke usually a thrombolytic treatment is performed using the clot-busting drug: tissue plasminogen activator (tPA). Administration of tPA carries the risk of bleeding in case of haemorrhage or old infarctions, leading to severe complications or patient death (Thomalla et al. 2006). Thomalla et al. (2006) showed that MR imaging (dw- and pw-MRI) is the method of choice for the identification of stroke type and those patients with tissue at risk, an area of infarction that is likely to benefit from thrombolysis in a specific time window. The authors state that the future of acute stroke therapy may not be based on time but likely more of an individually tailored procedure based on imaging findings. The tissue at risk is referred to as the ischemic penumbra in literature. The ischemic penumbra is defined as follows: a dual ischemic threshold is described for the neuronal function; the threshold for release of K+ is clearly lower than the one needed for complete electrical failure. These observations support the concept that in the ischemic penumbra (see Figure 1.7) the neurons may remain structurally viable, but are functionally inactive. In case of treatment this affected region is supposed to recover over time (Garcia 1984). The ischemic penumbra shows reduced cerebral blood flow in comparison to healthy tissue. The ischemic region, which is irreversibly damaged, is called the ischemic core and shows an even lower CBF than the penumbra. In case of a severe infarction or an old ischemic infarction, this region evolved from the penumbra and a thrombolytic treatment is too late having already irreversibly destroyed tissue and a too high risk of bleeding. In addition to these two tissue types of stroke lesions, a third type was identified: the so-called oligaemia (see Figure 1.7). It exhibits CBF below the normal range but above the threshold for penumbra, does not show neuronal failure, and is not at risk of infarction but recovers independently of treatment (Baron 1999).
Chapter 1: Fundamentals of Magnetic Resonance Imaging 30 _____________________________________________________________________________ Symptoms of acute ischemic stroke might be mimicked by a list of other conditions such as haemorrhage itself, migraines, seizures, functional and metabolic disorders and also vasogenic oedema syndromes. Therefore, it is important to visualise and verify that an ischemic lesion is indeed the cause for the clinical symptoms before therapy is initiated. Unfortunately, patients with acute stroke are more likely to be confused and uncooperative and therefore may be difficult to image with MRI safely. However, diffusion-weighted imaging and perfusion-weighted MR imaging are up-to-date tools for a rapid diagnostic test in order to exclude primary intra-cerebral haemorrhage, but reliably identify signs of ischemia very early after stroke onset (Yoneda et al. 1999). DWI measured in three directions (with this ADC, trace and MD are estimated) was found to be significantly better than T2-FLAIR (T2-weighted sequence with fluid attenuation inversion recovery) and T2*-weighted MRI, and much better than computer tomography (CT), for detecting an ischemic focus when imaging early after stroke onset (Thomalla et al. 2006). Also even small ischemic lesions show up clearly, while on CT scans these may still appear normal. This is explained as follows: Cerebral ischemia arises from a reduction in the delivery of oxygen and nutrients to brain tissue due to obstructed blood flow. During acute cerebral ischemia, the rapid failure of the high-energy metabolism and associated ionic pumps leads to the migration of sodium and calcium in the cell. The subsequent influx of osmotically obligated water results in cellular swelling (cytotoxic oedema) and a decrease in the extracellular volume fraction. Decreases in the ADC of brain water have been shown to coincide with the onset of acute cerebral oedema and this relationship allows the extent of the ischemic territory to be visualised as a hyper-intense region in a dw-MR image. (Sotak 2002) The correlation of perfusion parameters (specifically CBF) with dw-MRI reveals that the ischemic core region enlarges when adjacent, formerly penumbral, areas undergo irreversible deterioration during the initial hours of vascular occlusion. At the same time, the residual penumbra becomes restricted to the periphery of the ischemic territory, and its fate may depend critically upon early therapeutic intervention (Back 1998). Therefore the definition of the size of the penumbra is of great importance once the patient reaches the hospital. The idea that the so called DWI/PWI mismatch (see Table 1) would identify the ischemic penumbra (Fisher et al. 1995) and that PWI can identify regions of brain with blood flow below the level of ischemia, but still above the level of permanent damage, makes MRI in acute stroke imaging valuable for fast diagnosis and treatment decisions.
Chapter 1: Fundamentals of Magnetic Resonance Imaging 31 _____________________________________________________________________________ There are three possible mismatch scenarios (see Table 1), which can be observed in acute stroke. First, the lesion appears smaller on the dw-MRI than on the pw-MRI. This is typically observed in large-vessel strokes. The region that shows both diffusion and perfusion abnormalities is thought to represent irreversibly “infracted” tissue, while the region that shows only perfusion abnormalities and has normal diffusion likely represents viable ischemic tissue, or the penumbra and oligaemia. Second, the lesion has the same size on dw-MRI and pw-MRI. This occurs when the tissue is irreversibly “infracted” and there is no penumbra. And in the third scenario, the lesion appears larger on dw-MRI than on pw-MRI or is seen only on dw-MRI and not pw-MRI. These findings are usually associated with early reperfusion of ischemic tissue, and the size of the lesion on dw-MRI does not usually change substantially over time (Srinivasan et al. 2006). The mismatch concept and the three lesion tissue types are illustrated in the following table and figure for better understanding (Table 1,Figure 1.7). Table 1: DWI/PWI mismatch and its interpretation
DWI/PWI mismatch PWI > DWI DWI = core (not salvageable) No penumbra, no salvageable PWI = DWI tissue DWI > PWI Salvageable at-risk tissue Figure 1.7: Ischemic stroke lesion divided in its affected regions: oligaemia, penumbra and core.
Unfortunately, this DWI/PWI mismatch model does not hold in all ischemic stroke cases, especially in the case of PWI > DWI. However, this model is still of great use for fast decision making, but oversimplifies the metabolic and electrophysiological processes involved (Siemonsen et al. 2008). As a result, the PWI lesion overestimates the real penumbra and cannot reliably discriminate between benign oligaemia and penumbra (Siemonsen et al. 2008). A typical scanning protocol for acute stroke includes an axial dw-MRI sequence, a pwMRI sequence, a time-of-flight magnetic resonance angiography of the intracranial arteries (MRA), a T2-weighted sequence with fluid attenuation inversion recovery to attenuate the CSF signal (T2-FLAIR,) and a T2*-weighted sequence for the exclusion of intracranial haemorrhage. Table 2 lists the effects of acute ischemic stroke on various voxels representing specific tissue types in several MR image modalities.
Chapter 1: Fundamentals of Magnetic Resonance Imaging 32 _____________________________________________________________________________ Table 2: List of typical values derived from multi-modal acute stroke protocol for several tissue types of healthy and ischemic voxels. Values were found in several publications (Rohl et al. 2001; Rose et al. 2001).
Tissue
Parameter
Value / effect in image
ADC Normal GM
CBF / ml/100 g/min
~ 0.8 40 – 100
CBV / ml/100 g
~4
MTT / s ADC Normal WM
6 – 12 ~ 0.6 – 0.75
CBF / ml/100 g/min CBV / ml/100 g
15 – 30 ~ 3 – 13*
MTT / s ADC Ischemic oligaemia
12 – 27 10 – 20 % lower than normal
CBF / ml/100 g/min
20 – 70
CBV / ml/100 g
~5
MTT / s ADC Ischemic penumbra
Ischemic core
16 - 17 10 – 20 % lower than normal
CBF / ml/100 g/min
12 – 20
CBV / ml/100 g
>2
MTT / s
145 % of normal hemisphere
ADC
40 – 50 % lower than normal
CBF / ml/100 g/min
0 – 12
CBV / ml/100 g
0 and represent the support vectors. In summary, the projection of the data from the input space to the feature space is performed by kernel functions such as polynomials or Gaussian functions. During the training process the parameters of these kernels are determined. The result is the derived model for the
Chapter 2: Background to pattern recognition and supervised learning 41 _____________________________________ _______________________________________ _____________________________________ decision boundary. Once the SVM is trained, it has to be simply determined which side of the decision boundary a given test pattern x lies and assign the corresponding class label y, i.e. the class of x is assigned to be sign(w · x + b). (Burges 1998) In order to evaluate the SVM results the precision recall curve (PR) and/or the receiver operating characteristics (ROC) curve can be plotted (see section 4.3.1). For this, several pairs of sensitivity and specificity or precisions and recalls have to be achieved. This is also useful in order to find the optimal parameter adjustments and can be done, for example, by changing the weights for one class or the distance b incrementally. But before doing so, also the optimal penalty variable C (equation (2.9)) has to be found, which is done by stepwise increasing C. The prediction with the highest accuracy then had the optimal C adjustment. With this the SVM can be optimised resulting in the best fitting trade-off with good generalisation ability. The SVM was originally developed for a two-class problem: the binary classification. Binary classification is well suited to the problem of identifying healthy and chronically damaged tissue, as required in our problem of stroke outcome prediction. But many problems exist where multiple classes have to be defined, as done in the classification problem introduced in this thesis in chapter 3, where six tissue classes have to be defined within DT images. For this the SVM can be adapted to two common approaches: one-versus-one and one-versus-rest. The one-versus-rest approach divides the decision of an N-class problem into N-times two-class cases. The one-versus-one approach constructs a SVM for each pair of classes resulting in N(N1)/2 SVMs. When applying this to a test point, each classification adds a counter of the winning class and at the end the point is labelled with the class with the most votes. The one-versus-rest approach has the disadvantage that the performance can be compromised due to unbalanced training data sets (Gualtieri and Cromp 1999). In contrary, the one-versus-one approach is computationally more expensive since more SVM pairs have to be computed. (Burges 1998)
Chapter 2: Background to pattern recognition and supervised learning 42 _____________________________________________________________________________
2.3 Impact on SVM classification using imbalanced data Application areas such as medical diagnosis have highly skewed data sets with a very small number of positive instances, which are hard to classify correctly, but important to detect nevertheless. Classifiers generally perform poorly on imbalanced data sets because they are designed to generalise from training data and output the simplest hypothesis that best fits the data. The creation of the labels for the training data set states a hypothesis following the rules of Bayesian statistics. The starting point for Bayesian analysis is a prior distribution over the set of hypotheses describing the learner’s prior belief of the likelihood of a particular hypothesis generating the data. The set of prior probabilities specifies the distribution of examples of the various classes in the data and could be different from the distribution observed in the data. All learning systems have to make some prior assumption of a Bayesian type often called the learning bias (Schölkopf and Smola 2002). Therefore the creation of the training data sets has to be done with care in order to reduce the learning bias as much as possible. With imbalanced data, the simplest hypothesis is often the one that classifies almost all instances as negative. Another factor is that making the classifier too specific may make it too sensitive to noise and more prone to learn an erroneous hypothesis. A popular approach towards solving these problems is to bias the classifier so that it pays more attention to the positive instances. With the SVM this can be done, for instance, by increasing the penalty C, which is associated with misclassifying the positive class relative to the negative class. Another approach is to pre-process the data by oversampling the majority class or undersampling the minority class in order to create a balanced training data set (Akbani et al. 2004). There are several reasons why especially the SVM loses performance with imbalanced data sets (Akbani et al. 2004): 1. Positive points lie further from the ideal boundary: The imbalance in the training data ratio means that the positive examples may lie further away from the “ideal” boundary than the negative instances. For example, if one draws n randomly chosen numbers between 1 to 100 from a uniform distribution, the chances of drawing a number close to 100 would improve with increasing values of n, even though the expected mean of the draws is invariant of n. As a result of this phenomenon, SVM learns a boundary that is too close to
Chapter 2: Background to pattern recognition and supervised learning 43 _____________________________________ _______________________________________ _____________________________________ and skewed towards the positive examples (Wu and Chang 2003). Meaning the learning bias towards the negative class increases. 2. Weakness of Soft-Margins: Mathematically, it can be seen from equations (2.9) and (2.13) that minimising the first term on the right hand side ||w||2 /2, is equivalent to maximising the margin γ, while minimising the second term C Σξ minimises the associated error. The constant C specifies which trade-off is tolerated between maximising the margin and minimising the error. In case of a not very large C, the SVM learns to classify everything as negative class, since this is making the margin larger, with zero cumulative error on the abundant negative examples. The trade-off is the small amount of cumulative error on the few positive examples. This explains why SVM fails completely in situations with a high degree of imbalance. One way to combat this is to increase the trade-off C+ associated with the positive examples (Veropoulos et al. 1999) as explained in section 2.2. 3. Imbalanced Support Vector Ratio: Another source of boundary skew according to Wu et al. (2003) is the imbalanced support vector ratio. As the training data gets more imbalanced, the ratio between the positive and negative support vectors also becomes more imbalanced. Wu et al. hypothesise that as a result of this imbalance, the neighbourhood of a test instance close to the boundary is more likely to be dominated by negative support vectors and hence the decision function is more likely to classify a boundary point negative. However, Akbani et al. (2004) pointed out that because of the conditions in equation (2.14), the sum of the α’s associated with the positive support vectors must be equal to the sum of the α’s associated with the negative support vectors. Since in an imbalanced setting there are fewer positive support vectors with correspondingly fewer
α’s, each positive support vector’s α must be larger than the negative support vector’s α on average. These α’s act as weights in the final decision function (see equation (2.12)), meaning higher α’s for the positive support vectors result in a higher weight and therefore in an offset of the effect of support vector imbalance to some extent. This is why the SVM does not perform too badly compared to other machine learning algorithms for moderately skewed datasets (Akbani et al. 2004). In the application of the SVM for the prediction of stroke outcome in this thesis, the data imbalance is in the range of 1:50 to 1:100 (see section 4.3.1). In order to improve the prediction results of the SVM the penalty term C and the weights “w” were adjusted incrementally (in order
Chapter 2: Background to pattern recognition and supervised learning 44 _____________________________________________________________________________ to find the best setting, see section 2.2) and in addition, the negative class was undersampled in the training data set similar to the approach of Akbani et al. (2004).
2.4 Logistic regression as classifier There are several possibilities for using regression as a classifier. Using linear regression as a classifier means that the fact that the target output is binary (e.g., -1/1) rather than a continuous variable is ignored. So a linear regression function is estimated: f ( x; β ) = β0 + β1 x1 + ... + β d xd + ε based on the available data.
(2.15)
Geometrically this corresponds to a hyperplane fitting of the given points. The resulting regression function can then be used in order to classify any new (test) example x according to label = 1 if f(x; β) > 0.5, and label = 0 otherwise. f(x; βö ) = 0.5 therefore defines a linear decision boundary that partitions the input space into two class specific regions (half spaces). Another widely used regression method is the already mentioned logistic regression using the general linear model (see section 0 and (Wu et al. 2001)). Logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve. It is a generalised linear model used for binomial regression. An explanation of logistic regression begins with an explanation of the logistic function (see equation (1.18)). The "input" is η(x) and the "output" is P. The logistic function is useful because it can take as an input any value from negative infinity to positive infinity, whereas the output is confined to values between 0 and 1. The variable η(x) represents the exposure to some set of risk factors, while P represents the probability of a particular outcome, given that set of risk factors. The variable η(x) is a measure of the total contribution of all the risk factors used in the model and is known as the “logit”. The variable η(x) is defined in equation (1.19), where α is called the "intercept" and β1, β2, and so on, are called the "regression coefficients" of x1, x2, respectively. The intercept is the value of η(x) when the value of all risk factors is zero (i.e., the value of η(x) in someone with no risk factors). Each of the regression coefficients describes the size of the contribution of that risk factor. A positive regression coefficient means that that risk factor increases the probability of the outcome, while a negative regression coefficient means that risk factor decreases the probability of that outcome; a large regression coefficient means that the risk factor strongly influences the
Chapter 2: Background to pattern recognition and supervised learning 45 _____________________________________ _______________________________________ _____________________________________ probability of that outcome; while a near-zero regression coefficient means that that risk factor has little influence on the probability of that outcome. Logistic regression is a useful way of describing the relationship between one or more risk factors and an outcome, expressed as a probability, which has only two possible values, such as infarction and healthy tissue. (Wu et al. 2001)
2.5 Data scaling and normalisation The goal for each classification task is that the developed classifier can differentiate the questioned classes from independently drawn and identically distributed data sets. This is a standard assumption in learning theory, data generated in this way is commonly referred to as iid. Therefore the data needs to be translated into features, which show the same histogram behaviour with eliminated offsets in data distribution. In addition, for achieving uniform data distribution between data sets a normalisation or data scaling has to be performed. Equation (2.16) shows how a feature vector x´ can be amplified with a factor β or an offset α, which needs to be eliminated between data sets. (2.16) ⎛ 1⎞ x ' = β ⋅ x + α ⎜ 1⎟ ⎜ ⎟ ⎜⎝ M⎟⎠ In some cases this might mean just a normalisation of the signal S with a defined initial signal value S0 (SN = S/S0) or a scaling procedure, which is sometimes necessary in case the feature signal highly differs and would bring the data into the same value range. There are several scaling options. Two options are implemented in the latest SVM library (libSVMtl (Ronneberger 2004)) used in this thesis for the classification of HARDI data: the “minmax” approach, which scales each feature in such a way that the minimum becomes -1 and the maximum +1, and the “stddev” approach scaling each feature in such a way that the mean becomes 0 and the standard deviation becomes 1. For the second part of the thesis, the prediction of the final stroke lesion outcome, the original SVM library (libSVM (Chang and Lin 2001)) was used as it was easier to implement into MATLAB code. For this all scaling and data normalisation was done manually. Here, the methods used in literature were used as a starting point. (Wu et al. 2001) normalised the dw- and pw-MRI data by dividing by the mean of the outlined regions to produce “relative” values (rT2,
Chapter 2: Background to pattern recognition and supervised learning 46 _____________________________________________________________________________ rADC, rDWI, rCBF, rCBV, rMTT). Newer approaches (Wu et al. 2006) normalise with respect to mean values measured in normal contra-lateral WM (see section 4.2.3 for our normalisation procedures). While working on the problem of stroke outcome prediction it was found that despite normalisation and scaling, the data histograms of the features differ greatly (except for ADC). Therefore, a histogram matching procedure was performed. Histogram matching or histogram equalisation equalises a data histogram to a given other histogram and is a computational fast and easy method with big effect in classification applications (see Figure 2.3). Specifically, existent mismatches between data distributions (resulting from differing data acquisition such as different operators, different injection time of the contrast agent, different physiology etc) are transformed so that the distribution matches a given data set (Molau et al. 2001).
Figure 2.3: Histogram matching. The distribution of the test data set is transformed so that it matches the cumulative histogram of the training data set (figure copied from (Molau et al. 2001)).
In our case this means that a test data set is matched with the training data set. Assuming that the underlying effect causing the mismatch is an independent effect on the different features, each feature space dimension can be normalised independently of the others (Molau et al. 2001).
Chapter 2: Background to pattern recognition and supervised learning 47 _____________________________________ _______________________________________ _____________________________________ This is done as follows: for each feature dimension the distribution, meaning the histogram, p(x) is computed for the test data set and the training data set. A cumulative histogram is derived:
P( x ) =
∫
x
−∞
dx ′p( x ′ )
(2.17)
and the test data set is transformed to the training data set distribution (see Figure 2.3). The test value xt is replaced with the test value x% corresponding to the same point in the cumulative t training data histogram ( P( xt ) = P( x% ) ). t Due to the iid assumption (independently and identically drawn), histogram matching can account for scaling (equation (2.16) factor β), shifting (equation (2.16) offset α) or any kind of distortion of each feature space dimension, except for possible feature space rotations. How rotational invariance is achieved in the case of the directional dependant HARDI data without using the dense originally measured data is described in detail in section 3.4.
Chapter 3: Classification of HARDI data 48 ___________________________________________________________________________
3 Classification of HARDI data into tissue types using a SVM 3.1 Introduction Diffusion-weighted MRI and in particular measurements of diffusion anisotropy provides biologically relevant information about the tissue microstructure (see section 1.3.1.2). A special focus of interest for research and clinical application of dw-MRI is the investigation of the brain WM structure. Such measurements allow the reconstruction of the neuronal fibre architecture in WM, the visualisation of fibre tracks and the examination of morphological connectivity between different cortical and sub-cortical regions. Data acquisition is typically performed using the HARDI approach introduced by Tuch et al. (1999) (see section 1.3.3). This method consists of the application of diffusion encoding (DE) gradients in a large number of non-collinear directions. With, for instance, 64 DE gradient directions the spatially non-Gaussian diffusion behaviour of water in white matter regions with heterogeneous fibre orientations can be resolved. Therefore HARDI evolved to be the basis for many post-processing approaches for resolving the spatial structure of neuronal fibre bundles in WM. Specifically; it would be advantageous to distinguish between parallel (PF) and crossing (CF) fibre bundles. The existing methods for inferring multiple fibre bundle populations from diffusion data can be classified into two groups (Behrens et al. 2007): model-dependent methods for the estimation of the underlying diffusion profile or model-free methods based on the inherent structure of the diffusion profile itself. The generic model-based method is Diffusion Tensor Imaging (DTI) (Basser et al. 1994), which was the first method used as a basis for the reconstruction of neuronal fibres, i.e. fibre tracking. The diffusion tensor (DT) represents the ADC and can be explained as the averaging of all water magnetisation vectors in a voxel when applying DE gradients in several spatial directions. From the DT, rotation invariant anisotropy measures, such as the fractional anisotropy (FA), can be derived (see section 1.3.3). The main drawback of DTI is that it can only reveal a single fibre orientation in each voxel and fails in voxels containing complex tissue architecture with more than one significant fibre orientation. One segmentation procedure based on the DT model applies a supervised clustering procedure with a collection of DTI metrics in regions of interests for the segmentation of GM, WM and CSF (Hasan and Narayana 2006). In this method, the contrast of FA maps between CSF, WM and GM was used, based on the “principal diffusivity indices”. The CSF was segmented using its
Chapter 3: Classification of HARDI data 49 _____________________________________________________________________________ high diffusivity and low anisotropy properties. However, since this method is based on DTI, no further classification of the WM subclasses PF and CF was possible. An approach that combines model-dependent and model-free methods for the differentiation of parallel and crossing fibre bundles based on HARDI (see section 1.3.3) and DTI was described by Kreher et al. (2005). In this approach a multi-diffusion tensor model was introduced, which contains one anisotropic and one isotropic diffusion tensor in order to model the tissue structures. In each voxel it is decided separately which of the two models is more appropriate for describing the underlying diffusion and therefore more suitable for the detection of crossing fibre bundles. The first model-free method which used spherical harmonics for the description of the diffusion profile acquired with HARDI data was reported by Frank (2002). Spherical harmonics (SHs) are functions, which show a similar behaviour as Fourier expansions regarding translation, but correspondingly to rotation (simple matrix operation). Therefore, they are very well suited for SVM applications since a rotation invariant description of features is needed. SHs are described in spherical polar coordinates (polar angle θ and the azimuth angle φ). Every function that takes as its arguments the directions θ and φ can be expanded into spherical harmonics. A function of the signal S can be described with spherical harmonics as follows (Webster and Szegö 1930; Arfken and Weber 1985): ∞
+n
S(θ ,ϕ ) = ∑
∑aY
n= 0 m= −n
m m n n
(θ ,ϕ )
(2.18)
where Y is the spherical harmonic of order n (all integer n ≥ 0), and m the azimuthal separation constant or degree (all integer m, |m| ≤ n). The coefficients anm are expressed as: 2π
a = m n
π
∫ ∫Y
m* n
(θ ,ϕ )S(θ ,ϕ ) sin θ dθ d ϕ
(2.19)
ϕ =0 θ =0
with Y* being the complex conjugate of Y. The expansion of equation (2.18) can be terminated at some n. The higher the order n the more complex the deviation from the spherical shape (n = 0). This can be described, but the sampling theorem dictates that more directions must be measured (see section 3.4 and (Yeo 2005)). In Frank’s approach (2002) isotropic diffusion occurring in water or CSF is described by zero order spherical harmonics, diffusion along parallel fibres by second order spherical harmonics, and diffusion in the multiple fibre case is approximated by the fourth order. The odd orders describe asymmetric components and therefore represent imaging artefacts and noise. By
Chapter 3: Classification of HARDI data 50 _____________________________________________________________________________ using a high order versus low order ratio of the spherical harmonic coefficients, Frank presented a method for differentiation between PF and CF, which is however subject to limitations. The results could include possible misclassification, especially for WM regions containing multiple crossings. These regions appeared like isotropic voxels similar to GM voxels. Differentiation between GM, CSF and background noise was thus not feasible with exclusive use of the spherical harmonic description. Descoteaux et al. (2006) extended the model in order to distinguish between isotropic, one-fibre and multi-fibre diffusion. This procedure is very promising, but automatic full image segmentation was not possible, since CF was still often misclassified as GM or noise. Alexander et al. (2002) described a method for the modelling and detection of non-Gaussian diffusion profiles also using spherical harmonics, but up to an order of eight, providing a sequence of models of increasing complexity. A statistical test was performed in order to find the simplest of the models, which adequately described the data. This method was applied in a human experiment and seemed to classify isotropic (GM) and anisotropic Gaussian (WM) regions correctly as order zero and order two, respectively. It was found that on average five percent of profiles in voxels within the brain were classified as order four or above (anisotropic non-Gaussian), which, given the anatomy of the brain seems too low a percentage. The method was validated by characterising its performance using synthetic data. It was not described how accurately GM was differentiated from CF. Behrens et al. (2007) reviewed several recent model-free techniques. In the data shown (HARDI data in 60 DE directions) a third fibre bundle orientation could not be detected. The authors supposed that a detection of more than two orientations would be possible if more diffusion directions were to be acquired at a higher b-value. Simulations, which were performed in (Behrens et al. 2007) suggest that in order to resolve an orthogonal three fibre bundle system robustly, data with b-values above 4000 s/mm2 has to be acquired. As will be shown below in the Theory and Methods section, it is necessary to acquire more than 60 DE directions in order to fulfil the sampling theorem for spherical harmonics of order four and above (Yeo 2005). In addition, the review by Alexander (2005) showed with noisy data synthesised from isotropic test functions that most methods generate spurious angular structure. This may explain why strong angular structures are incorrectly detected even in many GM and CSF voxels. However, for a realistic tissue description, the existing models are rather complex and often include ill-defined parameters not adequately supported by the measurement data. All presented methods, including the model-free methods, showed that full image segmentation of
Chapter 3: Classification of HARDI data 51 _____________________________________________________________________________ microstructures and of image background was not possible. A differentiation between voxels containing PF or CF, or the differentiation between GM, CF and background noise is difficult, since this information is usually derived from some measure of diffusion anisotropy. Many publications outline methods, which show potential for performing this differentiation, but so far an evaluation of their methods for fibre crossings has not been reported.
3.2 Outline of the Classification task Based on the state-of-the-art described above, a new data-driven analysis of multidirectional diffusion-weighted MRI data is suggested, which may provide unique fingerprints for different types of tissue and image components. In addition to using a model-free approach, methods developed in the field of pattern recognition are employed. In the present case it is attempted to classify six different classes: grey matter (GM), the two white matter (WM) subclasses: CF and PF, partial volume (a mixture between GM and WM), as well as cerebrospinal fluid (CSF), background noise and image artefacts (hereafter referred to as noise). First, the underlying diffusion profile per voxel of the HARDI data is described using the rotational invariants of the spherical harmonic decomposition. Then a Support Vector Machine (SVM), a computer algorithm for statistical learning which has already demonstrated robust performance in other applications (Nattkemper 2004; Quddus et al. 2005) is used for classification (Cristianini and John 2000). The SVM is trained with the labelled image features in order to find the function for separating the classes. Afterwards the SVM is systematically tested with simulated data and then applied to six in vivo data sets.
3.3 Simulated Data Sets In order to evaluate the performance of the SVM for the task in question some quantitative method had to be found for evaluation. Therefore, artificial data sets were created. Diffusion weighted images were simulated based on a two-compartment model for 81, 61 and 31 DE directions evenly distributed on a sphere, where the signal S is composed of a combination of two underlying compartments by using equation (1.6):
S = S0w1 exp(−bD1) + S0w 2 exp(−bD2 )
(2.20)
with S0 being the signal without diffusion weighting, w the weight of the compartment, b the diffusion weighting matrix (or so called b-matrix) and D the diffusion tensor (the DT).
Chapter 3: Classification of HARDI data 52 _____________________________________________________________________________ The template (equation (2.20)) was used for the simulation of GM, PF and CF. Each voxel was simulated to include: (i) an FA value (resulting from a specific D), (ii) two crossing fibre bundles with varying angle between them (rotation of D) and (iii) relative weights w of the two combined signals, where 0.5 means that both bundles have the same strength. The FA and the mean diffusivity were equal for both bundles. The mean diffusivity was constant for each tissue type: 0.39x10-3 mm2/s in GM and 0.79x10-3 mm2/s in PF and CF. The remaining three parameters FA, relative weight and angle were varied independently within each class. For each class 520 voxels were simulated. In Table 3 the parameter combinations representing each tissue class are shown. Note that crossing fibre bundles are defined with an angle of ≥ 30 degrees. Table 3: Parameter combinations for each tissue class. Each range in FA is divided evenly into 8 bins. Relative weights are divided evenly into 2 bins for PF and 5 bins for CF. For PF the angle range is divided evenly into 5 bins and for CF into 13 bins.
Tissue GM PF CF
FA 0 – 0.15 0.55 – 0.9 0.2 – 0.5
Relative weights 0/1 0/1 and 0.1/0.9 0.2/0.8 – 0.5/0.5
Angle / degree 0 0 – 20 30 – 90
Six training data sets with several signal to noise ratios (SNR) were created by adding Gaussian white noise: SNR = ∞, 100, 60, 30, 10, 5. The SNR was determined in all simulated tissues and directions by taking the median of the signal and dividing it by the standard deviation of the absolute value of the added noise. In literature the SNR is often determined in the b = 0 image. According to this definition the SNR0 in the corresponding b = 0 image would be: ∞, 179, 104, 52, 18, 9. For each noise condition corresponding test data sets were created, which were identical except that the main orientation of the fibres was randomly rotated about the z-axis and that for each noise condition newly generated noise was added.
Chapter 3: Classification of HARDI data 53 _____________________________________________________________________________
In vivo HARDI measurements and pre-processing The following scanning protocol was performed with six healthy male volunteers (mean age: 29.3, standard deviation (SD) age: 3.1, all with about the same level of education) forming the basis for one training and five test data sets. The five additional test data sets were measured in order to avoid biases occurring when the same data is used for training and testing (Gottrup et
al. 2005). The six in vivo HARDI data sets were acquired on a 3.0 T MRI scanner (Magnetom Trio Tim, Siemens Medical Systems, Erlangen, Germany) with 81 DE directions. The scanner was equipped with a high performance gradient system capable of a maximal gradient strength of 40 mT/m. Images were acquired with a standard circularly polarised eight channel head coil. An effective b-value of 1000 s/mm2 was used for each of the 81 DE directions, which were evenly distributed on a sphere. Eleven additional measurements evenly distributed throughout the scan were acquired without diffusion weighting (b = 0 s/mm2). A total of 92 scans with 69 slices were obtained using a diffusion sensitive spin echo EPI sequence with TR = 11000 ms, TE = 94 ms, voxel size = 2 x 2 x 2 mm3, matrix = 104 x 104. During reconstruction, scans were corrected for motion and distortion artefacts based on a reference measurement (Zaitsev et al. 2006). In order to investigate the effects of varying DE directions two additional HARDI data sets per subject were acquired having the same imaging parameters except for the DE directions. One data set was acquired with 61 DE directions and eight images with b = 0 s/mm2, and the second was measured with 31 DE directions and four images with b = 0 s/mm2, all being evenly distributed on a sphere. It was chosen to reduce the number of acquired non-diffusion weighted data sets relative to the reduced number of DE directions in order to keep the ratio between zero and high b-value images the same. This ensures a similar contrast and SNR (Jones et al. 1999). All diffusion schemes used for the in vivo measurements were the same as for the simulations. In addition, a 3D T1-weighted magnetisation prepared ultrafast gradient-echo sequence (T1weighted MPRAGE) measurement was acquired for all six subjects with the following parameters: matrix size (x, y, z) = 256 x 256 x 160, TR = 2200 ms; TI = 1100 ms; TE = 2.15 ms, flip angle = 12°; bandwidth = 200 Hz/pixel, voxel size = 1 x 1 x 1 mm3. The post-processing steps for the T1-image were performed with the MATLAB (The MathWorks 2007) based SPM5 (Functional Imaging Laboratories 2005). The T1-image was coregistered to the b0-image with the default SPM5 parameters except for the reslicing method, for which 4th degree B-spline
Chapter 3: Classification of HARDI data 54 _____________________________________________________________________________ interpolation was used. Afterwards, segmentation into GM, WM and CSF was performed using the default SPM5 parameter settings and template probability maps. The segmentation results are saved as probability maps (values between 0 and 1).
3.4 Features and Processing of HARDI data As stated above, a prerequisite of tissue classification using the SVM is an adequate selection of image features. For this, the shape of the diffusion profile in each voxel was chosen, which can be described by spherical harmonics (Frank 2002). The properties derived from the HARDI data, which make up the feature vector, are required to be rotation invariant, which is why the SHs ar so well suited for the problem. The spherical harmonic coefficients a change depending on the orientation of the spherical harmonic in space, meaning spherical harmonics are rotation covariant. In the following, an approach is introduced describing the shape of the diffusion profile per voxel by using a rotation invariant description of the spherical harmonics (the absolute value of the SH) by simple matrix operations:
fn =
2
n
∑
m n
a
m= −n
(2.21)
By using equation (2.19) and applying the spherical harmonics addition theorem on the spherical harmonic (Webster and Szegö 1930; Arfken and Weber 1985) one obtains:
fn =
∫
dv ′ dv ⋅ S(v ′) ⋅ Pn ( v ′,v ) ⋅ S(v )
(2.22)
v ′ ∗v
≈
∑ S(v ′) ⋅ P ( v ′,v n
v ′ ,v
) ⋅S(v)
(2.23)
with fn being the rotation invariant number for each order n, Pn being the associated Legendre Polynomial of order n. The variable S describes the diffusion-weighted signal normalised with the non-diffusion weighted signal. The unit vector v represents each DE direction, where the angular brackets denote the scalar product. A sufficient description of a shape by spherical harmonics can only be achieved if enough sampling points are available. Yeo (2005) found a sufficient but not necessary condition describing the sampling theorem for spherical harmonics to be B data points:
B = (2 n)2
(2.24)
Chapter 3: Classification of HARDI data 55 _____________________________________________________________________________ However, Yeo considered a function of spherical harmonic order n to be under-sampled when fewer than B samples are taken. This means that for a spherical harmonic function of 4th order, 64 data points are usually necessary for representation. The SVM C++ library used in this application is called libSVM (Ronneberger 2004) and has a number of internal algorithmic options. Several options for separation of the classes within the library were trialled, the kernel function suiting to most applications in literature, the radial basis function (RBF), was also the most accurate (see section 3.5) for our application. In addition, best classification results were achieved if scaling of the features was performed. Two automatic scaling options are possible: the “minmax” approach and the “stddev” approach (see section 2.5). In addition, the two multi-class comparison methods were explored (one-versus-one and one-versus-rest). Classification using a supervised learning technique like the SVM requires the selection and labelling of representative voxels for a training data set. This can easily be done for the simulated data sets. However, true labelling of an in vivo data set requires knowledge of the contents of each voxel, which is impossible. In addition, investigations of other complex problems (Mouridsen et al. 2006) showed that there exists considerable operator bias when using experts for anatomical assessment. Consequently, the labelling of the first in vivo training data set was performed in two steps (Figure 3.1). First, a simulated data set with an SNR of 10 is used for training. The simulated data was used as model for the classification of the future in vivo training data set, but here only the division of WM into CF and PF was of interest, since the separation of CF from GM is not possible using spherical harmonics shape description exclusively (see section 3.1).
Chapter 3: Classification of HARDI data 56 _____________________________________________________________________________
Figure 3.1: Procedure for labelling the training in vivo data set and training the SVM in order to classify all image contents of a second in vivo DWI data set.
Second, WM, GM and CSF masks are created from the SPM5 segmentation results of the T1image. The information gathered in these two steps was taken in order to obtain a labelled in vivo training data set, meaning the CF regions were masked using the WM masks from the T1 segmentation, and the same was done with the PF voxels. The GM and CSF labels from the SPM5 segmentation were used as final labels. The combination of GM, WM and CSF resulted in a brain mask with the inverse being the background noise including ghosting and chemical shift artefacts. This labelled data set would be used as the basis for future classification of all new human HARDI examples having the same DE scheme. The voxels selected for the training data sets were derived from regions with definite knowledge about the tissue structure. In summary, for the image component classification task the HARDI data were transformed to yield rotation invariant features, labelled the training data sets into multiple classes by incorporating a priori knowledge derived from SPM5 and trained the SVM with these data. Then several varying algorithmic options were tested and with help of a corresponding test data set the classification results were evaluated. The SVM classification of the in vivo data sets produces a fully segmented HARDI data set. This was compared with segmentation using the linear, planar, and spherical coefficients of Westin et al. (1997).
Chapter 3: Classification of HARDI data 57 _____________________________________________________________________________
3.5 Classification Results and Comparison with other methods The SVM may be used with various settings as mentioned above, for example: one-versusone or one-versus-rest, together with a wide range of possible kernels etc. The in Table 4 listed combinations of settings were explored. Table 4: All tried SVM settings such as kernel function, multi-class comparison and scaling
Kernel function RBF RBF RBF RBF RBF RBF polynomial linear histintersect sigmoid
Multi-Class one-vs-one one-vs-rest one-vs-one one-vs- rest one-vs-one one-vs- rest one-vs-one one-vs-one one-vs-one one-vs-one
Scaling – – minmax minmax stddev stddev minmax minmax minmax minmax
Due to limitations of space, the results of every possible combination are not shown. In general, the best accuracy was obtained using the radial basis function as kernel together with feature scaling (using the cross-correlation functionality of the libSVMtl). Only the results of one combination (radial basis function, plus “stddev” feature scaling, plus one-versus-rest multi-class comparison) are presented for both simulated and in vivo data.
3.6.1. Simulations The dependency of the quality of classification on the number of DE directions and noise, for the SVM with the simulated data sets, is shown in Table 5. From the eigenvalues of the DT the Westin coefficients about the geometric shape of the diffusion tensor cl, cp and cs (Westin et
al. 1997) can be derived (see section 1.3.3). Table 5 shows the results obtained for PF, CF and GM in each specific data set after manual thresholding to achieve the best segmentation. It is evident that for both methods the higher the SNR and the higher the number of DE directions the better the detection accuracy. A non-trivial finding is that for the SVM the accuracy remains very high even for low SNR and poor sampling. All three image components were detected with the SVM without error in sensitivity in the data sets without noise, except for a few voxels in the detection of PF. SVM was least sensitive (71 %) in the detection of PF at the lowest SNR and
Chapter 3: Classification of HARDI data 58 _____________________________________________________________________________ smallest number of DE directions. SVM was least specific (87 %) in the detection of CF. In contrast, the Westin coefficients thresholding shows much lower sensitivity and specificity. The lowest sensitivity was 47 % in the detection of GM. The lowest specificity was 10% in the detection of CFs with an SNR = 10. Table 5: The segmentation accuracy in percentage (true positive results out of 520 possible (sensitivity), false positive results out of 1040 possible (1-specificity)) of the SVM classification and thresholding of the linear (cl), planar (cp) and spherical (cs) coefficients of the simulated data with all possible DE directions. The accuracy is shown as a function of SNR.
SVM SNR
Accuracy
CF DE81 DE61 sensitivity 100 100 ∞ (1-specificity) 0 0.77 sensitivity 100 99.8 100 (1-specificity) 0.1 0 sensitivity 100 99.2 60 (1-specificity) 0.1 0 sensitivity 98.9 99.4 30 (1-specificity) 0.1 0 sensitivity 99.2 99.2 10 (1-specificity) 1.06 0.67 sensitivity 96 92.5 5 (1-specificity) 3.85 5.87
DE31 DE81 100 100 0 0 100 99.8 0 0 99.4 99.8 0 0 99.8 99.8 0 0 97.5 97.9 2.12 0.38 81.0 92.69 12.98 1.63
PF DE61 98.5 0 100 0.1 100 0.38 100 0.29 98.7 0.38 90.9 1.44
DE31 DE81 100 100 0 0 100 100 0 0 100 100 0.29 0 100 100 0.1 0 95.8 100 1.15 0 71.4 99.4 7.79 0.48
GM DE61 DE31 100 100 0 0 100 100 0 0 100 100 0 0 100 100 0 0 100 99.8 0 0.19 96.9 87.9 1.44 9.13
c p , c l, c s
SNR
Accuracy
CF (cp) PF (cl) GM (cs) DE81 DE61 DE31 DE81 DE61 DE31 DE81 DE61 DE31 sensitivity 100 100 100 100 100 100 100 100 100 ∞ (1-specificity) 0 0 0 0 0 0 0 0 0 sensitivity 82.5 97.7 80.2 100 100 100 100 97.7 97.9 100 (1-specificity) 0.5 3.5 4.9 0 0 0 4.6 3.5 6.3 sensitivity 83.7 97.3 76.2 100 100 100 99.6 97.3 84.2 60 (1-specificity) 11 8.4 18.6 0 0 0 12.2 8.4 6.0 sensitivity 76.7 96.5 78.8 100 100 99.6 94.6 96.5 82.5 30 (1-specificity) 33.8 17.7 53.8 0 0 0.2 10.9 17.7 16.2 sensitivity 93.7 76.5 76.2 97.9 92.7 97.3 85.8 76.5 46.7 10 (1-specificity) 90.3 39.2 68.4 0.3 0.1 1.3 40.2 39.2 33.8 sensitivity 78.5 70.8 44.2 84.0 72.7 95.8 67.1 70.8 55.4 5 (1-specificity) 80.0 56.3 41.2 4.2 7 2.1 51.9 56.3 57.6
Chapter 3: Classification of HARDI data 59 _____________________________________________________________________________ The following Figure 3.2 demonstrates the classification accuracy for the detection of CFs for 81 DE directions. It is obvious that the lower the SNR, the lower the accuracy of classification. Even at a low SNR = 10 the results are in good agreement with the ground truth.
Figure 3.2: Percentage voxels detected as crossings depending on SNR and angle (81 DE directions). The ground truth is zero below 30°.
3.6.2. In vivo results The results for the segmentation of the in vivo data using the SVM are shown in the figures below. All following figures in this chapter 3 (Figure 3.3 – 3.7) show the same coronal slice from one test subject for the GM, WM and CSF SPM5 segmentation results, respectively. For reasons of space, only the results of one SVM option are presented in the following (using “stddev” feature scaling and one-vs-rest multi-class comparison). The “gold standard” SPM5 is shown in grey scale. Segmentation results are overlaid in transparent red. All following anatomical regions were identified with a standard atlas (Nieuwenhuys et al. 2008).
Chapter 3: Classification of HARDI data 60 _____________________________________________________________________________ 3.6.2.1.
Classification of grey matter
Figure 3.3 shows grey matter voxels classified by the two different methods: SVM classification and thresholding of Westin coefficient cs (spherical coefficient, see section 1.3.3 and equation (1.9)). The Westin coefficient thresholding (here 0.8 < cs < 0.9) shows many false positive results especially in noise regions and at the border of the lateral ventricle and parts of the third ventricle containing CSF (yellow circle in Figure 3.3b). The SVM classification yielded some false positive results in background noise, however the majority of the classified voxels are in agreement with the SPM5 probability map. Although the thalamus is defined as GM in SPM5 the population of cells in this region are, in reality, a mixture of grey matter and white matter, interweaved with many neuronal fibres. This may explain why SVM and cs thresholding are not in agreement with SPM5 (green circle in Figure 3.3a).
Figure 3.3: SPM5 segmentation map of grey matter overlaid with the classification results of recognised grey matter in transparent red (the first test data set is shown). a) GM classified with SVM, the thalamus is encircled in green; b) Westin coefficients thresholding (0.8 < cS < 0.9), false positive CSF regions are encircled in yellow.
Chapter 3: Classification of HARDI data 61 _____________________________________________________________________________ 3.6.2.2.
Classification of white matter
In Figure 3.4 all voxels recognised to be part of WM using SVM classification or Westin coefficients (see section 1.3.3 and equations (1.10) and (1.11)) thresholding (CF => 0.17 < cp < 1 and PF => 0.25 < cl < 1) are illustrated.
Figure 3.4: White matter map overlaid with classified parallel and crossing fibres (the first test data set is shown). a) PF classified with SVM, the Pons is encircled in red, the blue arrow points to the fornix; b) PF classification by Westin coefficients thresholding (cl > 0.25), the Pons is encircled in red, the blue arrow points to the fornix; c) CF classified with SVM, the thalamus is encircled in green; d) CF classification by Westin coefficients thresholding (cp > 0.17), the thalamus is encircled in green.
No quantitative evaluation of this data is possible; however the majority of the white matter voxels are expected to be classified as CFs, since two fibre bundles separated with a relatively low angle of 30° were defined to be crossing fibre bundles. The two images on top represent the detected PF and the two images below the detected CFs. Westin coefficients thresholding has lower specificity than the SVM method, showing greater number of false positives in the regions with background noise (Figure 3.4b and d). A lower number of parallel fibres were detected with
Chapter 3: Classification of HARDI data 62 _____________________________________________________________________________ Westin coefficients in the cortico-spinal tract close to the Pons (red circle in Figure 3.4a and b). In addition, Westin thresholding could not detect PFs of the fornix (blue arrow in Figure 3.4a and b), which on the contrary were recognised with the SVM. It is encouraging that voxels in the region of Pons were classified as both PF and CF, since it is known that here both descending PFs and many CFs are present together. Both methods detected CFs in the thalamus (green circle in Figure 3.4c and d), which is, as was mentioned above, a region, which contains grey and white matter. As with GM classification, the SVM WM classification gave some false positive voxels in regions with noise, but most voxels agree with the probability map. In general, Westin coefficients and SVM classification showed good agreement in voxels with high anisotropy such as the corpus callosum or the cortico-spinal tract above the Pons.
3.6.2.3.
Classification of CSF
In Figure 3.5 the results for the CSF classification are shown. As was mentioned above, finding a threshold for differentiation of GM and CSF with Westin coefficients thresholding is very difficult (GM: 08 < cs < 0.9, CSF: 0.95 < cs < 1). Therefore, in Figure 3.5b false negative voxels occur, where in Figure 3.3b they would be false positives. Note that SVM also classifies voxels outside the brain, which could be CSF or other fluids like blood. In addition the SVM tends to overestimate the number of CSF voxels, whereas, using a threshold that separates CSF from GM well, Westin thresholding tends to underestimate the number of CSF voxels.
Figure 3.5: CSF map overlaid with recognised CSF (the first test data set is shown). a) Classification with SVM; b) classification by Westin coefficients thresholding (cs = 0.95 – 1).
Chapter 3: Classification of HARDI data 63 _____________________________________________________________________________ 3.6.2.4.
Classification of image noise and artefacts
In Figure 3.6 the classified image noise, including background noise and image artefacts, is illustrated. Here, only the SVM classification results are shown, since it was impossible to find this class by Westin thresholding. The results are overlaid on mean diffusivity maps for better contrast. Some voxels recognised by the SVM lie in regions of high vessel pulsation or areas sensitive to image artefacts (blue circle in Figure 3.6a), which is a correct classification, since also noise was defined to be an artefact.
Figure 3.6: Mean diffusivity maps overlaid with the recognised noise (background noise and image artefacts). a) Classification with SVM; and b) combination and inversion of GM, WM and CSF SPM5 segmentation results.
3.6.2.5.
Comparison of sensitivity and specificity for SVM and Westin
coefficients
The SVM and the thresholding of the Westin coefficients results were compared with the “gold standard” SPM5 segmentation of the T1-image. As stated above, the segmentation algorithms, like SPM5, can only give probabilistic values for the tissue type. Only the probabilistic values above p = 0.5 were considered for the following validation. This comparison of the SVM and Westin coefficients with SPM5 is problematic. Since standard segmentation algorithms only divide the brain into GM, WM and CSF, the classification results had to be joined in order to be able to compare with SPM5. Therefore, the voxels recognised as PF and CF were combined to represent WM. The voxels recognised as GM and partial volume were combined to represent GM, although partial volume could contain GM and WM. The background noise was compared with a mask created with help of the segmented GM, WM and CSF from the T1-image and contained all voxels outside the brain. The CSF segmentation results were directly comparable. Table 6 gives an overview for all three data set options (81, 61 and 31
Chapter 3: Classification of HARDI data 64 _____________________________________________________________________________ DE directions) of the validation of the SVM segmentation. The average value and standard deviation for all test data sets are shown. Table 6: Comparison of sensitivity and specificity results in percentage (SVM classification and the linear, planar and spherical coefficients thresholding) of all data sets (81, 61 and 31 DE directions) in relation to T1-image segmentation using SPM5. The results of the five test data sets were averaged.
SVM Tissue class
WM (CF + PF) GM (GM + partial) CSF noise
Sensitivity ± SD
Westin coefficients cl, cp, cs
(1 – specificity) ± SD
81 DE directions 86.35 ± 2.03 2.29 ± 0.36 56.35 ± 6.02 2.19 ± 1.72 73.31 ± 10.19 8.46 ± 1.45 92.08 ± 3.42 16.61 ± 7.76
Sensitivity ± SD
(1 – specificity) ± SD
60.27 ± 2.89 49.31 ± 10.28 44.35 ± 17.16 –
17.37 ± 2.22 14.60 ± 5.40 5.40 ± 4.85 –
61.83 ± 2.02 50.70 ± 3.40 38.76 ± 27.16 –
18.74 ± 0.85 13.22 ± 1.30 4.09 ± 4.93 –
64.47 ± 3,96 52.97 ± 7.92 28.15 ± 15.87 –
17.52 ± 1.56 12.37 ± 3.45 2.69 ± 4.64 –
61 DE directions WM (CF + PF) GM (GM + partial) CSF noise
71.92 ± 2.62 48.39 ± 14.74 73.07 ± 12.70 94.62 ± 2.43
0.85 ± 0.24 1.42 ± 0.87 8.29 ± 1.27 28.96 ± 16.63 31 DE directions
WM (CF + PF) GM (GM + partial) CSF noise
77.14 ± 1.67 53.97 ± 8.09 72.88 ± 13.38 92.39 ± 2.73
1.42 ± 0.27 1.59 ± 0.79 9.61 ± 2.01 55.57 ± 9.56
The SVM classified all classes with a sensitivity above 70 %, except for GM. The data acquired in 81 DE directions gave the best accuracy for the classification of WM. The specificities for GM and WM are above 90 %, much higher than with Westin thresholding. The accuracy for the classification of noise and artefacts is above 90 %, although the specificity drops with decreasing number of DE directions. Westin coefficients thresholding shows much lower sensitivities than SVM, except for GM. Decreasing the number of DE directions makes only a difference for the data set simulated in 31 DE direction with an SNR = 5, where the sensitivity drops below 45 % for WM. The specificity for the detection of CSF shows slightly better results than for SVM.
Chapter 3: Classification of HARDI data 65 _____________________________________________________________________________
3.6 Discussion and Outlook A new automated method for the separation of parallel and crossing fibre bundles in the brain white matter using HARDI data and a SVM algorithm is presented. With this method, each voxel of a data set was identified without additional anatomical scans or expert knowledge. A rotation invariant data representation of the features was used as input for the SVM. After feature extraction the classification procedure was trained and systematically tested using simulated data sets with several noise levels. It could be shown that even for very low SNR of 5 the chosen SVM algorithms gave a very high sensitivity and specificity as well as robustness in the presence of noise with simulated data sets, in contrast to Westin coefficients thresholding. The in vivo HARDI data sets for the classification of fibre crossings were obtained in a clinically acceptable time. T1-weighted MPRAGE images were used solely to identify brain regions for the training data set. Once this was done, the MPRAGE images were no longer needed for classification. This means that, if the presented procedure for classification of HARDI data using the SVM was further developed and brought into routine use, MPRAGE images need not to be acquired in future. The problem of creating a training data set was solved by combining SVM classification results for CF and PF from simulated data sets with T1 segmentation results for grey matter, white matter, CSF and background noise. The selection of representative voxels for the training data set may need several refinement steps. This dependence on the training data is a drawback of using a supervised learning algorithm, but once the training data set was optimised all subsequent steps cannot be biased by users. This means, for optimising the application some effort is needed for the selection of the “best” representing training data. The project also attempted the classification of partial volume voxels containing grey matter and white matter. This was found to be highly dependent on the chosen multi-class comparison algorithm. An example is shown in Figure 3.7. Here, the performance of the two multi-class comparison methods one-vs-one and one-vs-rest is shown for three tissue classes. The results are most different in the area of the thalamus, which is a region not clearly specifiable as GM or WM (blue circle in Figure 3.7b and e). This area was classified as partial volume with the one-vs-rest approach. With the one-vs-one approach those voxels appear to be CFs. In general, the one-versus-rest approach detected areas of partial volume robustly. But
Chapter 3: Classification of HARDI data 66 _____________________________________________________________________________ when looking closer at the partial volume and CF detection with this latter method some regions, which should be crossings, are detected as partial volume as well (green circle in Figure 3.7b).
Figure 3.7: SPM segmentation results of GM (a – d) and WM (e – f) overlaid with SVM classification results in transparent red. The two multi-class comparison algorithms of the SVM are contrasted: on the left side the one-vs-rest approach is shown and on the right one-vs-one: a) – b) partial volume, c) – d) GM, and e) – f) CF.
Chapter 3: Classification of HARDI data 67 _____________________________________________________________________________ In addition, it was found that with a different SVM combination, better GM sensitivity was obtained, but at a cost of lower detection sensitivity for other classes (data not shown). In summary this means that there is a trade-off between the accuracy of partial volume, crossing fibre bundles and grey matter detection. This shows that the choice of the classification algorithm depends on the application, since it highly influences the results. In our comparison of SVM with Westin coefficients thresholding, it was shown that our method can differentiate between the WM structures, CF and PF, and is sensitive also for the separation of CF from GM (cf. Figure 3.3 – Figure 3.6). Except for the class CSF, Westin coefficients thresholding showed lower sensitivities and specificities for the differentiation of all classes. Note that the optimal Westin thresholding resulted in erroneous double labelling, i.e. many voxels were detected as both PF and CF, whereas the SVM can only give one label for each voxel. There are several explanations for this difference: first, the Westin coefficients are based on the diffusion tensor model, which is correlated with spherical harmonics of order two only. And second, the SVM surveys several features in parallel in high dimensional space, meaning class specific feature combinations are taken into account. In addition, the SVM segmentation results were compared with the coregistered SPM segmentation of the T1-image and found good agreement. The segmentation in SPM5 was defined as “gold standard”, but as already described a probability threshold had to be chosen for the validation, which introduces a subjective element to the analysis. Also, coregistration with an automated algorithm always has a risk of misregistration. The registration results were carefully inspected in order to ensure that this was not the case. There is also a discretisation effect when coregistering an image with finer resolution to a coarse image leading to inaccuracies especially at tissue borders. In order to avoid such effects the T1-MPRAGE images were acquired in the same session and orientation as the HARDI data, but such effects can never be totally avoided. Furthermore, the results obtained from the SVM for the CSF segmentation include voxels between brain and skull, which is not shown in the SPM5 segmentation, since this typical segmentation procedure is based on a-priori known probability maps. This means that the voxels defined as false positives in regions not shown in SPM5 segmentation might be correct in reality. A proper verification of the results remains a challenge especially for the parallel and crossing fibre bundles, which is a commonly recognised problem in diffusion MR imaging. Also, there are several partial volume combinations in each voxel possible such as GM-CSF, WM-CSF, WM-GM, GM-vessel and WM-vessel, which were not considered here. Only the GM and WM
Chapter 3: Classification of HARDI data 68 _____________________________________________________________________________ partial volume was taken into account. The above-mentioned misregistration and erroneous segmentation emphasise the strength of our method, which does not require registration or segmentation. With the presented method no additional anatomical scans are required, except for the creation of an initial training data set. However, for this one training data set it is important to determine carefully a set of training voxels chosen where the user is sure about the underlying structure. Also the dependence of the detection accuracy on the number of DE directions was investigated. There was only a small difference in between results obtained with the simulated data sets of 61 or 81 DE directions, but there is a sharp decrease in accuracy when using 31 DE directions. It would be interesting to investigate the effect of using an even higher number of DE directions than 81. In the in vivo results an effect of the number of DE directions can only be found for the two WM classes. This agrees with the sampling theory, which states that for the order n = 4, the DE directions required is ≥ 64, from equation (2.24). Several assumptions were made in our simulations: First, only a one and two fibre bundle model was used as the underlying anatomy. Second, the minimum angle between fibre bundles of 30° is an arbitrary threshold for crossings and can be seen as a definition of threshold between parallel and crossing fibre bundles. This threshold enables the simulation of a so called fibre bundle branching or fanning situation, which is the reason why so many fanning regions were detected as crossing regions. For proof of principle, our method was tested on five independent subjects and showed very similar results for the classification of CF, PF, GM and partial volume. Future work will test our algorithm on a larger number of subjects. In the literature there has been no method reported where the main image components were recognised just by using HARDI data. Some authors pointed out the difficulty of differentiating between CF, GM, and noise. In this new approach, one way of solving this problem could be choosing a combination of published model-free methods with a supervised learning technique. Though the results still show false positive voxels and validation is difficult, a segmentation procedure was obtained that performed well. Our method can provide a priori knowledge for increasing the performance of fibre tracking algorithms. After initial masking e.g. using the SVM classification results in order to define the tracking area only to be in WM, one could for example divide the fibre tracking into two cases: the parallel fibre bundles could be tracked with an easy and fast deterministic tracking
Chapter 3: Classification of HARDI data 69 _____________________________________________________________________________ algorithm and the crossing and fanning regions could be tracked with a computational expensive, but robust and reliable method such as Gibbs tracking (Kreher et al. 2008). In order to validate the developed algorithm in future one should concentrate on quantification processes. This would not only be important for the algorithm developed in this thesis, but also for other HARDI/DTI applications where statements about tissue structure are made. One could for example image animal brains directly followed by histology. Though registration of imaging data with histologies is a new problem in itself, such investigation could bring deeper insight about quantification options. Another application would be automatic recognition of pathologies, for example, the prognosis for WM neuronal fibre bundles destroyed after stroke using a HARDI measurement in the acute stroke phase. One could train the SVM with this early state HARDI scans (similar to the approach presented in the next chapter 4), which could be labelled with help of coregistered HARDI data acquired from the same patient, but later in the chronic stroke phase. In these late HARDI scans permanent destruction of white matter fibre bundles can be determined with expert knowledge. Now, the SVM classification of any HARDI data after acute stroke with the same imaging parameters may be used to predict the location of permanently destroyed WM regions.
Chapter 4: Prediction of stroke outcome 70 _____________________________________________________________________________
4 Prediction of stroke outcome using multi-modal acute stroke MRI data 4.1 Introduction Decision on thrombolytic treatment in acute non-haemorrhage stroke crucially depends on weighting the risk of infarction growth against the risk of bleeding (see section 1.3.5) (Garcia 1984). This induces the development of algorithms for stroke outcome prediction. State-of-theart prediction algorithms use logistic regression approaches in a supervised learning setup (see section 0 and (Wu et al. 2001)). Nowadays, such prediction algorithms rely on acute MRI data, especially dw- and pw-MRI. From the pw-MRI data perfusion estimates are determined using models (see section 1.3.4.1). As already described in section 1.3.4.1, the parameters derived from pw-MRI measurements are often inaccurate, raising the question if some of the problems can be avoided by omitting the erroneous attempts of quantification and finding pure data-driven solutions for solving the actual medical question or if the measurement of perfusion with the standard MRI methods already incorporates many problems. The goal of this section of this thesis is to find a data-driven solution for the task of prediction the stroke lesion outcome and to find out whether or not the existing methods contain enough information for the task of stroke outcome prediction. A purely data-driven method for stroke prediction is introduced applied with the standard prediction method: the logistic regression using the general linear model. In addition, a new additional classifier is introduced in this context: the SVM.
4.2 Materials and Methods 4.2.1. MRI data acquisition The MRI data were acquired in the frame of a European project, the “I-KNOW” project (I-Know 2006), (Project title: “Integrating Information from Molecule to Man: Knowledge Discovery
Accelerates Drug Development and Personalized Treatment in Acute Stroke”). Here, data from the Institut National de la Santé et de Recherche Medicale in Lyon were used, where the data was acquired on a 1.5 T Avanto Siemens scanner. The scanning protocol followed the study guidelines determined within the I-KNOW project framework. Out of fourteen patients, five patients were not used in most of the following evaluation steps as they only had very small
Chapter 4: Prediction of stroke outcome 71 _____________________________________________________________________________ acute and final lesions. The small lesion size seems to be responsible for the bad performance in the outcome prediction in those data sets. On the other hand, the decision of treatment for these patients is clear already and a prediction of stroke outcome not necessary. Therefore, only the nine remaining patients are presented in detail in the next chapters. Six of the patients were males and three were females with an average age of 68 years (SD = 13.2 years). More patient details shall be discussed in section 4.3.2. All patients had an infarction on their right hemisphere (shown left in the figures, being in radiological convention). For the prediction task two scanning time points are of interest. The multi-modal acute MRI scans acquired as soon as possible after stroke onset (time point 1, 1 – 6 hours post stroke onset) will form the data basis for the prediction algorithm. The T2-FLAIR scan acquired between two weeks and one month post stroke (time point 2) will be the basis for the definition of the chronic lesion size and location. Among other scans such as ToF (Time of Flight: an angiography MRI) and a T1 MPRAGE, which were scanned only for clinical examination by the radiologist, the following data were acquired at time point 1:
⇒ Diffusion-weighted MRI:
Spin-echo EPI sequence with diffusion sensitive gradients in 3 directions
TE = 87 ms, TR = 6400 ms, flip angle = 90°, matrix = 128 x 128, 40 slices, slice thickness = 3 mm, in-plane resolution = 1.87 x 1.88 mm2
ADC, DWI and b = 0 maps were automatically created online by the scanner software
⇒ Perfusion-weighted MRI:
Spin-echo EPI sequence, volumes were measured repeatedly 60 times (1 min 33 seconds)
TE = 30 ms, TR = 1540 ms, flip angle = 90°, matrix = 128 x 128, 20 slices, slice thickness = 6 mm, in-plane resolution = 1.88 x 1.88 mm2
Perfusion maps were calculated using the “I-KNOW”-software with the implemented “Mouridsen-model”: CBF, CBV and MTT (Mouridsen et al. (2006))
For comparison reasons the perfusion maps were also calculated using the first bolus extraction method by Gall et al. (2009)
Chapter 4: Prediction of stroke outcome 72 _____________________________________________________________________________
⇒ T2-FLAIR:
This is a high-resolution T2-weighted spin-echo sequence, which suppresses CSF. The CSF suppression is done using an inversion recovery pre-pulse at a specific inversion time TI, which inverts the longitudinal magnetisation vector. To obtain CSF suppression, the TI corresponds with the longitudinal relaxation time of water. (Vlaardingerbroek and den Boer 2003)
TE = 109 ms, TR = 8690 ms, TI = 2500 ms, flip angle = 150°, matrix = 256 x 224, 24 slices, slice thickness = 5 mm, in-plane resolution = 0.94 x 0.94 mm2
⇒ T2*:
With help of the T2*-weighted MRI using a gradient echo sequence with an echo time close to the T2* relaxation time (T2* decay time is the transverse relaxation time T2 in tissue and is 2 to 3 times as fast as the pure T2) the diagnosis whether or not haemorrhage is present can be given. (Vlaardingerbroek and den Boer 2003)
TE = 26 ms, TR = 784 ms, flip angle = 20°, matrix = 192 x 256, 24 slices, slice thickness = 6 mm, in-plane resolution = 0.90 x 0.90 mm2
At time point 2 a T2-FLAIR was acquired using the same scanning protocol as done at time point 1.
4.2.2. Processing of the acute stroke data The additional T2-FLAIR MRI was acquired during chronic state after 14 days up to one month, in order to be able to train the algorithm for the stroke prediction during the period of processing pipeline development. On this T2-FLAIR scan four independent and experienced radiologists outlined the stroke lesion. The final lesion was then determined being the 2-common volume, meaning the volume where at least two readers agreed. The T2-FLAIR from time point 2 was segmented into GM, WM and CSF using the MATLAB (The MathWorks 2007) based automated image registration software SPM8b (Friston
et al. 2009). As mentioned in section 0, the segmentation results were saved as probabilistic values between 0 and 1. The T2-FLAIR, the outlined lesion mask and the segmentation results from time point 2 were coregistered to the acute perfusion time series. Additionally, the other scans acquired in acute stroke state had to be coregistered to the perfusion scans. It was chosen to coregister to the perfusion scans, as these are the images with the least quality, but contain
Chapter 4: Prediction of stroke outcome 73 _____________________________________________________________________________ high information value. After coregistration, the resulting lesion masks were visually inspected and corrected if needed. The diffusion (ADC, DWI and b0-image), T2-FLAIR, T2*, and follow-up data were spatially coregistered with SPM8b (Friston et al. 2009). All data were coregistered to the same dimensions (128x128x20), orientation, and coordinates as the perfusion images using normalised mutual information for optimisation of the objective function. The images are then written into the coregistered space using 4th degree B-spline interpolation. The pre-processing with SPM8 follows the same procedure for model-free and model and is illustrated in Figure 4.1.
Figure 4.1: SPM pre-processing pipeline
4.2.3. Training and testing data sets for the Classifiers All the acquired and coregistered data had to be further processed for better feature representation in order to be distinguishable for the classifier. Two different approaches were investigated for this binary classification problem within the scope of the thesis: the model and the model-free approach. Therefore, two varying training and testing data types were created. For the model-approach, the data sets were composed as suggested in literature: CBF, CBV, MTT, ADC, DWI, b0-image, T2-FLAIR and T2* images, resulting in 8 features. Each feature was normalised using the mean signal within a WM or CSF mask derived from the coregistered T2-FLAIR (the initial signal S0 used for normalisation for each feature can be seen in Table 7). Three versions for testing the performance of prediction using the model approach were created. The first version used as features the already described data with normalisation. Here the perfusion maps were calculated using the “Mouridsen-model” (Mouridsen et al. 2006). For the second version the above features were “minmax” scaled in addition to the normalisation. The third version is composed of the data sets above, except that the calculation of the perfusion parameters was performed with a new approach formulated by Gall et al. (2009). This was done in order to test what effect different perfusion parameter calculation
Chapter 4: Prediction of stroke outcome 74 _____________________________________________________________________________ methods and the level of error within the calculated parameters, have upon the prediction accuracy. For the model-free approach, the perfusion time series had to be processed into some feature representation. The whole perfusion time series was normalised with an initial signal value S0 derived by averaging the first couple of points of the perfusion time series, before the bolus of the contrast agent had time to pass through and where the images did not show any signal attenuation. For the usage as feature the data were processed as follows: All voxels of the whole volume were averaged, but the time domain was kept, resulting in a mean signal changing with time after contrast agent administration (meanPerf). Afterwards, the signal was interpolated in order to receive a better time resolution. For this the signal curve is fitted using b-spline interpolation and resampled with additional four time points (text) between the original (extmeanPerf). Using this new highly resolved mean curve the beginning of the bolus is searched by finding the zero crossing (cross) after calculation of weighted neighbouring signal differences (diff). This defined point was taken as a start point for each voxel in the original perfusion data, except that the signal might be interpolated in case the start of the bolus lays in-between the originally measured data points. From this starting point (tstart) 43 additional time points (an arbitrary number), incorporating the whole bolus passage until signal recovery, create the features derived from the perfusion data and represents the signal Sb of the extracted bolus. The following pseudo-code describes the procedure in detail.
Chapter 4: Prediction of stroke outcome 75 _____________________________________________________________________________
Pseudo-code for the extraction of the bolus passage Initialise: weight(u) = [-2,-1,0,1,2] text = torig/5 meanPerf(torig) = mean(S) extmeanPerf(xext,text) = interpol(meanPerf(xorig,torig)) diff(1,2) = 0 then find start of bolus: for m = 3 : xext(end) if xext(m) -0.5 AND diff < 0.5) tstart = find(first(cross(text > 8 · TR))) extract bolus for each voxel: if tstart ≠ any(torig) tnew = [tstart + TR : tstart + 43] Sb(tnew) = interpol(S(torig) / S0(torig) Æ tnew) else Sb(torig) = [S(tstart) / S0(torig) : S(tstart + 43)] end
Finding the bolus start point in the mean perfusion time series ensures that the individual shift of the bolus start due to pathological or anatomical reasons is still preserved in each voxel. Each of the 44 used perfusion time points made up one feature. In order to be in about the same signal range (between -1 or 0 and 1) as the other features an additional factor (here: 1/10) was applied. All following features were accomplished from the other MR modalities such as the trace image (DWI), the ADC image, the b0-image from the DWI measurement, the acute T2FLAIR measurement and the T2*-image. All features except ADC were normalised and then scaled using the “minmax” scaling algorithm (see section 2.5). For ADC no scaling and normalisation was performed, as this already is a normalised number. The normalisation
Chapter 4: Prediction of stroke outcome 76 _____________________________________________________________________________ depended on the feature, the following Table 7 shows from which tissue region the mean signal
S0 was acquired in order to normalise with using: SN = S/S0. The used probabilistic WM and CSF maps for selecting the regions of interest in which the mean signal S0 is determined, were segmented from the T2-FLAIR and then coregistered. Table 7: The basis of the derived initial signal S0 for the normalisation of each feature is shown for model and model-free.
model
features
Normalisation basis (S0)
CBF
Mean(WM mask > 0.5)
CBV MTT b0-image ADC DWI T2-FLAIR T2*
Mean(WM mask > 0.5) Mean(WM mask > 0.5) Mean(CSF mask > 0.5) No normalisation Mean(WM mask > 0.5) Mean(WM mask > 0.3) Mean(CSF mask > 0.3)
model-free pw time points DWI ADC b0-image T2-FLAIR T2*
Mean(S0), Factor 1/10 Mean(WM mask > 0.5) No normalisation Mean(CSF mask > 0.5) Mean(WM mask > 0.3) Mean(CSF mask > 0.3)
While working with the model-free features, it was found that the histograms of the features were highly differing despite normalisation and scaling, except for ADC. Therefore, before prediction was performed all data sets were histogram matched with one arbitrary respective data set (see section 2.5), so that all data sets showed the same histogram. This was done using the MATLAB functions: “imhist” and “histeq”. Initially it was tested which SVM setup fits best for the problem using cross-correlation. The three options: one-class SVM (training with a subset being only one class in order to be able to test whether a or not a test point lies outside), multi-class SVM (3-class problem: healthy, lesion and recovered voxel) and a two-class SVM (healthy and lesion voxel). The best results were obtained in the two-class SVM setup. As mentioned before, the labelling of the voxels into the two classes “infarcted” (value 1) and “non-infarcted” (value -1) was done with help of the follow-up T2-FLAIR image. Also, the acute ADC map and acute T2-FLAIR images were inspected. Four independent radiologists
Chapter 4: Prediction of stroke outcome 77 _____________________________________________________________________________ outlined the final lesion and the resulting lesion was pooled from the consensus of at least two readers. Care was taken to avoid including regions demonstrating chronic changes on the T2 contrast, such as old stroke lesions or periventricular (close to the ventricle) white matter abnormalities. In the test data sets all voxels outside the lesions were considered as “normal” voxels, though this might include errors, as this is not always clearly definable and also might contain partial volumes of infarction. For the training data sets in the model-free case the selection of “normal” voxels (meaning no lesion) was done in two ways. In the first option the selection was limited to the ipsilateral hemisphere (opposite the hemisphere with the stroke lesion) in slices that showed evidence of no infarction. The voxels were randomly selected from a preselected area, in order to achieve normal voxels from each kind of tissue type: background, CSF, grey matter and white matter. In the second option all “normal” voxels were selected by hand, so that each tissue type was represented with roughly the same amount voxels. Here, also voxels were included which appear affected in the acute data, but heal during the following weeks until the follow-up scan was performed. In both options the number of lesion voxels and non-lesion voxels were the same resulting in a balanced training data set. In the case of the model-dependent approach the complete data set was used, since logistic regression does not need much computation time, meaning here a imbalanced training data set was used. In addition, it was tested whether or not the composition of the training data set has an effect on the prediction result. Therefore, also the logistic regression classifier was trained with manually selected data subsets having the same amount of lesion and non-lesion voxels, meaning with a balanced training data set. In order to achieve high generalisation ability of the classifier the training data sets were composed from several patients, this was done for both methods: model and model-free. For this the training voxels for each patient were combined using the leave-one-out (LOO) method, meaning always all except for one data set were combined (the data set with which was tested was left out).
Chapter 4: Prediction of stroke outcome 78 _____________________________________________________________________________
4.2.4. Adjustment and usage of classifiers Logistic Regression The implementation of the logistic regression was done in MATLAB (The MathWorks 2007) environment using the in the statistics toolbox provided functions “glmfit” and “glmval”. The coefficients of the GLM are estimated (see section 0) with aid of a training data set, which is composed of several patient examples using LOO. After estimation of the coefficients the prediction using a test data set can be performed. The result is a probabilistic value P between 0 and 1, which then is appointed to be 1 (lesion) or -1 (healthy) by thresholding. In order to receive the ROC and PR curve pairs the threshold is stepwise changed between the interval [0 … 1].
SVM In order to find the optimal parameter setting for the prediction task using the SVM, parameters were systematically changed during training resulting in several sensitivityspecificity and precision-recall sets (see section 2.2 for definitions of the following parameters). The kernel of choice for the prediction task was the RBF kernel (see sections 2.2 and 3.4). The systematic search of optimal parameters followed always the same procedure. First the penalty variable C was changed in the interval [1 … 10] in order to account for the highly imbalanced data sets (see section 2.3). The next parameter to be changed was the weight w, which was changed while keeping the C, which resulted in best accuracy. The weight was changed in the interval [0.1 … 10] for class 1 (lesion) and in the interval [0.05 … 1.05] for the class -1 (healthy). After the optimal weight was determined the final ROC and PR curve pairs were determined by changing the distance of the hyperplane in the range of ±1 of the original one automatically used in the model with optimal C and w.
Chapter 4: Prediction of stroke outcome 79 _____________________________________________________________________________
4.3 Results The results will be presented in three parts. First there will be a discussion about how to evaluate the prediction algorithms and how it is possible to compare with literature. In the second part the results of the investigation of the data and features are shown and in the third part the results of the evaluation of our developed prediction methods are presented.
4.3.1. Statistical Analysis of the performance of the stroke outcome prediction algorithms For the evaluation of the accuracy of the prediction algorithms, the labels for infarction and healthy regions have to be encountered applying the developed approach on several test data sets. The performance of the algorithms can be evaluated on its ability to accurately discriminate the infarction from healthy regions. By comparing the predicted maps with lesions demonstrated on coregistered follow-up T2-FLAIR images, the number of voxels predicted to develop infarction, which have infarction (true positives = TP), and the number which did not develop infarction (false-positives = FP) were tabulated. Also, the true negatives (TN, the number of voxels predicted not to develop an infarction and also remained healthy) and the false-negatives (FN, negatively predicted voxels, which developed infarction) were determined. From these counts the confusion matrix can be created: Table 8: Confusion Matrix
Hypothesised Yes class No
True class Positive Negative True Positives False Positives False Negatives True Negatives
and the algorithm’s sensitivity or recall can be calculated: Sensitivity = Recall = TP (TP + FN)
(4.1)
as well as the specificity: Specificty = TN (TN + FP)
(4.2)
In order to evaluate the performance of the algorithm, one option is to generate the receiver operating characteristic (ROC) curves from these measures for each tested algorithm (Zweig and Campbell 1993). This is done by plotting the sensitivity (or recall) against the specificity (often
Chapter 4: Prediction of stroke outcome 80 _____________________________________________________________________________ also 1-specificity) by producing many pairs of sensitivity – specificity. For the GLM algorithms, the probability cut-offs for classifying tissue with infarction were varied from 0 to 1 in some increments (e.g. 0.01). The area under the ROC curve (AUC) has been shown to represent the probability that an image will be correctly ranked normal or abnormal and therefore used to assess the performance of diagnostic systems (Bradley 1997; Wu et al. 2001). The AUC of this curve can be evaluated using trapezoidal integration (Wu et al. 2001). The AUCs for the different algorithms can then be compared by statistical tests (Mouridsen et al. 2006; Christensen et al. 2009; Jonsdottir et al. 2009). ROC curves and its derived values are used in most applications for stroke outcome prediction (Landgrebe et al. 2006). When using imbalanced data sets with many negative examples, plotting ROC, or evaluating only the specificity and sensitivity, gives an incorrect impression about the performance of prediction algorithms in the context of stroke. Therefore, an additional statistical analysis for the evaluation of prediction algorithms should be used, which is not dependent on specificity or the number of negative examples. For the so called precisionrecall curves, the precision (equation (4.3)) and recall (equation (4.1)) have to be calculated and plotted (Landgrebe et al. 2006). Precision = TP (TP + FP)
(4.3)
Certainly also here the AUC can be calculated in order to be able to compare between data sets or algorithms. However, since the ROC curves are assembled also from the specificity, it has to be considered that only a comparison between algorithms can be performed when exactly the same test data were used or if the test data were composed exactly the same. In order to show the dependence of the ROC curve from the ratio of positive and negative examples, several examinations were performed. For this, one of the acute multi-modal MRI stroke patient data sets was used in order to create two data subsets. The first subset consists of the complete imbalanced data set containing 5880 lesion voxels and 321800 non-lesion voxels (ratio: 1/55). In the second data set the imbalance was reduced by cropping background noise resulting in a lesion/non-lesion ratio of 1/20. In addition, two different feature setups were used (see section 4.2.3). The first feature setup was composed using the model approaches with CBF, CBV, MTT, DWI, ADC, b0-image, T2-FLAIR and T2* as basis (Wu et al. 2001). In the second feature setup the model-free approach was applied without using the derived perfusion parameters in addition to DWI, ADC, b0-image, T2-FLAIR and T2*. The prediction was performed with logistic
Chapter 4: Prediction of stroke outcome 81 _____________________________________________________________________________ regression in a general linear model set-up, where a supervised learning approach was used in order to calculate the coefficients for the stroke outcome prediction (see section 0). The performance was then evaluated using PR and ROC curves on the algorithm’s ability to accurately discriminate the regions with infarction from healthy regions.
Figure 4.2: The ROC on the left shows the comparison of the prediction results by using the standard features and the two data subsets. The ROC on the right shows the comparison of the new feature setup applied to the two data subsets.
As can be seen in Figure 4.2, the resulting ROC curvets differ depending on data set composition, but also when changing the composition of the features. In contrary, the PR curves stay the same in case only the data type ratio was changed (Figure 4.3).
Figure 4.3: The PR curve on the shows the comparison of the prediction results by using the standard features and the two data subsets. The PR curve on the right shows the comparison of the new feature setup applied to the two data subsets.
Chapter 4: Prediction of stroke outcome 82 _____________________________________________________________________________ When changing the features given to the algorithm the different performances of these two methods could already be seen in the respective confusion matrices. The difference between the two methods in the ROC curve seems to be similar to the difference just by changing the data imbalance ratios. However, it can be seen in the PR curves that the two methods actually have a different performance.
4.3.2. Data and feature analysis of the nine patients In order to get an overview of the nine patient data sets, the data were analysed with respect to lesion size, histogram behaviour and bolus differences. The following table lists the different chronic lesion sizes and demographics of the nine patients: Table 9: Lesion sizes of the chronic state (patient 3 had an old infarction on the other hemisphere resulting in more chronic lesion voxels written in parenthesis) and demographics of the nine patients (the patient ID’s are written in parenthesis)
Patient Number (ID number)
Age / years
Gender
1 (009) 2 (043) 3 (048) 4 (084) 5 (153) 6 (170) 7 (171) 8 (190) 9 (193)
78 66 70 78 35 66 74 66 75
male male male female male male female female male
Mean SD
67.6 13.2
− −
Weight / kg
Lesion size / voxels (plus old infarction)
75 N/K 85 85 100 90 60 86 75
2018 809 3433 (3689)* 655 9099 1887 2116 267 5880
82 12
2907 (2936)* 2880 (2887)*
* the number in brackets is if both, new and old lesions, are combined (patient 3), and averaged across all.
Patient 3 (ID 048) had an old infarction on the opposite hemisphere, which, if it were returned as a prediction of chronic tissue can be counted as correct. This patient was difficult to evaluate and the results had to be taken with caution. Nevertheless this patient was one of the five patients giving usable results when using model-free features with SVM classification; patients 3, 4, 5, 6 and 7. Classification of the remaining four patients basically favoured the nonlesion class. All nine patients were possible to evaluate with the model approach using logistic regression. Only two patients (patient 1 and 2) could be evaluated using the model-free approach with logistic regression analysis (see next chapter).
Chapter 4: Prediction of stroke outcome 83 _____________________________________________________________________________ The acute mismatch scenario (see section 1.3.5) for each patient is evaluated in Table 10. Additionally, the amount of lesion voxels visible in PWI, DWI as well as those which are damaged in PWI and DWI (DWI-PWI core) are listed. Table 10: Amount of disrupted voxels in the acute PWI, DWI and DWI-PWI core, as well as those voxels disrupted in DWI or PWI. This information is evaluated in order to define the mismatch scenario for each patient. In order to be able to interpret the results in connection with the mismatch concept, the final stroke lesion size is also listed.
Patient No
PWI voxels
DWI voxels
DWI-PWI core DWI ≠ PWI voxels voxels
1 (009) 2 (043) 3 (048) 4 (084) 5 (153) 6 (170) 7 (171) 8 (190) 9 (193)
572 1786 6921 4154 8129 4198 9662 -2471
1065 1373 2928 181 3509 1187 616 217 5213
292 651 2559 81 2418 781 329 217 1970
1053 1857 4731 4173 6802 3823 9620 217 3552
Final infarct size / voxels 2018 809 3433 655 9099 1887 2116 267 5880
Mean SD
4737 3215
1810 1713
1033 997
3981 2912
2907 2880
Scenario DWI > PWI PWI > DWI PWI > DWI PWI ≠ DWI PWI > DWI PWI > DWI PWI > DWI DWI > PWI DWI > PWI PWI > DWI
Most common was the scenario PWI > DWI, being also the most difficult scenario for the task of stroke prediction (but also the one occurring most often in general) (see section 1.3.5). Following the definition of the mismatch concept, patients 1, 8 and 9 with DWI > PWI should have salvageable tissue. As can be seen in the final infarct size, this concept seems not to be valid in those patients. Also the PWI > DWI scenario is usually interpreted that the PWI region is potentially salvageable, whereas the core remains damaged. Again, it can be seen that the concept is not valid in this patient subset, since most of the PWI disruption became chronic lesion as well. Especially the fact that PWI lesion voxels became chronic is surprising, since it is said that tissue is salvageable if it is spontaneously reperfused, which should have been the case for all patients, since they had thrombolytic treatment shortly after their arrival in the hospital, meaning < 6 hours post stroke. The data sets which showed to suit best for successful training were the ones where final infarct size was less than initial PWI deficit, but greater than the initial DWI lesion (patients 3, 6, 7 and 9). These patients showed different mismatch scenarios, but had in common that the final lesion strongly depended on the acute lesions in PWI, DWI and PWI AND DWI.
Chapter 4: Prediction of stroke outcome 84 _____________________________________________________________________________ In Table 11 the statistics of the final lesion outline in the T2-FLAIR at time point 2 regarding its spatial overlap with the acute MRI measurements is evaluated. Table 11: Final lesion size (outlined in the T2-FLAIR at time point 2 and then coregistered) and spatial overlap with acute DWI, PWI, DWI-PWI core and acute T2-FLAIR. This overview gives an impression about how coregistration worked and how the mismatch model is valid.
Patient No
% of acute DWI
% of acute PWI
% of acute DWIPWI core
% of acute T2-FLAIR
1 (009) 2 (043) 3 (048) 4 (084) 5 (153) 6 (170) 7 (171) 8 (190) 9 (193)
76.24 27.97 79.47 0.00 98.75 62.09 97.08 72.35 82.87
68.88 17.92 42.99 1.95 59.16 37.37 14.62 − 90.93
90.41 39.17 85.35 0.00 99.04 80.15 99.39 72.35 95.35
− 66.41 63.54 0.00 98.81 − − 96.08 −
Mean SD
74.60 22.42
42.73 30.16
82.65 19.93
81.21 18.82
The table makes clear that patient 4 has no agreement between acute and chronic lesion. This data set was only feasible in the model-free set-up with the SVM. The classifier logistic regression did not show any usable results. All other data sets had a high overlap with the acute DWI lesion and if the lesion was old enough to be visible in the T2-FLAIR, also here an average overlap of 81 % with the final lesion could be found. 82.7 % of the final lesion corresponds with the so-called core, defined as the region having a disruption in DWI and PWI. Only 42.7 % of the final lesion had an overlap with the acute pw-MRI measurement. Patient 4 was therefore omitted for most following evaluations as well. The Figure 4.4 shows exemplarily how the mean perfusion time series constitutes with original sampling rate, increased sampling rate when using interpolation, the starting point of the bolus found by the illustrated difference function (in green) and the resulting bolus finally used as features.
Chapter 4: Prediction of stroke outcome 85 _____________________________________________________________________________
Figure 4.4: Mean perfusion time series (in red) and the difference signal (in green, see section 4.2.3) used for defining the start point of the bolus. The blue function shows the mean perfusion time series with increased time resolution using b-spline interpolation. The black signal represents the mean resulting extracted bolus. The red circle delineates the extracted start point of the bolus.
The extracted boli of the five patients used for model-free SVM prediction are plotted in the following graph (Figure 4.5). It can be seen that the starting point was detected correctly in all cases, but that signal intensities differ despite normalisation. Another difference can be recognised easily: the duration of signal recovery and bolus width is differing. Despite these differences, the predictions of stroke outcome worked on these five patients well and are presented here.
Chapter 4: Prediction of stroke outcome 86 _____________________________________________________________________________
Figure 4.5: Mean normalised boli of 5 patients (patients 3, 4, 5, 6, 7) used for model-free SVM prediction. It can be seen the bolus starting point is the same in all cases and the main difference lies in signal intensities and duration of signal recovery.
Since the signal intensities not only of the perfusion time series signal, but also of the other features highly differed, histogram matching had to be performed (see section 2.5 and 4.2.3). The histograms of the features are compared for five patients with and without histogram matching in Figure 4.6 – Figure 4.7 for the model-free approach. Only those voxels were considered for the histograms being inside a derived brain mask.
Chapter 4: Prediction of stroke outcome 87 _____________________________________________________________________________
Figure 4.6: Histograms of the model-free features (without image background) of 5 patients (minmax scaling but no histogram matching): a) perfusion signal of time point 10, b) DWI, c) ADC (in arbitrary units from scanner vendor), d) b0-image and e) T2-FLAIR
Figure 4.7: Histograms of the model-free features (without image background, features with minmax scaling and histogram matching) of 5 patients: a) perfusion signal of time point 10, b) DWI, c) ADC (in arbitrary units from scanner vendor), d) b0-image and e) T2-FLAIR
Chapter 4: Prediction of stroke outcome 88 _____________________________________________________________________________ Only the ADCs do not show similar histograms, as this was not matched being a representative value of diffusion (displayed in arbitrary units from scanner vendor). The histograms of the features of all nine patients are shown in Figure 4.8 for the model approach. Here no histogram matching had to be performed since the histograms of the features with most influence (dw- and pw-derived values) were very similar already.
Figure 4.8: Histograms of the normalised features for the model approach of all nine patients. a) CBF, b) CBV, c) MTT, d) b0-image, e) ADC (in arbitrary units from scanner vendor), f) DWI, g) T2-FLAIR
In order to clarify the whole pipeline of feature processing the following Figure 4.9 summarises the previous paragraphs for the model-free features and Figure 4.10 for the model-dependent features.
Chapter 4: Prediction of stroke outcome 89 _____________________________________________________________________________
Figure 4.9: Flowchart of the processing pipeline of the model-free features, here it is shown for normalisation, histogram matching and scaling. The prefix “c” means coregistered, “n” means normalised, “h” stands for histogram matching and “s” for scaled.
Figure 4.10: Flowchart of the processing pipeline of the model dependent features, here it is shown for normalisation and scaling. The prefix “c” means coregistered, “n” means normalised and “s” means scaled.
Chapter 4: Prediction of stroke outcome 90 _____________________________________________________________________________
4.3.3. Evaluation of the developed stroke lesion outcome prediction algorithms On the following pages the two methods “model” and “model-free” are compared, but also the performance of the two classifiers - logistic regression and SVM. The test data sets were composed of the complete data sets without cropping the background. Firstly, the model approach will be discussed; followed by the model-free approach and then a comparison of the two.
Model approach For the model approach the mentioned eight patients were evaluated (patients 1, 2, 3, 5, 6, 7, 8 and 9) using first the approach by Mouridsen et al. (2006). Training was performed using the LOO combined complete data sets (meaning four complete patient data sets were combined but the one on which was tested was left out). In addition, it was tested whether or not the accuracy increases when carefully selecting only a choice of voxels for training as suggested by Jonsdottir et al. (2009). Jonsdottir et al. suggested to train with 50% healthy and 50% lesion voxels, whereas within the healthy training examples voxels are included which were damaged in acute state but recovered. Meaning the authors suggested an undersampling of the negative training examples. In addition, each tissue type was taken with equal numbers of voxels (GM, WM, CSF and background noise). Figure 4.11 shows the comparison of the prediction accuracies when training with the LOO combined complete data sets or with the LOO combined selected sub-data-sets (results shown exemplary for four patients). It can be seen that the best accuracies are achieved when taking the complete data-sets. However, this only becomes clear when looking at the PR curves. The ROC curves are very similar and in patient 1 show even a slightly larger area under the curve.
Chapter 4: Prediction of stroke outcome 91 _____________________________________________________________________________
Patient 1
Patient 3
Patient 6
Patient 9
Figure 4.11: Comparison of the prediction accuracy when training with the complete combined data sets (blue) and combined sub-data sets (red).
Chapter 4: Prediction of stroke outcome 92 _____________________________________________________________________________ One of the aims of the analysis was to find out how sensitive the prediction reacts to a change of the model for the selection of the AIF. Therefore, the second method for the calculation of the perfusion parameters by Gall et al. (2009) was used. The comparison of the “Mouridsen-model” with the “Gall-model” using logistic regression as predictor are presented in Figure 4.12 to 4.14 again in form of PR and ROC curves. Here, the results for all eight patients are shown. ROC
PR curve Patient 1
Patient 2
Patient 3
Figure 4.12: PR curves and ROCs of three patients (patients 1 – 3), comparing the “Gallmodel” (in blue) with the “Mouridsen-model” (in red).
Chapter 4: Prediction of stroke outcome 93 _____________________________________________________________________________ PR curve
ROC Patient 5
Patient 6
Patient 7
Figure 4.13: PR curves and ROCs of three patients (patients 5 – 7), comparing the “Gallmodel” (in blue) with the “Mouridsen-model” (in red).
Chapter 4: Prediction of stroke outcome 94 _____________________________________________________________________________
ROC
PR curve Patient 9
Patient 8
Figure 4.14: PR curves and ROCs of the last two patients (patients 8 and 9) comparing the “Gall-model” (in blue) with the “Mouridsen-model” (in red).
In the following Table 12 are the results of the investigation into whether there is a dependence of the prediction performance upon final lesion size. The maximum result of the product of precision and recall from all acquired precision and recall pairs is determined and then listed. It can be seen that once the lesion is big enough for being able to be used in the prediction task, the accuracy is not exactly dependent on the lesion size. However, it can be observed that data sets with bigger lesions are easier to predict. Table 12: Investigation of dependence of prediction performance from final lesion size.
Sorting (best performing data set) Precision · Recall 1 2 3
0.331 0.220 0.179
Lesion size 3433 (3689) 5880 2116
Patient No 3 (048) 9 (193) 7 (171)
Chapter 4: Prediction of stroke outcome 95 _____________________________________________________________________________ 4 5 6 7 8
0.178 0.118 0.107 0.095 0.091
1887 2015 9099 809 267
6 (170) 1 (009) 5 (153) 2 (043) 8 (190)
In Figure 4.15 the prediction results for different training data set compositions are shown on four exemplary data sets (patients 1, 2, 3, and 5), when changing the number of combined training data sets. For this three training setups were used: 1. All 14 patient data sets resulting in 13 patients for training (when leaving the one out (LOO) on which the algorithm is tested), 2. The nine patient data sets (with the 5 patients having very small lesions excluded) resulting in 8 patients for training (when using LOO) 3. And the four best training data sets which gave best accuracy for prediction (patients 1, 3, 6, and 9), resulting in three patient data sets for training (when using LOO) Already the combination of 3 patients seems to have enough information for a successful training, resulting already in the best possible prediction result. This result cannot be improved by adding more patients; in fact the opposite is true if the examples are not representative. Including patients in the training data set that have small lesions (and therefore more chance of recovery) means a bias towards predicting recovery (as the blue curves in Figure 4.15 indicates).
Chapter 4: Prediction of stroke outcome 96 _____________________________________________________________________________
Figure 4.15: Comparison of changed numbers of patients combined in the training data set exemplary for four patients (patient 1, 2, 3, 5). The prediction results for the training data sets with 13 combined data sets (also the ones which did not perform) is shown in blue, 7 or 8 patients in red and 3 patients in green.
Chapter 4: Prediction of stroke outcome 97 _____________________________________________________________________________ Another interesting aspect of the investigation is the consistency of the results with the mismatch scenario mentioned in section 1.3.5 and 4.3.2. In order to be able to determine how the prediction result is dependent on dw- or pw-MRI measurement, it is evaluated how much of the finally detected lesion was present in acute DWI or PWI (combined MTT, CBF, CBV masks) or in the region where a damage was present acutely in DWI and PWI. The results are presented in Table 13. Although prone to debate, it is necessary to give some expression of accuracy for lesion outcome prediction. In the context of this highly imbalanced data, the accuracy was determined by taking those parameters for logistic regression with the best result of the product of precision and recall (PR). The detected lesion voxels were then compared with the chronic T2-FLAIR and expressed as the detection accuracy in percentage. The results of this procedure are presented in Table 13. Unfortunately this accuracy number is very low being a trade-off between precision and recall. It is clear that the accuracy expressed in percent of the outlined lesion in the T2FLAIR at time point 2 could be much higher when sensitivity is neglected. Table 13: Evaluation of the detection accuracy at the best precision/recall trade-off. Also the dependence of the prediction result on the acute dw- and pw-MRI is shown.
Patient No
acute DWI lesion / %
acute PWI lesion / %
acute DWI– PWI core / %
acute T2-FLAIR lesion / %
1 2 3
35.59 31.39 63.36
31.64 20.77 32.83
59.25 54.07 68.97
− 100.00 67.71
5 6 7 8 9
71.84 46.08 76.62 36.41 56.22
37.09 17.17 13.52 − 62.40
74.81 62.23 93.62 36.41 69.85
39.29 − − 49.02 −
17.74 30.53 56.13 (53.37)* 39.29 34.13 45.84 26.97 50.46
52.19 17.40
30.77 16.47
64.90 16.61
64.01 26.74
37.64 (37.29)* 13.08
Mean Std
prediction accuracy final lesion / %
* the number in brackets is if both, new and old lesions, are combined (patient 3), and averaged across all.
It can be seen that the biggest overlap of the prediction result is found with the acute DWI-PWI core (voxels disrupted in PWI and DWI). If a disruption was present in the acute T2-FLAIR also here dependence of the prediction result is present. The dependence of the predicted lesion is about double as high for DWI as PWI.
Chapter 4: Prediction of stroke outcome 98 _____________________________________________________________________________
Model-free approach In order to find out whether the SVM or logistic regression is the better classifier for the prediction task, both classifiers were compared using the model-free data sets (Figure 4.16).
Figure 4.16: PR and ROC curves for two exemplary patients (patient 1 and 2) comparing the two classifiers SVM (blue) and logistic regression (red). Also the results for the Mouridsenmodel approach are shown (green). The performance for the SVM was very bad for patient 1, but the logistic regression shows similar results compared with the model. For the second patient the model-free logistic regression performed better than the model and the SVM.
The logistic regression classifier could not predict any non-lesion voxels (all voxels were predicted to be 1, meaning chronic lesion) except in two patients (patient 1 and patient 2). In these two patients the prediction results are very similar to the SVM. Patient 2 showed slightly better performance with logistic regression than the SVM, but only in the range of low sensitivity.
Chapter 4: Prediction of stroke outcome 99 _____________________________________________________________________________ Figure 4.17 and 4.18 shows the PR curves and ROC curves of the five best performing patients when predicting stroke outcome using SVM with the model-free approach (blue plots). The features were “minmax” scaled and histogram matched and are shown in comparison with regression using the Mouridsen-model (red plots). PR curve
ROC Patient 3
Patient 5
Patient 6
Figure 4.17: PR curves (left) and ROCs (right) of three patients comparing model-free using the SVM (blue) with the “Mouridsen model” approach using logistic regression (red).
Chapter 4: Prediction of stroke outcome 100 _____________________________________________________________________________ PR curve
ROC Patient 7
Patient 2
Figure 4.18: PR curves (left) and ROCs (right) of additional two patients comparing model-free using the SVM (blue) with the “Mouridsen model” approach using logistic regression (red).
For the model-free approach, the careful selection of the voxels for the training data set had to be performed, since taking the whole data set for training and also combining the data sets with LOO, would end in a far too long computation time (up to a couple of days). The method of choice was the undersampling of the negative training examples. Since the SVM is very prone to imbalanced data sets (see section 2.3), this also increases the prediction accuracy (see section 2.3 and (Akbani et al. 2004)), as prediction of the class with more examples is preferred. Meaning when taking the realistic setup for training the SVM, no lesion would be predicted or the negative class would be highly preferred. An exemplary prediction result of the model-free SVM approach is shown for patient 7 in the following Figure 4.19 as a region of interest (in red opaque) overlaid on the chronic coregistered T2-FLAIR image.
Chapter 4: Prediction of stroke outcome 101 _____________________________________________________________________________
Figure 4.19: Coregistered T2-FLAIR at time point 2 with the overlaid prediction result (in opaque red) of the SVM using no model (patient 7). Exemplary two varying slices are shown (slice 10 and 14). The final lesion is outlined with a green line.
The stroke lesion appears in lighter grey (surrounded with green line) than the surrounding tissue. The voxels coloured in red mostly lie in the stroke region, when judging visually. Some of the false positive detected lesions lie in areas of artefacts or in CSF. In comparison, the same patient is used for the illustration of the prediction result using the “Mouridsen-model” approach (Figure 4.20).
Figure 4.20: Coregistered T2-FLAIR at time point 2 with the overlaid prediction result (in opaque red) of the logistic regression using the “Mouridsen-model” (patient 7, same slices as in Figure 4.19). The final lesion is outlined with a green line.
The visual inspection of Figure 4.19 and Figure 4.20 gives the same impression as the PR curves: the model approach with logistic regression shows slightly better results. Again, it can be seen that most false positive voxels lie in areas of artefacts or pulsation. In the next figure (Figure 4.21) further post-processing was performed on the same results as shown in Figure 4.20. This was done in order to show how the prediction results could be
Chapter 4: Prediction of stroke outcome 102 _____________________________________________________________________________ further improved; by reducing the false positive voxels. This was done on the logistic regression results, but could be performed on any results.
Figure 4.21: Further post-processed prediction results (in opaque red) overlaid on the coregistered T2-FLAIR at time point 2 of the logistic regression using the “Mouridsen-model” (patient 7, same slices as in Figure 4.20). The final lesion is outlined with a green line.
The post-processing was performed as described in the following. Counting the detected lesion voxels on each brain side and selecting the hemisphere with the most detected voxels determined the ischemic hemisphere. All detected voxels on the other side were deleted. In the case of patient 3 (which was used in this example) this would also delete those voxels on the other hemisphere, which were detected as lesion correctly, but belong to an old infarction. In addition, only those lesion voxels, which existed in a cluster of at least eight lesion voxels within a kernel of 3x9 around each lesion voxel, were kept. With this procedure the sensitivity slightly decreased from 45.8% to 40.9%, the specificity slightly increased from 99.5% to 99.9%. The main difference can be seen in the precision, which increased from 39.1% to 70.2%. These numbers are summarised in a confusion matrix in Table 14. The in this patient aimed number of true positive voxels would be 2116 (TP + FN) and the aimed number of true negative voxels would be 325564 (FP + TN).
Chapter 4: Prediction of stroke outcome 103 _____________________________________________________________________________ Table 14: Confusion matrix of further post-processed classification results using logistic regression. The numbers without post-processing are written in parenthesis (the same results as in Figure 4.20 and Figure 4.21).
True Class Hypothesised class
TP
866 (970)
FN
1250 (1146)
FP
Precision 1509 (3679)
70.2% (39.1%)
TN 324055 (325197)
Sensitivity
Specificity
40.9% (45.8%)
99.9% (99.5%)
In order to have high precision and high sensitivity, such data sets with higher numbers of false positively detected voxels could be used for further post-processing instead. An example of this is shown in Figure 4.22.
Figure 4.22: Further post-processed prediction results (in opaque red) overlaid on the coregistered T2-FLAIR at time point 2 of the logistic regression using the “Mouridsen-model” (patient 7, same slices as in Figure 4.19). Here the cut-off of the prediction result was chosen to have higher sensitivity but low precision (meaning many false positives) in order to show the effect of postprocessing. The final lesion is outlined with a green line.
Table 15 shows the confusion matrix with resulting sensitivity, specificity and precision. The left value gives the original value and the right the result after post-processing. Here, those lesion voxels were included, which were not only on the major hemisphere but also clustered with at least ten other lesion voxels within a 3x9 kernel.
Chapter 4: Prediction of stroke outcome 104 _____________________________________________________________________________ Table 15: Confusion matrix of the classification results using logistic regression with further postprocessing using a data set with many FPs but higher TPs. The original numbers are written in parenthesis (the same results as in Figure 4.22).
Hypothesised class
TP FN
True Class 1261 (1440) FP 1722 (5975) 855 (676) TN 323842 (319589) Sensitivity Specificity 59.6% (68.1%) 99.5% (98.2%)
Precision 42.3% (18.8%)
Further post-processing of the predicted lesion leads to higher precisions by reducing the number of false positively detected lesion voxels. With such post-processing one can raise the precision (39.1% Æ 42.3%) to a level where without post-processing the sensitivity was much lower (45.8% Æ 59.6) (compare Table 13 and Table 15).
4.4 Discussion and Outlook The prediction of stroke outcome using dw- and pw-MRI is a very difficult task, as the most important features are based on the perfusion measurement, which is acquired with very low resolution. In addition, the underlying physiological responses in stroke are not clear. From the perfusion-weighted MRI measurement the parameters such as CBF, CBV and MTT are typically derived on the supposition of a model (see section 1.3.4.1). Deriving the perfusion parameters depends on the determined AIF, for which several theories exist. All published theories are still under discussion in the literature, and it has been shown that when determining the AIF by hand, the values differ depending on the operator (Mouridsen et al. 2006). Here, two methods which automatically determined AIF were compared (Mouridsen et al. (2006) and Gall
et al. (2009)). It could be shown that automatic methods also highly differ when it comes to the prediction task and therefore, the interest was focussed on the optimisation of the training data sets (complete or sub-data sets) and features given to the predictors by applying scaling, normalisation and histogram matching or using a different model for the extraction of the AIF. In addition a new model-free approach was applied and tested. The intention was to find out if information is omitted by using a model and if this information can be gained from the data by using a model-free approach. In addition, two classifiers were tested as predictors in order to clarify whether the predictor or the features are responsible for the accuracy of the results. It was found that not only the logistic regression method is much faster (a couple of seconds versus 30 minutes to an hour), but also performs slightly better than the SVM. This can be explained with
Chapter 4: Prediction of stroke outcome 105 _____________________________________________________________________________ the inability of the SVM to deal with highly imbalanced data sets (see section 2.3). Still the logistic regression did not totally outperform the SVM (it was only better in the area of low sensitivity), when taking the optimal trade-off between precision and recall. In addition, it could not deal with the model-free data, where only two out of nine patients showed any usable prediction results. But since the results of the model-free approach using the SVM were very similar to the results using the model approach with logistic regression this problem with the logistic regression is negligible.
How to determine the optimal parameter setting for the prediction algorithms Another conflict for the evaluation of stroke outcome prediction algorithm performance is the pointed question: is it important for the clinician to have a high precision, specificity or sensitivity? The answer for this can be found when trying to explain the possible scenarios.
⇒ High sensitivity and low precision means many true lesion voxels were found, but also a lot of false positives. This means the clinician might guess that the lesion will not recover and maybe is already too old with a too high risk of haemorrhage resulting in no thrombolytic treatment. In this case the decision would be wrong and really result in no recovery, although treatment would have helped.
⇒ The other scenario with high precision and low sensitivity means some of the lesion voxels are found (remember that there are not many true positives) and also most of the voxels would be true negatives, but unfortunately respectively a lot of lesion voxels would be detected as healthy. This result would give the impression that the recovery is very good, far better than it really would be. Here the clinician could decide to give thrombolytic treatment, since a very small chronic lesion is predicted hinting that the infarction is not very old and maybe has a shallow progression. This decision could again be wrong and result in unwanted bleeding, since in reality the infarction might have been very serious.
⇒ The third scenario would be high specificity and low sensitivity. Here, the algorithm detects most voxels as healthy and almost none as lesion, a similar scenario than scenario two, resulting in a similar medical decision. That is to say, that the best decision can be made when the decision is trade off against sensitivity and precision, for example one could use the product (precision · recall), which should
Chapter 4: Prediction of stroke outcome 106 _____________________________________________________________________________ be as high as possible. Meaning when comparing the presented results, one should look at the highest result of this product and it can easily be seen that all methods (model and model-free, logistic regression and SVM) are performing quite similarly as this should be close to the angle bisector between the axis of abscissa and ordinate. The highest difference between the algorithms lies in the precision when having low sensitivity. Meaning the algorithms tend to rather predict too many false negative lesion voxels, as they prefer the class with more examples, especially the model dependent approach with logistic regression.
Too small lesion sizes The results of the model approach also show that it is necessary to have at least a specific lesion size in order to have increased accuracy (results showed to be better for lesions of at least 1000 voxels), but the accuracy is not increasing with the lesion size. When the lesion size is too small (smaller than 200 voxels) the prediction of stroke outcome does not work effectively. However, such patients are not necessarily in need of outcome prediction, as they have a lower risk of haemorrhage and thrombolysis is normally performed anyway (Dijkhuizen et al. 2001).
Data variability When looking on the extracted boli from the perfusion time series of the five patients, a high variability could be seen in the signal strength, bolus duration and bolus height. Some of these problems could be solved by histogram matching, but there is still a high patient variability present due to different underlying kinds of stroke, varying cardiac output, additional other diseases, different stroke locations, differing age and gender, but also smoking or medication.
Complete or sub-data sets for training the logistic regression In contrast to literature the results show that not the usage of a manually selected training data set should be used for logistic regression (Jonsdottir et al. 2009), but that logistic regression showed higher accuracy when taking the whole original data sets for training. One explanation why this did not come into focus in literature before is that in other publications only the ROC curve (and the resulting AUC) was explored, which, in the case of imbalanced data, often leads to misinterpreting the results (see section 4.3.1). If a data set is imbalanced, it means that there are much less lesion voxels than healthy ones. The results of the precision recall curves show
Chapter 4: Prediction of stroke outcome 107 _____________________________________________________________________________ that the logistic regression algorithm prefers a realistic data set-up (using the complete imbalanced data set) for training.
Problem of imbalanced data sets As mentioned before, a big difficulty in the specific data composition for this prediction task lies in the fact that the data are highly imbalanced. It could be shown in the results section 4.3.1 that ROC curves can be very useful in case the number of positive and negative examples in the test data set is about equal. However, in the context of lesion outcome prediction the number of healthy voxels (negative examples) is much higher than the number of voxels with infarction (positive examples) resulting always in a specificity close to 1. This effect leads to high AUC, indicating high algorithm performance; but in reality the performance could be poor. In the light of the present results, using ROC curves hinders the comparison between different stroke prediction algorithms where the data source is not identical, since in reality, a setup with equal prevalence of infarction and healthy voxels does not exist and the ratio between these always varies. There are only a few voxels to classify which are abnormal typically being in the range of 1:50 to 1:100. In conclusion, very often in medical applications the majority of the examples are negative, and here specificity will always be high as long as the classifier is not predicting too many positives. For this reason, ROC curves should only be plotted in addition to PR curves to evaluate predictors. In general, all three measures (precision, recall, and specificity) or the entire confusion matrix should be presented if one does not know the appropriate measures to use for the distribution of their data set. This problem develops into further difficulties: not only that classifiers highly prefer balanced setups and rather prefer to classify the bigger class, but also the evaluation of accuracy is difficult as usually the accuracy is expressed in a probability seen in relation to random. Random is typically set at 50% with balanced data, but this is untrue for imbalanced data and is different for each patient situation. For example for a patient with 5000 lesion voxels and 322680 healthy voxels, random would lie at a detection accuracy of 15.5% (being the percentage of lesion voxels relative to non-lesion voxels) of the final lesion. The results therefore appear worse than for balanced classification problems, but in reality are performing well being much higher than random. As discussed before, the calculation of specificity and sensitivity is normally a good measure for classification performance. However, this is not the case for imbalanced data, as specificity depends on the number of true negatives and also using precision and sensitivity
Chapter 4: Prediction of stroke outcome 108 _____________________________________________________________________________ only make sense as long as the predictor is well suited for the problem, meaning it works well as long as not too many false positives are predicted. It seems that the SVM is less able to deal with the problem of imbalance and therefore the performance was less accurate than the logistic regression approach.
Comparison of two models for deriving the AIF Additionally, it was tested whether or not using a different model effects the prediction results. It was shown that the “Gall-model” did not perform as well as the “Mouridsen-model” showing the prediction algorithm is very sensitive to the method of AIF detection. This result does not mean that one or the other model is more correct, it only means that the “Gall-model” omits information necessary for outcome prediction. Since there is no other method published presenting a better solution for AIF detection, the “Mouridsen-model” seems to suit best to this task. Another hint that the “Mouridsen-model” seems to have complete information for the prediction task is its very similar performance compared with the model-free approach.
Agreement with mismatch scenario The patient data were evaluated in context of the mismatch scenarios. It could be seen that for both present cases: PWI > DWI and DWI > PWI, the mismatch concept is not valid when comparing the acute outlined lesions with the ones outlined on the chronic T2-FLAIR scan at time point 2. However, still a quite high number of permanently damaged voxels could be found in the DWI-PWI core and acute T2-FLAIR (both about 80%). The missing 20% could be due to coregistration errors and anatomical changes such as swelling (which ebbs away during healing) or dismantled tissue (of destroyed tissue). The highest agreement of the predicted lesion can be found with the DWI-PWI core (≈ 65 %) and acute T2-FLAIR (≈ 64 %). Whereas the visible disruption in the acute T2-FLAIR hints that the stroke onset was older than 6 hours in the first time point. More than 50 % of the predicted lesion voxels agree with the acute DWI lesion. The agreement with the DWI, PWI, DWI-PWI core and acute T2-FLAIR lesions is about 20 % less compared with outlined the chronic T2-FLAIR lesion. Therefore it is peculiar why there is only an agreement of about 38% between chronic T2-FLAIR lesion and predicted lesion. Again, misregistration could be the reason for this discrepancy, as the prediction result is in the original space of the acute perfusion scan. Altogether, one has to take the accuracy numbers of the prediction results presented in this
Chapter 4: Prediction of stroke outcome 109 _____________________________________________________________________________ thesis with care. The numbers are given for those parameter settings which gave the highest result of the product (precision · recall), which is the optimum for having not many false positive, but still many true positive results. Meaning it is possible to have better accuracy up to 100 % (high sensitivity), but at the disadvantage to have much lower precision caused by the high number of false positives.
Outlook The results of the investigation of the different stroke outcome prediction methods with existing clinical data showed that: a) The acquisition of the perfusion data needs further improvement, and, b) The reduction of data load by deriving perfusion parameters needs further improvement. The comparison of the two methods for deriving the AIF showed that working on a better model has high potential and research in this area should be continued. It could be seen that the “Gallmodel” lacks information necessary for the prediction task. The model-free approach with the usage of the SVM did not show better results than the “Mouridsen-model”, but made clear that the up-to-date imaging methods used in stroke protocols needs improvement and that especially the MRI methods of brain blood perfusion needs to be revolutionised. As already mentioned in the background section (1.3.4) there are a couple of problems accompanied with the measurement of brain perfusion using a contrast agent. An explanation for the very similar performance of the model-free and model approach could lie in the imperfect measurement of perfusion (see section 1.3.4.1 and van Osch et al. (2003)). Since the mentioned signal void is already present in the raw data, also the model-free approach does not contain the necessary information in order to enable high quality prediction. It may be that already the data source cannot give enough information for the prediction task, as it does not describe the tissue perfusion changes sufficiently. The sequences used for stroke imaging in clinics usually focus to be fast and indicate the lesion location. It can give an idea whether or not the patient shows high recovery potential, but in order to be usable for prediction the spatial resolution and signal amplitude in arteries needs improvement (see section 1.3.4). The success of prediction is still surprisingly good being highly above random. Therefore a big focus for future work should be put on the development of fast MRI methods containing the necessary information for prognosis.
Chapter 4: Prediction of stroke outcome 110 _____________________________________________________________________________ In order to further improve the prediction performance one could accept more false positively detected voxels in order to leave the final decision for discarding false positives to the radiologist. Generally in medical applications it is rather accepted to have more false positives if the results are subsequently re-screened by medical personnel. This would be a semi-automatic solution, up to now only a fully automatic solution was examined. In addition, one could apply further post-processing of the prediction results as shown exemplarily in Figure 4.21 and Figure 4.22. Here we first detected the brain hemisphere of the lesion and then only accepted voxels to be lesion if they belong to a cluster of a specific size. One could put further effort into the postprocessing and delete voxels in areas of typical EPI related artefacts as well as in regions being typical for pulsation. Lesion voxels could be connected with neighbouring detected lesion voxels in a specific radius in order to avoid holes. Also, voxels with a too high distance to the main core could be omitted. Such post-processing methods would be easy to implement with a potential high effect, as could be already seen in section 4.3.3 and Tables 14 and 15 and Figure 4.21and Figure 4.22.
Chapter 5: Summary 111 _____________________________________________________________________________
5 Summary The work presented in this PhD thesis applies the model-free extraction of information from functional MR data, via statistical learning of data components. This has been applied in a novel fashion to the classification of high angular diffusion data and clinical prediction of stroke outcome. The first part of the thesis concentrates on the classification of high angular resolution diffusion imaging (HARDI) in vivo data using a model-free approach. This is achieved by using a Support Vector Machine (SVM) algorithm taken from the field of supervised statistical learning. Six classes of image components are determined: grey matter, parallel neuronal fibre bundles in white matter, crossing neuronal fibre bundles in white matter, partial volume between white and grey matter, background noise and cerebrospinal fluid. The SVM requires properties derived from the data as input, the so-called feature vector, which should be rotation invariant. For our application we derive such a description from the spherical harmonic decomposition of the HARDI signal. With this information the SVM is trained in order to find the function for separating the classes. The SVM is systematically tested with simulated data and then applied to six in vivo data sets. This new approach is data-driven and enables fully automatic HARDI data segmentation without employing a T1 MPRAGE scan and subjective expert intervention. The image could even be subdivided into microscopic substructures below image resolution by dividing a region containing white matter into the voxel class of parallel or crossing fibre bundles. This was demonstrated on five test in vivo data sets giving robust results. The segmentation results could be used as a priori knowledge for increasing the performance of fibre tracking as well as for other clinical and diagnostic applications of diffusion-weighted imaging (DWI). With this application it could be shown that when using data-driven approaches, though having the disadvantage of high data dimensionality, more information can be gathered than usually done from such data, which in case of models might be omitted. The second part of the thesis is an example of a data-driven approach in clinical background. Here, again data-driven features were extracted and two classification algorithms were adapted and applied in order to predict the outcome of stroke from acutely acquired MR data. The prediction results found with the model-free approach were compared with the model approach. It could not be shown that data-driven methods (model-free methods) show better results. The usage of model-free approaches showed similar results to the model dependent
Chapter 5: Summary 112 _____________________________________________________________________________ approaches, except that computation time was highly increased due the very high data load and computational effort of the SVM. None of the tested classifiers was superior to the other, but preferred one or the other patient example or one or the other feature representation (scaling or histogram matching). Only the long computation time of the SVM makes logistic regression the better classifier for this prediction task. However, the data-driven approach revealed that the data corpus itself could lack the necessary information for the task of robust and reliable stroke outcome prediction. Further work has to be put into the derivation of the models for determining the perfusion parameters, but most importantly in the improvement of measurement methods for brain blood perfusion. For clinicians it is important to know that the measurement of pw- and dw-MRI is a very useful tool for early diagnosis of ischemic stroke. But if it is used in the context of stroke outcome prediction, the measurements are inaccurate either due to lack of resolution (in case of pw-MRI the slices are about 5 mm thick, resulting in big partial volume issues) or indirect measurement of the wanted parameters. For better accuracy one would not only need to measure longer, but is also in need of new methods fore measuring brain perfusion. Regarding data-driven post-processing methods itself, the conclusion to draw is that it is a very useful tool to test whether or not the derived models omit important information and to find out if there is further improvement and research needed in order to understand the underlying methods.
Chapter 6: Discussion and Outlook 113 _____________________________________________________________________________
6 Discussion and Outlook Two data-driven evaluations in a pattern recognition set-up were explored within the scope of the thesis. While the first application the classification of HARDI data into the six image classes resulted in high classification accuracy, was the task to predict chronic stroke lesion outcome much more cumbersome and difficult to perform. The main reason for the different performance is the better data description of the HARDI measurement along with the improved image resolution. In addition, all image classes were well balanced resulting in much better differentiation ability for the SVM. There were many additional issues to be faced in the stroke outcome prediction task; including the problem of having highly imbalanced classes, low resolution pw-MRI data with low SNR, and signal void in voxels containing blood. Another difficulty lies in the high patient variety due to motion artefacts, different stroke severity, individual blood circulation situations and spatial occurrences of the occlusions. Also, the coregistration of the chronic T2-FLAIR measurement (defined as the ground truth) with the acute pw-MRI for evaluating the results is not very accurate, due to anatomical changes resulting from tissue recovery (such as less swelling) and the much worse SNR and spatial resolution of the pwMRI. All these points make the data highly variable and an accurate prediction of chronic stroke lesion outcome is very difficult. Typically in post-processing and pattern recognition data are reduced using models and data description in order to have lower computing times and improved accuracy. It could be shown that this is not always representing the truth and that often the models are ill-described and omit important information. Additionally it could be shown that the use of data-driven methods might reveal that measurements, here perfusion-weighted MRI, actually do not contain enough information for specific tasks and should first be further developed for being useful for a specific task (here for the prediction of stroke outcome). The presented methods revealed that the database does not contain the necessary information in order to give reliable and robust results. As mentioned above, the nature of the measurement itself is prone to error, but physiological variability could be another contributor to the unreliability. One could improve the dw-MRI in the acute protocol and measure along 12 DE directions, but at the cost of scan time. However, this might give significantly more information about the underlying damage of white matter regions. The measurement of brain perfusion needs not only better image resolution (at least the same as the dw-MRI) but totally new methods in order to be able to accurately measure the AIF.
Chapter 6: Discussion and Outlook 114 _____________________________________________________________________________ Due to its ability to differentiate white matter structures a future application of tissue classification using HARDI data and other image modalities could be the differentiation of various multiple sclerosis (MS) lesion types or the progress of a lesion resulting in maybe the prediction of the MS lesion evolution. Since MS is a disease highly dependent on gender and treatment such prediction could be performed in regard of gender, treatment and lesion type. This would combine and expand both methods (introduced in chapter 3 and 4) developed for HARDI data segmentation and stroke outcome prediction, but in the context of MS. For example, one could apply pattern recognition methods able to detect and classify different types of pathologic modifications (using similar data driven methods applied in chapter 3), starting with the early stage of the disease: normal appearing GM and WM, inflammatory acute demyelinating areas and chronic neuro-degeneration. Using this information a method could be developed for predicting the disease course and the recovery potential of the affected areas using supervised learning theory (similar to chapter 4). With this one could study the gender related differences in the development of the WM pathology and optimisation of the prediction algorithm to be gender specific. Since MS is a chronic inflammatory demyelinating disease of the central nervous system with a large heterogeneity in clinical course and responses to therapy (Kornek and Lassmann 2003), which is still not yet understood and not yet curable, non-invasive investigations and post-processing methods would be of high importance. Brain routine clinical MRI represents the only non-invasive imaging method for MS diagnosis and follow-up, but suffers from poor specificity and correlation with the clinical status of the patient. Valid, reliable and sensitive MRI methods and additional MRI-based post-processing algorithms are urgently needed for the characterisation of subclinical early pathological tissue modifications in MS. This would both, enhance our understanding of MS-related tissue damage and might be able to form the basis for the prediction of the disease evolution, ultimately facilitating the development of efficient therapeutic approaches.
Index i _____________________________________________________________________________
Index Abbreviations a
α ADC AIF b
β
b0-image C CBF CBV CF cl cp cs CSF CT DE DWI EPI FA FLAIR FN FP GLM GM H1, H2 HARDI LOO MD minmax MTT MRI n.a. N/K P(x) PF PR PWI R1 R2
spherical harmonics coefficients scaling factor of the individual feature vector or the parameter in the optimisation process of the SVM apparent diffusion coefficient arterial input function distance of optimal hyperplane to origin shifting/ offset of the individual feature vector T2 weighted image without diffusion weighted within the diffusion weighted MRI penalty variable for SVM classification cerebral blood flow cerebral blood volume crossing fibre bundle linear coefficient derived from the diffusion tensor (Westin coefficient) planar coefficient derived from the diffusion tensor (Westin coefficient) spherical coefficient derived from the diffusion tensor (Westin coefficient) cerebro-spinal fluid computer tomography diffusion-encoding diffusion-weighted imaging, also dw-MRI echo planar imaging fractional anisotropy fluid attenuation inversion recovery false negatives false positives general linear model gray matter hyperplane 1, hyperplane 2 high angular resolution diffusion imaging leave-one-out mean diffusivity data scaling method using the mean, minimum and maximum mean transit time magnetic resonance imaging not applicable not known histogram/distribution of feature x parallel fibre bundle precision and recall perfusion-weighted imaging, also pw-MRI longitudinal relaxation transversal relaxation
Index ii _____________________________________________________________________________ RBF ROC SD SH SNR SPM5, SPM8 stddev SVM T1 T2 T2* TE TI TN TP TR WM x y
radial basis function receiver operating characteristic standard deviation spherical harmonics signal to noise ratio statistical parametric mapping software version 5 or 8 data scaling method using standard deviation support vector machine longitudinal relaxation time transversal relaxation time transversal relaxation time of tissue echo time inversion recovery time true negatives true positives repetition time white matter feature vector matrix label vector
Index iii _____________________________________________________________________________
List of Figures Figure 1.1: A) Coronal cut through the brain showing in dark grey the different cell types and layers of the grey matter and in light grey the white matter, which is composed by neuronal fibre bundles (figure is modified, original figure from: www.brainmaps.org). B) An exemplary neuronal fibre is shown (modified figure, original figure from: http://en.wikipedia.org/wiki/Neuron).................................................................................... 11 Figure 1.2: A) A maximum intensity map of a brain MR angiography showing the brain arteries (example image from one of our stroke patient data sets). B) A schematic drawing is shown, which emphasises on the CSF ventricles and flow in the brain (source: http://www.trejos.com/Trejos/BrainCSF.jpg)....................................................................... 12 Figure 1.3: The Steijskal-Tanner pulsed bipolar gradient scheme. The degree of diffusion sensitisation is given by the gradient duration (δ) and strength (height) of the sensitising pulsed-gradients (G), and the time interval between the two pulsed-gradients (Δ).............. 16 Figure 1.4: Principle of the Steijskal-Tanner sequence and two in-vivo images with and without diffusion weighting. a. The two diffusion sensitive gradients enable signal attenuation if diffusion occurs along the pulsed-gradient direction. b. Two images without (left) and with (right) diffusion weighting. A strong signal reduction caused by diffusion can be seen, especially in regions containing CSF a. ................................................................................ 17 Figure 1.5: Schematic drawing of a tensor in the coordinate system x, y, z with λ1,λ2, λ3 being the eigenvalues along the eigenvectors of the tensor............................................................ 21 Figure 1.6: Resulting differing AIFs (simulation from (Kjolby et al. 2009)) depending on position of the voxel relative to an artery with a radius of 1 mm. The main magnetic field B0 is parallel to the z-axis. a) Four exemplary voxels located outside the artery (represented by four different colours). b) The same colours are used to illustrate the resulting AIFs (time resolved signal). The black signal represents the case without partial volume, but only blood (p = 0) and the orange plot represents the case without partial volume, but only tissue (p = 1). .......................................................................................................................................... 28 Figure 1.7: Ischemic stroke lesion divided in its affected regions: oligaemia, penumbra and core. ............................................................................................................................................... 31 Figure 2.1: SVM classification procedure using training and test data sets. ................................ 37 Figure 2.2: Example training data with two classes (black and hollow dots). The support vectors are encircled. In case of separable date the black dot defining the slack variable ξ would not exist. The case of inseparable data is illustrated by one black dot being on the wrong side of H1 and outside the margin. The distance to H1 defines the error. (figure slightly modified, original from (Burges 1998)). ............................................................................................... 39 Figure 2.3: Histogram matching. The distribution of the test data set is transformed so that it matches the cumulative histogram of the training data set (figure copied from (Molau et al. 2001)). ................................................................................................................................... 46 Figure 3.1: Procedure for labelling the training in vivo data set and training the SVM in order to classify all image contents of a second in vivo DWI data set. .............................................. 56 Figure 3.2: Percentage voxels detected as crossings depending on SNR and angle (81 DE directions). The ground truth is zero below 30°. .................................................................. 59 Figure 3.3: SPM5 segmentation map of grey matter overlaid with the classification results of recognised grey matter in transparent red (the first test data set is shown). a) GM classified with SVM, the thalamus is encircled in green; b) Westin coefficients thresholding (0.8 < cS < 0.9), false positive CSF regions are encircled in yellow. .................................................. 60
Index iv _____________________________________________________________________________ Figure 3.4: White matter map overlaid with classified parallel and crossing fibres (the first test data set is shown). a) PF classified with SVM, the Pons is encircled in red, the blue arrow points to the fornix; b) PF classification by Westin coefficients thresholding (cl > 0.25), the Pons is encircled in red, the blue arrow points to the fornix; c) CF classified with SVM, the thalamus is encircled in green; d) CF classification by Westin coefficients thresholding (cp > 0.17), the thalamus is encircled in green...............................................................................61 Figure 3.5: CSF map overlaid with recognised CSF (the first test data set is shown). a) Classification with SVM; b) classification by Westin coefficients thresholding (cs = 0.95 – 1)............................................................................................................................................62 Figure 3.6: Mean diffusivity maps overlaid with the recognised noise (background noise and image artefacts). a) Classification with SVM; and b) combination and inversion of GM, WM and CSF SPM5 segmentation results............................................................................63 Figure 3.7: SPM segmentation results of GM (a – d) and WM (e – f) overlaid with SVM classification results in transparent red. The two multi-class comparison algorithms of the SVM are contrasted: on the left side the one-vs-rest approach is shown and on the right onevs-one: a) – b) partial volume, c) – d) GM, and e) – f) CF. ..................................................66 Figure 4.1: SPM pre-processing pipeline......................................................................................73 Figure 4.2: The ROC on the left shows the comparison of the prediction results by using the standard features and the two data subsets. The ROC on the right shows the comparison of the new feature setup applied to the two data subsets...........................................................81 Figure 4.3: The PR curve on the shows the comparison of the prediction results by using the standard features and the two data subsets. The PR curve on the right shows the comparison of the new feature setup applied to the two data subsets. .....................................................81 Figure 4.4: Mean perfusion time series (in red) and the difference signal (in green, see section 4.2.3) used for defining the start point of the bolus. The blue function shows the mean perfusion time series with increased time resolution using b-spline interpolation. The black signal represents the mean resulting extracted bolus. The red circle delineates the extracted start point of the bolus...........................................................................................................85 Figure 4.5: Mean normalised boli of 5 patients (patients 3, 4, 5, 6, 7) used for model-free SVM prediction. It can be seen the bolus starting point is the same in all cases and the main difference lies in signal intensities and duration of signal recovery. ....................................86 Figure 4.6: Histograms of the model-free features (without image background) of 5 patients (minmax scaling but no histogram matching): a) perfusion signal of time point 10, b) DWI, c) ADC (in arbitrary units from scanner vendor), d) b0-image and e) T2-FLAIR................87 Figure 4.7: Histograms of the model-free features (without image background, features with minmax scaling and histogram matching) of 5 patients: a) perfusion signal of time point 10, b) DWI, c) ADC (in arbitrary units from scanner vendor), d) b0-image and e) T2-FLAIR..87 Figure 4.8: Histograms of the normalised features for the model approach of all nine patients. a) CBF, b) CBV, c) MTT, d) b0-image, e) ADC (in arbitrary units from scanner vendor), f) DWI, g) T2-FLAIR................................................................................................................88 Figure 4.9: Flowchart of the processing pipeline of the model-free features, here it is shown for normalisation, histogram matching and scaling. The prefix “c” means coregistered, “n” means normalised, “h” stands for histogram matching and “s” for scaled. ..........................89 Figure 4.10: Flowchart of the processing pipeline of the model dependent features, here it is shown for normalisation and scaling. The prefix “c” means coregistered, “n” means normalised and “s” means scaled. .........................................................................................89 Figure 4.11: Comparison of the prediction accuracy when training with the complete combined data sets (blue) and combined sub-data sets (red).................................................................91
Index v _____________________________________________________________________________ Figure 4.12: PR curves and ROCs of three patients (patients 1 – 3), comparing the “Gall-model” (in blue) with the “Mouridsen-model” (in red)..................................................................... 92 Figure 4.13: PR curves and ROCs of three patients (patients 5 – 7), comparing the “Gall-model” (in blue) with the “Mouridsen-model” (in red)..................................................................... 93 Figure 4.14: PR curves and ROCs of the last two patients (patients 8 and 9) comparing the “Gallmodel” (in blue) with the “Mouridsen-model” (in red). ....................................................... 94 Figure 4.15: Comparison of changed numbers of patients combined in the training data set exemplary for four patients (patient 1, 2, 3, 5). The prediction results for the training data sets with 13 combined data sets (also the ones which did not perform) is shown in blue, 7 or 8 patients in red and 3 patients in green................................................................................ 96 Figure 4.16: PR and ROC curves for two exemplary patients (patient 1 and 2) comparing the two classifiers SVM (blue) and logistic regression (red). Also the results for the Mouridsenmodel approach are shown (green). The performance for the SVM was very bad for patient 1, but the logistic regression shows similar results compared with the model. For the second patient the model-free logistic regression performed better than the model and the SVM. . 98 Figure 4.17: PR curves (left) and ROCs (right) of three patients comparing model-free using the SVM (blue) with the “Mouridsen model” approach using logistic regression (red). ........... 99 Figure 4.18: PR curves (left) and ROCs (right) of additional two patients comparing model-free using the SVM (blue) with the “Mouridsen model” approach using logistic regression (red). ............................................................................................................................................. 100 Figure 4.19: Coregistered T2-FLAIR at time point 2 with the overlaid prediction result (in opaque red) of the SVM using no model (patient 7). Exemplary two varying slices are shown (slice 10 and 14). The final lesion is outlined with a green line.............................. 101 Figure 4.20: Coregistered T2-FLAIR at time point 2 with the overlaid prediction result (in opaque red) of the logistic regression using the “Mouridsen-model” (patient 7, same slices as in Figure 4.19). The final lesion is outlined with a green line........................................ 101 Figure 4.21: Further post-processed prediction results (in opaque red) overlaid on the coregistered T2-FLAIR at time point 2 of the logistic regression using the “Mouridsenmodel” (patient 7, same slices as in Figure 4.20). The final lesion is outlined with a green line....................................................................................................................................... 102 Figure 4.22: Further post-processed prediction results (in opaque red) overlaid on the coregistered T2-FLAIR at time point 2 of the logistic regression using the “Mouridsenmodel” (patient 7, same slices as in Figure 4.19). Here the cut-off of the prediction result was chosen to have higher sensitivity but low precision (meaning many false positives) in order to show the effect of post-processing. The final lesion is outlined with a green line. ............................................................................................................................................. 103
Index vi _____________________________________________________________________________
List of Tables Table 1: DWI/PWI mismatch and its interpretation...................................................................... 31 Table 2: List of typical values derived from multi-modal acute stroke protocol for several tissue types of healthy and ischemic voxels. Values were found in several publications (Rohl et al. 2001; Rose et al. 2001). ........................................................................................................ 32 Table 3: Parameter combinations for each tissue class. Each range in FA is divided evenly into 8 bins. Relative weights are divided evenly into 2 bins for PF and 5 bins for CF. For PF the angle range is divided evenly into 5 bins and for CF into 13 bins........................................ 52 Table 4: All tried SVM settings such as kernel function .............................................................. 57 Table 5: The segmentation accuracy in percentage (true positive results out of 520 possible (sensitivity), false positive results out of 1040 possible (1-specificity)) of the SVM classification and thresholding of the linear (cl), planar (cp) and spherical (cs) coefficients of the simulated data with all possible DE directions. The accuracy is shown as a function of SNR. ...................................................................................................................................... 58 Table 6: Comparison of sensitivity and specificity results in percentage (SVM classification and the linear, planar and spherical coefficients thresholding) of all data sets (81, 61 and 31 DE directions) in relation to T1-image segmentation using SPM5. The results of the five test data sets were averaged. ........................................................................................................ 64 Table 7: The basis of the derived initial signal S0 for the normalisation of each feature is shown for model and model-free...................................................................................................... 76 Table 8: Confusion Matrix ............................................................................................................ 79 Table 9: Lesion sizes of the chronic state (patient 3 had an old infarction on the other hemisphere resulting in more chronic lesion voxels written in parenthesis) and demographics of the nine patients (the patient ID’s are written in parenthesis)............................................................. 82 Table 10: Amount of disrupted voxels in the acute PWI, DWI and DWI-PWI core, as well as those voxels disrupted in DWI or PWI. This information is evaluated in order to define the mismatch scenario for each patient. In order to be able to interpret the results in connection with the mismatch concept, the final stroke lesion size is also listed. .................................. 83 Table 11: Final lesion size (outlined in the T2-FLAIR at time point 2 and then coregistered) and spatial overlap with acute DWI, PWI, DWI-PWI core and acute T2-FLAIR. This overview gives an impression about how coregistration worked and how the mismatch model is valid. ............................................................................................................................................... 84 Table 12: Investigation of dependence of prediction performance from final lesion size. ........... 94 Table 13: Evaluation of the detection accuracy at the best precision/recall trade-off. Also the dependence of the prediction result on the acute dw- and pw-MRI is shown. ..................... 97 Table 14: Confusion matrix of further post-processed classification results using logistic regression. The numbers withount post-processing are written in parenthesis (the same results as in Figure 4.20 and Figure 4.21). .......................................................................... 103 Table 15: Confusion matrix of the classification results using logistic regression with further post-processing using a data set with many FPs but higher TPs. The original numbers are written in parenthesis (the same results as in Figure 4.22). ................................................ 104
Index vii _____________________________________________________________________________
Cited Literature Ababneh, Z., H. Beloeil, et al. (2005). "Biexponential parameterization of diffusion and T2 relaxation decay curves in a rat muscle edema model: decay curve components and water compartments." Magn Reson Med 54(3): 524-31. Akbani, R., S. Kwek, et al. (2004). Applying support vector machines to imbalanced datasets. Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, Springer Berlin / Heidelberg. Alexander, D. C. (2005). "Multiple-Fiber Reconstruction Algorithms for Diffusion MRI." Annals of New York Academy of Sciences 1064: 113-133. Alexander, D. C., G. J. Barker, et al. (2002). "Detection and modeling of non-Gaussian apparent diffusion coefficient profiles in human brain data." Magnetic Resonance in Medicine 48(2): 331-40. Arfken, G. B. and H. J. Weber (1985). "Spherical Harmonics" and "Integrals of the Products of Three Spherical Harmonics." Mathematical Methods For Physicists. Orlando, USA, Elsevier Publishing Company, Academic Press: 680-685 and 698-700. Back, T. (1998). "Pathophysiology of the ischemic penumbra--revision of a concept." Cell Mol Neurobiol 18(6): 621-38. Baron, J. C. (1999). "Mapping the ischaemic penumbra with PET: implications for acute stroke treatment." Cerebrovasc Dis 9(4): 193-201. Basser, P. J., J. Mattiello, et al. (1994). "MR diffusion tensor spectroscopy and imaging." Biophysical Journal 66(1): 259-67. Behrens, T. E. J., H. J. Berg, et al. (2007). "Probabilistic diffusion tractography with multiple fibre orientations: What can we gain?" NeuroImage 34: 144-155. Belliveau, J. W., D. N. Kennedy, Jr., et al. (1991). "Functional mapping of the human visual cortex by magnetic resonance imaging." Science 254(5032): 716-9. Bernstein, M., K. King, et al. (2004). Handbook of MRI Pulse Sequences, Elsevier Academic Press. Bradley, A. P. (1997). "The use of the area under the {ROC} curve in the evaluation of machine learning algorithms." Pattern Recognition 30(7): 1145-1159. Burges, C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. Boston, USA, Kluwer Academic Publishers. 2: 121-167. Calamante, F. (2005). Artifacts and pitfalls in perfusion MR imaging. Clinical MR Neuroimaging, Diffusion, Perfusion and Spectroscopy. J. H. Gillard, A. D. Waldman and P. B. Barker. Cambridge, Cambridge University Press: 141 - 160. Chang, C.-C. and C.-J. Lin. (2001). "LIBSVM : a library for support vector machines." from http://www.csie.ntu.edu.tw/~cjlin/libsvm. Christensen, S., K. Mouridsen, et al. (2009). "Comparison of 10 perfusion MRI parameters in 97 sub-6-hour stroke patients using voxel-based receiver operating characteristics analysis." Stroke 40(6): 2055-61. Clark, C. A. and D. Le Bihan (2000). "Water diffusion compartmentation and anisotropy at high b values in the human brain." Magn Reson Med 44(6): 852-9. Cohen, Y. and Y. Assaf (2002). "High b-value q-space analyzed diffusion-weighted MRS and MRI in neuronal tissues - a technical review." NMR in Biomedicine 15(7-8): 516-542. Conturo, T. E., R. C. McKinstry, et al. (1996). "Encoding of anisotropic diffusion with tetrahedral gradients: a general mathematical diffusion formalism and experimental results." Magn Reson Med 35(3): 399-412. Cortes, C. and V. Vapnik (1995). "Support Vector Networks." Machine Learning 20: 273-297. Cristianini, N. and S.-T. John (2000). An introduction to support Vector Machines: and other kernel-based learning methods, Cambridge University Press. Descoteaux, M., E. Angelino, et al. (2006). "Apparent Diffusion Coefficients from High Angular Resolution Diffusion Imaging: Estimation and Applications." Magnetic Resonance in Medicine 56: 395-410. Dijkhuizen, R. M., M. Asahi, et al. (2001). "Delayed rt-PA treatment in a rat embolic stroke model: diagnosis and prognosis of ischemic injury and hemorrhagic transformation with magnetic resonance imaging." J Cereb Blood Flow Metab 21(8): 964-71. Edvinsson, L., E. MacKenzie, et al. (1993). Cerebral blood flow and metabolism. New York, Raven. Einstein, A. (1956). Investigations on the theory of the Brownian movement. [New York], Dover Publications. Fisher, M., J. W. Prichard, et al. (1995). "New magnetic resonance techniques for acute ischemic stroke." Jama 274(11): 908-11. Frank, L. R. (2002). "Characterization of Anisotropy in High Angular Resolution Diffusion-Weighted MRI." Magnetic Resonance in Medicine 47: 1083–1099.
Index viii _____________________________________________________________________________ Friston,
K., J. Ashburner, et al. (2009). "Statistical Parametric Mapping." 2009, from http://www.fil.ion.ucl.ac.uk/spm/. Gall, P., I. Mader, et al. (2009). "Extraction of the first bolus passage in dynamic susceptibility contrast perfusion measurements." Magma 22(4): 241-9. Garcia, J. H. (1984). "Experimental ischemic stroke: a review." Stroke 15(1): 5-14. Gottrup, C., K. Thomsen, et al. (2005). "Applying instance-based techniques to prediction of final outcome in acute stroke." Artificial Intelligence in Medicine 33(3): 223-236. Gualtieri, J. A. and R. F. Cromp (1999). Support Vector Machines for Hyperspectral Remote Sensing Classification. SPIE. Haacke, E. M., R. W. Brown, et al. (1999). Magnetic Resonance Imaging. Physical Principles and Sequence Design. New York, Wiley-Liss (John Wiley & Sons). Hagmann, P., L. Jonasson, et al. (2006). "Understanding diffusion MR imaging techniques: from scalar diffusionweighted imaging to diffusion tensor imaging and beyond." Radiographics 26 Suppl 1: S205-23. Hasan, K. M. and P. A. Narayana (2006). "Retrospective Measurement of the Diffusion Tensor Eigenvalues From Diffusion Anisotropy and Mean Diffusivity in DTI." Magnetic Resonance in Medicine 56: 130-137. Jones, D. K., M. A. Horsfield, et al. (1999). "Optimal strategies for measuring diffusion in anisotropic systems by magnetic resonance imaging." Magnetic Resonance in Medicine 42(3): 515-525. Jonsdottir, K. Y., L. Ostergaard, et al. (2009). "Predicting tissue outcome from acute stroke magnetic resonance imaging: improving model performance by optimal sampling of training data." Stroke 40(9): 3006-11. Kiselev, V. G. (2001). "On the theoretical basis of perfusion measurements by dynamic susceptibility contrast MRI." Magn Reson Med 46(6): 1113-22. Kiselev, V. G. and K. A. Il'yasov (2007). "Is the "biexponential diffusion" biexponential?" Magn Reson Med 57(3): 464-9. Kjolby, B. F., I. K. Mikkelsen, et al. (2009). "Analysis of partial volume effects on arterial input functions using gradient echo: a simulation study." Magn Reson Med 61(6): 1300-9. Kornek, B. and H. Lassmann (2003). "Neuropathology of multiple sclerosis-new concepts." Brain Res Bull 61(3): 321-6. Kreher, B. W., I. Mader, et al. (2008). "Gibbs tracking: A novel approach for the reconstruction of neuronal pathways." Magnetic Resonance in Medicine 60(4): 953-963. Kreher, B. W., J. F. Schneider, et al. (2005). "Multitensor Approach for Analysis and Tracking of Complex Fiber Configurations." Magnetic Resonance in Medicine 54: 1216-1225. Landgrebe, T. C. W., P. Paclik, et al. (2006). Precision-recall operating characteristic (P-ROC) curves in imprecise environments. Proceedings of the 18th International Conference on Pattern Recognition - Volume 04, IEEE Computer Society. Le Bihan, D., E. Breton, et al. (1986). "MR imaging of intravoxel incoherent motions: application to diffusion and perfusion in neurologic disorders." Radiology 161(2): 401-7. Lee, J. H. and C. S. Springer, Jr. (2003). "Effects of equilibrium exchange on diffusion-weighted NMR signals: the diffusigraphic "shutter-speed"." Magn Reson Med 49(3): 450-8. Liang, Z.-P. and P. Lauterbur (2000). Principles of Magnetic Resonance Imaging, IEEE Press. Maier, S. E., S. Vajapeyam, et al. (2004). "Biexponential diffusion tensor analysis of human brain diffusion data." Magn Reson Med 51(2): 321-30. Molau, S., M. Pitz, et al. (2001). Histogram Based Normalization In The Acoustic Feature Space. Automatic Speech Recognition and Understanding Workshop, Proc.ASRU2001, Madonna di Campiglio. Mouridsen, K., S. Christensen, et al. (2006). "Automatic Selection of Arterial Input Function Using Cluster Analysis." Magnetic Resonance in Medicine 55: 524–531. Nattkemper, T. W. (2004). "Multivariate Image Analysis in Biomedicine - A Methodological Review." Journal of Biomedical Informatics 37(5): 380 - 391. Neil, J. J. (1997). "Measurement of water motion (apparent diffusion) in biologial systems." Magn. Reson.: Educ. J. 9(6): 385-401. Niendorf, T., R. M. Dijkhuizen, et al. (1996). "Biexponential diffusion attenuation in various states of brain tissue: implications for diffusion-weighted imaging." Magn Reson Med 36(6): 847-57. Nieuwenhuys, R., J. Voogd, et al. (2008). The Human Central Nervous System. Berlin Heidelberg New York, Springer. Østergaard, L. (2005). Cerebral perfusion imaging by exogenous contrast agent. Clinical MR Neuroimaging, Diffusion, Perfusion and Spectroscopy. J. H. Gillard, A. D. Waldman and P. B. Barker. Cambridge, Cambridge University Press: 109 - 118.
Index ix _____________________________________________________________________________ Østergaard, L., J. Baron, et al. (2006). Integrating Information from Molecule to Man: Knowledge Discovery Accelerates Drug Development and Personalized Treatment in Acute Stroke. Ostergaard, L., K. Y. Jonsdottir, et al. (2009). "Predicting tissue outcome in stroke: new approaches." Curr Opin Neurol 22(1): 54-9. Pfeuffer, J., W. Dreher, et al. (1998). "Water signal attenuation in diffusion-weighted 1H NMR experiments during cerebral ischemia: influence of intracellular restrictions, extracellular tortuosity, and exchange." Magnetic Resonance Imaging 16(9): 1023-1032. Quddus, A., P. Fieguth, et al. (2005). Adaboost and Support Vector Machines for White Matter Lesion Segmentation in MR Images. IEEE Engineering in Medicine and Biology, Shanghai, China. Reiche, D., M. Bindig, et al. (2003, July 2003). "Roche Lexikon, Medizin." 5. Auflage. Retrieved 15.10.2009, from http://www.roche.de/lexikon/index.htm?userInput=Suche%20im%20Roche%20Lexikon&loc=www.roche. de. Rohl, L., L. Ostergaard, et al. (2001). "Viability thresholds of ischemic penumbra of hyperacute stroke defined by perfusion-weighted MRI and apparent diffusion coefficient." Stroke 32(5): 1140-6. Ronneberger, O. (2004). "libSVMtl." Albert-Ludwigs University Freiburg, Germany. from http://lmb.informatik.uni-freiburg.de/lmbsoft/libsvmtl/download.en.html. Rose, S. E., J. B. Chalk, et al. (2001). "MRI based diffusion and perfusion predictive model to estimate stroke evolution." Magn Reson Imaging 19(8): 1043-53. Rosen, B. R., J. W. Belliveau, et al. (1991). "Susceptibility contrast imaging of cerebral blood volume: human experience." Magn Reson Med 22(2): 293-9; discussion 300-3. Rosen, B. R., J. W. Belliveau, et al. (1991). "Contrast agents and cerebral hemodynamics." Magn Reson Med 19(2): 285-92. Schnell, S., D. Saur, et al. (2009). "Fully automated classification of HARDI in vivo data using a support vector machine." NeuroImage 46(3): 642-651. Schölkopf, B. and A. J. Smola (2002). Learning with Kernels. Cambridge, Massachusetts (USA), MIT Press. Schwarcz, A., P. Bogner, et al. (2004). "The existence of biexponential signal decay in magnetic resonance diffusion-weighted imaging appears to be independent of compartmentalization." Magn Reson Med 51(2): 278-85. Siemonsen, S., T. Fitting, et al. (2008). "T2' imaging predicts infarct growth beyond the acute diffusion-weighted imaging lesion in acute stroke." Radiology 248(3): 979-86. Simonsen, C. Z., L. Ostergaard, et al. (1999). "CBF and CBV measurements by USPIO bolus tracking: reproducibility and comparison with Gd-based values." J Magn Reson Imaging 9(2): 342-7. Skare, S., M. Hedehus, et al. (2000). "Condition Number as a Measure of Noise Performance of Diffusion Tensor Data Acquisition Schemes with MRI." Journal of Magnetic Resonance 147(2): 340-352. Sotak, C. H. (2002). "The role of diffusion tensor imaging in the evaluation of ischemic brain injury - a review." NMR Biomed 15(7-8): 561-9. SPM5. (2005). "Functional Imaging Laboratories, Methods Group." from http://www.fil.ion.ucl.ac.uk/spm/. Srinivasan, A., M. Goyal, et al. (2006). "State-of-the-art imaging of acute stroke." Radiographics 26 Suppl 1: S7595. Stejskal, E. O. and J. E. Tanner (1965). "Spin Diffusion Measurements: Spin Echoes in the Presence of a TimeDependent Field Gradient." The Journal of Chemical Physics 42(1): 288-292. Stewart, G. N. (1893). "Researches on the Circulation Time in Organs and on the Influences which affect it." The Journal of Physiology 15(1-2): 1-89. The MathWorks, I. (2007). MATLAB. Natwick, MA (USA). Thomalla, G., C. Schwark, et al. (2006). "Outcome and symptomatic bleeding complications of intravenous thrombolysis within 6 hours in MRI-selected stroke patients: comparison of a German multicenter study with the pooled data of ATLANTIS, ECASS, and NINDS tPA trials." Stroke 37(3): 852-8. Tuch, D. S., R. M. Weisskoff, et al. (1999). High Angular Resolution Diffusion Imaging of the Human Brain. International Annual Meeting of the International Society for Magnetic Resonance in Medicine, Philadelphia, USA. van Osch, M. J., E. J. Vonken, et al. (2003). "Measuring the arterial input function with gradient echo sequences." Magn Reson Med 49(6): 1067-76. Veropoulos, K., C. Campbell, et al. (1999). Controlling the Sensitivity of Support Vector Machines. Proceedings of the International Joint Conference on AI. Vlaardingerbroek, M. T. and J. A. den Boer (2003). Magnetization Preperation, a T1 Preperation Pulse: Inversion Recovery. Magnetic Resonance Imaging, Theory and Practice. Berlin, Springer-Verlag Berlin: 89.
Index x _____________________________________________________________________________ Vlaardingerbroek, M. T. and J. A. den Boer (2003). System Architecture. Magnetic Resonance Imaging, Theory and Practice. Berlin, Springer-Verlag Berlin: 21-51. Webster, A. and G. Szegö (1930). Leipzig and Berlin, Germany, Teubner. Westin, C.-F., S. Peled, et al. (1997). Geometrical diffusion measures for MRI from tensor basis analysis. International Annual Meeting of the International Society for Magnetic Resonance in Medicine, Vancouver, Canada. Wu, G. and E. Y. Chang (2003). Class-Boundary Alignment for Imbalanced Dataset Learning. In ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC (USA). Wu, O., S. Christensen, et al. (2006). "Characterizing physiological heterogeneity of infarction risk in acute human ischaemic stroke using MRI." Brain 129(Pt 9): 2384-93. Wu, O., W. J. Koroshetz, et al. (2001). "Predicting tissue outcome in acute human cerebral ischemia using combined diffusion- and perfusion-weighted MR imaging." Stroke 32(4): 933-42. Yeo, B. T. T. (2005). "Computing Spherical Transform and Convolution on the 2-Sphere." 1-8. Yoneda, Y., K. Tokui, et al. (1999). "Diffusion-weighted magnetic resonance imaging: detection of ischemic injury 39 minutes after onset in a stroke patient." Ann Neurol 45(6): 794-7. Zaitsev, M., J. Hennig, et al. (2006). Geometric Distortions Applied to Diffusion Tensor Imaging. International Annual Meeting of the International Society for Magnetic Resonance in Medicine, Seattle, USA. Zierler, K. (1962). "Theoretical basis of indicator-dilution methods for measuring flow and volume." Circ Res 10: 393 - 407. Zweig, M. H. and G. Campbell (1993). "Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine [published erratum appears in Clin Chem 1993 Aug;39(8):1589]." Clin Chem 39(4): 561-577.
Own Publications xi _____________________________________________________________________________
Own Publications Journal Papers Schnell S, Kiselev VG, Umarova R, Saur D, Klærke Mikkelsen I, Ribe L, Mouridsen K, Ostergaard L, Nighoghossian N, Cho TH, Burkhardt H, and Reisert M. What can we gain from perfusion and diffusion MRI for the task of stroke outcome prediction?, in preperation Vry M-S, Saur D, Schnell S, Umarova R, Glauche V, Rijntjes M, Hamzei F, and Weiller C, Decomposing motor cognition: evidence for a ventral fiber pathway in motor imagery, in preperation Suchan J, Umarova R, Schnell S, HimmelbachM, Weiller C, Karnath H, and Saur D. Fiber pathways connecting cortical areas relevant for spatial orienting and exploration, submitted in NeuroImage, March 2010 Harsan LA, Paul D, Schnell S, Kreher BW, Hennig J, Staiger JF, and v. Elverfeldt D. In Vivo Diffusion Tensor Magnetic Resonance Imaging and Fiber Tracking of the Mouse Brain. NMR Biomed. 2010 Mar 8. [Epub ahead of print] Rüsch N, Bracht T, Kreher BW, Schnell S, Glauche V, Il'yasov KA, Ebert D, Lieb K, Hennig J, Saur D, van Elst LT. Reduced interhemispheric structural connectivity between anterior cingulate cortices in borderline personality disorder, Psychiatry Res. 2010 Feb 28;181(2):151-4. Epub 2010 Jan 15 Saur D, Schelter B, Schnell S, Kratochvil D, Kupper H, Kellmeyer P, Kummerer D, Kloppel S, Glauche V, Lange R, Mader W, Feess D, Timmer J, Weiller C, Combining functional and anatomical connectivity reveals brain networks for auditory language comprehension. [epub] Neuroimage, 2009. Bracht T, Tuscher O, Schnell S, Kreher BW, Rusch N, Glauche V, Lieb K, Ebert D, Il'yasov K.A., Hennig J, WeillerC, van Elst LT, Saur D, Extraction of prefronto-amygdalar pathways by combining probability maps. Psychiatry Res, 2009. 174(3): p. 217-22. Schnell S, Saur D, Kreher BW, Hennig J, Burkhardt H, Kiselev VG. Fully automated classification of HARDI in vivo data using a support vector machine. NeuroImage 46(3): 642651 (2009) Umarova RM, Saur D, Schnell S, Kaller CP, Vry M-S, Glauche V, Rijntjes M, Hennig J, Kiselev VG, Weiller C, Structural Connectivity for Visuospatial Attention: Significance of Ventral Pathways, Cerebral Cortex (2009) Saur D, Kreher BW, Schnell S, Kümmerer D, Kellmeyer P, Vry MS, Umarova R, Musso M, Glauche V, Abel S, Huber W, Rijntjes M, Hennig J, Weiller C., Ventral and dorsal pathways for language. Proc Natl Acad Sci U S A 105(46): 18035-40 (2008)
Own Publications xii _____________________________________________________________________________ Kreher BW, Schnell S, Mader I, Il'yasov K, Hennig J, Kiselev VG, Saur D, Connecting and merging fibres: Pathway extraction by combining probability maps. NeuroImage 43(1): 81-89 (2008) Framme C, Alt C, Schnell S, Sherwood M, Brinkmann R, Lin CP, Selective Targeting of the Retinal Pigment Epithelium in Rabbit Eyes with a Scanning Laser Beam, IOVS, Vol. 48, No. 4 (April 2007) Alt C, Framme C, Schnell S, Lee H, Brinkmann R, Lin CP, Selective targeting of the retinal pigment epithelium using an acousto-optic laser scanner, Journal of Biomedical Optics 10(6), 064014 (November/December 2005) Framme C, Alt C, Schnell S, Brinkmann R, Lin CP, Selective RPE laser treatment with a scanned cw laser beam in rabbits, Ophthalmologe. May;102(5):491-6 (2005)
Conference Abstracts Schnell S, McMahon K, Heath S, Van Hees S, Holmes A, De Zubicaray G, and Copland D. Tracking the arcuate fasciculus in patients with aphasia Submitted in Academy of Aphasia 2010 Schnell S, Mader I, Saur D, Mouridsen K, Umarova R, Kümmerer D, Burkhardt H, and Kiselev VG. Estimation of Tissue at Risk of Infarction using a Support Vector Machine on multimodal Stroke MRI Data Proceedings of the 17th Annual Meeting ISMRM 2009, Hawaii/USA Schnell S, Kreher BW, Hennig J, Burkhardt H, Kiselev VG Automatic Classification of Human Brain Constituents including Crossing Fibres using HARDI and a Support Vector Machine Proceedings of the 14th Annual Meeting of the OHBM 2008, Melbourne/Australia Umarova R, Saur D, Schnell S, Kreher BW, Vry M, Glauche V, Weiller C Pathways for visual-spatial attention. Proceedings of the 14th Annual Meeting of the OHBM 2008, Melbourne/Australia Schnell S, Kreher BW, Hennig J, Burkhardt H, Kiselev VG Recognition of grey matter and parallel versus crossing fibre bundles within white matter using HARDI data and a supervised learning algorithm. Proceedings of the 16th Annual Meeting ISMRM 2008, Toronto/Canada, #568 Saur D, Schnell S, Kreher B, Kuepper H, Kuemmerer D, Abel S, Umarova R, Weiller C fMRI-guided tractography of language processing streams in the healthy brain: Relevance to recovery of aphasia Annual meeting of the OHBM in Chicago, Neuroimage 2007, p 53
Own Publications xiii _____________________________________________________________________________ Schnell S, Kreher BW, Hennig J, Il’yasov KA Investigation of the impact of noise on standard fibre tracking algorithms Proceedings of the Joint Annual Meeting ISMRM-ESMRMB 2007, Berlin/Germany, #1561 Kreher BW, Schnell S, Hennig J, Il'yasov KA DTI&FiberTools: DTI Calculation, Fiber-Tracking, and Combined Evaluation Consolidated in a Complete Toolbox Proceedings of the 12th Annual Meeting of the Organization for Human Brain Mapping, Florence, Italy, 2007 Alt C, Framme C, Schnell S, Schuele G, Brinkmann R, Lin CP In vivo and in vitro selective targeting of the retinal pigment epithelium using a laser-scanning device 12th Conference on Ophthalmic Technologies, JAN 19-20, 2002 San Jose, CA Ophthalmic Technologies XII, Proceedings of the Society of Photo-Optical Instrumentatipm Engineers (SPIE), Volume: 4611, Pages: 59-63, 2002 Lin CP, Alt C, Schnell S, Framme C, Schuele G, Brinkmann R Selective targeting of the retinal pigment epithelium in vivo using a laser scanner Annual Meeting of the Association-for-Research-in-Vision-and-Ophthalmology, MAY 05-10, 2002 FT Lauderdale, Florida, Investigative Ophthalology & Visual Science, Volume: 43, Pages: U595-U595, Suppl. 1, Meeting Abstract: 2533, MAY 2002
Acknowledgements I would like to extend my gratitude to my supervisors Prof. Dr. Jürgen Hennig and Dr. Valerij Kiselev for giving me the possibility to work on my thesis at the Medical Physics department of the University Hospital in Freiburg. To my first thesis advisor Prof. Dr. Hans Burkhardt, who fully supported all my intentions, I offer my lasting appreciation. For his intellectual assistance and general support I express a warm thank you to my second thesis advisor PD Dr. Michael Markl. My special acknowledgement of his very great assistance, I wish to express to Dr. Marco Reisert for his support at the end of my thesis regarding statistical learning methods and his great help in taking over many running side projects not being part of my thesis. Special gratitude I owe to Dr. Björn Kreher for giving advises in many delicate issues. A very warm thank you I want to express to my colleagues and friends Dr. Jochen Leupold and Dr. Katie McMahon from the Centre of Advanced Imaging at University of Queensland in Brisbane (Australia) for their very valuable last minute help and fruitful discussions. Particular gratitude I would like to extend to Ramona Lorenz, Dr. Laura Harsan, Jelena Bock, and Stefanie Schwenk for their big support and considerable help. They always motivated me and gave big moral support during difficult periods, which gave me the necessary staying power for finishing the thesis. I also wish to thank the whole Medical Physics department for the warm welcome and the great atmosphere I enjoyed during my stay, especially Laurence Haller. This work is dedicated to my nearest family and friends who helped me manage all the problems and overcome the difficulties encountered while working at this thesis.