Dipartimento di Matematica
Laboratoire Fizeau UMR 6525
A STUDY OF ITERATIVE METHODS IN IMAGE RECONSTRUCTION AND APPLICATIONS Federico Benvenuto
Jointly supervised Ph.D. Thesis in Mathematics
Università di Genova
Université de Nice Sophia-Antipolis
via Dodecaneso 35, 16146 Genova,
Parc Valrose 06108, Nice cedex 02,
Italy, http://www.dima.unige.it/
France, http://fizeau.oca.eu/
Jointly supervised Ph.D. Thesis in Mathematics A STUDY OF ITERATIVE METHODS IN IMAGE RECONSTRUCTION AND APPLICATIONS Dottorato di Ricerca in Matematica e Applicazioni Dipartimento di Matematica, Università degli Studi di Genova Doctorat de Recherche en Mathématiques et Applications Laboratoire A. H. Fizeau, Université de Nice Sophia-Antipolis Submitted by Federico Benvenuto
[email protected] [email protected] Date of submission: December 2009 Advisor: Prof. Mario Bertero DISI, Università degli Studi di Genova, via Dodecaneso 35, 16146 Genova, http://www.disi.unige.it
[email protected]
Co-Advisor: André Ferrari Laboratoire A. H. Fizeau UMR 6525, Université de Nice Sophia-Antipolis, Parc Valrose 06108, Nice cédex 02, http://fizeau.oca.eu
[email protected] Ext. Reviewers: Christine De Mol Department of Mathematics, Université Libre de Bruxelles Campus Plaine CP 217, Boulevard du Triomphe 1050 Bruxelles, http://www.ulb.ac.be
[email protected] Jean-Francois Giovannelli Equipe Signal et Image, LAPS / IMS, 351 Cours de la Libération 33405 Talence cédex, http://www.ims-bordeaux.fr
[email protected]
Acknowledgements I would like to express my gratitude to my supervisor Prof. Mario Bertero for giving me the opportunity to work in a very interesting area, and for his support and guidance throughout my graduate studies at Department of Computer and Information Science. I am grateful to the whole staff of the Signal Processing Group at the Fizeau Laboratory at Nice University, especially to my co-advisor Prof. André Ferrari, Celine Theys and Marcel Carbillet, for their helpful scientific and human support. I would like to thank Prof. Henri Lanteri for his kindness and hospitality, and for providing me with many suggestions, important advice and constant encouragement. Special thanks go to Pierre-Marie, Luca, Cristina and Hamad for their emotional support during the time I spent at Nice. Moreover, a big thanks goes to the PhD fellows of Visual Computing Laboratory, Gabriele Desiderà and Andrea La Camera for the time spent together and for the invaluable help and advices they gave me on many occasions. Many thanks go to Maurizio Filippone and Andrea Zunino for providing stimulating ideas and useful discussions. I also thank Riccardo Zanella of University of Modena for his great collaboration. Finally, thanks to Patrizia and to my family for putting up with me all this time!
Sommario
In questa tesi vengono studiati da un punto di vista matematico diversi algoritmi iterativi per la ricostruzione di immagini. Essi nascono da un approccio statistico al problema di ricostruzione, che si fonda sull’ipotesi secondo cui le immagini osservate, ovvero i dati, siano realizzazioni di variabili aleatorie, il cui valore atteso è incognito. Nei casi affrontati nella tesi, si considerano variabili di statistica Gaussiana oppure Poissoniana e anche variabili che sono la somma tra una variabile Gaussiana e una Poissoniana. Se da un lato, il dato è frutto di un processo statistico, dall’altro il suo valore atteso dipende dall’oggetto da ricostruire mediante una relazione che ne descrive il processo di formazione in seno al sistema di acquisizione. L’oggetto da ricostruire è visto come un insieme di parametri e la relazione che esprime il valore atteso del dato in funzione di questi parametri è usualmente detta modello. In questa tesi, tranne due casi in cui non è lineare, il modello consiste in una convoluzione tra i parametri e un’immagine che rappresenta la risposta all’impulso del sistema detta Point Spread Function (PSF). Supponendo che la statistica del dato sia nota, è possibile scriverne esplicitamente la distribuzione di probabilità assumendo che il valore atteso sia espresso dal modello. Quindi, il metodo per trovare la soluzione al problema di ricostruzione di immagini consiste nel massimizzare questa probabilità in funzione dei parametri. Il problema di ricostruzione è così formulato come un problema di massima verosimiglianza (MV). Infine, la funzione di MV viene riscritta mediante composizione con il logaritmo negativo ottenendo un problema di ottimizzazione convessa, se il modello è lineare, e non convessa altrimenti. Nel caso di noise Gaussiano si ritrova il problema ai minimi quadrati, mentre si giunge a nuovi problemi nel caso di noise Poissoniano o in quello misto Gaussiano-Poissoniano. Per calcolare operativamente l’insieme dei parametri soluzione del problema di MV è necessario un processo iterativo, data la sua natura non lineare. Ma il problema principale nel calcolare la soluzione consiste nel fatto che essa è completamente priva di significato, a causa della concomitanza di due fattori: il mal condizionamento del modello e il
ii
carattere aleatorio del dato. Di fronte a questo problema, gli algoritmi iterativi hanno il pregio di approssimare la soluzione a ogni passo, e quindi di giungere alla soluzione del problema di MV, che come detto è “sbagliata”, poco per volta. Questo dà modo di fermarsi prima che nella soluzione compaiano gli effetti di propagazione del rumore. Fermare le iterazioni è un modo per affrontare il problema della propagazione del rumore facente parte dei così detti metodi di regolarizzazione. Nella tesi descriviamo e utilizziamo anche i metodi di regolarizzazione provenienti dall’approccio bayesiano o “a posteriori”. Essi perseguono un’altra via rispetto all’arresto delle iterazioni, ovvero modificano la funzione di MV, introducendo una seconda distribuzione di probabilità direttamente sull’insieme dei parametri. Il problema impostato viene detto stima del “massimo a posteriori”, e la soluzione dipende da un parametro detto di regolarizzazione. Oltretutto, i metodi iterativi si prestano molto bene all’introduzione di vincoli sulla soluzione. Nella tesi consideriamo che gli oggetti siano esaustivamente descritti da tabelle di numeri i cui elementi siano tutti positivi e in questo modo possiamo introdurre nella ricerca iterativa della soluzione il vincolo di non negatività sui parametri. Gli oggetti ricostruiti nei casi trattati nella tesi soddisfano questo vincolo. Per minimizzare le funzioni i metodi del primo ordine, ossia metodi che necessitano solo del calcolo del gradiente della funzione, sono estremamente utili e di facile implementazione nel caso di problemi con un grande numero di parametri. Così, nella tesi l’attenzione è rivolta particolamente a questa classe di metodi. Tali metodi hanno dimostrato di essere in grado di ricostruire immagini e altri segnali, e di essere potenti strumenti per introdurre vincoli, mediante la conoscenza a priori, sulla soluzione. Nella tesi abbiamo indagato l’applicazione di questi metodi a diversi problemi: dalle immagini astronomiche provenienti da un telescopio a terra, a “immagini magnetiche” provenienti dalla prospezione geomagnetica in indagini archeologiche. Inoltre ci siamo occupati di immagini con caratteristiche diverse: da immagini di oggetti diffusi a quelle di oggetti sparsi o con bordi netti. Nel capitolo 2, si descrive il processo di formazione delle immagini, e le sorgenti di rumore nel processo di acquisizione. Inoltre, viene formulato il problema di ricostruzione dell’immagine in ambito statistico. Infine, si descrive brevemente la buona posizione e il mal condizionamento del problema discreto di ricostruzione. Nel capitolo 3 si descrive l’approccio statistico classico alla ricostruzione di immagini soprattutto in considerazione di tre tipi di rumore: Gaussiano, Poissoniano, e un misto dei due. Si descrive l’approccio di MV alla ricostruzione di immagini, e le condizioni di esistenza delle soluzioni del problema. Successivamente, si introduce la regolarizzazione sotto due diverse forme: per arresto delle iterazioni prima della convergenza alla soluzione del problema di MV, e riformulando il problema in un contesto
iii
bayesiano, mediante l’uso di distribuzioni di probabilità a priori in qualità di funzioni regolarizzanti. Nel capitolo 4 si introducono i principali algoritmi di tipo gradiente per la ricostruzione d’immagini quando, nella ricerca della soluzione, si considera che l’insieme dei parametri sia vincolato a valori non negativi. Si presenta una tecnica generale, denominata Split Gradient Method (SGM), che permette di trovare, in un modo molto diretto, tutti gli algoritmi di tipo gradiente presenti nella tesi, anche nei casi di regolarizzazione in contesto bayesiano. Infine, si introduce il metodo Scaled Gradient Projection (SGP). Questo metodo permette di aumentare considerevolmente le prestazioni degli algoritmi del primo ordine forniti da SGM. Nel capitolo 5, si discutono due esempi di ricostruzione quando il rumore sui dati è di tipo Gaussiano. Nel primo esempio si evidenziano i miglioramenti nelle prestazioni fornite dal metodo SGP per la ricostruzione di immagini astronomiche simulate. Nel secondo, dopo aver descritto il sistema di formazione delle cosiddette “immagini magnetiche”, si presentano i risultati di varie ricostruzioni regolarizzate fornite da algoritmi adattati a questo caso grazie al metodo SGM. Nel capitolo 6 si confrontano tre algoritmi per la ricostruzione di immagini astronomiche quando il rumore sui dati è una miscela di rumore Poissoniano e Gaussiano. Il primo algoritmo può essere ottenuto tramite SGM, mentre gli altri sono due diverse approssimazioni del primo. Nel capitolo 7 si considerano le ricostruzioni di tipo “cieco”. Nella prima parte si riprende il problema geomagnetico, discusso nel capitolo 5, quando il parametro profondità che figura nella Impulse Response Function (IRF) - analogo geomagnetico della PSF - è sconosciuto. Si presentano i risultati del processo di inversione in un caso simulato e in un caso reale. Nella seconda parte, si discute un esempio in cui i dati sono immagini astronomiche simulate e in cui sia la PSF che i dati sono misurati, e quindi entrambi sono perturbati da rumore Poissoniano. Il capitolo 8 suggerisce qualche possibile ulteriore sviluppo della ricerca.
Sintesi dei contributi Questo lavoro porta i seguenti contributi originali. - Nel capitolo 3 abbiamo studiato l’esistenza delle soluzioni del problema di MV applicato alla ricostruzione di immagini perturbate da una miscela di rumore Gaussiano e Poissoniano e abbiamo dimostrato le condizioni di esistenza della soluzione del problema di MV. - Nel capitolo 3 abbiamo proposto due definizioni della proprietà di semi-convergenza per curve o successioni in dimensione finita. In entrambi i casi la definizione di semiconvergenza codifica il comportamento di una curva o di una successione rispetto a
iv
un dato punto. Inoltre abbiamo svolto una discussione approfondita sulla questione della semiconvergenza del metodo di Landweber. Abbiamo mostrato tramite un esempio che tale metodo non ha necessariamente la proprietà della semi-convergenza. Inoltre, abbiamo mostrato che il valore atteso dell’errore di ricostruzione rispetto alla statistica del rumore e ad una particolare classe di distribuzioni di autovalori (i cui elementi approssimano la distribuzione reale degli autovalori di un sistema di formazione di immagini) è semi-convergente rispetto alla soluzione “vera”. - Nel capitolo 4 diamo un’approssimazione dell’algoritmo standard per la ricostruzione di immagini nel caso di una miscela di rumore Gaussiano e Poissoniano. Questa approssimazione dà un risultato più fine rispetto a quella proposta in letteratura come discusso nel capitolo 6. In appendice abbiamo descritto i dettagli matematici di questa approssimazione. - Nel capitolo 5, abbiamo applicato il metodo di SGP per la ricostruzione di immagini al caso di un problema mal condizionato con i dati perturbati da rumore Gaussiano. Abbiamo effettuato diverse simulazioni numeriche verificando l’efficienza del metodo SGP. Abbiamo inoltre confrontato gli algoritmi derivati dal metodo SGP rispetto ai classici algoritmi del primo ordine. - Nel capitolo 7 abbiamo proposto un metodo di deconvoluzione “semi-cieca” adatto per il caso specifico in studio, in cui l’IRF dipende da un parametro. I risultati della simulazione mostrano che questo metodo stima il parametro in modo efficace. - Nel capitolo 7 abbiamo proposto una deconvoluzione “miope” per la ricostruzione di immagini astronomiche utile quando la PSF è misurata come lo sono i dati e quindi avendo a disposizione soltanto una PSF perturbata da rumore. Questo metodo, a partire dalla probabilitá congiunta tra le due variabili aleatorie, i dati e la PSF misurata, porta alla minimizzazione di un funzionale separatamente convesso. Abbiamo svolto il confronto con i metodi classici e abbiamo mostrato la maggior efficacia del metodo “miope” rispetto a tali metodi. Durante questo lavoro, sono proseguite tre collaborazioni sul tema della ricostruzione di immagini con i seguenti Laboratori e Dipartimenti: - Laboratoire Fizeau, Université de Nice Sophia-Antipolis - Dipartimento di Matematica Pura e Applicata, Università di Modena e Reggio Emilia - Dipartimento per lo Studio del Territorio e delle sue Risorse, Università degli Studi di Genova
Elenco delle pubblicazioni - The study of an iterative method for the reconstruction of images corrupted by Poisson and Gaussian noise,
v
F. Benvenuto, A. La Camera, C. Theys, A. Ferrari, H. Lantéri and M. Bertero, 2008 Inverse Problems Vol 24. - Iterative deconvolution and semi-blind deconvolution methods in magnetic archaeological prospection, A. Zunino, F. Benvenuto, E. Armadillo, M. Bertero and E. Bozzo, 2009 GEOPHISICS Vol 74. - Méthode algorithmique de minimisation de fonctions d’écart entre champs de données. Application à la reconstruction d’images astrophysiques, H. Lantéri , C. Theys , F. Benvenuto , D. Mary, Colloque GRETSI 2009. - Gradient projection approaches for optimization problems in image deblurring and denoising, S. Bonettini, F. Benvenuto , R. Zanella , L. Zanni and M. Bertero, European Signal Processing Conference 2009. - Accelerated gradient methods for nonnegative least-squares image deblurring, F. Benvenuto , R. Zanella , L. Zanni and M. Bertero, 2010 Inverse Problems Vol 26. - Joint Blind Deconvolution, F. Benvenuto , A. Ferrari, Submitted to Inverse Problems.
Sommaire
Dans cette thèse, on étudie d’un point de vue mathématique divers algorithmes itératifs de reconstruction d’images. Ils résultent d’une approche statistique des problèmes de reconstruction, qui s’appuie sur l’hypothèse selon laquelle les images observées, c’est-àdire les données, sont des réalisations de variables aléatoires dont la valeur moyenne est inconnue. Dans les cas considérés dans cette thèse, on considèrera des variables Gaussiennes ou Poissoniennes, ou encore des variables qui font intervenir la somme d’un processus Gaussien et d’un processus Poissonien. Bien que les données soient le résultat d’un processus statistique, leur valeur moyenne dépend néanmoins de l’objet à reconstruire par l’intermédiaire d’une relation qui en décrit le processus de formation à travers le système d’acquisition. L’objet à reconstruire est considéré comme un ensemble de paramètres, et la relation qui exprime la valeur moyenne des données en fonction de ces paramètres est considérée comme le modèle. Dans notre travail, mis à part deux cas où il est non linéaire, le modèle traduit une convolution entre les paramètres et la réponse impulsionnelle du système, dite Point Spread Function (PSF). Si on fait l’hypothèse que la statistique des données est connue, il est possible d’en écrire explicitement la distribution de probabilités, en supposant que la valeur moyenne est exprimée par le modèle. En conséquence, la méthode permettant d’obtenir la solution du problème de reconstruction d’images consiste à maximiser cette probabilité en fonction des paramètres. Le problème de reconstruction est ainsi formulé comme un problème de Maximum de Vraisemblance (MV). En faisant l’hypothèse d’indépendance statistique des réalisations dans les différents pixels de l’image, la Vraisemblance de l’image entière s’écrit comme un produit des lois de probabilités dans les différents pixels ; la composition avec l’opposé d’une fonction logarithmique permet de reformuler le problème en termes de minimisation d’une fonc-
viii
tionnelle. Compte tenu du modèle linéaire, cette fonctionnelle est convexe dans les cas étudiés ici. Dans le cas d’un bruit Gaussien additif, on retrouve le problème des moindres carrés, alors que d’autres fonctionnelles sont obtenues dans le cas Poissonnien et dans le cas mixte Gaussien-Poissonnien. Pour calculer effectivement l’ensemble des paramètres solution du problème de MV, on doit utiliser un processus itératif, compte tenu de sa nature non linéaire. Toutefois, le problème principal dans ce calcul, tient au fait que la solution est sans signification à cause de la concomitance de deux facteurs : le mauvais conditionnement du modèle et le caractère aléatoire des données. Face à ce problème, les algorithmes itératifs ont la particularité d’approcher la solution à chaque pas et par conséquent d’atteindre progressivement la solution du problème MV qui est erronée. Cependant cela permet d’arrêter le processus de reconstruction avant que les effets de propagation du bruit n’apparaissent dans la solution. L’interruption des itérations permet de résoudre le problème de propagation du bruit et fait partie, à ce titre des méthodes de régularisation. Dans le travail présenté, nous utilisons également les méthodes de régularisation qui ont pour origine une approche Bayesienne, ou a posteriori et qui sont une alternative à l’arrêt des itérations. Dans ce contexte, la fonction de vraisemblance est modifiée en introduisant une loi de probabilité a priori qui agit directement sur l’ensemble des paramètres. Le problème devient alors celui de l’estimation du maximum a posteriori et la solution dépend alors d’un facteur supplémentaire, le facteur de régularisation. Par ailleurs, les méthodes itératives se prêtent parfaitement à l’introduction de contraintes sur la solution. Dans ce mémoire, nous considérons que les objets inconnus sont décrits exhaustivement par des tables de valeurs numériques positives, on peut ainsi introduire lors de la résolution itérative, une contrainte de non négativité sur les paramètres. Les objets reconstruits dans ce travail satisfont ces contraintes. Pour minimiser les fonctionnelles convexes considérées, les méthodes du premier ordre, c’est-à-dire celles qui ne font appel qu’au gradient de la fonctionnelle sont extrêmement utiles et simples à implémenter dans le cas de problèmes à grand nombre de paramètres ; c’est pourquoi dans cette thèse on s’intéresse particulièrement à ces méthodes. Ces méthodes ont montré leur efficacité en reconstruction d’images et de signaux ainsi que leurs aptitudes à prendre en compte les contraintes traduisant la connaissance a priori sur la solution. Dans la thèse, nous avons développé l’application de ces méthodes à divers types de problèmes : depuis les images astronomiques fournies par les télescopes au sol, jusqu’aux images magnétiques provenant de la prospection géomagnétique dans le contexte archéologique. En outre, nous avons traité des images correspondant à des objets ayant des caractéristiques très différentes : objets diffus et objets épars ou encore objets à bords francs.
ix
Dans le chapitre 2, nous décrivons le processus de formation des images et les sources de bruit dans le processus d’acquisition, en outre le problème de reconstruction d’images dans un contexte statistique est reformulé, enfin, on décrit brièvement le fait que le problème de reconstruction sous forme discrète est un problème bien posé mais mal conditionné. Dans le chapitre 3, on décrit l’approche statistique classique du problème de reconstruction en considérant trois types de bruit de mesure : Gaussien additif, Poissonnien, ou bien une combinaison des deux. On décrit l’approche au MV et les conditions d’existence de la solution du problème. On introduit ensuite la régularisation sous deux formes différentes : soit par arrêt des itérations avant la convergence à la solution au MV, soit en reformulant le problème dans un contexte Bayesien en introduisant une loi de probabilité a priori qui conduira à l’ajout d’une fonctionnelle de régularisation. Dans le chapitre 4, on introduit les principaux algorithmes de type gradient lorsqu’une contrainte de non négativité des paramètres est prise en compte. On présente une méthode générale dénommée Split Gradient Method (SGM), qui permet de construire de façon très directe tous les algorithmes de type gradient utilisés dans cette thèse, qu’ils soient régularisés ou non. Par ailleurs on introduit la méthode dénommée Split Gradient Projection (SGP) qui permet d’améliorer considérablement la rapidité des algorithmes élaborés selon la méthode SGM. Dans le chapitre 5 on analyse deux exemples de reconstruction lorsque les données sont entachées d’un bruit additif Gaussien. Dans le premier exemple, on met en évidence les améliorations obtenues par la méthode SGP pour des images astrophysiques simulées. Dans le second exemple, après avoir décrit le système de formation des images magnétiques, on présente les résultats de diverses reconstructions obtenues avec SGM, avec des fonctions de régularisation adaptées à ce problème particulier. Dans le chapitre 6 on compare les algorithmes pour la reconstruction d’images astronomiques, lorsque les données sont entachées d’un bruit considéré comme le mélange d’un bruit de Poisson intervenant lors de la photo conversion, auquel s’ajoute un bruit Gaussien additif lors de l’opération de lecture du capteur (CCD). Le premier algorithme est obtenu par la méthode SGM alors que les suivants sont deux approximations de celui-ci permettant de gagner en efficacité de calcul. Dans le chapitre 7, on étudie les reconstructions de type aveugle. Dans une première partie, on revient sur le problème de prospection géomagnétique considéré au chapitre 5 lorsque le paramètre de profondeur qui intervient dans la fonction réponse impulsionnelle (FRI) - analogue géomagnétique de la PSF - est inconnue. On présente d’abord les résultats d’un processus d’inversion dans un cas simulé et dans un cas réel. Dans la seconde partie on discute les résultats obtenus dans le cas d’images astronomiques simulées, pour lesquelles on considère que la PSF et l’image sont le résultat de mesures et sont donc toutes deux entachées de bruit de type Poisson. Le chapitre 8 suggère quelques voies de développement des travaux présentés.
x
Synthèse des contributions Les contributions originales de ce travail sont résumés ci après. - Dans le chapitre 3, nous avons étudié l’existence des solutions du problème du MV, appliqué à la reconstruction d’images perturbées par un bruit résultant d’un mélange de bruit Gaussien et de bruit Poissonnien, et nous avons établi les conditions d’existence de telles solutions. - Par ailleurs, nous avons proposé deux définitions de la propriété de semi-convergence pour des courbes où des successions en dimension finie. Dans les deux cas, la définition de la semi-convergence traduit le comportement d’une courbe où d’une succession par rapport à un point donné. En outre, nous avons développé une discussion approfondie sur la question de la semi-convergence de la méthode de Landweber. Nous avons montré au moyen d’un exemple que cette méthode n’a pas nécessairement la propriété de semi-convergence. De plus, à travers un exemple, nous avons montré que compte tenu de la statistique du bruit et pour une classe particulière de distribution des valeurs propres (dont les éléments approximent la véritable distribution des valeurs propres du système imageur), la valeur moyenne de l’erreur de reconstruction est semi-convergente par rapport à la solution vraie. - Dans le chapitre 4, nous donnons une approximation de l’algorithme standard pour la reconstruction d’images dans le cas d’un bruit combiné Gaussien Poissonnien. Cette approximation donne un résultat plus précis que celle donnée dans la littérature et qui sera discutée dans le chapitre 6. Les détails de calcul de cette approximation sont décrits en annexe. - Dans le chapitre 5, nous avons appliqué la méthode SGP à la reconstruction d’images dans le cas d’un problème mal-conditionné avec des données entachées d’un bruit additif Gaussien. Nous avons effectué diverses simulations numériques afin de vérifier l’efficacité de la méthode SGP ; en outre, nous avons comparé les algorithmes dérivés de la méthode SGP avec les algorithmes classiques du premier ordre. - Dans le chapitre 7, nous avons proposé une méthode de déconvolution “semi-aveugle” adaptée au cas spécifiquement étudié dans lequel la FRI dépend d’un paramètre. Les résultats de la simulation montrent que la méthode utilisée permet d’estimer efficacement ce paramètre. Dans ce même chapitre, nous avons proposé une méthode de déconvolution “semiaveugle” pour la reconstruction d’images astronomiques, particulièrement utile lorsque la PSF est comme les données, le résultat d’une mesure, et se trouve en conséquence entachée d’un bruit.
xi
A partir de la probabilité conjointe entre les variables aléatoires que sont les données et la PSF mesurées, cette méthode conduit à la minimisation d’une fonctionnelle séparément convexe ; nous l’avons comparée avec les méthodes classiques, et nous avons montré la supériorité de la méthode “semi-aveugle”. Au cours de ce travail, nous avons développé des collaborations sur le thème de la reconstruction d’images avec les Laboratoires suivants : - Laboratoire Fizeau, Université de Nice Sophia-Antipolis - Dipartimento di Matematica Pura e Applicata, Università di Modena e Reggio Emilia - Dipartimento per lo Studio del Territorio e delle sue Risorse, Università degli Studi di Genova
Liste des publications - The study of an iterative method for the reconstruction of images corrupted by Poisson and Gaussian noise, F. Benvenuto, A. La Camera, C. Theys, A. Ferrari, H. Lantéri and M. Bertero, 2008 Inverse Problems Vol 24. - Iterative deconvolution and semi-blind deconvolution methods in magnetic archaeological prospection, A. Zunino, F. Benvenuto, E. Armadillo, M. Bertero and E. Bozzo, 2009 GEOPHISICS Vol 74. - Méthode algorithmique de minimisation de fonctions d’écart entre champs de données. Application à la reconstruction d’images astrophysiques, H. Lantéri , C. Theys , F. Benvenuto , D. Mary, Colloque GRETSI 2009. - Gradient projection approaches for optimization problems in image deblurring and denoising, S. Bonettini, F. Benvenuto , R. Zanella , L. Zanni and M. Bertero, European Signal Processing Conference 2009. - Accelerated gradient methods for nonnegative least-squares image deblurring, F. Benvenuto , R. Zanella , L. Zanni and M. Bertero, 2010 Inverse Problems Vol 26. - Joint Blind Deconvolution, F. Benvenuto , A. Ferrari, Submitted to Inverse Problems.
Contents
Chapter 1 Introduction
3
1.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.2 Summary of publications . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Chapter 2 Mathematical modeling of image reconstruction
7
2.1 Imaging system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.1.1 Model of image formation . . . . . . . . . . . . . . . . . . . . . .
8
2.1.2 Model of image acquisition . . . . . . . . . . . . . . . . . . . . .
11
2.2 Formulation of the image reconstruction problem . . . . . . . . . . . . .
15
2.2.1 Ill-posed problem: the continuous model . . . . . . . . . . . . . .
17
2.2.2 Ill-conditioned problem: the discrete model . . . . . . . . . . . .
18
2.2.3 The combined effect of ill-conditioning and noise . . . . . . . . .
19
Chapter 3 Statistical approach to image reconstruction
21
3.1 Maximum likelihood formulation . . . . . . . . . . . . . . . . . . . . . .
21
3.1.1 Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.1.2 Poisson case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.1.3 Gaussian-Poisson case . . . . . . . . . . . . . . . . . . . . . . . .
23
3.1.4 Existence of solutions of the ML problem . . . . . . . . . . . . . .
24
3.2 Regularization by early stopping of iterative algorithms . . . . . . . . . .
27
xiv
CONTENTS
3.2.1 On the semi-convergence of the Landweber algorithm . . . . . .
29
3.3 Regularization in a Bayesian setting . . . . . . . . . . . . . . . . . . . . .
34
3.3.1 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
Chapter 4 Gradient-type algorithms for nonnegative image reconstruction
39
4.1 The case of Gaussian noise . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4.1.1 The projected Landweber method . . . . . . . . . . . . . . . . . .
42
4.1.2 The iterative space reconstruction algorithm . . . . . . . . . . . .
43
4.2 The case of Poisson noise . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.2.1 Richardson-Lucy algorithm . . . . . . . . . . . . . . . . . . . . .
45
4.3 The case of Gaussian-Poisson noise . . . . . . . . . . . . . . . . . . . . .
46
4.3.1 An efficient approximation of the Gaussian - Poisson minimization algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.4 The Split Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . .
48
4.4.1 Maximum likelihood estimates . . . . . . . . . . . . . . . . . . .
51
4.4.2 Maximum a posteriori estimates
. . . . . . . . . . . . . . . . . .
53
4.5 Efficient gradient methods . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.5.1 Acceleration of PL, ISRA and RL algorithm . . . . . . . . . . . . .
58
Chapter 5 Image reconstruction in the case of Gaussian noise
59
5.1 Efficient algorithms in a simulated astronomical example . . . . . . . . .
59
5.1.1 Acceleration of the basic algorithms . . . . . . . . . . . . . . . .
61
5.1.2 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . .
62
5.2 Regularized gradient algorithms in a geomagnetic example . . . . . . . .
67
5.2.1 Formulation of the problem . . . . . . . . . . . . . . . . . . . . .
69
5.2.2 The single layer model . . . . . . . . . . . . . . . . . . . . . . . .
71
5.2.3 Performance of the regularized algorithms . . . . . . . . . . . . .
76
Chapter 6 Image reconstruction in the case of Gaussian-Poisson noise
81
6.1 The case of Gaussian-Poisson noise . . . . . . . . . . . . . . . . . . . . .
81
6.1.1 Application to image deconvolution . . . . . . . . . . . . . . . . .
82
6.1.2 Computer implementation . . . . . . . . . . . . . . . . . . . . . .
82
1
CONTENTS
Chapter 7 Point spread function estimation
89
7.1 Semi-blind deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . .
90
7.1.1 Depth and thickness errors in IRF estimation . . . . . . . . . . . .
91
7.1.2 Semi-blind deconvolution in a real case . . . . . . . . . . . . . .
92
7.2 Myopic deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
7.2.1 Formulation of the joint deconvolution approach . . . . . . . . .
95
7.2.2 The iterative algorithm . . . . . . . . . . . . . . . . . . . . . . . .
99
7.2.3 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . 101 Chapter 8 Perspectives 8.1 SGP algorithm provided by the SGM scaling in the regularized approach
109 109
8.2 Non linear magnetic model . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.3 A non convex regularized parameter-free functional . . . . . . . . . . . . 111 Appendix A Appendices
115
A.1 Approximation of the Gaussian-Poisson minimization algorithm . . . . . 115 List of Figures
119
List of Tables
123
Bibliography
125
1 Introduction
In this thesis, we investigate a set of algorithms for image reconstruction. In particular, we study a class of first order methods for the minimization of convex functionals derived from a statistical approach to image reconstruction problem. First order methods have proven to be able to accurately reconstruct images and other signals, making them powerful tools also for evaluating different kinds of prior knowledge on the solution. We investigate the application of these methods to different problems: from astronomical images coming from a ground-based telescope, to “magnetic images” coming from geomagnetical prospection in archaeological survey. We also deal with images with different features: from images of diffuse objects to images of sparse or sharped objects. In chapter 2, the image formation process is described, as well as the sources of degradation in the image formation and acquisition. Moreover, we formulate the image reconstruction problem. Finally, we briefly describe the well-posedness and ill-conditioning of the discrete image reconstruction problem. Chapter 3 deals with the classical statistical approach to image reconstruction mainly considering three types of noise: Gaussian, Poisson, and a mixture of the first two. We describe the Maximum likelihood approach to image reconstruction, and the conditions for the existence of a solution of the ML problem. Next, we introduce regularization in two ways: by early stopping of the iterations, and in a Bayesian setting for which several priors are presented. Chapter 4 introduces the main gradient type algorithms for the image reconstruction when non negativity constraint on the solution is available. We present a technique, called Split Gradient Method (SGM), which allows us to find, in a very direct way, all the gradient type algorithms which we use. Moreover with the SGM we can dispose of a suitable algorithm for every regularizing prior we adopt. Finally, the Scaled Gradient Projection (SGP) method is presented. This method permits to considerably increase the performance of the first order algorithms. In chapter 5, we show two examples of reconstruction dealing with Gaussian noise. The
first presents the improvements in performance of the SGP method in the reconstruction of simulated astronomical images. The second describes the formation system of what we call “magnetic images” and presents the results of the regularized reconstruction given by suitable algorithms adapted to this case by means of the SGM. Chapter 6 explains the comparison between three algorithms for the reconstruction of astronomical images when the noise on the data is a mixture of Poisson and Gaussian noise. The first algorithm can be derived with the SGM, while the others are two different approximations of the first one. In chapter 7 we consider blind type reconstructions. In the first part we re-investigate the geomagnetic problem, discussed in chapter 5, when the depth parameter appearing in the Impulse Response Function (IRF) is unknown. We show inversion results in a simulated case and also in a real case. In the second part, we discuss an astronomical simulated example in which both the PSF and the data are measured, so that they are affected by Poisson noise. Chapter 8 is dedicated to suggest possible further developments of our research.
1.1 Summary of contributions This work has lead to the following original contributions. - In chapter 3 we study the existence of the ML solutions in the reconstruction problem of images corrupted by a mixture of Gaussian and Poisson noise. We give the conditions for existence of the solution of the ML problem. - In chapter 3, two formal definition of the semi-converge property are given in a finite dimensional space. The first concerns the semi-convergence of curves with respect to a point. The second concerns the semi-convergence of sequences with respect to a point. Moreover a deeper discussion in to the semi-convergence property for the Landweber algorithm is done. We give a toy-example in which the Landweber algorithm does not have the semi-convergence property. Moreover, we show that the expected value of the reconstruction error with respect to the noise statistics and to a particular class of eigenvalue distributions (which approximate the real distribution of the eigenvalues of an imaging system) is really semi-convergent with respect to the “true” solution. - In chapter 4 we give an approximation of the standard algorithm for ML image reconstruction in the case of a mixture of Gaussian and Poisson noise. This approximation results finer than the one proposed in the literature as discussed in chapter 6. In the appendix one can find mathematical details. - In chapter 5, we have applied SGP method for image reconstruction in the case of an ill-conditioned problem with data affected by Gaussian noise. We have performed
several numerical simulations verifying the efficiency of the SGP method. We have also compared SGP with respect to the classical known algorithms. - In chapter 7 we propose a semi-blind deconvolution suitable for the specific case in study, in which the IRF depends on one parameter. Simulation results show that this method efficiently estimates the parameter. - In chapter 7 we propose a myopic deconvolution for astronomical image reconstruction useful when the PSF is measured as well as the data. Hence only a noisy PSF is available. This method, starting from the joint probability between the two random variables, the data and the measured PSF, lead to the minimization of a separately convex functional. A comparison with classical methods is done. The efficiency of myopic method is shown. During this work, three main collaborations on image reconstruction were done: - Laboratoire Fizeau, Université de Nice Sophia-Antipolis - Dipartimento di Matematica Pura e Applicata, Università di Modena e Reggio Emilia - Dipartimento per lo Studio del Territorio e delle sue Risorse, Università degli Studi di Genova
1.2 Summary of publications - The study of an iterative method for the reconstruction of images corrupted by Poisson and Gaussian noise, F. Benvenuto, A. La Camera, C. Theys, A. Ferrari, H. Lantéri and M. Bertero, 2008 Inverse Problems Vol 24. - Iterative deconvolution and semi-blind deconvolution methods in magnetic archaeological prospection, A. Zunino, F. Benvenuto, E. Armadillo, M. Bertero and E. Bozzo, 2009 GEOPHISICS Vol 74. - Méthode algorithmique de minimisation de fonctions d’écart entre champs de données. Application à la reconstruction d’images astrophysiques, H. Lantéri , C. Theys , F. Benvenuto , D. Mary, Colloque GRETSI 2009. - Gradient projection approaches for optimization problems in image deblurring and denoising, S. Bonettini, F. Benvenuto , R. Zanella , L. Zanni and M. Bertero, European Signal Processing Conference 2009. - Accelerated gradient methods for nonnegative least-squares image deblurring, F. Benvenuto , R. Zanella , L. Zanni and M. Bertero, 2010 Inverse Problems Vol 26.
6
Introduction
- Joint Blind Deconvolution, F. Benvenuto , A. Ferrari, Submitted to Inverse Problems.
2 Mathematical modeling of image reconstruction
2.1 Imaging system
An image is a signal carrying information about a physical object which is not directly observable. In general the information consists of a degraded representation of the original object and one can roughly distinguish two sources of degradation: the process of image formation and the process of image recording. The degradation due to the process of image formation is usually denoted by blurring and is a sort of bandlimiting of the object. In the case of aerial photographs, for instance, the blurring is due to relative motion between the camera and the ground, to aberrations of the optical components of the camera and, finally, to atmospheric turbulence. The degradation introduced by the recording process is usually denoted by noise and is due to measurement errors, counting errors, etc. Blurring is a deterministic process and, in most cases, one has a sufficiently accurate mathematical model for its description. On the other hand, noise is a statistical process so that the noise affecting a particular image is not known. One can, at most, assume a knowledge of the statistical properties of the process. In this chapter we first discuss blurring and noise in general terms and then we provide several examples of noise. These examples are given to provide models that are useful for testing the methods discussed in this thesis. An imaging system consists, in general, of two parts. i) The first is an apparatus (formed by physical components such as mirrors, lenses, sources, collimators etc.) able to transform the radiation (microwaves, photons, Xrays, γ-rays, ultrasound etc.) emitted or transmitted by the sample to be imaged (in the following called the object) into a detectable radiation containing useful information about the spatial properties of the object. ii) The second is a detector providing measured values of the incoming radiation; this is the part of the system introducing sampling and noise.
8
Mathematical modeling of image reconstruction
2.1.1 Model of image formation We consider an object of coordinates s0 , where s0 can be a 2D or 3D variable. For what we are interested in, the object can be a light source which spread the light out or it can reflect the light coming from another source. In both cases we define its intensity as a function f (s0 ), where f can be a vector-valued function if the object emits polychromatic light. We observe this light with an optical system which is a device consisting of optical components (lenses, mirrors, beam splitters, etc.). It collects the light coming from the object to form an image in a plane, the image plane. Example 1. An example of optical device in astronomy is a telescope consisting of a Cassegrain reflector. It is a combination of a (large) concave circular mirror, the primary, and of a convex circular mirror, the secondary, situated above the primary but below its focal plane. The primary reflects the light coming from the sky toward its focal plane but, before reaching this plane, the light is reflected back by the secondary. Both mirrors are aligned in such a way that they have the same optical axis and the primary contains a hole in its center, thus permitting the light from the secondary to reach a detector located in its focal plane, below the primary. Every optical device transforms the light coming from a point light source in a given image. With the rigorous mathematical language of the distributions we can state that the system transforms a delta function, which represent a point light source f (s0 ) = δ(s0 ), in a fixed distribution hs0 (s). This transform describes the effects of diffraction, aberrations etc., and returns the image of the point light source, after passing through the optical system, as a function of s. For this reason the function hs0 (s) = h(s, s0 ) is called Point Spread Function (PSF) and in general it depends also on the location s0 of the point source. Under the approximation of Fourier optics, if the light emitted by the source is spatially incoherent, then the relationship between the intensity of the object f (s0 ) and that of the image g(s) is given by a linear integral operator, which describes the (linear) superposition of the images of the point source hs0 (s) for every point source s0 [31], i.e. Z g(s) = h(s, s0 ) f (s0 ) ds0 + b . (2.1) where b is a known constant signal called the background. In principle, the PSF can be obtained by solving the direct problem associated with the imaging process. In the case of an optical system, for instance, this means computing the propagation of light from a point source in the object domain to a point of the image domain through the elements of the system (lenses, mirrors etc). If no exact or approximate solution of the direct problem is available, the PSF can possibly be measured by generating a point source, by moving the point source in the object domain and by detecting the images produced by the system for the various positions of the source. In many imaging systems the PSF is invariant with respect to translations in the following sense: the image h(s, s0 ) of a point source located at s0 is the translation by s0 of the image h(s, 0) of a point source located at the origin of the object plane, i.e.
9
2.1 Imaging system
0
0
h(s, s0 ) = h(s − s0 , 0). It follows that h(s, s ) is a function of the difference s − s and 0 0 we write h(s − s ) instead of h(s, s ). Such an imaging system is called space invariant and the corresponding PSF is also called space invariant. A space-invariant PSF can be determined by detecting the image of a single point source, for instance located in the center of the object domain. If the imaging system is not space invariant, its PSF is said to be space variant. Equation 2.1 characterizes the so-called space-variant imaging system. Moreover, most imaging systems are, if not exactly at least approximately, space invariant so that hs0 (s) = h(s − s0 ) and Z g(s) = h(s − s0 ) f (s0 ) ds0 + b . (2.2) Equation 2.2 characterizes the so-called space-invariant imaging system.
−7.5 −9.0 −10.0 −10.5 −12.5 −12.0 −15.0 −13.5 −15.0
−17.5
−16.5
−20.0
−18.0
−22.5
−19.5
−25.0
−21.0 −27.5
Figure 2.1: The airy pattern and the Adaptive Optic (AO) psf.
In figure 2.1 we show the PSF of an ideal pupil, the so-called Airy pattern [31], due to diffraction effects. It is, for instance, the PSF of a Cassegrain telescope in the absence of aberrations of the mirrors, obstructions due to mechanical supports and effects of atmospheric turbulence. Different kinds of aberrations introduce, in general, modifications of the ideal PSFs. These can be due to aberrations in the optical components of the instruments. However, in the case of telescopes, the main troubles derive from atmospheric turbulence. The result is to produce an effective band considerably smaller than the band of the optical instrument. However, in modern ground-based telescopes, a new technology, called adaptive optics (AO), allows to compensate, at least partially, for the effect of atmospheric turbulence [26] and to get a resolution close to the diffraction limit.
2.1.1.1
The loss of information due to an optical system
Equation 2.1 can be viewed in functional analysis as a linear operator between functions spaces. We consider that the objects space O and the images space I are Hilbert spaces,
10
Mathematical modeling of image reconstruction
and so H:
O −→
f (s0 )
I
−→ g(s)
(2.3)
The imaging system can transmit complete information about the object or not. From a mathematical point of view, this fact is abstracted by means of the characteristics of the linear operator H. It represents the transmission of complete information if and only if it is invertible, i.e. 1. ker(H) = ∅ 2. im(H) = I When the first condition fails, then there exist at least two distinct objects, say f1 and f2 , such that Hf1 = g and Hf2 = g. Since the operator H is linear, we obtain H(f1 −f2 ) = 0 and therefore f = f1 −f2 is a non-trivial solution of the homogeneous equation Hf = 0. Conversely, if this equation has a non-trivial solution f and f1 is a solution of equation 2.1, i.e. Hf1 = g, then f2 = f1 + f is also a solution of the same equation because Hf2 = Hf1 + Hf = Hf1 = g. We see that the solution of the equation Hf = g is unique if and only if the equation Hf = 0 has only the solution f = 0. When there exist non-trivial solutions, these constitute a linear subspace ker H called the null space of the operator H or also the space of the invisible objects. In order to establish the existence of invisible objects it is convenient to write the equation Hf = 0 in terms of Fourier transforms. We obtain ˆ h(ω) fˆ(ω) = 0 , where ω is the frequency variable, which has the same dimensions of s, i.e. 2D or 3D vector variable, the symbol fˆ(ω) indicates the Fourier transform (FT) of f (s). ˆ Remark 1. The Fourier transform h(ω) of h(s) is called the transfer function (TF) of the imaging system. These functions play an important role in the knowledge of the behavior of the system. The PSF provides the response of the system to any point source wherever it is located. On the other hand, the TF tells us how a signal of a fixed frequency is propagated through the linear system, so that the image formation process can be viewed as a sort of frequency filtering. In the Fourier domain, the existence of non-trivial solutions of this equation can be ˆ ˆ read in terms of the support of h(ω). If the support of h(ω) coincides with the whole frequency space, then fˆ(ω) = 0. This is the case, for instance, of linear motion blur, of out-of-focus blur, of atmospheric turbulence blur and of near-field acoustic holography. In all these cases the solution of the restoration problem is unique. On the other hand, in the case of diffraction-limited imaging systems and in the case of far-field acoustic ˆ holography, the support B of h(ω) is a bounded subset of the frequency space and therefore invisible objects exist. In this case they only consist of high frequencies. Remark 2. According to a definition which is now classical a mathematical problem is said to be well-posed in the sense of Hadamard if it satisfies the following conditions: 1) the solution of the problem is unique
11
2.1 Imaging system
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Figure 2.2: The modular transfer function (MTF) of the AO psf.
2) the solution exists for any data 3) the solution depends with continuity on the data. When H is linear, as in our case, it simply means that H is invertible with continuous inverse. Indeed, the Closed Graph Theorem implies that 3) follows from 2). If one of the three conditions above is not satisfied, the problem is said to be ill-posed. Therefore an ill-posed problem is a problem whose solution is not unique and/or does not exist for any data and/or does not depend continuously on the data. In general the problem 2.3 is always ill-posed. Nevertheless, in the next section we will see that in a practical problem we deal with the discretized version of equation 2.2 and the peculiar features of its discrete version is that it can be well-posed in the sense specified above.
2.1.2 Model of image acquisition In order to measure the blurred image of f (0) (s), a recording system is placed in the image domain (for instance an array of detectors). For simplicity we assume that the output of this system is proportional to g(0) (s) (and we will put the constant equal to 1 in the following). However the effect of the recording process is the addition of a noise contribution, so that the recorded image, denoted by f (s) and called the noisy image, is given by Z g(s) =
h(s, s0 ) f (0) (s0 )ds0 + b + η(s)
(2.4)
The noise term η(s) in equation 2.4 is in general a realization of a random process. This realization is not known in practice. One knows, at most, statistical properties of the random process, such as mean value, variance, etc. We can also know, of course, if it is additive or multiplicative, correlated or uncorrelated, Gaussian or Poisson noise, etc. These properties, when known, should be used in the treatment of the problem. The amount of noise present in a discrete image is measured by the so-called signal-to-noise ratio (SNR) which is defined by variance of the image (dB) (2.5) SN R = 10 log 10 variance of the noise
12
Mathematical modeling of image reconstruction
H f (0)
h ∗ f (0) + b η g The images space I
The objects space O
Figure 2.3: The geometric representation of image formation process.
and can be estimated by SN R = 10 log10
kgk2 kwk2
(dB)
(2.6)
the norm being the Euclidean one associated with the scalar product. When the SNR is of the order of 40 or 50 dB, the noise is not visible in the image and the effect of the blur is dominant. In figure 2.3 we give a geometric representation where the two processes, imaging and detection, are clearly separated. In the case of a space-invariant system, equation 2.4 becomes Z g(s) = h(s − s0 ) f (0) (s0 )ds0 + b + η(s)
(2.7)
or also g = h ∗ f (0) + b + w So that, in terms of the Fourier transforms, equation 2.7 becomes ˆ gˆ(ω) = h(ω) fˆ(0) (ω) + b δ(ω) + w(ω) ˆ We will say that an imaging system is band-limited if its PSF is band-limited. The band B of the PSF will also be called the band of the imaging system. In the following we will give examples of blurring which can be modeled by means of a convolution integral as in equation 2.7. Some of them will be examples of band-limited blurring. In the other cases, the MTF tends to zero for large space frequencies so that the corresponding imaging systems are approximately band-limited. Real images, however, are discrete and therefore a function g(s) of two variables is replaced by a vector denoted by y = {yi }i∈S where i is, in general, a multi-index, i.e. a pair of indexes in the 2D case (astronomy) or a triple of indexes in the 3D case (microscopy). Then the original object f (0) (s0 ) can also be replaced by a vector denoted (0) by x(0) = {xj }j∈R . The two sets of index values, R and S, may coincide but they
13
2.1 Imaging system
may also be different. The cardinalities #S and #R are, respectively, m and n, i.e. the number of data and of unknowns of the image reconstruction problem. For example, in many astronomical cases, the 2D images are square, then m = n = N × N where N is the number of sampling of the values of g(s) or f (0) (s0 ) on the single side of the image. Finally the convolution integral can be replaced by a cyclic convolution with a matrix H, which represents a discretized version of the integral operator 2.2. This approximation is reasonable if the PSF is significantly different from zero over a domain much smaller than the image domain so that the periodicity effects are negligible. Usually in astronomical image deblurring the matrix H describes the effect of the optical components and, possibly, of the atmosphere. In most applications it can be a Toeplitz or a circulant matrix, i. e. Hij = Hi−j . We assume that it satisfies the following conditions: X X Hij ≥ 0 ; Hij > 0 , ∀j ∈ R ; Hij = 1 , ∀i ∈ S. (2.8) i∈S
j∈R
In other words, we assume that each row or column contains at least one non-zero element, and the PSF is normalized. By discretizing equation 2.7, in the case of 2D images, we obtain the following equation yi =
n X
(0)
Hij xj + b + wi ,
(2.9)
j=1
where w = {wi }i∈S is the discretized version of η(s). We can equivalently write equation 2.9 in matrix form, i.e. y = H x(0) + b + w . (2.10) Remark 3. We recall the relationship between the PSF of equation 2.7 and the cyclic PSF of equation 2.9. If we discretize h(s) by using, for instance, equispaced sampling points, then the cyclic PSF, H, of equation 2.10 is obtained by applying the shift operation to the array formed by the samples of h(s).
Therefore we can synthesize the general features of the data in the following. i) Data are discrete and the discretization is not decided by the mathematician but by the physicist or engineer who designed the imaging system; in general, sampling theorems are taken into account in the design and, very often, data are oversampled. ii) Data are realizations of random variables, as a consequence of the noise introduced by the detection system. Therefore, a component yi of the data vector is the realization of a random variable Yi . We will denote by Y = {Yi }i∈S the vector valued random variable corresponding to the data. About the detection system, we already remarked that it introduces sampling and noise, and that noise is a random process, so that the detected values are realizations of random variables. Therefore a modeling of this system requires a modeling of the noise, i. e. a model of its probability density (a function or a distribution). This density depends
14
Mathematical modeling of image reconstruction
on the object x and therefore we denote it as pY (y; x). The following assumptions are, in general, accepted as reasonable ones. – The random variables Yi and Yl associated to different elements of the detector are statistically independent, so that we can write pY (y; x) =
m Y
pYi (yi ; x) .
i=1
– The expected value of Yi is just given by the exact value of the incoming radiation so that we have Z E{Y } =
2.1.2.1
y pY (y; x) dy = Hx + b .
(2.11)
Gaussian noise
The first example is provided by the so-called additive white Gaussian noise. In this example Y is given by Y = Hx + b + E where b is a background and E is a vector-valued random variable with statistically independent components, all having the same Gaussian distribution, with expected value 0 and variance σ 2 , so that PE (e) =
√
1 2πσ 2
m
1 2 exp − 2 ||e|| , 2σ
where || · || denotes the usual 2-norm. Therefore the statistical model for the detected data is given by pY (y; x) =
2.1.2.2
√
1 2πσ 2
m
1 2 exp − 2 ||y − (Hx + b)|| . 2σ
Poisson noise
The second example is the so-called Poisson noise, describing, in general, the noise affecting counting processes (sometimes it is also called “photon noise”). In such a case each Yi is a Poisson random variable with expected value given by equation 2.11 Yi ∼ Poisson{(Hx + b)i } , so that its probability density is a distribution with support the set of the non-negative integers (each yi is a non-negative integer). We have pY (y; x) =
m −(Hx+b)i Y e (Hx + b)yi i
i=1
yi !
.
(2.12)
15
2.2 Formulation of the image reconstruction problem
2.1.2.3
Gaussian-Poisson noise
The third example is the so-called Gauss-Poisson noise and is a more refined model of the noise affecting data detected by a charged-coupled-device (CCD) camera, as described in [67]. It is given by Y =Z+E , where Z is a Poisson process while E is an additive white Gaussian noise given as in the previous examples. In such a case the probability density is given by the convolution between the two respective probability densities, i.e. 1
2 +∞ −(Hx+b)i m X Y e (Hx + b)li e− 2σ2 (yi −l) √ pY (y; x) = l! 2πσ
i=1
l=0
!
.
2.2 Formulation of the image reconstruction problem In rather broad terms, the problem of image reconstruction can now be formulated as a deconvolution problem, that is the problem of estimating the unknown object f (0) , or its discrete version x(0) , given the blurred and noisy image g(s), or its discrete version y. Moreover it must be assumed that the PSF h(s), or the cyclic PSF H, is also known. For simplicity, in the following of this section we assume that the background b is equal to 0. When we deal with an inverse problem in a continuous framework, the main drawback is the ill-posedness. It implies that the solution of the problem may not exist or, when it exists as in the case of discrete problems, is completely deprived of physical meaning as a consequence of error propagation and amplification from the data to the solution. It is essentially a consequence of the fact that the system does not transmit complete information about the object, as we already mentioned in remark 2. The first consequence of the situation described above is that one must reject the possibility of finding the exact solution of the problem and look for approximate solutions, i.e. objects which reproduce approximately the noisy images. This is the first basic point in the treatment of ill-posed problems. Remark 4. Note that, from an ontological point of view, this approach does not reject the concept of exact solution but only the possibility of finding it. Other statistical approaches consider the set of feasible solutions according to some model (for example using the Minimum Description Length), and doing so, they reject the concept of exact solution at all. However, another consequence of ill-posedness is that the set of approximate solutions is too broad and contains not only physically acceptable objects but also objects which are too large and wildly oscillating. Then the second basic point is that one must use the knowledge of additional properties of the unknown object (constraints) for selecting from the set of approximate solutions those which are physically meaningful. This is the role played by so-called a priori information.
16
Mathematical modeling of image reconstruction
Remark 5. In some cases one does not know the PSF or one only knows that it belongs to a class of functions depending on a certain number of parameters. For instance, one knows that the blur is due to uniform motion but the parameters of the motion are not known. In the chapter dedicated to the applications we analyze a geomagnetic problem in which the PSF depends on one parameter and we see how the so-called blind methods can be useful to solve this problem. This is only a particular case of a more general problem, which is known as image identification (equivalent names - blur identification, image blur identification, a posteriori restoration, blind deconvolution) and consists of the attempt of estimating both h(s) and f (0) (s) with the unique knowledge of g(s). It should be obvious that to solve such a problem one needs a lot of additional information about the two unknown functions h(s) and f (0) (s). In any case this problem is considered in this thesis in the content of two applications: in the geomagnetic field and also in the astronomical field where we also assume that, instead of the true PSF, a noisy approximation of the PSF itself is known. Therefore, even in the absence of noise one could find difficulties in solving the problem. In the following we start to consider the inverse problem in the approximation η → 0. In equation 2.7, as well as in equation 2.9, both f (0) and η are unknown, but if the SNR, as defined in equation 2.5, is sufficiently large, it may seem reasonable at first sight to neglect the noise term in these equations. If we make this approximation, then we can formulate the problem of image restoration as that of solving the linear equation Hf = g
(2.13)
or, in the discrete case, as that of solving the linear algebraic system Hx = y .
(2.14)
As concerns the discrete equation 2.9, we also expect that the solution x provides an approximation of the discrete object x(0) . If we use the FT, equation 2.13 becomes rather trivial, since we get ˆ h(ω) fˆ(ω) = gˆ(ω) (2.15) Analogously, by means of the DFT, from equation 2.9 with hj−i = Hij , we have ˆi x h ˆi = yˆi
(2.16)
As concerns the discretized problem of equation 2.9, the equation Hx = 0 implies ˆi x h ˆi = 0 and therefore the solution of the problem of equation 2.9 is unique if and only if ˆhi 6= 0 for any value of i. Remark 6. It is important to notice that, even if the problem 2.9 is a discrete version of the problem 2.7, the uniqueness of the solution of 2.9 is not directly related to the uniqueness of the solution of equation 2.7. Indeed, when the PSF h(s) is discretized and the DFT of the discrete PSF is computed along the grid described in the previous paragraph, as a ˆ i can be different from consequence of the discretization errors all values of the discrete TF h zero even if h(s) is band-limited.
2.2 Formulation of the image reconstruction problem
17
2.2.1 Ill-posed problem: the continuous model In this section we investigate the existence of the solution of the problem of image restoration in the case where uniqueness holds true. Indeed, as was shown in the previous section, the uniqueness of the solution can be investigated independently of its existence. We consider first the case of equation 2.7 and then the case of its discrete version, equation 2.9. ˆ is the whole space, equation 2.15 implies that the FT of the solution If the support of h of the problem is given by gˆ(ω) (2.17) fˆ(ω) = ˆ h(ω) However the solution of the problem exists if and only if the right hand side of equation 2.17 defines the FT of a function. In order to investigate this point, let us recall the model of image formation discussed in section 2.1.1. According to this model, the FT of the image g(ω) is given by ˆ gˆ(ω) = h(ω) fˆ0 (ω) + ηˆ(ω)
(2.18)
where fˆ0 (ω) is the FT of the true object and ηˆ(ω) is the FT of the noise contribution. If we substitute equation 2.18 into equation 2.17 we get ηˆ(ω) fˆ(ω) = fˆ0 (ω) + ˆ h(ω)
(2.19)
We conclude that the function fˆ(ω) is the sum of the FT of the true object fˆ0 (ω) plus a term which comes from the inversion of the noise contribution. This second term may be responsible for the non-existence of the solution, i.e. of the inverse FT of fˆ(ω). We ˆ first consider the case where h(ω) is zero for some values of ω (this situation applies, for instance, to the linear motion blur and to the out-of-focus blur). In this case the FT of the noise term in general is not zero at the frequencies where the TF is zero because the noise is a process which is independent of the imaging process. As a consequence in equation 2.19 we have division by zero and fˆ(ω) has singularities at the zeros of ˆh(ω). This fact implies that the inverse FT of fˆ(ω) may not exist and therefore equation 2.7 may not have a solution. ˆ Moreover, even if h(ω) is not zero for some values of ω, it tends to zero when |ω| → ∞. Since the behavior of ηˆ(ω) for large values of ω is not related to the behavior of ˆh(ω), ηˆ(ω) the ratio may not tend to zero. Depending on the relationship between the high ˆ h(ω) frequency behavior of the noise and that of the TF, this ratio may tend to infinity, or to a constant, or to zero. Possibly it may have no limit. The typical situation is that the inverse FT of this ratio does not exist and therefore no solution of the problem exists. The previous discussion can be synthesized in a precise mathematical form as follows: the solution of equation 2.7 exists and is square-integrable if and only if the image g(s) satisfies the following condition 2 Z gˆ(ω) (2.20) dω < ∞ ˆ h(ω)
18
Mathematical modeling of image reconstruction
in this case the solution f † is given by 1 f = (2π)m †
Z
ηˆ(ω) exp(i s · ω)dω ˆ h(ω)
(2.21)
Since condition 2.20 is not satisfied by arbitrary square-integrable functions and since, in general noisy images do not satisfy this condition, the solution of equation 2.7 may not exist. As concerns the continuous dependence of the solution on the data, it is sufficient to observe that, if we consider, for instance the object f 0 (s) = A0 exp(i s · ω0 ), ˆ 0 ) exp(i s· with a sufficiently large frequency ω0 , then its image is given by g0 (s) = A0 h(ω ω0 ) and therefore its amplitude can be very small even if the amplitude A0 of the object ˆ 0 )|−1/2 ). can be very large (for instance, one can take A0 = |h(ω
2.2.2 Ill-conditioned problem: the discrete model The previous discussion makes clear that both the non-existence of the solution and the lack of uniqueness discussed in the previous section are due to the fact that the imaging system does not transmit complete information about the Fourier transform of the object at certain frequencies. This lack of information is a point which is very important and which must never be forgotten. To this purpose one must always remember a very incisive statement of Lanczos: “a lack of information cannot be remedied by any mathematical trickery”. Indeed, the methods developed for solving ill-posed problems are not based on mathematical trickeries but rather on a reformulation of the problem based on the use of additional information on the object. This additional information is compensating, in some way, the lack of information due to the imaging system. This point will be further discussed when we deal with regularization. We conclude that, when uniqueness holds true, the problem of image restoration is ill-posed because the second and third conditions for well-posedness are not satisfied. Consider now the discrete equation 2.9. In this case the situation is quite different. We ˆ i 6= 0 for all values of i. If this condition know that the solution is unique if and only if h is satisfied, from equation 2.16 we get x ˆi =
yˆi ˆ hi
and, by taking the inverse DFT of this equation, we obtain m 2π 1 X yˆk † exp i k · j xj = ˆk m m h
(2.22)
(2.23)
k=1
where we have used the index k since i denotes -here- the complex number. This solution exists for any noisy image y and also depends continuously on the data. Therefore we conclude that, when uniqueness holds true, the discrete problem of equation 2.9 is well posed. The procedure outlined above is usually called inverse filtering in the literature on image restoration. We also observe that equation 2.23 can be written in terms of the inverse of the cyclic matrix H associated with the PSF h x† = H −1 y .
(2.24)
19
2.2 Formulation of the image reconstruction problem
The matrix H is the cyclic matrix generated by the vector (H −1 )j =
m 2π 1 X 1 exp i k · j ˆk m m h
.
(2.25)
k=1
It follows that equation 2.24 can be written as a cyclic convolution x†j =
m X (H −1 )j−k yk
(2.26)
k=1
A puzzling implication of the analysis performed in this section is that, even if equation 2.7 is ill-posed because, in general, its solution does not exist, its discrete version equation 2.9 is well posed because it has a unique solution which depends continuously on the data. In the next section we show that this solution is, in general, unacceptable from the physical point of view because it is completely corrupted by noise. To this purpose we must investigate error propagation from the data to the solution.
2.2.3 The combined effect of ill-conditioning and noise Let us consider a (small) variation δy of the discrete image y. The corresponding variation of the solution δx† is given by δx† = H −1 δy thanks to the linearity of equation 2.24. If we use the bound kH −1 k ≤ hmax is the maximum modulus of the eigenvalues, we have kδx† k ≤
1 kδyk ˆ hmin
(2.27) 1 , where ˆ hmin
(2.28)
ˆ max we get On the other hand, from equation 2.9 and the bound kHk ≤ h ˆ max kx† k kyk ≤ h
(2.29)
By combining the two bounds we obtain ˆ max kδyk h kδx† k ≤ † ˆ min kyk kx k h
(2.30)
The quantity α=
ˆ max h ˆ min h
(2.31)
is called the condition number of the problem and is the quantity which controls error propagation from the data to the solution. A problem with a large condition number is called ill-conditioned while a problem whose condition number is close to one is called well-conditioned.
20
Mathematical modeling of image reconstruction
H x(0)
h ∗ x(0) + b w y
x†
Figure 2.4: The fundamental drawback of the image reconstruction problem: the inverse solution of the model is wrong!
Remark 7. We remark that there exist some ill-conditioned problems, as the 3D magnetic inversion, for which, by restricting the problem domain to certain subsets of variables, the problem results better conditioned with respect to others subsets. In particular there exist some subsets of domain for which the problem is well-conditioned. Consequently, the condition number, which is a global index, is not sufficient to explain correctly the complexity of the problem. It is preferable to compute the sensitivity which is defined as the Jacobian of the forward operator. This quantity points out the local stability of the results of an inverse problem and can also be applied when the problem is non linear. The previous analysis implies that when discretizing an ill-posed problem we usually get a well-posed but ill-conditioned problem and, in particular, a very ill-conditioned one if the discretization is very accurate. If the discretization is rough, the discrete problem can be moderately ill-conditioned and, possibly, nearly well conditioned if the discretization is very rough. It should be obvious that a large condition number implies numerical instability: for instance, if or α = 106 , a relative error on the data of the order of 106 may imply an error of 100% on the solution. Therefore we see that continuous dependence of the solution on the data is necessary but not sufficient to guarantee numerical stability. In the case of synthetic data, when the undistorted object x0 is available, a measure of the improvement introduced by the process of restoration is given by the so-called mean square error improvement factor (MSEIF) defined by M SEIF = 20 log10
ky − x0 k (dB) kx∗ − x0 k
where y is the noisy image and x∗ is the restored image.
3 Statistical approach to image reconstruction
Let us assume that we have a complete model in the sense specified in the previous chapter and that we have a detected image y (for simplicity, we do not introduce some specific notation for the detected image), i. e. a realization of the random variable Y . The problem of image reconstruction is to find an estimate x ¯ of the unknown object corresponding to the image y. The trivial approach should be to look for a solution of the linear equation Hx + b = y, but, as we know, in general this approach is not successful since the matrix H is ill-conditioned. The fact that the most frequently used algorithm in tomography, namely the filtered back-projection, is just coming from the solution of the linear equation is an exception to this rule. Information about statistical properties of the data suggests to look for statistical approaches to the problem.
3.1 Maximum likelihood formulation Since we assume to know the probability density pY (y; x) of the data and since, in this density, the unknown object appears as a set of unknown parameters, at first glance the problem of image reconstruction appears as a classic problem of parameter estimation. Then the standard approach is the so-called maximum likelihood (ML) estimation. In our specific application, for a given detected image y, it consists in introducing the likelihood function defined by LYy (x) = pY (y; x) ; clearly this is only a function of x since y is given and is just the detected image. Then the ML-estimate of the unknown object is any object x∗ that maximizes the likelihood function x∗ = argmaxn LYy (x) . (3.1) x∈R
It is obvious that this definition is meaningful if the likelihood function has maximum points.
22
Statistical approach to image reconstruction
In our applications the likelihood function is the product of a very large number of factors, so that it is convenient to take the logarithm of this function; moreover, if we consider the negative logarithm (the so-called neglog) the maximization problem is transformed into a minimization one. Therefore we introduce the functional Jy (x) = −A ln LYy (x) + B ,
(3.2)
where A, B are suitable constants that can be introduced in order to simplify the expression of the functional. Since the neglog function is strictly convex, the problem of equation 3.1 is equivalent to the following one x∗ = arg minn Jy (x) .
(3.3)
x∈R
We reconsider now the three examples of the previous section.
3.1.1 Gaussian case In the case of additive white Gaussian noise, by a suitable choice of the constants A, B, we obtain Jy (x) = ||Hx + b − y||2 , (3.4) and therefore the ML approach coincides with the well-known least- squares (LS) approach. It is also well-known that the functional of equation 4.1 is convex, nonnegative, locally bounded and coercive (strictly convex if and only if the equation Hx = 0 has only the solution x = 0). Moreover it has always global minima, i. e. the LS-problem has always a solution; but this problem is ill-conditioned, in the case of image reconstruction, since it is equivalent to the solution of the Euler equation H T H x = H T (y − b) , and the condition number of the matrix H can be very large. Indeed, the continuous version of this problem is ill-posed (the matrix H comes from the discretization of an integral operator, very often a compact one) and this ill-posed problem is the starting point of the so-called Tikhonov regularization theory (see, for instance, [70, 27]). Therefore, this theory is based on the tacit assumption that the noise affecting the data is additive and Gaussian. We describe the Tikhonov regularization prior in more detail in section 3.3.1.
3.1.2 Poisson case In the case of Poisson noise, if we introduce the so-called Kullback-Leibler (KL) divergence of a vector z from a vector y, defined by DKL (y, z) =
m X i=1
yi ln
yi + zi − yi zi
,
(3.5)
23
3.1 Maximum likelihood formulation
then, with a suitable choice of the constants A, B, the functional J0 (x; y) is given by Jy (x) = DKL (y; Hx + b) = m X yi yi ln = . + (Hx + b)i − yi (Hx + b)i
(3.6)
i=1
It is quite natural to take the non-negative orthant as the domain of this functional. Moreover, it is well-known that it is convex, and strictly convex if the equation Hx = 0 has only the solution x = 0 [65], non-negative and locally bounded. Therefore it has global minima.
3.1.3 Gaussian-Poisson case In the case of Gauss-Poisson noise, the functional J0 (x) is given by Jy (x) = − where pi (x; y) =
m X
ln pi (x; y)
i=1
+∞ −(Hx+b)i X e (Hx + b)l l=0
(3.7)
1 i − 2σ 2 (l−yi )2
l!
e
,
as we described in section 2.1.2. This depends on the fact that, by assuming statistical independence between different pixels, y is a realization of a random vector Y , with probability density given by the convolution of a product of n = #(S) independent Poisson distributions, with expected values (Hx + b)i , i ∈ S, with a product of n independent Gaussian processes, all with expected value 0 and variance σ 2 . Thanks to the distributivity of convolution over separable variable functions, the probability density of Y is given by (the object x is treated as a set of unknown parameters while b and σ 2 are assumed to be known) ! +∞ −(Hx+b)i Y X 1 2 e (Hx + b)li 1 √ . (3.8) pY (y; x) = e− 2σ2 (yi −l) l! 2πσ 2 i∈S
l=0
Then, the likelihood is the function of x obtained by letting y be the detected image, LYy (x) = pY (y; x), and the ML approach maximizes this function of x. As usual, the maximization problem can be equivalently stated as a minimization problem by considering the neglog of the likelihood. More precisely, we set n Jy (x) = − log LYy (x) − log(2πσ 2 ) (3.9) ) (+∞ 2 X e−(Hx+b)i (Hx + b)l − 1 (l−y )2 X i i e 2σ2 = − log l! l=0 i∈S ( ) +∞ X X (Hx + b)li − 12 (l−yi )2 = (Hx + b)i − log . e 2σ l! i∈S
l=0
As far as we know, no result is available about the ill-posedness of this minimization problem.
24
Statistical approach to image reconstruction
3.1.4 Existence of solutions of the ML problem We know that the existence of solutions of the ML problem is not always guaranteed. In the case of Gaussian noise, or Poisson noise, some simple consideration on convexity leads to the existence and uniqueness of solutions, but for what concerns the mixture of these two noises one has to prove it. Hence, in this section we investigate the problem of the existence of solutions of the ML problem in the case of Poisson-Gaussian noise. We first prove the existence of solutions of the ML problem by investigating the properties of the negative-log of the likelihood function. Indeed, we will prove that also this functional is convex (strictly convex if the equation Hx = 0 has the unique solution x = 0), nonnegative and locally bounded. Therefore it also has global minima on any convex subspace of its domain. We first prove the concavity of the likelihood function LYy (x), which is a direct consequence of the following simple lemma for which no reference could be found in the classical estimation theory literature. Lemma 1. - Consider the data model Y = Z + E, where Z and E are two independent random vectors: Z denotes the “signal of interest”, which depends on the unknown parameters set x, and E denotes an additive noise with known parameters. If the likelihood function of the model Z is concave (resp. strictly concave), then the likelihood of the noisy model Y is also concave (resp. strictly concave). Proof. Define pZ (z; x) and pE (e) to be the probability density functions (pdf) of Z ∈ Z (support of Z) and E, respectively. Since the two vectors are independent, the pdf of Y is Z pY (y; x) = pZ (u; x)pE (y − u)du . (3.10)
Consequently the likelihoods LYy (x) and LZ y (x) of the “noisy” and “noiseless” model for a given observed vector y are related by Z Y Ly (x) = LZ (3.11) u (x)pE (y − u)du. As long as pE (e) ≥ 0, it can be easily checked that if, for all u, x1 , x2 and t ∈ (0, 1), Z Z LZ u (tx1 + (1 − t)x2 ) ≥ tLu (x1 ) + (1 − t)Lu (x2 ), equation 3.11 implies that the same Y result holds for Ly (x). Assume now that, for all u, LZ u (x) is strictly concave. If E is continuous, there necessarily exists a nonempty open subset U ⊂ Z where, for all u ∈ U, pE (y − u) > 0 and R LYy (x) = U LZ u (x)pE (y − u)du. Reasoning analogous to that given above proves that Y the strict concavity of LZ u (x) implies the strict concavity of Ly (x). Proposition 1. The function Jy (x), defined in equation 3.9, is convex on its domain and also on the closed and convex set C of the non-negative vectors. It is strictly convex if the equation Hx = 0 has the unique solution x = 0. Proof. Lemma 1 relates the concavity of LYy (x) to the concavity of LZ y (x), or equivalently the convexity of Jy (x) to the convexity of the neglog-likelihood of the signal of interest
25
3.1 Maximum likelihood formulation
Z. The proof then follows from standard results of image reconstruction for Poisson noise, which state that the corresponding neglog-likelihood is convex and strictly convex if the null space of H contains only the null element [65]. Proposition 2. The function Jy (x) is nonnegative and locally bounded on its domain and also on C. Therefore, it has minima and all minima are global. The minimum is unique if the equation Hx = 0 has the unique solution x = 0. Proof. The non-negativity follows from the second expression in equation 3.9 since the series is bounded by 1. The local boundedness of the function follows from the convergence of the series for any x. Then, the existence of global minima follows from the previous Proposition. The previous results demonstrate the existence of solutions of the ML problem. It is possible to gain more insight into the properties of the functional Jy (x) and hence of the minimization algorithm investigated in the next section by introducing the following function p(s; t) = q(s; t) =
+∞ l X s l=0 +∞ X l=1
l!
1
2
e− 2σ2 (l−t)
(3.12) +∞
sl−1 − 12 (l−t)2 X sl − 12 (l+1−t)2 = e 2σ e 2σ (l − 1)! l!
(3.13)
l=0
+∞ +∞ X sl−2 − 12 (l−t)2 X sl−1 − 12 (l+1−t)2 r(s; t) = = e 2σ e 2σ (l − 2)! (l − 1)! l=2
(3.14)
l=1
which are defined for any non-negative value of s and any real value of t. The functions q(s; t) and r(s; t) are, respectively, the first and second partial derivative of p(s; t) with respect to s. If we introduce the vectors p(x), q(x),and r(x), whose components are given by pi (x) = p ((Hx + b)i , yi ) , qi (x) = q ((Hx + b)i , yi ) ,
(3.15)
ri (x) = r ((Hx + b)i , yi ) ; i ∈ S , and the weighting vector h > 0 with components X hj = Hi,j ; j ∈ R ,
(3.16)
i∈S
then, the gradient and the Hessian of Jy (x) are given by q(x) ∇Jy (x) = h − H T , ∇2 Jy (x) = H T D(x)H , p(x) 2 q (x) − p(x)r(x) , D(x) = diag p2 (x)
(3.17)
where the product and quotient of two vectors are defined component by component (Hadamard product and quotient of two vectors).
26
Statistical approach to image reconstruction
The previous results on the convexity of Jy (x) imply that the quotient appearing in the expression of the Hessian is nonnegative. In the next Lemma we prove that it is always positive, thus proving that convexity and strict convexity depend only on the properties of the imaging matrix H. Lemma 2. For any t and s > 0, we have q 2 (s; t) − p(s; t)r(s; t) > 0 , where the functions p, q, r are defined in equations 3.12-3.14. Proof. We set
1 ξt (l) = exp − 2 (l − t)2 2σ
,
and observe that, for any t, ξt (l + 1)ξt (k) − ξt (l)ξt (k + 1) > 0 , k > l .
(3.18)
Then, if we write q 2 (s; t) as the product of the two (equivalent) series given in equation 3.13, use the expression of p(s; t) given in equation 3.12, and express r(s; t) as the second series given in equation 3.14, we get q 2 (s; t) − p(s; t)r(s; t) = +∞ X +∞ k+l−1 X s k [ξt (l + 1)ξt (k) − ξt (l)ξt (k + 1)] . = k!l! k=0 l=0
If we split the sum with respect to l into a sum from 0 to k and a sum from k to +∞ (note that the terms with l = k are zero), then, by exchanging indexes in the second sum and collecting the two terms, we have q 2 (s; t) − p(s; t)r(s; t) = =
+∞ X k X sk+l−1 k=0 l=0
k!l!
(k − l) [ξt (k + 1)ξt (k) − ξt (l)ξt (k + 1)] ,
and therefore the series is positive, because of inequality 3.18. Proposition 3. The function Jy (x) is strictly convex if and only if the equation Hx = 0 has the unique solution x = 0. Proof. The proof follows from the previous lemma and equation 3.17. The existence of solutions of the ML problem does not imply that we have obtained sensible estimates of the unknown object. It should also be proved that these solutions are stable with respect to noise fluctuations. We have evidence that this property does not hold true, as follows from the analysis of two related noise models. It is well known that the least-square solutions, coinciding with the ML solutions in the case of additive Gaussian noise, are widely oscillating as a consequence of noise propagation. This effect is partially reduced, but not suppressed, if one introduces the additional constraints of
3.2 Regularization by early stopping of iterative algorithms
27
non-negativity and flux conservation (constraint on the 1-norm of the solution). These constraints are automatically satisfied by the solutions of the ML problem in the case of Poisson noise and zero background [65]. However, these solutions are affected by the so-called checkerboard effect which is a consequence of noise propagation [57]: they are zero in a large number of pixels and grow up in the others. The noise model considered in this section is intermediate between the previous ones and therefore we expect that the ML solutions have a similar behavior. This conjecture is supported by numerical results obtained with the iterative method investigated in the following chapter. Remark 8. The previous paragraphs demonstrate that, in the case of image reconstruction, ML problems are ill-posed or ill-conditioned. That means that one is not interested in computing the minimum points x∗ of the functionals corresponding to the different noise models because they do not provide sensible estimates x ¯ of the unknown object. However, as it is known, the ML approach deserves an accurate analysis. The previous remark only implies that one must be very careful in applying to these problems methods derived from optimization theory. In particular, in our opinion, very efficient methods, such as second order methods, pointing directly to a minimum can be dangerous. On the other hand, numerical experience (and, in some cases, also theoretical results) demonstrates that first order methods can provide acceptable (regularized) solutions by early stopping. In the framework of regularization theory the study of iterative methods with such a property (we only mention Landweber, steepest descent and conjugate gradient methods) is a widely investigated topic.
3.2 Regularization by early stopping of iterative algorithms The iterative methods used to solve the ML problem usually consist in recursive formulas which converge to the ML solution. Each iteration improves the approximation of the ML solution, but, as we discussed above, the ML solution is physically wrong. Numerical experiences confirm that until a certain level of approximation of the ML solution, the approximated solution is physically acceptable, while beyond this level the corruption due to the noise propagation prevails more and more. This property is called semiconvergence and it is well-known by the researchers as an empirical properties of the algorithms. In the next paragraph we will give a formal definition of semi-convergence and we will see that this property does not hold always true. However, at least in the case of the Landweber algorithm, by taking into account the noise statistics and the distribution of the eigenvalues of the PSF, we will prove that the expected value of the reconstruction error is semi-convergent. In the following we recall the Landweber algorithm. It is the standard method when the noise is distributed as a white Gaussian random variable of given variance σ 2 . It is the gradient type algorithm applied to the least squares function described in equation 3.4. For simplicity and without less of generality, in the following of this section we assume that b = 0. The Landweber iterative method is well known in the literature. One of its most common application is the estimation of the solution of an ill-conditioned
28
Statistical approach to image reconstruction
H x(0)
h ∗ x(0) + b w y
x0
x†
Figure 3.1: The geometric representation of a convergent algorithm which is semiconvergent with respect to the “true” solution x(0) .
problem. The estimation takes place iteratively. The number of iterations plays the role of a regularization parameter. It is experimentally known that it has a semi-convergent behavior with respect to the “true” solution of the problem: it means that, until a certain number of iterations, the reconstruction error computed in `2 norm, decreases, while beyond this number of iterations, it increases. In this work we show a very simple example in which the reconstruction error does not have this behavior. Moreover, we show that this semi-convergence property is an “expected property” with respect to the Gaussian statistics of noise on the data and to the eigenvalue distribution of an ill-conditioned image reconstruction problem. We are particularly interested in the Landweber method because this is the unique algorithm for which is known a closed form. In general, the Landweber algorithm with step-length 1 is given by the recursive formula xk+1 = I − H T H xk + H T y
If one looks for the relation between xk and a fixed initialization x0 , by the composition rule, one gets k−1 X j k I − HT H HT y (3.19) xk = I − H T H x0 + j=0
Equation 3.19 is the coordinate expression of a discrete path γ˜ : k ∈ Z+ → Rn . When (H T H)−1 exists, by the geometric progression we have h k i T −1 T k (H H) H y (3.20) xk = I − H T H x0 + I − I − H T H Equation 3.20 is the coordinate expression of another path γ : k ∈ R+ → Rn . The relation γ(k) = γ˜ (k) when k ∈ Z+ holds by construction.
Remark 9. Expressed in the form of equation 3.20, one can easily note two fundamental things. The first is that we can see the iteration parameter as a real parameter. Consequently, equation 3.20 is the coordinate expression of a path (or trajectory) which joins
3.2 Regularization by early stopping of iterative algorithms
29
two points: x0 and x∞ = (H T H)−1 H T y. Convergence is obvious. The second is that, if H = λI, with λ > 0 then this path would be a line segment (if λ = 1 the path degenerate to a single point), and hence in general the curvature of the path depends on the forward system matrix H. Moreover we can write the algorithm in the frequency domain. The algorithm takes the form: t t yˆ i t 2 0 2 ˆ ˆ (3.21) x ˆi = 1 − |hi | x ˆi + 1 − 1 − |hi | ˆi h ˆ i | are the singular value of the forward operator H, and hence |h ˆ i |2 are the where |h eigenvalues of the operator H T H. In the frequency space, equation 3.21 is the coordinate expression of the path γˆ : t ∈ R+ → Rn , that is the Fourier transform of γ. Restricted to each component is the ˆ i ). parametrization of the line segment (ˆ x0i , yˆi /h ˆ i |2 and Remark 10. Two different line segments x ˆti and x ˆtj with different eigenvalues |h ˆ j |2 are covered by the algorithm with different velocities depending on the respective |h eigenvalue, i.e. |ˆ xtj − x ˆ0j | |ˆ xti − x ˆ0i | ˆ i |2 < |h ˆ j |2 < if |h yˆj 0 ˆ0i | | ˆyˆi − x |ˆ − x ˆj | h i
hj
for all t > 0.
3.2.1 On the semi-convergence of the Landweber algorithm We can theoretically compute the reconstruction error, as a distance between the object x and the t-th estimation of the algorithm. This computation is just only theoretical since in a real case we obviously can not dispose of the object x. But this theoretical computation leads us to some qualitative results which are the goal of this paragraph. In general a reconstruction error can be defined as the pull-back on the curve γ E = γ ∗ (d) of a distance d : Rn → R+ defined on the entire space Rn . There is a lot of ways to compute the distance between two points in Rn . To our theoretical aim we restrict our attention to the euclidean distance. By Parseval equality we can compute the reconstruction error as 1 t E(t) = ||xt − x||2 = ||ˆ x −x ˆ||2 . n From equation 3.21 we can easily compute the difference between x ˆti − x ˆi , that is t w ˆi w ˆi ˆ i |2 + x ˆti − x ˆi = 1 − |h (ˆ x0i − x ˆi ) − hi hi
30
Statistical approach to image reconstruction
Hence, the euclidean distance between xti and xi is " # 2t 2 | w ˆ | w ˆ w ˆ 1 X i i i ˆ i |2 1 − |h |ˆ x0i − x ˆi |2 − (ˆ x0i − x ˆi ) + x0i − x ˆi ) − (ˆ + ˆi ˆi ˆ i |2 n h h |h i # " t 2 |w ˆi |2 w ˆ | w ˆ | w ˆ i i i ˆ i |2 (3.22) + x0i − x ˆi ) − 2 x0i − x ˆi ) + (ˆ (ˆ + 1 − |h ˆ i |2 ˆ i |2 ˆi ˆi |h |h h h Then, the reconstruction error 3.22 is the sum, for each frequency, of three addends which can be convex or concave function of t, according to the sign of the random variable wi . As already mentioned, when solving a set of linear ill-posed equations by an iterative method typically the iterates first improve, while at later stages the influence of the noise becomes more and more noticeable. This phenomenon can be expressed in terms of reconstruction error by requiring that it is a decreasing function of t until a certain (optimal for the reconstruction) value, and then it is an increasing function of t. As in literature can not be found a definition of semi-convergence of a sequence or an algorithm we need to give the semi-convergence concept for an iterative scheme. Definition 1. Let N be a normed space. We say that a sequence γ˜ in N is semi-convergent respect to a point x ∈ S when there exists a value k ∈ Z+ such that D(˜ γ (k), x) < D(˜ γ (k − 1), x) for all k < k and D(˜ γ (k), x) > D(˜ γ (k − 1), x) for all k > k. Analogously we define the semi-convergence for a curve respect to a point: Definition 2. Let N be a normed space. We say that a curve γ in N is semi-convergent respect to a point x ∈ S when there exists a value t ∈ R+ such that d(D(γ(t), x)) 0 dt for all t > t. Usually, being the algorithms convergent, the reconstruction error tends to a limit value when t tends to infinity. For the Landweber algorithm this limit value is n
lim kˆ xt − x ˆk2 =
t→∞
ˆi |2 1 X |w ˆ i |2 n |h
(3.23)
i=1
ˆ i |2 ≤ 1 for each i ∈ {1, . . . , n}, we have because, being 0 < |h t ˆ i |2 = 0 . lim 1 − |h t→∞
In the following we give an example showing that the Landweber method may not have a semi-convergent behavior.
31
3.2 Regularization by early stopping of iterative algorithms
Example 2. A case of non semi-convergent behavior. We show a case in which the Landweber method does not have the semi-convergent property. We consider a system defined by equation 2.9. We suppose n = 2, [x01 , x02 ] = [17.5, 277.5], and the matrix " # 0.055 0.045 H= , 0.045 0.055 By taking [w1 , w2 ] = [5.0, 1.5], thanks to equation 2.9 we have [y1 , y2 ] = [18.45, 17.55] . In this case, we can explicitly compute the non semi-convergent reconstruction error, as shown in figure 3.2.
100%
80%
E(t)
60%
40%
20%
0% 100
101
102 103 104 number of iterations
105
Figure 3.2: The reconstruction error as a function of the number of iteration. This toy-example shows that the semi-convergent behavior is not an intrinsic property of the Landweber algorithm. It depends also on the specific realization of the noise and on the eigenvalues of the matrix H. Nevertheless, in general, in the experimental cases this property is verified. Indeed, we will show what we call the tendency of the Landweber algorithm to semi-converge with respect to to the generalized solution x† . A specific circumstance mainly contributes to this tendency, i.e. the distribution of the modulus of the eigenvalues of the operator H T H. In the previous example, we have chosen two different eigenvalues. But in general, in a image formation system, the modulus of the eigenvalues has a very particular distribution: the majority of the eigenvalues are very small and only a few part of them is sensibly larger than zero. This depends on the fact that the image formation system is typically characterized by the loss of information about the input signal, and in particular about the higher frequencies. An example of this kind of distributions is figure 3.2.1. But let’s proceed in an orderly fashion. First we prove that
32
Statistical approach to image reconstruction
Proposition 4. Any frequency component of the expected reconstruction error with respect to the white Gaussian noise w of variance σ 2 is given by Ew (|xti − xi |2 ) =
2 |ˆ x0i − x ˆi |2 ˆ i |2 )2t + σ (1 − |h ˆ i |2 )2t (1 − |h ˆ i |2 n |h 2 σ2 ˆ i |2 )t + σ (1 − |h −2 ˆ i |2 ˆ i |2 |h |h
(3.24)
Proof. Since w is a n-dimensional white Gaussian random vector of variance σ 2 then its Fourier transform w ˆ is a white Gaussian random variable of variance nσ 2 . Hence E (w ˆi ) = 0, E (w ˆi w ˆj ) = nσ 2 δij for every i, j ∈ {1, . . . , n}. By applying these rules to any frequency component of equation 3.22 we obtain the thesis. As concerns the behavior of the expected reconstruction error as a function of t, we know that for t → ∞ the function tends to a constant value, and precisely, thanks to equation 3.23 n X 1 t 2 2 (3.25) lim Ew kˆ x −x ˆk = σ 2 ˆ t→∞ i |hi | Moreover, if x0 6= x the function strongly decreases when t = 0, since
d Ew kˆ xt − x ˆk2 dt
t=0
= −2
n X i
x0i − x ˆi |2 ˆ i |2 ) |ˆ >0 log(1 − |h n
In general, without any additional hypothesis about the eigenvalues of the system nothing can be said about the behavior between 0 and ∞ - increasing or decreasing - of this function. Indeed, the expected reconstruction error is the sum of n terms, depending on |hi |2 . Each of these terms has a unique minimum, depending on the value of |hi |2 . Hence, if the eigenvalues belong to two circles around the origin of the complex plan, only two different modulus of the eigenvalues are available. Then, the expected reconstruction error would be the sum of two sets of functions, the firsts with minimum located at x0 and the seconds at x1 . The resulting sum can have two minima and so it would not be semi-convergent. This situation is summarized by the example 2. Remark 11. Usually, in an image formation system the distribution of the modulus |hi |2 of the eigenvalues is a decreasing function with domain between (0, 1). In a continuous setup we can consider the following family of distribution F|ch| ˆ 2 (x) = −
(c + 1)2 √ c x log x c2
for any c > 1, which, when c is large enough, is a good approximation of the |hi |2 distribution. ˆ 2 of the expected reconstruction error Proposition 5. The expected value with respect to |h|
3.2 Regularization by early stopping of iterative algorithms
33
Figure 3.3: The distribution of the modulus of the eigenvalues F|ch| ˆ 2 by varying the value of c. The more c increases, the more the maximum of the distribution increases and its position is shifted close to zero. Among this family of distributions, by taking c large enough, we can find distributions corresponding to ill-conditioned image formation systems.
Ew (||xt − x||2 ) is
t 2 E|h| = ˆ 2 Ew (kx − xk )
(c + 1)2 1 0 2 kx − xk R 2t + 1, 1 + c2 c 1 +nσ 2 R 2t + 1, c 1 2 2 2 −2nσ R t + 1, + nσ c , c
where the function R is defined as
R(a, b) = B (a, b) [Ha+b−1 − Hb−1 ] , B is the beta-function and H is the harmonic number.
(3.26)
34
Statistical approach to image reconstruction
Proof. The expected value of any term -depending on t- of the sum 3.24 is given by i Z 1 h 1 (c + 1)2 2t c 2 2t ˆ R 2t + 1, 1 + (1 − x) F|h| = E|h| ˆ 2 (1 − |h| ) ˆ 2 (x)dx = c2 c 0 " # Z 1 ˆ 2 )2t (1 − |h| (1 − x)2t c (c + 1)2 1 E|h| F (x)dx = R 2t + 1, = ˆ2 ˆ2 |h| ˆ2 x c2 c |h| 0 " # Z 1 ˆ 2 )t (1 − x)t c (c + 1)2 1 (1 − |h| F|h| R t + 1, = E|h| ˆ2 ˆ 2 (x)dx = ˆ2 x c2 c |h| 0 " # Z 1 1 c 1 2 F|h| = E|h| ˆ2 ˆ 2 (x)dx = (c + 1) 2 ˆ |h| 0 x ˆ 2 of any term of the expected reconrespectively. The expected value with respect to |h| xti − x ˆi |2 is struction error Ew |ˆ E|h| ˆ2
xti Ew |ˆ
2
−x ˆi |
0 (c + 1)2 |ˆ xi − x ˆi |2 1 = R 2t + 1, 1 + c2 n c 1 +σ 2 R 2t + 1, c 1 2 2 2 −2σ R t + 1, +σ c c
Hence, since E|h| ˆ2
" Ew (||xt − x||2 ) = E|h| ˆ 2 Ew =
n X i
n X i
|ˆ xti − x ˆi |2
!#
E|h| xti − x ˆi |2 ˆ 2 Ew |ˆ
the thesis holds true. One can numerically verify that the function 3.26 is semi-convergent for all values of c and kx0 − xk2 . Figure 3.4 shows the behavior of this function for some value of his parameters.
3.3 Regularization in a Bayesian setting The remark 8 is not surprising in the framework of inverse problem theory. Indeed it is generally accepted that, if the formulation of the problem does not use some additional information on the object, then the resulting problem is ill-posed. This is what happens in the maximum likelihood approach because we only use information about the noise. The additional information may consist, for instance, in prescribed bounds on the solution and/or its derivatives up to a certain order (in general not greater than two). This
35
3.3 Regularization in a Bayesian setting
20 15 10 5 50
100
150
200
250
-5 -10
200 150 100 50 200000 400000 600000 800000 1·106 -50 -100
t 2 Figure 3.4: On the upper plot, the graphics of the function E|h| ˆ 2 Ew (||x − x|| ) with kx0 − xk2 = 50 and c = 1, 2, 4, 8, 16, 32. On the lower plot, the graphics of the function t 2 0 2 3 4 5 E|h| ˆ 2 Ew (||x − x|| ) with kx − xk = 10000 and c = 1, 5, 25, 5 , 5 , 5 .
prescribed bounds can be introduced in the problem as additional constraints in the variational formulation provided by ML. However, in this chapter, we adopt a completely probabilistic approach, called Bayesian approach, where the additional information is given in the form of statistical properties of the object. In other words, one assumes that the unknown object x is also a realization of a (vector valued) random variable X. Then, a different interpretation of the probability density pY (y; x) is introduced: this is considered as the conditional probability density of Y when the random variable X assumes the value x pY (y; x) = pY (y|X = x) . For simplicity we will write pY (y|x). Then additional information on the unknown object x is introduced by providing the probability density of X, the so-called prior, that will be denoted by pX (x). The most frequently used priors are of the Gibbs type, i.e they have the following form pX (x) =
1 −µΩ(x) e , Z
(3.27)
36
Statistical approach to image reconstruction
where Z is a normalization constant, µ is a positive parameter (a hyper-parameter in the statistical language, a regularization parameter in the language of regularization theory), while Ω(x) is a functional, possibly convex. The previous assumptions imply that the joint probability density of the random variables X, Y is given by pXY (x, y) = pY (y|x)pX (x) . If we introduce the marginal probability density of Y Z pY (y) = pXY (x, y) dx , from Bayes formula we obtain the conditional probability density of X for a given value y of Y pY (y|x)pX (x) pXY (x, y) = . pX (x|y) = pY (y) pY (y) If in this equation we insert the detected value y of the image, we obtain the a posteriori probability density of X PyX (x) = pX (x|y) = LYy (x)
pX (x) . pY (y)
(3.28)
Then, a maximum a posteriori (MAP) estimate of the unknown object is defined as any object x∗ that maximizes the a posteriori probability density x∗ = argmaxn PyX (x) . x∈R
As in the case of the likelihood it is convenient to consider the neglog function of PyX (x). If we assume a Gibbs prior as that given in equation 3.27 and we take into account the definition of equation 3.2, we can introduce the following functional Jµ (x) = −A lnPyX (x) + B − A ln Z − A ln pY (y)
(3.29)
= J0 (x; y) + µJR (x) ,
where JR (x) = AΩ(x). This notation is introduced because the functional coming from the Gibbs prior is conceived as a regularization functional. Therefore the MAP estimates are also given by x∗ = arg minn Jµ (x) x∈R
(3.30)
and again one must look for the minimum points satisfying the non-negativity constraint. We conclude by remarking that it is not obvious that a minimum point x∗ of Jµ (x) is a sensible estimate x ¯ of the unknown object. In fact, in this formulation we have a free parameter µ (that, for analogy with regularization theory, we will call regularization parameter). In the classical regularization theory, a wide literature exists on the problem of the optimal choice of this parameter [27] but, as far as we know, this problem has not yet been thoroughly investigated in the more general framework provided by Bayesian regularization. In chapter 8 we introduce a further approach to face this problem.
37
3.3 Regularization in a Bayesian setting
3.3.1 Priors Many priors have been proposed in the literature. In this section we provide an overview of the commonly used ones. For the first two presented we give the probability distribution in addition to the regularizing functional. For the others, it is sufficient to give the functional, being the related probability distribution easily to compute and less interesting for the following.
3.3.1.1
Tikhonov regularization
The most widely used is the prior related to the Tikhonov regularization, i.e. pX (x) =
µn 2
π
˜k22 , e−µkx − x
(3.31)
where x ˜ is a given object whose features are similar to those which we expect on the solution thanks to our “a priori” knowledge. This probability distribution enforces smoothness on the MAP estimate according to the value of the µ parameters. Indeed, the term regularization parameter depends on this fact. This prior lead to the following regularization functional, in the sense of equation 3.29 JR (x) = kx − x ˜k22 , 3.3.1.2
(3.32)
Sparsity regularization
To enforce sparsity on the ML solution, we can use the L1 prior, i.e. pX (x) =
µ n 2
˜k1 , e−µkx − x
(3.33)
where the symbol k · k1 indicates the 1-norm. Again, µ controls the level of sparsity on the MAP estimate, and the corresponding regularized functional is JR (x) = kx − x ˜k1 ,
(3.34)
Very often no knowledge is available on the object x ˜, and so one assumes x ˜ = 0.
3.3.1.3
Entropy regularization
We consider regularization in terms of the Kullback-Leibler divergence of a reference vector x ¯ from the unknown vector x n X xj ¯ j − xj . JR (x) = xj ln + x x ¯j j=1
38
Statistical approach to image reconstruction
3.3.1.4
Edge-preserving regularization
We show a generalization to multi-dimensional spaces of the class of edge-preserving functionals given in [75, 76]. Firstly, we define two shift functions − ∆− k xj = ∆k x { j1
= x { j1
, ... , jk −1 , ... , jp }
(3.35)
∆+ k xj
= x { j1
, ... , jk +1 , ... , jp }
(3.36)
=
, ... , jp } x ∆+ k { j1 , ... , jp }
where the multi-index j has been written as j = { j1 , . . . , jp }. For example, when x is an image p = 2. This class of edge-preserving functionals depends on a nonnegative and nondecreasing, continuously differentiable function ψ, which we discuss below. It can be written as n X ψ kD j xk2 , (3.37) JR (x) = j=1
where
Dj x = and
D1j x , . . . , Dpj x
,
Dkj x = xj − ∆− k xj ,
(3.38) (3.39)
when j = 1, . . . , n and k = 1, . . . , p. Remark 12. Being Dk a linear operator on Rn , for every k, one can consider the matrix associated with them on the canonical basis. With a little abuse of notation we denote these matrices with Dk their self. One can simply verify that (Dk )T x = x − ∆+ kx .
(3.40)
Examples for ψ are provided by the following functions where T is a thresholding parameter which is necessary to avoiding the singularity of the gradient at the origin. – Total Variation regularization ψ(t) =
p
t + T2 − T
(3.41)
From the behavior of the function for large t, we see that we obtain a regularization by means of the 1-norm of the first difference. – Huber regularization ψ(t) =
√ t , t ≤ T 2 ; ψ(t) = 2 t − T , t > T 2 T
The behavior of this function for large t is similar to that of the previous function. – Geman & McClure regularization ψ(t) =
Tt t + T2
(3.42)
We remark that, while the functionals corresponding to the previous functions are convex, that corresponding to this one is not.
4 Gradient-type algorithms for nonnegative image reconstruction
In addition to the regularized approach, given by a stopping criterion or in a Bayesian framework, certain kinds of inverse problems are particularly suited to the introduction of specific constraints on the solution. Image reconstruction is exactly one of these cases. Indeed, having described the images as vectors in Rn , one can restrict the space of the feasible solutions to some subset which represents the totality of the images having some particular common features. This subset can be also non-convex, but usually it is. This depends on the kind of constraints we want to use and also on the fact that the techniques of non-convex optimization are computationally expensive. The non-negativity constraint is particularly suited for image deblurring since the image are supposed to be nonnegative. The corresponding subset, the so-called nonnegative orthant, is a convex subset of Rn usually denoted with Rn+ . The addition of non-negativity in a ML problem does not provide regularization, even if, as far as we know, a thorough investigation of the ill-posedness of the resulting constrained ML problem has still to be done. Moreover, we will see with an example in the case of Gaussian noise, that the nonnegative constraint enforces sparsity on the solutions rather than regularization. In this chapter we refresh several iterative methods introduced to minimize the neglog of the constrained ML functional. When the functional is convex these methods converge to nonnegative ML solutions. They have been proposed for different applications, and, for some of them, it is known from numerical practice that the “semi-convergence” property holds true. For all of these “constrained” algorithms, early stopping of the iteration provides “regularized” solutions. After their presentation, we give a description of them in terms of scaled gradient methods. We will see that in general, all of these methods can be viewed as scaled and projected gradient algorithms. Each one of these algorithms is defined by a recursive formula that depends on the
40
Gradient-type algorithms for nonnegative image reconstruction
gradient of the neglog of the ML functional, which in his turn depends on probability distribution of the noise and so on the statistics of noise. In the following, when we deal with Gaussian noise, we will refresh the projected Landweber (PL) method and the iterative image space reconstruction algorithm (ISRA). In the case of Poisson noise, we have the Richardson-Lucy (RL) algorithm, while in the mixed case, Poisson together with Gaussian noise, we study the algorithm which derives from the Expectation-Maximization (EM) approach, and it has been introduced in [67]. Even if they work well in many instances, they are not frequently used in practice because, in general, they require a large number of iterations before providing a sensible solution. Nevertheless we will show in this chapter that these methods, once organized as scaled and projected gradient methods, can increase their efficiency by applying special acceleration techniques that have been recently developed in the area of the gradient methods. In particular, we propose the application of efficient step-length selection rules and linesearch strategies. In the following chapter we also show results concerning the performance of the accelerated version of the PL and ISRA algorithms. We evaluate their behavior in comparison with recent scaled gradient projection (SGP) methods for image deblurring. Numerical experiments demonstrate that the accelerated methods still exhibit the semi-convergence property, with a considerable gain both in the number of iterations and in the computational time.
4.1 The case of Gaussian noise When the noise on the data is assumed Gaussian, the Maximum Likelihood approach to image deblurring leads to minimize the so-called least-squares (LS) functional which is an ill-posed problem. However, the situation may change if one considers a constrained LS problem 1 minimize Jg (f ) = ||Hf − g||2X 2 sub. to f ∈ Ω ,
(4.1)
where Ω is a closed and convex set. For instance, if Ω is compact and H is injective, then there exists a unique solution that depends continuously on the data g [40]. In many instances, the most natural constraint is non-negativity and the set of nonnegative functions is closed and convex in L2 but not compact. If H(Ω), the image of Ω, is not closed, then the constrained LS problem is still ill-posed: a solution may not exist if a noisy g is not in H(Ω). We are interested in the discrete version of the problem, i.e. H is a matrix, x, y are vectors/arrays and X is the `2 space; furthermore, we assume that the feasible region Ω is the nonnegative orthant. In this case, the problem 4.1 reduces to the following
41
4.1 The case of Gaussian noise
convex optimization problem 1 minimize Jy (x) = ||Hx − y||2 2 sub. to x ≥ 0 ,
(4.2)
where k · k denotes the usual `2 norm of vectors/arrays. Then a solution always exists, and is also unique if the null space of H contains only the null element (N (H) = {0}), a property that, in general, is satisfied as a consequence of the approximation errors in the computation of the matrix H. However, if H is ill-conditioned, we expect that this solution is completely deprived of physical meaning. In [3] arguments are given to suggest that a nonnegative LS solution is a night-sky reconstruction, i.e. consists of a set of bright points over a black background. Of course, such an effect is a consequence of the noise perturbing the data. In order to verify the result of the arguments of [3], we study the LS solution in the case of a test problem generated from the 256 × 256 image of the nebula NGC5979 shown in the upper-left panel of figure 4.1. This image is convolved with the PSF described in section 5.1 and the result is corrupted with additive white Gaussian noise. The resulting blurred and noisy image (upper-right panel of figure 4.1 is deconvolved with an iterative algorithm converging to the minimizer of the LS problem 4.2 (uniqueness holds true since the discrete Fourier transform of the PSF is never zero); the minimizer is shown in the lower-left panel of figure 4.1. The result is just a night-sky reconstruction, since it is a “sparse” object, consisting of brights spots over a nearly black background. We believe that the distribution of the bright spots depends on the realization of the noise but we did not check this point. Moreover, in the lower-right panel of figure 4.1 we show the normalized residual, as defined in section 5.1. It looks as a map of correlated noise and this means that the minimizer is a solution of the problem that is not acceptable also from the statistical point of view [61]. In addition, we remark that the flux of the object (that, in the case of a nonnegative object coincides with its `1 norm) and that of the reconstruction differ of only one unit on the fourth digit, i.e. a `1 -constraint is in practice satisfied by the minimizer. As it is known, such a constraint enforces sparsity in the solution. This remark also suggests that nonnegative LS solutions may be reliable in the case of sparse objects, such as a star cluster in astronomy, while they are not in the case of diffuse objects. Iterative methods have been proposed for the computation of constrained LS solutions: the projected Landweber (PL) method [25], in the case of problem 4.1 with Ω a general closed and convex set, and the iterative space reconstruction algorithm (ISRA) [22], in the case of problem 4.2. Numerical experiments on the reconstruction of diffuse objects demonstrate that both methods exhibit the semi-convergence property [57, 27, 5]: the results of the iteration first provide better and better approximations of the true object but after a certain point they turn to the worse, due to increased noise propagation. Therefore, in practice, regularization is obtained by a suitable stopping of the iterations. These numerical results are not supported by theory since, as far as we know, there is no proof that these iterative methods have regularization properties.
42
Gradient-type algorithms for nonnegative image reconstruction
240
140
210
120
180 100 150 80 120 60 90 40 60 20
30
0
0
8
4
4
3
0
2
−4
1
−8
0
−12
−1
−16 −2 −20 −3 −24 −4
Figure 4.1: Upper panels: the object (left) and the corresponding blurred and noisy image (right). Lower panels: the reconstruction provided by the non-negatively constrained minimum of the LS functional (log scale; left) and the corresponding normalized residual (right).
4.1.1 The projected Landweber method The projected Landweber (PL) method is investigated in [25] in the general case of problem 4.1, where X is a Hilbert space. Then, if we denote by PΩ the projection onto Ω, the PL iteration is defined by h i (4.3) xk+1 = PΩ xk + α(H T y − H T Hxk ) , where α is a fixed step-length in the descent direction
−∇Jy (x) = H T y − H T Hx .
(4.4)
In [25] (Theorem 3.2) it is proved that, if y ∈ H(Ω) and α satisfies the condition 0 0, by remarking that the properties 2.8 implies Hx > 0 for any x > 0, then it is easy to prove by induction that, for any given x0 > 0, all xk are strictly positive, and therefore the algorithm is well defined. Under these conditions, as proved in [23], the sequence {xk } is asymptotically convergent to solutions of problem 4.2, i.e. the limit of each convergent subsequence of the sequence {xk } is a nonnegative LS solution x∗ . If the solution is unique then the sequence is convergent. We remark that the condition H T y > 0 may not be satisfied because the basic assumption underlying the approach is that the noise corrupting the data is additive, white and Gaussian, with zero expected value. Therefore, y, as well as H T y, can have negative
44
Gradient-type algorithms for nonnegative image reconstruction
components. A similar situation arises if a constant background is superposed to the image or the expected value of the noise is positive. Indeed these quantities must be subtracted from the data otherwise the non-negativity constraint is not active. Also in this case the subtracted data can take negative values. However, it is possible to avoid this difficulty with a simple modification of the algorithm. Let b be a positive constant such that H T y + b > 0; then, we consider the following modified ISRA HT y + b . (4.8) xk+1 = xk T H Hxk + b It is easy to show, by a simple modification of De Pierro’s proof, that the new sequence {xk } is also asymptotically convergent to nonnegative LS solutions. As already remarked, numerical experience demonstrates that the sequence of the xk has the property of semi-convergence, so that regularization can be obtained by early stopping of the iterations. In the absence of ad hoc stopping rules, one can use also in this case the discrepancy principle 4.6.
4.2 The case of Poisson noise When the noise on the data is of Poisson type, the ML approach to image deblurring leads to minimize the so-called Kullback-Liebler (KL) divergence which is an ill-posed problem. However, the situation may change if one considers a constrained KL minimization Z g(s) ˜ minimize Jg (f ) = g(s) log + (Hf (s) + b) − g(s) ds (Hf (s) + b) sub. to f (s) ∈ Ω , (4.9) where Ω is a compact and convex set, s is a integration variable and can be two or more dimensional. As in the Gaussian case, H is injective. The properties of this functional and its minimization are investigated in [55, 56, 54]. In particular, in [54] an example is given where the functional does not have a minimum in the classical sense, hence proving the ill-posedness of this minimization problem. As a consequence, we should expect that noise strongly affects the minima of the discrete problem. Indeed this is the case and the specific effect of the noise in this problem is known as checkerboard effect, since many components of the minima are zero. Dealing with image reconstruction problem, and hence with a the discrete version of the problem 4.9, H is a matrix and x, y are vectors/arrays. The feasible region Ω is the nonnegative orthant. In this case, the problem 4.9 reduces to the following convex optimization problem m X minimize Jy (x) = yi ln i=1
sub. to
x≥0 .
yi + (Hx + b)i − yi (Hx + b)i
(4.10)
45
4.2 The case of Poisson noise
As in the Gaussian case, a solution always exists, and is also unique if the null space of H contains only the null element (N (H) = {0}). In general, this property is satisfied as a consequence of the approximation errors in the computation of the matrix H. However, if H is ill-conditioned, we expect that this solution is completely deprived of physical meaning.
4.2.1 Richardson-Lucy algorithm The iterative algorithm most frequently used in image deconvolution with Poisson data was proposed independently by Richardson [63] and Lucy [53]. It is generally known as Richardson-Lucy (RL) algorithm, in applications to astronomy and microscopy. In the paper by Shepp & Vardi [65] the same algorithm was re-obtained by considering the maximization of the likelihood function of equation 2.12, and applying a general approach known in statistics as expectation maximization (EM) method [24]. For this reason, in emission tomography, the acronym EM (sometimes ML-EM) is used for denoting this specific algorithm. First application to microscopy was investigated in [35][37], while application to astronomy was stimulated by the restoration of HST images [77, 33]. Here we give the algorithm in the form proposed in [59, 67] for taking into account background emission. In fact, in the presence of a background, the non-negativity constraint is active only when using this modified form of the iteration; otherwise, ringing effects may appear in the reconstructed image [46]. If we assume that the PSF satisfies the normalization condition of equation 2.8, then, for k = 0, 1, ... we have x(k+1) = x(k) H T
y Hx(k)
+b
,
(4.11)
the iteration being, in general, initialized with a constant array/cube. In such a case all the x(k) are strictly positive. In the case b = 0 all the x(k) satisfy condition X (k) X xi = yi , (4.12) i
and this property is a key point in all the convergence proofs we know [74, 45, 56, 39, 38] (see also [57]); since it is not satisfied if b 6= 0, a convergence proof seems to be lacking in this case. Notice that in the continuous case the algorithm takes the same form - with the obvious changes in the interpretation of the symbols - if the integral kernel h(s) satisfies a normalization condition similar to that of equation 2.8, the sum being replaced by an integral. Moreover, in the case b = 0, the iterations satisfy condition 4.12, again with the sum replaced by an integral. In this case, the continuous algorithm is investigated in [55]-[54] where it is proved that, if the algorithm converges to x∗ , then x∗ is a minimizer of 4.9. Moreover, J˜y (x(k) ) is decreasing with k. However, the first convergence result is proved in [62], where it is shown that, if the equation Hf = g has a nonnegative solution x∗ , i.e. if the data are in the range of the operator H, then the iteration converges to x∗ with respect to weak topology in Lebesgue spaces.
46
Gradient-type algorithms for nonnegative image reconstruction
It is known, from numerical practice, that the RL algorithm has regularization properties [5, 57] in the case of the reconstruction of non-sparse objects: sensible solutions can be obtained by an early stopping of the iteration, i.e. the iteration must not be pushed to convergence and the algorithm is semi-convergent. Therefore, the problem arises of finding appropriate stopping rules. In [62] a first analysis of some regularization properties of the algorithm is also given.
4.3 The case of Gaussian-Poisson noise In [67] Snyder et al. investigated the ML approach to the deconvolution of images acquired by a charge-coupled-device camera and proved that the iterative method proposed in [52] by Llacer and Nunez can be derived from the Expectation-Maximization method of Dempster et al. in [24] for the solution of ML problems. In this section we show that the iterative method proposed by the above mentioned authors is a scaled gradient method for the constrained minimization of this function in the closed and convex cone of the non-negative vectors and that, if it is convergent, the limit is a solution of the constrained ML problem. Moreover, by looking for the asymptotic behavior in the regime of high number of photons counting, we find an approximation that, as proved by numerical experiments, works well for any number of photons, thus providing an efficient implementation of the algorithm. The model of yi , discussed in [67], is (obj)
yi = yi
(back)
+ yi
(ron)
+ yi
,
(4.13)
(obj)
where yi is the number of photoelectrons arising from the object radiation and is (back) described by a Poisson process with expected value (Hx)i ; yi is the number of background photoelectrons (including external and internal background, dark current, (ron) and bias) and is also described by a Poisson process with expected value bi ; and yi is the so-called amplifier read-out noise, described by an additive Gaussian process, with expected value r and variance σ 2 . For simplicity, we assume r = 0 since it is always possible to shift the data by a suitable constant (without changing their statistics) to satisfy this condition. In the model of image formation defined by equation 4.13, the values of an image are not necessarily positive; negative values can occur, as a consequence of the additive Gaussian noise when the background is zero or sufficiently small. However the goal of image reconstruction is to produce a non-negative estimate of the unknown object. Therefore, as we already remarked, the problem is the minimization of Jy (x) on the closed and convex cone C of the non-negative vectors. Thanks to the results of the previous Chapter, this problem has a solution. Then, the likelihood is the function of x obtained by letting y be the detected image, LYy (x) = pY (y; x), and the ML approach maximizes this function of x. As usual, the
47
4.3 The case of Gaussian-Poisson noise
maximization problem can be equivalently stated as a minimization problem by considering the negative-log (neglog) of the likelihood. More precisely, we set ) ( +∞ X X (Hx + b)ni − 12 (n−yi )2 e 2σ , (4.14) Jy (x) = (Hx + b)i − log n! n=0 i∈S
the domain of the functional being the convex and closed cone of the nonnegative vectors. An iterative algorithm for the solution of the problem is derived in [67] using the Expectation-Maximization (EM) method of Dempster et al. [24] for the solution of ML problems. Regularized versions of the same algorithm are proposed and tested in [46]. The algorithm has the form x(k+1) =
x(k) T q(x(k) ) . H h p(x(k) )
(4.15)
where the function q and p are defined by equation 3.12 and equation 3.13, respectively. Moreover, we find an approximation that greatly reduces the computational cost of the algorithm and that, even if derived by means of asymptotic expansions, can be applied in all practical cases. Remark 13. In all our numerical experiments we always found convergence of the algorithm, and, thanks to the previous result, we can conclude that the limit is a ML solution. However, we also know that these solutions are corrupted by noise propagation so that they are not reliable estimates of the unknown object. Our numerical experience demonstrates that also the algorithm 4.26 has the semi-convergence property. Therefore, early stopping of the iterations can be an easy and simple way for obtaining “regularized” solutions. This conclusion does not mean that it is not necessary to consider more refined regularization methods.
4.3.1 An efficient approximation of the Gaussian - Poisson minimization algorithm In many cases the best result can only be obtained with a large number of iterations. Since the cost of one iteration in equation 4.15 is high (one has to compute the quotient of two series), the computational burden can become excessive. For this reason, it is important to have an approximate but sufficiently accurate estimate of the quotient of equation 4.15. Two different approaches were proposed [66, 68]. The first consists in adding σ 2 to the data, so that the new data are affected by additive Gaussian noise with expected value σ 2 , and in approximating the Gaussian distribution with a Poisson distribution. The final result consists in modifying the data and background terms so that the iterative algorithm is basically the RLM with the above mentioned modifications x(k+1) =
y + σ2 x(k) T H . h Hx(k) + b + σ 2
(4.16)
The second approach is based on a saddle-point approximation of the mixed GaussPoisson distribution. Its implementation is not as simple as the Poisson approximation,
48
Gradient-type algorithms for nonnegative image reconstruction
and it is also computationally more expensive. Moreover, a numerical validation shows that, for an increasing number of iterations, the two approximations provide essentially the same results [68]. We propose a different approximation that, in principle, applies only to the pixels where yi is sufficiently large (for instance, yi > 30). It is based on approximating the Poisson distribution with a Gaussian one and, although its derivation is lengthy (see the Appendix), the final result is quite simple. Moreover, a numerical implementation has demonstrated that it works well also for small and negative values of yi and that it is more accurate than the approximation provided by equation 4.16. This point will be discussed in chapter 6. The approximation is obtained from asymptotic approximations of the functions p(s; t) and q(s; t), defined √ in equations 3.12-3.13, for large values of t, s satisfying the condition |s − t| ≤ c t; these are derived in the Appendix. From equations A.9 and A.11 we have 1 + 2(s − t) 1 q(s; t) . = exp − 1+O √ 2 p(s; t) 2(s + σ ) t Then, from equation 3.15, in a pixel where yi is large, we get qi (x) 1 1 + 2 ((Hx + b)i − yi ) = exp − 1+O √ , (4.17) pi (x) 2 ((Hx + b)i + σ 2 ) yi and so the approximated algorithm takes the form " # (k) + b) − y (k) 1 + 2 (Hx x i i . H T exp − x(k+1) = h 2 (Hx(k) + b)i + σ 2
(4.18)
The unexpected and interesting result provided by our numerical experiments is that the approximation provided by this equation can be used everywhere, thus reducing the computational burden of the method in a significant way. Indeed, in such a case, one iteration has approximately the same computational cost of one RLM iteration, as defined in equation 4.16. Let us also remark that, if (Hx + b)i and yi are large, then, by taking into account that −1/2 the exponent in equation 4.17 is of the order of yi , the first-order Taylor expansion of the exponential provides yi + σ 2 1 + 2 ((Hx + b)i − yi ) ' exp − , 2 ((Hx + b)i + σ 2 ) (Hx(k) + b)i + σ 2 and therefore the two approximations coincide.
4.4 The Split Gradient Method The formulations discussed in the previous sections lead to the following general problem minimize
J(x)
subject to
x≥0 ,
(4.19)
49
4.4 The Split Gradient Method
where J(x) can be equal to the neglog-likelihood function Jy (x) in the case of ML problem, or to the sum of Jy (x) and the neglog-prior function JR (x) weighted by a regularization parameter µ, in the case of MAP estimation. As follows from the examples discussed above, we can assume that both are convex so that we have a convex minimization problem. As already remarked, the ML problem can not be treated as a standard optimization problem because it is ill-posed and we do not want to reach the minimum; on the other hand this is just what we want in the case of the MAP problem. Therefore it seems that it is necessary to use different methods in the two cases and, in fact, this is what people are usually doing. In this section we discuss an idea that can provide a unified approach to both problems. Since J(x) is convex all its minima are global. Then the Karush-Kuhn-Tucker (KKT) conditions are necessary and sufficient conditions for a point x∗ to be a minimum of J(x) x∗ ∇J(x∗ ) = 0 ,
x∗ ≥ 0 , ∇J(x∗ ) ≥ 0 .
Let us consider now the following decomposition of the gradient [47, 46] −∇J(x) = U (x) − V (x) ; U (x) ≥ 0 , V (x) > 0 . It is obvious that such a decomposition always exists but is not unique. Different choices of the vectors U, V can be used and this non-uniqueness may be an advantage in some cases. However, the applicability of the approach is based on the fact that in all models that have been introduced for image reconstruction a natural decomposition of the gradient of this kind can be found, with explicit expressions of U, V . This point will become clear from the inspection of the examples that will be discussed in the following. By assuming that we have selected a decomposition of the gradient in the previous form, then we can write the first KKT condition as a fixed point equation x∗ = T (x∗ ) , with T (x) = x
U (x) . V (x)
The operator T (·) is well defined, since V (x) > 0. Moreover it is continuous if the functional J(x) is continuously differentiable, as it is assumed, because, in such a case, it is possible to choose continuous functions U, V . By applying the method of successive approximations we get the following iterative algorithm – give x(0) > 0 – given x(k) compute U (x(k) ) x(k+1) = x(k) . (4.20) V (x(k) )
50
Gradient-type algorithms for nonnegative image reconstruction
About the convergence of the algorithm nothing can be said at this stage of the analysis since the operator T (·) is not in general a contraction. However we can remark some interesting features suggesting that it can deserve further considerations. The first is that all the iterates are automatically non-negative. The second property of the algorithm of equation 4.20 is contained in the following Proposition. Proposition 6. If the sequence of the iterates {x(k) } is convergent to x∗ and if U (x) > 0 for any x > 0, then x∗ solves (4.19). Proof. It is sufficient to prove that x∗ satisfies the KKT conditions. The first one is satisfied because, thanks to the continuity of T (·), x∗ is a fixed point of T (·). Moreover, the assumption U (x) > 0 for any x > 0 implies that all the iterates x(k) are strictly positive if x(0) > 0, as one can easily proves by induction. It follows that x∗ ≥ 0. Therefore we have only to check that also the third KKT condition is satisfied. It is certainly satisfied for all the values of the index such that x∗j > 0, because it follows from the first condition. It is also satisfied if x∗j = 0 and Uj (x∗ ) = 0 because, in such a case, the corresponding component of the gradient is strictly positive. Then, let us assume that it is not satisfied for a value of the index such that x∗j = 0, Uj (x∗ ) > 0, i. e. let us assume that Uj (x∗ ) >1 . x∗j = 0 , Vj (x∗ ) It follows that there exists k0 such that, for any k ≥ k0 , we have Uj (x(k) ) >1 . Vj (x(k) ) (k+1)
Since all the iterates are strictly positive, we get xj (k)
assumption that the limit of xj
(k)
> xj , in contradiction with the
is zero.
The last remark is that the algorithm is a scaled-gradient method, with step-size 1, since it can be written in the following form x(k+1) = x(k) − Sk ∇J(x(k) ) where Sk = diag
(
(k)
xj
Vj (x(k) )
)
.
(4.21)
(4.22)
Remark 14. It is important to remark that, in [47, 46] the algorithm is presented as a descent method with a step-size selection. Indeed, it is written in the following form x(k+1) = x(k) + λk
o x(k) n (k) (k) U (x ) − V (x ) V (x(k) )
(4.23) (0)
and the step-size λk > 0 is chosen in the following way. First an upper bound λk is determined in order to ensure that x(k+1) ≥ 0. This is obtained by looking at the values of
51
4.4 The Split Gradient Method (k)
j such that xj > 0 and [∇J(x(k) )]j > 0. If we denote by I+ the set of these index values, then it is easy to see that ( ) Vj (x(k) ) (0) λk = min ≥1. j∈I+ Vj (x(k) ) − Uj (x(k) ) (0)
Next, the step-size λk is optimized by a line search in the interval (0, λk ] using, for instance, Armijo rule. In such a way convergence of the method is ensured.
4.4.1 Maximum likelihood estimates In table 4.1 we give possible choices of the functions Uy (x), Vy (x) associated to the functionals Jy (x) for the three noise models discussed previous chapters. It is obvious that it is possible to obtain other acceptable choices by adding, for instance, a suitable constant to both functions.
Gauss
Uy (x)
Vy (x)
2H T y
2(H T Hx + b)
Poisson
HT
y Hx + b
h
Gauss-Poisson
HT
q(x; y) p(x; y)
h
Table 4.1: The functions Uy , Vy for the three noise models. The functions p, q of the third line are defined respectively in equations 3.12 and 3.13, while h is defined in equation 3.16. The interesting point is that, if we particularize the general algorithm of equation 4.20 to the three noise models, we obtain three well-known algorithms proposed for image reconstruction. Indeed, in the case of Gaussian noise we obtain x(k+1) = x(k)
HT y , H T Hx(k) + b
(4.24)
and this is the image iterative space reconstruction algorithm (ISRA), introduced in [22], whose asymptotic convergence is proved in [23]. More precisely the original algorithm is with b = 0, but the proof of convergence can be easily extended to the case b 6= 0. In the case of Poisson noise we obtain x(k+1) =
y x(k) T H , (k) h Hx + b
(4.25)
52
Gradient-type algorithms for nonnegative image reconstruction
and this is the expectation maximization (EM) algorithm proposed in [65] and known as Richardson-Lucy (RL) algorithm in image deconvolution [63, 53]. More precisely equation 4.25 is the modified version of the algorithm introduced in [67] for taking into account background emission. Finally, in the case of Gauss Poisson noise, by a suitable splitting of the gradient, we have again the same algorithm proposed by Llacer and Nu˜ nez [52] as well as by Snyder et al. [67] – choose x(0) > 0; – given x(k) , compute x(k+1) =
x(k) T q(x(k) ) . H h p(x(k) )
(4.26)
where the functions p, q are defined respectively in equations 3.12 and 3.13. In this way, the derivation of this algorithm is quite simple. It is sufficient to take Uy (x) = H T
q(x(k) ; y) p(x(k) ; y)
Vy (x) = h The following partial result can be easily proved. Proposition 7. If the sequence of the iterates is convergent, then the limit satisfies the KKT-conditions and therefore is a constrained minimum of the functional Jy (x).
Proof. Let us denote by x∗ the limit of the sequence {x(k) }∞ k=1 . Since the operator T is continuous, it is evident that the limit satisfies the first KKT-condition. Moreover, since all the iterates are strictly positive, as we already remarked, it also follows that the limit is non-negative. Therefore, we have only to prove that the third KKT-condition is satisfied. This follows from the previous result in the pixels where x∗j > 0, so we must only consider the pixels where x∗j = 0. Let us assume that the condition is not satisfied in one of these pixels, i.e. we assume that, for a given j ∗ 1 T q(x ) ∗ H >1 . xj = 0 , hj p(x∗ ) j It follows that there exists k0 such that, for any k ≥ k0 1 hj
q(x(k) ) H p(x(k) ) T
!
>1 .
j
Then, since all the iterates are strictly positive, it also follows that, for all these values (k+1) (k) of k, xj > xj , in contradiction with the assumption that the limit is zero.
53
4.4 The Split Gradient Method
4.4.2 Maximum a posteriori estimates In the case of a regularized functional, i.e. when J(x) = Jµ (x) the general algorithm of equation 4.20 takes the following form x(k+1) = x(k)
(k) (k) Uµ (x(k) ) (k) Uy (x ) + µUR (x ) = x , Vµ (x(k) ) Vy (x(k) ) + µVR (x(k) )
(4.27)
where Uy (x), Vy (x) come from the likelihood while UR (x), VR (x) come from the prior. Here we only remark that this algorithm has a very simple structure (linear fractional dependence of the iterates on the regularization parameter µ) suggesting an implementation based on an auxiliary function that can be called for different kinds of noise and regularization. In this way one can have a general algorithm working for all known kinds of likelihoods and priors. Unfortunately a convergence analysis is lacking. Now, we give some examples of the U and V functions for different kinds of regularization functionals. For the most widely used Tikhonov regularization, recalling notations of section 3.3.1, we have UR (x) = 0 VR (x) = x − x ˜ while in the sparsity regularization we have UR (x) = 0
VR (x) = sign(x)
In the Entropy regularization one knows that the gradient is given by −∇JR (x) = lnx − ln¯ x = ln
x , x ¯
where ln x denotes the vector/array obtained by taking the logarithm of x component by component or pixel by pixel. We remark that the gradient is not bounded at the origin.
4.4.2.1
Edge-preserving regularization
Always recalling notations in section 3.3.1, we consider the shift functions (equation 3.36) and the associated discrete derivatives of a vector (equation 3.38). The gradient of the functional 3.37 is given by ∇JR (x) =
p X k=1
(Dk )T ψ 0 (x)Dk (x) ,
where (Dk )T is the transpose of Dk , {ψ 0 (x)}j = ψ 0 (xj ), and the product between ψ 0 (x) and Dk (x) is to be intended element-wise. Moreover, (Dk )T is briefly discussed in remark 12.
54
Gradient-type algorithms for nonnegative image reconstruction
The gradient of equation 3.37 with respect to x leads to a general splitting of JR in the case of edge-preserving functional. On the nonnegative orthant, we can choose the following UR , VR functions UR (x) =
p X
0 − ψ 0 (x) x + ∆+ k ψ (x) ∆k x
k=1
VR (x) =
p X
− 0 0 ∆+ k ψ (x) x + ψ (x) ∆k x
k=1
In the following we give the form of ψ 0 , in the cases discussed in section 3.3.1. As above, T is a thresholding parameter. – Total variation regularization 1 . ψ 0 (t) = √ t + T2 From the behavior of the function for large t, we see that we obtain a regularization by means of the 1-norm of the first difference. – Huber regularization ψ 0 (t) =
1 1 , t ≤ T 2 ; ψ 0 (t) = √ , t > T 2 . T t
The behavior of this function for large t is similar to that of the previous function. – Geman & McClure regularization ψ 0 (t) =
T3 . (t + T 2 )2
We remark that, while the functionals corresponding to the previous functions are convex, that corresponding to this one is not.
4.5 Efficient gradient methods A very significant improvement is obtained by means of a recently proposed method [13], based on the scaling of the gradient suggested by the SGM algorithm. This approach is called scaled gradient projection (SGP) method and, in this case, a convergence proof is available. Moreover, it can be applied to all the iterative algorithms based on a descent direction obtained with a suitable scaling of the gradient. In general, it can be used for solving the minimization of a continuously differentiable function J(x) over a closed and convex set Ω. Problem 4.19 is exactly that case when Ω is the nonnegative orthant. To this purpose we recall the definition of feasible vector taken from Bertsekas [7]. Given a feasible vector x, i.e. a vector x ∈ Ω, a feasible direction at x is a vector z such that x + λz ∈ Ω for sufficiently small λ > 0. For instance, the negative scaled gradient of equation 4.21 is a descent direction that is also a feasible direction.
55
4.5 Efficient gradient methods
If x(k) is the current iteration, then, as proved in [13], for any given positive step-length αk and any positive definite scaling matrix Sk n o d(k) = PΩ,Sk x(k) − αk Sk ∇J(x(k) ) − x(k) is a feasible and descent direction, the projection being on Ω in the metric induced by the scalar product (Sk−1 x, z). Finally, the new iteration is obtained by means of a line search along the descent direction x(k+1) = x(k) + λd(k) ,
(4.28)
based, for instance, on the Armijo rule [7]. Convergence of the algorithm is proved for any selection of the step-lengths and scalings in positive compact sets. Acceleration is obtained by using suitable Barzilai-Borwein rules [4] for step-length selection and suitable scaling. Algorithm SGP: Scaled Gradient Projection Method 1. Initialization. Let αmin , αmax ∈ R be such that 0 < αmin < αmax , β, γ ∈ (0, 1) and let M be a positive integer. Set x(0) ∈ C, S0 ∈ S, α0 ∈ [αmin , αmax ]. For k = 0, 1, 2, . . . 2. Projection.
Compute the descent direction d(k) = PC,S −1 (x(k) − αk Sk ∇J(x(k) )) − x(k) . k
3. Line-search. Set λk = 1 and J¯ =
max
0≤j≤min{k,M −1}
J(x(k−j) ).
While J(x(k) + λk d(k) ) > J¯ + γλk ∇J(x(k) )T d(k) λk = βλk end. Set x(k+1) = x(k) + λk d(k) . 4. Update.
Sefine Sk+1 ∈ S and αk+1 ∈ [αmin , αmax ].
end The following notations are useful to go deeper into the subject. For a given vector x ∈ Rn , we denote by kxk√ S the norm induced by the n × n symmetric positive definite matrix S, that is, kxkS = xT Sx. Furthermore, for some given positive scalars c1 and c2 , let S be the set of the n × n symmetric positive definite matrices S such that c1 kxk2 ≤ xT Sx ≤ c2 kxk2 ,
∀ x ∈ Rn .
(4.29)
Finally, we denote by PC,S (x) the projection of x ∈ Rn over C in the norm k · kS , that is PC,S (x) = argmin kz − xkS = argmin z∈C
z∈C
1 T z Sz − z T Sx . 2
(4.30)
56
Gradient-type algorithms for nonnegative image reconstruction
The main steps of the SGP method are stated in Algorithm SGP. Several reasons make this approach appealing for solving problem (3.28). First of all, it is very simple: it belongs to the class of standard scaled gradient methods [7] with variable step-length αk and non-monotone line-search strategy [10]. Secondly, due to the special constraints of the problem and to appropriate choices of Sk , the projection operation in step 2 can be computationally non-expensive. Finally, the iterative scheme can achieve good convergence rate by exploiting the effective step-length selection rules recently proposed in literature. For the sake of completeness, we report some important details on the SGP implementation evaluated in this work. The choice of the scaling matrix Sk must avoid to introduce significant computational costs and, in particular, it must keep the projection PC,S −1 (·) k in step 2 computationally non-expensive. This can be done for example by using a diagonal scaling, so that the projection is obtained by solving a separable quadratic program for which efficient linear-time solvers can be used [20, 44]. According to the considerations of the previous section, we choose the following modification of the scaling matrix defined in (4.22) ( ( (k) )) xj , (4.31) Sk = diag max c1 , Vj (x(k) ) where c1 > 0 is a prefixed threshold. Therefore (4.29) is satisfied, c1 being just this threshold and c2 = c/ν, with c the flux constant and ν > 0 such that (4.32) ν = min min{Vj (x)} . j
x∈C
The line-search step of the SGP consists in a non-monotone strategy that uses successive reductions of λk to make J(x(k+1) ) lower than the maximum of the objective function on the last M iterations [10, 32]. Of course, if M = 1 then the strategy reduces to the standard Armijo rule. The updating rule for the step-length αk is crucial for improving the convergence rate of the scheme; we use special step-length selections derived by the two Barzilai-Borwein (BB) rules [4], as usually done in many effective gradient methods [20, 21, 28, 64, 79]. In case of scaled gradient methods, by proceeding as for the derivation of the BB rules, we can regard the matrix B(αk ) = (αk Sk )−1 as an approximation of the Hessian ∇2 J(x(k) ) and force a quasi-Newton property on B(αk ) = argmin kB(α)s(k−1) − z (k−1) k αBB1 k α∈R
or = argmin ks(k−1) − B(α)−1 z (k−1) k, αBB2 k α∈R
where s(k−1) = x(k) −x(k−1) and z (k−1) = ∇J(x(k) )−∇J(x(k−1) ) ; in this way, the following step-lengths are obtained T
αBB1 k
=
s(k−1) Sk−1 Sk−1 s(k−1) T
s(k−1) Sk−1 z (k−1)
(4.33)
57
4.5 Efficient gradient methods
and
T
αBB2 = k
s(k−1) Sk z (k−1) T
z (k−1) Sk Sk z (k−1)
.
(4.34)
At this point, following [28, 79], we exploit an adaptive alternation of the values provided by equations 4.33 and 4.34. More precisely the selection algorithm works as follows. We fix an interval [αmin , αmax ], and we choose a value of α0 in this interval. We also fix τ1 ∈ (0, 1) and a nonnegative T integer Mα . Next, for k = 1, 2, ..., if sk−1 tk−1 ≤ 0, then αk = αmax , else we compute oo n n αk1 = max αmin , min αkBB1 , αmax oo n n . αk2 = max αmin , min αkBB2 , αmax If, αk2 /αk1 ≤ τk , then we set n o αk = min αj2 , j = max{1, k − Mα }, ..., k ; τk+1 = 0.9 τk ;
else we set
αk = αk1 ; τk+1 = 1.1 τk . Unlike the rules in [28] and [79], the above strategy updates the threshold τk at each iteration; in our experience, this makes the choice of τ1 less important for the performance of the method and reduces the drawbacks due to the use of the same step-length rule in too many consecutive iterations. A convergence analysis of the SGP method is carried out in [13] for the general case of the minimization of differentiable functions on closed convex sets. This analysis is based on several well-known technical results on gradient projection type methods [7, 10, 11] and it is not included in the present discussion. Here, we simply recall that when algorithm SGP is applied to problem (3.28), the following proposition may be derived from [13]. Proposition 8. Let {x(k) } be the sequence generated by applying algorithm SGP to problem (3.28). Every accumulation point x∗ of {x(k) } is a constrained stationary point, that is ∇J(x∗ )T (x − x∗ ) ≥ 0 ∀ x ∈ C.
If J(x) is a convex function, then every accumulation point of {x(k) } is a solution of problem (3.28). Concerning the acceleration approach based on the use of the Armijo rule along the feasible direction, we first remark that, since Ω is the nonnegative orthant, it is not necessary to modify the projection operator as done in [13], where q the projection is
performed according to the norm induced by Sk −1 , i.e. ||x||S −1 = k feasible direction is given by h i z k = P+ xk + αk Sk (H T y − H T Hxk ) − xk .
xT Sk−1 x. Then the
In [13] it is proved that this is also a descent direction and therefore the current iteration can be updated as in equation 4.28, with λk determined by means of Armijo rule [7].
58
Gradient-type algorithms for nonnegative image reconstruction
4.5.1 Acceleration of PL, ISRA and RL algorithm Improved versions of PL and ISRA can be obtained by exploiting the SGP accelerating strategies. In this framework, the distinctive feature between this two methods is the scaling matrix. Both methods use diagonal scaling matrices, which ensure that the direction of the scaled gradient is descent with respect to the functional. The acceleration of PL method starts from the identity scaling matrix Sk = S(x(k) ) = I(x(k) ) and so this accelerated algorithm is also called Gradient Projection (GP), being the scaling irrelevant. Remark 15. In the case of image deblurring we have in particular that the imaging matrix H is a circulant matrix, and hence the diagonal of the matrix H T H is given by n X j=1
T Hij Hji =
X (Hij )2 = const ∀i ∈ {1, . . . , n} .
(4.35)
j
This means that, the Hessian of the LS functional has a constant diagonal, and in this way, GP can be reinterpreted as a projected quasi-Newton method for a constrained LS problem, with a very efficient step length selection strategy. On the other hand, the acceleration of the ISRA starts from the scaling matrix S(x(k) ) =
x(k) . H T Hx(k) + b
Then this matrix has to be thresholded according to equations 4.31 and 4.32. Finally, in the case of Poisson noise, the acceleration of the RL algorithm starts from the scaling matrix x(k) S(x(k) ) = . h where h is defined by equation 3.16, and usually, in an image deblurring case, it is a constant vector. Remark 16. Another advantage one can obtain by coupling SGM with SGP techniques is to dispose of convergent and very efficient algorithms also for the regularized functionals, whatever being the choice of the regularizing functional. Following the idea in equation 4.27 one can compute the regularized scaling matrices, according to the formula S(x(k) ) =
x(k) . Vy (x(k) ) + µVR (x(k) )
This remark is the fundamental idea of the first section of chapter 8.
5 Image reconstruction in the case of Gaussian noise
In this chapter we consider two different applications of the algorithms we have presented in the previous chapter, to get reconstruction of images when the noise on the data is Gaussian. Our purposes are mainly two. First, we want to check the algorithms introduced in the previous chapter in the case of image deconvolution by considering a few test problems. Moreover we introduce some other used algorithms in order to compare the performance of the accelerated algorithms proposed in the previous chapter. In particular, we will use a PSF with a corresponding modular transfer function (MTF), i.e. the modulus of its (discrete) Fourier transform, which simulates that taken by a ground based telescope. Secondly, we deal with a problem of archaeological magnetic prospection. In such a case most targets may be simply modeled by a single layer of constant depth and thickness. In this assumption, the recovering of the magnetization distribution of the buried layer from magnetic surface measurements is an ill-posed problem described by a first kind Fredholm equation that requires stabilization techniques to be solved. In analogy with the image reconstruction theory, by formulating the problem as a deconvolution, we can consider the solution showing the resolved subsoil features as the focused version of the blurred and noisy magnetic rough data image. Exploiting the image deconvolution tools, we apply two iterative reconstruction methods, the projected Landweber and a modified ISRA. We use different regularizing functionals in order to inject a priori information in the optimization problem, showing that edge-preserving total variationlike functionals give better results.
5.1 Efficient algorithms in a simulated astronomical example We compare the algorithms introduced in the previous sections in the case of image deconvolution by assuming that Ω is the nonnegative orthant. In such a case Hi,j =
60
Image reconstruction in the case of Gaussian noise
hi−j , where i, j are multi-indexes ranging on the domain of the image, an array N × N . The array hi is the point spread function (PSF) of the imaging system and we assume that it satisfies the positivity and normalization condition of equations 2.8. We consider a few test problems. The PSF used in our numerical experiments is shown in figure 5.1 (left panel) together with the corresponding modular transfer function (MTF), i.e. the modulus of its (discrete) Fourier transform (right panel). The MTF is visualized using a log-scale. The PSF simulates that taken by a ground based telescope (in the right panel of figure 5.1 the band of the telescope is evident as well as the out-of-band noise affecting the PSF) and can be downloaded from http://www.mathcs.emory.edu/∼ nagy/RestoreTools/.
0.00105 0.00090
−1.5 −3.0
0.00075 −4.5 0.00060
−6.0
0.00045
−7.5 −9.0
0.00030 −10.5 0.00015
−12.0
0.00000
−13.5
Figure 5.1: The PSF used in our numerical experiments (left panel) and the corresponding OTF (right panel).
We remark that the minimum value of the MTF is not zero, even if it is quite small, about 5.82 × 10−13 . Therefore the minimum of the corresponding constrained LS problem is unique, even if it is not an acceptable solution (see the section 2.2) and figure 4.1.
240
105
210
90
180
75
150
60
120
45
90 30 60 15 30 0 0
Figure 5.2: The satellite (left panel) and the corresponding blurred and noisy image (right panel).
We consider two objects: an image of the nebula Ngc5979, already shown in the upperleft panel of figure 4.1, and the frequently used satellite image, shown in the left panel of
5.1 Efficient algorithms in a simulated astronomical example
61
figure 5.2. Both objects have values ranging approximately from 0 to 255, but they have quite different features: the nebula is a diffuse and smooth object, while the satellite has sharp edges and a rather complex structure. Both objects are convolved with the PSF shown in figure 5.1 and the results are perturbed with additive white Gaussian noise with zero expected value and two different variances: σ 2 = 1 and σ 2 = 5. The blurred and noisy images (σ 2 =5) are shown in figure 4.1 (upper-right panel) and in figure 5.2 (right panel).
5.1.1 Acceleration of the basic algorithms More classical improved versions of PL and ISRA can be obtained by exploiting some well-known accelerating strategies widely used in the area of the first-order optimization methods. To introduce them, we recall the definition of projection arc taken from Bertsekas [7]. The projection arc associated to x is the set of vectors defined by x(µ) = P+ [x − µ∇Jy (x)] , µ > 0 .
(5.1)
where P+ is the projection on the nonnegative orthant, as already defined in the previous chapter. This definition can be extended to the case where the gradient is replaced by a scaled gradient with a positive definite scaling matrix S. Then the projection arc is the set of vectors given by xS (µ) = P+ [x − µS∇Jy (x)] , µ > 0 .
(5.2)
Now, we summarize the methods that we will be compared with the accelerated gradient projected method presented in the previous chapter.
5.1.1.1
PL combined with Armijo rule along the projection arc
The most simple way for accelerating PL is to update the current iteration xk by exploiting a variable step-length obtained form an Armijo rule along the projection arc xk (µ), defined by equation 5.1 with x replaced by xk . Therefore the iteration takes the following form h i (5.3) xk+1 = P+ xk + µk (H T y − H T Hxk ) , with µk chosen according to the rule: fixed the constants µ ¯ > 0 and T, γ ∈ (0, 1), then k m k ¯, where mk is the first nonnegative integer for which µ =T µ Jy (xk ) − Jy (xk (T m µ ¯)) ≥ γ∇Jy (xk )T (xk − xk (T m µ ¯)) ,
(5.4)
the gradient being given in equation 3.17. This algorithm will be called projection arc (PA) algorithm. Convergence proof is given, for instance, in [43]. In PA, the search for µk uses a fixed initial guess µ ¯, the same at each iteration, and, for this reason, the procedure may be inefficient in some cases. To overcome this drawback, the more sophisticated line-search strategy introduced in [51] can be used. At iteration
62
Image reconstruction in the case of Gaussian noise
k the previous step-length µk−1 is assumed as initial trial in the search for the suited µk . If this value does not satisfy the sufficient decrease condition 5.4 at iteration k, then the step-length is decreased by multiplying by T , until the condition is satisfied. On the other hand, if the sufficient decrease condition is satisfied by µk−1 , then the step-length is increased by dividing by T to reach a value such that the condition is not satisfied; the preceding value is used as the new step-length. We denote this algorithm as PA_1.
5.1.1.2
ISRA combined with Armijo rule along the projection arc
As already observed in chapter 4, ISRA can be viewed as a special scaled gradient method that exploits the direction dk = −
xk H T Hxk
+b
∇Jy (xk ).
This scaled gradient direction is feasible since, for any µ ∈ (0, 1] xk+1 = xk + µdk is nonnegative; more precisely, the non-negativity of xk+1 holds true for all µ smaller than some µkmax ≥ 1, and, in [47], a moderate acceleration of ISRA is obtained by searching for a suited µ (applying, for instance, Armijo rule) in the interval (0, µkmax ). The extension of the acceleration technique used in the PA algorithms to ISRA is obvious. Given the current iterate xk and the scaling matrix Sk , the corresponding projection arc is given by equation 5.2, with x replaced by xk and S replaced by Sk . Then, the iteration of this generalized ISRA is given by h i (5.5) xk+1 = P+ xk + µk Sk (H T y − H T Hxk ) , where P+ is the projection on the nonnegative orthant and µk is chosen according to Armijo rule as in equation 5.4. This algorithm will be called scaled projection arc (SPA) algorithm and the proof of its convergence can be found in [43]. Moreover, as in the case of PL, we can use the line-search approach suggested in [51]; the corresponding algorithm will be called SPA_1.
About the choice of the scaling matrix Sk , it can be derived from the diagonal scaling used in equation 4.21. However, for ensuring properties useful in the convergence proofs (see [7, 43, 13]), the following slightly modified form of the diagonal scaling is used xk 1 , , Sk = diag min L, max L H T Hxk + b
where L > 1 is an appropriate threshold (in our simulations we take L = 1010 ). It is obvious that such a modification is irrelevant in practice.
5.1.2 Numerical experiments We have six accelerated algorithms at our disposal: GP, PA, PA_1 as acceleration of the PL method and SGP, SPA and SPA_1 as acceleration of the ISRA.
63
5.1 Efficient algorithms in a simulated astronomical example
Since all the algorithms converge to the unique LS solution, and have the semiconvergence property, we consider two stopping rules. The first consists in choosing the value of k corresponding to the minimum of the relative reconstruction error, defined as follows in terms of the `2 norm ρk =
||xk − x ¯|| , ||¯ x||
(5.6)
where x ¯ is the original object. Table 5.1: Reconstruction of the nebula Ngc5979 in the case σ 2 = 1. Relative reconstruction error and number of iterations are given for the two stopping rules (minimum and discrepancy).
rel. err.
minimum iter iter/sec
time(sec)
discrepancy rel. err. iter
PL
8,06 %
1134
485
2,34 sec
8,95 %
464
PA
8,05 %
777
166
4,67 sec
8,93 %
318
PA_1
8,04 %
211
106
2,00 sec
8,60 %
124
GP
8,08 %
155
228
0,68 sec
8,78 %
73
ISRA
7,50 %
1800
409
4,40 sec
8,21 %
806
SPA
7,96 %
568
178
3,20 sec
8,29 %
335
SPA_1
8,93 %
472
98
4,81 sec
9,05 %
344
SGP
7,51 %
109
163
0,67 sec
8,13 %
65
The second is the discrepancy principle defined in 4.6. Moreover, if x ¯∗ is a generic solution provided by one of these stopping rules, for verifying its statistical significance [61], we compute the normalized residual defined by R∗ =
1 (H x ¯∗ − y) . σ
(5.7)
Finally, a careful parameter tuning for our simulations suggests the following parameter setting: PL: µ = 1.8; PA: µ ¯ = 6.0; PA_1: µ ¯ = 3.0 in the first iteration; GP: µ0 = 1.3, µmin = 10−3 , µmax = 105 , Mµ = 2, τ1 = 0.15; SPA: µ ¯ = 6.0, L = 105 ; SPA_1: µ ¯ = 3.0 in the first iteration, L = 105 ; SGP: µ0 = 1.3, µmin = 10−3 , µmax = 105 , Mµ = 2, τ1 = 0.15,L = 105 .
64
Image reconstruction in the case of Gaussian noise
Table 5.2: Reconstruction of the nebula Ngc5979 in the case σ 2 = 5. Relative reconstruction error and number of iterations are given for the two stopping rules.
rel. err.
minimum iter iter/sec
time(sec)
discrepancy rel. err. iter
PL
10,03 %
488
440
1,11 sec
11,13 %
193
PA
10,01 %
333
161
2,07 sec
11,08 %
135
9,96 %
103
17
6,14 sec
10,64 %
54
10,02 %
79
214
0,37 sec
10,88 %
41
ISRA
9,36 %
743
383
1,94 sec
10,24 %
317
SPA
9,95 %
219
171
1,28 sec
10,22 %
139
10,87 %
206
99
2,08 sec
10,99 %
161
9,43 %
54
142
0,38 sec
10,05 %
35
PA_1 GP
SPA_1 SGP
The other parameters used in the line-search strategies are: T = 0.4 and γ = 10−4 . Furthermore, for all the algorithms x0 is the constant image with the same flux of the detected image. In table 5.1 and table 5.2 we report the results obtained in the case of the nebula for the two different noise levels. From these results we can observe that GP and SGP outperform all the other methods both as number of iterations and computing time. Moreover, their reconstruction errors are very well comparable with those provided by the other methods. The last two columns of tables 5.1 and 5.2 refer to the results obtained with the discrepancy principle. We remark that it provides a smaller number of iterations and a higher reconstruction error, but the corresponding reconstructions are still acceptable. In figure 5.3 we show the best reconstructions, i.e. those corresponding to the minimum error 5.6, obtained by means of SGP for the two noise levels, with the corresponding normalized residuals. In table 5.3 and table 5.4 we report the results obtained in the case of the satellite for the two different noise levels, while in figure 5.4 we show the best reconstructions obtained by means of SGP with the corresponding normalized residuals. In this case SGP clearly outperforms the other methods. However, in the case σ 2 = 1, the normalized residual shows a few artifacts, indicating that this reconstruction presumably is not the best one from the statistical point of view. In any case similar artifacts are present also in the residuals of the reconstructions provided by the other methods. We remark that the scaled methods provide reconstructions with an error that is about 2% higher than that of the non-scaled methods. Therefore, for completeness, we show in figure 5.5 the reconstructions and the residuals obtained by means of GP. A visual inspection seems to
65
5.1 Efficient algorithms in a simulated astronomical example
270 240 210
240 210
180
180
150
150
120
120
90
90
60
60
30
30
0
0
4
4
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−3
−4
−4
Figure 5.3: Upper panels: the best reconstruction of the nebula in the case σ 2 = 1 (left) and in the case σ 2 = 5 (right). Lower panels: the corresponding normalized residual.
confirm that these reconstructions are better than those provided by SGP. Moreover the residuals do not exhibit the artifacts observed in figure 5.4. This effect of the scaling requires a further investigation. We also performed some comparisons with the well-known Modified Residual Norm Steepest Descent algorithm [2]: in the case of the nebula, it exhibits a very poor convergence rate with respect to GP and SGP, while it behaves similarly to GP in the case of the satellite. Thus, SGP seems clearly faster than the residual norm steepest descent approach, as already observed in the case of the deconvolution of images corrupted by Poisson noise [13]. In order to asses the computational complexity of SGP , we study the reconstruction time in relation to the size of the test images. We start with the 256 × 256 objects of the previous tests and we expand, by zero padding, their Fourier transform to an array 512 × 512. The same procedure is applied to the PSF (remark that, in these simulations, we use an ideal PSF, i.e. an Airy pattern, providing a smaller reconstruction error for a given noise level). But, while the new PSF is already normalized to 1 as the previous one, we multiply the new object by 4 so that its mean pixel value is approximately the same of the original object. As a consequence, if we convolve the new object with the new PSF and we perturb the result with additive Gaussian noise with the same σ 2 , we obtain a noise level very close to that of the smaller images. The same procedure is
66
Image reconstruction in the case of Gaussian noise
Table 5.3: Reconstruction of the satellite in the case σ 2 = 1. Relative reconstruction error and number of iterations are given for the two stopping rules.
rel. err.
minimum iter iter/sec
time(sec)
discrepancy rel. err. iter
PL
27,83 %
10307
468
22,02 sec
30,35 %
2355
PA
27,83 %
3600
260
13,86 sec
30,34 %
920
PA_1
27,83 %
3166
121
26,12 sec
30,34 %
964
GP
27,82 %
1702
213
8,00 sec
30,18 %
416
ISRA
29,95 %
5248
403
13,01 sec
30,21 %
3466
SPA
30,23 %
1348
182
7,41 sec
30,41 %
961
SPA_1
29,99 %
1215
100
12,17 sec
30,12 %
916
SGP
29,98 %
238
146
1,63 sec
30,10 %
170
Table 5.4: Reconstruction of the satellite in the case σ 2 = 5. Relative reconstruction error and number of iterations are given for the two stopping rules.
rel. err.
minimum iter iter/sec
time(sec)
discrepancy rel. err. iter
PL
31,24 %
3813
415
9,18 sec
34,19 %
886
PA
31,24 %
1400
246
5,69 sec
34,17 %
377
PA_1
31,23 %
1316
126
10,42 sec
34,18 %
406
GP
31,21 %
806
228
3,54 sec
33,86 %
99
ISRA
32,84 %
2369
419
5,65 sec
33,73 %
1280
SPA
33,18 %
630
174
3,62 sec
33,94 %
363
SPA_1
32,90 %
581
101
5,78 sec
33,47 %
373
SGP
32,82 %
124
146
0,85 sec
33,65 %
78
repeated for obtaining images 1024 × 1024 and 2048 × 2048. In this way, for each object, we have 4 test images with different sizes and approximately the same noise level. Next, we determine the optimal number of iterations for the 256 × 256 images by minimizing the relative reconstruction error 5.6 and we reconstruct the images with larger
67
5.2 Regularized gradient algorithms in a geomagnetic example
320
320
280
280
240
240
200
200
160
160
120
120
80
80
40
40
0
0
4
4
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−3
−4
−4
Figure 5.4: Upper panels: the best SGP reconstruction of the satellite in the case σ 2 = 1 (left) and in the case σ 2 = 5 (right). Lower panels: the corresponding normalized residual.
size using this number: therefore the computational time of each reconstruction depends only on the computational time of a single iteration. The results are reported in table 5.5. We remark a slight decrease in the reconstruction error when the size of the images increases, even if it does not correspond to a significant improvement of the reconstruction. By comparing the computational times for the different sizes it seems that the SGP complexity is essentially related to the complexity of the FFT algorithm.
5.2 Regularized gradient algorithms in a geomagnetic example Now we introduce the second example of this chapter. The problem we face is one of the hardest problems in potential fields prospection and is to achieve information about the depth of the sources causing the measured anomalies. There are some cases in which the aim of the survey is more oriented to delineate the horizontal position, extent and edges of the sources rather than depth. For example, the lineaments extrapolated from magnetic maps are employed for small-scale regional geological interpretation. Another interesting case is the use of magnetics in archaeological exploration [14, 34]. In fact
68
Image reconstruction in the case of Gaussian noise
270
270
240
240
210
210
180
180
150
150
120
120
90
90
60
60
30
30
0
0
4
4
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−3
−4
−4
Figure 5.5: Upper panels: the best GP reconstruction of the satellite in the case σ 2 = 1 (left) and in the case σ 2 = 5 (right). Lower panels: the corresponding normalized residual.
magnetic prospection is routinely employed in archaeological site surveying to map buried stone/brick foundation structures, roads or graves and to outline the locations of kilns, hearths and ferric objects [78]. The archaeological interpretation of magnetic data requires the detection of the signal in the background noise, and the reconstruction of horizontal images of the subsoil features, showing location, shape and dimensions of the archaeological targets in an easily readable form also to non geophysical operators. To achieve these goals, additional data processing is generally needed with emphasis on image enhancement and data filtering or inversion [41, 71]. Treating the buried bodies as a set of vertical sided rectangular prisms is a technique frequently used in two and three dimensional modeling of the archaeological structures buried in the subsoil. In a simplified approach (the single layer), the prisms can be supposed to be buried at equal depths [72, 1] and characterized by homogeneous magnetization of induced type or, in the case of permanent magnetization, along a known direction. This single layer model is effective in many situations such as foundation structures, defense walls, roads and graves. In other cases, when the permanent magnetization is relevant but unknown (like for buried kilns or ferric bodies), usually target individuation can be successfully reached without further image enhancement [14].
69
5.2 Regularized gradient algorithms in a geomagnetic example
Table 5.5: Behavior of the SGP algorithm applied to test images with different sizes. The same number of iterations is used for a given test object and a given noise level. An ideal psf is used in these simulations. nebula σ 2 = 1, iter. = 58
σ 2 = 5, iter. = 12
dim.
rel. err
time (sec)
rel. err
time (sec)
2562
6,33
0,37
7,12
0,08
5122
6,20
1,90
6,84
0,40
10242
6,20
8,86
6,98
1,86
20482
6,18
36,90
6,78
7,78
satellite σ 2 = 1, iter. = 379
σ 2 = 5, iter. = 58
dim.
rel. err
time (sec)
rel. err
time (sec)
2562
26,18
2,29
28,55
0,36
5122
25,83
12,44
27,42
1,86
10242
25,55
59,74
27,46
8,57
20482
25,51
250,20
26,83
36,51
5.2.1 Formulation of the problem The quantity usually measured in magnetic surveys is the modulus of total geomagnetic field |T| (figure 5.6). Subtracting the Earth geomagnetic field B0 (known from IGRFInternational Geomagnetic Reference Field), we obtain |T| − |B0 |, which is a first order approximation of the projection of the anomaly in the direction of the Earth field B · (B0 /|B0 |), when |B| |B0 |, that is: |T| − |B0 | = =
|B0 |2 + 2 B0 · B + |B|2 2 B0 · B |B| +O |B0 | |B0 |
21
− |B0 | (5.8)
having applied the first order Taylor’s formula of the square root [12]. The relation between the projection of the anomalous magnetic field B(r) on the Earth field B0 (supposed constant within the survey area), with r = (r 1 , r 2 , r 3 ), and the magnetization m(s) of the sources, located in the domain R0 ⊂ R3 , with s = (s1 , s2 , s3 ),
70
Image reconstruction in the case of Gaussian noise
T B
B0
|T| − |B0 | B0 · B |B0 |
Figure 5.6: The relation between the projection of the anomalous magnetic field B(r) on the Earth field B0 , with r = (r 1 , r 2 , r 3 ), and the difference between the measured geomagnetic field modulus |T| and the IGRF |B0 |. is described by [50]: B(r) = −
Z
R0
∇(r) ∇(s)
1 · m(s) dv . |r − s|
The integrand of the above equation can be written in matrix form m1 K11 K12 K13 K(r − s) · m(s) = K21 K22 K23 m2 m3 K31 K32 K33 where
Kij (r − s) =
∂2 1 . ∂r i ∂sj |r − s|
(5.9)
(5.10)
(5.11)
We assume that the magnetization is parallel to the inducing field, i.e. m(s) = m(s) v ,
(5.12)
with v = B0 /|B0 | and m the intensity of magnetization. By projecting B(r) on B0 we have Z T vT · K(r − s) · m(s) dv v · B(r) = 0 R Z vT · K(r − s) · v m(s) dv = 0 ZR h 3D (r − s) m(s) dv . (5.13) = R0
Hence, we have a 3D convolution [9]: Z h 3D (r − s) m(s) dv y(r) = R0
where y(r) = vT · B(r).
(5.14)
5.2 Regularized gradient algorithms in a geomagnetic example
71
As shown in figure 5.6, what we actually measure is |T|, and from this we obtain |T| − |B0 |, a first order approximation of the projection of the anomaly onto the geomagnetic field direction, that we can compute through the forward model of equation 5.14 for a given m(s).
5.2.2 The single layer model As said above, the forward magnetic model is a linearized system y = H m, where m is the intensity of magnetization, y is the projection of the anomalous magnetic field, approximately measured on the surface points, and H is the linear operator defined by the kernel h 3D (r − s) [49]. We adopt a right-handed reference system. The single layer model [72] is based on the following assumptions: – The sources belong to only one layer with top at s30 and bottom at s30 + ξ. We refer to s30 as the depth and ξ the thickness of the layer. – The intensity of magnetization does not depend on s3 inside the layer and is zero outside, so that the intensity function is x(s1 , s2 ) := m(s1 , s2 , s30 ). – All the measures are taken on the plane s3 = 0. Under these assumptions equation 5.14 becomes Z Z 1 2 1 2 y(r , r ) := g(r , r , 0) = x(s1 , s2 ) h(r 1 − s1 , r 2 − s2 )ds2 ds1 (5.15) where
h(r 1 − s1 , r 2 − s2 ) =
Z
s30 +ξ
s30
h 3D (r 1 − s1 , r 2 − s2 , −s3 ) ds3
(5.16)
is the so–called impulse response function (IRF). Hence supposing that the object to be retrieved belongs to only one layer and taking the measures on a plane parallel to this layer, a good approximation of the archaeological problem, the model is expressed by a convolution product: y = h ∗ x. (5.17)
The estimation of x(s1 , s2 ) from given values y(r 1 , r 2 ) is a deconvolution problem. Major computational advantages of the convolution approach are the reduction of the size of the arrays and the use of the FFT algorithm (reducing from O(N 2 ) to O(N log(N )) the computational complexity), increasing considerably the processing efficiency.
5.2.2.1
Discretization
The discretization is carried out by replacing the continuous functions with N × M arrays sampled on a regular grid. Given a single layer of homogeneous vertical sided rectangular prisms buried at equal depth s30 and centered at positions (s1m , s2n ) where: s1m = s10 + m∆s1
m = 0, . . . , M − 1
(∆s1 sample interval along the x-axis) s2n = s20 + n∆s2 2
n = 0, . . . , N − 1
(∆s sample interval along the y-axis)
72
Image reconstruction in the case of Gaussian noise A/m 0.00235 0
0.00021 0.00014
5
0.00009 10
0.00005 0.00001
15
x
−0.00003 −0.00008
20
−0.00013 25
−0.00020 −0.00039
30 0
5
10
15
20
25
30
y
Figure 5.7: The magnetic IRF the total magnetic anomaly field y(s1i , s2j ) measured on the surface point (s1i , s2j ) is given by the discrete convolution product: y(s1i , s2i ) =
M X N X
m=1 n=1
h(s1i − s1m , s2j − s2n ) x(s1m , s2n ) + ε(s1i , s2j )
(5.18)
where the unknown of the problem x(s1m , s2n ) is the intensity of magnetization of each homogeneous prism centered at (s1m , s2n ) and h(s1i − s1m , s2j − s2n ) is the discrete impulse response function, i.e. the total magnetic field produced at the point (s1i , s2j ) by a prism of unitary magnetization at the point (s1m , s2n ) buried at depth h. In figure 5.7 we show the IRF that corresponds to a prism in (N/2, M/2) The term ε represents additive noise, here defined as instrumental and cultural noise plus the effect of deviations from the simple model assumptions. Traditionally the inverse problem expressed by equation 5.18 has been solved in the frequency domain by the “apparent susceptibility mapping” method or in the space domain by two dimensional deconvolution. The first approach [73] consists in the reduction to the pole of the Fourier transform of the total field, downward continuation and recovery of the distribution of the apparent magnetization by dividing by the “shape factor”of the prism. The second consists in the convolution of the total field with an optimum filter operator derived from k [72], or in a Monte Carlo Markov chain approach [1]. In both cases some low pass filtering must be applied to suppress high frequency noise. The major drawbacks of traditional methods arise from the fact that, in practice, the single layer model can be only a rough approximation of the actual distribution of magnetization in the subsoil. The effects due to a possible variable depth and magnetization of the archaeological targets, their geometrical complexity and the presence of other magnetization contrasts (e.g. “geological” noise) should also be taken into account. Moreover, required a priori estimations of depth and thickness of the sources may be biased. All these effects may produce anomalous and complex magnetization distributions in solutions based on the simplified assumptions. Nevertheless the single layer model is effective, simple, robust and easily applicable also in the field since it does not require a great amount of computing resources. In the following we discuss an alternative approach to the above mentioned traditional methods which is based on the similarities of the single layer model with the generic image
5.2 Regularized gradient algorithms in a geomagnetic example
73
restoration problem, in order to take advantage of the image restoration tools and to simplify the whole procedure.
5.2.2.2
Understanding “magnetic images”
With the assumption of a single layer of constant depth and thickness, the magnetic problem becomes a 2D problem in which we have to obtain the “true” image of the buried prisms, i.e. the distribution of intensity of magnetization, from the blurred image given by the measurements of the total magnetic field on the terrain surface. In other words, we have to determine the distribution of intensity of magnetization in a single layer of prisms at a constant depth h. Equation 5.18 simply is the discretization of the homogeneous Fredholm integral equation of first kind, with the addition of the (unknown) noise term. This is exactly the same model used to describe a space invariant image formation system, where x(x, y) represents the input signal of coordinates (s1 , s2 ), y(s1 , s2 ) the output signal, that is, the image, h is commonly known as the point spread function (PSF) and represents the impulse response function and ε(s1 , s2 ) represents the additive noise. In this framework, the geophysical ill-conditioned problem stated in equation 5.18 is analogous to the problem of restoration of an image x from blurred (by the operator h) and noisy (by the additive term ε due to the recording process) data y, a common task in digital image processing [6]. The comparison is summarized in table 5.2.2.2. Space invariant image formation system
Single layer potential field prospection
Data image
Measured magnetic field
Unknown object (pixels)
Unknown susceptibility sources (prisms)
PSF
IRF
Estimation of PSF
Estimations of depth of the sources
Poisson noise
Gaussian noise
Table 5.6: Comparison between archaeological prospection and image formation model In magnetic prospection, the measured magnetic field is acquired by a magnetometer (typically proton-precession or flux-gate) that produces the so-called read-out noise which is described by an additive Gaussian noise. The “geological” noise cannot be modeled as it depends on the object that we want to reconstruct and is an interpretative task. Within the assumption of single layer, the IRF is space invariant on every horizontal plain, changing only with the variation of depth. In our case the knowledge of the IRF is equivalent to the knowledge of the depth of the layer. The number of magnetic data is often smaller than the desired number of prisms which compose the
74
Image reconstruction in the case of Gaussian noise
single layer of unknown susceptibility sources, but, for using FFT, we have to match the number of data and unknowns. So we divide the single layer in a number of prisms equal to the number of magnetic data and we sample the IRF accordingly.
5.2.2.3
IRF - adapted classical algorithms
Exploiting image deconvolution tools we solve this problem using certain iterative algorithms derived by the ML approach described in section 4. These algorithms allow to iteratively reconstruct the solution starting from the lower frequencies and adding step by step the higher ones. Knowing the variance of noise we can stop the iterations, for example according to the discrepancy principle 4.6, before adding to the solution the high frequencies corresponding to the noise amplification. Moreover these algorithms are very flexible tools giving us the opportunity to customize them to add “a priori” constraints on the solutions: this can be achieved by means of suitable regularized functionals 3.3.1. To perform the inversion, we focus on two different iterative methods adapted for IRF that instead of classical astronomical PSF, can have negative values. As we discussed above (section 3.1.1) ML approach, in case of additive Gaussian noise, is equivalent to the minimization of the least-squares functional. In a sense, this functional Jy (x) measures the goodness of fit of the model predictions to the actual data. The iterative method we use to minimize the functional Jy (x) on the positive cone x > 0 is the Projected Landweber method (section 4.1.1). On the other hand, we derive and use a version of the ISRA (equation 4.7) when no positivity constraint on h is available. ISRA requires both non-negative data and nonnegative IRF, so that we must modify the algorithm in order to adapt it to our case. To this purpose, we start writing the KKT conditions for the constrained minima of the convex functional 4.2: x ∇Jy (x) = 0 , x ≥ 0 , ∇Jy (x) ≥ 0 , where the multiplication of arrays is in the Hadamard sense. According to the SGM we look for a decomposition of the gradient in the following form ∇Jy (x) = Vy (x) − Uy (x) where Vy (x) > 0 and Uy (x) ≥ 0; Uy and Vy are not uniquely defined. The first KKTcondition can be written as a fixed point equation x=x
Uy (x) . Vy (x)
(5.19)
In the magnetic case, we deal with an IRF that can have negative values so we have to modify the algorithm in order to deal also with negative values in the data and the IRF. Following the SGM we modify ISRA for data y and IRF that can have negative values. First defining h+ = max{0, h} h− = max{0, −h}
[h∗ ∗ y]+ = max{0, (h∗ ∗ y)}
[h∗ ∗ y]− = max{0, −(h∗ ∗ y)},
(5.20)
5.2 Regularized gradient algorithms in a geomagnetic example
75
then inserting h = h+ − h− and h∗ ∗ y = [h∗ ∗ y]+ − [h∗ ∗ y]− the gradient of the LS functional becomes: ∇Jy (x) = (h∗ ∗ y)− + h∗+ ∗ h+ ∗ x + h∗− ∗ h− ∗ x − (h∗ ∗ y)+ + h∗+ ∗ h− ∗ x + h∗− ∗ h+ ∗ x .
(5.21)
Uy (x) = (h∗ ∗ y)+ + h∗+ ∗ h− ∗ x + h∗− ∗ h+ ∗ x
(5.22)
Vy (x) = (h∗ ∗ y)− + h∗+ ∗ h+ ∗ x + h∗− ∗ h− ∗ x .
(5.23)
Applying the gradient decomposition as described previously, we have
Next we use the successive approximation relation (equation 5.19) to obtain the modified ISRA algorithm: Uy (x(k) ) x(k+1) = x(k) . (5.24) Vy (x(k) ) This fixed-point equation can be solved in frequency domain using the convolution theorem and hence exploiting the speed of FFT.
5.2.2.4
Use of regularized algorithms
In many archaeological survey we can assume that what we look for is similar to a plan of a flat and so we can use an edge-preserving regularizing functional. A general form of the edge-preserving functional is given by equation 3.37 like the so-called Total Variation (TV), for which ψ is given by equation 3.41. Other choices of ψ lead to other edge–preserving functionals [75]. For example, the Geman–McClure prior (equation 3.42) applied to the square norm of the gradient kDm,n xk2 (as defined in equation 3.38) leads to another suitable functional, already developed in a geophysical framework by [60], and called the Minimum Gradient Support (MGS). Generally, for any function ψ, the gradient of JR takes the form i Xh ∇JR (x) = ψ 0 (s1m−1 , s2n ) + ψ 0 (s1m , s2n−1 ) + 2ψ 0 (s1m , s2n ) m,n 1 ×x(sm , s2n )
− ψ 0 (s1m−1 , s2n )x(s1m−1 , s2n ) − ψ 0 (s1m , s2n−1 ) ×x(s1m , s2n−1 ) − ψ 0 (s1m , s2n ) x(s1m+1 , s2n ) + x(s1m , s2n+1 ) .
The last two edge-preserving methods allow the inversion procedure to preserve the blocky structures present in the images, leading to an optimal solution in case of such structures in the subsoil, as occurs for most archaeological targets.
5.2.2.5
Regularized projected Landweber algorithm
The regularized PLM takes the form x(k+1) = PC x(k) − τ ∇Jy (x(k) ) + µ∇JR (x(k) )
76
Image reconstruction in the case of Gaussian noise
In case of edge-preserving functional, the iterative implementation of the algorithm requires the computation of the step length τ . We adopt an inexact line search to minimizing φ(τ ) = J x(k) + τ ∇J(x(k) )
for τ > 0 [75]. We apply a dichotomic process for getting the vanishing of φ0 (τ ), and hence finding the optimal τ .
5.2.2.6
Regularized modified ISRA
The regularization of the modified ISRA takes the form x(k+1) = x(k)
U (x(k) ) + µUr (x(k) ) , V (x(k) ) + µVr (x(k) )
in which Ur and Vr are the result of applying the split gradient method to J2 . For example, the splitting of the gradient for the TV functional is as follows: X Ur (f ) = ψ 0 (s1m , s2n ) x(s1m+1 , s2n ) + x(s1m , s2n+1 ) + m,n
ψ 0 (s1m−1 , s2n )x(s1m−1 , s2n ) + ψ 0 (s1m , s2n−1 )x(s1m , s2n−1 ) ,
Vr (f ) =
X m,n
5.2.2.7
x(s1m , s2n ) ψ 0 (s1m−1 , s2n ) + ψ 0 (s1m , s2n−1 ) + 2ψ 0 (s1m , s2n ) .
Estimate of µ and stop criterion
The choice of regularization parameter µ is crucial to obtain a solution that is physically admissible. In the Tikhonov regularization there is an optimum µ for which the solution has the minimum distance from the “true” image x(s1 , s2 ) for each given image to deblur. The problem arise from the fact that we must know x(s1 , s2 ) in order to compute the optimal regularization parameter. µ is a weight that balance between the minimization of the discrepancy and the JR functionals. In the simulations case we choose µ and to stop the iterations when the reconstruction error defined in equation 5.16 reaches a minimum value during iterative process.
5.2.3 Performance of the regularized algorithms The inversion scheme discussed in the previous section has been tested by means of synthetic models. We performed our numerical experiments using both ISRA and Landweber. The aims of the numerical analysis are: - to test the inversion scheme comparing the performances of the two iterative methods; - to compare the performances of stabilizers;
77
5.2 Regularized gradient algorithms in a geomagnetic example A/m
nT 24.92
0.023
5.91
0
0
0.025
0.020
4.06 5
5
0.017
2.73 1.59
10
10
0.015 0.013
0.53 15
15
−0.54
x
x
0.010 0.008
−1.68 20
20
0.005
−3.01 −4.86
25
25
0.002 0.000
5
10
15
25
30
y
−9.55 30
30 0
20
0
5
10
15
20
25
30
y
Figure 5.8: From left to right, the simulated object in the subsoil, the measured magnetic anomaly
- to test the robustness of the method (using a semi-blind deconvolution approach) when the estimations of depth and thickness of the targets are strongly biased. We simulated a buried archaeological target that may represent a set of walls partially collapsed. The object is 2 meters thick, buried in the subsoil at depth of 2 meters (top of the object). The object extends for about 10 × 15 meters as shown in the left panel of figure 5.8 We divide the subsoil in a set of homogeneous vertical-sided prism, considering only a single layer of cells as discussed in the previous sections. The survey area covers 32.125× 32.125 m and the sampling spacing in both x and y direction is 0.125 m, resulting in a 256 × 256 grid. The sampling points are located over the buried prisms centers. The total magnetic field anomaly and the point spread functions have been computed following [8], who developed an analytical relation for the prism shaped bodies. All simulations are performed with in house code implemented in Python using the software package Scipy [42].
5.2.3.1
Inversion tests with exact IRF
We first describe the IRF used in our simulations (figure 5.7). Assuming the North direction parallel to x axis, we compute a discrete IRF corresponding to a geomagnetic field with inclination and declination of 60 and 0 North degrees respectively, with sampling points on the same xy grid of the object. Notice that f is given in A/m as intensity of magnetization and the resulting IRF is expressed in nT, unit of magnetic anomalies intensity. The conclusions that can be derived from these hypothesis hold true also for any other IRF. The magnetic field is obtained by convolving the simulated object (left panel of figure 5.8) with the IRF. Next the magnetic field (expressed in A m−1 ) is perturbed by additive white Gaussian noise with σ = 0.001 corresponding to about 1 nT error in the measurement, a value that is reasonable for modeling the read-out noise in the acquisition process. In the right panel of figure 5.8 we show the blurred and noisy image. We point out that the blurred and noisy image takes values ranging from −9.5 to 24.9 nT and has negative values in 57% of pixels. First, we perform the inversion by means of the non regularized PLM and the non
78
Image reconstruction in the case of Gaussian noise
regularized ISRA algorithms. The relative reconstruction error is defined by: ρ(k) =
||x − x(k) ||2 , ||x||2
(5.25)
where x is the true object and ||.||2 denotes the Euclidean norm, and k is the number of iteration. Results are shown in figure 5.9. A/m
A/m 0.044
0.033
0.041
0
0
0.036
0.028
0.034 5
5
0.030
0.021
0.026
10
10
0.024
0.018
0.022 15
15
0.018
x
x
0.015 0.012
0.014 20
20
0.008
0.010 0.003
25
25
0.003 0.000
5
10
15
25
0.000
30
30
30 0
20
y
0
5
10
15
20
25
30
y
Figure 5.9: From left to right, the PLM reconstruction and the ISRA reconstruction
For the regularized algorithms the first point is the choice of the parameters µ and T . We fix T = 10−12 and we search for the best value of µ minimizing the relative reconstruction error. In table 5.7 we give the reconstruction error for different values of µ; the best value obtained is about 38%. The left panel of figure 5.10, of figure 5.11 and of figure 5.12 show the reconstructions provided by the regularized PLM and the right panel of figure 5.10, of figure 5.11 and of figure 5.12 show the reconstruction provided by regularized ISRA. A/m
A/m 0.035
0.032
0.033
0
0
0.035
0.027
0.027 5
5
0.024
0.020
0.021
10
10
0.024
0.017
0.018 15
15
0.015
x
x
0.015 0.011
0.011 20
20
0.008
0.008 0.003
25
25
0.003 0.000
5
10
20
25
0.000
30
30
30 0
15
y
0
5
10
15
20
25
30
y
Figure 5.10: The reconstruction of the magnetic object with Tikhonov regularization. The regularized PLM on the left and the regularized ISRA on the right.
A/m
A/m 0.030
0.029
0.028
0
0
0.031
0.024
0.023 5
5
0.020
0.018
0.017
10
10
0.021
0.016
0.015 15
15
0.012
x
x
0.013 0.010
0.010 20
20
0.007
0.007 0.002
25
25
0.002 0.000
5
10
15
y
25
30
0.000 30
30 0
20
0
5
10
15
20
25
30
y
Figure 5.11: The reconstruction of the magnetic object with TV regularization. The regularized PLM on the left and the regularized ISRA on the right.
79
5.2 Regularized gradient algorithms in a geomagnetic example
ISRA µ
Error (% )
PLM Iteration
Error (% )
Iteration
NOT reg. 46.48 ×10−4
2603
40.39
4386
TIK
1.0
44.05
17477
40.39
5441
5.0
40.88
14896
40.98
>20000
10.0
47.59
1102
40.39
4468
×10−6
TV
1.0
44.95
5472
40.03
2248
5.0
39.80
23576
38.98
3612
7.0
39.52
24856
38.62
4262
10.0
39.60
15938
38.29
5172
×10−6
MGS
10.0
46.48
2617
40.35
1921
1.0
47.50
2650
40.35
1923
0.1
46.47
2604
40.35
1923
Table 5.7: Optimal values of the regularization parameter µ for regularized PLM and ISRA in the case of T = 10−12 . A/m
A/m 0.044
0.033
0.041
0
0
0.036
0.028
0.035 5
5
0.030
0.021
0.026
10
10
0.024
0.018
0.022 15
15
0.018
x
x
0.015 0.012
0.014 20
20
0.008
0.010 0.003
25
25
0.003 0.000
5
15
20
y
25
30
0.000 30
30 0
10
0
5
10
15
20
25
30
y
Figure 5.12: The reconstruction of the magnetic object with MGS regularization. The regularized PLM on the left and the regularized ISRA on the right.
The best reconstructions of PLM and modified ISRA are similar. The solution provided by the not regularized algorithms exploiting the semi-convergence property do not fit well the true solution. In fact, the structure is too sharpened and the values of magnetization are not very close to that of the true object. Better reconstructions can be
80
Image reconstruction in the case of Gaussian noise
obtained using the regularized algorithms. The regularized inversions reveal a smaller relative reconstruction error than the not regularized one. However, both methods are quite slow with respect to other algorithms (for example conjugate gradient) but in this case is not an issue, because in real application we have to stop the iteration at certain point and so the fact that the solution varies little iteration after iteration simplifies the process. The Tikhonov regularization is similar to that not regularized but the continuity of structures is better enhanced. The TV functional clearly demonstrate the best performance both in shape similarity and in magnetization values. In particular TV show an enhancement of the blocky structures that characterize the buried object. In fact, comparing the recovered objects it is clear that the one obtained by TV is the closest to the shape of the“true” object. The MGS functionals in this case does not demonstrate particular enhancement of the solution compared with the Tikhonov one.
6 Image reconstruction in the case of Gaussian-Poisson noise
Noise computing data acquired with charge coupled device (CCD) camera is modeled as an additive Poisson-Gaussian mixture, with the Poisson component representing cumulative counts of object dependent photoelectrons, object independent photoelectrons, bias electrons and thermo-electrons, and the Gaussian component representing readout noise. One approach for compensating for blurring while recognizing the statistical properties of noise encountered in CCD cameras is based on the method of ML estimation, as discussed by [67] In this chapter we show the performance of the algorithm and its asymptotic approximation - developed in chapter 4 with respect to that proposed in [68] which is based on the approximation of the Gaussian noise with Poisson noise. The utility of the approach was shown on the reconstruction of images of the Hubble Space Telescope. Dealing with image deconvolution, we also extend the method to take into account boundary effects and multiple images of the same object. The approximation proposed in this work is tested on a few numerical examples. This problem deserves further investigation because it can be important in the deconvolution of images of faint objects provided by next-generation ground-based telescopes that will be characterized by large collecting areas and advanced adaptive optics.
6.1 The case of Gaussian-Poisson noise CCD cameras are used for acquiring images both in microscopy and astronomy. They are preceded by optical elements (lenses or mirrors), which limit resolution and introduce aberrations; moreover, in the case of ground-based telescopes, additional significant aberrations are introduced by the atmospheric turbulence, even when its effect is partially corrected by adaptive optics (AO). As a consequence, the acquired images must be processed by deblurring methods. If a ML approach is adopted, a study of the noise introduced by a CCD camera is required. This problem was considered by researchers
82
Image reconstruction in the case of Gaussian-Poisson noise
working on the restoration of the images of the Hubble Space Telescope at the beginning of the nineties [52, 66, 68], and an accurate statistical model is described, for instance, in [67]. Now we consider the specific problem of image deconvolution. We modify the algorithm to compensate for possible boundary effects; moreover, we extend the method to the problem of multiple image deconvolution. Then we discuss the implementation of the method and its validation on a few numerical examples.
6.1.1 Application to image deconvolution A quite natural application of the previous approach is the deconvolution of 3D images in microscopy or the deconvolution of 2D images in astronomy. We adopt the notation introduced in section 2.1.2. In these applications it is usually assumed that R = S and that the imaging matrix H is defined by a convolution product Hx = h ∗ x , where h is the so-called point spread function (PSF) of the imaging system. If the object x is surrounded by a uniform background and is completely contained within the field of view (FoV) of the imaging system, then a useful approximation consists in extending image, PSF and object periodically outside the FoV (periodic boundary conditions) so that the convolution product can be easily and efficiently computed by means of the FFT algorithm. In such a case it is also assumed that the PSF is normalized (see equation 2.8) so that we have hi = 1 (see equation 3.16). This condition implies that X X (Hx)i = xj . i∈S
j∈S
The physical interpretation is that the total number of photons in the object (also called total flux) coincides with the total number of photons in the computed image. Finally, we point out that one iteration of the algorithm defined by equation 4.26, with the quotient computed by means of the leading term of the asymptotic expansion of equation 4.17, requires the computation of four FFTs, and therefore has approximately the same computational cost of one RL iteration.
6.1.2 Computer implementation The iterative algorithms discussed in chapter 4 have the following general structure x(k) T H C(x(k) ) h where C(x) is a positive vector/array that takes one of the following possible forms x(k+1) =
+∞ X (Hx + b)n i
Ci (x) =
n=0 +∞ X
n!
1
2
e− 2σ2 (n+1−yi )
(Hx + b)ni − 12 (n−yi )2 e 2σ n! n=0
;
(6.1)
83
6.1 The case of Gaussian-Poisson noise 1 + 2 ((Hx + b)i − yi ) (1) Ci (x) = exp − ; 2 ((Hx + b)i + σ 2 ) (2)
Ci (x) = (1)
yi + σ 2 . (Hx + b)i + σ 2
(6.2) (6.3)
(2)
Ci (x) and Ci (x) are the two different approximations of Ci (x) that will be compared in this chapter. Concerning the implementation of Ci (x), we take into account the strong decay of the Gaussian factor, so the two series are computed using a number of terms given by [yi + 4σ]. The implementation of the two other algorithms is obvious. To compare the three algorithms, we select examples where the condition yi ≤ 30 is satisfied in the vast majority of pixels. We performed our numerical experiments both in the case of single image deconvolution and in the case of multiple image deconvolution, the latter case being considered mainly in view of possible application to the deconvolution of the images from the LINC-NIRVANA interferometer of LBT. Since we obtained similar results in both cases we report only those obtained in the first one. We also considered several different astronomical objects and several different PSFs. We show only two among the many examples we have investigated. Indeed, the conclusions that can be derived from these two examples can be applied to all others. All our experiments are performed using the software package AIRY (Astronomical Image Restoration in interferometrY) - http://dirac.disi.unige.it/ - [19], which has been developed for the simulation and successive deconvolution of LINC-NIRVANA images. Obviously, it can also be used for images of a single-mirror telescope. First of all we point out that our numerical experiments do not intend to simulate real astronomical observations but only to check and compare the numerical accuracy of the different approximations mentioned above. For this reason we do not investigate the effect of the background, that is relevant in infrared observations, nor the effect of the analog-to-digital conversion. As a consequence we consider images which can take negative values. The PSF used in our experiments has been generated using the software package CAOS (Code for Adaptive Optics Systems), and therefore is a simulation of an AO PSF, with a Strehl ratio of about 20%. The picture of this PSF, 256×256, is given in figure 6.1 (left panel) together with the corresponding modulation transfer function (MTF), i.e. the modulus of the Fourier transform of the PSF. Using a suitable gray scale, the circular bandwidth arising from the geometry of the mirror is visible, even if the high-frequency components are attenuated as a consequence of the low AO correction we are assuming. The images are obtained by convolving the simulated object with this PSF. No background is added. Next they are perturbed with Poisson noise (from photon counting) and additive Gaussian noise with σ = 10, a value that is reasonable for the read-out noise of existing scientific cameras; it could take higher values in real images if they are obtained by a co-adding of several frames. We report the results obtained for two different kinds of astronomical objects: the first is a star cluster consisting of 100 stars, with an angular distribution, with respect to the centroid of the cluster, described by a Gaussian function; the second is
84
Image reconstruction in the case of Gaussian-Poisson noise
Figure 6.1: Left panel: the PSF used in our simulation (log-scale). Right panel: the corresponding MTF (0.25 power-scale).
Figure 6.2: Upper panels: the 128 × 128 central part of the star cluster (left), and the 256 × 256 blurry image of the full cluster (right); the white square indicates the central part corresponding to the left panel. Lower panels: the galaxy NGC6946 (left) and the corresponding blurry image (right).
the image of the galaxy NGC6946, as provided by the prime-focus camera of LBT http://medusa.as.arizona.edu/lbto/astronomical - and reduced in size to a 256 × 256 array. The integrated magnitude of the star cluster is set to 12.5, with magnitudes of the stars ranging approximately from 17 to 18. The numerical values of the object range from 0 to 1.80 × 104 . The integrated magnitude of the galaxy is also set to 12.5, but the numerical values of the object range from 0 to 120. Figure 6.2 shows the two objects with the corresponding images. In the case of the star cluster, the image takes negative values in 31% of the pixels and values smaller than
85
6.1 The case of Gaussian-Poisson noise
Table 6.1: Mean value (MV) and standard deviation (SD) of the C-arrays, in the case of the star cluster, for different numbers of iterations. Nr.It.
Param.
C
C (1)
C (2)
100
MV SD
0.9903 0.0956
0.9903 0.0956
0.9885 0.0962
1000
MV SD
0.9947 0.0935
0.9947 0.0935
0.9947 0.0939
10000
MV SD
0.9944 0.0931
0.9943 0.0931
0.9944 0.0935
30 in 83% of the pixels. Moreover, the minimum and maximum value are −35.4 (about 3σ) and 457.0. For the galaxy, the corresponding percentages are 21% and 75%, respectively, and the minimum and maximum value are −34.5 and 113.7. Therefore, in the vast majority of the pixels, conditions for the validity of the asymptotic approximation derived in the Appendix are not satisfied. The algorithm behaves differently in the two cases; the convergence is slow in the case of the star cluster, while it is much faster in the case of the galaxy (a diffuse object), a behavior similar to that of the RLM algorithm. Indeed, the three different approximations of the array C(x) do not lead to different convergence behavior. We first analyze and compare the behavior of the arrays of equations 6.1-6.3 as a function of the number of iterations. In the case of the cluster, we computed 10000 iterations, and we checked the results at 100, 1000 and 10000 iterations. Table 6.1 gives, for these iterations, the mean value and the standard deviations of the pixel values of each one of the three arrays. They take values close to 1, with deviations of the order of 10%. No significant difference is found between these parameters of the three arrays; also, the increase in the number of iterations does not significantly modify the situation. Differences appear if we look at the arrays C − C (1) and C − C (2) , i.e. the deviations of the approximations with respect to the “correct” array C. These results, reported in table 6.2, indicate that the approximation derived in this paper, even if it should be valid only for large values of yi , provides more accurate values than the approximation obtained by replacing the Gaussian distribution of the read-out noise with a Poisson distribution; the mean value is smaller by a factor of about 100 for k = 100 and 1000, while the standard deviation is smaller by a factor of about 10 for k = 1000 and 10000. Note that good reconstructions are already obtained for k = 2000. Similar results are obtained in the case of the galaxy, reported in table 6.3 and table 6.4. The best restoration is reached after a number of iterations ranging from 40 to 50; therefore, we checked the arrays only at 50 and 100 iterations (at 100 iterations the checkerboard effect is already evident). Again, the approximation proposed in this paper is better than the Poisson approximation.
86
Image reconstruction in the case of Gaussian-Poisson noise
Table 6.2: Mean value (MV) and standard deviation (SD) of the differences C − C (1) and C − C (2) , in the case of the star cluster, for different numbers of iterations. Nr.It.
Param.
C − C (1)
C − C (2)
100
MV SD
-2.80 ×10−5 1.17 ×10−3
1.78 ×10−3 6.21 ×10−3
1000
MV SD
7.29 ×10−7 7.23 ×10−4
- 4.05 ×10−5 5.89 ×10−3
10000
MV SD
3.28 ×10−6 6.95 ×10−4
-2.32 ×10−5 5.85 ×10−3
Table 6.3: Mean value (MV) and standard deviation (SD) of the C-arrays, in the case of the galaxy, for different numbers of iterations. Nr.It.
Param.
C
C (1)
C (2)
50
MV SD
0.9957 0.0922
0.9957 0.0924
0.9936 0.0924
100
MV SD
0.9975 0.0920
0.9975 0.0920
0.9968 0.0921
In the case of the star cluster, we pushed the iterations up to 10000 but, if we look at the magnitudes (fluxes) of the reconstructed stars, the results do not change after 2000 iterations. If we subtract the reconstruction obtained with the approximation C (1) (x) from that obtained with C(x), we find an array with values ranging from −69.9 to 42.3; the mean value of the array is 4.9 × 10−4 and the standard deviation 1.4. On the other hand, if we subtract the reconstruction obtained with the approximation provided by C (2) (x) we obtain an array with values ranging from −54.0 to 70.9; the mean value is −3.4 × 10−3 and the standard deviation 1.6. In conclusion, in such a case, the reconstruction provided by C (1) is comparable with that provided by C (2) . The upper-left panel of figure 6.3 shows the reconstruction of the cluster (represented with a log scale) obtained, after 2000 iterations, with the algorithm implementing the computation of the series. If we compare this result with the left panel of figure 6.3, we recognize a few artifacts, but the estimation of the magnitudes of the stars is quite satisfactory except in those cases where two stars are too close to be resolved by the algorithm. Moreover, to check the statistical significance of our results, we compute the normalized residuals defined by yi − (H x ¯)i , Ri (¯ x) = q (H x ¯)2i + σ 2
87
6.1 The case of Gaussian-Poisson noise
Table 6.4: Mean value (MV) and standard deviation (SD) of the differences C − C (1) and C − C (2) , in the case of the galaxy, for different numbers of iterations. Nr.It.
Param.
C − C (1)
C − C (2)
50
MV SD
2.14 ×10−6 8.86 ×10−4
2.19 ×10−3 6.39 ×10−3
100
MV SD
1.23 ×10−6 8.54 ×10−4
7.49 ×10−4 5.57 ×10−3
where x ¯ denotes the reconstructed object. If the reconstructions are statistically correct these arrays should be free of artifacts and their values should have a Gaussian distribution with zero expected value and variance 1. All these conditions are satisfied: the map of the normalized residuals is shown in the lower-left panel of figure 6.3 (the results are quite similar for the three approximations); moreover, the expected values for the three reconstructions are about −0.055 while the variances are about 0.995. In the case of the galaxy, we also check the accuracy of the reconstruction by computing, at each iteration, both the relative r.m.s. error defined by equation 5.25 and the Kullback Leibler (KL) divergence (or I-divergence) of x(k) from x δ(k) =
X j∈S
(
xj ln
xj (k) xj
(k)
+ xj − xj
)
.
In such a case the convergence is quite fast and, for this reason, we compute only 100 iterations. Indeed, the minimum value of ρ(k) is reached after 45 iterations for the first two methods and after 51 for the third one. In all cases, the relative r.m.s. error is on the order of 35%. Moreover, δ(k) has also a minimum occurring for a bit larger value of k, ranging from 60 to 70 according to the algorithm used, and the minimum value is on the order of 2.2×105 . The reconstructions look similar to those obtained by stopping the iterations at the minimum of ρ(k) . As in the case of the star cluster, if we subtract the reconstruction obtained with the approximation C (1) (x) from that obtained with C(x), we find an array with values ranging from −0.95 to 0.45; the mean value of the array is 8.8 × 10−4 and the standard deviation 5.7× 10−2 . On the other hand, if we subtract the reconstruction obtained with the approximation provided by C (2) (x), we obtain an array with values ranging from −1.64 to 7.84; the mean value is −0.24 and the standard deviation 0.49. Therefore, in such a case, the approximation provided by C (1) is more accurate than the other one. The reconstruction provided by the first method is shown in the upper-right panel of figure 6.3. By comparing with the left panel of figure 6.3, we see that important details are lost due to the large noise affecting the data. Finally, we also checked the statistical accuracy of the results by looking at the normalized residuals. The expected value is around −0.043 for the first two methods and around −0.066 for the third one, while the
88
Image reconstruction in the case of Gaussian-Poisson noise
Figure 6.3: Left: The reconstruction of the central part of the cluster (up). and the map of the corresponding normalized residual (down). Right: The reconstruction of the galaxy (up) and the corresponding residual (down).
variance is 0.993 for all of them. All the residuals are free of artifacts and the map of those provided by the first method is shown in the lower-right panel of figure 6.3.
7 Point spread function estimation
In this chapter we face two different problems linked together by the fact that the point spread function is unknown. This kind of problems is well known in literature and they fall under the general heading of blind deconvolution. A standard technique to solve a blind reconstruction problem is to add some “a priori” information about the PSF. This information can be in the form of constraints, or can consists of a model of the PSF. In the literature these methods are called blind, semi-blind or myopic deconvolution according to the specific knowledge we have on the PSF. In the first application of this chapter we consider again the problem, treated in chapter 5, of the reconstruction of the susceptibility in the subsoil being known the magnetic anomaly. As we said in chapter 5, in the assumption that buried structures may be modeled by a single layer of constant depth and thickness, interpretation of magnetic data over 2D arrays can be assimilated to the problem of image deconvolution. Rough data represent the blurred and noisy image to be focused: this implies to recover the magnetization distribution in the subsoil by 2D inversion. The problem results ill-conditioned, and the presence of noise in the measurements requires regularization techniques to obtain stable and good results in inversion. Now, through numerical simulations, by applying the semi-blind deconvolution method we propose, we verify the robustness of this method with respect to errors in the magnetic IRF as it normally happens in the real cases. We also perform a set of simulations in which we suppose that the depth of the single layer is unknown. In this case we know a model of the IRF since it depends analytically on the unknown depth. We perform a quantitative analysis by computing the reconstruction error in all cases we have investigated. In this way we check the goodness of this semi-blind method, based on the knowledge of a model of the IRF. Next we apply the semi-blind deconvolution to a real dataset showing the effectiveness of this approach. Moreover, we present a second reconstruction problem in which the PSF is partially known, and precisely is observed as well as the image to retrieve. Hence the PSF results corrupted by Poisson noise and a myopic deconvolution process is required for estimating the unknown object. We propose a new statistical model which incorporates the presence of noise both on the image of the object to retrieve and on the “measured”
90
Point spread function estimation
PSF. This technique also takes into account the non-negativity constraint on the solution and on the PSF. As usual, deconvolution results are presented for simulated data. A comparison between the classical algorithms and that proposed is given in this chapter. This method can be also extended when different measures of PSF with different sizes are available.
7.1 Semi-blind deconvolution In the deconvolution methods applied in the section 5.2 we assumed to know exactly the IRF. However, in practical applications, both the depth and the thickness of the layer are unknown. If the magnetization is approximatively constant within the layer then the IRF depends mainly on s30 , as shown by equation 5.16, and this is the most relevant parameter. By fixing ξ we have a one parameter family of IRFs and the deconvolution problem becomes a semi-blind one [58]. Hence, in this paragraph we consider h depending on the parameter s30 and we write h(s30 ) in place of h(s1 , s2 ; s30 ). We propose a semi-blind iterative deconvolution that is a modification of the iterative blind deconvolution introduced in [5] and is based on regularized versions of PLM. Preliminarily we compute a discrete set of “admissible” IRFs, corresponding to a given set of depth values A = {k(s30 ) | s30 = s3min , . . . , s3max }
where s3min and s3max are the minimum and maximum value of the depth s30 . Then each external iteration of the semi–blind approach consists of two steps. A scheme is given in figure 7.1. The first step, represented by the upper rectangle, is as follows. Let us denote by xj , hj the output of the iteration j. This pair enters in the “Object BOX”. The deconvolution method implemented in this box is the iterative PLM with TV–regularization. By keeping hj as IRF and xj as initialization, we perform one inner iteration. In this way we have an update xj+1 of the object. Then the pair xj+1 , hj enters the “IRF BOX” where the iterative method corresponding to PLM with Tikhonov regularization is implemented. Here, the roles of object and IRF are exchanged, so that xj+1 is keep fixed and the deconvolution is applied to the IRF, using hj as initial guess. In this box five iterations are performed and the output of the first step is the pair xj+1 , hj+1 . The second step of the iteration, represented by the lower rectangle of figure 7.1, is as follows. If j 6≡ 0 (mod 200) then no operation is performed and we set hj+1 = hj+1 . If j ≡ 0 (mod 200) then we take as hj+1 the IRF in the set A such that 2 hj+1 = argminh∈A h − hj+1 , (7.1) i.e. the one with a minimum distance from hj+1 . In others words, the projection becomes effective every 200 iterations.
About the initialization of the procedure, we take as h0 the IRF corresponding to an estimate of s30 , and x0 = 0. The numbers of internal iterations of the two boxes have been optimized by means of numerical simulations, while the number of global iterations is determined by the minimal error (in simulations) or by the discrepancy principle.
91
7.1 Semi-blind deconvolution
{xj , hj } STEP 1 Object BOX
no >5
IRF BOX
yes
Projection on A STEP 2
≡0
j/ 200 6≡ 0
{xj+1 , hj+1 } Figure 7.1: The schematic representation of a semi-blind deconvolution.
The algorithm guarantees that the IRF is physically meaningful and moreover it automatically gives an estimate of the depth of the layer.
7.1.1 Depth and thickness errors in IRF estimation In the single layer approximation of the magnetic problem, the IRF depends on the thickness and the depth of the object. In this section it is shown that the use of semiblind deconvolution method damps the reconstruction errors due to a rough estimate of the IRF. The “true” object is the same used in the exact IRF simulation (figure 5.8). We start from an estimation of IRF in the context of an archaeological survey. The estimated IRF comes from a 2 m thick and 3 m depth single layer. The class A consists of IRFs coming from different depths of top of the layer, one every 0.25 m starting from 0.5 to 4.0 m. This guarantees that the IRF used in the deconvolution for the unknown object has a form very close to that of an actual IRF. Figure 7.2 shows the result using µ = 10−4 for the IRF, µ = 5 × 10−5 and T = 10−12 for both. The relative error continuously decreases after 50000 iterations when we decide
92
Point spread function estimation
to stop. A/m 0.038 0
0.035 0.030
5
0.026 10
0.022 0.019
15
x
0.016 0.012
20
0.008 25
0.003 0.000
30 0
5
10
15
20
25
30
y
Figure 7.2: Result of a semi-blind deconvolution in a simulated case.
The IRF selected by the algorithm is that given by a 2 m thick and 2 m deep layer, corresponding to the IRF used to generate the synthetic data. So, in this case, the method is capable of retrieving the “true” IRF, enhancing the inversion results and reducing the relative reconstruction error. The resulting deconvoluted image (figure 7.2) is quite similar in shape and in terms of boundary detection to the results obtained with exact IRF, while the magnetizations are not correct. This suggests that, if the aim of inversion is mostly oriented to trace the position and edges of the buried objects, then the single layer model works well also in the case of wrong estimation of depth or thickness.
7.1.2 Semi-blind deconvolution in a real case Finally we applied the inversion methodologies discussed in this article to a real dataset. The area investigated is located in Burnum, Croatia, and covers a 50 × 50m of extent. The sample spacing is 0.20m. We adopt a semi-blind deconvolution approach because as discussed before, we don’t know exactly the IRF, but we have only an approximate estimation. Supposing that the solution belongs to a single layer 1m thick, we consider an IRF given by the layer at depth of 3.0m. Again we use the PLM for both the deconvolution of the IRF (regularized with Tikhonov) and the object (regularized with TV). It performs 5 step of deconvolution for the IRF k every step of deconvolution on f . The class of admissible IRF is given by a set calculated for depth ranging from 0.5 to 5.0m every 0.25m. We choose µ = 10−4 for the IRF, µ = 10−3 for f and T = 10−12 for both. Obviously in this case it is not possible to calculate the relative error, so the iterations are stopped by means of the discrepancy principle (equation 4.6), considering σ = 3.0nT. We choose to stop the iterative process when σ became smaller than a certain value corresponding to the estimated standard deviation of the noise in the acquired data. The algorithm chooses the IRF corresponding to a layer buried at 1.75m (top of the layer) after 418 iterations. The resulting deconvoluted image shows a set of linear structures that fit well in the case of Burnum, where buried walls have been found. This image clearly enhances and delineates the structures only slightly visible in the acquired dataset, confirming the simulation experiments. The recovered IRF gives a depth of the top of layer of 1.75 m, a
93
7.2 Myopic deconvolution
nT
A/m 0.09
0
0
70.19 11.92
0.08 5
5
7.82
0.07 10
10
4.86
0.06 15
15
2.34
10
25
30
35
40
45
y
0.00
50
5
20
0.01
45
50 0
15
0.02
40
45
−32.56 50
0.03
35
40
−11.98
0.04
30
35
−7.87
x
30
−4.92
0.05
25
x
25
−2.39
0.05 20
20
−0.03
0
5
10
15
20
25
30
35
40
45
50
y
Figure 7.3: On the left, the real dataset of Burnum and on the right the result of semiblind deconvolution regularized with TV.
plausible value for Burnum, but we point out that the estimated thickness (1 m) is not exact, so the depth of the layer can be biased.
7.2 Myopic deconvolution Usually in the astronomical image reconstruction one supposes to know the PSF which describes the effects of the optical elements (lenses or mirrors which limit resolution and introduce aberrations) and, if any, the background [6]. Nevertheless, in most cases the PSF is unknown or poorly known. If the system is approximately space invariant, such a function can be estimated from the observation of one (or more) single star visible in the neighborhood of the object to retrieve. However, enough bright stars are not always visible close to the object, and in these cases, by supposing that we can observe at least one single star of low intensity, the measure of the PSF becomes particularly noisy. With such a PSF the linear model results very approximate and so the usual deconvolution approach is then put to severe test. Accordingly, the image reconstruction problem is more complicated since it needs to estimate the PSF as well as the object. This problem is well known in astronomical image reconstruction, and it is called “myopic” deconvolution in opposition to “blind” deconvolution, since there is a little knowledge, even if imprecise, of the PSF [15, 69]. It was considered in particular by researchers working on the restoration of images observed with ground-based telescopes, and hence, when the PSF models also the aberrations due to the presence of the atmosphere. In the literature there are some works in which the restoration is performed without any information on the PSF, as in [29], and other works, for example [18] and [16], in which a solution is proposed in a Gaussian statistical framework by adding some regularizing functionals which take into account the a priori knowledge on the PSF and, if any, on the object. In this work we present a new deconvolution method based on the knowledge of an image of the object and an image of the PSF both corrupted by Poisson noise, without any prior information on the PSF or on the object. After passing through the optical elements (lenses or mirrors), the light beams are
94
Point spread function estimation
stored in pixels by charged-coupled-device (CCD) cameras. As a consequence, each image is acquired as a table of numbers and we denote by y = {yi }i∈S the detected image, where i is a pair of indexes which denotes the pixels, and n = #S is the total number of pixels. Moreover, we denote by h = {hi }i∈S the PSF describing the effect of the optical components and, possibly, the acquiring distortion due to the atmosphere. Finally, we denote by x = {xi }i∈S the unknown object to be retrieved. by is the (known) constant background. In astronomical image reconstruction, the image formation model is yi = P [(h ∗ x + by )i ]
,
where ∗ is the convolution product, and P(ξ) represents a Poisson random variable with mean and variance equal to ξ These n variables are assumed to be statistically independent. The elements hi are nonnegative and their sum is 1. More in general, such a discrete convolution product can be represented as a row-by-column product between a block circulant matrix and a vector, that is (Hx)i = (Xh)i = (h ∗ x)i , where H = circ(h) and X = circ(x) are the block circulant matrices associated to the vectors h and x, respectively. In terms of block circulant matrix the above mentioned properties of h become the properties already described in equation 2.8. These considerations will be useful in the following. In order to reconstruct the object x from its image y, the major drawback is the estimation of the PSF. Usually h can be estimated by a single star extracted from the same y or from another image. Then in this case, for h holds the relation i h (7.2) ki = P (λ · h + bk )i
where k is the image of the observed star and λ is its “intensity” over the background bk . A common estimation of h consists in subtracting the background bk from k, clipping to 0 the negative values, and finally normalizing the result. We formalize this procedure by defining the set A of admissible PSF as follow ( ) X A = h hi ≥ 0 ∀i ∈ S , hi = 1 , i
and a projection operator on this set PA (ξ) =
max(ξi , 0) P i max(ξi , 0)
.
(7.3)
i∈S
This is not the usual metric projection of ξ on A, even if it is widely adopted by astronomers. Moreover, PA (k − bk ) is an estimate of h but statistically it is not the best choice. Indeed, the Maximum Likelihood Estimation (MLE) of h in equation 7.2, constrained to the admissible set A leads to minimize ( ) X ki ki ln Jh (h) = + (λ h + bk )i − ki (λ h + bk )i i ! X hi − 1 , + µi hi + ν i
95
7.2 Myopic deconvolution
where µi and ν are parameters according to the multipliers Lagrange theory, in the case of unilateral and bilateral constraints. The minimum exists because of the convexity of the unconstrained functional and the convexity of A. It has to satisfy the KKT conditions, which imply that hi = 0, or X ki ki − hi =0 . 1− (λ h + bk )i (λ h + bk )i i
One can numerically prove that these conditions are not verified if we put h = PA (k−bk ) and so the projection PA (k − bk ) does not give the solution of the constrained MLE.
7.2.1 Formulation of the joint deconvolution approach To retrieve the object which has generated y we propose an alternative approach. It consists in considering the image formation model composed by the two equations relative to y and k, i.e. ( yi = P [(Hx + by )i ] (7.4) ki = P (λ h + bk )i .
Clearly, the unknowns of this system are the object xi , the PSF hi and the star magnitude λ. Moreover the presence of the products Hx and λ h makes nonlinear this model.
We assume that the different Poisson variables are independent of each other. This assumption is in agreement with two different ways of measuring the PSF. The first is when the PSF is measured after or before the data, and hence two different images are stored. The second is when, as we note above, the image of the PSF is extracted from the data, and the region of the data which we want to reconstruct does not overlap the extracted PSF region. Then, the ML method, applied to the joint probability density function of the system 7.4, leads to minimize the functional X yi y + [(Hx + b )i − yi ] + J(h, x, λ) = yi ln (Hx + by )i i i h ki k + (λ h + b )i − ki . (7.5) ki ln (λ h + bk )i This functional is the sum of two generalized Kullback-Leibler divergences, and it provides a measure of the discrepancy between the pair of the detected images (y , k) and the pair of the computed images (h ∗ x + by , λ h + bk ) associated with (x , h , λ). It is defined on an open set D which contains the closed and convex cone C = {(x, h, λ) | x, h, λ ≥ 0} of the non-negative vectors. Now we prove some results concerning the geometry of the functional 7.5 and the existence of solutions of the MLE problem both on D and C. Proposition 9. The functional 7.5 is bounded from below by 0 on D and is coercive, i.e. tends to infinity for each variable xi , hi and/or λ tending to infinity.
96
Point spread function estimation
Proof. Both these facts depend on the properties of the function ψ(q) = p ( ln(p) − ln(q) ) + q − p . It is non-negative for all p, q > 0 and this proves the boundedness from below. Moreover, the limit to infinity lim ψ(q) = ∞ shows that J(x, h, λ) is unbounded above. q→∞
Thanks to the bounded and coercive properties, the functional 7.5 admits at least one absolute minimum, so the existence of solutions of the ML problem on D is satisfied. The ML problem also admits solutions on C, because J is continuous and C is closed. Clearly, the absolute minimum on D is attained when (x, h, λ) is such that y = Hx + by
and k = λh + bk
(7.6)
and hence the value of J(x, h, λ) is 0. If H is circulant, a simple condition guarantees the uniqueness of this trivial solution except for a multiplicative factor. To show it we need to introduce a definition. Definition 3. We say that an image x is a full frequency (FF) image if each component of its Fourier transform is non-zero. A characterization of this definition in terms of block circulant matrix associated with the image is the following: an image x is a FF image if and only if its circulant block matrix X = circ(x) is non singular. Moreover, an image that is not FF has very particular features. For a characterization of not FF images in the one dimensional case, see [30]. Proposition 10. If k and y are FF images, the solution of the ML problem in D is unique except for a multiplicative factor. Proof. The proof is an obvious consequence of the minimum conditions 7.6 and of the characterization of FF images as a circulant block matrices. We define an application ψγ : D → D such that ψγ (x, h, λ) = (γx, γ1 h, γλ), for every γ > 0. If (x∗ , h∗ , λ∗ ) is a solution of the problem 7.4 in D, ψγ (x∗ , h∗ , λ∗ ) is another solution of the same problem. We remark that in the hypothesis of proposition 10 there is only one solution of the ML problem in D which satisfies the normalization condition on the PSF. Indeed, this P condition implies that λ = i {ki − bk }.
The presence of noise on k and y gives rise to two important fact. Firstly, k and y are in general FF images, and this property can be easily verified since k and y are measured data. Secondly, the images x∗ and h∗ which form the minimizer (x∗ , h∗ , λ∗ ) of J have in general several negative components. Hence, these solutions of the ML problem do not belong to the nonnegative cone C. Now we show that J, even if it has absolute minima, is not convex.
97
7.2 Myopic deconvolution
Lemma 3. Let a, b and c be vectors. Let A and B be the circulant matrices associated with a and b, respectively. The following relations hold true: c c = B T diag B (7.7a) ∇a −B Ba (Ba)2 ∇a (Ab) = ∇a (Ba) = B T
(7.7b)
Proof. The proof is an obvious consequence of computation. For the second relation we need to take into account the symmetry of the product between a circulant matrix and a vector. Proposition 11. The Hessian HJ (x, h, λ) of the functional 7.5 can be written as a block matrix
HT
z2 y
z2 H + (1 − Z) 0 y d 2 2 λ 2 z bk w XT X+ 1 − w2 y d k k d k T P h2i 2 b 1 − w2 w i k ki i
XT
H d T z2 H X + (1 − Z)T y d 0
yi ki , wi = , Z is the circulant matrix associated with z and y (Hx + b )i (λh + bk )i [v]d indicates the diagonal matrix generated by the vector v. where zi =
Proof. We begin by computing the first derivatives. We have ∇x J
= H T (1 − z)
(7.8a)
∇h J
= X T (1 − z) + λ(1 − w) X = hi (1 − w)i
(7.8b)
∇λ J
(7.8c)
i
From 7.7a we have ∇xx J = H
T
z2 y
d
H
and ∇hh J = X
T
z2 y
X.
d
To compute the cross derivatives we apply the product rule, and we take into account the relation Hx = Xh. From 7.7b we have 2 z X + (1 − Z)T ∇hx J = H T y d where we have applied the equation 7.7a to the first term and the equation 7.7b to the second one. All other computations are obvious. Proposition 12. The function 7.5 is separately convex in x, h and λ but is not jointly convex in these variables.
98
Point spread function estimation
Proof. It is an obvious consequence of the lemma 11. In fact the blocks on the diagonal of H are positive semidefinite quadratic forms for all (x, h, λ). Instead, the nonzero blocks out of the main diagonal are not definite. Up to the present analysis, the function 7.5 admits minima but is not convex. Now we show that if there are other stationary points in D in addition to the trivial minimizer 7.6, they are formed by images that are not FF. Proposition 13. Let k be a FF image. All the stationary points (x∗ , h∗ , λ∗ ) in D of the functional 7.5 belong to the set defined by the intersection of the two hyper-surfaces of equation X X yi = N and (Hx + by )i − yi = 0 . (Hx + by )i i
i
Moreover, except the trivial minimizer 7.6, the others consist of at least one not FF image. Proof. The stationary points are given by the vanishing of the equation 7.8. We suppose that h is a FF image. In this case, the block circulant matrix H T is invertible and hence the gradient 7.8a vanishes only if Hx + by = y. Consequently the vanishing of 7.8b implies that k − bk h= λ and this is consistent with the assumption, since k is a FF image. The vanishing of 7.8c is then satisfied automatically. This proves that there is only one stationary point with h FF image. Now, we suppose that h is not a FF image. We consider a constant vector c such that Hc = by . It exists since H is circulant and by is constant. By computing the scalar product between x + c and the vanishing gradient 7.8a we obtain y T 0 = hx + c, ∇x Ji = x + c, H 1− Hx + by y = Hx + by , 1 − Hx + by and the first formula of the statement follows. Moreover, by computing the scalar product between 1 and the vanishing gradient 7.8a we obtain y y T 0 = h1, ∇x Ji = 1, H 1− = H1, 1 − Hx + by Hx + by and the second formula of the statement follows. All these theoretical results concern the stationary points of the functional 7.5 on its domain D. In the case that the functional 7.5 is constrained on C, Proposition 9) ensures that an absolute minimum exists. A minimizer (x∗ , h∗ , λ∗ ) of the constrained functional can be an unconstrained stationary point in the interior of the cone C, or a point which
99
7.2 Myopic deconvolution
belongs to the frontier of C. It necessarily satisfies the KKT conditions that, in this particular case, can be written as y ∗ (H ∗ )T 1 − 0 = x H ∗ x ∗ + by y k ∗ 1− 0 = h∗ (X ∗ )T 1 − + λ H ∗ x∗ + by λ∗ h∗ + bk P ∗ ki ∗ 0 = λ i hi 1 − ∗ ∗ (λ h + bk )i 0 ≤ x∗ , h∗ , λ∗
(7.9)
Since the functional constrained on C admits an absolute minimum, system 7.9 has to be verified by some given point in C. In the next Section we propose an iterative algorithm to find the points which verify equations 7.9. We base the construction of this algorithm on the Split Gradient Method (SGM) [47] getting a non-negative estimation of the solution of problem 7.4.
7.2.2 The iterative algorithm Thanks to the results of the previous section the minimization of 7.5 on the non-negative cone C has at least a solution, and this solution has to satisfy the system 7.9. These conditions can be expressed as fixed point equations through the use of the SGM. This method consists in splitting the gradient ∇J in two parts, i.e. ∇J(x, h, λ) = V (x, h, λ) − U (x, h, λ)
(7.10)
where V (x, h, λ) is positive and U (x, h, λ) is non-negative. The split is not unique, but any such a split will suffice. In our split, U and V take the following form
HT
y Hx + by
λk y T + X U (x, h, λ) = y Hx + b λh +b P ki i hi (λ h + bk )i
HT 1 , V (x, h, λ) = λ + X T 1 P i hi
,
where both U and V are composed by three blocks which are respectively the split of the differentiation on x, h and λ, that we briefly name Ux , Uy , Uλ and Vx , Vy , Vλ . In general, thanks to equation 7.10 we can write system 7.9 in the following equivalent way (x, h, λ) = T (x, h, λ) , where T is the operator
x, h, λ ≥ 0
100
Point spread function estimation
T (x, h, λ) =
Ux Uh Uλ ,λ ,h x Vx Vh Vλ
.
Finally, by applying the method of successive approximations to the fixed points of the operator T , we obtain the Joint Myopic (JM) algorithm x(j+1) = h(j+1) = λ(j+1) =
x(j) HiT 1
T H(j)
y
H(j) x(j) + by λ(j) k h(j) y T X(j) + T 1 H(j) x(j) + by λ(j) h(j) + bk λ(j) + X(j) λ(j) k hh(j) , hh(j) , 1i (λ(j) h(j) + bk )(j)
Since the operator T is not a contraction, the convergence of this algorithm cannot be deduced from general theorems of fixed point theory. Remark 17. If backgrounds by and bk are zero, for the estimations (x, h, λ) generated by the algorithm the following relations hold true hy , 1i hx(j+1) , 1i = hh (j) , 1i hy + k , 1i hh(j+1) , 1i = hx (j) , 1i + λ(j) hk , 1i λ(j+1) = hh(j) , 1i If we initialize with an h of flux 1, and with an x having the same flux of y, i.e. hh(0) , 1i = 1 , hx(0) , 1i = hy , 1i a simple proof by induction, based on the previous equations, shows that hx(j) , 1i = hy , 1i and λ(j) = hk , 1i for every iteration k. The algorithm can be initialized with the vector (x(0) , h(0) , λ(0) ), where x(0) is a constant image with flux equal to that of y, h(0) is given by the projection PA (k−bk ) as in equation 7.3 and λ(0) = hk − bk , 1i. Numerical experiments reveal that the algorithm is rather sensitive to a rough initialization of the PSF. Indeed, when we initialize with a constant PSF, (that is a not FF image) we cannot get satisfactory estimation of x and h. Remark 18. The algorithm is a scaled-gradient method, with step-size µ = 1, since it can be written in the following form
101
7.2 Myopic deconvolution
x ∇ J x ,h ,λ x(j+1) = x(j) − µ S(j) x (j) (j) (j) h ∇ J x ,h ,λ h(j+1) = h(j) − µ S(j) h (j) (j) (j) λ λ (j+1) = λ(j) − µ S(j) ∇λ J x(j) , h(j) , λ(j)
where
x S(j) λ S(j)
h(j) x(j) h , S(j) = , = Vx (x(j) , h(j) , λ(j) ) d Vh (x(j) , h(j) , λ(j) ) d λ(j) = . Vλ (x(j) , h(j) , λ(j) ) d
With this remark we can get the convergence of the algorithm with a simple application of the Armijo rule on the step-size µ. This additive form is common to all the algorithms derived from the SGM, but this specific problem has an intrinsic characteristic that makes this observation very useful. Indeed, this algorithm retrieves simultaneously both x and h. Hence we can consider two different step-sizes, µx for that which concerns the reconstruction of x, and µh for that which concerns the reconstruction of h. If we take µx and µh smaller than 1, at each iteration the images automatically are non-negative.
7.2.3 Numerical experiments The two classical algorithms to reconstruct images in the case of Poisson noise are the Richardson Lucy (RL) [63, 53] and the Blind Deconvolution (BD) based on RL [36]. The first requires the knowledge of the PSF, and provides an estimation of the object x having fixed h. In the case by = 0 it is convergent and this is proved in [74]. It provides quite good estimations of x even if one uses an approximate PSF. If a measure of h is available, the usual method to get an estimation of the PSF is that described in the Introduction. Following the previous notation the RL algorithm becomes x(j+1) = x(j) H T
y Hx(j) + by
where h = PA (k − bk ). The standard initialization for the object is a constant value, i.e x0 = n1 hy − by , 1i. The second leaves aside the knowledge of the PSF, and, like the JM deconvolution, estimates both x and h. One can easily see that BD is a particular case of JM. Indeed we can write this algorithm as x(j) y T x(j+1) = T H(j) H x + by H 1 (j)
h(j)
(j) (j)
y T h(j+1) = X T 1 X(j) H x + by (j) (j) (j)
102
Point spread function estimation
and hence it has the same form of the JM algorithm when one does not have a measure of the PSF, i.e, when k = 0. Hence, the JM algorithm differs from the BD only for the estimation of λ and for the term depending on k in the estimation of h, i.e. λ(j) k λ(j) h(j) + bk
.
(7.11)
This “correction term” will be discussed below. BD algorithm is usually implemented by performing a given number of iterations, say nx , on x with h fixed, followed by a given number of iterations, say nh , on h with x fixed. [48] prove that it is convergent when nx = nh = 1. One can verify that, when nx = nh , this approach provides negligible differences in results respect to that used in this work. Moreover, one can remark that these parameters nx and nh play exactly the same role of µx and µh respectively, when the JM algorithm is expressed in his additive form. We performed our numerical experiments with all three algorithms: JM, RL and BD. These experiments intend to simulate, as far as possible, a real astronomical observations in order to check and compare the numerical accuracy of the new JM method with respect to the classical RL and BD. For this reason we consider all the components that have an effect, as the background, which is relevant in infrared observations. All our experiments are performed using a in-house code written in Python, which has been developed for these simulated image deconvolutions. The key parameter for a quantitative comparison is the reconstruction error, which we compute in a different way for different kinds of images. Indeed, we report the results obtained for two astronomical objects: a diffuse object, the galaxy NGC5979, known as the Crab nebula and reduced in size to a 256 × 256 array; a sparse image, a star cluster consisting of 20 stars, with a distribution described by a Gaussian function with respect to the centroid of the cluster. We have considered these two cases to give an idea of the generality of the JM method, which, as RL or BD algorithms, can be applied to whatever kind of image. Instead, in the literature one can find several works about myopic deconvolution, but its applicability sometimes is limited to sparse objects [29]. The reconstruction error is computed at each iteration as the root mean square (RMS) error, (as already defined in equation 5.25) between the object and the reconstructed image in the case of the diffused images, and as the arithmetic mean of the RMS errors computed in 3 × 3 neighborhoods of every single star in the case of the sparse images. We also considered several different PSFs. Since we obtained similar results in all cases, we report only those obtained with a specific PSF and we show only two, among the many examples we have investigated. Indeed, the conclusions that can be derived from these two examples also apply to all others. The PSF used in the following experiments is an Airy function, showed in the left panel of figure 7.4, and therefore it can be the ideal PSF of a given telescope. Moreover, we generated an image of a single star in order to have a measure of a noisy PSF. With respect to the statistical model 7.4 we multiply the Airy PSF by the height of the star λ = 106 . Then we add a small constant background bk = 10. Next, we
103
7.2 Myopic deconvolution
324 0.00292 256 0.0023 196 0.00176 0.0013 0.0009 0.00058
144 100 64
0.00032
36
0.00014
16
4e − 05
4 0
Figure 7.4: On the left panel, the Airy PSF used for our experiments. On the right panel, the image of a single star. For a better visualization, all two images are plotted in square root scale.
perturbed the resulting image with Poisson noise (from photon counting). Its numerical values range from 0 to 345. The pictures of the single star of size 256 × 256 pixels, is shown in the right panel of figure 7.4. In the standard approach such a single star serves simply to get a measure of the PSF, and it is normally considered, except for the projection PA , as it were the “true” PSF. Instead, in the myopic approach k has a double role: firstly, k plays the role of measured image, as well as y, and secondly, PA (k − bk ) serves as initialization of the algorithm. The images, shown in the right panels of figure 7.5, are obtained by convolving the simulated objects (showed in the left panels) with the Airy PSF. A small uniform background by = 10 is added. Next, they are perturbed with Poisson noise (from photon counting). The total flux of the Crab Nebula is about 4 × 106 . Its numerical values range from 0 to 255. The corresponding image has the same flux but its numerical values range approximately from 9 to 175. The total flux of the star cluster is about 7, 5 × 105 . Its numerical values range approximately from 0 to 5000. As above, the corresponding image has the same flux but its numerical values range approximately from 0 to 54. First, we perform the inversion of the Crab nebula, by means of the RL, BD and JM algorithms. To simulate what happens in astronomical observations, we assume that we do not know the ideal PSF, but only its image k over a known background bk . Hence, we initialize the RL algorithm as we have described in the Introduction and we consider PA (k − bk ) as a reliable measure of the PSF. This same measure is the initialization h(0) for the BD and the JM algorithms. The result of the RL deconvolution is shown on the left of figure 7.6 and it corresponds to the image that minimizes the reconstruction error. Its dependence of the number of iterations is shown in the same figure (right side). In this case, the BD algorithm gives an estimation of the object very similar to that of the JM algorithm, and so it is not necessary to show it. This happens since the flux of y is 100 times greater than the flux of k, and hence the correction term 7.11 of the JM algorithm with respect to the BD, is negligible. This fact will be further discussed in the next Section. In this reconstruction, we use the additive form of the algorithm, with µx = 0.3, µh = 1 and µλ = 1.
104
Point spread function estimation
240 160 210 140 180 120 150 100 120 80 90 60 60 30 0
40 20
54 4800 48 4200 42 3600 36 3000 30 2400 1800
24 18
1200
12
600
6
0
0
Figure 7.5: Upper panels: the galaxy NGC5979 (left) and its corresponding image over a background (right); Lower panels: the simulated star cluster (left) and its corresponding image over a background(right).
It is well known that the RL algorithm is quite robust with respect to PSF imperfections, but in this case, the imprecise measure of the PSF does not allow to the RL method to make a significant enhancement in the deconvolution process. The situation is different for the JM and BD method. In fact, these algorithms, starting from the same rough estimate, reconstruct the PSF simultaneously with the object. This feature permits to both JM and BD methods to provide a better reconstruction. figure 7.6 shows the image reconstructed with the JM method that corresponds to the smallest reconstruction error. It is evident that the image details of the Crab are more pronounced. Especially the stars on the right side of the image are now clearly visible. In the same way, we perform the inversion of the star cluster, by means of the RL, BD and JM algorithms. The results of the RL and the BD methods are shown on the left side of figure 7.7. They correspond to the estimations that minimize the reconstruction error during the iterative process which is shown in the same figure (right side). For BD algorithm, the reconstruction is obtained setting nx = ny = 3. In this case, the situation is inverted. The BD algorithm is less efficient than RL. It changes iteratively the PSF, which very early becomes a discrete delta function. This happens when the BD algorithm performs the reconstruction of sparse objects. This characteristic of BD method stops the reconstruction. If one wants to avoid this draw-
105
7.2 Myopic deconvolution
100% 90%
210
80%
180
70%
150
E(i)
240
60%
120
50%
90
40%
60
30%
30
20%
0
10
20 30 40 number of iterations
50
70% 225
60%
200 50%
150
E(i)
175
40%
125 100
30%
75 50 25
20% 10%
0
50 100 150 number of iterations
200
Figure 7.6: Upper panel: the RL reconstruction of the Crab Nebula (left) and the corresponding graphic of the reconstruction error as a function of the current iteration (right). Lower panel: the JM reconstruction of the Crab Nebula (left) and the corresponding graphic of the reconstruction error as a function of the current iteration (right).
back in BD algorithm, one can take nx > nh , but as nx /nh increases, the reconstruction provided by BD algorithm gets closer to the RL reconstruction. However, these two algorithms do not reach the photometric values reached by the JM algorithm. Indeed, the JM method, starting from the same measure of the PSF, makes a better reconstruction of the object. We have performed the reconstruction by means of the multiplicative JM algorithm (µx = µh = 1). Figure 7.8 shows the reconstructed image that corresponds to the smallest reconstruction error. It is evident that the stars are well deconvolved, and almost all the stars are not spread over a region larger than one pixel. Moreover, we can note that the photometry of this reconstruction with respects to that of the original object is much better than that of the other reconstructions. This is also confirmed by the minimum value of the reconstruction errors: about 85% for the RL algorithm and about 60% for JM one. We have shown that the JM algorithm, in the case of the reconstruction of diffuse objects, provides estimations of the object which, from a numerical point of view, are very
106
Point spread function estimation
120 100% 105 90
99, 5% E(i)
75
99%
60 45
98, 5%
30 15
98% 0
0
50 100 150 number of iterations
200
100% 98%
2100
96%
1800
94%
1500 1200
E(i)
2400
92% 90%
900 88% 600 86% 300 0
84%
0
500
1000 1500 2000 2500 number of iterations
3000
Figure 7.7: The BD reconstruction of the Star Cluster (left) and the corresponding behavior of the reconstruction error as a function of the current iteration (right). The RL reconstruction of the Star Cluster (left) and the corresponding behavior of the reconstruction error as a function of the current iteration (right).
similar to the BD reconstruction. As we remarked in the previous Section, when we deal with the Crab Nebula reconstructions, this depends on the the fact that the flux of the diffused object is much greater than the flux of the measured PSF. The experimental results have confirmed that the importance of the correction term 7.11 in the JM algorithm depends on the ratio between the flux of y and the flux of k. This ratio is widely variable and is in favor of y when y is a diffused image, and vice-versa, when y is for example a star cluster. Moreover, we have shown that when we perform a JM deconvolution on a star cluster, with flux up to that of the measured PSF, we obtain a much better result. In general, one can obtain this enhancement when the flux of the object is of the same order of magnitude of the flux of the PSF. Finally, we remark that this method can be easily extended to the case where more than one measure of the PSF is available. If k1 , . . . , kp are different measures of the PSF we can state the following equation system yi = P [(Hx + by )i ] (k1 )i = P (λ1 h + bk )i 1 . ... = ... (k ) = P (λ h + bk ) p i p p i
107
7.2 Myopic deconvolution
100% 95%
4200
90%
3600
85%
3000 2400 1800
E(i)
4800
80% 75% 70%
1200 65% 600 0
60%
0
2000 4000 6000 8000 number of iterations
10000
Figure 7.8: The JM reconstruction of the Star Cluster (left) and the corresponding graphic of the reconstruction error as a function of the current iteration (right).
and so, if every random variable is independent of each other, straightforward application of SGM to the new ML functional, derived from the previous system, leads to a JM algorithm with p different correction term 7.11 in the PSF update equation. Moreover the JM method can also be applied when the measure of the PSF is restricted to an area smaller than that of the object to retrieve. A well known case is when the image has a region of interest that is smaller than the complete image, and in the remaining part of the image there is a single star. Usually this single star is smaller in size of the region of interest. By making the space-invariant hypothesis for the PSF, one can consider the single star as a measure of the PSF. Naturally these two remarks can be merged together and so the algorithm can be extended to the case of different measures of the PSF with different sizes. An interesting case is when the image has more than one single star out the region of interest. In this case the correction term 7.11 of the JM method grows up with the growth of the number of stars considered, as well as the attainable information.
8 Perspectives
In this chapter we outline possible future research lines, divided in three sections. In the first, we discuss possible applications of the SGP algorithms to the minimization of regularized functionals in the framework of the SGM, proposed by Lanteri et al. [47]. The second section is dedicated to a study of the non linear magnetic model and its differences with respect to the linearized model which we use in chapter 5. We also propose a method to minimize the neglog of the ML functional, even if it can not be convex. In the third section we introduce a non convex functional J , in the case of Gaussian noise, whose minimizer is the regularized solution satisfying the discrepancy principle. Finally, we introduce a variational approach to regularization paths.
8.1 SGP algorithm provided by the SGM scaling in the regularized approach The convergence of the classical algorithms PLM and ISRA is very slow. GP and SGP perform an acceleration of these algorithms. Moreover, SGP is the most efficient algorithm to achieve the minimum of J. Also in the case of regularized functional, SGP with the scaling provided by SGM should have the best performance. We want to apply the SGP also to regularized functionals. The fundamental idea is to use the SGM to get the scaling matrix S also in the regularized cases. As we already noted in section 4.4, let J(x) = Jy (x) + µJR (x) be the regularized functional. The splitting of the gradient ∇J(x) is given by: U (x) = Uy (x) + µUR (x)
, V (x) = Vy (x) + µVR (x) .
It provides a diagonal scaling matrix for a SGP algorithm, x . S(x) = Vy (x) + µVR (x)
110
Perspectives
This scaling matrix is positive definite (for x > 0) and provides a descent and feasible direction. Both in Gaussian and in Poisson case the application of the SGP deconvolution provided by the SGM scaling is still to be done. One first perspective is to check and compare the algorithms performances in the regularized framework. We expected to get very efficient algorithms for any particular kind of prior.
8.2 Non linear magnetic model In section 5.2 we have discussed the linearized magnetic model. The linearized approach requires some further (and unnecessary) assumptions on the model, hence restricting the validity of the results and adherence to the real word. We present in this section some advances on the non linear magnetic model. In most magnetic prospecting surveys the magnetometers employed measure only the modulus of the total magnetic field |T| and not the three components. The magnetic anomaly inverse problem requires to retrieve the unknown magnetization (susceptibility) of the sources from the magnitude of the total magnetic field measured above the sources. The components of the total field depend on the susceptibility by means of affine operators. Moreover the relation between the square modulus of the total field and the susceptibility is quadratic and hence the model results nonlinear. We developed a new algorithm based on a nonlinear forward model that is a better approximation of the real case than the standard linearized one. Indeed, there are some significant drawbacks in the case of linearized inversion, decreasing the resolution power and creating some artifacts in the solution. We assume that the magnetization is parallel to the inducing field, hence there is no remanent magnetization, as described in equation 5.12. Using the same notation of the section 5.2, the relation between the intensity of magnetization of the sources and field’s modulus is t = kTk = kB0 + Bk r 2 2 2 A1 m + b1 + A2 m + b2 + A3 m + b3 , =
(8.1)
P3 where Ai = j=1 Kij vj and b1 , b2 , b3 are the components of the vector B0 . Kij are defined in equation 5.11 and vj are the components of the vector v in equation 5.12. The inverse problem, corresponding to equation 8.1, is ill-posed. In magnetic prospection, t is acquired by a magnetometer (typically proton-precession or flux-gate) that produces the so-called read-out noise which is described by an additive Gaussian noise. The likelihood function derived from the ML approach is given by the probability distribution with mean given by the right hand side of equation 8.1 and variance σ 2 , that
111
8.3 A non convex regularized parameter-free functional
is
(A1 m + b1 )2 + (A2 m + b2 )2 + (A3 m + b3 )2 ||2 1 − 2σ 2 L(t, m) = e Z where Z is a normalization constant. The ML solution minimizes the following functional obtained by taking the negative logarithm of L, i.e. p J(t, m) = || t − (A1 m + b1 )2 + (A2 m + b2 )2 + (A3 m + b3 )2 ||2 || t −
p
We can write
J(t, m) =
2 X p t − φ(m)
k
k∈S
P Each component of the function φ(m) = 3j=1 (Aj m + bj )2 is convex. However, we can p ensure that the functional is convex only when φ(m) ≤ t (intended in the Hadamard sense). The convexity of the remaining part of the domain depends on numerical values. Now, we compute the gradient of the functional. ∇J(m) =
3 X i=1
ATi
t 1− p φ(m)
!
(Ai m + bi )
(8.2)
Minimization is carried on by an iterative scaled gradient algorithm includes a line search for the step-length parameter. The first preliminary results encouraged us to continue the study of the nonlinear model. When the problem is not under-determined, as in the case of the susceptibility mapping we also have noted that the SGM algorithm given by splitting the gradient 8.2 lead to a semi-convergent algorithm. Moreover, another reason to go in this direction is the quite lacking of papers on this theme in the literature.
8.3 A non convex regularized parameter-free functional In the case of Gaussian noise the discrepancy principle 4.6 is used for instance as a criterion for stopping the iterations of some iterative method, and hence is used as regularization criterion. Secondly, it is used for the choice of the “best” MAP estimation among a set of possible MAP estimations corresponding to several values of the regularization parameter. In particular, while the discrepancy principle - as a criterion for stopping the iterations - can be verified at each step without an expensive computation, its use to find the regularization parameter µ can be only performed “a posteriori” once one has computed a set of possible MAP estimations corresponding to several values of µ. This section is dedicated to briefly introduce an analytical relation which the regularized MAP estimation has to satisfy when we require that the discrepancy principle holds true. By assuming that the functional 3.29 is convex and differentiable, equation 3.30 implies that the condition for x to be a MAP solution is given by the vanishing of the differential
112
Perspectives
of the functional 3.29. Therefore, in the assumption of Gaussian noise, equation 3.30) and equation 4.6 holds simultaneously when the following system of n + 1 equations with n + 1 unknowns (x, µ), is satisfied H T (Hx − y) + µ∇x JR (x) = 0 (8.3) kHx − yk2 = nσ 2
Now we show the condition on x equivalent to the system 8.3.
Proposition 14. Given a problem described by equation 2.10, x∗ is a MAP estimation which satisfies the discrepancy principle 4.6, if and only if h x∗ , ∇x JR (x∗ )i H T (Hx∗ − y) = h Hx∗ − y, yi + nσ 2 ∇x JR (x∗ )
(8.4)
Proof. Starting from the equations system 8.3, we perform the following steps. Easy computation shows that the discrepancy principle is equivalent to h H T (Hx − y), xi = h Hx − y, yi + nσ 2 By computing the scalar product between the vector x and the gradient in the first equation of the system, we obtain an expression of µ in function of x µ=−
h Hx − y, yi + nσ 2 h ∇x JR (x), xi
(8.5)
By substituting µ in the gradient expression follows the thesis. Hence, we consider the following non convex regularized parameter-free functional ∗
Jσ (x ) =
n X j=1
h x∗ , ∇x JR (x∗ )i H T (Hx∗ − y) −
2 − h Hx∗ − y, yi + nσ 2 ∇x JR (x∗ ) .
(8.6)
j
This functional is nonnegative, and clearly it is zero if and only if equation 8.4 holds true. This means that the global minimizer of 8.6 is the regularized solution satisfying the discrepancy principle. Another way to solve the problem of finding the regularization parameter is linked to the concept of regularization path. In this section we give a characterization of the regularization path as the set of points which minimize a suitable functional that we will define in the following. This characterization can be useful if it is coupled with any information able to define a constraint for the minimization of the new functional. As before, a simple example of this additional constraint can be the discrepancy principle in the Gaussian case. In this way, the door through a regularized algorithm without the drawback of the parameter choice is open.
8.3 A non convex regularized parameter-free functional
113
We start by recalling the definition of regularization path. The regularization path is defined as the one parameter family of solutions of the problem 3.30 depending on the µ regularization parameter, i.e. γ : R −→ Rn
µ −→ x(µ)
where x(µ) = arg minn (Jy (x) + µJR (x)) x∈R
(8.7)
is given by equation 3.30. By supposing the convexity and the differentiability of the functional in equation 8.7, the regularization parameter µ has to satisfies n equations, i.e. ∇Jy (x) ∀ j = 1, . . . , n (8.8) µ=− ∇JR (x) j This consideration lead to the first functional " #2 n X ∇Jy (x) µ− . J(x, µ) = ∇JR (x) j j
If we assume that the noise is Gaussian and µ is defined by means of the discrepancy principle as in equation 8.5, we find again the non convex parameter-free functional (equation 8.6).
A Appendices
A.1 Approximation of the Gaussian-Poisson minimization algorithm This appendix derives the asymptotic behaviour of the functions p¯(s; t) = e−s p(s; t) , q¯(s; t) = e−s q(s; t) ,
(A.1)
where p(s; t) and q(s; t) are defined in equations 3.12 and 3.13, respectively. Our results hold true for large values of t, with s satisfying the conditions √ √ t−c t≤s≤t+c t ,
(A.2)
where c is a given but arbitrary constant (for instance, c = 3). When the functions of equation A.1 are inserted in the reconstruction algorithm of equation 4.15, the value of t is given by the value yi of the observed image in a given pixel, while the value of s is given by the model (Hx(k) + b)i approximating the data in that pixel at iteration k. Therefore condition A.2 implies that the difference between model and data can not be arbitrarily large. We first investigate the function p¯(s; t); the result can be easily extended to q¯(s; t). If we introduce the integers √ √ N1 = [t − c t] − 1, N2 = [t + c t] + 1 (as usual, [u] means the integer part of u), then the values of n corresponding to the maxima of the Gaussian and Poisson distributions are interior to the interval [N1 , N2 ]. Therefore, we split the series defining p¯(s; t) into three terms p¯(s; t) = p¯1 (s; t) + p¯2 (s; t) + p¯3 (s; t) = N2 +∞ −s n 1 −1 NX X X e s − 12 (n−t)2 = + e 2σ + . n! n=0
n=N1
n=N2 +1
Since the maximum value of the Gaussian factor is reached inside the interval [N1 , N2 ], the first and third term can be bounded by the values of the Gaussian, respectively, in n = N1 and n = N2 . We have 2 c t (t − N1 )2 ≤ α1 exp − 2 , (A.3) p¯1 (s; t) ≤ exp − 2 2σ 2σ 2 c t (t − N2 )2 ≤ α3 exp − 2 , (A.4) p¯3 (s; t) ≤ exp − 2 2σ 2σ where α1 , α3 are suitable constants and therefore are negligible with respect to the second term, as we will show. Indeed, the second term can be estimated using the Gaussian approximation of the Poisson distribution. More precisely, from the Stirling approximation of the factorial for large values of n √ ln n! = n ln n − n + ln 2πn + λ(n) , with λ(n) ≤ (12n)−1 , we can write ln
e−s sn n!
= −s + n lns − ln n! =
(A.5)
√ n = −ln 2πs − n ln + n − s + e1 (n, s) , s with
1 n e1 (n, s) = − ln − λ(n) . 2 s
Then, for n ∈ [N1 , N2 ] and s constrained by the condition of equation A.2, we find that there exist t¯ and α2 such that, for any t ≥ t¯, we have |e1 (n, s)| ≤
1 N2 − N1 1 α2 1 |n − s| + ≤ + ≤√ . 2 s 12 n 2N1 12N1 t
(A.6)
Next, from the second-order Taylor formula for the natural logarithm, we have n−s n n−s 1 n−s 2 + e2 (n, s) , − ln = ln 1 + = s s s 2 s
(A.7)
and, again, for n, s satisfying the previous conditions, there exist t¯ and α ¯ 4 , α4 such that, for t ≥ t¯, we get |e2 (n, s)| ≤ α ¯4
|n − s| s
3
≤α ¯4
|N2 − N1 | N1
3
≤
α4 . t3/2
(A.8)
Then, if we insert equation A.7 into equation A.5 and take into account the inequalities A.6 and A.8, after some algebra, we obtain the estimate ln
e−s sn n!
√ 1 = −ln 2πs − (n − s)2 + O 2s n ∈ [N1 , N2 ]
,
1 √ , t √ |s − t| ≤ c t .
A.1 Approximation of the Gaussian-Poisson minimization algorithm
117
This equation provides the leading term of an asymptotic expansion of p¯2 (s; t) for large values of t, s satisfying condition A.2 p¯2 (s; t) = √
N2 X
− 21
1 e 2πs n=N
(n−s)2 (n−t)2 + s σ2
1
1 1+O √ . t
In a similar way we obtain N2 X − 12 1 e q¯2 (s; t) = √ 2πs n=N
(n−1−s)2 (n−t)2 + s σ2
1
1 1+O √ . t
Thanks to the inequalities A.3 and A.4 (that hold true also for q¯1 (s; t) and q¯3 (s; t), respectively), the leading terms of p¯2 (s; t) and q¯2 (s; t) are also the leading terms of the asymptotic expansions of p¯(s; t) and q¯(s; t). These can be written in a more convenient form by introducing the following quantities α, α0 , β defined by α=s
t + σ2 σ2 1 1 1 0 , α = α + , = + 2 , 2 2 s+σ s+σ β s σ
so that the final result is (s − t)2 N (n − α)2 2 − X 1 2) 2(s + σ 2β e p¯(s; t) = e 1+O √ t n=N −
(A.9)
1
(s + 1 − t)2 N (n − α0 )2 2 − X 1 2 2β . e q¯(s; t) = e 2(s + σ ) 1+O √ t n=N −
(A.10)
1
However, if we observe that 1 s (t − s) = O √ n−α = n−s− 2 s+σ t 2 2 2 σ σ (n − α0 )2 = (n − α)2 + 2 = (n − α) + 2 s+σ s + σ2 1 2 = (n − α) + O √ t we also have (s + 1 − t)2 N (n − α)2 2 − X 1 2 2(s + σ ) 2β . e 1+O √ q¯(s; t) = e t n=N −
1
(A.11)
List of Figures
2.1 The airy pattern and the Adaptive Optic (AO) psf. . . . . . . . . . . . . .
9
2.2 The modular transfer function (MTF) of the AO psf. . . . . . . . . . . . .
11
2.3 The geometric representation of image formation process. . . . . . . . .
12
2.4 The fundamental drawback of the image reconstruction problem: the inverse solution of the model is wrong! . . . . . . . . . . . . . . . . . . .
20
3.1 The geometric representation of a convergent algorithm which is semiconvergent with respect to the “true” solution x(0) . . . . . . . . . . . . .
28
3.2 The reconstruction error as a function of the number of iteration. . . . .
31
3.3 The distribution of the modulus of the eigenvalues F cˆ 2 by varying the |h|
value of c. The more c increases, the more the maximum of the distribution increases and its position is shifted close to zero. Among this family of distributions, by taking c large enough, we can find distributions corresponding to ill-conditioned image formation systems. . . . . . . . . . . t 2 3.4 On the upper plot, the graphics of the function E|h| ˆ 2 Ew (||x − x|| ) with kx0 − xk2 = 50 and c = 1, 2, 4, 8, 16, 32. On the lower plot, the graphics t 2 0 2 of the function E|h| ˆ 2 Ew (||x − x|| ) with kx − xk = 10000 and c = 1, 5, 25, 53 , 54 , 55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Upper panels: the object (left) and the corresponding blurred and noisy image (right). Lower panels: the reconstruction provided by the nonnegatively constrained minimum of the LS functional (log scale; left) and the corresponding normalized residual (right). . . . . . . . . . . . .
33
35
42
120
LIST OF FIGURES
5.1 The PSF used in our numerical experiments (left panel) and the corresponding OTF (right panel). . . . . . . . . . . . . . . . . . . . . . . . . .
60
5.2 The satellite (left panel) and the corresponding blurred and noisy image (right panel). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
5.3 Upper panels: the best reconstruction of the nebula in the case σ 2 = 1 (left) and in the case σ 2 = 5 (right). Lower panels: the corresponding normalized residual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
5.4 Upper panels: the best SGP reconstruction of the satellite in the case σ 2 = 1 (left) and in the case σ 2 = 5 (right). Lower panels: the corresponding normalized residual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
5.5 Upper panels: the best GP reconstruction of the satellite in the case σ 2 = 1 (left) and in the case σ 2 = 5 (right). Lower panels: the corresponding normalized residual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
5.6 The relation between the projection of the anomalous magnetic field B(r) on the Earth field B0 , with r = (r 1 , r 2 , r 3 ), and the difference between the measured geomagnetic field modulus |T| and the IGRF |B0 |. .
70
5.7 The magnetic IRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
5.8 From left to right, the simulated object in the subsoil, the measured magnetic anomaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
5.9 From left to right, the PLM reconstruction and the ISRA reconstruction .
78
5.10 The reconstruction of the magnetic object with Tikhonov regularization. The regularized PLM on the left and the regularized ISRA on the right. .
78
5.11 The reconstruction of the magnetic object with TV regularization. The regularized PLM on the left and the regularized ISRA on the right. . . . .
78
5.12 The reconstruction of the magnetic object with MGS regularization. The regularized PLM on the left and the regularized ISRA on the right. . . . .
79
6.1 Left panel: the PSF used in our simulation (log-scale). Right panel: the corresponding MTF (0.25 power-scale). . . . . . . . . . . . . . . . . . . .
84
6.2 Upper panels: the 128 × 128 central part of the star cluster (left), and the 256 × 256 blurry image of the full cluster (right); the white square indicates the central part corresponding to the left panel. Lower panels: the galaxy NGC6946 (left) and the corresponding blurry image (right). .
84
6.3 Left: The reconstruction of the central part of the cluster (up). and the map of the corresponding normalized residual (down). Right: The reconstruction of the galaxy (up) and the corresponding residual (down).
88
7.1 The schematic representation of a semi-blind deconvolution. . . . . . . .
91
LIST OF FIGURES
121
7.2 Result of a semi-blind deconvolution in a simulated case. . . . . . . . . .
92
7.3 On the left, the real dataset of Burnum and on the right the result of semi-blind deconvolution regularized with TV. . . . . . . . . . . . . . . .
93
7.4 On the left panel, the Airy PSF used for our experiments. On the right panel, the image of a single star. For a better visualization, all two images are plotted in square root scale. . . . . . . . . . . . . . . . . . . . . . . . 103 7.5 Upper panels: the galaxy NGC5979 (left) and its corresponding image over a background (right); Lower panels: the simulated star cluster (left) and its corresponding image over a background(right). . . . . . . . . . . 104 7.6 Upper panel: the RL reconstruction of the Crab Nebula (left) and the corresponding graphic of the reconstruction error as a function of the current iteration (right). Lower panel: the JM reconstruction of the Crab Nebula (left) and the corresponding graphic of the reconstruction error as a function of the current iteration (right). . . . . . . . . . . . . . . . . 105 7.7 The BD reconstruction of the Star Cluster (left) and the corresponding behavior of the reconstruction error as a function of the current iteration (right). The RL reconstruction of the Star Cluster (left) and the corresponding behavior of the reconstruction error as a function of the current iteration (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.8 The JM reconstruction of the Star Cluster (left) and the corresponding graphic of the reconstruction error as a function of the current iteration (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
List of Tables
4.1 The functions Uy , Vy for the three noise models. The functions p, q of the third line are defined respectively in equations 3.12 and 3.13, while h is defined in equation 3.16. . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
5.1 Reconstruction of the nebula Ngc5979 in the case σ 2 = 1. Relative reconstruction error and number of iterations are given for the two stopping rules (minimum and discrepancy). . . . . . . . . . . . . . . . . . . . . .
63
5.2 Reconstruction of the nebula Ngc5979 in the case σ 2 = 5. Relative reconstruction error and number of iterations are given for the two stopping rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.3 Reconstruction of the satellite in the case σ 2 = 1. Relative reconstruction error and number of iterations are given for the two stopping rules. . . .
66
5.4 Reconstruction of the satellite in the case σ 2 = 5. Relative reconstruction error and number of iterations are given for the two stopping rules. . . .
66
5.5 Behavior of the SGP algorithm applied to test images with different sizes. The same number of iterations is used for a given test object and a given noise level. An ideal psf is used in these simulations. . . . . . . . . . . .
69
5.6 Comparison between archaeological prospection and image formation model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
5.7 Optimal values of the regularization parameter µ for regularized PLM and ISRA in the case of T = 10−12 . . . . . . . . . . . . . . . . . . . . . .
79
6.1 Mean value (MV) and standard deviation (SD) of the C-arrays, in the case of the star cluster, for different numbers of iterations. . . . . . . . .
85
124
LIST OF TABLES
6.2 Mean value (MV) and standard deviation (SD) of the differences C −C (1) and C − C (2) , in the case of the star cluster, for different numbers of iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
6.3 Mean value (MV) and standard deviation (SD) of the C-arrays, in the case of the galaxy, for different numbers of iterations. . . . . . . . . . . .
86
6.4 Mean value (MV) and standard deviation (SD) of the differences C −C (1) and C − C (2) , in the case of the galaxy, for different numbers of iterations. 87
Bibliography
[1] Robert G. Aykroyd, John G. B. Haigh, and Gayle T. Allum, Methods Applied to Survey Data from Archeological Magnetometry, Journal of the American Statistical Association 96 (2001), no. 453, 64–76. [2] J. Bardsley and J. Nagy, Covariance-preconditioned iterative methods for nonnegatively constrained astronomical imaging, SIAM J. Matr. Anal. Appl. 27 (2006), 1184–1198. [3] H. H. Barrett and K. J. Meyers, Foundations of Image Science, Wiley and Sons, New York (2003), 1047–1048. [4] J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal. 8 (1988), 141–148. [5] M. Bertero, D. Bindi, P. Boccacci, M. Cattaneo, C. Eva, and V. Lanza, A novel blinddeconvolution method with an application to seismology, Inverse Problems (1998), no. 14, 815–833. [6] M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging, IOP Publ., Bristol, 1998. [7] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, MA, 1999. [8] B. K. Bhattacharyya, Magnetic anomalies due to prism-shaped bodies with arbitrary polarization, Geophysics 29 (1964), no. 4, 517–531. [9] B. K. Bhattacharyya and M. E. Navolio, Digital convolution for computing gravity and magnetic anomalies due to arbitrary bodies, Geophysics 40 (1975), no. 06, 981–992. [10] E. G. Birgin, J. M. Martínez, and M. Raydan, Nonmonotone spectral projected gradient methods on convex sets, SIAM Journal on Optimization 10 (2000), no. 4, 1196–1211. [11]
, Inexact spectral projected gradient methods on convex sets, IMA J. Numer. Anal. 23 (2003), 539–559.
126
Bibliography
[12] R. J. Blakely, Potential Theory in Gravity and Magnetic Applications, Cambridge University Press, Cambridge, 1995. [13] S. Bonettini, R. Zanella, and L. Zanni, A scaled gradient projection method for constrained image deblurring, Tech. report, Department of mathematics, university of modena and reggio emilia, 2007. [14] E. Bozzo, S. Lombardo, F. Merlanti, and M. Pavan, Integrated geophysical investigations at an Etrurian settlement in Northern Apennines (Italy), Archaeological Prospection 1 (1994), 19–35. [15] P. Campisi and K. Egiazarian, Blind image deconvolution: theory and applications, CRC Press, 2007. [16] J. C. Christou, D. Bonnacini, N. Ageorges, and F. Marchis, Myopic deconvolution of Adaptive Optics Images, The Messenger 97 (1999), 14–22. [17] P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul. 4 (2005), 1168–1200. [18] Jean-Marc Conan, Laurent M. Mugnier, Thierry Fusco, Vincent Michau, and Gérard Rousset, Myopic deconvolution of adaptive optics images by use of object and point-spread function power spectra, Appl. Opt. 37 (1998), no. 21, 4614–4622. [19] S. Correia, M. Carbillet, P. Boccacci, M. Bertero, and L. Fini, Restoration of interferometric images - I. The software package AIRY, Astron. Astrophys. 387 (2002), 733–43. [20] Y. H. Dai and R. Fletcher, New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds, Mathematical Programming (Series A) 106 (2006), no. 3, 403–421. [21] Y. H. Dai, W. W. Hager, K. Schittkowski, and H. Zhang, The Cyclic Barzilai-Borwein Method for Unconstrained Optimization, IMA J. Numer. Anal. 26 (2006), 604–627. [22] M. E. Daube-Witherspoon and G. Muehllehner, An iterative image space reconstruction algorithm suitable for volume ECT, IEEE Trans. Med. Imaging 5 (1986), 61–66. [23] A. R. De Pierro, On the convergence of the iterative image space reconstruction algorithm for volume ECT, IEEE Trans. Med. Imaging MI-6 (1987), no. 2, 124–125. [24] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B (1977), no. 39, 1–38. [25] B. Eicke, Iteration methods for convexly constrained ill-posed problems in Hilbert space, Num. Funct. Ana. Optim. 13 (1992), 413–429. [26] B. E. Ellerbroek and C. R. Vogel, Inverse problems in astronomical adaptive optics, Inverse Problems 25 (2009), no. 063001, 37 pp. [27] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer, Dordrecht, NL, 1996. [28] G. Frassoldati, L. Zanni, and G. Zanghirati, New adaptive step-size selections in gradient methods, J. of Industrial Optimization, Management (2007), Available at .
127
[29] T. Fusco, J.-P. Véran, J.-M. Conan, and L.M. Mugnier, Myopic deconvolution method for adaptive optics images of stellar fields, Astronomy and Astrophysics Supplement Series 134 (1999), no. 1, 193–200. [30] D. Geller, I. Kra, S. Popescu, and S. Simanca, On circulant matrices, Preprint, Stony Brook University. [31] J. W. Goodman, Introduction to Fourier Optics, McGraw-Hill, New York, 1968. [32] L. Grippo, F. Lampariello, and S. Lucidi, A nonmonotone line search technique for Newton’s method, SIAM Journal on Numerical Analysis 23 (1986), 707–716. [33] R. J. Hanisch and R. L. White, The Restoration of HST Images and Spectra - II, Proc. of a Workshop held at the Space Telescope Science Institute, Baltimore, 18-19 November 1993, NASA (1993). [34] J. Herwanger, H. Maurer, A.G. Green, and J. Leckebusch, 3-D inversions of magnetic gradiometer data in archaeological prospecting: Possibilities and limitations., Geophysics 65 (2000), no. 3, 849–860. [35] T. J. Holmes, Maximum-likelihood image restoration adapted for noncoherent optical imaging, J. Opt. Soc. Am. A (1988), no. 5, 666–673. [36] T. J. Holmes, Blind deconvolution of quantum-limited incoherent imagery: maximum-likelihood approach, Journal of the Optical Society of America A 9 (1992), 1052–1061. [37] T. J. Holmes and Y. H. Liu, Richardson-Lucy/maximum likelihood image restoration algorithm for fluorescence microscopy: further testing, Appl. Optics 28 (1989), 4930–4938. [38] A. N. Iusem, A short convergence proof of the EM algorithm for a specific Poisson model, REBRAPE 6 (1992), 57–67. [39] Iusem A. N., Convergence analysis for a multiplicatively relaxed EM algorithm, Math. Methods Appl. Sci. 14 (1991), 573–593. [40] V. K. Ivanov, On linear problems which are not well-posed, Soviet Math. Dokl. 3 (1961), 981–983. [41] Yih Jeng, Yuh-Lung Lee, Chung-Yuan Chen, and Ming-Juin Lin, Integrated signal enhancements in magnetic investigation in archaeology, Journal of Applied Geophysics (2003), no. 53, 31–48. [42] Eric Jones, Travis Oliphant, Pearu Peterson, et al., SciPy: Open source scientific tools for Python, 2008. [43] C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, 1999. [44] K. C. Kiwiel, Breakpoint searching algorithms for the continuous quadratic knapsack problem, Mathematical Programming (2007). [45] K. Lange and R. Carson, EM reconstruction algorithms for emission and transmission tomography, J. Computer Assisted Tomography 8 (1984), 306–316. [46] H. Lantéri, M. Roche, and C. Aime, Penalized maximum likelihood image restoration with positivity constraints: multiplicative algorithms, Inverse Problems 18 (2002), 1397–1419.
128
Bibliography
[47] H. Lantéri, M. Roche, O. Cuevas, and C. Aime, A general method to devise maximum-likelihood signal restoration multiplicative algorithms with non-negativity constraints, Signal Process. 81 (2001), 945–974. [48] Daniel D. Lee and H. Sebastian Seung, Algorithms for non-negative matrix factorization, NIPS, 2000, pp. 556–562. [49] Y. Li and D. W. Oldenburg, 3-D inversion of magnetic data, Geophysics 61 (1996), no. 2, 394–408. [50] Y. Li and D. W. Oldenburg, Joint inversion of surface and three-component borehole magnetic data, Geophysics 65 (2000), 540–542. [51] C. J. Lin, Projected gradient methods for non-negative matrix factorization, Neural Computation 19 (2007), 2756–2799. [52] J. Llacer, J. Nunez, and R. J. Allen, Iterative maximum likelihood and bayesian algorithms for image reconstruction in astronomy, baltimore: the space telescope (1990), 62–69„ ] Science Institute. [53] L. B. Lucy, An iterative technique for the rectification of observed distributions, Astron. J. 79 (1974), 745–754. [54] H. N. Multhei, Iterative continuous maximum likelihood reconstruction methods, Math. Methods Appl. Sci. 15 (1993), 275–286. [55] H. N. Multhei and B. Schorr, On an iterative method for a class of integral equations of the first kind, Math. Methods Appl. Sci. 9 (1987), 137–168. [56]
, On properties of the iterative maximum likelihood reconstruction method, Math. Methods Appl. Sci. 11 (1989), 331–342.
[57] F. Natterer, The Mathematics of Computerized Tomography, Wiley, New York (reprinted by SIAM, 2001), 1986. [58] E. Pantin, J. L. Starck, and F. Murtagh, Deconvolution and Blind Deconvolution in Astronomy, Blind image deconvolution: theory and applications (K. Egiazarian and P. Campisi (Eds), ed.), CRC Press, 2007, pp. 277–317. [59] D. G. Politte and D. L. Snyder, Correction for accidental coincidences and attenuation in maximum-likelihood image reconstruction for positron-emission tomography, IEEE Trans. Med. Imaging 10 (1991), 82–89. [60] O. Portniaguine and M.S. Zhdanov, Focusing geophysical inversion images, Geophysics 64, (1999), no. 3, 874–887. [61] R. C. Puetter, T. R. Gosnell, and A. Yahil, Digital image reconstruction; deblurring and denoising, Annu. Rev. Astron. Astrophys. 43 (2005), 139–94. [62] E. Resmerita, H. W. Engl, and N. Iusem, The expectation-maximization algorithm for ill-posed integral equations: a convergenge analysis, Inverse Problems 23 (2007), 2575–2588. [63] W. H. Richardson, Bayesian based iterative method of image reconstruction., J. Opt. Soc. Am. 62 (1972), 55–59. [64] T. Serafini, G. Zanghirati, and L. Zanni, Gradient projection methods for quadratic programs and applications in training support vector machines, Optimization Methods 20, Software (2005), 353–378.
129
[65] L. A. Shepp and Y. Vardi, Maximum likelihood reconstruction for emission tomography, IEEE Trans. Med. Imaging 1 (1982), 113–122. [66] D. L. Snyder, Modifications of the Lucy-Richardson iteration for restoring Hubble Space-Telescope imagery, ch. The Restoration of HST Images and Spectra, pp. 56– 61, Eds. White R L and Allen R J, 1990. [67] D. L. Snyder, A. M. Hammoud, and R. L. White, Image recovery from data acquired with a charged-coupled-device camera, J. Opt. Soc. Am. A 10 (1993), 1014 –1023. [68] D. L. Snyder, C. W. Helstrom, A. D. Lanterman, M. Faisal, and R. L. White, Compensation for read-out noise in CCD images, J. Opt. Soc. Am. A12 (1995), 272–83. [69] E. Thiébaut, Optics in Astrophysics, vol. 198, ch. Introduction to image reconstruction and inverse problems, pp. 397–422, Springer Netherlands, 2005. [70] A. N. Tikhonov and V. Y. Arsenin, Solutions of ill-posed problems, New york: wiley (1977). [71] B. Tsivouraki and G. N. Tsokas, Wavelet transform in denoising magnetic archaeological prospecting data, Archaeological Prospection 14 (2007), no. 2, 130–141. [72] G. N. Tsokas and C. B. Papazachos, Two-dimensional inversion filters in magnetic prospecting: application to the exploration for buried antiquities, Geophysics 57 (1992), no. 8, 1004–1013. [73] W.E.S. Urquhart and D. W. Strangway, Interpretation of part of an aeromagnetic survey in the Matagami area of Quebec, The utility of regional gravity and magnetic anomaly maps (W. J. Hinze, ed.), Society of Exploration Geophysicists, 1985, pp. 426–438. [74] Y. Vardi, L. A. Shepp, and L. Kaufman, A Statistical Model for Positron Emission Tomography, Journal of the American Statistical Association 80 (1985), no. 389, 8–20. [75] C. R. Vogel, Nonsmooth regularization, Inverse Problems in Geophysical Applications (H.W. Engl, A.K. Louis, and W. Rundell, eds.), SIAM, 1997, pp. 1–11. [76] Curtis R. Vogel, Computational methods for inverse problems, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002. [77] R. L. White and R. J. Allen, The Restoration of HST Images and Spectra, Proc. of a Workshop held at the Space Telescope Science Institute, Baltimore, 20-21 August 1990, NASA (1990). [78] J. C. Wynn, Archaeological prospection: an introduction to the special issue, Geophysics 51 (1986), no. 3, 533–537. [79] B. Zhou, L. Gao, and Y. H. Dai, Gradient methods with adaptive step-sizes, Computational Optimization 35 (2006), 69–86.