posed blob detector does not require parameter tuning, making it eas- ier to use and more ...... final model is provably going to converge to a strong learner. ... Deep learning relies heavily on the concept of artificial neural networks, which are ... gradient is then needed in gradient descent to update the weights of each layer.
DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018
Machine learning for blob detection in high-resolution 3D microscopy images MARTIN TER HAAK
KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Machine learning for blob detection in high-resolution 3D microscopy images MARTIN TER HAAK
EIT Digital Data Science Date: June 6, 2018 Supervisor: Vladimir Vlassov Examiner: Anne Håkansson Electrical Engineering and Computer Science (EECS)
iii
Abstract The aim of blob detection is to find regions in a digital image that differ from their surroundings with respect to properties like intensity or shape. Bio-image analysis is a common application where blobs can denote regions of interest that have been stained with a fluorescent dye. In image-based in situ sequencing for ribonucleic acid (RNA) for example, the blobs are local intensity maxima (i.e. bright spots) corresponding to the locations of specific RNA nucleobases in cells. Traditional methods of blob detection rely on simple image processing steps that must be guided by the user. The problem is that the user must seek the optimal parameters for each step which are often specific to that image and cannot be generalised to other images. Moreover, some of the existing tools are not suitable for the scale of the microscopy images that are often in very high resolution and 3D. Machine learning (ML) is a collection of techniques that give computers the ability to ”learn” from data. To eliminate the dependence on user parameters, the idea is applying ML to learn the definition of a blob from labelled images. The research question is therefore how ML can be effectively used to perform the blob detection. A blob detector is proposed that first extracts a set of relevant and nonredundant image features, then classifies pixels as blobs and finally uses a clustering algorithm to split up connected blobs. The detector works out-of-core, meaning it can process images that do not fit in memory, by dividing the images into chunks. Results prove the feasibility of this blob detector and show that it can compete with other popular software for blob detection. But unlike other tools, the proposed blob detector does not require parameter tuning, making it easier to use and more reliable. Keywords Biomedical Image Analysis; Blob Detection; Machine Learning; 3D; Computer Vision; Image Processing
iv
Abstract Syftet med blobdetektion är att hitta regioner i en digital bild som skiljer sig från omgivningen med avseende på egenskaper som intensitet eller form. Biologisk bildanalys är en vanlig tillämpning där blobbar kan beteckna intresseregioner som har färgats in med ett fluorescerande färgämne. Vid bildbaserad in situ-sekvensering för ribonukleinsyra (RNA) är blobbarna lokala intensitetsmaxima (dvs ljusa fläckar) motsvarande platserna för specifika RNA-nukleobaser i celler. Traditionella metoder för blob-detektering bygger på enkla bildbehandlingssteg som måste vägledas av användaren. Problemet är att användaren måste hitta optimala parametrar för varje steg som ofta är specifika för just den bilden och som inte kan generaliseras till andra bilder. Dessutom är några av de befintliga verktygen inte lämpliga för storleken på mikroskopibilderna som ofta är i mycket hög upplösning och 3D. Maskininlärning (ML) är en samling tekniker som ger datorer möjlighet att “lära sig” från data. För att eliminera beroendet av användarparametrar, är tanken att tillämpa ML för att lära sig definitionen av en blob från uppmärkta bilder. Forskningsfrågan är därför hur ML effektivt kan användas för att utföra blobdetektion. En blobdetekteringsalgoritm föreslås som först extraherar en uppsättning relevanta och icke-överflödiga bildegenskaper, klassificerar sedan pixlar som blobbar och använder slutligen en klustringsalgoritm för att dela upp sammansatta blobbar. Detekteringsalgoritmen fungerar utanför kärnan, vilket innebär att det kan bearbeta bilder som inte får plats i minnet genom att dela upp bilderna i mindre delar. Resultatet visar att detekteringsalgoritmen är genomförbar och visar att den kan konkurrera med andra populära programvaror för blobdetektion. Men i motsats till andra verktyg behöver den föreslagna detekteringsalgoritmen inte justering av sina parametrar, vilket gör den lättare att använda och mer tillförlitlig. Nyckelord Biomedicinsk bildanalys; Blobdetektion; Maskininlärning; 3D; Datorseende; Bildbehandling
v
Acknowledgements First, I would like to express my gratitude towards my examiner Assoc. Prof. Anne Håkansson at the KTH Royal Institute of Technology for guiding me from the first project proposal all the way to the final deliverable. She was always open to answering the most troublesome questions or providing critical feedback. Due to her meticulous remarks I was able to reshape and tweak my work in order to achieve the high quality it has now. I would also like to thank my supervisor Jacob Kowalewski at Single Technologies under whom I performed this research. Not only would he provide me with the required resources at any moment, but he would also not hesitate to free up time for discussion. That I was able to finish the project well within the set time is most likely due to his dependable commitment. Moreover, his ideas and suggestions have strongly contributed to the approach applied in this project. Furthermore, I would like to thank Single Technologies for providing me with a very interesting thesis subject and a pleasant working space. I want to thank my co-workers for the nice chats and the friendly ambience around the office. Finally, I would like to thank my university supervisor Assoc. Prof. Vladimir Vlassov who provided me with some highly needed hints so that I could proceed with my research. Martin ter Haak Stockholm, May 2018
Contents 0.1
Acronyms and abbreviations . . . . . . . . . . . . . . . .
1 Introduction 1.1 Background . . . . . . . . . . . . . . . . . 1.2 Problem . . . . . . . . . . . . . . . . . . . 1.3 Purpose . . . . . . . . . . . . . . . . . . . . 1.4 Goals . . . . . . . . . . . . . . . . . . . . . 1.4.1 Benefits, ethics and sustainability . 1.5 Research methodology . . . . . . . . . . . 1.6 Delimitations . . . . . . . . . . . . . . . . 1.7 Outline . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
2 An introduction to in situ RNA sequencing 3 Blob detection 3.1 Automatic scale selection . . . . . 3.2 Algorithms . . . . . . . . . . . . . 3.2.1 Template matching . . . . 3.2.2 Thresholding . . . . . . . 3.2.3 Local extrema . . . . . . . 3.2.4 Differential extrema . . . 3.2.5 Machine learning . . . . . 3.2.6 Super-pixel classification . 4 Machine learning 4.1 Classification . . . . . . . . . 4.1.1 Naive Bayes . . . . . 4.1.2 Logistic regression . 4.1.3 K-Nearest Neighbour 4.1.4 Decision Tree . . . .
vi
. . . . .
. . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
ix 1 1 2 4 4 5 6 7 8 9
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
. . . . .
. . . . . . . .
11 11 13 13 14 16 16 19 19
. . . . .
21 22 22 23 24 25
CONTENTS
4.2
4.3
4.1.5 Random Forest . . . . . . . . . . . . . 4.1.6 AdaBoost . . . . . . . . . . . . . . . . 4.1.7 Support Vector Machines . . . . . . . 4.1.8 Neural network . . . . . . . . . . . . . 4.1.9 Validation . . . . . . . . . . . . . . . . Clustering . . . . . . . . . . . . . . . . . . . . 4.2.1 K-means . . . . . . . . . . . . . . . . . 4.2.2 Agglomerative clustering . . . . . . . 4.2.3 MeanShift . . . . . . . . . . . . . . . . 4.2.4 Spectral clustering . . . . . . . . . . . 4.2.5 Other clustering algorithms . . . . . . 4.2.6 Validation . . . . . . . . . . . . . . . . Dimensionality reduction . . . . . . . . . . . 4.3.1 Principal Component Analysis (PCA)
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
vii
. . . . . . . . . . . . . .
26 26 27 27 29 30 30 30 31 31 32 32 33 33
5
Related work 34 5.1 Blob detection . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 Machine learning for biomedical image analysis . . . . . 35
6
Methodology 6.1 Blob detection process . . . . . . . . . . . . 6.1.1 Feature extraction . . . . . . . . . . . 6.1.2 Feature compression . . . . . . . . . 6.1.3 Pixel classification . . . . . . . . . . 6.1.4 Pixel clustering . . . . . . . . . . . . 6.1.5 Blob extraction . . . . . . . . . . . . 6.1.6 Blob filtration . . . . . . . . . . . . . 6.1.7 Chunking . . . . . . . . . . . . . . . 6.2 Experiments . . . . . . . . . . . . . . . . . . 6.2.1 A: Feature extraction . . . . . . . . . 6.2.2 B: Feature compression . . . . . . . 6.2.3 C: Pixel classification . . . . . . . . . 6.2.4 D: Pixel clustering . . . . . . . . . . 6.2.5 E: Run on whole image . . . . . . . . 6.2.6 F: Comparison with state-of-the-art 6.2.7 Summary . . . . . . . . . . . . . . . 6.3 Data collection . . . . . . . . . . . . . . . . . 6.3.1 Characteristics . . . . . . . . . . . . 6.3.2 Labelling . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
38 38 39 40 40 40 41 41 41 42 42 45 45 49 50 51 51 51 51 53
viii
CONTENTS
6.4
Experimental design . . . . . . . . . . 6.4.1 Test system . . . . . . . . . . . 6.4.2 Software . . . . . . . . . . . . . 6.4.3 Data analysis . . . . . . . . . . 6.4.4 Overall reliability and validity
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
7 Analysis 7.1 Results from A: Feature extraction . . . . . . . . . . . . 7.2 Results from B: Feature compression and C: Pixel classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Results from D: Pixel clustering . . . . . . . . . . . . . . 7.4 Results from E: Run on whole image . . . . . . . . . . . 7.5 Results from F: Comparison with state-of-the-art . . . .
. . . . .
55 55 56 56 56
58 . 58 . . . .
61 66 70 71
8 Conclusions 74 8.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Bibliography A Experiment F software configurations A.1 Crops . . . . . . . . . . . . . . . . A.2 MFB detector . . . . . . . . . . . A.3 FIJI . . . . . . . . . . . . . . . . . A.4 CellProfiler . . . . . . . . . . . . . A.5 Ilastik . . . . . . . . . . . . . . . .
79
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
95 95 95 96 97 102
0.1. ACRONYMS AND ABBREVIATIONS
0.1 Acronyms and abbreviations Terms related to biology RNA FISH HCS DNA FISSEQ mRNA HCA
Ribonucleic acid Fluorescence in situ hybridization High content screening Deoxyribonucleic acid Fluorescent in situ sequencing Messenger RNA High content analysis
Terms related to image processing 2D 3D LoG GGM DoH DoG
Two-dimensional Three-dimensional Laplacian of Gaussian Gaussian gradient magnitude Determinant of Hessian Difference of Gaussians
Terms related to machine learning ML NN PCA SVD MI SVM RF DT LR KNN NB ReLU RBF
Machine learning Neural network Principal component analysis Singular value decomposition Mutual information Support vector machine Random forest Decision tree Logistic regression k-nearest neighbour Naive Bayes Rectified linear unit Radial basis function
ix
Chapter 1 Introduction In this thesis it is researched how machine learning can be applied to blob detection. What is meant with machine learning and blob detection will be later described in their respective chapters. This chapter provides an introduction to the research.
1.1 Background On the interface of computer science and biology we have an interdisciplinary field called bio-informatics. This field focuses on applying techniques from computer science to better understand biological data. One of its areas, namely biomedical image analysis, aims to analyse images that have been captured for the purpose of analysing medical data. Microscopy imaging is an important tool in the biomedical field for applications like the study of the anatomy of cells and tissues (histology) [1], urine analysis [2] and cancer diagnosis [3]. Fluorescent chemicals are often added to mark interesting features in the images such as with fluorescence in situ hybridization (FISH). FISH is the binding of fluorescent dyes to specific ribonucleic acid (RNA) sequences in tissue cells [4]. By capturing microscopy images under certain lighting conditions, these sequences light up as groups of local intensity maxima, also called blobs (see Figure 1.1 for an example). The location and the order of RNA sequences that are detected can be used for gene expres-
1
2
CHAPTER 1. INTRODUCTION
sion profiling. This profiling allows researchers to determine the types and structure of single cells [5]. As microscopes are becoming faster and supporting higher resolutions, the scale of the produced images makes it unfeasible for researchers to do manual analysis. Even more, it has been demonstrated that machine learning methods can outperform human vision at recognising patterns in microscopy images [6]. Therefore several bio-informatics software packages [7–10] have been developed that facilitate them or even make the analysis fully automatic in so called high-content screening (HCS) [11]. Furthermore, confocal microscopes are increasingly being used to create 3D images of cell tissue. These microscopes, which were to a large extent originally developed at KTH [12], can capture images at different depths. Machine learning, as a field from computer science, aims to ”train” programs to perform specific tasks by supplying them with data. Learning from data is useful when the task is hard to formalise such is often the case in object detection. For example, explaining to a computer how it can find cells in an image of animal tissue is hard. One way to do this is by providing the computer with a large dataset of cell images. With this data machine learning algorithms can be applied to deduce a visual definition of a cell. Using this new definition the computer can spot instances of cells in any image. The same reasoning can be applied to detecting blobs in biomedical images as well. By supplying a program with a set of examples of blobs, it can learn to detect blobs in images analogously to how it can detect cells.
1.2 Problem In this thesis the aim is to do blob detection on high-resolution 3D microscopy images. This is a difficult task because firstly it is often not possible to check the veracity of the found blobs. Experts can usually only assess the results by looking at them visually or by checking whether they match with prior knowledge. Secondly, the scale of images poses a challenge for both the blob detection algorithms and for verifying the results. Popular methods for biomedical image analysis rely on a number of
CHAPTER 1. INTRODUCTION
3
Figure 1.1: Microscope image of human tissue cells where RNA sequences have been stained with a specific fluorescent dye. The blobs, visible as bright spots, are spatially clustered within cells. A single cell and its most clear blobs have been labelled for an example. simple image processing steps for which the user has to set the right parameters such as in FIJI [13] and CellProfiler [10]. The main drawback of this approach is that some assumptions have to be made in order to tune parameters for the algorithms. Because these parameters are optimised only for the current image set, they cannot always be generalised to other image sets. Or simply, what can be a blob in one image may not be a blob in another image. Secondly, to deal with noise popular methods usually apply a number of pre-processing steps incurring extra time and additional parameters. Moreover, FIJI and CellProfiler were not created with high-content screening in mind since they can only process images that fully fit in memory, which is not always the case. Also, for a tool that is so popularly used, CellProfiler is quite slow and some of its functions only work for 2D images. To tackle the issue of user-set parameters, machine learning can be applied to train a model that can find blobs without user interaction. In addition, the models can be taught to ignore noise, thereby skipping the pre-processing steps. The algorithms have to deal with the 3D aspect and ideally use that information in their analysis. Further-
4
CHAPTER 1. INTRODUCTION
more, the algorithms have to operate out-of-core, meaning that they can process images that do not fit in memory. Lastly, efficiency is a major concern because of both the high resolution of microscopy images nowadays and the extra computations that machine learning algorithms usually require. Therefore a requirement is that the analysis of an image does not take longer than the time needed to capture that image. The research question is: How can machine learning techniques effectively be applied to blob detection in high-resolution 3D microscopy images?. Note that here ’effective’ combines both the notions of high quality and low running time since solutions that only excel in one aspect but lack in the other are useless.
1.3 Purpose The purpose of this thesis is to apply and test different machine learning techniques for blob detection in high-resolution 3D microscopy images. For the purpose of a proof-of-concept, images produced for in situ RNA sequencing are analysed as these images usually satisfy these characteristics. Since multiple steps are needed to distinguish blobs, machine learning can be applied at different stages in different forms. Therefore in each step suitable machine learning techniques are tested. The result is an analysis that compares the tested machine learning techniques and makes a conclusion on which are best suited for solving the problem.
1.4 Goals The aim of this project is to aid the development of autonomous bioimage analysis tools such that they require user minimal interaction. As user-guided image processing is replaced by computer vision the hope is that these tools become both faster and more accurate. While humans are limited by their cognitive capabilities, machines can continuously be enhanced by iterative upgrades. Faster hardware, smarter algorithms and better data will all help to progress the performance of such analytical tools.
CHAPTER 1. INTRODUCTION
5
Even though blob detection is only one task of current bio-image analysis tools, insights originating from this research can be applied to other common tasks as well such as edge or corner detection. Machine learning models can be taught to recognise cell membranes, cytoplasms or nuclei in a similar fashion as to blob detection. Different training data and alternative features have to be used but the algorithms will be analogous.
1.4.1 Benefits, ethics and sustainability With the ongoing research on cell tissue such as brain and organs, the ability to do large-scale gene expression profiling of single cells has great advantages. The identity and function of every cell can be determined, which allows researchers to accurately map the structure of complex tissues. Having an automated analysis pipeline can be a significant benefit to effective research in this field. Researchers do not wish to continuously adjust the settings with trial-and-error to find those parameters that give the best results. Therefore an approach is needed that picks the optimal settings for them so they can focus on their research. Letting computers take over the tasks of humans for image analysis can lead to great gains in terms of performance. Computers will surely be much faster and work longer, but their accuracy will not necessarily be comparable to that of humans. Human experts can directly profit from their prior knowledge where computer programs have to be specifically tailored for this. This means that the precision of such computer programs depends on the experience of both the original domain expert and the software engineer. Human mistakes can lead to errors in the software but while a human will usually notice when something has gone wrong, a computer does not care as long as the exception is not caught. When machine learning is employed, this problem becomes even more significant because then the accuracy of the software hinges on the quality of the training data. As biomedical images are frequently used in the research, diagnosis or treatment of human health, it is important to think about who should take responsibility when image analysis tools produce incorrect results. Ethics play an important role in deciding whether the producer of the software should be held accountable, or the user of the software. It is easy to shove the blame
6
CHAPTER 1. INTRODUCTION
towards the original creator but there is also the responsibility of the operating researchers and doctors. This is a difficult predicament, but according to me the liability should be investigated on a case-by-case basis. When an incident has occurred, thorough inspection of the involved events should be performed. The inspection should determine whether the cause was a doctors mistake, software error or hardware fault. Based on this information a verdict needs to be made on who should be held accountable. Regarding the possible medical applications of an automated image analysis tool, it is not hard to imagine the profits it brings about for the sustainability of health. As we humans are being surpassed by computer vision in our image analysis ability, we can focus on the tasks in which we are still superior such as interpreting the results and drawing conclusions. The consequence is that we become more efficient at treating health. There is clearly a strong relationship with the third Sustainable Development Goal (SDG): ”Good Health and Well-being” set down by the United Nations on September the 25th 2015 [14]. The project is not related to environmental sustainability.
1.5 Research methodology Research can be classified as either quantitative; meaning that a phenomenon is proved by experiments or tested with large data sets (quantity), or as qualitative; wherein a phenomenon is studied through probing the terrain or environment (quality) [15]. Since in this thesis the goal is to find the algorithms that perform best on a certain input, quantitative results will be collected. The performance is measured by predetermined metrics, therefore numbers dictate the conclusions. The philosophical assumption followed is post-positivism. Even though the reality is objectively given through reproducible results, as in positivism [15], different observers can have divergent opinions on what is the ’optimal’ algorithm for the problem, which distinguishes postpositivism from just positivism [15]. Additionally, it may also depend in practice on which characteristics of the algorithm are deemed most important. For example, a low quality but fast solution can have preference to a high quality but slow solution in some cases. Realism, which is the other potential philosophical assumption in this case [15], is not
CHAPTER 1. INTRODUCTION
7
applicable because it assumes that matters do not depend on the person who is thinking about them. However, it has just been argued that the interpreter possibly assesses the results subjectively. The research method used is applied research because the practical problem of blob detection needs to be solved, which is the main characteristic of applied research [15]. Multiple approaches are tested to find the best approach with the application of RNA sequencing in mind. Possible competing research methods are fundamental research; also called basic research since it drives new innovations, principles and theories, and descriptive research; which focuses on more statistical research and describing the characteristics of a situation as opposed to describing the causes and effects. However, since the goal of the thesis is to improve the performance of known solutions it should not be characterised as basic or descriptive research, but rather as applied research. A deductive approach is adopted because a generalisation is concluded that answers the research question, based on large amounts of quantitative data [15]. An abductive approach could also be chosen, but this approach assumes that the data is incomplete [15]. Since more data can be generated if desired, this is not the case in this project.
1.6 Delimitations The main product of this thesis is the results and conclusions of the analysis as opposed to the developed software. Since the developed software is not meant to be used in production as-is, it does not have to be highly optimised or robust to bad user input. Nevertheless, its quality must be sufficient such that the test results are credible. In addition, the focus will be on evaluating existing techniques, instead of coming up with custom algorithms and methods unless necessary. Available tried-and-tested implementations will be deployed to limit the amount of coding and debugging needed. This means that only those algorithms will be tested of which there are thrust-worthy implementations such as those found in popular software libraries.
8
CHAPTER 1. INTRODUCTION
1.7 Outline The first 3 chapters introduce the background information that is needed to understand the context and the experiments. Chapter 2: An introduction to in situ RNA sequencing provides a broad description of an example application of blob detection. The next Chapter 3: Blob detection describes the current state of art in algorithmic blob detection with biomedical image analysis in mind. Chapter 4: Machine learning introduces the basic theory of the machine learning concepts and algorithms that are applicable in this thesis. It is followed by Chapter 5: Related work which discusses the papers and corresponding researches that are relevant to this thesis. Next Chapter 6: Methodology lays out the strategy for answering the research question by six experiments. Chapter 7: Analysis contains the results of the experiments while argumenting their reliability. The thesis ends with Chapter 8: Conclusions that answers the research question, discusses the implications and suggests some open questions that are left.
Chapter 2 An introduction to in situ RNA sequencing In order to do phenotypic profiling1 of single cells traditionally one would look at the appearance of the cells by morphological methods2 [16]. In image-based cell profiling3 , hundreds of morphological features [such as the shape, structure and texture] are measured from a population of cells treated with either chemical or biological perturbagens [16]. A perturbagen is an agent (small molecule, genetic reagent, etc.) that can be used to produce gene expression changes in cell lines [17]. If one would then like to quantify the effects of a treatment, he or she can measure the changes in those morphological features compared to untreated cell in the control group. However, instead of looking at the results of gene expression such as the shape and structure of cells, one could also look more directly at which RNA4 sequences are being synthesised by transcription. In transcription, messenger RNA (mRNA5 ) is synthesised as a complementary copy of a DNA segment by an enzym called RNA polymerase6 [18]. These RNA sequences are used to transport the genetic informa1
use the set of observable characteristics to create a profile methods that are based on form and structure 3 gaining information on a cell 4 ribonucleic acid, a molecule essential in various biological roles in coding, decoding, regulation, and expression of genes 5 RNA molecule that convey genetic information from DNA to the ribosomes 6 enzyme that is responsible for copying a DNA sequence into a RNA sequence 2
9
10
CHAPTER 2. AN INTRODUCTION TO IN SITU RNA SEQUENCING
tion from the DNA in the nucleus to the ribosomes7 where they specify the amino acid8 sequence for the creation of proteins. Protein products like enzymes control the processes in the cell by facilitating the chemical reactions [19]. By knowing which enzymes are being produced, one can tell the type and functions of single cells. Developments in high-resolution microscopy together with fluorescence in situ hybridization (FISH) allow gene expression profiling for resolving molecular states of many different cell types [20] without losing spacial information. The FISH procedure starts by binding specific fluorescent chemicals to specific nucleobases in RNA-strings [4]. These chemicals are chosen such that they absorb light and emit it with a longer wavelength [21]. When capturing an image of that specific wavelength the locations of the fluorescent chemicals are revealed, and thus the locations of the tagged nucleobases. The nucleobases will show up as local intensity maxima in the images, that are usually called blobs. By capturing multiple photos with different fluorescent agents, fragments of nucleobases (sometimes called barcodes) can be distinguished that can encode for the full RNA string [20]. One popular method of fluorescent in situ RNA sequencing is FISSEQ [5]. Automated microscopy systems with the ability to make large amounts of high-resolution images each hour allow the transcriptonomic9 profiling of thousands of cells [22]. Even more, confocal microscopes can be used to capture photos of the cells at different depths of the tissue resulting in 3D images [12]. One of the main challenges from a bioinformatics point of view is accurately finding the blobs corresponding to different nucleobases and use them to do RNA sequencing.
7
complex molecule that acts as a factory for protein synthesis in cells building blocks of proteins 9 based on information relayed through transcription 8
Chapter 3 Blob detection Blob detection falls within the field of visual feature detection. This field, which is part of computer vision, focuses on finding image primitives such as corners, edges, curves and other points of interests in digital images [23]. Blob detection is aimed at finding regions in an image that are different from the surroundings with respect to properties like brightness, colour and shape (see Figure 3.1a for more properties). These regions are called blobs (see Figure 3.1b for an example). As there are multiple definitions of blobs depending on the application, there are also many different algorithms for finding them. A different but more exact definition used by Tony Lindeberg, who is a influential researcher on multi-scale feature detection, is that a blob is a region with at least one local extremum [24], such as a bright spot in a dark image or dark spot in a light image. Even though most classical definitions consider blobs in 2D, the definition can be extended to 3D as well. In this thesis blobs are defined as small (< 50 pixels) round 3D spots in an image that are brighter than their background (i.e. local intensity maxima). Refer back to 1.1 for an example.
3.1 Automatic scale selection A majority of blob detection methods are based on automatic scale selection as inspired by Lindeberg [27]. Before detection, the image is converted to scale-space representation by applying a convolutional
11
12
CHAPTER 3. BLOB DETECTION
(a) Examples of blob properties. From [25]. (b) Blob detection in a field of sunflowers. From [26].
Figure 3.1
g(x, y) =
1 x2 + y 2 exp{− } 2πσ 2 2σ 2
Equation 3.2: Two-dimensional Gaussian function. x is the distance from the origin in the horizontal axis, y is the distance from the origin in the vertical axis, and σ is the standard deviation of the Gaussian distribution smoothing kernel over the image. In most cases this is the Gaussian filter which performs a weighted average of its surrounding pixels based on the Gaussian distribution (see Formula 3.2 for the 2D filter), leading to a blurred image. The main purpose of scale-space representation is to understand the image structure at multiple levels of resolutions simultaneously [27]. The scale can set by changing the parameter σ. A larger scale σ increases the amount of smoothing, which leads to more Gaussian noise ignored and larger objects that can be detected [28]. By running the blob detection algorithms on the same image at different scales, blobs of different sizes can be detected. Figure 3.3 shows how with different scale levels of Gaussian smoothing variously sizes blobs can be found.
CHAPTER 3. BLOB DETECTION
13
Figure 3.3: Smoothed and thresholded images of an old telephone for scale levels s2 = 0, 2, 16, 32, 128, 1024 (from top-left to bottom-right). From [24].
3.2 Algorithms For every combination of blob definition and application different blob detection algorithms can be optimal. In the domain of this thesis, a few algorithms stand out that are either popularly used or are potential candidates. These are template matching, thresholding, local extrema algorithms, differential algorithms, algorithms using machine learning and over-segmentation.
3.2.1 Template matching Since blobs can be regarded as simple objects in an image, template matching can be applied to find them. This algorithm requires an image of the expected appearance of the object, called a template (Figure 3.4a). The template is moved over the search image (Figure 3.4b) with a stride of 1 and objects are detected where the template matches part of the image [28]. Every time the sum of absolute differences (SAD) or sum of squared differences (SSD) is stored in a correlation matrix (Figure 3.4c). The highest values (local maxima) in the correlation matrix correspond to a high probability that a object is located there. A threshold can then be used to extract the most significant objects and their locations. To find objects of different shapes and sizes multiple
14
CHAPTER 3. BLOB DETECTION
(a) Template
(b) Search image
(c) Correlation image
Figure 3.4: Template matching for finding a coin in an image of a set of coins. From [30]. templates can be designed beforehand. Template matching is easy to implement and very fast [29]. However, its main drawback is that it has a hard time finding objects that do not match the precise template. Since the blobs in our case can be of slightly different sizes and sometimes clumped up with other blobs, this method will not be very effective.
3.2.2 Thresholding When blobs are defined as either bright or dark spots in an image (Figure 3.5a), one can simply threshold the pixels to attain a binary image with regions corresponding to blobs (Figure 3.5b). Many thresholding techniques exist that exploit different information such as shape, clustering, entropy and object attributes. Sezgin and Sankur performed a survey and comparison of 40 selected thresholding methods from various categories [31]. Common processing steps that follow are filling up holes within the blobs that are a result of noise and splitting up multiple connected blobs by using a watershed algorithm (Figure 3.5b). The next step consists of locating the blobs by looking for connected components; groups of neighbouring blob pixels. Also the blobs can be filtered out that do not adhere to certain criteria such as size and shape. Finally the centroids of the blobs are calculated and returned as the location of the blob (Figure 3.5d). Exactly this approach is used by the popular bio-image analysis tool CellProfiler [10]. This interactive tool lets users create a custom pipeline that takes an image as input and outputs results according to the cho-
CHAPTER 3. BLOB DETECTION
15
(a) Input image
(b) Binary image by thresholding
(c) Binary image after watershed
(d) Final clustering and count
Figure 3.5: Common steps in a thresholding algorithm. Created using Fiji [13]. sen steps. These steps are simple image processing steps such as background removal, smoothing, enhancements and object detection. It works well when the user has time to tweak the parameters for each image or when images are similar. However, if not so, then it can become quite time-consuming to do batch processing of a large number of images. Watershed Watershed works by treating an image as a topographic map and letting ”water” flow from the peaks of the image downwards. In Figure 3.6, the peaks are marked as red circles. The boundary where the water from two markers meet each other indicates where the blobs should be split.
16
CHAPTER 3. BLOB DETECTION
Figure 3.6: Starting markers for watershed. First the shortest distance to the edge of the blobs is computed for each pixel. The darker the pixel, the further it is from the edge. The local minima that then appear are used as markers (visualised as red circles). When multiple markers are close together, then all but one are purged. Created using scikit-image [32].
3.2.3 Local extrema One can also simply look at the local maxima or minima in intensity to find the bright or dark blobs in the image. During run-time for every 3x3 region (other sizes are possible) the location of the pixel with the maximum or minimum intensity is recorded, usually only when it is above a certain threshold to ignore noise. These pixels are assumed to be the centres of blobs. Next a filtering step often follows to remove the extrema that are not centres of blobs. Sometimes a segmentation algorithm like watershed (see section 3.2.2) is used to find which other pixels belong to the blobs. Although this method is simple, problems will occur when there are large blobs with multiple local extrema. In this case the algorithm will output multiple smaller blobs instead of a large one.
3.2.4 Differential extrema Differential methods can be used instead, when local extrema are not sufficient to distinguish blobs due to noise. These methods are based on the derivative of the intensity function with respect to the coordinates and will therefore pinpoint regions where the intensity changes faster than in the rest of the image. Blobs can be mathematically represented by a pair consisting of a saddle point and one extremum point
CHAPTER 3. BLOB DETECTION
17
making it look like a peak in the frequency domain [33] (see Figure 3.7). Laplacian of the Gaussian (LoG) is a popular differential method used for blob detection [34]. First it convolves the input image by a Gaussian kernel at a certain scale t = σ 2 to give a scale-space representation L(x, y; t), where x and y are the pixel coordinates. Next, it applies the Laplacian operator (3.1) which results in a strong positive response of dark blobs of a specific size [34]. To capture blobs of different sizes usually the Gaussian kernel is applied with different scales simultaneously with the scale-normalised Laplacian operator (3.2). Figure 3.8 shows the result of applying the LoG with different scales to the same image. Since the Laplacian is expensive to compute, the Difference of Gaussians (DoG) is commonly used instead. This operator can be seen as an approximation of the Laplacian but is faster to compute. Similarly as to the LoG method, blobs can be detected in different scale-spaces. It is computed as the difference between two images smoothed with Gaussian kernels of different scales (3.3). ∇2 L = Lxx + Lyy
(3.1)
∇2norm L = t(Lxx + Lyy )
(3.2)
∇2norm L ≈
t (L(x, y; t + ∆t) − (L(x, y; t) ∆t
(3.3)
The scale-normalised Determinant of the Hessian (DoH) is another popular differential method. It uses the Monge-Ampère operator (3.4), where HL denotes the Hessian matrix of the scale-space representation L. What Lindeberg found in a detailed analysis is that the Hessian operator has better scale selection properties under linear image transformations than the Laplacian operator [34]. det Hnorm L = t2 (Lxx Lyy − L2xy )
(3.4)
18
CHAPTER 3. BLOB DETECTION
(a) Sunflower with a line straight through the ycentre. Adapted from [33].
(b) Intensity of the pixels on the red line in (a). The local minima that is used as the blob centre is indicated, together with the saddle points. Created with Matplotlib [35].
Figure 3.7: Intensity function over the x-axis of a sunflower image. This is applicable to 2D and 3D as well.
Original image
σ = 1.0
σ = 3.5
σ = 10.0
Figure 3.8: Laplacian of Gaussian applied to image with different scales σ. Created using Ilastik [36].
CHAPTER 3. BLOB DETECTION
19
3.2.5 Machine learning The problem of the previous algorithms is that they require the user to tune the parameters in order to find the desired blobs. Also what may be good parameters in one image may not be satisfactory in another. So what if we could teach the program what is a good blob by giving it examples and then letting it find the other blobs according to the learned definition. This is exactly how supervised machine learning can be applied to blob detection. In advance different features for each pixel are calculated that describe the intensity, edges and texture. By preceding the feature extraction with Gaussian smoothing using multiple scales, features are generated for multiple scale-spaces (as explained in section 3.1). Next, the user interactively selects some pixels belonging to a blob and some that do not belong to a blob. With this information a supervised machine learning algorithm like a RandomForest (see subsection 4.1.5) or the support vector machine (SVM) (subsection 4.1.7) is trained which can predict the class of the remaining pixels. The connected components of the blob pixels are then tagged as candidate blobs in the next step. If these candidate blobs are as desired, then their centroids can be returned as the blobs positions. But if there are stricter criteria, then machine learning can be applied again to distinguish the true blobs from the false blobs. First a set of features is calculated for each candidate blob such as shape, size or intensity histogram for example. Then a few blobs must be selected by the user as being true and a few others as being false. A machine learning algorithm can then use this information to identify only the correct blobs. Since an arbitrary number of features can be included, this method of finding blobs can be very accurate. This was what the people behind Ilastik thought as well, because their software does exactly this [36].
3.2.6 Super-pixel classification Super-pixel classification is another approach for blob detection. It starts by creating a segmentation of the pixels into regions of pixels called super-pixels. This is essentially a clustering step that tries to group neighbouring pixels together that are similar with respect to spe-
20
CHAPTER 3. BLOB DETECTION
(a) Felzenszwalb
(b) Quickshift
Figure 3.9: Products of two over-segmentation algorithms on an image of the astronaut Eileen Collins. From [41]. cific properties. Algorithms producing such so-called over-segmentations are, among others, Felzenzwalb’s [37] image segmentation algorithm and Quickshift [38] (see Figure 3.9). The next step is classifying these super-pixels as being a blob or not. A popular approach for this is extracting SIFT (scale-invariant feature transform) descriptors [39], map these to clusters and create a bag-of-visual-words histogram for the clusters appearing in the super-pixel as in [40]. The histogram is then classified as blob or non-blob using a supervised machine learning algorithm such as the SVM (see 4.1.7). This requires off-line training of the classifier prior to run-time with labelled super-pixels.
Chapter 4 Machine learning In this chapter the machine learning techniques that can be applied to the project’s problem are treated. As the focus of this thesis is to evaluate their performance, the techniques are only shortly discussed. These descriptions are not meant to be exhaustive, so the reader is advised to consult more elaborate sources if he or she requires a more thorough explanation. Machine learning is a field of computer science that gives computer systems the ability to ”learn” (i.e. progressively improve performance on a specific task) with data, without being explicitly programmed [42]. Data is usually structured as multi-array of values. Each row corresponds to one instance (e.g. customer) that we can call a datapoint. The columns are called features and describe characteristics of that instance (e.g. name, birth year, address, phone number, etc.). Typical tasks of machine learning are classification, regression, clustering, anomaly detection and structured prediction. A distinction that is commonly made between machine learning algorithms is whether they are supervised or unsupervised. Supervised learning is the task of learning a function that maps an input to an output based on example input-output pairs [43]. The output value is commonly called label. After training the learned function can be used to predict the label for new inputs. In classification the algorithm needs to decide to which discrete class a datapoint belongs. A classic example is classifying e-mails as either spam or non-spam. Regression on the other hand, aims to predict a continuous target value
21
22
CHAPTER 4. MACHINE LEARNING
for some datapoint. Lets say you want to approximate the price of a house with input information such as the floor area, location, build year and the number of bedrooms. Then you could look at other houses and build a model that describes the relationship between the house information and the price. With enough data this model is then usable for predicting other house prices. Unsupervised learning algorithms are not provided with the labels during training. This means that they will have to find patterns on their own. One of the most common types of unsupervised learning is clustering. Here an algorithm is applied that groups datapoints together that are similar with respect to some properties [44]. For a set of music tracks for example, it can investigated whether they can be partitioned into categories by considering their metadata like the year, artist, genre and length. As another unsupervised learning type, dimensionality reduction aims to describe the original data using fewer dimensions [45]. The main advantages of this are gaining a better conceptual understanding of the data, decreasing required storage space and improving the running time for following algorithms. In literature different terms are used for the same concepts. Therefore note that datapoints, instances, observations and example inputs all mean the same thing, namely the individual data-units. The attributes of the data-units are sometimes called properties, features or dimensions. The attribute that needs to be predicted can be called label, decision class, output class, response variable or target output.
4.1 Classification 4.1.1 Naive Bayes As a baseline classifier, which is an algorithm to which other algorithms are compared, the naive Bayesian classifier is often used [46]. This classifier uses the famous Bayesian theorem to make predictions (see Equation 4.1). It works by determining the probability of a datapoint belonging to a certain class by considering the prior knowledge of conditions related to the datapoint. For example, if one would like to estimate the probability of a person Bert of age 57 having cancer
CHAPTER 4. MACHINE LEARNING
P (A|B) =
23
P (B|A)P (A) P (B)
Equation 4.1: Bayes theorem where A and B are events and P (B) ̸= 0. P (Bert has cancer|Bert is 57 years old), the Bayesian formula can be used with P (A) as the probability of someone having cancer and P (B) as the probability of someone being 57 years old. To do a binary classification of Bert having cancer or not, this conditional probability is calculated and compared to the threshold of 50%. If the probability is more than 50%, then Bert is classified as having cancer. Since the probabilities are usually assumed to be normally distributed, probabilities can be estimated for conditions that have not been seen before. This method can be extended to consider multiple conditions (i.e. features) by calculating the product of the conditional probabilities for the given conditions. Unfortunately, the main drawback of this method is that it assumes that within one class all features are statistically independent, hence the name ”naive” [46]. On the good side though, research has shown that this is not a very significant problem in practice, especially for highly dimensional data [47]. Furthermore, this algorithm has very convenient properties that make it worth trying in many cases. Namely, it offers a range of important services such as learning from very large datasets, incremental learning, anomaly detection, row pruning, and feature pruning - all in near linear time [46]. In addition, it requires a minimal memory footprint and is fast to train.
4.1.2 Logistic regression The probability of a datapoint belonging to a certain class can be estimated in other ways as well. A common method is logistic regression that uses regression to fit a line h(x) through the data. By inserting the h(x) value for a datapoint x into a sigmoid function (see Figure 4.2), a number between 0 and 1 is returned. This number indicates the probability that the datapoint belongs to the positive class (in the case of binary logistic regression). Multinomial logistic regression may be used in cases where the dependent variable has more than two out-
24
CHAPTER 4. MACHINE LEARNING
f (x) =
1 1 + e−x
Figure 4.2: Sigmoid function. come categories.
4.1.3 K-Nearest Neighbour A very simple classification algorithm that has seen popular usage in research is k-Nearest Neighbour (or kNN). Its ease of understanding and implementing, together with its general applicability, is the reason that it was included in the top 10 algorithms in data mining [48]. Instead of building a model from the training data such as most other learning algorithms, it actually uses the training data directly for classification. For this reason it is called a non-parametric classifier. For every datapoint it finds the k nearest datapoints in the training dataset. The classes of those nearest neighbours dictate the class of the input datapoint using a majority vote. The used distance function depends on the application but common types are the Euclidean and cosine distance [49]. kNN is notorious to being sensitive to noise such as outliers. Too small values of k can lead to noisy datapoints receiving strong influence in classifying new datapoints [48]. Because n comparisons need to be made for each input datapoint, performance is a big issue for large datasets as well. For this reason a number of improvements have been proposed such as ’condensing’ [50] or ’editing’ [51] the training dataset such that it becomes smaller but approximately retains its accuracy.
CHAPTER 4. MACHINE LEARNING
25
Figure 4.3: An example of a decision tree for deciding whether to go for a trip when considering the weather. From [52].
4.1.4 Decision Tree Decision trees are models that map observations about an item to conclusions about its target value using a series of decisions based on the observation’s attributes [52]. The decision tree model has the shape of a directed acyclic graph in the form of a tree, where each internal node represents a decision and each leaf node represents the predicted class for a given observation (see Figure 4.3). Every time a new observation has to be classified, it starts with a comparison at the root of the tree. There one of its attribute is compared to a certain value and based on this decision it continues down either of the node’s branches. At every node such comparison is made until the observation arrives at the leaf node and a final classification is made. Inducing the decision trees from training data is called decision tree learning. The goal is to generate a general model that can be used to classify new observations [52]. There are different algorithms for generating such model but they all rely on the main idea that at each node the decision has to be made that best splits the data with respect to the target class. The quality of the split is measured by the information gain or information gain ratio that the decision produces. The information is often defined as the weighted average of Shannon Entropy 94.1) or the Gini Impurity (4.2) over the new branches, where P (xi ) is
26
CHAPTER 4. MACHINE LEARNING
defined as the probability of a possible value from {x1 , ..., xn }. H(X) = −
∑
P (xi ) · log2 P (xi )
(4.1)
i
Gini(X) = 1 −
∑
P (xi )2
(4.2)
i
4.1.5 Random Forest A common extension of decision trees are ensemble methods like random forests. These are a set of multiple induced decision trees that combine their outputs into a single classification to improve the overall accuracy [52]. Decision trees are known for being very sensitive to irregularities in the training data which makes them susceptible to over-fitting [52]. A random forest is created by building multiple decision trees, each with a different random sample of features from the training data. This ensemble method is also sometimes called ”random subspace method” or ”feature bagging”. The motivation for this method is that it prevents classifiers from focusing on only a single (or few) features that are strong predictors of the response variable. Because the classifiers have to look for more general features, they are less likely to over-fit. Ho performed an analysis of how random space projection leads to accuracy gains [53]. Random forests have been successfully applied to pixel classification in the bio-image analysis software Ilastik. The accompanying paper adds that ”The ability of the random forest to capture highly non-linear decision boundaries in feature space is a major prerequisite for the application to general sets of use cases.” [36].
4.1.6 AdaBoost AdaBoost is another ensemble method that has shown good results in practice. Similarly to bagging, it combines the predictions of multiple arbitrary classifiers. It was invented by Y. Freund and R.E. Schapire in 1996 [54]. Where bagging takes into account all the predictors equally, boosting differs by actually taking a weighted sum of the predictions
CHAPTER 4. MACHINE LEARNING
27
as the final output. The ”Ada” in the name stands for adaptive, because the algorithm is able to tweak subsequent weak learners such that they focus on instances that are harder to classify. By combining weak learners that are only slightly better then random guessing, the final model is provably going to converge to a strong learner.
4.1.7 Support Vector Machines Support vector machines (SVM) represent a powerful technique in classification, regression and outlier detection [55]. Similarly as to decision trees and random forests, they are non-probalistic. For a binary classification it seeks out an optimum hyperplane separating the two classes involved such that the distance between the closest representatives of the two classes is maximised. During training time SVM algorithms build a SVM model that splits the training data into two classes with the least error. Next, new datapoints are classified based on which side of the hyperplane they fall. The hyperplane can be linear such in regular linear SVM’s but sometimes has other shapes such as curves. In these cases a kernel function can used to map the data into a different features space [55]. In this new feature space it should be easier to find a linear hyperplane that divides the transformed data. Popular kernels are the Gaussian radial basic function (RBF) kernel, the exponential kernel and the polynomial kernel.
4.1.8 Neural network The previously described machine learning algorithms are heavily used in industry and work well on a wide variety of important problems. However, for some problems central in artificial intelligence (AI), such as speech recognition and object detection they have not achieved the required performance. Therefore a new field of machine learning called deep learning has emerged, motivated in part by the failure of traditional algorithms to generalise well on such AI tasks [56]. A significant challenge for more complex data is the curse of dimensionality, which makes machine learning exceedingly more difficult when the number of dimensions is high [56]. In order to cope with this problem traditional machine learning algorithms need prior beliefs to be guided
28
CHAPTER 4. MACHINE LEARNING
f (x) = max(0, x)
Figure 4.4: Rectified Linear Unit (ReLU) function. about what kind of function to learn. However, these priors hurt the algorithm’s ability to generalise over more complex functions. Deep learning relies heavily on the concept of artificial neural networks, which are networks of nodes inspired loosely by the neural networks of which animal brains are composed. These networks consist of connected layers where each layer is made up of nodes. The output of these nodes are actually linear functions of the input connections of the node followed by a non-linear activation function such as sigmoid (Figure 4.2) or Rectified Linear Unit (Figure 4.4). What makes these neural networks ’deep ’ are their hidden layers that are situated between the input and output layer. These hidden layers enable the network to learn very complex non-linear functions as are required in more complicated tasks. Usually the nodes of each layer are connected to all the nodes in the neighbouring layers, this is called densely connected. The most common type of artificial neural network is the feed-forward network that aims to approximate some function in order to predict the output for any arbitrary input. This can be useful for tasks such as classification and regression but even for more complex tasks such as data compression or image segmentation. These networks are called ’feed-forward’ because the data ’flows’ from the input layer to the output layer (see Figure 4.5). There are no feedback connections in this type of network, in contrast to recurrent neural networks for example. The method of training a feed-forward neural network (or most other artificial neural networks) is called backpropagation. Backpropagation is used to calculate the gradient of the loss function with respect to the weights working from the final layer back to the first hidden layer. This
CHAPTER 4. MACHINE LEARNING
29
Figure 4.5: Example of a feedforward neural network. Adapted from [59]. gradient is then needed in gradient descent to update the weights of each layer. There are also more advanced optimisation algorithms such as Adadelta [57] and Adam optimiser [58].
4.1.9 Validation F1-score To measure the performance of a binary classification algorithm the f1-score is often used. It is defined as the harmonic mean (4.3) of the precision (4.4) and recall (4.5). Its range runs from 0.0 to 1.0. f1 = 2 ·
precision =
recall =
precision · recall precision + recall
|true positive| |true positive| + |false positive|
|true positive| |true positive| + |false negative|
(4.3)
(4.4)
(4.5)
30
CHAPTER 4. MACHINE LEARNING
4.2 Clustering 4.2.1 K-means K-means clustering is one of the most simple and popular clustering algorithm [60]. It starts with selecting k random (though smarter methods exists) points from the data as centroids. Then in the next step it assigns each of the remaining points to the closest centroid. At the end of this iteration the points have been partitioned in k disjoint clusters. Next, for each cluster a new centroid is calculated as the mean of all the attribute values of the points in the cluster. In the next iteration the points are assigned to the new centroids. The algorithm continues iterating until either the centroids stop moving between iterations or another stop criterion is reached. The space requirements for K-means are modest because only the data points and centroids are stored [60]. K-means is also quite fast because its running time is linear with respect to the dataset size. This makes it a powerful multi-purpose clustering algorithm and a good starting point for more advanced clustering algorithms.
4.2.2 Agglomerative clustering Agglomerative clustering is an example of a hierarchical clustering method that first derives a hierarchical tree from the data and then infers the main clusters [60]. Agglomerative is also sometimes called bottomup because it starts by putting each point in a separate cluster, and then builds up new larger clusters from the smaller clusters until all points are connected. At every step it determines which two clusters are closest together and then merges them. There are different linkage criteria for deciding the distance between clusters such as: minimal distance between closest members (single linkage), minimal distance between furthest members (complete linkage), distance between centroids and minimal sum of squared differences clusters (’ward’ linkage). The biggest drawback of this algorithm is the running time since it needs to compare every pair of clusters in each step, therefore requiring O(n3 ) [60] computations. There are however faster implementations that run in O(n2 log n). Another challenge is the non-triviality
CHAPTER 4. MACHINE LEARNING
31
of inferring the flat clusters from the hierarchical tree since the criteria can be subjective. Examples of criteria are a fixed number of clusters or a maximum distance between clusters.
4.2.3 MeanShift As a non-parametric clustering algorithm for feature spaces MeanShift was proposed in 2002 [61]. The algorithm relies on centroids that it continuously updates to be the mean of the points within a given region. It aims to discover ’blobs’ in a smooth density of samples [62]. This property makes the algorithm an attractive candidate for clustering pixels, since blobs have a consistent density and are often quite dense. The algorithm is however not as highly scalable, because it requires multiple nearest neighbour searches during the execution of the algorithm [62]. The only parameter it requires is the bandwidth, which dictates the size of the region to search through. The bandwidth can be set beforehand or be estimated.
4.2.4 Spectral clustering Spectral clustering algorithms use the top eigenvectors of a matrix derived from the distance between points (also called affinity matrix) [63]. For this family of algorithms a common approach goes as follows. First the affinity matrix is calculated for all the points. Then an eigendecomposition is performed on the normalised Laplacian of this matrix. Next the k eigenvectors are selected belonging to the top k highest eigenvalues. These vectors are concatenated into a matrix of n × k. Finally the points in this lower-dimensional space are assigned to k clusters using a simple clustering algorithm such as k-means. k is usually determined beforehand as the expected number of clusters, but other approaches exist that guess k from the eigendecomposition matrix. Spectral clustering is a popular algorithm because it is simple to implement, can be solved efficiently by standard linear algebra software and very often outperforms traditional clustering algorithms such as the k-means algorithm [64].
32
CHAPTER 4. MACHINE LEARNING
4.2.5 Other clustering algorithms Of course the mentioned list does not cover all algorithms that have been invented for the purpose of clustering, which is impossible due to the overwhelming amount of literature on the subject. Other popular algorithms that have been considered but are deemed unsuitable are: affinity propagation [65]; since it does not scale very well for n, DBSCAN [66]; because all the blobs have the same density the algorithm will not be able to distinguish them, and clustering using the Gaussian mixture model; since it has too many parameters and is not scalable [67].
4.2.6 Validation Silhouette Score Since the ground-truth of the cluster to which a point belongs is often either subjective or unknown, it is difficult to evaluate the quality of a clustering algorithm. The Silhouette Coefficient was therefore proposed by P.J. Rousseeuw in 1987 [68], because it can be calculated solely from the clustering results. It is composed of two scores [69]: a The mean distance between a datapoint and all other points in the same cluster b The mean distance between a datapoint and all other points in the next nearest cluster The Silhouette Coefficient s for a single datapoint is then defined as: s=
b−a max(a, b)
To determine the Silhouette Score for a dataset, the mean of the Silhouette Coefficients for all datapoints is calculated (or a random sample if there are too many). The score is bounded by -1 for incorrect clustering and +1 for highly dense clustering. When the score is around zero it means that clusters are likely overlapping [69]. If the clusters are dense and well separated,
CHAPTER 4. MACHINE LEARNING
33
then the score is higher, which corresponds to the conventional definition of a cluster. Furthermore, a large silhouette score corresponds with a high value of roundedness of the clusters which is fortunately a desired property of blobs.
4.3 Dimensionality reduction 4.3.1 Principal Component Analysis (PCA) Principal Component Analysis is likely the most popular multivariate statistical technique [70]. It is widely used as a method for dimensionality reduction. It can be thought of as fitting a k-dimensional ellipsoid to the data such that each axis corresponds to a principal component. The axes are chosen so that they explain the highest amount of variance and are orthogonal with respect to each other. The first principal component is the axis that explains the largest amount of variance. The second principal component lays orthogonally to the first and explains the second highest amount of variance and so on. Shorter axes do not provide as much information and can therefore be removed. The aim of PCA is to find from the data X these components, which are linear combinations of the original variables. Singular Value Decomposition (SVD) is used to calculate X = P ∆QT [70]. The matrix P ∆ denotes the factor scores, or in other words the importances of the dimensions. Matrix Q holds the coefficients of the linear combinations used to compute the factor scores. By multiplying the original matrix X with Q we can project the data to a lower dimensional space. The result is a compressed version of the original matrix which can be used to speed up following steps. Because of this characteristic PCA is often used for image compression where it has proved itself to be effective [71].
Chapter 5 Related work The related work is divided in two subjects: blob detection and machine learning for biomedical image analysis. Each subject is treated separately.
5.1 Blob detection A survey on the usage of blob detection algorithms for biomedical image analysis described in literature has been done in 2016 [72]. The authors gathered and examined 30 relevant papers in which classical blob detection algorithms are utilised. In other words, the algorithms that do not use any machine learning or artificial intelligence. They found that a majority (20 of 30) of the papers used either the Laplacian of the Gaussian (LoG), the Difference of Gaussians (DoG) or the Determinant of Hessian (DoH) method (see Figure 5.1). The authors did not provide an explanation why they think these methods are the most popular. Blob detection is not only useful for analysing biomedical images, but also for images in other fields. When fruits are interpreted as blobs for example, machine vision techniques can be used to count them in trees [40, 73]. Tracking piglets in videos [74] and traffic sign detection for autonomous driving [75] are alternative applications of blob detection. Other types of images may be analysed as well like infra-red [76] and ultrasound images for the purpose of detecting breast abnormali-
34
CHAPTER 5. RELATED WORK
35
Figure 5.1: Frequency of blob detection methods used in 30 biomedical image analysis papers. From [72]. ties [77]. Ultimately blob detection can be used for any problem where regions need to be detected that are visually distinct from their surroundings.
5.2 Machine learning for biomedical image analysis With the development of more powerful computing systems the use of artificial intelligence is becoming ubiquitous. The bio-informatics experts have already discovered the advantages of computer-assisted image analysis in the 2000’s and a great deal of literature has been written about it already. Before the popularity of machine learning for computer vision, more simple image processing techniques were used such as segmentation, thresholding, and watershed (see 3.2.2) such as in CellProfiler [10]. CellProfiler allows the user to define a number of processing modules in sequence for performing analysis on cell images. Another popular software application for image processing is FIJI [13], which is a ”batteries-included” version of the powerful ImageJ 1.x [78] image processing tool. Even though these tools have many features, each step in the analysis requires parameters to be determined by the user. This can be difficult because the results depend highly on these parameters and
36
CHAPTER 5. RELATED WORK
it is difficult to affirm the correct parameters. Also, both tools cannot do out-of-core image processing without resorting to custom plugins or scripts. This missing feature makes them unsuitable for analysing images that do not fit in memory. Even more, 3D is not yet fully supported in CellProfiler with some crucial functions missing. Gene expression profiling using image processing methods is described in [5, 20, 79]. Transcriptomics, the techniques used to study an organism’s transcriptome (sum of all its RNA transcripts), are often used for gene expression profiling. Image analysis methods have been successfully applied to transcriptomics [22, 80]. The authors of the papers [81, 82] have provided an extensive explanation of the common steps in an automated bio-image analysis pipeline. Especially in high-throughput experiments, image analysis is used heavily to quantify phenotypes of interest to biologists [16]. Papers such as [16, 83, 84] treat the common case of phenotypic cell profiling specifically. Since mere image processing methods were not always sufficient, machine learning techniques have become more present in biomedical image analysis - often in the context of high-content screening (HCS). By using techniques from image processing, computer vision and machine learning, large amounts of bio-image data can be analysed, which is also frequently called high content analysis (HCA) [82]. Machine learning has also been applied to cell segmentation [85, 86] and nucleus detection [87]. In [88] the authors use a type of neural network, called convolutional neural network, to detect nuclei. For cell segmentation SVM’s are utilised in [89], while deep learning algorithms are compared in [90]. Besides proprietary software, free tools that apply machine learning for image-based cell analysis have been developed such as CellCognition [8], CellClassifier [7], Advanced Cell Classifier (ACC) [91] and cellXpress [9]. CellCognition is a computational framework for quantitative analysis of high-throughput fluorescence microscopy. It has functions for among others: image segmentation, object detection, feature extraction and statistical classification. The only purpose of CellClassifier is automatic classification of single-cell phenotypes using supervised machine learning. It requires images that have been prepared with CellProfiler. CellCognition and CellClassifier rely on a SVM for phenotypic classification. Advanced Cell Classifier is the improvement over CellClassifier that is more user-friendly, allows for more advanced machine
CHAPTER 5. RELATED WORK
37
learning with 16 different classifiers and was made for high-content screens. cellXpress is another fully featured and highly optimised software platform for cellular phenotype profiling. The platform is designed for fast and high-throughput analysis of cellular phenotypes based on microscopy images. Notably the bio-image analysis tool Ilastik [36] was a major inspiration for this project because its use of active learning, which lets users label a few instances iteratively on-line, showed to be very effective. It uses a random forest for pixel classification, without indication of other tried algorithms. Therefore in this project other machine learning candidates will be evaluated as well. The described tools are primarily meant for cell detection and profiling, which is slightly different from blob detection. Besides that, they all work semi-automatically by requiring the user to label a few instances beforehand, while the aim of this project is to do blob detection fully automatically.
Chapter 6 Methodology Machine learning can be applied to blob detection by first classifying the pixels as blobs/non-blobs, followed by a clustering of these pixels into blobs. This approach will also be used in this research, as extensively described in section 6.1. Based on this approach a number of experiments can be devised that help answer the research question. These are discussed in section 6.2. Section 6.3 describes the characteristics of the data, how the data was collected and how it has been labelled. The last section 6.4 discusses the details of the experiments and how overall reliability and validity will be assured.
6.1 Blob detection process In this project the blob detection process consists of 6 subsequent steps. The input is a 3D image and the output is a list of blob coordinates. The first step starts by extracting a set of features for each pixel in the image. These features signify intensity, edges and texture. An optional intermediary step is compressing the features with PCA. Next, a trained classification model is used to classify the pixels into two classes: blob and non-blob. The resulting binary image is passed into a clustering steps that attempts to declump the touching blobs. In the next step the locations and characteristics of all blobs are extracted. Finally, the blobs are filtered based on their characteristics and returned as output. An overview of the process is visible in figure 6.1. Now each step will
38
CHAPTER 6. METHODOLOGY
39
Figure 6.1: The blob detection process. be more thoroughly discussed.
6.1.1 Feature extraction Besides the single pixel intensities, filters can be applied to the input image to obtain additional features for each pixel. The filters are partly inspired by Ilastik [36]. The intensities of neighbouring pixels are represented by the raw image smoothed with a Gaussian filter. The Laplacian of Gaussian, Difference of Gaussians and Determinant of Hessian (see 3.2.4), and Gaussian of gradient magnitude are used to detect edges. The texture of regions is distinguished by the eigenvalues of the structure tensor (see 3.2.5) and the eigenvalues of the Hessian of Gaussian [36]. The scale of the filter σ for each feature can be specified. The features can be calculated in 2D by calculating them for each z-plane separately, but some also in 3D by applying a Gaussian filter in the z dimension.
40
CHAPTER 6. METHODOLOGY
6.1.2 Feature compression The idea behind this step is that since the number of extracted features may be too high, it can take very long to train and run a classification algorithm. Also some features may not be very relevant after all. A dimensionality reduction algorithm such as PCA can reduce the number of features without sacrificing the accuracy too much. This step takes in the pixel features, transforms them using a fitted PCA model and outputs the resulting pixel features in a lower dimensionality.
6.1.3 Pixel classification A trained classifier model can now be used to classify pixels by their features. The output of the classification is a list of predicted labels for each pixel saying whether it is likely part of a blob or not. The labels for all pixels are then put together again such that we get a binary image with a background of 0’s and regions of 1’s that denote blobs.
6.1.4 Pixel clustering Since blobs can be clumped up together, the aim of this step is to split them up. The method commonly used in image-based cell analysis software is a watershed algorithm (see 3.2.2). The starting markers are the local maxima in the blobs and the ”water” flows until the edges of the blobs. Even though the results of watershed are usually acceptable, there are other algorithms that can be useful for declumping as well. When the x and y coordinates are treated as features of each blob pixel, then they can be grouped in clusters by a clustering algorithm with the inverse of the Euclidian distance as similarity measure. The pixels that are close together will form clusters which correspond with individual blobs. In order to reduce the running time, the clustering is performed for each connected component separately, so that only the pixels in a component are considered each time. The result of this step is a segmented
CHAPTER 6. METHODOLOGY
41
image with labelled regions. The background is labelled with 0’s, while each cluster is labelled with a unique id from {1, 2, ...}.
6.1.5 Blob extraction In this step the segmented image is processed and all the clusters (i.e. blobs) with their characteristics are extracted. For each blob the centroid is calculated as the mean of the x, y and z coordinates of the pixels. The radius is the maximum distance over all the pixels to the centroid. The output of this step is a list of blobs with their respective characteristics.
6.1.6 Blob filtration With some prior knowledge about the size of blobs, this step filters out the blobs that are either too small or too large. This is needed because it may be possible that noise may be mistaken as blobs in the previous steps. Finally this step outputs the filtered blobs from the original input image.
6.1.7 Chunking Since an image may not fit in memory completely, it has to be processed out-of-core. The solution in this thesis is applying analysis on separate parts of the image by chunking. A chunk is a rectangular cuboid whose depth is the same as the image depth but whose width and height are usually much smaller (usually 500 × 500 pixels). In addition to the image data within the boundaries, every chunk also contains data of a 10 pixel thick border around the chunk called the overlap. This is needed for the convolutional filters that otherwise need to guess the values outside of the chunk boundaries. Also, some blobs may lie across chunk boundaries that otherwise would be split in two. The value of 10 pixels was chosen because blobs are very unlikely to be bigger than 20 pixels in diameter, which means that blobs whose centroid lays within a chunk are fully encompassed by the overlap. All the steps are performed on each chunk in sequence. In the end the blobs of each chunk are collected and returned as final list of blobs.
42
CHAPTER 6. METHODOLOGY
6.2 Experiments The research question of this thesis is How can machine learning techniques effectively be applied to blob detection in high-resolution 3D microscopy images? The research is comprised of six experiments. The first four experiments evaluate how machine learning techniques can be applied to the first four steps of the blob detection process: feature extraction, feature compression, pixel classification and pixel clustering. There are no experiments related to the last two steps of blob detection; blob extraction and blob filtering, because there is not enough data to justify using a machine learning approach over a simple heuristic method. The fifth experiment assesses the feasibility of the blob detection process, using the found optimal machine learning techniques, by running it over a whole image. In the sixth experiment the blob detector of this thesis will be compared to the state-of-the-art tools that are commonly used for blob detection in biomedical images.
6.2.1 A: Feature extraction Even though an unlimited number of features can be extracted from an image, only a limited set may be useful for a specific application. The features that may be beneficial for differentiating blob pixels from non-blob pixels are shown in Table 6.1. The single pixel value and the Gaussian filter are the most basic features and respond to light/dark regions in general. The motivation for choosing the Laplacian of Gaussian, Difference of Gaussians and Determinant of Hessian features is their popularity in blob detection (see Figure 5.1). The reason is their high response to local extrema (see 3.2.4). The Gaussian of gradient magnitude is suitable for detecting edges in images. After applying a Gaussian filter it employs a gradient magnitude filter that reveals the gradients of the pixel intensities. The Eigenvalues of structure tensor and the Eigenvalues of Hessian of Gaussian are both also used in Ilastik [36] to reveal texture.
CHAPTER 6. METHODOLOGY
Feature Value Gaussian filter Laplacian of Gaussian Gaussian of gradient magnitude Difference of Gaussians Determinant of Hessian Eigenvalues of structure tensor Eigenvalues of Hessian of Gaussian
43
Code
2D/3D
Scales σ
value gaus log ggm dog doh stex, stey hogex, hoge
N/A 2D, 3D 2D, 3D 2D, 3D 2D, 3D 2D 2D 2D
N/A 0.7, 1.0, 1.6, 2.5, 4.0 0.7, 1.0, 1.6, 2.5, 4.0 0.7, 1.0, 1.6, 2.5, 4.0 0.7, 1.0, 1.6, 2.5, 4.0 0.7, 1.0, 1.6, 2.5, 4.0 0.7, 1.0, 1.6, 2.5, 4.0 0.7, 1.0, 1.6, 2.5, 4.0
Table 6.1: All the pixel features that are tested. The value filter represents the single pixel intensities. The Determinant of Hessian, Eigenvalues of structure tensor and Eigenvalues of Hessian of Gaussian are not implemented in 3D. Because the eigenvalues consist of both an x and y component, these features consist of two attributes. A1: Feature selection To optimise feature extraction it is needed to find a subset of features from Table 6.1 that is both relevant for classifying a pixel as blob/nonblob and is non-redundant. The feature selection process proposed by José Bins and Bruce A. Draper does exactly this [92]. They suggest a three step approach for selecting a small number of important features from a huge set of features that is often available in the computer vision domain. The feature selection method in this thesis is largely inspired by their work. The first step is filtering out the irrelevant features by their Relief score [93] with respect to their predictive power of the label. Even though Relief has been shown to detect relevant features well, in practice it is very time consuming to compute for large datasets. Therefore in this thesis the choice was made to use mutual information (MI) instead, which has seen use in feature selection as well [94]. This metric, also sometimes called information gain, is utilised to calculate the gain in information (defined as the decrease in entropy) when instances are split by some condition. This makes it a suitable metric because the features are evaluated for significance to classification, which essentially is splitting instances by their features. Even more, the benefit of mutual information
44
CHAPTER 6. METHODOLOGY
is that it does not make assumptions on the data such as other methods like the Chi-square test that assumes categorical variables [95] and the Pearson correlation coefficient that only considers linear correlations [96]. The filtering step ends by removing the features whose mutual info score is not at least some minimum value. Additionally, because some features may take longer to calculate, the duration of calculation is also taken into account. Since mutual information does not detect redundancy, it is possible that some of the most relevant features are very similar to each other. Therefore the second step aims to eliminate redundancy by only keeping the most relevant feature for each group of similar features. Similar features are found by applying k-Means. This method of applying k-Means is unusual because most times instances are clustered by their features but now features are clustered by the values for each instance. In the paper the authors suggest a third step that uses the Sequential Floating Forward Selection (SFFS) algorithm [97] to create an optimal subset of features, but this is not needed in our case because the number of features is already sufficiently low after the second step. The field of feature selection in machine learning is very broad and many algorithms have been proposed that aim to find the set of optimal features. However, in order to limit the scope of this thesis the other methods of feature selection are not considered. The reason for choosing above method is that it has been shown to be more effective than standard feature selection algorithms for large datasets with lots of irrelevant and redundant features [92]. It seeks the most relevant features with the lowest amount of redundancy as are desired properties for a feature set. A2: 2D versus 3D comparison In addition, it is interesting to see whether features are more relevant when calculated in 3D versus 2D. This can be done by comparing their mutual information with respect to the label with a two sided t-test1 for the null hypothesis that the mean of the mutual information for the 2D feature is equal to the mean of the mutual information for the 3D 1
From scipy.stats.ttest_rel
CHAPTER 6. METHODOLOGY
45
feature µ(MI(f2D ))=µ(MI(f3D ). For the features where the hypothesis is rejected, it is determined whether the mutual information for the 2D feature is higher, or for the 3D feature. Reliability and validity To reduce the effect of chance, 100 random samples of about 26,400 pixels, and their corresponding labels are selected from the training image. In total about 1% of the pixels in the blob regions are sampled. In each sample about 144 (0.55%) of the pixels are labelled as blob, the rest as non-blob. Then for each sample the desired features are calculated, followed by a mutual information calculation with respect to the label.
6.2.2 B: Feature compression For the sake of reducing classification time, it is needed to investigate how much the pixel features can be compressed with PCA whilst keeping enough information for accurate classification. This is measured by the performance of the pixel classification after the features have transformed using a fitted PCA model. The features are compressed to {1, 2, ...m} components where m is the number of extracted features.
6.2.3 C: Pixel classification For pixel classification there are a number of suitable algorithms that go from simple to more complex. Simpler algorithms have less parameters because they make more assumptions on the data. This can make them more useful when little data is available compared to more complex algorithms. Algorithms with more hyper-parameters require more data and effort to train but can become more accurate because they can detect less obvious patterns in the data. Now for candidates, naive Bayes can be used as a baseline classifier to compare the other algorithms with. Logistic regression is a known good performer on binary classification problems. k-Nearest neighbour can work well on some problems that are not too complex. It
46
CHAPTER 6. METHODOLOGY
is simple to understand and implement, but its major drawback is the long running time for large data sets. Decision trees can be effective too, because they can potentially emulate any decision boundary. However, since they are prone to over-fitting, a random forest of multiple bagged decision trees may be a better alternative. AdaBoost is another ensemble method that, like a random forest, can mitigate the shortcomings of a decision tree by focusing on the harder-to-classify instances. As a classifier that can achieve a high accuracy, even with little data, the support vector machine is a popular additional contestant. Finally, a simple feed-forward neural network is chosen as last candidate because due to its high number of parameters it can potentially achieve a very high accuracy when given enough data. The metric used for measuring classification quality is the f1-score (see 4.1.9). The accuracy metric is not so useful because the blob/non-blob pixels are highly imbalanced with at most 1% of the pixels being blob pixels. The speed is measured as the prediction time for classifying a set of pixels. The training time of the classifiers is not taken into consideration because all training happens off-line. C1: Hyper-parameter tuning First, the optimal hyper-parameters are searched for each classifier such that they achieve the highest f1-score on the best selected features. Table 6.2 shows the classifiers that will be tested and their hyper-parameter search space. The same optimised decision tree is used as base estimator in Random Forest and in AdaBoost. Naive Bayes does not have any hyper-parameters that can be optimised. For logistic regression, support vector machine, decision tree and k-nearest neighbour grid search [56] is used to find the optimal combination of hyper-parameters. For the neural network a random search [56] is performed over 100 random combinations to approach the best hyper-parameters, followed by a grid search to attain the local maximas. With exception of the neural network classifier which is implemented with Keras [98] using a TensorFlow [99] back-end, the other classifiers use the implementation in scikit-learn [100].
CHAPTER 6. METHODOLOGY
47
Table 6.2: Classifiers tested for pixel classification and their search space of hyper-parameters. The remaining hyper-parameters get values according to the default values in scikit-learn 0.19.1 or Keras 2.1.5 for the neural network. Classifier Naive Bayes Logistic regression k-Nearest neighbour
Decision tree
None penalty1 ∈ {’l1’, ’l2’}, C2 ∈ {0.5, 1.0, 1.5, 2.0, 2.5} n_neighbors3 ∈ {1, 3, 5, 10}, weights4 ∈ {’uniform’, ’distance’}, p5 ∈ {1, 2} criterion6 ∈ {’gini’, ’entropy’}, splitter7 ∈ {’best’, ’random’, max_depth8 ∈ {3, 4, ..., 12}, max_features9 ∈ {1, 2, ..., 10}
Random forest
None - uses 50 optimised decision trees
AdaBoost
None - uses 50 optimised decision trees
Support vector machine
Neural network
1
Search space of hyper-parameters
C2 ∈ {0.5, 1.0, 1.5, 2.0, 2.5}, kernel10 ∈ {’linear’, ’poly’, ’rbf’, ’sigmoid’}, gamma11 ∈ {0.1, 0.4, 0.7, 1.0, 1.3} n_neurons112 ∈ {1, 5, 10, 15, 20, 25, 30} n_neurons213 ∈ {1, 5, 10, 15, 20, 25, 30} dropout14 ∈ {0.0, 0.1, 0.2, 0.3, 0.5} lr15 ∈ {0.0001, 0.0005, 0.001, 0.005, 0.01} decay16 ∈ {0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1}
’l1’ or ’l2’ regularisation regularisation term - smaller means stronger regularisation 3 number of neighbours to be considered when classifying a new point 4 ’uniform’ means that all neighbours are weighted equally, ’distance’ means that the weight for each neighbour is the inverse of its distance to the point 5 p=1 is the Manhattan distance and p=2 is the Euclidean distance [49] 6 criterion for determining best split - ’gini’ is the gini impurity, ’entropy’ is the information gain 7 strategy for choosing best split at each node - ’best’ chooses the best split and ’random’ chooses a random split 8 maximum depth of the tree 9 number of features to consider when choosing best split 10 kernel used in the algorithm - ’poly’ is a polynomial kernel of degree 3 and ’rbf’ stands for radial basis function 11 coefficient used in the ’rbf’, ’poly’ and ’sigmoid’ kernels 12 number of neurons in the first hidden layer 13 number of neurons in the second hidden layer 14 fraction of input units to drop 15 learning rate 16 learning rate decay over each update 2
48
CHAPTER 6. METHODOLOGY
C2: Pixel classifier comparison on different PCA compressions To test the optimised classifiers, they are first trained on the selected best features which are compressed with PCA to different numbers k=m,m-1.m-2,...,1 of components. Then they are used to predict the labels for a different of features. Next, the classifiers are compared to each other in terms of f1-score and prediction time. Design neural network Compared to other classification algorithms a neural network allows for many more hyper-parameters. In order to limit the options, the following parameters are therefore fixed. The input layer has a number of neurons that is equal to the number of best features found in A1. The number of hidden layers is 2 with the reasoning that more layers is better for learning complex functions but the problem is not complex enough to justify a deeper neural network. The output layer has only a single neuron because the neural network is expected to give a binary output. The layers are all dense meaning that between every pair of layers all the neurons are connected. An exception is the optional drop-out between the first and second hidden layer. Here some neurons may not be connected to the next layer based on the dropout parameter. The benefit of drop-out is that since the neural network cannot focus only on a few features and therefore needs to find general patterns, overfitting is prevented. The activation functions for the neurons in the hidden layers are ReLU’s (see Figure 4.4), which are easy to train because of their linear behaviour making them in general an excellent choice [56]. The activation function in the output layer is a sigmoid (see Figure 4.2) function as is common for binary classification problems with gradient-based learning [56]. In terms of optimisation algorithm there is no consensus on what is the best algorithm [101]. However, since the Adam optimiser [58] has shown to be robust and is used frequently in literature [56], this will be our choice as well. With exception of the learning rate and learning rate decay, all the hyperparameters in the Adam optimiser have the default values as provided in the original paper. The number of training epochs is 150 and the batch size is 10000.
CHAPTER 6. METHODOLOGY
49
Reliability and validity For both hyper-parameter optimisation and comparing the classifiers, cross validation is used to test how well the classification algorithm is at predicting labels of an unseen dataset. First 1% of the pixels in the blob regions of the training image are randomly sampled, together with their labels. For these pixels the best features are calculated as determined by A1. Stratified 10-fold cross validation is then applied to train the classifiers with 9 folds and calculate their f1-score for the predicting the labels in the remaining fold. Each of the 10 folds is the test fold exactly once. Stratification makes sure that the ratio of blob pixels is equal in each fold. This is important because the blob ratio must be similar to what is expected in a whole image. The mean of the f1-score for all 10 folds is taken as the overall f1-score for each classifier.
6.2.4 D: Pixel clustering For the pixel clustering step, the goal is to find the algorithm that is most suitable on the basis of its clustering quality and running time. The metric used for measuring clustering quality is the silhouette score (see 4.2.6). In order to compare other clustering algorithms, the watershed algorithm is treated as a clustering algorithm in this experiment. Not every clustering algorithm is suitable for pixel clustering because some may have different definitions of clusters or not scale well for instance. The candidates have therefore been chosen for the following reasons. Kmeans is a popular clustering algorithm with good reason: it is simple and fast. Besides that, it looks for round blobs with similar size which is beneficial in the case of blobs. Agglomerative clustering algorithms make few assumptions on the data, making them a suitable generalpurpose candidate. The MeanShift algorithm was created for solving high-density clusters which is exactly what blobs are. Spectral clustering is known to be slower than others but can find clusters of high quality. The clustering algorithms and their hyper-parameters are shown in Table 6.3. Their implementation in scikit-learn [100] is used.
50
CHAPTER 6. METHODOLOGY
Table 6.3: Algorithms and their parameters used for pixel clustering. The values of the remaining parameters are the default values in scikit-learn 0.19.1. Clustering algorithm Watershed K-means Agglomerative (centroid) Agglomerative (ward) MeanShift Spectral
Parameters None n_clusters=number of local maxima1 t2 =5 n_clusters=number of local maxima bandwidth3 =4 n_clusters=number of local maxima
1
starting centroids are initialised as the locations of the local maxima maximum distance between clusters - two clusters are merged when the distance between their centroids is smaller than t 3 coefficient used in the RBF kernel 2
Reliability and validity To reduce the effect of chance, the blob region of the test image is split in chunks of 500 × 500 × depth and 1% of the chunks are randomly sampled. All the pixels in each chunk are first classified using the best classifier as determined by C2 and best hyper-parameters as determined by C1. The pixels in each chunk are then clustered using every clustering algorithm in Table 6.3. For each clustering result the silhouette score and other blob statistics are calculated. The clustering algorithms are finally evaluated by the mean of the silhouette score for each sampled chunk.
6.2.5 E: Run on whole image The purpose of this experiment is to determine whether the overall machine learning approach taken in this thesis is suitable to detect blobs in whole images in practice. The set of the optimal features from A1, the optimal PCA compression from B, the optimal classifier from C and the optimal clustering algorithm from D will be used in the process. The blob detector with these properties is ran over one complete image. Both the blob detection results and the running time will be analysed.
CHAPTER 6. METHODOLOGY
51
6.2.6 F: Comparison with state-of-the-art As a final step for validation of the blob detector proposed in this thesis, it is compared to the current state-of-the-art of tools for blob detection. It is not fair to compare the found blobs from the different tools. Firstly, because all the tools require different parameters, it is unreasonable to guarantee that the used parameters are optimal. Secondly, there is no ground truth available on the number, location and sizes of the blobs to check the blob results with. Therefore the tools will only be compared based on their running time. The blob detection approach in this thesis, called the MFB detector, will be compared to FIJI [13], CellProfiler [10] and Ilastik [36]. For the configuration of each tool, the reader should consult Appendix A. Each tool will be run over 10 random crops of size 500×500×16 pixels of the test image. Then mean of those times is used for comparing the performance of the four tools.
6.2.7 Summary For a summary of the solutions that will be tested in the experiments, one can consult Table 6.4.
6.3 Data collection 6.3.1 Characteristics The data consists of high-resolution 3D microscopy images made with a confocal microscope. The images are saved in TIFF files where the depth layers are saved as a z-series. The description of the TIFF files contain metadata in the OME-TIFF XML format [102]. The characteristics of the images are summarised in Table 6.5. Experiments A: Feature extraction, B: Feature compression and C: Pixel classification require images with labelled pixels. For these steps the same image is used for both training and evaluation with the characteristics in Table 6.6. For experiment D: Pixel clustering there is no data of the ground truth
52
CHAPTER 6. METHODOLOGY
A - Image features
B - Feature compression
C - Classification algorithms
Raw pixel values Gaussian filter Laplacian of Gaussian Gaussian of gradient magnitude Difference of Gaussians Determinant of Hessian Eigenvalues of structure tensor Eigenvalues of Hessian of Gaussian
PCA transform to m to 1 components
Naive Bayes Logistic regression k-Nearest neighbour Decision tree Random forest AdaBoost Support vector machine Neural network
D - Clustering algorithms
E - Run on whole image
F - Blob detection tools
Watershed K-means Agglomerative (centroid) Agglomerative (ward) MeanShift Spectral
N/A
MFB Detector FIJI CellProfiler Ilastik
Table 6.4: Summary of solutions that will be tested in each experiment.
Characteristic Width/height Number of layers Pixel size x/y Pixel size z Storage size Data format Color depth
Values Any size, but typically around 30000 × 30000 Any number ≥ 1, but typically 3 − 12 Any value, but typically 0.27 µm At least 0.7 µm, but typically 1 µm Typically 0.5 − 10 GB Uncompressed OME-TIFF file with description in the OpenMicroscopy OME-XML format [102] 8 bits, but usually less than 256 unique values
Table 6.5: General characteristics of the biomedical images considered in this research.
CHAPTER 6. METHODOLOGY
Characteristic Width/height Number of layers Pixel size x/y Pixel size z Storage size Data format Color depth Blob coverage
53
Values 2310 × 115000 3 0.217µm, 0.219µm 1.0µm 762 MB OME-TIFF-XML 8 bits, at most 75 unique values 0.528% blob pixels
Table 6.6: Characteristics of the training image. This image has been labelled with the procedure described in 6.3.2. Characteristic Width/height Number of layers Pixel size x/y Pixel size z Storage size Data format Color depth Capture time
Values 24097 × 14445 16 0.273µm, 0.272µm 1.0µm 5.19 GB OME-TIFF-XML 8 bits, at most 210 unique values 18548 s (5:09 hours)
Table 6.7: Characteristics of the test image. This image is unlabelled. of the clusters. Since we now have more freedom to pick any image, a different image is used than in the previous experiments. The characteristics of this larger image can be found in Table 6.7. The same image is used for experiments E: Run on whole image and F: Comparison with state-of-the-art.
6.3.2 Labelling As supervised machine learning algorithm the pixel classifiers must be trained with image data where the pixels have been labelled as blobs and non-blobs. Two methods have been considered for generating these labels: humans can label the pixels or a program labels them with prior
54
CHAPTER 6. METHODOLOGY
knowledge that is not present during classification. Human labelling is tedious and error-prone, while machines are much faster but can make errors too without noticing. To reduce the drawbacks of these methods it has been decided to combine them. There are two version of the training image: one with blobs and one in which the fluorescent dyes have been stripped such that there are no blobs visible. A computer program can subtract the image without blobs from the image with blobs to reduce the background. The background is not completely removed because the images are slightly different due to noise and misalignment. In the resulting image the blobs are more distinct from their surroundings than in the original image. See also Figure 6.2. Then to accentuate the local maxima in the images, the Difference of Gaussians (DoG) of the image is calculated. A scale of σ = 1.5 has shown to give the best results by trail-and-error. Next a sample of the pixels (0.1%) from the DoG image is randomly chosen and used to fit a K-means clustering algorithm with 2 clusters. Standardisation (see Equation 6.1) is used to improve the speed and accuracy of the convergence. Since pixels that belong to a blob receive a much higher value after a DoG transformation, the clustering algorithm will group blob pixels in one cluster and the non-blob pixels in another cluster. K-means is chosen because it is general-purpose and reasonably efficient for a clustering algorithm. Its drawback however, is that it can get stuck in a local minimum because of an unfortunate choice of initial centroids. Therefore the algorithm is re-initiated 50 times with different centroid seeds. The centroids of the best output of the 50 runs in terms of inertia, which is defined as the sum of squared distances of samples to their closest cluster centre [103], are used to k-means cluster the remaining pixels. The results of the machine-labelled image are visually checked by a human to make sure that the blobs are correctly labelled. The labelling method utilises two pieces of prior knowledge that are not available during run-time of the classifier. Firstly, the classifier does not have access to the image where the blobs have been stripped during run-time, because it is too costly to strip the fluorescent dyes and then recapture each image. Secondly, the results will not be checked by a human in
CHAPTER 6. METHODOLOGY
(a) Blobs visible
(b) Blobs stripped
55
(c) Difference
Figure 6.2: Image c is produced as: c = a − 2 · gaussian_filter(b, σ = 1). The values were chosen empirically. Specification Operating system Processor Memory size
Value Windows 10 Home (64-bit) Intel(R) Core(TM) i7-3630QM CPU @ 2.40 GHz 8.00 GB
Table 6.8: Specifications of the test system. practice. X′ =
X −µ σ
(6.1)
6.4 Experimental design 6.4.1 Test system All the experiments are performed on the same computer with the specifications in Table 6.8.
56
CHAPTER 6. METHODOLOGY
Package name javabridge [104] Keras [98] Matplotlib [35] NumPy [105] NetworkX [106] pandas [107] python-bioformats [108] PyQt5 [109] scikit-image [110] scikit-learn [100] SciPy [111] TensorFlow [99] tifffile [112] PyYAML [113]
Version
Purpose
1.0.15 2.1.5 2.2.2 1.14.2 2.1 0.22.0 1.3.2 5.10.1 0.12.3 0.19.1 1.0.0 1.7.0 0.14.0 3.12
Dependency of python-bioformats User-friendly API for neural networks Visualisation of analysis results nD arrays, memory maps and helper functions Graph colouring Data manipulation and analysis Reading OME-TIFF image files Matplotlib backend Image processing Machine learning Statistics and other helper functions Neural network framework Reading OME-TIFF image files Reading and writing YAML files
Table 6.9: Python software packages used in this project.
6.4.2 Software All the software is written in Python 3.5 from the Anaconda2 4.2.0 distribution. The used Python packages are listed in Table 6.9.
6.4.3 Data analysis During the experiments the relevant data is stored in pandas DataFrames, which are 2D tabular datastructures. Then after the experiments have finished, the data is analysed using pandas and statistical functions from SciPy. The results are either displayed in text or plotted with Matplotlib.
6.4.4 Overall reliability and validity To guarantee statistical significance of the experiments the significance level is fixed to α = 0.001 for each statistical test. For the purpose of re2
See Anaconda: https://anaconda.org/
CHAPTER 6. METHODOLOGY
57
peatability, the random seed is set to 0 before creation of every sample. The validity of the optimised blob detector is evaluated by running it over a whole image. To check that the performance is reasonable, it is compared with performance of state-of-the-art tools.
Chapter 7 Analysis In this chapter the results are listed for each experiment. Furthermore, interesting observations from the results are described in the text.
7.1 Results from A: Feature extraction This section lists the results that have been collected in according to the tests on feature extraction. One should refer back to Table 6.1 for finding the features that correspond with the feature abbreviations gaus, log, ggm, etc.
A1: Selected best features In the first step of the feature selection process we filter out those feature that are either irrelevant or take too long to calculate. Figure 7.1 shows per feature and scale the mutual information and time. The figure shows that especially the Hessian of Gaussian eigenvalues do not provide much information for large scales while being at the same time expensive to compute. Based on the information from the figure we filter out the features whose mutual information is smaller than 0.010 and whose time is larger than 0.4 s. In the second step the features must be clustered such that redundant features can be removed. The results of the clustering is visible in Fig-
58
CHAPTER 7. ANALYSIS
59
Figure 7.1: Mutual information with respect to the blob/non-blob label of features for different scales on the left axis. On the right axis the average time to calculate the feature for a 500 × 500 × 3 pixels image chunk.
60
CHAPTER 7. ANALYSIS
Figure 7.2: Results of clustering the features with K-means and k=10. The colors denote the cluster assignment. t-SNE [114] was used to convert the points to a 2D space and spread them out nicely. ure 7.2. It is interesting to see that some clusters contain only one feature (e.g. value and 2d_log_4.0) while some clusters are much larger. Now for each cluster only the feature with the highest mutual information score is kept. Table 7.1 shows the final selection of the 10 best features as determined by their mutual information. The value of 10 was chosen because it resulted in the highest silhouette score (see 4.2.6).
A2: Comparison 2D vs 3D What Table 7.2 shows is that for the features with small scale it does not matter if they are calculated in two dimensions or in three dimensions. For features with larger scale there is a difference between 2D and 3D
CHAPTER 7. ANALYSIS
Name 3d_dog_1.6 3d_gaus_0.7 2d_stex_1.0 3d_log_1.6 2d_log_1.6 value 3d_log_2.5 2d_log_2.5 2d_log_4.0 2d_hogey_0.7
61
Mutual info Time (s) 0.030 0.026 0.024 0.022 0.022 0.022 0.020 0.018 0.013 0.011
0.085 0.039 0.167 0.074 0.060 0.012 0.075 0.062 0.073 0.209
Table 7.1: The 10 selected best features with their mutual information as calculated using the two-step feature selection process discussed in 6.2.1. The shown time is the average time to calculate the feature for a 500 × 500 × 3 pixels image chunk. but no consistent pattern is visible. That for the Difference of Gaussians (dog) with σ = 1.6 the third dimension provides more information than 2D on the blob/non-blob label was to be expected since during the generation of the training data the 3D version of the DoG was used to predict the label.
7.2 Results from B: Feature compression and C: Pixel classification During the tests the support vector machine classifier proved too hard to train because its training time scales more than quadratically with the number of samples. So therefore the SVM was trained with a smaller sample of 0.01% pixels. But since a large sample is crucial due to the significant class imbalance, the SVM performed badly.
62
CHAPTER 7. ANALYSIS
gaus log ggm dog
0.7
1.0
Scale σ 1.6
= [0.725] = [0.938] = [0.795] = [0.574]
= [0.617] = [0.704] = [0.596] 3d [0.000]
2d [0.000] 3d [0.000] = [0.696] 3d [0.000]
2.5
4.0
2d [0.000] 3d [0.000] 2d [0.000] 2d [0.000]
× 3d [0.000] 2d [0.000] 2d [0.000]
Table 7.2: Results of t-test comparing the means of the mutual information with respect to the label between 2D and 3D versions of the features. ’×’ means that one of the samples was not normally distributed making the t-test is invalid. ’=’ means that the hypothesis can be accepted and there is likely no difference in predictive power on the label between 2D and 3D. ’2D’ or ’3D’ means that the hypothesis must be rejected and the respective dimension has more predictive power than the other. The number between brackets is the p-value.
C1: Hyper-parameter optimisation In Table 7.3 one can observe the values of the optimised hyper-parameters for the classifiers used in pixel classification.
C2: Comparison classifiers with respect to PCA compression Each optimised pixel classifier was run over the same pixel sample that has been compressed with PCA to varying number of components. Figure 7.3 shows that for most classification algorithms more components (i.e. features) leads to a higher f1-score, as is expected. The exceptions are however the decision tree, random forest and naive Bayes. The disappointing performance of naive Bayes does not come as a surprise since it is known that its independence assumption can be too limiting for complex problems. What is surprising is that the decision tree and random forest actually profit from stronger (i.e. less components) PCA compression. This gives the perception that they are under-fitted when trained on PCA-compressed features. Also AdaBoost, that uses a decision tree as base estimator, loses out compared to k-nearest neighbour, neural network and logistic regression. Without PCA-compression all
CHAPTER 7. ANALYSIS
63
Classifier
Best hyper-parameters
f1-score
Naive Bayes Logistic regression k-Nearest neighbour Decision tree
N/A penalty=’l1’, C=2.5 n_neighbours=10, weights=’distance’, p=1 criterion=’entropy’, max_depth=4, max_features=10 Same as decision tree Same as decision tree C=7.5, kernel=’rbf’, gamma=1.3 n_neurons1=26, n_neurons2=10, dropout=0.1, lr=0.08, decay=0.0001
0.159 0.831 0.803 0.894
Random forest AdaBoost Support vector machine Neural network
0.848 0.897 0.812 0.903
Table 7.3: Found optimal hyper-parameters and their f1-score of the pixel classification algorithms. The values of the missing hyperparameters are the default values in scikit-learn 0.19.1 or in Keras 2.1.5 for the neural network. the classifiers (with exception of naive Bayes and SVM) perform similarly with a f1-score around 0.89, with a neural network being the best performer. The interesting observation that can be made from Figure 7.4 is that some classification algorithms (k-nearest neighbour, support vector machine, naive bayes) profit from PCA-compression in terms of prediction time, others are slowed down (random forest, neural network, logistic regression, decision tree), and for AdaBoost there does not seem to be a difference. This means that the decision whether to apply PCAcompression should depend on the chosen classifier. Also the same figure shows that k-nearest neighbour scales very badly with the number of components. This can be explained by the added complexity of calculating the distance between points in a higher dimension. Its huge prediction time makes it unsuitable for our problem. In Figure 7.5 one can evaluate the classifiers on both their f1-score and prediction time. The most accurate and fastest classifiers can be found in the top left corner. Those are neural network and decision tree on uncompressed data and logistic regression on both compressed on uncompressed data. A neural network is the most accurate classifier but is slower to train and run than a decision tree or logistic regression. It is
64
CHAPTER 7. ANALYSIS
Figure 7.3: f1-score of classification algorithms on varying numbers of components for PCA-compression. The dashed line indicates the f1score of the classifiers on non-compressed data. The SVM classifier is not included due to its lacklustre performance.
CHAPTER 7. ANALYSIS
65
Figure 7.4: Prediction time of classification algorithms on varying numbers of components for PCA-compression. The dashed line indicates the prediction time of the classifiers on non-compressed data. Note the logarithmic scale of the time axis.
66
CHAPTER 7. ANALYSIS
surprising that a decision tree classifier performs better than a random forest classifier. The output of a random forest is basically the mode of the output of multiple bagged decision trees. An explanation could be that there is only a single feature or a few features on which the decision tree relies for its classification. Since these essential features are ignored in some of the bagged decision trees, the overall accuracy of the random forest is less than that of a single decision tree. The finding that a random forest performs worse than a single decision tree for pixel classification of blobs is bad for Ilastik which uses a random forest.
7.3 Results from D: Pixel clustering The clustering algorithms were run on a subset of the chunks of the test image after the pixels have been classified. Since the decision tree without PCA-compression has shown to be both accurate and fast, it was used for the pixel classification. In Figure 7.6 one can see how well the clustering algorithms perform in terms of silhouette score and running time. Interestingly, agglomerative clustering and k-Means have similar performance that is both fast and with reasonable silhouette score. MeanShift and spectral clustering are definitely too slow for our purpose. Even more, Table 7.4 shows that MeanShift creates less but larger clusters. Also, the high standard deviation shows that it is highly unreliable in terms of clustering time. Watershed is not suitable because of its low silhouette score that is likely the result of it creating irregularly sized clusters. For this claim Figure 7.7 provides additional evidence by showing the different products of the pixel clustering algorithms on the same connected components. Indeed, in that example there is more variation in the size of the clusters produced by watershed. Also, subjectively speaking, agglomerative (centroid) has created the best shaped clusters in the figure. Based on these results agglomerative clustering with inter-centroid distance as pairing metric seems to be the most appropriate as clustering algorithm.
CHAPTER 7. ANALYSIS
67
Figure 7.5: Comparison of f1-score and prediction time for the classification algorithms. The number between the square brackets indicates the number of components for PCA-compression. Missing square brackets means no compression. Support vector machine is not included due to its lacklustre performance and k-nearest neighbour is not included since its prediction time is much longer than the rest.
68
CHAPTER 7. ANALYSIS
Figure 7.6: Comparison of silhouette score and running time of the clustering algorithms. The time is calculated as the mean clustering time for a 500 × 500 × 16 pixels chunk.
CHAPTER 7. ANALYSIS
(a) Watershed
(b) K-means
(c) Agglomerative (centroid)
(d) Agglomerative (ward)
(e) MeanShift
(f) Spectral
69
Figure 7.7: The same connected components segmented with watershed and clustering algorithms.
70
CHAPTER 7. ANALYSIS
Algorithm Watershed K-means Agglomerative (centroid) Agglomerative (ward) MeanShift Spectral
Time (s)
# Blobs
Blob size
Silhouette
1.416 (+/- 1.266) 0.131 (+/- 0.085) 0.066 (+/- 0.083)
15306 15686 16437
96.603 (+/- 97.092) 94.232 (+/- 78.200) 89.953 (+/- 57.253)
0.284 (+/- 0.171) 0.372 (+/- 0.188) 0.381 (+/- 0.189)
0.110 (+/- 0.108)
15616
94.580 (+/- 75.068)
0.375 (+/- 0.190)
6.378 (+/- 41.416) 4.249 (+/- 9.835)
9530 13081
154.930 (+/- 91.375) 87.701 (+/- 86.807)
0.424 (+/- 0.222) 0.350 (+/- 0.198)
Table 7.4: Statistics of the pixel clustering algorithms. The mean clustering time per 500 × 500 × 16 pixels chunk is given in time. # blobs is the total number of found blobs. blob size is the mean blob size in pixels. silhouette is the mean silhouette score for clustering each chunk. The number between parentheses is the standard deviation.
7.4 Results from E: Run on whole image The found optimal blob detector uses the 10 features in Figure 7.1, no PCA-compression, a decision tree classifier and agglomerative clustering with inter-centroid distance as metric. Running this complete blob detection process on the whole test image took in total 10,696 seconds, which is 2:58 hours. This duration is good considering that capturing that image took 5:09 hours. The share of each step can be found in Figure 7.8. It is unsurprising that the feature extraction step takes the majority of the running time because 10 filters have to applied to each image. The feature extraction could be made faster by not calculating the heavier filters anymore. In this case the 2d_stex_1.0 and the 2d_hogey_0.7 filters have been selected, but since they take more time to calculate than the other filters, they may be discounted in order to improve speed. Furthermore, a total 1,556,913 of blobs were found with an average radius of 2.587 pixels and density of 317.05 · 109 blobs per mm. In Figure 7.10 one can see an example of the steps in the optimised blob detector.
CHAPTER 7. ANALYSIS
71
Figure 7.8: Share of steps in total running time of the blob detection process.
7.5 Results from F: Comparison with state-ofthe-art Figure 7.9 shows how the running time of this thesis’ MFB detector compares with the speed of blob detection performed with FIJI, CellProfiler and Ilastik. Even though the MFB detector uses a very similar approach to Ilastik, it is twice as slow. The likely reason is that Ilastik has been better optimised. Ilastik uses the C++ VIGRA library [115] and makes use of multiple cores. By contrast, the MFB detector is implemented in Python and works on a single core only. Moreover, for a tool that solely makes use of simple image processing algorithms, CellProfiler is very slow. CellProfiler performs approximately the same steps as the FIJI macro but its performance is far worse.
72
CHAPTER 7. ANALYSIS
Figure 7.9: Mean running time of blob detection on a 500 × 500 × 16 pixels chunk with different tools.
CHAPTER 7. ANALYSIS
(a) Input image
(c) Classified pixels
73
(b) Extracted features (examples)
(d) Clustered pixels (colours are for visible separation only)
(e) Extracted blob centroids
Figure 7.10: Input, output and intermediary images in the blob detection steps on a chunk from the test image.
Chapter 8 Conclusions In this project the goal is to move away from simple user-guided image processing and go to fully automated computer vision for the task of blob detection. With the existing tools for biomedical image analysis, the user has to tune the parameters for each step of the blob detection process. Not only is this tedious, but it has to be repeated for every image set as well. This makes automatic operation of such pipelines impossible. Besides this, some of the current software was not designed for the scale of modern high-resolution 3D microscopy images that can get into the tens of gigabytes in size. To avoid the manual selection of blob detection parameters, machine learning techniques can be used to learn the definition of blobs from a large amount of image data by itself. Also, the scaling problem can be solved by out-of-core processing methods. Since performance is a major concern in this project, the research question was formulated as: How can machine learning techniques effectively be applied to blob detection in high-resolution 3D microscopy images? A blob detection pipeline, inspired by Ilastik [36], was designed that consists of 6 sequential steps. For steps 1 to 4, machine learning techniques were tested, while in step 5 and 6 a heuristic approach is used instead. For the first step of feature extraction a set of 10 image features was selected that is characterised by high relevancy and low redundancy. These features are used in the next step of pixel classification to decide for each pixel of the input image whether it is part of a blob or not. For this step 8 popular classifiers were first optimised for the
74
CHAPTER 8. CONCLUSIONS
75
problem, and then evaluated by their f1-score and prediction time. The results show that a decision tree and logistic regression are most suitable because of their high accuracy combined with a low running time. A neural network can achieve a slightly higher f1-score but is slower to classify with. Experiments on feature compression show that PCA compression is not worth it for the top classifiers because the accuracy is hit harder than the prediction time. The fourth step of pixel clustering aims to split up touching blobs. Conventional approaches use a watershed algorithm, but the novelty of this thesis is to apply clustering algorithms too. From the 6 clustering algorithms that were deemed suitable, agglomerative clustering and k-means show the most potential. Not only are they simple and run fast, but they create a high quality clustering, as measured by silhouette score, as well. In the next experiment the optimised blob detection process was applied to a typical image captured for the purpose of in situ RNA sequencing. The aim was to ascertain that the pipeline would work in practice. The running time was just over 3 hours, which satisfies the requirement that it must be less than the time to capture that image (5 hours). Furthermore, the experiment shows that feature extraction is by far (75%) the most lengthy step. This shows that feature selection is likely more significant than choice of classification or clustering algorithm. Even more, the results of experiment A show large discrepancies between the relevancy and time to calculate the features. This means that by selecting a less computation-heavy set of features, the running time can be greatly decreased. The final experiment compares the running time of this thesis’ blob detector, called MFB detector, with FIJI, CellProfiler and Ilastik. The running time of MFB detector was similar to FIJI but slower than Ilastik. The probable reason is that Ilastik is more highly optimised. All-in-all, the results show that indeed machine learning can be very effective for blob detection in high-resolution 3D microscopy images. The proposed blob detector, using 10 optimal features, a decision tree pixel classifier and agglomerative clustering algorithm, approaches the state-of-the-art in terms of speed. Since the MFB detector is trained on labelled data it is presumed to be more accurate than the other blob detectors that rely on user-set parameters. This suspicion is however hard to check because there is no ground truth available on the blobs. Table 8.1 provides a comparison of the four blob detectors.
76
Thresholding Declumping Out-of-core 3D Time (s)
CHAPTER 8. CONCLUSIONS
MFB detector
FIJI
CellProfiler
Ilastik
Supervised ML (labels from data) xy-clustering or watershed Yes Yes 16
User parameters
User parameters
Watershed
Watershed
Supervised ML (labels from user) None
No Yes 14
No Yes, but limited 90
Yes Yes 8
Table 8.1: Comparison of blob detectors. The time denotes the mean duration to process a 500 × 500 × 16 pixels chunk.
8.1 Discussion The implication of the proven good performance is that the MFB detector can eventually be integrated in high-content screening pipelines for analysis of biomedical images. Because the MFB detector does not rely on user parameters, it saves the medical experts time and cognitive effort. Instead of needing to tune the blob detection, they can focus on other stages of the analysis like diagnosis and interpretation. Furthermore, since the detector was trained on labelled data of blobs, it is supposedly more accurate than approaches that rely on user parameters. As a potential drawback, it does mean that the labelled has to be correct, because the performance of the blob detection depends strongly on it. There comes always a time that one should be critical of his own work. Starting with what is good about the research, the found blob detector combines a machine-learned thresholding algorithm combined with the novel use of a clustering algorithm for blob declumping. On the way an optimal set of features was selected and it was found that there can be slight differences in features calculated in 2D versus 3D. Moreover, PCA compression was found to be nonsensical for the problem. An automatic method of creating labelled image data was devised by using the difference between two images with and without blobs. The research question has been thoroughly investigated. For every step in the blob detection process those features or algorithms have been
CHAPTER 8. CONCLUSIONS
77
tested that are either used in related work or have similar function. An additional requirement for the algorithms was that they must be easy to implement using available software libraries. By focusing on this low hanging fruit it is possible that some other good candidates were missed. Also it is possible that the tested candidates have not been perfectly optimised. Since a support vector machine scales more than quadratically with the number of samples, it was arduous to train. The solution was therefore to train on a smaller sample. But this resulted in a low f1-score compared to other classifiers, even though the SVM is known to be generally a good performer. Moreover, only a very simple version of a feed-forward neural network was evaluated. Perhaps other hyper-parameters such as activation function, learning algorithm and network structure can improve its performance. For feature selection, the (confirmed) suspicion was that the choice of features is highly important for the overall performance of blob detection. Therefore an approach was consciously chosen that is specifically designed for computer vision. In a structured fashion it selects a subset of features that expresses both high relevancy and low redundancy. Despite this, there may be better ways to pick the features. For example, wrapper methods can be applied to recursively find an optimal subset of features using a performance evaluation at every iteration [94]. Or embedded methods can be tested that directly integrate with the learning algorithm [94]. In feature selection, being a whole field on its own, a great number of methods have been published already. And with open problems such as scalability, stability and model selection [94] more papers are added every year. The training data came from a single image only because there was no access to more. This may hurt the generality of the learned blob definition. Ideally data from multiple images is used to train the pixel classifiers. There was also no data on the ground truth of blobs to check the clustering results with. Moreover, the unsupervised silhouette score metric was used to measure clustering quality, but this may be suboptimal. Speaking of metrics, those chosen may not provide a good assessment of the image features and algorithms. For feature selection many other metrics exist, to name a few: Fisher-score, ReliefF, chi-square and FScore [94]. Also for classification and clustering there are alternative metrics. For classification we have the Receiver Operating Character-
78
CHAPTER 8. CONCLUSIONS
istic Area Under Curve (ROC AUC), log loss and fbeta-score [116]. For measuring clustering quality there is also Calinski-Harabaz score [117]. With so much choice it is difficult to determine what is the optimal metric to use. In the final experiment F the blob detector from this thesis was compared to other popular bio-image analysis tools. The performance of the different tools may be inaccurate because the chosen parameters could be less than optimal. The trial-and-error nature of those tools made it very costly to try every combination of parameters. Therefore the experiment should be regarded as an approximate comparison of the running times of state-of-the-art tools.
8.2 Future work As mentioned in the delimitations, the blob detector implementation in this project must be viewed as a leap in the right direction and not as final software product. It works reasonably but numerous improvements to speed, accuracy and usability are still available. Image processing, which is now performed by the Python-based scikitimage, could be accelerated by C-based libraries like OpenCV [118] or VIGRA [115]. Within the Python environment, general speed improvements can be achieved by code optimisations such as Numba’s [119] just-in-time compilation. For feature extraction, instead of letting the CPU apply convolutional filters to the input image, GPU’s, being far more efficient at such tasks, can be used. Parallelisation over multiple cores or multiple CPU’s may decrease the running time even more. Since the images are currently processed in chunks anyway, it should not be too hard to distribute them over multiple processors. A higher division of work leads to more overhead though. So perhaps it would be interesting to compare this to the more efficient method of storing the complete image in a single memory array and processing it in a single system. In this thesis only one approach of machine learning to blob detection has been tested, but there are others that may be successful as well. Over-segmentation, upon which subsection 3.2.6 shortly touched, may be used to first split the image into super-pixels and then classify each
CHAPTER 8. CONCLUSIONS
79
group of pixels as being a blob or not using a machine-learned classifier. In this project there was no access to data for enabling this but perhaps future research can evaluate the over-segmentation approach as well. For instance, white blood cell over-segmentation has been done with support vector machines in [120]. In addition, a convolutional neural network may be used for segmentation of bio-images like in [121, 122]. The advantage is that feature extraction is not necessary because the neural network is able to detect visual patterns of neighbouring pixels using convolutions. For images where only a small fraction of the total area contains blobs, there may be ways to avoid searching through empty regions. An idea could be to first predict for every region whether it contains blobs, perhaps using machine learning, before running the full blob detector on the region. By observing the classification results, it could be sometimes noted that noisy artefacts in the images, caused by imperfections of the tissue samples, were sometimes classified as blobs. Since this leads to a lower overall precision, one could look into ways of avoiding these false positives. As extension to the blob detection, machine learning may be applied to the sequencing of RNA as well. With enough data of different sequences, a model can be trained to predict the RNA sequences based on the found blobs in multiple input images. In [123] the authors use graph optimisation to find the most probable RNA sequence. How instead this could be achieved with machine learning is left as an open question. Finally, since the tool was not intended for production as-is, usability has been left out of scope. But for the software tool to be convenient in practice, several additions are required that aid the user. A graphical user interface could be included that shows the available options, provides feedback and displays the end results. Extra checks on user input may be added to improve robustness. Options for different input and output formats could be a good addition as well.
Bibliography [1]
D. Smyth, J. Bowman, and E. Meyerowitz, “Early flower development in Arabidopsis,” Plant Cell, vol. 2, no. 8, pp. 755–767, 1990.
[2]
Z. Zaman, G. Fogazzi, G. Garigali, M. Croci, G. Bayer, and T. Kránicz, “Urine sediment analysis: Analytical and diagnostic performance of sediMAX® - A new automated microscopy imagebased urine sediment analyser,” Clinica Chimica Acta, vol. 411, no. 3-4, pp. 147–154, 2010. doi: 10.1016/j.cca.2009.10.018.
[3]
H. Krivinkova, J. Pontén, and T. Blöndal, “THE DIAGNOSIS OF CANCER FROM BODY FLUIDS: A Comparison of Cytology, DNA Measurement, Tissue Culture, Scanning and Transmission Microscopy,” Acta Pathologica Microbiologica Scandinavica Section A Pathology, vol. 84 A, no. 6, pp. 455–467, 1976. doi: 10.1111/j.1699-0463.1976.tb00143.x.
[4]
E. Volpi and J. Bridger, “FISH glossary: An overview of the fluorescence in situ hybridization technique,” BioTechniques, vol. 45, no. 4, pp. 385–409, 2008. doi: 10.2144/000112811.
[5]
J. Lee, E. Daugharthy, J. Scheiman, et al., “Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues,” Nature Protocols, vol. 10, no. 3, pp. 442– 458, 2015. doi: 10.1038/nprot.2014.191.
[6]
T. W. Nattkemper, T. Twellmann, H. Ritter, and W. Schubert, “Human vs machine: Evaluation of fluorescence micrographs,” eng, Computers in Biology and Medicine, vol. 33, no. 1, pp. 31–43, Jan. 2003, issn: 0010-4825.
80
BIBLIOGRAPHY
81
[7]
P. Rämö, R. Sacher, B. Snijder, B. Begemann, and L. Pelkmans, “CellClassifier: Supervised learning of cellular phenotypes,” Bioinformatics, vol. 25, no. 22, pp. 3028–3030, 2009. doi: 10 . 1093 / bioinformatics/btp524.
[8]
M. Held, M. Schmitz, B. Fischer, et al., “CellCognition: Timeresolved phenotype annotation in high-throughput live cell imaging,” Nature Methods, vol. 7, no. 9, pp. 747–754, 2010. doi: 10 . 1038/nmeth.1486.
[9]
D. Laksameethanasan, R. Tan, G.-L. Toh, and L.-H. Loo, “CellXpress: A fast and user-friendly software platform for profiling cellular phenotypes,” BMC Bioinformatics, vol. 14, no. SUPPL16, 2013. doi: 10.1186/1471-2105-14-S16-S4.
[10] A. Carpenter, T. Jones, M. Lamprecht, et al., “CellProfiler: Image analysis software for identifying and quantifying cell phenotypes,” Genome Biology, vol. 7, no. 10, 2006. doi: 10.1186/gb2006-7-10-r100. [11] F. Zanella, J. B. Lorens, and W. Link, “High content screening: Seeing is believing,” eng, Trends in Biotechnology, vol. 28, no. 5, pp. 237–245, May 2010, issn: 1879-3096. doi: 10.1016/j.tibtech. 2010.02.005. [12] K. Carlsson, R. Lenz, and N. Åslund, “Three-dimensional microscopy using a confocal laser scanning microscope,” Optics Letters, vol. 10, no. 2, pp. 53–55, 1985. doi: 10 . 1364 / OL . 10 . 000053. [13] J. Schindelin, I. Arganda-Carreras, E. Frise, et al., “Fiji: An opensource platform for biological-image analysis,” Nature Methods, vol. 9, no. 7, pp. 676–682, 2012. doi: 10.1038/nmeth.2019.
[14] Health, en-US. [Online]. Available: http://www.un.org/sustainabledevelopment/ health/ (visited on 03/28/2018). [15] A. Håkansson, “Portal of Research Methods and Methodologies for Research Projects and Degree Projects,” eng, in DIVA, CSREA Press U.S.A, 2013, pp. 67–73. [Online]. Available: http: //urn.kb.se/resolve?urn=urn:nbn:se:kth:diva- 136960 (visited on 01/24/2018). [16] J. Caicedo, S. Cooper, F. Heigwer, et al., “Data-analysis strategies for image-based cell profiling,” Nature Methods, vol. 14, no. 9, pp. 849–863, 2017. doi: 10.1038/nmeth.4397.
82
BIBLIOGRAPHY
[17]
B. Edris, J. A. Fletcher, R. B. West, M. van de Rijn, and A. H. Beck, Comparative Gene Expression Profiling of Benign and Malignant Lesions Reveals Candidate Therapeutic Compounds for Leiomyosarcoma, en, Research article, 2012. doi: 10.1155/2012/805614. [Online]. Available: https : / / www . hindawi . com / journals / sarcoma / 2012/805614/ (visited on 02/20/2018).
[18]
E. Solomon, L. Berg, and D. W. Martin, Biology, English, 8 edition. Belmont, CA: Brooks Cole, Jan. 2007, isbn: 978-0-495-317142.
[19]
K. Hofmann, “Enzyme Bioinformatics,” en, in Enzyme Catalysis in Organic Synthesis, Karlheinzauz and H. Waldmann, Eds., Wiley-VCH Verlag GmbH, 2002, pp. 139–162, isbn: 978-3-52761826-2. doi: 10 . 1002 / 9783527618262 . ch5. [Online]. Available: http : / / onlinelibrary . wiley . com . focus . lib . kth . se / doi / 10 . 1002 / 9783527618262 . ch5 / summary (visited on 02/20/2018).
[20]
R. Ke, M. Mignardi, A. Pacureanu, et al., “In situ sequencing for RNA analysis in preserved tissue and cells,” Nature Methods, vol. 10, no. 9, pp. 857–860, 2013. doi: 10.1038/nmeth.2563.
[21]
D. J. S. Birch, Y. Chen, and O. J. Rolinski, “Fluorescence,” en, in Photonics, D. L. Andrews, Ed., John Wiley & Sons, Inc., 2015, pp. 1–58, isbn: 978-1-119-01180-4. doi: 10.1002/9781119011804. ch1. [Online]. Available: http://onlinelibrary.wiley.com. focus . lib . kth . se / doi / 10 . 1002 / 9781119011804 . ch1 / summary (visited on 02/20/2018).
[22]
N. Battich, T. Stoeger, and L. Pelkmans, “Image-based transcriptomics in thousands of single human cells at single-molecule resolution,” En, Nature Methods, vol. 10, no. 11, p. 1127, Oct. 2013, issn: 1548-7105. doi: 10.1038/nmeth.2657. [Online]. Available: https://www.nature.com.focus.lib.kth.se/articles/ nmeth.2657 (visited on 01/25/2018).
[23]
Y. Li, S. Wang, Q. Tian, and X. Ding, “A survey of recent advances in visual feature detection,” Neurocomputing, vol. 149, pp. 736–751, Feb. 2015, issn: 0925-2312. doi: 10.1016/j.neucom. 2014.08.003. [Online]. Available: http://www.sciencedirect. com / science / article / pii / S0925231214010121 (visited on 01/22/2018).
BIBLIOGRAPHY
83
[24] T. Lindeberg, “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focusof-attention,” en, International Journal of Computer Vision, vol. 11, no. 3, pp. 283–318, Dec. 1993, issn: 0920-5691, 1573-1405. doi: 10 . 1007 / BF01469346. [Online]. Available: https : / / link springer-com.focus.lib.kth.se/article/10.1007/BF01469346 (visited on 01/30/2018). [25] Blob Detection Using OpenCV ( Python, C++ ) | Learn OpenCV. [Online]. Available: https : / / www . learnopencv . com / blob detection-using-opencv-python-c/ (visited on 02/21/2018). [26] S. Lazebnik, Blob detection, Feb. 2011. [Online]. Available: http: / / www . cs . unc . edu / ~lazebnik / spring11 / lec08 _ blob . pdf (visited on 02/21/2018). [27] T. Lindeberg and J.-O. Eklundh, “Scale detection and region extraction from a scale-space primal sketch,” 1990, pp. 416–426. [28] A. Kaspers, “Blob detection,” English, Image Science Institute, UMC Utrecht, Tech. Rep., 2011. (visited on 02/21/2018). [29] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision. Thomson-Engineering, 2007, isbn: 978-0-49508252-1. [30] Template Matching — skimage v0.14dev docs. [Online]. Available: http : / / scikit - image . org / docs / dev / auto _ examples / features_detection/plot_template.html (visited on 02/21/2018). [31] M. Sezgin and B. Sankur, “Survey over image thresholding techniques and quantitative performance evaluation,” J. Electronic Imaging, vol. 13, pp. 146–168, Jan. 2004. [32] S. van der Walt, J. L. Schönberger, J. Nunez-Iglesias, et al., “Scikitimage: Image processing in Python,” en, PeerJ, vol. 2, e453, Jun. 2014, issn: 2167-8359. doi: 10.7717/peerj.453. [Online]. Available: https://peerj.com/articles/453 (visited on 02/21/2018). [33] T. Lindeberg, “Feature Detection with Automatic Scale Selection,” International Journal of Computer Vision, vol. 30, no. 2, pp. 79– 116, 1998.
84
BIBLIOGRAPHY
[34]
T. Lindeberg, “Scale Selection Properties of Generalized ScaleSpace Interest Point Detectors,” en, Journal of Mathematical Imaging and Vision, vol. 46, no. 2, pp. 177–210, Jun. 2013, issn: 09249907, 1573-7683. doi: 10.1007/s10851- 012- 0378- 3. [Online]. Available: https://link-springer-com.focus.lib.kth.se/ article/10.1007/s10851-012-0378-3 (visited on 03/22/2018).
[35]
J. D. Hunter, “Matplotlib: A 2d Graphics Environment,” Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007, issn: 1521-9615. doi: 10 . 1109 / MCSE . 2007 . 55. [Online]. Available: http://ieeexplore.ieee.org/document/4160265/ (visited on 02/21/2018).
[36]
C. Sommer, C. Straehle, U. Kothe, and F. Hamprecht, “Ilastik: Interactive learning and segmentation toolkit,” 2011, pp. 230– 233. doi: 10.1109/ISBI.2011.5872394.
[37]
P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient GraphBased Image Segmentation,” en, International Journal of Computer Vision, vol. 59, no. 2, pp. 167–181, Sep. 2004, issn: 0920-5691, 1573-1405. doi: 10.1023/B:VISI.0000022288.19776.77. [Online]. Available: https://link-springer-com.focus.lib.kth. se/article/10.1023/B:VISI.0000022288.19776.77 (visited on 01/30/2018).
[38]
A. Vedaldi and S. Soatto, “Quick Shift and Kernel Methods for Mode Seeking,” en, in Computer Vision – ECCV 2008, ser. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, Oct. 2008, pp. 705–718, isbn: 978-3-540-88692-1 978-3-540-886938. doi: 10.1007/978- 3- 540- 88693- 8_52. [Online]. Available: https://link- springer- com.focus.lib.kth.se/chapter/ 10.1007/978-3-540-88693-8_52 (visited on 01/30/2018).
[39]
D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94.
[40]
W. S. Qureshi, A. Payne, K. B. Walsh, R. Linker, O. Cohen, and M. N. Dailey, “Machine vision for counting fruit on mango tree canopies,” en, Precision Agriculture, vol. 18, no. 2, pp. 224–244, Apr. 2017, issn: 1385-2256, 1573-1618. doi: 10 . 1007 / s11119 016 - 9458 - 5. [Online]. Available: https : / / link . springer . com/article/10.1007/s11119-016-9458-5 (visited on 01/22/2018).
BIBLIOGRAPHY
85
[41] Comparison of segmentation and superpixel algorithms — skimage v0.14dev docs. [Online]. Available: http://scikit-image.org/ docs/dev/auto_examples/segmentation/plot_segmentations. html (visited on 02/21/2018). [42] A. L. Samuel, “Some Studies in Machine Learning Using the Game of Checkers,” IBM Journal of Research and Development, vol. 3, no. 3, pp. 210–229, Jul. 1959, issn: 0018-8646. doi: 10.1147/ rd.33.0210. [43] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, en. Prentice Hall, 2010, isbn: 978-0-13-604259-4. [44] P. Berkhin, “A survey of clustering data mining techniques,” in Grouping Multidimensional Data: Recent Advances in Clustering, 2006, pp. 25–71. doi: 10.1007/3-540-28349-8_2. [45] S. T. Roweis and L. K. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” en, Science, vol. 290, no. 5500, pp. 2323–2326, Dec. 2000, issn: 0036-8075, 1095-9203. doi: 10 . 1126 / science . 290 . 5500 . 2323. [Online]. Available: http : / / science.sciencemag.org/content/290/5500/2323 (visited on 02/23/2018). [46] T. Menzies, “Data Mining,” en, in Recommendation Systems in Software Engineering, Springer, Berlin, Heidelberg, 2014, pp. 39– 75, isbn: 978-3-642-45134-8 978-3-642-45135-5. doi: 10.1007/9783-642-45135-5_3. [Online]. Available: https://link-springercom.focus.lib.kth.se/chapter/10.1007/978-3-642-451355_3 (visited on 02/26/2018). [47] P. Domingos and M. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” en, Machine Learning, vol. 29, no. 2-3, pp. 103–130, Nov. 1997, issn: 0885-6125, 15730565. doi: 10.1023/A:1007413511361. [Online]. Available: https: //link-springer-com.focus.lib.kth.se/article/10.1023/ A:1007413511361 (visited on 02/26/2018). [48] X. Wu, V. Kumar, J. R. Quinlan, et al., “Top 10 algorithms in data mining,” en, Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, Jan. 2008, issn: 0219-1377, 0219-3116. doi: 10 . 1007 / s10115 - 007 - 0114 - 2. [Online]. Available: https : / / link springer-com.focus.lib.kth.se/article/10.1007/s10115007-0114-2 (visited on 02/28/2018).
86
BIBLIOGRAPHY
[49]
M. M. Deza and E. Deza, Encyclopedia of Distances, en, 4th ed. Berlin Heidelberg: Springer-Verlag, 2016, isbn: 978-3-662-528433. [Online]. Available: //www.springer.com/la/book/9783662528433 (visited on 03/22/2018).
[50]
P. Hart, “The condensed nearest neighbor rule (Corresp.),” IEEE Transactions on Information Theory, vol. 14, no. 3, pp. 515–516, May 1968, issn: 0018-9448. doi: 10.1109/TIT.1968.1054155.
[51]
D. L. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-2, no. 3, pp. 408–421, Jul. 1972, issn: 00189472. doi: 10.1109/TSMC.1972.4309137.
[52]
V. Podgorelec and M. Zorman, “Decision trees,” in Computational Complexity: Theory, Techniques, and Applications, vol. 9781461418009, 2012, pp. 827–845. doi: 10.1007/978-1-4614-1800-9_53.
[53]
T. K. Ho, “A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors,” en, Pattern Analysis & Applications, vol. 5, no. 2, pp. 102–112, Jun. 2002, issn: 14337541. doi: 10.1007/s100440200009. [Online]. Available: https: //link-springer-com.focus.lib.kth.se/article/10.1007/ s100440200009 (visited on 02/28/2018).
[54]
Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, Aug. 1997, issn: 0022-0000. doi: 10.1006/jcss.1997.1504. [Online]. Available: http : / / www . sciencedirect . com / science / article/pii/S002200009791504X (visited on 02/28/2018).
[55]
D. Simovici, “Intelligent Data Analysis Techniques—Machine Learning and Data Mining,” en, in Artificial Intelligent Approaches in Petroleum Geosciences, Springer, Cham, 2015, pp. 1–51, isbn: 978-3-319-16530-1 978-3-319-16531-8. doi: 10.1007/978-3-31916531-8_1. [Online]. Available: https://link-springer-com. focus.lib.kth.se/chapter/10.1007/978-3-319-16531-8_1 (visited on 02/27/2018).
[56]
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
BIBLIOGRAPHY
87
[57] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” arXiv:1212.5701 [cs], Dec. 2012, arXiv: 1212.5701. [Online]. Available: http://arxiv.org/abs/1212.5701 (visited on 04/11/2018). [58] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 [cs], Dec. 2014, arXiv: 1412.6980. [Online]. Available: http://arxiv.org/abs/1412.6980 (visited on 04/11/2018). [59] D. Mysid, A simplified view of an artifical neural network. Nov. 2006. [Online]. Available: https://commons.wikimedia.org/ w/index.php?curid=1412126 (visited on 04/11/2018). [60] S. Firdaus and M. A. Uddin, “A Survey on Clustering Algorithms and Complexity Analysis,” English, International Journal of Computer Science, vol. 12, no. 2, pp. 62–85, Mar. 2015, issn: 1694-0814. [Online]. Available: https://www.ijcsi.org/papers/ IJCSI-12-2-62-85.pdf (visited on 03/01/2018). [61] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, May 2002, issn: 0162-8828. doi: 10.1109/34.1000236. [62] Sklearn.cluster.MeanShift — scikit-learn 0.19.1 documentation. [Online]. Available: http://scikit-learn.org/stable/modules/ generated/sklearn.cluster.MeanShift.html#sklearn.cluster. MeanShift (visited on 03/05/2018). [63] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and an algorithm,” in Advances in Neural Information Processing Systems, MIT Press, 2001, pp. 849–856. [64] U. V. Luxburg, A Tutorial on Spectral Clustering. 2007. [65] B. J. Frey and D. Dueck, “Clustering by Passing Messages Between Data Points,” en, Science, vol. 315, no. 5814, pp. 972–976, Feb. 2007, issn: 0036-8075, 1095-9203. doi: 10 . 1126 / science . 1136800. [Online]. Available: http : / / science . sciencemag . org/content/315/5814/972 (visited on 03/02/2018).
88
BIBLIOGRAPHY
[66]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, ser. KDD’96, Portland, Oregon: AAAI Press, 1996, pp. 226–231. [Online]. Available: http://dl.acm. org/citation.cfm?id=3001460.3001507 (visited on 03/02/2018).
[67]
2.3. Clustering — scikit-learn 0.19.1 documentation. [Online]. Available: http://scikit-learn.org/stable/modules/clustering. html#birch (visited on 03/02/2018).
[68]
P. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, no. C, pp. 53–65, 1987. doi: 10. 1016/0377-0427(87)90125-7.
[69]
Sklearn.metrics.silhouette_score — scikit-learn 0.19.1 documentation. [Online]. Available: http://scikit-learn.org/stable/modules/ generated/sklearn.metrics.silhouette_score.html#sklearn. metrics.silhouette_score (visited on 03/05/2018).
[70]
A. Hervé and L. Williams, “Principal component analysis,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, Jul. 2010, issn: 1939-5108. doi: 10.1002/wics.101. [Online]. Available: http://onlinelibrary.wiley.com/doi/ abs/10.1002/wics.101 (visited on 04/03/2018).
[71]
S. Ng, “Principal component analysis to reduce dimension on digital image,” vol. 111, 2017, pp. 113–119. doi: 10 . 1016 / j . procs.2017.06.017.
[72]
K. T. M. Han and B. Uyyanonvara, “A Survey of Blob Detection Algorithms for Biomedical Images,” in 2016 7th International Conference of Information and Communication Technology for Embedded Systems (IC-ICTES), Mar. 2016, pp. 57–60. doi: 10 . 1109/ICTEmSys.2016.7467122.
[73]
K. Yamamoto, Y. Yoshioka, and S. Ninomiya, “Detection and counting of intact tomato fruits on tree using image analysis and machine learning methods,” 2013, pp. 664–667.
BIBLIOGRAPHY
89
[74] N. J. B. McFarlane and C. P. Schofield, “Segmentation and tracking of piglets in images,” en, Machine Vision and Applications, vol. 8, no. 3, pp. 187–193, May 1995, issn: 0932-8092, 1432-1769. doi: 10.1007/BF01215814. [Online]. Available: https://link. springer.com/article/10.1007/BF01215814 (visited on 01/23/2018). [75] J. Zavadil, J. Tuma, and V. Santos, “Traffic signs detection using blob analysis and pattern recognition,” 2012, pp. 776–779. doi: 10.1109/CarpathianCC.2012.6228752. [76] L. Minor and J. Sklansky, “The Detection and Segmentation of Blobs in Infrared Images,” IEEE Transactions on Systems, Man and Cybernetics, vol. 11, no. 3, pp. 194–201, 1981. doi: 10.1109/TSMC. 1981.4308652. [77] W. Moon, Y.-W. Shen, M. Bae, C.-S. Huang, J.-H. Chen, and R.-F. Chang, “Computer-aided tumor detection based on multi-scale blob detection algorithm in automated breast ultrasound images,” IEEE Transactions on Medical Imaging, vol. 32, no. 7, pp. 1191– 1200, 2013. doi: 10.1109/TMI.2012.2230403. [78] C. A. Schneider, W. S. Rasband, and K. W. Eliceiri, NIH Image to ImageJ: 25 years of image analysis, en, Comments and Opinion, Jun. 2012. doi: 10.1038/nmeth.2089. [Online]. Available: http : / / www . nature . com / articles / nmeth . 2089 (visited on 05/15/2018). [79] J. Lee, E. Daugharthy, J. Scheiman, et al., “Highly multiplexed subcellular RNA sequencing in situ,” Science, vol. 343, no. 6177, pp. 1360–1363, 2014. doi: 10.1126/science.1250212. [80] T. Stoeger, N. Battich, M. Herrmann, Y. Yakimovich, and L. Pelkmans, “Computer vision for image-based transcriptomics,” Methods, vol. 85, pp. 44–53, 2015. doi: 10.1016/j.ymeth.2015.05. 016. [81] O. Z. Kraus and B. J. Frey, “Computer vision for high content screening,” Critical Reviews in Biochemistry and Molecular Biology, vol. 51, no. 2, pp. 102–109, Mar. 2016, issn: 1040-9238. doi: 10.3109/10409238.2015.1135868. [Online]. Available: https: / / doi . org / 10 . 3109 / 10409238 . 2015 . 1135868 (visited on 02/01/2018).
90
BIBLIOGRAPHY
[82]
A. Shariff, J. Kangas, L. P. Coelho, S. Quinn, and R. F. Murphy, “Automated image analysis for high-content screening and analysis,” eng, Journal of Biomolecular Screening, vol. 15, no. 7, pp. 726– 734, Aug. 2010, issn: 1552-454X. doi: 10.1177/1087057110370894.
[83]
C. Sommer and D. Gerlich, “Machine learning in cell biologyteaching computers to recognize phenotypes,” Journal of Cell Science, vol. 126, no. 24, pp. 5529–5539, 2013. doi: 10.1242/jcs. 123604.
[84]
B. T. Grys, D. S. Lo, N. Sahin, et al., “Machine learning and computer vision approaches for phenotypic profiling,” en, J Cell Biol, vol. 216, no. 1, pp. 65–71, Jan. 2017, issn: 0021-9525, 1540-8140. doi: 10.1083/jcb.201610026. [Online]. Available: http://jcb. rupress.org/content/216/1/65 (visited on 01/23/2018).
[85]
M. Wang, X. Zhou, F. Li, J. Huckins, R. King, and S. Wong, “Novel cell segmentation and online SVM for cell cycle phase identification in automated microscopy,” Bioinformatics, vol. 24, no. 1, pp. 94–101, 2008. doi: 10.1093/bioinformatics/btm530.
[86]
K. Vermeer, d. S. van der, H. Lemij, and B. de, “Automated segmentation by pixel classification of retinal layers in ophthalmic OCT images,” Biomedical Optics Express, vol. 2, no. 6, pp. 1743– 1756, 2011.
[87]
H. Irshad, A. Veillard, L. Roux, and D. Racoceanu, “Methods for nuclei detection, segmentation, and classification in digital histopathology: A review-current status and future potential,” IEEE Reviews in Biomedical Engineering, vol. 7, pp. 97–114, 2014. doi: 10.1109/RBME.2013.2295804.
[88]
K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. J. Snead, I. A. Cree, and N. M. Rajpoot, “Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1196–1206, May 2016, issn: 0278-0062, 1558254X. doi: 10 . 1109 / TMI . 2016 . 2525803. [Online]. Available: http://ieeexplore.ieee.org/document/7399414/ (visited on 05/03/2018).
[89]
S. Niu and K. Ren, “Neural cell image segmentation method based on support vector machine,” vol. 9675, 2015. doi: 10.1117/ 12.2205114.
BIBLIOGRAPHY
91
[90] N. Hatipoglu and G. Bilgin, “Cell segmentation in histopathological images with deep learning algorithms by utilizing spatial relationships,” Medical and Biological Engineering and Computing, vol. 55, no. 10, pp. 1829–1848, 2017. doi: 10.1007/s11517017-1630-1. [91] F. Piccinini, T. Balassa, A. Szkalisity, et al., “Advanced Cell Classifier: User-Friendly Machine-Learning-Based Software for Discovering Phenotypes in High-Content Imaging Data,” eng, Cell Systems, vol. 4, no. 6, 651–655.e5, Jun. 2017, issn: 2405-4712. doi: 10.1016/j.cels.2017.05.012. [92] J. Bins and B. A. Draper, “Feature selection from huge feature sets,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, 2001, 159–165 vol.2. doi: 10.1109/ ICCV.2001.937619. [93] K. Kira and L. A. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” in Proceedings of the Tenth National Conference on Artificial Intelligence, ser. AAAI’92, San Jose, California: AAAI Press, 1992, pp. 129–134, isbn: 978-0262-51063-9. [Online]. Available: http://dl.acm.org/citation. cfm?id=1867135.1867155 (visited on 03/27/2018). [94] J. Li, K. Cheng, S. Wang, et al., “Feature Selection: A Data Perspective,” arXiv:1601.07996 [cs], Jan. 2016, arXiv: 1601.07996. [Online]. Available: http://arxiv.org/abs/1601.07996 (visited on 03/27/2018). [95] K. Yeager, LibGuides: SPSS Tutorials: Chi-Square Test of Independence, en. [Online]. Available: https : / / libguides . library . kent.edu/SPSS/ChiSquare (visited on 04/09/2018). [96] ——, LibGuides: SPSS Tutorials: Pearson Correlation, en. [Online]. Available: https : / / libguides . library . kent . edu / SPSS / PearsonCorr (visited on 04/09/2018). [97] R. Caruana and D. Freitag, “Greedy Attribute Selection,” in In Proceedings of the Eleventh International Conference on Machine Learning, Morgan Kaufmann, 1994, pp. 28–36. [98] F. Chollet et al., Keras. 2015. [Online]. Available: https://keras. io.
92
[99]
BIBLIOGRAPHY
M. Abadi, A. Agarwal, P. Barham, et al., TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. [Online]. Available: https://www.tensorflow.org/.
[100] F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, no. Oct, pp. 2825–2830, 2011, issn: ISSN 15337928. [Online]. Available: http : / / jmlr . org / papers / v12 / pedregosa11a.html (visited on 03/27/2018). [101] T. Schaul, I. Antonoglou, and D. Silver, “Unit Tests for Stochastic Optimization,” arXiv:1312.6055 [cs], Dec. 2013, arXiv: 1312.6055. [Online]. Available: http://arxiv.org/abs/1312.6055 (visited on 04/11/2018). [102] I. G. Goldberg, C. Allan, J.-M. Burel, et al., “The Open Microscopy Environment (OME) Data Model and XML file: Open tools for informatics and quantitative analysis in biological imaging,” eng, Genome Biology, vol. 6, no. 5, R47, 2005, issn: 1474-760X. doi: 10. 1186/gb-2005-6-5-r47. [103] Sklearn.cluster.KMeans — scikit-learn 0.19.1 documentation. [Online]. Available: http://scikit-learn.org/stable/modules/ generated/sklearn.cluster.KMeans.html (visited on 03/13/2018). [104] L. Kamentsky, Python-javabridge: Python wrapper for the Java Native Interface, original-date: 2014-03-05T16:10:38Z, May 2018. [Online]. Available: https://github.com/LeeKamentsky/pythonjavabridge (visited on 05/15/2018). [105] NumPy — NumPy. [Online]. Available: http://www.numpy.org/ (visited on 03/27/2018). [106] A. Hagberg, P. Swart, and D. S Chult, “Exploring Network Structure, Dynamics, and Function Using NetworkX,” in Proceedings of the 7th Python in Science Conference, Jan. 2008. [107] W. McKinney, “Data Structures for Statistical Computing in Python,” in Proceedings of the 9th Python in Science Conference, S. v. d. Walt and J. Millman, Eds., 2010, pp. 51–56. [108] Python-bioformats: Read and write life sciences file formats, originaldate: 2014-03-05T16:23:41Z, Apr. 2018. [Online]. Available: https: //github.com/CellProfiler/python-bioformats (visited on 05/15/2018).
BIBLIOGRAPHY
93
[109] Riverbank | Software | PyQt | What is PyQt? [Online]. Available: https://www.riverbankcomputing.com/software/pyqt/ intro (visited on 05/15/2018). [110] S. v. d. Walt, J. L. Schönberger, J. Nunez-Iglesias, et al., “Scikitimage: Image processing in Python,” en, PeerJ, vol. 2, e453, Jun. 2014, issn: 2167-8359. doi: 10.7717/peerj.453. [Online]. Available: https://peerj.com/articles/453 (visited on 03/27/2018). [111] E. Jones, T. Oliphant, P. Peterson, et al., SciPy: Open source scientific tools for Python. 2001. [Online]. Available: http : / / www . scipy.org/. [112] S. Silvester, Tifffile: Read and write image data from and to TIFF files. [Online]. Available: https : / / github . com / blink1073 / tifffile (visited on 03/27/2018). [113] Pyyaml: Canonical source repository for PyYAML, original-date: 2011-11-03T05:09:49Z, May 2018. [Online]. Available: https:// github.com/yaml/pyyaml (visited on 05/15/2018). [114] D. M. Van and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2625, 2008. [115] U. Köthe, Generische Programmierung für die Bildverarbeitung, Deutsch. Hamburg: Books on Demand, Sep. 2000, isbn: 978-3-8311-02396. [116] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008, isbn: 978-0-521-86571-5. [117] M. Kozak, ““A Dendrite Method for Cluster Analysis” by Caliński and Harabasz: A Classical Work that is Far Too Often Incorrectly Cited,” Communications in Statistics - Theory and Methods, vol. 41, no. 12, pp. 2279–2280, Jun. 2012, issn: 0361-0926. doi: 10.1080/03610926.2011.560741. [Online]. Available: https: / / doi . org / 10 . 1080 / 03610926 . 2011 . 560741 (visited on 05/16/2018). [118] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000. [119] Numba — Numba. [Online]. Available: https://numba.pydata. org/ (visited on 05/16/2018).
94
BIBLIOGRAPHY
[120] X. Zheng, Y. Wang, and G. Wang, “White blood cell segmentation using expectation-maximization and automatic support vector machine learning,” Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing, vol. 28, no. 5, pp. 614–619, 2013. [121] D. Cireşan, A. Giusti, L. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” vol. 4, 2012, pp. 2843–2851. [122] P. Moeskops, M. Viergever, A. Mendrik, V. De, M. Benders, and I. Isgum, “Automatic Segmentation of MR Brain Images with a Convolutional Neural Network,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1252–1261, 2016. doi: 10.1109/TMI. 2016.2548501. [123] G. Partel, G. Milli, and C. Wählby, “Improving Recall of In Situ Sequencing by Self-Learned Features and a Graphical Model,” arXiv:1802.08894 [cs, q-bio], Feb. 2018, arXiv: 1802.08894. [Online]. Available: http://arxiv.org/abs/1802.08894 (visited on 05/16/2018).
Appendix A Experiment F software configurations A.1
Crops
The coordinates of the crops that are processed by the four blob detection programs can be found in Table A.1.
A.2
MFB detector
The proposed blob detector in this thesis follows the blob detection process described in 6.1. The top 10 best features from Table 7.1 are extracted for each input image. The features are not compressed. A decision tree classifier is used to classify the pixels based on these features. Then an agglomerative clustering algorithm relying on inter-centroid distance is used to group the blob pixels into blobs. Finally the blobs smaller than 4 pixels are filtered out and the blob centroids are calculated.
95
96
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
x
y
10462 11060 12158 15579 17481 20019 21200 3729 6315 8038
6331 5698 4521 4741 13492 13236 7883 2072 8851 12185
Table A.1: Locations in pixels of the random crops of the test image. All the crops are of size 500 × 500 × 16 pixels.
A.3 FIJI ImageJ macro, executed with FIJI (ImageJ 1.51s) for each input image with Process -> Batch -> Macro...: name=getTitle; run("Smooth (3D)", "method=Gaussian sigma=1.000 use"); run("3D Fast Filters","filter=TopHat radius_x_pix=2.0 radius_y_pix =2.0 radius_z_pix=1.0 Nb_cpus=8"); run("Make Binary", "method=MaxEntropy background=Default"); run("3D Fill Holes"); run("3D Maxima Finder", "radiusxy=1.50 radiusz=0.5 noise=100"); run("3D Watershed Split", "binary=3D_TopHat seeds=peaks radius=1") ; run("3D object counter...", "threshold=100 slice=8 min.=4 max .=4000000 statistics"); filename = name + "_blobs.csv" saveAs("Results", "D:\\Single\\my-first-blobs\\analysis\\F\\fiji \\" + filename);
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
A.4
97
CellProfiler
CellProfiler pipeline, executed with CellProfiler 3.0.0 for each input image: CellProfiler Pipeline: http://www.cellprofiler.org Version:3 DateRevision:300 GitHash: ModuleCount:13 HasImagePlaneDetails:False Images:[module_num:1|svn_version:\'Unknown\'| variable_revision_number:2|show_window:False|notes:\x5B\'To begin creating your project, use the Images module to compile a list of files and/or folders that you want to analyze. You can also specify a set of rules to include only the desired files in your selected folders.\'\x5D|batch_state:array(\x5B\ x5D, dtype=uint8)|enabled:True|wants_pause:False] : Filter images?:Images only Select the rule criteria:and (extension does isimage) ( directory doesnot containregexp "\x5B\\\\\\\\\\\\\\\\/\x5D \\\\\\\\.") Metadata:[module_num:2|svn_version:\'Unknown\'| variable_revision_number:4|show_window:False|notes:\x5B\'The Metadata module optionally allows you to extract information describing your images (i.e, metadata) which will be stored along with your measurements. This information can be contained in the file name and/or location, or in an external file.\'\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled: True|wants_pause:False] Extract metadata?:Yes Metadata data type:Text Metadata types:{} Extraction method count:1 Metadata extraction method:Extract from file/folder names Metadata source:File name Regular expression to extract from file name:(?P.*) Regular expression to extract from folder name:(?P\x5B0 -9\x5D{4}_\x5B0-9\x5D{2}_\x5B0-9\x5D{2})$ Extract metadata from:All images Select the filtering criteria:and (file does contain "")
98
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
Metadata file location: Match file and image metadata:\x5B\x5D Use case insensitive matching?:No NamesAndTypes:[module_num:3|svn_version:\'Unknown\'| variable_revision_number:8|show_window:False|notes:\x5B\'The NamesAndTypes module allows you to assign a meaningful name to each image by which other modules will refer to it.\'\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Assign a name to:All images Select the image type:Grayscale image Name to assign these images:input Match metadata:\x5B\x5D Image set matching method:Order Set intensity range from:Image metadata Assignments count:1 Single images count:0 Maximum intensity:255.0 Process as 3D?:Yes Relative pixel spacing in X:1.0 Relative pixel spacing in Y:1.0 Relative pixel spacing in Z:3.7 Select the rule criteria:and (file does contain "") Name to assign these images:DNA Name to assign these objects:Cell Select the image type:Grayscale image Set intensity range from:Image metadata Maximum intensity:255.0 Groups:[module_num:4|svn_version:\'Unknown\'| variable_revision_number:2|show_window:False|notes:\x5B\'The Groups module optionally allows you to split your list of images into image subsets (groups) which will be processed independently of each other. Examples of groupings include screening batches, microtiter plates, time-lapse movies, etc .\'\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Do you want to group your images?:No grouping metadata count:1 Metadata category:None GaussianFilter:[module_num:5|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False]
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
Select the input image:input Name the output image:gaussian_filtered Sigma:0.3 EnhanceOrSuppressFeatures:[module_num:6|svn_version:\'Unknown\'| variable_revision_number:6|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the input image:gaussian_filtered Name the output image:enhanced Select the operation:Enhance Feature size:12 Feature type:Speckles Range of hole sizes:1,10 Smoothing scale:2.0 Shear angle:0.0 Decay:0.95 Enhancement method:Tubeness Speed and accuracy:Fast Threshold:[module_num:7|svn_version:\'Unknown\'| variable_revision_number:10|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the input image:enhanced Name the output image:thresholded Threshold strategy:Global Thresholding method:Manual Threshold smoothing scale:0.0 Threshold correction factor:1.0 Lower and upper bounds on threshold:0.0,1.0 Manual threshold:0.10 Select the measurement to threshold with:None Two-class or three-class thresholding?:Two classes Assign pixels in the middle intensity class to the foreground or the background?:Foreground Size of adaptive window:50 Lower outlier fraction:0.05 Upper outlier fraction:0.05 Averaging method:Mean Variance method:Standard deviation # of deviations:2.0 Thresholding method:Otsu RemoveHoles:[module_num:8|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D|
99
100
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the input image:thresholded Name the output image:removed_holes Size:1.0 Watershed:[module_num:9|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the input image:removed_holes Name the output object:watershed Generate from:Distance Markers:None Mask:Leave blank Connectivity:8 Downsample:1 MeasureObjectSizeShape:[module_num:10|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select objects to measure:watershed Calculate the Zernike features?:No FilterObjects:[module_num:11|svn_version:\'Unknown\'| variable_revision_number:8|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the objects to filter:watershed Name the output objects:filtered_objects Select the filtering mode:Measurements Select the filtering method:Limits Select the objects that contain the filtered objects:None Select the location of the rules or classifier file:Elsewhere ...\x7C Rules or classifier file name:rules.txt Class number:1 Measurement count:1 Additional object count:0 Assign overlapping child to:Both parents Select the measurement to filter by:AreaShape_Area Filter using a minimum measurement value?:Yes Minimum value:4 Filter using a maximum measurement value?:Yes Maximum value:1000
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
MeasureObjectSizeShape:[module_num:12|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select objects to measure:filtered_objects Calculate the Zernike features?:No ExportToSpreadsheet:[module_num:13|svn_version:\'Unknown\'| variable_revision_number:12|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the column delimiter:Comma (",") Add image metadata columns to your object data file?:Yes Select the measurements to export:No Calculate the per-image mean values for object measurements?: No Calculate the per-image median values for object measurements ?:No Calculate the per-image standard deviation values for object measurements?:No Output file location:Elsewhere...\x7CD\x3A\\\\\\\\Single \\\\\\\\my-first-blobs\\\\\\\\analysis\\\\\\\\F\\\\\\\\ cellprofiler Create a GenePattern GCT file?:No Select source of sample row name:Metadata Select the image to use as the identifier:None Select the metadata to use as the identifier:None Export all measurement types?:No Press button to select measurements:filtered_objects\ x7CAreaShape_Area,filtered_objects\x7CAreaShape_MeanRadius, Image\x7CCount_filtered_objects,Image\ x7CExecutionTime_01Images,Image\x7CExecutionTime_04Groups, Image\x7CExecutionTime_02Metadata,Image\ x7CExecutionTime_11FilterObjects,Image\ x7CExecutionTime_03NamesAndTypes,Image\ x7CExecutionTime_07Threshold,Image\ x7CExecutionTime_08RemoveHoles,Image\ x7CExecutionTime_05GaussianFilter,Image\ x7CExecutionTime_09Watershed,Image\ x7CExecutionTime_06EnhanceOrSuppressFeatures,Image\ x7CExecutionTime_10MeasureObjectSizeShape,Image\ x7CFileName_input,Experiment\x7CModification_Timestamp, Experiment\x7CRun_Timestamp Representation of Nan/Inf:NaN Add a prefix to file names?:No
101
102
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
Figure A.1: List of input images. Filename prefix:MyExpt_ Overwrite existing files without warning?:Yes Data to export:filtered_objects Combine these object measurements with those of the previous object?:No File name:blobs.csv Use the object name for the file name?:No
A.5 Ilastik The parameters of the Ilastik Pixel Classification + Object Classification project, executed with Ilastik 1.3.0 can be found in the Figures A.1, A.2, A.3, A.4, A.5 and A.6.
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
103
Figure A.2: Selected pixel features based on the found best features in Table 7.1.
Figure A.3: Labels in the Training step. Label 1 denotes non-blob pixels and Label 2 denotes blob pixels. Of both around 50 example pixels were indicated.
104
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
Figure A.4: Parameters of the Thresholding step.
Figure A.5: Parameters of the Object Feature Selection step. Only the size feature was selected.
APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS
105
Figure A.6: Labels in the Object Classification step. The labels are irrelevant because object classification is not part of blob detection. However, two labels were needed in order to export the blobs.
TRITA TRITA-EECS-EX-2018:125 ISSN 1653-5146
www.kth.se