SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
1
Reconstructing Curvilinear Networks using Path Classifiers and Integer Programming Engin Turetken, Fethallah Benmansour, Bjoern Andres, Przemysław Głowacki, ¨ Hanspeter Pfister, Senior Member, IEEE, and Pascal Fua, Fellow, IEEE Abstract—We propose a novel approach to automated delineation of curvilinear structures that form complex and potentially loopy networks. By representing the image data as a graph of potential paths, we first show how to weight these paths using discriminatively-trained classifiers that are both robust and generic enough to be applied to very different imaging modalities. We then present an Integer Programming approach to finding the optimal subset of paths, subject to structural and topological constraints that eliminate implausible solutions. Unlike earlier approaches that assume a tree topology for the networks, ours explicitly models the fact that the networks may contain loops, and can reconstruct both cyclic and acyclic ones. We demonstrate the effectiveness of our approach on a variety of challenging datasets including aerial images of road networks and micrographs of neural arbors, and show that it outperforms state-of-the-art techniques. Index Terms—Curvilinear networks, tubular structures, curvilinear structures, automated reconstruction, integer programming, path classification, minimum arborescence.
F
1
I NTRODUCTION
N
ETWORKS of curvilinear structures are pervasive both in nature and man-made systems. They appear at all possible scales, ranging from nanometers in Electron Microscopy image stacks of neurons to petameters in darkmatter arbors binding massive galaxy clusters. Modern acquisition systems can produce vast amounts of such imagery in a short period of time. However, despite the abundance of available data and many years of sustained effort, full automation remains elusive when the image data is noisy and the structures exhibit complex morphology. As a result, practical systems require extensive manual intervention that is both time-consuming and tedious. For example, in the DIADEM challenge to map nerve cells [2], the results of the finalist algorithms, which proved best at tracing axonal and dendritic networks, required substantial time and effort to proofread and correct [3]. Such editing dramatically slows down the analysis process and makes it impossible to exploit the large volumes of imagery being • Engin T¨uretken is with the Computer Vision Laboratory, EPFL, Lausanne, and the Swiss Center for Electronics and Microtechnology (CSEM), Neuchˆatel, Switzerland. E-mail:
[email protected] • Przemysław Głowacki and Pascal Fua are with the Computer Vision Laboratory, EPFL, Lausanne, Switzerland. E-mail:
[email protected],
[email protected] • Fethallah Benmansour is with Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland. E-mail:
[email protected] • Bjoern Andres is with the School of Engineering and Applied Sciences, Harvard University, Cambridge, US. E-mail:
[email protected] • Hanspeter Pfister is with the Visual Computing Group, Harvard University, Cambridge, US. E-mail:
[email protected]
produced for neuroscience and medical research. Part of the problem comes from the fact that scoring paths by integrating pixel values along their length, as is done in many automated methods, often fails to adequately penalize short-cuts and makes it difficult to compute commensurate scores for paths of different lengths. An additional difficulty comes from the fact that many interesting networks, such as those formed by roads and blood vessels depicted by Fig. 1 contain loops, which few existing methods attempt to model explicitly. Furthermore, even among structures that really are trees, such as the neurites of Fig. 1, the imaging resolution is often so low that the branches appear to cross, thus introducing several spurious cycles that can only be recognized once the whole structure has been recovered. In fact, this is widely reported as a major source of error [4], [5], [6], [7], [8], [9]. In this paper, we attempt to overcome both of these limitations simultaneously by formulating the reconstruction problem as one of solving an Integer Program (IP) on a graph of potential tubular paths. We further propose a novel approach to weighting these paths using discriminativelytrained path classifiers. More specifically, we first select evenly spaced voxels that are very likely to belong to the curvilinear structures of interest. We treat them as vertices of a graph and connect those that are within a certain distance of each other by geodesic paths [10], to which we assign informative scores based on an original classification approach. We then look for a subgraph that maximizes a global objective function that combines both image-based and geometry-based terms. Unlike earlier approaches that aim at building trees, we look for a potentially loopy subgraph while penalizing the formation of spurious junctions and early branch terminations. This is done by letting graph vertices be used by several branches, thus allowing cycles
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Road Networks (a) Aerial
Blood Vessels (b) Confocal
Neurons (c) Brightfield
Axons (d) Confocal
2
Dendrites and Axons (e) Brainbow
Fig. 1. 2D and 3D datasets used to test our approach. (a) Aerial image of roads. (b) Maximum intensity projection (MIP)
of a confocal image stack of blood vessels, which appear in red. It features cycles created by circulatory anastomoses or by capillaries connecting arteries to veins. (c) Minimum intensity projection of a brightfield stack of neurons. (d) MIP of a confocal stack of olfactory projection axons. (e) MIP of a brainbow [1] stack. As most images in this paper, they are best visualized in color.
(a)
(b)
(c)
(d)
Fig. 2. Enforcing tree topology results in creation of spurious
junctions. (a) Image with two crossing branches. The dots depict sample points (seeds) on the centerlines found by maximizing a tubularity measure. In 3D, these branches may be disjoint but the z-resolution is insufficient to see it and only a single sample is found at their intersection, which we color yellow. (b) The samples are connected by geodesic paths to form a graph. (c,d) The final delineation is obtained by finding a subgraph that minimizes a global cost function. In (c), we prevent samples from being used more than once, resulting in an erroneous delineation. In (d), we allow the yellow point to be used twice, and penalize creation of early terminations and spurious junctions, which yields a better result.
as shown in Fig. 2, but introducing a regularization prior and structural constraints to limit the number of branchings and terminations. Our approach was first introduced in [11] and [12], both of which produce connected reconstructions. We combine them here and propose new connectivity constraints along with a new optimization strategy to impose them, which result in significant speed-ups. Furthermore, we provide a more thorough validation with new experiments and comparisons to several other methods. We demonstrate the effectiveness of our approach on a wide range of noisy 2D and 3D datasets, including aerial images of road networks, and micrographs of neural and vascular arbors depicted by Fig. 1. We show that it consistently outperforms state-of-the-art techniques.
2
R ELATED W ORK
Most delineation techniques and curvilinear structure databases use a sequence of cylindrical compartments as a representation of curvilinear structures [13], [14], [15], [16],
[17], [18], [11], [19]. Besides being compact, this geometric representation captures two important properties, namely connectedness and directedness, which are common among many structures such as blood vessels and neurons and are essential in studying their morphology. Current approaches can be roughly categorized as manual, semi-automated or automated. Since we focus on automation, we discuss the first two categories in Appendix A and briefly review existing automated techniques below. Most of them require a seed point, the root, to be specified for each connected network in the image. Starting from the roots, they then grow branches that follow local maxima of a tubularity measure that captures the likelihood of a point being on the centerline of a curvilinear structure. Depending on the search mechanism employed, existing algorithms can be subdivided into local and global ones as discussed below. 2.1
Local Search
Local search algorithms make greedy decisions about which branches to keep in the solution based on local image evidence. They include methods that segment and skeletonize the tubularity image [20], [13], [21], [22], [23], [24], and active contour-based methods, which are initialized from it [25], [26], [5]. Such methods are shown to be effective and efficient when a very good segmentation can be reliably obtained. In practice, however, this is hard to do. In particular, thinning-based methods often produce disconnected components and artifacts on noisy data, which then require considerable post-processing and analysis to merge into a meaningful network. Another important class of local approaches involves explicitly delineating the curvilinear networks. It includes greedy tracking methods that start from a set of seed points and incrementally grow branches by evaluating a tubularity measure [27], [28], [29], [30], [6], [31], [9]. High tubularity paths are then iteratively added to the solution and their end points are treated as the new seeds from which the process can be restarted. These techniques are computationally efficient because the tubularity measure only needs to be evaluated for a small subset of the image volume in the vicinity of the seeds. However, they typically
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
require separate procedures to detect branching points. Furthermore, due to their greedy nature, imaging artifacts and noise can produce local tracing errors that propagate. This often results in large morphological mistakes, especially when there are large gaps along the filaments. 2.2 Global Search Global search methods aim to achieve greater robustness by computing the tubularity measure everywhere. Although this is more computationally demanding, it can still be done efficiently in Fourier space or using GPUs [32], [33], [34]. They then optimize a global objective function over a graph of high-tubularity seed points [8] or superpixels [35]. Markov Chain Monte Carlo algorithms, for instance, belong to this class [36], [37], [38]. These methods explore the search space efficiently by first sampling seed points and linking them, and then iteratively adjusting their positions and connections so as to minimize their objective function. However, while they produce smooth tree components in general, the algorithms presented in [36], [37] do not necessarily guarantee spatial connectedness of these components. By contrast, many other graph-based methods use the specified roots to guarantee connectivity. They sample seed points and connect them by paths that follow local maxima of the tubularity measure. This defines a graph whose vertices are the seeds and edges are the paths linking them. The edges of the graph form an overcomplete representation of the underlying curvilinear structures and the final step is to build a solution by selecting an optimal subset of these candidate edges. Many existing approaches weight the edges of this graph and pose the problem as a minimum-weight tree problem, which is then solved optimally. Algorithms that find a Minimum Spanning Tree (MST) [39], [14], [7], [18] or a Shortest Path Tree (SPT) [17] belong to this class. Although efficient polynomial-time algorithms exist for both SPTand MST-based formulations, these approaches suffer from the fact that they must span all the seed points, including some that might be false positives. As a result, they produce spurious branches when seed points that are not part of the tree structure are mistakenly detected, which happens all too often in noisy data. Of course, some of the spurious branches can be eliminated after the fact by pruning the tree. In the results section, we will compare this pruning approach to our IP one and show that the latter usually gives better results. Our earlier k-Minimum Spanning Tree (k-MST) formulation [8] addressed this issue by posing the problem as one of finding the minimum cost tree that spans only an a priori unknown subset of k seed points. However, the method relies on a heuristic search algorithm and two distinct objective functions, one for searching and the other for scoring, without guaranteeing the global optimality of the final reconstruction. Furthermore, it requires an explicit optimization over the auxiliary variable k, which is not relevant to the problem. By contrast, the integer programming formulation we advocate in this paper involves minimization of a single global objective function
3
that allows us to link legitimate seed points while rejecting spurious ones by finding the optimum solution to within a small user-specified tolerance. Furthermore, all the spanning tree approaches rely on the local tubularity scores to weight the graph edges. For example, global methods that rely on geodesic distances express this cost as an integral of a function of the tubularity values [40], [7]. Similarly, active contour-based methods typically define their energy terms as such integrals over the paths [18], [33]. In our own experience [8], because they involve averaging, such measures are not particularly effective at ignoring paths that are mostly on the curvilinear structures but also partially on the background. Moreover, because the scores are computed as sums of values along the path, normalizing them so that paths of different lengths can be appropriately compared is non-trivial. By contrast, the path classification approach we introduce in Section 5 returns comparable probabilistic costs for paths of arbitrary length. Furthermore, our path features capture global appearance, while being robust to noise and inhomogeneities. Last but not least, none of the existing approaches addresses the delineation problem for loopy structures. For the cases where spurious loops seem to be present, for example due to insufficient imaging resolution in 3D stacks or due to projections of the structures on 2D [41], some of the above-mentioned methods [7], [5] attempt to distinguish spurious crossings from legitimate junctions by penalizing sharp turns and high tortuosity paths in the solutions and introducing heuristics to locally and greedily resolve the ambiguities. They do not guarantee global optimality and, as a result, can get trapped into local minima, as reported in several of these papers. This is what our approach seeks to address by looking for the global optimum of a welldefined objective function.
3
A PPROACH
We first briefly outline our reconstruction algorithm, which goes through the following steps depicted by Fig. 3: 1) We first compute a tubularity value at each image location xi and radius value ri , which quantifies the likelihood that there exists a tubular structure of radius ri , at location xi . Given an N -D image, this creates an (N + 1)-D scale-space tubularity volume such as the one shown in Fig. 3(b). 2) We select regularly spaced high-tubularity points as seed points and connect pairs of them that are within a given distance from each other. This results in a directed tubular graph, such as the one of Fig. 3(c), which serves as an overcomplete representation for the underlying curvilinear networks. 3) Having trained a path classifier using such graphs and ground-truth trees, we assign probabilistic weights to pairs of consecutive edges of a given graph at detection time as depicted by Fig. 3(d). 4) We use these weights and solve an integer program to compute the maximum-likelihood directed subgraph
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(a)
(b)
(c)
4
(d)
(e)
Fig. 3.
Algorithmic steps. (a) Aerial image of a suburban neighborhood. (b) 3-D scale-space tubularity image. (c) Graph obtained by linking the seed points. They are shown in red with the path centerlines overlaid in green. (d) The same graph with probabilities assigned to edges using our path classification approach. Blue and transparent denote low probabilities, red and opaque high ones. Note that only the paths lying on roads appear in red. (e) Reconstruction obtained by our approach.
of this graph to produce a final result such as the one of Fig. 3(e). These four steps come in roughly the same sequence as those used in most algorithms that build trees from seed points, such as [39], [7], [18], [8], but with three key differences. First, whereas heuristic optimization algorithms such as MST followed by pruning or the k-MST algorithm of [8] offer no guarantee of optimality, our approach guarantees that the solution is within a small tolerance of the global optimum. Second, our approach to scoring individual paths using a classifier instead of integrating pixel values as usually done gives us more robustness to image noise and provides peaky probability distributions, such as the one shown in Fig. 3(d), which helps ensure that the global optimum is close to the ground truth. Finally, instead of constraining the subgraph to be a tree as in Fig. 2(c), we allow it to contain cycles, as in Fig. 2(d), and penalize spurious junctions and early branch terminations. In the results section, we show that this yields improved results over very diverse datasets.
4 G RAPH C ONSTRUCTION We build the graphs such as the one depicted by Fig. 3(c) in four steps. We first compute the Multi-Directional Oriented Flux tubularity measure introduced in [42]. This measure is used to assess if a voxel lies on a centerline of a filament at a given scale. Unlike the original Oriented Flux approach [33] that relies on a circular model of the crosssections, this measure allows for arbitrarily-shaped ones that are prevalent in biological imagery. This is achieved by maximizing the image gradient flux along multiple directions and radii, instead of only two with a unique radius. We then suppress non-maxima responses around filament centerlines using the NMST algorithm [42], which helps remove some artifacts from further consideration. More specifically, the NMST algorithm suppresses voxels that are not local maxima in the plane that is perpendicular to a local orientation estimate and within the circular neighborhood defined by their scale. It then computes the MST
of the tubularity measure and links pairs of disconnected components in the non-maxima suppressed volume with the MST paths. Next, we select maximum tubularity points in the resulting image and treat them as graph vertices, or seeds, to be linked. They are selected greedily, at a minimal distance of d to every point that was selected previously. Finally, we compute paths linking every pair of seed points within a certain distance l(d) from each other using the minimal path method in the scale space domain [40]. We take the geodesic tubular path connecting vertices i and j to be pij = argmin γ
Z
0
1
P (γ(s)) ds,
(1)
where P is the negative exponential mapping of the tubularity measure, s ∈ [0, 1] is the arc-length parameter and γ is a parametrized curve mapping s to a location in RN +1 [10]. As mentioned earlier, the first N dimensions are spatial ones while the last one denotes the scale. The minimization relies on the Runge-Kutta gradient descent algorithm on the geodesic distance, which is computed using the Fast Marching algorithm [43].
5
PATH C LASSIFICATION
Once the graph has been built, a key component of our algorithm, and one of the two main contributions of this paper, is our approach to assigning weights to its edges, or more specifically pairs of consecutive edges. As can be seen in Fig. 3(d), it is designed so that only relatively few edge pairs receive high scores and considerably simplifies the task of the IP solver used in step 4. In fact, we have checked that without this property, the solver often fails to converge. Furthermore, we will show in Section 7 that using these weights in conjunction with other approaches to finding desirable subgraphs does also bring about an improvement, albeit a smaller one than using the IP solver. A standard approach to computing such weights is to integrate a function of the tubularity values along the paths, as in Eq. 1. However, as shown in Fig. 4(a), the resulting estimates are often unreliable because a few very
to computing the probabilimage imagegradient gradient gradientrI(x) rI(x) rI(x)atat atthat that thatpoint. point. point. to computing the probabilpaths paths paths by byby summing summing summing tubularity tubularity tubularity values values values of of of Section Section Section 4.1 4.1 4.1 results results results in, in, in, image normal ray vector emanating from the centerline C p normal ray vector emanating from the centerline C passing from from from left left left to toto right, right, right, shortcuts, shortcuts, shortcuts, spurious spurious spurious branches, branches, branches,and and and missing missing missing ity estimates that we found ity estimates that we found 3 3 branches. branches. branches. (Bottom) (Bottom) (Bottom) Our Our Our classification classification classification approach approach approach to to to scoring scoring scoring byC(s x, and C(sx ) the point tofeatures it as illu by the x, and )three thecases. closest pointclosest to it Finally, as illustrated xallall changes. changes. changes. Finally, Finally, these these these features features are are arefed fed fedin paths paths paths yields yields yields the the right right right answer answer answer ininin all three three cases. cases. to bereliable. more More reliable. More to be more detect detect detecta objects objects objects ofof ofinterest. interest. interest. We Weadapt adapt adaptthis this thissts by Fig. 9. by Each weighted vote Fig.such 9. point Each contributes such point contributes aWe weighte specifically, given aINTELLIGENCE tubular specifically, given a krI(x)k tubular to a histogram bin, which we paths paths pathsby by bydefining defining definingHistogram Histogram Histogram ofof ofGradient Gradient Gradien SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE 5 take to be krI(x)k to awe histogram bin, which we take to be In InInthis this thissection, section, section, we we propose propose proposeaaa descriptors descriptors descriptors asas asfollows. follows. follows. path computed as discussed path computed as discussed path-classification path-classification path-classification approach approach approach tototo Given Given Givenaaatubular tubular tubularpath path path (s) (s) (s)such such suchasas asthe th th computing computing computing the theprobability probability probabilityestiestiesti- Fig. in Section we break it break ⇢it ⇢ the in 4.1, Section 4.1, we Fig. Fig.8,8,8,with with withsssbeing being beingthe the thecurvilinear curvilinear curvilinearabsciss abscis absci angle(rI(x), , if kx C(s )k > " mates mates matesthat that thatwe we weN(x)) found found foundtototobe be bemore more more x angle(rI(x), N(x)) , if kx C(s )k > " centerline centerline centerline and and and r(s) r(s) r(s) the the the corresponding corresponding corresponding radi rad ra x down into several (x)= (22) down into segments several segments reliable. reliable. reliable.More More More⇧(x)) specifically, specifically, specifically, given given given partition (x)= angle(rI(x), , otherwise, partition partition the the thepath path pathinto into intoequal-length equal-length equal-lengthover ove ove angle(rI(x), ⇧(x)) ,and, otherwise, and compute one feature vecaa atubular tubular tubularpath path pathcomputed computed computedas asasdisdisdisand compute one feature vec(s) (s) (s) and, and, for for for each, each, each, we we we compute compute computehisto hist hist i i i Fig. 8, with s cussed cussed cussed in in in Section Section Section 4.1, 4.1, 4.1, we we we break break break tor based on gradient hisbetween between betweenimage image imagegradients gradients gradientsrI(x) rI(x) rI(x)and and andnorm nor no tor based on gradient hisitititis down down down into into several several severalsegments segments segments plane, theinto cross-sectional use (s)ww obtained obtained obtainedwhich from from fromthe the thewe points points points xxx222 i (s) ii(s) tograms for each. We then use where ⇧(x) and and andcompute compute compute one one onefeature feature feature vector vector vector where ⇧(x) is the cross-sectional plane, which w segment. segment. segment. To To To ensure ensure ensure that that that all all all the the the gradient gradien gradien tograms for each. We then use m(s)/3to = the the deviation angle when axmargin belongs for foreach each each based based basedon on ongradient gradient gradienthishishisan embedding approach [58] to computetofor rounding rounding rounding the the thefixed path path pathissize isiscaptured, captured, captured, we we enlarge enlarg enlarg values. A margin iswe not pref compute the deviation angle when x belongs an embedding approach [58] tograms. tograms. tograms. We We We then then then use use use an an an emememthe normal ray vector is not aaamargin margin margindefined. m(s) m(s) m(s)=== ⇤r(s) ⇤r(s) ⇤r(s)kokomo kokomo kokomoproporti propor propor to compute fixed-size de- centerline and bedding bedding beddingapproach approach approach [57] [57]tototo comcomcom- values. centerline and[57] the normal rayAAvector ismargin notisisdefined. values. values. Afixed fixed fixedsize size size margin margin isnot not notpreferre preferr prefer fixed-size Todeobtain a fixed-size description of from paths’ appearance profile scriptors to fromcompute the potentially Fig. Fig. Fig. 6. 6.6. Tubular Tubular Tubular graph graph graph pute pute pute fixed-size fixed-sizedescriptors descriptors descriptors from from interference interference interferenceofof ofbackground background backgroundgradients gradients gradientsfor for for of ofofFig. Fig. Fig.3(c) 3(c) 3(c)with with withedge edge edge the Topotentially obtainarbitrary a description of paths’ appearance the the potentially potentially arbitrary arbitrary number number number the potentially Fig. 6. Tubular graph arbitrary scriptors on the cross-sectional plane, we further split the neighnumber from of feature Furthermore, Furthermore, Furthermore,toto toobtain obtain obtainaaadescription description description weights weights weightscomputed computed computedby byby of ofoffeature feature featurecross-sectional vectors vectors vectorswe we weobtain. obtain. obtain.FiFiFi-plane, we further split the of Fig.Fig. 3(c)6. with Tubular edge graph arbitrary numberintegrating on the sectional sectional sectional appearance appearance appearance profile, profile, profile, we we we split split split its its it integrating integrating tubularity tubularity tubularity of borhood feature N (nally, ) into Rthese equally spaced radius intervals as vectors we obtain. Finally, values we nally, we we we feed feed feed these these descriptors descriptors descriptors tototo values values along along along the the the paths paths paths nally, weightsof computed in- edge into into intoRRRequally equally equallyspaced spaced spacedradius radius radiusintervals intervals intervalsasaa Fig. 3(c)by with borhood N ( ) into R equally spaced radius interv Fig. 5. Three aspects of our feature extraction process. An a a classifier a classifier classifier and and and turn turn turn its its its output output output into into into shown in Fig. 9 and create a histogram for each such vectors we obtain. Finally, we instead instead instead of of of using using using our our our feed and and andcreate create createtwo two twohistograms histograms histogramsfor for foreach each eachsuch such suchin tegrating tubularity values by weights computed in-these to a classifier and path path path classification classification classification aa aprobability probability probability estimate. estimate. neighborhood of points around thethe path centerline in estimate. Fig. 9bottom and create a histogram for each first first first histogram histogram histogram captures captures capturesand the the thestrength strength strength ofof oft interval. Given Bshown orientation bins, radius interval feed these to a classifier and turn its output intoextended a probaalong the paths instead of values approach. approach. approach. We We We use use usethe the theshown As As Asshown showninininthe the thebottom bottom row row row tegrating tubularity inside inside insidean an aninterval, interval, interval, the the thesecond second secondone onecaptu capt cap same same same color color colorangular scheme scheme scheme C(s) isinto defined as the ofBacross-sectional circles the bin indices for point then given by one interval. Given orientation the radius interv using along our path of of ofenvelope Fig. Fig. Fig. 7,7,7,this this thisapproach approach approach penalpenalpenal- x are bins, bility turn its output probathe classificapaths instead of estimate. about about aboutthe the thesegment segment segmentcenterline. centerline. centerline. as asas in ina in Fig. Fig. Fig. 3(d) 3(d) 3(d) to toto izes izes izespaths paths pathsthat that that mostly mostly mostly follow follow follow shown demonstrate in black. This neighborhood isxindices into R radius (a) tion approach. We path use(b) the min(R 1,the bRkN(x)k/(r(s )divided + m(sfor ))c) and min(B demonstrate demonstrate how how how (s), let let letC(s C(s C(s Given Given aapoint point point xx xx 22are 2 i (s), ii(s), xx xGiven angular bin a apoint then giv using our classificabilityinestimate. As shown Fig. 7, this the thetree tree treestructure structure structurebut but butcross cross crossthe the the centerline much much muchless less lessinformative informative informative the centerline centerline point, point, point, and and andN(x) N(x) N(x)be bethe the thenorma norma norm same color scheme as in intervals highlighted by the yellow, green and red tubes (here 1, bB (x)/⇡c) respectively. For each segment, pro-be tion approach. We use the these these these weights weights weights are. are. are. C(s C(s C(s )toto to+ x, x, x,as as asthis illustrated illustrated illustrated by by byFig. Fig. Fig.mi 8.8. 8. background. background. background.Thus, Thus, Thus, itititdiscourages discourages discourages min(R 1, bRkN(x)k/(r(s ) m(s ))c) and xx)x)x approach penalizes paths that x Fig. 4. Scoring paths vs Integration. As (a) shown=in3)Fig. Fig. by 3(d)classification to demonstrate 7,a this contributes contributes aaavote vote vote toto to aaabin bin binofof ofthe the thegrad gra gra andduces histogram is and created for eachcontributes such interval. shortcuts shortcuts shortcuts and andeach spurious spurious spurious branches, branches, branches, R histograms, one corresponding ahistograms. radius same color scheme as in follow theRtrue 1, bBalong (x)/⇡c) respectively. For to each segment, th mostly tree gradient-symmetry gradient-symmetry gradient-symmetry histograms. histograms. The The Thebin bin bin much informative Tubular graph of Fig.how 3(c) withless edge weights computed by A which which which the the theintegration integration integration approach approach approach along alongthe the thepath path pathfails fails failstototodo. do. do. an approach penalizes paths that point x contributes a weighted vote to angular bin Fig. 3(d) to demonstrate the the the following following following equation: equation: equation: interval. We interpolate points within each such interval to to a weights are. In InInthe the theremainder remainder remainderof ofofthis this thissection, section, section, we we wedescribe describe describeour our ourpath path path structure but cross the backduces R histograms, each one corresponding integrating tubularity these values along the paths instead of using ⇢ ⇢ ⇢ the true treeangle according toensure the between the normal N(x) the N(x)) how much less informative mostly follow that enough votes are used to form theand histograms. features, features, features, embedding embedding embedding scheme, scheme, scheme, and and andtraining training trainingdata data datacollection collection collection angle(rI(x), angle(rI(x), angle(rI(x), N(x)) N(x)), ,if,ififkx kx kx C Thus, (x)= (x)= (x)= our path classification approach. We use same structure colorit discourages interval. We interpolate points within each⇧(x)) such inte angle(rI(x), angle(rI(x), angle(rI(x), ⇧(x)) ⇧(x)) , ,otherwise ,otherwis otherwis these weights are. theground. mechanism mechanism mechanism in inin more more morewe details. details. details. image gradient ∇I(x) at that point. but cross theFinally, backnormalize each histogram by the number of shortcuts, which is something that less integration scheme as in Fig. 3(d) to demonstrate how much infor- along the path ensure that enough votes where are used tothe form the histo where where⇧(x) ⇧(x) ⇧(x)isisisthe thecross-sectional cross-sectional cross-sectionalplan pla pla ground. Thus, it discourages points that voted Deviation for it. Descriptors mative these weightsfails are.to(b)do. In one of the microscopy stacks 5.1 5.1 5.1 Histogram Histogram Histogram of ofof Gradient Gradient Gradient Deviation Deviation Descriptors Descriptors each tototocompute compute compute the the thedeviation deviation deviation angle angle angle the the thesp Finally, we normalize histogram by theinininnum shortcuts, which is something that integration along the path This yields a set of histograms for each segment, which belongs belongstototothe the thecenterline centerline centerlineand and andthe the theno n Gradient Gradientorientation orientation orientationhistograms histograms histogramshave have havebeen been beensuccessfully successfully successfully xxxbelongs of the DIADEM data, scoring paths bya summing tubularity Fig. 8 illustrates flowchart of our approach. In Gradient the points that voted for it. isisisnot not notdefined. defined. defined. The The Thevotes votes votesare are areweighted weighted weighted applied applied appliedwe to totodetecting detecting detecting objects objects objects in ininaimages images images and and and recognizing recognizing recognizing to do. combine into single HGD descriptor. normalized to form features invariant toppplocal contrast values in, from left to remainder right, fails shortcuts, branches, of this spurious section, we describe and our pathand features, < < < rI(x), rI(x), rI(x), rI(C(s rI(C(s rI(C(s ) ) ) N(x)) N(x)) N(x)) > > > for for for the the th actions actions actionsin ininvideos videos videos[58], [58], [58],[59], [59], [59], [57]. [57]. [57]. In In In a a a typical typical typical setup, setup, setup, the the the x x x This yieldsare a set ofinto histograms for each segment, missing branches denoted byFig. the red circles at athe top.collection Using changes. Finally, these features fedcalled a gradient-symmetry classifier tohistograms 8 illustrates flowchart of our approach. In the and and gradient-symmetry gradient-symmetry histograms histogramsrespec respe respe image image image isisis first first first divided divided divided into into into aagrid agrid gridof ofoffixed-size fixed-size fixed-sizeblocks, blocks, blocks, called called and embedding scheme, and training data mechanism we into a single HGD descriptor. We We We compute compute compute the the the radius radius radius inter inte int cells, cells, cells, and and and then then thenfor for foreach each each cell, cell, cell,acombine a 1-D a1-D 1-Dhistogram histogram histogram of ofoforientated orientated orientated our classification approach to scoring paths yields the right 3 remainder our path features, objects of interest. We adapt this strategy for tubular in more details. of this section, we describedetect For simplicity, we assume that such a point is unique. angular angular bin bin bin indices indices indices for for for aaa point point point xxx gradients gradients gradients(HOG) (HOG) (HOG)isisisformed formed formedfrom from fromthe the thepixel pixel pixelvotes votes voteswithin within within angular
answer in all three cases, as shown inscheme, the bottom row. data collection embedding and training mechanism )x)) ++ + min(R min(R min(R 1,1,1,bRkN(x)k/(r(s bRkN(x)k/(r(s bRkN(x)k/(r(s it. it.it. Histograms Histograms Histogramsfrom from from neighboring neighboring neighboringcells cells cells are are then then thencombined combined combined xx paths by defining Histogram ofare Gradient Deviation (HGD) min(B min(B 1,1,1,bB bB bB (x)/⇡c), (x)/⇡c), (x)/⇡c),where where where BB B isi and and andnormalized normalized normalizedto totoform form form features featuresinvariant invariant invarianttototolocal local localcontrast contrast contrast min(B 3features in more details. For simplicity, we assume that such a point is unique. descriptors as follows.
high values along the path might offset low values and, as a result, fail to adequately penalize spurious branches and short-cuts. Furthermore, it is often difficult to find an adequate balance between allowing paths to deviate from a straight line and preventing them from meandering too much, which results in errors such as those depicted by the top row of Fig. 4(b) when using them to find the optimal subgraph in Step 4 of our algorithm. In this section, we propose a path-classification approach to computing the probability estimates that we found to be more reliable. More specifically, given a tubular path computed as discussed in Section 4, we break it down into several segments and compute one feature vector for each based on gradient histograms. We then use an embedding approach [44] to compute fixed-size descriptors from the potentially arbitrary number of feature vectors we obtain. Finally, we feed these descriptors to a classifier and turn its output into a probability estimate. As shown in the bottom row of Fig. 4(b), this approach penalizes paths that mostly follow the tree structure but cross the background. Thus, it discourages shortcuts and spurious branches, which the integration approach along the path fails to do. In the remainder of this section, we describe our path features, embedding scheme, and training data collection mechanism in more detail. 5.1
Histogram of Gradient Deviation Descriptors
Gradient orientation histograms have been successfully applied to detecting objects in images and recognizing actions in videos [45], [46], [44]. In a typical setup, the image is first divided into a grid of fixed-size blocks, called cells, and then for each cell, a 1-D histogram of oriented gradients (HOG) is formed from the pixel votes within it. Histograms from neighboring cells are then combined
Given a tubular path γ(s) such as the one depicted by Fig. 5, with s being the curvilinear abscissa, let C(s) be the centerline and r(s) the corresponding radius mappings. We partition the path into equal-length overlapping segments γi (s) and, for each, we compute histograms of angles between image gradients ∇I(x) and normal vectors N(x) obtained from the points x ∈ γi (s) within the tubular segment. To ensure that all the gradient information surrounding the path is captured, we enlarge the segments by a margin m(s) = β ∗ r(s) proportional to the radius values. A fixed size margin is not preferred to avoid biased interference of background gradients for thin paths. Furthermore, to obtain a description of paths’ crosssectional appearance profile, we split its tubular segments into R equally spaced radius intervals as shown in Fig. 5 and create two histograms for each such interval. While the first histogram captures the strength of the gradient field inside an interval, the second one captures its symmetry about the segment centerline. Given a point x ∈ γi (s), let C(sx ) be the closest centerline point, and N(x) be the normal ray vector from C(sx ) to x, as illustrated by Fig. 5. Each such point contributes a vote to a bin of the gradient-strength and gradient-symmetry histograms. The bin is determined by the following equation: Ψ(x)=
angle(∇I(x), N(x)) , if kx − C(sx )k > ε angle(∇I(x), Π(x)) , otherwise,
(2)
where Π(x) is the cross-sectional plane, which we use to compute the deviation angle in the special case when x belongs to the centerline and the normal ray vector is p not defined. The votes are weighted by k∇I(x)k and < −∇I(x), ∇I(C(sx ) − N(x)) > for the gradient-strength and gradient-symmetry histograms respectively. We compute the radius interval and the angular bin indices for a point x respectively as min(R − 1, bRkN(x)k/(r(sx ) + m(sx ))c) and
descriptors per path. To derive from them a fixed-size their union smaller 3) Intersection of ph and pg over 2) The ratioismin(l h , lg )/max(lh , lg ) is smaller than 5.2 (BoW) Embedding descriptor, we first use a Bag-of-Words approach than a threshold, taken to be 0.5 inthreshold. our experiments. This is to detect if ph is meandering to to compactly represent their featureThe space. The words of Weanlabel thosenumber negatives that partiallymuch overlap with above procedure produces arbitrary of HGD We used 0.75 for the ratio threshold inside pg .the the SUBMITTED BoW model by descriptors randomly sampling per ANALYSIS path. a To AND derive from them a fixed-size examples, matching pathsINTELLIGENCE as hard-to-classify and those is smalle 3) Intersection of that ph and pg over their union TO are IEEEgenerated TRANSACTIONS ON PATTERN MACHINE 6 predefined number of descriptors from the training descriptor, we firstdata. use a don’t Bag-of-Words approach ones. Inthan threshold, taken to be 0.5 in our experiments overlap as(BoW) easy-to-classify ouraexperiments, For a given path of arbitrary length, we then compute to compactly represent their feature space.examples The words of hard-to-classify constitute 99 of thenegatives neg- that partially overlap with th Wepercent label those BoW model are by randomly a the an embedding of the path’s HGDthe descriptors into thegenerated Embedding matchingsampling paths in corresponding ground truthexamples, Hi . Theand those tha matching paths as hard-to-classify ative examples. x x predefined of descriptors the training data. codewords of the model. Adapting theH embedding V1number x don’t overlap as ones.satisfy In our experiments Finally,from path lengths are randomly chosen from aeasy-to-classify length 1sequence candidate paths are considered as negatives if they . x x of arbitrary For Euclidean a given path length,certain we then compute approach of [58], we find the minimum distance hard-to-classify examples 99 percent of the neg learnt from graphs in the training dataset. Thisasconstitute x xdistribution incompatibility conditions, such having a small d1 H an2 embedding of This the path’s HGD descriptors into the from the path’s descriptors to each word in theV2model. ative examples. is achieved by first labeling consecutive edge pairs in the d 2 H overlap with their respective matching ones. . x i VAdapting . model. codewordsthat of the sequence pathprocedure lengths are randomly chosen from a lengt . j graphs.. asthepositive yields a feature vector of minimal distances has the orembedding negative using Finally, the above dj More given a randomly selected candidate x we find the minimum d N approach of [58], Euclidean distance distribution learntone from same length as the number of elements in the BoWx model. Hland then constructingspecifically, two separate distributions, forgraphs in the training dataset. Thi x to each word path p of a graph G , we find its matching ph by first V from the path’s descriptors in the model. This k x share is achieved by first labeling consecutive edge pairs in th g i To account for geometry and characterize paths that . dk each class, using the assigned labels. x HM VN a feature graphs as positive using the above finding that the has twothecenterline points in or Hinegative that are closest to procedur a common section, such as the oneyields shown in Fig. vector 4(a), of minimal distances same as the curnumber6of elements in the BoW and then two separate distributions, one fo we incorporate into these descriptors thelength maximum I MPLEMENTATION D ETAILS the start andmodel. end points of pconstructing g in the world coordinates of for geometry and characterize paths that share each class, using the assigned labels. Fig.along 6. Flowchart of ourcurve approach to appearance vature the centerline C. To It account is extracting computed as the imageFig. Ii describe and thendetails extracting shortest directed path In this section, of thethe feature a common section, such as the one shownwein first 4(a), features length. 0 from tubular paths of arbitrary argmaxkT (s))k, where T(s) is the unit tangent vector. extraction and parameter selection steps. We then briefly p connecting these points in H . h i we incorporate into these descriptors the maximum cur- 6 I MPLEMENTATION D ETAILS describe procedures toas reduce the size of thelength QMIP l(p) denote the centerline a pathdetails p, and vature along the centerline curveseveral C. ItLet is computed In this section, we firstofdescribe of the featur 5.3 Collecting Training Data argmaxkT0 (s))k, where T(s) problem instances prior to optimization. is the unit tangent vector. l (p ) denote the length of p ’s longest segment that We is then briefl extraction and parameter selection steps. p 2 2 1 min(B − 1, bBΨ(x)/πc), where B is the number of Given a set of training images {Ii }, the associated ground several to reducepthe sizea of the QMIP outside the volume describe enclosed by pprocedures 1 . We consider g as histogram bins. For each segment, this produces R pairs 6.1Data Feature Extraction tubular graphs truth tracings {H i } annotated manually 5.3 and Collecting Training problem instances prior to of optimization. negative example if it satisfies at least one the following histograms, each one corresponding to asample radius interval. The HGD descriptors introduced {Gof i } obtained using the method of Section Given a4.1, set we of training images {Ii }, the associated ground in Section 5.1 are comcriteria: we normalize each histogram by the number of puted for a group of overlapping segments on a Extraction path. To an Finally, equal number of positive and negative paths from each Feature truth tracings {Hi } annotated manually and tubular graphs 6.1 1) 4.1, max(l lph (p larger than a length thresh-5.1 are com points that voted for it. pg (ph ), The g )) is HGD descriptors introduced in Section we sample {Gi } obtained using the method of Section oldfrom tl , each whichputed is taken to beofthe minimum imageon a path. T This yields a set of histograms for number each segment, for a group overlapping segments an equal of positivewhich and negative paths
spacing in our experiments. 2) The ratio min(l(ph ), l(pg ))/max(l(ph ), l(pg )) is smaller than a threshold, which we set to 0.75. This is to detect if pg is meandering too much with respect to ph . 3) Volumetric intersection of ph and pg over their union is smaller than a threshold, taken to be 0.5 in our experiments.
we combine into a single HGD descriptor. 5.2
Embedding
The above procedure produces an arbitrary number of HGD descriptors per path. To derive a fixed-size descriptor from them, we first use a Bag-of-Words (BoW) approach to compactly represent their feature space. The words of the BoW model are generated during training by randomly sampling a predefined number of descriptors from the training data. For a given path of arbitrary length, we then compute an embedding of the path’s HGD descriptors into the codewords of the model. As illustrated in Fig. 6, computing the embedding amounts to finding the minimum Euclidean distance from the descriptors to each word in the model. The resulting feature vector of distances has the same length as the number of elements in the BoW model. To account for geometry and characterize paths that share a common section, such as the one shown in Fig. 7(a), we incorporate into these descriptors four geometry features, the maximum curvature along the centerline curve C(s), its tortuosity, and normalized length along the z and the scale axes defined in Section 4. For a path of length L in world coordinates and distance d between its start and end points, these features are defined respectively as argmaxkT0 (s)k, d/L, ∆z /L and ∆r /L, where ∆z and ∆r are the path lengths along the image axial dimension z and the scale dimension, and T(s) is the unit tangent vector depicted in Fig. 5. 5.3
Training
Given a set of training images {Ii }, the associated ground truth tracings {Hi } annotated manually and tubular graphs {Gi } obtained using the method of Section 4, we sample an equal number of positive and negative paths from each pair {Hi , Gi }. To train the path classifier, we obtain positive samples by simply sampling the ground truth delineations {Hi } associated with our training images. We obtain negative samples by first randomly selecting candidate paths from a tubular graph Gi , and then attempting to find
We label those negatives that partially overlap with the matching paths as hard-to-classify examples, and those that do not overlap as easy-to-classify ones. In our experiments, hard-to-classify examples constitute 99 percent of all the negative examples. We choose the path lengths randomly from a probability distribution built from the consecutive edge pair lengths of the graphs {Gi } in the training dataset. This is achieved by first labeling the edge pairs as positive or negative using the above procedure and then constructing two separate distributions, one for each class, using the assigned labels. Finally, at detection time, we run the path classifier on consecutive edge pairs and assign to them the resulting probabilities of belonging to the underlying curvilinear networks. In practice, as we show in Appendix B, we found a Gradient Boosted Decision Tree classifier to be better suited for this learning task than SVM classifiers with linear or RBF kernels, or random forest classifiers with oblique or orthogonal splits. It is therefore the one we use in our implementation.
6
F ORMULATING
THE I NTEGER
P ROGRAM
Formally, the graph-building procedure of Section 4, yields a graph G = (V, E), whose vertices V represent the seed points and directed edges E = {eij | i ∈ V, j ∈ V } geodesic tubular paths linking them. It is designed to be an overcomplete representation of the underlying network of tubular structures. In this section, we formulate the finding of an optimal subgraph of curvilinear structures as an integer program. We first constrain it to be a directed tree and then relax this constraint.
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
7
and consider the simpler acyclic case. We then extend our approach to networks that contain loops.
(a)
(b)
(c)
Fig. 7. Considering geometric relationships between edges
helps at junctions. (a) A closeup of the graph built by our algorithm at a branching point. (b) Minimizing a sum of individual path costs yields these overlapping paths. (c) Accounting for edge pair geometry yields the correct connectivity.
6.2 Probabilistic Model Let F = {eijk = (eij , ejk )} be the set of pairs of consecutive edges in G. By analogy to the binary variable tij of Eq. 3, let tijk denote the presence or absence of eijk in the curvilinear networks and Tijk be the corresponding hidden variable. Let also t and T be the set of all tijk and Tijk variables respectively. Given the graph G and the image evidence I, we look for the optimal delineation t∗ as the solution of t∗ = argmax P (T = t|I, G) , t∈Pc
= argmax P (I, G|T = t)P (T = t) , t∈Pc
6.1 Standard Formulation Many earlier approaches to finding optimal subgraphs [39], [7], [18], [8] can be understood as maximizing an a posteriori probability given a tubularity image and optionally a set of meta parameters that encode geometric relations between vertices and edge pairs. For example, in [8], the tree reconstruction problem is addressed by solving min
X
t∈T (G)
eij ∈E
caij
tij +
X
eij , ejk ∈E
cgijk
tij tjk
,
(3)
where T (G) denotes the set of all trees in G, tij is a binary variable indicating the presence or absence of eij in tree t. caij represents the cost of an edge, which can be either negative or positive and is computed by integrating pixelwise negative log-likelihood ratio values along the path connecting the vertices, while cgijk encodes the geometric compatibility of consecutive edges. As shown in Fig. 7, these geometric terms are key to eliminating edge sequences that backtrack or curve unnaturally. This approach, however, has three severe shortcomings. First, because the caij and cgijk are computed independently, they are not necessarily commensurate or consistent with each other. As a consequence, careful weighting of the two terms is required for optimal performance. Second, it disallows cycles by preventing vertices from being shared by separate branches, as is required for successful reconstruction cases such as the one of Fig. 2. Finally, optimizing using a heuristic algorithm [47] does not guarantee a global optimum. Nevertheless, we will show in the results section that the probabilities returned by the path classifier of Section 5 are sufficiently discriminative to produce competitive results using this kind of formulation. Furthermore, even better ones can be obtained by explicitly addressing the issues discussed above. We handle the first one by computing probability estimates, not on single edges, but on edge pairs so that both appearance and geometry can be accounted for simultaneously. Given such estimates, we solve the remaining two problems by reformulating the reconstruction problem not as the heuristic minimization of an energy such as the one of Eq. 3 but as a solution of an IP that allows loops while penalizing the formation of spurious junctions. In the following, we first introduce our probabilistic model
= argmin − log(P (I, G|T = t)) − log(P (T = t)),
(4)
t∈Pc
where we used the Bayes’ rule and assumed a uniform distribution over the image I and the graph G. The binary vector t belongs to the set Pc of binary vectors that define feasible solutions. In the following, we present two different ways of defining Pc and the two likelihood terms of Eq. 4. 6.3 Tree Reconstruction In this section, we take Pc to be the set of all directed trees T (G) of G, and use a uniform prior, which amounts to assuming that all tree shapes are equally likely. We therefore drop the prior term log(P (T = t)) from Eq. 4. Furthermore, we assume conditional independence of image evidence along the edge pairs {eijk }, given that we know whether or not they belong to the tree structure. We therefore represent the likelihood term P (I, G|T = t) as a product of individual edge pair likelihoods. As shown in Appendix C, this leads to ∗
t =argmin t∈T (G) e
X
− log
ijk ∈F
=argmin t∈T (G) e
X
! P (Tijk = 1|Iijk , eijk ) tijk P (Tijk = 0|Iijk , eijk )
(5) (6)
cijk tijk ,
ijk ∈F
where Iijk represents image data around the tubular path corresponding to eijk . The probability P (Tijk = 1|Iijk , eijk ) denotes the likelihood of edge pair eijk belonging to the tree structure, which we compute based on global appearance and geometry of the paths as described in Section 5. The cijk variables represent the probabilistic likelihood ratios we assign to the edge pairs. As discussed in Section 2, we found this more effective at distinguishing legitimate paths from spurious ones than more standard methods, such as those that integrate a tubularity measure along the path. In the following, we show that optimizing the objective function of Eq. 6 with respect to the constraints t ∈ T (G) amounts to solving a minimum arborescence problem [48] with a quadratic cost. By decomposing the indicator variable tijk introduced above as the product of the two variables tij and tjk , the minimization of Eq. 6 can be reformulated as the quadratic program X argmin
cijk tij tjk
t∈T (G) e
ij ,ejk ∈E
(7)
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
However, not all choices of binary values for the indicator variables give rise to a plausible delineation. The above minimization must therefore be carried out subject to the constraints t ∈ T (G) to ensure that the solution is a connected tree. We define these constraints by adapting the network flow formulation presented in [48], which provides a compact system with a polynomial number of variables and constraints. Assuming that the root vertex r ∈ V of the optimal tree is given, we express these constraints as X
j∈V \{r}
X
j∈V \{l}
X
l yrj ≤ 1,
∀l ∈ V \ {r},
(8)
l yjl ≤ 1,
∀l ∈ V \ {r},
(9)
l yij −
j∈V \{i,r}
l yij ≤ tij ,
X
l yji = 0,
j∈V \{i,l}
l yil = til ,
l yij ≥ 0, tij ∈ {0, 1},
∀l ∈ V \ {r}, ∀i ∈ V \ {r, l},
∀eij ∈ E, l ∈ V \ {r, i, j},
∀eil ∈ E,
∀eij ∈ E, l ∈ V \ {r, i}, ∀eij ∈ E,
(10) (11) (12) (13) (14)
where the are auxiliary continuous variables that denote the flow from the root vertex to all others. More specifically, l yij indicates whether the unique directed path from the root r to vertex l traverses the edge eij . If the optimal tree t∗ does not contain l and hence such a path does not exist, l then yij = 0. The first two constraints ensure that there can be at most one path in t∗ from the root to each vertex in the graph. The third one enforces conservation of flow at intermediate vertices i ∈ V . The remaining constraints guarantee that t∗ includes a path from the root to the vertex l passing through edge eil if t∗ contains eil . Some of our images contain several disconnected curvilinear structures. To avoid having to process them sequentially in a greedy manner, which may result in some branches being “stolen” by the first structure we reconstruct and therefore a suboptimal solution, we connect them all. Assuming we are given a set Vr ⊂ V of root vertices, one for each underlying curvilinear structure of interest, we create a virtual vertex v and connect it to each r ∈ Vr by zero cost edge pairs containing all other vertices to which r is connected. To reconstruct multiple disconnected structures at once, we therefore replace r in the constraints of Eqs. 8-13 with v and add the following constraints to the problem l yij
tvr = 1, ∀evr ∈ E : r ∈ Vr ,
(15)
which ensures that all the root vertices will be in the solution. 6.4
Loopy Reconstruction
As discussed above, allowing branches to cross produces cyclic graphs. However, in some cases, such as when delineating the neural structures of the brightfield and brainbow images of Fig. 1, we know that the underlying structure truly is a tree whose topology we will eventually want to recover. This means that we need to be able to distinguish one branch from the other. One approach would be to first recover the subgraph defined by the active edges and then attempt to assess its topology. However, to consistently enforce geometric constraints on branches
8
even at junctions, we do both simultaneously by reasoning in terms of whether consecutive pairs of edges belong to the final delineation or not. Fig. 8(a) depicts this approach in a specific case. Similarly, consider the vertices labeled i,j,k,l, and m in the graph of Fig. 2(b). In the delineation of Fig. 2(d), edge pairs eijk and emjl are both active and vertex j belongs to both branches. In practice, we seek to minimize the sum of the two negative log-likelihood terms of Eq. 4 as before but modify them to penalize the formation of spurious junctions. In the following, we first describe these two components and then introduce the constraints that allow graph vertices to be shared among separate branches. 6.4.1 Image and Geometry Likelihood Term Assuming conditional independence of the image evidence given the true values of the random variables Tijk , the first term of Eq. 4 can be rewritten as − log(P (I, G|T = t)) =
X
(cijk + wijk ) tijk ,
(16)
eijk ∈F
where cijk is the likelihood ratio of Eq. 6 and wijk is a learned compatibility function of edges eij and ejk . To see this, let us consider two sets Y and Z of auxiliary random variables denoting respectively the presence of edges in the optimal solution and compatibility of consecutive edge pairs. More specifically, let Y = {Yjk } be the vector of binary random variables indicating whether edges {ejk } truly belong to the underlying curvilinear structures, and y = {yjk } denote their realizations. As will be discussed in Section 6.4.3, we do not allow edges to have more than one active incoming edge pair in the solution. P Hence, we have yjk = eij ∈E tijk ≤ 1. As a result, for each edge ejk in the solution, there can be at most one parent edge eij such that tijk = 1. Let also Zjk be the random variable standing for the parent of ejk and Z be the vector of all such variables. That is, Zjk can take values from the set {eij | eij ∈ E \{ekj }}. There is a one-to-one deterministic relation between T and (Y, Z), which we write as Tijk = Yjk 1(Zjk = eij ),
∀eijk ∈ F
(17)
where 1(.) is an indicator function. We express the first term of Eq. 4 in terms of Y and Z as P (I, G|T = t) = P (I, G|Y = y, Z = z) Y = P (Ijk , Ejk |Yjk = yjk , Zjk = zjk )
(18) (19)
ejk ∈E
∝
Y
P (Zjk = zjk |Yjk = yjk , Ijk , Ejk )P (Yjk = yjk |Ijk , Ejk ) (20)
ejk ∈E
∝
Y
[P (Zjk = zjk |Yjk = 1, Ijk , Ejk )P (Yjk = 1|Ijk , Ejk )]yjk ×
ejk ∈E
[P (Zjk = zjk |Yjk = 0, Ijk , Ejk )P (Yjk = 0|Ijk , Ejk )]1−yjk (21) yjk Y Y c tijk ∝ (pijk ) P (Yjk = 1|Ijk , Ejk ) × ejk ∈E
eij ∈E
1−yjk P (Yjk = 0|Ijk , Ejk ) (22) " P # eij ∈E tijk Y Y pqjk deg ∗ (vj , vk ) , (23) ∝ (pcijk )tijk q (1 − pjk ) e ∈E e ∈E
jk
1 deg ∗ (vj , vk )
ij
n As a discussed eeffective at sh ones than integrate a SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE IGENCE 6 a llowing, we with a root vertex r (in red). Allowing t.loopy graph dgreen) to be used by the two different branches Eq. 5 with ye blue and ayellow arrows) produces a loopy soto solving Fig. 5. graph A loopy with a the root vertex rAllowing (in red). Allowing 5. A loopy withgraph a root vertex r crossing (in red). c ad of5.a tree. However, describing in Fig. A loopy graph with a used root vertex (in red). Allowing cost. vertex toc,by be by ther two different branches xadratic c (in to be different ge pairs {i,cgreen) c,(in j}green) and {k, l}the being active andbranches all branches ertex cgreen) (in toused be used by two the two different introduced (denoted by blue and yellow arrows) produces a loopy sooted by blue andc,and yellow arrows) produces a loopy sopairs containing such as {k, c, j}, being inactive 6) denoted by blue yellow arrows) produces a loopy so- in and lution tjk , the instead of a tree. However, describing the crossing instead a of tree. However, describing the acrossing in rin(in red). ssible to of eventually recover the tree topology. Fig. 5. However, A loopy graph with rootcrossing vertex Allowing ution instead a tree. describing the he quadratic
9
termspairs of edge {i, green) c, j}c,and {k, c, l} being active and all branches of edge {i, c,pairs j}cc,and {k, l}toc, being active and all vertex (in be used by the two all different erms of edge pairs {i, j} and {k, l} being active and
other pairs containing such as being {k, arrows) c, j}, being inactivea loopy soor edge pairsedge containing c, such asc,{k, c,{k, j}, inactive (denoted byc,blue and yellow produces other edge pairs containing such as c, j}, inactive (6) makes it to possible to eventually recover thebeing tree topology. es itReconstruction possible eventually recover tree topology. py lution instead ofrecover a the tree. However, describing the crossing in makes it possible to eventually the tree topology. e of edge pairs {i, j} and ed previously, terms allowing branches to c, cross pro-{k, c, l} being active and all a other edge pairs containing c, such as {k, c, j}, being inactive he indicator 4.5 Reconstruction Loopy Reconstruction ic graphs as the one shown in Fig. recover 5. e Loopy makes it possible to eventually the tree topology. The abovesuch 4.5 Loopy Reconstruction es A loopy graph with a root vertex r (in red). Allowing in some cases, such as when delineating the previously, allowing branches to cross probjectAstodiscussed the iscussed previously, branches tobranches cross proes As branches to cross pro(indiscussed green) to previously, be usedallowing by allowing the different ctures of brightfield andtwo brainbow images olution is the acyclic duces graphs as the one in Fig. 5. graphs such as such the one shown inshown Fig. 5. de cyclic by blue andgraphs yellow arrows) produces a loopy so-Fig. (a) (b) (c) duces cyclic such as the one shown in 5. 4.5 Loopy Reconstruction adapting the Fig. 1,ofwe know that the underlying structure However, in some cases, such as when delineating stead a tree. However, describing the crossing in ever, in some cases, suchsuch as when delineating the the the However, intopology some cases, as when delineating ich provides whose we will eventually want to allbranches As discussed previously, allowing to cross proneural structures of the brightfield and brainbow images free edge pairs {i, c, j} and {k, c, l} being active and variables l structures of the brightfield and and brainbow images neural structures ofc, the brightfield brainbow images 7)of the case of Fig. 5, this means that we need duces cyclic graphs as the one shown in Fig. ge such as {k, c,that j},such being inactive shown Fig. 1, we know the underlying structure xhown vinpairs ofincontaining the Fig. 8. Reasoning about edge-pairs. (a) 5.A loopy graph with a root vertex r, in red. Allowing vertex c, in green, to be used r Fig. n 1,in we know that the underlying structure Fig. 1, know that the underlying tints possible to eventually recover thethe tree topology. to distinguish onewe branch from other. One However, in some cases, such asstructure when delineating the truly is a tree whose topology we will eventually want to as is a tree whose topology we will eventually want to by the two different branches, denoted by blue and yellow arrows, produces a loopy solution instead of a tree. However, ruly is a tree whose topology we will eventually want to 8) recover. would be to first recover the defined by that structures of5,the brightfield andwebrainbow images Inneural the case of subgraph Fig.means this means need er. In the case of Fig. 5, this that we need (7) ecover. In the case of Fig. 5, this means that we need describing the crossing inthe terms of edge pairs eicj and ekcl being active, and ekcj and eicl being inactive makes it possible to edges and then itsknow topology. shown in to Fig.assess 1, we that the underlying structure to Reconstruction be able toattempt distinguish one branch from other. One oopy to distinguish one one branch from the other. OneOne ooable be able to distinguish branch from the other. consistently enforce constraints on truly is whose topology we will eventually want to (b) Handling overlapping edges. (c) Enforcing crossover consistency. recover thesubgraph correct treebytopology. approacheventually would beageometric totree first recover the defined 9)ussed ach would be to first recover the subgraph defined by previously, allowing branches to cross propproach would be to first recover the subgraph defined by (8) recover. ven the at junctions, we do both bythisitsmeans In thesimultaneously case of Fig. 5, that we need active edges and then attempt to assess topology. ctive edges and then attempt toshown assess itsedges topology. cyclic graphs such theto one Fig. 5. from the other. One he active then attempt to assess its topology. n terms ofedges whether consecutive pairs ofin to and be as able distinguish one branch However, to consistently enforce geometric constraints on 0) ever, to consistently enforce geometric constraints on r, in some cases, such as when delineating the However, to consistently enforce geometric constraints on he final delineation not. approach would be first recover the subgraph branches even atorjunctions, wetodo both simultaneously by defined by (9) hes even at junctions, we do both simultaneously by structures of the brightfield and brainbow images branches even at junctions, we do both simultaneously by }, 1) , ecj ) set mple, in the case Fig.of edge pairs (eicattempt theof active and then to of assess its topology. reasoning in terms whether consecutive pairs of edges where E5,edges denotes the edges containing ejk and its edge-pairs so that jk in Fig. 1, we know that the underlying structure ning in terms of whether consecutive pairs of edges easoning in terms of whether consecutive pairs of edges 2) ecl ) belong should tobelong (e , ecl ) nor geometric constraints on However, toneither consistently enforce the finalbutdelineation oricin not. v }, (10) incoming edges the graph, and Ijk by denotes the image X c j a tree whose topology we will eventually want to g to the delineation or not. belong to final the final delineation or junctions, not. consider the vertices labeled i,j,k,l, branches even at we do both simultaneously 3)Similarly, For example, in the case of Fig. 5, edge pairs (eic , ecj ) pijk = 1. (25) evidence around these edges. 19 is obtained under the (11) In case of 5,of in this means we need , ecjEq. ) pairs Forthe example, inFig. the case of Fig. 5, that edge pairs ecj rthe example, in the case Fig. 5, edge pairs (eof ic ) ic ,(e graph of Fig. 2(b). In the delineation reasoning terms of whether consecutive of edges and (ekc , ecl ) should belong but neither (eic , ecl ) nor endkcto ble distinguish one branch from the other. One , (e eshould should belong but neither (e ,not. ecl )evidence nor that the image around edge pairs are ,(e ecl )(12) belong but neither (e , e ) nor e ∈E kc cl )assumption ic ic cl ,(e ij dge pairs , e ) and (e , e ) are both active belong to the final delineation or jk mj jl (ekc , ecjij). Similarly, consider the vertices labeled i,j,k,l, y,ehcj toboth first recover thevertices subgraph defined by5, , ecj ).beSimilarly, consider the vertices labeled i,j,k,l, ). Similarly, consider the labeled i,j,k,l, given that know whether they jkcwould belongs branches. For example, inindependent the case Fig. edge pairs ic , ecj ) mtoinconditionally the graph of Fig. 2(b). Inofthe delineation of (ewe and (13) ot ve edges and then attempt to2(b). assess topology. nd in mthe inwe thefirst graph ofto Fig. In its the delineation m graph of Fig. 2(b). In the delineation of of (e or In our experiments, this probability is expressed as a belong the curvilinear structures llowing, describe our cost function that are , e ) should belong neither ) norIn Eq. 20, we use and (e kc (ecl Fig. 2(d), edge pairs both activeic , eclnot. ij , ejk ) and (e mj , ejl )but s that denote e r, to2(d), consistently geometric constraints onvertices Fig. edge pairs ,)rule eand )junctions, and eare arethe both active 2(d), edge pairs (ekc (e , emj ),and active ij). jk jl )both ijenforce jk mj(e jl he formation of spurious then ,,(e eecj Similarly, consider labeledterms i,j,k,l, P (Ijk , Ejk ) and (e Bayes’ and remove the constant and vertex j belongs to both branches. sigmoid function of a distinctive feature, such as directional specifically, ot she atbelongs junctions, we do both simultaneously by In the delineation of ndeven vertex j belongs to inboth branches. vertex jIn to both branches. constraints that allow graph vertices to ), be m of Fig. 2(b). the following, we firstgraph describe our cost function that a uniform prior for both. Pand (Y ythe =cost zfunction assuming rom the root jk = jk , Z jkcost jk function at ng in terms of whether consecutive pairs of edges In the following, we first describe our that the following, we first describe our that deviation along eijk , which helps recovering the right ong separate branches. Fig. formation 2(d), edge of pairs (eij , ejkjunctions, and (emj and , ejl )then are both active penalizes the spurious optimal tree We derive Eq. 21 and) branches. 22and bythen using the fact that yjk , tijk ∈ h to thethe final penalizes thedelineation formation ofjnot. spurious izes formation of or spurious junctions, and then and constraints vertex belongs tojunctions, both introduce the that allow graph vertices to be connectivity at crossovers such as the one of Fig. 2. th does not of xample, case of Fig. 5, substituting edge (e ntroduce thethe constraints that allow graph vertices to jk be {0, and =function eij |Yjkthat= 1, Ijk , Ejk ) and ic , eP cj duce the in constraints that allow graph vertices to)(Z be st Function In1}, the following, wepairs first describe our cost shared separate branches. ensure thatamong shared c ,among ecl )among should belong but neither (e , e ) nor separate branches. kc ic cl d separate branches. Ppenalizes (Z zjkformation |Ynegative Ijk , Ejunctions, with pijk and and then jkof=the jk =of0,spurious jk ) respectively o minimize sum the two logroot to each the the vertices labeled jx). Similarly, consider introduce constraints thati,j,k,l, allow∗ graph vertices to be 4.5.1 Cost Function terms of ofEq 3. In 1this the section, weterm take the servation . The deg (v , v ) denotes the number of 6.4.2 Prior Term in Cost the Cost graph of Fig. 2(b). In the delineation of 4.5.1 Function ∗ j k Function deg (vjamong ,vk )the shared separate branches. anWe image and path geometry - constraints gs), seekdata to minimize sum oflikelihood the two negative logWe take − log(P (T = t)) the second likelihood term of edge pairs (e , e ) and (e , e ) are both active ij jk mj jl We seek toone minimize the of sum ofthat the penalizes two log- the edge ekj , if it exists. in-edges vsection, excluding eek tovertex minimize sum the logehe second as athe junction prior j negative o the terms of Eqof 3.vertex In two this negative we take the Eq. 4 to be a junction prior that penalizes unwarranted tex jlikelihood belongs to both branches. ikelihood terms of Eq 3. In this section, we take the constant terms, express y e 4.5.1 Cost Function hood terms of Eq 3. In this section, we take the d bifurcations or terminations. Finally, in Eq. 23, we drop the first onewe as first an image data and geometry likelihood jk ehow following, describe our costpath function that bifurcations or terminations, which we write as ct first one an We image data path likelihood incurviAppendix C, assuming indeone as anasand image data and path likelihood ectedterm, toone minimize theand sum of thepenalizes two negative theterms second as ageometry junction prior that in ofand tconditional ’s,geometry substitute P (Yjklog= 1|Ijk , Ejk ) with ijk es theand formation ofseek spurious junctions, and then # " of the image evidence given the true values of erm, the second one as a junction prior that penalizes s.and them sethe second one as a junction prior that penalizes q likelihood terms of Eq 3. Intothis section, we take the unwarranted or terminations. or X X X X X ce the constraints that allow graph vertices be , bifurcations the first can be rewritten m variables Tijk pshow Taking the negative logarithm of Eq. 23 yields ult in As some unwarranted orterm terminations. rranted bifurcations or terminations. jk . in webifurcations Appendix C, assuming inde- likelihood first one as an image data andconditional path geometry e , λ t + γ t + δ t t t t t mij ijn ijn ijk among separate branches. reconstruct Aspendence we show in the Appendix C, assuming conditional we show in Appendix C,the assuming conditional inde-indeof image evidence given the true values of oct and second one as atrue junction prior X them all. pendence of image theterm, image evidence given the values of that penalizes ejn ∈E ejn ∈E ejk ∈E eij ∈E emi ∈E ence of the evidence given the true values of rewritten , the first term can be the random variables T s log(P (I, G|T = t)) = c w t , (15) ijk ijk ijk ijk unwarranted bifurcations ces, one forvariables the first be rewritten he random Tijk k 0, cijk < 0,
∀eijk ∈ F : i ∈ V /R, k ∈ V /R ∀eijk ∈ F : i ∈ V /R, k ∈ V /R
(D.2) (D.3)
We create a new edge eik for every merged edge pair eijk , and remove the center vertex j and the edges eij , ejk from the graph. We assign the cost value c P of a merged edge pair eijk to its newly created edge eik . The objective function term for this cost value then becomes m∈V : emi ∈E (c tmik ). Note that the first three reduction procedures introduced above do not compromise global optimality. That is, the global optimum of the reduced problem is equivalent to that of the original one. In contrast, the edge merger procedure we introduce here does not necessarily provide such an optimality guarantee. In practice, however, we did not observe connectivity issues in our solutions mainly because of the high accuracy of the path classifier (an average TPR of 97% at a FPR of 1%) as demonstrated by Fig. 10(b).
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
A PPENDIX E:
O PTIMAL
5
PRUNING OF A MINIMUM SPANNING TREE
Let G = ({V = vi }, E = {eij }) be the tubular graph of Section 6 with vertices V and directed edges E. If we assign pij , a directed minimum spanning tree tMST = ({V = vi }, EM ST = {eij } ⊂ E) to each edge eij the weight − log 1−p ij will span all the vertices while minimizing the energy of Eq. 32 [22] X pij − log tij . (E.1) 1 − pij eij ∈E
In practice, some of the vertices correspond to false detections that are really part of the background. One way to eliminate them is to prune tMST , that is, remove some edges while preserving the tree topology. To guarantee that the results are as comparable as possible to the ones produced by the k-MST and IP algorithms, we find the optimal connected subset of edges EP RU ⊂ EM ST that represent the pruned tree: X pij tij . (E.2) arg min − log 1 − pij EP RU ⊂EM ST eij ∈EM ST
In other words, we find the subtree that minimizes the same energy functional of Eq 32. To this end, we exploit the fact that we can rewrite the cost of an optimal subtree rooted at vi as a recursive function on the vertices as X pij + w (vj ) , 0) , (E.3) w (vi ) = min(− log 1 − pij vj ∈V : eij ∈EM ST
where the cost at a terminal vertex with zero out degree is equal to zero. For every edge eij ∈ EM ST outgoing from vertex vi , the algorithm checks whether it should be kept or removed from the solution, In the first case, we add the edge-weight and the cost w (vj ) of the optimal subtree rooted at vj , In the second case, the contribution to the cost is zero. All the w (vi ) values are computed recursively in this way starting from the root while labeling the outgoing edges of each vertex as retainable or not. Once the edges are labelled, the final solution is taken as the largest connected set of retained edges reachable from the root. This process is illustrated by Fig. 2.
Fig. 2: A directed minimum spanning tree rooted at the bottom vertex. The numbers on the edges represent edge weights, that is, the negative log likelihood ratios, and the numbers on the vertices represent the optimal costs w (vi ) of their subtrees. Edges marked as retainable are represented by solid arrows. The remaining edges appear as dashed arrows. The red edges constitute the optimal subtree. Note that not all retainable edges are in fact retained in the final solution. Also, not all of the pruned edges have positive weights and not all of the retained ones have negative ones. In this specific example, the weight of the initial minimum spanning tree is -6 and the weight of the optimal subtree is -13.
R EFERENCES [1] [2] [3]
J. C. Fiala, “Reconstruct: A Free Editor for Serial Section Microscopy,” Journal of microscopy, vol. 218, pp. 52–61, 2005. M. Brown, R. Szeliski, and S. Winder, “Multi-Image Matching Using Multi-Scale Oriented Patches,” in Conference on Computer Vision and Pattern Recognition, 2005. D. R. Myatt, T. Hadlington, G. A. Ascoli, and S. J. Nasuto, “Neuromantic - from Semi Manual to Semi Automatic Reconstruction of Neuron Morphology,” Frontiers in Neuroinformatics, vol. 6, no. 4, 2012.
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
[4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]
6
Bitplane, “Imaris: 3D and 4D Real-Time Interactive Image Visualization,” 2013, http://www.bitplane.com/go/products/imaris/. E. Meijering, M. Jacob, J.-C. F. Sarria, P. Steiner, H. Hirling, and M. Unser, “Design and Validation of a Tool for Neurite Tracing and Analysis in Fluorescence Microscopy Images,” Cytometry Part A, vol. 58A, no. 2, pp. 167–176, April 2004. Z. Fanti, F. F. De-Miguel, and M. E. Martinez-Perez, “A Method for Semiautomatic Tracing and Morphological Measuring of Neurite Outgrowth from DIC Sequences,” in Engineering in Medicine and Biology Society, 2008, pp. 1196–1199. M. Longair, D. A. Baker, and J. D. Armstrong, “Simple Neurite Tracer: Open Source Software for Reconstruction, Visualization and Analysis of Neuronal Processes,” Bioinformatics, vol. 27, no. 17, pp. 2453–2454, 2011. R. Srinivasan, Q. Li, X. Zhou, J. Lu, J. Lichtman, and S. T. C. Wong, “Reconstruction of the Neuromuscular Junction Connectome,” Bioinformatics, vol. 26, no. 12, pp. 64–70, 2010. H. Peng, Z. Ruan, D. Atasoy, and S. Sternson, “Automatic Reconstruction of 3D Neuron Structures Using a Graph-Augmented Deformable Model,” Bioinformatics, vol. 26, no. 12, pp. 38–46, 2010. H. Peng, F. Long, and G. Myers, “Automatic 3D Neuron Tracing Using All-Path Pruning,” Bioinformatics, vol. 27, no. 13, pp. 239–247, 2011. Y. Wang, A. Narayanaswamy, and B. Roysam, “Novel 4D Open-Curve Active Contour and Curve Completion Approach for Automated Tree Structure Extraction,” in Conference on Computer Vision and Pattern Recognition, 2011, pp. 1105–1112. S. Wearne, A. Rodriguez, D. Ehlenberger, A. Rocher, S. Henderson, and P. Hof, “New Techniques for Imaging, Digitization and Analysis of Three-Dimensional Neural Morphology on Multiple Scales,” Neuroscience, vol. 136, no. 3, pp. 661–680, 2005. P. Chothani, V. Mehta, and A. Stepanyants, “Automated Tracing of Neurites from Light Microscopy Stacks of Images,” Neuroinformatics, vol. 9, pp. 263–278, 2011. H. Cuntz, F. Forstner, A. Borst, and M. H¨ausser, “One Rule to Grow Them All: A General Theory of Neuronal Branching and Its Practical Application,” PLoS Computational Biology, vol. 6, no. 8, p. 1000877, 2010. H. Xiao and H. Peng, “APP2: Automatic Tracing of 3D Neuron Morphology Based on Hierarchical Pruning of a Gray-Weighted Image DistanceTree,” Bioinformatics, vol. 29, no. 11, pp. 1448–1454, 2013. Y. Wang, A. Narayanaswamy, C. Tsai, and B. Roysam, “A Broadly Applicable 3D Neuron Tracing Method Based on Open-Curve Snake,” Neuroinformatics, vol. 9, no. 2-3, pp. 193–217, 2011. E. Turetken, F. Benmansour, and P. Fua, “Semi-Automated Reconstruction of Curvilinear Structures in Noisy 2D Images and 3D Image Stacks,” EPFL-182839, Tech. Rep., 2013. D. Wang, R. Lagerstrom, C. Sun, L. Bishof, P. Valotton, and M. G¨otte, “Hca-Vision: Automated Neurite Outgrowth Analysis,” Journal of Biomolecular Screening, vol. 15, no. 9, pp. 1165–1170, 2010. M. Pool, J. Thiemann, A. Bar-Or, and A. E. Fournier, “Neuritetracer: A Novel Imagej Plugin for Automated Quantification of Neurite Outgrowth,” Journal of Neuroscience Methods, vol. 168, no. 1, pp. 134–139, 2008. MBF Bioscience, “Neurolucida: Microscope Systems for Stereology and Neuron Morphology,” 2013, http://www.mbfbioscience.com/neurolucida/. A. Cardona, S. Saalfeld, J. Schindelin, I. Arganda-Carreras, S. Preibisch, M. Longair, P. Tomancak, V. Hartenstein, and R. J. Douglas, “TrakEM2 Software for Neural Circuit Reconstruction,” PLoS One, vol. 7, no. 6, p. 38011, 2012. Y. Chu and T. Liu, “On Shortest Arborescence of a Directed Graph,” Scientia Sinica, vol. 14, no. 10, p. 1396, 1965.