IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 3, JUNE 2001
575
The Fast Tracker Processor for Hadron Collider Triggers A. Annovi, M. G. Bagliesi, A. Bardi, R. Carosi, M. Dell’Orso, M. D’Onofrio, P. Giannetti, G. Iannaccone, F. Morsani, M. Pietri, and G. Varotto
Abstract—Perspectives for precise and fast track reconstruction in future hadron collider experiments are addressed. We discuss the feasibility of a pipelined highly parallel processor dedicated to the implementation of a very fast tracking algorithm. The algorithm is based on the use of a large bank of pre-stored combinations of trajectory points, called patterns, for extremely complex tracking systems. The CMS experiment at LHC is used as a benchmark. Tracking data from the events selected by the level-1 trigger are sorted and filtered by the Fast Tracker processor at an input rate of 100 kHz. This data organization allows the level-2 trigger logic to reconstruct full resolution tracks with transverse momentum above a few GeV and search for secondary vertices within typical level-2 times. Index Terms—Associative memories, data acquisition, field programmable gate arrays(FPGAs), nuclear physics, parallel processing, particle tracking, pattern recognition, pipeline processing, triggering.
I. INTRODUCTION
I
N THIS PAPER, we describe the implementation of the Fast Tracker (FTK) [1], a highly parallel processor dedicated to the efficient execution of a fast track finding algorithm [2], based on the idea of a large bank of precomputed hit patterns [3]. We estimate the size of the hardware necessary to apply this technique to very complex tracking systems. The CMS experiment at LHC [4], [5] is used as a benchmark. The proposed system is an evolution of the silicon vertex tracker (SVT) [6] currently being built for the CDF experiment. The SVT reconstructs real-time tracks with sufficient precision to measure, for instance, quark decay vertices. The pattern recognition algorithm consists in associating fired detector channels, called hits, to track candidates with low spatial resolution, called roads. Then, the tracks are fitted and their parameters precisely determined. The FTK processor executes the first step, road finding, while the second step, track fitting, can be executed by any kind of level-2 trigger logic fast enough to work in pipeline with the FTK. The location of the FTK with respect to the data acquisition (DAQ) data flow is sketched in Fig. 1. The FTK has access to Manuscript received November 5, 2000; revised February 3, 2001 and March 19, 2001. A. Annovi, M. G. Bagliesi, A. Bardi, R. Carosi, M. D’Onofrio, P. Giannetti, F. Morsani, M. Pietri, and G. Varotto are with Istituto Nazionale di Fisica Nucleare, 56010 S. Piero a Grado, Pisa, Italy (e-mail:
[email protected]). M. Dell’Orso is with Dipartimento di Fisica, Università di Pisa, 56100 Pisa, Italy (e-mail:
[email protected]). G. Iannaccone is with Dipartimento di Ingegneria dell’Informazione, Università di Pisa, I-56126 Pisa, Italy (e-mail:
[email protected]). Publisher Item Identifier S 0018-9499(01)04901-2.
Fig. 1. The FTK processor has access to the tracker data of level-1 selected events on their way through the DAQ system. The FTK performs substantial data reduction by selecting high P track candidates and organizes hits into standard DAQ buffer memories.
the whole amount of tracking data at a very high rate (up to events/s) in order to perform data reduction for trigger applications. Hits on roads, with transverse momentum ( ) above a threshold of a few GeV/c, can be filtered from a huge number of other hits and written in memory banks accessible by the level-2 trigger logic. Of course, pattern recognition need not be necessarily completed by the FTK, since some ambiguities can be resolved at level-2, e.g., by fitting and comparing all residual hit combinations. For the sake of simplicity, in this paper we will refer to the level-2 procedure as the track fitting phase, although it might include completion of the pattern recognition inside the road. II. THE FTK PROCESSOR ARCHITECTURE The FTK consists of two main cooperating parts: the Data Organizer (DO) and the pipelined Associative Memory (AM). The Data Organizer [7] interfaces the FTK with the DAQ system. It performs the following tasks: 1) receives full resolution detector hits and buffers them in an internal structured database. 2) sends low resolution hits called Super Bins (SB) to the AM. The Super Bins are obtained by logically ORing a number of adjacent detector bins. 3) receives roads back from the AM and fetches, from the internal database, all detector hits contained in the roads.
0018–9499/01$10.00 © 2001 IEEE
576
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 3, JUNE 2001
4) sends each road, with its set of full resolution hits, to a memory accessed by the level-2 trigger logic. The pipelined Associative Memory implements the algorithm that finds roads. The AM is a dedicated device where parallelism is pushed to the maximum level, since each stored hit pattern is provided with the necessary logic to compare itself with the event. The AM pattern bank is limited by the size of the hardware, mainly consisting of low-density content-addressable custom memories. Pattern banks for the tracking detectors of the next generation hadron colliders are very large. As an example, we estimate the bank size for the barrel of the CMS experiment (see Section IV-B). We propose to use four independent AM’s, each one working on a fourth of the barrel. The azimuthal segmentation generates some inefficiency at sector boundaries (see Section IV-D), but simplifies the collection of all tracking data at an event rate of 100 kHz. In fact, the detector area served by a bank is limited by the AM bandwidth. For this aspect, the FTK has a bandwidth a factor 10 larger than the SVT at CDF [1]. The FTK processor increases the data flow rate by exploiting the parallel readout of the detector layers. Data are fed into each AM on six parallel buses with a rate of 108 bits every 25 ns for a total of 4 Gbit/s. We have estimated the input rates [1] for the silicon detector layers of the CMS barrel. Numbers are calculated for the detector occupancies published in the CMS Tracker TDR [5]. Even in the worst case of a 100-kHz level-1 event rate, the average time available to transfer each hit is approximately 25 ns, corresponding to 40 million hits per second on each fourth of layer. In order to ensure scalability of the architecture, the whole AM bank associated to each detector sector is a pipeline of AM boards: hit data from the related sector feed the AM boards one after the other. In such a way, boards can be simply removed from or added to the bank to tailor its size, with the only drawback of changing the data latency. III. HARDWARE IMPLEMENTATION A. The FTK Heart and Brain DO-Board and AM-Board Both the Data Organizer and the Associative Memory are composed of a set of 9U VME boards: the DO-boards and the AM-boards. Each DO-board receives hits from one or two detector layers and sends them with the properly reduced resolution on a single bus to the AM. The AM-board can receive up to six independent buses either from six DO-boards or from an upstream AM-board, to perform pattern recognition on up to twelve layers. We implemented DO and AM-boards with programmable chips (CPLD’s and FPGA’s from Xilinx [8]). Each board is synchronized by an internal clock signal and works at up to 40 MHz. Each board input is provided with a synchronous FIFO with independent clocks for read and write operations to allow asynchronous communication between boards. The realization of both boards has represented a significant technological challenge. 1) The challenge for the AM-board comes from the necessity of packing as many patterns as possible and distributing all detector hits to all patterns. A 9U VME board can support 128 AM chips, whose pattern capacity
strongly depends on technological choices, as described in Section III-B. For details on board construction, see [9]. 2) The DO-board is characterized by a large amount of complex logic: data flow in a long pipeline under control of several cooperating finite state machines and a lot of auxiliary logic that manipulates data on the fly. We used the XC9500XV family for complex equations and finite state machines, while we chose the VIRTEX family for register rich logic (e.g., the data pipeline). For details on board construction, see [7]. B. Packing Patterns Inside a 9U VME Board In order to estimate the hardware size of the AM bank, we have considered two possible implementations: one based on a commercial low-cost FPGA family (Xilinx Spartan 0.35- m process) [8] and the other based on standard cell ASIC’s [10]. For both approaches, we assume a fully modular architecture in which each module stores a single trajectory (road) and contains the logic needed to compare the coordinates of all fired detector channels with those associated to the stored trajectory. All modules in each AM chip receive, from six input buses, the complete configuration of fired detector channels of each event. We consider that a single trajectory consists of an 18-bit word for each of 12 layers. The most significant bit identifies the layer on each bus. Using the 0.35- m FPGA family mentioned above, 56 patterns can be fit on a chip, and 128 chips with the PQ208 package can be fit on a 9U VME board, for a total number of 7168 patterns per board. This is obtained with a careful mapping of the logical functions onto the FPGA, with 95% of the available Configurable Logic Blocks actually used. For details on the chip design and layout, see [9]. Estimates for the year 2005 (the scheduled starting date for next generation hadron collider experiments) can be made on the basis of data from the FPGA manufacturer and from the 1998 edition of the International Technology Roadmap for Semiconductors [11]. According to Xilinx, in 2005 the low-end family will be based on a 0.13- m process. Assuming that CLB area scales with technology, and that again 95% of the CLBs will be used, a density patterns per board will be reached. of As far as the ASIC implementation is concerned, according to the ITRS [11], in 2005 the 0.1- m process will be available for ASICs. We use as a basis for our estimate the architecture proposed in [10]: 3000 patterns (eight 12-bit layers for each pattern) are served by a single input bus. They are stored in an area of 1 cm using the ALCATEL-MIETEC CMOS 0.35- m technology. We rescale this number for an architecture with six 18-bit input buses for twelve layers. We estimate that a chip in the PQ208 package (which still allows 128 chips to be loaded patterns in 100 onto one AM board) can contain up to nm technology. This means that we can reach a density of patterns per board. The FPGA approach, though rather limited in terms of performance and pattern density, will be very useful for prototyping and for testing the complete architecture of the system. Close to the starting date of the actual experiment, one can switch to
ANNOVI et al.: THE FAST TRACKER PROCESSOR FOR HADRON COLLIDER TRIGGERS
577
A stand-alone simulation program has been used to generate tracks in the CMS detector. It takes into account effects such as multiple scattering, ionization energy loss, nonuniformity of the magnetic field, detector inefficiencies, and resolution smearing. It does not take into account strong interactions of hadrons with the tracker material whose main effect is the disappearance of the particle. Although this effect influences the performance of the tracking detector [5], it is not expected to introduce relevant changes in the evaluation of the track fitting algorithm. Detector hits are produced from tracks generated in two different ways.
Fig. 2. Sketch of the FTK VME crate.
an ASIC implementation, based on the most recent technology process, for increasing the packing density of more than one order of magnitude and therefore reducing the hardware to a reasonably manageable size. C. Crate Layout Fig. 2 shows a possible organization of the FTK inside a VME crate. The main sections are the sets of DO-boards and AM-boards. Up to six DO-boards are expected to transmit hit data to the AM pipeline on six independent buses (SB buses). However, the number of DO-boards in the system can be larger. Each detector layer, used by the level-2 track fitters, must send data to the Data Organizer, for linking to matching roads and formatting, even if these data are unnecessary to the AM. A total of 12 DO-boards can handle up to 24 detector layers. Each DO-board sends the “road packets” to the TRK output bus (see Fig. 1). The road packets contain the road identification and its full resolution hits. The back of the DO-board is dedicated to the connection with the DAQ [7]. Flat cables on the front panel are used for the SB bus to the AM and for the TRK bus to the external memory. The six SB buses are received on the front panel by a simple board (DO-AM interface), that transfers them onto the dedicated crate backplane for clean propagation to the AM [9]. The TRK buses, one from each DO-board, are received by the Ghost Buster board, that merges the DO outputs onto a single stream. Roads from the last board of the AM pipeline are broadcast to all DO- boards on a dedicated backplane bus, using user defined pins of the VME P2 connector [9]. IV. TRACK-FINDING PERFORMANCES A. The FTK and Experiment Simulation The performance of the system has been studied using the CMS central detector [5] (barrel) as a benchmark, but it is important to stress that the design fits into any hadron collider trigger. Taking into account the symmetry of the detector, only for is considered. an azimuthal sector of Therefore, all results reported in the following sections are relative to 1/8 of the whole barrel.
• Low-luminosity sample: Standard Model Higgs events (Htt processes) were simulated with Pythia version 6.125 [12], for a Higgs mass of 120 GeV/c. Random hits were added to the event to take into account the detector noise and two minimum bias events. This is the average number of events that overlaps the hard scattering in the LHC low luminosity run. Htt events have an average number of above 2 1000 tracks in the barrel, of which 120 with GeV/c, distributed in very energetic jets. It is an example of very crowded events, where pattern recognition could be particularly difficult. • High-luminosity sample: 360 energetic tracks per event, above 5 GeV/c, were uniformly generated in the with whole barrel. Random hits were added to take into account detector noise as well as 30 minimum bias pile-up events, more than the average number of soft collisions per beam crossing in the LHC high luminosity run. Thirty minimum bias events generate more than 1000 tracks in the CMS GeV/c. This type of event tracker [13], all below is more complex than most of those that will occur in the real experiment and can be considered as an upper-limit case. It represents a severe test for the online track finding project. It is to be stressed that most real events flowing through the system will have a structure much simpler than the two simutracks. For lated samples [5], with a small number of high these events, the FTK filters out much more than 90% of the hits [13]. The event primary vertex is smeared using a gaussian distrimm in the transverse plane and cm along bution of . Only eight cylindrical layers are used to find tracks. Two different choices of sets have been considered. The first set is composed by four silicon layers linked to the most internal four Micro Strip Gas Chamber (MSGC) layers and the second set includes two pixel layers linked to four silicon layers and two internal MSGC layers. Table I shows the distance from the beam line of all used layers. The second choice is to be preferred in practice since it includes more internal layers, which are more efficient in hadron detection [5]. However, from the point of view of the track fitting performance, the two different layer selections gave very similar results. In the following, we report only results for the second configuration. We require six fired layers out of 8 to include the hit combination in the track candidate list. We allow for two missing points, to reduce the effect of detector inefficiencies.
578
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 3, JUNE 2001
DISTANCE
OF
TABLE I USED LAYERS FROM THE BEAM LINE. FOR EACH LAYER WE SHOW IN WHICH OPTION IT IS USED
The simulation program performs two subsequent steps to reconstruct the event, in order to reproduce the behavior of the hardware procedure: • Road finding: all roads are found by comparing the event with pre-stored patterns, simulating the FTK processor; • Track fitting: all found roads are processed to find the best track parameter values and to reject the fake ones. A test is performed on every track candidate inside each road to reject the combinatorial background, and track parameters are calculated. This is achieved using linear approximations for the track constraints and principal component analysis [14]. The precision of the method has been checked in many conditions [15]. It is found to be comparable with the full offline resolution, with the additional advantage that calculation is very fast. B. Pattern Bank Size In principle, the pattern bank may contain all possible tracks that go through the detector (a 100% efficient bank). In practice, one should also consider effects which make particles deviate from the ideal trajectory, such as detector resolution smearing, multiple scattering, etc. Those effects generate a huge number of extremely improbable patterns, which blow the bank size up. For this reason, we decide to use a bank that is partially inefficient. We generate tracks in the detector and we store new patterns corresponding to the generated tracks, until the bank reaches the desired efficiency. This procedure is slow, but the computation is done only once. It automatically ensures that high- probability patterns are stored and low-probability patterns are left out. A reference bank efficiency has been conventionally fixed at 90%. The generated track typology also affects the bank size. It is very convenient to restrict the range of the generated track paand the region where they come from (lurameters, such as minosity region), to those values relevant for the physical processes to be studied. We assume a cylindrical luminosity region, circular on the transverse plane with a radius of 1 mm and 3 cm long in the longitudinal direction. This restriction helps to keep the size of the pattern bank small, but reduces to zero the efficiency for tracks coming from long lived particles. However, B-meson decay products, whose impact parameters are few hundred microns, are compatible with such a luminosity region and pattern threshold is an important parameter since bank. The track
Fig. 3. Bank efficiency (%) as a function of the bank size for P thresholds of 2 GeV/c (triangles), 5 GeV/c (squares) and 10 GeV/c (circles). The SB sizes are 1 mm in Si detector and 3 mm in MSGC detector.
it influences the efficiency in collecting interesting events. We would like to keep this threshold reasonably low. Three possible values, respectively 2, 5, and 10 GeV/c, have been used to evaluate the size of the corresponding pattern banks. The SB size is another parameter to be carefully studied, since it is critical for both the processor performances and the pattern bank size. It should scale roughly with the detector resolution, therefore in our study we use different values for Silicon (resolution 15 m) and MSGC (resolution 40 m) detectors. Three options have been considered for the SB widths: 1) 1 mm in Si detector and 3 mm in MSGC; 2) 2 mm in Si detector and 5 mm in MSGC; 3) 5 mm in Si detector and 10 mm in MSGC. Segmentation in the z direction is the same for Si detector and MSGC and it amounts . The simulated detector covers a to 8 cells of 12.5 cm for . pseudorapidity region The size of the pre-computed pattern bank has been evaluated threshold. Fig. 3 shows for the above options of SB size and threshthe bank efficiency versus the bank size for various olds, with SB sizes of option 1). Fig. 4 shows the 90%-efficient bank size as a function of the thresholds. SB size for various In Section III-B we have shown that all these bank sizes will be achievable. In order to choose the optimal pattern bank, one must also evaluate the residual computing power needed at the level-2 trigger to complete the full resolution tracking. C. Finding Tracks at Full Resolution Because of the SB finite size, a road may contain physical hits belonging to different particles. Also, depending on the SB size and on the event hit density, there is a level of combinatorial background consisting of fake roads, i.e., track candidates that will be rejected at full resolution. The number of matching roads is always greater than the actual number of tracks. The excess of roads grows with increasing SB sizes and thresholds. Therefore, the complexity of with decreasing
ANNOVI et al.: THE FAST TRACKER PROCESSOR FOR HADRON COLLIDER TRIGGERS
Fig. 4. 90%-efficient bank size as a function of the silicon detector SB size for P thresholds of 2 GeV/c (circles), 5 GeV/c (squares) and 10 GeV/c (triangles).
Fig. 5. Left scale (circles): average ratio between the number of matching roads and the number of real tracks as a function of the silicon detector SB size. Right scale (squares): average number of hit combinations per road as a function of the silicon detector SB size. Results for the high-luminosity and low-luminosity samples are shown by white and black symbols, respectively.
the residual track finding strictly depends on the following quantities: • roads/track : the average ratio between the number of matching roads and the number of actual tracks per event; • combinations/road : the average number of residual hit combinations per road. Fig. 5 shows these critical numbers as functions of the SB threshold is fixed at 2 GeV/c. Results are reported size. The for the high and low-luminosity samples. We observe that the
579
thinnest SB minimizes the differences between the two samples. Presumably, the time required for track fitting grows linearly with roads/track and combinations/road . For instance, a SB threshold of width of 1 mm in the silicon detector and a 2 GeV/c would give on average 2.4 residual combinations per road and 1.5 matching roads per real track, with a total of 3.6 hit combinations to be examined for each real track. In this case the fitting time has been checked with an SGI R10000 processor (a workstation with a 250-MHz 64-bit CPU). We sequentially fit the residual combinations of hits in the roads, choose the best fit, and calculate the track parameters. In this tracks per second can be so reconstructed. A way, about single CPU can reconstruct complex events with 100 tracks of above 2 GeV/c, similar to the above-mentioned Htt events, at a rate of 1000 Hz. Moreover, we consider 1) the much smaller tracks in most real events, 2) the planned use number of high of hundreds of level-2 CPUs and 3) the CPU power evolution. In conclusion, the level-2 processors can easily reconstruct the input events worthy of full tracking, which are a fraction of the maximum 100-kHz rate. After the track fitting stage, the number of tracks selected by cut is compared to the expected number of tracks per the event in order to evaluate the number of fake tracks. We find a level of fake tracks of the order of 1%. Assuming a conservative choice of the thinnest roads, to threshold of 2 safely handle the worst conditions, with a GeV/c the bank size for a quarter of the CMS barrel amounts patterns. Taking into account the pattern densities to estimated in Section III-B for the year 2005 for the ASIC patterns per board), we conclude that implementation ( such a bank will fit in 6 slots of a VME crate. In summary, a complete processor for one forth of barrel, running at an event rate of 100 kHz, is estimated to fit in a range between 12 and 20 standard 9U VME slots. These include a couple of slots for auxiliary cards, 6 slots for the AM bank, and a number of slots for DO-boards. The number of DO-boards can be as low as 4 (e.g., to serve the 8 detector layers used in the simulation of this paper), up to a maximum of 12. The choice depends on the number of detector layers used and on the possibility of multiplexing some pairs of layers on the same bus. Of course, fine-tuning and tradeoffs can tailor the system to specific experimental requirements. D. Track Efficiency The bank efficiency is not the only contributor to the total track efficiency. Geometrical efficiency and fit efficiency must also be considered. Geometrical inefficiency is generated by segmentation of the detector and it is due to boundary crossing tracks, since patterns are required to be entirely contained in a sector. Quantitatively, partitioning the barrel into four azGeV/c, imuthal sections results in a 15% inefficiency at GeV/c, and 3% at GeV/c. Of course, 6% at azimuthal overlaps or other correctives can be used to reduce or even suppress this source of inefficiency. cut applied during the track Fit efficiency depends on the fitting step. The fit cuts are adjusted so that the fit efficiency is always 90%. Therefore, the total efficiency is 69% at 2 GeV/c, 76% at 5 GeV/c and 78% at 10 GeV/c.
580
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 3, JUNE 2001
V. CONCLUSION The FTK processor reduces and organizes tracker data into a size manageable by level-2 trigger processors. Thus, the commercial CPU farms planned for LHC triggers can reconstruct full resolution tracks inside narrow roads for events selected by level 1 and worthy of full tracking. The full reconstructed tracks can be used in high-rate level-2 trigger algorithms. above a given threshold, are Hits of track candidates, with filtered from a huge number of other not interesting hits. The FTK makes use of a very compact hardware for a complex application. We have shown that full processing of intricate CMS barrel events at the rate of 100 kHz and reduction into greater than 2 GeV/c, would require 1-mm wide roads, with few 9U VME crates. Ambitious goals of trigger selection at the future hadron colliders, such as that of singling out b quarks on the basis of the decay-product impact parameters, can well benefit from the FTK architecture. REFERENCES [1] A. Bardi, S. Belforte, M. Dell’Orso, S. Galeotti, P. Giannetti, and E. Meschi et al., “A real-time tracker for hadronic collider experiments,” IEEE Trans. Nucl. Sci., vol. 46, pp. 947–952, Aug. 1999. [2] M. Dell’Orso and L. Ristori, “VLSI structures for track finding,” Nucl. Intsr. Meth., vol. A278, no. 2, pp. 436–440, June 1989.
[3] H. Grote, “Pattern recognition in high-energy physics,” Rep. Prog. Phys, vol. 50, pp. 473–500, Apr. 1987. [4] CMS Collaboration, “The compact muon solenoid technical proposal,” CERN/LHCC 94-38, LHCC/P1, Dec. 1994. , “The tracker project technical design report,” CERN/LHCC 98-6, [5] CMS TDR 5, Mar. 1998. [6] A. Bardi, S. Belforte, J. Berryhill, A. Cerri, A. G. Clark, and R. Culbertson et al., “SVT: an online silicon vertex tracker for the CDF upgrade,” Nucl. Intsr. Meth., vol. A409, pp. 658–661, May 1998. [7] A. Annovi, M. G. Bagliesi, A. Bardi, R. Carosi, M. Dell’Orso, and P. Giannetti et al., “The data organizer: A high traffic node for tracking detector data,” IEEE Trans. Nucl. Sci., submitted for publication. [8] Xilinx, The Programmable Logic Company. (2001, Feb.) Xilinx Data Book 2000 [Online]. Available: http://www.xilinx.com/partinfo/databook.htm [9] A. Annovi, M. G. Bagliesi, A. Bardi, R. Carosi, M. Dell’Orso, and P. Giannetti et al., “A pipeline of associative memory boards for track finding,” IEEE Trans. Nucl. Sci., submitted for publication. [10] A. Cisternino, I. D’Auria, W. Errico, G. Magazzù, and F. Schifano, “A standard cell based content-addressable memory system for pattern recognition,” in Proc. Fourth Workshop on Electronics for LHC Experiments, Oct. 1998, CERN/LHCC/98-36, pp. 514–516. [11] Semiconductor Industry Association, “International technology roadmap for semiconductors,” 1998 update. [12] S. Torbjorn. (1999, Jul.) Pythia. Lund University, Lund, Sweden. [Online]. Available: http://www.thep.lu.se/~torbjorn/pythia.html [13] F. Gianotti, “Collider physics: LHC,” ATL-CONF-2000-001, Apr. 2000. [14] H. Wind, “Principal component analysis and its applications to track finding,” Formulae and Methods in Experimental Data Evaluation, pp. d1–k16, 1984. [15] R. Carosi and G. Punzi, “An algorithm for a real time track fitter,” in Conf. Records 1998 IEEE Nuclear Science Symp., Toronto, Canada, Nov. 1998.