(Field Programmable Gate Array). The system is designed for. Image-Guided Surgical Navigation. Since FPGA and SoPC. (System on Programming Chip) have ...
2010 3rd International Conference on Biomedical Engineering and Informatics (BMEI 2010)
Hardware Acceleration for Motion Tracking System Used in Image-Guided Surgery Dang Xiao, Li Wenjun, Ding Hui, Wang Guangzhi Dept. of Biomedical Engineering Tsinghua University Beijing, China Abstract—This paper presents several hardware accelerating methods for an infrared optical tracking system based on FPGA (Field Programmable Gate Array). The system is designed for Image-Guided Surgical Navigation. Since FPGA and SoPC (System on Programming Chip) have the inherent flexibility and ease of implemented, system architecture together with PCB (Printed Circuit Board) design can be greatly simplified while the reliability can be increased. With the hardware accelerating methods, which are Floating-Point Custom Instruction, Tightly Coupled Memory (TCM) and Multiprocessor, calculating speed of 3D reconstruction gets a significant promotion of 18 times according to experiment results. This overcomes the lack of dominant frequency and floating-point computational capability in Altera soft processor Nios II. However, the benefit of speed is obtained at the cost of hardware resources, which should be considered. The hardware acceleration and its testing performance could theoretically support up to 32 markers realtime location with a frame rate of 60fps (frame per second). Keywords - hardware acceleration; optical tracking; FPGA; SoPC; Surgical Navigation.
I.
System on Programming Chip (SoPC) platforms based on Field Programmable Gate Array (FPGA) are becoming more prevalent as a solution for the implementation of embedded computing systems. This is due to their ease of implementation, highly customizable nature and inherent flexibility. Currently the major FPGA companies, Altera and Xilinx, provide the inorder soft processors Nios/Nios II and Microblaze, respectively. These processors have a rapidly growth in popularity and significantly simplify system complexity. While workloads for embedded system based on FPGA and ASIC tend to be similar, hardware-based acceleration capabilities on FPGA platforms targeting 3-dimensional reconstruction and optical tracking have not yet been fully explored, which is commonly achieved with high-end fixed or floating point DSP. In this paper, we demonstrate a simple yet effective hardware accelerating technique running on a SoPC platform. We have built a stereo optical tracking system with infrared markers, using Altera FPGA and Nios II processor. For this system, the following list of tasks must be solved:
INTRODUCTION
In recent years, Image-Guided Surgical (IGS) Navigation Systems have been widely used in computer-assisted surgery (CAS) due to their outstanding accuracy and thus leading to the development in three-dimensional (3D) tracking system, a number of them become commercially available. Among these optical tracking systems, the 2D optical sensor system known as the binocular vision system, is broadly studied and benefits from capability of both passive and active tracking. During the last several years, there has been a mass of literature published devoted to the research in stereo optical tracking system and its applications [1-4]. Also, various hardware architecture based on different electronic devices have been brought up and embedded system plays a significant role because of its flexibility and easy integration. For a usable surgical navigation, optical position tracking task must be done by a low level system to reduce the workload on the Navigation System, where usually a graphic workstation runs the navigation software. Since there is a lot of image and signal processing tasks in the optical tracking system, the common hardware solution has traditionally been the sole reserve of DSPs. With the introduction of dedicated multipliers into FPGA, these applications are able to incorporate FPGAs instead, taking both advantages of logic control and parallel computing in FPGA.
978-1-4244-6498-2/10/$26.00 ©2010 IEEE
•
Low-level hardware driver for image sensor.
•
Marker centroid extracting and matching from two image sensors.
•
3D reconstruction from 2D image points.
•
Data exchange with host PC.
Bottleneck occurs during the 3D reconstruction calculation, which includes mass of floating-point calculations. However, we improved the performance by utilizing several hardware acceleration methods, thus the 3D reconstruction speed get a promotion of 18 times. This paper is divided into following sections. First, Section II gives an introduction to our optical tracking system and the 3D reconstruction algorithm implemented in it. Section III shows the different hardware accelerating methods and their systemic architecture. The experiment results are shown in Section IV and Section V concludes this paper. II.
TRACKING SYSTEM DESCRIPTION AND DESIGN
A. System Architecture Our optical tracking system is composed of two infrared CMOS cameras (up to 60 fps) designed by ourselves, a FPGA
1498
evaluation board for system control, 3D reconstruction computation and final result transmission, also a PC connected with the system via USB for a 3D display. Active emission or passive retroreflective infrared markers attached to a hypothetical surgical tool can be tracked in real-time. Fig.1 shows the original optical tracking system and the software user interface with 3 infrared markers.
Abdel-Aziz and Karara [5]. Moreover, the Epipolar Constraint for markers matching needs the intrinsic and extrinsic parameters, which also could be calculated by DLT method. We used another high precision and commercial tracking system (Northern Digital OptoTrak) to generate the set of control points and finished the calibration progress by getting the intrinsic and extrinsic parameters. These could be used for the subsequent 3D reconstruction. 1) Epipolar Constraint: Once the instrinsic and extrinsic parameters of our tracking device are available, we can easily deduce the Epipolar Constraint by letting[6]
mrt Fml = 0 Figure 1. The tracking system and the user interface on PC
Hardware architecture and block diagram is shown in Fig.2. Image data from the CMOS sensor is read out by FPGA and image filters as well as the marker centroid extracting are both accomplished by FPGA HDL modules. Next, the 2D image coordinates of marker centroid from the pair of camera are sent to Nios II processor, which complete the Epipolar Constraint and 3D positioning calculation. Finally, the 3D coordinates are transmitted to PC via USB or other peripheral ports.
(1)
Where ml and mr are two corresponding points from the left and right images, respectively. The matrix F , referred in the literature as the fundamental matrix, is the representation of the epipolar geometry with mathematics. 2) DLT Method: The L-coefficients (11DLT) for 3D-DLT reconstruction can also be acquired from the instrinsic and extrinsic parameters. Given a set of corresponding pair of t t points ml = (ul , vl ) and mr = (ur , vr ) , this can lead to l l ⎡ L1l − ul Ll9 Ll2 − ul L10 ⎤ ⎡ Ll4 − ul ⎤ Ll3 − ul L11 x ⎡ ⎤ ⎢ l ⎢ l ⎥ l l ⎥ L − v Ll Ll6 − vl L10 Ll7 − vl L11 ⎥ ⎢ y ⎥ = ⎢ L8 − vl ⎥ = T (2) AX = ⎢ r5 l 9r r r ⎥⎢ ⎥ ⎢ L1 − ur L9 Lr2 − ur L10 ⎢ Lr4 − ur ⎥ Lr3 − ur L11 ⎢ ⎥ z ⎢ r ⎢ r ⎥ r r r r r ⎥⎣ ⎦ ⎣⎢ L5 − vr L9 L6 − vr L10 L7 − vr L11 ⎦⎥ ⎣⎢ L8 − vr ⎦⎥
Then the least squares solution (3) for equation (2) is
X = ( AT A)−1 AT T
(3)
Singular value decomposition can be used to solve the problem caused by the singularity of ( AT A) .
Figure 2. Hardware architecture and block diagram of the tracking system
This optical tracking system has good positioning accuracy but is low in computing speed, which can only reach 30fps, and lower if tracking markers are added. This is caused by the embedded software Nios II processor working under a low dominant frequency (100MHz, compared with the best DSP of up to 1GHz) and mass of floating-point computation. But in fact, our study shows that there are lots of hardware-based acceleration capabilities of FPGA platform and Nios II processor that can be explored to meet the requirement of navigation tracking system. B. 3D Reconstruction Methods A set of accurate and fully calibrated control points is mandatory if we use the most common DLT (Direct Linear transformation) calibration method, originally reported by
We can easily infer that the complexity of this algorithm depends on the number of markers, which will lead to a lack of real-time capability. In medical equipments, safety and accuracy are the most important aspects we should consider. On one hand, fixed-point calculation runs quickly on embedded systems but might lose data precision, and decrease the dynamic range of the data if we make a float-to-fixed conversion. Software programming must be very carefully designed to prevent the data overflow. On the other hand, floating-point operations using the fixed-point simulation by software can avoid the problem but cause the waste of operation time. An ideal solution is to use hardware accelerating the floating-point operations: this is not supported by common fixed-point processors, but can be easily integrated to the Nios II processor due to the inherent flexibility of FPGA. Together with the Tightly Coupled Memory (TCM) and Multiprocessor technique, we enhanced the performance of the 3D reconstruction calculation with floating-point greatly.
1499
III.
SEVERAL HARDWARE ACCELERATING METHODS
The key advantage of soft-processors is the flexibility nature that they provide in allowing specialization to an application through configuration or even instructions. Once we implemented the application to hardware, a significant promotion of speed generally can be achieved. In regard to the Altera’s FPGA, SoPC with costumed instructions, multiprocessors of Nios II and other hardware capabilities give an ease of implementation in our design. A. Altera’s Nios II processor Nios II [7] is Altera’s second generation soft processor. It is a general-purpose RISC (Reduced Instruction-Set Computer) soft processor with 32-bit instruction words and datapath, integer only ALU (Arithmetic Logic Unit), 32 general purpose registers and MIPS-style instruction format with a performance up to 250 DMIPS. There are three different processor cores consisted in the Nios II processor family: Economy (Nios II/e), Standard (Nios II/s) and Fast (Nios II/f). All three cores are single issue in-order execution processors. A Nios II processor system is equivalent to a microcontroller or “computer on a chip” that includes a processor and combination of peripherals and memory on a single chip. A Nios II processor system consists of one or more Nios II processor core, a set of on-chip peripherals, on-chip memory, and interfaces to off-chip memory, all implemented on a single Altera device. Most importantly, users can customize the Nios II processor system till it meets cost or performance requirements. B. Floating-point custom instructions With the Altera Nios II embedded processor, system designers can accelerate time-critical software algorithms by adding custom instructions to the Nios instruction set. System designers can use custom instructions to implement complex processing tasks in single-cycle (combinatorial) and multicycle (sequential) operations [8]. Fig.3 shows a Nios processor’s ALU with instruction set.
Nios II core implementation and includes single precision floating-point addition, subtraction, and multiplication. Floating-point division is available as an extension to the basic instruction set. C. Tightly Coupled Memory (TCM) During the 3D reconstruction, some constant parameters, the matrix F and the L-coefficients need to be accessed frequently, thus the read and write latency directly affects calculating speed. Although the major off-chip memory devices, such as the SSRAM (Synchronous Static Random Access Memory) and the SDRAM (Synchronous Dynamic Random Access Memory), are able to run under a sufficient high frequency, but have uncertain latency. The Nios II architecture provides tightly coupled master ports that obtain guaranteed fixed low-latency access to on-chip memory for performance critical applications [9-10]. Tightly coupled masters can connect to instruction memory and data memory allowing fixed low-latency access to executable code as well as the Read/Write access to data. In the Nios II core, tightly coupled masters are additional instruction or data master ports, separated from the Nios processor’s instruction and data master ports. We implemented the TCM in our Nios II system by loading the matrix F and L-coefficients in the tightly coupled data memory. These frequently used data can be accessed by Nios II with a very low-latency so that improves the performance of the system. Fig.4 shows the block diagram of the Nios II system with tightly coupled memories.
Figure 4. Nios II System with Tightly Coupled Data Memory
Figure 3. Adding Custom logic to the Nios ALU
The Nios CPU configuration wizard, accessed from the SOPC Builder, provides a graphical user interface that system designers can use to add up several custom instructions to the Nios processor, including the floating-point instructions. The floating-point custom instructions, optionally available on the Nios II processor, implement single-precision, floatingpoint operations. This could accelerate the floating-point operations in the Nios II C/C++ application program. The basic set of floating-point custom instructions is available on every
D. Multiprocessor As mentioned before, the Nios II processor doesn’t run at a high dominant frequency, compared with the advanced DSP and ARM, but using the multiprocessor technique, which is the inherent flexibility of SoPC, the lack of operating frequency will be overcome. What’s more, with rapid development of SoPC, more powerful processors will surely come soon. Multiprocessor systems [11-12] possess the benefit of increased performance, but always at the price of significantly increased system complexity. Therefore, the use of multiple processor systems has historically been limited to some special applications, such as workstation or high-end PC, which is typically too costly for most embedded systems. Thus, using multiple processors to perform different tasks and functions on
1500
different processors in embedded applications is gaining popularity in the wake of recent increases in the size of Altera FPGAs. While the SOPC Builder tool can easily modify and tune the hardware, different system configurations can be designed, built and evaluated very quickly. In the calculation of 3D reconstruction, 2D image coordinates from pair of infrared cameras come with pointmatching algorithm due to Epipolar Constraint. Then the 2D coordinates of matched-points use the DLT method for 3D computation. Actually, this can be done as pipeline [13] hardware architecture. Epipolar Constraint calculation could be done in one Nios II Processor while the calculation of 3D reconstruction can be achieved by another. A shared small size on-chip memory was implemented for data transfer. To aid in the prevention of multiple processors interfering with each other, a hardware mutex core was added which allows different processors to claim ownership of a shared resource and protects from corruption by the actions of another processor. The systemic pipeline architecture block diagram is shown in Fig.5 and the hardware components with multiprocessor of SOPC Builder tool is illustrated in Fig.6.
Figure 5. The Systemic Pipeline Architecture for Positioning Algorithm
•
Altera Quartus II 8.0 software.
•
SOPC Builder for design of the hardware system and its synthesis to FPGA configuration.
•
Nios II 8.0 Integrated Development Environment (IDE) for software programming and debugging.
•
The Altera development and education board (DE2-70) was used as the FPGA/SOPC-based platform for evaluation of designed hardware accelerating method. The DE2-70 board resources are listed in Table I. TABLE I.
Component
EXPERIMENTS AND RESULTS
A. Enviroment For the implementation of the hardware and software of the experimental test system, the following tools about Altera FPGA were used:
68,416 LEs, 250 M4K, 1.1M RAM bits 32MB capacity ×2 2MB capacity 8MB capacity
B. Experimental Results We implemented a new optical tracking system architecture based on the original one with hardware acceleration, using the three different methods described in Section III, respectively and combinedly. Summary of the experimental results and resources consumption will be discussed below. 1) System Parameters: From the front-end device, the frame rate of the CMOS sensor is 60fps, with frame size of 640×480 pixels. A 50MHz clock drives the hardware module designed by Verilog for image filtering together with the marker centriod extraction. This process barely relates to the number of markers, and only depends on the image resolution and frame rate. This module maintains the same rate with the CMOS sensor. Both Nios II processor and SDRAM are using the 100MHz clock frequency, which is generated by PLL with a 50MHz clock input and for the SDRAM there is a -3ns clock delay. 2) Performance Results: In order to collect the timing information before and after the hardware accelerations, a Nios II performance counter was both added in hardware and software. This helped a lot in accurately measuring execution time taken by multiple sections of code. Experimental measurement results is show in Table II.
Methods
IV.
Characteristics
Cyclone II EP2C70F896C6 SDRAM (off-chip) SSRAM (off-chip) Flash Device
TABLE II.
Figure 6. Illustration of the Hardware Components with Mulitprocessors in SOPC Builder Tool
DE2-70 DEVELOPMENT AND EDUCATION BOARD CHARACTERISTICS
HARDWARE ACCELERATING RESULTS A Measurement Result Clock cycles
Times(s)
Speedup
Pure Software
2,877,340,210
28.7734
–
Floating-Point Custom Instructions
232,117,095
2.32117
×12
TCM with CIb
203,074,068
2.03074
×14
Multiprocessor
1,393,945,887
13.9395
×1.7
Combined methodc
155,423,701
1.55424
×18.5
a. Calculation 1000 times with 3 infrared markers b. TCM denotes Tightly Coupled Memory, CI denotes Custom Instructions c. In the test system, only data TCM is implemented
1501
The results listed above may differ a little from time to time due to the uncertainty of data access latency on hardware and the device type that is used. The comparison of resource consumption is also shown below in Table III. TABLE III.
RESOURCE CONSUMPTION Different Methods
Resources
Pure SW
FloatingPoint CI
TCM
Multiprocessor
Logic Elements
4,158 (6%)
11,260 (16%)
11,355 (17%)
21,879 (32%)
Total Registers
2,712
7,083
7,019
13,389
Total Memory bit Embedded Multiplier 9-bit elements
65,088 (6%)
66,360 (6%)
852,792 (74%)
788,080 (68%)
4 (1%)
11 (4%)
11 (4%)
22 (7%)
Total PLLs
1 (25%)
technique implemented a significant promotion on 3D reconstruction. SoPC is gaining popularity in practical applications due to its inherent flexibility and highly customizable nature. This could reduce the system complexity and increase the reliability greatly, especially to the medical equipments. Therefore, performance requirement can be reached if hardware acceleration is fully explored. Since the accuracy of the system is affected by multiple factors, further work is needed to resolve the issue of precision. ACKNOWLEDGMENT We would like to thank the support from National 863 High Technology Plan (2006AA02Z4E7) and National Natural Science Foundation of China (30772195). REFERENCES [1]
1 (25%)
1 (25%)
1 (25%)
3) Results Discussion: Among the different methods we implemented, the Floating-Point Custom Insturction hardware acceleration enhanced the performance most, more than 10 times speedup, but in price of 10% LEs and 7 hardware extra multipliers. As reported by Altera, the Floating-Point Custom Instruction can acclerate the floating-point arithmetic operations by at least 12 times, so the results are reasonable. The TCM method increased the speed by 16% but paramountly we achieved a fixed low-latency key data accessing time in guarrentee, this is important for the system real-time ability. Multiprocessor method provide a noval pipeline hardware architecture utilizing the flexibility of SPOC, which shares the workload of the 3D reconstruction computation, obtainted a ×1.7 speedup. Since the Epipolar Constraint and DLT method contain parallel computations in quantity, more processor cores can be added to the two parts of the pipeline respectively in condition of sufficient resources to satisfy a higher performance requirement. The Table III shows a 3 markers position tracking, and the calculation quantities relate to the marker number N. So we can also easily infer that, under the test results, the capability after hardware accelerations can support up to 32 markers real-time location without frame drop at the frame rate of 60fps. This achieves the similar ability according to the NDI Polaris system [14] but uses a much simplified architecture with low cost. V.
CONCLUSION
This paper improves a real-time infrared optical tracking system used in Image-Guided Surgical Navigation System. The hardware acceleration using Floating-Point Custom Instruction, Tightly Coupled Data Memory as well as the Multiprocessor
[2]
[3]
[4]
[5]
[6] [7] [8] [9] [10]
[11] [12]
[13]
[14]
1502
H. G. KenngottNeuhaus, B. P. Muller-Stich, I. Wolf, M. Vetter, H.-P. Meinzer, J. Koninger, et al.“Development of a navigation system for minimally invasive esophagectomy,” Surg Endosc, 2007. AD. Wiles, DG. Thompson, and DD. Frantz, “Accuracy assessment and interpretation for optical tracking system,” In Proceedings of SPIE. 2004, pp.421-432. H. Liao, N. Hata, S. Nakajima, M. Iwahara, I. Sakuma and T. Dohi, “Surgical navigation by autostereoscopic image overlay of integral videography,” IEEE Transactions on Information Technology in Biomedicine. 2004 , pp.114-121. P. Zhou, Y.Liu, Y.Wang, “Multiple Infrared Markers Based Real-time Stereo Vision Positioning System for Surgical Navigation,” International Instrumentation and Measurement Technology Conference. 2009, pp. 692-696. Abdel-Aziz, Karara, “Direct linear transformation into object space coordinates in close-range photogrametry,” Proceeding of the ASP Symposium on Close-Range Photogrametry. 1971, pp. 420-475. O.Faugeras, Three-Dimensional Computer Vision: a Geometric View point, MIT Press, 1993. Altera Corporation, Nios II Processor Reference Handbook. Altera Corporation, Using Nios II Floating-Point Custom Instructions Tutorial. Altera Corporation, Using Tightly Coupled Memory with Nios II Processor. A.Irwansyah, V.Nambiar, M.Khalil-Hani, “An AES Tightly Coupled Hardware Accelerator in an FPGA-based Embedded Processor Core,” International Conference on Computer Engineering and Technology. 2009, pp. 521-525. Altera Corporation, Creating Multiprocessor Nios II Systems Tutorial. A.Kulmala, O.Lehtoranta, TD Hämäläinen, “Scalable MPEG-4 Encoder on FPGA Multiprocessor SOC,” Hindawi Publishing Corporation, EURASIP Journal on Embedded Systems, Vol 2006, pp.1-15. L. Zou, Z. Fu, Y. Zhao, J. Yang, “A Pipelined Architecture for Real Time Correction of Nonuniformity in Infrared focal-plane-arrays imaging system Using Multiprocessors,” Infrared Physics & Technology. 2010, pp. 7-10. Northern Digital Inc. “Polaris family of optical tracking systems,” http://www.ndigital.com/medical/polarisfamily.php. 2008.