in software. Filter. The filters are .... custom or reconfigurable processors, or other pipelined ..... providing for easy upgrades, or custom tailoring to specific â high ...
The Acadia vision processor
CAMP2000 – final
1
The Acadia Vision Processor Gooitzen van der Wal, Mike Hansen, and Mike Piacentino Sarnoff Corporation gvanderwal| mhansen | mpiacentino @sarnoff.com
Abstract Presented is a new 80 GOPS video-processing chip capable of performing video rate vision applications. These applications include real-time video stabilization, mosaicking, video fusion, motion-stereo and video enhancement. The new vision chip, codenamed Acadia, is the result of over 15 years of research and development by Sarnoff in the areas of multi-resolution pyramid-based vision processing and efficient computational architectures. The Acadia chip represents the third generation of ASIC technology developed by Sarnoff, and incorporates the processing functions found in Sarnoff’s earlier PYR-1 and PYR-2 pyramid processing chips as well as numerous other functions found in Sarnoff-developed video processing systems, including the PVT200. A demonstration board is being implemented and includes two video decoders, a video encoder, and a PCI interface.
1. Introduction Real-time video processing and analysis is computationally a very demanding task for which many dedicated video processing systems and chips have been implemented. Sarnoff has been on the forefront of research and development of real-time algorithms and efficient computational architectures based on multi-resolution image representation, often referred to as image pyramids [1]. In many applications, the processing can be divided into two stages, a pre-processing stage, and a postprocessing stage. The pre-processing stage involves data intensive, but regular processing functions applied to the full image. These include transformations of data representation (i.e. image pyramids), motion analysis, image stabilization, stereo disparity images, feature enhancements, signal enhancements, noise reduction, image fusion, and many other types of processing functions. We often refer to the pre-processing stage as ‘vision front end’ (VFE) processing. The post processing stage is typically more involved with scene interpretation, feature tracking, and system control. Since this is a more irregular type of processing, it is best performed
on general purpose processors, here referred to as the ‘host’. Figure 1 is a visualization of these two stages, with active sensor control as an example of control based on scene interpretation.
Figure 1. two-stage processing Sarnoff has now developed a new video-processing chip capable of performing video rate vision functions for the video front end. The Acadia video processing chip represents the third generation of ASIC technology developed by Sarnoff for real-time video processing, and incorporates the processing functions found in Sarnoff’s earlier PYR-1 and PYR-2 pyramid processing ASICs [2] as well as numerous other functions found in Sarnoff-developed video processing systems, including the VFE100 and VFE200 [3,4,5,6]. The Acadia is designed to support real-time affine motion analysis and 3D recovery from stereo for demanding vision tasks such as accurate video stabilization, mosaicking, and motion-stereo capabilities for vehicle navigation. In addition the chip supports real-time image fusion of multi-spectral imagery, such as IR and visible camera sources, as well as extended dynamic range and extended focal depth through image fusion.
2. The Acadia Vision Processor The Acadia vision processor uses a highly modular architecture consisting of multiple processing modules connected to a crosspoint switch, multi-port access to shared memory, flexible digital video interfaces, link ports for expansion, and a control interface. The architecture is based on pipelined pyramid processing systems previously implemented by Sarnoff [3,4,5,6].
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
CAMP2000 – final
2
converted to the values [-1, 0, 1]. Then the images are multiplied. The two global correlation functions can be formulated as: for n = -4 to 4 gcr (n) = ∑ A( x, y ) − B ( x + n, y ) or gcr (n) = ∑ tri( A( x, y ), thr ) ×tri( B( x, y ), thr ) This operation can be repeated many times with different vertical shifts or horizontal shifts, using delay controls in the frame store ports. Either of these functions can be used very effectively to compute global image translations. By performing this on relatively low resolution levels of the image pyramid, a good initial estimate on motion can be computed for very large image translations. Fig.2 Acadia Chip Diagram
2.1. Pipelined processing modules The central part of the Acadia is a set of dedicated processing modules that are connected via a crosspoint switch, so that the modules can be concatenated in any order, or can operate independently in parallel on different size images. Processing modules include a pair of 5 x 5 image filters, a global correlator, a global motion estimator, a stereo module, a pair of 8-bit lookup tables (LUT), and a four input ALU. As with previous Sarnoff system implementations, each image data path through the crosspoint switch includes timing information for ease of control in hardware and in software. Filter. The filters are similar to the filter units in Sarnoff’s PYR-1 and PYR-2 chips[2]. They include four 1k line delays with automatic edge repeat and reflect and an add/subtract at the output to provide simultaneous difference of Gaussian filters or for reconstruction of image pyramid representations. The 5x5 filter has 25 programmable coefficients, and supports spread-tap filters. Global correlator. The global correlator provides nine image correlation values between two images, where the second image is shifted +/- 4 pixels (9 locations) relative to the first image. Each of the nine correlation is summed over the full image extent and stored in an accumulation register. When the image is complete, the results in the nine accumulators can be accessed via the control interface. The two correlation functions supported are SAD (sum of absolute differences) and sum of tri-level multiplies. For the tri-level operation, the value of each pixel is first compared to a threshold, and
Global motion estimator. The global motion estimator has two programmable filters on one input: dx, dy, and the sum or difference between the first and a second image input - dt. The dx, dy, and dt are then sent to a set of programmable multiply accumulate (PMAC) functions. The accumulation occurs over the full image extent and the results are accessible in a set of accumulator registers. The PMAC function forms the basic module to compute the elements for the parametric image registration as mentioned in detail later in this paper. The dx and dy filters are 3x3 filters that compute various horizontal or vertical gradients, and are also accessible directly on the crosspoint switch. Stereo. The stereo module computes 32 SAD functions between two input images for each pixel in the image. The sum of each SAD is computed over a local 7x7, 9x7, 11x7, or 13x7 region. The local SAD for each pixel over a 7x7 local region, and shift s can be formulated as: SAD ( x, y, s ) =
x = 3, j = 3
∑
A( x + i, y + j ) − B ( x + i − s, y + i )
i = − 3, j = − 3
This is computed for every pixel for s = 0 to 31. The minimum of the 32 SAD values is detected and interpolated for every pixel in the image. The result is a stereo disparity image that can be used for robust depth estimations at full video data rates. The output is both a 8 bit disparity value, and the 16 bit value of the minimum SAD for that location in the image. The stereo operation can be repeated to double the range of the disparity values. In addition, the stereo operation can be repeated for right – to – left disparity calculations, and compared to the left – to – right disparity calculations for increased robustness of the stereo calculations. This left-right checking function is part of the ALU.
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
CAMP2000 – final
ALU. The ALU can multiply, and scale, 8 and 16 bit image data. In addition it can select minimum or maximum values of scaled data, which is essential for image filtering over time, and for combining Laplacian images for image fusion. The ALU can also divide two 8 bit or 16 bit images and provide a 16 bit result image at full pixel rate. The divider implements a floating point operation on two fixed point images, and provides a scaled 16 bit output image. In addition, the ALU can perform a left-right check for the stereo module disparity maps (noted as WRL in the diagram). A block diagram of the ALU is shown below.
3
a sustainable memory access bandwidth in the range of up to 600 Mbyte / sec. Two special features in the Acadia are key in providing reliable access of data to and from the external memory when the processing approaches the maximum data rate of the memory bus. The first is a programmable priority level assigned to every memory port module. For example, time critical ports, such as those involved with the video I/O, can be given a higher priority than others. The second is the use of the timing channels that are associated with each internal streaming data port. These provide the resource for the memory ports to hold-off processing during horizontal blanking time, if the buffers are not serviced quickly enough. They also provide a mechanism to keep data streams that require synchronization, in lock-step with each other on a lineby-line basis.
2.3. Image warping
Figure 3. ALU block diagram Processing rate. All the pipelined processing modules process the image data at the full internal clock rate, with a small overhead for blanking time between each line. At 100 MHz, a 512 x 512 image can be processed by any one of these modules, or a sequence of these modules, in 2.75 ms. A full image pyramid with both Gaussian and Laplacian results, using one of the filter modules, can be computed in about 3.7 ms.
2.2. Memory interface Another main feature of the Acadia is the multiport video frame store which connects the 800 MB/sec SDRAM with multiple streaming video ports to and from the crosspoint, the digital video interfaces, and the image warpers. All frame store ports have local data buffers and control, which guarantee consistent and independent streaming video to the processing modules. In addition, each port is capable of image up and down sampling for very efficient multi-resolution (pyramid) image processing. Even though the memory interface has the capacity of 800 Mbyte / sec, the required overhead of providing row addressing, and switching between multiple ports, reduces the sustainable data rate. In the Acadia, the buffers in the frame store ports 2k byte, providing for
There are two image warping units that can efficiently perform affine transformations (rotations, scale and zoom) on images in memory [7]. The warper modules include bi-cubic interpolation for very high quality image transforms. The warper units contain each a 16k byte set-associative data cache for fast image warping and interpolation, making the operations only about 25% slower than other pipeline operations. If the transforms cause significant cache misses, the warping function will slow down significantly. However, in many of these cases the image warping can be performed efficiently by warping the image as a sequence of large image blocks (i.e. 100 x 100). The warper is also designed to warp an interleaved UV image, to support efficient data management of color images.
2.4. Masking operations The Acadia provides masking operations for some of the modules. During the store operation, the FSPort can detect the presence of a mask value and not store that data in memory. This supports writing aligned images directly into a mosaic image stored in memory. The warper can insert mask data for the output image pixels that do not have data support in the warp source image. The global correlator and global motion estimator can be programmed to detect the mask and not accumulate the data where there is a mask value. The ALU can detect masked data at either of its inputs, and mask the output data accordingly, and it explicitly mask output data based on conditions in the ALU, such as values that are below a threshold or a
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
CAMP2000 – final
divide-by-zero detection. The stereo module masks all data outputs that do not have full support of the input images, including input pixels that are masked values. Since implementation of the external memory highly favors 8 bit data, there is no extra bit available for a separate mask bit for each image. Therefore masked pixels are represented as a specific 8 bit value in the Acadia, such as the value 0, or 255, or –128 for signed image data. This limits the available range of data that can be represented. But in many cases the data used is from standard video in BT656 format, which typically limits data to a range from 16 to 240, and specifically excludes 0 and 255 data values, because those values are used to identify timing codes in the data stream.
4
In a typical application, the external controller also provides some high level processing functions. The Acadia PCI demo board discussed below uses an embedded PowerPC with floating point capability for this function, and provides PCI burst access to the video data through a link port.
3. Using the Acadia Vision Processor
There are two digital video input ports and one digital video output port on the Acadia chip that provide YUV (4:2:2) interfaces directly to the SDRAM. The video ports are compatible with BT601 and BT656, but allow for programmable image sizes, and asynchronous data rates up to 70 MHz. This provides glue-less interfaces to many common video interface devices, such as video decoders and encoders, and image compression encoders and decoders such as MPEG, and DV. The video ports can also be adapted to other digital image formats, sizes, and frame rates. The digital video ports have each two dedicated frame store ports into or out of the SDRAM. This provides for separation of Y and UV data.
The Acadia is designed with a high degree of flexibility, so that a large variety of video processing algorithms can be implemented using this chip. An example is provided to illustrate the basic functionality of the chip. Then a few of the possible Acadia applications are described. Note that these could individually be a complete application, but they can also be an essential first step in providing preprocessed video to advanced video processing systems applications. The video image operations in the Acadia can be described as a set of segmented pipeline operations [8]. Each segmented pipeline operation starts with an image from memory of size n x m, is sent as a data stream through one or more pipeline processes, and stored back into memory. During the memory operations, the data can be up-sampled or downsampled. In a typical application, the algorithm can be efficiently described and implemented as a set of segmented pipelines that are performed in sequence and/or in parallel. The Acadia video processor, can perform multiple segmented pipeline operations in parallel, where each can operate on different size images.
2.6. Link ports
3.1. Combining two image pyramids
There are four bi-directional link ports included in the Acadia chip. The link ports provide interconnectivity to (a) multiple Acadia chips, (b) custom or reconfigurable processors, or other pipelined processors, or (c) high speed data transfer interfaces to system busses, such as the PCI bus.
A good example of how processing takes place in the Acadia is the computation of the image pyramid. An example of a Gaussian and Laplacian pyramid [1] is shown in figure 4. Both pyramids can be generated simultaneously with a single filter in the Acadia. Since there are two filters available, two pyramids can be computed at the same time, while the results can be combined with the ALU. Figure 5 shows a segmented pipeline operation, where the frame store ports from or to the external memory are indicated with FSP, and the filters are indicated with FLT. Each filter is programmed to compute a Gaussian filter at output 1, and a Laplacian filter at output 2. One image is sent by FSP1 to FLT1 simultaneous with a second image from FSP2 to FLT2. The Gaussian outputs of the FLT1 and FLT2 are sent to FSP3 and FSP4. For a pyramid (multiresolution) operation, the Gaussian outputs are sub-sampled by a factor of 2 in both horizontal and vertical direction by
2.5. Digital video interface
2.7. Control interface The Acadia requires an external controller to configure each module. The modules are configured for each image operation through a set of control and status registers, and return an interrupt at the completion of the image operation. The external memory is also accessible through the control interface, but is not meant for high bandwidth access to the video data. The link ports provide a better method to access video data at a high bandwidth.
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
CAMP2000 – final
FSP3 and FSP4. The Laplacian outputs of FLT1 and FLT2 are sent to the ALU, which can perform a variety of operations, such as: - weighted average: k3.(A.k1 + C.k2) - k3 . min/max(A.k1, C.k2) - if (|A.k1| > |C.k2|) then k3.A.k1 else k3.C.k2 - k3.A / (A.k1+C.k2). The ALU output can be stored either as 8 bit data to FSP5, or as 16 bit data to two frame store ports FSP5 and FSP6.
5
level. As mentioned before, this operation on a 512 x 512 image will require about 2.7 ms. The total time on a full image pyramid starting with images of 512 x 512 requires only 3.7 ms.
3.2. Parallel segmented pipelines in the Acadia The pyramid example above is an example of one segmented pipeline operation in the Acadia. Several segmented pipeline operations can occur in parallel. The set of segmented pipelines in figure 6 show such an example.
Figure 6. Parallel segmented pipelines
Figure 4. Gaussian and Laplacian pyramid
Figure 5. Two pyramid filters and alu operation in Acadia To perform this operation on a full image pyramid, the above computations are repeated on the Gaussian images stored by FSP3 and FSP4. Since these images are sub-sampled by 2, the next level operation requires only ¼ the processing time of the current pyramid
First, two video sources are captured and stored in memory as separate Y and UV images. And a processed Y and UV image is sent to a display. Since a standard video source represents about 20M bytes/sec data rate, these three processes require 60M bytes/sec, which is only about 10% of the available memory bandwidth and can easily occur in parallel with other operations. These three segmented pipelines require 16.7 ms for standard field rate processing. Second, a Y and UV image may be warped at full resolution before being sent to the display output, using a WRP and FSP. Several of these operations may occur in sequence during the 16.7 ms required by the video I/O. Third, an FLT may be used to apply a variety of filter and pyramid operations in sequence. Fourth, a sequence of operations may occur at a lower resolution of the image data, such as global motion estimations. Each of the above operations take a different amount of processing time, so the processes that take
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
CAMP2000 – final
less time can be repeated many times in parallel with the processes that take a longer time.
3.3. Parametric Image Registration Parametric image registration is a key video processing operation that enables the motion between image pairs to be measured accurately. One effective method of performing this function, described in [9], uses partial spatio-temporal derivatives of the two images to compute the motion between the images, yielding a parametric transformation that registers the images. In [9], the parametric transformation was described to be an affine transformation, although other parametric transformations can be easily derived using similar methods. The transformation that is to be solved is of the following form: I ( x, y, t ) = I ( x + du, y + du, t + 1) with du = a + bx + cy and dv = d + ex + fy The solution for the transformation for the variables a, b, ...., f using the direct method shown in equation (1) given that the summations are performed over the entire spatial extent of the two images. ∑ ∑ ∑ ∑ ∑ ∑
I x2 xI x2 yI x2 IxI y xI x I y yI x I y
∑ ∑ ∑ ∑ ∑ ∑
xI x2 x 2 I x2 xyI x2 xI x I y 2
x IxI y xyI x I y
∑ ∑ ∑ ∑ ∑ ∑
yI x2 xyI x2 y 2 I x2 yI x I y xyI x I y y 2I x I y
∑ ∑ ∑ ∑ ∑ ∑
IxI y xI x I y yI x I y I y2 xI y2 yI y2
∑ ∑ ∑ ∑ ∑ ∑
xI x I y x2IxI y xyI x I y xI y2 x 2 I y2 xyI y2
∑ ∑ ∑ ∑ ∑ ∑
yI x I y a − ∑ I x I t − xI I xyI x I y b ∑ x t 2 y I x I y c − ∑ yI x I t . = (1) d − ∑ I y I t yI y2 − ∑ xI y I t xyI y2 e − ∑ yI y I t f y 2 I y2
Solving the system of equations shown in (1) requires a great deal of resources using generalpurpose processors. However, it can be efficiently solved if the coefficients for the matrices shown in (1) can be computed quickly. The desired coefficients in (1) are all functions of I x , I y , I t , x , and y , the partial derivative of the input image in the horizontal, vertical, and time direction, and the horizontal and vertical position counters. The global motion estimator in the Acadia contains the filters to compute the three image partial derivatives I x , I y , and I t , followed by a set of programmable multiply/accumulate functions (PMACs) for computing the terms in equation (1). A total of 24 unique terms have to be computed, requiring four passes through the PMAC in the Acadia global motion estimator. The accumulator values are read by the Acadia controller and used to solve the equation, providing an affine estimation of the image motion.
6
3.4. Image Stabilization In many current and future real-time video applications, camera induced motion of the video significantly reduces the capability and robustness of performing analysis on the video stream. For example, detecting moving objects in the scene, or enhancing the video by combining multiple images over time, require a high degree image stability [5,6, 10]. The Acadia is capable of performing video rate stabilization on standard video (720 x 480 frames in color) with an accuracy of up to 1/10 of a pixel at full resolution, and can measure image motions as large as 64 pixels per field. The image motion modeled is an affine transformation, including image translation, scaling and rotation. This function is achieved by performing a highly optimized parametric image registration implementation several times at several resolutions of the image, for each field. The image to image registration is achieved by first applying image correlation at consecutive images at low resolution to detect large translations. This uses the global correlation function of the Acadia. The image is then warped with this initial coarse motion estimate, and residual motions are computed through subsequent registrations with the parametric image registration function described above. This warp and registration process is repeated multiple times, providing a successive, iterative refinement of the registration in coarse-to-fine fashion.
3.5. Video mosaicking An image mosaic is a composite of a set images into a larger – panoramic – image. An example of a mosaic is shown in figure 8. Video mosaicking is the same in that it combines consecutive frames of a video sequence into a composite display. This function provide greatly enhanced visualization of a video scene, especially when the images are captured by telephoto lenses. Very low bit-rate compression can also take great advantage of mosaicking, by sending mosaic images instead of the original images, and then sending differences relative to the mosaic. The real-time stabilization function described above is the basis for providing video-rate mosaic composition. The high accuracy of the registration function, combined with registration of the video to the mosaic composite, can provide highly accurate mosaic compositions.
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
CAMP2000 – final
7
Figure 7. Video Mosaic
3.6. Video Fusion There are several applications where it is highly desirable to combine two video streams of the same scene. One example is combining the video of an IR (infra-red) video with standard visible video, providing significantly enhanced visibility of the scene [11]. Another example is combining video from two cameras that point at the same scene, but with different focal length, providing enhanced depth of focus. A third example is combining video from two cameras that have a different aperture setting, providing significantly enhanced dynamic range to the display [12,13].
TV image
In all these applications it is essential to preserve the more significant detail from each of the video stream on a pixel by pixel basis. The Laplacian pyramid fusion provides excellent automatic selection of the more important image detail for every pixel in the image at multiple image resolutions. And by performing this selection in the multiresolution representation, the reconstructed – fused – image provides a very natural-looking scene. The Acadia chip provides all the functionality required to compute Laplacian pyramid fusion at full video rate. This includes enhanced capabilities in the convolution units, as well as a specialized fusion selection function
IR image
fused image
Fig 8. Video fusion of IR and visible images
background focus image
foreground focus image
fused image
Fig 9. Video fusion for extended depth of field
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
underexposed image
CAMP2000 – final
overexposed image
8
fused image
Fig 10. Video fusion for extended dynamic range
3.7. Stereo analysis There are many applications where image analysis based on appearance (i.e. image intensity variations, texture, and color) is not sufficient or not robust. In these applications, it may be highly beneficial to compute image structure in the scene prior to further analysis. This can be achieved by analyzing the image using two cameras, and compute the depth structure of the scene. The Acadia has a dedicated function on the chip that provides many parallel SAD (sum of absolute difference) computations to compute image disparity at multiple resolutions of the video. By combining this fast stereo computation function with image warping, the system can be made very sensitive to structure in specific areas in the field of view. For example, by translating one image relative to the other image, the stereo analysis can be programmed to be sensitive to objects at a particular distance. Or by warping the left image to the right image, so that two images are aligned in a planar field that matches the road surface in front of the camera, the stereo analysis is very sensitive to small objects on the road, or holes in the road. This has been referred to as “horopter-stereo’
and has been effectively demonstrated with the PVT200, operating in real time on vehicles [4]. Such an advanced function could also significantly enhance the robustness of detecting and analyzing moving objects for surveillance applications, which are notorious for their lack of robustness.
3.8. Acadia as a video front end processor In many applications, the Acadia can operate as a video front end (VFE) processor as mentioned in the first section. In this mode, the Acadia captures image data from one or more sources, performs complex and data intensive operations, mostly at the image level, and provides the processed data to a host processor via the video output or via a data link to a host computer for further processing. Examples are: pre-filtered, and pre-warped images, multi-resolution representations of the video, stabilized video with the computed motion parameters for each frame, image mosaics, or a stereo disparity images along with the original images. The Acadia PCI demonstration board described below is an excellent example of an efficient, and cost effective VFE for a standard PC, or high end Sun or SGI processing platform.
Figure 11. A stereo input image and computed elevation map
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
CAMP2000 – final
4. Acadia implementation Internal processing rates for Acadia is 108 MHz, yielding a throughput of 108 million pixels per second for each Acadia image-processing element. The total computational power of the Acadia ASIC is in the order of 80 billion operations per second (80 GOPS). The design of the Acadia chip is highly modular, providing for easy upgrades, or custom tailoring to specific – high volume – applications. As an example, the stereo module, which is only connected to the crosspoint switch, was added to the total design at the end of our project, with minimal impact to the schedule, while more than doubling the total operations per second available on the chip. Duplicating modules on the chip was easy because of the use of timing signals, a crosspoint switch, and standardized control and data stream interfaces to the modules. Also verification of duplicate modules was easy, as explained in the next section. The Acadia used 12 duplicates of the frame store port, and two duplicates of each of the warper, the LUT and the filter. The Acadia consists of 576k bits of SRAM, and 1M gates. The chip is implemented in TSMC 0.25 µm technology with one level poly and 5 levels of metal. Artisan cell libraries and memory compilers were used for the implementation. Verilog was used for the RTL level design, after which is was compiled to gate level using Synopsys design compiler. Cadence services provided the test insertion and netlist routing. The chip is packaged in a 492 pin BGA. First silicon of Acadia is expected at the time of this conference (September 2000), and system demonstrations are scheduled for the 4th quarter of 2000.
4.1. Acadia design verification The highly modular design structure, and the timing signals that are associated with each video datapath in and out of the modules, simplified the verification of the design significantly. Because of these factors, most modules could be modeled as a C function that operates on standard images, without regard to the actual pipeline delay required through the module, or the exact timing of inactive data. The Cmodule had to represent the exact functionality and the same bit-accuracy as the hardware implementation, but did not have to duplicate the hardware implementation. For many modules the verification was accomplished by converting the data stream going into and out of each module from and to images, after
9
which it was easy to verify the C-model results against the hardware design at RTL and at gate_level. For the external SDRAM we used the model provided by the manufacturer, and connected the model directly to the hardware design. This was not only a tremendous advantage in verifying the SDRAM interface, but also provided for a higher level of design verification. At chip level, the verification method preloads the SDRAM with desired image data, then runs a sequence of image operations with results stored into SDRAM. The simulations are then applied on the Cmodel of the chip and the hardware design independently, after which the SDRAM is compared for verification. The digital video I/O was modeled as a stream of data that converted from or to an image. The link ports were connected to each other through an external mux to verify their functionality, without modeling the I/O of the link ports in detail. For the control interface, and for enabling an easy “programmability” of the verification steps, a model of a simple controller was created. Both the C-model simulations and the hardware simulations were controlled through an instruction file. In addition, this allowed verification of register data in the chip, such as status and accumulator values. This method made it easy to set up a wide variety of verifications on the design, including the programming and simulation of the stabilization function. The chip includes JTAG for boundary scan, BIST functions for all the SRAM blocks, and a set of scan chains for full scan testing of the manufactured chip.
4.2. Acadia PCI board An Acadia PCI demonstration board include two standard video input decoders, one video output encoder, a PowerPC based controller (MPC8240), 48M byte SDRAM, 2M byte Flash, and a PCI interface for high bandwidth image data transfer between the host and the PCI board. The board can also operate in stand-alone mode for dedicated applications. The PowerPC is responsible for coordinating all of the real-time functions, and is fully dedicated to servicing and scheduling the operations within Acadia. Results of Acadia processing can be received from or sent to a host processor through the provided PCI host interface. The video decoders convert standard analog RS170, NTSC, or PAL video (composite, or S-video) to digital format for the Acadia video inputs. The video encoder converts the digital video output from the Acadia to standard video output as composite video and S-video.
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.
The Acadia vision processor
CAMP2000 – final
10
[5] R. Mandelbaum, M. Hansen, P. Burt, S. Baten, “Vision for Autonomous Mobility: Image Processing on the VFE200”, Proceedings of the IEEE International Symposium on Intelligent Control (ISIC), International Symposium on Computational Intelligence in Robotics and Automation (CIRA), and Intelligent Systems and Semiotics (ISAS), Gaithersburg, Maryland, U.S.A., September 1998. [6] M.W. Hansen, P. Anandan, K. Dana, G.S. van der Wal, and P.J. Burt, “Real-time scene stabilization and mosaic construction”. Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision. December 5-7, 1994, pp. 54-62. [7] G. Wolberg, “Digital Image Warping,” IEEE Computer Society Press, 1990.
Figure 12. Acadia PCI board
[8] P.J. Burt and G.S. van der Wal, “Iconic Image Analysis with the Pyramid Vision Machine”, Workshop on Computer Architecture for Pattern Analysis and Machine Intelligence, Seattle, Washington, October 5-7, 1987. pp.137-144
5. Acknowledgements
[9] J. Bergen, P. Anandan, K. Hanna, and R. Hingorani, “Hierarchical Model-Based Motion Estimation”, Proceedings of European Conference on Computer Vision, March 1992.
We had a small, but terrific team working on this chip. In addition to the authors of this paper, these include Fred Brehm, Greg Buchanan, Greg Burns, and Jayan Eledath of Sarnoff Corporation. The Acadia ASIC is in large part funded by DARPA.
[10] L. Wixson, J. Eledath, H.W. Hansen, R. Mandelbaum, D. Mishra, “Image Alignment for Precise Camera Fixation and Aim”, Proceedings of the Conference on Computer Vision and Pattern Recgnition (CVPR’98), Santa Barbara, California, June 1998.
6. References
[11] P.J. Burt, “Pattern selective fusion of ir and visible images using pyramid transforms”. National Symposiun on Sensor Fusion, 1992.
[1] P.J. Burt, “The Pyramid as Structure for Efficient Computation, Multiresolution Image Processing and Analysis". Springer Verlag, 1984.
[12] L. Bogoni, M.W. Hansen, and P.J. Burt, “Image Enhancement using Pattern-Selective Color Image Fusion”, 10th International Conference on Image Analysis and Processing, Venice, Italy, September 27-29, 1999, pp.44-49.
[2] G. van der Wal and P. Burt, “A VLSI Pyramid Chip for Multiresolution Image Analysis,” International Journal of Computer Vision, 8:3, 1992, pp. 177-189. [3] M.R. Piacentino, G.S. van der Wal, and M.W. Hansen, “Reconfigurable Elements for a Video Pipeline Processor”, IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM99), Napa Valley, California, April 21-23, 1999, pp. 82-91.
[13] L. Bogoni, “Extending Dynamic Range of Monochrome and Color Images through Fusion”, International Conference on Pattern Recognition (ICPR00), Barcelona, Spain, Sept. 2000.
[4] R. Mandelbaum, L. McDowell, L. Bogoni, B. Reich, M. Hansen, “Real-time stereo processing, obstacle detection, and terrain estimation from vehicle-mounted stereo cameras”, Proceedings of the 4th IEEE Workshop on Applications of Computer Vision (WACV'98), Princeton, New Jersey, October 1998.
To be published in IEEE proceedings of International Workshop on Computer Architecture for Machine Perception, Padua, Italy, September 2000.