Document not found! Please try again

Multimedia Video Engine: an Embedded Video Processing System

0 downloads 0 Views 305KB Size Report
Feb 1, 2005 - Video Processing System Based on a DSP ... The Multimedia Video Engine is an embedded system for real-time image-processing application.
February 2005

Multimedia Video Engine: an Embedded Video Processing System Based on a DSP V. Gemignani1, F. Faita1, M. Giannoni1, A. Benassi1, M. Demi1,2 1

CNR - Institute of Clinical Physiology, Pisa, Italy 1

Esaote SpA, Florence, Italy

1. Abstract The Multimedia Video Engine is an embedded system for real-time image-processing application The apparatus captures a video signal from an analog source, it processes the data by using a high performance digital signal processor (TMS320C6415), and then displays the results on a graphics monitor. Since several video imaging applications require user interaction, a graphical user interface (GUI) has been developed, which is operated by a mouse and a keyboard. Althoug less complex, the GUI is similar to those we commonly use in standard workstations and can contain a number of objects such as buttons, numerical displays, text, graphs etc. A software environment is also provided to program and customize the platform. Image processing algorithms can be easily integrated into the system as pieces of code written in C language. As f the GUI and the other resources of the board, they can be customized by using provided library functions. 2. Introduction The development of new video processing algorithms is a challenging task. Generally, the first step is to implement and test a procedure using a workstation, where complex mathematical software tools are available. In this phase, the algorithms are tested on sequences of images in off-line mode. Once the algorithms provide satisfactory results, the problem of achieving a realtime implementation is addressed. In most cases, a new hardware platform is used, which is significantly different from a workstation, and remarkable effort is required to accomplish an implementation on the new system. A simplification of this passage or even the development of the algorithms directly on the real-time platform would be a great advantage. Several special-purpose hardware devices can be used to obtain the processing power necessary for video processing applications. When the main aim is the evaluation and the testing of new procedures, a software-only solution based on a microprocessor is often preferred as it can mak the real-time implementation work easier. Moreover, it is suitable for the test phase, where the algorithms and their parameters may require adjustments. An example of VLSI IC designed for software-only real-time video processing is found in [1]. Other papers [2,3] used general-purpos video processors, such us the Texas Instruments' MVP [4].

Figure 1. System overview In recent years, new DSP devices have been developed which have the processing power needed for video processing. In particular, the Texas Instruments' TMS320C64x is a device which is bas on a VLIW architecture that includes support for packed data processing and special purpose instructions to accelerate imaging applications. Furthermore, the new DSP can be programmed i high-level language and can be used with code generation tools, such us Matlab Real-Time Workshop [5]. For these reasons, the TMS320C64x can be efficiently used for the rapid prototyping of algorithms [6]. There are several DSP boards for video applications which are based on the TMS32C64x and can be used for this purpose. In order to simplify the developing and testing phase, however, these should have some features that are not commonly found in video processing boards. First of all, the system should be easy-to-use. The implementation of a video processing algorithm should b possible without an in-depth knowledge of the system so as to allow a rapid switch from the workstation to the new architecture. In addition, an advanced user interaction with the system c be of great help in the developing and testing phase. The possibility of monitoring a numerical value in real time, changing the parameter of the algorithms, or switching between different algorithms are all characteristics which can be very useful when a new procedure is evaluated. I other words, the system should have a graphical user interface similar to the one we are used to utilizing in a workstation. Bearing this in mind, we developed a new video processing board based on the TMS32C64x. The architecture of the system was designed so as to integrate a complex graphical user interface. In particular, the board was equipped with a VGA graphic controller to efficiently manage a graphic display and a USB controller to interface the system with a mouse and a keyboard. Then a GUI was developed which, although less complex, is similar to those commonly used in standard workstations. Lastly, we developed a software framework which manages all the activities of the board and can be used to prototype video processing applications rapidly. 3. Hardware Overview The architecture of the board consists of six modules as depicted in figure 2:

Figure 2. Block diagram of the video processing system

(i) DSP (ii) Memory (iii) Video-input (iv) Video-output (v) Analog I/O (vi) Peripherals The DSP is the “heart” of the board. It sets up the other modules, manages the data movements and performs the data processing activity. The device we used is the Texas Instruments' TMS320C6415 [7], a fixed-point digital signal processor which provides remarkable performance in video processing. Its CPU, which has a Very-Long-Instruction-Word (VLIW) architecture, can carry out eight 32-bit instructions/cycle at 600MHz clock rate, that is 4.8 billion instructions/second. Moreover, special instructions can perform two arithmetic operations in parallel with 16-bit operands or four arithmetic operations with 8-bit operands, reaching a total o twenty-eight operations/cycle. The capability of performing multiple operations is of particular relevance in video processing applications since the points of the image are usually represented 8-bit or 16 bit data. Besides having remarkable processing power, the DSP provides several resources to efficiently manage the data movement. First of all, three buses (one 32-bit PCI bus one 64-bit and one 16-bit memory interface) are available to simultaneously move large amount of data. Then, three multi-channel buffered serial ports exist to easily connect the processor to low-speed devices. Finally, an on-chip enhanced direct memory access controller (EDMA) is able efficiently move the data without loss of performance of the CPU. The memory module contains three memory banks: (i) A 512 Kbytes flash memory, which is used to permanently store the firmware of the DSP. (ii) A 4 Mbytes synchronous SRAM, which is mainly used for high speed data storing. (iii) A 512 Mbytes SDRAM module, which is used for storing large amounts of data. The video-input module captures the analog video signal, which is selected among two Y/C input or up to four CVBS inputs. The analog-to-digital converter is the Philips SAA7114, a programmab video decoder which is able to convert the color of PAL, SECAM and NTSC signals into ITU 601 compatible color component values. The device also contains a horizontal/vertical video scaler which performs zoom operations on the raw images. Once the data are captured, they pass

through the FPGA, which can be used to perform an Y/C to RGB color format conversion. They a then stored in a FIFO where they are accessible to the DSP. The video-output module, which is very similar to a VGA interface for personal computers, is based on the Asiliant 69000 graphics controller. This device integrates 2 Mbytes SDRAM, a 2D graphics accelerator and other hardware resources commonly available in PCs, such as a pop-up window and two cursors on the same chip. Moreover, high-quality video playback, which suppor both a RGB and Y/C video format and implements a double buffering to eliminate video tearing, can be obtained by using the 69000. The video output module also provides a multi-standard TV out when the VGA out is not used. The analog I/O module is a multi-channels A/D-D/A converter, which allows the system to captu and play back audio signals. The eight inputs are captured with a resolution of 12-bit while the four outputs have a 10-bit resolution. The sampling rate is of up to 133 KHz. The peripheral module allows the system to be interfaced with external devices. In particular, th board is equipped with a full speed USB 1.1 controller which drives two downstream ports and o upstream port. The USB connection allows the board to be interfaced with a large amount of offthe-shelf devices, such as mice, keyboards, hard drives, etc. Finally, a configurable RS232 serial port and a JTAG connection are present. 4. Software Environment A software platform was implemented by starting from DSP/BIOS [8], a scalable real-time kerne specifically designed for the TI DSP platform. The kernel provides resources for pre-emptive multithreading, synchronization, interrupt handling, I/O services, and memory management. Having done that ,we implemented all the elements necessary for managing the peripherals of t board and for creating a GUI. Furthermore, we organized the video data flow so as to realize a software framework for video processing applications.

Figure 3: the software environment.

Six modules were integrated into the kernel (figure 3): (i) Device Initialization (ii) USB Stack (iii) GUI (iv) Video Input

(v) A/D-D/A (vi) Events Framework The initialization module contains the functions which set-up all the devices present on the board It runs just once at the system start-up, initializing the video decoder, the USB host controller, t FPGA, the graphics processor, the analog I/O subsystem, as well as all the DSP on-chip peripherals (Timers, External Memory Interfaces, PCI, EDMA and Serial Ports). The USB stack module contains the routines for the low-level communications between the DSP and the USB host controller ISP1161. By means of these routines it is possible to read and write the host controller registers, to setup and send the typical USB protocol packets (control, bulk, interrupt and isochronous packets) and to manage the data exchange between the host controlle and the peripherals. At present, only a mouse and a keyboard can be connected to the board. When the peripheral is attached, the USB enumeration protocol begins and subsequently, the corresponding peripheral driver is scheduled for running. The driver, which runs until the peripheral is detached, monitors the events generated by the peripheral (mouse movements, pressed keys, etc.) and notifies the Events Framework module of these events. The GUI module contains the routines for the management of graphical objects. The appearance of the GUI is very similar to those we commonly use in standard workstations. As the system is designed for video processing applications, the main component is, of course, a playback window A number of more general objects such as buttons, numerical controllers, numerical displays, led graphs, strings and images, are also available. The objects can be created and customized by th user code, usually at the initialisation of the system. Subsequently, a driver will refresh their graphical representation according to changes in their properties. To better understand the use of the GUI, the example of a button is reported in detail. To create button in the GUI, the function create_button() must be called. The function initializes all the properties of the button, which are: -

Name: name of the button; Position: position of the button in the GUI layout; Size: dimensions of the button; Bitmap: specifies the bitmap of the button; Type: specifies how the button works (switch, pulse). Status: status (pressed/released) of the button; Active: specifies if the button responds to a mouse click;

Once the button is created, two functions are available for the software programmer to interact with: - change_status_button(), which changes the status of the button; - check_button_status(), which returns the status of the button. The driver which refreshes the graphical representation of the GUI will change the appearance o the button whenever the status changes. The Video Input module contains the routines for the set-up of the Video Decoder and the driver which manages the acquisition of an image in the input buffer. It is possible to choose the characteristics of the input video signal, such as resolution, color depth, brightness and contrast The A/D-D/A module contains the routines for the management of the analog signals. With this routine, it is possible to set up the sampling parameters (number of channels and sampling frequency). The Event Framework manages the events which are generated by the system and coordinates the interaction among the other modules. Furthermore, in response to some events, the Event Framework runs pieces of user code. For example, this module controls if the mouse has been clicked on a button of the GUI. When these two conditions are true, the event “button pressed” generated, which runs a piece of user code. This module also manages the video data flow and

generates the event “new image in the buffer” which can be used to start an image processing algorithm.

Figure 4. Video data flow.

The video data flow is schematically depicted in figure 4. The method used is based on a doublebuffering scheme: one buffer (capture-buffer) is used to acquire the current frame while a secon buffer (processing-buffer) contains the previous frame which is made available to the user for elaboration. Both of these buffers are placed in the internal memory of the DSP. The data captur by the video decoder are temporarily stored in the FIFO memory, as seen in figure 4. When a lin of the image is present in the memory, the data are moved to the capture buffer by a DMA transfer (DMA_1). Once all the lines have been moved to the current frame, that is, when one image is complete, the buffers switch and an event is generated which runs the user code on the new data. The user code processes the data and returns them to the output buffer, which can be allocated anywhere in the memory (on-chip, SBSRAM, SDRAM). The output data are then converted into the video output format and moved to the video buffer, located in the SBSRAM, b a DMA transfer (DMA_2). Finally, the data are moved to the playback buffer of the graphics processor through the PCI bus. 5. Developing a Video Processing Application The criteria for developing a video processing application on our system are similar to those use with object-oriented languages. The software engineer has to write pieces of code in response to events the most important of which is the acquisition of a new image in the input buffer. This event starts the main part of the software, where the video processing algorithms are implemented. As an example, let us consider the design of a simple video processing application which we use to test the system. In this application we implemented two classical edge detectors: the Sobel gradient and the Roberts gradient [9]. The video output can show the original video signal or the video signal processed by either of the two edge detectors. A threshold, whose value is set by th end-user, can also be applied to the results of the edge detection algorithms. The following steps have to be carried out to complete the design of the system:

(i) initialisation of video input/output parameters . (ii) GUI design (iii) development of pieces of code in response-to-events. (iv) development of the video processing code For step (i), it is sufficient to set-up just a few parameters of the image, such as size, color dept and zoom factor.

Figure 5. Implementation of edge detectors. The design of the GUI consists in the initialization of a number of graphical objects. With referen to figure 5, we decided to use: six buttons (4 plain buttons e 2 arrow buttons), six labels, one numerical indicator and one playback window. The third step consists in writing pieces of code in response to events which are generated by th GUI. For example, when the arrow “up” is pressed, the corresponding piece of code increases th value of the threshold and updates the numerical indicator. Finally, the part of the code which implements the video processing application is written. This code is run whenever a new image i acquired by the system. According to this fact, we could also consider this portion of the code as response to an event. However, since this is the main event in a video processing system, we preferred to distinguish it from the others. With reference to the example, the code will check th status of the buttons and will then execute the relative algorithm. 6. Results and Conclusions The development of the DSP board has led to the following key-results: (i) the realization of a programmable and easy customisable embedded board with remarkable processing resources; (ii) the integration of a flexible GUI on an small-sized embedded board. (iii) the realization of a hardware/software platform for the rapid prototyping of complex image processing applications. The example described in section 4 was used to benchmark the system. The system was able to process every pixel of every image of the sequence in real time when the video signal was acquired with a resolution of 512x512 pixels, 8 bit/pixel, 25 frame/sec. In the worst case, the elaboration of a single image was carried out in 16 msec (40 msec were available). Besides

processing the images, the DSP has to manage the peripherals, the video data I/O and the GUI. The benchmarks revealed that this overhead of about 9% does not compromise the performance of the system. The two edge detectors were coded in C language and were easily included in the software framework. While the processing is in progress, it is possible to vary the parameters of the filter that is a threshold in the current example, as well as to switch between the two filters. The syste proved to be a very efficient tool for evaluating the performances of the two operators in real-tim video processing. Concerning the development of the board and the software framework, we found the TMS320C6 is an excellent processor in both processing the data and managing the peripherals we integrate in the system. The only weakness we found was the lack of off-the-shelf software and, in particular, the lack of middleware. We had to write many lines of code to build a framework for a video processing application. For example, we had to develop the driver for the VGA graphics processor for the video output module of the system and then fully implement the GUI when the are software components which are commonly available for other CPUs. We also spent a lot of time developing the USB driver, which is now limited to mice and keyboards because of the lack an entire stack. In this case, we think that a solution based on a general purpose processor (GPP could have benefited from a greater number of software products. Nowadays, unavailability of software is being partially solved by new development tools which support DSPs. For example, some of the RTOS originally created for GPPs now run on some of th more popular DSPs. The most notable example may be the porting of Linux, an OS already large used in embedded applications. If these new software platforms succeed, DSPs will be a more an more attractive alternative in embedded systems. 7. References [1] F. Charot, G. Le Fol, P. Lemonnier, C. Wagner, R. Barzic, C. Bouville “Toward hardware building blocks for software-only real-time video processing: the MOVIE approach” IEEE Trans. o Circuits and Systems for Video Technology, 1999, vol. 9 (6), pp. 882-894. [2] O. D. Evans, Y. Kim, “Efficient Implementation of Image Warping on a Multimedia Processor” Real-Time Imaging, 1998, vol. 4 (6): 417-428. [3] K.S. Kim, J.H. Park, R.C. Kim, R.H. Park, S.U. Lee, I.C. Kim “Real-Time Implementation of th Relative Position Estimation Algorithm Using the Aerial Image Sequence” Real-Time Imaging, 2002, vol. 8 (1): 11-21. [4] K Guttag, RJ Gove, JR Van Aken. “A single-chip multiprocessor for multimedia: the MVP” IEE Computer Graph Appl 1997; 12 (6): 53-64. [5] The MathWorks, Inc. “Real-Time Workshop User's Guide” (Release 14) June 2004. [6] K.H. Hong, W.S. Gan, Y.K. Chong, T.F. Cheong, S.H. Tan “Rapid prototyping of DSP algorithm on VLIW TMS320C6701 DSP” Microprocessors and Microsystems, 2002, 26: 311–324. [7] Texas Instruments, “TMS320C6000 CPU and Instruction Set Reference Guide” (SPRU189F). [8] Texas Instruments, “TMS320 DSP/BIOS User's Guide” (SPRU423A). [9] R. Gonzalez and R. Woods “Digital Image Processing”, Addison Wesley, 1992, pp 414 - 428.

This paper was presented at the Texas Instruments EDERS (European DSP Education and Research Symposium) conference held on 16th November 2004, Birmingham, UK.