The entire FPGA design is the original work of the author of this thesis to- .... transport interface from top to bottom line by line, with each image line being ... used for video input and output of the presented video processing system. 10 ...... [20] Altera Quartus II 11.0 Web Edition development environment application,.
}w !"#$%&'()+,-./012345= 1 ) & ( y < ( y s i z e m i n u s o n e ) ) ) m a s t e r s t a t e [ 1 : 0 ] = { 1 ’ b0 , y [ 0 ] } ; i f ( y == ( y s i z e m i n u s o n e ) ) m a s t e r s t a t e = 0 ; end else begin i f ( y == 0 ) m a s t e r s t a t e = 2 ; i f ( y == 1 ) m a s t e r s t a t e = 0 ; i f ( ( y >= 2 ) & ( y < y s i z e ) ) m a s t e r s t a t e [ 1 : 0 ] = { 1 ’ b0 , ˜ y [ 0 ] } ; end end end 29
8. E XAMPLE VIDEO PROCESSING CORES The variables x, y contain the actual position within the video image data, field is the even/odd field indicator, master state[1:0] is a variable indicating which of the actions A, B or C should the deinterlacer perform on the actual line and can advance is a signal indicating that the remaining core components are ready for next data item. The deinterlacer ram buffer module is Altera-specific instantiation of an embedded memory block forming a RAM memory to store the image line. The address to this embedded RAM memory block is controlled by the scheduler, the deinterlacer mem addr delay module delays the address signals for the line operation B. The operation B means that the deinterlacer must store the incoming line to the RAM buffer and at the same time load the data from the very same memory buffer. Therefore, it is necessary that the data from the buffer can be read out before the new image line data are saved to the buffer. The deinterlacer line switch module provides the switching between operations A, B and C as requested by the scheduler module. Operation A (master state = 2) means that the data received from the frame buffer component is stored to the RAM buffer and at the same time the data is routed through the deinterlacer line switch to the output FIFO. Operation B (master state = 1) means that the incoming data is stored to the RAM buffer and at the same time the previous line data stored in the RAM buffer are read out, sent to the deinterlacer line switch where the pixel data is averaged (interpolated) with the actual line data and sent to the output FIFO. Operation C (master state = 0) does not read the incoming pixel data but instead simply outputs the stored line from the RAM buffer to the output FIFO. The remaining components of the deinterlacer core are mainly support functions to properly align the individual data and control signals to compensate for the latency of the respective communicating components. To relax the requirements for the maximum frequency of the device logic fabric, the deinterlacer core processes two pixels at a time. This doubles the used data bus width, but at the same time allows to halve the operating frequency while maintaining the required bus bandwidth. The deinterlacer core expects the field data in a standard RGB color space with every color component having 8 bit value range (0.. 255). The interpolation (vertical averaging) of the neighboring half field image lines is done by adding the individual red, green and blue components of the pixel color (the two pixels in the RAM buffer from the previous image line and the two pixels currently being received and stored to the RAM buffer) together and then doing an one bit position shift right, thereby calculating an arithmetic average of the two values.
30
8. E XAMPLE VIDEO PROCESSING CORES
Figure 8.4: Top level entity of the deinterlacer component[20]
31
8. E XAMPLE VIDEO PROCESSING CORES
8.2
Alpha blender
Alpha blending is an image processing algorithm for mixing two images into one, with the option to select the transparency of individual picture elements. In video stream processing, the input images are formed by the active picture data of the individual video frames. The transparency is selected by the alpha channel, which for each pixel defines a transparency value. The range of the transparency value 0.0 to 1.0 can be translated to integer representation, for example with 8bit resolution the range is 0.. 255. The value 0 means that the first image is fully visible with no visual input from the second one and vice versa. Value of the final pixel is usually calculated by calculating the individual elements of the pixel color for each coordinate in the pixel’s color space. For example, with the RGB color space, the calculation can be described by the following equations: outR = layerAR ∗ layerAalpha + layerBR ∗ (1 − layerAalpha ) outG = layerAG ∗ layerAalpha + layerBG ∗ (1 − layerAalpha ) outB = layerAB ∗ layerAalpha + layerBB ∗ (1 − layerAalpha )
(8.1)
The alpha value for each pixel can be either fixed for the entire image or delivered to the blender core as a separate value for each individual pixel, for example as the unused 8 bits within 32-bit pixel memory window for 24-bit pixel colors. In this work, the blender core has a fixed value for the alpha channel for the entire active picture window. Although initially was the per-pixel alpha channel considered, to provide a simple way for the OSD menu generation, the fixed alpha solution was preferred. The main reason for this decision was that the PC video feed is used as the source for the OSD menu and it would be problematic to transmit the alpha channel through the standard 24-bit per color DVI interface. Using the fixed alpha value, the entire range of the pixel value of the DVI interface can be used for pixel color space coordinates and the OSD generation is achieved by simply displaying an image on the x86 host system graphics output. This solution also has its drawbacks, most notably the inability to display a non-transparent OSD image on top of the live camera video feed. This was resolved to dedicating a single pixel color from the x86 host system a the transparent color value. When this color is encountered by the blender core, the value of the camera video pixel is assigned to the output, regardless of the alpha value setting. This allows for the generation of either non-transparent or semitransparent OSD image on top of the live video feed. 8.2.1 Principle of operation The core processes two input pixel streams and produces a blended pixel stream on the output. One input stream is a directly connected video feed from the x86 host system, which is used as a reference video signal for the output video feed. 32
8. E XAMPLE VIDEO PROCESSING CORES This means that the output video feed has the same parameters (pixel clock, timing, resolution) as the video feed from the x86 host system. Into this video feed is mixed the live video signal from the camera input using the preceding frame buffer and deinterlacer components. This allows the system to mix these two streams with no interruptions in output video timing, since the camera feed is passed through the frame buffer component and can be therefore matched to the reference video signal. The calculation of the output pixel value is divided into separate calculations for each color component of the pixel color. Each calculation of the output color component is then further divided into pipelined calculation stages to relax the timing requirements of the design compared to the case with no pipelining done. For the calculation of output values the blender core uses the equations 8.1 translated into the integer domain. 8.2.2 Implementation The blender component is implemented as a Verilog HDL entity, instantiated in a higher level schematic design file in the Quartus design environment. The reference input video signal is fed to the core using the pixel b in[23..0] bus together with the reference video timing signals de in, hsync in and vsync in. The core is clocked using the reference video signal clock connected to the core clock input clock in.
Figure 8.5: Schematic symbol for the blender module[20]
The output video signal is formed by the output pixel out[23..0] together with the timing control signals de out, hsync out and vsync out. The output video feed uses the same clock as the input reference video feed, i.e. clock in. Following is a code walk through for a single color component (red). The core starts by registering the input information to reduce the length of the input path and therefore to improve the maximum operating frequency of the core. always @( posedge c l o c k i n ) begin p i x e l a