image processing and capturing using the philips

0 downloads 0 Views 2MB Size Report
5.1 The Structure Of A Device Independent Bitmap . ..... processing and manipulation operations like filtering, colorspace conversion. alpha blending, chroma ...
IMAGE PROCESSING AND CAPTURING USING THE PHILIPS TRIMEDIATM MULTIMEDIA PROCESSOR by Christian Sonntag

Submitted in partial fulfilment for the degree of Master of Science in Information Engineering.

Supervisor: Dr T J Ellis

City University Dept. of Electrical, Electronic and Information Engineering September 2000

c. somlta!.!/ttigmx .de

MSc. In Information Engineering, Project Report

Christian Sonntag

1 Abstract No.

IMAGE PROCESSING AND CAPTURING USING THE PHILIPS TRIMEDIA 1.M MULTIMEDIA PROCESSOR

Christian Sonntag

This project deals with • the implementation of an image processing library for the Philips TriMedia TM1000 processor residing on a Philips TriCodec PCI board and a library running on a PC, • the evaluation of image processing performance of the Philips TriMedia TM-1 000 processor compared to other processors, namely the Intel Pentium II 400 PC processor and the Texas Instruments TMS320C601 DSP processor, • image capturing and real time image processing using the image capturing and processing capabilities of the Philips TriMedia TM-1 000 processor and • the implementation of a graphical user interface running under Windows NT 4.0 which provides image input/output capabilities and accommodates the implemented image processing libraries. The Philips TriMedia TM -1000 processor has been designed for multimedia applications like MPEG (Motion Picture Expert Group) video (de )compression which is needed e.g. for DVD (Digital Versatile Disc) video applications. It provides a variety of multimedia specific units like video and audio input/output facilities. several coprocessors and a powerful DSPCPU (Digital Signal Processor/Central Processing Unit). It was found that the Philips TriMedia TM-1 000 processor cannot compete with a modern general purpose PC processor regarding the performance of image processing algorithms. This is mainly due to the fact that the processor has been designed especially for applications which are parallel in nature (like multimedia applications). Further optimisation of the algorithms could better this situation.

2

MSc. In Information Engineering, Project Report

Christian Sonntag

2 Table Of Contents 1

Abstract .................................................................................................................. 2

2

Table Of Contents ................................................................................................. 3

3

Introduction ........................................................................................................... 5

4

The Development Environment ........................................................................... 7 4.1

5

The Philips TriMedia TM-1000 Processor ................................................................. 7

4.1.1

VLIW (Very Long Instruction Word) Architecture ............................................... 7

4.1.2

General-Purpose VLIW Processor Core (DSPCPU) ............................................. 8

4.1.3

Coprocessors ........................................................................................................ I 0

4.1.4

Bus And Memory System .................................................................................... I 0

4.1.5

Input/Output Units ............................................................................................... I 0

4.2

The Philips TriCodec Board ...................................................................................... 12

4.3

The TriMedia Software Development Environment.. ............................................. 12

4.4

Summary ..................................................................................................................... 14

The Image Processing Libraries ........................................................................ 16 5.1

The Structure Of A Device Independent Bitmap .................................................... 16

5.2

General Structure Of The Libraries ......................................................................... 19

5 .2. I

The DOS Version Of The TriMedia Library ....................................................... 19

5.2.2

The Windows NT Version Of The TriMedia Library .......................................... 20

5.2.3

The PC Library .................................................................................................... 22

5.3

Examples Of Algorithm Implementation ................................................................. 24

5.3.1

The Median Filter ................................................................................................ 24 The MalT-Hildreth Edge Detector ........................................................................ 30

5.4

Performance Evaluation ............................................................................................ 35

5.5

Summary ..................................................................................................................... 40

, .)

MSc. In Information Engineering, Project Report

6

Christian Sonntag

Image Capturing And Realtime Processing ..................................................... .42 6.1

Philips TriMedia YUV Formats ................................................................................ 42

6.2

The Image Capturing Sofhvare ................................................................................. 44

6.3

Sutnmary ..................................................................................................................... 46

7

The Host Program ............................................................................................... 47

8

Conclusions And Outlook ................................................................................... SO

9

Acknowledgements ...... ,....................................................................................... 52

10

References ...................................... ,..................................................................... 53

11

Appendix ................... ,.......................................................................................... 56 11.1

ln1age Processing Functions .................................................................................. 56

11.2

Algorithm Execution Times ................................................................................... 63

11.3

Algorithms .............................................................................................................. 66

11.3 .I

The Shells01t Sorting Algorithm .......................................................................... 66

11.3.2 The Wirth Median Algorithm .............................................................................. 67 11.3.3 The Histogram Median Algorithm ....................................................................... 67 11.4

How To Extend The Host Program ...................................................................... 69

11.4.1 Adding Existing Functions/Libraries ................................................................... 70 I 1.4.2 Adding New Functions To The TriMedia Library ............................................... 73 11.4.3 Adding New Functions To The PC Library ......................................................... 76 11.4.4 Including A New Library Into The Host Program ............................................... 79 11.5

The Attached CD .................................................................................................... 84

4

MSc. In Information Engineering, Project Report

Christian Sonntag

~--~~--~--~------------------------------~~

3 lntroductiotl Image processmg and (:omputer vision are very important and rapidly developing fields and find applications in a large variety of scientific and industrial areas like medicine, military, astronomy, surveillance or the automobile industry. Applications reach from medical in1Uging over automatic vehicle or missile guidance systems to surveillance tasks like face recognition. Many image processit1g techniques like image enhancement or feature extraction methods are very time expensive operations and hence ask for powerful computers particularly in real time image processing applications like surveillance or guidance. Over the last years the available computational power has been increasing rapidly with every new generation or modern processors (be it DSPs (Digital Signal Processors). ASICs (Application Specific Integrated Circuits) or general purpose PC processors). This increase of comptnational power makes it possible to employ even complex image processing methOds in real time systems which is for instance very important for modern surveillance or guidance systems. Thus it is a necessity to investigate and exploit the possibilities of every new processor generation in order to implement more and more complex in1age processing systems. This is the main reasOI1 why this project was commenced. The possibilities of the TriMedia processor reg.OC

which provides a large amount of t1exibility, the TM-1 000 can also

r¥..::1 \32 t•lt9- 3-3

MHz~

be used to implement a variety of Fig. -1.1: TM-1 000 block diagram {2]

other

multimedia

and

Image

processing algorithms [L2,3]. The processor can be deployed as a sole CPU in standalone systems, as part of a multiprocessor configuration or as a coprocessor on a plugin card for a traditional PC-CPU (e.g. the Philips TriCodec board, see chapter 4.2). Besides its processor core the TM-1 000 also contains several multimedia input/output units, specialised coprocessors and a high-performance bus and memory system which enable it to extremely accelerate particularly multimedia applications.

4.1.1 VLIW (Very Long Instruction Word) Architecture The VLIW instruction set architecture [41 uses fixed-length instructions which each comprise several independent operations (one TriMedia VLI\V instruction can contain up to five independent operations). This is compared to the RISC (reduced instruction set computing) architecture which also uses fixed-length instructions, but only

specifies one operation per instruction, and the CJSC (complex instruction set computing) architecture which typically uses variable-length instructions, each

encompassing several dependent operations. CISC instruction sets were designed neglecting the possibility of superscalar hardware designs (superscalar means that a processor is able to execute more than one

operation at a time using e.g. pipe lining and several execution units). Therefore the assumed execution model was serial in nature

7

which

makes

it necessary

in

MSc. In Information Engineering, Project Report

Christian Sonntag

modern superscalar processor implementations to integrate additional logic hardware. This logic, typically called a dispatcher, discovers and exploits instruction-level parallelism using highly sophisticated techniques like branch prediction. This complicates the hardware design and does not guarantee the best possible utilisation of the given resources since a dispatcher is only able to take a small window of instructions into account. Although the CISC architecture can be seen as an inferior instruction set, almost all desktop microprocessors (for example the x86 PC processors) use this architecture since instruction set compatibility must be preserved between all processor generations to allow customers to protect their software investments. The great advantage of the VLIW architecture is that it explicitly specifies several independent operations per instruction and thus provides parallelism at the software level - all the complicated and expensive dispatcher logic is dispensable and the gained chip area can be used to implement additional hardware functions. VLIW processors rely on the compiler to create most parallel code. This has the advantage that parallelism is created during the compilation process and that the parallelism routines only have to be implemented once in the compiler instead of including them into every processor. Another benefit of moving the parallelism logic from the hardware to the software level is that it still can be improved although chips have already been manufactured. The compiler is able to take the whole program into account in order to achieve maximum parallelism whereas a hardware dispatcher can only consider a small window of code.

4.1.2 General-Purpose VLIW Processor Core (DSPCPU) The 32-bit DSP-like processor core of the TriMedia chip [ 1,2] implements the powerful VLIW instruction set architecture (see chapter 4.1.1) and is designed to run at a clock speed of 100 MHz. Each of its VLIW instructions contains up to five operations which comprise • common RISC operations, • special DSP operations which perform SIMD (single input, multiple data) operations and are very important, particularly in multimedia processing, • special multimedia operations which take advantage of the TriMedia architecture to

8

MSc. In Information Engineering, Project Report

Christian Sonntag

dramatically improve the performance of multimedia applications which otherwise only could be achieved by utilising numerous traditional operations and • a set of 3 2-bit floating point operations. Each of this five operations per instruction can target one of the 27 pipelined functional units of the DSPCPU, amongst other units five integer ALUs (Arithmetic Logical Units), two DSP ALUs and two floating point ALUs as well as some specialised units like integer, floating point and DSP multiplication, two shifting units and floating point division/comparison units. The DSPCPU is equipped with 32 kilobytes of instruction cache and 16 kilobytes of data cache where the data cache is dual-ported to allow two simultaneous accesses during one clock cycle. The TM-1000 DSPCPU comes with 128 32-bit general purpose registers, each of which can be read/written by every operation in every VLIW instruction (Fig. 4.2) except for the registers rO and r 1 which are write protected and contain decimal zero and decimal one, and five special purpose registers like the program and clock cycle counters and branch handling registers. All operations the DSPCPU provides (except the load immediate operations) are optionally operation

guarded, Js

I.e.

executed

the

Register

Size

Details

rO

32 bits

Always reads as OxO; must not be used as destination of operations

r1

32 bits

Always reads as Ox!; must not be .Jsed as destination uf operations

r2-r127

32 bits

126 general-purpose registers

PC

32 bits

Program counter

guarded

conditionally,

PCSW

32 bits

Program Control & Status Word

depending on the LSB (Least Significant

DPC

32 bits

Bit) of the guarding register.

Destination program counter; latches target of taken branch that is inter· rupted

SPC

32 bits

Source program counter; latches target of taken branch that is not interrupted

CCCOUNT

64 bits

Counts clock cycles since reset

The DSPCPU instruction set provides some

custom

operations

which

Fig. 4.2: DSPCPU registers [2]

are

especially designed to drastically improve the performance of multimedia applications. These operations encompass functions like the sum of products, merging, byte averaging and multiplying and motion estimation as well as some special DSP functions like addition, multiplication or subtraction. If these functions are employed in a reasonable way, multimedia algorithms like MPEG (de)compression or matrix transposition can be extremely accelerated compared to the utilisation of traditional operations.

9

MSc. In Information Engineering, Project Report

Christian Sonntag

4.1.3 Coprocessors The image coprocessor (ICP) ofthe TM-1000 [L2] relieves the DSPCPU of several time consuming tasks and thus enables it to devote its resources to more sophisticated calculations. The tasks the image coprocessor can perform encompass image processing and manipulation operations like filtering, colorspace conversion. alpha blending, chroma keying, bit masking and both horizontal and vertical scaling as well as just moving or copying image data from memory to memory or the PCI bus so that it can be retrieved by a graphics card or another device of a PC. Therefore the image processor contains several specialised units like a 5-tap filter. a colorspace converter (YUV -> RGB), an alpha blending unit and an output formatter. A bank of FIFO (first in, .first out) buffers are used by the image coprocessor to exchange data with the TM -1000 bus and for communication between its internal units. The image coprocessor can only apply onedimensional scaling and filtering at a time, thus the image has to be read from and written to memory two times in order to implement e.g. a twodimensional filter. The variable length decoder (VLD) coprocessor is particularly important for MPEG 1 and MPEG 2 (de )compression. It performs the tedious task of en-/decoding Huffman-encoded (entropy-encoded) video streams such as MPEG.

4.1.4 Bus And Memory System The TM-1 000 is capable of adressing 64 MBytes of external SDRAM (Synchronous Dynamic Random Access Memory) or SGRAM (Synchronous Graphics Random Access Memory) which is an enhanced version of SDRAM [2]. This memory is used to store image or audio data and can be accessed by the high speed internal bus or

data highway ofthe TM-1000. The internal bus also forms the connection between all functional blocks of the TM-1 000. It consists of separate 32-bit data and address busses. In order to allow for flexible and effective bandwidth exploitation the bus bandwidth allocation is programmable.

4.1.5 Input/Output Units The TM-1000 chip comes with a variety of DMA (Direct Memory Access) driven

10

MSc. In Information Engineering, Project Report

Christian Sonntag

input/ output units [2] which provide for the capturing and output of video and audio

data as well as for the communication with a modem/ISDN adapter and the PCI bus of a PC system. The video input (VI) unit of the TM-1000 is able to capture video data either in a CCIR601/656 compliant format in 8-bit resolution at rates

of up

to

19

Megapixels/second (for example from a digital or analog camera) or as a raw data stream of 8 or 10-bit resolution at rates of up to 38 Megabytes/second and then writing it to SDRAM. The VI unit expects data input in YUV 4:2:2 format, the Y, U and V components are stored seperately in appropriate data structures. Non CCIR601/656 compliant devices can be connected using a CCIR601/656 compatible video decoder. The VI unit is capable of either full or half resolution video capturing. In half resolution capturing mode the video data is subsampled horizontally by a factor of two. The video output (VO) unit is the complement to the VI unit in that it performs its inverse function. It generates a CCIR60 11656 compliant YUV datastream using the seperate Y, U and V data structures of an image in the SDRAM. All CCIR601/656 compatible devices like digital video recorders or extern video encoders can be attached gluelessly. The highly programmable audio input/output units are capable of reading respectively outputting 8 or 16-bit stereo sound samples in a sampling range from 1 Hz through 100KHz with a resolution of 0.07 Hz and support many PC standard memory data formats. They can be connected to serial ADCs (Analog-Digital Converters) and DACs (Digital-Analog Converters). The 12C interface is usually used to control

ec compatible peripheral devices and

therefore provides data transfer rates of up to 400 Kilobit/second. If the TM-1 000 is used in embedded systems, the

ec interface can also be used to read the boot image

from an EPROM at startup. The synchronous serial interface can be used to connect a variety of multimedia devices, like modems, ISDN adapters or video phones. It implements full duplex serialisation/deserialisation. Any connected device must at least support synchronous initialisation and the reception and transmission of data. Due to the implementation of the communication algorithms in software the interface is very flexible regarding transmission protocols.

11

MSc. In Information Engineering, Project Report

Christian Sonntag

The PCI interface of the TM-1 000 facilitates the integration into PC environments where PCI is a standard for high speed data transmission, but it can also be used in embedded systems as an interface to peripheral devices which implement functions the TM-1 000 lacks. It represents a glueless interface between the bus of the TriMedia processor and the PCI bus of a PC.

4.2 The Philips TriCodec Board The TriMedia driven Philips TriCodec PCI plugand-play card [6] which has been employed in this project has been designed as a video board for home

or

semiprofessional

applications

and

provides capturing, encoding, decoding, editing and composing video productions on a PC. Therefore

the

card

comes

with

supporting

software to create a versatile and high quality multimedia workstation. Storing of video data to a VCR tape or a CD via a CD-R or CD-RW drive is possible. A Video-CD authoring facility is also

fig. 4. 3: The TriCodec board f 6]

provided. The board offers a variety of possibilities such as MPEG 1 (de)compression, MPEG 2 decompression, real time analogue video/audio capturing and editing, playback of VideoCD content as well as MPEG movie files from a variety of sources like CD or DVD in PAL and NTSC format or on the PC screen.

4.3 The TriMedia Software Development Environment The TriMedia software architecture (TSA) [ 1] has been designed to provide interoperability, reusability and clarity of components in order to facilitate application development and to provide for a glueless collaboration between application programmers and software component developers. The TSA is composed of various parts. Its backbone are the compilers, linker, debuggers, instruction schedulers and all related tools. Other parts of the TSA are the basic C library and the application, device and general purpose APis as well as the TriMedia Software Streaming Architecture (TSSA). 12

Christian Sonntag

MSc. In Information Engineering, Project Report

The TriMedia SDE (Software Development Environment) which provides all these parts has basically been modelled as a traditional C based command line environment. Traditionally the assembler language has been used for developing real time DSP and multimedia applications. This is disadvantageous because assembly level code is not portable to other platforms whereas high level language sourcecode is. Therefore the TSA provides the additional possibility to develop applications in C or C++ Highly sophisticated compiling and scheduling tools allow for extremely optimised code without the need for programming in assembler. Fig. 4.4 shows the TriMedia compilation and execution system. First the C core compiler creates maximally parallelised code from a C or C++ sourcecode file and stores it in an intermediate file (ending .t). To achieve the aim of maximal parallelism the compiler also inserts decision trees

into

the

intermediate

file

which

therefore grows in size. Decision trees can only be entered at the beginning but can have multiple exits and provide a means of gaining

~~. l-1~

more fine grain parallelism.

./

The intermediate file is then being analysed by the instruction scheduler which uses a machine description file to accommodate the assembly

level

instructions

from

"~::rkl

Mach+no S1mulatr,r (tmslm)

-· '

'

Fig. 4 . ../: TriMedia compilation system

the

[1]

intermediate file in VLIW instructions readable for the assembler. The machine description file contains information about the functional units and registers of the actual processor which can be used by the scheduler. The optimising attempts during the scheduling process are very complex. The scheduler uses techniques like decision tree analysis, jump probabilities or speculative jump executions to create maximally efficient code and also has to take processor constraints like operation execution latencies or functional unit availability into account. During the scheduling process statistics about the generated schedules are created and can then be analysed by the developer to finetune the program. Further optimisation

can

be

achieved

performing the so called compile-pr(dile-

by 13

MSc. In Information Engineering, Project Report

Christian Sonntag

recompile cycle where the compiler generates profiling information about the program. Simulating the program with the tmsim machine level simulator produces profiling information that points out critical sections within the program. This file can be used by the compiler to create a program which contains more branch probabilities and thus is faster in execution. Other optimisation techniques include loop unrolling (a loop is replaced by copies of the sourcecode within the loop) or grafting (jumps to decision trees are replaced by the trees themselves, hence increasing the program size, but usually also increasing performance). The assembler uses the textual assembler code created by the scheduler (ending .s) to generate binary machine instructions and data and stores the resulting program in an object file (ending .o ). Therefore it has to use the machine description file. The linker finally transforms the object file into an executable file and links it to all needed libraries and the runtime support code. The TriMedia SDE provides three libraries which facilitate application development and enable the developer to interact with the TriMedia chip: The device library provides APis (Application Programming Interfaces) to interact with the several blocks of the TriMedia processor like video and audio input/output units, the PCI and serial interface, the timer unit or the image and VLD coprocessors. The application library provides APis which offer functions to implement operations like File input/output, MPEG and Motion JPEG encoding/decoding, audio and video rendering, digitizing and transforming or Dolby Digital AC-3 and Dolby Pro Logic support. The general purpose library offers utility APis like downloading and dynamic linking of programs and interprocessor communication between x86 processors and the TriMedia under Microsoft Windows 95 and NT.

4.4 Summary The TriMedia TM-1 000 processor has been designed to particularly perform functions which are highly parallel in nature (like many multimedia algorithms for instance). Philips alleges that the DSPCPU can perform over a billion operations per second. But since the TriMedia processor merely runs at a clock speed of 100 MHz and can issue five operations simultaneously to one of its 27 units, this gives a maximum of 500 14

MSc. In Information Engineering, Project Report

Christian Sonntag

million operations per second, assuming maximal parallelism of the program. Hence Philip's allegation is related to the special multimedia operations the chip provides. But these operations have not been used in the course of this project so that the maximum performance is 500 million operations per second. Actually the maximum performance is significantly less because the algorithm to be executed would have to provide maximum parallelism so that the processor could issue five operations to its functional units every cycle. This is very unlikely to happen due to the rather serial nature of the image processing algorithms implemented. Chapter 5.4 contasts the TriMedia TM-1 000 processor, an Intel Pentium II PC processor running at 400 MHz and a new DSP chip from Texas Instruments regarding performance and execution times of algorithms. The TriMedia processor has the big disadvantage that it is quite old compared to the Pentium II processor and thus will probably be far slower Although the Pentium II is only a general purpose processor, it should be very much faster than the TriMedia processor since it runs at four times the clock speed and the advantages of parallelism for the TriMedia are very small due to the serial nature of many image processing algorithms. The newest version of the TriMedia family, the TM-1300, would probably be at least a match for the Pentium II processor regarding image processing algorithms.

15

MSc. ln Information Engineering, Project Report

Christian Sonntag

5 The Image Processing Libraries During this project two image processing libraries have been implemented, one using the TriMedia processor on the Philips TriCodec PC board and the other one running directly on the PC host processor. The TriMedia library exists in two versions, a version running under Microsoft DOS and a version accessed through the host program which is depicted in chapter 7. This chapter is concerned with the general structure of the libraries, the description of algorithm implementation issues and a performance evaluation which contrasts the two libraries described here and additionally a highly optimised image processing library published by Intel [33] as well as a library developed within the scope of a concurrent MSc. project [ 13] which is running on the Texas Instruments TMS320C601 DSP chip.

5.1 The Structure Of A Device Independent Bitmap The image format used internally by the host program and all image processing libraries is the device independent bitmap (DIB). All images from files are converted into this general format during the loading process. Therefore this chapter gives a description of this format in order to allow the reader to better understand this chapter. The general structure of a DIB is shown in Fig. 5. 01.

Image Data

' r - - - - - - . , . - - - - - - - , · · · .. ················ .........................................

File Header

Palette

Info Header

L-------L---------'····························································

fig. 5.01: Structure ofa device independent bitmap

The bitmap header comprises a bitmap file header, a bitmap information header and a palette (also called color table or lookup table, LUT), if the colordepth of the bitmap is not greater than 8 bits. Then the actual image data follows, defining the

RGB (red-green-blue) value of each pixel in the image or a pointer to a palette entry respectively. The image data is organised as lines of the image, starting with the bottom line of the image and ending with the top line. Hence the first pixel of the image data corresponds to the bottom left pixel of the image and the last pixel corresponds the to the top right pixel in

the image. Furthermore the image data is 16

MSc. In Information Engineering, Project Report

Christian Sonntag

aligned to a 32 bit boundary, i.e. the memory size of every image line is a multiple of 3 2 bits. This means that if the pixel width of an image does not correspond to this 3 2 bit boundary, every line ofthe image data field is padded with bits to fit the image into this boundary. The image format used by the image processing libraries does not use this 3 2 bit boundary in order to facilitate image processing operations. The bitmap file header consists of five entries of 16 or 32 bits length which define the image type (must be ,BM" for a bitmap), the size of the bitmap data including the bitmap header and 32 bit alignment, two entries reserved for internal use which must be set to zero and the offset, in bytes, from the start of the bitmap file header to the image data. The bitmap information header comprises eleven entries of 16 or 32 bits length. These include the size of the bitmap information header in bytes, the width and height of the image in pixels, the colordepth of the image in bits, the type of compression of the image (uncompressed or run length encoded), the size of the image data in bytes (only used for compressed images, may be zero for uncompressed images), and some values which may be set to zero in a DIB like the horizontal and vertical resolution in pixels per meter and (only for images with a palette) information on how many colors are used in the image and how many colors are important The colordepth of an image defines how many bits are used to define each pixel, e. g. in an 8 bit image each pixel is defined by one byte in the image data, in a 24 bit image every pixel consumes three bytes of memory and in a 4 bit image each pixel comprises four bits of memory. If the colordepth of an image is 8 bits or smaller, the image contains a palette and each pixel value of the image data addresses one of the palette entries. For an I bit image, the palette contains two entries (since one bit can assume

z1 = 2 different values, 0 and 4

1), the palette of a 4 bit image has 16 entries (a four bit number can assume 2 = 16 different values) and for an 8 bit image the palette comprises 256 entries (an eight bit number can assume 2!< = 256 different values). A palette entry is 32 bits in size and consists of one byte for each the red, green and blue proportion of a pixel and a reserved value which is set to zero (this value is used for the alpha transparency channel of a 32 bit image). The reason for using a color table for images with a small colordepth is that 24 bits are needed to define a pixel in true color (2

24

:::::

17

16.7 million colors). But since a pixel

MSc. In Information Engineering, Project Report

Christian Sonntag

value of an image with a colordepth less than or equal to 8 bits can only assume 256 different values at most, the RGB values are defined in the palette and the image data values just point to an entry of this color table. Thus, although the amount of different colors in the image is still restricted to 256 at most, these colors can be chosen out of the true color range of 16.7 million. An advantage of a palette in terms of image processing is that many algorithms (e.g. histogram mapping algorithms like the logarithm operator or histogram equalisation) do not modify each pixel in itself but map each color in the image to a new color and hence only need to modify the color table which due to the small size of the palette is much faster than modifying every pixel in the image. All image processing libraries developed during this project only accept 8 bit grayscale images as input. In the RGB color space which has been used throughout the project (except for the capturing part which employs the YUV color space), grayscale values are defined as equal values of the red, green and blue components of a pixel. This means that even a 24 bit image can only contain 256 different grayscale values (some 16 bit bitmap formats reserve all 16 bits of a pixel for gray scales and thus have a range of i

6

= 65536 different grayscales, but this is not defined for 24 bit images in bitmap

format). So only an 8 bit image is required to achieve a sufficient grayscale resolution over the whole grayscale range of the RGB color space. In order to save memory, the libraries shrink every palette entry of an image to a size of one byte due to the redundancy within the red, green and blue components of a grayscale pixel. One drawback of using palette images for image processmg however is that if arithmetic functions (addition, subtraction) are to be applied, the palette of the image has to be sorted first since the result of an addition/subtraction is the actual grayscale value and not a pointer to the palette. Thus a palette entry with the index n must contain the grayscale value n.

18

MSc. In Information Engineering, Project Report

Christian Sonntag

5.2 General Structure Of The Libraries ••••• u

...................................

g Program On

....

Fig. 5.02: The general structure of the image processing libraries

Both image processing libraries have been implemented following a general structure which is shown in Fig. 5.02. The host system can be the PC running under Microsoft DOS or Windows NT, the target system can be the TriMedia processor or the PC running under Windows NT. The programming language used to develop the libraries is C. The compiler used for the TriMedia library is the tmcc which is part of the TriMedia software development environment (SDE) and implements C in compliance with the ANSI C standard but additionally includes some TriMedia specific extensions which do justice to the layout and instruction set of the TriMedia processor [ 1]. The compiler used for the PC library is the Microsoft Visual C++ compiler, version 6.0.

5.2.1 The DOS Version Of The TriMedia Library The DOS version of the TriMedia library employs the TriMedia utility tmrun which reads a boot image file from hard disk, downloads the TriMedia executable from the boot image and parameters from the DOS command line to the TriCodec board and starts up the TriMedia processor. A boot image is a

Image Files

file created during compilation of a TriMedia program and contains an executable which can be run on a TriMedia processor. A boot image file usually has the ending *.out. If a TriMedia program is being invoked

Parameters

Errorcode

without any parameters, a help text which shows

all

function

identifiers

and

the

corresponding image processing functions and required parameters is being displayed. 19

Image Processing Functions Fig. 5.03: The DOS version ofthe TriMedia library

MSc. In Information Engineering, Project Report

Christian Sonntag

Invoking a TriMedia program with a function identifier but without any parameters displays the parameters required by this image processing function. In order to perform an image processing operation a TriMedia program must be invoked as follows: tmrun {boot image name} {function identifier} {input image(.at(index2).menulD) library = index; } }

vector *List

=

GetA!gorithmList(library);

for(index = 0; index< List->size(); index++) { if (List->at(index).menulD ==function/D) break; } data

= List-~at(index);

If the executable file contains more than one function, the sourcecode fragment if((position=data.executable.Find(":: ''))I =-1) { if(data. executable. GetLength()-(position +2) >0) { completePath =data. executable. Right(data. executable. GetLength ()-(position+ 2)); parameter.functionNumber=atoi(completePath); } else { parameter.functionNumber=-1; } completePath = GetLibList()->at(library).executablePath + data.executable.Left(position);

} else { parameter.functionNumber=-1; completePath = GetLibList()->at(library).executablePath + data.executable;

}

can be used to retrieve the function identifier and the complete path to the executable, if the algorithm configuration file has been written following the guidelines given in chapter 11.4.1. After the image processing function has been performed, the interface class must take care of creating one or more new images containing the results. Alternatively a histogram or some self-defined way of result representation can be displayed. The function shown below must be called to create a new image. void OnFileNew(HGLOBAL pBMP, canst CString algorithm Text)

This function resides in the main class of the application, called ChostProgApp (App for Application). It takes as parameters a HGLOBAL pointer to the memory location of the new bitmap and a string describing the

tmage

82

processing

algorithm

and

MSc. In Information Engineering, Project Report

Christian Sonntag

additional parameters. The bitmap at the given memory location must comply to the official bitmap specification, i.e. it must contain a valid bitmap header and 32 bit alignment offset (see chapter 5.1). Every interface class must be derived from the class DSPLib of the host program and the function cal/Function must be overwritten in the derived interface class since the base class implementation does nothing. With this knowledge, the steps to integrate a new image processing library into the host program can be derived: •

Derive a new class from the DSPLib base class which implements the cal/Function function. Within this function no restrictions exist on how the programmer interfaces to his library, although it should return a boolean variable indicating if the operation was successful.



Open the resource editor of the Microsoft Visual C++ environment and modify the

IDD _LIB_PROPERTIES dialog box. Add the name of the new library into the data field of the properties dialog of the dropdown field ,Name (appears in menu)". This causes the Library Listing dialog of the host program to perceive the new library. •

Modify the function OnAlgorithmCall of the class CHostProgDoc by adding the sourcecode shown below. Now the host program will call the function caliFunction of the new class if the user presses a corresponding menu entry. It is important that the name is exactly the same (including the case) as the one of the last step since the host program identifies the library using this name.

if (GetLibListO->at(library).LibName

= = ,. New Name'')

{ NewLib library; library. callFunction(iconNumber, this);

} •

Implement the cal/Function function in order to interface to the new library.



Write algorithm configuration text files for the library either by hand or using the

Algorithm Listing dialog of the host program, and deposit the library executables in the folder specified in the Library Listing dialog. The Algorithm Listing dialog should be the first choice since only a small typing error in a configuration file can

83

MSc. In Information Engineering, Project Report

Christian Sonntag

cause the algorithm not to work at all. Now the new image processing library should be accessible by the host program.

11.5 The Attached CD This chapter describes the folder structure and relations between the numerous files and folders on the CD attached to this paper. The menu of the CD should look like the depiction to the right. The folder Dissertation-ChristianSonntag contains this paper in Microsoft Word 97 document form as well as all images and Microsoft Excel diagrams.

The

folder

HostProg

contains

the

CD Dissertation-ChristianSonntag HostProg TriMediaSDE Utilities ImLib.zip

executable host program including all configuration files and DLLs and an installation file which has been produced using a freeware installer (residing in the folder

Utilities\Exe\lnstaller on the CD). This program works like every other Windows install wizard by asking for a folder where the program shall be installed, creating a link to the program in the start menu and so on. The folder TriMediaSDE contains the

TriMedia Software Development Environment v 1.1 final release which has been used throughout the project to develop TriMedia software. Besides Windows it supports HP-Unix and Sun-OS. The folder Utilities comprises a lot of useful applications like compression software (WinZip, WinACE) or screen capture software (FuliShot). It also contains all the software developed in [32] in form of the folder JTools as well as the ghostscript reader. Other goodies are useful documents about the TriMedia processor and image processing, lots of images which have been used throughout the project and some useful sourcecode, amongst other things an updated version of the TriMedia library libdev.a including the header tmBoard.h which are needed to access the TriCodec board properly. The most important folder , lmLib, contains all the sourcecode that has been developed during the project. It has been compressed using the ZIP compression method for a practical reason. Every file or folder on a CD bears the attribute write protected for an obvious reason. The problem now is that if a folder or file is being copied to hard disk, it still carries this attribute. The ImLib folder is nested to a very deep level and contains numerous subfolders. Now many of the sourcecode and workspace files must work to remove the attribute read only

allow writing. Thus it is a very tedious 84

MSc. In Information Engineering, Project Report

Christian Sonntag

from every file in every subfolder. The depiction to the right shows the top level of the ImLib folder. The subfolder Dos contains the DOS version of the TriMedia library, the

folder HostProg the host program sourcecode, incShare contains header files which define the image processing functions of both TriMedia libraries, Dos

VisualDos and VisualWin are Visual C++ workspaces which contain

HostProg

links to all sourcecode files (DOS and WinNT) of the TriMedia

inc Share

library and thus Visual C++ can easily be used as a sourcecode

VisualDos Visual Win

editor, and finally WinNT contains the Windows NT version of the

WinNT

TriMedia library. The Dos and WinNT folders also contain a subfolder bin which carries all executable files. Only the HostProg subfolder will be explained since the other folders are quite straightforward to understand. The host

Arithmetic BoardData

program has a fairly complex structure since it depends on

CodecFile

nine other projects, making a total of ten projects in the

Convolution Filters

workspace. These other nine projects are the four image processing DLLs (folders Arithmetic, Filter, Graytansform and Transform) and five libraries for image format conversion [ 19-

23] which reside in the folder CodecFile. The host program has been structured to produce intermediate and output files

Filter GrayTransform HostAlgorithms Libraries Res TIAlgorithms TMAlgorithms Transform

during compilation and linking in the following manner. If the debuf compile mode is used, a folder Debug is being created within the HostProg folder, otherwise the folder Release is being created. For every project that is being compiled a subfolder holding intermediate files like object files is being created within the compile folder. This folder bears the same name as the project. The output files of an compile and linking circle (like static libraries, DLLs or executables) are being stored at different positions. The host program executable is being stored within the compile folder, all DLLs are stored in the folder HostAlgorithms!DLLs and the static image format libraries are stored in Libraries.

85