An Organic Computing architecture for visual ... - Semantic Scholar

4 downloads 27411 Views 291KB Size Report
Abstract—The paper presents architecture and synthesis results for an organic ... within a pixel field, e.g. to identify and to detect center points of .... We call these.
An Organic Computing architecture for visual microprocessors based on Marching Pixels Dietmar Fey, Marcus Komann, Frank Schurz, Andreas Loos Institute of Computer Science, Friedrich-Schiller-University Jena D-07737 Jena, Germany {fey, komann, schurz, loos}@informatik.uni-jena.de Abstract—The paper presents architecture and synthesis results for an organic computing hardware for smart CMOS camera chips. The organic behavior in the chip hardware is based on distributed and emergent functionality exploited for detection of objects and their center points given in binary images. Future real-time embedded systems used in industrial image processing have to provide reply times in the range of milliseconds. It is impossible to meet such strict requirements for megapixel resolutions with serial processing schemes in particular if multiple given objects have to be detected. Even classical parallel techniques like SIMD or MIMD approaches are not sufficient due to their dependency on more or less central control structures. To achieve more flexibility, unlimited scalability and higher performance parallel emergent architectures are necessary. We present such an approach, denoted as Marching Pixels, for future digital visual microprocessors. Marching Pixels work similar to artificial ants. They are crawling as hardware agents within a pixel field, e.g. to identify and to detect center points of an arbitrary number of objects given in an image. We present an emergent Marching Pixel algorithm for the processing of arbitrary concave objects and its mapping onto real hardware. Based on synthesis results for FPGAs and ASICs we discuss the possibilities of digital organic computing approaches for visual microprocessors for future smart high-speed camera systems.

I.

INTRODUCTION

Currently the architecture in most commercial smart cameras is based on a CCD chip or a CMOS camera chip for image capturing, an optional FPGA for pre-processing and a DSP or a high-performance microcontroller for the actual post-processing. Smart camera systems used in future robots which shall co-operate as so-called robot assistants with human workers have to fulfill hardware and software reply times below 10 ms. Reply time refers to the time spent between two consecutively captured images. Furthermore quality control tasks in the tool manufacturing industry require detection rates of up to 1000 images per second. The serial architecture of current smart camera systems on one side and the mentioned high reply time requirements on the other side does not match for the processing of a megapixel resolution. The pixel-bypixel access to a CCD or a CMOS camera chip would allow only 10 ns processing time per pixel.

This work is part of the research carried out within the priority program 1183 Organic Computing funded by the German Science Foundation (DFG)

To fulfill the high time requirements technological advancements are necessary which offer parallel access to several pixels. Either a row- or column-wise access based on linear arrays of analogue-to-digital converter (ADC) or a fully pixel-parallel access based on 3D chip stacking or 2D smart pixel circuits are necessary. In addition we also need parallel processing techniques on algorithmic and architecture side since a parallel access to pixels makes only sense if the pixels are also processed in parallel. The benefits of pixel processing as well as their implementation in analogue hardware for visual processing was also favored in [1] and [2]. Classical parallel processing techniques do not propose long-term perspective since they offer only limited scalability and cause additional overhead. For image pre-processing tasks a SIMD approach, i.e. carrying out the same operation on each parallel accessed image pixel, would be self-evident. However, such a solution is not to prefer for an integration into a chip. SIMD means that we have a central control unit that broadcasts the operation code to all processing elements (PEs), which are responsible for the pixel processing. This results in the implementation of data highways on a chip from the central unit to the PEs and vice versa to transfer operation codes and to carry out inspection queries. As result much chip area, required for long lines, is lost for the integration of transistor logic. Furthermore processing time is increasing due to query operations between control unit and PEs. In a MIMD approach each PE has its own control unit. However, a typical MIMD solution is based on geometric parallelization, i.e. the image is decomposed in smaller partitions and each partition is processed by one processor. All partitions are processed simultaneously but one partition is processed serially by its assigned processor. That approach limits scalability because the processing of more pixels takes more time. Furthermore if objects are expanded between neighbored partitions additional communication effort is to perform to exchange data between neighbored processors. A much better solution would be to implement a swarm of virtual agents equipped with a limited intelligence directly on hardware level. Such a swarm fulfills in collectivity a global task by exploiting emerging and self-organizing effects. E.g.

the agents are running automatically within the chip hardware to that pixel corresponding to the centroid of an object just by leaving information on pixel positions and evaluating those information to control self-dependently their propagation. In short they use emergence on hardware level to avoid performance hindering bottleneck effects by pushing back central control structures as much as possible. This is the approach we pursuit with our Organic Computing principle of Marching Pixels for visual microprocessors. The rest of the paper is structured as follows. In section 2 we describe in more detail the idea of the Marching Pixels approach and related work in this field. Section 3 describes a Marching Pixels algorithm for the identification of multiple arbitrary concave objects and their center points. Section 4 shows how this scheme can be mapped onto a corresponding parallel chip architecture. Section 5 presents synthesis results and discusses the realization possibilities. Finally we briefly summarize the work in section 6. II.

MARCHING PIXELS AND RELATED WORKS

Marching Pixels (MPs) can be considered as small hardware agents which are crawling within a pixel field in order to identify given objects in an image. Actually they shall fulfill a global task, e.g. to determine the center points of objects, so that these points can be passed to a gripper arm of a robot. MPs are born, can merge with other MPs, have an own state and leave information on a certain pixel position for other MPs arriving later at the same position. Furthermore they can die or they stop their march at a final destination. The starting point of a MP can be e.g. a detected edge pixel. Before binary images are generated as usual by an ADC applied on grayscale images captured by optical detectors. Afterwards the Organic Computing process can start. Figure 1 shows this for a simple example of MPs starting their march from the edges of an axis-parallel rectangle. They run to the midst of the edges, where they merge, turn into the interior of the object, which is known, since it is stored if a pixel is object pixel or not. After a while all MPs coming from opposite directions meet automatically in the center of the rectangle where they stop their march as result of an emergent process. The ideas of virtual creatures moving within a pixel field or an array of grid points was also addressed in other work. In [4] moving creatures which have the task to efficiently visit a defined pixel area. The functional behavior is modeled with a cellular automata (CA). The implementation of the virtual creatures was carried out for an FPGA. A similar approach was pursued in [3] for the implementation of virtual ants used for the solution of optimization problems. Also in this work the functionality of the state machine of a virtual ant was realized with an FPGA. These works differ in one important point from our approach. Both the virtual creature in [3] and the ants in [4] are implemented in principle as static constructs. They are modeled as a fix data structure whose states are updated according to a control algorithm which is implemented in FPGA hardware. The pixel field, in which the creatures are propagating, is realized as 2D data structure of x- and ypositions, containing information like e.g. current pheromone concentration on a position, or if the position contains an obstacle, is free or was already visited. In contrast to that our

intention is to mimic also the moving process of our creatures, the MPs, directly in hardware. The states of the MPs are sent as data packets within an array of PEs, and each PE is ideally attached with one pixel element. The difference is that our MPs are really located at that position where they are in the field rather than only storing the position in a data structure modeling the virtual creature. The advantage of our approach is that the emergent behavior is mapped more realistic and the processing is more parallel and faster. The price for that is a higher hardware effort for memory resources.

Figure 1. Simple example of a Marching Pixels procedure

A similar approach like ours concerning the direct implementation of a direct pixel to PE mapping for the processing of binary images is published in [5]. A reaction-diffusion architecture implements a CA to carry out controlled dilation and erosion operations on chip level to extract quadrilateral boxes with identical pixel values. Propagating pixels modeled with CAs are used in [6] to solve leader election problems. CAs are also a very helpful model for our approach of propagating MPs. However, we require more arithmetic capabilities for our MPs as it is was originally intended in the cells of a classical CA whose smartness was more focused on a rule based pattern substitution logic. In that sense MPs resemble more artificial ants. Concerning the pursued application and the aspired hardware realization MPs are also strongly related to the cellular non-linear network (CNN) model from Chua, Yang and Roska [7], [8] which provide also a concept for massively-parallel visual microprocessors. In contrast to the mostly in analogue electronics realized CNNs the implementation of MPs is strictly, of course with the exception of the optical detector interface, focused on digital electronics. Focusing on a digital solution allows us to realize flexible processing based on the moving of virtual state-based creatures with own memory. The price we have to pay for that flexibility is a higher effort concerning the required chip area per pixel than in CNNs. Hence, the realization of CNNs is already very advanced [9]. III.

MARCHING PIXELS ALGORITHMS

The idea of MPs was firstly presented in [10]. Since then further MP algorithms were developed for the center point detection of objects by means of emergent processing. In [11] an approach is presented which allows to find the center point in symmetric objects. The idea is that MPs initially move from all identified edge pixels to the interior of an object by propagating along the horizontal and the vertical direction (see Figure 1 left). By doing that they count the number of passed object pixels. Then two MPs, running to each other, meet themselves along the thick lines shown in Figure 2. We call these thick lines reduction lines. At each meeting point of the reduction lines the number of accumulated pixels along the vertical

and horizontal propagation direction is stored. In a second step (Figure 2 right) two new MPs start a new march along the reduction lines from left and top side. By comparing the stored accumulated pixel numbers those positions can be found where the difference of the accumulated pixel numbers on left and right side, resp. on top and bottom side, is minimal. Those positions correspond exactly to the vertical and horizontal coordinate of the object’s centroid.

neously or subsequently for the vertical direction. During that procedure the number of accumulated pixels at the straight line is summed up by the MP to calculate all object pixels, corresponding to the object’s area.

Figure 3. Horizontal propagation of MPs in concave objects

IV.

MAPPING THE ORGANIC COMPUTING PRINCIPLE ONTO A PARALLEL CHIP HARDWARE

Figure 2. Horizontal and vertical reduction lines

This solution is much more robust and effective than the simple idea shown in Figure 1. However there is a weakness dealing with concave objects in which disconnected reduction lines are formed because now MPs are not able to find a continuous, definite path (see the splitted reduction lines generated by horizontal reduction in Figure 2 below). A solution for that problem is presented in this paper. For reasons of clearness only the solution for the horizontal case is shown. At all object pixel positions where no left object pixel exists new MPs are born. They start to run along the horizontal direction to the right counting the number of object pixels they pass if no MP has been there before. Furthermore the neighbor pixel downwards and upwards is marked (see symbol X in Figure 3). If the MP arrives at the edge of a hole located within an object (see Fig. 3), the MP checks if the right pixel is marked. If this holds there must be other object pixels below or above the pixel to the right. Therefore the MP must be in front of a hole. Then the MP continues its march to right but passed pixels are not counted since the pixels belong to a hole. This step is repeated as long as either marked pixels or object pixels are found in front. In any case after a fixed number of steps a vertical running straight line is formed on the most right side of the object. This number has to be previously determined considering the worst case object with largest horizontal extension. The straight line starts at the topmost pixel position and ends at the bottommost pixel position of the object. In each pixel on that straight line are accumulated the number of pixels from all MPs which they counted along their horizontal march in their segment until they met either a hole or the most right edge. Then with the same comparison procedure applied on the reduction line for convex objects the coordinates of the object’s centroid can be determined. An analogous procedure with MPs moving from top to bottom can be applied simulta-

Instead of calculating the centroid of an object with formulas known from physics, the MPs are running automatically to the center point and stop there as result of emergent behavior. In the sense of a reaction-diffusion architecture this happens by determining a new output behavior depending on the MP’s current state and by evaluating the information in the current visited pixel field and its directly neighbored pixel fields. For that it is necessary to assign each pixel to a PE. The whole pixel field is mapped onto an array of PEs which are connected bidirectionally with its four orthogonal neighbor PEs (Figure 4). The MPs itself are implemented as data packets which are sent from a PE to a right or bottom neighbored PE, what is determined in each PE according to the state of the PE and its neighbors. Each PE is in one of four different states (see Figure 3; X, filled, straight and angular striped boxes). The MP’s data packet contains a data field to store the number of passed pixels. This information is evaluated as mentioned above to determine the object’s area which is stored in that PE, where the balance point was found. One so far unanswered question is that how the centroid, found in an emergent procedure, is made known to the global world, i.e. outside the PE array. For that exists a so-called observer processor which queries continuously all PEs in a row by a broadcast signal. In each time step it switches to the next row. Each PE, which has accommodated a MP in the final state, gives out serially the area of the object via a line running along the column of the array. Because the observer processor knows the number of the scanned row and furthermore detects the number of the column line via a PE has answered, he knows the centroid coordinate of the object and its area. By a comparison with area entries of pre-known objects, what is the usual case in industrial image processing tasks, the observer processor can identify the object, knows the centroid and can deliver this information e.g. to a robot gripper arm.

plemented. If the required distributed memory and the logic for the PEs are also separated in two chip layers we can double the pixel number further to 566×566. Since further advancements in CMOS integration density are to expect we think that mega pixel resolution is possible up to the year of 2010. For the clock size of each PE we assigned in the synthesis only 50 MHz. Due to the fully parallel mode each PE can then carry out half a million elementary steps to stay below the required 100 Hz frame rate. In the absolute worst case of a rectangle filling the whole image our MPs had to run two and a half time along the edge of that rectangle. Each MP move costs 10 cycles, i.e. for megapixel resolution only 25.000 elementary steps are necessary. Due to the moderate clock cycle for one PE the calculated power consumption for the absolute worst case is only 4.7 mW for a 16×16 PE array. This value can be further reduced since it is still possible to decrease the clock according to our worst case timing analysis. Figure 4. Fine-grain processor array for the processing of Marching Pixels

V.

IMPLEMENTATION

A single PE needs an arithmetic ALU for incrementing the MP’s data field which holds the number of passed pixels. Furthermore its task is to add counted pixel numbers arriving from different MPs, e.g. if holes in the object had to be propagated (see Figure 3 in the middle object). To find the balance points an additional compare unit is necessary in each PE. Due to the fully parallel processing mode it is uncritical to stay below the required reply time of 10 ms. Therefore we favor a bit-serial arithmetic to save chip area. This requires shift registers and a counter to define the start and end time step for add and compare operations. The functional behavior for a single PE and the whole array was specified in VHDL and synthesized for an FPGA and an 0.18 µm CMOS process to determine the hardware effort. The synthesis for a Xilinx Spartan3 FPGA with 74800 logic cells showed that a PE field of size 26×26 can be implemented. For a high-end Virtex4-XC with 200448 logic cells a field size of 40×40 is realizable. Hence, currently MP solutions based on FPGAs are only appropriate for demonstration purposes. The synthesis results for the ASIC yielded that an area of 1.272 mm² is necessary for a 16×16 PE array. If we assume a chip area of 2×2 cm² we can realize with state-of-the-art technology a pixel resolution of about 288×288 pixels. The synthesis results showed furthermore that there is nearly a 1-to-1 ratio of the required area for memory and logic. This is interesting for a realization of the architecture with a 3D chip stack what is our final goal. In such a stacked chip the optical detector array would be located on top of the stack. The next chip layer would contain a mixed-signal circuit for an ADC as well as a filter for the edge detection. Since the access to the detector elements is fully parallel each new image must be converted with a rate of 100 Hz to meet the required 10 ms response time. Therefore one can use multiplexing for the A/D conversion to limit the number of necessary area intensive A/D converters and filters. In result we get object and edge pixels which are transferred to the third layer of the chip stack which realizes our Organic Computing architecture. The results of that layer – area and found balance points – can be sent to a fourth chip layer where the observer processor is im-

VI.

SUMMARY

The simultaneous detection of multiple given objects or patterns in images requires new solutions orientated on Organic Computing principles. For that we proposed a solution based on emergent computing with so-called Marching Pixels for the detection of centroids of arbitrary concave formed objects. Our synthesis results for a corresponding chip hardware show that our Organic Computing architecture offers sufficient potential for future industrial high-speed CMOS camera chips. REFERENCES [1]

A. El Gamal, D. Yang, B. Fowler, “Pixel level processing – why, what, and how?”, Invited Paper, SPIE Conf. on Sensors, Cameras, and Applications for Digital Photography, San Jose, California, 1999. [2] P. Dudek and S.J. Carey, “A General-Purpose 128x128 SIMD processor array with integrated image sensor“, Electronics Letters, vol.42, no.12, pp.678-679, June 2006. [3] B. Scheuermann, K. Sob, M. Guntsch, M. Middendorf ,O. Diessel, H. ElGindy, H. Schmeck, “FPGA implementation of population-based ant colony optimization“, Applied Soft Computing, 4, 2004, pp. 303–322. [4] M. Halbach, R. Hoffmann, L. Both: “Optimal 6-State algorithms for the behavior of several moving creatures“,7th Int. Conf. on Cellular Automata for Research and Industry, ACRI 2006, Springer LNCS 4173, pp. 571-581, 2006. [5] T. Asai, M. Ikebe, T. Hirose and Y. Amemiya, “A quadrilateral-object composer for binary images with reaction–diffusion cellular automata“, Int. Journ. of Parallel, Emergent and Distributed Systems, Vol. 20, No. 1, March 2005, pp. 57–67. [6] C. Nichitiu, and E. Remila, “Leader election by d dimensional cellular automata“, ICALP’99, Springer LNCS 1644, pp. 565–574, 1999. [7] L.O. Chua, L. Yang, “Cellular neural networks, theory“, IEEE Transactions on Circuits and Systems, vol. 35, pp. 1257-72, 1988. [8] L.O. Chua, T. Roska: Cellular neural networks and visual computing, Cambridge Press, 2002. [9] G. Liñán, A. Rodríguez-Vázquez, R. Carmona, F. Jiménez-Garrido, S. Espejo, R. Domínguez-Castro, “A 1000 FPS at 128×128 vision processor UIT 8-Bit digitized I/O“, IEEE Journal of Solid-State Circuits, vol. 39, 7, 2004, pp. 1044-1055. [10] D. Fey, and D. Schmidt, “Marching-pixels: a new organic computing paradigm for smart sensor processor arrays“, Proceedings of the 2nd conference on Computing Frontiers CF’05, ACM press, pp. 1-9,.2005. [11] D. Fey, and M. Komann, “Realising emergent image preprocessing tasks in cellular-automaton-alike massively parallel hardware“,Int. Journ. of Parallel, Emergent and Distributed Systems, 22 (2), pp. 79-89, 2007..

Suggest Documents