A distributed dynamic reconfigurable system for ambient intelligence

0 downloads 0 Views 244KB Size Report
Abstract: The concept of Ambient Intelligence has been developed during a series of ISTAG. (Information Societies Technology Advisory Group) and other ...
A distributed dynamic reconfigurable system for ambient intelligence L.Marcenaro, L.Marchesotti, F.Cella and C.S.Regazzoni DIBE, Department of Biophysical and Electronic Engineering University of Genova, Via All’Opera Pia 11, Genova, Italy e-mail: [email protected] Abstract: The concept of Ambient Intelligence has been developed during a series of ISTAG (Information Societies Technology Advisory Group) and other meetings as a guiding vision to give an overall direction to Europe’s Information Societies Technology program. Ambient Intelligence is essentially an elaboration of Mark Weiser’s vision of Ubiquitous but Calm Computing which stresses the importance of social and human factors as well as developing the base technologies on which aspects of the vision are founded. Although Ambient Intelligence covers a large range of concerns, both human and technical, there are some technologies which might be excluded. They are characterised by Mark Weiser’s statement about Ubiquitous computing: “Ubiquitous computing is roughly the opposite of virtual reality. Where virtual reality puts people inside a computergenerated world, ubiquitous computing forces the computer to live out here in the world with people.”. Seen in this light, Ambient Intelligence is the limit of a process which introduces the technology into people’s lives in such a way that the introduction never feels like a conscious learning curve: no special interface is needed because human experience is already a rich ‘Manual’ of ways of interfacing to changing systems and services. Somehow, we need to create technology that leverages this powerful human resource rather than trying to suppress it by requiring humans to participate in inflexible interaction protocols of the sort supported by current call center technology. A distributed system for ambient intelligence needs to solve several problems, the main one being task allocation and resource management. In particular, one can suppose to subdivide a wide area monitoring system in several elementary logical tasks to be allocated to physical processors that can be constituted by standard PCs, but also intelligent sensors or embedded processors. This paper analyses the problem of distribution of intelligence in detail and proposes a novel technique for dynamic task allocation and reconfiguration in a distributed system. A

formalization of task allocation problem is proposed and a system that is able to automatically download and run logical modules onto physical processors is described. Proposed results demonstrate the validity of this approach.

1 Introduction Machine vision applications are nowadays going in the direction of pervasive, multisensor systems [1] with augmented functionalities and abilities. In particular these kind of systems have not only to process video data, but also to react to “stimula” coming from the real word. In this sense such kind of system has to instantiate customized communications with humans in order to interact with them showing some degree of intelligence. These architectures go under the name of Ambient Intelligence systems. In [2] a more formal definition of Ambient intelligence (AmI) can be found; it points out as such systems have to integrate technologies to support human interactions and to surround the users with intelligent sensors and interfaces. In the development of such systems a variety of different issues arises involving both algorithm and architectural level. In particular the duty of collecting heterogeneous data from a network of sensors and the need of processing decentralization make the multisensor approach and the distribution of processing load, key features of such architectures. The architecture of an AmI system can be described at the logical level with a hierarchical decomposition of the whole logical process into a set of basic processing tasks; in this sense a task is a well-defined set of operations involved in the image information processing (i.e., digital frame acquisition is a processing task necessary to derive a digital signal representation from an analog one). In [3] the concept of “Dynamic Task Allocation” for distributed computing systems (DCS) is formalized stating

that its purpose is to increase the system throughput in a dynamic environment; this is achieved by balancing the exploitation of computing units and by minimizing communication overhead between them. In [4] a model is proposed to decompose logical surveillance functionalities (e.g., tracking and classification of objects in complex scenes) into a set of modules and to optimally allocate such modules among a set of physical processing units by minimizing a functional cost. Work described in [5] represents the example of a working system that provides a complete Distributed Computing Environment infrastructure characterized by an highly scalable model, to handle data and provide services in a secure and performing fashion. At algorithmic level past work [6] on load balancing shows that is a n NP-hard problem; typical approaches are based on “Graph Theory”, “Int. Programming” and “Heuristic Methods”. Concerning the first line of research “graph-matching” [7] algorithm id used whereas “Int. Programming” techniques are based on functional costs optimization. The heuristic approach returns suboptimal solutions but it has been shown to be more appropriate and functional to solve allocation issues. The presented paper analyses the problem of distribution of intelligence in detail and proposes in section 2 a formalization of task allocation problem. A novel technique for dynamic task allocation and reconfiguration in a distributed system is presented in section 3 as well as results (section 4) which demonstrate the validity of this approach.

2 System description A system for scene understanding can be subdivided into basic logical modules called tasks. Each task can be allocated to a processor of the video-surveillance network and the allocation strategy should be able to optimize the reactivity of the overall system, i.e. the time interval that is needed to process a single frame. Different types of processing units constitute the processors network with different roles; at least three typologies of machines can be considered: clients, processing servers and code servers. Clients can uniquely execute their own processing tasks and in general, the first task of a processing chain is executed by a client. A processing server (PSE) can instead execute tasks from each connected client on the basis of the allocation rule. Code servers (CSE) are not used for executing scene understanding tasks, but are used within the network for control purposes and as repository for downloadable code modules. CSEs have to be taken into account by the allocation policy

because they can modify network available bandwidth. A client can be represented by an intelligent sensor, i.e. a sensor with on-board processing capabilities; intelligent sensors are currently available on the market, but they can be properly simulated by using a standard video camera connected to a processor. Without loss of generality one can suppose that servers are executing tasks with a nonpreemptive policy; if a particular task Ti has to process results from a certain number of logically preceding tasks Tj, j=1…N, it can be supposed that task Ti does not start until tasks Tj and correspondent data transmission are completed. One can suppose to know the processing capabilities of each processor in the surveillance network; the amount of available memory of each machine is supposed to be fixed and a PSE machine is supposed to be able to execute more then a single task by equally sharing its resources between running tasks. The number of PSEs and CSEs is supposed to be fixed while clients can be dynamically turned on and off. A processing network is made up by a certain number of servers Si, i={1,…,Ns} and Ci, i={1,…,Nc} clients connected to the system. Processing capabilities of involved machines Pi, i={C1,...,CNc,…,S1,…,SNs} in term of number of operations per second, can be measured and considered in order to solve the task allocation problem. A particular surveillance functionality m can be expressed trough a certain chain of Nm logical tasks. A system representation can be given by using graph theory. Two different schemes can be used in order to specify a particular architecture: the first graph is associated to the general physical network architecture, showing clients and servers and their interconnections. The second graph can be used in order to specify the logical architecture that corresponds to a certain system functionality. In this graph, each task of a given functionality is represented by a node. A computational cost OPmi is associated to the node Tmi reflecting its complexity from a computational point of view. The amount of data that a certain task Tmi needs to transmit to its successor Tmk can be specified by the quantity Cmik. It can be supposed Cmik=0 if tasks Tmi and Tmk are executed by a unique machine. In general quantities OPmi and Cmik can vary with input data thus they are not known a priori. However for certain low level tasks, these quantities depend uniquely on image dimensions and color depth, so they can be considered fixed in these cases.

OP1

OP2

C21

1

C32

OP3

2

C43

3

C12

C23

4

C63 C56

6

OP5

OP6

Figure 1 Logical architecture for a given functionality that can be divided into 6 logical modules

For each client machine, a matrix Cm can be defined, representing the amount of data that each task has to transmit to its neighborhood. In the case of figure 1 one can write: 0 0 0 0   0 C12 C 0 C 23 0 C 25 0   21  0 C 32 0 C 34 0 0  Cm =   0 C 43 0 0 0   0  0 0 0 0 0 C 56    0 C 63 0 0 0   0

C1 C2

PC1 PC2 PC3

C3

PS1

B14

S1

B15 B24

B25 B35

3 Optimal tasks allocation

C34

C25

5

OP4

B34

PS2 S2

Figure 2 Physical architecture with 3 clients and 2 servers

A separate graph as the one in figure 2 can be constructed for specifying the system physical architecture. In this case each node represents a processing resource; the quantity Pj specifies the global computational power of the node that has to be shared by the executing tasks. Graph arcs take into account the available bandwidth between different processors Bij. The structure of the network graph can be represented by a matrix B. In the case of figure 2 the matrix is given by: 0 0 B14 B15   0  0 0 0 B24 B25   B= 0 0 0 B34 B35  .   0   B14 B24 B34 0  B15 B25 B35 0 0   In general, one can suppose that the distributed system has to execute several chains of tasks, i.e. several functionalities. In this general case, a PSE can be required to execute several tasks in parallel in order to finish the processing of data coming from different clients and then perform some kind of data fusion at a certain level.

In order to successfully exploit the network of computational units in the system a cost functional has been evaluated. In particular tasks can be dynamically allocated by minimizing the functional that basically considers the total time of execution for a certain surveillance functionality. Let’s start and consider clients only: a client Cj has to execute a certain number of tasks x for a given task chain m; thus the execution time for the j-th client can be computed as: τ

Cj

x

(x ) = ∑ t i =0

j Tim

.

If a server Sj is considered, one can suppose that K different clients are executing tasks on the Sj server. In general, K is not fixed over time but changes with different modules configurations. The execution time for the tasks remaining from the processing chain of the h-th client can be expressed with:  0 if x = N m  S τ j (N m − x ) =  N m S j t T m otherwise i∑ i = x +1 The execution time of tasks i-th of the m-th chain on S OP m ⋅ K (t ) . On the Sj server is given by: t T mj = i Pj i other hand, the total computational complexity of a task chain m on a server Sj can be written as: Ω mN m − x =

Nm

∑ OP i

m

.

i = x +1

If a server Sj has a processing power equal to PS j one can suppose that its power is distributed equally to each executing process from each task chain of connected clients in a certain time instant. However, processes have heterogeneous duration times, then the computational power that is reserved for each process changes over time. In order to simplify the mathematical formulation of the problem one can equivalently suppose that the available computational power is shared between executing processes depending on their computational complexity. In this way the mathematical expression of the problem is greatly simplified leading to the same result: if PS j is the total available computational complexity for a PS j is the processing power server Sj, Ω mN m − xm dedicated to the processing of data from a m-th client. The execution time for data processing tasks remaining from each client on a server Sj is the expressed by NC

τ

Sj

(x ) =

∑Ω m =1

m N m − xm

PS j

.

This execution time has to be summed with the one for data fusion tasks that can not migrate but can be executed only by the server based on data coming from each client. One can suppose to have a certain number of data fusion tasks Fh, h={1,…,NF} with OPFh computational complexities. The correspondent execution time over server Sj is then given by: NF

τF = Sj

∑ OP h =1

Fh

PS j

In order to mathematically express the total execution time for a certain set of task chains over a physical surveillance network, transmission times should also be considered. Transmission time from a client Ci to a server Sj can be expressed by knowing the available bandwidth for data transmission Bij and the amount of data that needs to be transmitted. The last quantity is function of the task allocation and in particular each task Ti can be associated with a output rate Rm(x), i.e. the quantity of data that it generates, that is function of the modules allocation. CS τ i j = Rm (x ) B . ij The total execution time can be considered as sum of all the previous terms, thus: C S S T (x m ) = min τ j (x m ) + max τ j (N m − x m ) + τ F j (x ) j

(

)

(

)

A solution can be found by minimizing the total execution time by varying each xm for each functionality between {1,…,Nm}. For a certain task chain m, xm specifies the modules allocation between available clients and servers. If a single client single server configuration is selected, with a single task chain, the problem is greatly simplified. In this case, if the task T1 is allocated to the client C1, and no data fusion tasks are considered, one can write: N1 OP OP ( ) + ∑ Ti + R1 T1 T (T1 ) = T1 PC1 i = 2 PS1 BC1S1 The same quantity can be evaluated for each task allocation configuration Tk, k={1,…,Nm} and the minimum can be chosen as optimum modules allocation. From a practical point of view, if a more complex system is considered, with a CSE and a PSE but more than a client dynamically turned on and off, the following procedure can be used: firstly, code server and processing server are started; network and computational complexity monitoring are started; if a client connects to the processing server, the optimal modules allocation is computed and needed software modules are downloaded to the client. Network and computational statistics are continuously updated and then a new modules allocation is carried out in the case it is needed. If a new client is connected, a new

allocation of resources can be possible because the server has to share its computational power with more clients and, if clients are using the same network resources, the bandwidth between involved machines can vary.

4

Results

Results have been obtained by creating a distributed surveillance system with three PCs. A client PC is connected to a standard camera in order to simulate an intelligent camera device; the PC architecture is based on a Intel Pentium II 450MHz. Two AMD Athlon 900MHz PC are used ad PSE and CSE. The selected system functionality considers objects tracking and classification, i.e. a standard functionality for an automatic surveillance system. Nine different tasks have been considered as showed in table 1. The computational complexity of first 4 tasks can be estimated on the basis of processed images features and it can be considered fixed. Computational complexity of tasks 5 to 9 can vary with input data and it can be estimated by the optimization algorithm. A 10Mbits network is used for code migration step, while the system was tested with 100 and 10 Mbits networks regarding the connection between client and PSE. In order to evaluate available bandwidth in the surveillance network a technique based on difference between data sending and arrival time for considered PCs was adopted. By using this technique one does not need to send a predefined amount of data in order to estimate the connection bandwidth, but it can be evaluated directly from images extracted data. In this way the system is able to take into account the variations in the channel capacity thus reallocating logical modules if necessary. Proposed algorithm can be successfully used only if the system synchronizes processors internal clocks during the initialization phase. Actually, it can be observed that there is a substantial time drift between different processors and this has to be considered in order to re-synchronize processors clocks after a certain period. Figure 3 shows the differences between processors clock during time: time differences between client and server are plotted vs. time. It can be seen that in approximately 15 minutes the time drift is 80ms. For low-level modules, the amount of data to be transmitted is relatively high (about 1MB of data for 800x600 images with 24bit color depth) and then the time needed for data transmission is very high (i.e. 1 sec) if compared with the considered time drift. In this case then a resynchronization per hour is needed.

TASK

Description

T1 T2 T3 T4 T5 T6 T7 T8 T9

Acquisition Image difference Threshold Morphology Blob coloring Tracking Classification Features extraction Trajectory estimation

Module transmission time (10 Mbit) ms 79 7 8 6 63 91 21.9 9 10

Output data transmission time (10 Mbit) ms 1213 376 381 385 17 10 31 33

Table 1 Different tasks considered for experimental results

44540

TIME DRIFT(ms)

44520 44500 44480 44460 44440 44420 946

883

820

757

694

631

568

505

442

379

316

253

190

127

64

1

44400 TIME (s) Figure 3 The time difference between two different processors is plotted over time

On the other hand, if a higher-level module is considered, the amount of data to be transmitted is quite low (being just objects lists and positions). Thus, if a higher-level module is allocated to the smart camera device, a more frequent synchronization may be needed.In figure 4 (a) and (b) processing times for client and server are plotted over time by varying the modules allocation. For instance, T1 means that the client is performing acquisition only while the server is performing all the others logical tasks. It can be noticed that, because of the processing capabilities of involved machines, it takes approximately 350ms for

acquisition module on the client, while server needs only 150ms for all the other tasks. Figures 5 and 6 show the total processing time for 10 and 100 Mbit network respectively. Graphs are obtained with two different client processing powers. The minimum of the functional cost gives the optimal modules allocation in the surveillance network. It can be seen that allocating T5 or T6 often minimizes execution time. In the case of 100Mbit network and a slow client however it is preferable allocate the first two tasks only to it and transmit information to the remote server through a fast network.

1000 900

Time (ms)

800 700

T1

600

T2

500

T3

400

T4

300

T5

200 100 0 1

5

9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 Time (s)

(a) 250

Time (ms)

200

T1 T2

150

T3 100

T4 T5

50 0 1

4

7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 Time (s)

(b) Figure 4 Execution times for different modules allocations for (a) client and (b) server.

5

Conclusions

An analysis of the task allocation problem and resources optimization of a distributed system for video surveillance has been proposed. The problem was considered from a mathematical point of view by leading the dynamic allocation analysis to a

functional optimization problem. A general theoretical solution has been proposed, while some numerical results were carried out in a simple single client single server realization of the distributed surveillance system. A method have been proposed in order to chose the best task allocation for minimizing the total processing time of the system and then its events reaction time.

2500 Client A (PII 233)

Total execution time (ms)

Client B (PII 450)

2000

1500

1000

500

0 T1

T2

T3

T4

T5

T6

T7

T8

Task allocation Figure 5 Total execution times for different allocations by using a 10Mbit network and a clients with different processing power

Total execution time (ms)

1800

Client A (PII 233MHz)

1600

Client B (PII 450MHz)

1400 1200 1000 800 600 400 200 0 T1

T2

T3

T4

T5

T6

T7

T8

Task allocation

Figure 6 Total execution times for different allocations by using a 100Mbit network and a clients with different processing power

6

References

[1] R. Collins, A. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithms for cooperative multisensor surveillance” Proceedings of the IEEE, Vol. 89, No. 10, October, 2001, pp. 1456 - 1477 [2] ISTAG Scenarios for Ambient Intelligence in 2010 http://www.cordis.lu/istag.htm [3] H. Wu, D. Chang and W. J.B. Oldham “Dynamic Task Allocation Models for large Distributed Computing Systems” IEEE Transactions on Parallel and Distributeb Systems,vol.6 no. 12 December 1995. pp 2151-2161.

[4]L. Marcenaro, F. Oberti, G.L. Foresti and C.S. Regazzoni, “Distributed architectures and logical-task decomposition in multimedia surveillance systems”,Proceedings of the IEEE, Vol. 89, no. 10, October 2001, pp. 1419-1440. [5] http://www.opengroup.org/dce/ [6] W. W. Chu, L. J Hollway, M-T. Lan, and K. Efe, “Task Allocation in Distributed Data Processing” IEEE Computer, nov. 1980 pp 57,69. [7] C.C. Shen and W-H. Tsai, “A Graph Matching Approach to Optimal task Assignment in Distributed Computing Systems Using a Minimax Criterion”, IEEE Trans. On Computers, Vol. C-34, No.3, March 1985, pp 197-203.