Knowledge and Event-Based System for Video-Surveillance Tasks

3 downloads 713 Views 502KB Size Report
tools for its management and examples of event inference/composition characterising ... Monitoring critical variables, whose deviations from normality is a sign of malfunction or .... Windows and the user interface with another operating system.
Knowledge and Event-Based System for Video-Surveillance Tasks Rafael Martínez Tomás and Angel Rivas Casado Dpto. Inteligencia Artificial. Escuela Técnica Superior de Ingeniería Informática, Universidad Nacional de Educación a Distancia, Juan del Rosal 16, 28040 Madrid, Spain [email protected], [email protected]

Abstract. This work describes an event–based system supported by knowledge to compose high-level abstraction events from intermediate agent events. The agents are in a level that interprets multi-sensory signals according to the scenario ontology, particularly, from video-sequence identification and monitoring. The target task is surveillance understood in its entirety from the identification of pre-alarm signals to planned action. The work describes the system architecture for surveillance based on this composition knowledge, how the knowledge base is organised, tools for its management and examples of event inference/composition characterising a scene situation.

1

Introduction

Surveillance associated tasks are increasingly prevalent in different scenarios and services. The spectrum of possible target situations is enormous and of varying complexity, from simply detecting movement in a controlled space to more global surveillance where scenes are monitored with different cameras and sensors, the suspicious situation is studied and diagnosed, and the dynamic planning is done coherently, according to how the situation, activities and resources evolve. The surveillance task as a whole, therefore, has a similar structure to that of a control task, 1. Monitoring critical variables, whose deviations from normality is a sign of malfunction or a warning of possible subsequent malfunctions, 2. Diagnosing a problem, consisting either of the search for the cause of the malfunction or a prediction of failures, and 3. Planning coordinated action of the different agents that collaborate in solving the problem. In any case, the fundamental problem is to understand or interpret appropriately what is happening in a surveillance target scenario and somehow solve the great semantic leap that occurs when passing from the physical level of the sensors, particularly in video-surveillance, from the pixels captured by the camera to the identification of subtle or complex movements and scenographies of different actors. J. Mira et al. (Eds.): IWINAC 2009, Part I, LNCS 5601, pp. 386–394, 2009. c Springer-Verlag Berlin Heidelberg 2009 

Knowledge and Event-Based System for Video-Surveillance Tasks

387

For this, conceptually decomposition in several intermediate description levels is used with an incremental degree of abstraction [1,2,3] which, moreover, enables knowledge to be injected into the appropriate level, particularly, information on the environment (physical, behavioural or social, knowledge of the task, etc.). This also implies the possibility of considering different feedback loops from the highest semantic levels to the lower levels to improve the specific tasks in these levels. There are specific references on the use of top-down feedback in the highlevel vision in the works of ours groups [4]. The feedback includes a reflection on what is inferred, a search for inconsistencies, greater precision, etc. Particularly, following the proposal of [1] in our works we differentiate between pixel level, blob level, object level and activity level [3]. In this article we are particularly interested in this last level, which starts from the description, O of the identification of objects or classified scene elements and their monitoring in time and space in the object level. In particular, in computer vision High Level Vision [4] is precisely the interpretation of scenes beyond the mere recognition of objects: recognising situations, activities and interactions between different agents participating in a video sequence. In this line, back in 1983 and 1984 Neumann and Novak [5,6] worked on a system to generate a natural language description of the activities observed in a traffic video sequence, using frames of cases based on locomotion verbs organised hierarchically for the representation. Bobick (1997) [7] characterised movement in terms of the consistency of entities and relations detected in a time sequence. Conversely, the concept of activity is understood as a composition of stereotyped movements, whose time sequence is characterised by statistical properties (e.g. hand gesture). Finally, he defined action as “semantic primitives relating to the context of the motion”. In [8] a hierarchical ontology is structured (events, verbs, episodes, stories, etc.,). By contrast, Chleq and Thonnat (1996) [9]only differentiate between primitive and composed events. Thus, generally, the most abstract activities or events are considered as a composition from other more primitive events inherent to lower semantic levels. This composition is done from spatio-temporal relations ([10] for example) or from common sense knowledge on hierarchies and concept relations ([6], for example). When human observers interpret the meaning of a scene obviously they use their knowledge of the world, the behaviour of the things that they know, the laws of physics and the set of intentions that govern agent activity. All this additional knowledge that does not appear explicitly in the signals generated by the sensors enables observers to model the scene and use this model to interpret or predict, at least partially, what is happening or may happen in the scene. It is knowledge that must be made explicit and represented for its operationalisation. In this article, which is a continuation of other works by the group [11,12,13], we focus on explicit knowledge to identify activities as event composition in the activity level and use the events from the identification and monitoring processes in a video surveillance system. The objective is also to develop some tools that facilitate the generation of new high-level interpretation systems and

388

R. Martínez Tomás and A. Rivas Casado

reuse standardised and recurrent events from different surveillance scenarios and situations. In the following section, an example of event composition is shown to identify an alarming situation and another example to solve a monitoring problem. The prototype will be shown below. It operationalises this knowledge, applies the composition mechanism on an event base, and has tools that facilitate the configuration and incorporation of new knowledge on activities, scenes and scenarios [3].

2

High-Level Event Composition and Knowledge-Base Structuring

Events from the identification and monitoring agents (agents-sensors) which meet specific spatio-temporal restrictions make it possible to infer, thereby, trigger events with a greater semantic level. Figure 1 schematically illustrates an example of an alarming situation. The first row includes simplified images in the scene instants. The second row contains the simple events that are generated from the segmentation, monitoring and identification of each frame of the sequence. The third row shows the pattern for the composition of events occurring at each instant. It is a knowledge unit for a composition: a set of events that must meet specific spatiotemporal relations and the consequent events with greater semantics. Thus, following the example we can interpret the sequence as follows: 1. At instant t human1 is detected on the scene. The identification and monitoring agent does not recognise that the human is carrying an object, so the spatio-temporal location of the human is only represented with the event “At”.

Fig. 1. Table showing input events, knowledge used and events inferred schematically between successive frames

Knowledge and Event-Based System for Video-Surveillance Tasks

389

2. At instant t+1 “object1” is detected near to the position of human1. This situation activates a pre-alarm of possible abandonment of an object with the event “Pre-alarm”. Since there are no other humans nearby, it is inferred that “human1” has left the object. An association is created between the object and human and the pre-alarm is activated. 3. At instant t+2 “human1” is detected leaving the scene. This event and the active pre-alarm identify a situation of abandoning an object. The event “Alarm” goes off. We group knowledge units into packages that identify a specific situation. In turn, the packages are organised into composition levels. Each package is assigned a composition level. Each composition level sends the composed events that it has generated to their higher composition level. Packages in different composition levels are interdependent. Thus, if a package in a specific level is added to the knowledge base, all those packages in the lower composition levels that are necessary for its functioning will be added. We pursue the objective of creating a library of packages rich enough to be able to configure a system with ease. Each library of packages has its own corresponding ontology of events. In Figure 2 we can observe all the composition hierarchy between the different system elements. The knowledge base consists of composition levels. Each composition level has packages. Each package has knowledge unit s. As shown in the second example, the knowledge base not only includes the precise knowledge for identifying alarming situations, but also knowledge that complements the information received from previous levels. The more expert knowledge there is the fewer the number of false alarms. This other example shows that the activity level may recognise actions to enrich or complete the identification information. We have a human, who is walking behind a column. The system is aware of the situation of the column. We define the event Column as any scene element causing occlusion. Figure 3 has a similar structure to the previous example. In the first row we find the schematised images between instants t and t+2 of the scene where a human passes behind a column. In the second row the simple events are represented that reach the activity level and trigger the packages that infer events (third row) of the next composition level.

Fig. 2. Cascade of the knowledge base

390

R. Martínez Tomás and A. Rivas Casado

Fig. 3. Example of composition levels

The fourth row shows these events inferred in level one, the fifth row, the behaviour of a level-two package and, finally, the sixth row, the composed events generated from this level. In t+2 a sequence was identified that makes it possible to infer (it is assumed), since no individuals participate in the scene other than the person that has entered and left the column space. This implies correcting the information that comes from the object level and renaming human2 as human1. Therefore, two packages in different levels were used to identify the occlusion situation, thanks to knowledge of the scenario.

3 3.1

Prototype and Development Tools System Structure and Global Process

We can identify two stages in the global process. Figure 4 shows the system structure and this differentiation schematically. All the system components are connected via a local area network (LAN). The knowledge base and the identification and monitoring agent interfaces are designed in the first stage.

Knowledge and Event-Based System for Video-Surveillance Tasks

391

Fig. 4. System structure

An environment for the development (KB IDE) facilitates this process from the repository of reusable packages. The agents, in the execution stage, send the simple events that they have identified to the Execution and Monitoring System via the corresponding interface. This Execution and Monitoring system, using the knowledge base, infers composed events that it immediately stores in the Database. The user interface selects, organises and compiles the most relevant information. The Execution System sends the inferred events that characterise alarm situations directly to the user interface. The system is multi-platform since it uses the LAN network as a connection medium between the components. Accordingly, for example, the Agents may be running under LINUX, the Database on Windows and the user interface with another operating system. This makes the system versatile unlike the previous version [12]. We have passed from an AllInOne system to a distributed system that minimises execution problems and requirements, since several interconnected machines are used to distribute the global calculation load. Logically, the execution time must be taken into account when the time requirements of an environment are highly demanding. Execution times can be adjusted to the work environment needs. The size of the knowledge base should also be considered. If this has a large number of packages the execution time will have to be increased to attain a time slice large enough to be able to perform all the necessary operations. 3.2

Knowledge Base IDE

This tool is used to create the knowledge base that is subsequently executed in the Execution and Monitoring System. The following tasks can be performed with this application:

392

R. Martínez Tomás and A. Rivas Casado

– Configuration Parameter Definition: the system has a configuration parameter table that reconfigures the system for different situations, and the knowledge units or packages do not have to be modified. If, for example, the system is working with information from cameras, we can define their resolution as configuration parameters. – Event Ontology Definition: we have a mechanism that helps define all the event ontology. – Package Definition: the composition Level to which the package belongs and the dependences with other packages in the lower composition level can be defined with this tool. Thus the environment of this package is automatically filtered. Only certain events in the previous level can be accessed. – Knowledge Unit Definition in the packages: with this design tool a composition can be constructed. Each knowledge unit only has access to events contained in its package. Figure 5 shows the main window of this tool describing initially the simple events and the composed events that the user has. It has tabs that open windows to edit the different system components, such as editing new packages and knowledge units. 3.3

Execution and Monitoring System

The system activates the knowledge base and stores all the composed events generated in a historical record. We distinguish between the following components: – Net server : it activates the socket for the Agents to connect to the system. The connection is bi-directional in order to request certain information from the Agent. – List of connected Agents: there is a list of all the Agents that are connected to the system along with their description and related data. – Event synchronisation system: it is in charge of generating time slices to receive and label events. Once the event has been received, it is labelled with the time associated with the time slice. Thus an asynchronous system is transformed into a synchronous one. – Composition Motor : inference motor that evaluates the requirements of the knowledge units and adds the new inferred events to the event base. – Statistic and monitoring system: it analyses the number of events that arrive from each Agent, the execution time of the Composition motor, the synchronisation system buffers and database connection response times. This information is really useful when calibrating and configuring all the execution system parameters. – Database connection to store the information: it sends all the events inferred at each instant to the selected database. This means that the information can later be processed. When analysing these data, possible errors in the knowledge base can be debugged, which is really useful for refining the system.

Knowledge and Event-Based System for Video-Surveillance Tasks

393

Fig. 5. Main window of Knowlendge Base IDE

4

Conclusions

This work highlights the ideas on which the our video-surveillance prototype is based. It is organised into description levels from the physical level to the activity level. This article focuses in particular on the activity level and its implementation as an event composition knowledge-based system, from the simplest events, which arrive at the activity level from identification and monitoring, to other events with a greater semantic load. The prototype is fully functional. It has two tools: a Development Interface for the Knowledge Base and an execution and monitoring system. The Development Interface pursues quite an ambitious objective to facilitate the configuration of the pattern of new alarming situations, based on standardised and habitual activities, and to build, repository of knowledge packages in video-surveillance identification and monitoring.

Acknowledgements The authors are grateful to the CiCYT for financial aid on project TIN-200767586- C02-01.

394

R. Martínez Tomás and A. Rivas Casado

References 1. Nagel, H.: Steps towards a cognitive vision system. AI Magazine 25(2), 31–50 (2004) 2. Neumann, B., Weiss, T.: Navigating through logic-based scene models for highlevel scene interpretations. In: Crowley, J.L., Piater, J.H., Vincze, M., Paletta, L. (eds.) ICVS 2003. LNCS, vol. 2626, pp. 212–222. Springer, Heidelberg (2003) 3. Martínez-Tomás, R., Rincón-Zamorano, M., Bachiller-Mayoral, M., Mira-Mira, J.: On the correspondence between objects and events for the diagnosis of situations in visual surveillance tasks. Pattern Recognition Letters (2007), doi:10.1016/j.patrec.2007.10.020 4. Carmona, E.J.: On the effect of feedback in multilevel representation spaces. Neurocomputing 72, 916–927 (2009) 5. Neumann, B., Novak, H.: Events models for recognition and natural language descripcion of events in real-world image sequences. In: Proceedings of the Eighth IJCAI, Karlsruhe, pp. 724–726. Morgan Kaufmann, San Mateo (1983) 6. Neumann, B.: Natural language description of time-varying scenes. Brericht no. 105, FBI-HH-B-105/84, Fachberic Informatik, University of Hamburg (1984) 7. Bobick, A.: Movement, activity, and action: The role of knowledge in the perception of motion. In: Royal Society Workshop on Knowledge-based Vision in Man and Machine, London, pp. 1257–1265 (1997) 8. Nagel, H.: From imagine sequences towards conceptual descriptions. Image and Vision Computing 6(2), 59–74 (1988) 9. Chleq, N., Thonnat, M.: A rule-based system for characterizing blood cell motion. In: Huang, T.S. (ed.) Image sequence processing and dynamic scene analysis. Springer, Heidelberg (1996) 10. Pinhanez, C., Bobick, A.: Pnf propagation and the detection of actions described by temporal intervals. In: DARPA Image Understanding Workshop, New Orleans, Lousiana, pp. 227–234 (1997) 11. Mira, J., Tomás, R.M., Rincón, M., Bachiller, M., Caballero, A.F.: Towards a semiautomatic situation diagnosis system in surveillance tasks based. In: Mira, J., Álvarez, J.R. (eds.) IWINAC 2007. LNCS, vol. 4528, pp. 90–98. Springer, Heidelberg (2007) 12. Carmona, E., Cantos, J.M., Mira, J.: A new video segmentation method of moving objects based on blob-level knowledge. Pattern Recognition Letters 29, 272–285 (2008) 13. Rincón, M., Carmona, E.J., Bachiller, M., Folgado, E.: Segmentation of moving objects with information feedback between description levels. In: Mira, J., Álvarez, J.R. (eds.) IWINAC 2007, Part II. LNCS, vol. 4528, pp. 171–181. Springer, Heidelberg (2007)