FREEDIUS: An Open Source Lisp-Based Image ... - CiteSeerX

1 downloads 0 Views 5MB Size Report
quality anti-aliased texture mapping of imagery onto terrain models. The Yosemite Valley fly-through (Figure 1) was an example of the capability of this system.
FREEDIUS: An Open Source Lisp-Based Image Understanding Environment Christopher Connolly

Lynn Quam [email protected]

Artificial Intelligence Center SRI International, Inc. 333 Ravenswood Avenue Menlo Park, CA, 94025 +01 650-859-5022

[email protected] ABSTRACT This paper describes FREEDIUS, an open-source image understanding system. FREEDIUS is a Lisp-C hybrid system that exploits CLOS for rapid prototyping, flexibility of presentation in the user interface, and flexibility in persistent object storage. Applications of FREEDIUS include site modeling, video track analysis and event recognition.

Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features – abstract data types, polymorphism, control structures.

General Terms Algorithms, Measurement, Languages.

Keywords Image Understanding, Recognition.

Tracking,

Site

Modeling,

Event

1. INTRODUCTION FREEDIUS is a successor to a line of Image Understanding tools developed at SRI International (SRI) over the last three decades. These tools were written in Lisp, and originally targeted the Symbolics Lisp Machine as the implementation platform. An early system, ImagCalc, was begun in 1982 and served in part as a workbench for developing IU algorithms. This system implemented many fundamental image operators (such as Gaussian smoothing, image pyramids) and a range of Lisp macros that aided in rapid prototyping of image processing algorithms. ImagCalc was based on a simple stack machine model where each window contains a stack of images and GUI operations push and pop elements of the stack. This paradigm is present in FREEDIUS today.

Figure 1: TerrainCalc screenshot circa 1984, showing Yosemite Valley.

Next came TerrainCalc, one of the earliest systems for high quality anti-aliased texture mapping of imagery onto terrain models. The Yosemite Valley fly-through (Figure 1) was an example of the capability of this system. As DARPA projects required more understanding of the threedimensional (3-D) nature of the world as seen in the imagery, ImagCalc evolved into the SRI Cartographic Modeling Environment (CME) [1] whose GUI supported the interactive creation and manipulation of 3-D object models overlaid on imagery. The Lisp Machine version of CME was licensed by SRI to a variety of commercial and university groups. During the 1990s, these systems evolved further, and with the demise of Symbolics, Inc., were ported to Unix platforms (Sun and SGI) using a combination of C and Lucid Common Lisp. This effort was funded by DARPA under the RADIUS (Research and Development for Image Understanding Systems) program. Hence, the resulting system was called the RCDE [2,3] (the RADIUS Common Development Environment). The late 1990s saw the emergence of high-performance graphics in OpenGL, combined with portable GUI systems and cross-platform Lisp implementations. These were some of the factors that led to the development of FREEDIUS as a successor to ImagCalc and CME. FREEDIUS is a reimplementation of ImagCalc and CME whose

primary goals are: 1. To support several Common Lisp implementations: a. Allegro CL b. CMUCL 2. To support multiple OS and hardware platforms: a. Linux/X68 b. Sun/Sparc c. Mac OSX, X86, and PPC d. Windows/X86 The FREEDIUS license is the Mozilla Public License [2], which allows developers some freedom in distributing products built using FREEDIUS.

2.1 Foreign Libraries The FREEDIUS foreign function interface maps foreign primitive type and function declarations into a common API (the QFFI API). The QFFI API is implemented by a set of Lisp macros that map FREEDIUS foreign declarations into the underlying Lisp’s native foreign function API. Although it predates the UFFI, QFFI is similar in principle in that it generalizes and hides implementation-specific foreign interfaces and hides implementation differences from the programmer. The QFFI is an important aspect of FREEDIUS, since in practice there are many foreign libraries that can be recruited to perform various functions relevant to FREEDIUS applications (e.g., libgeotiff, libdv, libpng, etc.).

2. SYSTEM ARCHITECTURE 2.2 Tcl/Tk FREEDIUS permits the display and manipulation of images, 3-D geometry, and associated data and products. Many low-level operations (matrix manipulation, OpenGL helper functions, and low-level image file I/O, for example) are implemented in a core C/C++ library (libFREEDIUS), while the remaining functionality is implemented in Lisp.

Renderer

GUI (Tcl/Tk)

(OpenGL)

Lisp Core

Tcl/Tk provides the GUI framework for FREEDIUS. While other GUI packages have been adapted for FREEDIUS, Tcl/Tk is available for a wide variety of platforms and provides a native look-and-feel. The interface between FREEDIUS and Tcl/Tk is implemented using foreign functions to create a Tcl interpreter within the Lisp process. Some helper Tcl scripts are required for creating certain widget classes and for event dispatching back to Lisp. Foreign-callable functions are defined within Lisp to handle most Tcl/Tk events, and these are dispatched to implement pop-up menus and context-dependent mouse action. A separate C library, liblisptk, is used by FREEDIUS to interact with Tcl/Tk.

2.3 Images

Foreign Libraries C/C++ Core

Figure 2: FREEDIUS system components.

As shown in Figure 2, the Lisp component of FREEDIUS provides central control for all GUI, rendering, and foreign function operations. FREEDIUS has a read-eval-print loop (REPL) that is enhanced for graphical interaction with images and other objects (e.g., mouse selection actions bind “*“ to the selected object). Typically, FREEDIUS is used in conjunction with the SLIME system (http://common-lisp.net/project/slime/) to provide a convenient development environment. FREEDIUS also implements an eval-caching facility so that results of computeintensive operations can be saved and reused at no added computational cost. The Lisp portion of FREEDIUS is written mostly in Common Lisp, with some implementation-dependent extensions to handle certain system calls, OpenGL extensions, and foreign function APIs. These extensions have been implemented for Franz Allegro Lisp and CMU Common Lisp. FREEDIUS has been built and tested on Linux, Windows, Solaris, and MacOSX (Intel and PPC).

FREEDIUS supports images of various types and sizes. Small images (e.g., video frames) are stored using in-memory arrays. Larger images are tiled and represented as paged images that use the file system for auxiliary storage. This relieves the demand on virtual memory since FREEDIUS itself handles image tile paging and page pool management. The tiling mechanism in the libtiff library is usually used to that end. Both categories of image are instantiated as CLOS classes (array-image and paged-image) with slots for describing image dimensions and block (tile) sizes. File-mapped images are structured so that blocks (tiles) of each image are loaded into memory as needed from file storage. Pixels can be scalar or vector types. FREEDIUS supports signed and unsigned 8, 16, and 32-bit scalars, as well as single and double-float scalars. Standard color image types include 8-bit RGB, YUV, YUV411, and YUV422. Color images are built on the vector-image class. Most color-image objects are represented as tuples of scalar-image objects. Figure 3 illustrates the class hierarchy of FREEDIUS images. Each image implicitly defines a two-dimensional (2-D) coordinate system that consists of the pixel indices in the image. Since images can be scaled or otherwise arranged into pyramids and windows, it is necessary to relate each image to an underlying coordinate system that represents the fundamental imaging event that gave rise to an image and its derivative products. This

gl-xyz-sizable-object-mixin

image

vector-image

list-of-faces-object

scalar-image

array-image

sphere

paged-image

basic-extrusion cylinder half-cylinder

color-image

packed-color-image

Figure 3: CLOS class hierarchy for FREEDIUS images.

coordinate system is called a 2d-world. Images are related to a 2d-world by the image-to-2d-transform associated with each image. This allows arbitrarily scaled or windowed images to be related back to a common image plane coordinate frame.

2.4 Objects FREEDIUS supports a wide range of 2-D and 3-D object types for modeling and mensuration. Crosshair objects allow points of interest to be defined in two or three dimensions, usually for mensuration or sensor calibration. Figure 4 shows the crosshair object hierarchy. crosshair-object

2d-crosshair-object

3d-crosshair-object conjugate-point-object

Figure 4: Crosshair (1D object) class hierarchy.

curve

2d-curve

3d-curve

3d-ruler-object

3d-closed-curve

circle 2d-closed-curve Figure 5: Curve class hierarchy.

house-object

cube-object

Figure 6: Partial 3D object class hierarchy.

Linear features can be described by open or closed curves, ribbons (curves with width), or polygons (areas). These classes are useful for delineating roads, parking lots, bodies of water, and other objects that do not have a significant extent in height. Figure 5 shows part of the FREEDIUS 2-D object class hierarchy. True 3-D objects are usually represented as a graph of vertices and edges, whose topology implicitly defines object faces. These objects are typically parameterized in some way (e.g., by length, width, and height, or by length and angle). All objects have a native coordinate system in which the vertices (3-D or 2D points) are expressed. Often, these coordinate systems are objectcentered, allowing object vertex coordinates to be near 0.0. Part of the FREEDIUS class hierarchy for 3-D objects can be seen in Figure 6. Objects can be grouped for display or manipulation in object-sets or feature-sets. The former are used mainly for display, GUI interaction, and grouping for processing, while feature-sets are akin to the layers found in traditional GIS systems. Featuresets have an I/O framework that allows them to be persistent.

2.5 The Transformation Hierarchy One goal of FREEDIUS is to describe the series of transformations and projections that map a point in the world (perhaps expressed in geospatial coordinates) into points on an image plane. This allows users to extract object coordinates in a variety of different forms. Objects can be projected onto a sensor plane, or transformed into geospatial coordinates for I/O or further processing. The FREEDIUS transformation hierarchy supports this goal by establishing a network of transformations and projections (and their inverses) that describe the geometric relationships among objects and images. This is accomplished through the coordinate-system object. Objects and images are typically associated with coordinate systems that are related pairwise by transformations. Figure 7 shows the basic hierarchy of coordinate systems, ranging from geospatial 3-D coordinates (left hand side) to 2-D image coordinates (right hand side). By chaining transformations, any two coordinate systems in the same hierarchy can be related. Figure 8 shows an example of this kind of chaining.

latitude, longitude, and elevation into image coordinates can be reduced to one projection by composing the transformations such as those seen in Figure 7. FREEDIUS provides a wide range of transformation types. Among them are:

Figure 7: Coordinate system networks.

Image 2D-World

1.

4 × 4 homogeneous matrix transformations.

2.

A frame-camera class that adequately models many optical sensors.

3.

Rational Polynomial Coefficient (RPC) sensor models.

4.

Geospatial transformations, including Lat/Long and Universal Transverse Mercator (UTM).

Any of these can be used to link pairs of coordinate systems so that the correct set of transformations is applied to bring a point from one coordinate frame to another. The CLOS classes corresponding to each type specify information about how points are to be transformed, and how inverse transformations can be constructed.

4 Lat/Long WGS-84

3 2

1

LVCS Figure 8: Coordinate transformation chain.

In Figure 8, five types of coordinate frame are shown. The bidirectional arrows indicate transformations between pairs of coordinate systems. The numbered transformations correspond to the following functions and their inverses: 1.

geocentric-to-lat-long-transform

2.

geocentric-to-lvcs-transform

3.

3d-to-2d-projection

4.

image-to-2d-transform

Each of these functions returns a transformation object that makes a transition between a pair of coordinate systems. Each transformation object can be applied to a vertex (3-D point) to create a transformed vertex, and each transformation has an inverse that can be applied in the opposite direction. Geospatial coordinates define a point on the earth’s surface. Latitude, longitude, and elevation with respect to some datum is one example of a geospatial coordinate system. In practice, most object geometry is expressed in a local Cartesian coordinate system. For site modeling in FREEDIUS, each site defines an LVCS (Local Vertical Coordinate System) which is a Cartesian coordinate system within which the site geometry is expressed. The origin of the LVCS is assigned a geospatial coordinate in an appropriate coordinate system to relate it to a geographical position. Finally, the 2d-world object represents a plane of projection. In this case, a projection is known from the LVCS to the 2d-world. FREEDIUS allows transformations to be composed and cached, so that the task of projecting points from

2.6 OpenGL Image and object rendering in FREEDIUS is achieved using Lisp bindings to OpenGL through the foreign function interface. Most OpenGL calls are defined in Lisp, allowing FREEDIUS to use most of the rendering and selection capability of OpenGL. Image display is handled using texture tile pools that are managed in Lisp. Image display is accomplished by OpenGL texture mapping, while geometry is displayed using the OpenGL transformation and display-list mechanisms. To overlay geometry on image data, the FREEDIUS transformation hierarchy is used to bring all data into one common 2D coordinate frame that is associated with the image. Images generally have some sort of sensor model associated with them that serves as the projection transformation taking 3-D points into image coordinates. For frame-camera objects, this is decomposed into the exterior orientation of the camera and the internal projection transformation. For non-frame-camera sensors, such as the RPC, vertex projections are cached for display, and local linear approximations can be used if surface normals are required for rendering. In any case, all transformations that are required to project points into the image are linearized and pushed (as appropriate) onto the OpenGL matrix stack to effect display. OpenGL is also used for object selection. Object vertices, edges, and faces are mouse-sensitive to allow users to modify objects. Tcl callbacks for mouse movement are used to prepare and generate calls to the OpenGL selection API. Candidate selections are sorted and returned to FREEDIUS. Selections are expressed as pairs containing an object point or arc, along with the selected object. The first object on the list is returned by the selectedobject function but all selected objects can be examined by calling selected-objects.

2.7 GUI Context CLOS allows a great deal of flexibility in deciding how objects can be presented and manipulated in different contexts. Methods

can be conditionalized on object type and view type to allow the same objects to be presented and manipulated in radically different ways. In this section we discuss the interactions of two crucial sources of user interface context.

2.7.1 Views FREEDIUS and its predecessors implemented a view class that dictates how objects are to be rendered to the user (Figure 9). The view contains information about the 2D-world, the image to be rendered, and the object-sets that are to be displayed on the view. Prior to FREEDIUS, every view was spatial and displayed one image with associated geometry. FREEDIUS extends this notion by allowing other view types (especially temporal views) to be created. In particular, the timeline-view object allows timevarying objects to be displayed on a common timeline. Object rendering is therefore conditional both on the object and the type of view in which it is to be rendered. The drawobject generic function allows object drawing to be specialized on both the object type and the view type. A prime example of the use of such methods is in the world-line class, which is a subclass of 3d-curve. World-lines have 3D extent in time; a timestamp array is added so that each curve

vertex has a time associated with it. In spatial views, these objects are rendered as curves, using draw-object method for 3dcurve. However, the draw-object method for worldlines is specialized for timeline-view. While 3d-curves do not normally display anything in a timeline-view, world-lines are displayed as green bars with mouse-sensitive time samples. One can easily imagine view classes that conditionally display other object parameters in an analogous way.

2.7.2 Object Interaction The class of a selected object provides context for GUI interaction with that object. FREEDIUS provides a ui-context class for specializing operations depending on the object class and the view in which it has been selected. One role for the ui-context is to provide context for generating the contents of pop-up menus. The object-popup-menu-item-list method takes a FREEDIUS object and a ui-context as its arguments and generates the menu item list that is to be used in response to a mouse right-click on an object of a given class. Specializations of ui-context can be created to change the user interface behavior under certain circumstances. For example,

Figure 9: Site view in RCDE Electronic Light Table, showing annotated features.

Figure 10: A camera-bag (shaded triangle) seen from two different views; left: ground camera, right: USGS orthoimage. some FREEDIUS subsystems alter or extend popup menu contents using subclasses of ui-context. In other cases, the default user interface uses let bindings to temporarily alter the ui-context for specific applications. Whole objects can be moved in a variety of ways (e.g., in image coordinates, on the terrain, along the line of sight), and vertices can be modified similarly. Object extrusions are also supported to allow building outlines to be drawn and then extruded to the desired height. This is especially useful when modeling sites from a combination of orthorectified and oblique images [3]. User interface context is particularly useful when dealing with multiple object classes and multiple view classes. For example, tracks obtained from video sources are typically implemented as subclasses of world-line. Mouse selection of track vertices in a spatial view can cause the corresponding sample’s image chip to be displayed (see Section 3.2), while on a timeline-view, vertex selection triggers highlighting in multiple views.

2.8 FREEDIUS Subsystems One goal of FREEDIUS is to provide an open-source base system on which to develop derivative systems under various kinds of licenses. FREEDIUS itself is offered by SRI under the Mozilla Public License (http://www.mozilla.org/MPL), which provides a degree of flexibility and a minimum of contagion for developers who wish to build on FREEDIUS. To support modular design, FREEDIUS implements a system tool similar to Symbolics’ defsystem. Pure Lisp subsystems are supported, as well as mixed systems that require foreign libraries. Mixed subsystems allow third-party libraries to be incorporated easily into FREEDIUS. Subsystems have been implemented for libdv, libgeotiff, shapelib, libmpeg2, and many others. Several SRI-authored C and C++ systems have been integrated with FREEDIUS in a similar way.

3. APPLICATIONS Prior sections have described the basic infrastructure of FREEDIUS. In this section, we describe some applications that have been built using the base FREEDIUS system. These applications also illustrate the use of third party libraries to enhance the core capabilities of FREEDIUS.

3.1 Photogrammetry and Site Modeling As seen in prior sections, FREEDIUS has a rich framework for representing, manipulating, and rendering images, site geometry, and sensor models. One recent application for FREEDIUSrequired site modeling is for the US Naval Academy at Annapolis, MD. For this project, USGS source data was acquired from the USGS Seamless delivery system (http://seamless.usgs.gov/). This system provides USGS standard products using GeoTIFF and ESRI Shapefile formats. FREEDIUS uses subsystems that are based on libgeotiff (http://remotesensing.org/) and shapelib (http:// shapelib.maptools.org/) to ingest USGS DEMs (Digital Elevation Models), DOQs (Digital Orthographic Quadrants) and Shapefiles (geometric descriptions of ground features) into a FREEDIUS site model. This is one first step in creating a site model, and is particularly simple in that the “seed” for this site model comes directly from the USGS web site. USGS GeoTIFF files contain all the information that is needed to establish a geospatial location for a local Cartesian coordinate system (an LVCS). The USGS data is not sufficient to create detailed models of objects on the ground. We also collected several ground images with consumer digital cameras whose internal parameters can be extracted directly from the images produced by the camera. Ground camera placement and initial orientation was accomplished through the use of a camera-bag object. This object class contains a pointer to the frame camera that is associated with an image. A camera-bag presents itself in spatial views as an imaging plane with a frustum, as seen in Figure 10, where the camera frustum appears as a shaded triangle.

directly from the digital camera itself. Site modeling proceeds by outlining walkways and roads on the ground and positioning ground cameras (using camera-bag objects combined with USGS orthographic projections) and estimating orientation. Adjustment of a camera bag in any view will cause the view associated with the camera bag to refresh immediately. The adjustments to the camera model are therefore immediately apparent to the user.

Figure 11: Conjugate point object.

In this case, the corresponding frame-camera is positioned within the clock tower that is visible in the left-hand image. The projection contained within the camera-bag can be manipulated by adjusting the “stare point” on the ground, the orientation and XYZ position of the camera. It is also possible to vary intrinsic parameters like focal length, but in the case of the Annapolis site, the camera internal parameters are obtained

Once an approximate camera pose has been defined, conjugate points are used to define correspondences between control points (usually 3d-crosshairs placed on the terrain in a USGS DOQ image), and 2d-crosshairs placed in the image whose camera model is to be refined. The conjugate-point class is a subclass of 3d-crosshair that implements this correspondence operation. Figure 11 shows a conjugate point displayed in an uncalibrated image. The 3-D position of the conjugate point is projected as a blue crosshair using the (approximate) projection model for the image, while the corresponding 2D point is displayed with a dashed line extending to the 2D crosshair (an X) that shows the true 2D position of the corresponding image point. By using several of these correspondences, a least-squares error minimization (resection) can be applied to the camera model to obtain accurate pose. Once reasonable approximations can be obtained for ground camera models, building outlines can be drawn and extruded in multiple views to populate the site model. Usually, the building outline is created in the USGS DOQ view, and then extruded using the ground views to get accurate height information. Site modeling results are visible in Figure 9 as overlays on the images.

Figure 12: Video Light Table

Figure 13: Track from ground-based stereo sensor. Appearance information is stored with each observation.

3.2 Video Track Visualization Video is increasingly becoming a major source of data for computer vision applications. SRI engages in research on tracking from ground and airborne video sources. These trackers emit observations that are timestamped and expressed in either image or ground coordinates. Because different application domains are involved, track formats and metadata can vary from system to system. Figure 12 shows the Video Light Table (VLT), a system that was developed using FREEDIUS. The VLT ingests mover detections obtained from a separate SRI system that is able to stabilize and track objects found in video taken from a UAV (Unmanned Aerial Vehicle). Here, tracks are represented as a sequence of timestamped centroids and bounding boxes that are expressed in video frame coordinates Bundle adjustment is used to obtain accurate camera models for each frame of the video sequence. This allows track observations to be registered to ground coordinates. In Figure 12, these tracks are shown as dark (red) curves overlaid on a site model that was

derived from USGS Seamless data. The upper left pane shows the USGS orthographic image. In the upper right, the video frame is displayed. The lower left pane contains a timeline view in which ground tracks are displayed by green bars indicating the temporal extent of the track. The lower right pane shows a synthetic view that allows the site to be rendered from arbitrary viewpoints. This display also allows sensor ground footprints and UAV trajectories to be displayed. Selection of a point on the UAV trajectory causes the GUI to display the corresponding video frame and ground track data. In contrast to SRI’s ground-based tracking systems, airborne track data only contains centroids and bounding boxes, hence selection of points on a VLT track only highlights position and extent of the object. Another tracker developed at SRI uses multiple ground-based stereo cameras to detect movers. In this case, track observations (implemented by the sperson-track class) are richer and contain more appearance information in each observation. As in the VLT, tracks are displayed as green bars on a timeline view. However, vertex selection for these tracks allows appearance information to be retrieved and displayed as shown in Figure 13. Here, the image chip associated with each observation is stored and retrieved. Selection of a track point causes these image chips

to be overlaid on a previously saved background image, thus reconstituting the video. The GUI arrangement seen in Figure 13 allows a user to select tracks by name. Since this tracking system uses six sensors, the GUI provides the user with a way to select tracks by name and by sensor. The top control buttons allow the timeline to be played back. This provides a DVR-style capability for direct access and retrieval of video and track data. All tracks are ultimately subclasses of world-line, and as a result must be associated with some coordinate system. Typically, this coordinate system is related to the LVCS of the site. If the LVCS is geolocated, then it is trivial within FREEDIUS to determine geospatial coordinates for all points in a track, whether they are derived from airborne or ground imagery. In Figure 12, latitude and longitude information is shown for a selected vertex within a track. Appearance information places an additional burden on memory requirements, hence the sperson-track class is further specialized to create a paged-sperson-track class. This latter class is analogous to the paged-image class, in that track observations are paged in from a data file (or database, in some cases) only when needed. Only those track objects that fall within temporal windows of interest are actually instantiated. This is especially useful for surveillance systems that run 24 hours a day. The environment shown in Figure 12 has 6 sensors operating continuously. Track datasets from these sensors can contain thousands of tracks, each of which can have anywhere from about 10 to thousands of observations (depending on dwell time in the capture volume of the tracker). Hence, a paged observation scheme is essential to ensure reasonable performance during track playback, retrieval, and processing.

associated with one or two actors. Figure 14 shows a partial class hierarchy for event objects in FREEDIUS. These event classes are used to populate an object-set or database with a primitive event stream. High-level event recognition can be achieved by using the primitive event and track stream provided by FREEDIUS in combination with an ontology [4] that describes high-level events in terms of primitive events, actors, and objects. FREEDIUS emits its primitive track and event streams into an SQL database that is then analyzed by an approximate pattern matching system (LAW – the Link Analysis Workbench). This scheme was applied to the convoy recognition problem [5]. Figure 15 shows an example of one high-level convoy event that was detected by the system. The FREEDIUS video viewer panel in this figure has a timeline view in the top pane, and a video view in the bottom pane. In timeline views, tracks are displayed as thin green horizontal bars, while events are displayed as colored rectangles, where the color indicates the class of event that is displayed. In this case, the events that are displayed are FOLLOW events that together comprise a convoy. The details of this system typed-event

type-1-event

enter-event

type-2-event follow-event meet-event

exit-event depart-event

3.3 Event Recognition Events arise through the actions of tracked objects in the video scene. Any attempt at semiautomated analysis therefore begins by segmenting movers in the incoming video frames and organizing these observations into spatiotemporal track objects (sequences of spatial samples taken over time). Tracks can be further segmented into time points or periods that define events taken from some primitive vocabulary. For example, a mover might stop for a period of time, and this period can be marked as a STOP event. Each such primitive event can be thought of as a (possibly ephemeral) relationship among scene elements. In FREEDIUS, events can be derived from tracks in three ways: (1) Intrinsic track events involve only one track and are characterized by changes in trajectory (e.g., stopping or turning); (2) Type I events involve interactions between a track and some other (usually stationary) object; (3) Type II events occur when two tracks interact in some way (e.g., MEET or FOLLOW). Type II events also include intrinsic track events. In FREEDIUS, events are implemented in a class hierarchy. The base event class represents a point or interval in time. Type I events are associated with one actor (a track) and one object. Type II events are

Figure 14: Partial event object class hierarchy.

are discussed in [5], but the recognition process is robust in the presence of intermittent and noisy event streams. For this example, tree canopy and variation in contrast caused tracks to be detected only intermittently, but the system was still able to correctly label this sequence as a convoy.

4. FUTURE DIRECTIONS Because it is written in Lisp, FREEDIUS can easily be interfaced with other Lisp systems that perform reasoning over knowledge bases. FREEDIUS has already been used in a prototype system to provide geospatial capabilities to existing knowledge-based systems. Some future work includes the use of FREEDIUS as an integration platform for knowledge-based visual recognition. We continue to explore ways in which ontologies can be used for event and object recognition in still and motion imagery.

to cheaply achieve orders of magnitude improvements in performance. Most of the FREEDIUS core image operations should yield well to the kinds of processing possible within commercially available GPUs.

5. REFERENCES [1] Hanson, A. J. and Quam, L. Overview of the SRI Cartographic Modeling Environment, Technical Note 515. AI Center, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025, January 1992. [2] Heller, A. and Quam, L. The Radius Common Development Environment, in RADIUS: Image Understanding for Imagery Intelligence, Oscar Firschein and Thomas M. Strat eds., Morgan Kaufmann,1997. [3] Heller, A.J. and Fua, P. The Site-Model Construction Component of the RADIUS Testbed System, in 1996 Proceedings, Image Understanding Workshop, p. 345, 1996. [4] Francois, A.R.J., Nevatia, R., Hobbs, J., and Bolles, R.C. VERL: An Ontology Framework for Representing and Annotating Video Events. IEEE Multimedia Magazine, vol. 12, no. 4, pp. 76–86, October 2005. Figure 15: Video panel showing (from top) video playback controls, timeline view with track and event bars, and video view with track overlays. We are also investigating the use of special-purpose hardware to improve processing times for image analysis algorithms. GPUs (Graphics Processing Units) offer considerable promise as a way

[5] Burns, J. and Connolly, C. and Thomere, J. and Wolverton, M. Event Recognition in Airborne Motion Imagery, in Capturing and Using Patterns for Evidence Detection – Papers from the AAAI Fall Symposium, AAAI Press, 2006.

Suggest Documents