CameraCast: Flexible Access to Remote Video Sensors - CiteSeerX

2 downloads 421 Views 144KB Size Report
system-level solution to remote video access. It provides a logical device API so that an application can identically operate on local vs. remote video sensor ...
CameraCast: Flexible Access to Remote Video Sensors Jiantao Kong, Ivan Ganev, Karsten Schwan, College of Computing Georgia Institute of Technology {jiantao,ganev,schwan}@cc.gatech.edu

Patrick Widener Department of Computer Science University of New Mexico [email protected]

ABSTRACT New applications like remote surveillance and online environmental or traffic monitoring are making it increasingly important to provide flexible and protected access to remote video sensor devices. Current systems use application-level codes like web-based solutions to provide such access. This requires adherence to user-level APIs provided by such services, access to remote video information through given application-specific service and server topologies, and that the data being captured and distributed is manipulated by third party service codes. CameraCast is a simple, easily used system-level solution to remote video access. It provides a logical device API so that an application can identically operate on local vs. remote video sensor devices, using its own service and server topologies. In addition, the application can take advantage of API enhancements to protect remote video information, using a capability-based model for differential data protection that offers fine grain control over the information made available to specific codes or machines, thereby limiting their ability to violate privacy or security constraints. Experimental evaluations of CameraCast show that the performance of accessing remote video information approximates that of accesses to local devices, given sufficient networking resources. High performance is also attained when protection restrictions are enforced, due to an efficient kernel-level realization of differential data protection. Keywords: video sensor, proxy, data protection, remote device access

1. INTRODUCTION Flexible access to and distribution of remote video information is required by a wide variety of applications, ranging from video surveillance, to security monitoring, traffic control, videoconferencing, to autonomous robotics and distributed sensing, and many others. Using remote sensing as an example, three different elements of video access and distribution in these classes of applications are (1) data capture by some group of sensors, (2) data aggregation or fusion in aggregation servers, and (3) data personalization and display for specific end users. Current application-level realizations of such functionality use fixed communication and interaction protocols, an example being an aggregation server that provides static images using standard web servers and streaming images through protocols like RTSP, or it may use proprietary communication packages to deliver video data to multiple clients [1,15,16]. There are several issues with application-level implementations of remote access to video information. First, their client applications are constrained by the specific interfaces and communication protocols exported by such services. For instance, the static images provided by web servers will cause additional overheads for applications that require continuous image updates. Further, when streaming video with RTSP, applications cannot control the video stream, and therefore, cannot easily adapt streaming to their own network and system conditions [2]. Second, information access and distribution take place through service topologies that have limits in how they can be dynamically adjusted to match currently available system resources. An example studied in our own prior research is mobile clients experiencing service degradation due to a fixed deployment of aggregation servers to machines [3]. Third, when there are data privacy concerns, risks in exposing video information to application-level codes cannot be addressed with encryption when such codes must manipulate such information. CameraCast is a kernel-level solution to flexibly capturing, distributing, and accessing remote video information: 

CameraCast works for arbitrary application-level codes that require video information, since it simply extends the local APIs already used by such codes to also apply to remote devices, at the level of abstraction offered by common video drivers.



With CameraCast, applications can ‘cast’ a video capture device resident on some physical machine to any other machine on which they wish to run, then further cast such remote devices to other machines, etc. In this fashion, applications can construct the system-level service overlays that match their current needs and available resources.



CameraCast’s machine-machine information exchanges can be enhanced, at runtime, with methods for data manipulation and filtering. For generality, CameraCast offers a basic method-based extension model enabling safe kernel-level extensions. In this paper, we use that model to implement methods that protect video information, implementing fine grain data protection constraints. An example is a method that crops a video image to eliminate sensitive information not suitable for some remote application. Such data manipulations can be performed whenever address space boundaries are crossed, from kernel to application level or from machine to machine.

CameraCast is realized as a small set of kernel abstractions that implement (1) access to video sensors, (2) information distribution, (3) access protection. Using these abstractions, an application first installs a logical video capture device on the client side. It then maps this logical device to some actual remote device. Finally, it accesses the remote device with the same API as those used for a local device. The Linux implementation of CameraCast uses a kernel-level proxy on the node to which the actual video device is attached. This module can respond to remote requests by querying the device buffer; or it can independently query the local buffer to continuously ‘push’ video data to client-side logical devices. Kernel-level overlays are built by having logical devices in client machines serve as proxies for other machines. As a result, a logical device at one client could serve as a proxy for another client, thereby increasing scalability by offloading the machine with the actual device, for example. More generally, in this fashion, applications can construct the overlay topologies they desire for distributing and manipulating captured video information. Computational extensions applied to CameraCast data movements are implemented with the K-Plugin [7] kernel extension facility. This facility permits arbitrary application-provided functions to be applied to CameraCast data. In this paper, we use K-Plugins to implement a data protection model. The outcomes are CameraCast API extensions that implement a fine grain protection model. With this model, data protection associated with CameraCast devices uses capability-based controls on the data being received, where each capability includes information about how the image should be processed before passed to another machine or passed to an application. The implementations of the protection checks enforced by such capabilities are associated with CameraCast devices when they are first opened. Once opened, all future video information accesses will go through the capability check step. Experimental results shown in this paper demonstrate the viability and utility of CameraCast remote devices. With sufficient network connectivity, CameraCast performance is the same for remote vs. local devices. Scalability in terms of numbers of clients for a single CameraCast device is attained by kernel-level overlay constructed with multiple ‘chained’ device proxies. Capability-based data protection is shown to be a flexible way of controlling the data accessible to certain machines and end users. Data protection is done by manipulating (e.g., filtering) the image data inside the kernel and only passing the resulting images to applications. This enforces data privacy since in operating system kernels, including the Linux kernel, do not permit applications to access the original kernel-level image data. The CameraCast design can operate in both standard and virtualized execution environments, the latter demonstrated by an ongoing port of CameraCast to systems virtualized with the Xen hypervisor [4]. Here, device proxies and data protection functions run in separate driver domains that are entirely different from those that run operating systems and applications. As a result, the image data is protected by domain boundaries, where even compromised OS kernels cannot break such boundaries to violate the data privacy constraints enforced by differential data protection. In the next section, we present the CameraCast remote video access model and describe how video information is delivered to end users. Section 3 provides additional detail about how the CameraCast system is implemented. Section 4 illustrates the performance of the different kernel abstractions used to implement CameraCast. We briefly discuss related work in Section 5. Conclusions and future research appear in Section 6.

Application User Kernel

Logical V4L (IV) V4L Data V4L Proxy Protection Client Logical V4L (V) V4L Data V4L Proxy Protection Client

Logical V4L (II) V4L Logical V4L Proxy Device Client

User Kernel

Logical V4L (III) V4L Data V4L Proxy Protection Client Logical V4L (I) V4L Logical V4L Proxy Device Client Video for Linux Device Specific

Figure 1. Logical Video for Linux model

2. DESIGN The CameraCast design uses standard device interfaces like the Video for Linux API (V4L). The current V4L implementation, however, requires a video application to reside on the machine to which the capture device is attached. CameraCast exploits the presence of today’s ubiquitous low latency high bandwidth network interconnects to extend V4L interfaces to remote machines. The outcome is ‘remote’ devices accessed and used in the same manner as those available locally, with the standard interfaces used by higher-level video applications that desire video access. Specifically, whenever needed, a client-side application can create a CameraCast remote ‘logical’ device. Since this device has properties identical to that of the actual capture device, the application can access video information as if it was captured locally. This kernel-level solution differentiates the CameraCast approach from those used by server- or web-based implementations. The CameraCast design pursues simplicity and standards adherence. It is based on the following principles: (1) standard API, with simple extensions to optimize or improve remote operation; (2) low latency access to remotely captured data; (3) scalability to multiple remote users; and (4) a rich extension model to associate computation with data movement, which in this paper, is used to control information access by/from remote sites. We elaborate on each of these in more detail below. 2.1 Basic Remote Device Access A CameraCast device is a logical device that ‘wraps’ some physical device with additional functionality. Using the video sensor as an example, the logical device driver provides the Logical Video for Linux (LV4L) API, which is a backwardly compatible API extending the standard Video for Linux (LV4L) interface. Figure 1 illustrates the basic design of such a Logical Video for Linux (LV4L) model as consisting of four parts. The first part is a V4L interface that can be invoked by the applications or by other logical devices. The second part is a V4L client that can communicate with another device through the V4L interface, or with a remote logical device through the V4L proxy, which is the third part of the driver design. The fourth part manages the video frames and provides functionality enhancements like data protection. To wrap a video sensor device on the remote node, we need to set up at least two logical devices. As shown in Figure 1, a logical device (I) is layered on top of the real camera device driver at the remote node. On the client site, another logical device (II) provides a V4L interface to the application. Two logical device drivers are linked together through the V4L proxy and V4L client pair. Local calls to the V4L interface are first translated to network messages and then converted back to calls to the V4L interface of the actual device. As shown in Figure 1, LV4L permits multiple V4L interfaces to be chained together to create scalable multi-machine overlays via kernel-resident V4L proxies and clients. These proxies and clients cooperate to respond to a remote machine’s or a local application’s request for video data. Accessing the video sensor remotely makes it possible for multiple users to access the device at the same time. For exclusive access to a remote device, parameter settings for image format, resolution, etc. specified by a client are propagated to the actual camera device. For a shared remote device, since each application can set up its own parameters, such settings are maintained by the local LV4L driver. The actual camera is pre-set to the most commonly used parameters. For example, a shared camera may be set to capture images in JPEG format of size 640x480. All video

Device Buffer

V4L Proxy Client in Pull Mode

V4L Interface

Converter

V4L Client

JPEG Decoder Client in Push Mode

Capture Daemon Frame List

Logical V4L Device

V4L Interface

Device Buffer

Figure 2. Frame flow in the logical V4L device

information received from that remote device is converted to the locally desired format upon receipt. An example is the remote device providing JPEG images, with a local driver first using a jpeg decoder to convert the image, then releasing it to the application in RGB format. Such application requests succeed only for ‘supported’ (i.e., convertible) formats. 2.2 Push-based Remote Access The basic CameraCast mechanism uses a pull-based method that maps each user request to a direct access to the remote device. Several issues motivate us to provide a second, push-based access mechanism. In the pull-based model, a read call acquiring a frame will introduce one outgoing message for the request and one incoming message for the response. The capture/sync calls for a single frame involves two outgoing messages and two responses. When there are substantial network delays, this multiplicity of request/response messages reduces the attainable frame rate. For instance, with delays of tens of milliseconds, which are common in today’s wide area networks, and a frame cycle time of 40 milliseconds, a standard rate of 25-frames/seconds cannot be attained with the request-based approach. To deal with larger network delays and for scalability, CameraCast implements a simple push-based ‘streaming’ model, as shown in Figure 2. In this design, the LV4L driver runs a daemon thread, which when activated, continuously queries the wrapped device for captured frames. It moves such frames from the device buffer of the wrapped device to a frame list in the LV4L driver. The V4L proxy then constantly pushes the newest frame in the frame list to all clients that have requested continuous frame streaming. A push channel is set up by the V4L client of the local logical device sending a request to the V4L proxy associated with the physical device. The client driver maintains a few recently received frames, so that application requests can be satisfied directly, without traversing the network. Frame rates are limited by available network bandwidth and by the source’s ability to push frames into the channel. The push model provides us with additional opportunities for optimization, most notably to improve scalability to larger numbers of device clients. Specifically, since the LV4L driver associated with the actual device may become a bottleneck if it must serve a large number of remote clients, a remote proxy can itself become a source for another client. This is shown in Figure 1, where the logical device (II) can volunteer itself as a proxy if it works in push mode. The V4L client receives frames from the logical device (I) and then either pushes them to the logical device (V) or responds to pull-based requests from that device. In this fashion, the workload on the actual device-side LV4L driver can be offloaded to multiple logical drivers, forming a kernel-level overlay network. Further discussions of how such chains can be set up appear in the next section. 2.3 Data Protection The basic V4L interface either permits remote device access or disallows it. The extended LV4L driver developed in our research offers richer, finer grain controls. These are described next. 2.3.1 Dynamic Differential Data Protection Dynamic differential data protection (D3P) is a mechanism that deploys application-specific protection functionality at runtime to enforce restrictions on data access and manipulation. It employs a capability-based model for data access control [5,6]. In this model, each application, acting on behalf of a certain user, interacts with a Capability Manager Server (CMS), using standard authorization and authentication procedures. The CMS verifies the user’s role and issues

struct Operation { CHAR[32] Operation Name INT Operation Type CHAR[] Extra Data for Operation CHAR[] Operation Payload { ECode or Binary Code or Service Name/Location } }* *

Trivial fields are omitted.

struct Credential { CHAR[32] Credential Issuer CHAR[32] Credential Type Descriptor CHAR[64] Type Dependent Information CHAR[16] Session Identifier INT Index of Default OP Operation[] Operation List CHAR[16] Digital Signature }*

Figure 3. Credential structure

a corresponding certificate encoding the user’s access rights on a specific protected data type (e.g., the permissible camera image). The certificate specifies the data type, the owner, the list of permitted operations, and the digital signature. As illustrated in Figure 3, jointly, the credential issuer and the credential type descriptor uniquely specify the type of the protected data object. In addition to the type descriptor, we include some type-dependent information inside the credential. For example, for the type ‘camera captured data’, such information may include the dimension and format of the image. Such information may be critical to the operation specified in the credential. The other part of the credential is a list of operations that defines how the application can view this data type. For example, it may be as simple as ‘remove the left corner of the image’ or as complex as ‘blur out all human faces’. The operation can be defined in three ways: (1) by pre-compiled binary code, (2) with a function written in ECode (a subset of the C language) that can be dynamically compiled at run time, or (3) as a procedure stored at a code repository. In the third case, the repository itself will sign the code with its own signature to guarantee the code is genuine. The digital signature is the mechanism used to prevent credential forgery. Since highly secure cryptography is expensive, we only perform a one-time check when the LV4L driver first receives a credential. The reference to the credential by an application does not incur this integrity check, thus removing this expensive task from the data delivery path. Along with the digital signature, there is a session identifier that uniquely identifies where and when this certificate will be used. In our remote video access example, the LV4L driver generates a random number when the application tries to open a protected remote video sensor. The application then sends this number along with the request for a credential to the CMS. The credential received from the CMS then encodes this number as the session identifier. By checking the session identifier, we can guarantee that the credential is limited to the lifetime of the opened device, thus limiting improper propagation and reuse. 2.3.2 D3P Capture Device Driver A D3P Video Driver is a LV4L driver layered on top of another V4L interface. When the driver is equipped with D3P features, the application opens the device as if it were a regular V4L device. However, any future read/capture command will fail if it does no provide a proper certificate in advance. Toward this end, the application queries the driver for a session identifier, acquires a suitable certificate from the CMS for this session, and then installs the certificate in the driver. The LV4L driver verifies the certificate by checking the digital signature and the encoded session identifier. Once authenticated, the operation handlers inside the certificate will be installed in the K-Plugin Runtime. The K-Plugin Runtime provides safety and isolation guarantees for executing user code inside the kernel. This is described in more detail in the next section. As shown in Figure 4, the D3P LV4L driver operates on the images captured/acquired by the underlying V4L device or a remote V4L Proxy. The driver first decodes the frame into a format suitable for the operation handler, as shown in the figure. It then copies the frame to the memory region accessible to the K-Plugin Runtime. The handler operates on the image contained in the buffer and generates desired results into another

Applications Device Mapped Buffer

Result Frame Source Frame

Operation Handler

K-Plugin Runtime RGB Frame JPEG Frame

D3P Logical Device

V4L Interface / Remote V4L Proxy Figure 4. D3P LV4L device

buffer. Finally, the result image is moved to the application buffer or to the LV4L device buffer. A D3P logical device can be plugged anywhere in the LV4L driver chain. As shown in Figure 1, we can construct a chain from device (IV) to (III) to (I), or from (II) to (III) to (I). The D3P driver depends on the integrity of the system kernel to protect data from the application. So, only fully trusted systems should be equipped with D3P LV4L logical devices.

3. IMPLEMENTATION CameraCast has been implemented for Linux platforms. Its API is compatible to the Video for Linux device interface with some functionality enrichments. The runtime environment for associating manipulations with image movements is provided by the K-Plugin framework. In this paper, we use it to implement a fine grain data protection model. 3.1 The Extended Video for Linux Interface We use the existing ioctl interface to provide the enriched remote video capture and data protection functionalities. The application queries the LV4L device capability using VIDIOCGCAP. In addition to the regular status of a video capture device, the returned values specify whether the device is statically or dynamically mapped to remote devices, whether it can set up push channels, whether it can work as a V4L proxy, and whether it can provide data protection features. The logical device can be statically mapped to a remote device at install time. This is useful when the network topology is relatively fixed. In this case, the application need not know that it is talking to a wrapped remote device. The driver sets up connections to the remote device automatically, at open time. If the application is aware of the existence of the logical device by querying the device capabilities, it can dynamically construct the mapping to the remote device. Here, the user opens the device, receiving a virtual file descriptor as a response. Any V4L calls using this file descriptor will fail unless they first establish a connection to a remote device. The application uses VIDIOCONNECT, which is a new command, to set up such a connection, i.e., a communication channel to a remote V4L proxy. Currently, we are using TCP/IP for reliable connections. For loss-tolerant video access models, our future work will also implement UDP-based connections. It is up to the remote V4L proxy to determine whether it can accept a request for a connection. Based on its resource availability and its current subscribers, it can either accept the request, or reject it, or reject it but provide a list of alternative equivalent logical devices from its current subscribers. In the latter case, the user can choose one from the list and re-issue a VIDIOCONNECT request to it. It will repeat this process until it finally connects to some remote device. The outcome is an incrementally constructed kernel-level overlay shared by many clients for their remote device accesses. The application uses VIDIOPUSH to set up a push channel from the remote V4L proxy. It explicitly specifies whether it is willing act as a proxy itself. If yes, the remote V4L proxy can provide this node as an alternative logical device to other nodes. The application can use VIDIOUNPUSH to terminate the push channel, or it can simply close the device to shut down the channel implicitly. The current implementation only provides a best-effort push mechanism, without controls of frame rates (i.e., flow control), richer methods for admission control, etc. Such QoS-centric support will be addressed in future work. The current prototype handles broken network connections in the simplest way. When it encounters a network connection problem to the remote V4L proxy, or experiences a very long delay, it simply first closes the current connection and then restarts at the VIDIOCONNECT step. Although the LV4L model provides a way to construct overlay networks to share video capture devices, the model itself does not control overlay topologies. There is substantial research on constructing and maintaining efficient overlay topologies, including the work in [8,9,17]. The D3P logical device provides two new ioctl commands. VIDIOCGSESSION acquires the session identifier of the current opened device, and VIDIOCSCERT installs the certificate it acquires from the capability manager server. The certificate is digitally signed using MD5 checksum plus RSA encryption. Increasing its strength to SHA-1, for instance, is straightforward.

We use JPEG as the common format for transferring frames to take advantage of the associated reductions in frame size and because most modern video sensor can directly generate hard-encoded JPEG images. The JPEG decoder used is derived from the spca5xx video device driver. Because the image handlers for D3P typically operate only on raw formats like RGB, and because our current implementation lacks a kernel level JPEG encoder, we currently only perform data protection actions at the client side. 3.2 K-Plugin Framework The K-Plugin framework provides an effective, efficient, and easy to use extension mechanism. To that end, its design has been guided by the following desirable properties: (1) generality -- avoid targeting specific applications, (2) functionality -- avoid restrictions on plugin code, (3) safety -- isolate D3P driver from untrusted plugins, and (4) efficiency -- minimize implementation overheads. We attain these properties by combining hardware fault isolation and dynamic code generation with lightweight dynamic linking. Hardware fault isolation protects the core kernel from the untrusted plugins and helps avoid costly per-instruction runtime overhead. It provides an engineering solution to the isolation problem without the complexity and overheads inherent in programming-language techniques, proof-carrying code, or software-fault isolation. Dynamic code generation serves a two-fold purpose. First, it provides a common language for arbitrary and crossplatform runtime adaptation in heterogeneous environments. Second, it promotes source-code performance by translating them into native machine code able to run at full speed on bare hardware. Our isolation scheme exploits features of the Intel x86 architecture's segmentation and protection hardware by placing all plugins into an unused privilege ring. On x86 hardware, the OS kernel runs in ring-0 (highest-privilege). We allocate memory to hold all plugins' code, data, and stacks in ring-1, thereby guaranteeing the kernel memory's safety. Service callbacks are invoked through a hardware trap, much like system calls, and run in ring-0, i.e., they run in the OS kernel. Control and data flows between privilege rings are governed by the host kernel through hardware traps. Plugins have full access to their parameters and local variables allocated on the plugin stack. They also have full access to a pool of ring-1 memory, effectively acting as a heap. The contents of the heap persist between plugin invocations, so it is also used for static variables. The heap is allocated on a per-runtime basis, which means that all plugins within a runtime share it and can use it for global variables, communication, and cooperation. Additionally, it is possible to provide select plugins with read-only access to parts of the kernel memory if needed. An alternative implementing isolation using the CPU's paging mechanism would have imposed much larger invocation latency overheads associated with page table (address space) switching. For an in-depth description of kernel plugins, the reader is referred to [7]. 3.3 API Discussion Delivering video information efficiently to end users is known to be a complex task, with issues to be addressed ranging from constructing suitable content delivery networks, efficient image transcoding, filtering, and archiving, providing quality of service, enforcing data privacy, and many others. CameraCast is not designed as a replacement for complex application-level video servers or services. Instead, its goal is to provide some of the fundamental support needed by all such services, which is the need (1) capture video information, (2) access such information from local or remote devices, and (3) manipulate video information ‘in transit’ across machines and/or protection boundaries. Accordingly, CameraCast offers three sets of simple APIs to application: (1) setting up connections, (2) installing application-specific handlers on images, and (3) controlling resource usage. Using these APIs, applications can use arbitrary off-the-shelf machines to construct and maintain the video sensor overlays and associated video manipulations they desire. A demonstration of how an application can use CameraCast to efficiently utilize a remote device is a client side application that sets up a connection to the node with the actual device, or alternatively, with a logical device on some proxy node, thereby explicitly constructing the overlay network it desires. For push channels, the device side logical driver can query some dormant logical devices, asking them to act as proxies, thereby adjusting the server’s video delivery network. In both cases, all decision-making regarding overlay network topology is performed at application level. The CameraCast API simply translates user-level decisions into actual connections, e.g., push channels. An explicit implementation choice made for CameraCast is to use a general extension method for safely enriching video movements with computations, K-Plugins. This paper uses that method to install image handler inside the LV4L logical

device driver to tailor images for content protection. The outcome is the D3P device driver able to enforce data privacy for video information. Another example is an image-diff service, which could compare the current image frame with the previous one, and then translate the frame into a pseudo-image that only contains a no-change notification if no significant content changes are detected. Such an approach could significantly improve performance in environments with insufficient network resources. The third set of APIs, still under development, can setup QoS parameters for each of the outing connections. This can be used to control the resource usage of each connection, by controlling frame rate.

4. EXPERIMENTAL EVALUATION All experimental evaluations are performed on the Georgia Tech Emulab installation [10], termed Netlab. Experiments use an emulated network with 7 nodes, as shown in Figure 5. All nodes in the network are Pentium IV 2.8GHz machines with 512M memory sizes. The testbed is used to experiment with the effects of different network delays. It consists of a 3-node LAN connected with 3 remote machines, with different delays imposed to simulate different network link conditions. Since all nodes are connected with gigabit links, LV4L operation will not be limited by network bandwidths. One additional node with a 11MB link (emulating wireless connectivity) to the LAN is used to demonstrate that the LV4L approach can scale under poor network conditions.

Node with 11M link LAN (giga links)

Gateway 20ms

10ms delay 5ms 10ms

We use a Logitech QuickCam Communicate STX. While its sensor can produce up to 640x480 pixels 30 frames per second by the manual, the available spca5xx driver can only produce up to 14 frames per second. When using the 320x240 mode, the frame rate ranges from 13 to 28 frames per second. We setup the camera to generate around 25 frames per second using a static scene.

20ms

Figure 5. Test bed

The purpose of the first experiment is to demonstrate the competitive performance of the basic pull-based LV4L model. Then we compare it with an application level server with the same functionality. We next experiment with the different LV4L usage models and with the LV4L model extended with methods for data protection. Last, we demonstrate the scalability of LV4L under heavy loads, when using the push usage model. Table 1. Micro benchmark on basic V4L APIs Network Delay Operation

Open (ms) Query Camera Setting (ms) Change Camera Setting (ms) Change Picture Mode (ms) Map Device Buffer (ms) JPEG Frame Rate by Read (frames/sec) RGB24 JPEG Frame Rate by Capture/Sync(frames/sec) RGB24

Local Device

LAN Based

5ms Delay

10ms Delay

20ms Delay

1058.35 < 0.02 < 0.02 38.65 0.16 24.98 24.98 24.89 24.89

3.36 0.17 0.17 34.54 0.20 25.03 24.98 24.85 24.85

33.68 10.29 10.28 44.62 0.19 24.98 24.98 24.85 24.89

63.56 20.27 20.25 54.68 0.20 24.53 22.91 12.50 12.49

123.43 40.21 40.20 74.67 0.19 12.42 11.98 8.18 7.79

4.1 Basic V4L Interface Benchmark In this part, we measure the basic cost for each of the V4L interface API under different network conditions. As shown in Table 1, the basic costs for most V4L API calls are small. The numbers in the table mainly reflect the round trip delay of the network links. The time it takes to open a device is lower for our pull-based remote access. This is because the remote V4L proxy opens the actual device in advance, thus reducing device initialization time. Mapping device buffers is a local operation and is therefore unrelated to network delays. The frame rate cited is for capturing frames with 320x240 pixels. (The results on frame rate and frame delay shown in the remainder of this paper are all based on frames

Frame Delay by Read Method

30

300

25

250

20

App - JPEG LV4L - JPEG

15

App - RGB LV4L - RGB

10

Frame Delay [ms]

Frame Rate [Fr/Sec]

Frame Rate by Read Method

5

200

App - JPEG LV4L - JPEG

150

App - RGB LV4L - RGB

100

50

0

0

LAN Based

5ms Delay

10ms Delay

LAN Based

20ms Delay

5ms Delay

10ms Delay

20ms Delay

Network Status

Network Status

Figure 6. Frame rate: application vs. LV4L

Figure 7. Frame delay: application vs. LV4L

Frame Rate of Different LV4L Models

Frame Delay of Different LV4L Models 90

30

Pull for Read 20

Assisted Read Push for Read

15

Pull for Capture Assisted Capture

10

Push for Capture 5

Frame Delay [ms]

Frame Rate [Fr/Sec]

80 25

70 60

Pull (RGB)

50

Push (RGB)

40

Pull (JPEG)

30

Push (JPEG)

20 10

0

0 LAN Based

5ms Delay

10ms Delay

20ms Delay

Network Status

Figure 8. Frame rate: different LV4L models

LAN Based

5ms Delay

10ms Delay

20ms Delay

Network Status

Figure 9. Frame delay: pull model vs. push model

with 320x240 pixels). We can see from the table that for small network delays, remote and local devices achieve the same frame rates. When the delay increases, the frame rate for read drops, and the rate for capture/sync drops even faster. This is because of the extra messages for acquiring each single frame: in addition to the message transferring the frame from the remote V4L proxy, there is one extra message per read call and three extra messages per capture/sync call. The experimental results demonstrate that LV4L is competitive with the standard, local V4L solution for wellnetworked systems. With larger network delays, however, there are increased delays for camera control and also decreased frame rates for read or capture/sync operations. These results (1) demonstrate the viability of the LV4L approach, but (2) they also show that not surprisingly, the basic LV4L approach will not work well in more challenging network environments. Thus, support for alternative models of data distribution is critical. The push-based model evaluated below is one such alternative 4.2 Kernel vs. Application For comparison, we implement an application server that provides the device-captured frames to remote clients. The server responds to clients’ requests for captured frames. We simulate both the read and capture/sync models of the V4L interface. The comparison of LV4L with application-level solutions has several interesting elements. As shown in Figure 6, the frame rate is almost identical for capturing JPEG frames, but for RGB frames, the application server is significantly slower than our LV4L model. The major difference here is that the application server transfers raw RGB format directly to the client application, while the LV4L device receives a JPEG frame and then convert it to RGB format locally.

We modify the spca5xx driver to record the time when a frame is ready in its device buffer. We append the timestamp at the end of the frame. As a result, we can measure the client-side delay experienced before a frame reaches the end user. Figure 7 shows the frame delay for both the application server and for the LV4L model. They are the same for JPEG format, but the application server suffers from larger delays due to the larger sizes of the transferred RGB images. 4.3 Different LV4L Models The pull-based remote access model only consumes resources when the clients issue requests, but it performs poorly with larger network delays. The push-based remote access model provides better performance, but may consume additional network bandwidth. Figure 8 shows the performance of the two models plus a hybrid model labeled as ‘assisted read’ (capture). In the hybrid model, there is one daemon constantly moving frames from the device buffer of the actual camera. But the daemon only sends out frames when clients have requested them. As shown in Figure 8, the push-based model achieves the same frame rate even under the 20ms network delay. Interestingly, the hybrid model only performs better than the pull-based model in one case. After further investigating the device driver of the camera, we found that the driver always fills the device buffer, even without outside requests. As a result, for this driver, the hybrid model is not useful. Figure 9 shows the frame delay of two LV4L models. The push-based model wins since a new frame is pushed to the client side as soon as it is available. 4.4 LV4L with Data Protection We construct a D3P LV4L driver on top of another push-based LV4L logical device. The underlying logical device accepts incoming frames at the rate of ~25 frames per second, as shown in the previous experiment. The application acquires a certificate from a standalone Capability Manager Server (CMS). The certificate is passed to the D3P driver and verified using our kernel level RSA implementation. An image-cropping handler is then installed into the K-Plugin Runtime. It can crop out multiple rectangular areas as requested. For each of the incoming frames, the handler is invoked and produces the images the user is actually allowed to see. Table 2 shows the cost for each of the above steps. The time for acquiring a certificate and for verifying the certificate are both one-time costs, thus do not affect performance. It takes up to 1 millisecond to crop the image, which is far below the 40-millisecond frame cycle. Unless the cost of the operation handler is near the time of a frame cycle, the D3P driver will not affect the frame rate much, as evident from the results shown in the table. Table 2. D3P LV4L driver performance Network Delay Operation Acquire Certificate Verify Certificate Crop Handler Cost Frame Rate (frames/sec)

LAN Based

5ms Delay

10ms Delay

13 millisecond* 0.04~0.14 millisecond ** 0.6~1.0 millisecond *** 24.89 24.82

24.89

20ms Delay

24.91

* Depending on the network delay to the certificate server ** Depending on the size of the certificate *** Depending on the number and sizes of the crop areas.

4.5 LV4L Relay Performance In the first test, an intermediate node with a logical V4L device volunteers itself as a relay node. Another client accesses the actual camera via this logical device. In Table 3, connection status ‘X+Y’ means X-delay from the actual device to the intermediate node and Y-delay from the intermediate node to the client. In our test cases, the client achieves the full speed of the actual device, and the frame delay is consistent with the direct connection case in Figure 9.

Table 3. One-Hop relay for LV4L Connection Status Frame Rate (frames/sec) Frame Delay (ms)

LAN+LAN

LAN+5ms

LAN+10ms

5ms+10ms

5ms+20ms

25.03 1.10

24.98 16.88

24.98 28.19

24.98 41.18

25.00 57.73

Frame Rate on Multi-Conections 30

Frame Rate [Fr/Sec]

In the next test, the video sensor camera is attached to the node with only an 11Mbps link. We use one logical device driver to set up multiple push channels to simulate multiple clients with multiple logical devices. This simulated driver resides on one of the nodes in the Giga-link LAN. As shown in Figure 10, when all push channels are connected to the node with the actual device, the frame rate drops quickly because of the limited network resource. This problem is resolved by adding one intermediate node with a Gigabit link. All requests from the simulated driver are then served from this intermediate node, thereby maintaining a high frame rate even with a larger number of clients.

25 20 Direct 11M Link

15

Relay at 1G Link

10 5 0 1

2

4

8

16

32

64

128

Number of Clients

Figure 10. Performance for multiple connections

5. RELATED WORK Many video sensor network systems rely on application servers to deliver the video information to end users [1,15,16]. The sensor devices are hidden from end users ‘at the very end’ of the source side. Our work tries to export the device directly to remote clients, thereby allowing them to fully utilize the features of the devices. Although CameraCast provides a way to construct arbitrary delivery paths based on network connection status, the policies used to build and maintain the resulting overlay networks are run at application level. Sample policies are described in [8,9,17]. CameraCast simply provides a way to easily translate such overlay constructing decisions into the connection establishment necessary for building suitable video delivery networks. Our logical device model is similar to remote device access models like USB/IP [11] and Transparent Device Remoting (TDR) [12]. It differs in the level of abstraction being offered. USB/IP uses a peripheral bus extension to access a remote USB device, and TDR uses a NetBus. Because it extends the layer that controls communication between the device driver and the hardware, it cannot support concurrent access to the remote device. Our approach shares the device at the device driver level. With full knowledge of device, our logical device driver can do concurrent accesses via the dedicated device driver of the actual device. While our D3P device driver applies the capability model [13] to protect data privacy, we do not innovate in the domains of authentication and authorization. We refer to [14] and simply adopt a framework that is sufficient to support our idea.

6. CONCLUSIONS CameraCast is a mechanism for accessing a remote video sensor device using a logical device. Its device API is backward compatible with the standard Video for Linux API. The implementation model also permits multiple logical devices to be chained to create scalable multi-machine overlays via kernel-resident V4L proxies and clients. Moreover, the application can take advantage of API enhancements to protect remote video information, using a capability-based model for differential data protection that offers fine grain control over the information made available to specific codes or machines. Future work will extend CameraCast to use UDP/IP to reduce communication cost, and to improve the system by allowing frame dropping in the network. In addition, some QoS support will be added. More importantly, by using modern virtualization techniques, CameraCast’s data protection related functionality can be placed into a separated driver domain, thus further isolating it from the application and other parts of the system. The reviewers of this paper suggested several interesting extensions and/or additional experiments. Some of these are addressed in our future work. First, several related efforts are being undertaken by other students in our project, with promising results on the topics of self-virtualizing devices (i.e., devices for which virtualization functionality is included with the device itself) and device remoting (e.g., for USB devices). These efforts, undertaken with the Xen hypervisor, demonstrate that the CameraCast approach can be generalized to arbitrary devices in virtualized systems. Second, extensive work on overlays and overlay management conducted in our group (e.g., the IQ-Paths [18] overlays and the KStreams [19] kernel-level overlays)

indicates that extensions of CameraCast to distributed systems are possible. However, for such extensions, reviewers raise interesting questions, some of which are addressed in our current work. Questions include: (1) the need for stream merging or splitting to avoid redundant data transmission, (2) the need to flexibly deal with different access rights and permissions (which we are currently addressing for file systems in an ongoing effort on high performance I/O), (3) the ability to exploit intelligent network processors (partly addressed by our ongoing work on self-virtualizing devices, but studied in detail in our past work with Intel’s IXP communication processors). Finally, a more general effort addressing the issues that arise when potentially critical information is transported and manipulated across multiple machines is the ‘Trusted Passages’ project currently underway at Georgia Tech. Here, we are exploring how multi-core resources on future virtualized platforms can be used to monitor and inspect guest operating systems and their applications, the goal being to continually ensure certain levels of trust across overlays that transport and manipulate critical information.

REFERENCES 1.

2. 3. 4.

5. 6. 7. 8.

9. 10. 11. 12. 13. 14. 15.

16.

17. 18. 19.

Wu-chang Feng, E. Kaiser, Wu-chi Feng, and M. L. Baillif. Panoptes: scalable low-power video sensor networking technologies. In ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), Volume 1, Issue 2, May 2005. C. Shahabi, R. Zimmermann, K. Fu, and D. Yao, Yima: A Second Generation of Continuous Media Servers, IEEE Computer Magazine, June 2002, pp. 56-64. Y. Chen, K. Schwan, and D. Zhou. Opportunistic Channels: Mobility-aware Event Delivery. In Proceedings of the 6th ACM/IFIP/USENIX International Middleware Conference (Middleware 2005), November 2005 P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauery, I. Pratt, and A. Wareld. Xen and the Art of Virtualization. In Proceeding of the 19th ACM Symposium on Operating Systems Principles. October 2003. P. Widener, K. Schwan, and F. Bustamante. Differential Data Protection in Dynamic Distributed Applications. In Procedings of the 2003 Annual Computer Security Applications Conference, Las Vegas, Nevada, December 2003. J. Kong, K. Schwan, and P. Widener. Protected Data Paths: Delivering Sensitive Data via Untrusted Proxies. In the 4th Annual Conference on Privacy, Security and Trust (PST 2006). October 2006. I. Ganev, K. Schwan, and G. Eisenhauer. Kernel Plugins: When A VM Is Too Much. In the 3rd Virtual Machine Research and Technology Symposium, May, 2004 J. Jannotti, D. K. Gifford, K. L. Johnson, M. F. Kaashoek, and J. O'Toole. Overcast: reliable multicasting with an overlay network. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation, October 2000. D. Andersen, H. Balakrishnan, M. Kaashoek, and R. Morris. Resilient Overlay Networks. In Proceeding of the 17th ACM Symposium on Operating Systems Principles. October 2001 Georgia Tech Netbed Based on Emulab. www.netlab.cc.gatech.edu T. Hirofuchi, E. Kawai, K. Fujikawa, and H. Sunahara. USB/IP - a Peripheral Bus Extension for Device Sharing over IP Network. In USENIX 2005 Annual Technical Conference, FREENIX Track, April 2005. S. Kumar, S. Agarwala, H. Raj, and K. Schwan. TDR: Transparent Device Remoting in Virtualized Systems. E. Cohen and D. Jefferson. Protection in the Hydra Operating System. In Proceedings of the fifth ACM symposium on Operating systems principles. November 1975. T. Y. C. Woo and S. S. Lam. Authentication for Distributed System. In ACM Computer, Volume 25, Issue 1. January 1992 J. Huang, Wu-chang. Feng, N. Bulusu, Wu-Chi Feng, Cascades: Scalable, Flexible and Composable Middleware for Multimodal Sensor Networking Applications. In Proceedings of the ACM/SPIE Multimedia Computing and Networking (MMCN 2006), San Jose, CA, January 2006. L. Girod, J. Elson, A. Cerpa, T. Stathopoulos, N. Ramanathan, and D. Estrin. EmStar: A Software Environment for Developing and Deploying Wireless Sensor Networks. In Proceeding of the 2004 USENIX Annual Technical Conference. June 2004 L. Subramanian, I. Stoica, H. Balakrishnan, and R. H. Katz. OverQoS: An overlay based architecture for enhancing Internet QoS. Proceedings of the 1st Symposium on Networked Systems Design and Implementation, March 2004 Z. Cai, V. Kumar, and K. Schwan. IQ-Paths: Self-regulating Data Streams across Network Overlays. IEEE Symposium of High-Performance Distributed Computing(HPDC), 2006. J. Kong, and K. Schwan. KStreams: Kernel Support for Efficient Data Streaming in Proxy Servers. In Proceeding of NOSSDAV 2005, ACM, June 2005

Suggest Documents