INDIVA: Distributed Streaming Media and Equipment Control Middleware Lawrence A. Rowe, Wei Tsang Ooi, and Peter Pletcher Computer Science Division – EECS University of California Berkeley, CA 94720-1776
{rowe,weitsang,peterp}@bmrc.berkeley.edu
ABSTRACT Developing applications to control audio/video equipment and the interface between conventional audio/video signals and Internet streaming media is difficult. Consequently, widespread deployment and use of streaming media in day-to-day activities has been slow to develop. This paper describes a middleware system and application program interface designed to solve this problem. The system, called INDIVA, provides a hierarchical name space for accessing and controlling audio/video equipment, software services for processing media streams, and conference resources. The design and implementation of the system is described and examples are presented that illustrate how it can be used to implement direct manipulation interfaces for Internet streaming media. This middleware can also be used to implement control and automation systems for Internet webcasting and distributed collaboration systems.
1. INTRODUCTION Internet webcasting and distributed collaboration are complex distributed applications that require control of conventional audio/video equipment (e.g., cameras, routing switchers, audio mixers, etc.), software processes that operate on media streams (e.g., encoding, decoding, transcoding, forwarding, etc.), and conference resources (e.g., media streams, multicast sessions, and conferences). By media streams or streaming media, we mean a sequence of IP packets that contain continuous media data (e.g., audio, video, animation, etc.). Webcasting and distributed collaboration applications typically use multicast communication protocols because media streams must be delivered to many participants or processes. Accessing and controlling the widely varying and constantly changing audio/video equipment and managing the resources required to produce a webcast or distributed collaboration conference complicates the development of automation and control systems. For example, a complex webcast can require 10-20 processes running on several computers [14] and an Access Grid (AG) node requires at least four computers just to control the room and to encode and decode media streams [2].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference ’00, Month 1-2, 2000, City, State. Copyright 2000 ACM 1-58113-000-0/00/0000…$5.00.
Submitted for publication Apr 22, 2002
Webcasting and distributed collaboration applications are not widely used in part because of the cost and complexity of operating the systems. For example, a webcast production typically requires 3-5 skilled people if you have camera operators and a complex production (i.e., camera switching and visual effects). An AG conference requires at least one person at each location to operate the equipment. And, an H.323 videoconference requires an operator at each site and more people at a production center (e.g., MCU operators). These applications are too complicated for inexperienced end-users to use. Moreover, most end-users have neither the time nor the interest to learn how to operate them. The absence of simple end-user interfaces is caused by several reasons including: 1) the need for human decision making in the production process, 2) the proliferation of audio/video equipment with different features and command sets, 3) the complexity of the end-user interfaces to produce or participate in a webcast or conference, and 4) the complexity of the interface between the audio/video signals used by conventional production equipment and Internet streaming media protocols. The solution to some of these problems is automation, that is, replace people with intelligent software. Several research groups, including our group, have worked on the problem of automating webcast and distributed collaboration production [1, 8, 10, 15]. While it is possible to automate some aspects of one production, complexity increases dramatically when the system is scaled up to manage either a more complex production or many productions from different locations. The problems encountered by end-users when attempting to use streaming media limit use of the technology in day-to-day activities because simple requests are difficult to implement. For example, we have a satellite dish at Berkeley that end-users want to watch on a desktop using streaming media. The satellite receiver is connected to a routing switcher that can send the audio/video signals to a capture computer that will encode and transmit media streams. A user can watch these streams on his or her desktop by running an application to receive, decode, and play the streams. To simplify this application, we have developed web pages to change channels on the satellite receiver, to control the routing switcher, and to launch the capture applications. Nevertheless, the system is still difficult to learn and requires too much effort to use. The problem is that the user cannot specify what he wants (e.g., “show me CNN”), rather he must specify how to implement the request. The user must: 1) start capture processes on a computer(s) with available capture devices, 2) send a command to the satellite receiver to change to the desired channel, 3) send a command to the routing switcher to route the satellite audio and video to the selected capture devices, and 4)
Page 1
launch a local viewer to play the streams. Another example is a request to create a multicast session with three video streams: one from the speaker camera in a particular studio classroom, one from the audience camera in another classroom, and one from an existing video stream already being sent from a remote location. This request can be implemented by issuing commands to various computers, but end-users are unlikely to do it because they do not know the details required to complete the request. An end-user needs a direct manipulation interface that can be accessed from a desktop or portable input device. Figure 1 shows a typical production environment in a large company or educational institution. This example has a broadcast center and four studio classrooms with a variety of equipment and production facilities. The broadcast center houses shared equipment (e.g., videotape recorders, special-effects processors, etc.) and provides transmission and gateway services (e.g., satellite receivers and transmitters, H.32x videoconference MCUs, IP streaming media computers, etc.). Each studio classroom has a different configuration and they use three different transmission technologies to connect to the broadcast center routing switch. Classrooms 1 and 2 are connected by a CATV cable system; classroom 3 is directly connected using a point-to-point transmission technology (e.g., fiber, UTP, etc.); and, classroom 4 uses the Internet. Lastly, the production facilities are different in each classroom. For example, classroom 3 has a production switcher so event production can be controlled from a local control booth whereas the other classrooms need some other control and automation technology. The challenge facing any organization with this type of diverse environment is to find tools to produce events inexpensively. Software automation and control is the solution, but how do you write software that can operate in this complex heterogeneous environment. “INfrastructure for DIstributed Video and Audio” (INDIVA) is a middleware system designed to simplify the development of applications that access and control audio/video equipment, media streams, streaming media processing services, and conference resources. In essence, it provides high-level abstractions for accessing equipment and managing the interface between production equipment and streaming media applications. INDIVA uses the Internet conference model defined by the IETF, which is Studio Classroom - 1
Studio Classroom - 2
VTR
Scan Converter
Routing Switcher
Scan Converter
Cable System Broadcast Center
PTN
Satellite Receiver VTR
H.320 MCU
.. . Routing Switcher
Media Processing PC
IP Network
Media Processing PC
VTR Kaleido
Media Processing PC
Media Processing PC
Production Switch
Scan Converter Studio Classroom - 3
Media Processing PC
Scan Converter Studio Classroom - 4
Figure 1: Production environment example
Submitted for publication Apr 22, 2002
sometimes called Mbone conferences. A conference is composed of one or more sessions. A session is composed of media streams of one type, for example audio or video, being delivered from one or more senders to one or more receivers via IP multicast. A media stream sent by one member can use different media formats, and the format can change dynamically. Some members of a session send and receive streams while others might be sendonly or receive-only members. Members can join or leave a session at any time. The conference resources managed by INDIVA include multicast sessions, media stream sources and destinations, and conference and session descriptions such as SDP announcements. The INDIVA middleware was designed with three applications in mind: 1) an interactive end-user shell and direct manipulation desktop interface, 2) an automation system for an Internet webcasting production system, and 3) an automation system for the Access Grid distributed collaboration system. The interactive shell is based on three ideas: 1) supporting direct manipulation of equipment, media streams, multicast sessions, and conferences, 2) providing high-level commands that match user actions, and 3) defining reasonable defaults for the common case. This shell and associated applications will simplify streaming media use by endusers. Webcast and distributed collaboration control and automation applications require the ability to execute commands on remote equipment and software services, to manage conference resources, and to control access to resources. INDIVA provides high-level, easily understand abstractions based on a file system metaphor that will encourage rapid prototyping and experimenting with automation and control applications. Lowlevel programming details, such as locating cameras in a particular room, controlling the cameras, and accessing their video signals, are performed by the middleware. Consequently, resource descriptions and capabilities are separated from the application source code so changes can be made without requiring source code modifications. This architecture also allows an application to control rooms with various equipment configurations. The control and automation applications will reduce the cost and complexity required to produce events, thereby encouraging more use of this technology. Many commercial companies and research groups have developed hardware and software to control audio/video equipment. Most equipment control protocols use infrared and serial link interfaces so a remote control device or computer can send commands to the equipment. Typically, vendors supply a proprietary interface to control their equipment. Some companies and consortiums have developed standard control protocols built on these basic link protocols (e.g., control-S, LANC, etc.), but they have not been universally adopted and supported. And, some groups have implemented software to control equipment for a specific application (e.g., videoconferencing 13], broadcast automation [4], presentation control [6], etc.). These systems are excellent solutions for the particular application, but they are too limited in functionality. An open, extensible middleware system that provides more functionality and can be easily re-used for many applications is needed. Several companies have developed hardware and software systems to implement audio/video control systems (e.g., AMX, Crestron, SmartHome, etc.), but they do not provide high-level interfaces for distributed control applications. They do provide programming interfaces but that just complicates application
Page 2
development. First, you have to program the control computer to add a new function, and then you have to implement code to use these new functions. And sometimes, performance constraints imposed by the control systems (e.g., latency responding to equipment signals) makes the equipment unusable. Nevertheless, many environments use these types of control systems so they can be easily interfaced to the INDIVA environment. Lastly, several groups have defined proposed standards for controlling various types of devices (e.g., HAVi from the major consumer electronics companies, JINI from Sun, and Universal Plug&Play from Microsoft). These technologies share some goals with INDIVA but none of them provide abstractions to manipulate Internet streaming media, audio/video equipment, and conference resources. Rather than attempting to define an all-encompassing standard, our goal is to provide a foundation for rapid prototyping and experimentation. Some readers may be asking “How does the broadcast television industry solve this problem?” They use a standard studio architecture (i.e., a local production control room for each studio), point-to-point interconnection technology, and a star network topology. As a practical matter, this approach is too expensive for most companies and colleges. Eventually, all transmission may use the Internet. Most modern production equipment has an Ethernet connection but it is only used for control. Using the IP network to transport high quality audio/video signals requires an expensive device at both ends. Currently, IP connection end points cost between $1,000 and $10,000 compared to CATV or UTP connections that cost approximately $200 per end point. This paper describes the design and implementation of INDIVA. It is organized as follows. Section 2 describes the INDIVA architecture and command abstractions. Section 3 describes a prototype implementation of the system. Section 4 discusses varies issues related to the design, implementation, and use of the system.
2. INDIVA SYSTEMS ARCHITECTURE This section describes the INDIVA architecture and the abstractions it provides. Examples are presented to illustrate the use of INDIVA to implement a direct manipulation interface to audio/video equipment, media streams and services, and conference resources. We will call these INDIVA resources. The INDIVA middleware provides abstractions to name, access, and control resources. Figure 2 shows a model of an INDIVA node. The figure shows the interaction between audio/video equipment, an audio/video routing network, media processing and control computers, and an IP network. The audio/video routing network is a circuit-switched network that connects equipment to the media processing computers. These computers translate back and forth between audio/video signals and media streams. Audio/video equipment might be located in the same room as the media processing computers (e.g., a studio classroom); it might be located in a different room (e.g., a broadcast center); or it might be located at a remote location connected to the routing system through some other transmission technology (e.g., microwave, satellite, CATV, etc.).
Ap plicatio ns A/V E quipm ent A/V Cont rol Comput er s
A/V R outin g Netw ork
Media Proc es sing Comput er s
IP Netw ork
indiva Manag er
Figure 2: INDIVA Environment computer can control the equipment and process media. But, sometimes media processing computer(s) are located in a broadcast center. A studio or classroom might have several pieces of audio/video equipment controlled by one computer and the audio and video signals might be sent to a broadcast center using an audio/video routing network. Applications need to route audio/video signals between different equipment (e.g., sending camera output to a special-effects processor) or between equipment and computers or vice versa. The audio/video routing network is composed of routing switches connected to control computers, so circuits between input and output ports on a switch can be established remotely similar to the way audio/video equipment is controlled. Applications also need to send commands to equipment and software services. For example, the application might want to move a pan/tilt camera or change the capture properties (e.g., image size and frame rate for video or sampling rate, sample size, and number of channels for audio) or encoding properties (e.g., format, quality factor, bit rate, etc.) of a service that captures, encodes, and transmits a media stream. INDIVA is designed to make it easy for a user or program to execute these commands. An INDIVA node represents a domain of equipment and services. For example, on the Berkeley campus there might be a BMRC domain for resources managed by the Berkeley Multimedia Research Center and an ETS domain for resources managed by the campus-wide Educational Technology Services organization. Or, consider NCSA Access in Washington D.C., which has several AG nodes. One INDIVA node might control all resources in the AG nodes at that location or each AG node might be a distinct INDIVA node. An application can interact with several INDIVA nodes so users and applications can easily control resources at different nodes. A manager process, called the INDIVA manager, controls resources within a node and responds to requests from applications. An application need not know how equipment is connected to computers or how audio/video signals are routed through the routing network. Nor does the application need to know where the control process for a particular piece of equipment or software service is located. An application sends a command to the INDIVA manager, which executes the command or forwards it to the service process(s) that can execute it. The INDIVA manager is like a network file system server except it supports access to and control of audio/video equipment and
Today, most audio/video equipment can be computer-controlled using a serial or infrared connection shown as a dotted line in Figure 2. A control computer is connected to the IP network so applications can control remote equipment. Sometimes one
Submitted for publication Apr 22, 2002
Page 3
streaming media services.1 A library is linked into the application to provide high-level abstractions for interacting with the system.
vcomp.out bttv.cap
INDIVA uses a hierarchical name space to specify resources. An application mounts an INDIVA node and accesses resources
controlled by that node using a Unix-like file system. The name space is composed of files and directories. INDIVA resources are represented either by a file or a directory. For example, a video capture device typically has several input and output ports (e.g., composite and s-video). A directory represents the capture device and files within the directory represent the ports. INDIVA resources use an optional file extension to indicate the resource type. Examples of file extensions include routing switch (rs), satellite dish (sat), camera (cam), computer (pc), media stream (rtp), multicast session (ses), Mbone conference (conf), and service process (svp). High-level commands are provided to specify operations on resources (e.g., encode, play, config, info, mv, etc.). A complete list of the commands is given in an appendix. The following examples illustrate the use of INDIVA. The examples are presented as commands to the INDIVA shell (ish) or as commands invoked by direct manipulation commands on a desktop. The shell and desktop applications translate commands into calls on the INDIVA library and commands to the local computer (e.g., launch a process). The interface to the library is based on Tcl commands and Otcl objects [12], but it is easier to illustrate the functionality of the system by describing the ish interface. ish has built-in commands for the major INDIVA operations. It is embedded into a Tcl wish shell to simplify testing and application development. Applications launched by ish can be written using Open Mash [11] or any programming language that supports IETF streaming media protocols (e.g., SDP, RTP, RTSP, SIP, etc.). The remote procedure call package used by the INDIVA library and manager is XML/RPC so applications written in different programming languages can use the middleware. An application accesses resources managed by a particular INDIVA node by mounting the name space. The following
command mounts the BMRC node: % mount /bmrc media0.bmrc.Berkeley.edu:9500
The mount command specifies the prefix name for resources at the BMRC node (i.e., /bmrc) and the host and port number used to communicate with the INDIVA manager. Figure 3 shows the hierarchical name space for an INDIVA node. A hierarchical name specifies a particular resource. For example, the name /bmrc/devices/parkervision-306.cam
specifies a particular camera that happens to have audio and video output ports connected to the audio/video routing network. In fact, parkervision-306.cam is a directory that contains files for the output ports on the camera. You can list the ports on this camera as follows: % ls /bmrc/devices/parkervision-306.cam
aleft.out aright.out svideo.out vcomp.out
1
The DEC AudioFile server had a similar software architecture but only supported operations on audio equipment [7].
Submitted for publication Apr 22, 2002
s-video.out
media2.pc
a-left.out
sb.cap
a-right.out
: :
vcomp.out parkervision-306.cam
audio.out
devices
bmrc
:
sue
users : services
spkrcam
306Soda : …
rooms
std.conf :
…
main.rtp
: …
video.ses slides.rtp
migse m.conf
conferences :
audio.ses
: …
Figure 3: Hierarchical name space for an INDIVA node Bold typeface symbols and characters are output in response to commands entered by the user shown in regular typeface. The aleft.out and aright.out ports specify the left and right audio output channels captured by the built-in wireless microphone the ParkerVision uses for tracking, and the svideo.out and vcomp.out ports specify the video output in either s-video or composite format. You can list all equipment that can be accessed at an INDIVA node as follows: % ls /bmrc/devices autopatch-310.rs/ echostar-530.sat/ media2.pc/ parkervision-306.cam/ vcc3-405.cam/ …
autopatch-326.rs/ knox-530.rs/ media3.pc/ parkervision-310.cam/ vcc4-306.cam/
As you can see, devices are represented by directories that contain files that describe the ports on the device. Directories are also used to group related resources. The conferences, devices, and services directories shown in figure 3 contain files and directories that describe the INDIVA resources of the respective type managed by this node. The rooms directory contains a subdirectory for each room that contains audio/video equipment managed by the node; the users directory contains a subdirectory for each user; and the conferences directory contains a subdirectory for each Mbone conference that is currently being announced in the Mbone announcement channel. The conferences directory is a cache for Mbone conferences similar to the cache described by Swan et.al. [16]. The services directory contains a subdirectory for each service launched by the INDIVA node. The INDIVA name space supports links to simplify resource naming and management. For example, a room directory typically contains links to the resources in the room. The following commands list the devices in a particular room: % cd /bmrc/rooms/306Soda % ls audcam@ ovhdcam@ audiomixer@ spkrcam@
stagecam@ vcr@
These resources represent cameras (i.e., audience, stage, speaker, and overhead), an audio mixer, and a VCR in a particular classroom. In fact, they are links to the directories or files that
Page 4
Viewer: Conference (Open Mash Meeting) Viewer: Dish Network (CNN)
Time:
0:02:15
Menu
Exit
Figure 4: Single-stream viewer
Figure 5: Conference viewer
describe the specific resources. For example, the spkrcam in 306 Soda Hall is the camera parkervision-306.cam mentioned above. A dashed line in figure 3 represents the link. Another classroom might use a different type of camera in the speaker camera position (e.g., a Canon VCC-4 pan/tilt camera). Using links with the same names that point to different devices is one mechanism to simplify control software. Links are also used in home directories to create shortcuts to resources of interest. The INDIVA manager supports other commands commonly found in file systems to manage the name space (e.g., unmount, mkdir, pwd, rm, mv, cp etc.). Some of these commands are restricted. For example, the system will not allow a user to delete entries in directories that represents INDIVA resources (e.g., services, devices, etc.).
section. The returned names are used as arguments to the view command using the pipe (“|”) operation. These commands can also be written using a Tcl-style syntax as in encode $resource | view which is equivalent to
INDIVA has commands to query and update equipment and service information and commands to manipulate streaming media resources. For example, commands are provided to: 1. Encode audio and video signals and transmit them in conference sessions. 2. Decode and play a media stream(s) in a session. 3. Record a session or conference into an archive. 4. Play an archived media stream in a session. Suppose a user wants to watch CNN on a satellite dish. The following commands will allocate resources, change the satellite receiver channel, route the signals from the receiver to an available computer and capture device, launch processes to capture, encode, and transmit the audio and video streams, and launch processes to play the streams on the user desktop. % encode /bmrc/devices/echostar-530.sat \ –channel “cnn” $stdconf | view
The encode command takes a source device, optional arguments to the source device or encoding services, and a destination conference with audio and video sessions to which the streams will be sent. Every user has a predefined standard conference (i.e., a conference named std.conf in their home directory) that can be used to play streams. This command uses the shell variable $stdconf which is set by default to the user’s standard conference. For example, Sue has a standard conference with the fully qualified name of /bmrc/users/sue/std.conf that is stored in $stdconf in her shell. In the example above, default values are used for image capture, video coding, and transmission options (e.g., image size, frame rate, coding format, transmission bit rate, etc.). The encode command returns the name of the media streams it created. We will explain how we name media streams later in this
Submitted for publication Apr 22, 2002
set x [encode resource]; view $x and view [encode $resource] The view command launches a local viewer for the conference resource passed to it. The viewer joins the audio and video sessions and plays the streams. Figures 4 and 5 show the viewers launched on the user’s desktop by the view command. Figure 4 shows a single-stream viewer while Figure 5 shows a conference viewer with thumbnails for all video media streams. Doubleclicking on a thumbnail brings up a single-stream viewer on that stream. The user specifies which viewer to display and which sessions and sources to play by giving optional parameters to the view command. The view command allows the user to specify which sessions and streams to play. The following examples illustrate these features: % view $stdconf % cd $stdconf % view audio.ses % view video.ses/{main,slides}.rtp % view video.ses audio.ses/speaker.rtp The first command plays all media streams in all sessions; the second command plays all media streams in the audio session; the third command plays only the main.rtp and slides.rtp media streams in the video session; and the last command plays all video media streams and only the speaker.rtp audio session media stream. The default behavior is to play the listed media streams. The –mute option allows the user to specify media streams to be muted, in other words, play all media streams except the ones listed. The viewer can query the conference description referenced by $stdconf to access the source device and capture process if the user wants to change the satellite channel or capture process parameters. The info command is used to query for information. For example, to lookup the name of the equipment that produces a signal transmitted in a multicast session, you issue the command % info $stdconf/video.ses/x.rtp \ –equipment-source
The value returned is the fully qualified name of the equipment.
Page 5
Now, the following command changes the satellite channel % config /bmrc/devices/echostar-530.sat \ -channel “espn”
Channels are modeled as subdirectories with ports for the audio/video signals so that users can query the device to determine the available channels and to create links to commonly viewed channels. The option –channel is provided to simplify requesting audio and video signals for a channel. The info command option –ms-process is provided to lookup the media service processes that produce and operate on the media stream. In our example, the only service process is the capture process. The name of this process might be /bmrc/services/
[email protected]
The config command can now be used to change the parameters as shown in % cd /bmrc/services % config
[email protected] -image cif \ –bitrate 600kbs
The options available are specific to each resource. You can ask for a list of options using the info command and the options subcommand.
you list a session, you get a list of the source streams in the session as in: % ls $stdconf/video.ses
[email protected]:1.rtp
[email protected]:2.rtp …
A media stream source in the session identified by a generated name, as in this example, uses the host IP address on which the service process is running and the RTP CNAME. In traditional RTP applications, such as vic and vat, it is assumed that each host transmits at most one media stream into each session. Hence, applications use CNAMEs like “username@hostname” to identify a media stream.2 Since INDIVA may send multiple streams from one host into the same session, INDIVA services append a generated number to the CNAME to ensure uniqueness. Remember that users can optionally specify the media stream name in the command that created the service process or use the mv command to rename the resource after it has been created. The info command displays information about a resource. The following examples query information about a conference and a specific session: % info $stdconf Name: Berkeley MIG Seminar Owner: BMRC Webcast Director … % cd $stdconf/ % info video.ses IP Address: 224.2.3.4/4444 Format: video RTP/AVP 31 … % info video.ses/
[email protected]:1.rtp Source Host: 128.32.64.215 CNAME:
[email protected]:1 …
Direct manipulation commands to change equipment and capture process parameters are also available through the viewers. For example, stream-specific menu items are included in the pulldown menu in the single-stream viewer. Right-clicking on a thumbnail in the multiple-stream viewer displays a popup menu with streamspecific commands, and right-clicking elsewhere in the window displays a popup menu with conference- and session-specific commands. The viewer requests these UI components from the equipment control and media capture processes when it is launched. The requests are sent to the INDIVA manager, which forwards them to the appropriate process(es). The process(es) return a list of commands and UI components for the particular equipment or capture process. The menu also contains items to display statistics about the stream, session, or conference and to record streams locally (i.e., a personal video recorder with pause, position, and play operations). The viewer also has UI controls to mark, copy, and annotate segments of media streams. For example, the singlestream viewer includes mark-in and mark-out operations that allow the user to identify a segment of a video and/or audio stream. This segment can be copied to a media archive, played into another live session, or passed as an argument to a media processing service. This operation is typically found in an editing environment but not in a desktop streaming media player (e.g., Quicktime, Real Networks, or Windows Media players). The concept we are exploring is direct manipulation of streaming media. It is nearly impossible to experiment with these types of applications today because you need so much software infrastructure just to experiment. Conference resources can be created, examined, modified, and destroyed by INDIVA commands. The ls command lists sessions in a conference: % ls $stdconf audio.ses/ video.ses/
This conference has two sessions. Each session is represented in the INDIVA name space by a directory with the extension ses. If
Submitted for publication Apr 22, 2002
INDIVA is designed to simplify the manipulation of conference resources, so you can request an SDP description of a conference using the sdp subcommand to info as in: % info sdp $stdconf s=Berkeley MIG Seminar i=Regularly scheduled weekly seminar m=audio 22054 RTP/AVP 3 c=IN IP4 224.2.231.76/63 …
And, you can create a conference using the command % mkcon –sdp $sap myconf.conf
This operation creates a conference defined by the SDP specification stored in $sap. The conference is created in the current directory with the name myconf.conf. An optional argument to the mkcon command, named –announce, allows the user to request that the description be announced in the Mbone announcement channel. Recall that all announced conferences also have entries in the /bmrc/conferences directory.
2
The term source is ambiguous in this context. The RTP standard uses the term to refer to the process that transmits the media stream. INDIVA uses the term to refer to the equipment or process that originated the audio/video signal or media stream. We use the term media stream source to refer to the process that creates the RTP packet stream.
Page 6
Other operations on conference resources include commands to copy a media stream from one session to another session and to move a media stream from one session to another. We will illustrate these functions in terms of a desktop operation, but they are important operations in a webcast or distributed collaboration control system [17]. Consider the following example. A user wants to show a visitor what is happening in three classrooms. Suppose webcasts are being produced from each of the rooms. The user might want to create a new video session with the primary video stream from each webcast. The primary video stream is typically switched between the stage, speaker, and audience cameras depending on what is happening in the lecture [8]. The user might specify this operation by executing a copy&paste operation between the stream or thumbnail viewer shown in above and another conference or session viewer. Another way to specify this command is to drag the thumbnail from one session window and drop it on another session window. This operation is similar to a file system drag&drop operation except the entities being manipulated are media streams.
examine the internal structure of the recording (e.g., sessions, media streams, etc.).
3. IMPLEMENTATION This section describes the implementation of a prototype INDIVA system. The system is composed of an INDIVA manager process, a collection of service processes, and a client library. The INDIVA manager maintains the hierarchical name space, a directed graph that represents signal paths through the audio/video routing network, a table of allocated paths and resources, and a table of running service processes. A signal path is composed of segments that might be signals in an audio/video routing network or IP packets in a conference session. A signal path is also called a flow. The hierarchical name space is stored in a file system so it will be persistent. The other data structures are stored in main memory. Service processes implement a variety of services including audio/video encoding and decoding and equipment control. Commands sent to the INDIVA manager are implemented by querying and updating the data structures and by sending commands to service processes that can perform the operations.
To execute the cp command the INDIVA manager either: 1) launches a forwarding agent to copy packets sent by the CS160 video stream from RTP source
[email protected]:14.rtp to the video session in the user’s standard conference or 2) requests the capture process to send the stream to two sessions. Capture processes can send a media stream to different sessions at the same time. The move command (mv) deletes the stream from one session and adds it to another session. Implementing these operations is somewhat complicated because INDIVA must identify which process is sending the stream and execute a command to send it to another session or change the conference and session to which the stream is being sent. INDIVA and the capture services must manage these resources.
Figure 6 shows the process architecture of the system. Service processes are distributed over several host computers. An active services system based on the AS/1 framework manages these processes [3]. The AS/1 framework uses a soft-state request/response protocol to locate hosts with services required to implement a command and to allocate resources. A host manager (hm) process runs on each computer that can execute service processes. The INDIVA manager and all host manager processes communicate using one multicast session. The INDIVA manager sends a multicast request for a specific service to all host managers. Host managers with resources available to satisfy the request send a response message offering to provide the service. The manager chooses one of the responses, typically the first one received. The host managers use multicast damping, that is, they wait a short random amount of time before responding, and they listen to responses by other host managers so that the INDIVA manager is not flooded with responses. The response message includes information required to communicate with the service process (i.e., host IP address and port number for a socket). A service process might provide service to several clients or it might be limited to one client. Figure 7 lists some of the service processes. Figure 6 shows that audio and video encoding processes are running on host media0 and the routing switch and satellite receiver control processes are running on host control0. The video tape deck control process is not currently running because it is not needed.
Direct manipulation of Mbone conferences, multicast sessions, and media streams suggests other applications and interfaces. For example, a desktop file browser, similar to a file system explorer, might be constructed for an INDIVA node. Traditional operations on a file browser (e.g., double-click, drag&drop, etc.) can be implemented on INDIVA resources. An RTSP archive, for example, can be represented by a resource named myarchive.rtsp. This directory might have a special icon to denote the archive. Dropping a conference, session, or media stream entity on the icon can initiate a recording of the stream. Double-clicking on the archive icon might show the stored recordings, and double-clicking a recording might launch a player for that stored media. Similarly, the ls command can be used to
All AS/1 processes send heartbeat messages periodically. A heartbeat message is sent from the INDIVA manager to a service process to indicate that services are still required. A message is sent from a service process to the INDIVA manager to indicate that the service process is still alive. A service exits if it does not receive a heartbeat message from the INDIVA manager after a timeout period. The INDIVA manager re-launches a service if it does not receive a heartbeat message from it after a timeout period. AS/1 and the request/response protocol allow the hardware and process distribution to be changed without requiring modifications to the INDIVA manager source code. It also simplifies launching and managing persistent services and adding and removing hosts and services.
The following ish commands illustrate how an application can implement these operations. % cd /bmrc/conferences % ls # list active lecture webcasts cs160.conf/ migsem.conf/ syssem.conf/ … % cd cs160.conf/video.ses % ls # list sources in CS160 webcast
[email protected]:14.rtp
[email protected]:15.rtp % cp
[email protected]:14.rtp \ $stdconf/video.ses
Submitted for publication Apr 22, 2002
Page 7
Process VTR
A/V Routing Network
Satellite Receiver
irsd iae
iscd
ive hm
hm media0
indiva manager
hm media2
media1
hm control0
IP Network
Figure 6: INDIVA process architecture The remainder of this section describes the data structures used to maintain the hierarchical name space, information about client requests, service processes, and signal paths and how various INDIVA commands are implemented.
3.1 Name Space Management As mentioned above, the INDIVA resource name space is stored in a file system. A directory in the file system represents an INDIVA directory (e.g., /bmrc/rooms or /bmrc/devices) or an INDIVA resource (e.g., equipment, session, or conference). A file in the file system represents a port, a live media stream, or an archived stream. Links in the INDIVA name space are represented by links in the file system. Attribute values that describe information about INDIVA resources are stored in files. Attributes about INDIVA resources represented by a directory in the name space (e.g., equipment) are stored in a file, named .info, in the directory for the resource. Attributes about ports are stored in the file that represents the port. Attributes are represented as name/value pairs. For example, the camera /bmrc/devices/parkervision-306.cam has attributes that specify the camera make and model which in this case is “Parkervision” and “CameraMan Sys II,” respectively.3 The input/output ports of a routing switch are represented as files as shown by the following example of the 8x8 Knox routing switch: % ls /bmrc/devices/knox-530.rs/ aleft01.in aleft01.out aleft02.in aleft02.out … aright01.in aright01.out aright02.in aright02.out … v01.in v01.out v02.in v02.out …
The switch has composite video, audio-left and audio-right planes. Each output port has attributes type, from, action, and portid. Consider the port v05.out, which is a video output port. The attributes specify the signal type (i.e., composite), the source (i.e., v*.in), the actions defined on the device (i.e., switch), and the port number (e.g., 5). The satellite dish 3
Although the names of resources encode information about the resource (e.g., model and room number), INDIVA makes no use of this information. We encode the information in the name as a convenience.
Submitted for publication Apr 22, 2002
iae ive iad ivd iamxd ipvd irsd iscd ivcccd ivtrd
Description Audio encoder Video encoder Audio decoder Video decoder AMX control deamon ParkerVision Cameraman daemon Routing switch daemon Satellite receiver control daemon Canon VCC camera control daemon Video tape deck machine control Figure 7: Examples of services
echostar-530.sat has three output ports vcomp.out, aleft.out, and aright.out. Ports are represented explicitly so a flow can be mapped from a specific output port on one resource to a specific input port on another resource. Some INDIVA commands (e.g., ls, mkdir, ln, etc.) are implemented by executing file system commands on the name space file system. Commands that create a flow or execute control commands are implemented either by the INDIVA manager or by service processes. The next two subsections describe how they are implemented.
3.2 Resource Management The INDIVA manager maintains data structures that describe flows allocated to clients and processes created to provide services to clients. The flows represent signal paths through the audio/video network to and from resources. Each flow is allocated in response to a client request. For example, a request to encode a stream must allocate a flow from the source device to the encoding service(s). A request to composite two signals into one signal using the Kaleido must allocate a flow from each signal source (e.g., an output port on a piece of equipment or a media stream source in a conference session) to an input port on the Kaleido. The data structure that maintains state about allocated flows is called the flowTable. The flowTable has one row for every active request where an active request has been completed or is currently being implemented by the INDIVA manager. The columns in the table include the following: 1. UID – a unique identifier for the request. 2. client – a socket for communicating with the client. 3. source – the source resource. 4. destination – the destination resource. 5. flow – the signal path from the source to the destination. Each segment of the path is represented by a three-tuple with from and to resources and a unique identifier UID for an optional service that controls the segment. The service that controls a segment might be a control service for a piece of equipment like a satellite dish or a Kaleido. Or, it might be a media processing service like an audio encoding process that controls the segment from the output port of a capture device to the input port of an audio session in a conference. The unique identifier of the service in each path segment is used to access the serviceTable, which maintains a list of running
Page 8
service processes. This data structure has one row for every process and the following columns: 1. UID – a unique identifier for the service. 2. process – a socket for communicating with service. The entries in this table are created when a client request to allocate a signal flow is implemented as described in the next section. The INDIVA manager sends a heartbeat message as required by the AS/1 framework to every service with an entry in a segment of a flow in the flowTable. Service processes send heartbeat messages periodically to refresh the entries in the serviceTable. These running services are listed in the directory /bmrc/services so an application can execute commands to access or control the service.
3.3 Audio/Video Flow Management The primary function of the INDIVA manager is audio/video flow management. When the manager is executed, it constructs a directed graph that represents the audio/video network and equipment. This graph, called the avgraph, is composed of nodes that represent ports, equipment, or conference sessions. Edges in the avgraph represent a possible segment of a flow. An edge from one node to another node means that an audio/video signal or packets of a media stream can flow from the source node to the destination node. The nodes representing equipment (e.g., capture devices) and conference sessions indicate the possibility of a segment from a port on the device to a media stream in the session or vice versa. The graph is constructed from information stored in the file system that represents the INDIVA name space. Figure 8 shows a small fragment of the avgraph for the BMRC node. Equipment, computers, capture devices, conferences, and sessions are shown as rounded boxes. They correspond to directories in the INDIVA name space. Small circles represent nodes in the avgraph. White circles represent ports, and back circles represent an NxM node meaning that any input port can be connected to any output port. The figure shows video flow segments from three pieces of equipment (i.e., the EchoStar satellite, a Canon VCC-4 camera, and a DV tape deck) to input ports on the Knox router and from output ports on the router to three capture cards in two media processing computers. On the right side of the figure three conferences are shown with a video session. This figure shows an edge from a capture device to a conference session that implies a flow can contain segments from a port on the device to a media stream in the session. Remember, the Knox router is an 8x8 switch with composite video and stereo audio planes. This figure shows three inputs and three outputs on the video plane only. The complete avgraph has left and right INDIVA
media2.pc
migsem.conf
knox-530.rs
dv-530.vtr v02.in
echostar-530.sat
video.ses v03.out
vcomp.out
vcomp.in bttv.cap media3.pc
v03.in
cs160.conf
v04.out
vcomp.out
video.ses
vcomp.in vcc4-306.cam vcomp.out
bttv.cap
NxM v04.in
v05.out
NxM
vcomp.in lml33.cap
Figure 8: Fragment of avgraph
Submitted for publication Apr 22, 2002
syssem.conf
audio segments exiting the source equipment and passing through the two audio planes in the router and connecting to audio capture boards in the computers. It also has edges from the capture device to the audio sessions of the conferences. The avgraph is represented by two mappings: nodes: name object edges: name × name object where name is the fully qualified INDIVA name for the resource (e.g., /bmrc/devices/echostar-530.sat) and object is a reference to the implementation object that represents the entity. Using this representation, the INDIVA manager can access the object corresponding to a resource given the name, and it can retrieve flow segments between two resources. The object that represents the edge contains references to the from and to objects, a state attribute, and an optional object that represents the service that controls the segment. The state attribute indicates whether the segment is available or it has been allocated to a flow. In the future, this attribute will be used to reserve segments and control access to specific users and applications. Now, suppose a client application sent the following command to the INDIVA manager: % encode /bmrc/devices/echostar-530.sat \ –channel “CNN” ~sue/std.conf
The basic algorithm to allocate the audio and video flows and connect to the required control and media-processing services is as follows. The algorithm will be described for the video signal. It is executed additional times to allocate the audio flow(s). First, the shortest paths from the source (i.e., the composite video output of the EchoStar satellite dish) to the destination (i.e., the video session of Sue’s standard conference) are found by searching the avgraph. The source and destination nodes in the avgraph are located by using the nodes mapping with the INDIVA resource name. If the resource has more than one port (e.g., the satellite receiver might have both composite and s-video ports), a default port is specified in the equipment attributes. Second, a path is selected by tracing the segments to the source starting from the destination. AS/1 is used to select one segment when more than one path exists to the current node. In other words, if two segments enter a node, an AS/1 request message is sent to the host managers to choose the appropriate segment. The AS/1 request/response mechanism is used so that the hosts that provide the services can make resource allocation decisions. Notice that new services can be added dynamically to the system and they will be considered in the next round of allocations because the AS/1 request/response framework uses multicast communication from the INDIVA manager to the host managers. Third, the path is traversed from source to destination executing any services associated with segments in the avgraph. During this traversal the avgraph edges are marked busy to indicate that a flow is currently using this edge. AS/1 is used to launch a required service if it is not currently running either because it has never been launched or because a prior invocation has timed out and exited. Recall that the serviceTable maintains information about services that are already running.
video.ses
Assume the selected signal path is from the satellite receiver through the Knox router to the bttv.cap capture device in media3.pc. The services launched along the way include the control process for the satellite dish (iscd), the control process for
Page 9
the routing switch (irsd), and the video encoding process (ive). In all likelihood, the routing switch control process is already running, but the other processes may or may not exist. As part of launching the services, the INDIVA manager sends commands to switch the satellite channel to CNN, to route input port v03.in to output port v04.out, to select the bttv.cap capture device and vcomp.out port, and to initiate video capture and transmission. After the flow paths are allocated and the services launched, a response is sent to the client with a response code indicating successful execution of the command. Suppose the flow allocation algorithm reaches a segment that is busy when selecting the signal path. Under some circumstances, a second client will be allowed to use the signal (e.g., two applications are using the same camera). INDIVA must recognize these situations and handle them correctly. However, in some cases the segment cannot be shared.
And, a description of the lines of code in the various components implemented. We will also report on the performance of executing a command. Early measurements using a wall clock suggest that a command takes 0.5-1 second to execute. This performance, while not impressive, may be good enough for the applications we have in mind. However, we expect the system can be tuned to reduce the time dramatically should that be necessary.
5. SUMMARY The design and implementation of a distributed middleware system for managing and controlling audio/video resources was described. The system was designed to simplify the development of applications to automate and control webcasts and distributed collaborations and to encourage day-to-day use of streaming media technology. A prototype implementation of the middleware exists, and it is being actively developed.
A flow segment can only carry one signal so the INDIVA manager may not be able to complete a command because one or more segments between the source and destination are already being used to transmit other signals. Moreover, it is possible for the INDIVA manager to deadlock attempting to satisfy commands from two clients at the same time because resources are allocated incrementally and the flow allocation algorithm might allocate segments in different orders. Finally, managing a complex audio/video production environment requires that some activities take precedence over other activities. For example, an application using a camera in a studio classroom must be pre-empted when a lecture webcast is about to begin because the webcast is more important, in some cases, than someone watching the room. Generally speaking, most resources can be shared for reading (e.g., playing a stream) but not for writing or control.
6. ACKNOWLEDGMENTS
These problems are solved as follows. First, a priority mechanism is used to break deadlocks when two commands are attempting to allocate resources at the same time. If the requesting client has higher priority, the resource is pre-empted and allocated to the requesting client. A notification message is sent to the client of the request that has been de-allocated. The “earliest command received” aborts the later command if both requests have the same priority. Second, if a command attempts to allocate a resource that has already been allocated to another client, a “resource unavailable” response is returned unless the requesting client has higher priority. Lastly, clients with appropriate authorization can restrict all use of resources. This mechanism is required, for example, when a private meeting is being held and the participants want to limit who can watch and listen to the conference sessions or the cameras and mixers that produce the audio and video signals.
[4] S.J. Angelovich, K.B. Kenny, and B.D.Sarachan, “NBC’s
4. Discussion This section discusses the current state of the implementation and makes several observations about future development of INDIVA. Note to reviewers: A prototype implementation of the system exists. The AS/1 services, the INDIVA manager, and a rudimentary ish have been implemented. Commands can be executed through ish that allocate signal flows and produce video streams. The signal flows can go from a camera through several routing switches to a capture computer/device that sends a compressed stream to an Mbone conference. A simple control service is working. The final version of the paper will include an updated description of what is working.
Submitted for publication Apr 22, 2002
The National Science Foundation Grant ANI-9907994 supported this research.
7. REFERENCES [1] AutoAuditorium, http://www.autoauditorium.com/, 2002. [2] Access Grid, http://www.accessgrid.org/, 2002. [3] E. Amir, S. McCanne, R. Katz, “An Active Service Framework and its Application to Real-time Multimedia Transcoding,” Proceedings of the ACM SIGCOMM ‘98 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, Vancouver BC, Canada, 1998. Genesis Broadcast Automation System,” Proc. Of the Sixth annual Tcl/Tk workshop, San Diego, CA, Sep 1998.
[5] M. Delco, “Production Quality Internet Television,” Masters Project Report, EECS Department, U.C. Berkeley, http://bmrc.berkeley.edu/papers/2001-161/, Aug 2001.
[6] T. Hodes and R.H. Katz, “Composable Ad hoc Locationbased Services for Heterogeneous Mobile Clients,” ACM Wireless Networks Journal, Vol. 5, No. 5, Oct 1999, pp. 411-427.
[7] T.M. Levergood, et.al., “AudioFile: A Network-transparent System for Distributed Audio Applications,” Proc. Of the USENIX Summer Conference, Jun 1993.
[8] E. Machnicki and L.A. Rowe, “Virtual Director: Automating a Webcast,” SPIE/IS&T Multimedia Computing and Networking Conference, Vol. 4673, San Jose CA, Jan 2002.
[9] Miranda, Kaleido Multi-Image Display System, http://www.miranda.com/en/products/multiimage/Kaleido.htm, 2002.
[10] S. Mkhopadhyay and B. Smith, “Passive Capture and Structuring of Lectures,” Proc. Of the Seventh ACM International Conference on Multimedia, Orlando, FL, Oct 1999.
[11] Open Mash, http://www.openmash.org/, 2002.
Page 10
[12] Otcl and TclCL, http://sourceforge.net/projects/otcl-tclcl/, 2002.
[13] M. Perry and D. Agarwal, “Remote Control for Videoconferencing,” Proc. Of the 11th International Conference of the Information Resources Management Association, Anchorage AK, May 2000.
[14] L.A. Rowe, “Streaming Media Middleware is more than Streaming Media,” International Workshop on Multimedia Middleware, held in conjunction with the Ninth ACM International Conference on Multimedia, Ottawa, Canada, Oct 2001.
[15] Y. Rui, et.al., “Building an Intelligent Camera Management System,” Proc. Of the Nineth ACM International Conference on Multimedia, Ottawa Canada, Oct 2001.
[16] A. Swan, S. McCanne, and L.A. Rowe, “Layered Transmission and Caching for the Multicast Session Directory Service,” Proc. Of The Sixth ACM International Conference on Multimedia, Bristol, England, Sep 1998.
cp encode find info ln ls mkcon mkdir mkses mv play record rm switch
Copy resource Encode a media stream from a audio/video signal Search name space for matching resource Query resource for information Create a link List a directory Make a conference Make a directory Make a session Move a resource Play a media stream Record a media stream, session, or conference Remove a resource Change routing connection
The INDIVA manager also responds to RPC calls from service processes. The INDIVA shell has all of the commands above plus the following commands that are implemented in the shell:
[17] T.P. Yu, et.al., “dc: A Live Webcast Control System,” SPIE/IS&T Multimedia Computing and Networking Conference, Vol. 4312, San Jose CA, Jan 2001.
8. APPENDIX The INDIVA manager implements the following commands: Command
Description
control config
Send a command to a resource Change state of resource
Submitted for publication Apr 22, 2002
Command cd mount pwd unmount view
Description Change directory Mount an INDIVA node Print working directory Unmount an INDIVA node Launch a local viewer
Page 11