Figure 1 illustrates the architecture of the Zigbee protocol stack. ZigBee classifies devices into two classes: full function devices (FFDs) and reduced function.
WANG LAYOUT
12/14/07
5:29 PM
Page 121
ADVANCES IN WIRELESS VOIP
Voice Communications over ZigBee Networks Chonggang Wang and Kazem Sohraby, University of Arkansas Rittwik Jana, Lusheng Ji, and Mahmoud Daneshmand, AT&T Labs Research
ABSTRACT This article provides an overview of ZigBeeenabled wireless networks and discusses the feasibility of supporting voice communications over ZigBee networks. We begin by providing an overview of the ZigBee technology followed by an evaluation of voice quality and performance over such an impoverished wireless channel. Two types of voice communications, namely fullduplex voice over IP (VoIP) and half-duplex push-to-talk (PTT) are considered. Voice quality of VoIP is measured using the R-factor [1] (a well known objective speech quality metric). The quality of PTT, however, is evaluated based on packet-loss rate, delay, and jitter. The simulation results demonstrate that a low-power, low-rate wireless sensor network can support a limited range of voice services.
INTRODUCTION Voice over IP has seen a tremendous surge of innovative services in both business and consumer sectors. Users recently have begun to experience the benefits of intermarrying VoIP with wireless networking, such as mobility and enhanced services, with VoWiFi applications. On the other hand, a new type of wireless network is emerging. Wireless sensor networks are low-cost, low-power, and projected to become ubiquitous. In this realm, we study the feasibility of conducting voice communication over wireless sensor networks, in particular ZigBee [2] networks. ZigBee is the name of a technology that consists of a whole suite of specifications designed specifically for wireless networked sensors and controllers. The physical (PHY) and medium access control (MAC) layers are standardized by the IEEE 802.15 wireless personal area network (WPAN) working group under the designation of 802.15.4 [3]. The higher layers are specified by the ZigBee Alliance [2], which is an industry alliance consisting of a full spectrum of companies, ranging from Zigbee chip providers to solution providers. Compared to other wireless communication technologies, ZigBee is designed specifically for providing wireless networking capability for battery-powered, low-cost, lowcapability sensor and controller nodes, typically
IEEE Communications Magazine • January 2008
powered only by an eight-bit microcontroller. It has been estimated that the software size of a ZigBee stack is only 1/10 of a Bluetooth stack. ZigBee also features power-saving techniques so that deep sleeps can be handled efficiently with rapid wake-up and rapid fall-into-sleep features. As a result, ZigBee wireless sensors can last years in the field without requiring battery changes. In this article, after an overview of the protocols and technology, we provide insights into: • Feasibility of support and the number of full duplex VoIP and half-duplex PTT calls over linear ZigBee topologies. • Characterization of the voice quality using the R-factor, end-to-end delay, jitter, and packet loss. We employ a simulation-based model to characterize some of the previous objectives. The remainder of this article is organized as follows: we provide an overview of ZigBee technology, introduce the methodology for voice quality evaluation, and review the existing work on voice over ZigBee. We investigate the feasibility of voice communications over ZigBee characterized by an objective voice quality metric, the R-factor. We discuss some of the results of the voice scenarios in a realistic networking environment. Finally, we conclude the article.
ZIGBEE TECHNOLOGY OVERVIEW The ZigBee technology is designed to provide a simple and low-cost wireless communication and networking solution for low-data rate and lowpower consumption applications, such as home monitoring and automation, environmental monitoring, industry controls, and emerging low-rate wireless sensor applications. Figure 1 illustrates the architecture of the Zigbee protocol stack. ZigBee classifies devices into two classes: full function devices (FFDs) and reduced function devices (RFDs). FFDs are expected to be the more powerful nodes in a ZigBee network. They participate in functions such as core network formation and data relaying. Also, typically they are expected to be on at all times. RFDs attach to the ZigBee networks via FFDs. They only communicate to their corresponding FFDs in a controlled manner. Hence, RFDs can have very
0163-6804/08/$25.00 © 2008 IEEE
121
WANG LAYOUT
12/14/07
2:57 PM
Page 122
ZIGBEE/IEEE 802.15.4 PHY OSI reference
ZigBee layers Application (APL)
Security service provider
Data link
ZigBee device object (ZDO)
Application support (APS)
Transport
Network
Application object (APO)
Application object (APO)
Application
Network (NWK) Medium access control (MAC)
Physical
Physical (PHY)
IEEE 802.15.4 [3] PHY supports unlicensed industrial, scientific, and medical (ISM) radio frequency (RF) bands including 868 MHz, 915 MHz, and 2.4 GHz. Although sub-GHz bands have a longer communication range (at the same transmission power) and better penetration characteristics, compared to the 2.4-GHz band, these bands are regional. On the other hand, the 2.4-GHz band is globally available. Moreover, when operating in a sub-GHz band, as specified in the original IEEE 802.15.4-2003 standard, ZigBee supports only very low data rates. Hence, a majority of the first wave ZigBee products use the 2.4-GHz band. ZigBee communication uses direct sequence spread spectrum (DSSS). With each 4-bit data block being mapped to a 32-bit pseudo-random sequence at 2M-chip/s and offset quadrature phase shift keying (O-QPSK) in 2.4-GHz band, the ZigBee data rate is 250 Kbps. In total, 16 5MHz wide channels are available in a 2.4-GHz band for ZigBee.
ZIGBEE/IEEE 802.15.4 MAC ■ Figure 1. Block diagram of ZigBee layers.
Mesh
Star
ZigBee coordinators ZigBee routers Cluster tree
ZigBee devices
■ Figure 2. ZigBee topologies. simple software. In addition, RFDs often can go to sleep to save battery power and allow their corresponding FFDs to handle communications on their behaviors. In a ZigBee network, devices may take on one of three roles: • End device — an FFD or an RFD that executes applications and does not have a child node is an end device. • Router — an FFD that can relay messages and usually, has both child nodes and parent nodes. • Coordinator — an FFD that controls a ZigBee network, which also can be a gateway to the external world, a trust center, and an access authenticator, all in one. Each ZigBee network has one coordinator but can have multiple routers and end devices. Like end devices, routers and the coordinator may execute their own sensor/controller applications. ZigBee supports multiple multihop network topologies, as illustrated by Fig. 2.
122
In the MAC layer, there are two options for medium access: beacon-based and non-beaconbased. In the non-beacon scenario, there is no time synchronization among the ZigBee devices. Devices access the channel using carrier sense multiple access with collision avoidance (CSMA/CA). Each transmitter is required to assess the channel before transmitting. If the channel is detected as busy, the transmitter backs off and senses the channel again at the end of the back off window. IEEE 802.15.4 MAC also provides link layer data acknowledgement (ACK) and retransmission functions. The beacon-based access divides airtime into units of fixed length known as superframes, each of which is confined by two consecutive beacons and consists of up to three periods: a contention access period (CAP), an optional contention free period (CFP), and an inactive period. The CAP and CFP contain a number (16 by default) of equally sized time slots. A beacon frame must be transmitted at the first slot. The slots in CFP are called guaranteed time slots (GTS), which can be allocated by the personal area network (PAN) coordinator to support urgent real-time applications. During CAP, the channel is accessed using slotted CSMA/CA. Typically, the beacon-based access is used only in networks structured as star and cluster tree topologies as shown in Fig. 2, whereas the flat mesh topology networks use non-beacon access. The star and cluster tree topologies are not as flexible and robust for use in a mesh topology. Also, because it is difficult to add network structure and time synchronization, which is required by the beacon-based approach atop of the self-organized and potentially large ZigBee mesh networks, virtually all commercially available ZigBee systems currently support only non-beacon MAC access. Hence, in this article, we focus on studying the feasibility of voice communication over non-beacon ZigBee networks.
IEEE Communications Magazine • January 2008
WANG LAYOUT
12/14/07
2:57 PM
Page 123
NETWORK LAYER PROTOCOL ZigBee has multihop store-and-forward capability designed as an integral part of the system. This function is implemented within the network layer. In addition to forwarding data, the ZigBee network layer protocol also deals with issues such as: devices joining and leaving the network, configuring new devices, device addressing, and neighbor and route discovery. Multiple service primitives are defined to enable a device to join (or leave) the network through invocation of MAC layer association or disassociation. Each ZigBee device is identified by a unique 64-bit IEEE address. In addition, after joining a ZigBee network, a ZigBee device is assigned a 16bit short network address for intra-network communications. Admission control and address management of a ZigBee network are performed by its coordinator node. For structured topologies, such as a cluster tree, the short addresses are allocated in a maskable fashion similar to that of the IP addressing. This addressing scheme reflects the hierarchical nature of the topology. Also, in structured ZigBee networks, scheduling of beacon transmissions is used to guarantee that each device is allocated a slice of time for contention-free channel access. As a result, it is obvious that the beacon interval will be much larger than the superframe duration for the network with a large node density. In the beacon-enabled mode, a coordinator broadcasts beacons periodically to synchronize the attached devices. In the non-beacon-enabled mode, a coordinator does not broadcast beacons periodically but may unicast a beacon to a device that is soliciting beacons. With ZigBee routing, an end-to-end path can be established, and data can be transmitted successfully from the source to the destination. There are three routing approaches in ZigBee. The first one is hierarchical routing, which relays data frames through the tree structure that is formulated when devices (re-)join the network through association. This tree topology, for example, the star and the cluster tree in Fig. 2, is routed from the ZigBee coordinator. The hierarchical routing works as follows: • Data frames climb up the tree from the source device toward the ZigBee coordinator. • If the destination device is between the source device and the ZigBee coordinator, data frames will be received by the destination device; otherwise, data frames will go down the tree from the ZigBee coordinator until arrival at the destination device. This kind of routing does not require dedicated routing protocol, but the disadvantage is that it usually takes a long path to climb up and go down through the tree. In addition to the previous hierarchical routing, ZigBee provides a request-response-based routing protocol derived from the Ad hoc Ondemand Distance Vector (AODV) [4] routing algorithm. There are two major commands in this routing protocol: route request (RREQ) and route reply (RREP). RREQ is broadcast hopby-hop beginning at the source device and
IEEE Communications Magazine • January 2008
rebroadcast by intermediate devices until it reaches the destination device, which in turn will generate RREP, which traverses the reverse path of that of the RREQ back to the source device. When RREP reaches the source device, a route is established. The difference between the ZigBee routing protocol and the original AODV protocol is that ZigBee routing attempts to select a route with the least cost other than the least number of hops. In order to do this, ZigBee specifies a field in the RREQ called path cost, defined as the sum of link cost. Each intermediate device calculates its link cost and accordingly, updates the path cost field when RREQ is rebroadcast. As a result, the destination device, when it receives RREQ, can know the cost of the route through which the RREQ is coming and is able to choose the route with the least cost and notify the source device by sending back the RREP via the reverse path of the least cost path. Additionally, the path cost field can be utilized by intermediate devices to filter and avoid unnecessary RREQ broadcast. In ZigBee, the cost of a link l, (C(l)) is an integer-valued parameter in the range of [0, 7], defined as follows: C(l) = 7 or min(7, round(power(pl, –4)))
ZigBee security is offered at several levels. Link level security, designed by the IEEE 802.15.4, provides security services for access control, message integrity, message confidentiality, and replay protection. In addition, ZigBee provides a message integrity check.
(1)
where pl is defined as the probability of packet delivery on the link l [2]. The third routing approach is source-routed data transmission. The route record command is designed so that the destination device of this command can record a whole route through the network and construct the source route table, which can be utilized later by this device to perform source-routed data transmission.
ZIGBEE SECURITY ZigBee security is offered at several levels. Link level security, designed by the IEEE 802.15.4, provides security services for access control, message integrity, message confidentiality, and replay protection. In addition, ZigBee provides a message integrity check. When security is enabled, IEEE 802.15.4 data is encrypted using a particular mode of the 128-bit AES, that is, AES-CTR, AES-CBC-MAC, or AES-CCM, depending on security requirements. In addition, ZigBee supports network-level security by applying AES encryption using network-wide key. Application-layer security also can be provided with pair-wise key established between the communicating peers. ZigBee security is generally considered adequate for its applications.
ZIGBEE APPLICATIONS Most common applications that ZigBee technology advocates are based on medical, residential, and industrial control and monitoring. Examples include: lighting controls, automatic meter reading, wireless smoke and carbon monoxide (CO) detectors, heating, ventilation, and air conditioning (HVAC) and heating control, home security, environmental controls, medical sensing and monitoring, universal remote control to a set-top box, and industrial automation. Generally speaking, ZigBee is designed to transmit low-rate data with low-energy consump-
123
WANG LAYOUT
12/14/07
2:57 PM
Characteristics of ZigBee technology can affect its ability of carrying voice communications. ZigBee networks have limited bandwidth — only up to 250 Kb/s. Also, channel access contentions in ZigBee networks are
Page 124
tion within a short-distance. Originally, it was not intended for voice transmission. However, if ZigBee nodes already are deployed in an area for sensing and control applications, they provide a communication infrastructure that may become valuable to tap into for other applications under special circumstances. To this end, we study the feasibility of carrying non-secure VoIP and PTT communications over standard ZigBee networks.
VOICE OVER ZIGBEE In this section we introduce voice quality evaluation methodology and then survey the recent schemes that support streaming voice over ZigBee.
resolved by the
SPEECH QUALITY
CSMA/CA protocol,
Subjective Rating — Traditionally, the telecommunications industry has measured voice performance in terms of customer ratings of speech quality on a five-point scale, where an excellent rating is given a score of 5; good, a 4; fair, a 3; poor, a 2; and bad, a 1 [1]. Different approaches have been used to translate these ratings into an overall single measure from which speech quality can be judged. A popular approach is to calculate the arithmetic mean of scores known as a mean opinion score (MOS). A MOS ranges from a minimum of 1 to a maximum of 5. A MOS of 4.0 or higher is considered toll quality, and a lower limit of 3.0 is suggested for usable telephony.
which inevitably introduces additional waiting time for transmissions.
ITU-T E-model (R-factor) — A well known and widely used approach for evaluating voice quality is the E-model defined in International Telecommunication Union-Telecommunication (ITU-T) Rec. G.107 [5]. The E-model combines individual impairments (loss, delay, echo, codec type, noise, etc.) due to both the signal properties and the network characteristics into a single, overall measure of conversational voice quality called the rating factor (R-factor) [1]. This quantity is defined as follows: R = 100 – Is – Id – Ief + A,
(2)
where Is is the signal-to-noise impairment, Id is the impairment associated with the mouth-to-ear delay of the path, I ef is the equipment impairment factor that captures the effect of information loss due to both the encoding scheme and to packet loss, and A is the expectation factor. For a G.729a voice codec scheme with the assumption of random packet loss, the previous equation is simplified as the following expression [1]: R = 94.1-0.024d-0.11(d-177.3)H(d-177.3)-1140ln(1 + 10e), d = dcodec + djitterbuffer + dnetwork, (3) e = enetwork + (1 – enetwork)ejitterbuffer, where H(.) is the Heaviside step function, and d is the one-way mouth-to-ear delay that consists of three components: the delay associated with the codec (dcodec), the delay associated with the dejitter buffer introduced to smooth out the
124
delay variation (djitterbuffer), and the one-way transit delay across the IP transport network (dnetwork). Variable (e) represents loss probability including loss in the IP transport network (enetwork ) and the loss resulting from the dejitter buffer at the decoder (ejitterbuffer). [1] provides a simplified method to calculate the previous components in d and e, based on the metrics at the transport level. R and MOS are related as follows [1]: • R < 0 corresponds to MOS = 1; • R > 100 corresponds to MOS = 4.5; • 0 < R < 100 corresponds to MOS = 1 + 0.035R + 7 × 10–6R(R-60)(100-R). It is obvious that the higher R is, which implies smaller d and smaller e, the higher the MOS is that will be obtained. An R-factor between 50 and 60, 60 and 70, 70 and 80, 80 and 90, or 90 and 100 indicates poor, low, medium, high, or best voice quality, respectively.
VOICE OVER ZIGBEE The performance of ZigBee networks has been extensively investigated, for example, in [6] and [7]. However, those studies are not specific to voice quality. The authors of [8] reported how they extended their Firefly ZigBee work to provide real-time voice communication capability for underground miners separated by long tunnels. However: • The Firefly node has an additional AM radio interface for time synchronization, a highly specialized design that is not applicable to general ZigBee devices. • Firefly communication is based on a nonstandard time division multiple access (TDMA)-based link protocol designed specifically for linear network topology; it is difficult to generalize the approach to a standard ZigBee network of any topology. Our interest in the feasibility of conducting voice communication over commercial off-the-shelf (COTS) ZigBee hardware, where the normal operating mode is non-beacon-enabled, unsynchronized, and unslotted CSMA/CA, remains unanswered by [8]. Several characteristics of ZigBee technology can affect its capability of carrying voice communications. First, ZigBee networks have limited bandwidth — only up to 250 Kbps. This limitation confines the maximum number of supportable voice calls or sessions. Second, channel access contentions in ZigBee networks are resolved by the CSMA/CA protocol, which inevitably introduces additional waiting time for transmissions and leads to smaller effective bandwidth and increased delay — both could degrade the quality of voice communications. In addition, to maintain the advantage of low cost, a ZigBee node usually has low-gain antenna design, limited computation capability, and limited buffer size, which can affect the voice quality as well.
SIMULATION STUDIES This section describes simulation studies conducted using the network simulator (NS2) [9] and the corresponding simulation results for two types of voice communications: full-duplex VoIP
IEEE Communications Magazine • January 2008
WANG LAYOUT
12/14/07
2:57 PM
Page 125
and half-duplex PTT. Linear network topology with N nodes is used in the experiments, with distance D between neighboring nodes. Voice communications are between the two end nodes only. The transmission range (TXR, which is the maximum distance at which a transmission can be successfully received) is 15 meters, and the carrier sense range (CSR, which is the maximum distance at which a transmission can be detected) is 15 or 30 meters.
PERFORMANCE OF VOIP Each full-duplex VoIP connection is simulated by two constant bit-rate (CBR) flows of opposite directions. The parameters of the CBR flows follow those of the G.729a codec: 20 bytes of data every 20 ms interval. Adding RTP, UDP, and IPv4 headers, each VoIP packet becomes 60 bytes long. Using IPv4/UDP/RTP header compression (e.g., IETF RFC 3095 [10]), a 40-byte IPv4/UDP/RTP header can be compressed to only one byte. In our study, we consider both cases: with header compression (W/HC) and without header compression (W/O HC). The distance D is set to eight meters. Buffer size in every node is 50 packets with first input first output (FIFO) queuing and tail-drop discipline. R-factor is used to measure the quality of VoIP. To calculate the R-factor, a dejitter buffer of six packets is applied. Table 1 presents the resultant R-factor (Eq. 3) for different configurations. From these results, it can be concluded that: • Between two directly connected nodes, two (or three, if header compression is in use) G.729a VoIP calls of medium voice quality can be supported. • For two hops, if a hidden terminal problem can be avoided, that is CSR is at least twice that of TXR, one VoIP call can be supported. • When the number of hops exceeds three, support of G.729a VoIP on ZigBee networks becomes unreliable.
PERFORMANCE OF PTT PTT is a half-duplex voice communication and therefore, is more delay and jitter tolerant than (full-duplex) VoIP. Each PTT session consists of a series of bursts, each corresponding to the time duration within which a single user talks. We use one-way CBR traffic of the same duration as each voice segment to mimic each voice burst adaptive multirate (AMR) with 5.15 kb/s assumed, with each 20-ms voice frame length digitized into a 13-byte AMR frame. We assume that Namr (=5) voice frames are aggregated in a single IP voice packet. • For each PTT session on the average, there are four bursts, and the average duration of each burst is seven seconds. • Session arrival is according to a Poisson process with an average arrival rate, λ. To observe the maximum number of PTT sessions, we assume that there are Ns active sessions — that is, we set λ = N s /(4*7). The distance between nodes is D = 8 m and CSR = 2TXR = 30 meters. Two cases of buffer size are 50 and 200 packets with FIFO queuing and taildrop assumed.
IEEE Communications Magazine • January 2008
Two hops
One hop1 VoIP calls
CSR = TXR
W/O HC
W/HC
1
78.9
2
CSR = 2TXR
Three hops2
W/O HC
W/ HC
W/O HC
W/ HC
79.2