Multimedia Distribution using Network Coding on the iPhone Platform

4 downloads 69687 Views 302KB Size Report
Oct 29, 2010 - dom linear network coding on the Apple iPhone and iPod. Touch mobile ... motivating application is data distribution in wireless ad-hoc networks, where .... GUI that is written in Objective-C can call regular C++ code, thereby ...
Multimedia Distribution using Network Coding on the iPhone Platform Péter Vingelmann

Morten V. Pedersen

Budapest University of Technology and Economics Dept. of Automation and Applied Informatics Budapest, Hungary

Aalborg University Dept. of Electronic Systems Aalborg, Denmark

Frank H. P. Fitzek

Janus Heide

Aalborg University Dept. of Electronic Systems Aalborg, Denmark

Aalborg University Dept. of Electronic Systems Aalborg, Denmark

ABSTRACT This paper looks into the implementation details of random linear network coding on the Apple iPhone and iPod Touch mobile platforms for multimedia distribution. Previous implementations of network coding on this platform failed to achieve a throughput which is sufficient to saturate the WLAN interface. In addition to previous works we compare new implementations based on two different Galois fields: GF(28 ) and GF(2). Using the binary Galois field allows us to ensure high throughput and low computational requirements on mobile devices with limited resources. We have implemented this approach and achieved synthetic encoding/decoding throughput of up to 36/29 MB/s on a third generation iPod Touch 32GB which exceeds the results of other researchers by two orders of magnitude.

Categories and Subject Descriptors H.m [Information Systems Applications]: Miscellaneous

General Terms Experimentation

Keywords Network coding, performance evaluation, iPhone platform

1. INTRODUCTION Multimedia distribution in wireless P2P networks is becoming more important in the near future. Efficient data

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’10, October 25–29, 2010, Firenze, Italy. Copyright 2010 ACM 978-1-60558-933-6/10/10 ...$10.00.

distribution can be realized using network coding. The concept was introduced in information theory by Ahlswede et al. [1] in 2002, and it received a lot of attention in recent years. Several research works considered the idea from a theoretical point of view [4, 6], while others implemented network coding [5, 7] to prove the feasibility of this novel technique. The core idea of network coding is to take several original packets and combine them into one coded packet, which has the same length as one original packet. If we combine N packets, then at least N coded packets are necessary to successfully decode the original packets at the receiver. An encoding vector is attached to each coded packet to give the receiver the knowledge of which packets have been combined. As any other coding scheme, network coding can be used to deal with erasures. In addition, network coding offers the possibility to recode packets inside the network, i.e. each node can recombine previously received coded packets into a new coded packet. Using network coding may provide multiple benefits, namely improved throughput, high degree of robustness, lower complexity, and improved security. Katti et al. [2] have shown that random network coding can significantly improve end-to-end throughput of unicast sessions, provided that multiple paths between the source and the destination are used simultaneously. Network coding can be applied in many mobile communication scenarios such as multicast or meshed networking. A motivating application is data distribution in wireless ad-hoc networks, where the objective is to share pictures, sound, video, etc. among a set of source nodes and a set of receivers or sink nodes. In this work we consider the implementation of random linear network coding (RLNC) on the iPhone platform, including the second and third generation iPod Touch, and the iPhone 3G. The iPhone is a state-of-the-art mobile platform, and it has already been widely used for streaming multimedia applications from the Internet. Its ARMv6 core is used in many other contemporary mobile devices, therefore we can use our results as an estimate of what can be achieved on similar platforms. Note that mobile devices have limited resources such as energy, memory, and computational

power in addition to the general problems in mobile networking such as limited wireless capacity. Special care must be taken if we intend to perform computationally intensive operations on any mobile platform. Shojania et al. [8] have already implemented random linear network coding on the iPhone, but the measured data rates were no greater than 420 kilobytes per second, and the high additional CPU load for encoding and decoding imposed a negative impact on battery constrained devices. The authors only considered network coding over the Galois Field(28 ), which involves a relatively small probability of generating linearly dependent packets, but it also imposes a significant computational overhead. In [3] the authors proposed to base RLNC on the binary Galois field in order to decrease the computational complexity. This approach can help to significantly increase throughput and decrease the CPU load and energy consumption of NC implementations. Although the probability of generating linearly dependent packets increases if we use GF(2), the computational overhead introduced by these extra packets is far less than doing the heavy calculations over GF(28 ). Moreover, it has been shown that for a given desired throughput GF(2) allows for a substantially higher generation size than GF(28 ), thereby it can be more efficient from a network point of view in some cases. Therefore in this paper we provide two separate implementations based on GF(2) and GF(28 ), and we compare their performance in terms of encoding and decoding throughput. This work is organized as follows. Section 2 introduces the concept of RLNC. In Section 3 we describe the two network coding implementations. Section 4 presents measurement results obtained with the application. The final conclusion is drawn in Section 5.

2. RANDOM LINEAR NETWORK CODING As any other coding scheme, network coding consists of two distinct parts: encoding means creating coded packets from the original packets and decoding means transforming these coded packets back to the original format. The data to be sent is divided into packets and a certain number of these packets forms a generation. A generation is a series of packets that are encoded and decoded together. During encoding, we use random coefficients to form linear combinations of data packets. All operations are performed over a Galois field. Let N be the number of packets in a generation, and let P be the size (in bytes) of a single data packet. Each encoded packet contains a header (N bytes in GF(28 ) and N bits in GF(2)) and a payload (P bytes). The header part is often referred to as the encoding vector. At least N linearly independent encoded packets are necessary to retrieve the original data set at the decoder side.

2.1 Encoding First, the encoder has to read N original packets in order to generate encoded packets for this generation. The original data (N · P bytes) is stored in a matrix of corresponding dimensions. Random coefficients are also necessary for the encoding process, each encoded packet requires N coefficients. The possibility of generating linearly dependent encoded packets depends on the Galois field size and the generation size as it is shown in [3]. It means that more than N encoded packets are necessary in some cases. The payload of each encoded packet is calculated by mul-

tiplying the header as a vector with the data matrix. The computational complexity of this operation depends heavily on the field size. Note that any number of additional encoded packets can be generated using a rateless code.

2.2

Decoding

We use a modified, on-the-fly version of Gauss-Jordan elimination for decoding. It guarantees that the data is always maximally decoded, therefore the load of decoding is distributed evenly over time. The encoded packets from the same generation are aggregated together, containing both the encoding vector and the payload part. Upon receiving a coded packet, the new data is processed using the previously received packets. The elimination is based on the encoding vector, but the corresponding operations are also performed on the payload part. The decoder stores the received, and partially decoded, data packets in an N × (N + P ) decoding matrix. First, the forward substitution is performed where we subtract multiples of already received packets from the newly arrived packet. If a packet is linearly independent, then the modified encoding vector will have a non-zero pivot element. If the pivot coefficient is not one, the row has to be normalized by dividing all of its elements by this coefficient. After this step the new row can be inserted into the decoding matrix based on the position of its pivot element. The last step is to propagate this row back to the existing non-zero rows. The algorithm stops when the decoding matrix is in reduced row echelon form, thus the payload part contains the original decoded data.

3.

IMPLEMENTATION

Our application was originally written in C++, and it was designed to be a platform-independent command-line tool for throughput measurements. Unfortunately all applications on the iPhone platform are required to have a GUI, and using the Objective-C language is mandatory for GUI development. However, the Xcode development environment uses GCC 4.2 as its internal compiler, and it is possible to compile C++ source files in Objective-C++ mode and link with the generated object files. Consequently, the GUI that is written in Objective-C can call regular C++ code, thereby we can maintain a high degree of platform independence. We use the O3 and ftree-vectorize optimization arguments of the GCC compiler. The throughput measurements always run in the application’s main thread in order to get maximum priority and minimize interference with other threads and processes. Unfortunately this means that the GUI becomes unresponsive during batch measurements. Encoding and decoding are not particularly complex operations over GF(2). Encoding is implemented by bitwise xoring several original data packets based on the ones in the corresponding encoding vector, which is a random bit vector. Decoding is performed in two steps with a slightly modified Gauss-Jordan algorithm. The forward substitution is done by enumerating the elements of the encoding vector, if an element is one and we have a corresponding row in the decoding matrix, then we xor the new packet with the corresponding row, otherwise this is a pivot element and we can go to the next step. The packet is discarded if no pivot element is found. Backward substitution means subtracting (i.e. xoring) the pivot packet from previously received packets that have one at the pivot position of the new packet.

If we consider GF(28 ), the fundamental question is how to realize the Galois Field arithmetics. Addition corresponds to the XOR operation that can be performed natively on the CPU. On the other hand, multiplication is more complicated over GF(28 ). The traditional method is to use two look-up tables for the log and exp functions, and calculate the x · y product using the exp[log[x] + log[y]] formula. Such an implementation requires three memory reads and one addition for each multiplication. In [8] the authors suggest to use a loop-based approach to compute the product procedurally every single time it is needed. We use a simpler solution based on a single look-up table. The multiplication table can be pre-calculated and stored in a static byte array. This table only occupies 256 × 256 bytes = 65 kB in memory (maybe in the cache). Multiplication can be performed by looking up a certain element in this array. If we are to multiply x and y, then the element at index x · 256 + y must be retrieved from the multiplication array. In general this requires one binary shift, one addition and one memory read for each multiplication. However, if we notice that one of the factors(x) will remain the same, when we add a multiple of a packet to another packet, then we can select a single row of the table as a base pointer. Consequently, most multiplications will only require one pointer addition and one memory read. Moreover, it is highly possible that the selected row (only 256 bytes) is stored in cache, thereby this approach can be more efficient than the traditional log/exp lookups and the loopbased implementation. Note that encoding means generating linear combinations of existing packets, which can be performed over GF(28 ) with xor operations and multiplication as described above. Random coefficients are generated with a simple xorshift random number generator. The forward and backward substitution phases of decoding only involve linear combinations and they can be performed in a similar way. The normalization phase also involves division over GF(28 ), though it is executed only once for each innovative packet. Division can be realized using a single look-up table similarly to multiplication.

4. RESULTS In this section we present results obtained with our network coding benchmark application running on an iPhone 3G, a second generation iPod Touch 8GB and a third generation iPod Touch 32GB. Encoding and decoding throughput was measured with different packet sizes (P ) and generation sizes (N ) using implementations based on GF(2) and GF(28 ). A simple approach would be to measure encoding and decoding times for a single generation, but this would lead to inaccurate results. In order to simulate a more realistic scenario we chose the following setup: during each test iteration the objective is to encode and subsequently decode a 512 KB data buffer. This buffer is divided into several generations based on the current packet size and generation size, and the encoder generates several sets of encoded packets for these generations. Then these encoded packets are forwarded to the decoder that tries to decode all generations to retrieve the entire data set. The aggregate encoding and decoding times are measured during each iteration. A full test consists of 1000 iterations. All scalars in the encoding vectors are generated using a uniform distribution.

The average results and standard deviations are presented on the following figures. Note that decoding results are shown inside the wider bars for encoding values.

Figure 1: Encoding/decoding throughput on the iPhone 3G (OS version 3.1.2) As seen in Figure 1 the coding rates for GF(2) and GF(28 ) on the iPhone 3G are quite different. Encoding in GF(2) can be faster than 16 MB/s for N = 16, P = 512, whereas in GF(28 ) it does not exceed 2 MB/s for any setting. In general, GF(2) is approximately 8 times faster. As expected, the encoding and decoding speed decreases as the generation size increases. Decoding throughput is only slightly lower than encoding throughput in most settings. Theoretically the number of operations needed for encoding and decoding is the same. Thus it should be possible to achieve similar encoding and decoding throughputs. We see that decoding throughput is quite close to encoding rates for higher generation sizes in GF(2) and for all settings in GF(28 ). This is an interesting observation, since on desktop computers decoding is usually 15-30% slower than encoding according to measurements in [9, 3]. We also observe that for smaller generation sizes in GF(2) the throughput decreases when the packet size increases. However, this tendency does not hold for other settings. If we compare the GF(28 ) results with the previous implementation by Shojania et al. [8], we observe an average 50% increase in throughtput. Figure 2 shows a 25-35% speed-up on the second generation iPod Touch compared to the iPhone 3G. This observation is in accordance with the ratio of processor frequencies on the two devices. The ARM1176 CPU in the iPhone 3G is underclocked to 412 MHz, and the clock frequency is 533 MHz on the 2ndGen iPod Touch (this is a 29% increase). In comparison with the implementation in [8], we report an average 46% increase in encoding and decoding rates. In Figure 3 we can see that the third-generation iPod Touch is almost twice as fast as the second-generation iPod Touch and it is 150% faster than the iPhone 3G. This device is widely considered to be fastest in the iPhone family. It is comparable to the iPhone 3GS, which has the same processing core. This is particularly interesting, because the ARM Cortex-A8 core is underclocked to 600 MHz in the third-generation iPod Touch, which is only a 12.5% speed-

Figure 2: Encoding/decoding throughput on the 2ndGen iPod Touch 8GB (OS version 3.1.2)

up compared to the previous generation. However, Apple states that the iPhone 3GS is ”up to 2X faster” than the iPhone 3G due to the improved processor performance. For small generation sizes in GF(2) we observe that the packet size has less impact on the encoding throughput compared to the other two devices. The tendency is the same on all three devices: the throughput is approximately a first-order function of the generation size. Note that these network coding benchmarks may not accurately measure the raw processing power of the different devices. The performance of general-purpose applications may vary significantly.

5. CONCLUSION In this paper we have presented two different implementations of random linear network coding for the iPhone platform. Our implementation over GF(28 ) has surpassed previous throughput results of Shojania et al. [8] by 50% in encoding and decoding on the iPhone. More importantly, we have shown that GF(2) can be extremely useful if high throughput and low energy consumption are important parameters. Our GF(2)-based implementation is 6-10 times faster than its GF(28 )-based counterpart for all settings. Note that the current platform-independent solution can be optimized through assembly and SIMD instructions to deliver even higher throughputs. Moreover, the iPhone 3GS and the third generation iPod Touch are equipped with a PowerVR SGX GPU which supports OpenGL ES 2.0 shaders. Thereby it becomes possible to run GPU-based network coding on mobile devices. An OpenGL-based implementation is described in [9], which can be ported onto the iPhone platform in the future.

6. ACKNOWLEDGMENTS This work was partially financed by the CONE project (Grant No. 09-066549/FTP) granted by Danish Ministry of Science, Technology and Innovation as well as by the collaboration with NOKIA throughout the ENOC project.

Figure 3: Encoding/decoding throughput on the 3rdGen iPod Touch 32GB (OS version 3.1.3)

7.

REFERENCES

[1] R. Ahlswede, N. Cai, S. Y. R. Li, and R. W. Yeung. Network information flow. IEEE Transactions on Information Theory, 46(4):1204–1216, 2000. [2] S. Chachulski, M. Jennings, S. Katti, and D. Katabi. Trading structure for randomness in wireless opportunistic routing. SIGCOMM Comput. Commun. Rev., 37(4):169–180, 2007. [3] J. Heide, M. Pedersen, F. Fitzek, and T. Larsen. Network coding for mobile devices - systematic binary random rateless codes. In Workshop on Cooperative Mobile Networks 2009 - ICC09. IEEE, June 2009. [4] T. Ho, R. Koetter, M. Medard, D. Karger, and M. Ros. The benefits of coding over routing in a randomized setting. In Proceedings of the IEEE International Symposium on Information Theory, ISIT ’03, June 29 July 4 2003. [5] S. Katti, H. Rahul, W. Hu, D. Katabi, M. Medard, and J. Crowcroft. Xors in the air: practical wireless network coding. In Proceedings of SIGCOMM ’06, pages 243–254. ACM Press, September, 11-15 2006. [6] M. M´edard and R. Koetter. Beyond routing: An algebraic approach to network coding. In INFOCOM, 2002. [7] J.-S. Park, M. Gerla, D. S. Lun, Y. Yi, and M. Medard. Codecast: a network-coding-based ad hoc multicast protocol. Wireless Communications, IEEE [see also IEEE Personal Communications], 13(5):76–81, October 2006. [8] H. Shojania and B. Li. Random network coding on the iphone: Fact or fiction? In ACM NOSSDAV 2009, June 2009. [9] P. Vingelmann, P. Zanaty, F.H.P.Fitzek, and H. Charaf. Implementation of random linear network coding on opengl-enabled graphics cards. In European Wireless 2009, Aalborg, Denmark, May 2009.

Suggest Documents