a network programming language based on concurrent ... - CiteSeerX

3 downloads 140 Views 76KB Size Report
timeout. timer!10 blocks a process for 10 seconds. The combination of selective execution and a timer channel is used to .... as Perl, Ruby, and Python. However ...
A NETWORK PROGRAMMING LANGUAGE BASED ON CONCURRENT PROCESSES AND REGULAR EXPRESSIONS Kenta Hatori Systems Development Division Research Institute of Systems Planning, Inc. / ISP Tokyo, Japan email: [email protected] ABSTRACT We propose a new programming language Preccs for networking and implementing communication protocols. Preccs is based on ideas of concurrent processes and regular expressions. We designed the language for simple and intuitive description of communication protocols. Then, we have developed a Preccs compiler that generates C code from a Preccs source program. The generation of C code with the Preccs compiler will make it possible for reliable communication programs development in a short period. In order to evaluate the effectiveness of our approach, we have implemented a simple HTTP server and a simple VoIP application in Preccs. Both of them are implemented in a few lines of code. KEY WORDS network programming, concurrent processes, regular expressions, domain specific languages

1 Introduction Network programming in C with socket API is very tiring work, although it is common way to implement network protocols and applications such as HTTP server, FTP client, SNMP agent. Generally speaking, a typical network protocol mainly consists of two parts: (1) definitions of message formats that are transfered to other nodes and, (2) a specification of message sequences. However, it is hard work to parse complicated messages such as HTTP Request in C. Furthermore, when a network application sends/receives a message, it has to handle other events simultaneously such as timeout and user cancel event. Socket API are insufficient for programming in such a situation because a socket is merely an abstract representation of a communication endpoint. In this case, we need to use either asynchronous I/O, multiplexing I/O or multi-threading. For example, snippets of a program that receives inputs from a socket and a standard input could be described like follows: FD_ZERO(&rfds); FD_SET(STDIN_FILENO,&rfds); FD_SET(so,&rfds);

Kei Hiraki Graduate School of Information Science and Technology The University of Tokyo Tokyo, Japan email: [email protected] ret = select(so+1,&rfds,NULL, NULL,NULL); if (FD_ISSET(STDIN_FILENO,&rfds)) { printf("Input from STDIN\n"); } if (FD_ISSET(so,&rfds)) { printf("Input from SOCK\n"); } This example uses the select system call for multiplexing I/O. The select system call can be used with sockets to provide a synchronous multiplexing I/O. Although this is a quite simple example, it seems unnecessarily complicated because of unessential processes such as initialization of descriptor set. In order to solve these problems, we propose a new language called Preccs that is suitable for network programming and implementing communication protocols. In Preccs, an implementation of protocols mainly consists of two parts: (1) definitions of rules for message sequences with notation based on concurrent process calculus, and (2) definitions of message formats with extended regular expressions that can make it simple to describe complicated structured messages. We also developed a Preccs compiler that generates C code from a Preccs source program. The generation of C code with the Preccs compiler will make it possible for reliable communication programs development in a short period. The rest of the paper is organized as follows. Section 2 discusses related work. The Preccs language is briefly described in Section 3. Section 4 explains how to translate Preccs to C code. After showing experimental results in Section 5, we discusses the effectiveness of Preccs in Section 6. Finally, section 7 concludes and discusses future directions.

2

Related Work

Prolac[7] is a statically-typed object oriented programming language designed for writing readable, modular, extensible, and efficient network protocol implementations. [8] reports that the Prolac TCP implementation has achieved comparable performance to an unmodified Linux 2.0 TCP. However, it is not shown whether the Prolac is applicable

to protocols of other layers or network programming. Erlang[2, 3] is a declarative language for programming concurrent and distributed systems which was developed at the Ericsson, where it is used to write huge realtime control programs for telephone exchanges and network switches. Erlang supports concurrency and has builtin primitives for asynchronous message passing between processes. Erlang are now used for a number of industrial applications. The largest application consists of approximately 250,000 lines of Erlang code. However the Erlang does not provide a mechanism for parsing protocol messages such as regular expression pattern matching in Preccs. Concurrent ML(CML)[11] is an extension of the functional language Standard ML. CML supports the programming of process communication and synchronization using a higher-order concurrent programming mechanism which allows programmers to define their own communication and synchronization abstractions. Pict[10] is a concurrent programming language constructed in terms of an explicitly-typed π-calculus[9] core language. Although both CML and Pict are not designed for networking, the design of Preccs is influenced by these languages. XDuce[5] is a programming language on the basis of regular expression types and pattern matching. Although XDuce is specifically designed for processing XML data, a lot of its ideas are applicable to Preccs.

3 The Preccs Language In this section, we will briefly describe the Preccs language features. 3.1

Processes in Preccs

Processes in Preccs are main entities for computation, so the program in Preccs consists of several processes. A Preccs process is different from a conventional Unix Process. It is scheduled by the runtime system of Preccs. For the present, we regard the Preccs process as a more abstract object.

P ::= stop | skip | var x : T | var x = e | | | | | | G ::= | Σ ::= |

p(e1 , · · · , en ) P1 ; · · · ; Pn G1 -> P1 | · · · | Gn -> Pn e@Σ C{ C-program C} G e1 ! e2 e?x e1 -> P1 | · · · | en -> Pn x1 :R1 -> P1 | · · · | xn :Rn -> Pn

Figure 1. Syntax of process expressions in Preccs

input, output and process creation. Figure 1 shows the syntax of process expression. Variable Definition Both var x: T and var x = e define the new variable x. The former initializes x to the default value of type T and the latter to the evaluated value of the expression e. Process Creation p(e1 , · · · ,en ) creates a new process p and then starts to run concurrently. e1 , · · · , en are evaluated and passed to the process as arguments. Sequential Execution P1 ; · · · ;Pn sequentially executes process P1 through Pn . Selective Execution G1 -> P1 | · · · | Gn -> Pn waits until either of the guarded processes G1 , · · · , Gn is enabled to execute, and selectively executes the corresponding process P1 , · · · , Pn . When several processes are enabled to execute at the same time, it is undecidable which process is executed.

3.1.1 Process Definition

Guarded Process e1 ! e2 and e ? x respectively mean send value e2 on channel e1 and receive a value on channel e, binding that value to the variable x.

The definition of process is quite easy. The following example is the definition of the hello world process that simply displays ”Hello, world!” to the standard output.

Pattern Matching e @ Σ is a pattern match process that tries to match value e with the pattern in Σ.

proc HelloProc() = stdout!"Hello, world!\n" In general, the process definition has the following form:

Σ is a list of pairs of pattern and process. e1 -> P1 | · · · simply tests which value of e1 through en equals to the value. x1 :R1 -> P1 | · · · is called regular expression pattern matching that will be described in the following part.

proc p(x1 :T1 , · · · ,xn :Tn ) = process-expression where p is a process name, xi s are the formal parameter names and Ti s are the formal parameter types. The process expression is a combination of atomic actions that include

3.1.2

Inline C code

Preccs supports the inclusion of embedded C code into the process. C code is surrounded by the special character se-

quences ’C{’ and ’C}’. The following is an example of inline C code. proc Hello() = stdout!"Hello,world!(from Preccs)\n"; C{ printf("Hello,world!(from C)\n"); C} We can also refer to the variable defined by the Preccs process in C code. proc PrintInt(n:int) = C{ printf("%d", TOCINT($n$)); C} $n$ means to refer to the variable n defined in Preccs code. TOCINT is a macro that converts the int reperesentation of Preccs into the representation of C. 3.1.3 Recursive Process We use recursion to express iterative execution since an iterative control structure is not supported in Preccs. The following example is a count down process defined as a recursive process. proc CountDown(n:int) = n @ 0 -> stop | _ -> CountDown(n-1) 3.2

Communication Channel

Processes in Preccs can communicate each other by sending or receiving messages (value) via communication channels. Message passing is synchronous, which means that both the sender and receiver must be ready to communicate before either can proceed. Each channel is typed with the sort of value that it can transmit. 3.2.1 Channel Creation Channels can be created dynamically in processes. var ch: creates a new channel of type T . It is quite similar to the way of variable definition. The process expression var ch:;P binds ch with scope P . The following example shows how to communicate between two processes: proc Main() = var ch:; Sender(ch); Receiver(ch) proc Sender(ch:) = ch!"Hello" proc Receiver(ch:) = ch?msg Both sender and receiver process share the channel ch that is passed as parameter. The sender process sends a message ”Hello” on the channel ch, on the other hand, the receiver process receives the message from the channel.

3.2.2

Built-in Channels

Several channels are provided as built-in channels, which are used for special purposes. The stdout channel is used to display text. The stdin channel is used to get input from a console. The stdout/stdin channel has a type of string channel that restricts to carry string value. We show a simple example for the stdin/stdout channel as follow: proc Echo() = stdin?msg; stdout!msg; Echo() where the echo process waits to receive an input data from a console and binds msg to the data, then the process sends it on the stdout channel, so the input data will be displayed on the console. The timer channel is introduced for realizing a timeout. timer!10 blocks a process for 10 seconds. The combination of selective execution and a timer channel is used to realize a timeout. For example, the following process blocks until some input arrives via stdin channel or 10 seconds elapse, then it executes the corresponding clauses. stdin?x -> stdout!"input message:"ˆx | timer!10 -> stdout!"timeout.\n" In the above example, ’ˆ’ is an operator that concatenates strings. 3.2.3

I/O Channel

I/O channel is a special channel that is associated with I/O stream such as a socket or a file descriptor. In fact, the stdout/stdin channel is also I/O channel that is associated with standard input/output stream. For the sake of using I/O channel, we can easily to realize multiplexing I/O. For instance, the snippets shown in the Section 1 is simply rewritten with I/O channel as follow: stdin?msg -> stdout!"Input from STDIN\n" | sock?msg -> stdout!"Input from SOCKET\n" where sock is a I/O channel associated with a certain socket. Preccs provides several library functions to associate a channel with a I/O stream. 3.3

Type System

Preccs is a statically typed language, so the Preccs compiler checks the type safety of every operation at compile-time. Figure 2 shows a syntax of type expressions in Preccs. bool, int, and string are base type. is a type T of channel. Array type and record type are similar to in a conventional programming language. In the next, we will introduce regular expression type.

T ::= bool | int | string | | | | R ::= | | | | | | |

{ l1 :T1 ; · · · ; ln :Tn } T [n] {R} octet string-literal R1 ; R2 R1 | R2 R* | R+ | R? { l1 :R1 ; · · · ; ln :Rn } R[n] R[l]

Base type Channel type Record type Array type Regular expression type

Figure 2. Syntax of type expressions op

id

opts[0]

opts[1]

・・・

Message

end 0xFF

1 octet

4 oct.

tag

len

data

Option len oct.

Figure 3. An example of message format

The form of a type definition is as follow: type tname = type-expression where tname is a type name. 3.3.1

Regular Expression Types

Regular expression types[6] has been proposed as a foundation for statically typed processing of XML. Preccs enriches regular expression types, so that it makes easy to express a format of a message. octet represents 1 octet data. A string literal is a sequence of characters within double quotes and represents itself. R1 ;R2 is a concatenation and R1 |R2 is an alternation. R*, R+, and R? mean repeat the representation of R zero or more times, one or more, and zero or one, respectively. {l1 :R1 ; · · · ;ln :Rn } is similar to a concatenation of R1 through Rn but has labels to the each representation respectively. R[n] repeats the representation of R n times. R[l] is also l times repetition of R where l means the value of the label l field that is interpreted as integer.

Figure 3 shows an example of message format that is very common in communication protocols. The corresponding type definition of the message format could be represented as follow: type Message = {{ op : octet; id : octet[4]; opts : Option*; end : "FF"h }} type Option = {{ tag : octet; len : octet; data : octet[len] }} The type Option represents a format that is often called TLV (Type-Length-Value or Tag-Length-Value). Note that TLV format cannot be directly represented by normal regular expressions because of a variable sized field. The extended regular expressions of Preccs enables to represent TLV format directly. 3.3.2

Subtyping

Subtyping is an inclusion relation between types. We can naturally introduce subtyping on the regular expression types. We write A stdout!"others" where ’_’ denotes a default pattern that matches any value. In the above example, if a value of the variable msg matches the type HttpGetRequest, a new variable x is binded as the type HttpGetRequest and passed to the process ProcGet as a parameter.

4

Preccs to C Compilation

2500

In this section, we briefly explain how the Preccs compiler translate a Preccs source program into C code. The translation process consists of several phases after parsing a source program as follows: (1) translation into a simple π-calculus-like representation as an intermediate representation (2) conversion to continuation-passing style (CPS) λ-calculus and doing closure conversion (3) translation from CPS to C.

2000 c e s 1500 / s t s e u 1000 q e r

Preccs Apache

500 0 50

100

200 bytes

400

800

4.1 Translation into the Intermediate Representation First, the Preccs compiler translates a program into a simple π-calculus representation as an intermediate representation. This translation is relatively straightforward because the structure of the syntax of the Preccs language is similar to π-calculus. We should point out that regular expression pattern matching is translated to a state transition table for automaton and a call to the runtime routine that executes pattern matching. Moreover, simple optimizations are executed on this representation such as removing unnecessary communications. We adapted the algorithms in [1] for conversion to CPS. The way to generate C code from CPS is based on [14]. 4.2

Pattern Match Compilation

As mentioned above, regular expression pattern matching is translated to a state transition table. A pattern matching automata is implemented by using this table. Note that the automata has several repetition counters because the regular expressions in Preccs are extended as described in Section 3. In order to realize efficient pattern matching, we adopt deterministic automaton for pattern matching. Moreover, the Preccs compiler implements an optimization that avoids unnecessary character matching by using type information.

5 Experimental Results In order to evaluate the effectiveness of Preccs, we have implemented a simple HTTP server and a simple VoIP application in Preccs. 5.1 The Simple HTTP Server An HTTP server is a typical server application, which receives and parses a request from a client, and serves the requested contents. We have implemented a simple HTTP server that supports only a GET method. The server can be implemented in less than 70 lines of code. It is implemented as a concurrent server where it handles multiple clients simultaneously.

Figure 4. Comparison of Preccs HTTP server and Apache

Table 1. Lines of code by file File voip.prc voip.c g711.c Total

Loc 243 46 309 598

Notes Preccs program main(), initialization codec

Next, we compared the performance of the server in Preccs with Apache HTTP server. Figure 4 shows the performance measured in HTTP GET requests per second. The x-axis indicates the size of the data. It is measured by using http load.1 Although this is not a strict comparison because Apache is a full featured HTTP server, it is sufficient to show that the performance of the server in Preccs is enough for practical use. 5.2

The VoIP Application

We have also implemented a simple VoIP application by using Preccs that communicates between two PCs. This application uses a small subset of SIP[12] and RTP[13]. In general, VoIP services need to handle several I/Os simultaneously such as socket, sound devices, and user inputs, so it is complicated to implement in a conventional programming language. On the other hand, using synchronous channel and selective execution in Preccs is greatly helpful and simplifies handling these I/Os. Table 1 gives a code breakdown of the implementation by file. The entire implementation is less than 600 lines of code, and it is only about 300 lines except for codec.

6

Discussion

As pointed out in the introduction, conventional programming languages and sockets are not sufficient to network 1 http://www.acme.com/software/http

load/

programming in terms of productivity. To solve this problem, our approach is to develop a new programming language based on concurrent processes and regular expressions. On the other hand, another approach is possible. One approach is use of the high-level scripting languages such as Perl, Ruby, and Python. However, these languages also burden programmers with the programming style of using sockets and select. Some languages support Remote Procedure Call (RPC) mechanism that allows code to call that is not local to them. RPC is intended to act like a procedure call, but to act across the network transparently. It is useful for some type of network programs. However, it is not suitable for implementing network protocols because the data representation is hidden by the RPC stub.

7

Conclusions

In this paper, we have presented a new programming language Preccs, which enables to simplify network programing and implementing communication protocols. The Preccs language is designed based on concurrent processes and regular expressions. We confirmed the effectiveness of our approach by experimental implementations of a simple HTTP server and a simple VoIP application. We plan to apply model checking techniques[4] to the verification of Preccs programs. We consider that the Preccs is suitable for model checking because its basis is concurrent processes. It is expected to enable development of highly reliable systems by using Preccs with model checking.

Acknowledgments This work is supported by a grant from IPA Exploratory Software Project 2004 and 2005.

References [1] Andrew W. Appel. Compiling with continuations. Cambridge University Press, New York, NY, USA, 1992. [2] Joe Armstrong, Rovert Virding, Claes Wikstrom, and Mike Williams. Concurrent Programming in ERLANG. Prentice Hall, 1996. second edition. [3] Joe L. Armstrong. The development of erlang. In International Conference on Functional Programming, pages 196–203, 1997. [4] Jr. Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. Model Checking. The MIT Press, Cambridge, Massachusets, 1999. [5] Haruo Hosoya and Benjamin C. Pierce. Xduce: A statically typed xml processing language. ACM Trans. Inter. Tech., 3(2):117–148, 2003.

[6] Haruo Hosoya, J´erˆome Vouillon, and Benjamin C. Pierce. Regular expression types for XML. ACM SIGPLAN Notices, 35(9):11–22, 2000. [7] Eddie Kohler. Prolac: A language protocol compilation. Master’s thesis, Department of Electrical Engineering and Computer Science, Massachusetts Insutitute of Technology, 1997. [8] Eddie Kohler, M. Frans Kaashoek, and David R. Montgomery. A readable tcp in the prolac protocol language. ACM SIGCOMM’99, 29:3–13, 1999. [9] Robin Milner. Communication and Mobile Systems:the π-Calculus. Cambridge University Press, 1999. [10] Benjamin C. Pierce and David N. Turner. Pict: A programming language based on the pi-calculus. CSCI Technical Report 476, Computer Science Department, Indiana University, Indiana, 1997. [11] John H. Reppy. Concurrent Programming in ML. Cambridge University Press, 1999. [12] J. Rosenberg, H. schulzrinne, G. Cmarillo, et al. Sip: Session initiation protocol, 2002. (RFC3261). [13] H. Schulzrinne, S. Casner, R. Frederick, et al. Rtp: A transport protocol for real-time applications, 1996. (RFC1889). [14] David Tarditi, Peter Lee, and Anurag Acharya. No assembly required: Compiling standard ML to C. ACM Letters on Programming Languages and Systems, 1(2):161–177, June 1992.

Suggest Documents