For some domains, software systems tend to be highly concurrent. Servers exploiting the inherent ... performance-wise from increased concurrency at all, many applications that are ... which allow for consistency checking based on the static ...
A generic Component Framework for High Performance Locally Concurrent Computing based on UML 2.0 Activities Tim Schattkowsky C-LAB
Abstract Software support for hardware threads like the Hyperthreading Technology or the upcoming multi-core desktop processors is required even for traditional single processor domains home and systems. Although the modeling of concurrent systems is already quite advanced, the current engineering practice usually does not yield highly concurrent applications extra efforts due to several limitations without most methods for concurrent systems design. Unlike other methods, we consider the as a deployment problem where the software components need to be deployed on different multiple execution units depending on the system configuration. To overcome this, we present a component model and design approach based on the execution semantics UML 2.0 Activities that enables the design and construction software applications with increased inherent concurrency and scalability for multi-processor platforms. The application of the approach and its benefits are demonstrated in a real world web server example.
1. Introduction The design and construction of efficient concurrent software systems is still one of the most challenging tasks for software engineers. We observe that the actual application of concurrency in software systems vanes depending on the application domain. For some domains, software systems tend to be highly concurrent. Servers exploiting the inherent concurrency of multiple client connections are a typical example for such a system. Systems that need to certain timing requirements embedded systems and multimedia systems) also tend to employ concurrency to fulfill those requirements. However, there are multiple applications, where concurrent behavior i s only employed when it is enforced by the platform. Many of those applications are desktop applications that implicitly target single processor systems, where users will not recognize any benefit from increased
Alexander University Paderborn uni-paderborn.de
concurrency in these applications based on the assumption that a fully loaded processor is not likely to increase its overall performance significantly when the same program behavior is scheduled differently. However, the increase in engineering and production costs and physical limits make single core processor designs far less attractive, since the actual core logic takes less and less space on the silicon as compared to added caches, which could be shared on multi-core processors. Thus, for next generation multi-core processors appear to be a more cost-effective alternative for increasing computing power compared to speeding up single processor cores. All major processor manufacturers including Intel, AMD, Motorola, and have announced such products for the next year. Furthermore, multiple hardware threads in a single core have arrived even at desktop in the form of Intel’s Hyperthreading technology that introduces two logical processors sharing the execution units of a processor. designs are not However, most contemporary able to exploit the additional computing power offered by such systems. While some of those applications like word processors are not likely to profit significantly performance-wise from increased concurrency at all, many applications that are strongly processor bound could do so. Examples for such applications range computer games over multimedia to scientific applications. In this paper, we introduce a component-based design approach for concurrent systems using UML 2.0 Activity Diagrams that inherently increases system concurrency compared to traditional synchronous component approaches and additionally supports automatic replication of components. Furthermore. we describe crucial aspects of the implementation o f an execution framework for components based on our approach. We demonstrate how complex design model elements and scalability features can be mapped to a framework of components using Activity-Diagram-like execution semantics. Finally, we outline the application of the approach based on a web server example and conclude with an overview of ongoing and work related to the approach.
Proceedings of the 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’05) 0-7695-2308-0/05 $20.00 © 2005 IEEE
2. Related Works In the domain of component systems research, there exist many publications about component frameworks, service composition and related modeling concepts based on Petri Nets, UML etc. However, established component technologies like COM and CORBA do not focus on the concurrency aspect, especially not for not physically distributed software systems. Petri Nets are the most commonly used design mechanism that actually captures concurrency aspects at the system-level. However, Petri Nets are mostly employed in workflow modeling and service composition approaches like [S]. Only a few specific implementations complement these design approaches. Hamadi and Benatallah formulated an approach for modeling the semantics of component composition in [7] which is mainly focused on Web services. In their work, the control flows of the single components are described using Petri nets. Based on that, they define “Service Nets” and Web services as Petri nets with some additional properties. The main part is about the composition of the components, for which the authors define a “Web service algebra”, which contains all basic control flow constructs like sequence, parallelism, alternative choice, iteration etc. so that finally a number of properties about the resulting overall system can be inferred. Although this paper reveals interesting aspects about the modeling of component composition, it does not show how conclusions about the the execution of the resulting system can be derived resulting model. Their approach also contains some important restrictions. One example is that component Petri nets may only have one input and one output place, which is a severe limitation for our purposes. a generic framework for Erich et al. describe in modeling component systems based on architectures like useful if the modeling CORBA or COM. This is is done entirely using graph- or net-based modeling techniques. The framework is based on a generic notion of transformations of specifications based on rule based graph transformations and high-level replacement systems. As a result, the semantics of a system can be inferred from that of its components. This is only the case if every aspect of each component can be formalized. Another approach for component composition can be found in [ which is focused on evolvable architectures use in the domain of business processes. Padberg et restricted nets called nets, which are a special sort of high-level Petri nets adopted for the description of workflows as modeling language. Component composition is realized by the application of which allow for consistency checking Petri Net based on the static composition. The formalism of Padberg
et al. is process-centered, but relies heavily on Petri net semantics, which hinders practical application. UML 2.0 introduces new token semantics for Activities’ that are comparable to high level Petri Nets. This new semantics make UML 2.0 Activities an interesting alternative to Petri Nets for high-level modeling systems, especially since the new syntax of emphasizes on well-defined interfaces for the Actions in an Activity. Earlier versions of the UML regarded Activities as a special kind of which provides rather different semantics. These old-style activity diagrams are widely discussed in the context of However, the application is workflow modeling still quite different from what we desire and it seems that most of these publications focus on the design aspect rather than actual execution. Only a few actually consider implementation aspects However, the application of Component Diagrams to capture concurrency aspects is quite limited, as it fails to capture concurrency aspects. Kobryn provides in [9] an overview about the merits and pitfalls of modeling component based systems with UML. He reveals significant problems arising with the use of UML component diagrams and some semantic overlaps between different modeling concepts of UML. Yet his analysis is focused exclusively on the aspect of static, structural modeling based on compents as defined by the UML. However, this paper presents a different component model that is not affected by the described problems. Our approach focuses on the dynamic concurrent behavior of a software system. Thus, it complements structure-oriented component approaches like COM, as these can be used to actually implement our approach. filter graphs are an example of how such a component model can be implemented using COM. Although these filter graphs are based on components with certain input- und output pins, their design and application is severely limited to the domain of assembling certain multimedia systems. We wanted to come up with a generic approach that covers such systems as well as server applications and many other applications.
3. Design Approach Our design approach uses UML Activity Diagrams to represent a software system as a set of interconnected software components. An UML Activity is a graph composed of interconnected nodes representing various aspects of the system. These nodes include and An edge between such nodes is an or Note that all references to classes from the are capitalized as in the UML specification.
abstract syntax
Proceedings of the 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’05) 0-7695-2308-0/05 $20.00 © 2005 IEEE
[all
to thread]
1: Component The UML 2.0 introduces Petri-Net like token semantics for Activities. Tokens are moved between Actions) and using and a Merge, Fork or Join) may be used in between, but cannot store tokens. have queue semantics and default to FIFO behavior. In our approach. this semantics for UML Activities is the foundation for providing inherent concurrency at runtime through the application of classic concurrency patterns like pipelining and replication without the need to explicitly design or implement this concurrent behavior in the actual application. In UML 2.0 Activities, a node an Action) is considered as activated when all its inputs from object and control flows are available. The node then consumes all its inputs and produces a set of outputs. This way, control and data flow oriented behavior can be modeled. approach has a strong focus on data flow modeling, as it can be used to design pipelined software designs that can several stages simultaneously and thus have an increased inherent level of concurrency compared to traditional designs that rely on sequential execution. Our design approach is using a profile based on a significant subset of UML 2.0 Activities enriched by a set of constraints for defining certain relevant component properties. A component in approach is a coarse in the Activity grained Action or describing a phase of the software behavior. Object and control flows interconnecting these components are used to coordinate these Actions. To avoid inefficient unlimited buffering, we assume a non-buffering behavior for object transfer of tokens. flows resulting in the Most other model elements in UML 2.0 Activities are supported as well, but are less significant to the approach. Many are considered as special Actions implemented using predefined in the framework described later. Thus. only the specific constraints and Activities used in our extensions compared to approach will be discussed here.
A component may have different interfaces through A is defined by a set of input or output pins that can be used to interconnect components using data and control flows. A Pin has an identifier and a type. We consider control flows as a special type of data flows moving a “Control”-type token. While at the design level multiple control flows are implicitly joined according to the UML semantics, “Control”-type data flows can be used explicitly to avoid these semantics. Furthermore, each component has an “Complete” output pin of “Control.’ type indicating its completion. Tokens appearing on an unconnected Pin are discarded silently. In our approach, a component can essentially have five different states (see Figure 1.). After construction, the component needs to be deployed on a Host system responsible for the execution of the component. For paper, we only regard local components. Thus, the Host is the local implementation of the component framework. Once a component is deployed, it could potentially be executed, if all its inputs are If this is the case. the component will be in the Ready state. Such a component will be executed by the system as soon as possible. For a component without inputs, this means the component is Ready just after its deployment. However, most components will remain in until the component is A component is considered as activated once all mandatory (non-optional) inputs are available. If no such inputs exist, at least one optional input is needed to activate the component. If no such input exists, the component is inherently activated. An activated component is Ready to run. It becomes Running once it is assigned to a thread for execution. After execution, the Component is Propagating until all outputs are consumed. For our components, we extend the semantics of Actions with the idea of concurrent execution of an Action on subsequent input sets. Assuming the Action is reentrant and free side-effects, this does not cause problems as long as the results appear in the original order. While this is quite intuitive, many designs do not require this behavior and could exploit concurrent order execution. Thus, we introduce a [strict} constraint to explicitly enforce UML-like sequencing of results. If that constraint is not used, it is assumed that the component does not need to present results in the same order. It is important to note that this is a property determined by the actual design, not the component itself. Semantically, this means that the component may consume several sets of inputs and produce the respective outputs in a different order. This enables us to transparently create concurrent instances of the same component at runtime without the need to sequence their outputs. This is the foundation for creating highly scalable software systems, as pipelining only provides a fixed amount of concurrency within the system.
Proceedings of the 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’05) 0-7695-2308-0/05 $20.00 © 2005 IEEE
It is usually desirable to control the actual level of scaling for a certain component. The ability to replicate certain components may be limited. One reason may be that these components rely on certain unique resources the input port for server connections or a video digitizer device). To handle such cases, we introduce the {single} and constraints to limit the absolute number of instances of a component to one or a given natural number. Furthermore, even for a component that can be replicated without inherent limits, it may be useful to provide an indication of the maximum number of desired copies per hardware thread to limit the growing overhead induced by the management of the concurrent copies. Thus, we introduce the (scale=} constraint to define such a limit. Thus, the maximum number of replicated instances for a component in our approach is = defined as Our approach supports the idea of Pins as defined by the UML 2.0. Such pins may be used to consume multiple inputs or produce multiple outputs. However, as these semantics are quite broad, we introduce an additional constraint that captures a more specific situation. In our approach, Pins may be marked as This indicates that no tokens at all may be produced or consumed by at the Pin. We apply both {stream} and {optional} to indicate that or tokens may be produced or consumed on a Pin. The replication of an Action in multiple concurrent instances may cause problems when results overtake each other. While this is often irrelevant or even desirable, it is sometimes important to maintain or restore a certain order on the tokens on a certain pin when sending the responses to HTTP requests in the original order). Thus, we have introduced the (strict} constraint to indicate that the original token order must be restored at a Pin. Furthermore, the sequencing of tokens from concurrent instances for a {stream} output may be crucial. By default, the result streams from concurrent instances are output in order of completion of the respective components. However, this not always necessary. Thus, we have introduced the (interleaved] constraint for {stream Pins to indicate that tokens from different instances are mixed. Aggregation of components in our approach is achieved by interconnecting component pins. Composition of components is achieved in our approach by using Like an Action, such a is defined by a set of Pins. These pins are directly connected to the according components or an owned Action representing the component-specific behavior. That Action may be interconnected with the owned components as well. Thus, there are no explicit semantics for the other than its use for composition.
Connection
(optional)
Figure 2: Web Server Design Model
The UML profile we use for component systems design has some restrictions on the use of certain model elements. We have not yet considered as a mechanism for iterating collections because it seems that the introduction of concurrent execution of Actions in our approach can cover the same situations. Additionally, Guards and decision are not supported, as their semantics the use in a component context are not clear. of these model elements with certain combinations of seems to collide generally with the token semantics as proposed by the not yet UML 2.0 specification, which is not a limitation of our approach. Finally, we like to emphasize that our approach computationally complete as loop-completeness can be easily shown by creating a data object (token) for checking the loop condition and rotating this object through an iterating component.
4. Design Example As an illustrating example, we will consider an HTTP Web server design. The corresponding design model (see Figure 2) shows how an HTTP 1.1 Web server can be designed by properly interconnecting four coarse grained components. This design emphasizes on an HTTP feature called request pipelining that fits nicely to illustrate our approach. HTTP request pipelining allows multiple requests to be sent on the same persistent connection without waiting for the corresponding responses. This significantly improves especially on high latency connections.
Proceedings of the 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’05) 0-7695-2308-0/05 $20.00 © 2005 IEEE
The design model for our Web contains only four components. The component waits for connections. As these connections arrive on single port, we can say that only one instance of this component can perform reasonable work. Thus. it contains the {single} constraint. Once a connection is established, it is passed to the RequestReader as an HTTPConnection object. The RequestReader will read a new request (excluding the body) the connection and pass that request as an object together with the connection to the RequestHandler component. The RequestHandler will examine the headers and read the body if desired. Once the body has been read, the connection will be passed back to the reader, so it can start reading the next request. The RequestHandler may also decide to close the connection, because it decides not to read the entity body or if an error occurred. If a reponse is computed, it is sent as a ResponseJob object (aggregating Response and Connection) to the Responsewriter for delivery. Alternatively, if an error already occurs while reading the request, the RequestReader directly generates a response. It is worth noting that the Responsewriter is the only {strict) component in this design. The actual order in which concurrent requests are processed does not need to be specified. However, the client expects responses to be sent in the original order. This order will be restored at the Responsewriter input Pin. The actual knowledge of that order is internal to the processed objects, as we will outline in the next section describing the actual instantiation and execution of such design models.
expects all contained design model. The objects to implement an interface which the buffer to ask the component if it is ready to be released. Thus, each component is only likely to give meaningful information in one such context. In our example, each if it contains the next response to ResponseJob can be sent over its connection. This a flexible approach. as it allows for partial ordering, which is required in our example where an ordering is only defined for responses on the same connection. Furthermore, we notice that each component is marked either with a or stereotype, The execution distinguishes between components that may block for a longer time and components that make full use of the processor during execution. This is a property of the actual component implementation. Thus, it is not contained in the design model, but known to the execution framework
Connection
Connection
5. Execution Framework The design models in our approach are the direct input to an execution framework that instantiates these designs.
However. to keep the execution environment simple, this instantiation actually includes a transformation of the design model into an equivalent runtime model consisting only of directly interconnected components without further control nodes or other model elements. Thus, model elements like Join, Fork and Merge have to be translated into respective components, which is still straightforward. Competition for tokens has to be resolved by introducing additional buffers. Figure 3 shows a transformed runtime model consisting of executable components for the design model shown in Figure 2. The actual component implementations are located using a registry and instantiated using a factory mechanism. We notice how implicit merges have been resolved into separate components. Furthermore, we note component has been introduced. This that a component restores the ordering of the Responses and is introduced as a result of the {strict} constraint in the
1 Figure 3: Instantiated Runtime Model
Proceedings of the 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’05) 0-7695-2308-0/05 $20.00 © 2005 IEEE
Blocking and Scheduled Components are handled differently to make better use of existing hardware threads. Components are handled sequentially by a pool of software threads depending on the number of hardware threads. This leads to optimal throughput since the components will not block. However, streaming output pins may cause a component to block until the output data has been consumed. Thus. this will
another component to be scheduled fur execution until the component can continue its operation. components are idle while blocking. Thus, the respective threads are provided from a separate that may grow over time until a reasonable dependent limit has been reached. Unused threads in both thread pools are kept for some time before they are discarded. This helps avoiding instantiation delays.
Connection.
n
Pin
Framework
_--. ____________-___
This replicated Component fetching
‘his replicated Component be
Framework.
Replicated mpleted
‘his
Framework.
Figure 4: RequestReaderReplicator Component - configuration snapshot at runtime
Proceedings of the 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’05) 0-7695-2308-0/05 $20.00 © 2005 IEEE
7
While the instantiated design model is inherently we have not yet described how replication is achieved at runtime. In our example, we can notice that during instantiation all non-{single] components are replaced by composite components. These are designed to handle the automatic replication of the components as needed. These composite components are automatically assembled from framework components and include a factory component for manufacturing actual replicated instances of a component type. Figure 4 shows a snapshot of the composite component for the RequestReader from our example. Components crossed through are already deleted, but included for the orientation of the reader. These are the Components that have provided initial tokens for a part of the Activity. We notice that the original component inputs are connected to a latch in the composite component that interconnects these inputs to one replicated instance of the original component type that is encapsulated into a composite ReplicatedRequestReader component. owned by the All the component stem from a instantiating new ReplicatedRequestReader components up to as defined by the and the design model and for issuing these instances into a queue components waiting for input. Generic components from the framework are used to connect such a ReplicatedRequestReader to the input pins, make it fetch the input data, and disconnect it so the next one from the queue may be connected afterwards. Con
Once the component has consumed inputs, it is ready to execute and will do so. When the execution of the component has finished, it will enqueue itself into a queue for components that are ready to propagate their results. To do so, the component gets connected to the output pins. waits until these are fetched to a result buffer. When all results are fetched from the result buffer by interconnected components, the next component can be connected to this buffer. After propagating its outputs, a component will again be added to the waiting queue. Components from the execution framework are allowed to leave inputs unconsumed. This does not comply with the execution semantics, but is only used to directly implement model elements like the Merge node. Thus, it does not harm the model semantics during execution. This is quite important when considering the possibility to apply form verification techniques that exploit the compliance of the models. The implementation of the ReplicatedRequestReader techniques already described is component using the shown in Figure 5. Both inputs and outputs are buffered to provide synchronous data passing capabilities. Completion of the actual encapsulated RequestReader component is indicated by passing a self-reference at the "Completed" output pin. Finally, we notice that the whole design introduces a latency of several execution steps compared to the non-replicated version. However, all these steps are trivial and can be neglected given that the actual scheduling a component for execution is efficient,
:
ReplicatedRequestReader
Complete
Framework Latch
Completed
on
Figure 5: ReplicatedRequestReader Component created by the framework at runtime
Proceedings of the 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’05) 0-7695-2308-0/05 $20.00 © 2005 IEEE
technology enabled the Web server to benefit from the second processor without exploiting concurrent gains here are mainly connections. The achieved through pipelining and out-of-order execution of pipelined requests using replicated components.
7. Conclusions and Future Work
I 50
150
I 250
350
Procsssor Processor
Figure 6: Web Server Performance comparision
6. Implementation and Evaluation We have implemented COM and Java versions of our component Framework. The implementations read design models in the form of XML documents and instantiate these using native component implementations The Java framework has been used to implemented our example based on an existing web server design that lacked true HTTP 1.1 capabilities. We created two versions of that server. The first version has been implemented straightforward using a single thread per connection to completely process all requests. For the second version, we moved to components based on our approach. This took about half a day. By comparing the of both systems we found that the component framework directly provided us a significant performance gain. The benchmark was conducted on a dual Pentium system with PC-133 SDRAM by requesting the same cached document ten thousand through a single persistent HTTP 1 connection and computing the request rate. test has been repeated ten times and the results where averaged. No anomalies occurred. The test results are shown in Figure 6. At first we notice a performance hit for the single processor configuration. We should expect a gain here because the pipelining should help the single processor system a little to save the blocking operations for computations. However, since the blocking operations in our example are block because they are subject to several not likely levels of buffering from the platform to network level, positive effects are hidden by negative system-level effects caused by the additional concurrency and increased implementation. overhead, esp. due to the The single processor was operating at load during all tests. we see a significant performance gain when using two processors. Introducing our component
We presented a generic approach on the design and construction of concurrent software systems based on UML 2.0 Activity Diagrams For this, we have introduced and implemented a framework for the execution of such components. Finally. we discussed a Web server example to demonstrate that the application of our approach is simple and can immediately increase application Future work will include the design and construction of a physically distributed version of the component framework to extend applicability to such systems. This also aims at providing a basis for advanced management like error recovery at the component level
References [I]
The Application of Petri Nets to Workflow Management. The Journal of Circuits. Systems and Computers. Vol. 8, 1998. Dumas, M., Hofstede, A. H.M. ter: UML Activity Diagrams as a Workflow Specification Language. In Proc. of the 4th International Conference on the Unified Modeling Toronto, Canada. October 200 I . Language H.. F., Braatz, B., M.: A Generic Component Framework for System Modeling. FASE 2002, LNCS 2306,2002. Proceedings Eshuis, R., Wieringa, R.: Verification support for workflow design with UML activity graphs. In Proc. o f ICSE, [ 5 ] Eshuis. Wieringa, R.: Real-Time Execution Semantics for UML Activity Diagrams. In Proc. 4th International Approaches to Software Conference Engineering (FASE Springer, 2001. Foster, H.. Uchitel, S., J., and based Verification of Web Service Compositions, presented on Automated at Eighteenth IEEE International Montreal, Canada, 2003. Software Engineering [7] Hamadi, R., B.: A Petri Net-based Model for Web Service Composition. Proc. of the Australasian Database Conference on Database Technologies. Hruby, P.: Specification of Workflow Management Systems with UML. Proceedings of the 1998 OOPSLA Workshop on Implementation and Application of Object-oriented Workflow Management Systems, Vancouver. BC 1998. [9] Kobryn, C.: Modeling Components and Frameworks with UML. Communications of the ACM, Vol. 43, No. 10. 2000. [ Management Group. The: Unified Modeling 1.2003. Language: Superstructure. J.. Weber. H., Petri Net based Components for Evolvable Architectures. Transactions of the 6.
Proceedings of the 12th IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’05) 0-7695-2308-0/05 $20.00 © 2005 IEEE