SystemC for a Paradigm Shift: Concurrency Modeling Pallavi Shurpali, Ravi Shankar and Ellie Shuff Computer Science & Engineering Florida Atlantic University Boca Raton, FL-33431
[email protected] Abstract Reuse and potential for performance improvement have motivated multi-thread, multi-core applications on current SoCs (System on a Chip). Concurrent programming has thus surfaced as a major challenge in SoC. Traditional design and verification approaches will not be cost-effective in exposing concurrency failures, which can be costly (significantly increased time to market and field failures). One would have to develop abstract concurrency models and do exhaustive analysis on these models to test for concurrency problems. We discuss in this paper a course on concurrency modeling conducted with graduate and senior undergraduate students. Our first goal was to make students aware of how concurrency issues could impact future complex systems and develop easy top-down methodologies to overcome these problems. The availability of Java and two high level tools for modeling and analysis facilitated this. Our second goal was to port the methodology to SystemC, an open source language based on C++ and a defacto standard used for system level design in the industry, for software-hardware codesign. We have documented the results with several examples.
1. Introduction A current SoC contains one or more embedded programmable processors, on-chip memory, and special purpose functional blocks [1]. Today’s complex SoC designs make extensive reuse of both hardware and software components that together implement the desired system functionality. The reuse paradigm significantly reduces development cycle and enhances product functionality, performance and quality. In the past, better system performance was achieved by increasing the clock speed. Several considerations are conspiring to prevent this from continuing as we reach farther into the deep submicron (DSM) domain. This is eloquently detailed in a recent article [2], which suggests that further performance gains can only be achieved with concurrent programming. Concurrency is being increasingly used in SoC designs to increase the performance and scalability of various multimedia,
networking, and communications products [1]. Concurrency techniques used in hardware design increase performance by using multiple instruction pipelines, reducing idle cycles for shared resources (e.g. pipelines, buses), and by distributing the processing load across multiple cores. Multi-cores and multi-threading techniques are used to implement concurrency in systems. Software programmers, fluent in sequential programming, however, may fail to appreciate the nuances of concurrent programming [2]. This may lead to unexpected and intermittent system failures. See [3], for example. Hardware is naturally concurrent; however, the hardware designers using programming languages could benefit from proper use of concurrency constructs. Most concurrent programming problems can be attributed to a lack of proper synchronization in the access of shared resources (e.g. bus, memory, and various I/O devices) [4]. The problems manifest as data corruption, race condition, deadlock, stall, and/or starvation. These problems often are unpredictable and occur intermittently making it hard to reproduce them. Exhaustive verification after system integration may not even uncover such faults in a reasonable period of time. A simple set of rules, coupled with a high level analysis, can allow one to gain the performance advantage, while avoiding concurrency failures that can be costly. The traditional approach of designing hardware and software separately from the initial specifications is not feasible as the complexity of the system increases. The system level design approach needs to be holistic and be capable of handling both software and hardware design in a single flow. Trade-offs have to be performed across boundaries of software and hardware assuming the system to be a single entity. System level concurrency modeling could benefit from the software concurrency modeling efforts underway. However, from a software-hardware codesign perspective, these models should also give insight as to performance in terms of quality of service, power consumption, resource utilization, etc. This information can then be used to perform partitioning of the system into software and hardware. In order to develop high level models, we need languages that allow modeling of both software and hardware together; address the issues of concurrency, synchronization, and
communication, essential for multiprocessor/multi-tasking systems; and allow one to perform what-if scenarios [5] at various levels of abstraction with incomplete information. They also need to allow integration of information from multiple heterogeneous sources for future SoC designs. SystemC 2.0 is a C++ based system level modeling and design language [6] that supports all this. Java may also be useful [7]. We focus here on SystemC as it is accepted industry-wide.
2. Methods In developing this course, our goal was to develop an infrastructure for addressing concurrency issues in softwarehardware codesign. We used FSP (finite state process), a modeling tool, and LTSA (labeled transition signal analyzer), a graphical analysis tool, developed for modeling and analyzing concurrency in software, but equally applicable to our codesign environment [8]. FSP, a textual notation, is used to develop an abstract concurrency model that captures all the relevant synchronization and communication detail. Each process is modeled as an abstract state machine. The parallel composition of multiple processes results in FSP (see Fig. 1 for an example). LTSA translates the FSP descriptions into equivalent graphical description. It is used to test exhaustively the model for concurrency problems. It checks for both desirable and undesirable properties of a system by analyzing all possible sequences of events and actions that can occur in the system (see Fig. 2 for an example). The book [8] uses Java as the implementation language. We integrated their methodology with SystemC, so it could be useful for software-hardware codesign. We used the SystemC package from the open source OSCI site and used it in the Visual Studio Environment. We list below the basic constructs required for concurrency and the corresponding SystemC constructs: Mutex (sc_mutex); Monitors (Channels and Interfaces); Threads (sc_thread and sc_cthread); Passive Object (sc_method); Events (sc_event); Block Thread (wait ()); and Unblock Thread (notify()). SystemC Version 3.0 is expected to include the following missing concurrency features [9]: dynamically spawn threads; abort a thread and its’ children; support for abstract RTOS and scheduler modeling; and specification and checking for timing constraints. A custom model of computation for modeling concurrency can be easily implemented in the new version. The students were given four assignments, four quizzes and one term project in the course [10]. The assignments were: document a concurrency mishap that occurred in reallife; discuss concurrency in any aspect of computer system/software/hardware design; document software concurrency failures in any industry, using federal/industrial
databases; and explore the role of concurrency in system task partitioning.
Figure 1 FSP Animation for Producer-Consumer with a monitor (correct version) Composing potential DEADLOCK States Composed: 28 Transitions: 32 in 60ms Trace to DEADLOCK: get
Figure 2 LTSA predicts deadlock for Producer-Consumer (for the semaphore-based design) Assignments were designed to get students interested by showing the real-life relevance of the course. The quizzes focused on the use of FSP and LTSA for simple concurrent systems. For the term project, the students explored various options and used new tools/methods/languages to develop concurrency models and programs. Students fluent in SystemC re-wrote some examples from the book. We will cover those examples here. The tutorials developed include FSP, LTSA, and the SystemC code for both the incorrect and the corrected versions [10].
2.1 Example 1 - Ornamental Garden model: This example has a garden with two turnstiles. Number of people allowed in the Garden is limited to 40. Each turnstile can let in 20 each. A shared resource, counter, keeps track of the total number of people entering from both the turnstiles. If the concurrent program worked correctly, the Counter value should match the sum of the totals for the two turnstiles. We have implemented the example in SystemC both with and without mutual exclusion of the shared resource to demonstrate process interference. FSP and LTSA were used to show the existence of this problem.
2.2 Example 2 – Producer Consumer Model: This example uses a bounded FIFO buffer. See Figs 1 and 2 for the FSP and LTSA of the model. The producer puts data into the buffer and the consumer gets the data from the buffer, at programmable rates. The buffer is a shared object and both the producer and consumer threads have mutually exclusive access to the buffer. When the buffer is full the producer thread is blocked; similarly, if the buffer is empty the consumer thread is blocked; the thread is activated if the
blocking condition changes. A monitor based design is safe from deadlock, but runs slowly (Fig 1). We implemented the example with two levels of synchronization (with semaphores), which potentially can run faster, but has the potential for a deadlock (Fig 2). The consumer thread may lock the buffer before checking whether the buffer has data, leading to a deadlock. We then developed another SystemC model with the producer and consumer threads that check for empty and full condition of the buffer, respectively. The resulting simulations showed correct operation [10].
2.3 Example 3 – Dining Philosopher Model: This example has five philosophers seated around a table with a bowl of spaghetti in the centre. There are five forks placed on the table between the philosophers. Every philosopher spends random amount of time eating and thinking. To eat the spaghetti, each philosopher has to use the forks placed to their immediate right and left only. Thus every fork is shared between two philosophers. The philosophers have mutually exclusive access to a fork. We implemented both the incorrect (with deadlock) and correct example in SystemC. Deadlock is avoided by breaking one of the four necessary and sufficient conditions for deadlock [8] – in this case, by introducing asymmetry in the behavior of philosophers. Refer to [10].
5. Conclusion This course was offered to both graduate and senior undergraduate students. During the course, insight was gained by making students work on projects and look for real-life concurrency failures. We have developed a quick, easy, and cost-effective method for developing SystemC programs that have correct concurrent behavior. ACKNOWLEDGMENTS We gratefully acknowledge Ilogix, Inc., Mirabilis Design Inc, and OPNET for their tools made available free for our course. Thanks also to Professors Magee and Kramer of Imperial College; London for their pioneering work which makes concurrency modeling a reality. REFERENCES [1] Cordan.B, Configurable Platform-Based SoC Design Techniques, part 1, March 12, 2001- www.eedesign.com,
[2] Sutter.H, Free Lunch is Over: A Fundamental Turn Towards Concurrency in Software, March ‘05 http://www.gotw.ca/publications/concurrency-ddj.htm,.
[3] Nisley.E, But I Never Did That Before, in Dr.Dobb’s Journal, Nov 2005 - www.ddj.com/documents/ddj0411r/
[4] Mitchell E., Multicore, multithreaded SoCs pose a challenge, August 25, 2003 - http://www.eedesign.com/
3. Results
[5] Quraishi G., Shankar R., On Simulating IP Market Dynamics
The students found, through literature search, many interesting examples of real-life concurrency failures, as with the Path Finder Mars Rover mission [11], Automobiles [3], Medical Instruments [12], Cell Phones [13], and others. Students were from both software and hardware backgrounds. They were given flexibility to implement their term project using different languages and tools, so long as they could demonstrate the concurrency concepts that they used. Projects included: Concurrency programming with SystemC, Concurrency modeling of Therac-25 [12], Modeling of pipelining; Automated UML to concurrent code generation, Concurrency with a Linux Cluster, etc.
[6] The Open SystemC Initiative, www.systemc.org [7] Habibi A. and Tahar.S, A Survey on System-On-a-Chip Design
4. Discussion Students were taught FSP and LTSA; hence they were comfortable with them. Most students were, however, unfamiliar with open-ended assignments that asked them to explore real-life systems. The students were also asked to think critically and challenge any concept or example they did not understand or agree with. The grading policy encouraged them to be innovative. Extra points were given to students for active participation. We are at present exploring the use of UML for automatic concurrent code (SystemC and C++) generation, and concurrent verification.
in an Academic Environment using SystemC, IEEE Intl Conf. Microelectr. Syst Educ., pp.136, 2003.
Languages, Proc. of the 3rd IEEE Intl Wkshp on SoC for RT Apps, pp.212-215, 2003.
[8] Magee J. and Kramer J., Concurrency: State Models & Java Programs, http://www-dse.doc.ic.ac.uk/concurrency/
[9] Swan S., An Introduction to System Level Modeling in SystemC 2.0, Jan ‘01. www.systemc.org/projects/sitedocs/ document/v201_White_Paper/en/1 [10] SystemC Concurrency Tutorials:http://www.csi.fau.edu/concurrency_modeling/ [11] Barr M., Introduction to Priority Inversion, April 2002www.embedded.com. [12] courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html [13] http://www.epinions.com/content_139397402244