Key Elements Tracing Method for Parallel XML ... - Semantic Scholar

48 downloads 0 Views 343KB Size Report
3 (D) show that L1 and L2 cache miss ... memory performance, we first measure the L2 cache miss rate on T1 .... In CIKM'03, pages 175-178, New Orleans, Louisiana,. USA ... In ASPLOS'00, pages 117-128, Cambridge, MA, USA, November,.
2009 International Conference on Parallel and Distributed Computing, Applications and Technologies

Key Elements Tracing Method for Parallel XML Parsing in Multi-core System Xiaosong Li, Hao Wang, Taoying Liu, Wei Li Institute of Computing Technology, Chinese Academy of Sciences Beijing, 100080 China {lixiaosong, wanghao, liutaoying, liwei}@software.ict.ac.cn parsing is then applied to these segments. Based on PXP, SFTXP parallelized the process of skeleton building (preparsing) by transforming it to a process of execution of simultaneous finite transducer. Although they have both improved the performance of XML parsing through parallelization, there are still some problems in real applications. One particularly significant problem is they can not fulfill the performance requirement of server applications like Web Services container. This is due to the preparsing stage is serial in PXP. Although SFTXP overcomes this drawback, the overhead of state guessing for parallelize the preparsing stage degrades the overall throughput. In this paper, we address the problem using a Key Element Parse Tracing method (KEPT), which parallelizes the preparsing and parsing at element level. For the new method explores higher parallelism and avoids the overhead of guess execution for parallelizing the preparsing stage, it enhanced not only the performance of single XML document parsing but also the throughput in server application. Preparsing is a process for creating data dependence graph based on XML documents. The graph is based on data dependence (parent-child), but still contains semantic relationship (sibling). Parsing is to carry out parallel element parsing based on data dependence graph. For single element, the preparsing and parsing must be applied in chronological order. But for processing of multiple elements, the preparsing to one element can be parallel with parsing of the other one when there are no specific dependences, which can be seen as parallelism. These dependences can be summarized as below: 1. Preparsing of elements should be in the order of elements occurred in documents. 2. Parsing of elements should be in data dependence order, where the parent of the elements should be processed first. 3. For the same elements, preparsing should always be applied first compared to parsing. KEPT implements parallel XML parsing by tracing the above three dependences of elements and scheduling elements that have no dependences to do preparsing and parsing in parallel. The cost of preparsing and synchronization in parallel processing are extremely high if all elements in document are traced. In order to solve this problem, a method which extracts key elements from document schema is proposed to reduce the number of elements for processing. For optimizing the communication performance in KEPT, we introduce a scheduling algorithm to packing multiple successive key elements into a task for parsing, which makes

Abstract—Though XML is applied intensively in a lot of applications, XML parsing is not practical in many fields because of its poor performance. Parallel XML parsing on multi-core is a promising choice. Previous methods all adopt data parallel approach on XML parsing. As the semistructured nature of XML, they were obliged to divide the data into well-formed XML chunks and then parse these chunks parallel. The division process is named as preparsing. As the preparsing is serial, it becomes the bottleneck of parallel XML parsing. Related work Simultaneous Finite Transducer (SFTXP) parallelized the preparsing stage. It maintained multiple preparser results for each equal sized chunk according to enumerated all possible parsing states. In spite of finite states for each XML, the overhead by SFTXP is tremendous, including CPU time and memory for multiple results generating and storing, respectively. In this work, we address parallel XML parsing by Key Element Parse Tracing (KEPT) method which parallelizes the preparsing and parsing at element level. It remolds the preparsing as a key element extracting process and schedules the processing of key elements in the framework of KEPT. Then parsing process is parallelized as a whole. To demonstrate the effectiveness, we implement it on libxml2 and obtain good scalability on both an 8-core Linux machine and an 8-core 24 SMT Sun machine running Solaris. Keywords- XML parsing;parallel;multi-core;key element tracing

I.

INTRODUCTION

With the increasing use of 3W, XML has become one of the major data format for information exchanging in many kinds of applications. As XML parsing is both computing and data intensive, it becomes a performance bottleneck to XML-based applications [10]. Many approaches have been proposed to improve the performance of XML parsing: schema-directed parsing, reducing redundant code transformation, and hardware acceleration for XML validation and XSLT [3] [4] [8] [10] [18]. As multi-core systems are widely applied, parallelism becomes a promising choice to improve the performance of XML processing. In parallel XML parsing, the most difficult thing is parallel task partition which is caused by the unstructured nature of XML and the dependency between elements in documents. Parallel XML Parsing (PXP) and Simultaneous Finite Transducer XML Parsing (SFTXP) are two major methods proposed in this area. The first method maps the XML text to a skeleton. Then the document is divided into segments based on the structure of the skeleton. Parallel

978-0-7695-3914-0/09 $26.00 © 2009 IEEE DOI 10.1109/PDCAT.2009.64

439

preparsing and parsing of elements in one task be assigned to the same core. As preparsing and parsing to one element shares the same data chunk, this kind of locality enhanced the performance of memory accessing and reduced the communication cost between cores. It is worth noting that for avoiding thrashing in cache use, the number of elements in one task is restricted by in-chip cache capability. In Sun Fire1000 platform with 8-CPU, and Dell 1900 platform with 2 Intel Xeon 5310 CPU, total 8 cores, it shows good results in the speedup on benchmark documents. The speedup in these two platforms is 6 and 5.2, respectively. Compared with PXP and SFTXP, It is much better than the former and at least equal to the latter in speedup while improves the applicability to server applications which requires high throughput. The rest of the paper proceeds as follows. Section 2 outlines the background on parallel XML parsing. Section 3 introduces the KEPT methods for parallel parsing based on key element tracing. Section 4 presents performance results on both a Sun Platform with 8-cores 24 SMT and a Linux Platform with 8-cores. Based on these competitive results, we analyze the performance influence factor of KEPT through the following aspects: computing time, barrier cost and communication cost. We conclude in Section 5 with some further discussion of this approach and directions for future research. II.

Figure 1. KEPT Architecture

III.

METHOD DESCRIPTION AND IMPLEMENTATION

KEPT is hybrid model of task and data parallel approach. Using data parallel approach, it divides data into elements and then processes elements in parallel, and finally, merges results to DOM Tree sequentially. As the element processing in parallel XML parsing can be divided into preparsing and parsing stage, these two are parallelized if parsing dependences are met in the framework of KEPT. This can be seen as a task parallel approach. For performance optimization, KEPT first selects a subset of XML elements (key elements) from XML Schema as the basic granularity for preparsing and segment parsing. These key elements are further packed to tasks for scheduling. The tasks are scheduled to processors to make the preparsing and parsing of the key elements ran in the same core and parallelizes the processing of multiple elements. As shown in Fig. 1, the preparsing and parsing are simultaneous in multi-core platform. On one hand, as soon as the preparsing has divide parts of data into key elements, these elements is packed to a task for scheduling and then the preparsing goes on. On the other hand, the preparsing and parsing to elements in one task is assigned to the same physical processing unit, while preparsing to subsequent elements is assigned to other cores.

BACKGROUND ON PARALLEL XML PARSING

Pipeline, task parallel and data parallel are three directions for parallel data processing. In task parallel applications, entirely different calculations can be performed on either the same or different sets of data. For XML parsing, the DOM node allocation and code transformation are independent tasks. It can be parallelized is this way. Pipeline approach divides application into several stages and allocates one or more cores to every stage. The output of one stage is assigned as the input of the other. Data parallel approach first divides the data into chunks, and then these chunks are assigned to cores for processing. This approach is used in all previous work for XML parsing even it is hard to be applied. In parallel XML parsing, the difficulty is the chunks can not be rigidly partitioned to well-formed XML chunks. PXP is a two-stage processing to XML documents, that is a quick scan to the document, and followed by a partitioning and full parsing stage. The initial scan, which is known as preparsing, figures out the structure of the elements in the document. With the structural information, the XML document can then be partitioned into well-balanced data chunks and then be fully parsed in parallel. The full parsing is performed using unmodified libxml2 [38], which provide an API sufficient to parse well-balanced XML fragments. Finally, the parsing results of each parallel thread (the DOM fragments) are merged together to form the final DOM Tree. Based on PXP, SFTXP parallelize the preparsing process by equally partitioning the XML data and assign the data to multi-cores for preparsing. As the data assigned is not in a well-formed structure, state guessing is used for addressing the problem of uncertain state of preparser.

A. Preparsing Preparsing selects key elements for document being processed, divides XML data into chunks, and extracts data dependences of each key element. The preparsing consists of the two stages described below: 1) Select Key Elements As XML schema defines the element type and overall structure of XML documents, its top-N-level (The N is a document dependent parameter) elements are extracted out as key elements. For some key elements has elements nested in itself, it can’t be parsed while skipping the inner one by low-level XML parsing library which supports only parsing to elements with well-formed definition. These types of elements must be converted. So we tag this type of elements as compound type. The other ones are tagged as atomic. In this way, different parsing methods are applied depends on the type in parsing stage. 2) Divide data and extract elements dependences The document is carved up by key elements. The boundary of element is determined by how it is to be parsed.

440

For compound elements, their sub elements must be parsed alone. Therefore, the compound element contains only the start tag, eg “”. For atomic one, it is consistent with XML data definition, eg from “” to “”. The boundary of key elements is called break point and recorded for parsing. In order to parallelize the element parsing, data dependence of each element must be extracted. The relationship affecting parsing between elements mainly include parent-child and sibling. In order to find these relationships in preparsing, a stack is used. The stack pushes and pops element when meet start and corresponding end tags, respectively. If an element is pushed to this stack, its parent is lower-level one of it the stack. This is because the sequential reading of XML elements from documents behaves as a pre-order traversal to the DOM tree. And obviously, elements sharing the same parent are siblings of each other. As the preparsing goes on, the key elements been extracted compose a queue. Node of the queue describes the offset of the key elements in document, the index of parent and the type of composition. It is worth noting that, recognizing key element by name is not feasible in some cases owing to elements share the same name in different scoping may stand for completely different data types. This ambiguous of XML element naming is resolved in KEPT by the identifying the sequence of nodes in the stack used in preparsing.

dependences are not fulfilled at the moment, the current one is thrown into a global job queue. After finishing current task, every working thread should revisit the queue and parse the ready nodes. 2) Element Parse The parsing of XML elements is based on a widely used XML parsing library-libxml2. There are two characters when using parsing interface in libxml2: First, The element being parsed should be symmetrical. That is to say if the start and end tag are matched, both compound and atomic types of elements can be parsed. Second, parsing interface needs the parent DOM node to be passed in. Then the namespace is retrieved by tracing up until the root element. It can be an expensive operation, particularly when the tracing path is long. In this way, the namespace of the current element is specified. Based on these two characters, we use the following methods for choosing parsing method according to its composition type. a) For atomic type elements, it can be parsed using normal interface. If several contiguous atomic elements share the same parent, they could be parsed just in one parsing interface to reduce the overhead involved in multiple function call. b) For compound type elements, an automatic end tag completion method is used, e.g., for element , we complete it with . Then normal interface can be applied again. If all its child elements are in the same task, it can be parsed as atomic one to reduce the namespace tracing operations. 3) Results Store Results are stored in queue nodes standing for the key elements. It contains the DOM nodes generated by libxml2. Because the libxml2 does not link the generated node to outer nodes, the parent of the newly generated DOM node is referenced again. It assures the namespace can be resolved when parsing its children.

B. Segment Parsing 1) Data Task Retrieval The task assignment in KEPT is triggered by parsing threads as soon as it is idle. The parsing task contains of certain number of successive nodes in the queue mentioned in 3.1. The nodes in one task determine the size of XML data to be loaded and parsed by the parsing thread. In order to keep the data being parsed stayed in cache, the task size should be smaller than the size of last level cache in-chip. This makes the working set of preparsing and parsing been overlapped and optimizes the data accessing performance. We use the following equation for decide the task size (Ts): Ts = / (1+μ) Where is the last level cache size, μ is an empirical estimate to the ratio of tags and pure data in XML documents (general set to 3). It varies on different XML data [10]. Generally, the more elements contained in certain size of data, the higher μ is. Before start parsing individual elements, the data dependency must be checked and satisfied. These dependences can be summarized as below: 1. Preparsing of elements should be in the order of elements occurred in documents. 2. Parsing of elements should be in data dependence order, where the parent of the elements should be processed first. 3. For the same elements, preparsing should always be applied first compared to parsing. In short, the parent of the node current being parsed must have been parsed in order to get the namespace. If the

C. Result Merging When all nodes in queue have been parsed, the corresponding DOM nodes have been created. According to DOM specification, the DOM nodes must be linked in a parent-child and sibling order. As the node in queue has kept a pre-order traversal of DOM tree, we just need to connect the DOM nodes together in the following way: For each node in queue except the first one, it is added to the tail of the child list of parent node. After processing all nodes in queue, the root of the DOM tree is then in header of the queue node. IV.

PERFORMANCE EVALUATION

We compared the performance of KEPT with PXP and libxml2, including speedup test for performance of single document parsing. In order to validate the effectiveness of KEPT we analyzed the influence factors of the performance of KEPT.

441

the performance of KEPT are thread number and task size. Fig. 2(E) and Fig. 2(F) show the impact of them on the total speedup when parsing 64M DBLP data on two platforms.

A. Experimental Settings The experiment is based on two systems, one is Dell 1900 with two Intel Xeon 5310 processors, and the other is Sun Fire T1000 with UltraSPARC-T1 processor. We use Xeon 5310 and T1 for short of these two platforms later in this paper. We use gcc with compiler optimization option O2 for compiling the applications and default optimization option for libraries used. A modified libxml2 version 2.6.30 [5] for basic XML chunk parsing and Hoard memory allocator (libhoard) version 3.6.2 were used. We use performance tuning tools VTune on Xeon 5310 and cputrack on T1 to collect micro-architecture data for performance analysis. All speedups are calculated by dividing sequential parsing time (get from original libxml2) by parallel parsing time on DBLP XML documents [16].

1)

Computing Time The computing time mainly determined by element dependence in XML document, the percentage of preparsing in total one and the number of processing cores. Element dependence determines the potential degree of parallelism. For a document with deep structure, the parsing process has to be serial. Even though it largely impacts the degree of parallelism, we do not discuss about it here because it is not common in real applications and is not the scope of this paper. The computing time is comprised with preparsing and parsing. As the preparsing is a serial, the percentage of preparsing determines the scalability of KEPT. Even the preparsing process itself is not parallelized, KEPT have improved the total speedup by make the preparsing and parsing parallelizable in element level. The thread number can stands for the available processing cores. The more thread number the higher the speedup when not exceeds the core number. KEPT get peak performance when the thread number equals to the number of cores on Intel Xeon 5310 platforms, which can be seen from Fig. 2(E). It’s not the case in T1 although. As T1 is an 8 core with 24 hardware SMT architecture, it makes KEPT get top performance in 12 threads. On Xeon 5310, the thread number is 8. This number is consistent to the fact that Inter platform contains two Xeon 5310 CPUs. Thread number less than 8 may not fully utilize processing power, and thread number more than 8 may suffer from resource contentions. The optimal thread number on T1 is 12, and it is no longer equal to the number of cores. The impact of parsing speed ratio and hyper-threaded technique are main reasons that KEPT using 12 threads can get optimal performance. Therefore, the optimal number of threads is determined by core number, parsing speed ratio, and the cache hierarchy of multi-core CPU.

B. Overall Performance Fig. 2(A) and Fig. 2(B) show the speedup of PXP and KEPT with task size of 256K on Xeon 5310 and T1 platform respectively. From these figures, we can conclude that KEPT performs better than PXP for XML parsing on both platforms. The total makespan of KEPT is shorter than PXP for nearly all thread numbers. The speedup on Xeon 5310 is enhanced by about 50% with 8 threads, and gets 33% enhancement on T1 with 12 threads. The dropdown of speedup occurs when thread number exceeds 8 on Xeon 5310 and 12 on T1. The reason is threads begin to content resources significantly at these points, and we will analyze it in sub-section C.2. Experiments results with input documents in different sizes are shown in Fig. 2(C) for PXP and Fig. 2(D) for KEPT. Both KEPT and PXP exhibit good performance as the input data size increases. This is mainly because the startup overhead counts for less of the overall makespan with larger input. Both methods show a dropdown when the number of threads exceeds 8. For KEPT, parts of the parsing threads will be idle if total task size required exceeds the input document. And then the overall performance is only contributed by those working threads. If the task size is fairly set based on load balance, KEPT can still scales well with both the document size and the available processing cores.

2)

Barrier Cost The barrier cost includes the cost on load balancing and that on keep global state consistency. The load imbalance in KEPT is usually caused by different processing time on tasks in same size. The task size is estimated through data size one processed, but the actual time is related to the composition of elements in task. The processing time of an element with multiple attributes in its start tags and that of one with multiple child elements are different. It will leads to deviation in actual processing time. The larger the task is, the larger the deviation. But if the task allocated is too small, the synchronization cost between working threads is much higher. The synchronization cost in KEPT includes exclusive acquiring the preparsing task, the exclusive access to the global dependent job queue. The impact of task size is shown in Fig. 3(A). It shows the make-span of preparsing and the total time for document

C. Analysis the Influence Factors of the Performance In order to quantify the synchronization and communication needed, we have evaluated the influence factor of KEPT performance based on Bulk Synchronous Parallel model [22]. The BSP model for designing parallel algorithms bridges the hardware architecture and parallel programs by quantifying the performance of three aspects. It can be described as formula below: Overall Performance = Computation Time + Communication Cost + Barrier Cost Similar to BSP model, KEPT for parallel XML parsing can also mapped to them. In the following subsections, we analysis the influence factors of KEPT in corresponding three aspects. The performance comparison between PXP and KEPT are first given below. The runtime parameters that determine

442

Figure 2. Performance Comparision of PXP and KEPT for XML documents

parsing. With the increase of task size, preparsing time is reducing. The total time follows the same trend before task size is smaller than 128K. As the task size increased, the total time again increases slowly. For revealing the load imbalance in KEPT, max difference of parsing thread finishing time (MDT) is recorded which is shown in Fig. 3(B). It shows that large task size will leads to larger imbalances as explained before. Synchronization cost is measured by the number of voluntary context switch (VCS) in Linux; Fig. 3(B) shows that with the increase of task size, the number of VCS reduced a lot. We can figure out the impact of task size on barrier cost.

the 3MB L2 cache. Therefore we will investigate T1 more carefully. For investigating how data partitioning affects the memory performance, we first measure the L2 cache miss rate on T1 with 1 and 2 threads in order to avoid cache contention. It shows that the cache miss rate grow with the task size 1 and 2 threads both. Small task size will leads to better locality because the working set of preparsing and parsing are more likely to be overlapped. However, when number of thread on T1 platform is large, it will be hard to analyze the cache performance because the cache performance may be affected by both multiple threads and task size. The result with more threads on T1 platform is relatively indistinctive compared to that with 1 or 2 threads. The possible reason is the load unbalance decreases the contention of cache usage and thus reduces cache miss rate. We find that the L2 cache miss rates on both platforms are relatively low. The memory is not a major bottleneck for XML parsing performance. However, we believe that the awareness of memory hierarchy is necessary for further optimization.

3)

Communication Cost The communication cost in BSP is the time for sending n messages in communication channel with transfer rate g. In KEPT, the cost is mainly determined by the way of process been assigned to the processors, the thread number, task size and the architecture of computer. Localizing the operation on elements can reduce the number of message need to transfer between processing units and storage units crossing many cores. The localized data accessing reduced the cache substitution and improves the transfer rate. Because of the enormous speed gap between different levels of storage, the data transfer rate can be improved by multiple orders of magnitude through placing the data near to the storage unit near the processing units. In order to validate the effectiveness of these optimizations, we conduct a micro-architecture analyze to KEPT and libxml2 on these two platforms. Fig. 3(C) and Fig. 3 (D) show that L1 and L2 cache miss rate changes with the number of threads. The L2 cache miss rates of KEPT with task size of 128K are relatively low on both platforms. The increase of the thread number does not significantly affect the memory performance of KEPT on Intel Xeon 5310 platform. This can be contributed to the small working set of XML parsing and big 4MB L2 cache shared by 2 cores in one physical package. The case is more obvious on T1, because all eight cores are contenting to use

V.

CONCLUSION AND FUTURE WORK

Because of the unstructured nature of XML, both data parallel and task parallel approach can not scale well with multi-core platforms. In perspective of data parallel processing, XML parser can not get a firm working context in an ambiguous position in documents. This problem is solved by scan the XML documents and partition it based on the structure. But the problem now is the scanning becomes a bottleneck (PXP), or the overhead for the parallelizing the scanning process is extremely high (SFTXP). For address these problem, we redefine the preparsing as a key element extracting and relationship extracting process. The preparsing to one part of elements is scheduled to execute parallelized with parsing to the other parts of elements. From the element level, it is scheduled to for parsing if the dependences are fulfilled (KEPT). The experimental data

443

Figure 3. Performance analysis of KEPT in different task size (a) (b) and thread number (c) (d)

show our method exhibits good performance on both 8-core Intel Linux machine and 8-core with 24 SMT Sun machine. Even though the scalability of XML parsing is enhanced, the preparsing stage as a whole is still serial. Possible techniques for quick positioning tags will obvious improve the performance of preparsing. Idea of SFTXP can also be applied in key element level for parallel preparsing. Scheduling mechanism aware of data locality and cacheperformance is a promising direction for data intensive application just like in XML parsing. So another way of future work is applying tracing method to processes XMLlike data parallel.

[7]

[8]

[9]

[10]

[11]

ACKNOWLEDGMENT We acknowledge contributions by the GOS team, especially Hong Liu, Yizhu Tong. This work is supported in part by Natural Science Foundation of China (Grant No. 60873243, 60603004, 90612019), 863 Program (Grant No. 2006AA01A106 and 2006AA04Z158) and 973 Program (Grant No. 2005CB321807). We also thank the reviewers for valuable comments.

[12]

[13]

[14]

REFERENCES [1] [2]

[3]

[4]

[5] [6]

[15]

W. Lu, K. Chiu, and Y. Pan. A Parallel Approach to XML Parsing. In Grid’06, pages 223-230, Barcelona, Spain, September 2006. Y. Pan, W. Lu, Y. Zhang, and K. Chiu. A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs. In CCGrid’07, pages 351-362, Rio de Janeiro, Brazil, May 2007. M. G. Kostoulas, M. Matsa, N. Mendelsohn, E. Perkins, A. Heifets, and M. Mercaldi. XML Screamer: An Integrated Approach to High Performance XML Parsing, Validation and Deserialization. In WWW’06, pages 93–102, Edinburgh, Scotland, May, 2006. T. Takase, H. Miyashita, T. Suzumura, and M. Tatsubori. An Adaptive, Fast, and Safe XML Parser Based on Byte Sequence Memorization. In WWW’05, pages 692–701, Chiba, Japan, May, 2005. The XML C parser and toolkit of Gnome. http://xmlsoft.org W3C Document Object Model. www.w3.org/DOM/

[16] [17]

[18] [19]

[20] [21]

444

M. R. Head, and M. Govindaraju. Approaching a Parallelized XML Parser Optimized for Multi-Core Processors. In workshop on ServiceOriented Computing Performance in Conjunction with HPDC'07, pages 17-22, Monterey, California, USA, June, 2007. W. Zhang, and R. van Engelen. A Table-Driven Streaming XML Parsing Methodology for High-Performance Web Services. In ICWS’06, pages 197-204, Chicago, USA, September, 2006. K. Chiu, M. Govindaraju, and R. Bramley. Investigating the limits of SOAP Performance for scientific computing. In HPDC’02, pages 246-254, Edinburgh, Scotland, UK, July, 2002. M. Nicola, and J. John. XML Parsing: A Threat to Database Performance. In CIKM’03, pages 175-178, New Orleans, Louisiana, USA, November, 2003. C. Ding, and Y. Zhong. Predicting whole-program locality through reuse distance analysis. In PLDI'03, pages 245-257, San Diego, California, USA, June, 2003. P. J. Denning. The working set Model for program behaviour. Communications of the ACM, pages 323-333, Volume 11, issue 5, May 1968. J. J. Ding, and A. Waheed. Dual Processor Performance Characterization for XML Application-Oriented Networking. In ICPP’07, Xi-An, China, September 2007. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07, pages 211-222, San Diego, California, USA, June, 2007. M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala and L. P. Chew. Optimistic Parallelism Benefits from Data Partitioning. In ASPLOS’08, Seattle, WA, USA, March, 2008 DBLP XML records. http://dblp.uni-trier.de/xml/. E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In ASPLOS’00, pages 117-128, Cambridge, MA, USA, November, 2000. M. L. Noga, S. Schott, and W. Lowe. Lazy XML Processing. In DocEng’02, Mclean, Virginia, USA, November, 2002. P. Apparao, R. Iyer, R. Morin, N. Nayak, M. Bhat. Architectural Characterization of an XML-centric Commercial Server Workload. In ICPP’04, Montreal, Quebec, Canada, August, 2004. Y. Pan, Y. Zhang, K.Chiu. Simultaneous transducers for data-parallel XML parsing. In IPDPS’08, 14-18 April 2008 Page(s):1 – 12 Valiant, L. G. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103-111.

Suggest Documents