Efficient Memory Representation of XML Document Trees
Recommend Documents
via, e.g., the DOM, suffer from huge memory demands: the space needed to load ...... A. Arion, A. Bonifati, G. Costa, S. D'Aguanno, I. Manolescu, and A. Pugliese.
These condensed representations are key sentences and key phrases and ... necessary, this insight could potentially reduce costs for the labelling process of text ... labelling key phrases is not only twice as fast as labelling full-text documents bu
Key Words: document labelling, tag clouds, word clouds, text summarisation, data ... an example of the latter is the âlabelled feature approachâ [Druck et al., 2008]. ..... 18.4%. 7.5%. 10.6%. Table 1: Key word extraction performance. Comparing .
Ben Messaoud, I., Feki, J., Zurfluh, G., 2010. Unification des structures des documents XML pour l'entreposage de documents. In ASD'10, Cinquième Atelier sur ...
Ben Messaoud, I., Feki, J., Zurfluh, G., 2010. Unification des structures des documents XML pour l'entreposage de documents. In ASD'10, Cinquième Atelier sur ...
METS used for Digital Libraries and their interest for data .... digital objects within a digital library. .... show the model of the signatory in the letter class. It is.
Jul 5, 2006 - L. Candillier, I. Tellier, and F. Torre. Transforming XML trees for efficient classifi- cation and clustering. INEX 2005 Workshop on Mining XML ...
ancestors (nodes with a path to n) and descendants (nodes with a path from ... subset of descendants and ancestors that are explicitly stored should be as small.
The memory size of explicit MPC solutions is primarily determined by ..... 0.5] u,. (9) where the states and inputs are constrained, respectively, by |xi| ⤠5, i = 1, 2, ...
Computation Structures Group. Memo 453. The Stata Center, 32 Vassar Street, Cambridge, Massachusetts 02139. Computer Sci
Jul 1, 2017 - 2014; Wu et al., 2016), text summarization (Rush ..... size of 10 (Wiseman and Rush, 2016). ..... Richard S. Zemel, and Yoshua Bengio. 2015.
Feature Selection in Text Categorization ... Naıve Bayes classifier called Complement Naıve Bayes (CNB) [6], emphasizing the importance of term ..... Kibriya, A.M., Frank, E., Pfahringer, B., Holmes, G.: Multinomial naive bayes for text.
to easily convert XML data into various formats. The basic idea is ... [email protected] ... BibTEXML entries into various formats (eg, HTML or.
multimedia data, packaged in MP4 containers, with a standard browser. Ac- companying XML documents can be transmitted as a whole, or progressively.
higher comparison accuracy with respect to alternative XML comparison methods, while ... web exploitation of XML, XML-based comparison becomes a central issue in the ...... pair of documents), XML documents appear in one global cluster, the starting
project co-funded by the Italian Ministry of the University and Scientific Research). In particular ..... The delegation of the Emperor arrived in Rome around Christmas 1467 (C4). Notice that ..... of the TEI or TEI Lite DTDs (TEI Consortium, 2001).
memory bag-of-words ranked document retrieval using a self-index derived from the ... and Subject Descriptors. H.3.1 [Information Storage and Retrieval]: Content Analysis ...... au/~e76763/publications/cys11-adcs.pdf. [12] P. Ferragina and G.
This is the same expression as for the regular Pearson correlation coefficient, ... The Spearman rank correlation is an example of a non-parametric similarity ...
{vnassis,wenny}@cs.latrobe.edu.au. 2 Faculty of Information Technology, University of Technology, Sydney, Australia. {tharam3,rajugan}@it.uts.edu.au. Abstract ...
2 Vicky Nassis et al. The concern of managing large amounts of XML document data arises the need to explore the data warehouse approach through the use of ...
the analysis and representation of geophysical signals, as we show by example. ..... (16), however, the Slepian expression (17) has disentangled the ...
the balanced parenthesis (BP), depth first unary degree sequence (DFUDS) and ..... structures which are used ubiquitousl
17 Apr 2007 ... needed to store the tree structure of the XML document. ..... node u ∈ N∗ of t, the
pattern p matches t at u if there are trees t1,...,tk such that ... no possibility to
extend m and m′ at the leafs to matches of some larger common.
Efficient Memory Representation of XML Document Trees Giorgio Busatto a , Markus Lohrey b , Sebastian Maneth c,1 a
Department f¨ur Informatik, Universit¨at Oldenburg, Germany [email protected] b
National ICT Australia Ltd.1 and University of New South Wales, Sydney, Australia [email protected]
Abstract Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. In this paper, a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by compressing their tree structure; the latter means to detect and remove repetitions of tree patterns. Formally, context-free tree grammars that generate only a single tree are used for tree compression. The functionality of basic tree operations, like traversal along edges, is preserved under this compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. The complexity of certain computational problems like validation against XML types or testing equality is investigated for compressed input trees. Key words: Tree grammar, compression, in-memory XML representation
1 Introduction
There are many scenarios in which trees are processed by computer programs. Often it is useful to keep a representation of the tree in main memory in order to retain 1
National ICT Australia is funded through the Australian Government’s Backing Australia’s Ability initiative, in part through the Australian Research Council.