XCDL: an XML-Oriented Visual Composition Definition Language

20 downloads 185 Views 906KB Size Report
Visual languages, Colored Petri Nets, Composition, XML data .... are emerging and provided for desktop and online applications. .... It is based on simple drag.
XCDL: an XML-Oriented Visual Composition Definition Language Gilbert Tekli

Richard Chbeir

Jacques Fayolle

Telecom St Etienne, Jean Monnet University St Etienne, France +33 4 77 48 50 34

LE2I Laboratory (UMR-CNRS), Bourgogne University Dijon, France +33 3 80 39 36 55

Telecom St Etienne, Jean Monnet University St Etienne, France +33 4 77 48 50 00

[email protected]

[email protected]

[email protected]

ABSTRACT XML data flow has reached beyond the world of computer science and has spread to other areas such as data communication, e-commerce and instant messaging. Therefore, manipulating this data by non expert programmers is becoming imperative. On one hand, Mashups have emerged a few years ago, providing users with visual tools for web data manipulation but not necessarily XML specific. Mashups have been leaning towards functional composition but no formal languages have yet been defined. On the other hand, visual languages for XML have been emerging since the standardization of XML, and mostly relying on querying XML data for extraction or structure transformations. These languages are mainly based on existing textual XML languages, have limited expressiveness and do not provide non expert programmers with means to manipulate XML data. In this paper, we define a generic visual language called XCDL based on Colored Petri Nets allowing non expert programmers to compose manipulation operations. The language is adapted to XML, providing users with means to compose XML oriented operations. The language core syntax is presented here along with an implemented prototype based on it.

Categories and Subject Descriptors D.3.2 [Programming Languages]: Formal Definitions and Theory– semantics, syntax.

General Terms Design, Standardization, Languages, Theory.

Keywords Visual languages, Colored Petri Nets, Composition, XML data manipulation.

1. INTRODUCTION The widespread of XML today has invaded the world of computers and is present now in most of its fields (i.e., internet, networks, information systems, software and operating systems). Furthermore XML has reached beyond the computer domain and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. iiWAS2010, 8-10 November, 2010, Paris, France. Copyright 2010 ACM 978-1-4503-0421-4/10/11...$10.00.

is being used to communicate crucial data in different areas such as, e-commerce, data communication, identification, information storage, instant messaging and others. Therefore, due to the extensive use of textual information transmitted in form of XML structured data by expert and non expert programmers simultaneously, it is becoming essential to allow users, programmers and others, to manipulate corresponding XML data holding sensitive textual information based on personal requirements. As an example, consider a cardiologist who shares medical records of his patients with some of his colleagues and wishes to omit personal information concerning his patients (i.e., name, social security number, address, etc.). In this case, data omission is the manipulation required which can be done via data encryption, removal, substitution or others depending on the operations provided by the system and the requirements of the user (cardiologist in this case). In order to address this matter, 2 main approaches have emerged in the literature, Mashups and XML oriented visual languages. Both approaches try to provide expert and non expert programmers with the ability to write/draw data manipulations by means of visual elements. While there has been no common definition for Mashups, existing Mashup tools mainly aim at composing manipulation operators (e.g., RSS filters) for different types of web data (e.g., html, web site content…), but are not specific to XML. Since Mashups have not been formally defined yet as stated in [5], no languages have emerged yet providing visual functional compositions. On the other hand, XML oriented visual languages are already formalized and mainly based on existing XML transformation (e.g., XSLT) or querying languages (e.g., XQuery). They provide visual means for non expert programmers to write manipulation operations specific for XML data. Nonetheless, the expressiveness of existing XML oriented Visual languages is limited to their inability to visually express all the operations existing in the languages (e.g. aggregation functions) which they are based upon. Also the expressiveness is limited to the operations of these languages themselves. Their main goal is data extraction and structure transformation. Aside from their expressiveness limitations, these languages normally require the user to have some knowledge in different areas such as data querying which renders the task more difficult. They are not considered as visual functional composition languages. In this paper, we try to solve these issues by defining a new visual language, XML-Oriented Composition Definition Language (XCDL), which provides a generic and formal definition of visual function-based composition. It is based on Colored-Petri Nets which allows it to express complex compositions with parallelism

and concurrency. In this paper, we present the definitions and specifications of the XCDL language along with a prototype developed to validate our approach. The rest of this paper is organized as follows. The first section motivates our research with a few scenarios. Section 2 discusses related work and available approaches. In Section 3, we present the definition and specifications of the XCDL. Section 4 presents the prototype. And finally, we conclude and state some future works.

words sensitive data in the RSS feeds are to be encrypted by the information provider (the communication department), decrypted by the corresponding readers (employees and partners) and obfuscated for the rest. The feeds should remain RSS standardized. Required techniques: 1-

XML Granular Content Encryption and Signature: 

2. MOTIVATING SCENARIOS To motivate our research and describe some of the issues which can be addressed using visual composition languages, 2 scenarios are described which illustrate different perspectives of XML data manipulation. The following scenarios can neither be solved using existing Mashups nor XML visual languages. Mashups have 2 main drawbacks: (i) they are not XML specific tools and therefore do not offer data or text centric XML related operators, and (ii) they are server tools and cannot manipulate desktop data. As for XML visual languages, they do not really offer manipulation operators, but operate mostly on extracting data and transforming the XML structure. If we consider having a visual composition language generic and adaptable to different data types providing the user with means to create data manipulation operations from functions defined either in system libraries or external services, such a language can be embedded in any Mashup tool, server or client side, and used to create different manipulations based on the functions specified in them which can be specific to a certain data type, in our case XML data. Consider a media company running different departments locally and internationally (Reporting department, Publishing department, Communication department …). Different control scenarios are required either in a single department or between departments. Scenario 1: News gathering and report generation A journalist working in the reporting department is writing an article covering a global event. The journalist wishes to acquire all information being transmitted by different media sources (television channels, radio channels, journals …) in the form of RSS feeds, filter out their content based on the topic (s)he is interested in, and then compare the resulted feeds. Based on the comparison results, a report covering relevant facts of the event will be generated. To achieve this control, several techniques would be required: 1-

XML Filtering: Filter XML data provided by several sources having the same structure (RSS Schema) based on a specific topic.

2-

XML Content Similarity: Compare the filtered XML data for content similarities and retrieve significant data.

3-

Automated XML generation: Generate an XML file reporting the filtered out XML data.

Scenario 2: Sensitive Data obfuscation The Communication department posts information and news concerning its activities in form of RSS feeds over the internet. The company wants to keep sensitive parts of the information exclusive to its employees and partners. But the information needs to be partially available worldwide over the internet. In other



Encrypt and sign part of the data content transmitted in an XML file without altering the structure. (e.g.38SUJujdgxxvES decided to sign the contract on the Wx34zs5sdZD.). Decrypt the encrypted data by the corresponding users.

To summarize, separate techniques (i.e. filtering, content adaptation, encryption, etc.) must be implemented and adapted together in order to provide the previous requirements. These techniques generally require some expertise in different fields in order to implement them separately. In addition, more and more services and libraries defining functions related to such techniques are emerging and provided for desktop and online applications. Thus, allowing their reuse is becoming essential. Providing a visual composition language for XML data would render these tasks simpler and manageable by non expert programmers by composing the required manipulations based on embedded functions related to these techniques.

3. RELATED WORK So far 2 main approaches have emerged aiming at providing non expert programmers with tools allowing them to define their own control or manipulation operators for XML data: Mashups and Visual languages.

3.1 Mashups Mashup is a new application development approach that allows users to aggregate multiple services, each serving its own purpose, to create a service that serves a new purpose. Mashups are built on the idea of reusing and combining existing services. They are mainly designed for data circulating on the web. Their objective is to target non-expert programmers; therefore a graphical interface is generally offered to the user to express most operations. Mashup applications [1][9][12] can include Mashups using maps (i.e. google maps and yahoo map3), multimedia content (i.e. youtube and flicker videos), e-commerce services (i.e. amazon.com and ebay.com) and news feeds (i.e. RSS and ATOM). The latter is the focus of most emerging Mashup tools nowadays. To the best of our knowledge, no tool yet provides information regarding the analysis of the performances. All the tools are supposed to target non-expert users, but a programming knowledge is usually required depending on each tool. None of the Mashup tools provide a formal language for creating Mashups and particularly, functional composition based languages. Several tools have emerged such as Damia [1], Yahoo Pipes [5], Popfly [12], Apatar [5] and MashMaker [9]. Damia and Yahoo Pipes are mainly designed to manipulate Data Feeds such as RSS feeds. Popfly is used to visualize data associated to social networks such as Flicker and Facebook. Popfly is a framework for creating web pages containing dynamic and rich visualizations of structured data retrieved from the web through REST web services. Apatar helps users join and aggregate desktop data such as MySQL, Oracle, PS SQL and others with the web through

REST web services. MashMaker is used for editing, querying and manipulating data from web pages. Its goal is to suggest to the user some enhancements, if available, for the visited web pages. Existing Mashup tools are mainly designed to handle online Web data which can be a disadvantage since by doing this, user‟s data, generally available on desktops cannot be accessed and used. They are not specifically designed for XML data manipulation and therefore do not provide XML specific operations for querying, updating and modifying all types of XML data. Existing tools are going towards composition based on existing functions (i.e., Damia and Yahoo Pipes) which allows them to increase their expressiveness in comparison with the tools following the query by example technique. The latter have limited operations and are considered more complex for non programmers due to the fact that some knowledge is required in querying data.

3.2 Visual Languages for XML

Expressiveness

High

Low

Based on Formal languages

No

Yes

Functional Composition

Yes

No

Composition-based functions

No

No

Extending functions

Dependent on the tool

limited

4. OUR APPROACH Our approach is based on the same spirits of both Mashups and XML visual languages as shown in Figure 1. On one hand, it has a similar architecture to Mashups and defines visual compositions. On the other hand, it presents a formally defined visual functional composition language and separates the inputs and outputs to source and destination structures similar to XML-oriented visual languages.

Since the emerging of the XML standard and its widespread beyond the computer domain, researchers have been trying to provide visual languages allowing the manipulation of XML data. These visual languages are mainly extensions of existing approaches such as XML query languages and transformation languages. Their main contribution is to allow non expert programmers to extract sensitive data from XML document and restructure the output document. Several languages have been developed over the years such as Xing [8], XML-GL [10], XQBE [2] and VXT [4]. On one hand, Xing and XML-GL were developed before XQuery was standardized and took the SQL querying approach by following the 3 main components of a regular query, selecting, filtering and restructuring the data. XQBE was developed after XQuery and is based on it. Its expressiveness is greater than previous approaches whereas it allows the creation of complex queries containing aggregation functions, ordering results and negation expressions. Nonetheless, its expressiveness is limited to data extraction and query reconstruction in XQuery and does not include textual data manipulation operations. VXT, on the other hand, was based on XSLT [14] which is mainly used for XML data restructuring and not textual data manipulation. On the visual aspect, all of these approaches followed the same pattern, dividing their workspace into 2 main sections. The left and right sections, where the first one constitutes the source file with the extraction rules and the second one constitutes the result file with some links between the 2 sections defining the mapping between them. The existing visual languages successfully bridged the gap between the complexities of XML data querying and non expert programmers but were limited only to data extraction, filtering and restructuring. So mainly they provided non expert programmers with the ability to create XML structural transformations along with data extraction and filtering but did not deal with the textual data manipulation issues. Table 1 summarizes the different criterions of the Mashups and XML oriented visual languages. Table 1: Properties of Mashups and XML oriented languages Properties

Mashups

XML Visual languages

XML specific

No

Yes

Manipulate online data

Yes

Yes

Manipulate desktop data

No

Yes

Figure 1: The XA2C approach The approach targets both expert and non-expert programmers. It defines and specifies a formal visual composition language which can be adapted to any composition based Mashup tool or visual functional composition tool. Nevertheless our language is XML oriented and it scopes all XML data. In addition, it is based on CP-Nets which allow us to provide information regarding performance analysis and error handling. Before giving the formal definition of the XCDL (XML-Oriented Visual Composition Definition Language), we give an informal one and a discussion of some of its properties.

4.1 Informal Definition of the XCDL The XCDL is a visual functional composition language based on system defined functions and oriented towards XML. We denote by system defined functions (SD-functions), functions which will be identified in the language environment. These SDfunctions can be provided by DLL/JAR files or services (e.g., Web service). XCDL is divided into 2 main parts: 

The Inputs/Outputs (I/O).



The SD-functions and the composition which constitute the XCDL Core.

The I/O are defined as XML Content Description trees (XCDtrees) which are Ordered labeled trees summarizing the structure of XML documents or XML fragments, or representing a DTD or an XML schema, in forms of tree views as shown in Figure 4. SD-functions are defined each as a CP-Net [7][13] with the inputs and outputs defined as places and represented graphically as circles filled with a single color each defining their types. It is important to note that a function can have one or multiple inputs but only one output. The operation of the function itself is represented in a transition which transforms the inputs to the output. Graphically, it is represented as a rectangle with an image embedded inside it describing the operation. Input and output places are linked to the transition via arcs represented by direct lines. Several sample functions are shown in Figure 2.

   

Expressiveness: allowing users to describe basic operations (e.g. String search) as well as complex ones (e.g. granular encryption). Flexibility: enabling its integration and appliance to different systems (e.g. web applications, desktop applications). Extensibility: allowing it to be extended and enriched with new operators and functionalities. Adaptability: allowing it to be adaptable to different domains (in our research to Textual and XML data).

In order to satisfy simplicity, we defined the language as visual and based on functional composition. It is based on simple drag and drop actions of graphical components in order to compose manipulation operations. To provide Expressiveness, Flexibility and Extensibility, we based the syntax of the XCDL on CP-Nets instead of other algebras or grammars (e.g., Lambda Calculus), based on: 

Figure 2: Several sample functions defined in XCDL The composition is also based on CP-Nets. It is defined by a sequential mapping between the output and an input of SDfunctions. It is represented by a combination of graphical functions which are dragged and dropped, and then linked together with a sequence operator which is represented by a direct line between the output of a function and an input of another having the same color as shown in Figure 3. As a result, on one hand, a composition might be a serial one meaning that all the functions are linked sequentially and to each function one and only one function can be mapped as illustrated in Figure 9.a. In this case, the sequential operator is enough. On the other hand, the composition might contain concurrency, as in, several functions can be mapped to a single one as depicted in Figure 9.b. In this case we introduce an abstract operator, the concurrency operator, in order to indicate the functions are concurrent. The geometric properties of the functions are shown in Figure 10, such that, input places are drawn in a symmetric manner in correspondence with the X-axis considered to be situated in the middle of the transition. The distance between the circles is calculated automatically as described in section 4.4.

The fact that it allows us to define our language as visual in a more simplified manner than other algebras and grammars (e.g., lambda calculus).  CP-Nets allow the expressiveness of both State and state changes simultaneously.  CP-Nets have a very well defined semantics and can describe any type of workflow system, behavioral and syntax wise simultaneously.  Both, the execution and compilation of our language are based on CP-Nets properties.  Dealing with concurrency is straight forward with CP-Nets and does not require any adaptations.  CP-Nets are easily adapted to define Object-Oriented languages due to their ability to deal with different types of data (colors) and the use of global variables.  CP-Nets can be adapted to different contexts, such as time and QoS contexts (e.g., for online services)  CP-nets have several behavioral and dynamic properties such as, boundedness, home state, coverability, persistence, synchronic distance, liveness, fairness and analysis methods such as incidence matrix, reachability graph, and coverability tree which facilitate and enrich the execution and compilation of the language. And for the adaptability, we have separated the composition, from the inputs and outputs which allows us to orient the language towards different data types. In our research, we defined a tree structure representing XML data and rendered the language XML oriented. Now we give the formal definition of the I/O of the XCDL and the syntax of the language core.

4.2 XCD-trees I/O of XCDL Figure 3: Functional Composition in XCDL

4.1.1 XCDL Properties and Discussion To render the language as generic, extensible and user friendly; it should fulfill the following main properties: 

Simplicity: providing an easy to use language practical for all users: novice and expert programmers.

Since XCDL is XML oriented, therefore it aims at manipulating XML structured data, whether they are user based XML documents or fragments, DTD (Document Type Definition) based documents or XSD (XML Schema Definition) based documents. In order to describe XML data structure, we introduce a representation called XCD-tree depicted in Figure 4. It is based on the tree model defined in the standardized W3C DOM model. It views an XML based document as a node-tree where all nodes in the tree have a relationship to each other. In our research, we design the XML Content Description tree (XCD-tree) as an ordered labeled tree allowing us to represent the structure defining the content of XML data. It is imperative to note that the XCD-

tree does not represent the content of XML data itself but its structure. The content of XML Data is defined by the XML elements, attributes, and element/attribute values, which we assume to be textual data in this paper (similarly to most approaches targeting XML data management, e.g., search, indexing, etc., which disregard the various types of values that could occur in XML documents, e.g., Decimal, Integer, Date…, for the sake of simplicity).

empty list of child nodes is a Leaf node and Text nodes are the only Leaf nodes. We denote by RXCD-tree the root node of XCD-tree. If the XML data is a fragment of XML and contains no unique root element, then a virtual root node is inserted v_root. The ELEMENT and ATTRIBUTE typed nodes represent structural data, their labels underlining corresponding element/attribute tag names. As for a TEXT typed node, it represents data content, and is thus assigned a TEXT label in our tree representation model (since we are only interested in the content structure, and not the content values themselves). To illustrate this, we present Figure 4 depicting XCD-trees originating from an XML document (books.xml). In Figure 5, an XCD tree representation of an XML fragment with no single root.

Figure 4: XML document to XCD-tree example The XCD-tree allows us to represent any type of XML, datacentric and text-centric, i.e., DTD (Document Type Definition) and XSD (XML Schema Definition,) and XML files and XML data fragments. The XCD-tree representation of DTDs and XSDs is straightforward since they already give a structural view of XML documents. As for XML files and fragments, we use tree structural summarization techniques with repetition reduction [11] in order to extract the structure of the XCD-tree. It is important to note that our purpose is to have a representation describing the content structure of XML data (ELEMENT, ATTRIBUTE and TEXT nodes). Their grammar constraints, such as max occurrence and min occurrence are out of the scope of this paper. Definition 1-XCD-tree, it is a root node with a set of ordered subtrees,

Figure 5: XML fragment to XCD-tree example In the following section, we define the XCDL core.

4.3 XCDL core As discussed in the previous sections, the XCDL is a visual language defined in 3 levels as shown in Figure 7.

4.3.1 XCDL-Graphical Representation Model (XCDL-GR) The XCDL-GR model defines the graphical components used to represent the language syntax visually to the user. We define the following components, Point, AD (Abstract Drawing), Color, Circle, Line and Rectangle as shown in Figure 6.

XCD-tree= (NX, TX, LX, fX, AX) where:     

NX is the set of nodes in the XCD-tree (i.e., XCD-nodes) TX{ELEMENT, ATTRIBUTE, TEXT} is the set of node types associated to each XCD-node LX is a set of labels associated to each XCD-node fX : NX  LX,TX is the function associating a label and a type to each node AX  NX x NX is the set of arcs associating 2 nodes together

Definition 2-XCD-node Nx is represented by a doublet, XCD-node = where:  type  TX  label  LX A node can have one and only one parent except for the root node which has no parents. Each node has a list of child nodes. Attributes are child nodes of their elements. A node with an

Figure 6: GR Components Point defines a spatial point represented by 2 Cartesian coordinates and is used to define reference points for the rest of the components. AD is an abstract drawing defined by 2 reference points and is considered a super type for Line, Rectangle and Circle which are the 3 main graphical components of our language where Line is a graphical line, Rectangle is a rectangle filled with an image and Circle is a circle filled with an RGB Color defined by an Integer. If we consider D an instance of a drawing type and t one of its tuples, we denote D.t in order to specify the required tuple (e.g., consider r as a Rectangle, R.img retrieves img of Rectangle r). The following section presents the syntax of the XCDL core based on CP-Nets.

 

XCD-Node:Attribute defines the type XML Attribute XCD-Node:Text defines the type XML Element/Attribute Value

We define a marking M, which will be used in the rest of this paper, over a place p allowing us to retrieve the value in p. Definition 5-M is a marking over p where p P and M(p) is the value of the token in p  Figure 7: XCDL overview

4.3.2 Syntax Definition Before we introduce the XCDL core syntax, we define the grammar it is based on. The syntax is based on the grammar defined by CP-Nets‟ algebra (and therefore retains their properties such as, Petri Net firing rule [13] and Incidence matrix [7]). Definition 3-XCGN (standing for XML oriented Composition Grammar Net) represents the grammar of the XCDL which is compliant to CP-Nets. It is defined as XCGN = (, P, T, A, C, G, E, I) where: 

is a set of data types available in the XCDL

The XCDL defines 6 main data types, Char, String, Integer, Double, Boolean, XCD-Node} where Char, String, Integer, Double and Boolean designate the standard types of the same name. XCD-Node defines a super-type designating an XML component (cf. definition 4) P is a finite set of places defining the input and output states of the functions used in the XCDL T is a finite set of transitions representing the behavior of the XCDL functions and operators A  (P x T)  (T x P) is a set of directed arcs associating input places to transitions and vice versa o a A: a.p and a.t denote the place and transition linked by a C:Pis the function associating a color to each place G:TS is the function associating an SD function to a transition where: o S is the set of SD-functions, which are operations performed by functions indentified in the development platform’s libraries (e.g., concat(string,string)) E:AExpr is the function associating an expression expr Expr to an arc such that: o a A: Type(E(a))=C(a.p) I:PValue is the function associating initial values to the I/O places such that: o p P, v Value : [Type(I(p))=C(p) Type(v)

Before defining the syntax of our language, we define an empty CP-Net “” which will be used in the rest of this paper. Definition 6- is an empty CP-Net and is defined as,     

o

  

 

 



XCD-Node:Element defines the type XML Element

 = (, P, T, A, C, G, E, I) where:

P=Ø T=Ø A= Ø Since the CP-net is empty, therefore the functions do not perform any operations.

Then we introduce the composition which is defined mainly by 2 types: (i) a serial composition which is a sequential composition between multiple instances of SD-functions and sequence operators as shown in Figure 9.a. (ii) a concurrent composition which is a composition between multiple instances of SDfunctions and sequence operators to a single instance of a SDfunction as shown in Figure 9.b. The concurrency operator is used in this case to indicate that the SD-functions are concurrent. We define next a SD-function, which represents a function defined in the system‟s library, through a DLL file or web-services, having 1 or multiple input and a single output. Definition 7-SD-function is a system defined function based on CP-Nets, describing an operation based on an indentified function in the system’s library and is defined as, SD-function = (, P, T, A, C, G, E, I) where: 

XCD-Node  {XCD-Node:Element, XCD-Node:Attribute and XCD-Node:Text} where:

=Ø

We define now the syntax of the XCDL core. As mentioned previously, the core of the language is defined by SD-functions, a sequential operator, a concurrency operator and the composition which is realized between different instances of SD-functions, and sequential and concurrent operators. Therefore we introduce next the 3 main components of the XCDL: SD-function, Sequence operator “” and concurrency operator “//”. The latter is an abstract operator denoting that the functions are concurrent.



Definition 4-XCD-Node is a super type designating an XML Component. It has 3 main sub-types as defined in the XCD-tree:

We denote by M0 the set of initial markings of P and M(p) the initial marking of p where M0(p) = I(p)



is the set of colors defining the types of data available in the SD-function o   XCGN. P is a finite set of places defining the input and output states of the SD-function o P = PInPOut and PIn  POut = Ø where PIn = {pIn0, pIn1, …, pInn} and POut = {pOut}. PIn represents





  



the set of input places and POut represents the set of output places. T is a finite set of transitions representing the behavior of the SD-function o T = {t} where t contains the operation to be executed. A  (PIn x {t})({t} x POut) is a set of directed arcs associating input places to transitions and vice versa where PIn x {t} indicates the set of arcs linking the input places to t and {t} x POut linking t to the output places. C:Pis the function associating a color to each place G:{t} S is the function associating an operation to t where Type(G(t)) = C(pOut). The operation can be retrieved with a URL to the DLL file or a web-service. E:AExpr is the function associating an expression expr Expr to a : o Expr is a set of expressions where: ∀𝑒𝑥𝑝𝑟 ∈ 𝐸𝑥𝑝𝑟: 𝑒𝑥𝑝𝑟 𝑀 𝑎. 𝑝 𝑖𝑓 𝑎. 𝑝 ≠ 𝑝Out = 𝐺 𝑎. 𝑡 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 I:PInValue is the function associating initial values to the Input places.

Definition 9-Concurrency is an abstract operator denoted by the symbol “//” which indicates that multiple instances of SDfunctions are concurrent. It is defined as an empty CP-Net. Figure 8.b shows a graphical representation of a sequence. The concurrency operator is an abstract operator and has no graphical representation. In the XCDL Core we separate the composition to a Serial Composition mapping sequentially several instances of SDfunctions and a Concurrent Composition, mapping several instances of SD-functions sequentially to a single instance of SDfunction. Figure 9.a and 9.b illustrate respectively a Serial Composition and a Concurrent Composition.

In Figure 8.a, we give a graphical representation example of a SDfunction.

Figure 8: Graphical representations of the Core Components We define now a Sequence operator “” which is used to map an output place of a SD-function to an input place of another. Definition 8-Sequence is an operator denoted by the symbol “” which maps 2 places together and is defined as  

    



Sequence = (, P, T, A, C, G, E, I) where:  is the set of colors where || = 1 P is set of 2 places defining the input and output states of the Sequence operator o P = PInPOut and PIn  POut = Ø where PIn = {pIn} and POut = {pOut} where pIn represents the input place and pOut represents the output place T = {t} where t contains the sequence operator A = ({pIn} x {t})({t} x {pOut}) = {aIn, aOut} is aIn and aOut are directed arcs associating respectively the input place pIn to transition t and t to the output place pOut C:Pis the function associating a color to each place where C(pIn)=C(pOut) G: is a function over T Where Type(G(t)) = C(pIn)  G(t)=M(pIn) E:AExpr is the function associating an expression expr Expr to a : o Expr is a set of expressions where: ∀𝑒𝑥𝑝𝑟 ∈ 𝐸𝑥𝑝𝑟: 𝑀 𝑎. 𝑝 𝑖𝑓 𝑎. 𝑝 ≠ 𝑝Out 𝑒𝑥𝑝𝑟 = 𝐺 𝑎. 𝑡 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 I:POutValue is the function associating initial values to the output place

Figure 9: Composition in XCDL Definition 10-SC is a Serial Composition, 𝑆𝐶 = 𝑛𝑖=0 𝑆𝐷𝐹ii , linking sequentially n instances of SD-functions using n-1 instances of Sequence operators and is compliant to a CP-Net. It is defined as 𝑺𝑪 =  

     



𝒏 𝒊=𝟎 𝑺𝑫𝑭ii =

(, P, T, A, C, G, E, I) where:

SDFi is a SD-function where: o i  [0,n] and j  [0,n], SDFi ≠ SDFj for i≠j i is a Sequence operator where: o i. SDFi. o i.PIn = SDFi.POut and i.POut  SDFi+1.PIn o n = (Ø, Ø, Ø, Ø, C, G, E, I) in an empty CP-Net 𝛴 = 𝑛𝑖=0 𝑆𝐷𝐹i. 𝛴 𝑃 = 𝑃𝐼𝑛 ∪ 𝑃𝑂𝑢𝑡 𝑤ℎ𝑒𝑟𝑒 𝑃𝐼𝑛 = 𝑛𝑖=0 𝑆𝐷𝐹𝑖 . 𝑃𝐼𝑛 𝑎𝑛𝑑 𝑃𝑂𝑢𝑡 = 𝑛 𝑖=0 𝑆𝐷𝐹𝑖 . 𝑃𝑂𝑢𝑡 𝑇 = 𝑛𝑖=0(𝑆𝐷𝐹𝑖 . 𝑇 ∪ 𝑖 . 𝑇 ) 𝐴 = 𝑛𝑖=0(𝑆𝐷𝐹𝑖 . 𝐴 ∪ 𝑖 . 𝐴 ) C:Pis the function associating a color to each place where C = SD-function.C G: is a function over T where ∀𝑡 ∈ 𝑇, 𝐺 𝑡 = 𝑆𝑦𝑠𝑡𝑒𝑚-𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑆𝐷𝐹. 𝐺(𝑡), 𝑡 ∈ 𝑛𝑖=0 𝑆𝐷𝐹𝑖 . 𝑇 𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒. 𝐺 𝑡 , 𝑡 ∈ 𝑛𝑖=0 𝑖 . 𝑇 E:AExpr is the function associating an expression where E = SD-function.E



I:PInValue is the function associating initial values to the Input places, I = SD-function.I

Definition 11-CC is a Concurrent Composition, 𝐶𝐶 = 𝑛 𝑖=0(𝑆𝐷𝐹ii𝑆𝐷𝐹n+1) // linking n instances of SD-functions using n instances of Sequence operators concurrently to an instance of SD-function and is compliant to a CP-Net. It is defined as 𝑪𝑪 = 𝒏𝒊=𝟎(𝑺𝑫𝑭ii 𝑺𝑫𝑭n+1)//= (, P, T, A, C, G, E, I) where:        



SDFi and SDFn+1 is a SD-function where: o i  [0,n+1] and j  [0,n+1], SDFi ≠ SDFj for i≠j i is a Sequence operator where: o i. SDFi. o i.PIn = SDFi.POut and i.POut  SDFn+1.PIn 𝛴 = 𝑛+1 𝑖=0 𝑆𝐷𝐹i. 𝛴 𝑃 = 𝑃𝐼𝑛 ∪ 𝑃𝑂𝑢𝑡 𝑤ℎ𝑒𝑟𝑒 𝑃𝐼𝑛 = 𝑛+1 𝑖=0 𝑆𝐷𝐹𝑖 . 𝑃𝐼𝑛 𝑎𝑛𝑑 𝑃𝑂𝑢𝑡 = 𝑛+1 𝑖=0 𝑆𝐷𝐹𝑖 . 𝑃𝑂𝑢𝑡 𝑇 = 𝑛𝑖=0(𝑆𝐷𝐹𝑖 . 𝑇 ∪ 𝑖 . 𝑇 ) 𝑆𝐷𝐹n+1. 𝑇 𝐴 = 𝑛𝑖=0(𝑆𝐷𝐹𝑖 . 𝐴 ∪ 𝑖 . 𝐴 ) 𝑆𝐷𝐹n+1. 𝐴 C:Pis the function associating a color to each place where C = SD-function.C G: is a function over T where ∀𝑡 ∈ 𝑇, 𝐺 𝑡 = 𝑆𝑦𝑠𝑡𝑒𝑚-𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑆𝐷𝐹. 𝐺(𝑡), 𝑡 ∈ 𝑛+1 𝑖=0 𝑆𝐷𝐹𝑖 . 𝑇 𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒. 𝐺 𝑡 , 𝑡 ∈ 𝑛𝑖=0 𝑖 . 𝑇 E:AExpr is the function associating an expression where E = SD-function.E

Since the XCDL is a visual language and the composition is done via drag and drop. The order used by the user to add his functions and map them together is arbitrary. Nonetheless, this does not affect the resulting composition. We prove that by proving that the composition is associative along with other properties stated below.

4.3.2.1.1 XCDL Algebra Properties Due to the lack of space, we did not include the proofs for the following properties. Consider a, b, c and d instances of SD-functions. We were able to identify the properties in Table 2: Table 2: Algebra Properties Associative property of Sequence Associative property of concurrency Commutative property of concurrency Distributive property of concurrency Sequence Identity property Concurrency Identity property

(ab)c= a(bc) ((a//b)//c)d=(a//(b//c))d (a//b)c=(b//a)c (a//b)c=((ac)//(bc)) a =a a// =a

In the following section, we define the transformation syntax which will allow us to transform the XCDL Core syntax into a graphical representation based on the components defined in the XCDL-GR model.

4.4 Transformation Syntax In order to define the transformation syntax, we introduce formally an abstract syntax which will ease the treatment of the graphical components and the transformation to the XCDL Core syntax. Since the XCDL is based on Colored-Petri Nets, therefore

we identify the following main components: Color, Place, Transition and Arc. Definition 12-AS is the abstract syntax for XCDL and is defined as    

AS = < F, FP, FT, FA > where: F:  C is a function associating an abstract drawing type Color to a type   FP: P  O is a function associating a drawing type Circle to a place p P FT: T  R is a function associating a drawing type Rectangle to a transition t T FA: A  L is a function associating a drawing type Line to an arc a A

The transformation syntax which allows us to transform the XCDL syntax into the XCDL-GR model by the aid of the AS uses 2 main functions as shown in Figure 11.

Figure 10: Transformation Functions Definition 13-T is the transformation syntax and is defined as,

T = where:  

TFS is a transformation function used to translate sequence operators into graphical data TFF is a transformation function used to translate a SDfunction into graphical

The transformation based on TFS and TFF are depicted in Figure 10.a and 10.b respectively. As for the formal definitions of TFS and TFF, they were omitted in this paper due to the lack of space. In this section we defined a generic composition language which allows users to visually create functional compositions. The language syntax was defined in CP-Nets. The components and the composition results are all CP-Nets which allows the composition to express true concurrency and parallelism. We present next a prototype which we developed in order to validate the XCDL core language.

5. PROTOTYPE To validate our language, we developed a prototype based on the XCDL core grammar allowing us to draw XML oriented manipulation operations based on functions existing in the system libraries. The functions defined in the prototype were mainly functions allowing different string manipulations. The prototype was developed in Visual Basic Dot Net (VB.Net) and therefore the string functions used were the ones defined in the VB.Net String Class (e.g., concat, trim, hashcode etc.). The architecture of the prototype is shown in Figure 11. The prototype is composed of 4 main modules: (i) the internal data

model, (ii) the library, (iii) the I/O XCD-trees and (vi) the composition platform.

5.3 I/O XCD-trees In this module we defined an algorithm which generates a summarized structure of an XML document with repetition reduction. It takes an XML file as an input and generates a tree list as an output representing the summarized structure of the document as shown in the XCD-tree sections in Figure 5.

5.4 Composition Platform Figure 11: Prototype Architecture

5.1 Internal Data Model

(a) SD-function schema

The composition platform, shown in Figure 13, provides the user with a graphical platform allowing him to compose his operations from the functions defined by the Library module. This platform is divided to 4 main sections: the XCD-tree:In which shows the input XML structure, the XCD-tree:Out which shows the output structure, the SD-functions which lists the SD-functions defined by the Library and the Composition Workspace which is where the composition is drawn by dragging and dropping SD-functions and sequentially mapping them by consequently selecting an output and then an input. To give an illustration of the use of XCDL, we go back to scenario 1. A journalist wants to filter RSS feeds based on a specific topic (e.g. pollution), check for information redundancy by comparing the similarity of the feeds and generate an ATOM XML based report with the retrieved sensitive data. Figure 13 shows the composed operation defined based on CP-nets. To define the composition operation, first we define the input and output content description structures in forms of XCD-trees. The input and output XCD-trees generated represent respectively a simplified RSS structure and an ATOM structure. Different SDfunctions are shown in the Composition Function‟s section.

(b) Composition schema Figure 12: Relational Schemas Compliant with XCGN The internal data model is based on XML and is compliant to the XCGN. The internal data model is defined in compliance with CP-Nets as defined in Definition 10. It is composed of 2 relational schemas as shown in Figure 12, and is XML based. The first one is the library schema which defines the internal model of the SDfunctions defined in the prototype library based on the syntax definition of SD-functions (cf. Definition 7). The second schema defines the internal model of the composition based on the composition syntax in Definitions 10 and 11.

5.2 Library The library module is a set of graphical forms which allow us to customize our language and define the functions which we want to embed in our library. The customizations are done by choosing which data types we wish to include in our language, what colors we want to give to each type, what are the dimensions we want to give for a transition (ht and wt), the maximum height for the places in a function (h), the images used to describe the functions etc. The functions are indentified via a set of forms allowing us to initiate a function definition, define its transition containing the operation to be executed and define its I/O places.

In this scenario, to create his composition, the user first checks available SD-functions. He starts by selecting the textual content values of the Elements, title and description as shown above by using the Extract Data function which extracts the values from the selected XCD-nodes. Then, he adds a separator character “*” to the extracted value from the Element title using the Concat function. He concatenates the data extracted from the Element description with the title using another instance of the Concat function. At this stage the composition so far merges 2 TEXT nodes together with a character separator „*‟ and discards the rest of the RSS data. The user then sequentially adds a Select function to the merged output which discards all data not containing the specified topic. To compare the feeds, the user inserts a Wait For function and gives it 2 as an argument so that it waits till it receives 2 data values and then transmits them simultaneous as separate outputs. To this point, the user is able to filter his feeds based on a certain topic and extract the sensitive data in textual format. The user adds next a Redundancy removal function. After removing all redundancies, the user directs the output to a Split operator which will split data based on the separating character “*” and maps the data to the TEXT typed XCD-nodes, title and description, in the output XCD-tree as shown in Figure 15. The split operator, along with other operators will be detailed in future works.

Figure 13: Composition Platform

6. CONCLUSION AND FUTURE WORKS In this paper we discussed the issues regarding XML oriented visual languages which scope non expert programmers. We were mainly interested in visual composition languages. In the literature we indentified 2 main approaches, Mashups which provide non expert programmers with means to draw manipulation operations for web data by means of composition mainly, and XML oriented visual languages which are mainly oriented towards XML data extraction and structure transformation. The main issues addressed were that Mashups are not XML specific and did not provide any formal syntax for the composition, and as for XML visual languages, they are limited to data extraction instead of manipulation and mainly based on existing languages and have limited expressiveness. To solve these issues, we presented the core of a generic composition visual language, XCDL, based on existing functions and rendered it XML oriented. The language was based on CP-Nets in order to provide scalability, flexibility, adaptability, expressivity, and performance analysis. The language was implemented in a prototype developed in VB.Net which allows users to customize the language and create composed operations for XML textual values mainly. The main track, in future works, relies on extending the XCDL Core to include composed functions. In other words, a user should be able to define and embed functions based on the composition of existing ones. A second track is to provide decision and loop operators such as if and for which will increase the expressiveness of the language.

7. REFERENCES [1]

D. E. Simmen, M. Altinel, V. Markl, S. Padmanabhan, A. Singh, Damia: data Mashups for intranet applications, In: International Conference on Management of Data, pp. 11711182, 2008

[2]

D. Braga , A. Campi , S. Ceri, XQBE (XQuery By Example): A visual interface to the standard XML query language, In: ACM Transactions on Database Systems (TODS), vol.30, pp. 398-443, 2005

[3]

E.J. Golin, S.P. Reiss, the specification of visual language syntax, In: IEEE Workshop on Visual Languages, pp. 105 – 110, 1989

[4]

E. Pietriga , J. Vion-Dury , V. Quint, VXT: a visual approach to XML transformations, In: Proceedings of the 2001 ACM Symposium on Document engineering, pp. 1-10, 2001

[5]

G. Di Lorenzo, H. Hacid, H. Paik, B. Benatallah, Data integration in Mashups, In: ACM SIGMOD Record, V.38, pp. 59-66, 2009

[6]

J. Beckham, G. Di Fabbrizio, N. Klarlund, Towards SMIL as a Foundation for Multimodal, Multi-media Applications, In: W3C Posi-tion Paper, pp. 1363-1366, 2001

[7]

K. Jensen, An Introduction to the Theoretical Aspects of Coloured Petri Nets, In: Lecture Notes In Computer Science, Vol.803, pp. 230-272, 1993

[8]

M. Erwig, A Visual Language for XML, In: Proceedings of the 2000 IEEE International Symposium on Visual Languages (VL'00), pp. 47, 2000

[9]

R. J. Ennals, M. N. Garofalakis, MashMaker: Mashups for the masses, In: International Conference on Management of Data, pp. 1116-1118, 2007

[10] S. Ceri , S. Comai , E. Damiani , P. Fraternali , S. Paraboschi

, L. Tanca, XML-GL: a graphical language for querying and restructuring XML documents, In: Computer Networks: The International Journal of Computer and Telecommunications Networking, vol.31, pp. 1171-1187, 1999 [11] T. Dalamagas, T. Cheng, K.-J. Winkel, T. Sellis, A

methodology for clustering XML documents by structure, In: Information Systems, pp. 187-228, 2006 [12] T. Loton, Introduction to Microsoft Popfly, No Programming

Required, 2008 [13] T. Murata, Petri Nets: Properties, Analysis and Applications,

In: Proceedings of the IEEE, Vol. 77, No. 4, pp. 541-580, 1989 [14] W3C World Wide Web Consortium, Extensible Stylesheet

Language Transformations -XSLT 1.0, 1999