2010 Fifth International Conference on Software Engineering Advances
A Domain-Oriented Approach for GIS Component Selection Gabriela Gaetan, Viviana Saldaño
Agustina Buccella, Alejandra Cechich
Unidad Académica Caleta Olivia Universidad Nacional de la Patagonia Austral Caleta Olivia, Santa Cruz, Argentina e-mail {ggaetan, vivianas}@uaco.unpa.edu.ar
GIISCo, Computer Science Department University of Comahue Neuquén, Argentina e-mail: {abuccel,
[email protected]} Among the different documentation proposals, there is a common understanding about the needs of defining a conceptual framework to classify and describe components from a repository or marketplace. However, there is a lack of such similar understanding to the way components should be characterized.
Abstract—So far, one of the key non-solved aspects of component-based developments is identifying and recovering existing components in effective way. On these lines, we all agree on the needs of organizing and cataloging candidate components to speed up search and identification. Particularly, domain-oriented catalogues have emerged as promissory tools, although it also implies more complex procedures for their creation. In this context, this paper introduces a proposal to compile information for Geographic Information Systems (GIS). By using Natural Language Processing (NLP) along with a standardized information scheme, we extract and catalogue information from components on the Web as well as from requirements from components’ users. We describe the main elements of the process and we illustrate them by using a motivating example.
Publication Process Classify Components
Understand Candidates
Store Descriptions
Make a decision
Repository Component Descriptions
INTRODUCTION
One of the problems for reaching a wide-use of component-based developments lies on the difficulty of identifying and recovering existing components. Undoubtedly, component community is moving to the use of classified components – from faceted classification to ontology-based. A main concern focuses on the needs of a common description model. However, in spite of several catalogs exist on the Web, such a model is still an open issue – currently some portals only get an overview of the technical and legal features of the software; and others focus only on a particular type of component [2]. Selecting OTS (Off-The-Shelf) components involves a complex process that relates component developers and application developers. The former are responsible for supplying information to be used when searching, understanding and selecting components. For instance, as shown in Figure 1. From [18], component developers’ activities constitute the “Publication Process”, which consist of (1) classifying the component recently created; (2) documenting the component; and (3) storing information in a repository. On the other hand, the application developers’ activities constitute the “Selection Process”, which consist of (1) searching candidates by matching some quality criteria, including functionality; (2) understanding information of candidates; and (3) making decisions about selection and adaptation. One of the problems of both processes is the lack of standard documentation to describe OTS components. 978-0-7695-4144-0/10 $26.00 © 2010 IEEE DOI 10.1109/ICSEA.2010.22
Selection Process Search Candidates
Document Components
Keywords - OTS-Based Development; Geographic Information Systems; Natural Language Processing; Component-Based Development.
I.
Component User
Component Developer
Figure 1. A component selection process
In [20], we have identified some key elements to support a standardized framework towards a knowledge-based process for COTS component identification. Firstly, when establishing a component marketplace, one of the specific demands is to provide well-structured information about components, i.e., a well-structured catalogue. This issue leads us to questioning about how information should be structured to be considered useful. It means not only gathering information from third parties but also setting basic elements that categorize COTS components, allowing us to assess quality properties – perhaps by using metrics or some testing mechanisms. On the other hand, matching provided and required services requires not only standardizing information from vendors but also standardizing requirements for searching. In analyzing different trends on component classification and matching, we have found several interesting aspects, which might constitute a basis for improving COTS component identification. Particularly, the use of domain-specific standard information might set a common vocabulary to support both processes – publication & selection. With this aim, in [12] [22], we have adapted a general component specification framework [21] to build a more suitable scheme for classifying GIS components. In order to normalize the classification categories, we firstly analyzed available information on web catalogues for GIS 94
by using a list of standardized values (“executable”, “standard”, “service”, etc.), and suggests the use of architectural patterns to describe the “architectural level” attribute; however there is no explicit concern on the kind of notation we may use to specify each attribute. The need of a standard documentation is also remarked by Kallio and Niemelä, who provide a general template for documenting software components [15]. This work introduces an additional discussion to our analysis: the problem of how components should be documented from the component buyer’s and supplier’s point of view. From the buyer’s perspective, implications are on the need of establishing procedures to define and balance searching goals. Since negotiation of goals is part of an identification process inherently, incorporating classifications might facilitate discussions; i.e., rationale will be more explicit and funded. Our approach collects the main ideas of many of these efforts and goes further by proposing a domain-specific standard schema for geographic information systems’ components. Our main concern is about how to build a useful description repository to automate selection. Therefore, from the suppliers’ view, we suggest building a wrapper for information available on the Web, in such a way that search engines may access a normalized information structure when selecting candidates. From the composers’ view, we suggest building another wrapper for components’ requirements, in such a way that required services are expressed by using the same normalized information structure. In this way, selection becomes a matter of mapping two models represented by a wrapper for information on the web and a wrapper for component’s requirements.
components, and we tailored the geographic service taxonomy provided by the ISO/IEC 19119 standard [13]. The rest of the paper is structured as follows: in Section II, we discuss some related work. In Section III and Section IV, we offer an overview of our approach. In Section V, we present an example of the application of our approach. Section VI introduces some discussion based on our experiences; and in Section VII, we conclude and present some further work. II.
RELATED WORK
Among the different proposals, there is a common understanding about the needs of defining a conceptual framework to classify and describe components from a repository or marketplace. However, there is a lack of such similar understanding to the way components should be characterized. The following topics are just a few examples of it. A matter of identification. Name is a universally acceptable identifier; however additional information may be needed to effectively managing components. For example, Vitharana et al. [25] use a synthesis of structured identifiers to classify components. Identifiers are composed of the business component name, vendor name, a contact person, and the version of the component. Additional items required for managing components could be added to this list. Besides, components are classified according to their environment (i.e., programming environment such as .NET, EJB, and CORBA). On the other hand, the UnSCom framework [21] identifies a component through its composition contract, which contains deployment-time constraints and consists of a set of required and provided interfaces. A classification schema introduces various contract levels for the specification of composition contracts, such as the domain-related level, which describes the functionality that is either being required or provided at the interfaces of a component; and so forth. As another example, the XCM approach [23] encapsulates the component name, and the description of the general information about the component such as version, package, language, component model, domain, operating system, and publisher. As we can see from these few examples, identification sometimes implies some sort of classification as well – through domain identification, composition contract description, or environment classification. It seems that we still need to reach consensus on what features are considered core – and possibly standard – for identifying components. A matter of notation. We should firstly agree on the needs of a common notation for describing the different features of a classification schema. Notation is introduced explicitly by some proposals. For example, the UnSCom framework [21] introduces a set of textual specification formats, a set of graphical notations based on UML 2.0, and the XML format to exchange component specifications. Particularly, the main notation for domain-related information is a normative language, a special format to denote an ontology that is both machine- and humanunderstandable. To exemplify our point, note that the proposal by Jaccheri and Torchiano [14] describes attributes
III.
A WRAPPER ON THE WEB: A DOMAIN-SPECIFIC CLASSIFICATION SCHEMA
Our classification scheme designed to organize GIS component information [12] comes from the comparison and composition of information published on different Web sites (ComponentSource [4], FreeGIS [9], Freshmeat [11], ESRI [7]) and from the adaptation of the geographic service taxonomy depicted in the ISO/IEC 19119 standard [13]. Our scheme, called compoSIG, is composed of entities referring the main concepts of the GIS component domain. In spite we propose a light ontology (less expressive in some ways), we consider that similarly to [16], our ontology is enough to define the elements (entities, attributes, and relationships) and semantics needed by our work. Of course, entities are characterized by attributes and relationships to other entities. In addition, there is a set of axioms (i.e., generalization/specialization, disjunction, whole-part, etc.) that are used to improve the recovery process. The main goal of our system is automatically discovering text fragments that characterize components on the Web, and documenting them as instances of our classification scheme (TABLE I. ). Each instance is stored in a Component Description Repository.
95
TABLE I. A STANDARD-BASED CLASSIFICATION SCHEME FOR GIS COMPONENTS
General and Commercial Information Name Version Web Site Developer Organization E-mail Phone Postal Address Price Translation Artifacts Software Requirements Hardware Requirements
Classification
Ontology
Functionality Web Page Web Page
Type
Configuration Parameters
Annotated Pages
Classification
Geographic Service Operating System
Formatting Formatted Data
Web Page Evaluation
Programming Language Status
Implemented Functionality
Quality Level
Component Description
License Standards
Domain Vocabulary
Figure 2.
General view of the systems’ modules
A. Configuration Module The information extracting cycle starts uploading an ontology and defining its relationship with the main terms of the GIS component domain. The main goal of this module is defining the types of labels that will be used later during the annotation process. In our case, the labels match with the classification scheme of the compoSIG ontology, and contain entities such as “name”, “license”, “price”, “web site”, and so on. In spite of our system is based on the reuse of the GATE component, the Configuration module is the exception. This module presents a graphical interface that allows domain experts to manually set the parameters that are needed to perform the process of classification on a text (a GIS component’s description). It is necessary to remark that this configuration process might be also carried out by generating text files using other tools; however, from this module (as Figure 2 shows) not only generates the parameter files but also checks that all of them are ready to start the classification based on annotations. As we previously mentioned, the information extracting tools that we use allow us to automatically annotate text with respect to a classification scheme represented by the ontology. For the appropriate functioning of the tools, one or more files are needed to set several parameters such as instances of each concept of the ontology, relationships among them, and type of annotation.
The system is based on the compoSIG scheme and applies NLP tools for extracting information from Web catalogues of GIS components. For implementing the system, we use the GATE (General Architecture for Text Engineering) tool [5], one of the most mature and widely used NLP tools. Tasks related to information extraction (tokenization, semantic labeling, and phrase partitioning) are implemented by using ANNIE (a Nearly-New Information Extraction System), an information extracting system distributed with GATE. Figure 2 describes the main modules of the system that are represented by squared boxes; meanwhile rounded boxes represent information. The Configuration module allows us to define the GIS component domain terms and their relation to the ontology. This process requires the scheme as input and produces a set of parameters to classify components as a result. The Classification module takes web pages that describe GIS components as input along with the parameters defined in the previous process. Then, the Classification module generates annotations written in XML on the original Web page by applying annotation techniques based on ontologies. The annotated pages will populate the Component Description Repository, which contains metadata to describe structured components. The Formatting module automatically completes the facets of the classification scheme associated to the ontology. Before storing the formatted information along with the description of the component, the Evaluation module allows a domain expert to verify the validity of the annotations, and determine a quality level based on completeness of the resulting classification. Let us briefly describe each of these modules.
B. Classification Module Differently from other proposals, such as the one in [24] that require suppliers to publish component information by manually filling a predefined specification, the Classification module automatically classifies information from Web catalogues according to the compoSIG scheme. Users must import an HTML file with the component description to classify a new GIS component by using annotation. Considering that the procedure extracts the information to fill the classification scheme, associating the process to the ontology is absolutely necessary.
96
The annotation process is implemented by reusing GATE’s components: Tokenizer (splits text into single tokens); Sentence Splitter (splits sentences in a text); Part Of Speech Tagger (generates the label corresponding to the grammatical category of a sentence); and OntoGazetteer, which is in charge of generating a temporal annotation that binds an analyzed chunk of text to a class of the ontology. For instance, in the case of compoSIG, the word ‘Proj’ in the text will be bound to the ‘name’ class. For each entity identified in the text, the Classification module search for the best-fitted class of the ontology. After selecting the class, a specific instance is determined and metadata for binding text and ontology are generated. Rules from the grammatical analyzer (Jape) are applied to determine whether a candidate annotation matches to a particular ontology class (or sub-class). Jape also allows us to filter alternative annotations by applying simple disambiguation techniques (in case that an entity reference is associated to several possible instances and types). The final result of the annotation process consists of the original web page enriched by annotations in XML (automatically from GATE).
http://www.remotesensing.org/proj software Windows,GNU/Linux,other Unices C MIT cartographic data, cartesian data transformation, projection human interaction Figure 3. XML Component Description
IV.
A WRAPPER FOR COMPONENT’S REQUIREMENTS: EXTRACTING INFORMATION FROM USE CASES
From the user's side, geographic requirements enter to the system as textual use cases with a controlled language. Particularly, we use the template for use cases defined in [3] in which the functionalities of the system are described. In addition, a controlled language [17] is used in order to restrict the use of words and sentence structures. We apply the SVDPI pattern (Subject, Verb, Direct object, Preposition, Indirect object) to build the sentences within use cases. Then, these uses cases must be processed. To do so, we firstly define a taxonomy of GIS services [22] based on the ISO 19119 standard [13]. The taxonomy is used to classify the services required by the users as well as Non-Technical Requirements that are relevant to the selection of OTS components. Part of this taxonomy can be seen in Table II. The table shows four columns: level, service, main verbs, and representative objects. The two first columns are extracted from the taxonomy of the ISO 19119 standard, and represent the main geographic services that can be used in a service specification. The two last columns denote a list of key words in which sentences and verbs are defined separately. For instance, at the human-interaction layer, the catalogue viewer service can be identified by the verbs locate, browse, and manage; and by metadata about geographic data or geographic services. Then, in order to find mappings between our taxonomy and use cases coming from users, we define a methodology based on proposals of natural language processing. In particular, the methodology is based on the proposals defined in [6] [19] with some modifications allowing the extraction of useful information to classify the use cases according to our taxonomy.
C. Formatting Module The Formatting module receives the annotated pages and edits text that will populate the Component Description Repository because the XML/GATE format contains additional data, which must be filtered to describe the component in terms of the classification scheme. The formatting process uploads an XML/GATE file and shows its information as a “Scheme” or as “structured XML” required by the Component Description Repository. The Component Description Repository contains information based on metadata of the analyzed components. We selected XML as the description language to ensure the use of a common syntax, considering that any user will search the repository for component candidates. Tokens related to entities of the ontology and identified in the XML/GATE file are labeled appropriately. They constitute the main input to the new XML file, whose name will be set as the and of the component that is being described (i.e., ‘proj4.6.0.xml’). Single tokens are labeled by using XML labels named after some entity of the ontology as shown in Figure 3. D. Evaluation Module We should remark that before storing a component it is necessary to verify if information is consistent with required information in the repository. To do so, GIS domain experts verify annotation validity and review information. Since information stored in the repository may not be enough to make architectural decisions, this module should be extended to include information quality assessment. It will make us to consider other quality features, such as information completeness, that are currently out of the scope of our work. PROJ 4.6.0
97
TABLE II.
that the main scenario of a use case is “The user selects an electric line”, the tool creates a parse tree classifying each word of the sentence. Figure 5 shows this tree.
PART OF OUR SERVICE TAXONOMY FOR GIS REQUESTS
Layer
Service
Service Description Main Verb
Human Interaction
Representative Object
Catalogue Viewer
Locate Explore Manage
Geographic data metadata Geographic service metadata
Service Editor
Control Understand Compose Invoke Plan
Service Chain of services
Geographic Feature Editor
Interact Display Query Add
Feature Characteristic Orientation Perspective Transparency Texture
C. Generating Event Tokens In the third step, event tokens are created by finding main verbs and representative objects (Table II) within each sentence. Following the last example, in the sentence “The user selects an electric line”, the main verb is the principal word (root) of the verbal phrase. In this case, the verb “select” is classified as the main verb. The representative object is created from the direct object of the sentence; and it must be subordinated to the main verb. In our example “electric line” is the representative object. Therefore, the event token will be “select electric line”.
…
Figure 4 shows the main steps of the methodology. The four steps (A-D) are performed by each action defined in the main scenario of a use case specification.
Figure 5. An example of a parse tree.
Use case template
A. Determine the POS B. Generate the Syntax Tree C. Generate Event Tokens XML Repository
D. Replace Event Tokens by Specific Services
Yes
Further actions?
D. Replacing Event Tokens by Specific Services The last step performs the mapping between the event tokens created in the last step and the services defined in our taxonomy (Table II). To give support to this last step, we create a semi-automatic tool that uses EuroWordNet [8] as thesaurus for the Spanish language. It is a user-guided tool that assists users in the process of choosing synonym relations to make suitable mappings. In our example, the event token (select electric line) generated in the last step is compared to the service descriptions (column 3 and 4 of Table II) of the taxonomy. Once the user selects the description that matches with a required service, he/she will obtain the standardized name of it and the category in which it is contained. The user, by using our tool, finds that the verb “select” is a synonym of the verb “interact” and the object “line” is similar to “feature”. Thus, the tool proposes the standard service “Geographic feature editor” of the “Human Interaction” level. The results of the mapped services are stored in an XML repository that will be used to find mappings between the user's requirements and the information of OTS components published on the Web. To do so, results from this step (required services) are structured following an XML-based structure as shown in Figure 6.
No END
Figure 4. Steps to extract GIS services from textual use cases
Let us briefly illustrate each of these steps. A. Determining the POS Inputs of the methodology are textual use cases documented by using the template. Then, the first step, determining the POS (part-of-speech), starts analyzing each sentence of the main scenario of the use case. This step analyzes each word and specifies the type (verb, noun, etc.) and the role of each of them within the sentence in which they are defined. B. Generating the Parse Tree The second step creates different parse trees according to the sentences of the main scenario of the use cases. In each syntax tree, nodes represent the phrases and leafs represents the words of each sentence. In order to perform the steps A and B, we use the FreeLing [10] tool suite that provides support for the Spanish language. For example, considering
98
line select human interaction
Comp Name
uDig
Figure 6. XML requirement representation.
V.
Proj
MAPPING MODELS
In order to perform the mapping between both processes, Publication and Selection, we are developing a tool to help us in mapping both models. To do so, we have to solve the problem of sense disambiguation, wich comes along with the use of natural language. To face this problem we are working on the application of an adaptation of the Lesk Algorithm [1] that uses information contained in a dictionary to perform word sense disambiguation. This tool lets us compute similarity between a pair of words, giving a score of relatedness to each computation. For mapping functional requirements, we apply this tool to information from required services and components’ descriptions stored in the repository. For example, to perform the mapping of functional requirements, the first step is to select those components that belong to the category of the required service, i.e., those that satisfy that “geoservice” matches the “category” – in our example “human interaction”. Then, within those selected components, we compute similarity for each combination of values for the pairs “verb/task” and “object/object” from Requirement’s XML structure and Component Description’s XML structure, shown in Table III.
NCAR
Comp Name uDig Proj NCAR
TABLE IV. Verb (Requirement’s XML Structure)
VERB/TASK RELATEDNESS Task Rela(Components tednes Descriptions’ s XML Structure)
Comp Avg Value
select
view
4
select select select select select select select
edit transformation projection draw display edit manipulate
4 6 15 70 19 4 19
4
TABLE V. Object (Requirement’s XML Structure) line line line line line line line
OBJECT/OBJECT RELATEDNESS Object RelaComp (Components tednes Avg Descriptions’ s Value XML Structure) Spatial data 5 5 Cartographic data 2 2,5 Cartesian data 3 contours 36 21,75 maps 22 vectors 28 streamlines 1
10,5 28
On the other hand, for technical requirements matching, we compare selected filters in the tool with ontology’s entities corresponding to “classification”, such as Operating System, Programming Language, Standards and License. In our example, our technical requirement is “Operating System=Linux”, which is satisfied by the three components evaluated. As a searching result, we show a list of references (Table VI), which satisfy both functional and technical requirements.
TABLE III. MATCHING XML STRUCTURES Requirement Component Description object object verb task category geoservice
TABLE VI. Comp. Name NCAR Proj uDig
As shown in Figure 6, the values we are going to evaluate on the Selection process side are “line” and “select”; and on the Publication process side, we analyze the elements stored in the Components Description Repository. In our example there are four components [9]: PROJ, GeoTools, uDig and NCAR. The GeoTools component is not evaluated because its geoservice’s value is not “human interaction”. In this way, we reduce the number of components that participate in the matching. Then, we calculate the relatedness between each combination of values from both structures, as it is shown in Table IV and Table V. In Table IV, we can observe all the combinations for values of verb/task of each component, and the resulting score of relatedness for each combination. In the last column, we can see the average relatedness of each component. Table V is similar, but in this case combinations for values of object/object are computed. In addition, as we have an XML record for each step of the use case’s main scenario, after we have found the matching components to each step, the tool returns as result those components satisfying all the use case’s steps, ordered by average relatedness score.
SEARCH RESULT Avg. Relatedness 24,87 6,5 4,5
VI. DISCUSSION We started to validate our approach considering real web portals of component catalogues (i.e., ComponentSource [4], FreeGIS [9], etc.), and requirements from a case study in the domain of oil companies. Based on the results of preliminary applications, where we mapped the models generated by the two wrappers, we identified the following lessons: The use of the standard ISO 19119 in both models allowed us a better mapping between offered and required services of GIS. This is a clear advantage that came from a better understanding among all parties. We had to carefully evaluate how much of the information required to assess OTS components was actually available from information in the catalogues. We analyzed the current gap between the required and provided information, so refinement of taxonomies was guided to reduce the gap, yielding in more realistic attributes. After all these efforts, and providing
99
guidelines for using the tools, we realized that detecting and selecting candidates was faster – and produce higher stakeholders’ satisfaction. The use of textual use cases allows us to apply Natural Language Processing techniques, which help extract requirements mirroring the standard. However, use cases differ widely in breadth and scope, and its appropriate selection is not straightforward. We emphasize the use of scenarios appropriate to all roles involving a system. The architect role is one widely considered but we also have roles for the system composer, the reuse architect, and others, depending on the domain. It is important when analyzing a system that all roles relevant to that system be considered since design decisions may be made to accommodate any of the roles. The process of choosing use cases for analysis forces designers to consider the future uses of, and changes to, the system. It also forces to consider other notations (such as use case diagrams) that should be properly adapted to fit our approach.
SOFSEM 2007. LNCS, vol. 4362. pp. 856-886. (2007) [7] ESRI – The GIS Software Leader – Available: http://www.esri.com/. [Accessed: May 24, 2010] [8] EuroWordNet: Building a multilingual database with wordnets for several European languages – Available: http://www.illc.uva.nl/EuroWordNet/. [Accessed: May 24, 2010]. [9] FreeGIS.org – Available: http://www.freegis.org/. [Accessed: May 24, 2010]. [10] Freeling Home Page – Available: http://garraf.epsevg.upc.es/feeling/. [Accessed: May 24, 2010] [11] Freshmeat.net – Available: http://www.freshmeat.net/. [Accessed: May 24, 2010] [12] Gaetán G., Cechich A., and Buccella A. Extracción de Información a partir de Catálogos Web de Componentes para SIG. XV Congreso Argentino en Ciencias de la Computación. Jujuy, Argentina. (2009) [13] ISO/IEC. Geographic Information Services. Draft International standard 19119 (2002) [14] Jaccheri L. and Torchiano M. Classifying COTS Products. In Proceedings of the 7° European Conference on Software Quality, LNCS 2349, pages 246-255, (2002) [15] Kallio P. and Niemelä E. Documented Quality of COTS and OCM Components. In Proceedings of the 4th ICSE Workshop on Component-Based Software Engineering, (2001) [16] Kiryakov A., Popov B., Kirilov A., Manov D., Ognyanoff D., and Goranov M. Semantic Annotation, Indexing, and Retrieval. 2nd International Semantic Web Conference. USA. (2003) [17] Kulak D.and Guiney E. Use Cases:Requirements in Context. Addison-Wesley Longman Publishing Co., Inc. USA. (2003) [18] Lucena V. F. Flexible Web-based Management of Components for Industrial Automation. Phd thesis, Stuttgart University. (2002) [19] Mencl V. Deriving Behavior Specifications from Textual Use Cases. Proceedings of Workshop on Intelligent Technologies for Software Engineering (WITSE04), (2004) [20] Réquilé-Romanczuk A., Cechich A., Dourgnon-Hanoune A., and Mielnik J.C.. Towards a Knowledge-Based Framework for COTS Component Identification. Second ICSE International Workshop on Models and Processes for the Evaluation of OTS Components, ACM Press, pp. 1-4, (2005). [21] Overhage S. UnSCom: A Standardized Framework for the Specification of Software Components. In Proceedings of the 5th Annual International Conference on Object-Oriented and Internet-Based Technologies, Concepts, and Applications for a Networked World . LNCS 3263, pp. 169-184, (2004) [22] Saldaño V., Buccella A., and Cechich A. Una Taxonomía de Servicios Geográficos para facilitar la identificación de componentes. Proceedings of CACIC 2008. Argentina. (2008) [23] Tansalarak N. and Claypool K. XCM. A Component Ontology. OOPSLA’04 - Ontologies as Software Engineering Artifacts (2004) [24] Varadarajan S., Kumar A., Gupta D., and Jalote P. ComponentXchange: An E-Exchange for Software Components. IADIS Conf. WWW/Internet, pp. 62-72. Portugal (2001) [25] Vitharana P., Zahedi F., and Jain H. Knowledge-Based Repository Scheme for Storing and Retrieving Business Components: A Theoretical Design and Empirical Analysis. IEEE Transactions on Software Engineering. Vol. 29(7), pp. 649-664. (2003).
VII. CONCLUSION AND FUTURE WORK In this paper, we have presented an approach for selecting components based on domain information of GIS and specific ISO standards. Our approach is built upon two models – from the supplier and the composer points of view – normalized to facilitate searching using natural language processing tools. From our experiences, we found that a domain-oriented selection process might improve the whole process, from a technical to a managerial perspective. In addition, as a semi-automatic process is absolutely necessary to effectively detect suitable candidates, we are improving the supporting tool in such a way that we may smoothly transfer this approach into industry. ACKNOWLEDGMENTS This work is partially supported by the UNComa project 04E/072 (Identificación, Evaluación y Uso de Composiciones Software), and the UNPA project 29/B090 (Identificación de Soluciones Off-The-Shelf para Sistemas de Información Geográficos). REFERENCES [1] Banerjee S. and Pedersen T. An adapted Lesk Algorithm for word sense disambiguation using Wordnet. Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, (2002). [2] Cechich A., Réquilé A., Aguirre J., and Luzuriaga J. Trends on COTS Component Identification. 5th International Conference on COTS-Based Software Systems. IEEE Computer Science Press. Orlando, (2006) [3] Cockburn A., Writing Effective Use Cases, Addison-Wesley Pub Co, ISBN: 0201702258, 1st ed., (2000) [4] ComponentSource – The Definitive Source of Software Components. Available: http://www.componentsource.com/. [Accessed: May 24, 2010]. [5] Cunningham H., Maynard D., Bontcheva K., and Tablan V. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Application. 40th Anniversary Meeting of the Association for Computational Linguistics (2002) [6] Drazan J. and Mencl V. Improved Processing of Textual Use Cases: Deriving Behavior Specifications. In Proceedings of
100