Components retrieval systems Oualid Khayati, Jean-Pierre Giraudin Laboratoire LSR - IMAG BP 72 38402 Saint Martin d'Hères Cedex France
[email protected]
Abstract. The main problem encountered when reusing the components libraries is component retrieval, i.e. finding in the library the components that can be used in the construction of a specific information system. Some professional applications development environments evolved to a simple components manager. Although they make possible the management of components libraries, they don’t propose advanced components retrieval tools to help the engineer. In this work, we study the retrieval techniques and we propose a model for components retrieval in reusable components libraries.
1 Introduction Component-based approaches for the development of information systems are a widely growing engineering discipline that deals with the cycle of developing components and developing with components. Software engineers try to borrow concepts of components composition from other engineering disciplines to component software. Since the early nineties when Microsoft introduced a component-based development environment (the Visual Basic environment and its pluggable components), several component-based approaches and components models were introduced (COM, DCOM, EJB, CORBA,...). Component-based development process is composed of two complementary sub-processes (see fig.1) : − Applications engineering sub-process: application engineers use components to build applications. − Components engineering sub-process: component engineers identify, develop and capitalize components that have a high usability and usefulness. Components libraries are so used by different actors. The real actors number depends of the development team organization (see fig.2). Components library management is the central feature of any component-based development approach. This paper aims to describe the existing retrieval components approaches and to put into advance the context in which they can really be helpful for applications engineers.
2
Oualid Khayati, Jean-Pierre Giraudin
Applications engineering
Development by reuse
Development for reuse
Components engineering
Fig. 1. Components based development process
Librarian
Applications engineer Order
Components Extract
Components
Reusable components Library
Archive
Order New components
Orders
Components
Components
Requests for updating
(Component) identification and qualification group Development group
Maintenance group
Fig. 2. Example of development team organization.
2 Components retrieval problem Components indexation and selection can be seen as information retrieval problems. Let us consider a component as a document on which we can apply all information (documents) retrieval techniques. The first works tried to apply plain text classification techniques to components retrieval problems (Boolean model [WAL79,SAL83a], Vectorial model [LUH57, BUC92, ROB94, SIN96], Probabilistic model [MAR60, BOO74, ROB76, CRO79, TUR91], Linguistic model [SAL83, DEF86, NIE90a, NIE90b]). These techniques can be useful with some kind of components like patterns [ALE77, GAM95, COP95] but they are not really helpful for applications engineer to select software components. For example the plain text
3
classification techniques are not able to resolve queries such as “select all the components using Java technology which implement the Interface-name interface”. such queries require component code analysis and externals informations given by humans . Some components retrieval approaches were introduced for these specific needs. These approaches can be classified into three categories based upon the way components are represented: external classification, structural matching and behavioural retrieval. External classification External classification category includes all the approaches that represent the component by an external description. An automatic or a manual process can produce the description. • Faceted representation and vocabulary: faceted classification approaches [PRI87, ZHA00] for components retrieval consist of a collection of facets or classifications, which represent the type of information, that is relevant for identifying reusable components. Each facet has a name and an associated term-space called vocabulary, which is a collection of terms used to describe aspects of the facet. For example, a software component facet may be the programming language used to implement it. This facet can be named langage_implementation and can have one of the next values: java, C, C++,... Faceted classification technique is powerful and gives good results. The disadvantage with this technique is that it requires manual indexing that can be expensive. • Classification using natural language description: the system extracts lexical, syntactic and semantic information from the natural language description [GIR93, GIR94, GIR95]. The interpretation mechanism used for description analysis does not pretend to understand the meaning of the description, but it uses linguistic techniques to get automatically a component description that can be used to resolve the engineer queries. Classification techniques using natural language description are difficult to set up and specific to restricted domain use. Structural matching Structural matching category includes all the approaches that represent the component by extracting its structure. • Signature matching: this approach [ZAR93] primarily relies upon type matching and type transformation. Classes are represented by a multi-set of feature signature. Signature matching approach assumes that the set of allowable signatures in the system is known. Feature can be: - simple types (i.e. integers, naturals), - constructed types (i.e. record of other simple or constructed types), - user defined types (i.e. domain-specific programmed types), - type variables (i.e. type variable α may contain any other type),
4
Oualid Khayati, Jean-Pierre Giraudin
- functions. Signature matching technique is useful when the application engineer knows a partial or a complete signature definition of the needed component. • Specification matching: specification matching [ZAR95, LAB97, HER01] primarily relies upon matching predicates in a logic, given a theory of predicate equivalence. Classes are represented by a collection of predicates pairs which are the pre- and post- conditions of each class operation. Predicates are expressions written in a logic using terms and formula symbols. Specification languages like Z, B, or OCL can be used to write the components specifications and to specify the query. The engineer query is a specification of the component that the engineer needs. The component retrieval system uses proof theorem method to match the query with the components library specifications. Proof theorem algorithms are generally complex. So, if we apply specification matching techniques on components large libraries, we can be confronted to performance problems. The specification matching technique requires advanced knowledge of specification languages. So, systems using specification matching technique are expert oriented. Behavioural retrieval Behavioural-based retrieval approaches [POD92, HAL93, ATK94, ATK95] are based on the notion of exploiting the executability of software components to classify them. Testing the component with different arguments calling his functions yields dynamic responses, which are collected. This collection is called the component behaviour. An ordering on behaviours is then used to classify components and to search through the library of components. The programs used to produce the components behaviour try to call a subset or all the functions of the component and recover the results. If the program calls functions that do not exist in the component, the components will ignore the call. In the behavioural-based retrieval approach, the engineer query is a program that calls some functions with specific arguments and this program will be plugged to all components of the library to test the components behaviour. The components that respond to the searched behaviour will be selected and presented to the engineer.
3 Final considerations After the study of these different approaches, we can say that each retrieval approach helps the applications engineer to resolve one type of query. For example faceted representation and classification using natural language description will help the applications engineer to select components using external information provided as human description of the components. This kind of components retrieval models provides tools adapted to non experimented users. Signature matching and specification matching models allow the engineer to select components by specifying
5
their signature or their specification. They are useful for applications engineers that have specific needs and that are able to formally specify their needs. Behavioural retrieval are useful to select the components by defining some behavioural constraint. This kind of method is useful to evaluate dynamic properties that depend from environments parameters. For example select the components which allow to sort big objects collections thus minimizing execution time and CPU memory. We think that an integrated retrieval model that applies different indexing and retrieval techniques may be a promising approach. Using a pattern language to document the components can help the applications engineer to understand and use the components. The patterns can be used at two levels: to document the component it self and to trace its use history. The components engineer would explain how to use the component and how to adapt it by writing one or more pattern. Every time the applications engineer would have to select and use components in a non documented use context he would inform the components engineer. If the context is important and can help and facilitate the future component use, the components engineer would document it by adding a new pattern to the pattern language (see fig.3 ). abstraction realization Component 1 Component 2
Component i
Pattern 1 Problem Context Solution
Pattern k Problem
Component i
Context Solution
Adapted Component 1
Adapted Component n Adapted Component p
Fig. 3. Relations between components and patterns.
Such a system will increase the components based approaches productivity and help to improve the knowledge capitalization process in development teams.
References [ALE77] [ATK94]
[ATK95]
C. Alexander, S. Ishikawa, M. Silverstein, M. Jacobson, I. Fiksdahl-king, S. Angel. A Pattern Language. New York: Oxford Press, 1977. S. Atkinson and R. Duke. A methodology for behavioural retrieval from class libraries. Technical Report 94-28, Software Verification Research Centre, Dept. of Computer Science, Univ. of Queensland, Australia, 1994. http://citeseer.nj.nec.com/atkinson94methodology.html. S. Atkinson and R. Duke. Behavioural retrieval from class libraries. Australian Computer Science Communications, 17(1), pages 13-20, January 1995.
6
Oualid Khayati, Jean-Pierre Giraudin
[BOO74] A. Bookstein and D. Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information science. Vol. 25, N° 5, pages 312-318, 1974. [BUC92] C. Buckley, G. Salton, J. Allan. Automatic Retrieval with Locality Information using SMART. In TREC-1 Proceedings, pages. 69-72, 1992. [COP95] J. Coplien and D. Schmidt. Pattern Languages of Program Design. Addison-Wesley Publishing Company, 1995. [CRO79] W. B. Croft and D.J. Harper. Using Probabilistic Models of Document Retrieval without Relevance Information. Journal of Documentation, pages 285-295, vol. 35, 1979. [DEF86] B. Defude. Etude et réalisation d’un système intelligent de recherche d’informations : le prototype IOTA, Phd dissertation, Institut National Polytechnique de Grenoble, Juillet 1986. [GAM95] E. Gamma, R. Helm, R. Johnson, J. Vlissides. Design Patterns – Elements of Reusable Oriented Software. Addison-Wesley, 1995. [GIR93] M. R. Girardi, B. Ibrahim, A software reuse system based on natural language specifications, Proceeding of International Conference on Computing and Information (ICCI’93), Sudbury, Ontario, Canada, pages 507-511, 1993. [GIR94] M.R. Girardi and B. Ibrahim, A Similarity Measure for Retrieving Software Artifacts, Proceeedings of Sixth International Conference on Software Engineering and Knowledge Engineering (SEKE'94), Jurmala,Latvia, pp. 478-485, June 21-23, 1994. http://citeseer.nj.nec.com/girardi94similarity.html. [GIR95] M. R. Girardi. Classification and Retrieval of Software through their Descriptions in Natural Language, Ph.D. dissertation, No. 2782, University of Geneva. December 1995. [HAL93] R. J. Hall. Generalized behaviour-based retrieval. In Proceedings of the 15th International Conference on Software Engineering, pages 371-380, 1993. [HER01] D. Hemer and P. Lindsay. Specification-based Retrieval Strategies for Module Reuse. In D. Grant and L. Sterling, editors, Proceedings 2001 Australian Software Engineering Conference, 27-28 August 2001, Canberra, Australia, pages 235-243, IEEE Computer Society, 2001. [LAB97] L. L. Jilani, J. Desharnais, M. Frappier, R. Mili, and A. Mili. Retrieving Software Components That Minimize Adaptation Effort, in Proceedings of the 12th Automated Software Engineering Conference, pp. 255-262, 1997. http://citeseer.nj.nec.com/jilani97retrieving.html. [LUH57] H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information, IBM Journal of Research and Development, vol 1, ndeg.4, octobre 1957. [MAR60] M.E. Marion and K.L. Kuhns. On relevance probabilistic indexing and information retrieval. Journal of the Associations of Computing Machinery, N°7, pages 216244, 1960. [NIE90a] J. Nie. Un modèle logique général pour les systèmes de recherche d’information : application au prototype RIME, Phd dissertation, Joseph Fourier University, Grenoble, Juillet 1990. [NIE90b] J. Nie et Y. Chiaramella. A Retrieval Model based on an Extented Modal Logic and its Application to the RIME Experimental Approach, ACM SIGIR 90, Bruxelles, Belgique, 1990. [POD92]
A. Podgurski and L. Pierce. Behaviour sampling: A technique for automated retrieval of reusable components. In Proceedings of the 14th International Conference on Software Engineering, pages 349-360,1992.
7 [PRI87] [ROB76] [ROB94] [SAL83] [SAL83a] [SIN96] [TUR91] [WAL79] [ZAR93] [ZAR95] [ZHA00]
R. Prieto-Diaz and P. Freeman. Classifying Software for Reusability. IEEE Software, 4(1), pages 6-16, 1987. S. E. Robertson and K. Sparck-Jones. Relevance Weighting of Search Terms. Journal of American Society fo Information Science, N°27, pages 129-146, 1976. S. E. Robertson and S. Walker. Some Simple Effective Approximation of the 2Poisson Model for probabilistic Weighted Retrieval. In ACM SIGIR, pages 232241, 1994. G. Salton, M.J. McGill, Introduction to modern information retrieval, McGraw Hill book company, New York, 1983. G. Salton and E. A. Fox, and H. Wu, Extended Boolean Information Retrieval, Vol. 36, No. 11, December 1983, Communication of the ACM, pp. 1022-1036. A. Singhal, C. Buckley, M. Mitra. Pivoted Document Length Normalization. In ACMSIGIR, pages. 21-29, 1996. H. Turtle and W.B. Croft. Efficient probabilistic Inference for Text Retrieval. Proceedings of RIAO 3, 1991. W.G. Waller and D.H. Kraft. A mathematical model for a weighted Boolean retrieval system. Information Processing and Management, 15(5):235-245, 1979. 188 A. M. Zaremski and J. M. Wing. Signature matching: A key to reuse. Technical Report CMU-CS-93-151, Carnegie Mellon University, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, 1993. A. M. Zaremski and J. M. Wing. Specification Matching of Software Components. In Proceeding. Third Symposium on the Foundations of Software Engineering (FSE3), pages –17, ACM SIGSOFT, 1995. Z. Zhang, L. Svensson, U. Snis, C. Srensen, H. Fgerlind, T. Lindroth, M. Magnusson, C. Stlund. Enhancing Component Reuse Using Search Techniques, Proceedings of IRIS 23. Laboratorium for Interaction Technology, University of Trollhttan Uddevalla, 2000.