Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
A Multidimensional Semantic User Model for COTS Components Search Personalization Nacim Yanes, Riadi Laboratory, La Manouba, Tunisia,
[email protected] Sihem Ben Sassi, Riadi Laboratory, La Manouba, Tunisia,
[email protected] Henda Ben Ghezala, Riadi Laboratory, La Manouba, Tunisia, hhbg.hhbg @gmail.com
Abstract In this paper, we propose a personalization approach for construction and exploiting a multidimensional semantic user model in the context of COTS based development. This user model is used for improving the performance of a COTS component specialized search engine. Experimental results show that using our user model improves COTS components search quality by providing users with the most relevant results at the top of the search results list. Keywords: COTS component, user model, personalized search, multidimensional representation.
I. Introduction Nowadays, an alternative paradigm to the traditional software development consists in building information systems by selecting and integrating Commercial-Off-The-Shelf (COTS) components offered by a vendor trying to profit from them and characterized by the non-availability of their code source. Ahuja (2014) states that using COTS components has been recognized as a crucial success factor for the software industry. Several advantages of COTS- based development have been identified for both component providers (COTS components vendors) and component consumers (COTS components integrators), including the minimizing of the overall development costs of software, the reducing risks of software development and the possibility of shortening time-to-market for software products. Even though the use of COTS components provides a lot of benefits, Arora and Singh (2014) mentioned that still there are several challenges, risks, uncertainties related to this approach. As a matter of fact, the success of COTS-based development greatly depends on the ability of the integrators to identify the most relevant COTS components candidates to be selected and then integrated or used. The Web is a vital part of COTS components identification as it constitutes the virtual place where COTS components are mainly searched for and provided. Current search engines, such as Google, are important for retrieving relevant COTS components from the Web. However these search engines follow the "one size fits all" model which is not adaptable to individual users; users around the world who input the same keywords at the same time will get exactly the same search result regardless of their past search history. Moreover, users might not choose the right words that best identify their needs. A recent study, accomplished by AlMaskari and Sanderson (2011), demonstrated that users with more than 7 years of online searching experience obtained much more relevant documents than users with less experience. Personalized web search is considered as a promising solution to handle these problems, since different search results can be provided depending upon the choice and information needs of users, as stated by Fathy et al (2014). It exploits user information and search context to learning in which sense a query refer. Therefore, we have set the objective of developing a personalized search engine for COTS components. This latter is based on the use of a multidimensional semantic user model that we propose in order to identify users’ preferences and interests and then to effectively exploit these interests in the retrieval system to improve search results. In this paper, we address the problem of construction and exploiting of
2286
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
the proposed user model in the search process to provide each user with the results that are most relevant to his/her interests. The remaining of this paper is structured as follows: section 2 provides background information on the overall context of this work. Our approach for building and exploiting the multidimensional semantic user model is described in Section 3. We discuss the application of our user model for ranking and displaying COTS components search results. The experimental evaluation and results are described in Section 4. In the final section, we present our conclusion and future work.
II. Background In this section, we present the context of the proposed work by giving background information on the targeted specialized search engine for COTS components. Our proposed search engine relies on a personalized semantic search approach. This latter is based on two main processes described in the sequel. 1. The indexing process The indexing process aims at producing a unified representation of the heterogeneous COTS components descriptions provided by COTS vendors/publishers. It consists of two sub-processes: • The sub-process of COTS component information extraction aims at locating values of interest within COTS components Web pages, and mapping those values to our COTS component conceptual schema. This latter is built according to the hierarchy of concepts defined by our ontology for COTS components (henceforth ONTOCOTS) and is linked to a storage structure (an XML file) to permit the population of the extracted data. • The sub-process of knowledge population aims at populating automatically the ONTOCOTS ontology with instances that represent various concepts and their corresponding relationships. 2. The search and retrieval process
•
• •
•
The search and retrieval process is based, on the one hand, on a domain ontology (ODP) that represents and stores knowledge about COTS components application domains and on the other hand on our proposed user model which aims at integrating the user’s domains of interest in the COTS component retrieval process in order to tailor search results to a particular user. The search and retrieval process consists of four sub-processes described in the following. The sub-process of research space reduction aims at better exploiting the ONTOCOTS ontology and therefore to improve the retrieval performance. This sub-process exploits, among others, our multidimensional semantic user model to reduce the search space. We will explain how the user model is exploited later on in this paper. The sub-process of query expansion is used to increase the likelihood of a match between the query and the relevant COTS components by adding semantically related terms to a user’s query. We used the lexical ontology WordNet as the source of expansions terms. The sub-process of COTS component retrieval takes as input the expanded query and generates a formal SPARQL query. This latter is then executed against the ONTOCOTS ontology, which returns a list of instance tuples that satisfy the query. The returned instances are forwarded to the sub-process of results ranking and presentation. The sub-process of results ranking and presentation aims at bringing system relevance and user relevance closer. It is based on our multidimensional semantic user. The exploitation of the user model to re-rank search results will be described later on in this paper.
2287
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
III. The proposed multidimensional user model As we mentioned above, the goal of COTS components search personalization is to consider the user’s search preferences and interests in the search process to provide each user with the COTS components that are most relevant to his/her interests. Our personalization approach consists of three steps described in the sequel. 1. Representing the user model Identification and organization of user model knowledge is a key issue to have a global view of data personalization. As a matter of fact, different attempts have been done in the literature to collect and classify this knowledge and multiple model representation of the user interests and preferences are addressed in numerous user profiling approaches. Following this representation and categorization effort, we adopted in our work a multidimensional approach to represent and describe the user towards different dimensions. We were inspired by the work of Kostadinov (2007) who proposed to organize user model into dimensions and sub dimensions which are described by (attribute, value) couples. Sub dimensions could be considered as complex attributes. An example of possible sub dimension is the address which is composed of a street name, street number, postal code, city and country. The generic user meta model proposed by Kostadinov is given in Fig 1. It can be used to create a large variety of user models which content may vary from one domain to another. Kostadinov proposes six dimensions through which user model knowledge can be defined. These dimensions refine and extend previous categorization attempts, particularly by allowing preference expression and by providing some operators for their exploitation and evolution. These dimensions serve as a foundation to a generic user model aimed to be used in a wide application spectrum. Using these dimensions as our departure point, we propose a user model (noted Mu) composed of 5 dimensions: personal data, domain of interest, delivery data, quality data and security data. Our proposed user model is illustrated in Fig 2.
Profile +id-profile
1..*
Dimension +id-dimension +name 0..* 0..* Attribute
SubDimension +id-SubDimention +name
1..*
+id_attribute +name +valueType
1..*
AttributeValue +id_value +value
Fig 1. The user meta model.
2288
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
If we consider U as the set of users, a user u ∈ U will have as model Mu:
Mu = Dper ∪ Ddoi ∪ Ddel ∪ Dqua ∪ D sec Where: Dper = Personal data dimension Ddoi = Domain of interest dimension Ddel = delivery dimension Dqua = quality Dimension Dsec = security dimension.
PersonalDataDimension Profile DomainOfInterestDimension
+id-profile
1..*
DeliveryDimension
Dimension QualityDimension
+id-dimension +name 0..*
SecurityDimension 0..* Attribute
SubDimension +id-SubDimention +name
1..*
+id_attribute +name +valueType
1..*
AttributeValue +id_value +value
Fig 2. The proposed multidimensional semantic user model. •
Personal data dimension (Dper): This dimension groups attributes related to the identity of the user himself. This knowledge can be more or less detailed, depending on the application range on which this profile can be used. This knowledge can be organized into different entities, possibly organized as a generalization specialization hierarchy. In our proposal, personal data dimension regroups, as given in Fig 3, attributes including the user name, his/her email, his/her login, his/her password and his/her profession. Personal Data
Identity
1..*
IdentityAttribute
Password
Last Name
First Name
Email
User Name
Profession
Fig 3. The personal data dimension. 2289
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
•
Domain of interest dimension (Ddoi): This dimension groups all preferences related to general needs of a given user. In our proposal, the domain of interest describes the main COTS components application domains the user is interested in. The domain of interest can be defined in different ways. Researchers have attempted to utilize ontologies for improving personalized Web search. Our research follows recent ontology-based personalized search approaches in utilizing the Open Directory Project (ODP) ontology as a source of evidence to build a semantic representation of the domain of interest dimension. We exploited the ODP since it is considered as the largest and most comprehensive Web directory, which is maintained by a global community of volunteer editors, as mentioned in a research study by Daoud (2008). We utilize the first three levels of the ODP for learning the domain of interest dimension as bags of words associated with each category. The domain of interest dimension is illustrated in Fig 4. When using ODP ontology, our proposed specialized search engine can match evidence gathered from the user feedback (his/her search history) with concepts of the ODP ontology and therefore represent new user’s domains of interest. Our method for representing the domain of interest dimension runs in three main steps: (1) representing keyword user domains of interest derived from the user feedback, (2) mapping the keyword user domains of interest on the ODP ontology, (3) finally representing the user domains of interest by the depth three concepts of the resulting set.
concern
Domain Of Interest
1..*
COTS Component
1
Concept Vector
Search History
1..* Concept
between
0..*
1..*
+2..2 Semantic Relation
Conceptual Graph 1..*
Domain Ontology 1..*
Fig 4. The domain of interest dimension. •
Delivery data dimension (Ddel): This dimension is composed of two underlying sub-dimensions, as given in Fig 5: customization and classification criteria. The customization sub-dimension describes different modalities related to user interface (e.g. presentation style, results size). The classification criteria sub-dimension regroups attributes related to COTS components, including functional criteria such as operating system and component model, and on the other hand, quality attributes such as memory utilization and disk utilization.
2290
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
Delivery Data
concern
COTS Component Memory Use
1..*
1..*
Disc Use
Customization
Classification Criteria
Quality Attribute
Certification
Standardization Number Of Results Per Page
Display Specification
Functional Criteria Evolvability
Operating System
License
Publisher
Component Model
Fig 5. The delivery data dimension. •
Quality data dimension (Dqua): This dimension is one of the most important issues in data personalization. Attributes of this dimension define quality expected by the user. In our proposal, data quality concerns data sources; i.e. the reliability of COTS components publishers. The quality data dimension is illustrated in Fig 6.
Quality
1..* Quality Factor COTS Component Vendor/publisher 1 concern
1..* Container Quality Factor
Reliability
Fig 6. The delivery data dimension. •
Security data dimension (Dsec): This dimension describes security rules and constraints that can be applied to the user profile. Security dimension mainly refers to privacy policies. The security data dimension is illustrated in Fig 7.
2291
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
Security
User Profile
concern 0..*
Profile Security
1..* 1..*
Right Of Access
0..*
Autorization
Fig 7. The security data dimension. 2. Constructing and updating the user model We propose an hybrid approach, explicit/implicit, to collect information about users. In the first interaction with the proposed specialized search engine, users are asked to fill their personal data, preferences and interests. We present some screenshots illustrating how our search engine collects explicitly the users’ data. First, each user has to introduce information about his/her identity. As a matter of fact, he/she has to complete the information relating to the two items “personal data” and “professional data” of the form “Personal dimension interface”. Then, the same user continues the explicit collecting process by entering his/her domains of interest and preferences about delivery, quality and security dimensions, as given in Fig 8.
Fig 8. Screenshots of our user model dimensions.
2292
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
As we mentioned above, the explicit approach to collect users’ preferences and domains of interest is followed only when using the proposed search engine for the first time. In the ulterior interactions, the search history of the user is exploited to collect implicitly information and therefore to update his/her profile, particularly his/her domain of interest dimension. Indeed, we proposed an algorithm, which is illustrated in Fig 9, to update the user’s domains of interest. The principle of our algorithm is described as follows: we compute the involvement degree of each domain of interest after the submission of a given number of search queries. The involvement degree of a domain of interest is equal to the number of search queries related to the domain in question. Domains of interest with an involvement degree superior than the average of the involvement degrees are used to update the user profile. The user profile evolution is periodically carried out after the submission of a given number of search queries or if the involvement degree of a domain of interest has reach a given threshold. Begin //First triggering 1. Compute the involvement degree of each domain. 2. Compute the involvement degree average 3. Compare each involvement degree with the average 4. Update the user profile //New triggering after X search queries or involvement degree has reach a given threshold 5. Compute the involvement degree 6. Repeat 2, 3 and 4. End
Fig 9. Principle of our user model updating algorithm. 3. Exploiting the user model As we mentioned earlier, our specialized search engine exploit the user model to enhance the COTS components search quality in two phases, namely “reduction of the search space” and “results ranking and display. As a matter of fact, we used the domains of interest specified on the “domain of interest” dimension of our user model to reduce the search space. On the other hand, we have proposed a ranking algorithm based on our user model, particularly the “delivery data” dimension. Indeed, we firstly ranked COTS components according to the importance of the “functional criteria” belonging to the “delivery data” dimension. We propose an algorithm to determine the importance of these criteria using their weights specified by the user in his/her profile. Secondly, we use the “quality attributes” that belongs to the “delivery data” dimension in order to rank COTS components in each group. Furthermore, we used the “customization” sub-dimension of the “delivery data” dimension to personalize the search results presentation. As a matter fact, search results will be displayed according to the preferences of each user specified in his/her “specification display” and “number of results per page” attributes that belong to the “customization” sub-dimension.
IV. Experimental evaluation The goal of our experimental evaluation is to show that personalization with the multidimensional semantic user model leads to significantly higher retrieval performances comparing with a basic search. As a baseline, we used the open source framework Lemur. To do so, we used two data sets in our experiments: the first one concerns the COTS components collection that we built by ourselves due to the lack of a standard test collection and the second one is the ODP data set that we created to represent each of the ODP concepts. In addition, we used more than 20 2293
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
real users from the academic environment (master students, engineers and teachers). Each of these users had submitted 25 search queries (at least one query per application domain) to evaluate the effectiveness of our personalization approach. Our evaluation results show that our specialized search engine improves considerably the precision at all cut-off points, precisely at the 15 cut-off point where the improvement exceeded the 20%. Moreover, our findings reveal that our specialized search engine improves nDCG obtained by the baseline search. Therefore, this experiment demonstrates that re-ranking the search results based on the user model is effective in presenting the most relevant COTS components to the user.
V. Conclusion and future work Personalized web search provides users with results that accurately satisfy their specific goal of the search. In this paper, we proposed a personalization approach based on constructing a multidimensional semantic user model and exploiting it in the context of COTS components search. Our proposed user model consists of five dimensions, namely personal data, domain of interest, delivery data, quality data and security data. The domain of interest dimension is obtained by mapping user’s search queries into categories from the ODP ontology. From the experimental results, we show that our proposed specialized search engine outperforms the baseline search and improves the retrieval effectiveness in searching COTS components marketed on the Web. This shows the effectiveness of the proposed user profile modeling. Additionally, representing the domain of interest dimension semantically with concepts from the ODP ontology is more accurate and reduces the ambiguity than using users’ keywords. In the future, we plan to perform a large-scale experiment for longer period with more users. We can also learn other implicit information such as mouse movement, the time interval between two clicks, etc. to effectively update the user model. Furthermore, we plan to examine the effect of semantic relations in the ODP ontology on the re-ranking quality.
References Ahuja, L. (2014) ‘A technological perspective for evaluating component based technologies’ International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), IEEE, 2014, 308-313. Al-Maskari, A. and Sanderson, M. (2011), ‘The effect of user characteristics on search effectiveness in information retrieval,’ Journal of Information Processing & Management, 47(5), 719-729. Arora, P. and Singh, H. (2014), ‘Identification of Critical Risk Phase in Commercial-off-the-Shelf Software (CBSD) using FMEA Approach,’ Global Journal of Computer Science and Technology: Software & Data Engineering, 14(2). Daoud, M., Tamine-Lechani, L. and Boughanem, M. (2008), ‘Using a concept-based user context for search personalization’ Proceedings of the International Conference of Data Mining and Knowledge Engineering (ICDMKE), London, UK, 2008, 57-64. Fathy, N., Gharib, T. F., Badr, N., Mashat, A. S. and Abraham, A. (2014), ‘A personalized approach for re-ranking search results using user preferences,’ Journal of Universal Computer Science, 20(9), 12321258.
2294
Innovation and Sustainable Economic Competitive Advantage: From Regional Development to Global Growth
Kostadinov, D. (2007), Personnalisation de l'information: une approche de gestion de profils et de reformulation de requêtes, Doctoral dissertation, Université de Versailles-Saint Quentin en Yvelines. Meyers, B.C. and Obendorf, P. (2001), Managing Software Acquisition, Addison-Wesley.
2295