Framework Proposal to Evaluate Trustworthiness in ...

Framework Proposal to Evaluate Trustworthiness in an Online Community Adriana Figueiredo, Olga Nabuco, Tatiana Al-Chueyr, Marcos Rodrigues Centro de Tecnologia da Informação Renato Archer – CTI, Campinas, Brazil {adriana.figueiredo; olga.nabuco; tatiana.martins; marcos.rodrigues}@cti.gov.br

Abstract This work presents a proposal to evaluate members' trustworthiness in online communities. Both members’ interactions and the relevance of their contributions to the community are analyzed as trust evidences. To establish the relevance of a contribution, the community’s vocabulary is studied and its occurrence in each member's interaction is evaluated. We propose a framework that supports both the trust model and the creation of the community’s vocabulary. The software architecture extends current collaborative systems and the trust model uses information extracted from the artifacts generated by the online community. An experiment in the InVesalius community, available at Brazilian Public Software Portal, is presented.

1. Introduction Collaboration is one of the cornerstones of the Web since its origin. However, it was Web 2.0 and its technologies that really enabled online participation and the collective harnessing of intelligence, as we have witnessed recently [1]. The focus of this work is online communities, where people – geographically and socially distant, speaking different languages, with distinct levels of experience and skills – are connected to build and share things common to their interest. Software development, knowledge systems (ex: Wikipedia) and scientific research are examples of areas where these things are being constructed [2]. We are precisely interested in communities of practices, where members contribute voluntarily and solidarity. What happens in these situations is that the members trust the others who frequently participate and which contribution is relevant to the community matters. We understand that the recognition by the peers can be used as a reward for this voluntary work. Therefore, our idea is to expand and export this kind of

recognition to different applications areas, such as innovation models, research production and etc. Members of an online community are usually not related in the real world. Who can be trusted is normally a problem for newcomers. We propose a trust model where the interactions and the relevance of the contributions are used as trust factors to calculate the trustworthiness of each member. The relevance of a contribution is detected using the community’s previously established vocabulary. In this work we present a software architecture framework that supports both: (i) the trust model; and (ii) the construction of the vocabulary of an online community. Each community has its own vocabulary with special words and a particular meaning. This vocabulary is not static, and is constantly changing. This paper is organized as follows: in section two we present a conceptual model of an online community, including the main elements of the trust model; in section three, is presented the software architecture that extends current collaborative systems, and adds support to trust and vocabulary in the community; an experiment in a Brazilian Public Software [3] online community is described in section four. Related work and conclusion are presented in sections five and six, respectively.

2. Trust in online conceptual model

communities:

a

Collaborative systems, with functionalities that facilitate communication and information sharing, have provided the means necessary to support online communities. However, these systems offer insufficient support for trust in the community, even though trust is an essential component when collaboration happens in the virtual space. Human notion of trust is a complex concept and has been extensively studied in many domains such as sociology, psychology, organizational theory and

political science. In computer science, trust has recently gained attention from the researchers once online services and communities are getting popular to Internet users. Computational trust models use mechanisms drawn from human methods for assessing trustworthiness of other humans. Direct experience and recommendations are the methods traditionally used by those models [4]. The experience that outcomes from a direct or indirect interaction (i.e., recommendation) is indeed relevant, but some researchers claim that new types of evidences should be investigated [5, 6]. We propose a trust model for the members of a community, using information extracted from the artifacts generated (or consumed) in the community. Fig. 1 illustrates the conceptual model of an online community, its artifacts and trust elements. A set of artifacts is specific to the collaborative work domain and examples are: (i) posts in a forum; (ii) a wiki; (iii) a dialogue in a chat. A different set of artifacts is specific to the community purpose. For example, in a software development community, artifacts can be defects, source code, etc. A trust algorithm uses trust factors and parameters, specified by the community, to calculate the members’ trustworthiness. Trust factors can be quantitative or qualitative. The number of posts in a forum is an example of a quantitative trust factor. The quality of an artifact is related to its relevance to the community and it is determined using the community’s vocabulary.

We propose a software architecture framework encompassing the current collaborative system plus two additional subsystems: Knowledge Subsystem and the Trust Subsystem, as illustrated in Fig. 2. Knowledge Subsystem is responsible for the vocabulary and ontology of the community. The Analyst represents people with expertise in the community matters. These people, with the help of an ontology engineer, construct the vocabulary and the ontology in a process that is partially manual. The vocabulary contains keywords and other words that are repeatedly used by the community. During runtime, words that are detected by the Vocabulary Manager, with certain constancy in the artifacts are inserted in the Candidate Base. Later, the Analyst can transfer these words to the community’s Vocabulary Base. This vocabulary is used to build the community’s ontology and to determine the relevance of an artifact. In this work, the focus is in the vocabulary that is used to determine the relevance of a contribution. The Trust Subsystem is responsible for gathering trust factors and calculates the trustworthiness of each member of the community. The component TrustFactorManager downloads, from the community subsystem, the artifacts generated since last access. For each one of the artifacts, quantitative and qualitative (relevance) trust factors are extracted and inserted into TrustFactor Base. The relevance of an artifact’s contents is determined by the component Vocabulary Manager and it is based on the intersection of the artifact’s text and the vocabulary earlier constructed. TrustCalculator is responsible for daily calculating the trustworthiness of all members. Heuristics are used to calculate the trustworthiness values, considering the trust factors and the weights defined by the community.

Figure 1: UML conceptual model for trust in online communities

3. Proposal of members' trustworthiness framework

Figure 2: The proposed software architecture

Our idea is that this trustworthiness value be available to the world and can be used in different applications. For this purpose, we developed the ontology depicted in Figure 3.

Figure 3: An ontology to a community and its trust elements

4. An Experiment in a Community of the Brazilian Public Software Portal In this section, we report an experiment in progress in The Brazilian Public Software Portal (BPSP) [3]. BPSP is a government initiative to promote a sustainable free-software business model, integrating the society into a new model of technological knowledge production. Today BPSP hosts more than 20 cutting-edge public and free software solutions and more than 40.000 registered members from over 80 countries. Anyone can register to the BPSP and participate to as many Public Software Communities as one is interested. Our case study is the community that surrounds InVesalius [9], a three dimensional reconstruction free software. This community integrates over 2.300 members from more than 55 countries. InVesalius community members can be divided into two groups: developers (mainly programmers) and users (usually surgeons). At first, it was selected a group of documents (articles, FAQ, Wiki, emails, etc) written by the community. It was developed an automated Python script to retrieve these data from the community subsystem, available at BPSP, and save the artifacts´ content as text files. These text files were used to create a vocabulary model which contains keywords and other words that are repeatedly used by the community. A software developed in house [10] and the Bayesian Network Builder named Kea [11] extracted these words. The original documents are either in Portuguese or English. Figure 4 shows a partial list of the extracted words that

were confirmed by the domain specialist as significant to the community. The bagOfWords serves as reference to new texts: according to the amount of pertinence in the vocabulary model, the new text is considered relevant to the community. Table 1 presents some of the quantitative and qualitative trust factors elicited to this community. Posts in the forum, the Wiki pages and the surveys' replies are the most important artifacts generated by the community. The BPSP is developed using OpenACS Web framework [13], which offers several other tools and related artifacts for online collaboration. Only the artifacts considered most significant were chosen. PSPB also offers for developers a Trac System [12] as a bug manager and source-code repository front-end. The tickets generated by Trac are considered a domain artifact.

5. Related Work Many research on trust management for online communities relies on direct or indirect experience [15], [16]. References [6] and [7] are works that investigate different sources of trust evidence as we look for. Longo et al. [6] propose a method to evaluate entity’s trustworthiness, using temporal factors, i.e., factors computed by considering only the timedistribution of interactions. The authors investigate factors like degree of activity, presence, regularity and frequency of interactions and present an experiment over the Wikipedia project. Only quantitative factors are considered in this work.

Figure 4: bagOfWords for InVesalius community

Javanmardi and Lopes [7] propose a trust model to determine the reputation of the contributors and the reliability of their contribution in collaborative and

open information repositories. The authors relate the reputation of a user and the reliability of his/her contribution to the stability of his/her data in the system. The experiment of this work is over the Wikipedia as well. Table 1: InVesalius trust factors

6. Conclusion and Further Work In this work, we proposed a framework for trust in online communities. Although recommendation could be used, we looked for new types of evidences. We calculate the trustworthiness of a community’s members using their interactions and the relevance of their contribution to the community. We have presented the initial results of an experiment realized in a public software community. Regarding the mix of languages, in the current stage, we combined English and Portuguese stop-word and observed that there was no keywords losing, but more observations are needed. Further work includes the construction of the Community’s ontology using the vocabulary base. In the Trust Subsystem, we will adopt the SIOC (Semantically-Interlinked Online Communities) ontology and extend it with the trust concepts presented in this work. We are also investigating different mechanisms to determine the relevance of a text. One idea is to categorize text against a base of good and bad contributions [14].

Acknowledgements This paper is based on work funded in part by FINEP/MCT grants in the context of Brazilian Public Software Framework Project, contract 01.08.0604.00, 29/12/2008.

7. References [1] T. O’Reilly, What Is Web 2.0 - Design Patterns and Business Models for the Next Generation of Software, available in

http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/3 0/what-is-web-20.html?page=1, 2005. [2] E. Kapetanios, Quo Vadis computer science: From Turing to personal computer, personal content and collective intelligence, Data & Knowledge Engineering vol. 67, Elsevier, pp. 286-292, 2008. [3] Brazilian Public Software Portal: http://www.softwarepublico.gov.br [4] J. Sabater, C. Sierra, Review on computational trust and reputation models, IIIA - CSIC, Campus UAB, Bellaterra, Barcelona, Spain, Kluwer Academic Publishers, 2003 [5] P. Dondio P., S. Barrett, S. Weber, J. Seigneur Extracting Trust form Domain Analisys: a Study Case on the Wikipedia Projec,Proceedings of IEEE ATC 06, Autonomic and Trusted Computing Conference, Whuan, China, LNCS 4158, pp. 362-373, 2006 [6] L. Longo, P. Dondio, S. Barrett, Temporal factors to evaluate trustworthiness of virtual identities, Third International Conference on Security and Privacy in Communications Networks, IEEE, pp 11-19, sept 2007. [7] S. Javanmardi, C. V. Lopes, Modeling trust in collaborative information systems, International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp.299-302, 2007. [8] C. Freitas, C. Meffe. The Evaluation of the Brazilian Public Software Portal, Proceedings of the WebSci'09: Society On-Line, Athens, Greece, March 2009. [9] T.A.C.P. Martins et al., InVesalius: three-dimensional medical reconstruction software, Virtual and rapid manufacturing, Taylor and Francis Group, pp. 135-141, 2008. [10] M. F. Koyama, O. F. Araújo, F. E. Pereira, K. Drira, Sharing Engineering Information and Knowledge, 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, pp. 297-300, 2005. [11] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, C. G. Nevill-Manning, Kea: Practical Automatic Keyphrase Extraction, Design and Usability of Digital Libraries: Case Studies in the Asia Pacific, Information Science Publishing, pp. 129-152, 2005. [12] Trac Open Source Project: http://trac.edgewall.org/ [13] The Toolkit for Online Communities: http://www.openacs.org/ [14] W. B. Cavnar, J. M. Trenkle, N-Gram-Based Text Categorization, in proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994 [15] J. Caverlee, L. Liu, S. Webb, SocialTrust: TamperResilient Trust Establishment in Online Communities, Proc. of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp. 104-114, 2008. [16] H. Liu, E. Lim, H. W. Lauw, M. T. Le, A. Sun, J. Srivastava, Y. A. Kim, Predicting trusts among users of online communities: an epinions case study, Proc. of the 9th ACM conference on Electronic commerce, pp. 310-319, 2008.