Ontologies for the Semantic Web: Can Social Network Analysis Be Used to Develop Them? Henry M. Kim Schulich School of Business, York University
[email protected]
Abstract According to Tim-Berners Lee, the inventor of the WWW, a semantic Web in which software agents find meanings of terms that describe tasks it performs is the next progression of the Web. Ontologies as repositories of these machine-interpretable meanings are key to his vision. However, ontologies are distributed and not, and will likely never, be centrally organized. Enabling agents to find the right meanings then is an important challenge for realizing the semantic Web. As ontologies evolve, they will likely form in clusters exhibiting small-world effects, just like web pages. In this paper, questions about bring to bear findings from social network analysis to the design of these ontologies are raised. One questions stems from the argument that ontology use over a competing technology (XML) may occur when mitigating uncertainty is important. A research direction to study social networks for dealing with uncertainty then is posited. Contact: Prof. Henry M. Kim Schulich School of Business York University 4700 Keele St., Toronto, Ontario, Canada M3J 1P3 Tel: 1-416-736-2100 x77952 Fax: 1-416-736-5687 Email:
[email protected] Key Words: Ontologies, Semantic Web, Data Modeling, Social Network Analysis Acknowledgement: I would like to thank Professor Barry Wellman of the University of Toronto for directing me to this conference Support: This paper is supported by the Natural Sciences and Engineering Research Council of Canada.
Appears in: Proceedings of CASOS Conference, Pittsburgh, PA, June 2002. Posted with permission on http://http://www.yorku.ca/hmkim/files/Kim02-CASOS-wcopyright.pdf; © CASOS, 2002
Ontologies for the Semantic Web: Can Social Network Analysis Be Used to Develop Them? Henry M. Kim The WWW is inherently a set of standards such as HTTP (HyperText Transfer Protocol) and HTML (HyperText Markup Language) for transmitting and rendering hypertext, developed to be consistent with Internet standards. Just with these standards and a web browser, performing meaningful tasks with hypertext documents is mainly a human endeavor. According to Tim Berners-Lee, the inventor of the Web, this is a limited use of the Web’s possibilities [Berners-Lee et al. 2001]: “computers will find the meaning of semantic data by following hyperlinks to definitions of key terms and rules for reasoning about them logically. The resulting infrastructure will spur the development of automated Web services such as highly functional agents.” Efficient computers, not errorprone humans, will then be able to perform rote tasks. In this vision, meanings that computers can find and reason about are represented using ontologies. An ontology is a data model that “consists of a representational vocabulary with precise definitions of the meanings of the terms of this vocabulary plus a set of formal axioms that constrain interpretation and well-formed use of these terms” [Campbell and Shapiro 1995]. A sufficiently expressive ontology represents enough knowledge as computer encoded instructions and data to enable automated instruction execution. Because precise definitions and axioms exist, proper interpretations by a computer or a decision maker that did not develop the definitions and axioms are possible. There are numerous ontologies available on the Web—everything from “light-weight” ones [Uschold 1998] that represent taxonomies of terms with little or no meanings formally represented, such as those for Yahoo!™ [Labrou and Finin 1999], to those that represent meanings for specific community of users, such as VerticalNet™ ontologies [Das et al. 2001], to an ambitious endeavor to represent common-sense meanings of the world [Lenat 1998]. For software agents, these ontologies serve as dictionaries with which the agents can determine meanings and proper interpretations of terms that describe tasks that need to be performed. Imagine the agent’s dilemma though: There are numerous dictionaries within which there are numerous definitions, and the term that is given to the agent to define may not be readily found or its proper meaning may differ from a symbolically-identical term with a different meaning. This is analogous to how Yahoo!™’s primary web page seems overwhelming to an uninitiated user, or a searcher’s frustration at typing ‘jaguar’ on Google™ to investigate felines and being referred to sites about cars and computers. At least human users are intelligent enough to choose appropriate branches on Yahoo!™’s classification tree to “drill down” or add other keywords to their Google™ search. There are efforts from AI (semantic inter-operability), database (schema integration), and information retrieval (IR) communities to enable computers to mimic such intelligence. In fact Google™’s search algorithms are a result of IR work. These efforts can be informally characterized as locally-aware but globally-unaware, and locally-unaware but globally-aware. The first describe those, such as mapping techniques, that take advantage of relationships between terms within some cluster of repositories (ontologies, databases, or documents), but not between clusters; the second describe those, such as cooccurrence techniques, that take advantage of tendencies in which all terms are generally represented. Is it possible to develop an approach to design and use repositories, specifically ontologies for software agent use, based on a hybrid, locally-aware and somewhat-globally-aware approach? In this paper, research towards this approach is initiated by exploring, then relating to ontology development, a phenomenon that can be characterized as locallyaware and somewhat-globally-aware: How people find other people.
Relating to Social Network Analysis The small-world effect has been attributed as the explanation for the “six-degrees of separation” result of Milgram’s experiment [Watts and Strogatz 1998]. Adamic [2001] has discovered that web pages exhibit this effect. If web pages whose hyperlinks are used by humans are organized in small-worlds, then one can suppose that ontologies whose hyperlinks are used by software agents will be organized in small-worlds, if not already so organized. Though there are efforts at standardizing upon use of common terms for all ontologies, such as UpperCyc [Cycorp 2002], most researchers believe that these terms and definitions will not be globally used, but rather that numerous domain, industry-, value-chain, and company- specific ontologies will evolve that are incomplete and inconsistent with respect to each other. Emergence of competing and non-complementary XML (extensible Markup Language) tag sets is a testament to this; XML is the standardized syntax and format for representing structured data over the Web, and is the de facto language for transporting instances (facts) represented in ontologies. In fact, it can be argued that prevalent use of XML can preclude widespread adoption of ontologies. Kim [2002] makes the following analogies to argue this point. HTML use is likened to paper and pen use for letters. Data can be
encoded on paper by an author, but its interpretation is left solely to the recipient, who brings to bear an understanding of natural language and knowledge about the author for interpretation. XML use is likened to use of business forms. The recipient brings to bear knowledge of the format and conventions of the business form to bear in interpreting that form. As long as the author follows the format and conventions, s/he does not need to know the recipient, and vice versa. Accurate interpretation is possible even if the author and recipient do not share a common natural language. Business forms can be more efficiently processed because the recipient can, for instance, sort by looking at one field where the indexed data should always be rather than perusing the whole form. Taking advantage of this, lower-paid data entry clerks could be hired to perform such tasks as well as more expensive domain experts. Similarly, data structured in XML versus unstructured HTML can be efficiently processed using computers, not humans. The proviso is that there must be very consistent, informal understanding of meanings and proper uses of terms between authors and recipients of a given XML document. Ontology use is likened to use of business forms with standard operating procedures (SOP’s) and requisite training to properly apply SOP’s. Where interpretation and processing of business forms is not as rote and more uncertain, SOP’s and training assist clerks’ decision-making. Ontologies are still meant for computer use, but meanings and terms are represented more formally for computer interpretation. Inasmuch as SOP’s and training are more expensive than use of business forms only, so is ontology use more expensive than XML use; just as increase in uncertainty of processing necessitates SOP’s and training, so may increase in uncertainty of machine interpretation compel ontology use. Relating this back to social networks, those networks that exhibit small-world effects but inter-relate to mitigate uncertainty should be more carefully studied. This may yield clues to circumstances in which ontologies will be adopted—XML use is much more prevalent, so a clear advantage of ontology use must be demonstrated. Characterization of these circumstances can then be factored into design and use of ontologies that will be practically used.
Concluding Remarks This paper discusses an initiation of a program of study rather than the results of one. It does provoke relating two somewhat disparate fields of study, and endeavors to make a contribution to the advancement of a ubiquitous technology, the WWW. The research questions raised are the following. How can de-centralized ontologies with no central organization be designed for a semantic Web that enables a society of software agents to automatically perform tasks that are currently done manually? What research, methods, and tools from the social network analysis fields can be brought to bear for this design? Can a characterization of circumstances in which ontology use will prevail over XML use be discerned? And can social networks that exist to mitigate uncertainty be studied for this characterization? The semantic Web is a lofty vision, not likely to be realized in full for a long time. However, it may be partially realized, and it is believed that this research will contribute to that.
References [Adamic 2001] Adamic, Lada A., 2001, "Network Dynamics: The World Wide Web", Ph. D. Thesis, Department of Applied Physics, Stanford University, Stanford, CA. [Berners-Lee et al. 2001] Berners-Lee, Tim, Hendler, James, and Lassila, Ora, 2001, "The Semantic Web", Scientific American, May. [Campbell and Shapiro 1995] Campbell, A. E., and Shapiro, S. C., 1995, "Ontological Mediation: An Overview", In: Proceedings of the IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing, AAAI Press, Menlo Park, CA. [Cycorp 2002] Cycorp, 2002, “Welcome to the Cyc Public Ontology”, Available – http://www.cyc.com/cyc-21/index.html, Accessed: May 30, 2002. [Das et al. 2001] Das, Aseem, Wu, Wei, and McGuinness, Deborah L., 2001, "Industrial Strength Ontology Management", In Proceedings of the International Semantic Web Working Symposium. Stanford, CA, July. [Kim 2002] Kim, Henry M., 2002, "Predicting how the Semantic Web Will Evolve", Communications of the ACM, February. [Labrou and Finin 1999] Labrou, Y. and Finin, T., 1999, "Yahoo! as an Ontology - Using Yahoo! Categories to Describe Documents", In: Proceedings of the 8th International Conference on Information and Knowledge Management, Kansas City, MO, November, pp. 180-7.
[Lenat 1998] Lenat, Doug, 1998, “From 2001 to 2001: Common Sense and the Mind of HAL”, In:, Ed: Stork, David G., MIT Press: Boston, MA, pp. 193-210. [Watts and Strogatz 1998] Watts, D.J. and Strogatz, S.H., 1998, “Collective Dynamics of “Small World” Networks”, Nature, 393, pp. 440-2. [Uschold 1998] Uschold, Mike, (1998), "Where Are the Killer Apps?", In: Proceedings of ECAI-98 Workshop on Applications of Ontologies and Problem-Solving Methods, Brighton, England, August.