Semantic Email Addressing: The Semantic Web Killer App? - CiteSeerX

0 downloads 0 Views 144KB Size Report
Abstract. In this document, we explore how semantic technologies can be brought to email addressing. We introduce the notion of semantic email addressing ...
Semantic Email Addressing: The Semantic Web Killer App? Michael Kassoff, Charles Petrie, Lee-Ming Zen and Michael Genesereth Logic Group, Department of Computer Science Stanford University {mkassoff, petrie, leezen, genesereth}@stanford.edu

Abstract In this document, we explore how semantic technologies can be brought to email addressing. We introduce the notion of semantic email addressing (SEA). SEA allows emails to be sent to a semantically specified recipient of group of recipients, which may be dynamically changing over time. We give some applications of SEA and describe our prototype implementation.

Introduction Email addresses, like telephone numbers, are opaque identifiers. They are often hard for a person to remember, and worse still, people’s email addresses and phone numbers change from time to time. Email addresses are a means to an end. One’s goal is usually not to send an email to a particular address, but instead to a particular person. One wants to say hello to our friend Steve, or send a message to the VP of marketing at Microsoft, or to the head caterer for one’s wedding. Ideally, one could send a message to a person just by entering his name, his position, or some other descriptive attributes. If the email address of a person changes, then the email system should send to the new email address, automatically. If the person matching a description differs over time, the email system should always send to the person currently matching that description. More generally, one should be able to send emails to groups of people matching a particular set of attributes: all chairs of departments at Stanford, or all female customers living in Detroit, or all people in our organization who speak both English and French. Today, mailing lists are used to email predefined groups of people. However, as there are infinitely many ways to define a set of people (e.g.“all people in the marketing department whose name starts with the letter ‘M’ ”), one cannot in general rely on such predefined lists. Instead, one must have the ability to address our email arbitrary groups of people. We use the term semantic email addressing (SEA) to refer to emails that are addressed to a semantically defined set of entities. The recipients to a semantically addressed email are computed on the fly based on the semantic definition of the address. SEA has other benefits as well:

No discovery required. Email addresses and mailing lists can be difficult to discover, even for a human. Worse, since email addresses and mailing list names have no semantics, automatic discovery of email addresses and mailing lists by a computer is an extremely difficult task, if not impossible. With SEA, discovery is completely obviated. No maintenance required. Traditional mailing lists require manual labor to maintain. A mailing list administrator must go through the process of creating the list. The list must then be maintained by the administrator himself and the individuals who would like to subscribe to the list, unsubscribe to the list, and change their information with regards to the list. This can be particularly onerous when the user must deal with many lists, for example when his email address changes. Ideally, the user could update his personal information such as email address once, in a single place, and have email addressers automatically adapt. SEA makes this possible. The rest of this paper is organized as follows. We first describe some applications of SEA. Next, we explain how one can send and reply to messages using SEA, and discuss our prototype implementation called ISEA. We then explore a few issues raised by SEA, including security and privacy issues, errors, user adoption, and standardization. Finally, we describe related and future work.

Examples To illustrate semantic email addressing, we consider two examples. The first illustrates SEA in a controlled, corporate environment. The second illustrates the use of SEA using public information gathered from the Internet.

Corporate Example Corporations and other organizations often have databases of information about personnel, projects, customers, and so on. This information can be leveraged to send emails based on properties of the people in the department. As an example, consider a moderate size company with several departments. The company has a centralized database containing the following information about company personnel:

Charles Petrie Axel Polleres Figure 1: Fragment of Charles Petrie’s FOAF (Friend-of-a-Friend) profile

Name Email Address Department Group Position Project Date of Hire Given this information, one may define groups of people by querying the data, for example for “all senior managers in the accounting department.” If the company also has the following information about projects: Project Name Priority Leader Start Date End Date One may then define precise groups of people such as “all developers for current projects in the Database group.” Using semantic email addressing, one may then send an email to this group. Like a traditional mailing list, a recipient of a semantic email address may respond to the sender or to the group itself. Unlike a traditional mailing list, the recipient may respond to a particular subset of the group, or forward the email to some other semantically defined group of people. Using semantic email addressing, it would be easy for a computer to send an email to a specific person or group of persons. For example, the company room reservation system could straightforwardly be programmed to automatically send an email to the building manager each time conference room 101 was reserved, without knowing who the building manager is. As the employee serving as building manager changed over time, the email would always be sent to the correct person, without having to reprogram the room reservation system.

Internet Example Recently, a RDF ontology known as FOAF, or Friend-ofa-Friend, has gained in popularity. This ontology contains

predicates for expressing properties of a person such as their name, email address, group memberships, employer, gender, birthday, interests, projects and acquaintances. As of 2004, over 1.25 million FOAF documents were publicly available on the Internet, and that number has certainly grown since (Ding et al. 2005). By spidering the Semantic Web and collecting the information contained in these FOAF files, one can build up a large collection of data about people and their interests. This information can then be used to email people with a given interest, or who know people who know a particular person, and so on. To illustrate, a fragment of one of the authors’ FOAF file is shown in Figure 1. As you can see, Charles is a Stanford employee interested in BMW R series motorcycles, German, and semantic research, and knows Axel Polleres. Note that using this information and similar information culled from various other FOAF files, one could address an email to all Stanford employees interested in semantic research, or all people interested in R series motorcycles, and so on. Note how his FOAF file is a single place in which Charles can control his personal information - if Charles were to change his email address, then all semantically addressed email based on his FOAF profile would be automatically routed to his new address. If Charles were to leave Stanford, then he would no longer receive emails addressed to Stanford people interested in semantic research. All of this is under Charles’ control, without the need to change his settings with each mailing list individually. It all happens automatically. In a sense, SEA is the opposite of spam. While both SEA and spam may involve unsolicited emails, spam is sent to everyone, while SEA is targeted towards those who have publicly announced they are interested in it. SEA is a marketer’s dream come true. Note also that, unlike mailing lists, there is no discovery required for either the sender or the receiver. It just works. Of course, we don’t have to depend just upon FOAF descriptions. We can also integrate semantic information from various sources, such as RDF files or relational databases. For example, we might also pull bibliographic information from a site such as DBLP, and email everyone who has ever coauthored a paper with Charles and whose email address is

Figure 2: Sending email to members of the group lead by Michael Genesereth interested in logical spreadsheets publicly available.

Using Semantic Email Addressing Now that we have motivated the myriad benefits of SEA, we delve into some practical details. How should it work? We first talk about sending semantically addressed email, then we talk about how one might reply to a semantically addressed email. To allow the user to semantically specify a set the email addresses to which to send to, there are three somewhat separable pieces of functionality required. First, the user must be able to define the group of email addresses of interest. This might involve choosing from some predefined list of definitions; though to attain the full power of semantic email addressing, the user should be able to formulate new definitions. The second piece of functionality is that the definition must somehow be translated to a set of email addresses to which the email must be sent. The final piece of functionality is to facilitate replying in a simple manner. This final piece is optional but sometimes useful. An email client that allows for SEA requires only some small extensions over a traditional email client. In our prototype, the interface for composing a message looks just like a traditional interface, with To, Cc, Bcc, Subject and Body fields. The only difference is an extra button, which allows a semantic email address to be specified wherever a traditional email address can be placed. This functionality is similar to the address book functionality found in modern email clients. Upon pressing the button, a window pops up with an interface for defining a set of recipients. Since a definition must be formulated in a particular on-

tology, either the user either be able to specify a query in the ontology directly, say via a textual editor, or he must use some sort of tool to that facilitates query formulation in that ontology. As only power users can be expected to formulate queries directly, a graphical interface for query creation is useful here. To support the possibility of using various ontologies, the tool should be generic and generate query interfaces on the fly based on the ontology and some display metadata. The interface used by our prototype is shown in Figure 2. The interface is automatically generated based on the schema for a person and some metadata about how to display each field, for example as a text box or a drop down list. Here, an email is being sent to members of the group lead by Michael Genesereth interested in logical spreadsheets. Note that it allows the emailer to define a set of people as opposed to a set of email addresses. As each person is associated with at most one email address, this also unambiguously defines the set of email addresses to send the message to. Note how the interface of ISEA allows embedded queries of arbitrary depth to be formed. For example, one might query for all people at a site that is in a country where French is spoken. This cross-category search is particularly useful when large amounts of supporting information are available. In particular, consider that if the Semantic Web becomes a reality, then one will have the possibility of basing his query on arbitrary information on the Web. Semantic email addressing is a powerful tool, which could easily be used to email large numbers of people. Safeguards must be put in place to make sure that user errors do not result in a mass spamming. It is useful to give the user feedback on his query definition by displaying a list of people

Figure 3: Confirmation page shown after pressing send

to which the email will be sent, in the case that the number of people is sufficiently small that this can be done. Otherwise, it is useful to display the number of people to which the email will be sent, along with a sample of those people. This allows the user to catch errors in his query definition, and give him confidence that he is not accidentally sending a large number of people a message not meant for them. Figure 3 shows a simple confirmation page displayed in ISEA after the send button is pressed after choosing the set of people defined in Figure 2. As mentioned earlier, ISEA creates actual semantic email addresses that are essentially of the form query@domain. For example, in our prototype system, by sending an email to a properly encoded “People in the Logic Group interested in logical spreadsheets”@logic.stanford.edu, an email will be sent to the appropriate people (where the quoted part is replaced by an appropriate computer-interpretable string). This is useful for sending mail from email clients that do not support semantic email addressing, for adding semantic email addresses to one’s address book, etc. However, in an ideal world in which all email clients supported semantic email addressing, these semantic email addresses would be completely hidden from the user, as they are long, unintelligible, and ugly.1 Instead of viewing such ungainly identifiers, we expect users to always view semantic email addresses though some graphical representation like the one shown in Figure 2, or a human-readable textual description such as one written in a controlled subset of English. In our prototype, we also allow a short, human readable nickname to be associated with a semantic email address using the same mechanism that user names are associated with normal email addresses. In Figure 2, the nickname “PrediCalc Team” is given to the aforementioned semantic email address. Upon receiving a semantically addressed email, it is sometimes useful to give the receiver the ability to view both the semantic definition of who the email was sent to and the actual people to whom the email was sent. The former can be accomplished by viewing the semantic email address through an appropriate GUI or natural language display, in 1

In our prototype, the semantic email address of Figure 2 is represented as sea+.28.3F5.20AND.20.28PERSON.2EGROUP.20.3F 5.20.3F6.29.20.28GROUP.2ELEADER.20.3F6.20MICHAEL.2E GENESERETH.29.20.28GROUP.2EINSTANCE.20.3F6.29.20.28 PERSON.2EINTEREST.20.3F5.20PERSONALINTEREST.2E33 64062876.29.20.28PERSON.2EINSTANCE.20.3F5.29.29@logic. stanford.edu. We make no claims as to the optimality of such an encoding, but present it as an illustration of why such addresses should be hidden from users.

the case that the email client supports semantic email addressing, or by clicking a Web page link automatically generated by the system and placed in the footer of the message by the sending agent. The latter is a bit more tricky, for the reason that the set of email addresses defined by a particular query may change over time. In the case that the number of email addresses is small, the actual email addresses to which the email was sent might appear in the To or Cc fields of the email, but if the number of addresses is large, this becomes impractical. In this case, the sending agent must somehow keep track of either the set of addresses sent to by each message or enough of the state of the database at each point in time to reconstruct this information. Doing so in a spaceefficient manner is an open problem. Replying to a semantically addressed email raises similar issues. In the case that the addresses of the recipients are sent in the To or CC fields of the message, one can reply in the usual manner to the recipients of the original message, or they can use semantic email addressing to reply. In the case that the recipient list is not included in the message, semantic email addressing must be used to reply. However, note that the set of people satisfying a condition may change over time, so replying can result in new people receiving the reply and old people not receiving the reply, which may or may not be the desired behavior. Of course, this is nothing new - the subscriber list to a traditional mailing list changes over time as well.

Prototype We have built a prototype SEA module on top of Infomaster (Genesereth et al. 1997) called ISEA (“Infomaster Semantic Email Addresser”). Infomaster is an information integration engine, and can be used to query multiple sources of data on the Internet through a single mediated schema. This is useful for SEA because it allows information to be pulled in from many sources - not just about people, but also useful supporting information about organizations and locations and so forth. The prototype is currently being used in a system used by members of the Digital Enterprise Research Institute in four locations - Stanford, Galway, Innsbruck and Korea. Since the institute is large and distributed, and members are frequently coming and going, it is hard for a member of the institute to keep track of who is where and doing what. Thus, it is a natural application for SEA. The prototype allows people to be emailed based on their site, group affiliations, name, interests, and other attributes (see Figure 2). This information is obtained from private databases and publicly available FOAF files. The ontology used by SEA is set by an administrator and is thus centrally controlled, though it is continually evolving. The basic architecture of our prototype is as follows. There are four principal components: (1) A Web-based user interface that is used to construct semantic email addresses; (2) A database (Infomaster) that is queried to determine the traditional email addresses of people matching a query; (3) A an email system (Postfix), that is used both to send

emails to (traditional) email addresses and to receive semantically addressed emails; and (4) Application code (ISEA) to glue the previous three pieces together. A typical interaction with the system interacts with the components in the following order: (1) The user constructs a semantic email address using our SEA-address creating interface. Optionally, the user can use the interface to query the database to determine the list of recipients who are addressed by the semantic email address he or she has constructed. (2) The user sends an email to this address using the email client of his or her choice. (3) The email is received by the ISEA Postfix application. This triggers an action that calls the ISEA application code, which translates the semantic email address into a database query and queries the database. The result of this database query is a list of (traditional) email addresses. (4) ISEA uses Postfix to forward the email to this list of email addresses, placing the addresses in the blind carbon copy header of the email, and the semantic email address in the to header of the email. 2 (5) The emails are received by the recipients using the email clients of their choice. In the case of a user replying to an email with a semantic email address, step (1) is skipped and the process starts at step (2).

Security and Privacy Intra-Enterprise An interesting possibility is for people outside an enterprise to use SEA to send email to members of the enterprise based on their semantic description, without knowing who they are. For example, I might want to send a message to the lead developer of a particular product at Yahoo!, but I do not know his or her name or email address, and Yahoo! might not want me to know his or her name or email address. However, they still might want to allow me to contact him or her. By exposing a semantic email addressing form to me, they can allow this in a simple way. This leads to considerations of the use of SEA on the open Internet.

SEA on the Open Internet SEA is ideal for targeted email addressing, essentially the opposite of spam. However, SEA is also ripe for abuse it can easily be used to send exceedingly untargeted emails 2 Note that this means that ISEA will receive another copy of the email, which can lead to an infinite loop if we are not careful. This is avoided by adding an email header called X-ISEA which we use to flag whether this message has been seen before or not (i.e. messages with this header have been seen before and should not be processed).

if one desires. Within small to medium sized communities, this sort of behavior is often uncommon since social repercussions occur when a community member violates community rules. Within larger communities, security policies may need to be enacted to limit who is allowed to send to whom. On the public Internet, however, SEA abuse is much harder to control. Of course, this is true for any publicly available email address, whether or not SEA is used, as is true now for [email protected]. On the open Internet, the more interesting issue is that of data ownership and authentication. Preventing any user from changing their profile so that they spoof being part of a restricted group on the open Internet will require new solutions. One can imagine many SEA systems, only some of which implement profile authentication. Users requiring security will want to use a SEA system that authenticates profile properties against at least local permissions and similarly authenticates incoming messages. And on the open Internet, SEA with subscription to public invisible properties becomes indistinguishable from semantic RSS, except for the authentication issue.

Dealing With Errors Keeping Information Up to Date One of the most powerful aspects of a semantic email address is that the set of recipients that it addresses changes over time, as the properties of the people in the database change, and as new people and added to the database and old ones removed. Ideally, the database is always kept up to date, to reflect the changing realities of the world. In practice, information in the database is not always 100% accurate. Inaccurate information can lead to people receiving an email who should not, and people not receiving an email when they should. It is important that there is a way for users of the system to ensure that their information is correct, either by editing it themselves or by having an authorized administrator change the information for them. In ISEA, users have password-protected access to their profiles via a Web interface. All information about a person is therefore controllable by the individual himself. Users are expected to keep their own profiles up to date. As one might expect in a large organization, not all information is always accurate. However, the authors note that in the case of frequently used semantic emailing addresses (for example, ”all people interested in SEA”), the relevant information is more often kept up to date than in the case of information that is has not yet been used for semantic email addressing.

Bounced Emails Bounced and delayed emails are handled by forwarding the error messages back to the sender. This works fine, unless the semantic email address is meant anonymize the recipient or recipients. In this case, the error message should not be forwarded back to the sender. Instead a message should be sent back that masks the identity of the recipients, e.g. ”Your email message has not been sent to the intended recipient. We apologize for the inconvenience.” We note that

these problems and solutions are no different than those for traditional mailing lists.

User Adoption In our prototype implementation with DERI Galway and DERI Innsbruck, as well as even DERI Stanford, we currently have these three sites as one selection property. The total number of people varies, but numbers less than 500 currently. We will soon add DERI Korea. Within each site, there are research groups. We currently have a total of fifteen (14) different groups. Each group has a leader and a set of members. Each person can currently change anything in the set of profiles, much as with a wiki. New groups and people can be added by anyone. Each person has a profile of personal data including interests. Anyone can add a new interest. This openness is by design after consideration by a working design council. The openness will eventually be modified in accordance with the security and privacy considerations above. Even among technical people, adoption has to be carefully managed. A very common remark is that people trust their existing email lists and don’t trust SEA virtual lists. When asked about the specific form their trust takes, people say that they know who is on the existing email lists. In at least the case of DERI Innsbruck, this turns out to be incorrect: only the administrator can see who is on the list. In contrast, anyone can check to see to whom a SEA message will be sent. Similarly, people have remarked that they will not be able to use SEA messages in the same way that they use email from existing email lists. Again, when asked for specifics, they say they store messages from specific email lists in specific folders and then use these emails to send to the email lists later. Of course, this is exactly the same with SEA virtual email lists, and this methodology is facilitated with the ISEA nickname. In general, SEA provides all of the same information and functionality as existing email lists, and more. Whatever worked with static email lists works with SEA virtual email lists, with less effort. Thus it appears that there are initial psychological, rather than technical barriers to SEA adoption, and we see this conservatism being quickly disappearing.

Standardization SEA does not require that all implementations use the same addressing scheme; it can vary from organization to organization or even from server to server within an organization. However, we would expect one or more standards to emerge over time. This would provide the benefit that email clients could provide better built-in support for SEA, including SEA address editing, analysis, and search.

Related Work Several other authors have seen the value in bringing semantics to emails, though all in a somewhat different way. For example, the Information Lens system (Malone et al. 1987) allowed users to send semistructured email messages and to

filter those messages using production rules. An interesting feature of this system was that it allowed users to send to a special mailbox called “Anyone” from which anyone could subsequently choose to receive messages based on production rules. This flips on its head the nature of widely broadcast emails - instead of starting with receiving all emails and whittling them down based on filtering rules, the user instead starts with an empty inbox and pulls in emails of interest based on rules. This is quite similar to the subscription model of RSS, the syndication format popular today. As RSS feeds contain more and more semantic information, the semantic subscription model exemplified by Information Lens may become more commonplace. More recently, MailSMORE (Kalyanpur et al. ) allows the user to annotate the content of an email with RDF triples, and automatically also includes RDF triples generated based on the standard headers of an email such as the To, From, Subject and Body fields. This can be used for semantic filtering and filing of emails. The M ANGROVE system (Mcdowell et al. 2004a; McDowell et al. 2004b; Etzioni et al. 2003) takes this idea further but allowing not just structured email content, but also semantic email processes which allow email clients to be scripted with declarative workflow that can be used to automatically aggregate information obtained from many email responses, automatically resend emails to people who have not responded, or to analyze the semantic content of incoming email messages and respond accordingly. Most relevantly, Microsoft Exchange 2003 allows administrators to create query-based distribution groups, which are essentially mailing lists whose recipients are based on an LDAP query run at the time of the email sending. This alleviates much of the administrative work required to maintain a mailing list. However, because the mailing lists may be created by an administrator only, they do not allow the full power of SEA. In fact, users are completely shielded from the fact that a distribution group is query based: each querybased distribution group is given a name, and thus looks just like a regular mailing list to an outsider.

Future Work In addition to the work suggested by the security and privacy discussion above, there is a wide range of improvements planned for our prototype, ranging from the near-term simple to long-range difficult. We have a parallel effort to develop SEA-specifc email clients to solve the problem of the ugly semantic email address. But our major development work is in the area of user interface to make SEA look more like existing popular email interfaces. Allowing external groups to use SEA, is much more problematic as it means that the issues of distributed permissions and authentication must be addressed. There is also the general semantic problem of linking to arbitrary external semantic sources on unifying the results. Much of this second problem has been in principle solved with Infomaster(Genesereth et al. 1997). However, there remains the ontological engineering problem of constructing proper hierarchies of ontologies so that Charles can use such them to declare that he filters out BMW K-series Motorcycles from all of those

about BMW Motorcycles. (Proper SEA design assumes that superclasses of expressed interest are not of interest to the person profiled, but subclasses are.) An interesting possibility to help control spam is semantic filtering and filing of emails. One could write semantic email rules based on not just the standard email fields, but also the semantic email address. For example, one might classify all email as spam that comes from someone not in a whitelist and without a FOAF that references someone within two degrees of separation and no common interest. This semantic filtering and filing could be extremely effective, especially if combined with semantic email content. How to generate user interfaces that allow these sort of filters to be specified and how to efficiently filter large numbers of emails against a semantic email address are interesting questions.

Conclusion In this paper, we have introduced the notion of semantic email addressing, which provides all of the functionality of static email mailing lists. In addition, we have demonstrated with our prototype implementation some additional functionality possible. The first benefit is simply that users maintain their own profiles, thereby controlling their email streams. It is no longer necessary to subscribe, unsubscribe, and change email addresses in static lists. Nor do list administrators have to establish, maintain, and filter email mailing lists. The targeted nature of a semantically addressed email is powerful and could be used to combat unintentional spam. Furthermore, since it can facilitate the contacting of individuals based on their characteristics, it can be used to preserve the privacy of email addresses and even individual identities. SEA is a simple concept, and one which we hope will soon be incorporated into commercial email systems, but which we are already deploying in set of academic institutes distributed over international sites and consisting of various research groups and people with varying interests. SEA allows each person to be precisely targeted according to their site, group, and interests in ways not feasible with static email mailing lists. Because it not only frees administrators from maintaining static lists but also allows people to have more control over their email based upon local control of their semantic data, SEA could be the “killer app” of the Semantic Web.

References L. Ding, L. Zhou, T. Finin, and A. Joshi. How the semantic web is being used: An analysis of foaf documents. HICSS, 2005. O. Etzioni, A. Halevy, H. Levy, and L. McDowell. Semantic email: Adding lightweight data manipulation capabilities to the email habitat, 2003. M. Genesereth, A. Keller, and O. Duschka. Infomaster: An information integration system. In J. Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, Tucson, Arizona, USA, pages 539–542. ACM Press, 1997.

A. Kalyanpur, B. Parsia, J. Hendler, and J. Golbeck. Smore - semantic markup, ontology, and rdf editor. http://www.mindswap.org/papers/SMORE.pdf. T. Malone, K. Grant, F. Turbak, S. Brobst, and M. Cohen. Intelligent information-sharing systems. Commun. ACM, 30(5):390–402, 1987. L. Mcdowell, O. Etzioni, and A. Halevy. Semantic email: Theory and applications. Web Semantics: Science, Services and Agents on the World Wide Web, 2(2):153–183, December 2004. L. McDowell, O. Etzioni, A. Halevy, and H. Levy. Semantic email. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pages 244–254, New York, NY, USA, 2004. ACM Press.