78
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 44, NO. 1, FEBRUARY 2014
Enhancing Memory Recall via an Intelligent Social Contact Management System Bin Guo, Member, IEEE, Daqing Zhang, Member, IEEE, Dingqi Yang, Zhiwen Yu, Senior Member, IEEE, and Xingshe Zhou, Member, IEEE
Abstract—Human memory often fails. People are frequently beset with questions like “Who is that person? I think I met him in Tokyo last year.” Existing memory aid tools cannot well support the recall of names effectively. This paper explores the memory recall enhancement issue from the perspective of memory cue extraction and associative search, and proposes a generic methodology to extract memory cues from heterogeneous, multimodal, physical/virtual data sources. Specifically, we use the contact name recall in the academic community as the target application to showcase our proposed methodology. We further develop an intelligent social contact manager that supports 1) autocollection of rich contact data from a combination of pervasive sensors and Web data sources, and 2) associative search of contacts when human memory fails. The system is validated by testing the performance of contact data collection techniques. An empirical user study on contact memory recall is also conducted, through which several findings about contact memorizing and recall are presented. Classic cognitive psychology theories are used to interpret these findings. Index Terms—Contact recall, memory aid, pervasive computing, social contact management, Web intelligence.
I. INTRODUCTION EMORY lapse is a common problem among people. Many different forms of memory lapses occur in our daily life. For example, we may forget the name of a contact or the title of a document. These memory failures often bring many inconveniences to our work and life. Therefore, finding effective ways to aid human memory becomes a fundamental and challenging problem. Enhancement to memory recall has been investigated in diverse areas for a long time. Before the era of computing, it was done manually. Representative examples are note taking and address-book writing. However, these methods suffer from problems, including possible loss/damage and inefficient search
M
Manuscript received February 1, 2013; revised June 15, 2013, August 6, 2013, October 12, 2013, and November 5, 2013; accepted December 3, 2013. Date of publication December 23, 2013; date of current version January 16, 2014. This work was partially supported by the National Basic Research Program of China (No. 2012CB316400), the EU FP7 SOCIETIES project, the National Natural Science Foundation of China (No. 61373119, 61332005, 61222209, 61103063), and the Basic Research Foundation of NPU (No. JC20110267). This paper was recommended by Associate Editor J. A. Adams. B. Guo, Z. Yu, and X. Zhou are with the Department of Computer Science Northwestern Polytechnical University, Xi’an 710072, China (e-mail:
[email protected];
[email protected];
[email protected]). D. Zhang and D. Yang are with the HANDICOM Lab, Telecommunication Network and Services Department, Institut Mines-TELECOM/TELECOMSudParis, Evry 91000, France (e-mail:
[email protected];
[email protected]). Digital Object Identifier 10.1109/THMS.2013.2294332
support. With the development of information technology, the use of digital tools (e.g., digital contact books, e-calendars) has gained much attention nowadays. However, while enabling reliable storage and efficient query support, these digital tools still face several issues. Among these issues, the reliability of data retrieval is significant. To conduct successful searches on data retrieval tools, users must remember sufficient details about the item they want to retrieve in order to construct a query (e.g., terms in a document, the name of a contact). However, psychological research indicates that people are not good at remembering precise details. Instead, they tend to remember diverse memory cues relevant to the item [1]–[3]. For example, in conducting a document search, people may rely on some contextual memories, such as its size, time when it is created, associated links, and so on. Using memory cues to address the memory lapse problem has been proved to be effective. Several types of cues are used as general ones to address distinct memory problems, such as user experience with the object, its creator, linked items, etc. However, the role that memory cues play in memory recall is rather complex and domain related, which has been of little concern in existing studies. First, in terms of the objective and user groups, many different forms of cues can be used to enhance memory recall. Just taking the contact name slipping as an example, businessmen may remember properties such as position, commercial needs, etc. On the other hand, educational background and research interests become more important information for academic researchers. This is in line with the survey result reported by Elsweiler et al. [4], which indicates a wide range of forgetting behaviors and consequently a need for different types of tools to prevent memory failure. Therefore, depending on the context of memory aid, it may be easier for the user to utilize some types of memories over others. Second, to enhance memory recall using specific memory cues, we should collect such cues in association with the items in a data archive for future retrieval. One principle to be followed here is that we should lighten the burden of users on memory cue collection (i.e., manual effort should be minimized) [5]. However, autocollection of memory cues is nontrivial. It largely depends on the behaviors of the target user group and the availability of data sources. In other words, for different user groups, the information that can be collected is different. For example, for those who are active in social network sites (e.g., Twitter, Facebook), we can extract more information (e.g., friend lists, favorites) about them from the Web. In what follows, we choose the contact name memory problem, and use the academic community to illustrate the above issues.
2168-2291 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
GUO et al.: ENHANCING MEMORY RECALL VIA AN INTELLIGENT SOCIAL CONTACT MANAGEMENT SYSTEM
In today’s modern life, people participate in various social activities and meet numerous people every day. These acquaintances form the social contact network (SCN) of a person. Thus, the ability to manage the SCN and use it to get things done has become an important yet difficult task. For example, Bob is an active researcher. He participates in various academic activities and meets numerous new contacts. To maintain his SCN efficiently, Bob uses a digital contact book for storing the information of his contacts. However, he encounters two issues. First, contact information retrieval can be difficult. Finding the needed information is not difficult using a traditional contact tool if we know the name of the target contact. Just like most people, however, Bob often forgets the names of his contacts. For example, while at the airport, Bob sees someone he has met before. He wants to talk to that person but cannot recall the information, not even the person’s name. The following is what he could only recall: “I met him at a conference in Tokyo in 2009. He told me that he graduated from MIT, with Web mining as his research interest.” As an academic community member, Bob pays particular attention to some academic information of a contact, such as the contact’s educational background and research interest. The contextual information relevant to the meeting event with the contact is also easy to remember. However, traditional contact tools do not work with such memory cues. Second, the cost of manually collecting the aforementioned memory cues is quite high. Thus, it is better to develop a tool that can automatically collect the needed contact information. This paper aims to enhance memory recall using memory cues. We explore the feature-specific nature of memory recall and propose a generic methodology to extract memory cues from heterogeneous, multimodal, physical/virtual data sources. Specifically, we intend to do the following. 1) Convert the memory recall enhancement problem into a feature-specific memory cue collection and management issue. Compared with existing studies, we emphasize the effects of different features to memory recall, such as user groups, the objective, availability of memory cues, and so on. These features are important because they can steer the better design of memory tools for different domains. As a showcase study, the current paper uses contact name slipping (a typical memory lapse problem) as the target problem and chooses people in the academic community as the target user group to illustrate the feature-specific design philosophy on cue-based memory recall. 2) Propose a general-purpose methodology of enhancing memory recall by extracting memory cues from heterogeneous and physical/virtual data sources. The methodology includes the feature-specific memory recall (FMR) view, as a general guideline for designing cue-based memory tools. In view of the distributed, complementing nature of personal information in both physical and Web-based data sources, the mixed-reality sensing strategy (MRSS), as a practical technical solution, is proposed to guide how to collect memory cues using a combination of Web intelligence and pervasive sensing techniques. 3) Design and implement the social contact manager (SCM) to enhance contact management and name recall. As a
79
showcase study, the SCM allows people to retrieve information of their social contacts more efficiently using associative cues. Based on a dedicated user study (with 55 subjects) about contact memory, we identify what cues play a crucial role in contact name recall in the academic community. Based on the observations from the user study and the MRSS strategy, we propose the combined techniques to automatically collect (extracting personal profile from homepages, using mobile phone sensing to obtain contextual cues) and associatively search the contact data for real-time recall purpose. In addition, we made an empirical user trail with 16 subjects on contact memorizing. Some interesting findings are discovered and explained using cognitive science theories. The rest of this paper is organized as follows. The next section reviews studies on memory aid. In Section III, we propose our methodology on memory cue selection, collection, and management. A user study on contact name recall is also presented. The next two sections describe the design and development of the SCM and how it supports people to search by association of their contacts. Section VI reports the performance of the SCM and the findings obtained from a user study on contact memorizing. Section VII concludes our paper. II. RELATED WORK The enhancement of memory recall using memory cues and the development of a social contact management system bring together the following close related research directions. A. Digital Memory Aids Based on the targeted memory problems, digital memory aid systems generally fall into two categories: retrospective aid, which deals with the recall of past events, and prospective aid, which deals with remembering future tasks [4]. Our study mainly focuses on retrospective aid. There are two main application domains of retrospective memory aid: digital object refinding and lifelogging. Digital object refinding focuses on the storing of and searching for collections of heterogeneous digital objects that users generate or encounter during their work process, such as documents, emails, etc. [1], [2], [6]. The tools commonly make use of metadata on the various aspects of an object’s past context, such as when, where, and its relationship with other objects, making refinding information easier. The lifelogging vision extends beyond the mere storage of digital objects used in office environments and moves into the real world to capture our daily activities. Different systems have been developed for digital lifelogging, and they all use portable or wearable devices for data capture. Forget-me-not [7] uses a personal digital assistant to collect its user’s activities in the form of texts (e.g., user location and encounters) throughout the day. SenseCam [8] records the user’s life using a wearable camera that can take photographs periodically. These systems all provide a timeline-based approach for data browsing. However, as the continuous recorded lifelogging data is huge, and people
80
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 44, NO. 1, FEBRUARY 2014
often forget the concrete context in terms of time, a timelinebased method is not enough for efficient data retrieval. There are two ways to improve the lifelogging data retrieval method. One way is to associate memory contents with additional contexts/cues. For instance, iRemember [9], an audiobased memory aid system, associates the recorded user conversations with location, weather, and news browsed to facilitate data retrieval. The other method is to restrict the scenes to be captured. Considering that the complete capture of people’s life can bring much noisy data that can never be retrieved, this method focuses on the capture of selected scenes important to the users. InSense [10] uses pattern recognition techniques to analyze the data from user-worn sensors and automatically collects multimedia clips when detecting the user in an “interesting” situation. Our study can be viewed as a topic-specific (scene-restricted) lifelogging system. Instead of attempting to support the total capture of people’s life, the SCM focuses on capturing selected scenes important to users, i.e., the social contact meeting records. We clearly define what data should be gathered for this previously unexplored problem and find a solution to address it. B. Human-Centric Computing Human-centric computing provides context-aware services according to the extracted information about users from cyberphysical spaces. There are two major sources of human information extraction: Web and pervasive sensors. These two data sources have different attributes and strengths. First, the Web is a major source for extracting static or slowly changing human information, such as user profile [11] and social relationship (from social websites). Second, pervasive sensing enables the detection of human activities, social interactions [12], and so on in the real world. Owing to the diverse features, aggregation of data from the two distinct sources provides unique opportunities to pervasive applications [13]. To date, however, research on the two data sources follows two separate research lines. Moreover, little work has been conducted to explore the combined power of these two data sources. In this paper, we propose and practice the MRSS. The MRSS seeks to provide a practical method to auto-collect human digital traces from heterogeneous sources. As shown in the SCM, by gathering contextual and basic contact information from the real world using portable sensors, as well as by extracting biographical information from the Web, the SCM intends to make a first attempt to illustrate the aggregated effects of pervasive sensing and Web intelligence by solving a compelling human problem: social contact management. C. Social Contact Tools Personal contacts are critical to getting work done. Many systems have been developed to facilitate communication with contacts and recommend contacts in social activities. Contact communication: Modern work increasingly relies on electronic communication environments, typically email (e.g., Gmail Contact, Outlook), instant messaging (e.g., MSN), and social networking sites. ContactMap [14] presents an alternative
representation of email messages by highlighting those from the identified important contacts. A common problem of these systems is that they are designed mainly to facilitate interaction among registered users in the virtual world. The SCM, however, proposes a general and natural way of enabling people to build digital connections with their contacts when they exchange business cards in the real world. Contact recommendation: Numerous systems have been developed to recommend contacts to a user based on the user’s background, social relationship, and context. For instance, social network sites like Facebook can recommend friends to a user based on his/her educational background. In WhozThat [15], users within a public place (e.g., in a bar) can exchange their social network IDs through their mobile phones and find someone of interest. Instead of delivering social-matching services, our system wants to support the management and recollection of already acquired contact information. III. METHODOLOGY AND USER GROUP STUDY This section describes our methodology on cue-based memory recall. We choose contact name slipping to do a showcase study of the method, and present a user study for measuring the importance of memory cues in contact recall. A. Methodology As presented in the introduction, we convert the memory recall enhancement problem into a feature-specific memory cue collection and management issue, and propose a generic method to address this issue. Our way to enhance memory recall consists of the following two main aspects. 1) Feature-Specific Memory Recall View: Memory cues have proven important in memory recall. Instead of seeking to capture everything about human memory, we explore the problem- and group-specific features of human memory. For different user groups and specific memory lapse problems, we think that there are different memory cues that take effective roles. Our view thus advocates the study of the role memory cues play in terms of the feature-specific nature of human memory. We also emphasize the availability of memory cues from a technical perspective. It is easy to understand that not all useful memory cues (e.g., human intention) can be captured using existing techniques. When designing cue-based memory tools, we: 1) prefer to leverage the cues that can be extracted using state-of-the-art techniques; and 2) from a practical purpose, prefer not to use proprietary capturing devices that are not widely used or accepted by users. 2) Mixed-Reality Sensing Strategy: Having explained our view on memory cue selection, the next issue becomes how to gather varied memory cues in human daily life. The strategy we propose here is founded on our “digital-footprint” observation [13]. We propose and practice the MRSS strategy, which aims to explore the aggregated effects of virtual and physical data sources to augment memory recall, leveraging a combination of Web intelligence and pervasive sensing techniques. To demonstrate the FMR view and the MRSS strategy, the following sections of this paper will present a case study on the
GUO et al.: ENHANCING MEMORY RECALL VIA AN INTELLIGENT SOCIAL CONTACT MANAGEMENT SYSTEM
81
problem of contact name recall (as depicted in the introduction). We selected the academic community as the target user group, because compared with other user groups, academic users have more available personal information sources in the current stage. We believe, however, that with the prevalence of advanced techniques, the information of other communities will increasingly be digitalized, and thus the technologies developed can be applied to them as well in the near future. B. User Group Study As described in the FMR view, when designing the cue-based memory recall tool for a specific problem, we should examine what memory cues are considered crucial for it. To this end, we start from an online survey when designing the SCM. The survey aimed at 1) understanding deeply how people manage and recall their social contacts, and 2) providing guidelines for the SCM system design. The survey was conducted on the Google Docs website from March 5 to April 15, 2011. The questionnaire consisted of the following parts. 1) Participant Profile: This part include the participants’ age, profession, gender, and the way they manage their contacts and the frequency of contact name slipping (choosing from the scales of “rarely,” “sometimes,” and “often”). 2) Weighting the Contact Information on the Purpose of Contact Recall: To obtain insight into what can help people recall their contacts, the participants were asked to imagine a past event where they became acquainted with a new person, but for some reason they could not remember the contact’s name when they met again. Participants were then asked to assess which kind(s) of information about the contact they could remember and use to recall the contact in such cases. According to the use case presented in the introduction and previous studies on contact management [14], [15], the following information is provided to be evaluated. They fall into two categories of memory. Semantic memory: This refers to generalized knowledge that does not involve memory of a specific event [16]. In the SCM, it means the properties that a contact has. Six kinds of contact information commonly used in the academic community are provided: educational experience, affiliation, research interest, address, social relationship, and hobby. Autobiographical memory: It refers to previous experience with the object [17]. In the SCM, we emphasize the contexts related to physical encounters with a contact. Four types of contexts are provided, namely, contact meeting time, location, weather (e.g., you met someone on a day with “heavy rain”), and topics discussed (e.g., “transfer learning”). We also assume that the facial feature (e.g., handsome, with a beard) also plays a significant role in contact recall. C. Online Survey Results A total of 55 participants were recruited by e-mail from two universities: Institute Telecom in France and Xidian University in China. The analysis of their answers is presented below. 1) Participant Profile: Among the 55 participants, 21 (38%) were female. The age of the participants ranged from 18 to
Fig. 1.
Survey results for contact recall (average score).
62, and more than 75% of them were below 30. The participants covered a wide range of academic professions, including professors, official staff, researchers, post-docs, and graduate/ undergraduate students. According to the survey, the participants use both traditional means and digital tools for contact management. The most commonly used tools include mobile phones, paper notebooks, Outlook, and collection of name cards. As regards contact nameslipping frequency, around 47 (85%) of the subjects felt that they more or less suffer from this issue. 2) Weighting the Contact Info on the Purpose of Contact Recall: To analyze the importance of the different kinds of contact information in contact recall, we quantified the three options used in the questionnaire. We gave the option “rarely used” a value of 0 point, “maybe used” a value of 1 point, and “surely used” a value of 2 points. Fig. 1 presents the results. As shown in Fig. 1, as basic information for a contact, affiliation and address are considered important by the subjects. As members of the academic society, they are also interested in the group-specific information, such as educational background, research interest, and social relationship with others. In addition, various contextual information on the physical interaction with a contact, including last meeting time, location, and topic, are considered important for contact recall. Human facial feature is considered an important symbol to characterize a contact as well. These survey results are important for designing SCM, as presented in the next section. IV. SOCIAL CONTACT MANAGER SYSTEM: AN OVERVIEW The SCM is the first tool that fully supports automatic contact data collection and associative data retrieval. In what follows, we present the design considerations of our system based on the requirements identified from the online survey. This methodology we presented in Section III-A can be simply represented as two subissues: which data to gather and how to gather. We identify four sets of data to be gathered based on the online survey results, as illustrated in Fig. 2. 1) Basic information: This information includes the person’s name, affiliation, email, city, and so on. 2) Selected bioinformation: For the academic community, biographical data, such as educational background, research interest, and social relationship, are important information to users. We chose eight types of biodata.
82
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 44, NO. 1, FEBRUARY 2014
Fig. 3.
Homepage finding process.
V. DETAILED DESIGN OF SOCIAL CONTACT MANAGER Fig. 2.
Four sets of contact data to be gathered.
3) Contextual information: Various interaction contexts are considered important cues for contact recall. Four types of contexts are defined, which refer to when, where, in what weather the event occurs, and the relevant topic discussed. 4) Impression: To a contact, user impression, such as his/her facial features and hobbies (e.g., a diving enthusiast), is also an important cue for contact recall. After determining which data to gather, the next step is to gather them. To minimize user effort on contact data recording, we leverage the MRSS strategy. We use Web sensors to extract user profile data from the virtual world, and mobile sensors to record user-encountering data in the real world. The combined power of distinct sources can capture a more comprehensive picture about social contacting, manifested as follows. 1) Pervasive sensors can detect basic and contextual info. The business card is a rich source of basic information. We employ off-the-shelf card scanners to capture it. Contextual cues can be derived from sensor-equipped mobile phones. We present the details in Section V-C. 2) Minimizing user effort in bioinformation gathering is a nontrivial problem. With the prevalence of Web techniques, more and more physical objects are digitalized and shared in the Internet. User profile is one such object, which is presented in various social networking websites (e.g., Facebook) and personal homepages. For the academic community, personal homepages are more popular and can provide open-access bioinfo about a researcher. We propose a business card-triggered bioinfo gathering method. Basic information from a business card is used to find the homepage of the contact. Afterward, a conditional random field (CRF)-based method is used to extract the required information from the homepage. We present the details in Sections V-A and -B. 3) As regards impression information, users annotate it in our system because of its highly subjective nature.
Automatic contact data gathering is the fundamental function supported by the SCM. In the following sections, we describe how bioinformation is extracted. We then present the techniques used for the other three datasets. The contact search interface that supports associative search of contacts using memory cues is also described. A. Business Card-Enhanced Homepage Finder We recognize personal homepages as a public repository to extract a researcher’s bioinformation. However, the problem is how to locate the homepage of a person. Search engines such as Google can provide a set of candidates relevant to a person if we use the person’s name as the keyword. However, determining the right homepage using this method is still difficult because of 1) the namesake problem (e.g., many people are called “Bin Li” in China) and because 2) the result set always includes several types of Web pages (e.g., news, digital libraries of papers), which are noisy. In terms of our application field, we propose a business card-enhanced homepage finding method, which is implemented in two steps. 1) Collection of a High-Relevance Set by an Enhanced Query: Namesake is a well-known problem in the Web people search field. Previous research has mainly focused on clustering the result set into k groups when a person’s name is provided [3], [11]. As a Web people search system, the clustering result can be returned to the information seeker for the final decision. Our system faces a different problem. To reduce user intervention, the user decision process is not included. Thus, is there a way to deal with this new situation? Let us first determine what people do when faced with a namesake problem. If a person finds that many people bear the same name on the Web, the person may use additional keywords that can characterize the target user (e.g., his/her affiliation) to filter irrelevant results. Our solution is inspired by this method, which attempts to employ a contact’s basic information (from his/her business card) to deal with the namesake problem. As shown in Fig. 3, aside from person name, four types of contact data obtained from a contact’s business card, namely, email, country, city, and affiliation, are used for the enhanced people search. For each keyword
GUO et al.: ENHANCING MEMORY RECALL VIA AN INTELLIGENT SOCIAL CONTACT MANAGEMENT SYSTEM
TABLE I HEURISTIC RULES FOR HOMEPAGE IDENTIFICATION
pair, we retain the top eight search results returned from Google search API.1 Preprocessing of the four obtained result sets is then performed. 1) Result filtering: The filtering removes some results from popular websites that are not homepages, including social websites, scientific libraries (e.g., DBLP). 2) Result scoring: After filtering, the remaining results from the four result sets are combined into one set. For Web pages that appear in more than one result set, only one copy will be kept. However, because the occurrence time reflects the degree of relevance of a Web page to a person, we assign occurrence score to the Web pages, which is calculated by (1). Along with other heuristics rules presented later, the occurrence score will be used for ranking-based homepage decision. OccurrenceScore = 2 × OccurrenceTime.
(1)
2) Identification of the Homepage: So far, we have obtained a result set highly relevant to the target person. However, how do we distinguish the homepage from other noisy Web pages? A Google search result consists of three types of metadata related to a Web page: title, URL, and snippet. Analysis of the search results shows that homepages share many commons over these metadata. For example, the title of a homepage often involves the full name or its variant of a person. Thus, we define a set of rules for homepage identification, as listed in Table I. A combination of empirical knowledge and experiment-adjustment method is used to determine the score value for each rule. Finally, the Web page with the highest score (the sum of occurrence score and heuristic-rule score) is identified as the contact’s homepage. B. Biographical Data Extraction Using the Conditional Random Field Extracting structured data (e.g., “PhdUniv”) from Web pages is a typical information extraction problem. According to previous studies [18], [19], CRF performs better than the other models (e.g., HMM, SVM) by taking advantage of the dependences among observation tokens. Therefore, we use CRF as the information extraction method. 1) Conditional Random Field-Based Information Extraction: The CRF was presented by Lafferty et al. in [18]. It is 1 http://code.google.com/apis/ajaxsearch/
83
commonly used for labeling sequential data, such as texts. It builds the model based on a conditional probability P (Y |X), where: 1) X is a sequence of observation tokens. For our system, it indicates the words and punctuations from the homepage; 2) Y is a sequence of labels. The set of labels is predefined according to the need of the target problem. For SCM, nine labels are defined, including the eight types of bio-data and a label “Other” to tag all irrelevant tokens. In labeling, the CRF model is used to find the sequence of labels Y ∗ having the highest likelihood Y ∗ = maxY P (Y |X) with the Viterbi algorithm. According to [18], the conditional probability of the label sequence can be represented as n 1 exp λi fi (yt−1 , yt , X, t) (2) P (Y |X) = Z(X) t=1 i where Z(X) = Y exp( i λi nt=1 fi (yt−1 , yt , X, t)) is a normalization factor. Feature function fi (yt−1 , yt , X, t) can be either a transition feature function r(yt−1 , yt ) or a state feature function s(yt , X).λ is t estimated from the training data, where λi is a learned weight of the feature function fi . State feature functions are defined by users, which indicate the dependence between a label in the labeling sequence and any token or token set in the observation sequence. An example of a state feature function is shown in (3). If a current token belongs to the year dictionary, and it is in a sentence that describes a Ph.D. degree, then the token is labeled “PhdYear.” The definition of feature functions makes good use of various local and global features of tokens (e.g., the dictionary feature). We will describe these features in detail later. ⎧ ⎪ ⎨1, if xi ∈ YearDic ∧ Context(xi ) = PhdDegree ∧ yi = PhdYear f (yi , X) = ⎪ ⎩ 0. (3) 2) Information Extraction Workflow: The information extraction workflow consists of three major modules: raw file processing module, model training module, and labeling module. M-1: Raw file processing module. This module accepts raw homepage files and transforms them into standard CRF input files. A standard CRF input file is composed of tokens, token features, and labels. Three units are included here. 1) Preprocessing unit: In preprocessing, we first segment the HTML page into “logical sentences” or paragraphs delimited by structural HTML tags such as
,
and tags for presenting headers, tables, and lists. After paragraph segmentation, we remove all HTML tags using rules represented by regular expressions and transform the raw HTML file into a normal text file. 2) Feature computation unit: The input file from the preprocessing unit is tokenized into words, punctuation marks, and paragraph delimiters. After tokenization, we compute the features of the tokens. 3) CRF input file construction unit. This step parses the XML format file from the feature computation unit and transforms it into a standard CRF input file.
84
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 44, NO. 1, FEBRUARY 2014
M-2: Model training module: After obtaining the training input file, we use the CRF++ toolkit2 to train the CRF model. M-3: Labeling module: It labels the CRF test files using the trained CRF model. 3) Feature Selection: Four types of features are calculated for information extraction. Token features: The token itself, its morphology (e.g., capitalized), type (word, number, or punctuation), and part of speech, are used as token features. Paragraph feature: It reports whether the paragraph is long or short. A paragraph is considered “long” if the number of characters it contains is above the threshold. Dictionary features: A number of dictionaries are used, such as advisor heuristics dictionary (e.g., “advisor”; 4 entries), title dictionary (e.g., “professor”; 25 entries), country dictionary (850 entries), degree dictionary (76 entries), and so on. Context features: They are grouped into three types given as follows. 1) Local context feature represents the dependence between the label of a token and its neighbor tokens. For example, in the sentence “. . . obtained his Ph.D. degree from Tokyo University in 1986. . . ,” the word “from” is selected as a feature indicating that the following words may be a university name. 2) Paragraph-level feature represents a relative long-distance dependency within a paragraph. For example, to examine if “Tokyo Univ.” should be labeled “PhdUniv” and not “BsUniv” in the previous example, the context information determined by “Ph.D. degree” should be explored. 3) Interparagraph context feature: The informal nature of homepage texts usually has one item of bioinfo to be written in several short paragraphs. For example, research interests are usually written in a list style. We call the dependence between a feature and a label atomic dependence. Sometimes, a combination of several features can characterize the dependence between a label and observation tokens better; we call it combined dependence. For example, in (3), two features are used to represent when a token can be labeled “PhdYear.” First, the token belongs to the year dictionary; second, it is within the context of “PhdDegree.” C. Pervasive Sensing A physical sensor is another important source for gathering contact information. In this section, we describe how the other three types of contact data are collected. 1) Basic information: There are two possible ways of capturing basic information from business cards: using a mobile mini card scanner or using a camera-based card scanner application for mobile devices. 2) Contextual information: 1) Time. In addition to the year context, we record three types of “semantic” contexts: season, month, and morning/afternoon/evening. These contexts are recorded when data from the business card are scanned (i.e., the person meets the contact for the first 2 http://crfpp.sourceforge.net
TABLE II DESCRIPTION OF THE RELATIONSHIP BETWEEN PLACE CATEGORIES, AND ASSOCIATED ACOUSTIC EVENTS, AND THE FEATURES SELECTED FOR PLACE CATEGORIZATION
time), or a new event is created by the user (another meeting event relevant to the same contact). 2) Location context. We characterize it from two aspects. One is depicted at the city level, which is obtained from the GPS-enabled mobile phone. The Google Geocoding API3 is used to convert GPS coordinates to city/country name. The other is at the semantic level, characterizing the categories of meeting places (e.g., workplaces, outdoors). It is recognized by processing the sound signals captured by mobile phones, as depicted in detail later. 3) Weather. The Yahoo Weather Web service is used to obtain city weather context. 4) Topic context is highly semantic and subjective, so it should be defined by the user. 3) Impression data: People can sense rich information beyond the capacity of sensors. In our system, the facial feature and hobby information of a contact can be derived from human observation and cognition. Location category context: We use audio clips captured from mobile phones to recognize the categories of places. In terms of our application field, the three most common categories of contact meeting places, namely workplace, leisure place, and outdoor, are identified to be recognized. As shown in Table II, three acoustic events—speech, music, and environmental sounds— are associated with the three place categories. Unlike CSP [20], which attempts to characterize fine-grained place categories using complex and high-computational visual, audio, and optical features, we propose an audio-based, lightweight algorithm to attain coarse-grained place categorization. It consists of the following three modules, as shown in Fig. 4. 1) Preprocessing: An audio clip of a predefined length (e.g., 2–3 s) is read by the mobile phone’s microphone sensor when the business card is scanned. It is then segmented into frames for feature extraction. The sampling rate is 8 kHz, and each frame length is set to 64 ms (to enable lower duty cycle on the phone). To compensate for the high-frequency part that is suppressed during the sound production process, the input signals should be preemphasized. Fast Fourier transform (FFT) is applied to transfer the sound signal from the temporal domain to the frequency domain for spectral feature extraction. 2) Feature Selection: According to the requirements of the SCM, three features are carefully selected, namely, zero crossing rate (ZCR) [21], spectral flux (SF) [22], and Haar [23]. Each 3 http://code.google.com/apis/maps
GUO et al.: ENHANCING MEMORY RECALL VIA AN INTELLIGENT SOCIAL CONTACT MANAGEMENT SYSTEM
Fig. 4.
85
Sound classification-based place categorization, consisting of three components: preprocessing, feature extraction, and the classifier.
selected feature is useful for distinguishing the three acoustic events, as shown in Table II. For instance, compared with music, speech signals often show a higher variation in the ZCR. Haar-like filtering is traditionally used as a feature extraction method in image processing. As reported in [23], it is also effective for environmental sound recognition. To save energy, we avoid using more powerful yet computationally intensive features, such as Mel-frequency Cepstrum coefficients. Finegrained classification (ambient sound classification [23], speech recognition [24]) is possible for further analysis of sound events. For our system, however, coarse place categories are sufficient because they are easy to remember and use for contact search. 3) Classification: Several classifiers have proved effective in sound processing, including Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), the Linde–Buzo–Gray (LBG) algorithm, and decision trees. As investigated in [23] and [25], there are no substantial performance differences among them. Thus, we build a classifier based on the J48 decision tree algorithm (provided by the Weka [26]), which is simple to construct and more efficient when running on resource-constrained devices. User editing is enabled for all gathered data to add/remove information or correct possible errors introduced in the sensing and classification process. D. Contact Search A keyword-based “full-index” method is used for associative contact search. The 20 types of contact metadata are stored in two tables: person and event. The person table stores the basic information, bioinfo and impression data about contacts. The event table stores the contextual information about the meeting events. We call each row in a table a record. In the full-index method, when a keyword like “Tokyo” is inputted, the contacts who graduated from “Tokyo Univ.” and met in Tokyo can all be listed. One person record corresponds to one or more event records, as we can meet a person one or more times. The contact photograph is used to validate the contact during contact search. In the current stage, we assume that the photograph can be taken by the user’s mobile phone when a new contact is met. VI. EVALUATION As the case study to manifest the methodology on memory recall, we mainly test the performance of the SCM. Specifically, we consider the success of the SCM to be determined by three major factors: 1) effectiveness of the homepage finder and the CRF-based bioinfo extraction, 2) the performance of the place
TABLE III EXPERIMENTAL RESULTS OF THE HOMEPAGE FINDER: THE NUMBER AND PERCENTAGE (%) OF THE HOMEPAGES IDENTIFIED UNDER DIFFERENT RULE SETTINGS
recognition algorithm, and 3) usefulness of the contact search tool. A series of experiments were conducted to evaluate them. A. Homepage Finder To test if the homepage finder can correctly find the right homepages of researchers, we randomly selected 200 names from the Arnetminer dataset (http://arnetminer.org/ lab-datasets/ profiling/), which includes information about 898 researchers who have homepages. A combination of multikeyword pairbased search (i.e., occurrence rules) and various heuristic rules was used in the homepage finder. We calculated their contribution in the experiments. Table III shows the experimental results. When the homepage finder was not used, that is, when the person’s name was used as the keyword and the first recommended Web page from Google API was used as the homepage, only 40 results (20%) were correct. However, when the homepage finder was used, the accuracy increased to 89%. Further experiment shows that a multikeyword pair-based search (i.e., occurrence rules, simplified as ‘occur rules’ in Table III) is more effective (the accuracy decreased to 34% when occurrence rules were not used). Among the three types of heuristic rules, URL and title rules (>80%) contributed more than snippet rules (