SeeNSearch: A Context Directed Search Facilitator ... - Semantic Scholar

SeeNSearch: A Context Directed Search Facilitator for Home Entertainment Devices Alan Messer, Anugeetha Kunjithapatham, Phuong Nguyen, Priyang Rathod, Mithun Sheshagiri, Doreen Cheng, Simon Gibbs Samsung Information Systems America Inc. 75 W.Plumeria Dr., San Jose CA 95134 amesser, anugeethak, pnguyen, priyang.rathod, msheshagiri, doreen.c, [email protected] Abstract The Internet has become an extremely popular source of entertainment and information. But, despite the growing amount of media content, most Web sites today are designed for access via web browsers on the PC, making it difficult for home consumers to access Internet content on their TVs or other devices that lack keyboards. As a result, the Internet is generally restricted to access on the PC or via cumbersome interfaces on non-PC devices. In this paper, we present unobtrusive and assistive technologies enabling home users to easily find and access Internet content related to the TV program they are watching. Using these technologies, the user is now able to access relevant information and video content on the Internet while watching TV.

1. Introduction With advances in hardware and software technologies, Consumer Electronic (CE) devices are becoming more and more powerful. Growth in network infrastructure and the falling prices of hardware continue to increase the availability of network-capable entertainment devices. Even users who are not computer savvy are starting to configure elaborate home networks consisting of cable set-top boxes, digital television sets, home media servers, digital audio players, personal video recorders, PCs etc. Home consumers are also creating, storing and accessing more and more digital content through these devices. However, there is an unfortunate gap between the digital content on the Internet and the networked digital entertainment device: currently most Internet content is organized for access via a web browser, but on home CE devices that lack a keyboard and mouse,

using a browser becomes awkward and tedious. Moreover, people typically expect a “lean back” experience when it comes to using their TV. For instance, someone watching a television news program may not be inclined to conduct an Internet search if it requires any effort more than pushing a couple of buttons. As an example of how TV viewers would ideally like to access the Internet, consider the following scenario: Trisha is watching a documentary program about “Hurricane Katrina” on her living room TV. She wishes to learn more about some topics addressed in the program, especially about “New Orleans” which has just been mentioned. She presses a button on her TV remote control and finds a host of information related to the program being watched. The graphic on the screen shows two menus. One menu has a list of keywords related to the TV program, and the first keyword “New Orleans” from this menu is highlighted. The other menu shows a list of web links containing information and/or videos related to “New Orleans”. Trisha notices that the first link on this menu is a video about the aftermath of the Hurricane Katrina. Using the navigation buttons on her remote control she selects this link to start viewing it. The above scenario illustrates the following essential features: 1) The user does not enter text or queries at any point; interaction is via the navigation buttons on a conventional remote control. 2) The user is able to access desired related Internet information by pushing a few buttons, there is no need to bring up a search page or enter search terms.

In this ideal scenario, the context of the user (the program being watched), helps focus the search to relevant content. In this paper, we describe a novel system that enables the above mentioned features using existing Internet and web content. The resulting system, an embedded search assistant, facilitates interactive, “lean back” Internet access for TV viewers. We start by describing our approach to combine traditional AI techniques and innovations in this space to solve this problem. Next we outline the prototype system and summarize initial user feedback. We then present some related work and conclude by proposing possible improvements and future work.

2. Approach Searching for information on the Internet typically involves two stages: 1) search query formation, 2) data search and analysis. A user looking for information has to first form a search query that describes the type of information being sought. Then, she has to identify potential sources of data, use them to obtain information related to her search query and possibly has to refine the search query and re-navigate to different sources iteratively until she finds the desired information. The user is essentially forced to perform a highly interactive data analysis task to find the exact information she is looking for. Our approach is to reduce this burden by embedding a context-aware search agent in the TV (or set-top box or other device driving the TV display). The search agent performs two steps: First, the search agent identifies potential search queries that the user might be interested in. We infer this based on the content being viewed when the agent is activated. Further, we also identify possible refinements to the search queries and allow the user to easily refine the queries. The agent then conducts an automated data search and analysis process involved in resolving a chosen search query. We do this by extracting, aggregating and correlating the data of interest to the user, without any input from the user, using pre-defined execution plans. In the following sub-sections, we provide more details on how the above steps are executed.

2.1. Identifying Potential Queries Our system identifies potential data of interest to the user through the user’s current application state. Current application state refers to the state of the application that the user is using at the time he desires to access relevant Internet content. For example: If the

user is watching a TV program, the channel the TV is tuned to and the program being broadcast constitute the application state. Based on the application state, the system first identifies the content used or rendered by the application. Then it obtains metadata information and/or other associated data for the content being accessed and using these, identifies potential search queries. The source of metadata, or how the metadata is obtained, will depend on the application: if the application is “listening to music” the metadata could be obtained from the CD or music track, if the application is “watching TV” then closed captioning (CC) is a rich source of metadata. For the remainder of the paper we will focus on TV and CC, however our approach can be used in other situations. 2.1.1. Closed-Caption Analysis. Closed captions (or, subtitles in Europe) are embedded in analog and digital TV programs before they are broadcast. Techniques vary by geography and delivery method, but in many countries their presence (for some percentage of shows) is mandated for accessibility to those with hearing disabilities. However, extracting any useful information from this text is not straightforward. The captions typically do not contain any case information, precluding any attempt to extract proper nouns based on case. Also, they are often ungrammatical (sometimes because of the spoken nature of the content), poorly punctuated and may have frequent typos. Because of these limitations, we cannot apply the usual keyword extraction techniques used for text documents on closed caption text. In addition, the content of closed captions is highly dependent on the type of the program. For example: a ‘News’ program is usually high on factual content, whereas a ‘Sitcom’ typically has low factual content. We have developed a keyword extraction technique that uses a part-of-speech tagger and genredependent rules to extract significant keywords from the closed captions of a program. Our technique is designed to work in real-time on real broadcast signals and can process a steady stream of CC text coming into the system. We describe below the detailed steps involved in our approach to extract keywords (Figure 1 gives an overview of these steps): 1) While a stream of CC text is received, it is broken down into sentences. This is done in order to preserve the grammar of the text. 2) The sentences are then tagged using a part-of-speech (POS) tagger [3]. The tagger analyzes each sentence and determines how each word is used in the sentence. It uses lexical rules and a dictionary to assign an initial tag to each word in a sentence, and then uses

contextual rules to update the tag based on the context in which the word occurs. A word not available in the dictionary is tagged as a proper noun. The contextual rules are sensitive to the grammar of the input sentence. Ungrammatical or incomplete sentences can result in incorrect tagging of the words in the sentence. CC Stream

Sentence Boundary Detection

Electronic Program Guide (EPG)

Plain Text Sentence

Genre

Part-of-Speech Tagging

Genre Extraction Policy Mapping

Tagged Text Rules

Rules

Keyword Extraction Policies

Keywords Extracted

Figure 1. Keyword extraction process For example: Given the input caption: “john bent ran home”, based on some lexical rules and a dictionary, the tagger initially assigns the following tags to the words: “john bent ran home”. This indicates that in the previous sentence, ‘john’ is used as a proper noun, ‘bent’ and ‘ran’ are verbs in past tense and ‘home’ is a noun. In the next iteration, the tagger looks at the context of each word in the sentence and based on some contextual rules, determines that the correct tag for ‘bent’ is in fact (proper noun). Consequently, the output of the POS tagger for this sentence is: “john bent ran home”. 3) Significant keywords/phrases (or, potential search queries) are then identified from the tagged sentences as below: We maintain and apply different extraction rules to extract significant phrases from a tagged sentence. The rules are essentially tag patterns. For example, a rule such as ( +) indicates that a phrase containing a word tagged as an adjective followed by one or more words tagged as nouns should be retrieved. Another example of a rule is (+), which specifies that a phrase with consecutive proper nouns should be retrieved.

Further, we also maintain a mapping from genre to an extraction policy. An extraction policy essentially specifies a set of extraction rules to be used for extracting keywords for a particular type (genre) of program. For example, if the program being watched is a program with high factual content such as ‘News’, we apply a highly-aggressive extraction policy to extract many different types of keywords (e.g. sequences of nouns, compound nouns, proper nouns etc.). On the other hand, if the program has low-factual content (e.g.: ‘Sitcom’), we apply a very conservative extraction policy in order to extract keywords selectively (e.g. only proper nouns) and make sure the selected keywords are useful. In summary, we choose the extraction policies to apply on the tagged sentences (obtained from step 2) based on the type (genre) of the program being watched and extract keywords from the sentences according to the policies. 4) In order to suggest the most recent keywords in a program, as the subject of the CC change over time, we maintain two history windows over the stream of incoming text. The smaller, most recent window spans the last N sentences and the larger program wide window covers the entire TV program/current news story/current program section etc. The keywords extracted from the most recent window are ranked higher than others and the ranked keywords are then ordered and presented to the user. When the program or the news story changes indicated either by special characters in the closed captions, such as ‘>>>’ in the US, or by looking at the EPG and the current time, the keywords in the windows for the previous programs are gradually demoted in rank. 2.1.2 Identifying Possible Query Refinements. In the previous section, we described our approach to identify and suggest potential search queries to the user. Considering the iterative search process that the users are accustomed to perform on a PC, we believe the users may similarly like to refine the search queries we identify and suggest to better suit their needs. To assist the user in doing so on a device like a TV, we use a query expansion technique that uses search result snippets. A snippet is the short piece of text that is often a part of the search results returned by search engines like Google or Yahoo. It typically contains a short summary of a webpage (which is one of the search results), or short pieces of text from the web page that usually include the original query and a few surrounding words. Our technique is based on the premise that the keywords in the snippets are essentially those that represent the web pages found, or those that co-occur with the original query, making

them good candidates for query expansion. The steps involved in our query expansion technique are as below: 1. Obtain the top K results from a search engine along with their snippets for the original query, Q. A higher value of K gives better keywords but has a higher latency as more results need to be fetched and vice versa. 2. Use an N-gram based approach to extract all phrases of up to three words from the snippets. 3. Filter out stop words etc. and index the remaining extracted keywords. 4. Rank the keywords using TF-IDF score and suggest the top ranked keywords as possible refinements to the original query. If one of the suggested refinements (say, Q1) is selected, it is appended to the original query and a new query (Q+Q1) is formed. This new query is sent to the search engine and the results are obtained. This process can be iteratively repeated if the user wishes to refine the query/refinement further. As it is evident, our technique requires minimal input from the user and allows query expansion without the need for a keyboard.

2.2. Resolving Queries using Execution-Plans In this section, we describe our techniques to automate the steps involved in a query resolution process, i.e., extracting, aggregating & correlating the data of interest. By correlation, we mean associations (such as ‘similar to’) between data. We have designed a simple XML-based representation, that we refer to as ‘execution plans’, to encapsulate the steps involved in the query resolution process. An execution plan comprises one or more plan-steps and each plan-step specifies the type of task (data extraction, aggregation or correlation) to be performed. Each plan step is expressed by a general processing function that we refer to as ‘RuleLets’ to operate on the data in the query resolution process. Sample RuleLets we have designed include: GetDataRuleLet (to obtain data from different data sources), MergeDataRuleLet (to merge data obtained from different data sources) and GetContentNotInHomeRuleLet (to identify data from a collection of data, which is not already available on the home devices). A plan-step essentially specifies the RuleLet to be executed and the set of input and output parameters required for the execution of the RuleLet. The specific fields in a plan-step include the name of the RuleLet to be executed, the input data required for the RuleLet execution, the output-type expected from the execution of the RuleLet and the scope of the

desired output data (if applicable). The scope field is used to specify whether the required data should be available in the home (‘Local’) or on the ‘Internet’. In order to cater to different kinds of search queries, a plan library is maintained. When a user chooses a search query, the system identifies a plan based on the context of the user. Below, we illustrate the use of plans using the search scenario described in the Introduction. In the scenario, the user is watching a broadcast documentary program titled ‘Hurricane Katrina’. When the user expresses interest to access related Internet content, potential search queries are identified by executing the following plan: GetDataRule EPGInfo Internet GetDataRule EPGInfo