Searching the 'Web of Things'

3 downloads 1826 Views 576KB Size Report
Use of the Web as the platform hosting and exposing connected objects, can ..... running on a Web server and acting as a proxy between different localized ... add location(s), etc. application logic was to blink a lamp and to display a message.
Searching the ‘Web of Things’ Benoit Christophe, Vincent Verdot and Vincent Toubiana Bell Labs Research Alcatel-Lucent Bell Labs France Villarceaux Research Center 91620 Nozay, France Email: {firstname.lastname}@alcatel-lucent.com

Abstract—With the proliferation of connected devices and the widespread adoption of the Web, ubiquitous computing success has recently been brought into the fashion of an emergent paradigm called the ‘Web of Things’, where Web-enabled objects are offered through interconnected smart spaces. While some predict a near future with billions of Web-enabled objects, the success of this vision now depends on the creation of efficient processes and the availability of tools enabling users or applications to find connected objects matching a set of requirements (and expectations). We present an on-going work that aims to develop a search process dedicated to the ‘Web of Things’ and that relies on three contributions. The creation and use of semantic profiles for connected objects; the establishment of similarities between semantic profiles of different connected objects to gather them into clusters and; the computation of a score associating a ‘context of search’ to an incoming request and enabling the selection of the most appropriate search algorithms, involving either probabilistic or precise reasoning. Index Terms—Semantic Web, ontology, Web-of-Things, machine learning, search mechanisms

I. I NTRODUCTION Since its introduction by Weiser in 1988, ubiquitous computing has been a fertile ground for research and technology, leading to a number of concrete advances, e.g. mobile computing. The core of the vision, revolving around smart spaces where users seamlessly consume services and information, has recently been updated through an emergent paradigm called the ‘Web of Things’. This new paradigm has led to the implementation of Web-based frameworks creating a service layer allowing to virtualize connected objects in each smart space and to offer them to Web applications or users. Use of the Web as the platform hosting and exposing connected objects, can be explained by multiple technological and business benefits, a few of which including versatility (application agnostic), high availability and deployment, use of standardized communication protocols and the ecosystem created thanks to Web 2.0 paradigm. While the creation of Web-enabled smart spaces brings connected objects to Web stakeholders, we believe that the next step consists of providing them with the means to search through such spaces. By search, we mean the processes allowing connected objects to be discovered and filtered for a given use and to be combined together or associated within Web applications. We endeavor a distributed framework that completes a ‘Web of Things’ platform [3] [4] – previously designed and implemented – by adding mechanisms allowing

connected objects to be efficiently found. Our main contributions are organized according to the three following axes. • A set of models relying on Semantic Web technologies and allowing connected objects to be precisely described. • Mechanisms attempting to realize semantic models cross-understanding, to discover similarities between connected objects and to gather them into clusters. • A strategy of search defined every time a request is received and using a context of search to select the appropriate algorithms that will privilege accuracy or speed in the process of obtaining meaningful search results. Albeit frameworks to expose objects on the Web have already been designed, to the best of our knowledge no one paid sufficient attention to the establishment of efficient search processes, required in the case of a ‘Web of Things’ composed of billions of objects, as planned by [6]. In parallel, while different search mechanisms have been developed – focusing on the production of better search algorithms and involving user preferences or its geo-location, etc. – none of them tried to understand the type of result expected by a requester, being either accurate or fast. We believe that searching the ‘Web of Things’ requires both two points to be carefully addressed and we propose to tackle them around the aforementioned three axes. Section ii motivates the need for efficient search mechanisms through illustrative use cases while section iii describes the related works. Section iv details the semantic models allowing connected objects to be precisely described. Section v describes the methodology that we use in order to interlink semantic models of different connected objects. Section vi presents the overall search strategy taking place upon the reception of a request. Section vii presents early prototypes of this on-going project. Finally, section viii provides concluding remarks and highlights our future works. II. M OTIVATING USE CASES This section presents two use cases that highlight the need for different discovery mechanisms, depending on the requester issuing a query. A. Accurately composing objects through semantic profiles Mary goes back home with her little daughter. Her house, interconnected to the Web, is a smart space containing several

connected objects, especially a webcam on her living room. While Mary is watching TV, her daughter stands up for the first time. In order to share this moment with her husband, Mary uses her smartphone to log on her smart space (represented by an intelligent system localized in her house) and access to a menu displaying the functionalities of her webcam. After getting the stream of the webcam, she asks the system to retrieve a list of connected objects able to ‘consume’ this video stream. By comparing the semantic description of Mary’s webcam with the ones of the different connected objects that Mary can access, the system retrieves only accurate objects being her TV, her digital photo frame but also the screen laptop of her husband and his mobile device as well. From this list of connected objects, Mary selects the laptop screen. Then, the system uses the semantic descriptions of both objects to correctly map the output produced by the webcam – the stream – on the input expected by the laptop. Once this composition done, a popup appears on the laptop screen of Mary’s husband that can choose to see her daughter. B. Using semantic similarities to speed search results Mary asks now the system to provide her with information about current forecast, pollution, etc. in the area of a park she is willing to go with her daughter. Thus, the system searches for connected objects returning results corresponding to the different requirements extracted from her request (e.g. pollution, temperature, humidity, wind, etc. in the park area). In the smart spaces around the park, a myriad of diverse connected objects is able to display a subset of meaningful information: air pollution sensors, weather station and wind speed sensors. Unfortunately, none of them is able to answer Mary’s request perfectly by providing all expected data. Fortunately, as the request comes from Mary – and not from an application – the system performs a different search algorithm looking for similarities between the requirements (expressed as a graph) and the semantic profiles of the different connected objects. This results to a list of connected objects matching the requirements of Mary’s request at different percentages and allows Mary to be provided with results in an acceptable time (once a wind sensor is found, it can be proposed to Mary with an associated matching score). These two use cases show the rationale of performing different types of search according to the requester expectations. In the first use case, a system (or an application) searches an object to logically compose it with another one and requires therefore very accurate results to achieve this operation. In the second use case, a user searches connected objects answering to a set of requirements. Similarities between the requirements and the description of objects can be used, to quickly propose acceptable results. III. R ELATED WORKS This section gives an overview of the different works that try to address issues of searching for connected objects or sensors.

A. ‘Web of Things’ search engines In [13] authors define an ubiquitous knowledge-based system to enable object discovery in smart environments. To define a domain they use two ontologies, the former containing intensional (general) knowledge of the domain and the latter containing extensional (assertional) knowledge, specific to the individuals of the domain. Smart environments are composed of micro devices, each of them being considered as an individual of a modeled domain and providing a semantically annotated description. The description of a micro device refers to an ontology providing intensional knowledge and contains semantic annotations applied to it. Discovery of micro devices involves two steps. Upon the reception of a request, a preselection based on ontology reference is made, to select a ‘type’ of resources. Thus, a matching is performed amongst semantic annotations of selected resources and an on-demand request. This work seems to address a similar approach than us, considering the need for sensors to be semantically described. Nevertheless, search of sensors seems to be tied to a dedicated smart environment as the authors do not attempt to query other smart environments when searching for meaningful objects. Moreover, in their approach, the authors do not establish similarities across different intensional knowledge repositories. Applying such method in our scope would then lead to the impossibility to realize both aforementioned use cases: unable to retrieve connected objects of an other smart space and unable to retrieve similar objects. In [11] the authors propose a search engine allowing to search for real-world entities having certain properties. They associate a Web page to a real-world entity (e.g. a meeting room) containing additional structured metadata about the sensors connected to it. The requests are made using a simple search language (e.g. ‘room ABC occupancy:empty’) and are composed of a ‘static’ part (e.g. ‘room ABC’) and a ‘dynamic’ part (e.g. ‘occupancy:empty’). When a request is received by the search engine, the ‘static’ part is analyzed to filter the sensors providing a capability (e.g. filter the sensors of room ABC that provide ‘occupancy’). It then uses predictive models to compute the probability that filtered sensors return a given value (e.g. probability that sensors giving ‘occupancy’ in room ABC return ‘empty’). When sufficient hits are found, the process is stopped and the results are returned to the user. This work allows to search for real-world entities reaching a given state as perceived by the sensors that compose them. Although quite practical, this approach imposes two strong conditions: First, to perform a query end users have to know the vocabulary used by sensors (how states are named). Second, an entity must be summed up by the sensors that compose it. In our case, potential billions of connected objects invalidate the first condition. Moreover, the second condition would avoid the representation that we have of a connected object: a rich device providing functionalities, being localized somewhere, able to move, having owner and granted users, etc.

B. Sensor search engines

A. The benefits of using the Semantic Web

In MAX [8], Microsearch [15], and Snoogle [16] a sensor node attached to a connected object carries a keyword-based description of that object. Following an ad hoc query consisting of a list of keywords, the system returns a ranked list of the top k entities matching this query. As based on keyword descriptions, these systems avoid accurate results to be found. As an example, they do not allow to find an object realizing a given capability. Furthermore, realizing aforementioned use cases with this type of description would be impossible. Thus, it does not suit our needs. In Distributed Image Search [18], the authors considered the content generated by camera sensors. A user can submit a query containing an image and the system returns the cameras that captured similar images. Results are ranked and the most relevant matching sensors/images are returned to the user. As based on the content generated by a sensor, this method can not be applied in our scope as it implies to send all queries to all sensors and would therefore not scale a global ‘Web of Things’. Finally GSN [1] is a system for Internet-based interconnection of heterogeneous sensors and sensor networks. In this approach, sensors are identified with metadata (e.g. a sensor can define predicates such as ‘usage’ set to ‘room monitoring’ or ‘geographical’ set to a room number). In addition, they syntactically described the structure of the data streams that they consume or produce. As based on keywords this approach can not be applied in our scope, for the same reasons than those already mentioned in this part.

Our approach to describe any connected object relies on a set of models using principles and technologies of the Semantic Web [2] such as OWL standing for Ontology Web Language. To a certain extent, our approach can be seen as a subset of Linked Data1 dedicated to connected objects. Thus, the resulting description benefits of the following: Web-based: In the Semantic Web, all entities defined in a model are referenced by URIs using the standard Web protocol HTTP. Extensibility: This is one of the main concepts of the Semantic Web and OWL. The open-world assumption states that everything not explicitly stated is undetermined. This concept allows integration of additional models to refine connected object descriptions. According to the opposed closed-world assumption something that is not explicitly stated does not exist. In this vision, the whole set of connected objects would have to be modeled within itself, constricting extensibility. Ability of domain-driven models to be interlinked: Although a model uses to define a particular domain, it is possible for every entity composing the model to reference other entities from a different model by using arbitrary relations. The linking of information from different sources can substantially increase the value of modeled data as it makes possible to connect various data sources and to build a decentralized, dynamic and extensible collaborative information space. Use of standardized languages: The use of standardized languages enables computers to automatically read and interpret information so that applications or programs can gather the desired information from different sources in a generic way. Model expressiveness: The use of first order logic allows semantic engines to infer logical consequences from a set of asserted facts or axioms. A consequence of this is the ability for an object to supersede another one. As an illustration, a reasoner would propose a phone as one of the results of the requirement ‘emitting a sound’ just because a subset of properties describing the phone is equal to those of a loudspeaker.

To sum up the related works, only one of them relies on ontology to describe objects with semantic annotations while most of them rely on keyword-based descriptions. More important, none of them establish similarities between descriptions of searched elements. Finally, none of them attempt to associate a ‘context a search’ to a request and therefore, always perform the same search algorithm. IV. D ESCRIPTION MODELS FOR CONNECTED OBJECTS The scope of potentially Web-enabled objects is so large [6] that a standard to provide definitions for every of them would be impractical. Therefore, using connected objects in Web applications requires a dedicated description language. A way to proceed may have been by re-applying service description languages (e.g. WSDL, WADL) to connected objects through the creation a new set of WS-* [12]. Indeed, these languages – designed to describe functionalities – could have been used to represent what the connected objects provide. However, expressiveness limitations inherent to these language designs [9] [10] would not (or insufficiently) have allowed fine representation of some aspects of connected objects such as their functional behavior (i.e. their associated finite-state machine), the ownership and underlying access rights or the geographical range where the object can be accessed from.

B. Different models for an accurate description To tackle the various aspects of connected objects and to provide a description exploitable by applications or end-users, our approach defines or reuses different types of ontologies. It creates a domain ontology to define geo-location concepts associated to a connected object or its smart space. It uses generic ontologies to model the finite state machine of any connected object and to enable structures required or generated by this object to be described as well. It reuses FOAF – Friend Of A Friend vocabulary specification – an ontology describing people and the links between them, to associate an owner to a connected object. It defines an application ontology describing the overall scope of connected objects and interlinking aforementioned ontologies together. Finally, to deliver a description searchable by humans, it defines a taxonomy gathering a list 1 Linked

Data, http://linkeddata.org/

of capabilities representing affordances of a set of commonly used connected objects (TV, heating system, etc.). The rest of this section details the five different models that our approach implies. All of them are prefixed by ‘vo’ – for virtual object – recalling the work done in [3]. These different models define concepts in the form of OWL classes and OWL properties. To produce the description of a connected object, these models are instantiated with data peculiar to this object, as shown in recapitulative Fig. 1. The first developed ontology exposes finite-state machine associated to a connected object. In more details, this model called vo-fsm defines concepts like ‘functionality’, ‘input’, ‘output’, ‘event’ or ‘state’ in the form of OWL classes as well as rules – using SWRL2 – such as ‘a functionality is available in some states, reacts to inputs and generates outputs whom values can potentially change the overall state of the object’. This model is instantiated by object providers through the declaration of OWL individuals, corresponding to the functionalities, states, etc. of the connected objects they release. By providing these details, this ontology enables applications or programs to not only know which object’ functionality to use but also how to use it. The second developed model provides a generic definition for the ‘structures’ required or generated by connected objects and has consequently been named vo-structures. This model is composed of a single RDF class called ‘structure’, defined by an RDF property called ‘hasComponent’ to allow a ‘structure’ to be composed of other ‘structures’. This model is instantiated by object providers through the declaration of OWL classes and OWL properties, respectively specializing ‘structure’ and ‘hasComponent’. As an instance, a phone provider can declare a ‘CallInput’ concept, specializing ‘structure’ and having the property ‘hasCalleePhoneNumber’ specializing ‘hasComponent’. This way, we allow object providers to use their own vocabularies (and their own classifications) to declare the structures used by their objects. The counterpart of this approach is that each connected object has its own representation of the structures it requires/generates. Therefore, additional processes are required to discover similarities between structures of different connected objects. To cope with this challenge, we use a machine learning approach, detailed in section v . The third ontology contains geo-location concepts and defines common types of building as well as spatial shapes. Called vo-location, it allows to represent smart spaces through geometrical forms such as point, segment, line, circle, etc. and to associate them a ‘type’ (room, meeting-room, house, mall, etc.). In addition, it defines properties such as ‘included in’ or ‘nearby’ allowing several places to be relatively located. Although this ontology corresponds to our representation of smart spaces, it contains similar concepts than those defined in the GeoNames3 ontology. To benefit from the ‘open data initiative’, vo-location uses the additional OWL built-in property – ‘owl:sameAs’ – allowing to reference the similar concepts 2 Semantic

Web Rule Language, http://www.w3.org/Submission/SWRL/ http://www.geonames.org

3 GeoNames,

Fig. 1.

The different semantic models, interlinked through properties

expressed in GeoNames. The fourth model is a taxonomy of terms describing how objects are perceived by humans, in terms of the capabilities that they allow. Called vo-capability, it gathers a list of commonly used objects, natively connected or not (as mentioned in [4] it becomes now affordable to plug connecting capabilities to any object) and associates them a set of capabilities. Examples of objects are ‘audio systems’, ‘furnitures’ (e.g. a ‘bed’) or ‘heating system’ while examples of capabilities are ‘turning on or off the TV’, ‘answering a call’ or ‘cooking’. Interlinked with vo-fsm, this model allows to map a ‘functionality’ with the realization of some ‘capabilities’ and then enables a goaloriented search to be performed (i.e. a search engine can return objects matching the realization of a goal expressed under the form of a set of capabilities). Finally, the last model that we have developed contains a unique concept called ‘virtual object’ made of properties involving all other defined models. The ‘virtual object’ concept is the central element in the description of a connected object, thus this model has been named vo-core. In addition to this concept, vo-core defines properties allowing to interlink all above models: • •



vo-fsm with vo-structures through a property mapping ‘input’, ‘output’ and ‘event’ with ‘structure’, vo-fsm with vo-capability through a mapping between ‘functionality’ and ‘capability’ (realizes and isMadeOf properties on Fig. 1), vo-fsm with FOAF to define the access rights of a ‘functionality’ (accesses property on Fig. 1).

The description file associated to a connected object consists then of an ontology importing vo-core – and indirectly all other created models – to declare the different data peculiar to this object. It involves different stakeholders being the object

provider (instantiating vo-structures and vo-fsm), the smart space administrator (instantiating vo-location) and the object owner (having a FOAF profile and instantiating vo-core). V. C LUSTERING SIMILAR CONNECTED OBJECTS If the previous section details the models required for describing connected objects, it also highlights that each stakeholder participating to this task will use their own vocabulary to name object functionalities, object structures or even smart spaces. As an instance, the action of ‘forwarding a call’ could use different parameters or different naming conventions, whether the phone is released by one company or another. Ideally, this problem would be solved by agreements between all device manufacturers resulting in a giant ontology defining a common understanding of all structures, functional names or places. This vision is impossible to achieve as it would imply a huge effort from all manufacturers to define this ontology and once done, to migrate their data and structures to be compliant with it. Instead of imposing a global knowledge graph to be built, we prefer to rely on a different approach consisting of an ontology set instantiated by each device manufacturer with their own concepts and properties. While being a reasonable approach to create connected object descriptions, this leads to the problem of finding similarities between different connected objects: if using different vocabularies, how to discover that two objects are equal or have similar functionalities or structures. If not solved, this problem avoids to propose search engine features such as proposing objects that can be composed (output of one generating data for the input of a second), or that can supersede another one. We address this issue through an additional process that establishes two kinds of similarities. • Similarities between connected objects according to the common set of vo-capability that they realize. • Similarities between connected objects according to the structures that they use or generate. The first type of similarity is established thanks to vocapability taxonomy. Indeed, if a capability appears in different functionalities, a similarity between them is established. The second type of similarity relies on a general ontology matching problem [7]: establishing a mapping between concepts defined in different ontologies. To discover such similarities, our approach applies machine learning based algorithms to semi-automatically create semantic mappings between concepts of different vo-structures instantiations. This approach is similar to the one used in [5] and that consists – for each concept defined in a given ontology – of finding the most similar concept node belonging in an other ontology, by computing a similarity measure based on a joint probability distribution function. As the authors did in [5], for any two concepts – defined in different vo-structures – we calculate a joint probability distribution (JPD) that we use to compute a similarity measure. According to them, ”for any two concepts A and B, the

joint distribution consists of the four probabilities: P (A, B), P (A, B), P (A, B), and P (A, B), where a term such as P (A, B) is the probability that an instance in the domain belongs to concept A but not to concept B”. To compute the joint distribution of concepts A and B, the authors in [5] approximate a term such as P (A, B) as ”the fraction of individuals belonging to both A and B”. Hence, they classify both A and B individuals to obtain those belonging to A ∩ B. In our study, we redefine the approximation of a term such as P (A, B) by using two different approaches avoiding the use of individuals. Indeed, recall that the similarities we search to obtain, concern concepts belonging to different vo-structures and that such models are empty of OWL individuals. In other words, any concept in vo-structures is defined through the OWL properties that compose it. Thus to obtain such approximation, we develop several descriptors, each of them producing a textual representation of an OWL class and its complement (considering an ontology as a universe made of concepts, the complement of a concept is defined by all others concepts). As an instance, when analyzed by a descriptor d, a concept GP S defined in an ontology O produces Rd (GP S) and Rd (GP S), with O = GP S ∪ GP S. Once performed on each concept of different vo-structures, we approximate P (A, B) as the probability that A expresses the same meaning than B based on their respective representations. To compute P (A, B) we use a document classification technique to get the probability that Rd (A) expresses the same concepts than Rd (B) and we couple it with topic modeling algorithms to find the fraction of ‘topics’ shared between Rd (A) and Rd (B). In other words, we obtain P1 (A, B) by training a Naive Bayes classifier with Rd (B) and Rd (B) and by testing Rd (A) on it. Then, we use an implementation of Hierarchical Latent Dirichlet Allocation (HLDA) to compute P2 (A, B) by extracting topics of Rd (B) and Rd (B) and by doing topic inference with Rd (A). We finally obtain P (A, B) as the average of both aforementioned probabilities. We reapply the same principle to obtain P (A, B), P (A, B) and P (A, B). After having computed the joint probability distribution, the idea is to use several similarity distance measures in order to see how close two concepts are from each other. We currently have implemented only one measure based on the well-known Jaccard coefficient [17], but we are establishing some others taking into account specificities inherent to the domain addressed (i.e. connected objects). Fig. 2 provides an overview of the process involved in the discovery of structure similarities. Although obtaining joint probability distribution may take time, this computation is done when a connected object joins a smart place. Therefore, it is decoupled from the process involved upon the reception of a request.

Fig. 2. Computing joint probability distribution to find similarities in different vo-structures

VI. P REDICTING THE CONTEXT OF SEARCH The ‘Web of Things’ becoming more and more popular, some claim that billions of objects will be connected in a near future [6]. This scale naturally brings issues for the related discovery mechanisms. Thus, if capability to search for connected objects requires semantic descriptions to be created (section iv ) and analyzed (section v ), we believe that it also requires to adapt our search methods, and so according to the context associated to a search request. Indeed, the ‘Web of Things’ allows scenarios where users or applications interact with connected objects. These two kinds of requester come each with their own specificities. • Applications own a high computer power and need accurate results that they are able to process. • Humans have low computing power and memory and need quick results even approximative. To guarantee satisfying results, the underlying search mechanisms must be able to efficiently process different types of requests, regarding the requester’s context that will define a tradeoff between accuracy and speed. By predicting this context, our approach allows to select the most appropriate search algorithms and to tune a search in order to provide results as expected by the requester’s needs in terms of speed and accuracy. To determine a context of search, our approach answers the following two questions. • Can we determine the type (application or user) of the requester? • Are connected objects – of a given smart space – organized in a small number of clusters? Thus, it computes a score given an indication on the type of search expected by the requester. Fig. 3 sums up the overall prediction process, described hereafter. Additionally, to predict the context of search we use the organizational aspect of a smart space as defined in [3], where a smart space comes with

a search engine module, each one of these engines federated behind a ‘global’ one. When receiving a request, we use a local context analyzer (LCA) to extract a ‘requester id’ and see if such requester has a known search profile (i.e. we search if the requester has already been looking for something). To do so, we use an internal database to retrieve the types of previous search that were asked by the requester. It is worth noticing that we do not intend to retrieve the whole request/response history associated to the ‘requester id’. Our main and only interest is to trigger the table of a local database to compute a probability of type of search based on historical request types associated to a ‘requester id’. If the ‘requester id’ is unknown, we use a global ‘context search’ analyzer (GCA) asking the smart places in the ‘geographical’ neighborhood, in case they may have a search profile associated to the ‘requester id’. The reason to privilege geographically close smart spaces refers to the assumption that they may have (or even share) the same connected objects, with a good probability for users of a smart space A to be granted with rights in the smart space B (e.g. two nearby offices in a company may have both a printer, a coffee machine, phones, webcams, etc.). In addition to searching the history associated to a requester, the LCA uses clusters of connected objects – based on similarities discovered by algorithms detailed in section v – to balance the previous obtained probability with a score given an indication on the organization of connected objects in a given smart space (lots of objects, being of same type, etc.). Our approach currently discriminates between four cases. 1) A low number of connected objects in the smart space. We return then a score favoring an accurate search (i.e. will not take time to search between a few set of objects). 2) A high number of connected objects. To speed up the search, we return a score favoring probabilistic search. 3) A low number of clusters. Therefore, it means that some connected objects belong to the same ‘category’ of objects. We return then a score privileging an accurate search to better discriminate between objects of the same category. 4) A number of clusters close to the number of connected objects. Thus, we return a score privileging a probabilistic search. Finally, the system computes an overall score resulting from the previous values obtained. This score defines our ‘context of search’ and enables the appropriate algorithms to be selected. VII. E ARLY PROTOTYPES Although this work is currently on-going, some prototypes have already been done in order to support a user when navigating between WOT-enabled smart spaces [3], searching for connected objects. These prototypes took part of an existing base configuration composed of approximately 20 connected objects (including phones, webcams, TVs or pressure sensors) dispatched into 4 different smart spaces (all

Fig. 3.

Predicting context of search

in different locations of our offices). The tools having been developed are the following ones. • •





A set of flash interfaces allowing semantic profiles for connected objects to be easily created (see Fig. 4). An out-of-the-box tool discovering similarities between concepts of different ontologies. This tool comprises a set of descriptors to perform the different machine learning algorithms previously mentioned. A search engine localized in each instrumented smart space, running on a Web server and using Pellet [14] to build processable graphs associated to the semantic descriptions of connected objects. An additional search engine out of any smart space, running on a Web server and acting as a proxy between different localized search engines.

We have currently passed some tests that validate the rationale of using semantic profiles to retrieve connected objects matching a set of requirements. These tests were based on the existing smart spaces instrumented in our offices and were tackling the case of a user willing to compose different objects together as well as the case of an application requiring some objects in order to be executed. The first test was to retrieve objects able to consume a video stream produced by a webcam connected to the Web (see Fig. 5-a). By processing semantic profiles of the different connected objects, the system proposes only 3 of them being a TV living in the same smart space than the considered webcam, a TV living in a different smart space and a photo frame. All of them were actually able to consume this video stream. The second test was relying to an application requiring different connected objects to be executed (see Fig. 5-b). The

Fig. 4. Web-based tools to create vo-* ontologies. Logon to the system with a FOAF profile, add functionalities, add owner, add location(s), etc.

application logic was to blink a lamp and to display a message on a screen, following the reception of an incoming call. Therefore, the application was requiring an object able to ‘blink’, another one able to ‘display’ and the last one able to ‘fire an event’. When issuing the different requests, the results that was presented were as follow. • The request “objects able to blink” returns all the lamps of the different smart spaces, as well as a screen configured as delivering blinking by being turned on and off several times. • The request “objects able to display” returns the different screens and photo frames contained in all the smart spaces. • The request “objects able to fire an event” returns the different phones of each smart spaces, as well as a mailbox and a chair coupled with a pressure sensor. To sum up, every time we passed these tests, the returned results were corresponding to what was expected. Testing our approach also requests a large number of connected objects to be semantically described as well as a large number of users owning and sharing connected objects. As few instrumented smart spaces are existing in our offices, we are currently implementing additional tools, to simulate the existence of millions of connected objects, dispatched amongst dozens of smart spaces (almost simulated as well). Some tools were already created while some are currently in development. As an instance, to obtain a set of FOAF profiles describing people, we have used our corporate directory providing APIs to retrieve information about any employee: his hierarchy, siblings and subordinates. Based on these data, we used the Jena API4 to create a FOAF profile per employee. For the 4 Jena

Semantic Web Framework, http://jena.sourceforge.net/

the perceptions they have about their objects. It consists also in extending our techniques to retrieve the ‘context of search’ by using additional knowledge bring by requesters part of the same ‘social network’. Finally, a part of our work is to follow the deployment of smart spaces such as [3] to perform tests on real situations. In parallel to the refinement of each axis, a part of our work is to finish the developments of our simulation tools, to validate our approach. R EFERENCES

(a) Fig. 5.

(b)

Connected objects proposed following a search request

simulation of smart spaces, we have used Google Geocoding API5 with the information retrieved from our corporate directory to build a vo-location ontology containing a set of places. The simulation of connected object descriptions relies on tools that are currently in development. As an instance, we are developing a program affecting a random number of functionalities, states and inputs per functionality to a connected object, in order to obtain vo-fsm. Once done, all these tools will enable the creation of different instances of vo-core and thus, will enable tests validating all technical contributions bring by this paper to be passed. VIII. C ONCLUSION With the emergence of Web-enabled objects, the success of the ‘Web-of-Things’ will rely on the development of efficient techniques allowing to search for connected objects. We have described an approach based on the creation of semantic profiles for connected objects, the use of machine learning techniques to match ontological structures of different objects and the use of a ‘context of search’ to predict the type of algorithm to perform upon the reception of a request. Our approach takes its foundations from the Semantic Web and uses different kinds of ontology to provide an accurate and flexible description for connected objects. It also uses well founded notions of semantic similarity, expressed in terms of the joint probability distribution of the concepts involved. Use of machine learning follows similar ideas than [5] which makes our approach easily extensible. Finally, we have introduced a ‘context of search’ and how to predict it, using the type of requester issuing a request as well as the organization of a smart space. First experiments have validated the rationale of using semantic profiles to retrieve connected objects matching a set of requirements. Our main line of future research involves refining our semantic profiles by driving surveys on end-users to better understand 5 Google Geocoding API, http://code.google.com/intl/fr-FR/apis/maps/ documentation/geocoding/

[1] K. Aberer, M. Hauswirth, and A. Salehi. Infrastructure for data processing in large-scale interconnected sensor networks. In Proceedings of the 2007 International Conference on Mobile Data Management, pages 198–205, Washington, DC, USA, 2007. IEEE Computer Society. [2] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. http:www.scientificamerican.comarticle.cfm?id=the-semantic-web; 20 May 2009, 2001. [3] M. Boussard, B. Christophe, O. L. Berre, and V. Toubiana. Providing user support in web-of-things enabled smart spaces. In D. Guinard, V. Trifa, and E. Wilde, editors, WoT, page 11. ACM, 2011. [4] B. Christophe, M. Boussard, M. Lu, A. Pastor, and V. Toubiana. The web of things vision: Things as a service and interaction patterns. Bell Labs Technical Journal, 16(1):55–62, 2011. [5] A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Ontology matching: A machine learning approach. In Handbook on Ontologies in Information Systems, pages 397–416. Springer, 2003. [6] ericsson.com white paper. More than 50 billion connected devices, 2011. [7] Y. Kalfoglou and W. M. Schorlemmer. Ontology mapping: The state of the art. In Y. Kalfoglou, W. M. Schorlemmer, A. P. Sheth, S. Staab, and M. Uschold, editors, Semantic Interoperability and Integration, volume 04391 of Dagstuhl Seminar Proceedings. IBFI, Schloss Dagstuhl, Germany, 2005. [8] K. kiong Yap, V. Srinivasan, and M. Motani. M.: Max: Human-centric search of the physical world. In In: Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems (SENSYS05, pages 166–179, 2005. [9] V. Kolovski, B. Parsia, Y. Katz, and J. Hendler. Representing web service policies in owl-dl. In In International Semantic Web Conference (ISWC, pages 6–10, 2005. [10] D. Martin, M. Paolucci, S. Mcilraith, M. Burstein, D. Mcdermott, D. Mcguinness, B. Parsia, T. Payne, M. Sabou, M. Solanki, N. Srinivasan, and K. Sycara. Bringing Semantics to Web Services: The OWL-S Approach. In J. Cardoso and A. Sheth, editors, SWSWPC 2004, volume 3387 of LNCS, pages 26–42. Springer, 2004. [11] B. Ostermaier, K. R¨omer, F. Mattern, M. Fahrmair, and W. Kellerer. A real-time search engine for the web of things. In Proceedings of Internet of Things 2010 International Conference (IoT 2010), Tokyo, Japan, Nov. 2010. [12] M. P. Papazoglou, P. Traverso, S. Dustdar, and F. Leymann. Serviceoriented computing. Communications of the ACM, 46:25–28, 2003. [13] M. Ruta, T. D. Noia, E. D. Sciascio, F. Scioscia, and E. Tinelli. A ubiquitous knowledge-based system to enable rfid object discovery in smart environments. In Q. Z. Sheng, Z. Maamar, S. Zeadally, and M. Cameron, editors, IWRT, pages 87–100. INSTICC PRESS, 2008. [14] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical owl-dl reasoner. Web Semant., 5:51–53, June 2007. [15] C. C. Tan, B. Sheng, H. Wang, and Q. Li. Microsearch: A search engine for embedded devices used in pervasive computing. ACM Trans. Embed. Comput. Syst., 9:43:1–43:29, April 2010. [16] H. Wang, C. C. Tan, and Q. Li. Snoogle: A search engine for pervasive environments. IEEE Trans. Parallel Distrib. Syst., 21:1188– 1202, August 2010. [17] I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington, MA, 3 edition, 2011. [18] T. Yan, D. Ganesan, and R. Manmatha. Distributed image search in camera sensor networks. In Proceedings of the 6th ACM conference on Embedded network sensor systems, SenSys ’08, pages 155–168, New York, NY, USA, 2008. ACM.