Retrieving and Materializing Data in Hybrid Mediators

3 downloads 0 Views 943KB Size Report
Edited by Lucien M. Le. Cam and Jerzy Neyman. University of California Press,. 1967. [15] J. A. Hartigan and M. A. Wong, “A K-Means Clustering. Algorithm”, in ...
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 3 (2016) pp 2128-2134 © Research India Publications. http://www.ripublication.com

Retrieving and Materializing Data in Hybrid Mediators Samir Anter Software Project Management Team, Computer Science and Systems Analysis, National Higher School, (ENSIAS) Mohammed V University, Rabat, Morocco. Ahmed Zellou Software Project Management Team, Computer Science and Systems Analysis, National Higher School, (ENSIAS) Mohammed V University, Rabat, Morocco. Ali Idri Software Project Management Team, Computer Science and Systems Analysis, National Higher School, (ENSIAS) Mohammed V University, Rabat, Morocco.

Different approaches were proposed for this reason. Among these, there are the virtual integration systems [3], and hybrids integration systems [4][5]. The first one ‘Fig.1’ is defined as "an approach to providing an intermediate tool between users or applications on one hand, and a set of autonomous, heterogeneous, distributed and scalable information sources on the other hand. This tool offers a transparent access service to sources through a unique interface and a single query language". [6]

Abstract With the emergence of the new generation of information technologies and telecommunications, the mass of information produced by individuals and enterprises has increased in a considerable manner. Thus, and in order to manage this diversity of information, the integration systems were proposed. Among these, we find the hybrid information integration systems. They allow materializing a part of data in a local database, while integrating virtually the other part. As these materialized one is organized as views, it becomes necessary to propose algorithms for this objective. Among the most interesting ones, there is k-schema. It allows organize the attributes into a set of views, while affecting an attribute to a single view. This choice causes the not loading some of data that are highly requested by users, and at the same time, it loads other that are rarely requested. In this paper, we propose a new algorithm in this end. In this latter, it is authorized to assign a same attribute to several views. We also proposed new functions for calculating the dependencies between attributes.

Queries

Results

Mediator Queries processing

Keywords: Views Creation, NK-Schema, Hybrid information integration systems, Materialization

Wrapper

Wrapper

Wrapper

Source 1

Source 2

Source 3

Figure 1: Architecture of the virtual integration system

Introduction

This approach is to give user the illusion of querying a homogeneous and centralized system, by avoiding him to have finding relevant sources for its queries, and querying them one by one according to their particular specificities. For this objective, a unified schema, named global or virtual, describing the sources is provided. It serves as a support for queries formulation. These queries are subsequently decomposed into sub-queries and will be executed on information sources, containing the relevant information. The correspondence between the global schema and local schemas of sources The correspondence between the global schemas and the source schemas is provided, at the mediator level, by mapping relationships.

The constant evolution of networks in the last two decades has led to a vulgarization of information in quantitative and qualitative terms. This vulgarization has caused the heterogeneous information, stored in autonomous and distributed sources. According to a study by IBM in 2007, 79% of companies have more than two sources and 25% more than fifteen [1]. In conclusion, today's information systems are composed of several sources produced independently and are generally autonomous, heterogeneous and distributed [2]. The diversity of these sources is one of the main difficulties encountered by users. Thus, it becomes necessary to establish the access systems making, transparent, the aspects of distribution, autonomy and heterogeneity.

2128

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 3 (2016) pp 2128-2134 © Research India Publications. http://www.ripublication.com Concerning the wrappers, it translates the queries expressed in terms of the query language, used by the mediator, to queries expressed in the query language used by each source. In addition, it transforms the responses expressed in the data model used by each source to the data model used by the mediator. This system has the advantage to provide an updated result, because the information is extracted directly from sources. However, it suffers from certain defects. On the one hand, the response time is rather high. This is mainly due to the time spent in retrieving information from the remote sources, in response to user queries. On the other hand, the sources are not always available. Therefore, the queries asked on these sources are not satisfied. To remedy this, it is possible to load all the data in a local database to mediator. This choice even if significantly increase the response time of queries, it seems inappropriate for several reasons: On the one hand, the size of integrated data is very large, making their storage impossible. On the other hand, the freshness of data, which is one of the most important factors in the integration systems, is threatened. Indeed, it impossible to update this large mass of data each time it is changed in sources. For these and others reasons, the hybrid information integration systems are proposed ‘Fig.2’. These, can be defined as: “systems where a part of data is queried in demand as in the virtual approach, while another part is extracted, filtered and stored in a local database” [4]. Or as: “systems in which some attributes are materialized and others are virtualized” [7].

others are. Thus, it will be very useful to extract, and then organize them as candidate views for materialization. The remainder of this paper is divided into six sections. The second section presents some related works; a discussion is presented in the third section. The fourth section presents our approach, then an evaluation in the fifth section, before ending with a conclusion and future works.

Related Work The problem of creating views to materialize in a hybrid information integration system is a crucial task in the quality of the latter. Among the approaches that have tried to propose a solution to this problem are: Ariadne [8,9,10], and PHIS [11,12]. Ariadne Ariadne is a hybrid information integration system. It supports sources of semi-structured data in a web environment. The proposed approach in this system [8] tries to identify the data to materialize based on the distribution of user queries, the cost, the up-to-date's cost of the materialized part and the frequency of this update. An algorithm called CM (Cluster and Merge) [8,9,10] has been proposed to extract the most requested data. This classifies queries to determine the categories of data which users are interested in, before merging in order to make them compact. To do this, the algorithm determines for each query, all classes and subclasses of interest. For example, in a system dedicated for tourists to get information about hotels and rental cars, consider the following query: Select price, year From car Where make =”Ford” or make=”Audi”

Results

Queries

Mediator

In this query, the class of interest is the class "car" as the subclasses of interest are "Audi cars" and "Ford cars" Then it extracts for each class, the attribute groups and requested how often (price and year in this case). In this step, CM merges groups of attributes for each class, before merging these latter to construct compact ones. The merger is accomplished by the procedure merge-class(). This approach has some drawbacks. First, there may be data that have rarely figured in user queries and will appear in the final classes. Second, the number of clusters obtained can be very important. This will make their merger very difficult seen impossible. Third, the size of the views created is very important that will explode the space dedicated to materialization. To remedy this, the approach PHIS was proposed.

Local database Queries processing Wrapper

Wrapper

Wrapper

Source 1

Source 2

Source 3

Figure 2: Architecture of hybrid information integration system

The choice of the part to materialize in a hybrid information integration system is a crucial task. Indeed, a system where this part is well chosen is a system where a large number of queries are satisfied in materialized part, which will reduce the queries response time. Thus, the problem to which we should provide solutions in this paper is: how to create this part to materialize? To do this, we are based upon the assumption that from the data manipulated by the system, such a pattern is present in user queries, ie some classes are queried more frequently than

PHIS Generally, in a mediation system, a global diagram representing the area of use is provided. This is in terms of the latter are expressed the user queries. By analyzing these ones, PHIS [12] determines, among all the attributes of a global schema, those in which users are interested, i.e. those that appear very often in the queries.

2129

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 3 (2016) pp 2128-2134 © Research India Publications. http://www.ripublication.com These are then organized into classes (view schemas). For this reason, an algorithm called k-schema [12,13] has been proposed. This latter, is based on the dependency between the attributes in order to divide them into compact classes (view schemas). Thus, two attributes which their degree of dependency is very high, are candidates to appear in a single view. The latter represents the number of times that two attributes have appeared together in the same query. Thus, we say that two attributes are dependent, if they appear very often together. Thus, for each pair of attributes X and Y in the set of attributes of interest, the degree of dependence is defined by:

number of attributes appeared in user queries. It is obtained by the following formula: N (6) k

 Where, N is the interest attributes number, and w the average number of attributes per query. After construction the view schemas, it remains to assign values to the attributes of each one, to build candidate views for materialization. To do this, a function called extract-values was implemented. The latter extract extracts for each attribute, its most frequent values. [11,12,13] These values are, assigned to attributes constructing instances of each schema. However, the number of instances is often high. Thus, it becomes necessary to remove some of them. To do this, a degree of attribute-attribute-values dependency associated with each pair of values taken by each pair of attributes is calculated [11,12,13]. It will define the intrainstance dependency on which is based to select those that will be discarded. The degree of attribute-attribute-values dependency is calculated for each pair of values taken by a

(1) The degrees of dependency between all attributes of interest are represented in the matrix-attribute-attribute-dependency, that is a square matrix M of size N defined by:   ( Ai , A j ) if i j M  (mij ) such as: mij   (2) Otherwise  0 Based on the dependency between the attributes, it becomes possible also, to calculate it between an attribute and a class. Thus, for a class C = X1, X 2 ,… , X N and an attribute X. The

{

couple of attributes (X, Y) by the following function:

}

(7)

degree of dependency of the latter to C is defined as: (3) Where J As the goal of the algorithm proposed in PHIS is the construction of compact views, a function that calculates the degree of within-cluster dependency is proposed. The latter is defined for each class C = X1, X 2 ,… , X N as:

{

d (C) =

2 N (N -1)

From a sets

j

(4)

i=1 j=i+1

Based on these functions, the algorithm K-scheme is proposed. Is an iterative algorithm to divide the attributes into k clusters (schemas), while maximizing the sum of withincluster dependencies expressed as:

å d (Ci )

and

of values of the attributes X and Y, we

In last step, it remains only to define the degree of intrainstance dependency. Thus, for the instance I = {X1 = V1 , X 2 = V2 ,.. , X N = VN }, the within-instance dependency is

k

t=

the frequency in which the attributes X

constructed a NVX× NVY matrix T=(tij), where rows represent the values taken by attribute X and columns the values taken by the attribute Y, and where: (8) tij   X ,Y (Vi , V j )

N -1 N

i

(V ,V ) is i j

and Y have appeared, in a same query, respectively, with the values Vi and Vj.

}

å å j( X , X )

X ,Y

expressed by:

(5)

 (I ) 

i=1

K-schema has three phases: (i) Define the number k of classes and initialize each one by an attribute such that they are less dependent upon each other, in order to optimize the algorithm. (ii) For each attribute, calculate its degree of dependency to different classes and assign it to the class to which is more dependent. (iii) Stop if the sum of degree of intra-class dependencies t cannot increase, otherwise return to step (ii).

N 1 N 2    (V , V ) N ( N  1) i 1 j i 1 X i , X j i j

(9)

The instances that will be kept are those with a withininstance dependency is greater than a threshold called instance-dependency.

Discussion As we can see, k-schema is a classification algorithm inspired from k-means [14,15]. It divides a set of attributes into k clusters. To do this, it is based on the dependencies between attributes. It exchanges them between clusters while maximizing the within-clusters dependencies. In other words,

The result obtained in this step is a set of compact view schemas, subject we chose the right value of k number of views. Thus, PHIS calculate the value of k from the average

2130

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 3 (2016) pp 2128-2134 © Research India Publications. http://www.ripublication.com but only in those with a price between 400MAD1 and 800MAD. This constraint cannot be taken into consideration because the attribute “price” was assigned only to the first schema. This is because in the PHIS approach, an attribute cannot be assigned to a multiple schemas. This choice is unfair. Indeed, the degree of dependency of the attribute “price” to the schema (make, model) is (900 +580)/2, while to the schema (room_type, stars) is (570 +900)/2. It is true that it is dependent to the first schema than to the second, but with a negligible difference. This means that users put, also constraints on the price in their search for hotels. Thus, the fact of not considering it in the choice of the hotels information to materialize, can lead to load some data that users are not interested in. In our case for example, we load all single rooms in “3*” hotels even if their price is not between 400MAD and 800MAD. This choice is a waste of time and storage space. Indeed, users do not request this second type of data. From this example, we note that the attribute price should be assigned to S2. This involves adding a new constraint in the selection phase of the part to materialize, and thus a high accuracy. In our case for example, we specified that among the single rooms in “3*” hotels, users are interested only in those with a price between is 400MAD and 800MAD, which will not be possible without assigning the attribute price to S2. In the next section we will propose our approach, which allows assigning an attribute to multiple schemas.

it affects each attribute to a single class to which is highly dependent. The method of classification adopted in this approach assumes that the boundaries between the clusters are fully defined. Thus, the clusters produced in this case are disjoint. However, in many real cases the boundaries between the clusters may overlap. Indeed, the attributes may not belong entirely to a single class, but partially to several. Considering a system dedicated for tourists. This provides information about hotels and rental cars. For this purpose, it integrates several sources of information. The global schema (GS) of this one is composed of the following attributes: Make: Car's make. Model: Car's model. Fuel: Car's fuel type. Air conditioning: (Yes/No) Transmission: (Automatic/Manual) Color: car's color. Name: Hotel name. Stars: Hotel star rating. Room_type: (single, double, suite, duplex, ..). Price: Car’s renting price / room’s reservation price. From the asked queries on this global schema, we have extracted the attributes of interest. In other words, the attributes highly demanded by users. In our case, these are: Make, Model, Stars, Room_type and Price. Then, we have built the matrix of Attribute-Attribute dependency presented below.

Nk-Schema Table 1: Matrix of attribute-attribute dependency

The algorithm k-schema is an algorithm that divides a set of attributes into compact and disjoints subclasses. In other words, from a set SA = X1, X 2 ,.., X N of N attributes, it

{

æ make model price room type stars ö ç make 320 900 0 0 ÷ ÷ ç 580 0 0 ÷ ç model price 570 900 ÷ ç ç room type 375 ÷ ç ÷ è stars - ø

}

constructs k subclasses ci such as: ci   i  1, k  ci k

c j   i, j / i  j

(10)

ci  SA i 1 As we explained above, an attribute should belong to more than one class at a time. For this reason, we proposed the algorithm Nk-schema. The latter is in two phases. First, we will construct a set of k disjoint classes as in PHIS approach. In a second step, is calculated for each attribute, a relation of membership to different classes, and assigns it to those with a degree is exceeds a threshold. The membership relationship considered in our case is the degree of dependence between an attribute and a class. For this reason we have defined new functions. The first one is:

Using the algorithm k-schema, we obtained the following two schemas: S1(make, model, price) and S2(room_type, stars). In the remainder of this section, we will focus in particular on the second schema. We have extracted, in user queries, the values taken by each of the attributes. Thus, we have obtained for the attributes "room_type" the classes of values: “single”, “double”, “suite”, and for the attributes “Stars”, the values: “3*”, “4*” and “5*”. In a second step, we have calculated for each pair of values taken by the attributes (room_type, stars), and we found that the users are interest in the single rooms in “3*” hotels, double rooms in “4*” hotels and also suites in “5*” hotels. Thus, the PHIS approach recommends the materialization of all single rooms in “3*” hotels, double rooms in “4*” hotels and also suites in “5*” hotels. By analyzing the user queries, we noticed, for example, that the users are not interested in all single rooms in “3*” hotels,

(11) It calculates for each pair of attributes (X,Y), the number of times that they appeared together in a single query. 1

2131

MAD: MOROCCAN’S DEVISE (1€ ≈ 11MAD)

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 3 (2016) pp 2128-2134 © Research India Publications. http://www.ripublication.com

{

Also, we defined a second function defined by: (12) This represents the number of times the attribute X has appeared in user queries. Based on these two functions, we define a function that represents the degree of dependence between two attributes. As we can see, this function has been defined in the PHIS approach. However, it associates to each pair of attributes, the number of times that they appeared together. But this is not quite correct. To explain this, taking an example of four attributes X, Y, Z and T. Assuming that X has appeared 25 times, Y 37 times, Z, 7 times and T 6 times. Assuming also that X and Y have appeared together 10 times ( j ( X ,Y ) = 10 ) and Z and T, 5 times ( j (Z,T ) = 5 ). According to the PHIS approach, the dependency between X and Y is higher than the dependency between Z and T. It is clear that this conclusion does not reflect reality. Indeed, it is not because j ( X ,Y ) … j (Z,T ) , that X and Y are more dependent than Z and T. This is justified by the fact that among the 52 (37+25-10) queries where X and Y have appeared, they have appeared together 10 times only. On the other hand, even if Z and T have appeared together only 5 times, they appeared in total in 8 requests only. This proves that X and Y are less dependent than Z and T. For this reason, we have defined a new function j ¢ . This latter represents the number of times that two attributes have appeared together in relation to their total number of appearance. It is defined as follows:   : SA  SA  0,1

(X ,Y )

(X ,Y ) 

k

 

 X ,C



   (C , C ) j

The set of classes obtained in this step are disjoint. In other words, from a set SA = X1, X 2 ,.., X N of attributes of interest,

{

{

}

}

we construct a set SC = C1,C2 ,..,C P of classes, such as Ci … C j = f "i, j / i ¹ j .

In a second step, we calculate for each attribute its dependency to different classes to which it does not belong. Then they are assigned to those to which their dependencies are higher than their intra-class dependencies. In other words:

}

C j  SC X i  SA / X i  C j if  ( X i , C j )   (C j ) then C j  C j   X i 

(14)

(18)

This choice is justified by the fact that since m ¢( X i ,C j ) ³ d ¢(C j ) , it exists in the class C j the attributes that their dependency to Xi is greater than their dependency to certain attributes that are already assigned to the class C j . Thus, it is unfair to not assign it also to this class.

}

N 1 N  2  ( X i , X j ) if N  2  C   N ( N  1) i 1 j i 1  Otherwise  0

(17)

Where k is the number of classes obtained by the æ Nö formula k = E ç ÷ 2, such as N is the attributes number of èwø interest and w the average number of attributes per query [11,12].

In the remainder of this paper, we will also need two additional functions d ¢ and y . The first one will represent the intra-class, and the second, the interclass dependency. Thus for a class C = X1, X 2 ,.. , X N we have:

  : SC   0,1

i

i 1 k 1 k

i



{

k (k  1) 2

 (C )

i 1 j i 1

  : SA  SC  0,1



}

As we explained earlier, our goal is to create views candidates for the materialization. Thus, it is very important that the dependency between the attributes assigned to the same view should be maximized as possible, and at same time, the dependency between the views should be minimized as possible. To achieve this objective, we exchange attributes between classes in an iterative manner, while maximizing the value of:

 ( X )   (Y )   ( X , Y ) Similarly, we define a function that associates to the attribute X a degree of it dependency to the class C = X1, X 2 ,.. , X N . It is defined as follows: 1 N  ( X , X i ) if C   and X  C   N i 1  N   1  ( X , X i ) if C   and X  C   ( N  1) i 1  0 if C  

{

(16)

(13)

{

}

Also, for C1 = X1, X 2 ,.., X N and C2 = Y1,Y2 ,..,YN we have: 1 2

Let us return to our example of the system dedicated to provide information about hotels and rental cars. By applying the new function of attribute-attribute dependency, we got the matrix of attribute-attribute dependency represented below:

(15)

2

2132

E(x) : Integer part of x

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 3 (2016) pp 2128-2134 © Research India Publications. http://www.ripublication.com Table 2: Matrix of attribute-attribute dependency

Noting that the attribute X1 was assigned to three views simultaneously. The same for the attribute X 29 that has

make model price room _ type stars    make 0,18 0, 28 0 0     0,18 0 0     model  price 0,17 0, 27       0, 20      room _ type   stars       

assigned to two views. Thereafter, in order to compare our algorithm with conventional k-schema [12,13], we generated repeatedly a different global schema on which we have executed a set of randomly generated queries. Every time we call the algorithms k-schema and NK-schema in parallel. These, generates two sets of schemas. In a second step, we extract the values token by each of the attributes of the two sets obtained, in order to construct the views to materialize [12,13]. In a final step, we have calculated, in both cases, the number of satisfied queries in the materialized part and the size of this one. From these results, we plot the following graphs:

The first thing we notice, is that even if the attribute price appears in the same number of times (900) (cf. Table 1) with the attribute make as with the attribute stars. However, its dependency to the first one is less than to the second. This is justified by the fact that the attribute make has appeared in total (900+300), more times than the attribute stars (900+375). Applying our algorithm on these attributes we obtain the same views S1 (make, model, price) and S2 (room_type, stars) obtained with the classical k-schema. However, in our new approach, we must not stop here, but we need to check if there are attributes that have a degree of dependence to certain classes greater than their dependencies intra-class of this later. We find that price is the only attribute that its dependence is greater than the intra-class dependency ( m¢( price, S1) = 0, 22 ) and m¢( price, S2) = 0,20 ). This implies that the attribute price will also be affected to S2. Thus, the obtained views are: S1(make, model, price) and S2(room_type, stars, price). In the next section, we will test our algorithm Nk-schema, and then we will compare the obtained results with those obtained by k-schema.

Figure 4: Size of the materialized part As we noted in “Fig. 4”, our approach brings no gain in terms of the materialization space's size. This is normal since NKshema affects certain attributes to different views at the same time, which involves loading additional data. However, all data loaded in this case are highly requested by users, unlike the other approach that can, as explained above, load some data that are rarely requested by users instead of other that are.

Experimental Results To test our algorithm, we have developed a prototype. In a first step it generates (randomly) a global schema. In a second step, it generates (randomly too), a set of queries executed on this one. From these queries, it calculates for each attribute its frequency of appearance and also the degree of dependency with the other attributes before calling our algorithm for constructs the view schemas. In the example below, we generated a global schema GS X1, X 2 ,.., X 30 with 30 attributes. On the latter, we executed

(

)

1000 queries and we obtained the result shown in ‘Fig. 3’.

Figure 5: Queries satisfied in the materialized part

It is true that our approach has not brought any improvement in terms of the space of materialization, but not in terms of the quality of the data. Thus, as can be observed in “Fig.5”, the number of successful queries in the materialized part is higher than in the case of the use of classical k-schema. This implies that the data loaded in our approach are of the high quality. However, we can ask the question: is that it is not because the size of the materialized part is higher than the number of satisfied queries in this section is also high? To answer this question we plotted the graph shown in “Fig.6”.

Figure 3: Example of obtained results

2133

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 3 (2016) pp 2128-2134 © Research India Publications. http://www.ripublication.com [3]

[4]

[5]

Figure 6: Queries satisfied in the materialized part relative to the size of this one.

[6]

This graph shows the evolution of the number of queries satisfied in the materialized part relative to the size of this one. We notice that this ratio is still high in the case of the use of NK-Schema than in the case of the use of K-Schema. This is due to the fact that our approach is very strict in the choice of the part to materialize, implying that the materialized data is high quality.

[7]

[8]

Conclusion and Outlooks The creation of views candidate for materialization is a crucial task in the quality of the hybrid information integration system. Indeed, a system where a maximum number of queries are satisfied in the materialized part, is a system where the queries response time, which is a very important factor, is reduced. As materialized data is organized as views, we propose an algorithm that allows their creation. To do this, we select among all attributes appeared in the user queries. These are then classified to obtain view schemas to materialize. We have proposed an algorithm that we called NK-Schema. This is unlike other approaches; it allows the assignment of a single attribute to different views at once. This allows increasing, as we have explained above, the precision in the selection phase that increases the quality of the data to materialize. We also proposed new functions for calculating the dependencies between attributes. These values are calculated based only on the appearance of the latter in user queries. Then, we propose as a perspective of this work to introduce the domain ontology to calculate a semantic distance between attributes, which we can use in the construction phase of view schemas. Also, we are based only on the distribution of user queries for selecting attributes that will appear in the views. It will be very useful to use the user profiles to obtain information about their interests and therefore be considered in this phase and also in the updating phase of the materialization part.

[9]

[10]

[11]

[12]

[13]

[14]

References [1] [2]

[15]

L. Haas, “Beauty and the beast: The theory and practice of information integration”. ICDT, 2007. A. Elmagarmid, M. Rusinkiewicz and A. Sheth. " Management of Heterogeneous and Autonomous Database Systems ". Morgan Kaufmann, San Francisco, 1999.

[16]

2134

G. Wiederhold: “Mediators in the architecture of future information systems”. IEEE Computer, Vol. 25(3):38-49, 1992. J. Widom., Integrating Heterogeneous Databases: Lazy or Eager? , ACM Computing Surveys 28A(4), Décembre 1996. A. Voisard and M. Jürgens “Geospatial Information Extraction: Querying or Quarrying?”. In Goodchild M, Egenhofer M, Fegeas R, Kottman C, eds. Interoperating Geographic Information Systems. 1sted. Dordrecht: Kluwer Academic. 165–79, 1999. A. Zellou. “Contribution à la réécriture LAV dans le contexte de WASSIT, vers un Framework d’intégration de ressources ”. Thèse de Doctorat. Rabat. Maroc. Avril 2008. R. Hull, G. Zhou, University of Colorado. " A Framework for Supporting Data Integration Using the Materialized and Virtual Approaches". SIGMOD’96 6/96 Montreal, Canada.1996. N. Ashish, C. A. Knoblock and C. Shahabi, “selectively materializing data in mediators by analyzing user queries”, in Fourth IFCIS Conference on Cooperative Information Systems, pages, 1999. N. Ashish, “optimizing information mediators by selectively meterializing data”,Ph.D dissertation, Departement of computer Science. faculty of the graduate school, university of southern california, 2000. N. Ashish, “selectively materializing data in mediators by analyzing user queries”, in International Journal of Cooperative Information Systems Vol. 11, Nos. 1 & 2, 2002. S. Anter, A. Zellou and A. Idri, “personalization of a hybrid information integration system: Creation of views to materialize based on the distribution of user queries”, in ICCS’12, Agadir, Morocco, 2012. S. Anter, A. Zellou and A. Idri, “K-Schema: A new approach, based on the distribution of user queries, to create views to materialize in a hybrid information integration system”, Journal of Theoretical and Applied Information Technology, Volume 47, Issue 1, Pages 158170, 2013. S. Anter, A. Zellou and A. Idri, “MATHIS: A New Approach for Creating Views to Materialize in a Hybrid information integration system”, International Review on Computers and Software (IRECOS), Vol. 8. n. 3, pp. 816825, 2013. J. MacQueen. “Some methods for classification and analysis of multivariate observations”, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Volume I, Statistics. Edited by Lucien M. Le Cam and Jerzy Neyman. University of California Press, 1967. J. A. Hartigan and M. A. Wong, “A K-Means Clustering Algorithm”, in journal of the royal statistical society, 1979. W. Hadi, A. Zellou and B. Bounabat. “A Fuzzy Logic Based Method for Selecting Information to Materialize in Hybrid Information Integration System“. International Review on Computers and Software, Vol. 8, N.2, 2013, ISSN 1828-6003, pp: 489-499.

Suggest Documents