An MLP-based representation of neural tensor

5 downloads 0 Views 1MB Size Report
neural tensor networks have been introduced to complete. RDF knowledge .... Usage of the back propagation learning rule to train the proposed network for ...
Neural Comput & Applic DOI 10.1007/s00521-017-3053-1

ORIGINAL ARTICLE

An MLP-based representation of neural tensor networks for the RDF data models Farhad Abedini1 • Mohammad Bagher Menhaj2 • Mohammad Reza Keyvanpour3

Received: 22 June 2016 / Accepted: 13 June 2017  The Natural Computing Applications Forum 2017

Abstract In this paper, a new representation of neural tensor networks is presented. Recently, state-of-the-art neural tensor networks have been introduced to complete RDF knowledge bases. However, mathematical model representation of these networks is still a challenging problem, due to tensor parameters. To solve this problem, it is proposed that these networks can be represented as two-layer perceptron network. To complete the network topology, the traditional gradient based learning rule is then developed. It should be mentioned that for tensor networks there have been developed some learning rules which are complex in nature due to the complexity of the objective function used. Indeed, this paper is aimed to show that the tensor network can be viewed and represented by the two-layer feedforward neural network in its traditional form. The simulation results presented in the paper easily verify this claim. Keywords Knowledge base  Neural tensor network  MLP  Learning rules  RDF

& Mohammad Bagher Menhaj [email protected] 1

Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran

2

Center of Excellence in Control and Robotics, Electrical Engineering Department, Amirkabir University of Technology, 424, Hafez Ave., Tehran, Iran

3

Department of Computer Engineering, Alzahra University, Vanak, Tehran, Iran

1 Introduction RDF knowledge bases are composed of a set of triples, each of which is called fact. Every fact, includes two entities and a relation between them. The first and second entities are called subject and object, respectively, while their relation is called predicate. For instance, the fact that ‘‘Einstein was born in Germany’’ can be shown as the triple of \Einstein, born-in, Germany[, in which the Einstein is the first entity, Germany is the second entity, and born-in is the relationship between them. RDF knowledge bases are diverse including YAGO [1, 2], DBpedia [3], Freebase [4] and WordNet [5]. They are highly useful for solving problems such as information retrieval, NLP, question answering. Since RDF knowledge bases have been proposed for realization of semantic Web, they should be comprehensive. However, unfortunately there is no fully comprehensive knowledge base, where their completion, extension and enrichment are big concern of the current knowledge bases. So far, numerous methods have been presented for completion of such knowledge bases including [6–9] and for knowledge base enrichment and extension such as [10–17]. In knowledge base completion methods, new facts are reasoned from available facts of knowledge base. But the methods of knowledge base extension extract new facts from another resource. For example, in [10–12], the resource is structural texts. Since extraction of facts out of unstructured sources can be very demanding, methods such as SOFIE [13] have been propounded which use the facts available in the knowledge base itself to complete and enrich their facts. But the focus of this research is the knowledge base completion. In the SOFIE method, in order to complete YAGO ontology, a method has been proposed that is able to extract new facts out of unstructured texts. In this method, the

123

Neural Comput & Applic

accuracy of the facts obtained from the unstructured texts is measured by the facts available in the knowledge base. However, in [6], using reasoning in the neural tensor network and through the facts available in the knowledge base, a method has been presented for completion of the knowledge base. This method gives better results when compared with other associated methods such as [18–23] and is able to add more facts to the knowledge base in comparison with the methods such as the SOFIE. Indeed, so far, this method is state-of-the-art in the area discussed in this paper [6]. Nevertheless, one of the problems of the neural tensor network is that representation of its neuron mathematical model is not possible due to the presence of the tensor parameter. In addition, because of lack of differentiability of its objective function, it has a complex learning rule. For this purpose, it is suggested that a two-layer feedforward neural network [24, 25] can be used instead of the neural tensor network. Using a multilayer perceptron network, the network is easily representable, where it is possible to use the learning rules of multi-layer networks for it. To evaluate and compare the proposed method with other relevant methods, a standard data set is used on which the relevant issues are evaluated. Therefore, the scientific contributions of this paper are as follows: • • •

Illustration of correspondence of the neural tensor network with a multi-layer perceptron neural network. Representation of the neuron mathematical model of the neural tensor network. Usage of the back propagation learning rule to train the proposed network for results improvement.

In the following sections, first this network is thoroughly elaborated owing to the proximity of the proposed method with the neural tensor network, followed by presentation of the proposed method. In the suggested method, it is revealed that the neural tensor network is equivalent to a multi-layer perceptron network. For this purpose, first the tensor layer is shown with a new model. Then, it is used for the ministration of the neural tensor network as a multilayer perceptron network. This is followed by presentation of the learning methods of this network. In the next section, the practical results gained from implementation of the proposed method on standard data in relation with the previous method are presented. Finally, its results are concluded in the conclusion section.

2 Neural tensor network Neural tensor network was introduced by Socher for completion of knowledge base using available facts in the knowledge base that is state-of-the-art [6]. The concept of tensor has been used in some works such as [26–28]. But here, this concept is used in neural network. In this section,

123

this network model is presented. The model for each relation R is described as triple \e1, R, e2[ and presented by a neural tensor network that its inputs are two entities e1 and e2. If the entities are in that relationship, then the model returns a high score, otherwise a low one. Due to this, any fact can be scored with some certainty mentioned implicitly or explicitly in the database. The goal of neural tensor network is to complete a knowledge base using its available facts. It means that the model can learn that a triple \e1, R, e2[ is true as a fact. For this aim, a score is gained for each triple and then using a threshold each triple is classified into two classes of true and false. For instance, whether the relationship \e1, R, e2[ = \Einstein, born-in, Germany[ is true and with what score of certainty. If it is supposed that e1, e2 be two d dimension vectors of entities, then the neural tensor network-based function that predicts the relationship of two entities can be described as Eq. (1) [6].     e1 T T ½1:k ghe1; R; e2i ¼ U f e1 WR e2 þ VR þ bR ð1Þ e2 ½1:k

½1:k

where f = tanh, WR 2 Rddk , h 2 Rk , hi 2 e1T WR e2 for i = 1…k, VR [ Rk92d, UR [ Rd and bR [ Rd. Equation 2 shows a sample neural tensor network model with two slices and 3 dimension entities as instance of next sections representations. The basic idea for training of the model was that the score of each triple in the training set T(i) = \e1(i), R(i), e2(i)[ for i = 1…N should be a higher score than triples that its e1 or e2 has been replaced with another entity taken randomly. These triples that are made by replacing random entities ec are called corrupted triples (i) (i) (T(i) c = \e1 , R , ec[). 02 2 32 3 31 e21 w11 w12 w13 B6 76 7 7C C B6 ½ e11 e12 e13 6 4 w21 w22 w23 54 e22 5 7 B6 7C B6 7C B6 w31 w32 w33 e23 7C 6 ½ u1 u2 f B 2 0 3 7C C B6 0 0 32 w11 w12 w13 e21 7 B6 7C B6 7 76 7 C C B6 ½ e11 e12 e13 6 4 w021 w022 w023 54 e22 5 7 @4 5A w031 w032 w033 e23 1 0 e11 B e1 C B 2C C " # " #B C v11 v12 v13 v14 v15 v16 B b1 B e13 C þ Cþ B C B v21 v22 v23 v24 v25 v26 B e21 C b2 C B @ e21 A e21 ð2Þ If the neural tensor network parameters are X = U, E, W, V, b, the objective function was presented as Eq. (3) by Socher [6].

Neural Comput & Applic

JðXÞ ¼

N X C X

maxð0; 1gðT ðiÞ Þ þ gðTcðiÞ ÞÞ þ kkXk22

i¼1 c¼1

ð3Þ But, taking derivatives from this objective function is not possible. For this reason, an optimization method [29] has been used for its minimization. So, in this paper, it is suggested that neural tensor network can be matched with a standard neural network. With this proposal, standard learning rules can be used for the training. The suggested method is explained in continue.

Fig. 1 Representation of mathematical model of neural tensor layer neuron

W ¼ ½w11 ; w21 ; w31 ; w12 ; w22 ; w32 ; w13 ; w23 ; w33  P ¼ ½e11 e21 ; e12 e22 ; e13 e23 ; e11 e21 ; e12 e22 ; e13 e23 ; e11 e21 ; e12 e22 ; e13 e23 T g ¼ f ðWPÞ

3 The proposed method In the proposed method, it is indicated that the neural tensor network is equivalent to a multi-layer perceptron network. To this end, first the tensor layer is shown with a new model. Subsequently, it is used for representation of the neural tensor network as a multi-layer perceptron network. Next, the learning methods of this network are also presented. 3.1 Representation of the tensor layer The neural tensor network was described in the previous section. Since there are two input vectors in the tensor layer of this network and each of them is located at either sides of the parameter matrix of W, the neuron mathematical model of this layer cannot be shown directly. In this section, first a method is introduced for its representation and then a new model is presented. The tensor layer includes two input vectors of e1 and e2 multiplied by the parameter matrix of W. Equation (4) shows this layer with its results being a scalar number. For simplicity, the property vectors have been considered as three-element which are generalizable to further elements. 2 32 3 e21 w11 w12 w13 Tensor ¼ ½e11 e12 e13 4 w21 w22 w23 54 e22 5 ð4Þ w31 w32 w33 e23 The result of the above multiplication can be shown as (5). Tensor ¼ ½e11 e21 w11 þ e12 e22 w12 þ e13 e23 w13 þ e11 e21 w21 þ e12 e22 w22 þ e13 e23 w23

ð5Þ

þ e11 e21 w31 þ e12 e22 w32 þ e13 e23 w33  If the desired neural network contains only this layer, one can correspond it with a one layer neural network. For this purpose, first two vectors of W and P are defined into below and then the score of g from Eq. (1) is computed as (6).

ð6Þ

A point regarding the neural tensor network is calculation of the correlation between the two input vectors of e1 and e2 resulting in its increased accuracy in comparison with other models. In this paper, it is suggested that a vector representing the correlation between these two vectors be used instead of these two input vectors. To this end, one can use the P vector in (6). In fact, the network input is changed in this way. Therefore, representation of the mathematical model of the neural network neuron and the new parameter can be shown in Fig. 1 for which no bias value has been considered. This representation of tensor layer can be employed in representation of the neural tensor network in the form of a multi-layer perceptron network, presented in the following sections. 3.2 Representation of the neural tensor network in the form of a multi-layer perceptron network For simplicity in the representation, first the parameters V and b in the neural tensor network are not considered, which are added later. In the neural tensor network, after obtaining all values of tensor for all of the facts of the training set, one score called g is allocated to each of these facts. In order to distinguish the score of the true from incorrect facts, a threshold value should be obtained. Then, it is possible to determine whether the value obtained from this network belongs to the correct or incorrect pattern. The procedure of gaining the value of g in the neural tensor network with two slices is shown with further details in Fig. 2 and statement (7). 2 32 3 e21 w11 w12 w13 Tensor ¼ ½e11 e12 e13 4 w21 w22 w23 54 e22 5 w231 w32 w33 3e2 23 3 w011 w012 w013 e21 ð7Þ þ ½e11 e12 e13 4 w021 w022 w023 54 e22 5 w031 w032 w033 e23

123

Neural Comput & Applic Fig. 2 Representation of g score in neural tensor network with two slices

Through considering the parameter W in the form of (8) and by changing the input vector as (9), g can be restated as (10) with U parameter of (1) equation.   w11 ; w21 ; w31 ; w12 ; w22 ; w32 ; w13 ; w23 ; w33 W¼ ð8Þ w011 ; w021 ; w031 ; w012 ; w022 ; w032 ; w013 ; w023 ; w033 P ¼ ½e11 e21 ; e12 e22 ; e13 e23 ; e11 e21 ; e12 e22 ; e13 e23 ; e11 e21 ; e12 e22 ; e13 e23 T g ¼ U T f ðWPÞ

ð9Þ ð10Þ

Now, the obtained statement is representable, which can be shown as the neural mathematical model provided in Fig. 2. In this figure, the value of the parameter U has been regarded as 1 (UT = [1, 1]), where its values will be adjustable further. As can be observed, by changing the representation of inputs and the network parameters, a multi-layer perceptron network has been obtained with its output elements being summed up. However, in this figure, parameter U is not adjustable. In order to solve the problem of regulation of the U value, it is suggested that an output layer be added to this network for determination of correct and incorrect patterns. This multi-layer network is indicated in Fig. 3. The overall representation of this model is provided in Fig. 4. These figures can be generalized for further slices as well, signifying that one neuron can be considered for every slice. The calculation method of the output a in a perceptron network with L layers can be observed in (11). a0 ¼ P   alþ1 ¼ f lþ1 W lþ1 alþ1 þ blþ1 ;

l ¼ 0; 1; 2; . . .; L  1 ð11Þ

123

Therefore, by considering a = g, L = 2 and b = 0, the g2 output of the mentioned network can be obtained as (12). g0 ðP; RÞ ¼ P   g1 ðP; RÞ ¼ f 0 WR g0 ¼ TanhðWR PÞ ¼ g1   g2 ðP; RÞ ¼ f 1 UR g1 ¼ URT TanhðWR PÞ ¼ g2

ð12Þ

As can be seen, the result is equal to (10). Thus, this network can be equivalent to the neural tensor network. However, for the network output to be within the range of 1 and -1, tanh should be used instead of the linear function. This variation is shown in Fig. 5. The above figures were related to networks that included only tensor parameter and U. However, there is another type of this network including standard neural network parameters called V and b. According to Formula (1) and the points discussed in this section, the mentioned network can be shown as Fig. 6. In this way, the neural mathematical model equivalent to the neural tensor network was illustrated. In order to prove this claim, in the evaluation section, the results obtained from the training of these networks are compared. For this purpose, first the learning rules of this network should be described. Since the learning rules of this network are complex and as the obtained network is an MLP network, it is further demonstrated that by altering the training data, one can also use the BP learning rules for its training. 3.3 Training of the multi-layer perceptron network using the learning rules of the neural tensor network Since the available data set is the same as that of the neural tensor network, a method similar to the neural tensor

Neural Comput & Applic

Fig. 3 Two-layer perceptron network equal with the neural tensor network

of that fact per each of them. In this way, for every correct fact, some incorrect triples are achieved. Overall, a number of correct and incorrect triples will be achieved as the final training data. The objective function is applied to these data, with the cost of each stage being calculated. Using optimization methods, the network parameters can be altered through optimizing the cost of every stage. Accordingly, the trained network and the final value of parameters are obtained in a way that the objective function cost is reduced. Here, the optimization method of minFunc [29], previously used in the neural tensor network, has been employed. In the following section, in order to train the mentioned multi-layer perceptron network, application of a simpler learning rule is suggested.

Fig. 4 Overall representation of the model

3.4 Training of the multi-layer perceptron network using the BP learning rule

Fig. 5 The model with tanh function

network can be used for training of the new network. As (1) was derived in relation with each of the parameters in the neural tensor network at every stage, in this method (12) should be derived in relation with all parameters at every stage with its cost being calculated based on the objective function (3). The derivative with respect to parameter W is provided in (13), and the derivative with respect to parameter U is provided in (14). ogðP; RÞ ¼ URT Tanh0 ðWR PÞP oW

ð13Þ

ogðP; RÞ ¼ TanhðWR PÞ oU

ð14Þ

In order to obtain the cost of calculation of the objective function, first some facts are chosen out of the training data set, where some entities take the place of the second entity

One of the problems of the neural tensor network is the complexity of its training stage. For this reason, a new solution is suggested for simpler training of the proposed network in the section. It is possible to train the network in a simpler array by changing the training data. For this purpose, the training data should contain a set of correlated entities (P) as the input and the score of each of them as the aim output. Each of the correlated entities of P represents one fact of the training set, obtained through Eq. (9). With the help of this set, one can achieve the output of the neural tensor network. Considering the variations in the training data set, now the BP learning rule can be used for it. In the next section, the results obtained from this method are compared with those of the previous method.

123

Neural Comput & Applic

Fig. 6 Matching complete neural tensor network with two-layer perceptron network

4 Experimental results

7 NTN MLP

6

4.1 Evaluation of equivalence of both networks In order to evaluate the equivalence of both neural tensor and multi-layer perceptron networks proposed in this paper, a set of standard data has been used on which previous relevant studies have been evaluated [6]. Table 1 summarizes the statistical properties of this data set. In this data set, WordNet has been employed as an RDF knowledge base. In this data set, 112581 standard WordNet triples have been used as samples for the training, in which 38696 unique entities exist in 11 different relationships. A number of 2069 triples are also available for achieving the threshold value of each relation for distinguishing correct and incorrect facts. For training, the inputs are extracted from the training triples of the data set in order further even to each network. By implementing the learning rule of the neural tensor network on the relevant network and the altered learning rule on the proposed multi-layer perceptron network, the network parameters are regulated. For regulation of the Table 1 Data set properties [6] Data set

#R

#Ent.

#Train

#Dev

#Test

WordNet

11

38,696

112,581

2609

10,544

123

Cost

5 4 3 2 1 0

1 3 5 7 9 1113151719212325272931333537 9 4143454749

Iteraons Fig. 7 Diagram of implementation results of two methods

parameters, the optimization stage of the objective function (3) should be iterated up to a level at which the lowest cost occurs. These stages are iterated 50 times for these two networks with their results provided in Fig. 7. In this diagram, it is clear that the results obtained from the training of both networks are very similar to each other. In fact, in a single time of training for both networks, iteration of 50 times of the objective function optimization has been considered, where at every stage its cost is calculated. The results indicate that the costs vary with a similar rate. For clarification of the issue, each of the networks is trained 30 times, where Tables 2 and 3 present the average of minimum cost and the number of iterations for achieving each of these costs. The comparison of these two tables is also shown in Fig. 8. As can be observed, the

Neural Comput & Applic Table 2 Average of Min-Costs of two-layer perceptron training for 30 runs

Average Min-Cost #Iteration

Table 3 Average of Min-Costs of NTN training for 30 runs

Table 4 Comparison of two methods accuracy in test step NTN

MLP network

Accuracy (with b and V parameters)

86.21

86.2

Accuracy (without b and V parameters)

84.46

84.1

1.4501 25.66

Average Min-Cost

1.4506

#Iteration

26

1 0.8 0.6

2

0.4

1.5

0.2

1

0

0.5 0

10

20

30

40

Average MLP_MinCost 1

NTN_MinCost 2

50 60 Accuracy

70

80

90 100

Fig. 10 The accuracy obtained from the test stage in different iterations

Fig. 8 Comparison of two Tables of 2 and 3 1 0.8 0.6 0.4 0.2 0

1

2

3

4

5 6 7 Accuracy

8

9

10

11

Fig. 11 The accuracy of the proposed method in each of the 11 relations

Fig. 9 The results obtained from execution of 100 iterations

results are almost similar. In order to reveal the reason why the number of iterations of each training stage has been considered as 50, the results obtained from execution of 100 iterations are also shown in Fig. 9. As is evident in this diagram, after further iterations, the costs grew with a more dramatic rate and thus the desired target, which is cost reduction, cannot be realized. As is evident, the results obtained from both networks are very close to each other, based on which one can claim that the training stage of the proposed network is equivalent to that of the neural tensor network. After training, the networks are tested using the test triples of the data set. Upon conductance of this stage, both

networks indicate almost similar results implying the equivalence of both networks. The results obtained from this stage for networks with parameters V and b and without them are provided in Table 4. In this way, it is seen that the neural tensor network is not a new neural network, rather is another representation of the available neural networks. Since in the previous work, the neural tensor network has been trained with 100 iterations, the results displayed in this table are also the output of execution of 100 iterations. Nevertheless, it is further indicated that there is no need for excessive iterations. In order to show the proper number of iterations at the training stage, the accuracy obtained from the test stage is indicated in Fig. 10 for a multi-layer perceptron network trained with different iteration numbers. As can be seen, iteration of 60 times gives rise to an accuracy of 84.6, being better than the result of the iteration number of 100.

123

Neural Comput & Applic Table 5 Thresholds of each relation #Relation

1

2

3

4

5

6

7

8

9

10

11

Threshold

0.37

0.36

0.36

0.2

0.34

0.19

0.2

-0.19

-0.05

0.28

0.24

In order to reveal how the accuracy criterion is distributed in every relation, and as the desired network is separately trained for every relation, the accuracy of the proposed method in each of the 11 relations is shown individually in Fig. 11. As previously mentioned, after the training stage, the parameters are regulated, where through entering new inputs to the network, every input is given a score between 1 and -1. However, the threshold value should be specified. This value indicates that from which score above, the scores belong to the set of correct triples. For this purpose, the Dev data set with 2609 correct and incorrect triples has been used in order to achieve this threshold value. The results are shown in Table 5 for individual relations. So far, it was found that both neural tensor and multilayer perceptron networks with similar learning rules act in a similar fashion. However, it is further shown that with variation of the data sets, another learning rule called BP can be used for the proposed network.

learning rule here is lack of a data set suiting it, based on which the output of the previous method has been used. Unfortunately, the major problem in employing this method is that obtaining a proper data set for it is difficult. Nevertheless, in general, the aim was to show the fact that BP learning rule can also be used for this network. In similar applications where the score of every input is clear, employment of this method may result in suitable results. However, today’s many works such as [30–35] have been used neural tensor network idea, this model is also applicable inside deep neural networks such as [36], and in future, using such methods, the facts of a text can be classified to solve classification of text strings [37].

4.2 The results of training of the new network with BP learning rule

5 Conclusion

As mentioned in Sect. 3.2, through Eq. (9), one can represent the structure of the two vectors of the input entity as one vector representing the correlation between these two entities. In the new data set, the desirable output of each of these correlated vectors should also be available. For this purpose, the scores obtained from the previous method have been used. In fact, the data set available in Table 1 has been used with the only difference being the fact that the training data of this set include correct and incorrect patterns, in which correct and incorrect triples output have been considered as 1 and -1, respectively. However, in the new data set, for every triple of the training data, a score equivalent to that, i.e., a number between 1 and -1 is considered. In this way, or changing the training data set, the multilayer perceptron network can be trained. After the training stage, in order to achieve the threshold value for the correct and incorrect triples, the test stage can be carried out. At this stage, the results obtained from every input are measured against the threshold value. The accuracy criterion of this method in comparison with the previous one is shown in Table 6. Note that these results have been obtained on two different data sets. Indeed, the problem of using the BP

In this paper, it was indicated that how one can represent a neural tensor network using a multi-layer perceptron network for the RDF data model. In other words, it was shown that these two networks are equivalent to each other. For this purpose, first the neural tensor network was introduced and its problems were investigated. Although this method is the best technology in its own right, two of its major problems are no representation of its neural mathematical model in the complexity of its training stage. In order to solve the first problem, or changing the form of the inputs and parameters, a method was proposed for representation of this network. The obtained network was a multi-layer perceptron network, whose output equation was convertible to the output equation of the neural tensor network. Despite some changes in the derivative of the training stage, similar results were obtained. The results revealed that both networks are equivalent to each other. In other words, the neural tensor network is not a new network, rather another representation of the available networks. To solve the second problem, BP learning rule was used through altering the data sets. The results obtained from this learning rule indicated high accuracy, but unfortunately, achieving a suitable data set for this method is difficult. But this suggested method can be used for

123

Table 6 The accuracy criterion of this method in comparison with the previous one

Learning rule

Accuracy

BP

89.43

Similar to NTN

86.21

Neural Comput & Applic

applications in which two vectors have correlation and the score of every inputs is clear. In these cases, employment of suggested method may result in suitable results. Overall, in this paper, the neural tensor network was criticized. It was further shown that the correlation between two vectors of the entities of the network input can be demonstrated with one correlation vector, based on which the respective parameters can also be employed. This idea can be used in future studies in order to find new methods to achieving the correlation between the paired entities of the input before entrance to the network and thereby improving the obtained results. This model is also applicable inside deep neural networks such as [36], and in future the facts of a text can be classified to solve classification of text strings using such methods. Compliance with ethical standards Conflict of interest The authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership or other equity interest; and expert testimony or patentlicensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

References 1. Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 697–706 2. Hoffart J, Suchanek FM, Berberich K, Weikum G (2013) YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif Intell 194:28–61 3. Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. Springer, Berlin, pp 722–735 4. Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 1247–1250 5. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41 6. Socher R, Chen D, Manning CD, Ng A (2013) Reasoning with neural tensor networks for knowledge base completion. In: Advances in neural information processing systems, pp 926–934 7. West R, Gabrilovich E, Murphy K, Sun S, Gupta R, Lin D (2014) Knowledge base completion via search-based question answering. In: Proceedings of the 23rd international conference on World Wide Web. ACM, pp 515–526 8. He W, Feng Y, Zou L, Zhao D (2015) Knowledge base completion using matrix factorization. In: Web technologies and applications. Springer, pp 256–267 9. Zhao Y, Gao S, Gallinari P, Guo J (2015) Knowledge base completion by learning pairwise-interaction differentiated embeddings. Data Min Knowl Disc 29(5):1486–1504 10. Abedini F, Mirhashem M (2011) SESR: semantic entity extraction for computing semantic relatedness. In: International conference

21.

22.

23.

24.

25.

26.

27.

28.

29.

on advanced computer theory and engineering, 4th (ICACTE 2011). ASME Press Abedini F, Mahmoudi F, Jadidinejad AH (2011) From text to knowledge: semantic entity extraction using yago ontology. Int J Mach Learn Comput 1(2):113 Abedini F, Mirhashem SM (2012) From text to facts: recognizing ontological facts for a new application. Int J Mach Learn Comput 2(3):183 Suchanek FM, Sozio M, Weikum G (2009) SOFIE: a self-organizing framework for information extraction. In: Proceedings of the 18th international conference on World Wide Web. ACM, pp 631–640 Bu¨hmann L, Lehmann J (2013) Pattern based knowledge base enrichment. In: The semantic Web–ISWC 2013. Springer, Berlin, Heidelberg, pp 33–48 Hellmann S, Bryl V, Bu¨hmann L, Dojchinovski M, Kontokostas D, Lehmann J, Zamazal O (2014) Knowledge base creation, enrichment and repair. In: Linked open data—creating knowledge out of interlinked data. Springer International Publishing, pp 45–69 Khalatbari S, Mirroshandel SA (2015) Automatic construction of domain ontology using Wikipedia and enhancing it by Google Search Engine. Inf Syst Telecommun 3:248–258 Bu¨hmann L, Lehmann J (2012). Universal OWL axiom enrichment for large knowledge bases. In: Knowledge engineering and knowledge management. Springer, Berlin, Heidelberg, pp 57–71 Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: Conference on artificial intelligence (No. EPFL-CONF-192344) Jenatton R, Roux NL, Bordes A, Obozinski GR (2012) A latent factor model for highly multi-relational data. In: Advances in neural information processing systems, pp 3167–3175 Bordes A, Glorot X, Weston J, Bengio Y (2012) Joint learning of words and meaning representations for open-text semantic parsing. In: International conference on artificial intelligence and statistics, pp 127–135 Sutskever I, Tenenbaum JB, Salakhutdinov RR (2009) Modelling relational data using bayesian clustered tensor factorization. In: Advances in neural information processing systems, pp 1821–1828 Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 384–394 Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning. ACM, pp 160–167 Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993 Tavoosi J, Suratgar AA, Menhaj MB (2015) Stability analysis of recurrent type-2 TSK fuzzy systems with nonlinear consequent part. Neural Comput Appl 28(1):47–56 Phan AH, Cichocki A (2012) Seeking an appropriate alternative least squares algorithm for nonnegative tensor factorizations. Neural Comput Appl 21(4):623–637 Huang S, Chen J, Luo Z (2014) Sparse tensor CCA for color face recognition (Retraction of vol 24, pg 1647, 2014). Neural Comput Appl 25(7–8):2091 Ben X, Zhang P, Yan R, Yang M, Ge G (2016) Gait recognition and micro-expression recognition based on maximum margin projection with tensor representation. Neural Comput Appl 27(8):2629–2646 Schmidt M (2005) minfuc http://people.cs.ubc.ca/schmidtm/soft ware/minfunc.html

123

Neural Comput & Applic 30. Chang KW, Yih WT, Yang B, Meek C (2014) Typed tensor decomposition of knowledge bases for relation extraction. In: EMNLP, pp 1568–1579 31. Iyyer M, Boyd-Graber JL, Claudino LMB, Socher R, Daume´ III H (2014) A neural network for factoid question answering over paragraphs. In: EMNLP, pp 633–644 32. Angeli G, Manning CD (2014) NaturalLI: natural logic inference for common sense reasoning. In: EMNLP, pp 534–545 33. Cheng J, Zhang X, Li P, Zhang S, Ding Z, Wang H (2016) Exploring sentiment parsing of microblogging texts for opinion polling on chinese public figures. Appl Intell 45(2):429–442

123

34. Zhang X, Du C, Li P, Li Y (2016) Knowledge graph completion via local semantic contexts. In: Database systems for advanced applications. Springer International Publishing, pp 432–446 35. Shi B, Weninger T (2016) Discriminative predicate path mining for fact checking in knowledge graphs. Knowl Based Syst 104:123–133 36. Ong BT, Sugiura K, Zettsu K (2016) Dynamically pre-trained deep recurrent neural networks using environmental monitoring data for predicting PM2.5. Neural Comput Appl 27(6):1553–1566 37. Ma H, Tseng YC, Chen LI (2015) A CMAC-based scheme for determining membership with classification of text strings. Neural Comput Appl 27(7):1959–1967

Suggest Documents