agent = Andrew, predicate = returned, destination = home. â¢. 7 possible roles ... We use ADAM for training instead of plain SGD (more efficient). ⢠What the model ...
PDP Models Can’t Learn Relational Representations Guillermo Puebla & Leonidas A. A. Doumas
Department of Psychology, PPLS, University of Edinburgh, Scotland, UK.
Story Gestalt Model •
•
Taking the activation of the gestalt layer at the previous time step and combining it with the input sentence at current time allows the model to form a “gestalt” representation of the story “so far”.
•
•
Story example:
•
We use ADAM for training instead of plain SGD ( more efficient).
•
What the model can do:
•
agent = Andrew, predicate = decided-‐to-‐go, destination = beach agent = Andrew, predicate = drove, patient = Mercedes, destination = beach agent = Andrew, predicate = returned, destination = home
The model fails to capture the role of of texts with untrained concepts and texts that break the statistical regularities of the corpus:
•
Statistical regularity test:
Before testing the relational processing capabilities of the model, we replicated St. John ( 1998) original results to ensure that our implementation of the model was correct.
• • •
7 possible roles ( agent, predicate, patient or theme, recipient or destination, location, manner, attribute ) of which only the “predicate” role is mandatory:
Albert
•
Replication of previous results
This model process a story by taking as input a sequence of sentences, one by one, and forming a representation of the story presented so far in the gestalt layer. • •
•
Lois
•
However, the model can’t go beyond its training data set ( it can only use seen pairings).
•
Instead of trying to emulate relational processing without using symbols, the real problem is how to implement symbolic operations in a neural-‐like architecture.
•
Future work:
•
Andrew liked .
•
Andrew went-‐to the restaurant-‐0.
•
Andrew ordered .
• We plan to extend this simulations to complete distributed representations of concepts in the input layer of the model.
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Inference: if Clement goes to a restaurant, he won’t tip. Revision: if Clement, ordered expensive wine though, he will. Pronoun resolution: ‘He’ refers to Clement if there is no other male in the story.
References mean violation mean regular
0 contexts 8 contexts 0 contexts 8 contexts
Generalization •
Much of the positive evaluation that the Story Gestalt model has gained comes from its generalization capabilities.
•
St. John ( 1998) argued that the model is able to process untrained texts with the help of a highly combinational corpus.
•
Script example: • liked . • went-‐to the restaurant-‐0. • ordered .
•
Script restrictions: • ‘Andrew’ likes [‘pancakes’, ‘salad’]. • ‘Lois’ likes [‘banana’, ’peaches’].
•
Applying St. Johns procedure involves providing cases in which Andrew likes all kinds of food in other places different than the restaurant ( 1-‐8 here only on Lois).
Lois
•
• It has been argued that the Story Gestalt model forms a overall representation of the story presented so far ( in the gestalt layer). Machine learning research has shown that “deep transition” RNN are powerful time series learning machines ( Pascau et al., 2014). • Other related RNN architectures that don’t have a straightforward interpretation as forming a “gestalt” representation could show the same behavior as the Story Gestalt model ( e.g., LSTM, GRU).
•
Hummel, J. E., & Holyoak, K. J. ( 2003). A symbolic-‐connectionist theory of relational inference and generalization. Psychological Review, 110, 220–264.
•
Marcus, G. F. ( 1998). Rethinking eliminative connectionism. Cognitive Psychology, 37, 243–282.
•
Pascanu et al. ( 2013). How to Construct Deep Recurrent Neural Networks. arXiv:1312.6026
•
Rogers, T. T. & McClelland, J. L. ( 2014). Parallel Distributed Processing at 25: Further Explorations in the Microstructure of Cognition. Cognitive Science, 6, 1024–1077.
•
Rogers, T. T., & McClelland, J. L. (2008b). Précis of semantic cognition: A parallel distributed processing approach. Behavioral and Brain Sciences, 31(6), 689.
•
St. John, M. F. (1992). The story gestalt: A model of knowledge-‐intensive processes in text comprehension. Cognitive Science, 16, 271–306.
•
St. John, M. F., McClelland, J. L. (1990). Learning and applying contextual constraints in sentence comprehension. Artificial Intelligence, 46, 217–257.
Albert
New concepts test: 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
mean violation
0 contexts
8 contexts
New patient
Here, we show that even though the model is able to draw complex inferences based on the statistical regularities of its training corpus, it fails to capture the relational structure of input that breaks these statistical regularities.
The Story Gestalt model is able to draw complex inferences based on the statistical regularities of its training corpus. At first glance, it looks like the model is making inferences based on abstract roles.
0 contexts 8 contexts 0 contexts 8 contexts
No agent
•
Take for example the Story Gestalt model (St. John, 1998). If this model is presented with a story about a restaurant where Albert plays the role of the agent ordering food, the model would predict that Albert will eat the food and pay the bill instead of bringing the bill or receiving the payment.
•
mean regular
New agents
•
violation
Cross-‐script patient
Recently, Rogers and McCelland (2008, 2014) have argued that ‘gestalt’ PDP models ( St. John & McClelland, 1990; St. John, 1992) can learn to bind abstract roles to objects and use these representations to make inferences based on the relational structure of the input.
New patient
•
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
No agent
The ability of PDP models to capture abstract relational knowledge has been put into question repeatedly (e.g., Hummel & Holyoak, 2003; Marcus, 1998)
Discussion
Effect of applying a highly combinatorial corpus:
•
New agents
•
The core architectural feature of the model is the use of a ”deep transition” RNN:
•
Cross-‐script patient
Introduction