PDP Models Can't Learn Relational Representations

2 downloads 0 Views 659KB Size Report
agent = Andrew, predicate = returned, destination = home. •. 7 possible roles ... We use ADAM for training instead of plain SGD (more efficient). • What the model ...
PDP  Models  Can’t  Learn  Relational  Representations Guillermo  Puebla  &  Leonidas  A.  A.  Doumas

Department  of  Psychology,  PPLS,  University  of  Edinburgh,  Scotland,  UK.

Story  Gestalt  Model •



Taking   the   activation   of  the  gestalt   layer   at   the  previous   time   step   and   combining   it   with   the  input   sentence   at   current   time   allows   the   model   to   form  a  “gestalt”   representation   of  the   story  “so  far”.





Story  example:



We   use  ADAM  for  training   instead   of  plain   SGD   ( more  efficient).



What   the  model   can   do:



agent   =  Andrew,  predicate   =   decided-­‐to-­‐go,  destination   =   beach agent   =  Andrew,  predicate   =   drove,  patient   =  Mercedes,   destination   =   beach agent   =  Andrew,  predicate   =   returned,   destination   =   home

The   model   fails   to  capture   the   role   of  of  texts   with  untrained   concepts   and   texts   that  break   the   statistical   regularities   of  the   corpus:  



Statistical   regularity   test:

Before   testing  the   relational   processing   capabilities   of  the  model,   we   replicated   St.  John  ( 1998)  original   results   to  ensure   that  our   implementation   of  the   model   was  correct.

• • •

7  possible   roles   ( agent,  predicate,   patient   or   theme,   recipient   or   destination,   location,   manner,  attribute   )  of  which   only  the   “predicate”   role   is  mandatory:

Albert



Replication  of  previous  results

This   model   process   a  story  by  taking   as  input   a  sequence   of  sentences,   one   by  one,  and   forming  a  representation   of  the  story  presented   so   far  in  the   gestalt   layer. • •



Lois



However,  the   model   can’t   go  beyond  its   training   data  set   ( it  can   only  use   seen   pairings).



Instead  of  trying   to  emulate   relational   processing   without  using   symbols,  the   real   problem   is  how  to  implement   symbolic   operations   in   a  neural-­‐like   architecture.



Future   work:



Andrew  liked   .  



Andrew  went-­‐to  the   restaurant-­‐0.



Andrew  ordered   .

• We   plan  to   extend   this  simulations   to  complete   distributed   representations   of  concepts   in   the   input  layer   of  the   model.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Inference:   if  Clement   goes   to  a  restaurant,  he   won’t  tip. Revision:   if   Clement,   ordered   expensive   wine   though,  he  will. Pronoun  resolution:   ‘He’   refers  to  Clement   if  there   is  no  other   male   in   the   story.

References mean  violation mean  regular

0  contexts 8  contexts 0  contexts 8  contexts

Generalization •

Much  of  the   positive   evaluation   that  the   Story  Gestalt   model   has  gained   comes   from  its   generalization   capabilities.



St.  John  ( 1998)  argued  that   the  model   is  able   to  process   untrained  texts   with   the   help   of  a  highly   combinational   corpus.



Script   example: •   liked   . •   went-­‐to  the  restaurant-­‐0. •   ordered   .



Script   restrictions: • ‘Andrew’  likes   [‘pancakes’,  ‘salad’]. • ‘Lois’  likes   [‘banana’,  ’peaches’].



Applying  St.  Johns  procedure   involves   providing  cases   in   which   Andrew  likes   all   kinds  of  food  in   other   places   different   than  the   restaurant  ( 1-­‐8  here   only   on  Lois).

Lois



• It  has  been   argued  that   the  Story  Gestalt   model   forms  a  overall   representation   of  the   story  presented   so  far  ( in   the  gestalt   layer).   Machine   learning   research   has  shown  that  “deep   transition”   RNN  are   powerful  time   series   learning   machines   ( Pascau et   al.,  2014). • Other   related   RNN  architectures   that  don’t  have  a  straightforward   interpretation   as  forming  a   “gestalt”   representation   could  show  the   same   behavior   as  the   Story  Gestalt   model   ( e.g.,  LSTM,   GRU).  



Hummel,   J.  E.,  &   Holyoak,  K.  J.  ( 2003).  A  symbolic-­‐connectionist   theory   of   relational   inference   and  generalization. Psychological   Review,   110,  220–264.



Marcus,  G.   F.  ( 1998). Rethinking  eliminative   connectionism.   Cognitive   Psychology,  37,  243–282.



Pascanu et   al.  ( 2013).  How  to   Construct  Deep   Recurrent   Neural   Networks.   arXiv:1312.6026



Rogers,  T.  T.  &  McClelland,   J.  L.   ( 2014).  Parallel   Distributed   Processing   at  25:   Further   Explorations   in  the   Microstructure   of  Cognition. Cognitive   Science,   6,   1024–1077.



Rogers,  T.  T.,  & McClelland,   J.  L. (2008b). Précis  of  semantic   cognition:   A   parallel   distributed   processing   approach. Behavioral and   Brain   Sciences, 31(6), 689.



St.  John,  M.  F. (1992). The  story  gestalt:   A  model   of  knowledge-­‐intensive   processes   in   text   comprehension. Cognitive   Science, 16, 271–306.



St.  John,  M.  F., McClelland,   J.  L. (1990). Learning   and  applying   contextual   constraints  in   sentence   comprehension. Artificial   Intelligence, 46, 217–257.

Albert

New  concepts   test: 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

mean  violation

0  contexts

8  contexts

New  patient

Here,   we  show  that  even   though  the  model   is  able   to  draw  complex   inferences   based  on  the   statistical   regularities   of  its  training   corpus,   it  fails   to   capture   the   relational   structure   of  input  that   breaks  these   statistical   regularities.

The   Story  Gestalt   model   is  able   to  draw  complex   inferences   based   on  the   statistical   regularities   of  its   training   corpus.   At  first  glance,   it   looks  like   the   model   is   making   inferences   based   on  abstract  roles.

0  contexts 8  contexts 0  contexts 8  contexts

No  agent



Take  for  example   the   Story  Gestalt   model   (St.  John,  1998).  If  this  model   is   presented   with   a  story  about  a   restaurant  where   Albert   plays  the   role   of  the   agent   ordering   food,  the   model   would  predict   that  Albert   will   eat  the   food   and  pay  the   bill   instead  of  bringing   the  bill   or   receiving   the   payment.



mean  regular

New  agents



violation

Cross-­‐script  patient

Recently,   Rogers   and  McCelland (2008,  2014)  have  argued  that   ‘gestalt’  PDP   models   ( St.  John  &   McClelland,   1990;  St.  John,  1992)  can  learn   to  bind   abstract  roles   to  objects  and  use   these   representations   to  make  inferences   based   on  the   relational   structure   of  the  input.

New  patient



1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

No  agent

The   ability   of  PDP  models   to  capture   abstract  relational   knowledge  has   been   put  into  question   repeatedly   (e.g.,  Hummel &   Holyoak,  2003;  Marcus,  1998)  

Discussion

Effect  of  applying   a  highly   combinatorial   corpus:



New  agents



The   core   architectural   feature  of  the   model   is   the  use   of  a  ”deep   transition”   RNN:  



Cross-­‐script  patient

Introduction