Simultaneous Estimation of Self-position and Word from ... - Google Sites

0 downloads 235 Views 4MB Size Report
Phoneme/Syllable recognition. Purpose of our research. 5. Lexical acquisition. Lexical acquisition related to places. Mo
The 13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems (IFAC HMS 2016 in Kyoto)

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information Akira Taniguchi *, Tadahiro Taniguchi *, Tetsunari Inamura ** * Ritsumeikan University, Japan (e-mail: [email protected]). ** National Institute of Informatics/The Graduate University for Advanced Studies, Japan.

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Outline 1. Background and purpose of our research 2. Research outline 3. Proposed method: Simultaneous estimation of self-positions and words

4. Experiment1: Learning spatial concepts

5. Experiment2: Evaluation for word clustering

6. Experiment3: Self-localization using learned spatial concepts

7. Conclusion 2

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Outline 1. Background and purpose of our research 2. Research outline 3. Proposed method: Simultaneous estimation of self-positions and words

4. Experiment1: Learning spatial concepts

5. Experiment2: Evaluation for word clustering

6. Experiment3: Self-localization using learned spatial concepts

7. Conclusion 3

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Background of our research • Lexical acquisition in robots – The robot does not have word knowledge in advance. – The robot learns words from human speech signals. – Estimating the identity of the speech recognition results with errors is a very difficult problem.

• Self-localization in robots – The robot estimates self-position using sensor information and an environment map. – If the robot uses local sensor only, the sequential global localization is a very difficult problem.

4

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Purpose of our research Lexical acquisition

Self-localization

• Uncertain verbal information • It is obtained from human speech • Phoneme/Syllable recognition

• Uncertain position information • It is estimated by Monte-Carlo Localization (MCL).

Mutually effective utilization “The front of TV.”

Monte-Carlo Localization

Lexical acquisition related to places /waitosherfu/

/afroqtabutibe/

/zafroqtabutibe/ ?? /zafrontobutibi/ ?? /bigbuqkkais/

5

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Outline 1. Background and purpose of our research 2. Research outline 3. Proposed method: Simultaneous estimation of self-positions and words

4. Experiment1: Learning spatial concepts

5. Experiment2: Evaluation for word clustering

6. Experiment3: Self-localization using learned spatial concepts

7. Conclusion 6

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Research outline (1/4)

Place of learning target

7

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Research outline (2/4) Teaching /afurantotibi/ ? “a front of TV”

“white shelf”

/waitosherufu/ ? Teachings of multiple

Ambiguous speech recognition 8

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Research outline (3/4) Learning spatial concepts

/waitosherufu/

/afurantobutibi/

/bigusheufu/

9

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Research outline (4/4) Modification of self-localization

“Where is . this place?” “a front of TV”

/furontabutebi/ ?

Before

After The robot can narrow down the hypothesis of self-position.

10

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Outline 1. Background and purpose of our research 2. Research outline 3. Proposed method: Simultaneous estimation of self-positions and words

4. Experiment1: Learning spatial concepts

5. Experiment2: Evaluation for word clustering

6. Experiment3: Self-localization using learned spatial concepts

7. Conclusion 11

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Simultaneous estimation of Self-positions and words Monte Carlo localization(MCL) Probabilistic self-localization algorithm based on particle filter

W4:/terebi/

W2: /torashuboqs/

W1:/sherufu/

Spatial concept • Place name • Spatial area (position distribution)

W3:/teebou/ Ex) spatial concepts

• The proposed method is based on a Bayesian probability approach. • The robot can learn place names from the user’s speech signals without previous word knowledge • The robot can reduce the estimation error of self-position by using learned word information.

12

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Simultaneous estimation of Self-positions and words Position distribution

zt 1 μ, Σ

zt

Robot’s position

xt

xt 1

ut 1

sensor data

ut

Control data



Speech recognition result Place names

Ct  Ot

zt 1 x t 1

ut 1 State of spatial concept

Learning spatial concepts Parameter estimation by Gibbs sampling

Self-localization The posterior distribution is calculated as follows:

p( x0:t | z1:t , u1:t , O1:t )  p( zt | xt ) p(Ot | xt ) p( xt | xt 1 , ut )  p( x0:t 1 | z1:t 1 , u1:t 1 , O1:t 1 ) p (Ot | xt )

W

β:the parameter of the effect of LD L :the number of spatial concepts

  p (Ot | W, Ct ) p ( xt | μ, Σ, Ct ) p (Ct ) Ct

  exp(  LD(Ot ,WCt ))N ( xt | μCt , Σ Ct ) Ct

1 L

Levenshtein distance Gaussian distribution

13

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Learning algorithm for spatial concept using Gibbs sampling 1. Initialization 𝝁, 𝚺 2. Sampling of 𝐶𝑡 without W 3. Sampling of 𝝁, 𝚺 4. Sampling of W 5. Sampling of 𝐶𝑡 with W 6. Multiple iterations of 3-5 We set the initial position distribution. The mean vector 𝝁 is the uniform random number in the given range, i.e., the range in which the robot can move on the map. The covariance matrix 𝚺 is set as diag(𝜎initial ; 𝜎initial ).

14

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Learning algorithm for spatial concept using Gibbs sampling 1. Initialization 𝝁, 𝚺 2. Sampling of 𝐶𝑡 without W 3. Sampling of 𝝁, 𝚺 4. Sampling of W 5. Sampling of 𝐶𝑡 with W 6. Multiple iterations of 3-5

1

3

2

The conditional posterior distribution of Ct samples the state of spatial concept Ct at each time (data) t. In this step, the robot does not estimate the place names W. Therefore, the sampling of Ct is performed as follows:

Ct ~ p( xt | Ct , Ct )

15

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Learning algorithm for spatial concept using Gibbs sampling 1. Initialization 𝝁, 𝚺 2. Sampling of 𝐶𝑡 without W 3. Sampling of 𝝁, 𝚺 4. Sampling of W 5. Sampling of 𝐶𝑡 with W 6. Multiple iterations of 3-5

1

3

2

The posterior distribution of 𝝁, 𝚺 samples the position distribution • conditional 位置分布の初期値を決定 for state of the spatial concept c. • each どのデータがどの場所概念を表すかを推定する  c  範囲内に一様乱数 The of 𝝁, 𝚺 is performed as follows: • sampling 発話された場所の座標データを使って位置分布から確率的に選択する

 initial 1 0   c , c c~  N ( μ m N ,  N  c ) Wishert(  c | VN , N )  c |0  p (hyperparameters xt  | the ) initial  Ct , of Ct mN , NC ,VNt,~ : the posterior distribution N

16

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Learning algorithm for spatial concept using Gibbs sampling 1. Initialization 𝝁, 𝚺 2. Sampling of 𝐶𝑡 without W 3. Sampling of 𝝁, 𝚺 4. Sampling of W 5. Sampling of 𝐶𝑡 with W 6. Multiple iterations of 3-5

word 1

3

word

2

word

posterior distribution of Wc samples the place name Wc for 位置分布の初期値を決定 •The conditional 場所概念ごとにデータから位置分布を更新 state of the spatial concept c. •each事前分布にガウス‐ウィシャート分布を用いてガウス分布に対するベイズ どのデータがどの場所概念を表すかを推定する  c  範囲内に一様乱数 The of Wc is performed as follows: • sampling 推論を行う 発話された場所の座標データを使って位置分布から確率的に選択する

1 0 |  ) p (W ) 1 | V , ) initial  c ,  c1 ~  N ( μcW m , (   ) Wishert(  ~ p ( O | W N N c N N tc C c t c c 0 Ct (  cx  C ~ p Ct ) initial  t t |  Ct ,  m N ,:ガウスウィシャート分布のハイパーパラメータ , V ,  p(Wc) = 1/N N N N

tTo

17

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Learning algorithm for spatial concept using Gibbs sampling 1. Initialization 𝝁, 𝚺 2. Sampling of 𝐶𝑡 without W 3. Sampling of 𝝁, 𝚺 4. Sampling of W 5. Sampling of 𝐶𝑡 with W 6. Multiple iterations of 3-5

word 1 1

33

2 2

word

word

posterior distribution of Ct samples the state of spatial concept •The conditional 場所概念ごとにデータから位置分布を更新 each time t. •Ct for事前分布にガウス‐ウィシャート分布を用いてガウス分布に対するベイズ The sampling of Ct is shown as follows: 推論を行う

 c ,  c1 ~ N ( μc | m N , (  N1 c ) Wishert( c1 | VN , N )

C ~ p ( x |  ,  ) p (O | WCt )

t t Ct Ct t :ガウスウィシャート分布のハイパーパラメータ

18

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Learning algorithm for spatial concept using Gibbs sampling 3.

1. Initialization 𝝁, 𝚺 2. Sampling of 𝐶𝑡 without W 3. Sampling of 𝝁, 𝚺 4. Sampling of W 5. Sampling of 𝐶𝑡 with W 6. Multiple iterations of 3-5

5.

word

word 1 1

1 3

word

2 33

word

4.22

word

word

word word

word

 c  範囲内に一様乱数 Multiple iterations were performed for the process described in steps (3) to (5) 1 1 0 1 |  initial above.  ,  ~ N ( μ m , (   ) Wishert(  c c N N c c | V N , N )  c c    initial   0 19

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Outline 1. Background and purpose of our research 2. Research outline 3. Proposed method: Simultaneous estimation of self-positions and words

4. Experiment1: Learning spatial concepts

5. Experiment2: Evaluation for word clustering

6. Experiment3: Self-localization using learned spatial concepts

7. Conclusion 20

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Experiment 1 : Learning spatial concepts /terebimae/ 500cm (in front of the TV)

Conditions Four places were selected as learning targets. The user decides whether the robot arrived at a learning target. The teaching utterances are repeated 40 times. The number of spatial concepts:𝐿 = 4 The number of iterations:10 Speech recognition system : Julius (using Japanese syllables dictionary) Microphone : SHURE PG27-USB

500cm

/gomibako/ (trash bin)

/siroitana/ (white shelf)

/tsukue/ (desk) The environment on SIGVerse[Inamura et al. (2010)]

Inamura, T., et al. (2010). Simulator platform that enables social interaction simulation -SIGVerse: SocioIntelliGenesis simulator-. In IEEE/SICE International Symposium on System Integration, 212-217.

21

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Learning result of spatial concepts W4: /terimae/

W2: /gumiwako/

W1: /sinitana/

W3: /tsukune/

22

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Outline 1. Background and purpose of our research 2. Research outline 3. Proposed method: Simultaneous estimation of self-positions and words

4. Experiment1: Learning spatial concepts

5. Experiment2: Evaluation for word clustering

6. Experiment3: Self-localization using learned spatial concepts

7. Conclusion 23

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Experiment 2 : Evaluation for word clustering Conditions We compared the matching rate for the estimated states of the spatial concept Ct of teaching data and the classification results of ground truth by a human.

Ct = 4 Ct = 1 Ct = 3 Ct = 2

The EAR (estimation accuracy rate) is calculated as follows:

The number of estimated correct data ΕΑR  The number of all teaching data

The white boxes represent the classification of the teaching data according to the state of the spatial concept.

Classification by human (Ground truth)

24

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Evaluation for word clustering Figures show estimated results of the state of spatial concept in 10 trials. Word 𝑂𝑡 only clustering EAR: 0.81

The number of estimated Ct

The proposed method EAR: 0.96

As a result, we showed that it is possible to improve the lexical acquisition accuracy by performing clustering that considered position and word information. 25

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Outline 1. Background and purpose of our research 2. Research outline 3. Proposed method: Simultaneous estimation of self-positions and words

4. Experiment1: Learning spatial concepts

5. Experiment2: Evaluation for word clustering

6. Experiment3: Self-localization using learned spatial concepts

7. Conclusion 26

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Experiment 3 : Self-localization using learned spatial concepts We performed an experiment for the evaluation of self-localization using learned spatial concepts. Comparison

• Conventional MCL (without word information) • The proposed method (with word information)

While the robot performs self-localization, the estimation error for each time step is calculated as follows: M M

et  ( xt  xt ) 2  ( yt  yt ) 2

xt   wt( i ) xt( i ) , yt   wt(i ) yt( i ) i

i

xt , yt :the robot’s true position

Two evaluation indicators

𝑒𝑚 : the average of 𝑒𝑡 𝑒𝑟 : the minimum value of the estimation error 𝛾 𝛾 is determined by the interval 0, 𝛾 , which includes values greater than or equal to 95% of 𝑒𝑡 in all teaching times.

27

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Experiment 3 : Self-localization using learned spatial concepts Conditions • Landmark based MCL – If the robot looks at the landmark, the robot gets the distance and angle of the landmark. – View angle of the camera:45 degree – The number of particles:1000 – The number of landmarks:4 – The robot moves for 400 steps When the robot arrives at the learning target, the user says the place name of the target place for the robot.

W4:terimae

W2: gumiwako

W1:sinitana

W3:tsukune

28

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Situation of the experiment (Video clip)

29

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Results The estimation errors of the proposed method were smaller than that of MCL for two evaluation indicators 𝑒𝑚 , 𝑒𝑟 .

As a result, we showed that the spatial concepts enable the modification of the global self-localization error.

30

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Outline 1. Background and purpose of our research 2. Research outline 3. Proposed method: Simultaneous estimation of self-positions and words

4. Experiment1: Learning spatial concepts

5. Experiment2: Evaluation for word clustering

6. Experiment3: Self-localization using learned spatial concepts

7. Conclusion 31

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Conclusion Learning spatial concepts

• The robot could acquire spatial concepts by integrating noisy information from sensors and speech. Lexical acquisition related to places • It is possible to improve the lexical acquisition accuracy by performing clustering that considered uncertain position information and uncertain word information. Self-localization using spatial concepts • The robot was able to estimate self-position by employing spatial concepts with lower error than when using MCL. THANK YOU FOR YOUR KIND ATTENTION. 32

THANK YOU FOR YOUR KIND ATTENTION. 33

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

The results of speech recognition using the Japanese syllable dictionary Teaching word

Recognized words

/shiroitana/ (white shelf)

/shiroitano/

/shinitana/

/shiroitaga/

/shinotaga/

/shinutana/

/chinitano/

/tsutana/

/shinonitana/ /shinotana/

/shiruitana/

/tsukue/ (desk)

/tsukube/

/tsukune/

/tsukuu/

/tsukue/

/tsukume/

/tsukune/

/tsukuke/

/tsutsune/

/tsukune/

/tsukume/

/gomibako/ (trash bin)

/komiwako/

/gubiwako/

/gumiwako/

/gubiwako/

/kumiwako/

/komiwako/

/gumiwako/

/gumiwako/

/gumibako/

/gumibaku/

/tebimae/

/tegikunae/

/terenae/

/terimae/

/tezurimae/

/terimae/

/teteginae/

/teribinae/

/terimae/ /terebimae/ (in front of the TV) /terenae/

34

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Smoothing

z1

x0

x1





zt 1

x t 1

zt

xt

zt 1



x t 1 

zT

xT

Estimation of self-positions 𝑥1:𝑇    • MonteCarlolocalization : on-line algorithm • Learning spatial concepts : off-line algorithm

In general, the smoothing method can provide a more accurate estimation than the MCL of online estimation. Self-positions 𝑥1:𝑇 are sampled by using a Monte Carlo fixed-lag smoother in the learning phase. 35

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Monte Carlo fixed-lag smoothing (Particle smoother) In general, the smoothing method can provide a more accurate estimation than the MCL of online estimation.

robot’s true position

xt

xt  L time



:the trajectories of particles

:the smoothed trajectory

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Speech recognition system: Julius • Word dictionary

Japanese syllable dictionary

– Japanese syllables only To recognize speech signals as syllable sequences. The set of 43 Japanese phonemes defined by Acoustical Society of Japan (ASJ)’s speech database committee is adopted by Julius. The Julius system uses a word dictionary containing 115 Japanese syllables. We assume that the robot can recognize user speech at the unit of syllables, i.e., the robot does not have word knowledge in advance. Julius dictation-kit-v4.2, http://julius.sourceforge.jp/

37

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

The clustering method using word data only Ot (without position data 𝑥𝑡 ) 1. Initialization 𝝁, 𝚺 2. Sampling of 𝐶𝑡 without W 3. Sampling of 𝝁, 𝚺 4. Sampling of W 5. Sampling of 𝐶𝑡 with W 6. Multiple iterations of 3-5

2

Ct ~ p (Ct )

3

c ,  c ~ p( c ,  c )

4

5

    Wc ~   p (Ot | WCt )  p (Wc )  tcTCt   o 

Ct ~ p (Ot | WCt ) p (Ct )

38

Simultaneous Estimation of Self-position and Word from Noisy Utterances and Sensory Information

Nonparametric Bayesian Spatial Concept Acquisition method (SpCoA) “Here is Emergent System Lab.”

/konekutiNgukorida/ (connecting corridor)

/emajentoshitemurabo/, /taniguchizurabo/

• This model can learn spatial concepts from continuous speech signals • This model can learn an appropriate number of spatial concepts, depending on the data (using nonparametric Beysian approach) • This model can relate several places to several names. (many-to-many correspondences between names and places) Akira Taniguchi, Tadahiro Taniguchi, and Tetsunari Inamura, "Spatial Concept Acquisition for a Mobile Robot that Integrates Self-Localization and Unsupervised Word Discovery from Spoken Sentences", IEEE Transactions on Cognitive and Developmental Systems, 2016.

39

Suggest Documents