Prediction of Next Contextual Changing Point of ...

2 downloads 0 Views 1MB Size Report
CORPORATION. {kentarou hitomi, kazuhito takenaka, takashi bandou}@rd.denso.co.jp x. Sequence of latent letters. Sequence of latent words. Segment. Chunk.
Prediction of Next Contextual Changing Point of Driving Behavior Using Unsupervised Bayesian Double Articulation Analyzer Shogo Nagasaka1 , Tadahiro Taniguchi1 , Kentarou Hitomi2 , Kazuhito Takenaka2 , and Takashi Bando2 Abstract— Future advanced driver assistance systems (ADASs) should observe a driving behavior and detect contextual changing points of driving behaviors. In this paper, we propose a novel method for predicting the next contextual changing point of driving behavior on the basis of a Bayesian double articulation analyzer. To develop the method, we extended a previously proposed semiotic predictor using an unsupervised double articulation analyzer that can extract a two-layered hierarchical structure from driving-behavior data. We employ the hierarchical Dirichlet process hidden semiMarkov model [4] to model duration time of a segment of driving behavior explicitly instead of the sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) employed in the previous model [13]. Then, to recover the hierarchical structure of contextual driving behavior as a sequence of chunks, we use the Nested Pitman-Yor Language model [6], which can extract latent words from sequences of latent letters. On the basis of the extension, we develop a method for calculating posterior probability distribution of the next contextual changing point by marginalizing potentially possible results of the chunking method and potentially successive words theoretically. To evaluate the proposed method, we applied the method to synthetic data and driving behavior data that was recorded in a real environment. The results showed that the proposed method can predict the next contextual changing point more accurately and in a longer-term manner than the compared methods: linear regression and Recurrent Neural Networks, which were trained through a supervised learning scheme.

I. INTRODUCTION To assist people’s driving behavior, future advanced driver assistance systems (ADASs) should observe a driver’s behavior, detect contextual changing points, and predict the driver’s future behavior. Contextual changing points are moments when a driver switches his/her driving behavior, e.g. braking before turning right, entering a sequence parking a car, and leaving intersection. If ADASs intervene in a driver’s behavior to assist his/her navigation at contextual changing points, the system should start preparing for the intervention ahead of time. For this purpose, the time when the driving context changes needs to be predicted. However, long-term prediction of future driving behavior is widely known to be very difficult. This research was partially supported by a Grant-in-Aid for Young Scientists (B) 2012-2014 (24700233) funded by the Ministry of Education, Culture, Sports, Science and Technology, Japan. 1 S.Nagasaka and T. Taniguchi are with College of Information Science and Engineering, Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, Japan { s.nagasaka,

taniguchi}@em.ci.ritsumei.ac.jp 2 K.Hitomi, Corporate

K.Takenaka and R&D Div.3,

T. Bando are with DENSO CORPORATION

{kentarou hitomi, kazuhito takenaka, takashi bandou}@rd.denso.co.jp

Estimation of contextual changing points

Prediction of contextual changing points Timeline

Estimated changing points of chunks Observed data Chunk

Predicted changing points of chunks Unobserved future data Remaining duration time of segment

Segment

Remaining duration time of chunk

Sequence of latent letters Sequence of latent words

Duration time of segment

x

Duration time of chunk

Generation Inference

Language model BC AC ABC

AD

BA

Fig. 1. Overview of the generative process of time series data that have a double articulation structure [13], and overview of estimation and prediction of remaining duration time from doubly articulated time series data

Recently, statistical time series modeling techniques, such as the hidden Markov model (HMM), hybrid dynamical system (HDS), and Gaussian mixture model (GMM), have been frequently employed [5], [9], [8], [16], [1]. However, long-term prediction of the next contextual changing point has never been achieved on the basis of a machine learning approach. To estimate contextual changing points from observed driving behavioral data, Takenaka et al. applied a double articulation analyzer (DAA) to driving behavioral time series data obtained from a control area network (CAN) of a car [11]. They showed that extracted doubly-articulated changing points corresponded to contextual changing points labeled by human annotators more accurately than those extracted by a one-layer hidden Markov model. They developed a drive video summarization method on the basis of this technique [10]. Bando et al. also showed that the DAA is a suitable segmentation method for topic modeling [2]. Basically, this paper follows these research results and assumes that switching points of two successive chunks estimated by DAA represent contextual changing points of driving behaviors, e.g. braking before turning right, entering a sequence parking a car, and leaving intersection. . However, none of these researchers developed a method for predicting

the next contextual changing point in future unobserved data. On the other hand, Taniguchi et al. developed a semiotic predictor that can predict successive sequences of hidden states of driving behavior by extending the DAA [13]. They showed that their method could predict longer sequences than previous methods. However, their method can only predict successive sequences of hidden states that are separated from an actual timeline. Therefore, it cannot predict the next contextual changing point on the actual timeline. In this paper, we develop a method predicting next contextual changing point by extending Taniguchi et al.’s algorithm [13]. The task and generative process of our model are illustrated in Fig. 1. The top figure represents the current position of a car. The vertical bars on the top timeline represent estimated and predicted contextual changing points of driving behavior. Gray distribution on the future timeline shows a predicted distribution of the next contextual changing point. The middle figure represents observation data. The black and gray lines are observed and unobserved future parts, respectively. The middle lower blocks of labels are the sequences of the estimated letters and words, corresponding to segments and chunks, respectively. The remaining duration time of the end of the segments and chunks are represented by τ S and τ C , respectively, which are explained in detail in Section 3. The bottom state transition graph represents a language model that is assumed to generate sequences of latent words. The purpose of this paper is to develop a method that calculates the posterior distribution of termination time of a current chunk shown in Fig. 1. To predict the next termination time of a chunk, we model the remaining duration time of a chunk of driving behavior on the basis of the idea of DAA. To model the duration time of chunks and segments of driving behavior, we introduce the hidden semi-Markov model to the DAA. On the basis of the extension, we propose a method that can estimate distribution of the remaining duration time of a current chunk and predict changing points of a current chunk. The remainder of this paper is organized as follows. Section 2 describes the DAA and the hierarchical dirichlet process hidden semi-Markov model (HDP-HSMM)[4] used for extending the DAA. Section 3 outlines the proposed method that calculates remaining duration time distribution using n-gram probability of chunks that is estimated by the DAA. Section 4 demonstrates the effectiveness of our method on synthetic and real driving behavior data. II. DOUBLE ARTICULATION ANALYZER WITH HDP-HSMM

Second, the phonemes are chunked into words (“ABC”, “AC”, “BA” etc. in Fig. 1). In most cases, we assume that phonemes have no meanings but words do. We call time series data corresponding to a word and a phoneme a chunk and a segment, respectively, in this paper. We call the hidden state that lasts during a segment (“A”, “B”, “C” etc. in Fig. 1) a letter. Additionally, we call a sequence of letters corresponding to a chunk a word in this paper. A transition model between the words is represented as a language model. It is reported that the DAA can estimate contextual changing points of driving behavior [11]. However, there is a problem with extending the method to predict the next contextual changing point. The sticky HDP-HMM assumed that self-transition of a hidden state has Markovian property. Therefore, the distribution of a self-transition count of hidden states is restricted to a geometric state duration that always has its mode at zero. To estimate the remaining duration time of a driving behavior, the count of transition needs to be modeled as a proper duration distribution, e.g. the Poisson or Gaussian distribution. To solve this problem, we employ the HDP-HSMM with Poisson duration distribution, which introduces the explicit duration distribution to the HDP-HMM. It enables the DAA to estimate state-specific duration distribution for segment. It is known that sticky HDP-HMM is a subset of the HDP-HSMM. We can naturally obtain an extended DAA by simply replacing sticky HDPHMM [3] with HDP-HSMM [4] in the previously proposed DAA [12], [13]. As a result, the sequential approximate inference procedure is described as follows. 1) Inferring sequence of letters and segments corresponding to hidden states of HDP-HSMM by using blocked Gibbs sampler [4] 2) Inferring word sequence from the sampled sequence of letters by using blocked Gibbs sampler of NPYLM [6] after replacing a sequence of the same letters with a single letter, e.g. from “bbbaaaa” to “ba” B. Hierarchical Dirichlet Process Hidden Semi-Markov Model The HDP-HSMM, proposed by Johnson et al., is a Bayesian nonparametric approach. Figure 2 shows the graphical model for the HDP-HMM. The generative model is illustrated as follows β πi o (θi , ωi ) zs

A. Double articulation analyzer and its extension The DAA assumes that observation data are based on a double articulation structure [12], which is well known structure in semiotics. Figure 1 shows a conceptual figure of double articulation. Our spoken language and some other semiotic time series data have a double articulation structure. In speech recognition, a spoken auditory signal is first segmented into phonemes (“A”, “B”, “C” etc. in Fig. 1).

∼ GEM(γ), ∼ DP(α, β), ∼ H × G,

(1) (2) (3)

∼ π ¯zs−1 ,

(4)

∼ g(ωzs ), = zs , = f (θxot ), ∑ DsS¯ , =

DsS xt1s :t2s yt1s :t2s t1s t2s

=

Suggest Documents