Robotics and Autonomous Systems A method for sonar based

0 downloads 0 Views 885KB Size Report
Grid based mapping allows to perform detection and tracking of a moving object at ... the high cost of currently available robotic systems is still a barrier for their wide diffusion in practice, ... angular resolution of sonar sensors is not necessar-.
Robotics and Autonomous Systems ELSEVIER

Robotics and Autonomous Systems 25 (1998) 117-126

A method for sonar based recognition of walking people A n g e l o M . S a b a t i n i *, V a l e n t i n a C o l l a Advanced Robotics Zechnology and Systems Lab (ARTS-Lab) Scuola Superiore S. Anna, Via Carducci 40, 56127 Pisa, Italy

Received 23 July 1997; received in revised form 16 March 1998; accepted 16 March 1998

Abstract

In this paper we investigate the problem of recognising a person based on the rhythmic features of human walking. The perceptual task is perfonned using in-air sonar sensors. A linear array sensing head with limited complexity is developed to achieve the stated goal; the method of sensory information processing implemented in the device is a blend of grid based mapping and wavelet based multiresolution analysis. Grid based mapping allows to perform detection and tracking of a moving object at the highest signal-to-noise ratios; wavelet based multiresolution analysis allows us to detect the features that are peculiar to walking people in the range measurement sequence extracted from a single sonar sensor of the sensing head. Experimental tests on a number of moving objects with highly time-varying target strengths are carded out; the results prove the feasibility of the approach in terms of recognition rate and acquisition time. The present study contributes to understanding how in-air sonar sensors behave and interact with complex scatterers such as the human body; also, it offers promise for novel applications of sonar technologies in the field of advanced robotics, where the close interaction between human users and robotic systems is on stage. © 1998 Elsevier Science B.V. All rights reserved. Keywords: Sonar sensing; Human body detection; Sensory information processing

1. Introduction

In the field of adwmced robotics several applications may benefit frora the ability of ultrasonic (US) sensors, or sonar sen,,;ors to provide geometrical information concerning the existence, the location and the nature of insonified objects. Unfortunately, the interpretation of range data from sonar sensors is a highly complex problem, for a number of factors that mainly concern the adverse properties of the propagating medium - air in the vast majority of applications of interest to robotic researchers. Mostly, the difficulties of scene understanding are alleviated by reducing the complexity of the target environment. In * Corresponding author. E-mail: [email protected]

practice, the sensor models used for data interpretation are limited in their scope to describe (a) reflections from smooth, specular surfaces in relatively uncluttered environments [12], (b) diffractions from diffuse scatterers in moderately cluttered environments [6]. The signal analysis is oftentimes limited to the time of arrival of the (first) echo, since other potentially interesting signal features, such as the echo amplitude, turn out to be quite fragile in practice. An interesting approach that aims at modelling and interpreting sensor information not limited to range is reported in [3], where environments populated by both smooth reflectors and rough scatterers are considered. In this paper we intend to go one step further in the investigation of sonar sensor behaviour in difficult environments, by pursuing the study of a method

0921-8890/98/$ - see front matter © 1998 Elsevier Science B.V. All rights reserved. PII: S0921-8890(98)00006-2

118

A.M. Sabatini, V. Colla /Robotics and Autonomous Systems 25 (1998) 117-126

for sonar based recognition of walking people. Since the high cost of currently available robotic systems is still a barrier for their wide diffusion in practice, research activities that aim at expanding the application domain of inexpensive techniques are welcome. In our view, sonar sensing techniques are among those (inexpensive) techniques whose potential has yet to be fully exploited. In this paper, we propose a novel application for them and present the results of a preliminary investigation. So far, researchers have used thermal type infrared detectors or CCD cameras to accomplish the task of human body recognition. The infrared detector is used in conjunction with a US sensor in [9]; here, no attempt is made to extract information about the walking person by analysing the sensory data from the US rangeflnder. A method for automatic pedestrian detection from monocular image sequences based on the rhythmic features of human walking is devised in [24]. Our approach stems from considering that the identification of a moving object may be accomplished by analysing the features that are peculiar to the motion patterns themselves [23,24]. This notion, well known in the computer vision community, holds irrespective of the sensory modalities mostly involved in the perceptual activity. For instance, when bats locate and track preys, their biosonar and neural machinery extract some signatures from the received echoes that can betray, e.g., beating insect wings, so as to permit successful hunting even in dense vegetation [20]. We argue that the two-stage bipedal action performed by one person who walks in the direction of a sensor aiming at his/her legs produces a range measurement sequence with a somewhat stereotyped pattern. This pattern should be composed of plateaus, roughly lasting the time of the stance phase of one leg, followed by relatively steep slopes due to the swift pendolation phase of the contralateral leg. The temporal frequency and the spatial period that characterise the gait are thus reflected in the staircase aspect of the corresponding range measurement sequence. In the method we propose, the detection of moving objects and their tracking employ robust grid based mapping techniques; a wavelet based multiresolution analysis is then involved in the task of target discrimination. In contrast to the classical Fourier spectral analysis, wavelet decomposition techniques are able to analyse a given phenomenon at varying levels of detail - they are

a sort of mathematical microscope, with good localisation properties in both the time and the frequency domains [17]. Also, wavelet techniques come equipped with a library of fast algorithms whose computational complexity is comparable to that of the Fourier analysis via fast Fourier transform (FFT) algorithms. The wavelet techniques offer a powerful and computationally inexpensive tool that can be used to reveal the staircase aspect of the highly noisy signal corresponding to a walking person among other signals from complex scatterers which happen to be less "regular". An important point to outline in our approach is that the low complexity and cost of the sensing and computing hardware are specific design requirements for the application envisioned in this paper. The multiaural linear array sensing head, we previously developed for continuous localisation of a mobile robot follows the guidelines of a "minimalist" design [5,18]. In this work, the device is used for data collection without any hardware modification. The paper is organised as follows: in Section 2, a brief theoretical overview of the analytical tools we use in the developed method is provided; the detailed description of the algorithms follows in Section 3. The experimental results are given in Section 4. Section 5 provides some concluding remarks and plans for our future work.

2. Theoretical remarks

2.1. A survey of sonar range data processing methods The different approaches that are proposed in the robotic literature for spatial reasoning using sonar range data are dependent on the nature of the application environment. In the case of reflections from smooth, specular surfaces, geometric feature based representations prove to be quite successful for imaging naturally occurring structures that can be assimilated to point or line features [1,10]. Here, the low angular resolution of sonar sensors is not necessarily a difficulty: the use of multiple transducers with overlapping fields of view and the availability of a physically based sensor model allow us to efficiently deal with the correspondence problem (identification), while reducing the angular uncertainty affecting the target location (positioning) [15]. In the case that

A.M. Sabatini, V. Colla/Robotics and Autonomous Systems 25 (1998) 117-126 the environment is crowded with diffuse scatterers, a popular alternative to a geometric feature based representation is given by the so-called grid based approach [7]. Here, a tessellated space representation is built as an array of cells, whose content is the estimated probability that the corresponding spatial patch is occupied or empty; as new measurements are available, the occupancy probabilities are updated using a probabilistic sensor model, so as to reinforce consistent interpretations and to reduce the impact that outlier measurements and artifacts, e.g., multiple reflections, may have on the stability of the representation. Occupancy grids are widely used in collision avoidance applications, even for fast mobile robots in cluttered environments [2]; in a limited number of cases, they are also used for updating the robot estimate of position and orientation, although the estimation accuracy is less than that achieved by a geometric feature based method [19]. In the present study, we argue that a grid based approach is to be preferred to a geometric feature based approach for object detection and tracking since the objects of interest are not amenable to geometric modelling.

2.2. Wavelet based methods for multiresolution analysis

Vm C Vm-l, M Vm = {0},

k J Vm = L2(R). mE7/

(1)

f ( t ) ~ Vm ~ .~ f ( 2 t ) ~ Vm-1,

m ~ 7/,

Vm-1

= Vm ~

Win,

Vm _1_Win.

(2)

and there exists a so-called scaling function ~p(t) e V0 such that the set

(4)

Let Pmf(t) and Q m f ( t ) be the projections of f ( t ) onto Vm and Wm, respectively; Eq. (4) implies

Pro-if(t) = P m f (t) + a m f (t). Pmf(t) is a coarser approximation of f ( t ) than P m - l f ( t ) , because Q m f ( t ) contains details of P m - l f ( t ) which are lost in going from Vm-1 to Vm. Provided that (1)-(3) are satisfied, a function ap(t) ~ W0, called the wavelet function, exists, such that the set {~m,n(t) = 2-m/2ap(2-mt - n ) I n ~ Z}

(5)

is an orthonormal basis for Wm and the entire set {~Pm,n(X)}n,raJ_ is an orthonormal basis for L2(R). Denote the inner product of two functions as ( f (t), g(t)) = f f (t)g*(t) dt, where the superscript • stands for complex conjugation; the so-called wavelet series expansion of the signal f ( t ) is then defined as (6)

The inner products dm,n = ( f (t), ~Pm,n(t)) are referred to as wavelet coefficients. Analogously, we adopt the notation Cm,n = ( f (t), tPm,n(t)). Suppose that the sequence

CO,n = ( f (t), dP(t - n)),

n E Z,

is available; in many cases of practical interest, they are well approximated by the sampled version of the signal [21]. Due to Eq. (4), f ( t ) can be decomposed into the sum of its orthogonal projections into V1 and W1, e.g., f ( t ) = fl(t) + ql(t). After J steps of recursive decomposition applied to the f/ components, the original signal is expressed as the sum of a coarse approximation and J detail components as follows: J

Moreover, for any signal f ( t ) ~ L2(R) we have

(3)

is an orthonormal basis for Vm. We define Wm as the orthogonal complement of Vm in Vm-~ :

f ( t ) = ~ )--~(f, ~m,n)~m,n(t). mE~_nE~_

The wavelet analysis consists of expressing a signal f ( t ) ~ L2(R) as the sum of a coarse approximation plus added detail components that represent the signal at varying resolution levels [4]; LE(R) is the space of signals with finite energy. The multiresolution approach to the wavelet decomposition, pioneered by Mallat [13] and Meyer [14], will be briefly described in the following. A multiresolution analysis consists of a sequence of closed subspaces Vm, m ~ 7/, in L2(R) with the following properties:

mEZ

{dPm,n(t) = 2-m/2gb(2-mt -- n) In ~ 7/}

119

f ( t ) = f j ( t ) h- ~_~qi(t) i=l J

= fj(t)-I-~ ~"~di,n~i,n(t). i=1 nEZ

A.M. Sabatini, V Colla/Robotics and Autonomous @stems 25 (1998) 117-126

120

The fact that ¢~(t) ~ V0 and ~p(t) ~ W0 implies that both these functions belong to V-l, yielding:

¢ (t) = Z

hn¢-l,n (t),

C m-l,k

nET~

~k(t) = Z

gn¢-l,n(t).

Crn,k

(7) -

nEZ

In Eq. (7), {hn} is the impulse response of a lowpass filter, associated to the scaling function ~b(t), and {gn} is the impulse responses of a high-pass filter, associated to the wavelet function ~p(t). Eq. (7) can be extended to all the translates and dilates of the scaling and wavelet functions:

g*

dro, k

(a)

C m,k

dpm,k(t) = Z hn-2kdPm- 1,n (t), nE?7 ~m,k(t) = Z

gn-2k(bm-l,n(t).

dm,k

nE~_

Finally, the coefficients Cm,k and dm,k can be obtained from Cm-l,~ by using the relations implied by Mallat's algorithm: Cm,k ~

Z

Fig. 1. (a) One stage of the multiresolution decomposition; the circular blocks are downsamplers. (b) One stage of the reconstruction process; the circular blocks are upsamplers.

,

hn_2kCm-l,n,

nE~

dm ,k = Z

* gn_2kCm-l,n.

(8)

nc~_

The multirate filtering structure sketched in Fig. 1(a) yields the downsampled-by-two convolution sums that are indicated in Eq. (8). The coefficients Cm-l,k are recovered by means of the reconstruction formula of Mallat's algorithm: Cm-l,k = Z [ h k - 2 n C m , k -'}-gk-2ndm,k]. nE~-

The multirate filtering structure sketched in Fig. yields the upsampled-by-two convolution sums are indicated in Eq. (9). It can easily be shown hn, gn are time-reversed versions of hn* and respectively [21].

(b)

(9) l(b) that that gn*,

3. System description The system we use for the experiments in this paper is a multiaural linear array composed of three standard Polaroid transducers, with centre-to-centre

spacing d = 15 cm; the sensing head is endowed with a rotational degree of freedom given by a stepper motor under computer control. In the current version of the system, the angular resolution of the sensing head is A/~ = 0.6 °. Both the transducer electronics and the motor driver are controlled by a microcontroller Motorola HC11, connected to a host computer PC486DX2 via a standard RS232 interface (see Fig. 2). Elsewhere, we describe the system implementation in greater detail [5,18]. Note that the electronic board provided by the Polaroid company together with the transducers does not allow to use them as pure receivers. In our application the transducers are thus fired according to a sequential firing strategy, although we argue that the performance of the system should improve if a truly tri-aural configuration is used [15], e.g., the lateral elements of the array are pure receivers: the reason is that higher firing rates are obtained by tri-aural firing schemes as compared to sequential firing ones. Moreover, in this paper only the first detected echo is processed, although

A.M. Sabatini, V. Colla/Robotics and Autonomous Systems 25 (1998) 117-126

121

GRID-BASED MODULE ~

GRID-BASED REPRESENTATION



BLOB TRACKING

To the stepper motor driver

WAVELET-BASED PROCESSING MODULE •

HAAR COMPRESSION .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

To the decision maker

.

Fig. 3. Block diagram of the sensory data processing performed by the host computer.

Fig. 2. The nmltiaural sensing head.

multiple ranging should be possible with slight modifications of the control logic unit [16]. A person who walks nearby the robot can be revealed by testing for the presence of quasi-periodic components in a range measurement sequence, e.g., the plateaus, provided that: (a) the sonar device has the ability to locate and track, both in the radial and in the lateral directions, (b) the relative motion between the person and the sensor presents a radial component of sufficient strength. Unfortunately, the incoming sensory information is usually corrupted by high noise levels, because of the strong fluctuations occurring to the echo strength while the person is moving. Moreover, changing shapes of clothes, varying inclination of the legs during the motion and irregularities in the kind of motion l:end to diffuse the energy back to the receiver in an almost unpredictable way; the influence of these effects may be severe, especially when the sensor detection sensitivity is degraded as a consequence of large orientation angles existing between the reflecting legs and the sensor. On the other hand, highly irregular range measurement sequences are also expected from several objects like swaying plants, chairs, tables, whose target strengths against a moving sensor may vary in time in a complex fashion. Our method relies on the assumption that all these objects should be characterised by less stereotyped patterns than walking people. Successful discrimination algorithms have thus to discover faint regularities in highly noisy range measurement sequences.

Fig. 3 shows a functional block diagram of the operations performed by the host computer for detectiontracking-discrimination.

3.1. Detection and tracking of moving targets The ability to perform lateral tracking allows to implement active sensing strategies for signal-to-noise ratio (SNR) enhancement. Also, they facilitate the avoidance of target escape manoeuvres from the sensor receptive field. In the absence of appropriate reference geometric models, we prefer a grid based method to the trilateration algorithms that are used by geometric feature based methods for target orientation tracking. The accuracy requirement for lateral tracking does not need to be particularly high in our system; the cell size may be therefore in the order of several centimetres, with subsequent savings in computation time. En passant, it should be noted that qualitative or semiquantitative information concerning the location of a point source is often used in those sonar devices whose design takes inspiration from a bionic approach [11]. An occupancy grid is a matrix W, whose elements W/j measure the confidence about the presence of some objects within the cell gi,j, s e e Fig. 4. The elements of the matrix are updated according to the sensor measurements. In particular, if the kth sensor (k = 0, 1, 2) performs a range measurement d~, the cell located on the sensor acoustic axis at the distance dg is given the highest increment, with smaller increments given to its eight neighbouring cells. Conversely, if a cell has

A.M. Sabatini, V. Colla/Robotics and Autonomous Systems 25 (1998) 117-126

122

b~ob Sl

$2

S31

locations of their centroids. Although several blobs may be present in the same map, it should be pointed out that, in the current version, our multiaural sensing head does not perform multiple ranging, so that the closest object has in practice a high chance t o " mask" the objects lying behind; this fact greatly reduces the problem of multiple track assignment. The attitude of the system is to direct its own attention to the nearest blob provided that the difference between the centroid locations in successive maps relative to an absolute reference frame is greater than a given threshold, DTHR. A simple q-tracker is used to smooth the data relative to the orientation of the blob under examination:

Fig. 4. The occupancy grid (not to scale along the y-direction).

0[i] = ot 0[i -- 1] + (1 -- t~) On [i],

a non-zero occupancy value but no sensor detects the presence of an obstacle within the same cell or within any of its eight neighbouring cells, the occupancy value is properly decreased. The temporal registration of the maps on a frame-by-frame basis is achieved by taking into account the (known) rotational motion of the sensing head. With an occupancy grid based mapping method, a moving object is perceived as a sort of "blob". An important step of the tracking procedure concerns therefore the discrimination between different blobs• The problem is faced using an algorithm borrowed from the methods of region-oriented segmentation commonly applied in computer vision. The procedure - denoted as region growing by pixel aggregation [8] - operates on the occupancy grid W. The algorithm starts with a set of "seed" cells and regions are grown from them by appending those neighbouring elements that have similar properties, e.g., non-null occupancy values in our case. The location of a blob G is computed from the location of its centroid, whose coordinates relative to a sensor based reference frame are:

where 0[i - 1] is the orientation angle estimated at the (i - 1)th time step, Om[i] is the orientation angle measured at the ith ime step, and 0[i] is its updated estimate at the ith time step. Active sensor aiming is a trick to improve the SNR and to reduce the chance that other blobs distract the system before completing the data analysis for target discrimination.

EWi,jEG Xi,j Wi,j Xc =

EWi,j~G Wi,j

'

Y~.Wi,jeG Yi,j Wi,j Yc = Y~.Wi.jeGWi,j '

3.2. Target discrimination The raw range data xi provided by the central sensor are pre-filtered using a five-sample-wide median filter, so as to reduce the effect of outlier measurements; the filtered range measurement sequence is then fed to the module performing the wavelet decomposition. If the basis functions {~m,n (t)}m,n are suitably chosen, only a relatively small subset of the coefficients d,n,n should contain almost all the relevant information conveyed by f ( t ) . Interestingly, wavelet decomposition techniques are widely used for data compression; the reason is that, in many cases, a relatively small subset of the obtained coefficients suffices to capture the relevant information about the analysed signal, provided that a suitable choice is made for the wavelet function • To detect the staircase nature of the sequence corresponding to a walking person, we propose to use the Haar wavelet transform, which is characterised by the following scaling and wavelet functions:

where xi,j and Yi,j are the coordinates of the centre of

the cell gi,j. In order to distinguish which blobs correspond to moving objects, a track is maintained for the

1 ~b(t)----- 0

for0~

Suggest Documents