Minimizing the Number of Support Points in Perceptual ... - CiteSeerX

2 downloads 97 Views 141KB Size Report
Minimizing the Number of Support Points in. Perceptual Offline Haptic Compression. Fernanda Brandi. Institute for Media Technology. Technische Universität ...
Minimizing the Number of Support Points in Perceptual Offline Haptic Compression Fernanda Brandi

Eckehard Steinbach

Institute for Media Technology Technische Universit¨at M¨unchen Munich, Germany [email protected]

Institute for Media Technology Technische Universit¨at M¨unchen Munich, Germany [email protected]

Abstract—During the recording of a haptic interaction session both position and force-feedback data have to be saved. This work proposes two novel methods for haptic data compression which allow to keep the reconstruction errors during playback below human perception thresholds. Supported by well-known results from psychophysics, the first proposed non-iterative method determines the least amount of samples necessary to reconstruct the force-feedback signal within the perception thresholds. The second proposed method compresses the position signal in open-loop fashion exploiting the fact that for the replay of haptic interaction sessions the position signal no longer has influence on the force-feedback. Thus, the only position information perceived by the user is the displayed visual information and therefore it can be compressed exploiting the limits of human visual perception. Our experimental results show better haptic signal compression performance when compared to the state-of-the-art for both methods while maintaining good haptic and visual subjective experiences. Keywords — haptics, perceptual coding, telepresence, teleaction, deadband, psychophysics, multimedia, compression, playback

I. I NTRODUCTION In traditional multimedia communication, mostly visual and auditory information are employed to enable remote interactive sessions between two or more users. Especially in the last decade, the haptic modality has been increasingly studied and applied for enriching interactive communication. This new modality permits the utilization of further human sensations such as kinaesthetic (e.g. movement, force, torque) and tactile (e.g. touch, pressure, texture, pain) senses. The employment of haptics evidently empowers interaction systems since it reaches closer to our natural ability of communicating through all our senses. Networked haptic interactions systems, such as telepresence and teleaction (TPTA) systems, consist of an operator (OP), the human system interface (HSI), the communication channel and lastly, the teleoperator (TOP). In Fig. 1, we can observe the OP which is the human user. The OP remotely controls the TOP which is typically a robot or an end effector in a virtual environment. The TOP displays the position imposed by the OP and reads the respective force-feedback as the result of its actions in the environment. The HSI device, composed in Fig. 1 by visual and haptic displays, provides a proper physical interface so the OP can remotely manipulate the TOP and

additionally receive the according visual and haptic feedback. The HSI has sensors and actuators which must as accurately as possible read the actions performed by the OP and display the respective force-feedback. Two control loops are closed locally at the OP and the TOP [3]. The communication is established over the network communication which connects the local control loops with a global control loop. Operator

Visual Display

Local Control Loop

Network

Local Control Loop

Teleoperator Sensors & Actuators

Haptic Display

Fig. 1. Schematic overview of a networked visual-haptic telepresence and teleaction system (adapted from [3], [9]).

The more accurate the interactive session is in terms of both spatial and temporal representation regarding all the modalities, the more transparent is the system. In other words, to achieve a high transparency or immersion level, the overall user perception concerning the remote environment should be as much local, immediate and real as possible. In order to cope with the human high temporal haptic perception resolution, the system’s stability and consequently the transparency, both the sampling and displaying rate must be high [3]. In order to facilitate the transmission of the multimedia signals, data compression is required for each modality. Efficient compression schemes for audio and video have been already widely studied and employed. Surprisingly, haptic signals still endure with the lack of proper compression schemes specifically designed to accommodate the necessities of haptic data communication. In real-time applications, due to the high temporal resolution of the human haptic perception, it is imperative that during the interaction session the samples are sent on-the-fly. Thus, to avoid additional delay, haptic data cannot be compressed using traditional block-based codecs. Offline haptic applications have been gaining attention worldwide. Such systems employ haptic recording and playback in order to enable learning, training, documentation, performance analysis, etc. [9] Although massive storage devices

are increasingly popularized, the compression of haptic signals is still of extreme relevance. The haptic field has been rapidly evolving and devices are accomodating a growing number of degrees-of-freedom which can produce a considerable amount of data to be processed and stored. For the recording of haptic interaction sessions, the strict latency constraints previoulsy described can be clearly dropped since all media (video, audio, haptic position and forcefeedback) are locally stored for later playback. Therefore, the haptic system no longer closes a loop over the network permitting the use of traditional compression methods for the recorded haptic signal such as shown in [11]. However, the approaches mentioned in [11] do not explicitly incorporate the limits of human haptic perception. Psychophysical studies [1], [5], [12] show that the intensity of a stimulus is relatively perceived in the human brain. The widely-known Weber’s Law formalizes the relation between the initial stimulus and the perceivable intensity of the following stimulus as ΔI =K I

(1)

where I is the initial stimulus, ΔI is the perceivable variation threshold, or the so-called Just Noticeable Difference (JND), and K is the threshold parameter which describes the linear relationship between the JND and the stimulus I. In other words, ΔI is the smallest amount of change that can in fact be perceived by the subject under the stimulus I. As demonstrated by the state-of-the-art approaches in [6], [8], perceptually motivated compression can be employed to greatly reduce the amount of transmitted/stored haptic samples. To further enhance data reduction capabilities, signal prediction methods can be employed. As long as the predicted sample falls within the JND thresholds, the so-called deadband, the original sample can be discarded and instead the predicted sample can be displayed. Whenever the deadband is violated, the original sample is sent/saved and used to further predict other signal values. These samples are referred as updates. The haptic signal is then reconstructed using the updates and the predicted samples. The reasonable adjustment of the deadband parameter K leads to high compression ratios while maintaining good subjective performance. In this paper we are particularly interested in data compression for recorded haptic interaction signals. Despite the fact that the approach utilized in [8], [9] provides a strong compression ratio and rather good subjective results, the perceptual truthfulness of the reconstructed signal according to Weber’s Law cannot be fully guaranteed. In [8], [9], the deadband violations are calculated based on the predicted samples, however, these samples are not actually used by the decoder. Instead, the saved updates are interpolated yielding a new reconstructed signal (line segments connecting the update samples) which was not tested for deadband violation at the encoder. Therefore the reconstructed signal can potentially violate the perception thresholds causing a significant deviation from the original signal. In other words, for the approach in [8], [9] the Weber

A

t

(a) A

t

(b) A

t

(c) A

t

(d) Fig. 2. Examples of how to determine support points and to reconstruct the signal. In all figures the black dots are the original samples, the grey areas are the respective perceptual deadbands, the red dots are the support points and the red line is the reconstructed signal. The illustrated cases are: (a) online deadband approach [6], [7], (b) offline deadband approach [8], [9] (the magenta dashed lines are the non-used predictions), (c) proposed offline scheme employing only the original samples and (d) proposed offline scheme employing different test points within the deadband.

parameter K has to be selected conservatively to ensure good fidelity to the original signal. Examples of the traditional deadband-based compression schemes for online [6], [7] and offline [8], [9] applications are depicted, respectively, in Fig. 2(a) and 2(b). In Fig. 2(a) the predicted samples are employed to reconstruct the signal until a violation occurs. When a violation is detected, an update (or similarly, a support point) is transmitted. In Fig. 2(b)

0.3 0.2 0.1 0 force (N)

the same prediction as in Fig. 2(a) is performed, however, the reconstructed signal is later determined by interpolating the update samples. It is evident in Fig. 2(b) that deadband violations can occur for such a procedure. Based on these observations and to further explore the limits of human perception for the compression of haptic signals, two novel compression approaches are proposed in this work. In Sections II and III, the proposed compression schemes for force-feedback and for position signals are presented respectively. In Section IV, the experiments evaluating the transparency and the data compression performances for the proposed schemes are described. This is followed by a discussion of the results and guidelines for future work.

−0.1 −0.2 −0.3 −0.4 −0.5 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 time (ms)

(a)

II. C OMPRESSION OF F ORCE -F EEDBACK DATA 0.3 0.2 0.1 0 force (N)

Inspired by the deadband-based data reduction approaches in [6], [8], a novel force-feedback signal compression scheme is proposed. The goal of this approach is to reconstruct the signal in such a way that all samples are enveloped by the deadband thresholds. The basic idea is to determine the least number of successive pairs of points that when connected by linear interpolation yields a reconstructed signal (set of line segments) which entirely remains within the perception thresholds as seen in Fig. 2(c) and 2(d). The perception thresholds can be straightforwardly calculated using Weber’s Law in (1) and the previously recorded force-feedback signal.

−0.1 −0.2 −0.3 −0.4 −0.5

A. Basic Proposed Algorithm

(b) 0.3 0.2 0.1 0 force (N)

The algorithm determines line segments with a starting point sn (e.g. the first signal sample) and ending points sn+m where m = 1...M − n and M is the total number of signal samples. The support points, s, are always contained within the deadband thresholds. All the line segments defined by pairs of points (sn , sn+m ) are tested for deadband violation and the longest segment which does not violate the perceptual thresholds is chosen to reconstruct that part of the signal. The algorithm proceeds to determine the next pair of points taking the starting point for the new segment as the previous sn+m . Pairs of points are subsequently tested until nonoverlapping line segments for the whole signal are determined. This procedure is illustrated in Fig. 3. As previously mentioned, the support points can be any points contained within the perceptual boundaries. Nevertheless, for a matter of simplicity they were initially chosen to be at the deadband thresholds or on the signal itself as observed in Fig. 3(b) and also on the left hand side of Fig. 4(b). To optimize the algorithm, a maximum number of consecutive violations, V , per starting point can be defined. It is expected that after V successive violations no further point can provide a non-violating line segment. Additionally, the proposed scheme does not test every combination of support points. While testing ending points, only one point (among the three available) at each time instant is tested. When the maximum number of violations V is reached we prevent the algorithm from testing longer segments making it return to the last non-violating line segment and tries another ending

5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 time (ms)

−0.1 −0.2 −0.3 −0.4 −0.5 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 time (ms)

(c) Fig. 3. In (a) the central blue curve and the surrounding green curves are, respectively, the original signal and its perceptual deadband thresholds. In (b) the magenta circles represent the tested support points and the red lines are the corresponding non-violating line segments. Violating line segments and their respective pair of points are omitted in this illustration. In (c) the reconstructed signal is depicted as the red dashed lines. The stored support points are shown as unfilled magenta circles.

point among the available ones. Both heuristics can save a great amount of computational effort without impairing the reconstruction performance.

Ay

To facilitate the understanding, the details and figures presented so far treated just one degree-of-freedom (DoF) signals. However, the force-feedback signal is typically composed by three DoFs, thus, the compression scheme can be more efficient if applied directly in the three-dimensional space. This extension is shown in the following subsection. In the 1-DoF case, the perceptual thresholds are defined as the upper and lower bound, namely I + ΔI and I − ΔI, of a given sample I as defined in (1). Analogously, in the multiDoF case, boundaries are commonly defined using isotropic shapes referred to as deadzones [6]. Thus, the deadzone for 3DoF can be defined as a spherical volume as seen in Fig. 4(a). A

Ay

t

Az

Ax

(a)

A

Ay

t

Az

Ax

(b) Fig. 4. On the left hand side of (a) a single sample in black, its respective deadband thresholds as red marks and the deadband itself as the gray area are depicted. On the right hand side, the extension for the 3-DoF case where the deadzone is defined as the enveloping sphere is illustrated. In (b) the possible support points (blue) and additionally the original sample (black) are shown.

As previously mentioned, in the 1-DoF case the support points coincide with the deadband thresholds or with the original sample itself as shown in Fig. 4(b). Similarly, for the 3-DoF case, seven possible support points are defined: six on the sphere surface and one additional point at the center of the sphere (the original sample itself). More or different points in the sphere could be selected, however, after a careful analysis it was decided to employ only seven points which cover major representative regions on the sphere while keeping low computational complexity. A parameter w is included in the algorithm to adjust where the six external points are to be located. The points are placed towards the center (w decreases) or towards the deadband boundaries (w increases). The value interval for w is [0, 1]. An example illustrating 0 < w < 1 can be observed in Fig. 5. III. C OMPRESSION OF P OSITION DATA As mentioned in the first section, in offline applications the signal can be read from a previously recorded haptic session and replayed to one or more users. Moreover, during playback no active interaction is demanded from the OP, therefore the

Fig. 5. The external support points (blue) defined by a given w where 0 < w < 1 is illustrated.

user can passively hold the haptic device. The force-feedback is haptically presented to the OP along with the position data which is visually exhibited in the monitor display. The proposed approach aims to compress the position signal taking into consideration that in offline applications the position data is only perceived by the user through visual display. Moreover, the closed-loop observed in Fig. 1 is no longer applicable since all the data was previously recorded and can be accessed locally. Thus, any compression applied to the position signal does not exert any influence on the force-feedback and does not compromise the stability of the system. Consequently, position signals can be compressed according to human visual perception limitations rather than haptic limitations as proposed in [8], [9]. Human visual perception is characterized by a considerably lower temporal resolution than the haptic sense. When subsequent images are displayed, the human brain processes them as part of a motion sequence when the images are shown typically with intervals of 30 up to 60 milliseconds [4]. These time intervals can be interpreted as the corresponding image refresh rates which can vary from 17 to 33 images per second on average. Depending on the image refresh rate or the velocity of the motion displayed to the user, movement illusion can still be provided but artifacts might be perceived such as flicker and judder [10]. To minimize such artifacts, higher image refresh rates may be utilized. 0.4 0.35 0.3

position (cm)

B. Extension to Three Degrees-of-Freedom

Ax

Az

0.25 0.2 0.15 0.1 0.05 0

0

200

400

600

800 1000 time (ms)

1200

1400

1600

Fig. 6. Example of the HLS approach. The blue curve corresponds to the original position signal, the magenta circles indicate the saved support points and the dashed red line depicts the reconstructed signal.

The proposed compression scheme simply reduces the position signal’s display rate (1 KHz) to a value usually

TABLE I S UBJECTIVE RATING SCHEME . Description no difference perceptible, but not disturbing slightly disturbing disturbing strongly disturbing

Rating 100 75 50 25 0

% stored samples

In our experimental haptic recording and replay system, a three-dimensional virtual environment containing a static toroidal object is employed (see CHAI3D [2] for more information). A haptic session has been previously run and both position and force-feedback data were recorded. The operator performed different actions such as tapping on the object and running along the surface of the toroid with different intensities and in several directions. The SensAble PHANTOM Omni haptic interface device is used to interact with the virtual environment. The playback consists of a 13-seconds replay and the sessions were randomly and blindly presented to the subjects. The position playback is visually delivered on the display monitor while the force-feedback is directly presented to the user via the haptic device. During replay the user is asked to hold the haptic display (i.e. the stylus of the PHANTOM device) in a fixed position at the center of the workspace. Fifteen subjects, mostly right-handed male between 20 and 30 years old, underwent these tests. The experiments were divided in two parts: force-feedback and position data compression. In both parts, the compressed signals were evaluated based on the objective compression ratio and the subjective quality assessment (transparency). For the transparency evaluation, the subjects underwent an initial training session where they could assess differences between non-compressed signals and highly compressed signals. The possible artifacts for both force-feedback and position data compression were explained and demonstrated. The subjects could freely replay the non-compressed session for reference at any time during the experiment. At the end of each displayed session the subjects were asked to grade that session according to Table I.

offline deadband proposed, sphere, w=0.0 proposed, sphere, w=0.5 proposed, sphere, w=1.0

5

4

3

2

1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

deadband parameter

Fig. 7.

Objective results for force-feedback signal compression.

120 offline deadband proposed, sphere, w=0.0 proposed, sphere, w=0.5 proposed, sphere, w=1.0

100

80

rating

IV. E XPERIMENTS AND R ESULTS

6

60

40

20

0

−20 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

deadband parameter

(a) 120

100

80

rating

employed as an image update rate. This reduction can be adjusted until no visual artifacts can be perceived. To maintain the original displaying rate and avoid synchronization issues during playback, the dropped samples can be later replaced by values predicted using the Hold Last Sample (HLS) approach such as seen in Fig. 6.

60

40

20 offline deadband proposed, sphere, w=0.0 proposed, sphere, w=0.5 proposed, sphere, w=1.0

0

−20 0

1

2

3

4

5

6

% stored samples

(b)

A. Force-Feedback Data Compression

Fig. 8. Subjective results (mean and standard deviation) for force-feedback signal compression. The region above the green line indicates the cases where no disturbance is perceived.

The subjects are instructed to firmly hold the PHANTOM Omni stylus while running the playback. They should observe if the force-feedback action presented at the haptic device corresponds to their expectations according to the visually displayed action. No compression was employed for the position signal.

In this experiment the offline deadband approach presented in [8], [9] was tested along with the proposed force-feedback data compression scheme. For the proposed method, a sphere was employed as the three-dimensional deadzone. The param-

eter w could be set to 0, 0.5 or 1. The maximum allowable number of subsequent violations V was set to 100. The deadband parameter K could assume any of the following values: 0.01, 0.05, 0.10, 0.15, 0.20, 0.30, 0.40, 0.75 and 0.95. As seen in Fig. 7, the compression ratio is significantly better for all variations of the proposed method. The difference in rate among the proposed algorithms (with distinct w values) can be roughly neglected. Furthermore, it can be observed in the subjective results in Fig. 8 that satisfactory perceptual quality is achieved with the proposed method. To further improve the perceptual experience, stricter deadband parameters could be employed in the proposed method while still keeping the amount of stored samples lower than in the traditional offline deadband approach [8], [9] as observed in Fig. 8(b). B. Position Data Compression The position data compression experiment consisted of replaying the recorded session for the subjects so they could visually evaluate the quality of the reconstructed signal. The refresh rates for the position data could be set according to R = 10, 15, 20, 25, 30, 35, 40, 50, 60 samples/second. To avoid loss of synchronicity with the force-feedback signal, the overall display rate was kept at 1000 samples per second. The points amid support samples were determined using the HLS approach as shown in Fig. 6. No compression was applied to the force-feedback signal. As already expected, the compression ratio C linearly follows the parameter R using the relation C = 1 − R/Ro where Ro represents the original sampling rate. In these experiments Ro =1000 samples/second. Thus, the compression ratio C is kept between 94% and 99%. proposed method

100

80

rating

60

40

20

0 10

15

20

25

30

35

40

45

50

55

60

R (samples/sec)

Fig. 9. Subjective results (mean and standard deviation) for position signal compression. The region above the green line indicates the cases where no disturbance is perceived.

In Fig. 9, it can be observed that for a refresh rate R ≥ 40 samples/second the position signal compression no longer provokes disturbing visual artifacts. V. C ONCLUSION In this work, two novel and independent approaches for compressing haptic signals in offline applications were pro-

posed. The first approach is a non-iterative method to determine the least amount of support samples that can reconstruct the force-feedback signal within perception thresholds according to the widely known Weber’s Law. The second approach takes into consideration the openloop property of haptic recording and playback. Hence, this scheme takes advantage of the fact that the position signal is only employed to visually update the system during replay and therefore can be compressed according to human visual perception limitations rather than haptic perception limitations (such as exploited in the first proposed approach). Experimental results show a higher compression ratio compared to the state-of-the-art perceptual compression scheme [8], [9] while maintaining comparable transparency. Additionally, the proposed force-feedback compression approach guarantees better accuracy concerning the deadband boundaries compared to [8], [9] since it entirely determines the reconstructed signal in compliance with the original’s signal perception thresholds. For future work, the authors intend to implement an embedded logarithmic quantized space in order to decrease the entropy of the reconstructed force-feedback signal which fits better the use of traditional entropy coders. Moreover, an additional step will be included to further minimize the deviation between the reconstructed and original signals. This step will rearrange the support points in order to better approximate the line segments to the original data. R EFERENCES [1] G. C. Burdea. Force and Touch Feedback for Virtual Reality. Ed. Wiley, New York, 1996. [2] F. Conti, F. Barbagli, D. Morris, and C. Sewell. CHAI 3D: An opensource library for the rapid development of haptic scenes. In Proc. of the IEEE World Haptics, Pisa, Italy, Mar. 2005. [3] W. R. Ferrell and T. B. Sheridan. Supervisory control of remote manipulation. IEEE Spectrum, 4(10):81–88, Oct. 1967. [4] E. B. Goldstein. Sensation and Perception. Wadsworth, 6th edition, 2002. [5] J. Greenspan and S. Bolanowski. Pain and Touch, chapter The Psychophysics of Tactile Perception and Its Peripheral Physiological Basis. Academic Press Inc., New York, 1996. [6] P. Hinterseer, S. Hirche, S. Chaudhuri, E. Steinbach, and M. Buss. Perception-based data reduction and transmission of haptic data in telepresence and teleaction systems. IEEE Trans. on Signal Processing, 56(2):588 –597, Feb. 2008. [7] P. Hinterseer, E. Steinbach, S. Hirche, and M. Buss. A novel, psychophysically motivated transmission approach for haptic data streams in telepresence and teleaction systems. In Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, PA, USA, Mar. 2005. [8] J. Kammerl and E. Steinbach. Deadband-based offline-coding of haptic media. In Proc. of ACM Multimedia, Vancouver, BC, Canada, Oct. 2008. [9] J. Kammerl and E. Steinbach. High-fidelity recording, compression, and replay of visual-haptic telepresence sessions. In Proc. of the IEEE Int. Conf. on Image Processing (ICIP), Hong Kong, China, Sep. 2010. [10] A. Punchihewa and D. G. Bailey. Artefacts in image and video systems; classification and mitigation. In Proc. of the Conf. of Image and Vision Computing, Auckland, New Zealand, Dec. 2002. [11] C. Shahabi, A. Ortega, and M. Kolahdouzan. A comparison of different haptic compression techniques. In Proc. of the IEEE Int. Conf. on Multimedia and Expo (ICME), Aug. 2002. [12] C. Sherrick and J. Craig. The psychophysics of touch. In Tactual Perception: A Sourcebook. Cambridge University Press, 1982.

Suggest Documents