Sense of Touch in Robots With Self-Organizing ... - Christian Balkenius

498

IEEE TRANSACTIONS ON ROBOTICS, VOL. 27, NO. 3, JUNE 2011

Sense of Touch in Robots With Self-Organizing Maps Magnus Johnsson and Christian Balkenius

Abstract—We review a number of self-organizing-robot systems that are able to extract features from haptic sensory information. They are all based on self-organizing maps (SOMs). First, we describe a number of systems based on the three-fingered-robot hand, i.e., the Lund University Cognitive Science (LUCS) Haptic-Hand II, that successfully extracts the shapes of objects. These systems explore each object with a sequence of grasps while superimposing the information from individual grasps after cross-coding proprioceptive information for different parts of the hand and the registrations of tactile sensors. The cross-coding is done by employing either the tensor-product operation or a novel self-organizing neural network called the tensor multiple peak SOM (T-MPSOM). Second, we present a system based on proprioception that uses an anthropomorphic robot hand, i.e., the LUCS haptic-hand III. This system is able to distinguish objects both according to shape and size. Third, we present systems that are able to extract and combine the texture and hardness properties from explored materials. Index Terms—Cognitive robotics, manipulators, self-organizing feature maps, tactile sensors, unsupervised learning.

I. INTRODUCTION ENSE of touch is of outmost importance in the field of robotics since a well-performing robot must be able to interact with objects in its environments. It is also important as it supports, and sometimes substitutes, the visual modality during recognition of objects. Like humans, robots need to perceive properties like shape, size, texture, and hardness and discriminate between individual objects by the sense of touch. Using passive touch, humans are able to gather information by receptors sensitive to pressure, heat, touch, and pain [1] and perhaps to determine the shape of the explored object as well. However, a more active exploration of the object enables cutaneous information from the skin and proprioceptive information from the joints to be combined, thereby allowing a larger amount of information about shape and size to be collected. It also makes the perception of texture possible. Together, these processes come under the ambit of haptic perception, which involves sensory as well as motor systems. Two important submodalities in haptic perception are texture and hardness perception. In noninteractive tasks, the estimation of properties, like the size and the shape of an external object, are, often to a large extent, based on vision only, and haptic perception is only employed when visual information about the object is not reliable. This might happen, for example, at bad-

S

Manuscript received August 1, 2010; revised December 22, 2010 and February 16, 2011; accepted February 16, 2011. Date of publication March 28, 2011; date of current version June 9, 2011. This paper was recommended for publication by Associate Editor W. K. Chung and Editor G. Metta upon evaluation of the reviewers’ comments. The authors are with the Lund University Cognitive Science, S-222 22 Lund, Sweden (e-mail: [email protected]; christian.balkenius@ lucs.lu.se). Digital Object Identifier 10.1109/TRO.2011.2130090

lighting conditions or when the object is more or less occluded. Haptic submodalities, like texture and hardness perception, are different in this respect. These submodalities are especially important because they provide information about the outer world that is unavailable for all other perception channels. Modeling of haptic perception and the implementation of haptic perception in robots are two neglected areas of research. Research has mainly focused on grasping and object manipulation [2]–[5], and many models of hand control have been focused on the motor aspect rather than on haptic perception [6], [7], although there are some exceptions, e.g., [8]–[21]. This paper reviews a number of self-organizing-robot systems that are able to extract features from haptic sensory information. They are all based on self-organizing maps (SOMs) [22]. The purpose is to demonstrate some principles as to how this can be done. Thus, our systems are not optimized and the purpose has not been to find out which system is better. First, we present some systems based on the three-fingered robot hand, i.e., the Lund University Cognitive Science (LUCS) haptic-hand II, that successfully extract the shapes of objects [23]. These systems explore each object with a sequence of grasps while superimposing the information from individual grasps after cross-coding of proprioceptive information for different parts of the hand and the registrations of tactile sensors. The cross-coding is done by employing either the tensor-product operation or a novel selforganizing neural network called the tensor multiple-peak SOM (T-MPSOM). Second, we present a system based on proprioception and an anthropomorphic robot hand, i.e., the LUCS haptic-hand III [24]. This system is able to map objects both according to shape and size. Third, we describe systems that are able to extract and combine the properties texture and hardness from explored materials [25]. The systems presented in this paper have been implemented using the modeling framework Ikaros [26]. II. SHAPE PERCEPTION We have implemented systems for the extraction of the shape of objects. These systems use the LUCS haptic-hand II and explore each object with a sequence of grasps. The gathered information from each grasp is cross coded by either the tensorproduct operation or T-MPSOMs. The coded information from the grasps during an exploration is superimposed into a representation used as input to an SOM, which learns to map objects to different regions. The LUCS haptic-hand II (see Fig. 1) is an 8-degree-offreedom (DOF) three-fingered robot hand equipped with 45 piezoelectric touch sensors developed at the LUCS. Each finger consists of two segments, which consists of an RC servo and a bracket together with a sensor plate mounted on the palmar side. The two segments of a finger are articulated against each other and the three fingers are articulated against a triangular

1552-3098/$26.00 © 2011 IEEE

JOHNSSON AND BALKENIUS: SENSE OF TOUCH IN ROBOTS WITH SELF-ORGANIZING MAPS

Fig. 1. (a) LUCS haptic-hand II while grasping Rubik´s cube. (b) Schematic overview of the LUCS haptic-hand II. The three-fingered robot hand has 8 DOFs. Each finger consists of two segments symmetrically mounted on a triangular plastic plate. The plastic plate is mounted on a wrist, which, in turn, is mounted on a lifting mechanism. Each finger segment is built with an RC servo and a servo bracket. The actuators of the LUCS haptic-hand II are controlled via a SSC-32 (Lynxmotion, Inc.), which is a controller board that can control up to 32 RC servos. Each finger segment is equipped with a (black) sensor plate containing seven or eight piezoelectric touch sensors.

plastic plate where they are mounted symmetrically. The sensor plates are equipped with seven or eight pressure-sensitive sensors, depending on whether the plate belongs to a proximal or a distal finger segment. The wrist enables horizontal rotation of the robot hand, and is, in turn, mounted on a lifting mechanism. A. Cross-Coding The first haptic model described in this paper uses the tensor product (or outer product) to combine cutaneous and proprioceptive information gathered by the robot hand in several steps. The tensor product is an operation between an n-dimensional column vector x = (x1 , . . . , xn )T and an m-dimensional row vector y = (y1 , . . . , ym ) resulting in an n × m matrix M , where Mij = xi yj . The function of the tensor product is to code one signal in terms of another signal, which is essentially transferring the signal from one coordinate system to another. By repeating the tensor recoding a number of times for each joint, the signals from the pressure sensors are gradually transformed into a code for the shape of the grasped object. The tensor operation thus constitutes the ideal operation on the sensory signals to produce a shape code. However, the size of the tensor code will increase substantially with each recoding making it impractical in practice. To overcome this problem, the second and the third models replace the tensor-product operations with the T-MPSOMs, which is a novel neural-network architecture that combines the computations of the tensor product with the merits of SOMs to limit the combinatorial explosion that results from the application of the tensor product. These two models differ in the way the activation of the T-MPSOMs is calculated. The T-MPSOM is a variant of the SOM with multiple activation peaks that takes two input vectors. Each neuron in the 2-D grid that constitutes the T-MPSOM has two weight vectors corresponding to the dimensionality of the two input vectors received in every iteration. To calculate the activity of a neuron, two partial activations are first calculated, corresponding to each of the input vectors. These activations are calculated by multiplying each element in one of the input vectors with an arbor function [27] and the

499

corresponding element in the weight vector for this input vector. The arbor function corresponds to the receptive field of the neuron. All these products are then summed up to obtain the partial activation. The two partial activations for the neuron are then used to calculate the final activation of the neuron. The method employed for this calculation depends on the variant of the T-MPSOM. In one of the models, the partial activations were multiplied, whereas in the other one, they were summed. Each neuron updates its weight vectors in each iteration. There is a contribution from every neuron in the neural network while updating the weights of a neuron. The degree of contribution from a single neuron depends on its activity and a Gaussian function of the distance to the weight-updating neuron. The input vectors as well as the weight vectors are normalized in each iteration. In mathematical terms, the T-MPSOM consists of an I × J matrix of neurons. In each iteration, every neuron nij receives the two input vectors xa ∈ K and xb ∈ L . nij has two weight a b ∈ K and wij ∈ L . The activity in the neuron nij vectors wij is given by I J a a a b b b yij = A (i, k)wij k xk A (j, l)wij l xl k =1

l=1

in the variant with multiplied partial activations and by I J a a b b yij = Aa (i, k)wij Ab (j, l)wij k xk + l xl k =1

l=1

in the variant with added partial activations, where Aa (u, v) = e−(u −(I /K )v )

2

/2σ 2

and Ab (u, v) = e−(u −(J /L )v )

2

/2σ 2

.

The updating of the weight vectors are given by a a a a wij k (t + 1) = wij k (t) − α(t)Nij (t) xk (t) − wij k (t) and

b b b b wij l (t + 1) = wij l (t) − α(t)Nij (t) xl (t) − wij l (t)

where 0 ≤ α(t) ≤ 1 is the learning rate with α(t) → 0 when t → ∞. The learning in each neuron is controlled by a function N describing the neighborhood of each neuron Nij (t) = where Nij (t) =

k

Nij (t) maxij Nij (t) yk l (t)G(nk l − nij )

l

and yk l (t) is the activity in nk l at time t, G() is a Gaussian function, and · is the Euclidean distance between two neurons. B. Model Design The haptic models are inspired by their biological counterparts, especially the second and the third models, which employ the T-MPSOM neural network. The models are all similar,

500

Fig. 2. Schematic depiction of the models of haptic-shape perception with the LUCS haptic-hand II.

except for the methods employed for coding the tactile information that are gathered during the haptic exploration, as will be explained below. In addition to the LUCS haptic-hand II, a sensory driver, and a motor driver, the models (see Fig. 2) consists of four common Ikaros modules and three instances of a model-specific module. The common modules will be described next. Grasping Module: It takes care of the individual grasping movements. The tactile information is coded by a vector with 45 elements corresponding to the 45 sensors. To let the robot hand adapt its shape to the grasped object, a grasping movement is executed by moving the proximal-finger segments until the total change of tactile sensors recorded exceeds a threshold or a maximally allowed position has been reached. When the proximal segments stops, the distal segments start to move in the same fashion. Commander Module: It is responsible for the exploration of the object. This is done by carrying out a sequence of nine different grasps at two different heights and with five different wrist angles. The exploration continues until a sequence of nine exploration grasps have been executed with the actual object. Short-Term-Memory Module: It receives matrices that code the cutaneous and proprioceptive information for the current iteration and superimposes these matrices. Therefore, the tactile information from the beginning to the end of the exploration of an object is put together into one matrix. More formally, this means that at each iteration of an exploration, the short-termmemory (STM) module takes as input an I × J matrix M in and outputs an I × J matrix M out , where at iteration t in the in in exploration of an object mout ij (t) = mij (0) + mij (1) + · · · +


min ij (t). The cutaneous and proprioceptive information is either coded by a chain of three tensor-product operations or by a chain of three T-MPSOM neural networks. SOM Module: When the exploration of an object is completed, the output matrix of the STM module that represents all the tactile information from the exploration is used as input to the SOM module that is a self-organizing neural network [22] with 225 neurons. If a lot of shapes and individual objects are going to be discriminated, a larger number of neurons might be needed. In this module, the final mapping of the explored objects takes place. In addition to these common modules, the models employ three instances of either a module that implements the tensorproduct operation or a module that implements the T-MPSOM neural network. The tactile information from the touch sensors is coded as a vector of real numbers. The proprioceptive information is coded as three vectors, i.e., coding servo positions, which are provided by the motor driver. The wrist angle and the vertical position of the robot hand are coded by two 10-element vectors. The angle and the position are coded by setting one element to 1 and all the others to 0. In addition, we use another vector that represents the positions of the finger joints that consists of a series of sequences of ten elements, with all but one element set to 0 and the remaining element set to 1, where each sequence codes for the position of a finger joint. In the first model, the first instance of the tensor-product operation takes as input a vector, which represents the wrist angle, and a vector, which represents the vertical position of the robot hand. The output from this instance is used (after rearrangement into a vector) as input to a second instance of the tensor-product module together with a vector that represents the positions of the finger joints. The output from this second instance is, in turn, used as input to a third instance of the tensor product module together with a vector that represents the current status of the touch sensors on the robot hand. The output from this instance is superimposed in the STM module, as described above, during the whole exploration of an object. The idea of using this chain of recoding with the tensor-product operation and superimposing is to make the final activity depend on all the joint angles as well as the sensor response during the exploration of the object. This establishes a code that depends on the 3-D shape of the grasped object during an exploration. In the second and the third model, the three instances of the tensor-product operation are substituted by three instances of a T-MPSOM module. This yields models that selforganize to a larger extent than the first model and that makes them more biologically plausible. The important benefit of the T-MPSOM compared with the tensor product is that the number of elements in the output from each instance can be downscaled dramatically. This avoids the combinatorial explosion that occurs when several tensor products are combined. The three instances of the T-MPSOM module could be imagined as corresponding to different areas in the human somatosensory cortex. The first instance, which takes as input-vector representations of the wrist angle and the vertical position of the robot hand, corresponds to the somatosensory areas in the human brain that receive proprioceptive information and code for


501

TABLE I TEST OBJECTS USED WITH THE THREE MODELS OF HAPTIC SHAPE PERCEPTION FOR THE LUCS HAPTIC-HAND II (A IS THE DIAMETER OR HEIGHT, B IS THE HEIGHT OR LENGTH, AND C IS THE WIDTH)

the localizations and orientations of the hand. The second instance, which together with the output from the first also takes as input a vector representation of the positions of the finger joints, corresponds to the somatosensory areas in the human brain that receive proprioceptive information and code for the localizations and orientations of the hand and the fingers. The third instance takes as input the output from the second instance as well as a vector representation of the current status of the touch sensors on the robot hand and corresponds to the somatosensory areas that integrate information from the proprioceptive submodality with cutaneous information. The second and the third models are identical apart from the method employed to calculate the activities of the neurons. In the second model, the activity of a neuron is calculated by multiplying the part activations, whereas the third model employs addition of the part activations to calculate the activities of the neurons. C. Shape-Perception Tests and Results To simplify, the tactile and proprioceptive information from the explorations of the objects, as presented in Table I, was written to files. Five explorations of each of the objects, i.e., a total of 30 explorations, were carried out to create a sample set used for both training and test. All the three models were trained with 1000 randomly chosen explorations from the sample set to organize the SOM module. The latter two models also included a preceding training phase with 5000 iterations (an exploration contains a varying number of iterations) to organize the three instances of the T-MPSOM module. Finally, the trained models were tested with each of the 30 samples in these sample sets. The first model, which employed a chain of tensor-product operations to code the tactile information, and the second model, which substituted the tensor-product operations with a chain of T-MPSOM neural networks with multiplied part activations, could learn to map the test objects according to shape, i.e., the spheres were mapped in one area of the SOM, the blocks were mapped in another area, and the cylinders were mapped to still another area of the SOM [see Fig. 3(a) and (b)]. The first model was also able to distinguish individual test objects [see Fig. 3(a)], and this is also true for the second model, with one exception when a boccia sphere was mistaken as a boule sphere [see Fig. 3(b)]. The grasping tests with the third model, which substituted the tensor-product operations with a chain of T-MPSOM neural networks with added part activations went out slightly worse

Fig. 3. (a) Results with the model that uses a chain of three tensor-product operations to code the tactile information. (b) Results with the model that uses a chain of three instances of the variant of the T-MPSOM that multiplies the part activations. (c) Results with the model that uses a chain of three instances of the variant of the T-MPSOM that adds the part activations. Each image represents the grid of neurons in the SOM of the model and shows the centers of activity during the testing. The centers of activity due to explorations of objects from each shape category has been encircled. All three models are, more or less, able to map the test objects according to shape, and the first and the second models also identified most of the individual objects.

than the previous two models but still performed quite well [see Fig. 3(c)]. This model was able to learn to map the objects according to shape with one exception. The exception was a boccia sphere, which was confused with the cylinder. Unlike the two other models, this model was not able to discriminate between individual objects within the same shape category. III. SHAPE AND SIZE PERCEPTION We have implemented a proprioception-based system that is able to map objects both according to shape and size. This system is based on the LUCS haptic-hand III and gathers proprioceptive information by reading variable resistors internal to the servo control circuits while grasping objects. The proprioceptive information is used as input to an SOM that gets organized according to the properties like shape and size. The LUCS haptic-hand III is a five-fingered 12-DOF anthropomorphic robot hand equipped with 11 proprioceptive

502


TABLE II SIXTEEN OBJECTS USED IN THE EXPERIMENTS WITH THE MODEL BASED ON THE LUCS HAPTIC-HAND III

muscles that actuate the fingers of a human hand as well. The hand is actuated by a total of 12 servos. A. Model Design

Fig. 4. LUCS haptic-hand III while holding a screw driver in open position seen in a front view and in a side view. Some of the actuators in the forearm can also be seen in the side view. The 12-DOF robot hand has five fingers, is of the same size as a human hand, and all its parts have approximately the same proportions as their counterparts in a human hand. Each finger can be separately flexed/extended in the proximal joint, whereas the medial and distal joints are flexed/extended together as real human fingers. As a human hand, the thumb has only a proximal and a distal phalange. These can also be separately flexed/extended. In addition, the thumb can also be adducted/abducted in a way similar to the human thumb. The wrist is capable of flexion/extension.

sensors (see Fig. 4). The robot hand has a thumb consisting of two phalanges, whereas the other fingers have three phalanges. The thumb can be separately flexed/extended in both the proximal and the distal joints and adducted/abducted. The other fingers can be separately flexed/extended in their proximal joints, whereas the middle and the distal joints are flexed/extended together. All this is similar to the human hand. The wrist can also be flexed/extended as the wrist of a human hand. The phalanges are made of plastic pipe segments, and the force transmission from the actuators, which are located in the forearm, is handled by tendons inside the phalanges in a similar way to the tendons of a human hand. All fingers, except the thumb, are mounted directly on the palm. The thumb is mounted on a servo, which enables the adduction/abduction. The servo is mounted on the proximal part of the palm, which is similar to the site of the thumb muscles in a human hand. The actuators of the fingers and the wrist are located in the forearm. This is similar to the

This model (see Fig. 5) consists of the LUCS-haptic hand III, sensory and motor drivers, an SOM, and a commander program that executes the grasping movements. The sensory driver scans the proprioceptive sensors when requested to do so by the commander program, while the motor driver translates high-level motor commands from the commander to commands that are appropriate for the robot hands servo controller board. When the commander executes a grasp, the robot hand is closed around the object. When the robot hand is fully closed, the sensory driver samples the registrations from the 11 proprioceptive sensors and conveys the information as an 11-element vector to the SOM, which is activated and adapts its weights if the model is in the learning phase. The SOM is a 225-neuron dot-product SOM with plane topology, which uses softmax activation with the softmax exponent equal to 10 [28]. B. Grasping Tests and Results We have tested this model with ten objects (see Table II, objects a–j). These objects are either cylinder-shaped or blockshaped. There are five objects of each shape category. All objects are sufficiently high to be of a nonvariable shape in those parts grasped by the robot hand, e.g., a bottle used is grasped on the part of equal diameter below the bottle neck. During the grasping tests, the test objects were placed in a fixed position on a table with the open robot hand around them. If the objects were block-shaped, we always placed the widest (i.e., the widest of A and B given in Table II) side against the palmar side. In total, 50 grasps were carried out (with five on each object), and the sensory information was written to a file. Then, the SOM was trained and tested with this set of 50 samples. The training phase lasted for 2000 iterations, and then,


503

Fig. 5. Schematic depiction of the single grasp model. The commander program executes the grasps by sending high-level motor commands to the motor driver, which translates and conveys the information to the servo controller board of the robot hand. When the robot hand is fully closed, the commander program request a scanning of the registers of the 11 proprioceptive sensors of the robot hand. The sensory information is conveyed as a vector to the SOM.

the weight adaptation was turned off, and each sample was input to the SOM again and the activation recorded. We have also tested if the model is able to generalize its knowledge to new objects, i.e., to objects not included in the training set. To this end, we used six new objects (see Table II, 1–6)—three cylinder-shaped objects and three block-shaped objects. The new objects were of variable sizes. The fully trained model was fed by input from grasps of the new objects under the same conditions as the objects in the training set. Each object in the new set was grasped once and the activity in the SOM was recorded (see Fig. 5). In Fig. 6(a), the mapping of individual grasps have been grouped. Each group encloses the mapping of grasps of a single test object. One grasp of the olive oil bottle, one grasp of the tube, and one grasp of the plastic bottle 2 have been excluded from the grouping since they are not mapped together with the other grasps of the same object, and they are mapped in the wrong shape category (although they are mapped at a proper place while considering size) as well. As can be seen in Fig. 6(a), the model is able to discriminate between individual objects, although not perfectly. The SOM seems to be organized according to shape, as can be seen in Fig. 6(b). Four groups of objects can be distinguished in the map. The same three objects, as shown in Fig. 6(a), have been excluded, and for the same reason. One of the groups encompasses large block shapes, one group encompasses small block shapes, one group encompasses large cylindrical shapes, and one group encompasses small cylindrical shapes. Thus, the model seems to be able to discriminate between shapes, and it also groups the shapes according to whether they are bigger or smaller. The SOM also seems to have become organized in a way, so that the mapping of the test objects are ordered in a clockwise manner according to size from smaller to larger. It seems as the surface of the block shaped objects turned against the palmar side of the hand during grasping has precedence when the SOM organizes according to size, and this surface is also what we consider when we say that the SOM is ordered according to size. That the surface turned at the palmar side has precedence

Fig. 6. Mapping of the test objects. The characters a–j and the numbers 1–6 refer to the objects given in Table I. Each square represents a neuron in the SOM, which consists of 15 × 15 = 225 neurons. The presence of a letter in a square indicates a center of activation in the SOM for the corresponding object. The occurrence of a certain letter in more than one square means that the corresponding object has different centers of activation during different grasps of the same object, i.e., all letters of a certain kind represents all occurring centers of activation in the SOM when the system was tested with the corresponding object. (a) Mapping of the individual objects. The mappings of the samples of each training object have been encircled. The mappings of three samples (of the 50 training samples) were considered as outliers and excluded when we encircled the areas for each of the ten training objects. (b) Four groups of objects can be distinguished in the map. The same three objects as in (a) were excluded when we encircled the four areas and for a similar reason. One group encompasses large block shapes, one group encompasses small block shapes, one group encompasses large cylindrical shapes, and one group encompasses small cylindrical shapes. (c) Mapping of the test objects are ordered clockwise from small to large according to size with one exception, i.e., plastic bottle 1. Within a shape category, the test objects are mapped clockwise from small to large according to size without exception. (d) In the generalization experiment, the test objects are mapped, so that they can be identified with the most-similar object in the training set. The encircled areas are the same as those in (a). The test objects are also ordered according to size in the same way as the objects in the training set and are correctly mapped according to shape.

is also what would be expected since this information should, in some way, be coded by the proprioceptive information from all the fingers but the thumb, whereas the perpendicular surface (in the case of a block shape) is only coded by the proprioceptive information from the thumb. There is one exception to the size ordering in the SOM although, namely, plastic bottle 1, as can be seen in Fig. 6(c). However, within a shape category, the test objects are mapped clockwise from smaller to larger according to size without exceptions.

504


The results are interesting because they reveal that the proprioceptive information encompasses information about both the shape and the size of the grasped objects and, in addition, information that enables discrimination of the individual objects to some extent. The result of the generalization experiment is depicted in Fig. 6(d). As can be seen, each of the objects is mapped, so that it can be identified as the most-similar object in the training set, i.e., if the test object is block-shaped, then it is mapped in the same area as the most-similar block shaped object in the training set, and if the test object is cylinder shaped then it is mapped in the same area as the most-similar cylinder-shaped object in the training set, which also means that all test objects are mapped, so that they are ordered according to size in the same way as the objects in the training set and that they are correctly mapped according to shape. Thus, the model’s ability for generalization is total for the tested objects. IV. HARDNESS AND TEXTURE PERCEPTION We have implemented some systems that are able to extract and combine the properties texture and hardness from explored materials. These systems employ a microphone-based texture sensor and/or a hardness sensor that measures the displacement of a stick pressed at the material with a constant force. The two sensors discussed here are developed at the LUCS. One of these sensors is a texture sensor and the other is a hardness sensor. The texture sensor consists of a capacitor microphone with a tiny metal edge-mounted at the end of a moveable lever, which, in turn, is mounted on an RC servo. When exploring a material, the lever is turned by the RC servo, which moves the microphone with the attached metal edge along a curved path in the horizontal plane. This makes the metal edge slide over the explored material, which creates vibrations in the metal edge with frequencies that depend on the textural properties of the material. The vibrations are transferred to the microphone since there is a contact between it and the metal edge. The signals are then sampled and digitalized by a NiDaq 6008 (National Instruments) and conveyed to a computer via a universal-serial-bus (USB) port. The fast Fourier transform (FFT) algorithm is then applied to the input to yield a spectrogram of 2049 component frequencies. The hardness sensor consists of a stick mounted on an RC servo. During the exploration of a material, the RC servo tries to move to a certain position, which causes a downward movement of the connected stick at a constant pressure. In the control circuit inside the RC servo, there is a variable resistor that provides the control circuit with information whether the RC servo reaches the required position or not. In our design, we measure the value of this variable resistor at the end of the exploration of the material and, thus, get a measure of the end position of the stick in the exploration. This end position is proportional to the compression of the explored material. The value of the variable resistor is conveyed to a computer and represented in binary form. A. Explorations of Objects The systems employing the hardness and/or texture sensor have been trained and tested with one or both of the two sets of

TABLE III EIGHT MATERIALS USED IN THE EXPERIMENTS WITH ALL TEXTURE/HARDNESS SYSTEMS

samples. One set consists of 40 samples of texture data, and the other set consists of 40 samples of hardness data. These sets have been constructed by letting the sensors simultaneously explore each of the eight materials, as described in Table III, five times. During the hardness exploration of a material, the tip of the hardness sensor was pressed against the material with a constant force and the displacement was measured. The exploration with the texture sensor was done by letting its lever turn 36◦ during 1 s. During this movement, the vibrations from the metal edge slid over the material and were recorded by the microphone mounted at the end of the stick. The output from the texture sensor from all these explorations was then written to a file after the application of the FFT. Likewise, the output from the hardness sensor was written to a file represented as binary numbers. The hardness samples can be considered to be binary vectors of length 18, whereas the texture samples can be considered to be vectors of length 2049. The eight materials have various kinds of texture and can be divided into two groups: one with four rather-soft materials and one with four rather-hard materials. During the exploration, the materials were fixed in the same location under the sensors. B. Texture Perception The texture-perception system [see Fig. 7(a)] is a monomodal system. This means that the raw-sensor output from the texture sensor is transformed by the FFT into a spectrogram containing 2049 frequencies, and the spectrogram represented by a vector is, in turn, conveyed to an SOM, which uses softmax activation [28] with the softmax exponent equal to 10. After training, the SOM will represent the textural properties of the explored materials. We have experimented with different parameter settings of the texture SOM, both with the aim to get a well-working monomodal system and to get a system that would serve well as a part of a bimodal system, and we reached the conclusion that a well-working set of parameters is to use an SOM with 15 × 15 neurons with a plane topology. A torus topology was also tested but turned out to be less effective than a plane topology. The sort of topology used influences the behavior of the SOM at the borders. With plane topology, the activations from the objects in the training set tend to be close to the borders, which turned out to be good when the texture-perception system was used as a part of the combined monomodal/bimodal system described


505

C. Hardness Perception

Fig. 7. Schematic depiction of the texture/hardness systems architectures. (a) Monomodal system for texture perception. The raw-sensor output is transformed by the FFT into a spectrogram containing 2049 frequencies. The spectrogram represented by a vector is conveyed to an SOM. (b) Monomodal system for hardness perception. The raw-sensor output represented as a binary vector with 18 elements is conveyed to an SOM. (c) System with both monomodal and bimodal representations. This system could be seen as a merging and an extension of the previous two systems, or likewise, the previous two systems could be seen as the monomodal level of this system. The output from the texture SOM and the output from the hardness SOM is merged, i.e., a new vector is created by transforming the activations of the texture SOM and the hardness SOM into vectors and putting them one after another. The merged vector is used as input to a bimodal SOM. This means that in this system there are self-organizing representations of texture and hardness as well as a combined representation of both. (d) Bimodal system. This system directly combines the output from the FFT and the binary output from the hardness sensor into a new vector in the same way as described in the previous system but without the monomodal-representation step. The combined vector is used as input to a bimodal SOM.

below. We also experimented with different decay rates of the Gaussian neighborhood function. We came to the conclusion that a neighborhood radius of 15 at the start of the training phase, which decreased gradually until it was approximately 1 after 1000 iterations and stayed at 1 during the rest of the training phase, was satisfactory. This system and all the others were trained during 2000 iterations before evaluation. We reasoned that it would have been good if the neighborhood had shrunk to a small value after about 1000 iterations in order to let the bimodal SOM of the combined system, described below, get enough iterations to self-organize. In other words, the idea was that the texture SOM should have been rather well organized after 1000 iterations.

The hardness-perception system is also monomodal. In this system [see Fig. 7(b)], the raw-sensor output from the hardness sensor, which is represented as a binary number with 18 bits, is conveyed to an SOM, which, like the texture system, uses softmax activation with the softmax exponent equal to 10. After training, the SOM will represent the hardness property of the explored materials. As in the case of the texture system, we have experimented with different parameter settings of the hardness SOM and for the same reasons. In this case, we also tested several different sizes of the monomodal SOM. This was because preliminary experiments indicated that it could be a good idea to use a very small-sized SOM for hardness in the combined system described below. A small-sized hardness SOM seemed to self-organize solely according to the hardness property and was unable to distinguish individual objects, and since the texture SOM was better in distinguishing individual objects, we did not want the hardness part to blur this although we wanted it to make the bimodal representation get organized according to hardness as well. We tried SOMs with planar as well as torus topology and with 15 × 15, 10 × 10, 5 × 5, 2 × 2, or 1 × 2 neurons. All variants started with a neighborhood size that covered the whole SOM and the rates of decay of the neighborhood were adjusted so that the neighborhood would shrink to a radius of approximately 1 after about 1000 iterations. As we had expected, the 15 × 15-neuron SOM (with plane topology) was best in this monomodal system, but we also found that, as suspected, all tested sizes but one indeed organized to divide the objects into the hard and soft categories. The exception was the SOM with only 1 × 2 neurons, which did not preserve the division of hard and soft materials in a good way. D. Combined Monomodal and Bimodal System Fig. 7(c) shows a combined system where the activity of the monomodal SOMs are rearranged into vectors by creating a new vector by putting the hardness output vector after the texture output vector. The monomodal texture SOM used 15 × 15 neurons with the same parameter setting as in the texture system. In the case of the monomodal hardness SOM, we tried two different variations, namely, a 2 × 2-neuron SOM and a 15 × 15-neuron SOM with the settings specified in the hardness system above. Both worked fine, but the variation with the 2 × 2-neuron SOM yielded the best representation in the bimodal SOM. The bimodal SOM had similar settings as the monomodal texture SOM, but the decay rate of the neighborhood was set to decrease the neighborhood radius to 1 in 2000 iterations. E. Bimodal System In the bimodal system [see Fig. 7(d)], we combined the output from the texture sensor, after transformation into a spectrogram by an FFT, with the raw hardness sensor output expressed as a binary number by the same method as in the combined system described above, i.e., by putting the output vector from the hardness sensor after the output vector from the FFT. This means

506


rons in the SOM), discriminates hard from soft materials well. In two explorations, the hard/soft category is undetermined. This is so because one exploration of a material a and one exploration of a material g have the same center of activation. It also discriminates perfectly between the materials b, d, f, and h. The bimodal system [see Fig. 8(d)] discriminates perfectly between the materials c, d, e, f, and h, i.e., the same materials as in the texture system. Moreover, it also discriminates hard from soft materials, although in seven explorations, the hard/soft category is undetermined because three explorations of the material a and four explorations of the material b have the same center of activation. V. CONCLUSION

Fig. 8. Mapping of the materials used in the experiments with the different SOM-based systems. The characters a–h refer to the materials in Table III. Each image in the figure corresponds to an SOM in a fully trained system, and each square represents a neuron in the SOM, which consists of 15 × 15 = 225 neurons. (Filled circle) Particular neuron is the center of activation for one or several explorations. The occurrence of a certain letter at more than one place means that the corresponding material has different centers of activation during different explorations of the same material, i.e., all letters of a certain kind represents all occurring centers of activation in the SOM when the system was tested with the corresponding material. (a) Monomodal SOM in the texture system. (b) Monomodal SOM in the hardness system. (c) Bimodal SOM in the combined monomodal and bimodal system. (d) Bimodal SOM in the bimodal system.

that this system has no monomodal representations. The combined vector was used as input to a bimodal SOM with the same settings as in the combined system above. F. Results and Discussion The mapping of the materials (see a–h in Table III) used in the experiments with the different SOM-based systems is depicted in Fig. 8. Each image in the figure corresponds to an SOM in a fully trained system and each cell in an image corresponds to a neuron in the corresponding SOM. A filled circle in a cell means that that particular neuron is the center of activation in one or several explorations. In Fig. 8(a), the mapping of individual-texture explorations with the texture system have been encircled. As can be seen, most materials are mapped at separate sites in the SOM (i.e., c, d, e, f, and h). There are some exceptions though, namely, a, b, and g. Therefore, the texture system is able to imperfectly discriminate between individual materials. The SOM in the hardness system, as depicted in Fig. 8(b), also maps different materials at different sites in the SOM but not as well as the texture system. The hardness system recognizes b, f, and h perfectly and blurs the other more or less. However, the system perfectly discriminates hard from soft materials. The combined monomodal and bimodal systems [see Fig. 8(c)], which can be seen as a merging and extension of the texture system and the hardness system (with 2×2 neu-

We have presented a number of haptic robot systems based on two robot hands, i.e., the LUCS haptic hands II and III, and sensors for texture and hardness. Three working models of haptic shape perception were based on the LUCS haptic-hand II. The first of these models employed a chain of tensor-product operations to code the tactile information, while the two other models substituted the tensor-product operations with instances of the novel T-MPSOM neural network. The reason for using a chain of tensor-product operations or T-MPSOMs to code the tactile information instead of only one is to capture the relations between signals. The latter two models differed in that they used slightly different variants of the T-MPSOM neural network. In one case, the part activations of a neuron were multiplied to decide the total activation of the neuron, while in the other case, the part activations were added to get a more biologically plausible model. The first (i.e., with tensor products) and the second model (i.e., with T-MPSOMs with multiplied part activations) were able to learn to map the test objects according to shape, as well as to identify individual objects. The capacities of these models should be comparable with that of a human being. One system for haptic size and shape perception was based on our anthropomorphic robot hand, i.e., the LUCS haptic-hand III. This system only used proprioceptive information, which resulted in a very well-performing haptic system. This system was also able to map the sizes of the objects in an ordered fashion, and to discriminate between objects as long as they were not too similar. A human would have a similar problem if she was not able to detect the material properties of the objects or expressed differently, if all objects were of exactly the same material and weight. We also successfully tested the systems’ ability to generalize learning to six novel objects. Four systems for object recognition were based on textural and/or hardness input. The texture sensor employed is based on the transmission of vibrations to a microphone when the sensor slides over the surface of the explored material. The hardness sensor is based on the measurement of displacement of a stick when pressed against the material at a constant pressure. The results are encouraging, both for the monomodal systems and the bimodal systems. The bimodal systems seem to benefit from both submodalities and yield representations that are


better than those in the monomodal systems. This is particularly true because the bimodal representations preserve the discrimination ability of the monomodal texture system as well as seem to preserve the way that the system groups the objects. The influence of the hardness input makes the bimodal representation organized according to hardness as well. Although the resulting shape coding in the different systems depends on the submodalities used, it may be possible to associate these representation with, for example, visual representations of objects or appropriate motor commands for manipulation. Toward this end, we are currently investigating mechanism that can be used to automatically form associations between different SOMs [29]. Because of the successful approaches in the systems presented in this paper, in the near future, we will continue our research in haptic perception by trying to merge the findings from all these systems. We will try to design a system based on an anthropomorphic robot hand with proprioceptive sensors that uses tensor-product operations or T-MPSOMs to cross code the angles of different parts of the hand, tactile information, and information about texture and hardness. Although this combined system will be designed by hand, it would also be interesting to look at automatic methods that could combine information from many submodalities and the relative merits of different types of coding of the sensory signals in the maps. The cross-coded information will be superimposed over time as in the systems based on the LUCS haptic-hand II or perhaps by employing leaky integrators. REFERENCES [1] S. Millar, “Network models for haptic perception,” Infant Behav. Dev., vol. 28, no. 3, pp. 250–265, 2006. [2] P. Dario, C. Laschi, A. Menciassi, E. Guglielmelli, M. Carrozza, and S. Micera, “Interfacing neural and artificial systems: from neuroengineering to neurorobotics,” in Proc. 1st Int. IEEE EMBS Conf. Neural Eng., 2003, pp. 418–421. [3] K. DeLaurentis C. Mavroidis. (2000). Development of a shape memory alloy actuated robotic hand [Online]. Available: http://citeseer.ist.psu.edu/383951.html [4] C. Rhee, W. Chung, M. Kim, Y. Shim, and H. Lee, “Door opening control using the multi-fingered robotic hand for the indoor service robot,” in Proc. IEEE Int. Conf. Rob. Autom., 2004, pp. 4011–4016. [5] H. Sugiuchi, Y. Hasegawa, S. Watanabe, and M. Nomoto, “A control system for multi-fingered robotic hand with distributed touch sensor,” in Proc. 26th Annu. Conf. IEEE IECON, 2000, pp. 434–439. [6] M. A. Arbib, A. Billard, M. Iacoboni, and E. Oztop, “Synthetic brain imaging: Grasping, mirror neurons and imitation,” Neural Networks, vol. 13, pp. 975–999, 2000. [7] A. H. Fagg and M. A. Arbib, “Modeling parietal premotor interactions in primate control of grasping,” Neural Netw., vol. 11, no. 7/8, pp. 1277– 1303, 1998. [8] P. K. Allen and P. Michelman, “Acquisition and interpretation of 3-d sensor data from touch,” IEEE Trans. Robot. Autom., vol. 6, no. 4, pp. 397–404, 1990. [9] S. Caselli, C. Magnanini, and F. Zanichelli, “Haptic object recognition with a dextrous hand based on volumetric shape representations,” in Proc. IEEE Int. Conf. MFI, 1994, pp. 280–287. [10] P. Dario, C. Laschi, M. Carrozza, E. Guglielmelli, G. Teti, B. Massa, M. Zecca, D. Taddeucci, and F. Leoni, “An integrated approach for the design and development of a grasping and manipulation system in humanoid robotics,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2000, pp. 1–7. [11] I. Erkmen, A. M. Erkmen, A. E. Tekkaya, and T. Pasinlioglu, “Haptic perception of shape and hollowness of deformable objects using the anthrobot—III robot hand,” Robot. Syst., vol. 16, no. 1, pp. 9–24, 1999.

507

[12] G. Heidemann and M. Sch¨opfer, “Dynamic tactile sensing for object identification,” in Proc. IEEE Int. Conf. Robot. Autom., 2004, pp. 813– 818. [13] K. Hosoda, Y. Tada, and M. Asada, “Anthropomorphic robotic soft fingertip with randomly distributed receptors,” Robot. Auton. Syst., vol. 54, no. 2, pp. 104–109, 2006. [14] J. Jockusch, J. Walter, and H. Ritter, “A tactile sensor system for a threefingered robot manipulator,” in Proc. IEEE Int. Conf. Robot. Autom., 1997, pp. 3080–3086. [15] L. Natale and E. Torres-Jara, “A sensitive approach to grasping,” in Proc. 6th Int. Workshop Epigenetic Robot., 2006, pp. 87–94. [16] E. M. Petriu, S. K. S. Yeung, S. R. Das, A. M. Cretu, and H. J. W. Spoelder, “Robotic tactile recognition of pseudorandom encoded objects,” IEEE Trans. Instrum. Meas., vol. 53, no. 5, pp. 1425–1432, 2004. [17] S. A. Stansfield, “A haptic system for a multifingered hand,” in Proc. IEEE Int. Conf. Robot. Autom., 1991, pp. 658–664. [18] C. Taddeucci, C. Laschi, R. Lazzarini, R. Magni, P. Dario, and A. Starita, “An approach to integrated tactile perception,” in Proc. IEEE Int. Conf. Robot. Autom., 1997, pp. 3100–3105. [19] W. W. Mayol-Cuevas, J. Juarez-Guerrero, and S. Munoz-Gutierrez, “A first approach to tactile texture recognition,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., 1998, pp. 4246–4250. [20] J. Edwards, C. Melhuish, J. Lawry, and J. Rossiter, “Feature identification for texture discrimination from tactile sensors,” in Proc. TAROS, 2007, pp. 115–121. [21] M. Campos and R. Bajcsy, “A robotic haptic system architecture,” in Proc. 1991 IEEE Int. Conf. Robot. Autom., 1991, pp. 338–343. [22] T. Kohonen, Self-Organization and Associative Memory. New York: Springer-Verlag, 1988. [23] M. Johnsson and C. Balkenius, “Neural network models of haptic shape perception,” Robot. Autonom. Syst., vol. 55, pp. 720–727, 2007. [24] M. Johnsson and C. Balkenius, “Experiments with proprioception in a self-organizing system for haptic perception,” in Proc. TAROS, 2007, pp. 239–245. [25] M. Johnsson and C. Balkenius, “Recognizing texture and hardness by touch,” in Proc. IROS, 2008, pp. 482–487. [26] C. Balkenius, J. Mor´en, B. Johansson, and M. Johnsson, “Ikaros: Building cognitive models for robots,” Adv. Eng. Inf., vol. 24, no. 1, pp. 40–48, Jan. 2010. [27] P. Dayan, “Competition and arbors in ocular dominance,” in Proc. NIPS, 2000, pp. 203–209. [28] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.: Oxford Univ. Press, 1995. [29] M. Johnsson, C. Balkenius, and G. Hesslow, “Associative self-organizing map,” in Proc. Int. Joint Conf. Comput. Intell., 2009, pp. 363–370.

Magnus Johnsson received the B.Sc. degree in computer science in 2003, the M.Sc. degree in computer science in 2004, the Master’s degree in cognitive science in 2004, and the Ph.D. degree in cognitive science in 2009, all from the Lund University Cognitive Science (LUCS), Lund, Sweden. He is currently a Researcher with the LUCS. His current research interests include neural networks, the modeling human haptic perception, and the implementation of haptic perception in robots. He has designed and implemented several robot hands and bioinspired systems for haptic perception.

Christian Balkenius received the Ph.D. degree in cognitive science from the Lund University Cognitive Science (LUCS), Lund, Sweden, in 1995. He is currently a Professor of cognitive science with the LUCS. His current research interests include the cognitive and developmental processes involved in learning and perception both at a neural and at a computational level. His research includes models of classical and instrumental conditioning, learning processes in the control of visual attention, and sensory and motor processes.