Fast Direct and Inverse Model Acquisition by Function Decomposition Remis Balaniuk1 , Emmanuel Mazer and Pierre Bessiere LIFIA - University of Grenoble 46, ave. Felix Viallet 38031 Grenoble, France (1 - Phd student supported by CNPQ-Brasil e-mail:
[email protected])
Abstract A computational approach to direct and generalized inverse model acquisition is presented. The approach is based on a proposed method to direct model acquisition from partial information. The method decomposes an hyper-space function in one variable functions, simplifying the learning problem. The acquired direct model is then implemented in a tree-like structure that can be used in the inverse sense without additional learning effort. Our approach is able to acquire complete models in hyper-spaces requiring only selected data focused in onedimension sub-spaces, strongly reducing the data acquisition effort. Our approach is particularly interesting for applications in robotics. The acquisition of direct models in robotics frequently takes place on high dimension phase spaces. When traditional approximation methods are used, enormous data bases, containing the examples to be interpolated, are required. This data bases are costly obtained.
1
Introduction
To consider learning as the process of acquiring nonlinear relations from inputs to outputs seems to be very useful in machine learning. Knowledge is seen as a continuous hyper-surface that maps an input multidimensional space into an output multidimensional space [2]. In the same way, learning can be thought as the reconstruction of this hyper-surface from sparse data points [14]. In a robot, for example, inputs can be the sensorial information and outputs the action controls [12]. Function approximators (neural networks in particular) are actually very popular in this kind of learning [3]. Sample bases are used to determine continuous non-linear functions, that can be generalised to unknown points in the input space. However, in practice, the use of interpolation methods for higher dimensional spaces is strongly determined by
the sample bases. A good sample base must be locally well distributed all over the entry space and this represents a number of examples exponentially proportional to the dimension of the entry space. Consequently, learning becomes an off-line exhaustive task of data acquisition. In physical systems, this acquisition may be excessively time consuming when data measurements are costly. Moreover, noises can be added when there are critical gaps in the data due to sampling variation. Theoretical analyses of the approximation methods and of the surface's complexity can determine the conditions for a data base to enable a good approximation and a good interpolation [14]. Another research field, intensively explored, proposes strategies for data acquisition in order to efficiently select data points for use in learning. The problem has been referred to variously as "active learning", "sequential design" or "query-based learning" [5][8]. In the connectionist literature, there is a related approach in which a network is trained by selective attention. [1]. Selective attention assumes that the approximation method is allowed to play an active role in either generating new exemplars (query learning) or selecting exemplars from available examples (active exemplar selection). In query learning we are allowed to select input values where future data will be gathered. In this case, the goal is to minimise the number of exemplars one has to gather, or to corroborate information crucial to the estimate [7] [15] [4] [10] [9] . Active exemplar selection is concerned with selecting a concise subset of the available examples for use as training exemplars. This approach is concerned with minimising the number of exemplars one has to train upon, given that many are available [13]. In this paper we propose a method to calculate function values using only selected information, like in query learning. The queries to obtain data are driven varying just one input space co-ordinate in each query series, with the other co-ordinates remaining static. A query series
determines a one input variable function, that we call a shape of the original surface in that dimension. Approximators are used to interpolate the examples obtained in the query series. There is no need of exhaustive sample bases like in traditional use of approximators. To complete the method, a recurrent transformation using a number of shape functions, will reproduce the original hyper-surface. The number of shape functions needed to accomplish the hyper-surface reconstruction will depend on the dimension of the input space and on the complexity of the original function. In order to implement the complete computational approach, the approximators used were standard backpropagation neural networks, but the hyper-surface reconstruction method is quite general, and any known interpolation method could be used here. Approximators and algebraic operations used by the shape method are then disposed in a tree-like structure. This structure will be used to calculate all the function partial derivatives (the jacobians), with no additional data gathering or learning, enabling a differential model inversion.
2
The hyper surface reconstruction method
We will name F the set of functions and we will focus in a general functional form defined by the functions set Fn : Fn = { f ∈ F such as f (θ1 , θ 2 ) = ϕ 0 (θ1 ). ψ 0 (θ 2 ) +
ϕ1 (θ1 ). ψ 1 (θ 2 )+... + ϕ n (θ1 ). ψ n (θ 2 )} (1)
We want to prove that transformation (2), used recursively, can reproduce any function of the form (1). To prove this, let us define a set of sets (3) Γ 0 = { f ∈ F such as DPf = 0} Γ1 = { f ∈ F such as DPf ∈ Γ 0 }
(3)
Γ 2 = { f ∈ F such as DPf ∈ Γ1} ... Γ n = { f ∈ F such as DPf ∈ Γ n−1} Theorem : Γ n = Fn
(4)
We can rewrite (2) for the functions g belonging to the set Γ 0 as : 0
0
g(θ1 , θ 2 ) =
0
g(θ1 , θ 2 ). g(θ1 , θ 2 )
DPg(θ1 , θ 2 ) = g(θ1 , θ 2 ) −
0
0
g(θ1 , θ 2 )
0
g(θ1 , θ 2 ). g(θ1 , θ 2 ) 0
0
g(θ1 , θ 2 )
and define : 0
ϕ 0 ( θ1 ) =
g(θ1 , θ 2 )
0
, ψ 0 (θ 2 ) = g(θ1 , θ 2 )
0 0 g(θ1 , θ 2 )
and g(θ1 , θ 2 ) = ϕ 0 (θ1 ). ψ 0 (θ 2 ) In the same way and using recurrence we can redefine the functions h belonging to set Γ1 as : 0
where ϕ 0 , ψ 0 , ϕ1 , ψ 1 ,..., ϕ n , ψ n are any function. This form, that will be extended to more than two variables later, interests us because it describes a general polynomial form, that could be found in almost any computational geometry problem. Moreover, this form can be decomposed and reproduced in another form, computationally very interesting. We already defined the functional form and now we will define in (2) DPf, a functional transformation : DPf (θ1 , θ 2 ) = f (θ1 , θ 2 ) −
f (θ1 , θ 20 ). f (θ10 , θ 2 ) f (θ10 , θ 20 )
(2)
DPh(θ1 , θ 2 ) = h(θ1 , θ 2 ) −
and
0
h(θ1 , θ 2 ). h(θ1 , θ 2 ) 0
0
h(θ1 , θ 2 )
DPDPh(θ1 , θ 2 ) = 0 1
DPh(θ1 , θ 2 ) −
1
DPh(θ1 , θ 2 ). DPh(θ1 , θ 2 ) 1
1
DPh(θ1 , θ 2 ) 0
h(θ1 , θ 2 ) =
0
h(θ1 , θ 2 ). h(θ1 , θ 2 ) 0
0
h(θ1 , θ 2 )
=0 1
+
1
DPh(θ1 , θ 2 ). DPh(θ1 , θ 2 ) 1
ϕ 0 ( θ1 ) =
h(θ1 , θ 2 ) 0 0 h(θ1 , θ 2 )
0
, ψ 0 (θ 2 ) = h(θ1 , θ 2 ), 1
0 θ2
are function variables particular values,
f (θ1 , θ 2 ) is a function determining an hyper - surface, 0 f ( θ1 , θ 2 )
0 f ( θ1 , θ 2 )
and
0 0 f ( θ1 , θ 2 )
is a point in the hyper - surface.
are the shape functions and
ϕ 1 ( θ1 ) =
DPh(θ1 , θ 2 ) 1
1
DPh(θ1 , θ 2 )
1
DPh(θ1 , θ 2 )
0
where : 0 θ1
=0
1
, ψ 1 (θ 2 ) = DPh(θ1 , θ 2 ),
and h(θ1 , θ 2 ) = ϕ 0 (θ1 ). ψ 0 (θ 2 ) + ϕ1 (θ1 ). ψ 1 (θ 2 )
Using the recurrence until the set Γ n we will find the form (1), proving the theorem (4). Hence, using recursively the DPf transformation and the partial functions containing just one input variable we can reconstruct any function of the form (1). The method can be easily generalised to more input function variables. For a three input variables function, for instance, the DPf transformation will take the form : DPf (θ1 , θ 2 , θ 3 ) = f (θ1 , θ 2 , θ 3 ) − 0
0
0
f (θ1 , θ 2 , θ 3 ). f (θ1 , θ 2 , θ 3 ). f (θ1 , θ 2 , θ 3 ) 0
0
0
f ( θ1 , θ 2 , θ 3 )
(5)
Each function in (5) with two input variables and a 0 variable value like f (θ1 , θ 2 , θ 3 ) can be solved by the twovariables case of the method because the third variable acts as a constant. Hence, in the same way, we can reconstruct a function containing an arbitrary number of variables using only one input functions and the transformation. The DPf transformation was chosen expressly to decompose functions of the special form (1). We are interested by this form because it can be found in a large class of problems, including computational geometry problems that are particularly interesting to the robotics domain. We will illustrate the application of the method with one of this problems : a robot arm kinematics calculation. We are actually exploring the same method using variations of the functional transformation in order to extend the functional forms we can decompose.
3 The direct kinematics problem To illustrate the application of the method we propose to solve a simple problem: the direct kinematics calculation of a two degrees of freedom robot arm (figure 1).
T1
The position of the robot arm can be defined analytically by four variables : τ1 , τ2 ( the angles between the links of the rotational joints) and l1, l2 (the link lengths). The direct kinematics problem consists of finding the configuration kinematic equations which relates the joint variables τ1 , τ2 , l1, l2 to x , y (the Cartesian co-ordinates of the arm extremity). We will show the application of the method analytically, to demonstrate it is correct, but the real interest is that it can be used just numerically, in real application queries, where usually we do not hand functional forms, that can be very complex and often unknown. For simplicity, we consider the link lengths equals to unit in this analytic illustration. In fact, numerically we can ignore these two parameters and all other constants used to define the system, because all this constants will be contained in the values obtained by the queries. In the model acquisition of physical systems this point can be quite relevant, since often we do not have a complete and trustful model definition. In robots arms, for example, the values of the angles may be defined by : τ 1 = τ 1c + τ 1i , where τ1c is a commanded angle and τ 1i is an offset. This offset is another system constant, and its determination may introduce errors in the model. In our arm kinematics problem, to calculate x , for instance, we can use the equation (6) ( note it respects the general form (1) ) : x = f ( τ1 , τ 2 ) ∈ Fn f ( τ1 , τ 2 ) = cos( τ1 ) + cos( τ1 ). cos( τ 2 ) − sin( τ1 ).sin( τ 2 )
(6)
We will access f just in one dimension each time, giving us the shape functions. We will assume the query series determining the shape functions will be taken on: τ10 = 0 , τ11 = π/2 , τ20 = 0 and τ 2 1 = π/2 . We start the model acquisition process supposing the function f belongs to the set Γ 0 and proposing a function g: g( τ1 , τ 2 ) ∈ F0 DPg( τ1 , τ 2 ) = g( τ1 , τ 2 ) −
g(0, τ 2 ). g( τ1 , 0) g(0, 0)
in the one or no variables cases : T2
EF(x,y) figure 1 - the robot arm
g= f
and so :
g( τ1 , τ 2 ) =
f (0, τ 2 ). f ( τ1 , 0) f (0, 0)
g( τ1 , τ 2 ) = cos( τ1 ) + cos( τ1 ). cos( τ 2 )
(7)
=0
This first decomposition trial shows us the function f does not belong to the set Γ 0 , since the equation (7) does not match the equation (6). We will try a deeper recurrence level and propose a new function h belonging to Γ1 : h( τ1 , τ 2 ) ∈ F1 DPh( τ1 , τ 2 ) = h( τ1 , τ 2 ) −
h(0, τ 2 ). h( τ1 , 0) h(0, 0)
DPDPh( τ1 , τ 2 ) = 0 DPh( τ1 , τ 2 ) − h( τ1 , τ 2 ) =
DPh( π 2 , τ 2 ). DPh( τ1 , π 2 ) =0 DPh( π 2 , π 2 )
h(0, τ 2 ). h( τ1 , 0) h(0, 0)
+
DPh( π 2 , τ 2 ). DPh( τ1 , π 2 ) DPh( π 2 , π 2 )
h( τ1 , τ 2 ) = cos( τ1 ) + cos( τ1 ). cos( τ 2 ) − sin( τ1 ).sin( τ 2 ) Analytically we found the original function f(τ 1 , τ2) (6) using just the shape functions f(τ 1 , 0) , f(τ 1 , π/2), f(0,τ2) and f(π/2,τ2) . Obviously the transformation works for any choice of τ 1 0 , τ11 , τ 2 0 and τ 2 1 , except those where f( τ1 0 , τ20 )=0 or DPf(τ1 1 , τ21 ) =0.
n the depth of the recurrence. We must add one level to the depth of the recurrence to do the depth test. To compute the values of the shapes in our implementation we obtain a query series with points in the shape and we interpolate this points using a backpropagation neural network. The implemented approach was used in simulated learning problems like the direct and inverse kinematics presented in this paper. The accuracy of the acquired direct method depends strongly on the interpolation precision. Residual errors are propagated and amplified by the recurrent transformations. Hence, to get a good model, a careful approximation of the shape function must be made. However, the use of the back-propagation algorithm in one entry problems gives reasonable convergence times, and the accuracy problem can be solved with some extra learning epochs. The shape functions learning tasks can also be made by independent processes, accelerating the overall learning time. To visualize the computational direct model obtained with the proposed method without the recurrence we present the method inverted tree (figure 2). This tree has interpolators (Ni) and constants (C) at the leaves, simple operations (+ - * /) at the branches and an addition at the root. One tree leaves from the input values (the interpolators entries) to one output value (the tree root). The tree of the figure 2 was obtained in the decomposition of the direct kinematics example.
4 The algorithm to model acquisition from partial information + The interest in having such a method is to acquire (to learn) a model from a system from which we can obtain data points, but not a precise and trustful structural description, like a functional form. The proposed method can acquire a model from a system, by doing a functional decomposition. The DPf transformation is used recurrently, and the depth of this recurrence, determined by the surface complexity, determines the number of known data points needed to estimate the output value in a new unknown entry point. The proposed method was implemented in an recurrent algorithm that can obtain the recurrence depth needed to represent a surface by testing its own previsions. The depth test is made doing twice the model calculation considering a supposed depth. Each calculation use a different shape functions set (taken on different surface locations) and the same result must be found to prove the supposed depth is the good one. When the functional decomposition is done and the shape functions acquired, the direct model is ready to be used. The total number of shapes needed to reconstruct an hyper-surface depends on the number of input variables and on the depth of the recurrence. This number is defined n −1 by m.2 where m is the number of input variables and
/
*
N1
/
*
C
N2
-
-
-
C
/
N3
/
N4
*
C
*
N1
C
C
C
C
figure 2 - the method tree
N2
5
The generalized inverse model
An inverse model is an internal model that produces an action as a function of the current state and the desired sensation [11]. An output error is the difference between the current output state and the desired output state . We will use this output error to calculate input variables differences, coding the action to move to an input configuration whose output values are the desired ones. Using the method inverted trees we can calculate the function output values and also to approximate numerically all the function first-order partial derivatives (the jacobians). To calculate the jacobians we start calculating the partial derivatives of each shape function. The equation (8) shows the approximation of a first-order derivative sf i' from a shape function sfi .
sf i ' (θ j ) =
∂ sf i =0 ∂θ k
∂ sf i sf i (θ j + ∆θ j ) − sf i (θ j ) = ∆θ j ∂θ j
(8)
k≠j
Running through the model tree we will find the algebraic operations. Each input of an algebraic operation contains an input value and a list of partial derivatives values. The partial derivatives of the algebraic operation can then be computed.
f (θ 0 ,..., θ i ) = g(θ 0 ,..., θ i ).h(θ 0 ,..., θ i )
∂f ∂g ∂h = .h(θ 0 ,..., θ i ) + .g(θ 0 ,..., θ i ) ∂θ j ∂θ j ∂θ j f (θ 0 ,..., θ i ) = g(θ 0 ,..., θ i ) + h(θ 0 ,..., θ i )
∂f ∂g ∂h = + ∂θ j ∂θ j ∂θ j f (θ 0 ,..., θ i ) = g(θ 0 ,..., θ i ) − h(θ 0 ,..., θ i )
∂f ∂g ∂h = − ∂θ j ∂θ j ∂θ j f (θ 0 ,..., θ i ) = g(θ 0 ,..., θ i ) / C
∂f ∂g = /C ∂θ j ∂θ j Note we have only divisions by constants in the tree.
The partial derivatives will be propagated until the tree root. The same process will be repeated for each tree used to calculate an output value of the model. The differences in the system can then be described by a differential equations set (9).
xi = f i (θ 0 ,..., θ j ) ∆xi = ∑ j
∂f i .∆θ j ∂θ j
(9)
To calculate the differences ∆θ j we must solve the equations system defined by (9). This differences are a first-order approximation of the real desired action. Using this approximated action the system will converge to the desired position. If the new position is not as close as derired of the goal the inverse calculation can be repeated.
6
The inverse kinematics problem
The complete approach was applied in the kinematics problem. A simple case like the planar arm (figure 1) needs four shape functions and two trees containing 25 nodes each. The model acquisition is then made by four complete rotational arm movements in one degree of freedom. Each rotational movement is segmented to read the end-effector position along the trajectory. Manipulators having six joints can present different configurations, and in each configuration the function decomposition will be did in a different way, with differents recurrence depths. We applied the method to a simulated six joints spherical manipulator with a spherical wrist. In this configuration we have two simpler kinematics problems : the positions kinematics problem (to calculate the wrist center) and the orientation kinematics problem (to find the orientation of the wirst). The model computation for this manipulator uses 10 shape functions. The bigger tree in the model contains 111 nodes. Its difficult to write about quantitative evaluations in performance or errors when we use simulations. The function decomposition do not add noise or errors to the computed values. The method just amplify the errors contained in the interpolated input data. Hence, choosing the number of data points we use to define each shape function, and using a good, well tunned interpolator, the method calculation errors can be did as small as we want. Using a reasonable number of error-free simulated data it is even possible to use linear interpolation at the place of non-linear approximators maintaining small output errors. The inversion convergence depends on the partial derivatives and to have coarse fitting interpolations do not necessarily means to have wrong partial derivatives. It is
possible to maintain good inversion convergence even with coarse interpolations. We are actually more interested exploring the functional decomposition and inversion, but we know that a complete validation of the method must be did in real systems, with complete and precise quantitative evaluations in the future work. The simulated cases proved the method can decompose complex functions like the arm kinematics. In this cases we proved also the convergence of the inversion. Sometimes when the goal is too far from the starting position the inversion compute action values out of the action ranges. This problem was already reported in [11], and it seems intrinsic to the differential techniques. This problem can be avoided by establishing intermediates subgoals between the starting point and the goal position.
7 Conclusion We presented a computational approach to acquire and use direct and generalized inverse models with minimal data acquisition and memorisation effort. The direct model acquisition method generalises information obtained in partial queries and decomposes an hyper-space function in one input functions. The method is quite general, and it is a powerful way of reducing the number of examples needed to interpolate a function, enabling the use of approximators, like neural networks, in real applications where the data points are costly obtained. The generalized inverse model is obtained with no additional learning effort. In Robotics, the proposed approach can be particularly interesting, since this kind of system can be easily driven to obtain the desired data. Moreover, the acquisition of large data bases in robotics systems can be a strong constraint to the application of learning methods. Problems determined by functional forms belonging to the special form handled by the learning method can be easily found in Robotics. The method was applied only in simulated cases and it must be tested in real systems in the future work to be completely validated.
References [1] S. Ahmad and S. Omohundro, A network for extracting the locations of point clusters using selective attention, Tech. Rep. 90-011, Int. Computer Science Institute, University of California, Berkeley. [2] Walter L. Baker and Jay A. Farrel, An introduction to Connectionist Learning Control Systems, in Handbook of Intelligent Control, Neural, Fuzzy and Adaptative Approches. Eds. David A. White and Donald A. Sofge. 1992. [3] A. G. Barto, Connectionist Learning for Control: An Overview. Tech. Rep. 89-89, University of Massachusetts, Amherst, Mass (1989).
[4] E. B. Baum, Neural net algorithms that learn in polynomial time from examples and queries, IEEE Trans. Neural Networks, vol. 2, pp. 5-19, 1991. [5] G. Box and N. Draper, Empirical Model-Building and Response Surfaces. New York: Wiley 1987. [6] W.S. Cleveland, S.J. Devlin and E. Grosse, Regression by local fitting: methods, properties and computational algorithms. J. Econometrics, 37, 87-114, 1988. [7] D. Cohn, A local approach to optimal queries, in Connectionist Models Summer School (CMSS-90). Proc. 1990 Summer School in San Diego, D.S. Touretzky et al., Eds. San Mateo, CA: Morgan Kaufmann. [8] J.J. Faraway, Sequential design for the nonparametric regression of curves and surfaces, Tech. Rep. 177, Department of Statistics, The University of Michigan, Ann Arbor, 1990. [9] J.N. Hwang, J.J. Choi, S. Oh, and R.J. Marks II, Query Learning based on boundary search and gradient computation of trained multilayer perceptrons, in Proc. IJCNN 1990, San Diego, The Int. Joint Conference in Neural Networks, 1990. [10] J.N. Hwang, J.J. Choi, S. Oh, and R.J. Marks II, Query based learning applied to partially trained multi-layer perceptrons, IEEE Trans. Neural Networks, vol. 2, pp. 131-136, jan. 1991. [11] M.I. Jordan and D.E. Rumelhart, Forward models: Supervised learning with a distal teacher, (Occasional Paper 40). Cambridge, MA: MIT, Center for Cognitive Sciences. 1991. [12] Lisa Meeden and Gary McGraw and Douglas Blank, Emergent Control in an Autonomous Vehicle. Proceedings of the Fifteenth Annual Conference of the Coginitive Science Society. 1993. [13] Mark Plutowski and Halbert White, Selecting Concise Training Sets from Clean Data, IEEE Transactions on Neural Networks, vol. 4, no. 2, march 1993. [14] Tomaso Poggio and Federico Girosi, A Theory of Networks for Approximation and Learning, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, A.I. Memo No.1140, july 1989. [15] S. Yakowitz, and E. Lugosi, Random search in the presence of noise, with application to machine learning, Society for Industrial and Applied Mathematics Journal on Scientific and Statistical Computing, vol. 11, no. 4, pp. 702-712, 1990.