Learning to Climb: An Application to Binary Classification in High Dimensions
Stelian Coros Department of Computer Science University of British Columbia CPSC 540 Course Project
[email protected]
Abstract In this paper I describe the implementation of a system used to autonomously drive bipeds in a variable environment that consists of stairs of different heights placed at variable intervals. Out of an existing pool of controllers, the one that is best suited to deal with the current setting should become active. Gaussian Processes Classification is used in order to learn the regions in the biped’s state space within which each controller is expected to perform correctly. I investigated the performance of the Expectation Propagation and Laplace’s approximation for GPCs for this learning problem and I found that both methods have about the same predictive power.
1
Introduction
In the fields of computer graphics, robotics and biomechanics, a sought-after goal is the creation of humanoid characters that are able to autonomously produce realistic motion when placed in realistic environments. Despite a significant amount of research in this area, the problem of bipedal motion planning that produces the efficiency, robustness and flexibility of human motion over variable terrain (i.e. stepping over obstacles, climbing on stairs, etc) remains unsolved [1]. This is a very difficult problem because bipeds are inherently unstable, high-dimensional dynamic systems. Designing control strategies for biped locomotion is non-trivial[2]. Additionally, autonomous characters should exhibit fairly sophisticated behaviour, and they are expected to interact naturally with their environment[2]. An intuitive way to attack this problem is to consider a divide-and-conquer approach, whereby simple specialized controllers are composed into control systems with broader functionalities [3]. In this paper I discuss my implementation of a system that aims at guiding a bipedal character to autonomously navigate through uncertain terrain that is composed of stairs of variable heights that occur at variable intervals. As in [3], I work with a set of relatively simple controllers (based on the work in [2]). A simple supervisor controller is responsible for choosing a controller that is well suited for handling the current state of the biped, where I extend the biped’s state to also include a parametrization of the terrain it finds itself in. In order to make this possible, one needs to determine those regions in the character’s (extended) state space within which an individual controller is expected to function properly. This requirement lends itself very naturally as an
application to the binary classification problem for high dimensional data. A popular way of solving the classification problem is to use Gaussian Processes Classification (GPC)[4-7]. GPCs are Bayesian kernel classifiers derived from Gaussian processes for regression[6] and they are used to obtain the probability of an input point belonging to a certain class. In general, exact inference in GPC models is intractable[6, 7], and many approximate solutions have been developed. In this paper I investigate the performance of the Laplace approximation[7] and the Expectation Propagation algorithm[5, 6] for Gaussian Processes Classification as applied to the problem of an autonomous stair climbing biped. This paper proceeds as follows: in section 2 I introduce the background information that is necessary for understanding the implementation of the system. In section 3 I briefly review Gaussian Processes Classification. Section 4 gives some insight into the process of generating the training data. Section 5 describes the results and Section 6 presents conclusions and suggestions for future work.
2
Background Information
In this section I briefly summarize the control strategy introduced in [2], give details regarding the biped used in this paper and present the control composition framework used in my implementation. 2.1
Simbicon: A framework for biped locomotion
The work in [2] presents a simple yet promising finite state machine (FSM) based control strategy that can be used for biped locomotion. Each state in the FSM describes a desired body pose that is characterized by target angles for each joint. The joint angles in a character are driven towards their targets using PD control. Transitions between states in the FSM occur either after fixed amounts of time, or after foot contacts are established. One of the contributions presented in [2] is the introduction of a balance strategy: the target angles in the FSM are computed using a feedback law of the form θd = θd0 + cd d + cv v, where θd is the dynamically computed target angle, θd0 is the default angle in the FSM, d is the horizontal distance between the stance foot and the center of mass (COM) and v is the velocity of the COM as shown in Fig 1. The feedback gain parameters cd and cv are instrumental in providing balance [2]. For my implementation, I used the code for the locomotion system developed in [2]. The 9 degrees of freedom biped shown in Fig 1 is used throughout this paper. For details regarding this biped (i.e. mass, height, etc) please see [2]. Three controllers are used for this system. One of them represents a walking gait of approximately 1.5m/s, the second is a slower walking gait of roughly 1m/s and the third controller represents a high stepping gait that is able to climb stairs. All three controllers are represented by FSMs with four states (see Fig 2), two of which correspond to foot placements. 2.2
Controller Composition Framework
One advantage of using FSMs in [2] is that switching between controllers is simple: a transition is made between two states of the FSMs of different controllers[2]. In other words, one can consider one large FSM that is composed of all the FSMs of the individual controllers in the system, as exemplified in Fig 2. In addition to the state transitions already available, there are also transitions between the states of different controllers. By choosing one of these additional transitions, the supervisor controller can activate different individual controllers. Of course, not all the possible combinations of such transitions lead to successful motions (i.e. the biped can become unstable and fall over). In this work I use the idea presented in [3]: for each state in each individual controller’s
Figure 1: Meet Bip7, a 9 DOF planar biped. The distance d between the stance foot and the COM and v, the velocity of the COM make up the sensory input for the feedback control introduced in [2]. The terrain in the near vicinity of the biped is parameterized by δ, the distance to the next stair and h the stair’s height. FSM that accepts transitions from other controllers, the regions in the biped’s state space (more information on the biped’s state space is given in Section 4) that lead to a normal gait are learned. It often happens that multiple controllers can handle the biped’s current state, but ties are broken in a very simple way. The walking controller has priority over the other two, and the slow walk controller has priority over the high stepping controller. This simple choice is justified by the fact that the biped is expected to walk naturally when no stairs are present, and only invoke the other two controllers when a normal walk would be unsuitable (for instance when the biped is close to a stair).
Figure 2: Abstract controller representations. The even numbered states represent the foot placement states (i.e. in the Walk controller, state 0 becomes active when the right foot touches the ground). The gray lines represent possible transitions between the FSMs of different controllers (only shown here for the first two) that can be used by the supervisor controller.
3
Gaussian Processes Classification
Assuming that we have a data set D that is made up of multi-dimensional data points xi , and each data point has a label denoted yi that can be either −1 or 1, the binary ˜. classification problem is defined as finding the correct label y˜ for a new input point x Gaussian Process Classifiers are a useful tool for this problem and they are derived from Gaussian Processes for regression[6]. Gaussian processes are non-parametric models that represent the density of the training data points as a multivariate Gaussian distribution which is described by a mean function, usually zero, and the covariance between different pairs of input points which is a decreasing function of their distance[6]. The covariance function is characterized by a small number of hyperparameters. GPCs can be viewed as graphical models which have random variables for inputs, latent functions and class labels[6], as shown in Fig 3.
As described in [6], the class probability for the new R input point is obtained by integrating over the hyperparameters: p(˜ y |˜ x, D) = p(˜ y |˜ x, D, Θ)p(Θ|D)dΘ, where R ˜ )df˜. Finally, p(f˜|Θ, D, x ˜ ) is obtained by further p(˜ y |˜ x, D, Θ) = p(˜ y |f˜, Θ)p(f˜|Θ, D, x R ˜ ) = p(f, f˜|Θ, D, x ˜ )df. In integrating over f, the vector of latent functions: p(f˜|Θ, D, x general, evaluating these integrals can be costly[6], so many approximation algorithms have been developed [4-7].
˜ Figure 3: Graphical model for GPCs[6], with n training points. xi and yi are observed, x is given and y˜ is to be predicted. fi and f˜ are latent functions and are jointly Gaussian. Θ represents the hyperparameters of the covariance function. 3.1
Expectation Propagation for GPC
The Expectation Propagation (EP) algorithm[4-6] presents a way of conducting approximate Bayesian inference. If given a typical Bayesian inference problem the posterior over some parameter θ is given by the product between the prior and the likelihood. Q For i.i.d. data, this becomes p(θ|y , ..., y ) ∝ P (θ) p(y |θ). This can be approximated 1 n i Q by q(θ) ∝ t˜0 t˜i (θ), where each term is assumed to be an exponential. EP works by successively choosing values for t˜i (θ) that minimize the Kullback-Leibler divergence: q(θ) q(θ) t˜new = arg min KL( old p(yi |θ)k old ti (θ)) i t˜i t˜i When solving the EP problem for GPC, the vector of latent functions f plays the role of the parameter θ. EP attempts to approximate the posterior p(f|D) = p(f)p(D|f)/P (D) as a Gaussian q(f) ∼ N (mf , Vf ). The exact details of the EP algorithm are omitted here due to space constraints, but the reader is referred to [5, 6] for more details. The ˜ A). Using this posterior over the latent functions can be expressed as q(f) ∼ N (Ca, ˜ information, classification of a new data point x , given the value of the hyperparamP ˜ )), where c represents the covariance eters can be done according to sgn( ai yi c(xi , x function. 3.2
Laplace’s Approximation
Laplace’s approximation (LA) [7] is an analytical approach that allows one to approximate p(˜ y |˜ x, D, Θ) by a Gaussian distribution that is centered at the maximum of a function with respect to y˜, y and has an inverse covariance matrix given by −∇∇ log(p(˜ y |˜ x, D, Θ)). Newton-Raphson iterations over y can be used to find the maximum of the function, and then the approximate distribution of y˜ can be computed[7]. For this project I used Rasmusen’s implementation of the EP and Laplace’s approximation algorithms for GPCs that is available online [4]. In this implementation, the hyperparameters of the covariance function (for both Laplace’s approximation and EP)
are found using an iterative optimization method that uses Polack-Ribiere conjugate gradients to compute the search direction combined with a line search[4].
4
Training Data
The biped used throughout this project has 9 degrees of freedom (DOF), 7 that characterize the orientation of each link, and 2, x and y, that represent the coordinates of the COM. The state of the biped at any time is characterized by the quantities that describe each DOF as well as their rates of change (i.e. velocities). For the purpose of generating the training data, the x translational coordinate is irrelevant (because only the distance to the step is needed) so it was ignored. Also included in the state is a parametrization of the terrain in the vicinity of the biped. As shown in 1, the vicinity of the biped is characterized by the distance δ to the next step, and its height h. With this in mind, the extended state of the biped lives in R19 . In order to generate the training data, the following process was used: first, a plausible set of 40 biped configurations were sampled from the normal execution of each of the individual controllers. Also included in this set were stances obtained by manually switching between controllers, as the biped often needs to take a few steps before regaining a stable gait. For each of these biped stances, I then varied the distance to the step (20 values between 10cm and 2.5m) and also its height (10 values between 3cm and 30 cm). This gave rise to 8000 extended biped states (also referred to as test points). In order to classify each test point as successful or not, I ran a simulation where the initial state of the biped was dictated by the test point. If at any time during the simulation the biped came close to a stable gait (I used the L2 norm to compute the distance to a set of states that occurred during normal execution) the test point was deemed successful. If the biped fell (parts other than the feet collided with the ground), or more than 5s of simulation time passed, the test point was deemed unsuccessful. The training data was generated for each FSM state that corresponds to the right foot touch down. In order to generate predictions for the states corresponding to the left foot touch down I made use of the fact that all the gaits used in the project were symmetric by using the prediction obtained on the reversed stance.
5
Results
In order to generate the results, 1000 training points were randomly selected out of the set of 8000 extended states of the biped. The rest were used as test data. Gaussian Processes Classification using both Laplace’s approximation and EP was carried out. A squared exponential function with isotropic distance measure[4] was used. The values of its hyperparameters were computed as briefly outlined in section 3. Table 1 summarizes the percentage of correct predictions. As can be seen, the quality of the predictions obtained by using EP are only slightly better than when using Laplace’s approximation. However, the training times for EP were 3-4 times higher (about 35m for EP and 9m for LA. These times include the optimization over the hyperparameters of the covariance function.) I imported the training results from Matlab and I used them in my C++ implementation of the system in order to make predictions. With this information, the biped was able to climb over stairs as planned. While the system is not a very robust (stairs need to be fairly far apart and the biped still trips over sometimes), it indicates that such an approach can be used in order to autonomously drive a bipedal character over uncertain terrain. A video showcasing this result can be found at http://www.cs.ubc.ca/˜scoros/cs540.
Table 1: GPC results. Successful prediction rates for the test and training data.
6
Controller
EP - training
EP - test
LA - training
LA - test
Walk Slow walk High Step
0.9640 1 0.826
0.9353 0.9337 0.7756
0.9560 0.9820 0.8230
0.9321 0.9277 0.7731
Conclusions and future work
In this paper I presented a framework that can be used to autonomously drive a bipedal character over variable terrain consisting of stairs of different heights placed at variable intervals. Gaussian Processes Classification was used to learn, for each controller, the regions in the biped’s state space that do lead to a normal gait. Based on this information, the controller that is best suited with the current state of the biped and the environment the biped is in was activated. Two approximations to GPCs have been tested for this problem: Expectation Propagation and Laplace’s Approximation. I found that the two approximations produce very similar results, with EP producing slightly better results. The training time for LA, however was found to be 3-4 times lower than the training time for EP. This is not a large concern for this application, since the training was only performed once off-line. The list of suggestions for future work is a long one. Increasing the number of controllers in the system should be simple and I expect it would lead to better results. Also, as long as a parametrization exists for other kinds of terrains (for instance terrains with variable slopes, obstacles, etc) a similar strategy as the one presented here can be applied. Other methods for binary classification can be used as well. For instance, [8] presents a method suitable for non linear regression for high dimensional data using an approximate nearest neighbour algorithm. This algorithm is computationally inexpensive, can handle irrelevant or redundant inputs well, and can be used efficiently for online incremental learning[8]. Last but not least, the approach described here looks only one step ahead each time. It would be interesting to reformulate this problem as a reinforcement learning problem and compare the results. References [1] Ramamoorthy, S. & Kuipers, B.J. (2007) Qualitative hybrid control of dynamic bipedal walking. Robotics : Science and Systems II, MIT Press. [2] Yin, K., & Loken, K. & van de Panne, M. (2007) SIMBICON: Simple Biped Locomotion Control, To appear in Proceedings of ACM SIGGRAPH. [3] Faloutsos, P. &, van de Panne, M. & Terzopoulos, D. (2001) ”Composable Controllers for Physics-based Character Animation”. Proceedings of ACM SIGGRAPH. [4] Rasmusen, C.E. & Williams,C. (2006) Gaussian Processes for Machine Learning. MIT Press. Software available at http://www.gaussianprocess.org/gpml/. [5] Minka, T.P. (2001) A family of algorithm for approximate Bayesian inference. Ph.D. thesis, Massachusetts Institute of Technology. [6] Kim, H. & Ghahramani, Z. (2003) The EM-EP algorithm for Gaussian Process Classification. In Proceedings of the Workshop on Probabilistic Graphical Models for Classification. [7] Williams, C.K.I., Barber, D. (1998) Bayesian classification with Gaussian Processes. IEEE Transactions on PAMI 20 . [8] Vijayakumar, S. & D’Souza, A. & Schaal, S. (2006) Approximate nearest neighbor regression in very high dimensions, Nearest Neighbor Methods in Learning and Vision, MIT Press.