Sequential View Grasp Detection For Inexpensive Robotic Arms

4 downloads 24 Views 5MB Size Report
Sep 16, 2016 - [9] Gregory Kahn, Peter Sujan, Sachin Patil, Shaunak Bopardikar, Julian ... [21] Javier Velez, Garrett Hemann, Albert S. Huang, Ingmar Posner, ...
Sequential View Grasp Detection For Inexpensive Robotic Arms

arXiv:1609.05247v1 [cs.RO] 16 Sep 2016

Marcus Gualtieri and Robert Platt∗

Abstract— In this paper, we consider the idea of improving the performance of grasp detection by viewing an object to be grasped from a series of different perspectives. Grasp detection is an approach to perception for grasping whereby robotic grasp configurations are detected directly from point cloud or RGB sensor data. This paper focuses on the situation where the camera or depth senor is mounted near the robotic hand. In this context, there are at least two ways in which viewpoint can affect grasp performance. First, a “good” viewpoint might enable the robot to detect more/better grasps because it has a better view of graspable parts of an object. Second, by detecting grasps from arm configurations nearby the final grasp configuration, it might be possible to reduce the impact of kinematic modelling errors on the last stage of grasp synthesis just prior to contact. Both of these effects are particularly relevant to inexpensive robotic arms. We evaluate them experimentally both in simulation and on-line with a robot. We find that both of the effects mentioned above exist, but that the second one (reducing kinematic modelling errors) seems to have the most impact in practice.

(a)

(b)

Fig. 1. (a) Illustration of grasp detection. (b) Depth sensor mounted near a robotic hand.

I. I NTRODUCTION Recently, grasp detection has been demonstrated to be a practical approach to perception for robotic grasping [6], [11], [7]. In grasp detection, machine learning methods are used to identify places in an image or point cloud where a grasp would be feasible. For example, Figure 1(a) illustrates grasps detected by our algorithm given a point cloud of a shampoo bottle as input. The yellow grippers denote robotic hand configurations from which a good grasp of the object is expected to be feasible. Since grasp configurations are detected without first estimating the object pose, grasp detection typically works well for grasping novel objects when precise information about object geometry is not known in advance. Most current approaches to grasp detection operate based on an image or point cloud obtained by viewing the scene from a fixed viewpoint. For example, Levine’s work at Google uses input from a single RGB camera [14]. Some of our own prior work creates a point cloud by registering together information from a pair of RGBD cameras [18]. In this paper, we consider the idea of sequential view grasp detection, where a camera or depth sensor is mounted near the robotic hand (Figure 1(b)) and views the target object from a series of perspectives until finally attempting a grasp. There are a couple of reasons why multiple views could be better than one in the context of grasp detection. First, it might be possible to improve grasp detection performance by planning viewpoints relative to a specific object/grasp of interest. Second, it might be possible to reduce the effect ∗ College of Computer and Information Science, Northeastern University, Boston, Massachusetts, USA [email protected]

of manipulator kinematic model errors on grasp success by viewing the object from an arm configuration close to the final grasp configuration. These questions are particularly important in the context of grasping with inexpensive robotic arms such as the Baxter arm where it is impossible to measure end effector pose using forward kinematics calculations accurately. Moreover, whereas it is easy to register multiple views together into a single point cloud (or truncated signed distance function, TSDF) with accurate kinematic estimates, this becomes very difficult if one must rely SLAM techniques because ICP-like methods do not work well in empty/uncluttered near-field scenes. This paper explores the possibilities described above by performing experiments off-line using the BigBird object dataset [17] and on-line using our Baxter robot. Although all experiments were performed using variations of our own grasp detection algorithm, we expect the results to generalize to other algorithms similar to ours such as [11], [7], [4], [5]. We have several interesting findings. First, we find that viewpoint does indeed affect grasp detection performance, although the nature of the effect can depend on object type. Second, we find that there is little benefit to viewing the object from many additional viewpoints once a “good” viewpoint is obtained unless the different-viewpoint data is registered together. Third, we find that there is a significant advantage to viewing the object at close range from a kinematic configuration close to the expected grasp configuration. Doing so seems to make the overall algorithm less sensitive to manipulator kinematic model errors and can improve grasp success rates by as much as 9%.

II. BACKGROUND A. Viewpoint selection Optimal viewpoint selection falls into a large body of research called active sensing. We only attempt to mention a few works in this area in order to demonstrate where the present work fits in. Viewpoints can either be considered independently as in [20] or sequentially along a trajectory as in [13], [21]. In this paper we examine the effects of both individual and multiple viewpoints on grasp detection performance. In this work we do not attempt to reconstruct a scene from multiple views; although, this can sometimes be a powerful approach. We and others have explored this with significant benefit to grasp detection performance [9], [6], but the primary difficulty faced with this is that the ICPbased SLAM algorithms do not register multiple views well in near-field, uncluttered scenes. Many prior works see viewpoint selection as a problem in maximizing the amount of information gained from a the prospective viewpoint. This can be expressed in terms of entropy [20], KL divergence [19], or Fisher information [13]. The main decision in these various applications is the probability distribution used for computing the information obtained from the sensing action. A key challenge addressed by these approaches is the ability to correctly predict what probability distribution will be seen from a yet unobserved position given the current information about the scene. The advantage of our approach is that it is very easy to generate a distribution of view quality in the framework of grasp detection, and no additional training/prediction is needed beyond what is provided by existing grasp detection algorithms. Also, this provides a very natural way to incorporate any prior information that may be known about the type of task being completed, since this distribution is somewhat object-specific as we show later. For overcoming kinematic errors when the arm is near the end of the task, visual servoing is frequently used. Generally this requires a specific feature of an object or a fiducial on the object [1], which is difficult to get in open-world environments. Levine et al. overcome this problem using end-to-end deep learning [14]. The present work attempts to limit these sorts of errors by viewing the scene as close as possible to the planned grasp configuration. B. Grasp detection Traditional approaches to perception for robotic grasping are object-based [2]. First, they plan grasps with respect to a known mesh model of an object of interest. Then, they estimate the 6-DOF pose of the object and project the grasp configurations planned relative to the object mesh into the base frame of the robot. While these methods can work well in structured environments where object geometries are known ahead of time, they are less well suited for unstructured real world environments. In contrast, grasp detection methods treat perception for grasping like a typical computer vision problem [16], [12], [15], [5], [3], [7], [10], [18], [6].

Instead of attempting to estimate object pose, grasp detection estimates grasp configurations directly from image or point cloud sensor data. These detected grasp configurations are 6DOF poses of a robotic hand from which a grasp is expected to be successful. Importantly, there is no notion of “object” here and there is no segmentation as a pre-processing step. A successful grasp configuration is simply a hand configuration from which it is expected that the fingers will establish a grasp when they are closed.

(a)

(b)

(c)

Fig. 2. (a) input point cloud; (b) grasp candidates that denote potential grasp configurations; (c) high scoring grasps.

Most grasp detection methods have the following two steps: grasp candidate generation and grasp scoring. During candidate generation, the algorithm hypothesizes a large number of robotic hand configurations that could potentially be but are not necessarily grasps. Then, during grasp scoring, some form of machine learning is used to rank the candidates according to the likelihood that they will turn out to be grasps. There are a variety of ways of implementing these two steps. Figure 2 illustrates ours. Figure 2(b) shows a large set of grasp candidates that denote potential grasp configurations. These grasp candidates are generated by searching for hand configurations that are obstacle free with respect to the point cloud and that contain some portion of the cloud between the two fingers. Figure 2(c) shows a set of grasps that were scored very highly by the machine learning subsystem. In our case, we use a four-layer-deep convolutional neural network to make grasp predictions based on projections of the portion of the point cloud contained between the fingers. More details on this method can be found in [6]. III. F USING S EQUENTIAL G RASP D ETECTIONS A natural question to ask is whether it is possible to improve grasp detection accuracy by incorporating data from multiple viewpoints. Our prior work has shown that it is possible to improve grasp success rates by 9% just by using a SLAM package to create a more complete point cloud of the scene [6]. In that work, we mounted a depth sensor near the wrist and used InfiniTAM [8] to integrate point cloud data obtained by moving the hand in an arc over the scene. Unfortunately, because we typically view the scene at relatively close range (approximately 45 cm), InfiniTAM frequently loses track of the scene relative to the voxel grid while moving the arm. This was a particular problem when

(a)

(b)

accuracy obtained by averaging scores exceeds the accuracy obtained from most of the individual viewpoints. However, there are two things to notice. First, if there is no way of knowing which viewpoint is better than another, then there is value in averaging scores from different viewpoints. Second, the fact that the accuracy obtained from the best viewpoint exceeds the average suggests that it might be better to not average, but to obtain a good viewpoint and do grasp detection using that single cloud. We explore this idea in the next section. Although the findings presented above are only for the CheezIts box, other objects produce similar results. IV. C HOOSING THE B EST V IEW FOR G RASP D ETECTION

(c) Fig. 3. (a) 20 viewpoints around a box a CheezIts; (b) grasp candidates on the box; (c) grasp detection accuracy for averaged scores (red) versus accuracy for each view (blue).

viewing a near-empty tabletop where there is little structure to facilitate correct SLAM matching using ICP. In addition, because we were using the Baxter robot, we sometimes experienced grasp failures due to kinematic modelling errors. Both of these problems are serious challenges when attempting to do grasp detection using inexpensive robotic hardware. An alternative to using a SLAM package is to run grasp detection directly on each of multiple point clouds and to somehow fuse the results. One simple approach is to create a single set of grasp candidates, score those candidates against each of the multiple point clouds, and then to make predictions based on the average score per grasp candidate. We performed this experiment off-line using data from the BigBird dataset [17]. BigBird is a database comprised of a set of 125 objects where each object is associated with 600 different point clouds taken from different views at roughly a constant radius about the object. In this experiment, we used point clouds taken from the 20 different viewpoints shown in Figure 3(a) around a box of CheezIts (the black object in the upper right of Figure 3(a)). We used the first point cloud (denoted by a red circle in Figure 3(a)) to generate the grasp candidates shown in Figure 3(b). Then, we scored each of the grasp candidates using each of the 20 point clouds. The blue line in Figure 3(c) shows the accuracy of grasp detection as a function of view, which varies between 80% and 100%. The red line shows the detection accuracy that is obtained by averaging scores from all 20 viewpoints. The

In the last section, we sampled grasps once and then evaluated the accuracy of making predictions based on average grasp scores. For inexpensive robotic arms, this is a best-case scenario because it requires accurately registering the point cloud in which the grasp candidate was obtained with the point cloud obtained from the new perspective. One solution would be to encode this uncertainty as “process noise” in the context of sequential Bayesian filtering. However, since this will reduce the “averaged” accuracy shown in Figure 3 even futher, it is probably not worth it. Therefore, in this section, we consider the entire grasp detection process (both candidate generation and scoring) rather than just looking at how to improve the accuracy of scoring. This work is predicated on the idea that there is an approximate “target grasp” that describes how we would like to grasp a particular object. For example, we envision that the target grasp on a flashlight would be a handle grasp from which the robot were able to actuate the switch. Given the target grasp, we will develop a method of reasoning about how to view that grasp in order to maximize the chances of detecting a large number of grasps with high precision. First, we use the BigBird dataset to characterize the number of ground truth positives versus negatives as a function of viewpoint. Importantly, viewpoint is expressed in the reference frame of each individual grasp candidate (in terms of azimuth and elevation relative to the candidate). Figure 4(a-c) shows the result as a function of viewpoint for three different objects. This plot was calculated using kernel density estimation with a spherical Gaussian kernel and a standard deviation of 0.15. Notice that for the two rectangular objects (the CheezIts and the Dove soap), the relative density of positives is maximized in the vicinity of θ = π±0.6. Looking at the grasp directly along the approach vector works very poorly while looking at the grasp from one side or other other helps a lot. The result is different for the cylindrical object (the Pringles) where the relative density of positives is high from most viewpoints with a peak occurring at θ = π. The result described above is important because a high proportion of ground truth positives versus negatives in the candidate set could enable us also to detect more grasp positives with high precision. We tested this hypothesis using the BigBird dataset. First, using a mesh provided by BigBird for a given object, we generate a ground truth grasp positive

(a) CheezIts

(b) Dove Soap

(c) Pringles

(d) CheezIts

(e) Dove Soap

(f) Pringles

Fig. 4. (a-c) Density of positives as a function of viewpoint in the reference frame of the grasp candidate (θ =azimuth, φ =elevation) for the curvaturebased grasp candidate generation strategy. Yellow denotes areas with high density of ground truth positives. Blue denotes areas with dense ground truth negatives. (d-f) Same as Figure 4 except for the normals-based candidate generation strategy.

at random. This represents the “target grasp” that describes how we would like to grasp the object. Our goal will be to detect grasps in the vicinity (within ±2 cm and ±20 degrees) of the target grasp with high confidence. Using the relationships shown in Figure 4(a-c), we assign a score to each of the 600 viewpoints contained in the BigBird dataset for that object. Finally, we select the highest scoring view (the “smart viewpoint”), detect grasps in the corresponding cloud, and evaluate the number of ground truth grasp candidates found. The results are shown in Table I. On average, the grasp candidates generated from the smart viewpoint contain several times more ground truth positives than the grasp candidates generated from random viewpoints. Moreover, these views give us better recall at high precision. In grasp detection, the cost of a false positive is particularly high because it can cause a grasp failure. As such, we are particularly interested in the characteristics of the detection system at high precision. Our prior work introduced a measure known as recall at 99% precision (equal to the point on the precision-recall curve at 99% precision) [6]. The assumption here is that we want to recall as many grasps as possible while maintaining no more than 1% false positives. It turns out that for grasps in the vicinity of the target grasp, recall at 99% precision is significantly higher for the planned viewpoints relative to random viewpoint (the “Rec @ 99%” column in Table I). Moreover, the raw number of detected true positives is higher at 99% precision as well (the “Pos @ 99%” column in Table I). This suggests a significant benefit to the planned view both in terms of the chances that the highest scoring grasps is a true positive and in terms of the number of high-precision grasps that are available for the robot to choose from. Why does viewpoint have such an impact? One can think about grasp candidate generation as a set of heuristics that suggest potential grasp configurations. In this work, we use a candidate generation strategy that searches discretized orientations about the axis of maximum principle curvature of the local object surface (we will call this the “curvaturebased strategy”). Our choice to use this curvature axis has an important effect on the shape of the plots shown in Figure 4(a-c). Other grasp candidate generation strategies result in different preferred viewpoint directions. For example, both Herzog et al. [7] and Kappler et al. [10] use a grasp candidate mechanism that searches discretized orientations about the

Object CheezeIts CheezeIts Dove Soap Dove Soap Pringles Pringles

View Type Best Random Best Random Best Random

Num Pos 194 41 355 48 456 122

Rec @ 99% 0.807 0.220 0.809 0.132 0.584 0.170

Pos at 99% 177 41 343 38 300 100

TABLE I P ERFORMANCE OF GRASP DETECTION FROM BEST VIEW VERSUS RANDOM VIEW. Num Pos DENOTES THE NUMBER OF GROUND TRUTH POSITIVES IN THE CANDIDATE SET; Rec @ 99% DENOTES RECALL AT 99% PRECISION ( SEE TEXT ); Pos at 99% DENOTES NUMBER OF POSITIVES DETECTED AT 99% PRECISION .

local surface normal instead of a curvature axis (we call this the “normal-based strategy”). While this method also results in a strong relationship between viewpoint and density of ground truth positives, as shown in Figure 4(d-f), notice that the exact nature of the relationship is completely different. Therefore, while smart viewpoint selection is likely to help regardless of the exact candidate generation strategy used, it is important to calculate the correct viewpoint relationships for the the chosen candidate strategy. As a baseline, we also evaluated this relationship for a strategy that searched for grasp candidate in uniformly random orientations. That strategy also exhibited a strong relationship between viewpoint and the density of grasp positives. However, since this random strategy typically generates many times fewer grasp candidates overall than either of the other candidate generation strategies, we do not consider it further. V. S EQUENTIAL V IEW G RASP D ETECTION The experiments described at the end of Section IV suggest that it should be possible to improve grasp performance by viewing grasps from the right viewpoint. In particular, if the system has prior knowledge about approximately how an object should be grasped, then it should be possible to plan views that make it easier to detect the desired grasps. However, this is only one way that viewpoint can help. Another way is to reduce the impact of inaccurate kinematics on grasp success rates. Many inexpensive robotic systems, such as the Baxter research robot, have inaccurate forward kinematics. One way to reduce this impact is to detect grasps relative to an arm configuration nearby the expected

final grasp configuration. Assuming that the magnitude of kinematic errors grow with c-space distance, detecting the grasps from a nearby configuration should help. Based on these two ideas, we propose a three-view sequential grasp detection strategy that combines these ideas. We evaluate the impact of this strategy on grasp success rates using the Baxter research robot in our lab. A. A three-view sequential grasp detection strategy We formulate the following three-view sequential grasp detection strategy. First, we view the scene from an arbitrary viewpoint in order to establish an approximate “target grasp” (first view). This target grasp expresses how we would like to grasp an object. For example, if the objective is to grasp and actuate a flashlight, then the target grasp would be a handle grasp that is close enough to the switch to enable the robot to actuate it. In the experiments for this paper, the first viewpoint was chosen randomly with the constraints that the sensor be 35 cm away from a fixed point on the table and that the line-of-sight be within 53 degrees of the gravity direction. Grasps were then detected in the point cloud acquired from this viewpoint, and we then selected one to be the “target grasp” according to the same procedure in our prior work [6]. (Essentially this involves checking for reachability, collisions, grasp width, and heuristics of taskspecific interest.) Second, based on the target grasp, we select a viewpoint from which that grasp would best be viewed (second view). This is accomplished using the previously modeled density of positives versus negatives as illustrated in Figure 4. If the object identity is known, then we use a model such as shown in Figure 4 directly. If object identity is unknown, then we use an averaged density model, given whatever prior information is available. In these experiments the densities were generated from box-like objects, since high-scoring views for boxes seem to work well on cylindrical objects as well. In our experiments, we constrain the second view to be 30 cm from the target grasp. Once the view is obtained, we perform a second round of grasp detection and select a grasp within a threshold radius (8 cm in our experiments) of the target grasp. This selected grasp becomes the final target grasp. Once the final target grasp is obtained, we move the sensor to a pre-grasp arm configuration and obtain a third view. Here, the sensor is as close to the target grasp as possible while still satisfying the minimum-range constraints of the depth sensor (about 24 cm in our experiments). Using the third view, we perform one final round of grasp candidate generation without performing the corresponding grasp scoring step. The grasp candidate with the 6-DOF pose closest to the final target grasp is selected for execution. The fact that we do not score the grasps on this final step is significant – the third view is not intended to get a good label for the grasp, but simply to register the target grasp the point cloud. Essentially, the third view “shifts” the target grasp so that it is not in collision with the point cloud but still close to the grasp found in the previous step.

This three-view process is illustrated for a single object on a tabletop in Figure 5. Figure 5(a) shows the first view where an approximate target grasp is established in the middle of the box. Figure 5(b) shows the second view planed to maximize the precision of grasp detection. Figure 5(c) shows the third view that occurs just prior to grasping. Figure 5(d) shows the completed grasp. Figure 6 shows depth images corresponding to the second and third views.

(a) Second View

(b) Third View

Fig. 6. Depth images that were used to generate point clouds for the second and third views shown in Figure 5. The fingers of the robot are visible in the top-center, and shadows of the fingers are on the top sides.

B. Experiments We evaluated the preceding three-view strategy on our Baxter robot. Our goal was to evaluate the benefit of the proposed three-view grasp detection strategy relative to various ablations of the strategy. We consider the following four variations: 1) The three-view strategy as described in Section V-A. 2) Same as (1) except that we skip the final (third) view. 3) Same as (1) except that we skip the middle (second) view. 4) Same as (1) except that we skip the final two views and only perform the initial arbitrary view.

(a)

(b)

Fig. 8. (a) All 25 objects used in robot experiments. (b) Cluttered pile of 10 objects that the robot must clear.

The experimental protocol followed for each variation is similar to the one proposed in our prior work [6]. First, 10 objects are selected at random from a set of 25 and “poured” into a pile in front of the robot. (See Figure 8(a) for the object set and Figure 8(b) for an example pile of clutter.) Second, the robot proceeds to automatically remove the objects one-by-one as the experimenter records successes and failures. This continues until either all of the objects

have been removed, the same failure occurs on the same object three times in a row, or no grasps were found after 3 attempts. The sensor (Intel RealSense SR300) and gripper hardware used in the experiment are shown in Figure 1(b). Figure 7 shows the robot performing the first half of one round of an experiment. Attempts Failures Success Rate

Views 1-2-3 131 17 0.87

Views 1-2 141 31 0.78

Views 1-3 154 26 0.83

Views 1 153 39 0.75

TABLE II G RASP SUCCESS RATES FOR THE 4 EXPERIMENTAL STRATEGIES .

(Error Type) FK Grasp Other

Views 1-2-3 10 4 3

Views 1-2 21 10 0

Views 1-3 9 14 3

Views 1 28 9 2

TABLE III C OUNTS BY FAILURE TYPE FOR THE 4 EXPERIMENTAL STRATEGIES . “FK” MEANS A FORWARD KINEMATICS ERROR LED TO THE FAILURE AND “G RASP ” MEANS A DEFECT IN THE DETECTED GRASP LED TO A FAILURE .

Grasp success rates for the 4 experimental strategies are shown in Table II. A grasp was counted as a success if the object was lifted from the table and placed into a box on the floor adjacent to the table. Failures were counted whenever the robot attempted a grasp but the object did not make it to the box. The “attempts” row of the table shows the number of grasp attempts made using each strategy (579 grasp attempts total over all four strategies). The “failures” row shows the number of failures for each strategy and the “success rate” row is one minus the ratio between failures and attempts. These results suggest that both the second and third views contribute to successful grasps. Notice that the third view seems to be more important than the second view in terms of grasp success rates: the grasp success rate is higher when the second view is removed as opposed to when the third view is removed. Since grasp detection for the third view can runs quickly (only the sampling step is needed, classification is not needed), this may be a good strategy for time-sensitive tasks. The grasp failures in our experiments primarily fell into two categories as shown in Table III. The “FK” row in Table III denotes the number of grasp failures caused by a misalignment between the planned grasp and the actual object in the real world. The “Grasp” row denotes the number of grasp failures caused by a detection error. (Notice that the columnwise sum of the three rows in Table III is equal to the corresponding total number of failures indicated in Table II. The failure modes seem to reinforce intuition about what should happen if either view 2 or 3 is skipped. If view 3 is skipped, then we obtain a large number of FK failures, presumably because we are not using the last view to align the detected grasp with the final target grasp. If view

2 is skipped, then we obtain relatively more Grasp failures because the algorithm does not get the best view of the target grasp. VI. C ONCLUSIONS AND F UTURE W ORK This paper describes work that demonstrates that viewpoint can have an important effect on grasp detection. Although all experiments are performed using a grasp detection strategy we developed in our prior work, we expect that they should extend to similar grasp detection strategies such as [11], [7], [4], [5]. To summarize, we have found that viewpoint can have an important impact on the relative proportion of ground truth positives contained within the set of grasp candidates. This manifests itself in terms of the recall of grasp detection at high rates of precision (an important measure which we seek to optimize). We have also found that viewing the object as close as possible to the target grasp configuration can significantly reduce the chances of a misalignment between the planned grasp configuration and an actual viable grasp. We proposed a sequential threeview strategy for grasp detection that we demonstrate to significantly improve practical grasp success rates relative to a single random view. In the future we would like to score our views with respect to information-theoretic measures introduced in prior work to see if there is a common relationship between these approaches. Our approach is nice because it fits in well with grasp detection and requires no additional expensive computation besides a table look-up to evaluate the scores of viewpoints. More importantly, in the future we would like to look at improving the way grasps are shifted on the final approach, perhaps by making the process continuous until the sensor leaves the minimum viewing range. We hope to improve grasp success rates even further by optimizing performance of grasping in the last few centimeters. ACKNOWLEDGEMENTS This work has been supported in part by the National Science Foundation through IIS-1427081, NASA through NNX16AC48A and NNX13AQ85G, and ONR through N000141410047.

(a) First view

(b) Second View Fig. 5.

Fig. 7.

(c) Third View

(d) Grasp

Simple illustration of the three-view grasp detection strategy.

First 5 grasps of a typical experiment trial. All grasps were successful. This was run with all 3 views included.

R EFERENCES [1] Franc¸ois Chaumette and Seth Hutchinson. Visual servo control, part i: Basic approaches. IEEE Robotics & Automation Magazine, 13(4):82– 90, 2006. [2] Sachin Chitta, E Gil Jones, Matei Ciocarlie, and Kaijen Hsiao. Perception, planning, and execution for mobile manipulation in unstructured environments. IEEE Robotics and Automation Magazine, Special Issue on Mobile Manipulation, 19(2):58–71, 2012. [3] Renaud Detry, Carl H. Ek, Mariannna Madry, and Danica Kragic. Learning a dictionary of prototypical grasp-predicting parts from grasping experience. In IEEE Int’l Conf. on Robotics and Automation, pages 601–608, 2013. [4] David Fischinger and Markus Vincze. Empty the basket - a shape based learning approach for grasping piles of unknown objects. In IEEE Int’l Conf. on Intelligent Robots and Systems, pages 2051–2057, 2012. [5] David Fischinger, Markus Vincze, and Yun Jiang. Learning grasps for unknown objects in cluttered scenes. In IEEE Int’l Conf. on Robotics and Automation, pages 609–616, 2013. [6] Marcus Gualtieri, Andreas ten Pas, Kate Saenko, and Robert Platt. High precision grasp pose detection in dense clutter. In IEEE Int’l Conf. on Intelligent Robots and Systems, 2016. [7] Alexander Herzog, Peter Pastor, Mrinal Kalakrishnan, Ludovic Righetti, Tamim Asfour, and Stefan Schaal. Template-based learning of grasp selection. In IEEE Int’l Conf. on Robotics and Automation, pages 2379–2384, 2012. [8] Olaf Kahler, Prisacariu V. Adrian, Ren Carl Y. Yuheng, Xin Sun, Philip Torr, and David Murray. Very high frame rate volumetric integration of depth images on mobile device. In IEEE Int’l Symp. on Mixed and Augmented Reality, volume 22, pages 1241–1250, 2015. [9] Gregory Kahn, Peter Sujan, Sachin Patil, Shaunak Bopardikar, Julian Ryde, Ken Goldberg, and Pieter Abbeel. Active exploration using trajectory optimization for robotic grasping in the presence of occlusions. In IEEE Int’l Conf. on Robotics and Automation, pages 4783–4790, 2015. [10] Daniel Kappler, Jeannette Bohg, and Stefan Schaal. Leveraging big data for grasp planning. In IEEE Int’l Conf. on Robotics and Automation, 2015. [11] Daniel Kappler, Sstefan Schaal, and Jeannette Bohg. Optimizing for what matters: the top grasp hypothesis. In IEEE Int’l Conf. on Robotics and Automation, 2016. [12] Ian Lenz, Honglak Lee, and Ashutosh Saxena. Deep learning for detecting robotic grasps. The Int’l Journal of Robotics Research, 34(45):705–724, 2015. [13] Daniel Levine, Brandon Luders, and Jonathan P. How. Informationrich path planning with general constraints using rapidly-exploring random trees. In AIAA Infotech@ Aerospace Conf., 2010. [14] Sergey Levine, Peter Pastor, Alex Krizhevsky, and Deirdre Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. arXiv preprint arXiv:1603.02199, 2016. [15] Joseph Redmon and Anelia Angelova. Real-time grasp detection using convolutional neural networks. In IEEE Int’l Conf. on Robotics and Automation, pages 1316–1322, 2015. [16] Ashutosh Saxena, Justin Driemeyer, and Andrew Y. Ng. Robotic grasping of novel objects using vision. The Int’l Journal of Robotics Research, 27(2):157–173, 2008. [17] Arjun Singh, James Sha, Karthik S. Narayan, Tudor Achim, and Pieter Abbeel. Bigbird: A large-scale 3d database of object instances. In IEEE Int’l Conf. on Robotics and Automation, pages 509–516, 2014. [18] Andreas ten Pas and Robert Platt. Using geometry to detect grasp poses in 3d point clouds. In Int’l Symp. on Robotics Research, 2015. [19] Herke Van Hoof, Oliver Kroemer, Heni Ben Amor, and Jan Peters. Maximally informative interaction learning for scene exploration. In IEEE Int’l Conf. on Intelligent Robots and Systems, pages 5152–5158, 2012. [20] Pere-Pau V´azquez, Miquel Feixas, Mateu Sbert, and Wolfgang Heidrich. Viewpoint selection using viewpoint entropy. In Int’l Conf. on Vision Modeling and Visualization, volume 1, pages 273–280, 2001. [21] Javier Velez, Garrett Hemann, Albert S. Huang, Ingmar Posner, and Nicholas Roy. Planning to perceive: Exploiting mobility for robust object detection. In Int’l Conf. on Automated Planning and Scheduling, 2011.

Suggest Documents