2012 12th International Conference on Control, Automation, Robotics & Vision Guangzhou, China, 5-7th December 2012 (ICARCV 2012)
Th42.4
A Depth Sensor to Control Pick-and-Place Robots for Fruit Packaging Abdul Md Mazid School of Engineering and Built Environment Central Queensland University Rockhampton, Australia E-mail:
[email protected]
Pavel Dzitac School of Engineering and Built Environment Central Queensland University Rockhampton, Australia E-mail:
[email protected]
according to size and colour. Sorting according to sizes, mass, colour, quality, etc, is an essential task particularly in the fruit and vegetable industry. It is envisaged that the depth sensor method can be applied efficiently in the fruit and vegetable packaging industry for sorting and packaging purposes. The existing work on robotic pick-and-place focuses mostly on use of vision systems and commercial depth sensing technologies. There are commercial pick-and-place systems available that use object localization technologies to locate and pick objects. However, these systems use custom sensing arrangements that are either very expensive or not customizable by the user. The proposed system in this paper makes use of a low cost, commercial sensor and common application development tools and methods that allow the user to customize the application for applications such as robot environment mapping and navigation with little effort. The prototype object detection and location application was developed using Microsoft Visual C# 2010, and can be obtained from the authors by request.
Abstract— This paper presents a powerful and inexpensive method for object detection and location using a low cost, commercial 3D depth sensor. This method can be used as an affordable alternative to the more expensive commercial object detection and location systems, particularly where a depth inaccuracy of a few millimeters can be tolerated. The prototype control application extracts relevant information such as the XYZ co-ordinates of an object’s location from a 3D depth map, generated by the depth sensor interface software. This information is then used by the robot control application to control the end effector of a pick-and-place robot for handling objects or fruit. Experiments have proven that the projected area of an object and its location can be successfully extracted from the depth map provided by the 3D depth sensor. This type of depth sensor can be used in robotics research projects for short range environment mapping and navigation, and for detecting and locating objects. Keywords— depth sensor, depth map, pick-and-place robot, fruit packaging
I.
INTRODUCTION
This paper presents the development of a prototype object detection method for robotic pick-and-place application that uses the Asus Xtion Pro Live 3D depth sensor, OpenNI interface software and the AForge framework’s image processing library to locate objects (melons, oranges, tomatoes, or pineapples) in a container, and direct the pickand-place robot gripper to the XYZ location of each object on the top layer of the container in which the objects are located.
II.
As mentioned earlier the experimental system, for detecting and locating individual objects from the upper layer of a container consists of the following: x A depth sensor, x Interface software, and x An image processing library
Experiments for detecting and locating oranges on a plain surface were performed successfully and useful information such as object projected area and its location were obtained.
Assembly and interfacing of this equipment was performed for experimental purposes as described below. A. Robot-sensor interface design The depth map point cloud data can be processed using 3D point cloud processing libraries to extract 3D information of interest to the application being developed.
This technology can be advantageously used in packaging industry particularly for packaging fruit and vegetables such as oranges, pineapples, mangoes, tomatoes, etc, that do not require extremely accurate depth location information. The described object detection and location method could be easily adapted effort for sorting fruit and vegetables
978-1-4673-1872-3/12/$31.00 ©2012 IEEE
ARCHITECTURAL THEORY AND DEVELOPMENT
For the application presented in this paper the robot-sensor interface was designed such that the 3D depth map generated
949
Figure 3, extracted from the OpenNI User Guide [1], shows the relationship between the hardware such as the Xtion Pro Live depth sensor, the OpenNI interface software, the middleware (OpenNI plug-in) and the user application software.
by the depth sensor can be captured and processed to extract the XYZ location co-ordinates for each object of interest in the scene, and provide this information to the controller of the pick-and-place robot. The depth map point cloud data can be used to extract any relevant information about the object for the purpose of robot-control decision making. The block diagram in Figure 1 shows the configuration of the robot-sensor interface at the hardware level. The interface between the sensor and the computer is provided via USB 2.0 communication. The interface between the computer and the robot controller is provided via RS232 communication. Obviously the interface between the computer and the robot controller can be provided using any other standard communication hardware that is available on both the computer and the robot controller. In this case the RS232 communication is used because it is readily available on both and is adequate in terms of the required data throughput.
Figure 3. The layer in the OpenNI framework
The robot control application presented in this paper extracts relevant information from the depth map data supplied by the depth sensor via the OpenNI interface software, and uses the extracted information as an input to the robot control logic that determines the actions of the robot. B. The Asus Xtion Pro Live depth sensor The Xtion sensor consists of a depth sensor (infrared projector and receiver), an RGB camera and two microphones as shown in Figure 4. The depth sensor itself consists of a structured infrared light projector that projects a 2D array of infrared dots and a receiver that decodes the dot pattern and generates a raw 3D depth point cloud. It differs from the structured light techniques [2] because it determines depth from a pattern of random dots projected on the object.
Figure 1. Layout of the robot-sensor hardware interface
The block diagram in Figure 2 displays the configuration of the robot-sensor interface at the software level. The interface between the sensor firmware and the user software application running on the computer is provided by the OpenNI interface software [1], which uses the sensor driver to capture the raw depth point cloud. Standard Microsoft Visual C# library functions are then used to create the interface between the user application and the robot controller.
Figure 4. The Xtion Pro Live sensor
The raw depth point cloud stream is processed by the OpenNI interface software which generates a depth map (distance map) in millimeters after applying correction factors to the raw depth point cloud. The specified useable range of the depth sensor is between 800mm to 3500mm, in other words the object has to be at least 800mm away from the sensor and not further than 3500mm to be reliably detectable. The Xtion sensor transmits video, depth and audio streams. The video and depth streams are transmitted at frame rates of 60fps (320x240 resolution) and 30fps (640x480 resolution).
Figure 2. Layout of the robot-sensor software interface
950
The depth stream tests ran successfully at a frame rate of 30fps using a resolution of 640x480.
Determine z location of all objects from the search distance threshold value Save object count and xyz location into an array Save other relevant object info into the array }
The Xtion Pro Live sensor is commercially available from Asus with a bundled SDK and is similar to Microsoft’s Kinect sensor used on Xbox 360. Both sensors are based on technology developed by PrimeSense [3].
} Send object location info to robot Send robot to object’s location Pick up object and move to a new location
C. Installation and setup For Windows environments the software installation package is provided for both the 32bit and 64bit machines. The latest version of the driver & SDK package is available from the Asus website. The latest version of OpenNI software is available from the OpenNI website.
} End For the purpose of this project the depth map provided by the depth sensor is converted to a 2D image after the vertical Z-distance to the object is obtained. The Z-distance of interest in this project is the Z-distance from the sensor to the top of the object. The Z-distance is found by applying a distance threshold to the depth map data (using the statement “Apply threshold to depth map to set the search distance“) and then recording the value of the search distance threshold at which the objects are detected (using the AForge image processing library functions) as in the statement below.
The Xtion sensor drivers and SDK that contains sample applications can be installed by following the installation procedure provided by Asus. III.
ROBOT CONTROL APPLICATION DEVELOPMENT
The prototype robot control application was developed using Microsoft Visual C# 2010 [4]. The easiest way to start is to modify a sample applications provided with the SDK [5]. The application presented in this paper was created by modifying the SimpleViewer sample application provided with the SDK. The developed application uses the OpenNI interface software to capture the depth map. The depth map is then processed by the robot control application using the AForge image processing library [6] to find the objects that are nearest to the depth sensor and determine their position.
If (object found) { Determine z location of all objects from the search distance threshold value } The distance threshold is sequentially increased from the minimum distance of 800mm to a desired maximum distance that is sufficient to detect all objects of interest. When the first object is detected its Z-distance is recorded. From this first Z-distance found the search threshold is increased a further distance of say 30mm. The Z-distance of each of the objects found in the 30mm scanned region is recorded into an array. The AForge library is used to find the XY location of each of the detected objects and at the same time to provide other information such as object’s projected area and orientation. Once this information is available the robot control application computes the control logic and determines where to direct the end-effectors of the pick-andplace robot.
A simple but effective control algorithm has been designed as shown in the pseudo code below. The pseudo code below searches for objects (blobs) in the depth map received from the depth sensor and determines how many objects have been found at a particular search distance from the depth sensor. Begin { Wait for update from depth node (depth sensor) Get the depth metadata (depth map) from depth node Apply threshold to depth map to set the search distance While (object not found) { If (max search distance not reached) { Increment threshold to obtain a new search distance Search for objects (blobs) in depth map at new search distance } If (object found) { Count all objects found at that search distance Determine xy location of all objects
A brief description of the written C# program and the control methodology are given below. The following code shows the required references to OpenNI and AForge in the C# application. using Using using using
951
OpenNI; AForge; AForge.Imaging; AForge.Math.Geometry;
The following code shows the required declarations for OpenNI in the C# application.
DepthMetaData. The depth metadata contains the actual distance data (depth map) after correction factors were applied to the raw depth data. Next OpenNI is instructed to wait for update from the depth node. When the update is received OpenNI is instructed to get the metadata and store it in the depthMD variable of type DepthMetaData.
private Context context; private ScriptNode scriptNode; private DepthGenerator depth;
A Context is defined as a workspace where the application builds its OpenNI production graph and holds the information regarding the state of the application. The production graph is a map of all the nodes used in the application. To use OpenNI you must construct and initialize a Context. The ScriptNode allows OpenNI to create nodes and manages all nodes that it created by using scripts. The DepthGenerator is a node that generated the depth map from the raw depth point cloud.
//Create a DepthMetaData instance DepthMetaData depthMD = new DepthMetaData(); //Read next available raw depth data context.WaitOneUpdateAll(depth); //Get depth metadata from raw depth data depthMD = depth.GetMetaData();
Once the metadata (depth map) is available it can be extracted from the depthMD structure and processed as desired. We use the AForge library to find the objects (blobs) and the Z-distance to each object. The first step is to convert the depth information to a bitmap image so it can be processed by AForge. This is done by applying a threshold to the distance data and converting it to a binary image. The converted bitmap image is then passed to AForge for processing and blob finding as shown in the C# code below.
The following C# code initializes the Context workspace using an XML file and the ScriptNode. It also initializes the DepthGenerator node. context=Context.CreateFromXmlFile(SAMPLE_XML_FIL E, out scriptNode); depth = context.FindExistingNode(NodeType.Depth) as DepthGenerator;
The following C# code shows the required declarations for
//process the 2D image blobCounter.ProcessImage(AForge.Imaging.Image.Cl one(image,System.Drawing.Imaging.PixelFormat.For mat24bppRgb));
AForge.private BlobCounter blobCounter = new BlobCounter(); private Blob[] blobs;
The blob information is then retrieved from AForge as an array of blobs, as shown in the C# code below. In this case the maximum number of detected blobs is determined by the robot control application, which can be designed to find objects within a specified depth search range (but within the sensor working range).
The BlobCounter is an AForge function that processes a 2D image, typically formatted as a 24bpp RGB image, finds “blobs” in the image and computes their location, area and other parameters. A blob is an island of pixels in the image. AForge thresholds the image to generate a black and white image, and locates islands of pixels that are separated by a black background. The Blob is an AForge structure that holds the various bits of information about the detected blob, such as its XY position and its area. A size filter can be applied to the BlobCounter to force it to consider blobs of a certain size range and ignore the rest. This functionality is useful and can simplify the object identification task in the user application. The following C# code applies a filter to the instantiated blobCounter and forces it to ignore objects that are not within the filter constraints.
//Get info for each blob found blobs = blobCounter.GetObjectsInformation();
Each element of the retrieved array of blobs holds the information about each blob in a structure and can be queried to retrieve the required information as shown in the code below. //Record the number of blobs found blobCnt = blobs.Length; //Extract the area and location info for each blob found for (int i = 0; i < blobs.Length; i++) { blobArea[i] = blobs[i].Area; blobX[i] = blobs[i].CenterOfGravity.X; blobY[i] = blobs[i].CenterOfGravity.Y; }
blobCounter.FilterBlobs = true; blobCounter.MinHeight = 20; blobCounter.MinWidth = 20; blobCounter.MaxHeight = 60; blobCounter.MaxWidth = 60;
952
The robot control application detected the objects as shown in Figure 6 (the front row of oranges in Figure 5 is the top row of blobs in Figure 6). Out of fifteen oranges thirteen were detected as blobs, one was ignored as too small by the filter setting in the application and one was not detected at all because it was too low (outside the preset detection range). The results on the right are for the first blob in the array of detected blobs The Centre of Gravity is the XY position of the detected object. The Z position (not shown in the picture) was calculated separately during the vertical search for each blob. The two small size blobs on the two edges of the image are the two legs of the sensor support. When detecting objects in a container the edges of the depth map can be cropped to ignore the sides of the container during object detection. Filters would also reject most of the artifacts in the image.
The following C# code searches for the first blob (top-most object) by incrementing the blob search distance (z_distance) until the blob is found or the maximum search distance is reached. //Search for blob if(blobCnt < 1 && z_distance < 830) { //increment z distance until next blob found z_distance += 2; } else if(blobCnt > 0) } //when blob found record its Z distance blobZ[0] = z_distance; z_distance = 0; } else } //no blob found if search range exceeded no_blobs = true; z_distance = 0; }
The above code can be modified to find all objects that are in the selected search range. This can be done by searching for objects until the specified maximum search distance is reached, and not limiting the search by the number of objects found. When the objects are found, their XYZ location information is sent to the robot controller. The robot is then commanded to go to the given location, pick up the detected object and put it in a new desired location, such as a packaging box or a quality inspection machine. IV.
Figure 6. The oranges test results using the modified OpenNI SimpleViewer
Although this is a simple test it shows the usefulness and flexibility of this sensor even when used in conjunction with the free open source point cloud and image processing libraries. Most of such libraries are available for the C++ programming language.
EXPERIMENT FOR PERFORMANCE TESTING
Once the detected object’s data is available it can be used by the control application to send the pick-and-place robot to the object’s location. The application can update the object coordinates in real time, and therefore the robot would not have to wait at any point for the robot control application to update the location of the objects.
The blob detection application was tested using a single layer of several oranges as objects on a horizontal plane, as shown in Figure 5. The Xtion sensor was located about 860mm above the horizontal plane. The aim of the test was to determine whether the search algorithm would detect the objects and their XYZ location reliably.
Many of the important specifications such as the depth map horizontal and vertical resolution are not stated in the basic sensor documentation, so some research was required. According to the testing performed, at 2m from sensor, the depth map has a resolution of about 3mm along the X and Y axes, and a depth (Z) resolution of about 10mm. This agrees with the sensor resolution information. The resolution of this sensor increases as the distance from the sensor decreases. At 1m from sensor the XY resolution is about 1.5mm and the Z resolution is about 5mm. The object detection precision at 1m from sensor is very good for most packaging and material handling applications. Figure 5. Experimental setup for object detection
953
Although these low cost sensors do not have a very high resolution, they are still extremely useful for locating objects in many applications. Additionally the RGB camera can be used in conjunction with the depth sensor to improve the XY resolution and accuracy if necessary.
a suitable technology, but none was as appropriate for low cost object detection and depth measurement as the technology used by Xtion and Kinect sensors. The developed control algorithms were based on previous experience with control system and control architecture development and implementation [9-10].
A good explanation on how this type of sensors measure depth is given by Nate Lowry [7]. V.
An important aspect of a controller is its performance and robustness. However, when these sensors are used in robotic applications it is important to consider the safety aspect of the application such that sensor failure will not create a dangerous situation in which people may get hurt or expensive goods may get damaged [9-10].
CONCLUSION
A flexible and inexpensive object detection and localization method for pick-and-place robots that can be developed with little effort has been presented. Although the Xtion and Kinect depth sensors were intended for gaming applications, they can be used for indoor robotics to provide useful depth information in many applications. Depth sensors provide the robots with flexible and powerful means of locating objects, such as boxes, without the need to hardcode the exact co-ordinates of the box in the robot program.
REFERENCES [1]
OpenNI User Guide, Available: http://openni.org/Documentation/ProgrammerGuide.html. [2] Giovanna Sansoni, Marco Trebeschi, and Franco Docchio, “State-ofThe-Art and Applications of 3D Imaging Sensors in Industry, Cultural Heritage, Medicine, and Criminal Investigation”, Sensor 2009, Vol. 9. Pp. 568-601. [3] PrimeSense, Available: http://www.primesense.com/#4. [4] Microsoft MSDN library – C# Programmer’s Reference. Available: http://msdn.microsoft.com/en-us/library/618ayhy6(v=VS.71).aspx. [5] Asus Xtion Pro Live driver & SDK package, Available: http://www.asus.com/Multimedia/Motion_Sensor/Xtion_PRO_LIVE/#d ownload. [6] AForge.Net Framework. Available: http://www.aforgenet.com/ [7] The Kinect SensesDepth available via link: http://nongenre.blogspot.com.au/2010/12/how-kinect-sensesdepth.html?z#!http://nongenre.blogspot.com/2010/12/how-kinectsenses-depth.html [8] Pavel Dzitac, Abdul Md Mazid, “Principles of sensor technologies for object recognition and grasping”, Proceedings, 15th International Conference on Mechatronics Technology, Melbourne, 2011. [9] Pavel Dzitac, Abdul Md Mazid, “An advanced control technology for robotic palletising in materials handling for higher productivity”, Proceedings, 15th International Conference on Mechatronics Technology, Melbourne, 2011. [10] P. Dzitac, A. M. Mazid, “An Efficient Control Configuration Development for a High-speed Robotic Palletising System”, Proceedings of 2008 IEEE International Conference on Robotics, Automation and Mechatronics (RAM 2008). Chengdu, China. 2008. pp140-145.
This technology can be used for sorting and packaging of various fruit and vegetables such as oranges, apples, pineapples and grapefruit. Further improvements can be done in this application to synchronize Xtion’s RGB camera with the depth sensor to provide color information, and to improve the XY resolution and accuracy. This would be useful in general as many applications, such as fruit sorting, require or could benefit from object’s color information. Using these depth sensors and the open source libraries (available for 2D image and 3D point cloud processing) a sophisticated detection and localization system for randomly orientated objects in containers can be developed with less effort than would be required if stereo vision cameras were used. Many sensor technologies have been researched [8] to find
954