Cloud-based image recognition for Robots

Cloud-based image recognition for Robots Jaroslav ONDO1, Daniel LORENCIK1, Peter SINCAK1, Hiroaki WAGATSUMA2 1

Department of Cybernetics and Artificial Intelligence, Technical University of Kosice {jaroslav.ondo, daniel.lorencik, peter.sincak}@tuke.sk 2Department

of Human Intelligence Systems, Kyushu Institute of Technology, Kitakyushu, Japan [email protected]

Abstract. Paper deals with Cloud-based Robotics approach which seems to be very supported by new technologies in the area of Cloud computing. In this paper, we will present and early implementation of a system for cloud-based object recognition. The primary use of the system is to provide an object recognition as a service for a wide range of devices. The main benefit of using the cloud as a platform are easy scalability in the future and mainly the sharing of already collected knowledge between all devices using this system. The system consist of feature extraction part and the classification part. For feature extraction, SIFT and SURF are used, and for the classification, the MF ArtMap has been used. In this paper, the implementation of both parts will be presented in more detail, as well as preliminary results. We do assume that Cloud Robotics and Brain research for Robots will emerge into a functional system able to share and utilize common knowledge and also personalization in close future. Keywords: cloud computing, cloud robotics, SIFT, SURF, MF ArtMap, Brain like systems

1

Introduction

Cloud Computing was introduced for IT domain many years ago. The impact to Intelligent Robotics came only recently when a concept of Cloud Robotics came into the domain of Intelligent Robotics [1], [2]. We do believe that Cloud Robotics should include implementation of Artificial Intelligence on the Cloud and also this technology can bring some major changes in core Artificial Intelligence like pattern Recognition towards continuously changing representation set for learning. Learning approach seems to be incremental and also some brain like inspirations can play important role of the resulting system. Crowdsourcing and also multisource information about brain functioning can bring effect in resulting accuracy of Robotic Intelligence.

adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011

2

Cloud Based Framework for Cloud Robotics

We have discussed the system proposal in greater detail in [3]. The proposed system is based on the notion of AI Brick [4] – to provide well-defined system suited for one task – in this case the object recognition. Since the system uses Microsoft Azure as a cloud platform, the inherited cloud capabilities will allow for easy scalability in case of increased demand on service, will allow for easy deployment of new versions (as the main logic will be provided as a cloud service) and most importantly, it will allow for knowledge acquisition and sharing from all of the connected clients. The important feature of the system is that it places no special requirements for the devices that would use it. The only requirements are ability to capture images and to send them over the internet connection to the service. 2.1

Cloud Computing Platform and Technological Aspects

As was already mentioned, the system is based on the PaaS [5] (Platform as a Service) provided by Microsoft Azure. Since our system is intended to be a cloud service, we adopted the modular architecture of Azure cloud services, where user interfaces are created as web roles hosted on virtual computers of variable computing power with the use of ASP.NET, and the background jobs are created as worker roles hosted on dedicated virtual servers of variable computing power. These are interacting with the use of Message Bus and Queues. The image data are stored as a blob storage, as well as the descriptors extracted from them. To create a truly cloud based service, we use the No-SQL Azure Tables [6] for crossreferencing the image data, extracted descriptors and the classification data instead of SQL-like databases. From the high level architecture proposal on the Fig. 1 it can be seen that only the image is sent over the Internet to the cloud service as an input data. The required preprocessing and feature extraction is done on the cloud. This approach certainly creates a problem in the terms of the speed, as the upload of the image is a time consuming operation. However, it is necessary to achieve the normalized feature space required for the object classification and also makes the resultant service more widely available, as we do not require any special software of the device for the communication. In the final stage of the service development, we will implement REST-like API for the use in other scenarios (in line with the AI-brick notion). Such a service can then be utilized in many applications, most notable are the applications of the cloud robotics. An example can be the RoboEarth project [7] which is able to use existing cloud image recognition services like Google Goggles [8].

Fig. 1. High level architecture of the proposed system [3]

2.2

Research Approaches used in Proposal

The image processing is important part of information acquisition for Robots. In image processing a feature space can be used in many forms. We have chosen spectral and also derived descriptors as features for pattern recognition procedure. We are using the SIFT (Scale invariant feature transform) [9] ad SURF (Speeded-Up Robust Features) [10] for features extraction and Membership function ArtMap [11]–[13] and Gaussian classifier for the classification of objects. The main research goal of our work is to adapt these approaches to the cloud environment and to find out which combination of extractor-classifier provides the best results. We had chosen these two classifiers as the MF ArtMap represents the model-free classifier, whereas the Gaussian represents the model-dependent classifier. One of our research goal is to compare these two methods. We anticipate the challenge with the adaptation of the classifier methods to the cloud environment. The goal of the proposed system is to provide stable service for all devices connected without regard to the actual number of connected devices. In other words, the service has to be scalable. In the terms of cloud computing that means the virtual machines which are the underlying infrastructure of the service can be at any time rebooted, shut down or started. Therefore the system itself has to be built in a way that reflects these conditions. We also compare these classification methods to the simple matching, which can prove faster in certain conditions (up to certain number of entries in the table storage). Another anticipated challenge is how to work effectively with the large sets of data we assume we will amass during the course of experiments and eventual publication of the service for the public use.

As one of the classifier, and the one to be used in the proof of concept experiment, we had considered the use of one from the group of ART neural networks due to the previous experience, more precisely ArtMap [14], [15] neural network subgroup. These networks are able to be trained using supervised learning. Finally, MF (membership function) ArtMap ([13], [16]) neural network was chosen as a classifier. This type of neural network combines theory of fuzzy sets and ART theory. The consequence of this combination is structured output consisting of computed values of the membership function of every found fuzzy cluster of every known class for the input. This way, it is possible to compute how much the input belongs into every class. The input is classified into the class represented by winner fuzzy cluster. Winner fuzzy cluster is cluster belonging into the output vector, however the value of its membership function is maximal in the output vector.

3

Cloud-based Image Classification – Software as a Service

We had divided the service into two parts, one called Cloud-based Feature Extraction (CFE) and the second the Cloud-based Classification - CCL. In this section we will talk about the feature extraction part which we had already implemented as a service. In the following text, we will use the abbreviation CFE instead of the Cloud-based feature extractor for describing the service.

Fig. 2. CFE architectures overview. On the left (a) is architecture version 1, on the right (b) architecture version 2

3.1

CFE Architecture Version 1

Our first architecture design was to use dedicated roles for extraction and for image preprocessing. The idea was that the image preprocessing was the same for both of the extractors and seemed only fitting to have it scaled automatically based on the actual load. For each of the extractors, separate worker roles were created – for the same reason. The communication with the user was done through another web role. The inter-roles communication was implemented with the Azure Queues, as compared to the Service Bus Queue have less overhead and are faster. In the queue message, we are sending the unique identification of the image. The workflow in this architecture was as follows: 1. The user uploads the image via the web page, and chooses the extractor (SIFT, SURF or both of them) 2. The image is stored to the blob storage, and the unique Id of the image is put into the queue for image preprocessing 3. The image preprocessing role accesses the image, and rewrites it with normalized image (scaled down if too big, and set to the shades of gray). Also, the Id of the preprocessed image is put into the queues for selected extractor services 4. The extractor role access the image in the storage by unique Id, and extract local features, which are then stored in blob storage with extractor prefix and image Id. The image and its extracted features are also written to the Azure Table, in which the relations between objects are kept 5. The web page with result table is updated and shows the uploaded preprocessed image along with the extracted features (available as an XML formatted document) The schema of the architecture can be seen on the left side of the Fig. 2. This architecture had a drawback in the terms of speed, as can be seen in the Table 1 and Table 2. 3.2

CFE Architecture Version 2

In the second architecture design, we made changes to speed up the process of feature extraction. As can be seen from the Table 1 and Table 2 , there is a significant time when the service is literally doing nothing, it just waits for sleep cycle to complete to check the queue for new messages. Since the architecture 1 used 3 queues (with one feeding the other two through the image preprocessing role), we decided to add the image processing to the extraction roles, thereby eliminating the first queue and one worker role. This idea was supported also because the image preprocessing was the least time consuming operation in the cycle. By the elimination of one worker role, the workflow in architecture 2 changed: 1. The user uploads the image via the web page, and chooses the extractor (SIFT, SURF or both of them) 2. The image is stored to the blob storage, and the unique Id of the image is put into the queue for selected extractor

3. The extractor role access the image in the storage by unique Id, and extract local features, which are then stored in blob storage with extractor prefix and image Id. The image and its extracted features are also written to the Azure Table, in which the relations between objects are kept 4. The web page with result table is updated and shows the uploaded preprocessed image along with the extracted features (available as an XML formatted document) The schema of the architecture can be seen on the right side of the Fig. 2. This architecture was quicker than the first. The results of measurements can be seen in the tables Table 3 and Table 4.The speed-up is between 18 and 32%. Currently we are optimizing the code to further speed-up the extraction process. 3.3

Measured Speed Results for the CFE Architectures

For testing, we used 20 images of varying size and complexity, smaller one with resolution 0.16 MPX (mega pixels) and the biggest one had 10.84 MPx. Five of the images were above FullHD resolution. The batch of images can be considered small, but at this stage, we use it only for validation of the design and the rough speed tweaking of the service. After deployment, the testing will be more rigorous with bigger sample size. We also measured the cloud service run in local emulator, so we can compare these two environments. But even in local emulator, we were using live cloud storage (unemulated), therefore only the roles were run locally. The infrastructure we used were Small compute instances for all roles, and the sleep cycle for worker roles was set to 2 seconds. We will also experiment with these settings in later stages of research. In the following tables, the measured values of time taken by the service are shown. The “Time for the user” column shows the time between clicking the upload button and showing the result on the page. The “Sum of time taken by tasks” column shows the sum of time actually consumed by the roles to compute result. The last two rows shows the time for extracting the local features and storing them in storage. Table 1. Measurements of the CFE architecture 1 - speed on the local emulator

Time for user

Sum of time taken by tasks [s]

SIFT extraction [ms]

SURF extraction [ms]

min

0:00:02

2.0450

435.1242

710.4513

max

0:04:39

21.4839

8196.7643

12731.7996

Average

0:00:20

5.0472

1860.4650

2287.6808

Median

0:00:05

3.3764

896.0543

1229.4251

Table 2. Measurements of the CFE architecture 1 - speed on cloud environment

Time for user




min

0:00:01

0.9759

197.9281

354.9850

max

0:00:15

11.9751

5967.6276

8374.6194

Average

0:00:04

2.6114

1007.3908

1690.6716

Median

0:00:03

1.7334

473.1524

1074.7474

Table 3. Measurements of the CFE architecture 2 - speed on the local emulator

Time for user




min

0:00:00

1.3733

170.0301

211.0119

max

0:00:10

12.2926

3121.1854

5058.4078

Average

0:00:03

3.0056

632.8390

942.6149

Median

0:00:02

2.3446

369.7710

586.2928

Table 4. Measurements of the CFE architecture 2 - speed on cloud environment

Time for user




min

0:00:00

0.7403

156.2778

169.7089

max

0:00:11

10.0929

3578.1217

7811.9474

Average

0:00:03

1.9533

686.7257

1358.1596

Median

0:00:02

1.3111

349.4731

788.2077

4

Cloud-based MF ArtMap Classifier

The second part of proposed system is classifier CL – implemented as Software as a service. . Once the image's descriptors are extracted, they are propagated into classifier. The classifier classifies the object on the picture into one of known classes or create new one if the object does not fit to none of known classes. From the group of ART neural networks we chose ArtMap [14], [15] neural network subgroup. These networks are able to be trained using supervised learning. Finally, MF (membership function) ArtMap [13], [16] neural network was chosen as a classifier. This type of neural network combines theory of fuzzy sets and ART theory. The consequence

of this combination is structured output consisting of computed values of the membership function of every found fuzzy cluster of every known class for the input. This way, it is possible to compute how much the input belongs into every class. We implemented MF ArtMap neural network classifier as separated cloud service. That makes proposed system more modular and allows combination of any classifier and any image descriptor extractors to reach the best results. MF ArtMap neural network is implemented like a data structure. All values of the MF ArtMap classifier and also trained classes, relevant clusters and their settings are stored in cloud table in cloud data store.

Fig. 3. Graphical description of training problem

During the experiments, we encountered a problem with training new images (Fig. 3). Extractor service (CFE) extract different number of descriptors for every input image. This number depends on different factors like a size of the input image, number of detected key points in the image etc. Simultaneously, the MF ArtMap neural network expects constant dimension vector as an input. Therefore, we decided to train MF ArtMap network sequentially - every descriptor as separate input. Once all descriptors of input image are propagated through the MF ArtMap neural network, we obtained vector of values of all membership function of input descriptors to all clusters and all classes. At this point we were able to statistically classify input image into one of known class or create new class of no match was found.

Fig. 4. Modification of MF ArtMap topology for sequential input

Described solution to the problem required the modification of the MF ArtMap topology. On the Fig. 4 the modified topology is presented. The layer called stack of winner’s fuzzy clusters has been added. The consequence of this modification is that the output from neural network is not just one winning fuzzy cluster determining the input class, but the output is the set of winner fuzzy clusters. After all descriptors are propagated through the first three layers, the content of the stack is propagated to the output layer, where the winner fuzzy clusters are statistically evaluated and the class of the input set of descriptors is determined. 4.1

Proof of Concept Experiment

In our experiments with MF ArtMap on the Cloud, we created architecture shown on Fig. 5. The robot Nao is capturing the image and send it to the control application on the computer. The Windows Form application relays this image to the cloud service for processing. The image is then processed on the cloud, the features extracted by the CFE service and passed to the MF ArtMap classifier. The result of the classification is then send back to the control application on the computer, which relays the data to the robot. After successful classification, the robot says the result class of the object on the captured image.

Fig. 5. High-level architecture of the system used as a proof of concept

The experiments were done on two sets of images – set 1 and set 2. First set consisted of logos and simple objects, set number 2 contained images of more complex objects. Both sets were divided 60/40 for learning and testing phase. For comparison, we have used different type of features, SIFT, SURF and spectral RBG features of the image. The results of the experiments are shown in the Table 5. The basic intention was to observe a behavior of the CFE on different types of images and there classification accuracy

could be influenced by number of features identified on those different type of images. The number of clusters and generalization ability of the MF ArtMap classifier was also observed and taken into consideration. The incrementally of the MF ArtMap classification approach is very good advantage since additional classes will not require the retraining of the neural network but are just incrementally processed in the feature space. Table 5. Proof of concept – results of the classification using two sets of data SET 1

SET 2

SURF

RGB

SIFT

SURF

RGB

Training set

100,0%

100,0%

90,0%

100,0%

100,0%

91,2%

Testing set

70,0%

65,0%

70,0%

65,2%

65,2%

56,5%

Representative set

85,0%

82,5%

80,0%

82,6%

82,6%

73,9%

Number of found clusters

2161

3075

798

2165

2895

681

Generalization of Neural Net

0,223

0,491

0,999

0,149

0,423

0,998

Classification precision

SIFT

The above classification results are representing average classification rate which was previously evaluated on the contingency tables in more details. The Number of Clusters and Generalization is in correlation since the ideal case is to have few clusters in classification but it also depends on processing data.

5

The Cloud-based Robotics with Brain like Approaches

Our intermediate goal is to use the gained knowledge to implement an MF ArtMap as a service. The Proof of concept presented in this paper was using the basic structure with MF ArtMap, which was not modified for cloud infrastructure. Regarding this, the synaptic weights of cloud version of MF ArtMap will have to be stored separately. This will allow for easy duplication of trained neural network, or moving the application to more powerful cloud server if there was demand for it. The scaling will then be done by the platform independently of human intervention, thereby providing robustness to the object recognition service. The MF ArtMap will need to be adapted further for the task of object recognition using feature descriptors, as the number of descriptors varies with object. The proof of concept used batch learning, which provided rather unsatisfying

results. Therefore, we are working on MF ArtMap input layer modification to allow for inputting all the descriptors at once. In close future we do believe to add to this framework some brain like approaches mainly from repository maintained by PhysioDesigner project [17]. We believe that implementation of hybrid approaches using selected methods of Artificial or Computational Intelligence and Brain like and more biologically inspirared systems can lead to more accorate results in the Cloud-based framework for Robots. The current testing platform is NAO humanoid robot and we do expect the extend this activity to Pepper humanoid platform next year.

6

Conclusion

We have presented some results of Cloud-based system for Object Recognition useable for Humanoid Robot NAO. We do believe that further work on the approach can be useful for multi robotic platform and also we do expect hybridization of the classical AI approaches with brain like approaches for the benefit if Cloud-based robotic intelligence. We expect problems with the standardization of databases for intelligence including fact that domain oriented knowledge will be preferable and easy to implement versus universal knowledge. Also the learning procedure is expected to be incremental and domain oriented and we do not think that universal learning approach will succeed in the close future. Acknowledgment: This paper is the result of the Project implementation: University Science Park TECHNICOM for Innovation Applications Supported by Knowledge Technology, ITMS: 26220220182, supported by the Research & Development Operational Programme funded by the ERDF.

References

[1]

J. J. Kuffner, “Cloud-enabled robots,” in IEEE-RAS International Conference on Humanoid Robotics, 2010.

[2]

G. Mohanarajah, D. Hunziker, R. D’Andrea, and M. Waibel, “Rapyuta: A Cloud Robotics Platform,” IEEE Trans. Autom. Sci. Eng., pp. 1–13, 2014.

[3]

D. Lorenčík, M. Tarhaničová, and P. Sinčák, “Cloud-Based Object Recognition: A System Proposal,” in Robot Intelligence Technology and Applications 2, vol. 274, J.-H. Kim, E. T. . Matson, H. Myung, P. Xu, and F. Karray, Eds. Cham: Springer International Publishing, 2014, pp. 707–715.

[4]

T. Ferraté, “Cloud Robotics - new paradigm is near,” Robotica Educativa y Personal, 20-Jan-2013.

[5]

P. Mell and T. Grance, “The NIST Definition of Cloud Computing Recommendations of the National Institute of Standards and Technology,” Nist Spec. Publ., vol. 145, p. 7, 2011.

[6]

J. Giardino, J. Haridas, and B. Calder, “How to get most out of Windows Azure Tables.” [Online]. Available: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-mostout-of-windows-azure-tables.aspx.

[7]

“RoboEarth Project.” [Online]. Available: http://www.roboearth.org/. [Accessed: 20Mar-2014].

[8]

“Google Goggles.” [Online]. Available: http://www.google.com/mobile/goggles/#text. [Accessed: 20-Mar-2014].

[9]

D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150–1157 vol.2.

[10]

H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded Up Robust Features,” in European Conference on Computer Vision, 2006, pp. 404–417.

[11]

A. Bodnárová, “The MF-ARTMAP neural network,” in Latest Trends in Applied informatics and Computing, 2012, pp. 264–269.

[12]

P. Smolár, “Object Categorization using ART Neural Networks,” Technical University of Kosice, 2012.

[13]

P. Sinčák, M. Hric, and J. Vaščák, “Membership Function-ARTMAP Neural Networks,” TASK Q., vol. 7, no. 1, pp. 43–52, 2003.

[14]

G. A. Carpenter, “Default ARTMAP,” Boston, 2003.

[15]

N. Kopco, P. Sincak, and S. Kaleta, “ARTMAP Neural Networks for Multispectral Image Classification,” J. Adv. Comput. Intell., vol. 4, no. 4, pp. 240–245, 2000.

[16]

P. Sincak, M. Hric, and J. Vascak, “Neural Network Classifiers based on Membership Function ARTMAP,” in Systematic organisation of information in fuzzy systems, P. Melo-Pinto, H.-N. Teodorescu, and T. Fukuda, Eds. IOS Press, 2003, pp. 321–333.

[17]

“PhysioDesigner.” [Online]. Available: http://physiodesigner.org/.

Cloud-based image recognition for Robots

Cloud-based image recognition for Robots

Suggest Documents

Human Activity Recognition for Domestic Robots

Evolving Visual Object Recognition for Legged Robots

Image Recognition

Image Processing for Next-Generation Robots

3D Micro Image Recognition for in Bio-Cell Operation by Micro Robots

Image Analysis for Face Recognition - Face Recognition Homepage

Mass Density Laplace-Spectra for Image Recognition

Image Recognition System for Automated Lighting ...

Fast Image Classification for Monument Recognition

The Image Recognition System for Terrestrial ...

Novel Image Feature Alphabets for Object Recognition

Single Image Subspace for Face Recognition - PARNEC

for Image Processing and Pattern Recognition

Using Convolutional Neural Networks for Image Recognition

Image Recognition for Digital Libraries - CiteSeerX

Personalized Classifier for Food Image Recognition

A Real Time Gesture Recognition System for Mobile Robots - CiteSeerX

A Multi-Modal Person Recognition System for Social Robots - MDPI

A Multi-Modal Person Recognition System for Social Robots - MDPI

A Robust Object Recognition Method for Soccer Robots

Visual Place Recognition for Autonomous Robots - Semantic Scholar

Transfer Report: Object Recognition for Wearable Visual Robots

Automatic action recognition for assistive robots to ...

Language-Guided Action Recognition for Robots - University of ...