Cloud-based image recognition for Robots Jaroslav ONDO1, Daniel LORENCIK1, Peter SINCAK1, Hiroaki WAGATSUMA2 1
Department of Cybernetics and Artificial Intelligence, Technical University of Kosice {jaroslav.ondo, daniel.lorencik, peter.sincak}@tuke.sk 2Department
of Human Intelligence Systems, Kyushu Institute of Technology, Kitakyushu, Japan
[email protected]
Abstract. Paper deals with Cloud-based Robotics approach which seems to be very supported by new technologies in the area of Cloud computing. In this paper, we will present and early implementation of a system for cloud-based object recognition. The primary use of the system is to provide an object recognition as a service for a wide range of devices. The main benefit of using the cloud as a platform are easy scalability in the future and mainly the sharing of already collected knowledge between all devices using this system. The system consist of feature extraction part and the classification part. For feature extraction, SIFT and SURF are used, and for the classification, the MF ArtMap has been used. In this paper, the implementation of both parts will be presented in more detail, as well as preliminary results. We do assume that Cloud Robotics and Brain research for Robots will emerge into a functional system able to share and utilize common knowledge and also personalization in close future. Keywords: cloud computing, cloud robotics, SIFT, SURF, MF ArtMap, Brain like systems
1
Introduction
Cloud Computing was introduced for IT domain many years ago. The impact to Intelligent Robotics came only recently when a concept of Cloud Robotics came into the domain of Intelligent Robotics [1], [2]. We do believe that Cloud Robotics should include implementation of Artificial Intelligence on the Cloud and also this technology can bring some major changes in core Artificial Intelligence like pattern Recognition towards continuously changing representation set for learning. Learning approach seems to be incremental and also some brain like inspirations can play important role of the resulting system. Crowdsourcing and also multisource information about brain functioning can bring effect in resulting accuracy of Robotic Intelligence.
adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011
2
Cloud Based Framework for Cloud Robotics
We have discussed the system proposal in greater detail in [3]. The proposed system is based on the notion of AI Brick [4] – to provide well-defined system suited for one task – in this case the object recognition. Since the system uses Microsoft Azure as a cloud platform, the inherited cloud capabilities will allow for easy scalability in case of increased demand on service, will allow for easy deployment of new versions (as the main logic will be provided as a cloud service) and most importantly, it will allow for knowledge acquisition and sharing from all of the connected clients. The important feature of the system is that it places no special requirements for the devices that would use it. The only requirements are ability to capture images and to send them over the internet connection to the service. 2.1
Cloud Computing Platform and Technological Aspects
As was already mentioned, the system is based on the PaaS [5] (Platform as a Service) provided by Microsoft Azure. Since our system is intended to be a cloud service, we adopted the modular architecture of Azure cloud services, where user interfaces are created as web roles hosted on virtual computers of variable computing power with the use of ASP.NET, and the background jobs are created as worker roles hosted on dedicated virtual servers of variable computing power. These are interacting with the use of Message Bus and Queues. The image data are stored as a blob storage, as well as the descriptors extracted from them. To create a truly cloud based service, we use the No-SQL Azure Tables [6] for crossreferencing the image data, extracted descriptors and the classification data instead of SQL-like databases. From the high level architecture proposal on the Fig. 1 it can be seen that only the image is sent over the Internet to the cloud service as an input data. The required preprocessing and feature extraction is done on the cloud. This approach certainly creates a problem in the terms of the speed, as the upload of the image is a time consuming operation. However, it is necessary to achieve the normalized feature space required for the object classification and also makes the resultant service more widely available, as we do not require any special software of the device for the communication. In the final stage of the service development, we will implement REST-like API for the use in other scenarios (in line with the AI-brick notion). Such a service can then be utilized in many applications, most notable are the applications of the cloud robotics. An example can be the RoboEarth project [7] which is able to use existing cloud image recognition services like Google Goggles [8].
Fig. 1. High level architecture of the proposed system [3]
2.2
Research Approaches used in Proposal
The image processing is important part of information acquisition for Robots. In image processing a feature space can be used in many forms. We have chosen spectral and also derived descriptors as features for pattern recognition procedure. We are using the SIFT (Scale invariant feature transform) [9] ad SURF (Speeded-Up Robust Features) [10] for features extraction and Membership function ArtMap [11]–[13] and Gaussian classifier for the classification of objects. The main research goal of our work is to adapt these approaches to the cloud environment and to find out which combination of extractor-classifier provides the best results. We had chosen these two classifiers as the MF ArtMap represents the model-free classifier, whereas the Gaussian represents the model-dependent classifier. One of our research goal is to compare these two methods. We anticipate the challenge with the adaptation of the classifier methods to the cloud environment. The goal of the proposed system is to provide stable service for all devices connected without regard to the actual number of connected devices. In other words, the service has to be scalable. In the terms of cloud computing that means the virtual machines which are the underlying infrastructure of the service can be at any time rebooted, shut down or started. Therefore the system itself has to be built in a way that reflects these conditions. We also compare these classification methods to the simple matching, which can prove faster in certain conditions (up to certain number of entries in the table storage). Another anticipated challenge is how to work effectively with the large sets of data we assume we will amass during the course of experiments and eventual publication of the service for the public use.
As one of the classifier, and the one to be used in the proof of concept experiment, we had considered the use of one from the group of ART neural networks due to the previous experience, more precisely ArtMap [14], [15] neural network subgroup. These networks are able to be trained using supervised learning. Finally, MF (membership function) ArtMap ([13], [16]) neural network was chosen as a classifier. This type of neural network combines theory of fuzzy sets and ART theory. The consequence of this combination is structured output consisting of computed values of the membership function of every found fuzzy cluster of every known class for the input. This way, it is possible to compute how much the input belongs into every class. The input is classified into the class represented by winner fuzzy cluster. Winner fuzzy cluster is cluster belonging into the output vector, however the value of its membership function is maximal in the output vector.
3
Cloud-based Image Classification – Software as a Service
We had divided the service into two parts, one called Cloud-based Feature Extraction (CFE) and the second the Cloud-based Classification - CCL. In this section we will talk about the feature extraction part which we had already implemented as a service. In the following text, we will use the abbreviation CFE instead of the Cloud-based feature extractor for describing the service.
Fig. 2. CFE architectures overview. On the left (a) is architecture version 1, on the right (b) architecture version 2
3.1
CFE Architecture Version 1
Our first architecture design was to use dedicated roles for extraction and for image preprocessing. The idea was that the image preprocessing was the same for both of the extractors and seemed only fitting to have it scaled automatically based on the actual load. For each of the extractors, separate worker roles were created – for the same reason. The communication with the user was done through another web role. The inter-roles communication was implemented with the Azure Queues, as compared to the Service Bus Queue have less overhead and are faster. In the queue message, we are sending the unique identification of the image. The workflow in this architecture was as follows: 1. The user uploads the image via the web page, and chooses the extractor (SIFT, SURF or both of them) 2. The image is stored to the blob storage, and the unique Id of the image is put into the queue for image preprocessing 3. The image preprocessing role accesses the image, and rewrites it with normalized image (scaled down if too big, and set to the shades of gray). Also, the Id of the preprocessed image is put into the queues for selected extractor services 4. The extractor role access the image in the storage by unique Id, and extract local features, which are then stored in blob storage with extractor prefix and image Id. The image and its extracted features are also written to the Azure Table, in which the relations between objects are kept 5. The web page with result table is updated and shows the uploaded preprocessed image along with the extracted features (available as an XML formatted document) The schema of the architecture can be seen on the left side of the Fig. 2. This architecture had a drawback in the terms of speed, as can be seen in the Table 1 and Table 2. 3.2
CFE Architecture Version 2
In the second architecture design, we made changes to speed up the process of feature extraction. As can be seen from the Table 1 and Table 2 , there is a significant time when the service is literally doing nothing, it just waits for sleep cycle to complete to check the queue for new messages. Since the architecture 1 used 3 queues (with one feeding the other two through the image preprocessing role), we decided to add the image processing to the extraction roles, thereby eliminating the first queue and one worker role. This idea was supported also because the image preprocessing was the least time consuming operation in the cycle. By the elimination of one worker role, the workflow in architecture 2 changed: 1. The user uploads the image via the web page, and chooses the extractor (SIFT, SURF or both of them) 2. The image is stored to the blob storage, and the unique Id of the image is put into the queue for selected extractor
3. The extractor role access the image in the storage by unique Id, and extract local features, which are then stored in blob storage with extractor prefix and image Id. The image and its extracted features are also written to the Azure Table, in which the relations between objects are kept 4. The web page with result table is updated and shows the uploaded preprocessed image along with the extracted features (available as an XML formatted document) The schema of the architecture can be seen on the right side of the Fig. 2. This architecture was quicker than the first. The results of measurements can be seen in the tables Table 3 and Table 4.The speed-up is between 18 and 32%. Currently we are optimizing the code to further speed-up the extraction process. 3.3
Measured Speed Results for the CFE Architectures
For testing, we used 20 images of varying size and complexity, smaller one with resolution 0.16 MPX (mega pixels) and the biggest one had 10.84 MPx. Five of the images were above FullHD resolution. The batch of images can be considered small, but at this stage, we use it only for validation of the design and the rough speed tweaking of the service. After deployment, the testing will be more rigorous with bigger sample size. We also measured the cloud service run in local emulator, so we can compare these two environments. But even in local emulator, we were using live cloud storage (unemulated), therefore only the roles were run locally. The infrastructure we used were Small compute instances for all roles, and the sleep cycle for worker roles was set to 2 seconds. We will also experiment with these settings in later stages of research. In the following tables, the measured values of time taken by the service are shown. The “Time for the user” column shows the time between clicking the upload button and showing the result on the page. The “Sum of time taken by tasks” column shows the sum of time actually consumed by the roles to compute result. The last two rows shows the time for extracting the local features and storing them in storage. Table 1. Measurements of the CFE architecture 1 - speed on the local emulator
Time for user
Sum of time taken by tasks [s]
SIFT extraction [ms]
SURF extraction [ms]
min
0:00:02
2.0450
435.1242
710.4513
max
0:04:39
21.4839
8196.7643
12731.7996
Average
0:00:20
5.0472
1860.4650
2287.6808
Median
0:00:05
3.3764
896.0543
1229.4251
Table 2. Measurements of the CFE architecture 1 - speed on cloud environment
Time for user
Sum of time taken by tasks [s]
SIFT extraction [ms]
SURF extraction [ms]
min
0:00:01
0.9759
197.9281
354.9850
max
0:00:15
11.9751
5967.6276
8374.6194
Average
0:00:04
2.6114
1007.3908
1690.6716
Median
0:00:03
1.7334
473.1524
1074.7474
Table 3. Measurements of the CFE architecture 2 - speed on the local emulator
Time for user
Sum of time taken by tasks [s]
SIFT extraction [ms]
SURF extraction [ms]
min
0:00:00
1.3733
170.0301
211.0119
max
0:00:10
12.2926
3121.1854
5058.4078
Average
0:00:03
3.0056
632.8390
942.6149
Median
0:00:02
2.3446
369.7710
586.2928
Table 4. Measurements of the CFE architecture 2 - speed on cloud environment
Time for user
Sum of time taken by tasks [s]
SIFT extraction [ms]
SURF extraction [ms]
min
0:00:00
0.7403
156.2778
169.7089
max
0:00:11
10.0929
3578.1217
7811.9474
Average
0:00:03
1.9533
686.7257
1358.1596
Median
0:00:02
1.3111
349.4731
788.2077
4
Cloud-based MF ArtMap Classifier
The second part of proposed system is classifier CL – implemented as Software as a service. . Once the image's descriptors are extracted, they are propagated into classifier. The classifier classifies the object on the picture into one of known classes or create new one if the object does not fit to none of known classes. From the group of ART neural networks we chose ArtMap [14], [15] neural network subgroup. These networks are able to be trained using supervised learning. Finally, MF (membership function) ArtMap [13], [16] neural network was chosen as a classifier. This type of neural network combines theory of fuzzy sets and ART theory. The consequence
of this combination is structured output consisting of computed values of the membership function of every found fuzzy cluster of every known class for the input. This way, it is possible to compute how much the input belongs into every class. We implemented MF ArtMap neural network classifier as separated cloud service. That makes proposed system more modular and allows combination of any classifier and any image descriptor extractors to reach the best results. MF ArtMap neural network is implemented like a data structure. All values of the MF ArtMap classifier and also trained classes, relevant clusters and their settings are stored in cloud table in cloud data store.
Fig. 3. Graphical description of training problem
During the experiments, we encountered a problem with training new images (Fig. 3). Extractor service (CFE) extract different number of descriptors for every input image. This number depends on different factors like a size of the input image, number of detected key points in the image etc. Simultaneously, the MF ArtMap neural network expects constant dimension vector as an input. Therefore, we decided to train MF ArtMap network sequentially - every descriptor as separate input. Once all descriptors of input image are propagated through the MF ArtMap neural network, we obtained vector of values of all membership function of input descriptors to all clusters and all classes. At this point we were able to statistically classify input image into one of known class or create new class of no match was found.
Fig. 4. Modification of MF ArtMap topology for sequential input
Described solution to the problem required the modification of the MF ArtMap topology. On the Fig. 4 the modified topology is presented. The layer called stack of winner’s fuzzy clusters has been added. The consequence of this modification is that the output from neural network is not just one winning fuzzy cluster determining the input class, but the output is the set of winner fuzzy clusters. After all descriptors are propagated through the first three layers, the content of the stack is propagated to the output layer, where the winner fuzzy clusters are statistically evaluated and the class of the input set of descriptors is determined. 4.1
Proof of Concept Experiment
In our experiments with MF ArtMap on the Cloud, we created architecture shown on Fig. 5. The robot Nao is capturing the image and send it to the control application on the computer. The Windows Form application relays this image to the cloud service for processing. The image is then processed on the cloud, the features extracted by the CFE service and passed to the MF ArtMap classifier. The result of the classification is then send back to the control application on the computer, which relays the data to the robot. After successful classification, the robot says the result class of the object on the captured image.
Fig. 5. High-level architecture of the system used as a proof of concept
The experiments were done on two sets of images – set 1 and set 2. First set consisted of logos and simple objects, set number 2 contained images of more complex objects. Both sets were divided 60/40 for learning and testing phase. For comparison, we have used different type of features, SIFT, SURF and spectral RBG features of the image. The results of the experiments are shown in the Table 5. The basic intention was to observe a behavior of the CFE on different types of images and there classification accuracy
could be influenced by number of features identified on those different type of images. The number of clusters and generalization ability of the MF ArtMap classifier was also observed and taken into consideration. The incrementally of the MF ArtMap classification approach is very good advantage since additional classes will not require the retraining of the neural network but are just incrementally processed in the feature space. Table 5. Proof of concept – results of the classification using two sets of data SET 1
SET 2
SURF
RGB
SIFT
SURF
RGB
Training set
100,0%
100,0%
90,0%
100,0%
100,0%
91,2%
Testing set
70,0%
65,0%
70,0%
65,2%
65,2%
56,5%
Representative set
85,0%
82,5%
80,0%
82,6%
82,6%
73,9%
Number of found clusters
2161
3075
798
2165
2895
681
Generalization of Neural Net
0,223
0,491
0,999
0,149
0,423
0,998
Classification precision
SIFT
The above classification results are representing average classification rate which was previously evaluated on the contingency tables in more details. The Number of Clusters and Generalization is in correlation since the ideal case is to have few clusters in classification but it also depends on processing data.
5
The Cloud-based Robotics with Brain like Approaches
Our intermediate goal is to use the gained knowledge to implement an MF ArtMap as a service. The Proof of concept presented in this paper was using the basic structure with MF ArtMap, which was not modified for cloud infrastructure. Regarding this, the synaptic weights of cloud version of MF ArtMap will have to be stored separately. This will allow for easy duplication of trained neural network, or moving the application to more powerful cloud server if there was demand for it. The scaling will then be done by the platform independently of human intervention, thereby providing robustness to the object recognition service. The MF ArtMap will need to be adapted further for the task of object recognition using feature descriptors, as the number of descriptors varies with object. The proof of concept used batch learning, which provided rather unsatisfying
results. Therefore, we are working on MF ArtMap input layer modification to allow for inputting all the descriptors at once. In close future we do believe to add to this framework some brain like approaches mainly from repository maintained by PhysioDesigner project [17]. We believe that implementation of hybrid approaches using selected methods of Artificial or Computational Intelligence and Brain like and more biologically inspirared systems can lead to more accorate results in the Cloud-based framework for Robots. The current testing platform is NAO humanoid robot and we do expect the extend this activity to Pepper humanoid platform next year.
6
Conclusion
We have presented some results of Cloud-based system for Object Recognition useable for Humanoid Robot NAO. We do believe that further work on the approach can be useful for multi robotic platform and also we do expect hybridization of the classical AI approaches with brain like approaches for the benefit if Cloud-based robotic intelligence. We expect problems with the standardization of databases for intelligence including fact that domain oriented knowledge will be preferable and easy to implement versus universal knowledge. Also the learning procedure is expected to be incremental and domain oriented and we do not think that universal learning approach will succeed in the close future. Acknowledgment: This paper is the result of the Project implementation: University Science Park TECHNICOM for Innovation Applications Supported by Knowledge Technology, ITMS: 26220220182, supported by the Research & Development Operational Programme funded by the ERDF.
References
[1]
J. J. Kuffner, “Cloud-enabled robots,” in IEEE-RAS International Conference on Humanoid Robotics, 2010.
[2]
G. Mohanarajah, D. Hunziker, R. D’Andrea, and M. Waibel, “Rapyuta: A Cloud Robotics Platform,” IEEE Trans. Autom. Sci. Eng., pp. 1–13, 2014.
[3]
D. Lorenčík, M. Tarhaničová, and P. Sinčák, “Cloud-Based Object Recognition: A System Proposal,” in Robot Intelligence Technology and Applications 2, vol. 274, J.-H. Kim, E. T. . Matson, H. Myung, P. Xu, and F. Karray, Eds. Cham: Springer International Publishing, 2014, pp. 707–715.
[4]
T. Ferraté, “Cloud Robotics - new paradigm is near,” Robotica Educativa y Personal, 20-Jan-2013.
[5]
P. Mell and T. Grance, “The NIST Definition of Cloud Computing Recommendations of the National Institute of Standards and Technology,” Nist Spec. Publ., vol. 145, p. 7, 2011.
[6]
J. Giardino, J. Haridas, and B. Calder, “How to get most out of Windows Azure Tables.” [Online]. Available: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-mostout-of-windows-azure-tables.aspx.
[7]
“RoboEarth Project.” [Online]. Available: http://www.roboearth.org/. [Accessed: 20Mar-2014].
[8]
“Google Goggles.” [Online]. Available: http://www.google.com/mobile/goggles/#text. [Accessed: 20-Mar-2014].
[9]
D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150–1157 vol.2.
[10]
H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded Up Robust Features,” in European Conference on Computer Vision, 2006, pp. 404–417.
[11]
A. Bodnárová, “The MF-ARTMAP neural network,” in Latest Trends in Applied informatics and Computing, 2012, pp. 264–269.
[12]
P. Smolár, “Object Categorization using ART Neural Networks,” Technical University of Kosice, 2012.
[13]
P. Sinčák, M. Hric, and J. Vaščák, “Membership Function-ARTMAP Neural Networks,” TASK Q., vol. 7, no. 1, pp. 43–52, 2003.
[14]
G. A. Carpenter, “Default ARTMAP,” Boston, 2003.
[15]
N. Kopco, P. Sincak, and S. Kaleta, “ARTMAP Neural Networks for Multispectral Image Classification,” J. Adv. Comput. Intell., vol. 4, no. 4, pp. 240–245, 2000.
[16]
P. Sincak, M. Hric, and J. Vascak, “Neural Network Classifiers based on Membership Function ARTMAP,” in Systematic organisation of information in fuzzy systems, P. Melo-Pinto, H.-N. Teodorescu, and T. Fukuda, Eds. IOS Press, 2003, pp. 321–333.
[17]
“PhysioDesigner.” [Online]. Available: http://physiodesigner.org/.