A Secured Distributed OCR System in a Pervasive

13 downloads 0 Views 315KB Size Report
storage a service to deploy the two training and learning databases. In the other hand, to consider .... security services over the Wide Area Network(WAN). SecaaS ..... [21] R.Alferi, R.Cecchini, V.Ciaschini, L.Dell'Agnello, A.Frohner,. A.Gianoli ...
A Secured Distributed OCR System in a Pervasive Environment with Authentication as a Service in the Cloud Hamdi Hassen* , **

Maher Khemakhem ** , ***

*Computer Science Department College Of Science And Arts at Al Ola, Taibah University, KSA. **Mir@cl Lab, FSEGS University of Sfax BP 1088, 3018 Sfax, Tunisia (216) 74 278 777 [email protected]

***Computer Science Department, Faculty of Computing and Technology University of King Abulaziz, KSA ** Mir@cl Lab, FSEGS University of Sfax BP 1088, 3018 Sfax, Tunisia (216) 74 278 777 [email protected]

Abstract— In this paper we explore the potential for securing a distributed Arabic Optical Character Recognition (OCR) system via cloud computing technology in a pervasive and mobile environment. The goal of the system is to achieve full accuracy, high speed and security when taking into account large vocabularies and amounts of documents. This issue has been resolved by integrating the recognition process and the security issue with multiprocessing and distributed computing technologies.

high capability for large-scale recognition systems. However, they overlooked a very important factor to obtain an efficient OCR system : the security factor. The rise in adoption of distributed platform such as cloud computing in a pervasive and mobile environment for distributed OCR system has increased the risk of Identity and Access Management (IAM) system and how they grant access to corporate resource and OCR cloud application .

Our idea consists on the one hand, to consider cloud computing as an infrastructure (IaaS) to deploy our OCR application and cloud storage a service to deploy the two training and learning databases. In the other hand, to consider Authentication as a Service (AaaS) in the Software As a service (SaaS) cloud model to secure the access to the distributed OCR system (OCR application and the database).

To resolve the security problem of our distributed OCR system, the new paradigm distributed platform such as the cloud computing model Software as a Service(SaaS) especially Security as a Service (SecaaS) is used for security management, via the application of Authentication as a service (AaaS) in the cloud.

Experimental results showed that: (i) the cloud computing technology is very promising in enhancing a distributed Arabic OCR performance, (ii) Authentication and Identity and Access Management (IAM) systems can be costly and complex, but, AaaS brings the cost benefits of SaaS to authentication. Based on advances in cloud technology, we can manage the two factor authentication and incorporating pervasive devices that was time-consuming. And (iii) transfer these burdens to a cloud provider will be an appealing option. Keywords-Distributed OCR system, Security, Authentication, Cloud computing

I. INTRODUCTION Since the early nineties, many distributed (OCR) systems have been proposed. Most systems process large amounts of documents, report high recognition rates and high speed up factor. This is mainly due to flexibility of the existing distributed pattern recognition schemes implemented across different architectural platforms and network environments, giving

The paper begins with an introduction to the concept of distributed OCR system and pervasive environment. A survey of the techniques of security used in actual pervasive and mobile system and problem statement is presented in the third section. An overview of our approach to solve this problem is presented in the fourth section. The design of the experiments, experimental results and discussions are presented in section V. The conclusions and future work are discussed in section VI. II. DISTRIBUTED OCR SYSTEM AND PERVASIVE COMPUTING A. Distributed OCR system Distributed Optical character Recognition (OCR) system is one of the proposed solutions for the implementation of large-scale pattern recognition system [1]. Distributed OCR system can be defined as an extension of existing pattern recognition approaches in which the recognition process is delegated across a distributed architectures and platform such as P2P networks[2], data grid [3] and cloud computing technologies [4] .

Most of the approaches on distributed OCR system have been focusing on integrating and combining the good complementary approaches, algorithms and techniques which can lead to an acceptable recognition rate of the OCR system in one hand, and the adequate hardware infrastructure which can host such complex and greedy software to achieve the mission in a reasonable time on the other hand [5]. Consequently, this creates such a high dependency on hardware implementation. Hence, the issue of scalability and accuracy in this context has yet to be solved. This is mainly due to inflexibility of the existing distributed pattern recognition schemes to be implemented across different architectural platforms and network environments, providing high capability for large-scale recognition deployments [6]. Many distributed OCR system are provided by computer science scientists such as The Australian Newspaper Digitization Program [7], OCRopus [8] and OCRGrid[9]. Figure1 presents the OCRGRid platform.

environment that people will not even realize that they are using computers.

Figure 2. Pervasive system

In recent years, the fast development of embedded technology, WLAN technology, mobile computing technology turn pervasive computing in the head of research. To achieve an efficient pervasive computing environment, it is also currently facing many issues, one of which is its security challenges . III. SECURITY IN EXISISTING DISTRIBUTED OCR SYSTEM A.

Figure 1. OCRgrid platform

B. Pervasive computing With the new technologies and advanced computing paradigms of people, oriented and ubiquitous, mobile and pervasive computing has become a new and hot field of computer studies. Pervasive or ubiquitous computing represents an environment composed of a wide variety of devices and equipment that carry out information processing task to obey the needs of human users by connecting to different networks[10]. Devices and equipment in a pervasive environment are not a simple PC as we think of them. They are very tiny - a set of invisible devices which can exist in many different forms such as laptop, computers, tablets, terminals, phones or in almost any object that we cannot imagine. Representative examples including tools, cars, trains, clothing and different consumer goods, all equipment communicating and interconnected via networks Figure below presents un example of pervasive system. According to the director of the User Sciences and Experience Group at IBM's Research Center Almaden Dan Russell, by 2015, computing will have become so naturalized within the

Distributed OCR System Security needs in a Pervasive Systems Security deals with protection from intentional attacks by third parts. Security objectives can be formulated as confidentiality, integrity, non-reputability, availability, anonymity and authenticity. In a pervasive environment, a secured distributed OCR system require some other constraints and needs. Representative needs including decentralization, intero-perability and interaction, trust spread, traceability and non-repudiation, autonomy, transparency and reactivity, flexibility, privacy protection,... B.

Techniques of security in actual distributed OCR system The security of a distributed OCR system in a pervasive environment is a key factor for the acceptance of the appearing technologies in these environments. The ubiquity of devices around the user must bring him some useful and applicable services according to his/her/its needs, of reactive manner (after having expressed an intention) or of proactive manner (needs anticipation). A survey of actual technologies of security used in existing distributed OCR system mentioned in II.A are established. The Passwords technique continues to be a way for users to authenticate themselves on a equipment such as a PC. The Personal

Identification Number (PIN) is frequently used for authentication on portable telephones. The biometric techniques are used increasingly, the Finger Prints (FP) methods in particular, are now being used by devices and equipment to identify users. For objects without biometric characteristics, the Radio Frequency Identification (RFID) technique can be used in automatic identification processes for authentication. The integrity feature is guaranteed with the calculation of the checksum of the data. Manipulation can be discovered by comparing the checksum to a reference value. Checksums are typically generated by using hash values, which transform data of any length into a unique value of fixed length. The encryption technique is applied to guarantee the confidentiality. There are two essential types of encryption symmetric and asymmetric. The first one is based on the same secret key in encryption and decryption process. But in the second one two types of keys are used: a public key for the encryption, and a private key for the decryption. Non-repudiability is achieved with digital signatures. Asymmetric cryptography (see above) is generally used. In contrast to previous security objectives, the cryptographic technique is not able to assure the anonymity feature. However, Frank Stajano the reader in Security and Privacy and a member of the Digital Technology Group and the Security Group has described a protocol that enables anonymous bidding by different parts at an action. With his Resurrecting Duckling security policy model, Stajano has developed a scheme for connecting devices to one another, without requiring a third, trustworthy authority. Since it offers the possibility of getting with limited devices, this model is well suited for use in distributed computing [11]. The Secure Socket Layer (SSL) technique and its variants, Transport Layer Security (TLS), are applied to guarantee the confidentiality in communication between devices. The Internet Protocol IP, the most-used protocol for communication between various digital devices via different network technologies, has been expanded with the IPSec security standard, which supports authentication and encryption at the IP packet level. Special resource-conserving versions of the IP stack, such as the uIP stack developed by Adam Dunkels and the Swedish Institute of Computer Science, enable the IP protocol to be used in the world of pervasive computing [12]. Concerning the routing technique, the assignment of an identifiant is a key to the security availability. Indeed, a much chosen identifiants whole always permits to moderate the access of node or to resource control. It also appears that a too big liberty of entrances in tables of routing poses security problems, an adversary can occupy a part on the no negligible of this table, even with identifiants of secured nodes. Other new techniques of security based on identity management[13] such as Radius [14], OpenID[15], Liberty Alliance [16], WS Security[17] or on privilege management like Akenti [18] , Permis[19], CAS[20], Voms[21].

C. Weaknesses of security techniques in actual distributed OCR system Based on the state art of the techniques of security in actual distributed OCR system we can conclude that : On the one hand, the retained approaches in actual pervasive system for the security of communication, the exchange of information they are most based on traditional technique such as cryptography, digital signature, access controls routing Control, and tierce of confidence. These techniques limit the exploitation of the inherent properties of distributed systems or pervasive networks (distribution, redundancy…), and evolutions of the environment like devices appearance / removal, breakdown, compromising or contamination of machines (user), change of place, equipment)… On the other hand, different security approaches in pervasive environment don't address the problem of the security to the heart of the architecture. The evolution of network technologies and the appearance of varied applications in terms of services and resources encourage the appearance of new security issues where existing solutions and mechanisms are inadequate for identification and authentication process. In a mobile, distributed and pervasive system, a uniform and centralized security management mechanism is not an option. Hence, it is necessary to give more autonomy to security systems by offering them some mechanisms that permit a dynamic and flexible collaboration and communication between devices and equipment of the system. Pervasive Systems are very dynamic in term of time and location. In the setting of these systems, it is therefore desirable to adapt approaches of securities dynamically, according to the state of the network. This dynamic adaptation permits to distribute the power within the evolution of the network, whatever in the number of users and equipment of the system. IV. DISTRIBUTED OCR SYSTEM WITH AUTHENTICATION AS A SERVICE IN THE CLOUD In order to build a secure and trusted distributed OCR system, the OCR system's designer must address several security requirements. The recent studies have shown that the combination of distributed system and shared secret mechanisms allow to increase at the same time the level of confidentiality and disponibility of data. Today, distributed system such as Cloud computing technologies is primarily used to deliver services as software, platform, and infrastructure. The cloud computing model, Security as a Service ( SecaaS or SaaS) is a service that offers managed security services over the Wide Area Network(WAN). SecaaS is a particular service of SaaS that is limited to essential security issue services. Our objective is to study how to use cloud computing technology to secure our OCR system in a pervasive and mobile environment and hence allows these systems to adapt dynamically the network evolution (network, machine...), and to the

needs of users (sharing, exchange files, distributed processing,....). We propose to secure our distributed OCR system using SecaaS model in the cloud computing technology. Our study is based on considering Authentication as a service (AaaS) in the cloud to the Identity and Access Management (IAM) system. This application allows any devices or an entity to authenticate access to the pervasive network. Such a device may be a human user, equipment or another server. The AaaS application can be configured in a dedicated PC, an access point, a central server or a LAN switch. Figure 3 describes our idea. The Single sign-on (SSO) systems is considered the AaaS service that is used to allow users to authenticate the distributed OCR application. First, the SSO process authenticates the end user for all protected services ( OCR application or the two learning and training databases) they have been given rights to. Second, SSO with dynamically manner reduces users rights and further prompts when they modify services in a precise open session [22].

The authentication as a service (AaaS) model is a cloud computing service based on the Security Assertion Markup Language (SAML) standard used to provide protocols for communicating security information about users and configuring authentication components, like secure tokens. SAML is based on three essentials components: assertions, binding and protocol. Authentication, attribute, and authorization are the three different assertions [23]. First, the authentication assertion allows the security system to validate the user identity. Second, the assertion attribute defines particular information about the user. Third, authorization assertion defines what the user is authorized to do. The Binding component defines the map of SAML message exchanges to Simple Object Access Protocol (SOAP) communication. Protocol defines how SAML asks for and receives assertions . Choosing the AaaS provider is a very important step, since many AaaS providers offer the same core authentication functionality but they distinguish themselves on other security management techniques.

Figure 3. A Secured distributed OCR system with Authentication as a service (AaaS) in the cloud

V. THE EXPERIMENTAL STUDY A. Experimental environment Our experimental studies are an extension of earlier work published in the 2013 AASRI Conference on Parallel and Distributed Computing and Systems in which we have distributed our OCR application via cloud computing technologies. The extension is manifested in the introducing of the security component in the distributed OCR application with authentication as a service in the cloud. Datasets and experimental environment are summarized below: For the Arabic handwriting recognition application , the normalized IFN-ENIT [24] is used as a dataset for training and learning database. Wavelet transform [25] is used as a features extraction technique. The hybrid approach K-NN/SVM [26] is applied as a classification technique.

B. Results and Discussion The speedup factor which is the ratio of the execution time in a sequential manner using a single processor to the execution time using multiple processors is the factor used to analyze our experiments. Results show: • Cloud computing technology is a framework to speed up our OCR application, that if we use 100 cores with large instance of Amazon Elastic Computing, then the execution time reaches the 425 seconds and the speedup factor reaches the value 68. This result is very interesting, because in this case our proposed OCR system is able to recognize more than 1400 characters per second. We note that our sequential system is able to recognize only 19 characters per second. Table I describes results. TABLE I. DISTRIBUTED OCR SYSTEM PERFORMANCES

In order to prove the impact of the distributed architecture in the speed up and scalability of the OCR application we have used :

Sequential system

The cascading framework [27] to develop the Data Analytics and Data Management system. Based on a survey of the cloud computing providers [28], first, Amazon Elastic Computing Cloud [27] is applied for the implementation of our distributed OCR application using the three Standard Amazon EC2 Instances. To begin with the “small” instances each with 1.7 GB of memory, 160 GB of instance storage, and 32-bit platform. After that, the Large Instance 7.5 GB of memory, 850 GB of instance storage, 64-bit platform and lastly the Extra Large Instance 15 GB of memory, 1690 GB of instance storage and 64-bit platform. Second, Amazon Simple Storage Service S3 [27] is used to manage the input and output data. Finally, Amazon cloud and especially SaaS model is used as a cloud service with Authentication as a service to prove the security of our distributed OCR system. The application programming interfaces (APIs) provided by the Amazon Elastic Computing cloud is used to incorporate the Single sign-on (SSO) authentication service into our distributed OCR application. The Experimental environment setup is based on two steps: The first step is the setup and configuration of the experimental environment while the second is to access to the application in a real cloud computing platform such as Amazon EC2. In the second step, when our application is successfully running in Hadoop, it’s time to submit it to Amazon cloud EC2 after signing up for Amazon EC2 using the AWS (Amazon Web Services) Management Console through their corporate identity and authorization system.

1 computer

small

Speedup factor

1.2

63

65

68

Number of characters per second

19

1300

1370

1400

-

The tools MapReduce technology [27] to parallelize the Arabic handwriting OCR that process large datasets with different computers (nodes) (distributed architecture). The distributed infrastructure Hadoop [27] to process a large scale data and to share efficiently a huge work across different machines.

Distributed system (Amazon EC2 Instances) medium

large

Improving data privacy and increasing compliance is typically the two performances security objectives when adopting AaaS in our Distributed OCR application via cloud technology. The Percentage of Non-Encrypted Traffic (PNET) that measures whether the quantity of encrypted traffic and the Percentage of Managed Nodes (PMN) are two metrics applied to evaluate the performance of these two area. Results show that: •

The quantity of PNET in our distributed OCR application via cloud computing technology is adequate and improve acceptable level of data security.



The increasing of the PMN metric that can help to identify and reduce risk . These results is a relevent consideration to achieve high level of compliance standards



IAM systems can be a complex and costly task, but Authentication as a service in the cloud transfers the cost benefits of Software as a Service to the authentication application.



The problem of security of distributed OCR system is resolved by the cloud computing architecture or with auxiliary manner by limiting access to OCR application and databases in a pervasive devices using access authentication technique. PaaS permits to pervasive applications to adapt to the environment using distributed components specified with contract. Secured distributed OCR based on AaaS in the cloud presents a

framework that simplify the migration and the composition of applications to share data and application in a secured pervasive environment. As a conclusion and according to the above experiments and analysis, we confirm the growing trend of hosted managed security as a service in a distributed OCR system and why the cloud is quickly becoming the preferred tools of securing Internet access over traditional solutions such as software and appliances. V) CONCLUSION AND PERSPECTIVE Using the Cloud Computing paradigm to resolve the processing and storage capacities problems in the first hand and security issue in the second hand, allowed us to develop our approach in a mobile and pervasive environment . We think that these approaches allow us to: •

Answer to the general needs of a distributed OCR system in a pervasive system such as , the scalability, the speed up, the security, the modularity, the dynamicity, the evolution and the efficiency.



Exploit the features of the pervasive network,



Guarantee certain levels of simple communication.

The proposed design approach requires further investigations. In particular, we plan to deploy our OCR application in a multicloud architecture and compare performances with existing ones.

[8]

[9]

[10]

[11]

[12]

[13]

[14] [15]

[16]

[17] [18]

[19]

ACKNOWLEDGMENT The authors wish to thank the reviewers for their fruitful Comments. The authors also wish to acknowledge the members of the Miracl Laboratory, Sfax, Tunisia.

[20]

[21]

REFERENCES [1]

[2] [3] [4]

[5]

[6]

[7]

H. Choi.C. and Oh, S.Y., “Efficient Human-like Memory Management based on Walsh-based Associative Memory for Real-time Pattern Recognition”, IJCNN, 2006, pp. 3657 - 3663. F. Cappello, P2P :Developpement et recentes prespectives, 2005. B. Rajukumar: A Gentle Introduction to Grid Computing and Technologies, CSI communication 2005. G. Kulkarni, R. Sutar J.Gambhir ,cloud computing: a study of infrastruture as a service. International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 1, Jan-Feb 2012, pp.117-125. H.C. Choi, and S.Y. Oh, “Efficient human-like memory management based on walsh-based associative memory for real-time patternrecognition”, IJCNN, 2006, pp. 3657 – 3663. H. Al-Hertani and Ilow, J. (2005). Pattern recognition based detectionand localization in a network of randomly distributed sensor nodes,ISDA ’05, Computer Society, Washington,DC, USA, pp. 412– 419.2005. R.Holley How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs, D Lib Magazine, vol. 15 no 3/4, October 2009.

[22]

[23] [24]

[25]

[26] [27] [28]

T. M. Breuel. The OCRopus open source OCR system. In Proc. SPIE Document Recognition and Retrieval XV, pages 0F1–0F15, San Jose, CA, USA, Jan. 2008. H. Goto, OCRGrid : A Platform for Distributed and Cooperative OCR Systems, The 18th International Conference on Pattern Recognition (ICPR'06). J. Zhou, J. Riekki and J. Sun, "Pervasive Service Computing toward Accommodating Service Coordination and Collaboration," Proc. Fourth International Conference on Frontier of Computer Science and Technology (FCST '09), pp. 686-691, 2009. F . STAJANO, Security for ubiquitous Computing, Electrical & Electronics Engineering , Communication Technology - Networks, pages 8- 59, 2002. J. Xingguo , Q. Yulin ; Y. Jiancheng , "A Method to Streamline the TCP/IP Protocol Stack at Embedded Systems" Information 2010 International Conference of Science and Management Engineering (ISME), Page(s): 386 - 389, 2010. M.Nadir, Djedid A trust-based security mechanism for nomadic users in pervasive systems, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 5, No 1, September 2012. C.Rigney, A.Rubens, W.Simpson, S.Willens “RFC 2138: Remote Authentication Dial In User Service (RADIUS)”. 1997. D.Recordon, D.Reed “OpenID 2.0: a platform for usercentric identity management” The 2nd ACM workshop on digital identity management, pp.11-16, ACM Press, Virginia, USA 2006. M. Alsaleh, C Adams “Enhancing Consumer Privacy in the Liberty Alliance identity Federation and Web Services Frameworks” Workshop on Privacy Enhancing Technologies, pp. 59-77, Cambridge, UK 2006. OASIS: Organization for the Advancement of Structured Information Standards. “Web Services Security Specification" WS-Security 2004. M.Thompson, W.Johnston, S.Mudumbai, G.Hoo, K.Jackson, A.Essiari “Certificate-based access control for widely distributed resources” SSYM 08’: 8th Conference on USENIX Security Symposium, p.17, USENIX Association Berkeley, USA, 1999. D.W Chadwick, A.Otenko “The PERMIS X.509 role based privilege management infrastructure” Future Generation Computer Systems Journal, Vol 19, No 2, pp.277-289. 2003. L.Pearlman, V.welch, I.Foster, C.Kasselman, S.Tuecke “A Community Authorization Service for Group Collaboration” POLICY 02’: 3rd International workshop on policies for Distributed Sustems and Networks, pp.50-59, IEEE Computer Society, Washington, USA 2002. R.Alferi, R.Cecchini, V.Ciaschini, L.Dell’Agnello, A.Frohner, A.Gianoli, K.Lörentey, F.Spataro. “VOMS, an Authorization System fort Virtual Organizations”. European Across Grids Conference, pp.3340, Verlag, Spain, 2004. R.Wang ,S.Chen, X.Wang ,Signing Me onto Your Account throuh faboo kand Google: A TrafficGuided Security Study ofCommercially Deploye d Single-Sign-On Web Services, , Page(s): 365 - 379, 2012. F. Kohlar, . J. Schwenk., Secure Bindings of SAML Assertions to TLS Sessions, Page(s): 62 - 69, 2010. M. Pechwitz, S. S. Maddouri, V. Mrgner, N. Ellouze, and H. Amiri. Ifn/enit - database of handwritten arabic words. In In Proc. of CIFED 2002, pages 129– 136, 2002. H. Hamdi, M. Khemakhem, A Comparative study of Arabic handwritten characters invariant feature. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 12, 2011. C Zanchettin, .B Bezerra, .A KNN-SVM hybrid model for cursive handwriting recognition, , Page(s): 1 - 8 , 2012 . J. Varia, S.Mathew, Overview of Amazon Web Services, 2013. R. Prodan and S. Ostermann, “A Survey and Taxonomy of Infrastructure as a Service and Web Hosting Cloud Providers,” Proc.Int’l Conf. Grid Computing, pp. 1-10, 2009.

Suggest Documents