web search engine based semantic similarity

0 downloads 0 Views 2MB Size Report
focus area for cloud in this paper would be on Microsoft Windows Azure, Google App Engine ..... other options such as SQL Server Integration Services (SSIS). Options .... [2] Jinesh Varia, “Architecting for the Cloud: Best Practices”. [Online]. Available: http://media.amazonwebservices.com/AWS_Cloud_Best_Practices.pdf.
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTERN RETRIEVAL ALGORITHM Pushpa C N1, Thriveni J1, Venugopal K R1 and L M Patnaik2 1

Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore. 2 Honarary Professor, Indian Institute of Science, Bangalore. [email protected]

ABSTRACT Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.

KEYWORDS Information Retrieval, Semantic Similarity, Support Vector Machine, Web Mining, Web Search Engine, Web Snippets

1. INTRODUCTION Search engines have become the most helpful tool for obtaining useful information from the Internet. The search results returned by even the most popular search engines are not satisfactory. It surprises users because they do input the right keywords and search engines do return pages involving these keywords, and the majority of the results are irrevalent. Developing Web search mechanisms depends on addressing two important questions: (1) how to extract related Web pages of user interest, and (2) given a set of potentially related Web pages, how to rank them according to relevance. To evaluate the effectiveness of a Web search mechanism in finding and ranking results, measures of semantic similarity are needed. In traditional approaches users provide manual assessments of relevance or semantic similarity. This is very difficult and expensive. The study of semantic similarity between words has been an integral part of information retrieval and natural language processing. Semantic similarity is a concept whereby a set of terms within term lists are assigned a metric based on the likeness of their meaning. Measuring the semantic similarity between words is an important component in various tasks on the web such as relation extraction, community mining, document clustering, automatic meta-data extraction and Web Sundarapandian et al. (Eds) : ITCS, SIP, CS & IT 09, pp. 01–11, 2013. © CS & IT-CSCP 2013

DOI : 10.5121/csit.2013.3101

2

Computer Science & Information Technology (CS & IT)

mining applications such as, community extraction, relation detection, and entity disambiguation. In information retrieval, one of the main problems is to retrieve a set of documents that is semantically related to a given user query. Efficient estimation of semantic similarity between words is critical for various natural language processing tasks such as Word Sense Disambiguation (WSD), textual entailment and automatic text summarization. In dictionary the semantic similarity between words is solved, but when it comes to web, it has become the challenging task. For example, “apple” is frequently associated with computers on the Web. However, this sense of “apple” is not listed in most general-purpose thesauri or dictionaries. A user, who searches for apple on the Web, may be interested in this sense of “apple” and not “apple” as a “fruit”. As we know that new words are being added every day in web and the present words are given multiple meanings i.e. polysemous words. So manually maintaining these words is a very difficult task. We have proposed a Pattern Retrieval Algorithm to estimate the semantic similarity between words or entities using Web search engines. Due to the vastness of the web, it is impossible to analyze each document separately; hence Web search engines provide the perfect interface for this vast information. A web search engine gives two important information about the documents searched, Page count and Web Snippets. Page count of a query term will give an estimate of the number of documents or web pages that contain the given query term. A web snippet is one which appears below the searched documents and is a brief window of text that is searched around the query term in the document. Page count between two objects is accepted globally as the relatedness measure between them. For example, the page count of the query apple AND computer in Google is 977,000,000 whereas the same for banana AND computer is only 60,200,000 [as on 20 December 2012]. The more than 16 times more numerous page counts for apple AND computer indicate that apple is more semantically similar to computer than is banana. The drawbacks of page count is that it ignores the position of the two words that appear in the document, hence the two words may appear in the document but may not be related at all and page counts takes into account polysemous words of the query term, hence a word for example Dhruv will have the page counts for both Dhruv as the star of fortune and Dhruv as a name of the Helicopter. Processing snippets is possible for measuring semantic similarity but it has the drawback of downloading a large number of web pages which consumes time, and all the search engine algorithms use a page rank algorithm, hence only the top ranked pages will have properly processed snippets. Hence there is no guarantee that all the information we need is present in the top ranked snippets. Motivation: The search results returned by the most popular search engines are not satisfactory. Because of the vastly numerous documents and the high growth rate of the Web, it is time consuming to analyze each document separately. It is not uncommon that search engines return a lot of Web page links that have nothing to do with the user’s need. Information retrieval such as search engines has the most important use of semantic similarity is the main problem to retrieve all the documents that are semantically related to the queried term by the user. Web search engines provide an efficient interface to this vast information. Page counts and snippets are two useful information sources provided by most Web search engines. Hence, accurately measuring the semantic similarity between words is a very challenging task. Contribution: We propose a Pattern Retrieval Algorithm to find the supervised semantic similarity measure between words by combining both page count method and web snippets method. Four association measures including variants of Web Dice, Web Overlap Ratio, Web Jaccard, and WebPMI are used to find semantic similarity between words in page count method

Computer Science & Information Technology (CS & IT)

3

using web search engines. The proposed approach aims to improve the correlation values, Precision, Recall, and F-measures, compared to the existing methods. Organization: The remainder of the paper is organized as follows: Section 2 reviews the related work of the semantic similarity measures between words, Section 3 gives the problem definition Section 4 gives the architecture of the system and Section 5 explains the proposed algorithm. The implementation and the results of the system are described in Section 6 and Conclusions are presented in Section 7.

2. RELATED WORK Semantic similarity between words has always been a challenging problem in data mining. Nowadays, World Wide Web (WWW) has become a huge collection of data and documents, with available information for every single user query. Mehran Sahami et al., [ 1] proposes a novel method for measuring the similarity between short text snippets by leveraging web search results to provide greater context for the short texts. In this paper, a method for measuring the similarity between short text snippets is proposed that captures more of the semantic context of the snippets rather than simply measuring their term-wise similarity. Hsin-Hsi Chen et al., [2] proposed a web search with double checking model to explore the web as a live corpus. Instead of simple web page counts and complex web page collection, the proposed novel model is a Web Search with Double Checking (WSDC) used to analyze snippets. Rudi L. Cilibrasi et al., [3] proposed the words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. It is a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. The method is applicable to all search engines and databases. Authors are introduced some notions underpinning the approach: Kolmogorov complexity, information distance, and compression-based similarity metric and a technical description of the Google distribution and the Normalized Google Distance (NGD). Dekang Lin et al., [4] proposed that, bootstrapping semantics from text is one of the greatest challenges in natural language learning. They defined a word similarity measure based on the distributional pattern of words. Jian Pei et al., [5] proposed a projection-based, sequential patterngrowth approach for efficient mining of sequential patterns. Jiang et al., [6] combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Philip Resnik et al., [7]-[8] presents measure of semantic similarity in an is-a taxonomy, based on the notion of information content. Bollegala et al., [9] proposed a method which exploits the page counts and text snippets returned by a Web search engine. Ming Li et al., [10] proposed a metric based on the non-computable notion of Kolmogorov computable distance and called it the similarity metric. General mathematical theory of similarity that uses no background knowledge or features specific to an application area. Ann Gledson et al., [11] describes a simple web-based similarity measure which relies on pagecounts only, can be utilized to measure the similarity of entire sets of words in addition to wordpairs and can use any web-service enabled search engine distributional similarity measure which uses internet search counts and extends to calculating the similarity within word-groups. T Hughes et al., [12] proposed a method that presents the application of random walk Markov chain theory for measuring lexical semantic relatedness. Dekang Lin et al., [13] present an information

4

Computer Science & Information Technology (CS & IT)

theoretic definition of similarity that is applicable as long as there is a probabilistic model. Vincent Schickel-Zuber et al., [14] present a novel approach that allows similarities to be asymmetric while still using only information contained in the structure of the ontology. These literature surveys proved the fact that semantic similarity measures plays an important role in information retrieval, relation extraction, community mining, document clustering and automatic meta-data extraction. Thus there is need for more efficient system to find semantic similarity between words.

3. PROBLEM DEFINITION Given two words A and B, we model the problem of measuring the semantic similarity between A and B, as a one of constructing a function semanticsim (A, B) that returns a value in the range of 0 and 1. If A and B are highly similar (e.g. synonyms), we expect semantic similarity value to be closer to 1, otherwise semantic similarity value to be closer to 0. We define numerous features that express the similarity between A and B using page counts and snippets retrieved from a web search engine for the two words. Using this feature representation of words, we train a two-class Support Vector Machine (SVM) to classify synonymous and non-synonymous word pairs. Our objectives are: i) To find the semantic similarity between two words and improves the correlation value. ii) To improves the Precision, Recall and the F-measure metrics of the system.

4. SYSTEM ARCHITECTURE The outline of the proposed method for finding the semantic similarity using web search engine results is as shown in Figure 1.

Figure. 1 System Architecture When a query q is submitted to a search engine, web-snippets, which are brief summaries of the search results, are returned to the user. First, we need to query the word-pair in a search engine for example say we query “cricket” and “sport” in Google search engine. We get the page counts

Computer Science & Information Technology (CS & IT)

5

of the word-pair along with the page counts for individual words i.e. H (cricket), H(sport), H(cricket AND sport). These page counts are used to find the co-occurrence measures such as Web-Jaccard, Web-Overlap, Web-Dice and Web-PMI and store these values for future references. We collect the snippets from the web search engine results. Snippets are collected only for the query X and Y. Similarly, we collect both snippets and page counts for 200 word pairs. Now we need to extract patterns from the collected snippets using our proposed algorithm and to find the frequency of occurrence of these patterns. The chi-square statistical method is used to find out the good patterns from the top 200 patterns of interest using the pattern frequencies. After that we integrate these top 200 patterns with the cooccurrence measures computed. If the pattern exists in the set of good patterns then we select the good pattern with the frequency of occurrence in the patterns of the word-pair else we set the frequency as 0. Hence we get a feature vector with 204 values i.e. the top 200 patterns and four co-occurrence measures values. We use a Sequential Minimal Optimization (or SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and top-ranking patterns. The SVM is trained to classify synonymous word-pairs and nonsynonymous word pairs. We select synonymous word-pairs and Non-synonymous word-pairs and convert the output of SVM into a posterior probability. We define the semantic similarity between two words as the posterior probability if they belong to the synonymous-words (positive) class.

5. ALGORITHM The proposed Pattern Retrieval Algorithm is used to measure the semantic similarity between words is as shown in the Table 1. Given two words A and B, we query a web search engine using the wildcard query A * * * * * B and download snippets. The * operator matches one word or none in a web page. Therefore, our wildcard query retrieves snippets in which A and B appears within a window of seven words. Because a search engine snippet contains 20 words on an average, and includes two fragments of texts selected from a document, we assume that the seven word window is sufficient to cover most relations between two words in snippets. The algorithm which is described in the Table 1 shows that how to retrieve the patterns and the frequency of the patterns. The pattern retrieval algorithm as described above yields numerous unique patterns. Of those patterns only 80% of the patterns occur less than10 times. It is impossible to train a classifier with such numerous parse patterns. We must measure the confidence of each pattern as an indicator of synonymy that is, most of the patterns have frequency less than 10 so it is very difficult to find the patterns which are significant so, we have to compute their confidence so as to arrive at the significant patterns. We compute chi-square value to find the confidence of each pattern. The chi-square value is calculated by using the formula given below: = (P + N) (pv (N - nv) - nv(P - pv))2 PN (pv + nv)(P + N - pv – nv)

(1)

Where, P and N are the Total frequency of synonymous word pair patterns and non-synonymous word pair patterns, pv and nv are frequencies of the pattern v retrieved from snippets of synonymous and non-synonymous word pairs respectively.

6

Computer Science & Information Technology (CS & IT)

Table 1. Pattern Retrieval Algorithm [PRA] Input: Given a set WS of word-pairs Step 1: Read each snippet S, remove all the non-ASCII character and store it in database. Step 2: for each snippet S do if word is same as A then Replace A by X end if if word is same as B then Replace B by Y end if end for Step 3: for each snippet S do if X € S then goto Step 4 end if if Y € S then goto Step 9 end if end for Step 4: if Y or Number of words > Max. length L then stop the sequence seq. end if Step 5: for each seq do Perform stemming operation. end for Step 6: Form the sub-sequences of the sequence such that each sub-sequence contains [X . . . Y . .]. Step 7: for each subseq do if subseq is same as existing pattern and unique then list_ pat = list_ pat + subseq freq _pat = freq _pat + 1 end if end for Step 8: if length exceeds L then Discard the pattern until you find an X or Y. end if Step 9: if you encounter Y then goto Step 4. end if

6. IMPLEMENTATION AND RESULTS 6.1. Page-count-based Co-occurrence Measures We compute four popular co-occurrence measures; Jaccard, Overlap (Simpson), Dice, and Point wise Mutual Information (PMI), to compute semantic similarity using page counts.

Computer Science & Information Technology (CS & IT)

7

Web Jaccard coefficient between words (or multi-word phrases) A and B, is defined as:

WebJaccard(A, B) =

 0  H(A ∩ B)     H(A) + H(B ) - H(A ∩ B) 

if H(A ∩ B) ≤ c,

otherwise

(2)

Web Overlap is a natural modification to the Overlap (Simpson) coefficient, is defined as:

Web Overlap(A, B) =

 0  H(A ∩ B)     min (H(A) , H(B )) 

if H(A ∩ B) ≤ c,

otherwise

(3)

WebDice is defined as:

Web Dice(A, B) =

 0  2 H(A ∩ B)     H(A) + H(B ) 

if H(A ∩ B) ≤ c,

otherwise

(4)

Web PMI is defined as:

Web PMI (A, B) =

0    log 2 ((    

if H(A ∩ B) ≤ c, H(A ∩ B) N

otherwise

(5)

H(A) H(B) N

N

We have implemented this in Java programming language and used Eclipse as an extensible open source IDE (Integrated Development Environment) [15]. We query for A AND B and collect 500 snippets for each word pair and for each pair of words (A, B) store it in the database. By using the Pattern Retrieval algorithm, we retrieved huge patterns and select only top 200 patterns. After that we compare each of the top 200 patterns based on the chi-square values “ ” which are called as good patterns with the patterns generated by the given word pair. If the pattern extracted for the particular word pair is one among the good patterns, store that good pattern with a unique ID and store the frequency of this pattern as that of the pattern generated by the given word pair. If a pattern does not match then store it with a unique ID and with its frequency set as 0 and store it in the table.

8

Computer Science & Information Technology (CS & IT)

To the same table add the four values of Web-Jaccard coefficient, Web-Overlap, Web-Dice coefficient and Web-PMI which gives a table having 204 rows of unique ID, frequency and word pair ID. Later normalize the frequency values by dividing the value in each tuple by the sum of all the frequency values. Now this 204-dimension vector is called the feature vector for the given word pair. Convert the feature vectors of all the word-pairs into a .CSV (Comma Separated Values) file. The generated .CSV file is fed to the SVM classifier which is inbuilt in Weka software [16 ]. This classifies the values and gives a similarity score for the word pair in between 0 and 1.

6.2. Test Data In order to test our system, we selected the standard Miller-Charles dataset, which is having 28 word-pairs. The proposed algorithm outperforms by 89.8 percent of correlation value, as illustrated in Table 2. Table 2. Comparison of Correlation value of PRA with existing methods

Correlation Value

Web Jaccard

Web Dice

0.26

0.27

Web Overlap 0.38

Web PMI 0.55

Bollegala Method

PRA Proposed

0.87

0.898

The Figure 2 shows the comparison of correlation value of our PRA with existing methods.

Figure 2: Comparison of correlation value of PRA with existing methods. The success of a search engine algorithm lies in its ability to retrieve information for a given query. There are two ways in which one might consider the return of results to be successful. Either you can obtain very accurate results or you can find many results which have some

Computer Science & Information Technology (CS & IT)

9

connection with the search query. In information retrieval, these are termed precision and recall, respectively [17]. The precision is the fraction of retrieved instances that are relevant, while Recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. In even simpler terms, high Recall means that an algorithm returned most of the relevant results. High precision means that an algorithm returned more relevant results than irrelevant. Table 2. Precision, Recall and F-measure values for both Synonymous and Non-synonymous classes. Class Synonymous Non-synonymous

Precision 0.9 0.833

Recall 0.947 0.714

F-Measure 0.923 0.769

In this paper, F-measure is computed based on the precision and recall evaluation metrics. The results are better than the previous algorithms, the Table 3 shows that the comparison of Precision, Recall and F-measure improvement of the proposed Algorithm. Table 3. Comparison of Precision, Recall and F-measure values of PRA with previous method Method

Precision

Recall

F-Measure

Bollegala

0.7958

0.804

0.7897

PRA

0.9

0.947

0.923

7. CONCLUSIONS Semantic Similarity measures between words plays an important role information retrieval, natural language processing and in various tasks on the web. We have proposed a Pattern Retrieval algorithm to extract numerous semantic relations that exist between two words and the four word co-occurrence measures were computed using page counts. We integrate the patterns and co-occurrence measures to generate a feature vector. These feature vectors are fed to a 2Class SVM to classify the data into synonymous and non-synonymous classes. We compute the posterior probability for each word-pair which is the similarity score for that word-pair. The proposed algorithm outperforms by 89.8 percent of correlation value. The Precision, Recall and F-measure values are improved compared to previous methods.

REFERENCES [1]

Sahami M. & Heilman T, (2006) “A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets”, 15th International Conference on World Wide Web, pp. 377-386.

[2]

Chen H, Lin M & Wei Y, (2006) “ Novel Association Measures using Web Search with Double Checking”, International Committee on Computational Linguistics and the Association for Computational Linguistics, pp. 1009-1016.

[3]

Cilibrasi R & Vitanyi P, (2007) “ The google similarity distance”, IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 3, pp. 370-383.

[4]

Lin D, (1998) “Automatic Retrieival and Clustering of Similar Words”, International Committee on Computational Linguistics and the Association for Computational Linguistics, pp. 768-774.

10

Computer Science & Information Technology (CS & IT)

[5]

Pei J, Han J, Mortazavi-Asi B, Wang J, Pinto H, Chen Q, Dayal U & Hsu M, (2004) “Mining Sequential Patterns by Pattern growth: the Prefix span Approach”, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 1424-1440.

[6]

Jay J Jiang & David W Conrath, “Semantic Similarity based on Corpus Statistics and Lexical Taxonomy”, International Conference Research on Computational Linguistics.

[7]

Resnik P, (1995) “Using Information Content to Evaluate Semantic Similarity in a Taxonomy”, 14th International Joint Conference on Aritificial Intelligence, Vol. 1, pp. 24-26.

[8]

Resnik P, (1999) “Semantic Similarity in a Taxonomy: An Information based Measure and its Application to problems of Ambiguity in Natural Language”, Journal of Artificial Intelligence Research , Vol. 11, pp. 95-130.

[9]

Danushka Bollegala, Yutaka Matsuo & Mitsuru Ishizuka, (2011) “A Web Search Engine-based Approach to Measure Semantic Similarity between Words”, IEEE Transactions on Knowledge and Data Engineering , Vol. 23, No.7, pp.977-990.

[10] Ming Li, Xin Chen, Xin Li, Bin Ma, Paul M & B Vitnyi, (2004) “The Similarity Metric”, IEEE Transactions on Information Theory, Vol. 50, No. 12, pp. 3250-3264. [11] Ann Gledson & John Keane, (2008) “Using Web-Search Results to Measure Word-Group Similarity”, 22nd International Conference on Computational Linguistics), pp. 281-28. [12] Hughes T & Ramage D (2007) “Lexical Semantic Relatedness with Random Graph Walks”, Conference on Empirical Methods in Natural Language Processing Conference on Computational Natural Language Learning, (EMNLP-CoNLL07), pp. 581-589. [13] Lin D, (1998) “An Information-Theoretic Definition of Similarity”, 15th International Conference on Machine Learning, pp. 296-304. [14] Schickel-Zuber V & Faltings B, (2007) “OSS: A Semantic Similarity Function Based on Hierarchical Ontologies”, International Joint Conference on Artificial Intelligence, pp. 551-556. [15] http://onjava.com/onjava/2002/12/11/eclipe.html [16] www.cs.waikato.ac.nz/ml/weka/ [17] Pushpa C N, Thriveni J, Venugopal K R & L M Patnaik , (2011) “Enhancement of F-measure for Web People Search using Hashing Technique”, International Journal on Information Processing (IJIP), Vol. 5, No. 4, pp. 35-44.

Authors Pushpa C N has completed Bachelor of Engineering in Computer Science and Engineering from Bangalore University, Master of Technology in VLSI Design and Embedded Systems from Visvesvaraya Technological University. She has 13 years of teaching experience. Presently she is working as Assistant Professor in Department of Computer Science and Engineering at UVCE, Bangalore and pursuing her Ph.D in Semantic Web.

Thriveni J has completed Bachelor of Engineering, Masters of Engineering and Doctoral Degree in Computer Science and Engineering. She has 4 years of industrial experience and 16 years of teaching experience. Currently she is an Associate Professor in the Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore. Her research interests include Networks, Data Mining and Biometrics.

Computer Science & Information Technology (CS & IT)

11

Venugopal K R is currently the Principal, University Visvesvaraya College of Engineering, Bangalore University, Bangalore. He obtained his Bachelor of Engineering from University Visvesvaraya College of Engineering. He received his Masters degree in Computer Science and Automation from Indian Institute of Science Bangalore. He was awarded Ph.D. in Economics from Bangalore University and Ph.D. in Computer Science from Indian Institute of Technology, Madras. He has a distinguished academic career and has degrees in Electronics, Economics, Law, Business Finance, Public Relations, Communications, Industrial Relations, Computer Science and Journalism. He has authored 31 books on Computer Science and Economics, which include Petrodollar and the World Economy, C Aptitude, Mastering C, Microprocessor Programming, Mastering C++ and Digital Circuits and Systems etc.. During his three decades of service at VCE he has over 250 research papers to his credit. His research interests include Computer Networks, Wireless Sensor Networks, Parallel and Distributed Systems, Digital Signal Processing and Data Mining.

L M Patnaik is a Honorary Professor in Indian Instituteof Science, Bangalore. During the past 35 years of his service at the Institute he has over 700 research publications in refereed International Journals and refereed International Conference Proceedings. He is a Fellow of all the four leading Science and Engineering Academies in India; Fellow of the IEEE and the Academy of Science for the Developing World. He has received twenty national and international awards; notable among them is the IEEE Technical Achievement Award for his significant contributions to High Performance Computing and Soft Computing. His areas of research interest have been Parallel and Distributed Computing, Mobile Computing, CAD for VLSI circuits, Soft Computing and Computational Neuroscience.

DEVELOPING APPLICATION FOR CLOUD – A PROGRAMMER’S PERSPECTIVE Rajeev BV1, Vinod Baliga2 and Seshubabu Tolety3 1

Microsoft Competencies, TEC, Siemens Technology Services Bangalore, India

2

Microsoft Competencies, TEC, Siemens Technology Services Bangalore, India

[email protected] [email protected] 3

Mobile Computing Team, TEC, Siemens Technology Services Bangalore, India [email protected]

ABSTRACT There are many challenges that the developers will come across while developing or migrating applications to cloud. This paper intends to discuss various points that the developers need to be aware of during the development or migration of the application to the cloud in terms of various parameters like security, manageability, optimal storage transactions, programmer productivity, debugging and profiling, etc. The paper provides insights into how to overcome these challenges when developing / migrating the on-premise application on to cloud and the difference in programming when targeting the on-premise data center and cloud. The primary focus area for cloud in this paper would be on Microsoft Windows Azure, Google App Engine and Amazon cloud.

KEYWORDS Cloud Computing, Cloud Security, Application Scalability on Cloud, Cloud Data Storage, Legacy Applications on Cloud

1. INTRODUCTION Developing a new web application targeting the cloud or migrating existing web application to the cloud involves certain changes in the programming model. The developer of the application needs to carefully understand some of the other aspects of programming that could be different from legacy on-premise application deployment. The following sections of the paper discuss some of the important technical aspects that the programmer needs to be aware of, while building applications for the cloud.

2. SECURITY As far as the physical security of a cloud provider’s warehouses and hardware are concerned, most of the providers boast of various security process certifications and third party attestations. But developers need not be aware of these since they are controlled directly by the provider. Confidentiality, Integrity, Identity and Availability are the most important features that every cloud service provider promises to provide. Confidentiality is mainly provided through various account level security techniques such as Identity Management and Access Management. Most Sundarapandian et al. (Eds) : ITCS, SIP, CS & IT 09, pp. 13–21, 2013. © CS & IT-CSCP 2013

DOI : 10.5121/csit.2013.3102

14

Computer Science & Information Technology (CS & IT)

cloud providers allow access to cloud accounts through encrypted keys and secure certificates which make the cloud service accounts inherently secure. Amazon Web Services for example provides these features through AWS Identity and Access Management and AWS Multi-Factor Authentication. Each request to the storage account requires authentication via encrypted keys which ensures that the data cannot be illegitimately accessed by unintended users. All transactions that take place between an application and the corresponding storage account happen via secure http. If the data transmission has to be secured using cryptography with authorized key system then it will have to be done by specific applications. In terms of network security the cloud service providers offer significant protection against traditional network security issues. Distributed Denial of Service Attacks, Man in the Middle Attacks, IP Spoofing and Port Scanning are minimized through various proven techniques employed by the cloud service providers. Microsoft’s Windows Azure platform provides confidentiality through an array of features such as Service Management API authentication, Least Privilege Customer Software which ensures that every application deployed on cloud run with bare minimum privileges by default hence reducing the risk of privilege exploitation by any malicious software attack. Also, every communication that happens between Windows Azure internal components is protected with SSL. Access to Windows Azure Storage services is also secured by means of access control mechanisms. In a cloud environment it’s never guaranteed that a particular application is the only one running on a particular piece of hardware. Since all applications run in a virtualized environment, chances are that multiple virtual hosts will be running on the same physical hardware. But even in this sort of scenario, application developers need not worry about applications intruding into each other’s data since all the applications are isolated from each other by design. Microsoft provides this sort of isolation via technologies such as Hypervisor, Packet Filtering and VLAN isolation. AWS provides similar protection with the use of virtualization and firewall solutions. Microsoft provides users with options to encrypt the data in storage and in transit. While the permanently stored data can be encrypted by using proven techniques that are provided by .NET Cryptographic Service Providers, the data in transit can be protected with the use of SSL. Both Amazon Web Services and Microsoft Windows Azure platforms provide security to their blob storage services both at container and blob level. There are also options provided where the access to each blob can be logged. Similar sort of security options are available for structural data storage and queue storage services provided by various service providers. Ultimately most applications need to have their own security mechanism, so that only authorized users can make use of the services provided by them. This is traditionally achieved using techniques such as forms authentication or windows authentication. Similar techniques can be employed in cloud environment as well. If the application wants to leverage proven security mechanisms such as Active Directory Services, then the cloud services provide application developers with various options. Windows Azure applications can make use of Active Directory Services through Windows Azure Active Directory services to enable security features such as single sign-on. With Amazon Web Services developers would have to come up with workarounds to make use of Active Directory Services. Windows Azure also provides Access Control Services with which application developers can provide identity and access control to their web applications while integrating with standards based identity providers such as Live ID, Google and Facebook.

Computer Science & Information Technology (CS & IT)

15

Business applications often require industry specific regulatory compliance. AWS is currently PCI DSS 2.0 Level 1 compliant. Microsoft claims to be currently working on getting this compliance for Windows Azure. None of the major cloud service providers are currently HIPAA compliant although guides are available to make use of cloud storage data protection features as a part of an overall strategy to achieve HIPAA compliance. High availability of an application is something that any organization strives to achieve. But it is also something that is very hard to achieve because it requires investing on highly specialized tools, lots of hardware and specially trained people. But with cloud achieving high availability could be as simple as changing a configuration setting using the management portals to increase the number of application instances. Data storage is also highly replicated so that multiple copies of data are available at any given point in time. For example SQL Azure provides high availability automatically which is quite complex to achieve on premise.

3. DATA STORAGE When it comes to storing application data, traditionally developers would make use of server storage or network storage to store large files, Microsoft Message Queuing (MSMQ) or other Enterprise Messaging Service (EMS) such as Tibco for queuing services and Relational Database Management Systems like SQL, Oracle or MySQL for storing structured and relational data. Most of the cloud data storage providers provide alternatives to these services. Large files could be stored using Azure Blob Storage or Amazon S3, de-coupled communication between two applications can be achieved using Amazon Simple Queues and relational data can be stored using Amazon RDS or Google Cloud SQL. Table-1 below shows the list of cloud storage services provided by Amazon, Google and Microsoft. Table-1: Cloud storage services provided by major cloud service providers. Storage Feature

Windows Azure

Amazon Web Services

File Storage Queuing Service Structured Data Storage Random Read/Write

BLOB Storage Service Queue Service Table Storage Azure Drives

Simple Storage Service Simple Queue Service SimpleDB (beta) Elastic Block Store

Google App Engine Blobstore Task Queue DataStore -

But the developers need to be aware of a few aspects in which consuming cloud based storage service varies from traditional data storage mechanisms. Cloud based storage services are mostly accessed using REST based APIs. The application developers need to be aware of how REST works and also be familiar with the REST based APIs supported by cloud storage providers. The next important thing a developer should be aware of is the pricing model of cloud storage services. Although this seems more of a business concern, the application developer must be fully aware of the transaction charges (Cost of each request to the storage), bandwidth charges (Cost of incoming and outgoing data) and the storage charges (Cost per each gigabyte of data stored). Every time a request is made to the cloud storage the transaction and bandwidth usage meter ticks and one will be ultimately charged for it. Table-2 shows the cost model of cloud storage services provided by Amazon, Google and Microsoft. However one has the option of making use of data caching options to avoid frequent hits to the data storage account. One should also keep in mind that the cloud storage is not local to the application. Hence some amount of latency should be expected by the developer. There are also limitations imposed on all

16

Computer Science & Information Technology (CS & IT)

types of storage be it blob, table or queue storage. These limitations may vary from vendor to vendor. If the application has to make use of standard file system APIs then the developers will have to make use of special drive storage services provided by the cloud storage provider (Azure XDrive or Amazon Elastic Block Store - EBS). But one has to be aware of the limitations of drive storage services. In Windows Azure only one application instance can have a write access to a particular drive at any given point in time. Other application instances can continue to have read access to the same drive. An application will also have to ensure that the drive is mounted before issuing any command to the drive storage. Table-2: Cloud storage service charges. Windows Azure

Amazon Web Services

Blob Storage Charges Storage Transaction Charges

$0.14 per GB stored per month $0.01 per 10000 transactions

$0.125 per GB per month for first 1TB $0.01 per 10000 GET transactions, per 1000 PUT, COPY, POST or LIST transactions

Data Transfer Charges

$0.12 per GB per month

Data In: Free Data Out: First 1 GB / month - Free, Up to 10 TB / month - $0.120 per GB and so on.

Google App Engine $0.13 per GB per month Write: $0.10 per 10000 Read: $0.07 per 100000 Small: $0.01 per 100k operations $0.12 per GB per month

A decision for design time of the application would be whether to use structured data storage provided by cloud data providers (Azure Table or Amazon SimpleDB or Google BigTable) or to use cloud RDBMS services. If we look at the cost factor RDBMS services on cloud cost a lot more compared to structured data storage services and also the amount of data storage capacity provided by RDBMS services are pretty low compared to table storage counterparts. But when it comes to data access (using standard data access APIs), Portability (migrating the application and database back to organization premise), Transactions (Cross table and distributed transactions), Type of Data Types supported RDBMS services clearly have the upper hand over the structured non-relational storage services provided by cloud storage providers. Third party tools such as CloudBerry Backup are available for backing up data from cloud storage accounts. Developers can also implement their own data backup programs.

4. SCALABILITY When it comes to scaling an application up or down, most cloud providers provide their own scaling solutions. Microsoft’s Windows Azure comes with a feature known as Elastic Scale which allows scaling of application via a minor configuration change without having to bring down the existing application. Microsoft also provides APIs through which the application can programmatically scale up or down based on some application logic. An Amazon Elastic Compute Cloud instance can also be auto scaled up or down as per the demands of the Application that is hosted. An application can also be scaled based on pre-defined schedules. Dynamic scaling is achieved through Amazon Cloudwatch metrics. Amazon

Computer Science & Information Technology (CS & IT)

17

Cloudwatch also has an option where in application can make use of Amazon Simple Notification Service (SNS) to send alerts before initiating auto scale and after completing the auto scale. Applications hosted on Google App Engine are capable of utilizing technologies that Google applications are built on, things like BigTable and Google File System (GFS). Since cloud applications are distributed in nature managing user sessions has to be implemented in ways that can support distributed environments. Storing sessions in application memory is not an option so one has to follow state management techniques which may include storing encrypted session state in a dedicated state server or in some other persistence storage. This could result in some form of application latency. Developers have to ensure that the application session objects are serializable so that they could be persisted. Like state management, logging also differs because of the distributed nature of applications running in cloud environment.

5. DIAGNOSTICS Tracing and diagnostics is integral part of in the lifecycle of any software. But the way tracing and diagnostics is handled on on-premise applications and cloud hosted applications varies slightly in some aspects. One of the simplest forms of diagnosing an application hosted in production environment is by having some logging mechanism. In an on-premise application hosting scenarios, we would have the application log errors, exceptions and information to a text file or a database by using our own custom logging mechanism or by using third party logging frameworks such as NLog, JLog or kLogger for .NET, Java and PHP applications respectively. For out of the box logging Windows Azure provides a diagnostics infrastructure which makes use of the .NET tracing mechanisms to log traces of information and errors. This allows application programmers to choose what gets logged and also gives them the option to transfer these logs to a persistent storage (using Azure storage services) on a timely basis. Similar diagnostics services are provided by Google AppEngine which internally makes use of JLog. We could use the third party logging frameworks for a cloud hosted application as well. However the way these frameworks are configured within the application would change to some extent. For example if we have to use NLog for an application hosted on Windows Azure, we would have to implement custom NLog targets and integrate them with Windows Azure Diagnostics Infrastructure. Similarly if we have to use NLog in an application hosted on Amazon Web Services to send log reports via email then we would have to configure NLog to make use of Amazon Simple Email Service (SES). Some of cloud service providers also provide the developers with remote debugging capabilities. For example Windows Azure provides IntelliTrace (Visual Studio 2010 Ultimate only). Azure Connect can also be used to achieve remote debugging in Windows Azure platform. Amazon Web Services ships a toolkit for Eclipse which helps the developers with remote debugging and VMWare is also working on an upcoming CloudFoundry feature that provides capability to remotely debug a Java application. Almost all major cloud storage service providers give application developers with options to enable storage statistics, analytics and metrics. All of these services will be storing the storage statistics and logs in a predefined structure and these could be read using third party software like AWStats or our own custom APIs. These logs will contain data ranging from time of storage access to IP address of the client who made the request. Storage analytics could be used for audit trails purpose as well.

18

Computer Science & Information Technology (CS & IT)

6. MANAGING RELATIONAL DATA There may be scenarios where in an application would store data in a relational form into a database such as SQL Server, Oracle or MySQL. Along with highly scalable structured data storage such as Windows Azure’s Table Storage or Google App Engine’s DataStore, cloud service providers also provider cloud based relational database services. Windows Azure provides relational database as a service through SQL Azure. SQL Azure is basically SQL Server for the cloud environment and supports majority of the features supported by SQL Server Enterprise Edition. The process of connecting to a SQL Azure database and querying against it remains largely similar to what one would do while making use of a SQL database in an enterprise environment. However SQL Azure does come up with some limitations which are well documented in Microsoft Developer Network (MSDN) Library. Microsoft also provides tools to migrate an existing on- premise SQL database to SQL Azure which could be helpful in migration of an on-premise application using SQL Server to cloud. An existing on-premise data can be migrated to SQL Azure by using either the migration tool or other options such as SQL Server Integration Services (SSIS). Options are also provided for synchronizing a SQL Azure database with an on-premise database. Although SQL Azure provides a web based management portal, advanced database management can be achieved by connecting to a SQL Azure database via SQL Server Management Studio installed in an onpremise system. Amazon provides its relational database services through Amazon Relational Database Service (Amazon RDS) where in one has a choice of MySQL or Oracle as his Relational Database Management Server. Amazon RDS takes care of patching and updating the database server software and also provides on demand database instances. Based on the cloud database service chosen, one has to keep in mind whether each hit to the database is charged in terms of transactions and bandwidth. If it is charged, then it’s up to the application developer to keep the database transactions to minimum by making use of macro queries wherever possible. Table-3 shows the cost associated with various cloud based relational database services. Table-3: Cloud based relational database service charges SQL Azure

Pricing is based on size of database chosen. For example, a database of size between 1GB to 10GB would cost $9.99 for the first GB and $3.996 for each additional GB

Amazon Relational Database Service (RDS)

Price depends on the type of database chosen (MySQL or Oracle) and the size of the RDS virtual machine.

Google Cloud SQL

Google’s database service is not being billed currently

Computer Science & Information Technology (CS & IT)

19

7. MIGRATING LEGACY APPLICATIONS There are quite a few challenges while trying to migrate an on-premise application to cloud to leverage the benefits that various cloud service providers offer. We can broadly classify these challenges into the following: The complexity of migrating an application database depends on what sort of cloud data storage service we choose to use. If our on-premise application uses RDBMS like MySQL or MS SQL Server then the migration would range from a minor configuration change to changes in database code such as stored procedures and triggers in case we are using on-premise database features that don’t exist in cloud RDBMS service. In Windows Azure one has the option of migrating a database installation to the cloud using VM Roles. However ports that are commonly used by database servers may not be open in a cloud environment. So the migration has to ensure that database server is configured appropriately. However, if we choose to migrate an existing relational database to one of the structured nonrelational data stores, it would require major coding changes in the data access logic. Most applications would have some sort of authentication and authorization mechanisms built in. Usually an on-premise application would carry out authentication and authorization against an application specific store of user details. If an application is making use of Active Directory Services, then Windows Azure and Amazon Web Services both provide options and workarounds to make the existing Active Directory infrastructure work after the application is moved to cloud environment. Cloud service providers also provide authentication via universal identity providers such as Google, Facebook and Windows Live which could also be an option for authentication in a cloud based application. Scenarios where an application has to access applications or services hosted on-premise or partner organizations can also be migrated to cloud environment. Windows Azure provides Service Bus as a part of its App Fabric services which enables service calls and messages to pass through firewalls and NAT routers. Deploying an on-premise application onto cloud environment may include certain challenges. These challenges will mainly depend on whether IaaS services are used or PaaS services are used. With IaaS such as Amazon Web Services, the migration of an existing deployment will be pretty straight forward with minimal effort. However if we were to choose PaaS services such as Windows Azure, the migration challenges depend on the configuration and dependent applications/libraries that need be installed before the application deployment. An application that does not require any external libraries or OS configurations can be migrated very easily. Applications that require simple OS configuration such as environment variables setting and minor registry modifications can be achieved through start-up scripts which can be run in elevated mode in Windows Azure. However if the configurations are too many and cannot be done through start-up scripts then we would have to make use of Windows Azure VM Roles wherein we would be uploading a Windows Server 2008 R2 image with all the pre- configurations done. VM Role works almost same as other compute roles in Windows Azure however the work of updating OS and applying OS patches will have to be taken care by the cloud service user.

8. USE CASE TO MIGRATE ON-PREMISE APPLICATION TO CLOUD The use case is about an energy producing plant. Let us assume there are multiple such plants installed in various regions. All the plants need to communicate to a centralized database. The

20

Computer Science & Information Technology (CS & IT)

business layer should be able to scale up or down depending on the demand and save/fetch data from the database. A web application needs to constantly poll for new data from the plant and display onto the UI. The existing design is shown in figure 1. In the existing design of the application, the User interface of the application communicates with the business logic over HTTP. Migrating the application to AWS or Google App Engine would involve using different set of tools and techniques. Although migration of the UI components remains largely same, database migration complexity would depend on the RDBMS chosen. The following steps are needed to migrate the same application and database onto Windows Azure: 1. Upload the existing application and Service to the cloud as a Web Role. 2. Get the Service URL and update the Reference in the application. 3. Migrate the on-premise database to SQL Azure using any of the following techniques. a) SSIS – SQL Server Integration Services b) SQL Wizard- Copy option. c) Data Sync from Azure Management Portal. Figure-1: On-Premise design of the plant application

Computer Science & Information Technology (CS & IT)

21

Figure-2 shows the design of the application after it was modified for migration to cloud.

9. CONCLUSION Developing an application for cloud environment is not too different from the traditional onpremise application development. It’s just the nuances of cloud computing platforms that the developers and architects need to be aware of. We have discussed several points in the course of this paper which shed light on issues that a developer or an architect faces while adapting to the latest advances in cloud computing.

REFERENCES [1]

Charlie Kaufman and Ramanathan Venkatapathy, “Windows Azure Security Overview”.

[2]

Jinesh Varia, “Architecting for the Cloud: Best Practices”. [Online]. Available: http://media.amazonwebservices.com/AWS_Cloud_Best_Practices.pdf

[3]

J.D. Meier, “Azure Security Notes”. [Online]. Available: http://blogs.msdn.com/cfsfile.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-4803/0572.AzureSecurityNotes.pdf

[4]

Jinesh Varia, “Migrating your Existing Applications to the AWS Cloud” [Online]. Available: http://media.amazonwebservices.com/CloudMigration-main.pdf

[5]

“Integrating Applications with the Cloud on the Windows Azure Platform”. [Online]. Available: http://wag.codeplex.com/

[6]

David Chappell & Associates, “The Windows Azure Programming Model”. [Online]. Available: http://www.davidchappell.com/writing/white_papers/The_Windows_Azure_Programing_Model_1.0-Chappell.pdf

[7]

Creating HIPAA compliant Medical Data Applications [Online]. Available: http://awsmedia.s3.amazonaws.com/AWS_HIPAA_Whitepaper_Final.pdf

HEART RATE VARIABILITY ANALYSIS FOR ABNORMALITY DETECTION USING TIME FREQUENCY DISTRIBUTION – SMOOTHED PSEUDO WINGER VILLE METHOD Veena N. Hegde 1 , Ravishankar Deekshit 2 ,P.S.Satyanarayana3 1

Associate Professor, Instrumentation Tech. BMS College of Engineering, Bangalore [email protected]

2

Professor and Head, Department of EEE, BMS College of Engineering, Bangalore. 3 Former Professor, Department of ECE, BMS College of Engineering, Bangalore.

ABSTRACT Heart rate variability (HRV) is derived from the time duration between consecutive heart beats. The HRV is to reflect the heart’s ability to adapt to changing circumstances by detecting and quickly responding to unpredictable stimuli to cardiac system. Depressed HRV is a powerful predictor of mortality and of arrhythmic complications in patients after diseases like acute Myocardial Infarction. The degree of variability in the HR provides information about the nervous system control on the HR and the heart’s ability to respond .Spectral analysis of HRV is a frequency domain approach to assess the cardiac condition. In this paper one such method for analyzing HRV signals known as smoothed pseudo Wigner Ville distribution (SPWVD) is applied, The sub-band decomposition technique used in SPWVD, based on Instantaneous Autocorrelation (IACR) of the signal provides time-frequency representation for very lowfrequency (VLF), low-frequency (LF) and high-frequency (HF) regions identified in HRV spectrum. Results suggest that SPWVD analysis provides useful information for the assessment of dynamic changes and patterns of HRV during cardiac abnormalities.

KEYWORDS: HRV,RR Tachogram, Resampling, STFT,WVD,SPWVD

1. INTRODUCTION The Electrocardiogram (ECG) is a periodic signal containing information about the functioning of the heart. The duration and amplitudes of P, QRS and T wave in an ECG cycle contain useful information about the nature of heart disease. However, always it may not be possible to directly monitor the subtle details of the functioning of the heart just by observing EGC. The symptoms of certain diseases may appear at random in the time which will be not be seen for quite a long to give any consistent information. In, 1965, Heart Rate was identified as another significant tool in research and clinical studies of cardiology when distress was preceded by alterations in inter heart beat intervals before any appreciable change occurred in heart rate itself. Later it was found that HRV computed from 24-hour Holter records are more sensitive than simple bedside tests [1] Therefore, HRV signal parameters, extracted and analyzed using computers, are highly useful in diagnostics. Analysis of HRV also has become a popular non-invasive tool for assessing the Sundarapandian et al. (Eds) : ITCS, SIP, CS & IT 09, pp. 23–32, 2013. © CS & IT-CSCP 2013

DOI : 10.5121/csit.2013.3103

24

Computer Science & Information Technology (CS & IT)

activities of the autonomic nervous system [2].These signals are essentially non-stationary; may contain indicators of disease. The indicators may be present at all times or may occur at random in the time scale [3]. The HRV which is inversely proportional to the time differences between two consecutive Rwaves in a time series of ECG is given by the equation

HRV = [60 / t1, 60 / t 2, − − − − −60 / tn]

(1)

where [t1, t 2 − − − − − − − tn]T represent a time series composed of time intervals between consecutive R peaks in an ECG signal and HRV is expressed as beats per minute. Various researchers have contributed in automated analysis of HRV as an alternative to ECG characterization. Researchers have extensively discussed the investigated tools to indirectly assess the biological conditions based on HRV parameters under normal and different pathological conditions of cardiovascular system [4]. The continuous recording of HR shows regular fluctuations that reflect parasympathetic and sympathetic neural control on the sinus node [5]. The HRV has proved to be non stationary in nature. Hence the analysis of HRV expressing quantitative parameters has been carried out using time domain, frequency domain and non-linear approaches [4].Though time domain HRV analysis is popular, when it does to the assessment of the cardiac conditions based on the HRV, it is recommended to carryout spectral analysis of the signal [5]. In frequency domain the typical spectral pattern in normal conditions of HRV show the presence of three frequency bands: a very low frequency (VLF) band from 0.00 to 0.03 Hz, a low frequency (LF) band from 0.03 to 0.15 Hz and a high frequency (HF) band in respiratory range generally more than 0.25 Hz. The value of the LF component are related to the sympathetic activation whereas the area of the high frequency component (HF) provides a quantitative index of the influence of respiration on the ECG signal and may be connected to the vagal activity. Thus the LF/HF ratio is an important marker of sympatho-vagal balance on heart rate variability control . Typical HRV frequency bands 0

-10

-20

-30

Power (dB)

-40

-50

-60

-70

-80

-90

-100

-110

0

0.05

0.1

0.15

0.2

0.25 Frequency (Hz)

0.3

0.35

0.4

0.45

0.5

Figure 1. Spectrum for a normal HRV pattern showing typical VLF , LF and HF regions The Figure 1 shows the typical spectrum of a HRV signal indicating a normal functioning of the heart. HRV spectrum obtained by RR Interval of ECG is an unusual time series as both x and y axis indicates time intervals, one being related to the other. Further, since the variability in HR

Computer Science & Information Technology (CS & IT)

25

occurs on a beat-to-beat basis, the time series is inherently unevenly spaced along the horizontal axis as the number of ECG samples within each RR peaks are different [6]

RR-INTERVALS PLOT 400 350 300

RR Duration

250 200 150 100 50 0

0

2000

4000

6000

8000 10000 Sample No

12000

14000

16000

18000

Figure .2 The unevenly sampled RR Tachogram Figure 2 shows the RR-interval data where the total numbers of samples of ECG considered for obtaining RR-intervals are shown in x-axis. The y-axis represents the distances between consecutive R-R peaks. It is seen from the plot that both x and y axes are indicating number of samples. As shown the horizontal distance between each point (time stamp) is different for each adjacent pair, with the difference recorded on the vertical axis. This plot is called RR Tachogram. ECG R-Peak Detection Plot 1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5 x 10

4

Figure.3 R Peak detection in an abnormal ECG Detection Figure 3 shows R peak detection in an abnormal ECG which also has an unevenly sampled RR Tachogram. From the examples explained till now, fact that the RR Tachogram is unevenly sampled and this shows the necessity for re-sampling. Because, whenever spectral analysis is carried out using transformed domain approach, it is expected that the signal will have uniform sampling. Hence it is necessary to resample the Heart Rate or RR interval data before performing frequency domain analysis. Typical human heart rate is 72-82 bpm leading to heart rate of 1.5 Hz. A sampling rate of 4Hz is ideally preferred. Since the human heart rate can sometimes exceed 3Hz (180 beats per minute). However, if one knows that the RR Tachogram is unlikely to

26

Computer Science & Information Technology (CS & IT)

exceed 120 beats per minute then a re-sampling rate of 4Hz is sufficient. A choice of 7 Hz for the up-sampling of the RR-Tachogram satisfies the Nyquist criterion and gives a value of 2100 points in a 5-minute window. There are different approaches to have uniformly sampled RR Tachogram. In [6] the re-sampling methods have been discussed. Heart Rate Variabilty 150

140

130

120

R R Interval

110

100

90

80

70

60

50

40

2

2.5

3

3.5

4

4.5

Time in Secs

Figure. 4. HRV for 4.5 Seconds with RR Low limit of 0. 421 sec and RR High limit of 1.007 Sec Figure 4 shows typical HRV derived from re-sampling method for time duration of 4.5 seconds. The Computation of HRV is based on the Equ. (1) and Resampling rate of 4Hz is used using linear interpolation. The European and North American Task force on standards in HRV [8] suggested that the shortest time period over which HRV metrics should be assessed is 5 minutes. Though the approaches discussing the HRV derivation from RR Tachogram and HRV has been addressed by different researchers and the time frequency distribution of the HRV signals give additional information about cardiac activities. This paper aims to bring out such details using SPWVD technique. Section 2 gives the mathematical background about the method proposed. Section 3 gives the simulation results. The paper is concluded in section 4.

2. SPWVD METHOD FOR TIME FREQUENCY ANALYSIS Fourier Transform (FT) provides good description of the frequencies in a waveform, but not their time of occurrence. Complex exponentials representing frequency information of the signal stretch out to infinity in time. FT analyze the signal globally, not locally. To overcome the limitation of FT the Short-Time Fourier Transform (STFT) was introduced for processing nonstationary signals [8]. A window function is applied to a segment of data, effectively isolating that segment from the overall waveform, and the FT is applied to that segment. But this necessitates a tradeoff between time localization and frequency resolution. By considering the window size as small, an increase in frequency resolution is achieved at the cost of time resolution and vice versa. As an alternative approach, the Wigner–Ville distribution (WVD) [9] alleviates this tradeoff. The WVD at any instant is the FT of the instantaneous autocorrelation (IACR) sequence of infinite lag length[10].Though WVD gives high resolution in time-frequency domain, it is not used widely for practical application due to the interaction between different signal components, introducing cross frequency values into the spectrum called as “cross term”. These terms demonstrate energies at time–frequency values where they do not exist. The Wigner function cannot be directly interpreted as a probability distribution function because, in the general case, it is necessarily negative in some regions of phase space. For an indirect probabilistic interpretation, a non-negative phase space function is necessary. The phase space distribution which is produced in simultaneous un-sharp measurements of position and momentum can be represented as a

Computer Science & Information Technology (CS & IT)

27

convolution sum of the WVDs of the individual signals plus additional term, the cross term. They represent the interaction of two frequencies and their relative phases. These terms may exist even for values of time at which the signal is zero and may not be negligible. The WVD may take negative values [10]. Practically, it is the Pseudo WVD (PWVD) that is computed which considers IACR only for a finite number of lags. In the PWVD the IACR is weighted by a common window function to overcome the abrupt truncation effect known as Gibbs effect. Shorttime Fourier transforms (STFTs) cannot accurately track changes in a signal's spectrum that occur over the course of a few seconds, this is a significant limitation for many biological signals. Smoothing time and frequency functions are used to enhance the readability of the Pseudo Wigner-Ville spectrum by eliminating the cross terms inherent to the bilinear nature of the distribution without affecting the resolution. This is called Smoothed pseudo WVD (SPWVD) [11]. The use of different smoothing kernels results in a class of distribution, called the Cohen's class. But the WVD obtained by using common smoothing kernel (other than rectangular) do not satisfy some of the TFR properties [12]. FT is a reversible transform. For a signal x(t), the FT is :

X ( w) =







x (t ) e − jwt dt

(2)

Time domain signal is:

x(t ) =

1 2π





−∞

X ( w)e jwt dw

(3)

Where x(t) is the time domain signal of interest. In equation(2), the X ( w) shows the strength of the each frequency component over the entire interval ( −∞ , ∞ ).It does not show when those frequencies occurred. In STFT, the Fourier Transform is applied to a segment of data that is shorter, often much shorter, than the overall waveform. A window function h(t ) is chosen whose length is equal to the lengths of the segments. The window function used is a Gaussian function in the form:

w(t ) = e

(− a

t2 ) 2

(4)

The basic equation for the STFT in the continuous domain is: ∞

X(t,f) = ∫ x (τ ) w(t − τ )e− jπ ft dτ −∞

(5)

where w(t − τ ) is the window function and τ is the variable that slides the window across the waveform, x(t ) . However STFT suffers from the Gibb’s phenomena. A time-frequency energy distribution which is particularly interesting is the WVD defined as: ∞ τ τ ψ (t, f ) = ∫ x(t + )x* (t − )e− jπ f τ dτ −∞

Where

2



2

τ

τ

r (t ,τ ) = ∫ x(t + ) x* (t − ) −∞ 2 2

(7) (8)

is the instantaneous auto-correlation sequence and τ is the time lag. In frequency domain, the WVD is given by: ∞ ' w w Wx (t , w) = ∫ X ( w + ) X * (t − )e− jw t dw' −∞ 2 2

(9)

The interference terms in WVD can be reduced by smoothing in time and frequency. The result is the smoothed-pseudo Wigner-Ville distribution (SPWVD) which is defined as follows.

28

Computer Science & Information Technology (CS & IT)

If one chooses a separable kernel function f (ξ ,τ ) = G (ξ ) h(τ ) , with a Fourier transform of the form : (10) F (t , v) = FT [ f (ξ ,τ )] = g (t ) H (v) We obtain the smoothed pseudo Wigner-Ville distribution as:

τ

τ

SPWVD(t ,v) = ∫ h(τ )[∫ g (s − t ) x(s + ) x* (s − )ds]e− j 2π vt dτ 2 2

(11)

3. SIMULATION RESULTS In 1999, researchers at Boston’s Beth Israel Deaconess Medical Center, Boston University, McGill University, and MIT initiated a new resource for the biomedical research community. This resource is setup to help with simulation for current research and new investigations in the study of complex biomedical signals. Physio Net is an online forum for dissemination and exchange of recorded biomedical signals, by providing facilities for analysis of data and evaluation of proposed new algorithms. The resource website http://www.physionet.org, PhysioNet is used for carrying out the SPWVD time frequency analysis for the HRV in this paper. The ECG signals of different data sets available in this resource are chosen for the simulation work in this paper. The HRV is derived from RR Tachogram and is analyzed using SPWVD technique. The following set of data files give the details as recorded in MIT data base and their corresponding analysis.The record 101 has 342 normal beats, 3 APC beats and 2 unclassified with total of 1860 beats. The normal rhythm rate is 55 to 79. There is clean ECG for duration of 30 minutes. Figure 5.a shows the SPWVD spectrum and Figure 5.b shows its contour. The continuity of the line in the frequency axis in Figure 5.b shows the HF component of the HRV spectrum. In abnormal heart rate the HF and LF component position will be shifted indicating abnormality of the heart. .

Figure 5. a normal RR interval spectrogram with ECG database 101, from MIT/BIH data base

Computer Science & Information Technology (CS & IT)

29

Figure 5.b Contour plot for the record 101 which has maximum number of Normal beats. The continuity of the line at normalised frequency around o.8 , indicate the HF component of the HRV spectrum.

Figure.6. a. Plot of RR interval obtained from ECG of a patient suffering from Congestive Heart Failure. (MIT/BIH Data RR-Interval base)

Figure .6 .b SPWD spectrum for normal heart rate. Fig 10.b.Contour plot of SPWVD of RRInterval data obtained from ECG of a subject HF component is in the middle shifted from the middle of frequency axis. The SPWD is computed with a window length of 8 sample, time smoothing window size is 16 samples

30

Computer Science & Information Technology (CS & IT)

Figure 6.c. Contour plot of the SPWD spectrum showing the discontinuity in HF component of HRV spectrum. The position of the HF is also shifted widened in the axis.

Figure 7.a.Power Spectrum using SPWVD showing the Time Frequency distribution of time series representing normal HR

Figure 7.b.Contour plot of SPWVD of RR-Interval data obtained from ECG of a subject, having normal heart rate. HF component is in the middle of the frequency axis. The SPWD is computed with a window length of 8 sample, time smoothing window size is 16 samples

Computer Science & Information Technology (CS & IT)

31

Figure.8.RR interval of a diseased subject with severe changes in the distances between the consecutive RR intervals

Figure 9.a The PSWVD spectrum showing the Time Frequency plot of Heart Rate variability for the sample shown in Figure 8

Figure 9.b. Contour plot of the HRV obtained for the RR interval time series shown in Figure 8 Figure 6.b.and Figure 6.c.shows the RR time series of abnormal heart rate shown in Figure 6.a The time-frequency plots are shown in Figure 7.a. and Figure 7.b for normal heart. Similarly for the abnormal RR time series indicated in Figure8, Figure 9.a and Figure 9.b. shows the HRV SPWED plots. The time frequency and contour plots show that the changes in the frequency of

32

Computer Science & Information Technology (CS & IT)

the components correspond to physiological changes which will be taking place. In the HRV spectrum, strength of the second component (HF) relative to that of the first decides about the normal and abnormal classification. From this point of view, the sub band processing algorithms like SPWVD have better performance However, the frequency transition from one band to the other may not have brought out well which is not of importance in this case.

4. CONCLUSION The SPWVD with any time and frequency smoothing, provides good time and frequency resolution. Paper shows how it is possible to have time frequency distribution of different HRV signal obtained for MIT database, using SPWVD. The corresponding TFDs show that whenever there are deviations from normal beating of the heart like congestive heart failure , heart beats after Myocardial Infarction ,the regular repetitive patterns of frequency distributions are changed. Though the attempt here is not made to give the statistics about time frequency analysis for the various data files used for the experimentation, it is possible to observe the extent of nonstationary behaviour of HRV in a longer duration data analysis.

REFERENCES [1]

[2]

[3] [4]

[5] [6] [7] [8] [9]

[10] [11] [12] [13]

Norhashimah Mohd Saad, Abdul Rahim Abdullah, and Yin Fen LowDetection of Heart Blocks in ECG Signals by Spectrum and Time-Frequency Analysis, 4th Student Conference on Research and Development (SCOReD 2006), Shah Alam, Selangor, MALAYSIA, 27-28 June, 2006 Schwartz, P.J., and Priori, S.G. (1990): ‘Sympathetic nervous system and cardiac arrythmias’, In: Zipes, D.P., and Jalife, J. eds. Cardiac Electrophysiology, From Cell to Bedside. Philadelphia: Saunders, W.B. pp. 330–343. Levy, M.N., and Schwartz, P.J. (1994): ‘Vagal control of the heart: Experimental basis and clinical implications’, Armonk: Future. Task Force of the European Society of Cardiology and North American Society of Pacing and Electrophysiology. (1996): ‘Heart Rate Variability: Standards of measurement, physiological interpretation and clinical use’, European HeartJournal, 17, pp. 354–381. Berger, R.D., Akselrod, S., Gordon, D., and Cohen, R.J. (1986): ‘An efficient algorithm for spectral analysis of heart rate variability’, IEEE Transactions on Biomedical Engineering, 33, pp. 900–904. Kamath, M.V., and Fallen, E.L. (1995): ‘Correction of the heart rate variability signal for ectopics and missing beats’, In: Malik, M., and Camm, A.J. eds. Heartrate variability, Armonk: Futura, pp. 75–85. Kobayashi, M., and Musha, T. (1982): ‘1/f fluctuation of heart beat period’, IEEE transactions on Biomedical Engineering, 29, pp. 456–457. Boomsma, F.T., and Manintveld. (1999): ‘Cardiovascular control and plasma catecholamines during restand mental stress: effects of posture’, Clinical Science, 96, pp. 567–576. Viktor, A., Jurij-Matija, K., Roman, T., and Borut, G. (2003): ‘Breathingrates and heart rate spectrograms regarding body position in normal subjects’,Computers in Biology and Medicine, 33, pp. 259–266. Rosenstien, M., Colins, J.J., and De Luca, C.J. (1993): ‘A practical method for calculating largest Lyapunov exponents from small data sets’, Physica D,65, pp. 117–134. Pincus, S.M. (1991): ‘Approximate entropy as a measure of system complexity’, Proceedings of National Acadamic Science, USA, 88, pp. 2297–2301.12. Peng, C.K., Havlin, S., Hausdorf, J.M., Mietus, J.E., Stanley, H.E., andGoldberger, A.L. (1996): ‘Fractal mechanisms and heart rate dynamics’, Journal on Electrocardiology, 28 (suppl), pp. 59–64. Grossman, P., Karemaker, J., and Wieling, W. (1991): ‘Prediction of tonicparasympathetic cardiac control using respiratory sinus arrhythmia: the need for respiratory control’, Psychophysiology, 28, pp. 201–216.

SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION Sheena C V1 , T M Thasleema2 and N K Narayanan3 Department of Information Technology, Kannur University, Kerala, INDIA [email protected], [email protected] and [email protected]

ABSTRACT This paper reports a word modeling algorithm for the Malayalam isolated digit recognition to reduce the search time in the classification process. A recognition experiment is carried out for the 10 Malayalam digits using the Mel Frequency Cepstral Coefficients (MFCC) feature parameters and k - Nearest Neighbor (k-NN) classification algorithm. A word modeling schema using Hidden Markov Model (HMM) algorithm is developed. From the experimental result it is reported that we can reduce the search time for the classification process using the proposed algorithm in telephony application by a factor of 80% for the first digit recognition.

KEYWORDS Isolated Digit Recognition, Mel Frequency Cepstral Coefficient, k - Nearest Neighbor, Hidden Markov Model.

1. INTRODUCTION Speech recognition is one of the active research areas in Human Computer Interaction [1]. Speech Recognition is the ability of a computer to recognize general, naturally flowing utterances from a wide variety of speakers. It involves capturing and digitizing the sound waves, converting them to basic language units or phonemes, constructing words from phonemes, and contextually analyzing the words to ensure correct spelling for words that sound alike. This paper discusses two different stages for Malayalam digit recognition using Mel Frequency Cepstral Coefficients (MFCC) algorithm. In the first stage a recognition experiment is carried out using k-NN algorithm and in the later part a word modeling algorithm is proposed for the Malayalam telephony application using Hidden Markov Model (HMM) for faster classification. The basic theory of HMM was introduced and studied in the late 1960s and early 1970s [2]. But from the literature study, it is reported that only in the past some decades only HMM has been applied accurately to problems in speech processing. These models are very rich in mathematical structure and thus it can provide the theoretical basis for use in wide range of applications. Here we have developed a word modeling algorithm for the Malayalam isolated digits using HMM. Sundarapandian et al. (Eds) : ITCS, SIP, CS & IT 09, pp. 33–38, 2013. © CS & IT-CSCP 2013

DOI : 10.5121/csit.2013.3104

34

Computer Science & Information Technology (CS & IT)

Malayalam is the one of the major language in the Dravidian language family. It is regional language of south Indian state of Kerala and also on the Lakshadweep islands spoken by about 36 million people [3]. The phonemic structure of Malayalam contains 51 vowels/consonant-vowel sounds in which 15 long and short vowels and 36 consonant-vowel sounds. Due to lineage of Malayalam to both Sanskrit and Tamil, Malayalam language structure has the largest number of phonemic utterances among the Indian languages [4]. Malayalam script includes letters capable of representing all the phoneme of Sanskrit and all Dravidian languages [5]. In this work, in the recognition stage we have used ten Malayalam digits uttered by a single speaker repeated 20 times and is tabulated in the table 1. In the word modeling part we have used isolated digits in Malayalam for the use of telephone applications.

The organization of this paper is as follows. In section II feature extraction using MFCC algorithm is discussed. Section III explains recognition experiments using k-NN algorithm and the results are discussed. Section IV gives an overview on HMM followed by Malayalam isolated digit word modeling using the probability matrix. Finally in section V concludes the present work followed by directions for future work.

2. FEATURE EXTRACTION USING MFCC COEFFICIENTS Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately. In this paper we discusses one of the basic speech feature extraction technique namely Mel Frequency Cepstral Coefficient (MFCC). The MFCC method uses the bank of filters scaled according to the Mel scale to smooth the spectrum and to perform similar to that executed by the human ear [6]. The filters with Mel scale spaced linearly at low frequencies up to 1 kHz and logarithmically at higher frequencies are used to capture the phonetical characteristics of the speech signals. Thus MFCCs are used to represent human speech perception models. MFCCs are computed as in fig 1.

Computer Science & Information Technology (CS & IT)

35

Frame blocking is the process of segmenting the speech samples obtained from the analog to digital (A/D) conversion into small frames with time length in the range of (20 to 40) milliseconds. In the next step windowing is carried out to each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. After that Fast Fourier Transform (FFT) is applied for converting each frame of N samples from the time domain in to the frequency domain. Then each frame with actual frequency, f measured in Hz is converted on a scale called the ‘Mel’ scale. The Mel frequency is calculated using the formula

Fmel = 2595 log(1 + FHz ) / 700

(1)

The Mel-frequency scale is linear frequency spacing below 1000Hz and a logarithmic spacing above 1000Hz. The log Mel spectrum is again converted into time domain using discrete cosine transform (DCT) to get Mel Frequency Cepstral Coefficients (MFCC). Thus the MFCC is derived by applying above described procedure for each speech frame. A set of MFCC coefficients are extracted by taking the average of each frame and are used as a feature set in the k-NN recognition algorithm.

3. SPEECH RECOGNITION USING K-NN ALGORITHM k-Nearest Neighbor algorithm (k-NN) is part of supervised learning that has been used in many applications in the field of data mining, statistical pattern recognition and many others [7]. k- NN is a method for classifying objects based on closest training samples in the feature space. An object is classified by a majority vote of its neighbors. k is always a positive integer. The neighbors are taken from a set of objects for which the correct classification is known [8]. Hand proposed an effective trial and error approach for identifying the value of k that incurs highest recognition accuracy [9]. Various pattern recognition studies with highest performance accuracy are also reported based on these classification techniques [10]. k-NN assumes that the data is in a feature space. If k=1, then the algorithm is simply called the nearest neighbor algorithm. In the example in Fig. 2, we have three classes and the goal is to find a class label for the unknown example xj. In this case we use the Euclidean distance and a value of k=5 neighbors. Of the 5 closest neighbors, 4 belong to w1 and 1 belongs to w3, so xj is assigned to w1, the predominant class.

36

Computer Science & Information Technology (CS & IT)

In the specified experiment a database of 20 repetitions of 10 Malayalam digits are used for testing and training purpose.One hundred samples are taken for training and one hundred samples for testing. An average recognition accuracy of 62% is obtained using k-NN algorithm for the Malayalam digit recognition.

4. WORD MODELLING USING HMM Hidden Markov Model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unobserved state [11]. The model is completely defined by the set of parameters A, B and π where A is the transition probability, A = {aij},1 ≤ i ≤ j ≤ N , B is the emission probability, B = bj ( wk )},1 ≤ j ≤ N ,1 ≤ k ≤ M is the probability of the observation wk being generated from the state j, π is the initial state probabilities. Thus a model of N state and M observation can be defined by λ= (A,B,π) [2]. The present work discusses the modeling algorithm developed for the Malayalam isolated digit recognition in telephony application. We considered 50 mobile numbers of the BSNL service provider. Here we calculated and tabulated the initial probabilities and transition probabilities and are given in table 2 and the corresponding HMM model is shown in fig 3.

Computer Science & Information Technology (CS & IT)

37

From the tabulated result it is found out that the initial probability for all the digits except the digit 8 and 9 is 0 while the digit 8 has the probability 0.05 and 9 has the probability0.95, since the database contains all the BSNL numbers starting with the digit 8 and 9. In this work we make use this result to reduce the search time in the recognition experiment in such a way that in the classification stage we can start the recognition experiment only by considering the digit 8 and 9 and hence we can reduce the search time by a factor of 80% in the first digit. The similar procedure can be extended to the successive digits also, resulting in a good reduction in search time in recognition/classification experiment.

5. CONCLUSIONS This paper presented a word modeling schema for the recognition of Malayalam isolated digit recognition using various mobile numbers uttered in Malayalam. Two different stages are carried out in this study. In the first stage a recognition experiment is carried out using MFCC coefficients and k-NN algorithm for the 10 Malayalam digits and a recognition accuracy of 62% is obtained. In the second stage a word modeling algorithm is proposed for the Malayalam isolated digits using HMM. From the experimental result it is reported that using proposed algorithm we can reduce the search time by a factor of 80% in the recognition of first digit for the classification process of BSNL telephone number recognition system. The modeling of the all the isolated digits from the different service providers using HMM modeling algorithm and its recognition using other classification algorithms are some of our future research directions.

REFERENCES [1]

Rabiner Lawrence and Biing-Hwang Juang (1993) Fundamentals of speech Recognition Pretice Hall.

[2]

L. R.Rabiner and B. H. Juang, (1986), “An Introduction to Hidden Markov Models”, IEEE Magazine , pp. 4 – 16, 1986.

[3]

Ramachandran, H. P (2008) Encyclopedia of language and linguistics,. Oxford: Pergamon Press.

[4]

Aiyar, S (1987). Dravid ian theories, p. 286.

ASSP

38

Computer Science & Information Technology (CS & IT)

[5]

Govindaraju, V., & Setlur, S (2009), Advances in pattern recognition. Guide to OCR for Indic scripts:Document recognition and retrieval, Berlin: Springer. (p. 126).

[6]

Ibrahim Patel and Y Srinivas Rao, (2010), “ Speech Recognition using HMM with MFCC analysis using frequency spectral decomposition tech nique”, Signal and Image Processing - An International Journal, Vol. 1(2), pp. 101 – 110.

[7]

Zhang. B and Srihari S N, (2004), “Fast k – Nearest Neighbor using Cluster Based Trees”, IEEE trans. on Pattern Analysis and Machine Intelligence, Vol. 26(4), pp. 525 – 528.

[8]

Pernkopf.F,(2005),“Bayesian Network Classifiers versus selective Recognition,Vol. 38, pp. 1 – 10.

[9]

Hand D J (1981) Discrimination and classification, NewYo rk, Wiley.

k–NNClassifier Patter

[10] Ray A. K and Chatterjee B, (1984), “Design o f a Nearest Neighbor Classifier System for Bengali Character Recognition”, Journal of Inst. Elec. Telecom. Eng , Vol. 30, pp 226 – 229,. [11] Daniel Jurafsky and James Martin (2004) Speech and Language Processing, Pearson Education.

Authors Sheena C V received her MSc in Computer Science from, Kannur University , Kerala, India in 2008, she is currently a Ph.D. student under Prof Dr.N.K.Narayanan at Department of Information Technology, Kannur University, Kerala, India. Her research interests include Computer Vision, Digital Image Processing, Digital Speech Processing, Artificial Intelligence and Artificial Neural Networks.

T M Thasleema had her M Sc in Computer Science from Kannur University, Kerala, India in 2004. She had to her credit one book chapter and many research publications in national and international levels in the area of speech processing and pattern recognition. Currently she is doing her Ph.D in speech signal processing at Department of Information Technology, Kannur University under the supervision of Prof Dr N. K Narayanan.

Dr. N.K. Narayanan is a Senior Professor of Information Technology, Kannur University, Karala, India. He earned a Ph.D in speech signal processing fro m Department o f Electronics, CUSAT, Kerala, India in 1990. He has published more than hundred of research papers in national & international journals in the area of Speech processing, Image processing, Neural networks, ANC and Bioinformatics. He has served as Chairman of the School of Information Science & Technology, Kannur University during 2003 to 2008, and as Principal, Coop Engineering College, Vadakara, Kerala, India during 2009-10. Currently he is the Director, UGC IQAC, Kannur University.

A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES Gullanar M. Hadi1 and Nassir H. Salman2 1

Department of Software Engineering , Salahaddin University,Erbil, Iraq [email protected] 2

Department of Computer Science , Cihan University, Erbil, Iraq [email protected]

ABSTRACT In the first study [1], a combination of K-means, watershed segmentation method, and Difference In Strength (DIS) map were used to perform image segmentation and edge detection tasks. We obtained an initial segmentation based on K-means clustering technique. Starting from this, we used two techniques; the first is watershed technique with new merging procedures based on mean intensity value to segment the image regions and to detect their boundaries. The second is edge strength technique to obtain accurate edge maps of our images without using watershed method. In this technique: We solved the problem of undesirable over segmentation results produced by the watershed algorithm, when used directly with raw data images. Also, the edge maps we obtained have no broken lines on entire image. In the 2nd study level set methods are used for the implementation of curve/interface evolution under various forces. In the third study the main idea is to detect regions (objects) boundaries, to isolate and extract individual components from a medical image. This is done using an active contours to detect regions in a given image, based on techniques of curve evolution, Mumford–Shah functional for segmentation and level sets. Once we classified our images into different intensity regions based on Markov Random Field. Then we detect regions whose boundaries are not necessarily defined by gradient by minimize an energy of Mumford–Shah functional for segmentation, where in the level set formulation, the problem becomes a mean-curvature which will stop on the desired boundary. The stopping term does not depend on the gradient of the image as in the classical active contour. The initial curve of level set can be anywhere in the image, and interior contours are automatically detected. The final image segmentation is one closed boundary per actual region in the image.

KEYWORDS Watershed, difference in strength map, K-means, edge detection, image segmentation. Active counters, Level set method, Markov Random Field

Sundarapandian et al. (Eds) : ITCS, SIP, CS & IT 09, pp. 39–50, 2013. © CS & IT-CSCP 2013

DOI : 10.5121/csit.2013.3105

40

Computer Science & Information Technology (CS & IT)

1. INTRODUCTION Edges are boundaries between different textures. Edge also can be defined as discontinuities in image intensity from one pixel to another. The edges for an image are always the important characteristics that offer an indication for a higher frequency. Detection of edges for an image may help for image segmentation, data compression, and also help for well matching, such as image reconstruction and so on. In the method of image segmentation, we focus on the idea that edges define boundaries and that regions are contained within these edges. Edge detection refers to the process of identifying and locating sharp discontinuities in an image. There are many methods to make edge detection as follows: To perform image segmentation and edge detection tasks, there are many methods that incorporate region-growing and edge detection techniques, for example, it is applying edge detection techniques to obtain Difference In Strength (DIS) map then employ region growing techniques to work on the map as in [1] and [2]. In [3], combining both special and intensity information in image segmentation approach based on multi-resolution edge detection, region selection and intensity threshold methods. As in [4], Pappas considered the problem of segmenting images with smooth surfaces, which presents a generalization of the K-means clustering algorithm to include special constraints and to account for local intensity variations in the image. Qixiang Ye et al. [5] have proposed to find main edges meanwhile filter edges within texture regions. They have computed pixel similarity degree around a pixel, have computed a new gradient, and applied a Canny like operator to detect and locate edges. Caragea [6] detects the difference between pairs of pixel around a pixel and uses the highest value from the difference of four pairs of pixels that can be used to form a line through the middle pixel. Al-amri et al. [7] presented methods for edge segmentation of satellite image: they used seven techniques for this category; Sobel operator technique, Prewitt technique, Kiresh technique, Roberts technique, Laplacian technique, Canny technique and Edge Maximization Technique (EMT) and they are compared with one another so as to choose the best technique for edge detection segment image. P. Thakare [8] discussed about some image segmentation techniques like edge based, region based and integrated techniques and explains in brief the edge based techniques and their evaluation. They also focuses on edge based techniques and their evaluation.

2. LEVEL SET METHOD In mathematics, a level set of a real-valued function f of n variables is a set of the form: { (x1,...,xn) | f(x1,...,xn) = c } …………….……..(1) where c is a constant. That is, it is the set where the function takes on a given constant value. When the number of variables is two, this is a level curve (contour line), if it is three (in 3-D ; Surface (Interface) evolution), this is a level surface, and for higher values of n the level set is a level hyper surface. So level set methods are used for the implementation of curve/interface evolution under various forces . In this method many modified functions were used: [4] and [9]

Computer Science & Information Technology (CS & IT)

41

a) Function evolve2D () was used which is a high level function that takes an input, evolves it N iterations and returns the result. b) Function contour (Z) is a contour plot of matrix Z treating the values in Z as heights above a plane. A contour plot is the level curves of Z for some values V. The values V are chosen automatically. The contours are normally colored based on the current color map. c). Function imcontour; where imcontour (I) was used to create contour plot of image data and to draw a contour plot of the intensity image I. We found that 50 iterations are very good for some images, and the evolution type depend upon the parameters values in function Phi[4][9]:see Figure (6) phi = evolve2D(phi,dx,dy,0.5,25,[],[],0,[],0,[],[],1,b); also the processing time depend upon iteration numbers.

3. EDGE STRENGTH MERGING PROCESS Two edge strengths gradient values (T1, T2) were used in one subroutine, T1 is less than T2. For example if we choose T1 = 1 and if the Edge strength as in equation (2) is less than 1, we get merging of every two adjacent regions because the watershed algorithm [10] we used based on immersion procedure and in this procedure it looks to the topographic surface. It means we related intensity values as an altitude (height) and we got merging results by comparing the gradient values of the edge points (pixels) between the two regions and the region itself. If the points have low gradient values, that means the merging was done and the region becomes large. So; in this procedure it is very important and useful the choosing values of T1, T2 in our merging process. See the results in Figure 3.

∑ Gradient Edge

strength

=

( p)

(2)

p ∈ Edge

N

Where Gradient (p) represents edge points gradient values which come from the gradient image step for all pixels (p) on the edge between every two regions, and N are the number of edge pixels. 3.1 Edge Strength Merging Process results

(a) Original image of Brain image (512x512).

(b) Seg. image into 6 regions by k-means method.

42

Computer Science & Information Technology (CS & IT)

(c) Then segmented image with edges (region map) by watershed algorithm.

(d) An accurate edge map after watershed & merge process by mean value.

Figure 1. The results of K-means method, watershed algorithm, and merging techniques. [cpu = 27.835s. [20.936 s for K-means. 6.229 s for watershed and merging. And 0.670 s for two edge strength].

4. DIFFERENCE IN STRENGTH TECHNIQUE RESULTS The DIS for each pixel was calculated using equation (3) [11]. And after processing all the input pixels, the DIS map was obtained. In DIS map, the larger the DIS value is, the more the pixel is likely located at the edge. At this step, a 3x3 window runs pixel by pixel on the input image. When the window runs over the bolder of the input image, pixels outside the bolder are given the gray level of the input nearest to it. The DIS for the center pixel as in Figure 2, for example, was calculated as in equation (3) [11].

Z1 − Z3 + Z1 − Z5 + Z1 − Z6 + Z1 − Z7 + Z1 − Z8 + Z2 − Z4 + Z2 − Z5 + Z2 − Z6 + Z2 − Z7 + Z2 − Z8 .......... ..........(3) Z3 − Z4 + Z3 − Z6 + Z3 − Z7 + Z3 − Z8 + Z4 − Z5 + Z4 − Z7 + Z4 − Z8 + Z5 − Z6 + Z5 − Z7 + Z6 − Z8

Computer Science & Information Technology (CS & IT)

Z1

Z2

Z4 Z6

Z7

Z3

Z1

Z5

Z4

Z8

Z6

Z2

43

Z3 Z5

Z7

Z8

Figure 2. The DIS detecting windows. Examples of DIS maps are shown in Figure 3-b. One can expect that the values of DIS should be small in the smooth regions obtained by k-means. The greater DIS value represents that the pertaining pixel is on the area that changes severely in gray levels. With the DIS map one can check with the result of image segmentation based on K-means. It is clear that the DIS map consists of all edge information about the input image even on the smooth regions. Since the DIS of the smooth region is small (weak edge), one can use a threshold T to eliminate false edges and thus obtain larger regions. In this case, the DIS map provides the complete edge (strong and weak) information about the image. By exploiting these information, one can accurately locate the contour of an object. Now to find the effect of DIS, we used multithreshoding edge detection; first we calculated DIS for each pixel in the image then we calculated the mean value of DIS for the whole image. From the mean value we thresholded our image by different % of mean DIS. The threshold for discarding weak edges is set to the mean of DIS as in Figure 5 (d through g). The threshold used for connected edges is set to the 50% of mean DIS. So, using multi-threshold is important to eliminate false edges and thus obtain larger regions as in Figure 3 (d through g). The region map without threshold is shown in Figure 3-c. As we can see from the Figure 3 (d-g) compare with the Figure 3-c, the concept that an object should have a closed contour help us to eliminate redundant edge pixels and connect the broken contour by using multi-threshold based on different values of mean DIS of the whole image under study. But if we take k-means and then DIS with 25% of mean DIS, we will get all the edges of our images as in the Figure 4 below and we don't need to use watershed technique.

44

Computer Science & Information Technology (CS & IT)

(a) Original image.

(b) DIS map of image (a).

(c) Without threshold.

(d) Threshold 25% of DIS mean.

(e) T 50% of DIS mean.

(f) T 100% of DIS.

(g) T 120% of DIS mean.

Figure 3. DIS map of image and multithreshoding of DIS mean edge detection results.

Computer Science & Information Technology (CS & IT)

45

4.1 Other DIS Method Results The experimental results are shown in Figures (1, 3) above and Figure (4) below. Medical images as brain images are simple pattern images with the size of {156 x 156} and 256 gray levels images and other images to test our segmentation and edge detection methods. We obtained output images consist of all edge information and regions about the input image.The region maps are shown in Figure1-c. As can be seen from the edge maps Figure 1-d, that there are no broken lines on the whole image regions. The output image was displayed as an edge map as in Figure 1d, Figure 3(e through g); and Figure 4-c.

(a) Original image.

(b) After K-means process.

(c) Using threshold 25% of mean DIS.

Figure 4. Edge map using K-means process and thresholding 5% of mean DIS.

5. ACTIVE CONTOUR RESULTS BASED ON LEVEL SET The basic idea in active contour models or snakes is to evolve a curve, subject to constraints from a given image, in order to detect objects in that image. For instance, starting with a curve around the object to be detected, the curve moves toward its interior normal and has to stop on the boundary of the object. In the classical snakes and active contour models (see [12], [13],[14], [15]), an edge-detector is used, depending on the gradient of the image u0 , to stop the evolving curve on the boundary of the desired object.during our process, we used initial segmented images (different intensity regions) based on Markov Random Field to superimpose the region boundary and to extract the bounded region (segmented map) in our image as in Figure. 5 (b & c) as an example.

46

Computer Science & Information Technology (CS & IT)

Original image

(b) Segmented image by M-Shah GAC method

(d) Abdomen image after

(c) Segmented map of bounded area in (b) (extracted region )

(e) Step1 initial curves

(f) Step 2

(h) Semi final results

(i) Brain image after MRF

MRF method

(g)

Step3

Computer Science & Information Technology (CS & IT)

(j) Initial curves

47

(k) Final results

Figure . 5 Segmentation results by Mumford-Shah Geodesic Active Contours (GAC) So we superimposed the edge of the different regions in the image using Mumford-Shah method after we chose few closed curves represent different intensity area in our image. For example as shown in Figure.5 (e&j). So our results accuracy depend on, if the results of MRF is accurate then the regions boundaries are in correct position as shown from the figures above. Also in this method; if we want to choose any region in the image and to define its edge, we can do all that. Then we can calculate some region information such as the area of that region , region map and contour length clearly. We can use different kind of images to extract different features (roads , rivers , agricultural areas …etc ) as in remote sensing images.

6. LEVELSET METHODS RESULTS USING MATLAB AS IN FIG(6)

Figure 6. The final results of levelset method to brain image and elapsed time of processing (28.5 sec)

48

Computer Science & Information Technology (CS & IT)

7. CONCLUSION In DIS method:, the segmentation regions and their boundaries were defined well and all of the boundaries are accurately located at the true edge as shown clearly from Figure1-(c, d), Figure 3g, and Figure 6-c. And if we take k-means first and then DIS with 25% of mean DIS, we will get all the edges of our images as shown in the Figure 4 above, so we need’nt to use watershed technique. Also we concluded that using multi-threshold is important to eliminate false edges and thus obtain larger regions, the DIS map consists of all edge information about the input image even on the smooth regions, and the combination of k-means, watershed segmentation method, DIS map are good techniques to perform image segmentation and edge detection tasks, where the final segmentation results are one closed boundary per actual region of the image under study, and the two edge strengths gradient values (T1, T2), T1 is less than T2, are very sensitive to get good results. Where the incorrect choosing of these values gives us uncorrected image segmentation and edge detection results and this is a disadvantage. So we will develop this work in future with automatically determined the threshold values. Finally, the disadvantages of these techniques are depending mainly on k-means results, where if the clustering procedure doesn't implement correctly, the results are incorrect by the other techniques we used. However, in this paper we solved the problem of undesirable oversegmentation results produced by the watershed algorithm, also the edge maps we obtained have no broken lines on entire image. In levelset method: we conclude that the levelset technique and extraction an object methods give us very accurate and clear results. We found that 50 iterations are very good for some images, and the evolution type depend upon the parameters values in function Phi[4,8]:see Figure (6) phi = evolve2D(phi,dx,dy,0.5,25,[],[],0,[],0,[],[],1,b); also the processing time depend upon iteration numbers. In the other study using an active contours based on techniques of curve evolution, Mumford–Shah functional for segmentation and level sets is a good and accurate method to detect object(region) boundaries, to isolate and extract individual components from our image. It is possible to detect objects whose boundaries are not necessarily defined by gradient by minimize an energy of Mumford–Shah functional for segmentation which can be seen as a particular case of the minimal partition problem where the stopping term does not depend on the gradient of the image, as in the classical active contour and the initial curve of level set can be anywhere in the image . This help us to obtain the final image segmentation is one closed boundary per actual region in the image where the segmentation problem involves finding the closed curve C that lies along the boundary of the object of interest in the image. Then it is easy to calculate the region area and the boundary length. The level set approach allows the evolving front which can extract the boundaries of particularly intricate contours. Also we can use this method with different kind of images (e.g., medical images and remote sensing images) to detect object boundaries, to isolate and extract individual components (as a segmented maps) as shown as an example in Figure 5 (c).finally Edge detection refers to the process of identifying

Computer Science & Information Technology (CS & IT)

49

and locating sharp discontinuities in an image. The discontinuities are abrupt changes in pixel intensity which characterize boundaries of objects in a scene.

References [1]

Salman N and Liu C. Q.,(2003) “ Image Segmentation and Edge Detection Based on Watershed Techniques”, International Journal of Computers and Applications, Vol. 25, No. 4, pp. 258-263.

[2]

Yu, Yi-Wei, Wang, Jung-Hua.,(1999) Proc. of the IEEE international conference on Systems, Man and Cybernetics (SMC), 6 :P-798

[3]

Chowdhury, Mahbubul Islam; Robinson, John A.,(2000) IEEE Proc. of Canadian Conference on Electrical and Computer Engineering, 1 : P-312

[4]

Tang, H. , Wu, E. X., et al.(2000) Computerized Medical 349

[5]

Qixiang, Ye, Wen, G., Weiqquiang, W. (2003) “A New Texture Insensitive Edge Detection Method”,Institute of Computing Technology, Chinese Academy of Sciences ,China, ICICS-PCM 2003, 15-18 Dec 2003, Singapore.

[6]

Caragea S. 2008. Fondater, Administrator and Chief Editor, IntelliProject, “Difference Edge Detection“, Licensed under IntelliProject open License, Romania, http://www.intelliproject.net

[7]

S. S. Al-amri, N. V. Kalyankar and S. D. Khamitkar,( 2010 )“Image Segmentation by using Edge Detection”,International Journal on Computer Science and Engineering (IJCSE),Vol. 02,No.03, pp. 804-807.

[8]

Puman Thakare,(2011) “A study of image segmentation and edge detection techniques”, International Journal of Computer Science and Engineering (IJCSE), Vol. 3, No. 2, Feb 2011.

[9]

BarisSumengen, (2005)A Matlab toolbox implementing Level Set Methods., vision research lab at UC Santa Barbara.

Imaging and Graphics,Vol. 24,No. 6,P

[10] Matlab the language of technical computing, version 7.6.0.324(R2008a). [11] Vincent L. and Soille P. (1991) “Watershed in Digital Space: An Efficient Algorithm Based on Immersion Simulations,” IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol. 13, No. 6, pp. 583-593 [12] Yu Y. and Wang J., (1999)“Image Segmentation Based on Region Growing and Edge Detection,” in Proceedings of the 6th IEEE International Conference on Systems, Man and Cybernetics, Tokyo, Vol.6., pp. 798-803. [13] M. Kass, A. Witkin, and D. Terzopoulos,(1988) “Snakes: Active contourmodels,” Int. J.Comput.Vis., Vol. 1, pp. 321–331, 1988. [14] V. Caselles, F. Catté, T. Coll, and F. Dibos,(1993) “A geometric model for active contours in image processing,” Numer. Math., vol. 66, pp. 1–31, 1993. [15] R. Malladi, J. A. Sethian, and B. C. Vemuri,(1993) “A topology independent shape modeling scheme,” in Proc. SPIE Conf. Geometric Methods ComputerVision II, Vol. 2031, San Diego,CA, pp. 246–258.

50

Computer Science & Information Technology (CS & IT)

[16] V. Caselles, R. Kimmel, and G. Sapiro, “On geodesic active contours,” Int. J. Comput. Vis.,Vol. 22, No. 1, pp. 61–79, 1997.

Author Gullanar M. Hadi, born in 1964. She received her B.Sc. and M.Sc. degrees from Collage of Science, Al-Mustansyriah University, Baghdad –IRAQ in 1985 and 1989 respectively. She received his PhD degree in Opto-Electronics engineering from Shanghai Jiao Tong University-China in 2004. Her research interests include image processing and nano-materials. From 1985-2006, she worked at AlMustansyriah University , collage of Science, Baghdad –IRAQ, from 2006-2009 she worked as a lecturer in several Jordanian University, from 2009 to present she was lecturer in Software Engineering dept. , Salahaddin University, Erbil-Iraq.

Corresponding author: SALMAN, N. H, born in 1960. He received his B.Sc. and M.Sc. degrees from Al-Mustansyriah University / Collage of Science in 1983 and 1989 respectively. He received his PhD degree in Image Processing and Pattern Recognition from Shanghai Jiao Tong University-China in 2002. His research interests include remote sensing, image processing and image analysis based on image segmentation and edge detection techniques. Also he is interesting in computer programming languages, Matlab programming. From 1982-1998, he worked at Space Research Center -Remote Sensing Dept. in Baghdad –IRAQ, also he was an assistant professor (from 2002) in the Dept. of Computer Science-Zarqa Private University-Jordan. Now he is a head of Computer Science dept .Cihan University, Erb

A Much Advanced and Efficient Lane Detection Algorithm for Intelligent Highway Safety Prof. Sachin Sharma1 and Dr. D. J. Shah2 1

Department of Electronics & Communication, GTU, Ahmedabad, India

2

Department of Electronics & Communication, GTU, Ahmedabad, India

[email protected] [email protected]

ABSTRACT This paper presents a much advanced and efficient lane detection algorithm. The algorithm is based on (ROI) Region of Interest segmentation. In this algorithm images are pre-processed by a top-hat transform for de-noising and enhancing contrast. ROI of a test image is then extracted. For detecting lines in the ROI, Hough transform is used. Estimation of the distance between Hough origin and lane-line midpoint is made. Lane departure decision is made based on the difference between these distances. As for the simulation part we have used Matlab software.Experiments show that the proposed algorithm can detect the lane markings accurately and quickly.

Keywords Hough transform, Top-Hat transform, lane detection, lane departure, ROI Segmentation.

1. INTRODUCTION With the help of offered machine vision algorithms, dozens of processors control every performance aspect of today’s automobiles which are rising exponentially. In the future, vehicles tend to be more intelligent and shall assist the driver both concerning comfort and safety. Several facilities are being offered under Advanced Driver Assistance Systems (ADAS) like night vision assistance, lane departure warning system (LDWS), pedestrian detection system (PDS), smart airbags, cruise control, etc. As the reliability and the performance of the algorithms have been significantly improved due to the increasing performance of computers, vision systems have been acknowledged in the automatic control community as a powerful and versatile sensor to measure motion, position and structure of the environment. If efficient algorithms are developed for such modern vision systems, then the performance of the system will certainly improved to large extent. With increase in the challenges in identifying the road lanes, robust algorithms must be used to mitigate the problems of poor lane detection, less efficiency poor performance under different traffic and environmental conditions. Many time road lanes are fade and not visible.

Sundarapandian et al. (Eds) : ITCS, SIP, CS & IT 09, pp. 51–59, 2013. © CS & IT-CSCP 2013

DOI : 10.5121/csit.2013.3106

52

Computer Science & Information Technology (CS & IT)

2. PROBLEM ADDRESSED With increasing challenges in the identification of road lanes, robust algorithms must be used to mitigate the problems of poor lane detection, less efficiency, poor performance in traffic and different environmental conditions. Many time road lanes are fade and not visible. Two-lane, three lane, and four - lane roads are present in many cities of the developed and under-developed countries. Such factors are becoming obstacles in identifying the road lane for Lane Departure Warning Systems (LDWS). Especially, when multiple lanes are present on a road, the detection algorithm may identify all these lanes due to viewing angle of camera inside a car or vehicle. On urban highways, multiple entry and exit points are present with relatively small distances between adjacent entry and exit points. This scenario explains the presence of various lane markings on urban roads. For LDWS, these detected lane edges may lead detection algorithm towards complexity and inaccuracy. Also, while giving departure warning, multiple lane boundaries may give false warnings. During the processing of lane departure, time to lane crossing (TLC) parameter may be affected. According to the survey carried out by National Highway Traffic Safety Administration o f U S , 43 % of the total traffic accident casualties are the results of the abnormal lane switching/departure on the road, which is also the major cause of the traffic accident in the list [1]. In the previous studies of the Driver Assistance System (DAS), a m u c h powerful computing machine and large size memory are required to carry out the calculation of the computer vision and graphic processing algorithms [2], [3]. There are articles contributed to the studies and methods of lane recognition, such as the stereo vision system [4], [5] which transform the image coordinate system back to the real world coordination. Then, the method is applied to identify the lane markings and remove other irrelevant objects in the image. To improve the performance, it was proposed using curvature method only in the far end of the image but adopting the straight line pattern in the near end to identify the lane markings in order to reduce the time required for identification [6]. Many approaches have been applied to lane detection, which can be classified as either feature-based or model based [7], [8]. Hsiao et. Al. presents lane departure algorithm based on spatial and temporal mechanism [9]. But this approach suffers from poor illumination problem. In [10]-[11], occlusion handling algorithm for lane tracking is presented. But is has a limitation of low computational speed.

3. ABOUT THIS PAPER In this paper, effective ROI is considered as a first step of algorithm processing after preprocessing by top-hat transform [2]. ROI is further segmented to avoid the problem o f m u l t i p l e lanes. Segmenting the ROI has the advantage of dividing multiple lanes present in the ROI. This ROI is further divided into left and right sub-regions. Lane marking using HT is carried out in segmented regions of an image. Processing an image without segmentation will detect many Hough lines due to which ambiguity will be created in estimating lane departure. Segmenting ROI will reduce the complexity of the lane detection. Segmentation helps to give lane identification in appropriate manner giving only desired lane lines which are required for estimating lane departure information. This methodology will have the net effect of enhancement in the speed of operation; reduced ambiguity, hence the computational time required for lane departure warning will be reduced. Thus, driver will get lane departure information instantly and will have more warning onset time. It is desirable for LDWS to have more onset time. Onset time is the amount of time the driver gets to bring the car in lane when deviated out of lane.

Computer Science & Information Technology (CS & IT)

53

The paper is organized as follows. Section 4.1 describes procedure for dynamic threshold value selection. Section 4.2 describes segmentation of ROI. Modified lane departure method is elucidated in section 4.3. Section 5 explains experimental validation. Section 6 concludes the paper.

4. DYNAMIC THRESHOLD VALUE SELETION In this paper, a method based on histogram statistics will be used to determine the fitting threshold value dynamically.

4.1 Proposed Method The procedure is to define a neighborhood and move its center from pixel to pixel. At every location, histogram of the points in the neighborhood is first computed and thereafter the histogram specification transformation function is obtained. This function is then used to map the intensity of the pixel centered in the neighborhood. As shown in Figure 1, the

Figure 1. Pixel translation with 4×4 neighborhood center of the neighborhood region is then moved to an adjacent pixel location and the procedure is repeated. Because only one row or column of the neighborhood changes during the pixel- to-pixel translation of the neighborhood, updating the histogram obtained in the previous location with the new data introduced at each motion step is possible. Row translation is shown in Figure 1. A 4×4 neighborhood is taken into consideration. This method has many advantages as can be seen from the Figure. Figure 2 shows the histogram of input and output image.

54

Computer Science & Information Technology (CS & IT)

Figure 2. Histogram Specification Transformation Function. (a) Input Image (b) Output Image Output image is obtained when the proposed method of dynamic threshold value selection is applied to an input image. Rayleigh distribution is taken into account in histogram specification transformation because it describes the random level brightness and contrast ratio of lane images appropriately. From Figure 2 (b) it is clear that the histogram is equalized and uniformly spaced. This process gives an input image an enhanced contrast level which makes lane detection easier. Figure 3 shows the flow of the algorithm.

Figure 3. Flow of the algorithm

4.2 ROI Segmentation Lower area of a lane image, shown dotted in Figure 4, is considered as region of interest (ROI). In this part of an image, road lanes are present. This is the lower region of the view seen by a camera which can be situated inside a car near rear view mirror. This ROI is

Computer Science & Information Technology (CS & IT)

55

further divided into left and right sub- regions. Lane marking using Hough Transform (HT) will be carried out in segmented regions of an image

Figure 4. Region of Interest Segmentation helps to give lane identification in appropriate manner giving only desired lane lines which are required for estimating lane departure information. This methodology will have the net effect of enhancement in the speed of operation. Also, with reduced ambiguity, the computational time required for lane departure warning is reduced. Thus, driver will get lane departure information instantly and will have more warning onset time.

4.3 Modified Lane Departure Method The new proposed methodology for lane departure indication is described in this section. ROI of an image is extracted and represented as Ri . Edges in an image are detected using Hough transform. Hough origin Ho is placed at the coordinate ( x/ 2 ,0) . Edges of lanes are extracted. Left edge mid-point and right edge mid-point viz. Μ L , Μ R is calculated. A line joining from each mid-point to Hough origin is plotted and its length is measured as Κ L , Κ R . Also, horizontal distance between the mid-points is noted down as length C shown below in Figure 5.

Figure 5. New Lane Departure Calculation on ROI (a) Left departure, (b) Right departure If the value of length C is greater than initial threshold value Τi then the position of car will be examined for departure. The terms KL, KR are used to obtain information in this regard. As shown in above Figure 5 (a), if length KR is less than KL then car is near right lane otherwise

56

Computer Science & Information Technology (CS & IT)

if length KR is greater than KL then car is near left lane. The initial thresholds for minimum lengths are set. If either of the length KL, KR reduces below some threshold TL, TR then lane departure on left side or right side occurs and necessary warning will be given to driver. The algorithm for proposed lane departure method is given in following pseudo code

On the contrary, if the value of C is lesser than initial threshold value Τi , as shown in Figure 6, car is crossing the lane and is on the central axis of the road

Figure 6. New Lane Departure Calculation on ROI As shown in this Figure, dotted lane marking is identified. Edges are extracted with outer boundaries. The length C is the distance between the edges shown in Figure 6. C is always less than the initial threshold value in case when car is in left or right lane. Also, during left or right departure, C is always greater than initial threshold. ROI segmentation is taken into account. The uniqueness of the algorithm lies in considering value of C as shown in Figure 5. Three cases are assumed: Case I: C is greater than initial threshold value Ti when left departure occurs – In this case, the value of C is greater than 50. The length KL , KR are calculated. Centroid of KL, KR is estimated which decides C value. For left departure, KL < KR is condition is satisfied.

Computer Science & Information Technology (CS & IT)

57

Case II: C i s greater than initial threshold value Ti when right departure occurs - In this case, the value of C is greater than 50. For left departure, KR < KL condition is satisfied. Case III: C is less than initial threshold value Ti - In this case C value is less than 50. Car is crossing the lane .

5. EXPERIMENTAL VALIDATION The proposed algorithm of lane departure is simulated in MATLAB. The software runs on i5 processor at 2.53 GHz.As shown in Figure 7 (a) original image is shown. The lane detection is performed using Hough transform. The detected lane boundaries are shown in Figure 7 (b) in green color. It seen that HT detects lane boundaries accurately.

Figure 7. Lane Detection. (a) Original Image, (b) Lane Detection shown in green color Modified lane departure method is used to generate warning to the driver. If the car is deviated from the lane, the color of identified lane markings is changed from green to red. A caution or a warning is generated and displayed to the driver. Figure 8 (a) shows that car is departing towards right side. Figure 8 (b) shows that car is crossing the middle boundary and is at the center on a road. Figure 8 (c) shows that car is departing towards left side.

Figure 8. Lane Departure shown in Red color. (a) Right Side, (b) At Center, (c) Left Side In case III, C value is 130. Also, length KR is greater than KL informing left departure condition has occurred. Thus, accurate predictions are obtained using the proposed algorithm. Table 1 shows that the proposed algorithm gives lane departure information in fraction of second, with average value equal to 0.053622 second. The second last column

58

Computer Science & Information Technology (CS & IT)

shows time required for each execution of identifying the departure Table 1: Lane Departure Parameters of Proposed Algorithm Image

C

KL

KR

1 2 3 4 5 6 7

100 79 130 120 150 110 115

72 95 68 80 80 70 69

58 95 134 49 80 138 140

Time (s) 0.051774 0.046725 0.062367 0.053745 0.053512 0.054011 0.053113

Departure Right In Lane Left Right In Lane Left Left

6. CONCLUSION In this paper, an improved method for lane departure warning system is presented. Hough transform is used to detect the lane markings. The lane departure method is improved by ROI segmentation technique. By measuring the distance between the lanes and using it to make out decision for left or right departure, the proposed algorithm accurately detects the lanes in short span of time. It is observed that the proposed algorithm has average execution time of 0.053622 second. It has the benefit of less complexity and fast execution. This algorithm, if optimized, will further enhance the speed of operation. For lane departure warning system it is necessary that the algorithm must be executed in short span of time with better accuracy so that driver will get more onset time to bring the car in lane. Our paper fulfills these conditions by giving less time to generate warning. Thus, proposed algorithm is suitable for real-time application for LDWS.

ACKNOWLEDGEMENTS The authors would like to thank everyone, just everyone!

REFERENCES [1]

National Highway Traffic Safety Administration, http://www.nhtsa.dot.gov/

[2]

Long Chen, Qingqyan Li and Qin Zou, “Block-Constraint Line Scanning Method for Lane Detection”, IEEE Intelligent Vehicles Symposium, 2010

[3]

Robert M. Haralick and Linda G. Shapiro, “Computer and Robot Vision,” Vol.1, Addison Wesley Publishing Company Inc., 1992.

[4]

Yue Feng WAN, Francois CABESTAING and Jean-Christophe BURIE, “A new edge detector for Obstacle Detection with a Linear Stereo Vision System”, IEEE Proceedings, 2010, pp. 130 – 135.

[5]

Mathias Perrollaz, Anne Spalanzani and Didier Aubert, “Probabilistic representation of the uncertainty of stereo vision and application to obstacle detection”, 2010 IEEE Intelligent Vehicles Symposium Univeristy of California, San Diego, Ca, USA, June 21-24 2010, pp.313-318.

[6]

C. R. Jung and C. R. Kelber, “A robust linear parabolic model for lane following,” Proceedings of XVII Brazilian Symposium on Computer Graphics and Image Processing, Oct. 2004, pp. 7279.

Computer Science & Information Technology (CS & IT)

59

[7]

Joel C. McCall and Mohan M.Trivedi, “Video-based Lane Estimation and Tracking for Driver Assistance: Survey, System, and Evaluation”, IEEE Transactions on Intelligent Transportation Systems, vol.7, 2006, pp.20-37, doi: 10.1109/TITS.2006.869595.

[8]

Broggi and S. Berte, “Vision-based Road Detection in Automotive Systems: a Real-time Expectation-driven Approach”, Journal of Artificial Intelligence Research, vol.3, 1995, pp. 325-348.

[9]

Pei-Yung Hsiao, Chun-Wei Yeh, Shih-Shinh Huang, and Li-Chen Fu, “A Portable Vision-Based Real-Time Lane Departure Warning System: Day and Night”, IEEE Transaction on Vehicular Technology, vol. 58, No. 4, May 2009

[10] Bing-Fei Wu, Senior Member, IEEE, Chuan-Tsai Lin, Student Member, IEEE, and Yen-Lin Chen, Member, IEEE, “Dynamic Calibration and Occlusion Handling Algorithms for Lane Tracking”, IEEE Transaction on Industrial Electronics, vol. 56, No. 5, May 2009. [11] Nak Yong Ko, Reid Simmons, Koung Kim, “A Lane based obstacle avoidance Method for Mobile Robot Navigation”, KSME International Journal, Vo. 17, No. 11, pp. 1693-1703, 2010.

Authors: Sachin Sharma, Ph.D. pursuing, is Assistant Professor, Electronics and Communication Department, SVBIT, Gandhinagar (Gujarat). He is having more than 5 years of experience in Academics, Research & Industry. He has published numerous articles related to Image Processing, Digital Signal Processing, and Intelligent Transportation Systems. He is an active member of several professional societies, including ISTE, IEEE and SAE.

Dr. Dharmesh Shah is working as Principal at LCIT, Bhandu (Gujarat). He is also the Dean – Engineering (Zone II), GTU, Ahmedabad. He is having more than 15 years of experience in Academics, Research & Industry. He has published numerous articles related to VLSI, Digital Signal Processing, and Image Processing. He is an active member of several professional societies, including IETE, ISTE and IEEE.

Suggest Documents