Quranic Computation A Review of research and application Rahmath Safeena College of Computers and Information Technology, Taif university, Taif, Saudi Arabia e-mail:
[email protected] Abstract— The Noble Qur’an is considered to be the central religious text of Islam. Any linguistic or literary research with the use computational technologies of this text is benefitted by billions of people around the world. It has been observed that this research approach of Qur’anic Computation has strongly established its base in the research and application. This study reviews the evolution of computational effort on the noble book of Quran, both from the research and application point of view. The purpose of this review has been achieved through an exploratory study of several research literatures and various applications documentation. Based on this objective, the study notice that Quranic Computation developed through various researches and application has common goals of achieving easy understanding of the Quran, but have chosen distinct complementary methodology and techniques to achieve it. A snowball technique of collection, classification and categorization of articles or documents from 1997 to 2011 has adopted in the review. Keywords- Quranic Computing, Computational Linguistics, Quranic Arabic, Review Paper.
I.
INTRODUCTION
The Quran is held by Muslims to be a single-authored text, the direct words of God (Allah), conveyed by the angel Gabriel to Prophet Muhammed (Peace be up on him). Arabic words are known to have complex morphological structure [1]. Quran is the central religious text of Islam. The Noble Quran is one of the well-known books in the world which has been descended in Arabic and most of peoples would like to understand more about this book [2]. Arabic is a highly inflected language, with nouns and verbs taking different morphological forms according to their role in a sentence. Traditional grammar is familiar to native speakers who have studied Arabic formally, and many books have been written about the language of the Quran which explain the text in terms of traditional grammar [3-5]. However, the intent of this paper is to analyze various researches and development done based on the Noble Qur’an. II.
METHODOLOGY
The research method used in this paper is the reviews of different literatures on Quranic computation from 1997 to 2011.Article were found via the computerized search of the selected topic. This paper surveys the development of Quranic Computation using a literature review and
Abdullah Kammani College of Computers and Information Technology, Taif university, Taif, Saudi Arabia e-mail:
[email protected]
classification of articles in order to explore how the development of Quranic Computations has been evolved and in which directions is need to be focused in to. The paper covers journal articles, conference proceedings and dissertations. Based on the scope of 39 articles from academic journals, this paper notice Quranic Computation to be evolved from two distinct complimentary directions of studies like computation of a) General Arabic and b) Quranic Arabic. A surveys and analyzes of these two streams of studies reveals few research gaps and suggest some future directions of the field. III.
LITERATURE REVIEW
One of the techniques used in Quranic computation is computational linguistics. Computational linguistics is an interdisciplinary field dealing with the statistical and/or rulebased modeling of natural language from a computational perspective. This modeling is not limited to any particular field of linguistics, but research on Quranic computation is an imperative. The importance of this is that the Qur’an contains many classical words and the writing style is very different from modern standard Arabic. It is especially important to preserve the correctness of words in this sacred book of the Muslims [1]. A. Computation Linguistics Intelligent natural language processing is based on the science called computational linguistics. Computational linguistics might be considered as a synonym of automatic processing of natural language, since the main task of computational linguistics is just the construction of computer programs to process words and texts in natural language. [6]. Recent Computational Linguistics research incorporates statistical techniques as well as knowledge-based techniques. It is an approach to linguistics that employs methods and techniques of computer science. A formal, rigorous, computationally based investigation of questions that are traditionally addressed by linguistics [7]. Computational linguistics (CL) is at the crossroads of linguistics and computer science. Applied CL focuses on developing practical applications that have some facility with human language. Applications using CL research which are currently available include: voice recognition software, web search engines, word processors (spell checkers, grammar checkers), machine translation systems (automatic language translation). There are many more exciting applications
Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 – 25, 2013, Madinah, Saudi Arabia
1 - 218
currently under development; multilingual information retrieval, information extraction, colloquial machine translation and systems that create digests of newspapers, journals and magazines. Computation linguistics application on Noble Qur’an so far taken two distinct complimentary directions; A) General Arabic, B) Quranic Arabic B. General Arabic Arabic is acquiring attention in the natural language processing (NLP) community because of its political importance and the linguistic differences between it and European languages. These linguistic characteristics, especially complex morphology, present interesting challenges for NLP researchers [8].Recent computational advances have made possible annotating the Quran to very high accuracy [9] One of the main goals of Arabic Natural Language Processing is effective document retrieval. For example, if query is input through a search engine, the relevant document retrieved must be based on either the root or the stem of the word. Therefore, the goals of most Arabic morphological analyzers and stemming engines are to extract the root and/or stem of a word. Bielicky and Smarz [10] in their work describes the building of a valency lexicon of Arabic verbs using a morphologically and syntactically annotated corpus, the Prague Arabic Dependency Treebank. Their work is built on ‘Functional Generative Description (FGD)’ theory where verbs have valency frame with many complements known as functors which can further be divided into actants (Actor, Addressee, Patient, Effect and Origin) and adjuncts (like Manner , Means and Location). This FGD concept was adapted for Arabic verbs [11]. Al-Qahtani [12] gives an extensive categorization of modern standard Arabic verb valence based on Case Grammar (CG). According to this matrix five cases (Agent, Experience, Benefactive, Object, and Locative) are plotted horizontally and type of verb (State, Process, Action) vertically. The data was taken from 8327 verbs from a lexicon and most frequent 200 verbs were exhaustively sorted to a cell in the matrix. Salem [8] developed a rule-based lexical framework for Arabic language processing using the Role and Reference Grammar linguistic model. A system, called UniArab is introduced to support the framework. UniArab utilizes an XML-based implementation of elements of the Role and Reference Grammar theory, and its representations for the universal logical structure of Arabic sentences. The UniArab system for Modern Standard Arabic (MSA) takes MSA as input in the native orthography, parses the sentence(s) into a logical meta-representation, and generates a grammatically correct English output with full agreement and morphological resolution. Attia, Rashwan, Ragheb, Al-Badrashiny, Al-Basoumy, & Abdou, [13] designed and implemented an Arabic lexical semantics Language Resource (LR) that enables the retrieval of the possible senses of any given Arabic word at a high coverage. Instead of tying full Arabic words to their possible
senses, their LR flexibly relates morphologically and PoStags constrained Arabic lexical compounds to a predefined limited set of semantic fields across which the standard semantic relations are defined. With the aid of the same large-scale Arabic morphological analyzer and PoS tagger in the runtime, the possible senses of virtually any given Arabic word are retrievable. Habash and Rambow [14] developed MAGEAD, a morphological analyzer and generator for the Arabic language; which decomposes word forms into the templatic morphemes and relates morphemes to strings. Buckwalter [15] designed Buckwalter morphological analyzer that uses a concatenative lexicon-driven approach where morphotactics and orthographic rules are built directly into the lexicon. The system has three components: the lexicon, the compatibility tables and the analysis engine. An Arabic word is viewed as a concatenation of three regions, a prefix region, a stem region and a suffix region. The prefix and suffix regions can be null. Prefix and suffix lexicon entries cover all possible concatenations of Arabic prefixes and suffixes, respectively. For every lexicon entry, a morphological compatibility category, an English gloss and occasional part-of-speech (POS) data are specified. Habash [16] described an approach to automatic sourcelanguage syntactic preprocessing in the context of ArabicEnglish phrase-based machine translation. Source-language labeled dependencies, that are word aligned with target language words in a parallel corpus, are used to automatically extract syntactic reordering rules. Belkredim1and El Sebai [17] used the derivations of Arabic verbs and their patterns to structure the Arabic language and to link the words' morphology to their semantics. The model developed in their research was based on an Ontology using the derivation rules of the Arabic language. The model was evaluated by linguists to validate its applicability. C. Quranic Arabic The Qur’an is a classical book and the language is in the traditional Arabic known as i’rab [1].The Qur'an has the advantage of being a closed corpus in the following senses: First, it demonstrates a frequent repetition of structures, indeed of the same phrases, to the extent of what may be considered formulaic style. Second, the Qur'an is traditionally identified with one person, a specific region, and a certain period of time and its volume is relatively restricted. These two facts justify treatment of the Qur'an as an independent corpus which deserves an independent study of its language in general and syntax in particular [18]. Understanding the Quran is a grand challenge for society, for western public education, for Muslim-world education, for knowledge representation and reasoning, for knowledge extraction from text, for systems robustness and correctness, for online collaboration. Understanding the Quran is a major new Grand Challenge for Computer Science and Artificial Intelligence [19]. Dukes, Atwell, and Sharaf [20] are working on the Quranic Arabic corpus that is a resource which provides morphological annotation and syntactic analysis using
Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 – 25, 2013, Madinah, Saudi Arabia
1 - 219
dependency grammar. The Quranic Arabic Corpus is a collaboratively constructed linguistic resource initiated at the University of Leeds, with multiple layers of annotation including part-of-speech tagging, morphological segmentation [21] and syntactic analysis using dependency grammar [22]. TABLE I.
STUDIES ON ARABIC COMPUTING
Auth or
Focus
Method
Result
[8]
Arabic to English translation
RRG theory, XML based metadata, Java programming and Interlingua design for machine Translation.
UniArab system
[13]
To retrieve the sense of any given Arabic word.
Arabic morphological analyzer and PoS tagger
Arabic lexical semantics Language Resource (LR)
[12]
Categorization of modern standard Arabic verb valence based on Case Grammar
Data was taken from a lexicon
Verbs were exhaustivel y sorted to a cell in the matrix.
[10]
Build a valency lexicon of Arabic verbs
Morphologically and syntactically annotated corpus
Prague Arabic Dependenc y Treebank
[14]
Morphological analyzer and generator for the Arabic language
Morphological analyzer
MAGEAD
[15]
Morphological analysis
Morphological analyzer
Buckwalter morphologi cal analyzer
Al-Yahya, Al-Khalifa, Bahanshal, Al-Odah, and AlHelwah [23] proposed a computational model for representing Arabic lexicons using ontologies. The model is based on the field theory of semantics from the linguistics domain, and the data which drives the design of the model is obtained from the most accurate text that presents superiority and perfection of the Arabic language, the Noble Quran. Dror, Shaharabani, Talmon, and Wintner, [18] devised a computational system for morphological analysis and annotation of the Qur'an, and Talmon and Wintner devised morphological tagging of the Qur’an [24]for research and teaching purposes. These systems facilitate a variety of queries on the Quranic text that make reference to the words and their linguistic attributes. The core of the system is a set of finite-state based rules which describe the morphophonological and morpho-syntactic phenomena of the Quranic language. The results are stored in an efficient database and are accessed through a graphical user interface which facilitates the presentation of complex queries.
Dukes, Atwell, and Habash [25] presented a new approach to linguistic annotation of an Arabic corpus: online supervised collaboration using a multi-stage approach. The different stages include automatic rule-based tagging, initial manual verification, and online supervised collaborative proofreading. Dukes [26] presented LOGICON, an end-to-end system using partial parsing, which assigns novel semantic structures to natural language text. A syntactic tagging scheme is proposed which is closely aligned to the corresponding semantics. Syntax-driven approach is used to derive semantic roles through recursion. Given a simple sentence focusing around an event, LOGICON attempts to identify roles for the actor (who did the event), the action (what the event was) and the target (what entity the actor performed the action on). Sharaf and Atwell [11] is in the work of designing a Knowledge Representation (KR) model for the Quran leveraging on the concept of ‘frame semantics’. They aim to build FrameNet like lexicon for the verbs in the Quran. This initial attempt will enable future extension to include predicates other than verbs and to consider other classical Arabic texts as well as Modern Standard Arabic. Shenassa and Khalvandi [2] designed a system to analyze the quality of translating a text; that is to evaluate different English translations of Quran using tools and concepts such as pos-tagging, natural language processing, computational linguistic and machine learning. To do such an evaluation, each verb process type in translated text is compared with it's of Quranic text. The system uses Halliday Grammar, which is a useful theory for analyzing a formal text, to do this. It has been assumed that each verb in the text of Quran has been tagged manually. At the other hand, to detect each verb process type in translated English texts, a tagger is used to detect and tag each verb based on Halliday grammar. The best translation is one in which the number of similar verb process type in the source and translated text is maximum. Dukes and Buckwalter [22] had shaped up a Dependency Treebank of the Quran using Traditional Arabic Grammar. The Quranic Arabic Dependency Treebank (QADT) uses XML to represent the syntax of verses from the Quran and a Java object model is provided with the Treebank as an API to query the data. The Treebank also introduces the novel approach of displaying Quranic syntax using dependency graphs, which show how each word in a sentence is related and what role it plays in building up a complete syntactic structure. This differs from other Arabic Treebank by providing a deep linguistic model of traditional Arabic grammar. This Treebank is a part of the Arabic Corpus. It also includes information like the root for each word, a word-by-word interlinear translation into English, and an automatically generated phonetic transcription. Thabet [27] developed a methodology which results in understanding the Qur’an on the basis of its lexical semantics. It discovers the thematic structure (thematic interrelationships among the suras (chapters) ) of the Qur’an based on a fundamental idea in data mining and related disciplines: like collection of texts, the lexical frequency profiles of the individual texts are a good indicator of their
Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 – 25, 2013, Madinah, Saudi Arabia
1 - 220
conceptual content, and thus provide a reliable criterion for their classification relative to one another. Noordin and Othman [28] proposed a system design for retrieving Quranic texts and any knowledge that derived or cites al-Quran. They surveyed the 125 websites offering access to Quranic texts on their structure and linkages. Findings revealed that the websites offer texts and translation, recitation, excerpt of exegesis, and link to other websites consisting of news, events, and related topics. A standard structure was not implemented by these websites and thus proposed a system design which focuses on texts, translation, recitation, exegesis, al-Hadith, its topics and themes like stories of the prophets and places mentioned in al-Quran, and search feature. Al-Yahya et al [29] developed an ontological model based on Semantic Web technologies for representing computational lexicons using the field theory of semantics and componential analysis which provides the foundation for a dynamic and collaborative computational lexicon. The ontological structure represents word semantics using the atomic components (features) of words and used the recent W3C standard for representing ontologies, Web Ontology Language (OWL) for shared and open access to such a resource. In this method they limited the vocabulary to those words which exists in the Noble Quran. Kotb et al. [30] demonstrate the significance of XML semantics checker algorithm to check the semantic consistencies of the XML file of the Noble Quran book found in Religion 2.0 website. They checked the semantic consistency by attaching semantic information to XML element tag attributes. They automatically checked if the number of verses in each chapter of the XML Noble Quran book as well as the number of chapters is semantically correct. The system has successfully counted the number of verses and chapters as in the real Noble book. This result was achieved by using the SLXS Specification Language for XML Semantics to specify the semantic rules associated with XML attributes. Moisl [31] proposed a model based on calculation of a minimum Quran Sura (chapter) length threshold using concepts from statistical sampling theory followed by selection of Sura and lexical variables based on that threshold. He applies the proposed solution to a reanalysis of the Quran and found that the higher the threshold, the larger the number of variables on which clustering can be based, and the smaller the number of Sura that can be clustered. Al-Khalifa et al [32] presents a work-in-progress project for building a computerized framework that exploits the power of semantic web technologies and natural language processing, for recognizing and identifying semantic opposition terms using Natural Language processing armed with domain ontologies. The SemQ is a framework that takes Quranic verse as an input and outputs the list of semantically opposed words in the verse along with their degree of opposition. The framework architecture consists of two major components: the domain ontology (to mimic how the human brain keeps the semantics stored.) and the SemQ Tool (the tool works automatically to identify semantically opposite terms and works as a manual identification tool that
relies on a subject matter expert (SME) to populate the ontology with terms and their properties). Shoaib et al [33] addresses the deficiencies of key word based searching and the issues related to semantic search in the Noble Quran, and proposed a model that is capable of performing semantic search. The model exploits WordNet relationships in relational database model; that is exploits the relational schema for the purpose of WordNet. The implementation of this model has been carried out using SQL Server 2005 and VB.Net on Surah Al-Baqarah. The precision of the model's prototype implementation is far better than simple key word searching. IV.
TECHNIQUES USED IN QURANIC COMPUTING
A. Natural Language Processing(NLG) It is used to provide concise English and Arabic summaries of the inflection features stored in the Quranic linguistic database [22]. Natural Language Processing (NLP) technology is a significant component in Semantic Web tool. NLP is one branch of the linguistics, which uses the computer technology to realize human language processing effectively. Its ultimate objective is to automatically understand human language with the support of artificial intelligence technology. It is also called as natural language understanding and sometimes is used to transform information to Semantic Web data. Traditional information retrieval also can be turned into knowledge discovery [34]. B. Treebank and Syntactic annotaion A Treebank is a linguistic resource which collects together syntactic trees. These are manually annotated analyses of sentences which can be read both by humans and computers, with different Treebank adopting different theories of syntax. Previous syntactic work includes the three major Arabic Treebank that have been recently developed: the Penn Arabic Treebank [35], the Prague Arabic Dependency Treebank (PADT) and the Columbia Arabic Treebank (CATiB)[36]. Penn Arabic Treebank annotation consists of two phases: (a) Morphological/Part-of-Speech (=POS) tagging which divides the text into lexical tokens and includes morphological, morphosyntactic and gloss information, and (b) Syntactic analysis referred to as Arabic Treebanking (=Arabic TB) which characterizes the constituent structures of word sequences, provides function categories for each non-terminal node, and identifies null elements, coreference, traces, etc [37]. Prague Arabic Dependency Treebank consists of multilevel linguistic annotations over the language of Modern Standard Arabic, and provides a variety of unique software implementations designed for general use in Natural Language Processing (NLP) [38]. The Columbia Arabic Treebank (CATiB) is a database of syntactic analyses of Arabic sentences. CATiB avoids the annotation of redundant linguistic information that is determinable automatically from syntax and morphological analysis, e.g., nominal case. And secondly, CATiB uses
Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 – 25, 2013, Madinah, Saudi Arabia
1 - 221
linguistic representation and terminology inspired by the long tradition of Arabic syntactic studies [36]. Syntactic annotation in the dependency framework involves two types of inter-related decisions: attachment and labeling. The attachment of one word to another indicates that there is a syntactic relationship between the head word and the dependent word (and the subtree it heads). The labels specify the type of the attachment. For example, the relation, subject, may label the attachment of a dependent noun to a heading verb, where the noun is the subject of the verb [36]. Pajas and Stepank [39] presented recent advances in an established Treebank annotation framework comprising of an abstract XML based data format, fully customizable editor of tree-based annotations, a toolkit for all kinds of automated data processing with support for cluster computing, and a work-in-progress database-driven search engine with a graphical user interface built into the tree editor. V.
Author [20]
Focus Quranic corpus
STUDIES ON QURANIC COMPUTING Arabic
[22]
Quranic corpus
[25]
linguistic annotation of an Arabic corpus
[18]
Computational system for research and teaching
Partial parsing, which assigns novel semantic structures to natural language text Build FrameNet like lexicon for the verbs in the Quran.
Syntax-driven approach is used to derive semantic roles through recursion Using the concept of ‘frame semantics’.
[27]
thematic structure of the Qur’an
Data mining and lexical semantics
[33]
Semantic search
SQL Server 2005 and VB.Net
[23]
Computational model for representing Arabic lexicons using ontologies. Time nouns from the Holy Quran are used to derive the resulting ontological structure
field theory of semantics from the linguistics domain
[11]
DISCUSSIONS
Most recent research done in the field of Quranic Computing can be classified as: Information Retrieval, Speech Recognition, Optical Character Recognition, Morphology Analysis, Semantic checking, Educational Applications [32] Quranic Corpus[22] . The Noble Quran, due to its unique style and allegorical nature, needs special attention about searching and information retrieval issues. The legacy keyword searching techniques are incapable of retrieving semantically relevant verses [33]. TABLE II.
[26]
Method Morphological analysis, and syntactic analysis Uses XML to represent the syntax of verses from the Quran and a Java object model is provided with the Treebank as an API to query the data Online supervised collaboration using a multi-stage approach.
morphological analysis and annotation
Results Quranic bank
tree
Quranic Arabic Dependency Treebank (QADT)
The different stages include automatic rule-based tagging, initial manual verification, and online supervised collaborative proofreading. Database that can be accessed through a GUI which facilitates the presentation of complex queries.
[29]
Semantic technologies
web
LOGICON
Knowledge Representatio n (KR) model for the Quran understanding the Qur’an on the basis of its lexical semantics WordNet based relational model Ontological model
Ontological data-driven model
A. Research Gaps Although Arabic is the language of over two hundred million speakers, little has been achieved in regards to computational Arabic resources [23]. There is a need an objective, impartial computation of the noble Quran based on the Quranic word and other authentic source of Sunnah. Some of the research gaps that need to focused are 1) Most of the Muslims are ignorant of the deeper meanings in the Quran, in spite of learning the sounds of the verses. An authentic Quran Expert System could help them question and understand the teachings of the Quran for themselves. 2) Present-day systems can provide response to the questions from the source text, but many potential questions are more difficult and contentious to answer via text-match, requiring a new Knowledge Representation and Reasoning formalism capable of capturing complex, subtle knowledge encoded in this Classical Arabic text, and inferencing in new ways which mirror the thousand-year–old traditions of scholarly analysis and interpretation. B. Future Directions The Quran stands out as the source of large collection of analysis and interpretation for Islamic Teaching. In future, application of Artificial Intelligence, knowledge extraction and knowledge representation techniques would certainly give way easy and clear understanding of this splendid scripture. This effort will lead to some computational results are that will shed new light on traditional interpretations, thus adding to the canon of Islamic wisdom. The Muslim believes the Noble Quran as free of any alterations or variations. The Computation effort would surely reinstate this belief with sound inference and interpretation of Quran.
Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 – 25, 2013, Madinah, Saudi Arabia
1 - 222
VI.
CONCLUSION
Computation of Quranic Arabic is a unique challenge, because of the vocabulary and morphological variation it has from the modern standard Arabic. Any use of computational technology on this linguistic or literary investigation would definitely help in easy understanding the Quranic text. Therefore this review paper is intended to be base for research on novel computational approach toward the Noble Quran. ACKNOWLEDGMENT The authors would like to thank all researchers who have sent their articles for our analysis and study on request. REFERENCES [1]
[2]
[3] [4] [5] [6] [7] [8]
[9] [10] [11]
[12] [13]
[14]
[15] [16] [17]
[18]
R.J. Raja Yusof,, R. Zainuddin, M.S. Baba, and Z. Mohd. Yusoff, “Qur'anic Words Stemming,” The Arabian Journal for Science and Engineering, vol. 35, 2010, pp. 37-49. M. Shenassa and M. Khalvandi, “Evaluation of Different English Translations of Holy Koran in Scope of Verb Process Type,” ICTTA 2008, Damascus: 2008, pp. 1-4. H. Ansari, Learning the Language of the Quran, Centre of Religious Studies and Guidance., 1997. A. Jones, Arabic Through the Quran, Islamic Texts Society, 2005. J. Rafai, Basic Quranic Arabic Grammar, Ta Ha Publishers, 2004. I.A. Bolshakov and A. Gelbukh, Computational Linguistics: Models, Resources, Applications, Mexico: Dirección de Publicaciones, 2004. S. Wintner, “Computational Linguistics,” 2005. Y. Salem, “A Generic Framework for Arabic to English Machine Translation of Simplex Sentences Using the Role and Reference Grammar Linguistic Model,” Master Thesis, School of Informatics and Engineering at the Institute of Technology Blanchardstown, 2009. K. Dukes, “Computational Analysis of the Quran through Traditional Arabic Linguistics,” 2011. V. Bielicky and O. Smarz, “Building the Valency Lexicon of Arabic Verbs,” LREC 2008, Marrakech, Morocco: 2008, pp. 2300-2307. A. Sharaf and E. Atwell, “Knowledge Representation of the Quran Through Frame Semantics: A Corpus-Based Approach,” Corpus Linguistics-2009, University of Liverpool: 2009, p. 12. D. Al-Qahtani, Semantic Valence of Arabic Verbs, Librairie du Liban Publishers, 2005. M. Attia, M. Rashwan, A. Ragheb, M. Al-Badrashiny, H. AlBasoumy, and S. Abdou, “A Compact Arabic Lexical Semantics Language Resource Based on the Theory of Semantic Fields,” GoTAL 2008, Gothenburg, Sweden: 2008. N. Habash and O. Rambow, “MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects,” Sydney: Association for Computational Linguistics, 2006, pp. 681-688. T. Buckwalter, “Buckwalter Arabic Morphological Analyzer Version 1.0.,” University of Pennsylvania: 2002. N. Habash, “Syntactic Preprocessing for Statistical Machine Translation,” MT-Summit, Copenhagen, Denmark: 2007, p. 8. F. Be1krediml and A. El Sebai, “An Ontology Based Formalism for the Arabic Language Using Verbs and their Derivatives,” Communications of the IBIMA, vol. 11, 2009, pp. 44-52. J. Dror, D. Shaharabani, R. Talmon, and S. Wintner, “Morphological Analysis of the Qur'an,” Literary and Linguistic Computing, vol. 19, 2004, pp. 431-452.
[19] E. Atwell, K. Dukes, A. Sharaf, and N. Habash, Understanding the Quran: A new Grand Challenge for Computer Science and Artificial Intelligence, Edinburgh: 2010. [20] K. Dukes, E. Atwell, and A. Abdul-Baquee M. Sharaf, “Syntactic Annotation Guidelines for the Quranic Arabic Treebank,” LREC2010, Malta: 2010, p. 4. [21] K. Dukes and N. Habash, “Morphological Annotation of Quranic Arabic,” LREC 2010, Valletta, Malta: 2010. [22] K. Dukes and T. Buckwalter, “A Dependency Treebank of the Quran using Traditional Arabic Grammar,” INFOS 2010, Cairo, Eygpt: 2010. [23] M. Al-Yahya, H. Al-Khalifa, A. Bahanshal, I. Al-Odah, and N. AlHelwah, “An Ontological Model for Representing Semantic Lexicons: An Application on Time Nouns in the Holy Quran,” The Arabian Journal for Science and Engineering, vol. 35, 2010, pp. 2235. [24] R. Talmon and S. Wintner, “Morphological Tagging of the Qur’an,” EACL'03 Workshop, Budapest, Hungary: 2003. [25] K. Dukes, E. Atwell, and N. Habash, “Supervised Collaboration for Syntactic Annotation of Quranic Arabic,” 2010. [26] K. Dukes, “Logicon: A System for Extracting Semantic Structure Using Partial Parsing,” RANLP-2009, Borovets, Bulgaria: 2009, p. 5. [27] N. Thabet, “Understanding the Thematic Structure of the Qur’an: An Exploratory Multivariate Approach,” Proceedings of the ACL Student Research Workshop, Michigan: Association for Computational Linguistics, 2005, pp. 7-12. [28] M. Noordin and R. Othman, “An Information Retrieval System for Quranic Texts: A Proposed System Design,” 2006. [29] M. Al-Yahya, H. Alkhalifa, A. Bahanshal, I. Alodah, and N. AlHelwah, “An Ontological Model for Representing Computational Lexicons: A Componential Based Approach,” NLP-KE-2010, Beijing, China: 2010, pp. 1-6. [30] Y. Kotb, K. Gondow, and T. Katayama, “A Case Study for XML Semantics Checker Model,” Washington, DC: 2003, pp. 4834 - 4839. [31] H. Moisl, “Sura Length and Lexical Probability Estimation in Cluster Analysis of the Qur'an,” ACM Transactions on Asian Language Information Processing, vol. 8, 2009, pp. 1-19. [32] H.S. AI-Khalifa, M. Al-Yahya, A. Bahanshal, and I. Al-Odah, “SemQ: A Proposed Framework for Representing Semantic Opposition in the Holy Quran using Semantic Web Technologies,” CTIT-2009, Dubai: 2009, pp. 1-4. [33] M. Shoaib, M.N. Yasin, K. Hikmat Ullah, M.I. Saeed, and M.S.H. Khiyal, “Relational WordNet model for semantic search in Holy Quran,” Islamabad, Pakistan: 2009, pp. 29-34. [34] M. Beseiso, A.R. Ahmad, and R. Ismail, “A Survey of Arabic Language Support in Semantic Web,” International Journal of Computer Applications, vol. 9, 2010, pp. 35-40. [35] A. Bies and M. Maamouri, Penn Arabic Treebank Guidelines, Philadelphia: University of Pennsylvania, 2003. [36] N. Habash, R. Faraj, and R. Roth, “Syntactic Annotation in the Columbia Arabic Treebank,” MEDAR, Cairo, Eygpt: 2009. [37] M. Maamouri, A. Bies, and S. Kulik, “Creating a Methodology for Large-Scale Correction of Treebank Annotation: The Case of the Arabic Treebank,” MEDAR, Cairo, Eygpt: 2009, p. 7. [38] J. Hajic, O. Smarz, P. Zemanek, J. Snaidauf, and E. Beska, “Prague Arabic Dependency Treebank: Development in Data and Tools,” NEMLAR, Cairo, Eygpt: 2004, pp. 110-117. [39] P. Pajas and J. Stepanek, “Recent Advances in a Feature-rich Framework for Treebank Annotation,” CoLing 2008, Manchester: 2008, pp. 673-680.
Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences December 22 – 25, 2013, Madinah, Saudi Arabia
1 - 223