was not implemented by these websites. The proposed system design focuses on texts, translation, recitation, exegesis, al-Hadith, its topics and themes like ...
An Information Retrieval System for Quranic Texts: A Proposed System Design Mohamad Fauzan Noordin and Roslina Othman Faculty of ICT, International Islamic University Malaysia dr uzan ai c and the creation of encryption." George Sarton [2] said that the "main task of mankind was accomplished by Muslims. The greatest philosopher, Al-Farabi was a Muslim; the greatest mathematicians Abul Kamil and Ibrahim Ibn Sinan were Muslims; the greatest historian, Al-Tabari was still a Muslim." Still today, the expansion of knowledge as derived from al-Quran continues. However, the expansions are done by many institutions and in most cases 'hidden'; and a comprehensive list is not widely available. At the moment, Muslims have to rely on Index Islamica, which in fact is a good source for scholastic reference. This Index has yet to cope with the range of articles within the context of Islamicization as studied at academic institutions. The hierarchical structure of Islamic sources available in many writings is best depicted as the one shown in Figure 1 (adapted from [3]). Al-Quran and al-Sunnah (i.e. Hadith) are the two main primary sources. Even though civilization consists of disciplines, civilization is consulted by many Muslims as sources for knowledge.
Abstract This project proposed a system design for retrieving Quranic texts and any knowledge that derived or cites al-Quran. The objectives were to survey the websites offering access to Quranic texts on their structure and linkages, and to propose a system design for retrieving Quranic texts. A total of 125 websites offering access to Quranic texts were examined. Findings revealed that the websites offer texts and translation, recitation, excerpt of exegesis, and link to other websites consisting of news, events, and related topics. A standard structure was not implemented by these websites. The proposed system design focuses on texts, translation, recitation, exegesis, al-Hadith, its topics and themes like stories of the prophets and places mentioned in al-Quran, and search feature.
1. Introduction Major works have been done to facilitate retrieval of the verses in al-Quran, e.g. [1] for scholastic and/or personal reason. Such works vary from a table of content consisting of a listing of chapters, term indexing, and topics describing events and ordains. There is a need to look at the current structure of the Web-based al-Quran and propose a system design for retrieving Quranic texts and knowledge generated based on the understanding and consultation of the Quranic texts.
1.1. Knowledge derived from al-Quran Branches of knowledge have rooted from this particular source. This knowledge ranges from moral and ethics, to rulings (legal and prohibitions), to art and history, and to science and mathematics, among others. Scholars contributed new theories and practices to many disciplines like mathematics, physics, astronomy, The Muslim Golden medicine, and philosophy. Civilization is considered as Muslim Heritage, and knowledge generated during this civilization have been taught at many universities. The most known references include the medicine practices established by Avicenna and principles in philosophy by Averroes. Carli Fiorina [2], CEO of Hewlett Packard said about Muslims being the scholars shedding new lights of knowledge: "Its architects designed buildings that defied gravity. Its mathematicians created the algebra and algorithms that would enable the building of computers,
0-7803-9521-2/06/$20.00 §2006 IEEE.
Figure 1. Hierarchical structure of sources
1.2. Retrieval techniques for Quranic text The retrieval techniques have been designed for creating al-Quran Retrieval Systems, with most of the systems were not necessarily developed for the Web. Such systems that are available on the Web are using hypertext/link, natural language, and topical concept/facet. There are systems designed specifically as a solution for retrieving terms in Arabic documents 1 704
(e.g. KISS project at Sheffield University, AIR at Syracuse University [4], and QARAB at DePaul University [5]). These systems range from monolingual, multilingual, and cross-language either in the form of ad-hoc retrieval or question-answering. Retrieval techniques implemented include stemming, class menus, and topical and keywords (concepts/facets). Quranic texts are more than normal texts in documents; i.e. the former required a very different approach due to its nature of having an implication to a human's belief and adopted way of living. A structure based on degree of significance has to be adopted. Degree of significance is defined here as the degree of relationship of one verse to another, i.e. one verse could be higher in term of implication than another. Atiyah [6] proposed terms selection based on class menus to facilitate a structural relationships that touched on hierarchical structure of broad and narrower, however degree of significance was missing. His proposition was applicable for texts within al-Quran and has no discussion in relation to knowledge derived from and based on al-Quran as the primary source. This paper explored and proposed block-level link, stemming, sentence completion, and other common retrieval techniques like phrase searching for Quranic text. This paper proposed citation analysis to determine the inclusion of items in the database/storage. None of these methods has been applied to Quranic texts, even though they have been applied in other systems. Block-level link has been applied in Yahoo News [7] and stemming for Arabic documents [8]. Sentence completion was applied in the work of Gbaski and Scheffer [9], but for e-mail's reply template. Block-level link involves organizing texts in separate blocks, and each block is linked to its own list of articles and sources (stored items). The two approaches of stemming are light stemming and heavy stemming. The former is "the process of stripping off prefixes and suffixes to produce stem of the word"; and heavy stemming is "the process of striping off prefixes, suffixes and infixes to produce the root of the word" [10]. Sentence completion (also referred to as verse completion) involves the ability of the system to retrieve texts that complete the search terms - the system treats these terms as first part of the verse. This feature is very important for the memorization of al-Quran. Citation analysis involves studying the list of works cited by a particular document/item.
knowledge/topics deriving from it as provided on these websites. Do these Websites implement a certain structure for the arrangement and access to its content? Is the structure based on the concept that al-Quran and alHadith (Prophet's sayings and traditions) are the two primary sources for a Muslim, or based on any other approaches adopted by scholars in their writings, like Bukhari and Muslim [ 1]? What are the topics linked to Quranic texts provided by these Websites? Do these Websites offer search features, and what kind? These questions motivate this project.
2. Background of this project The aim of this project is to propose a system design to retrieve Quranic texts and knowledge derived from al-Quran, and planned for its implementation on the web. The design must be based on the status of alQuran as the primary source and itself being supplemented by al-Hadith, making the latter as the second primary source, in which both must not be detached from the other. The objectives of this project are: i. to identify the structure between the Quranic text and the knowledge/topic displayed as the content for the surveyed websites ii. to propose a system design for retrieving the texts in al-Quran and the knowledge derived from it on the web. The basis must be alQuran as the primary source supplemented by al-Hadith. The expected outcome of this project is a proposed web-based system design to retrieve Quranic texts and other authorized materials considered as knowledge deriving from these texts, with relevant retrieval The intended use is for scholastic techniques. references as much as for practices purposes. The contribution of this project is itself being the groundwork requirement for a larger-scale project in terms of budget requirement for meeting the needs of the Creative Multimedia Industry established in Malaysia i.e. Islamic edutainment. The latter project is aimed at developing a prototype of a multimedia courseware for learning al-Quran with an edutainment approach. 2.1. Methodology
This project examines the structure between the Quranic texts and knowledge/topics deriving from alQuran as displayed on these websites, and reviews the themes given in authenticated exegesis and other sources closely-related to al-Quran like Stories of the Prophets by Ibn Kathir [12] and Sirah an-Nabawi [13]. Currently, there are four well-known exegeses: Ibn Kathir [ 14], Sayed Qutb [ 15], Maududi [ 16], and Hamka [17].
1.3. Quranic Texts on the Web This study found 128 websites offering a complete collection of Quranic texts. This initial finding indicated that the web has been utilized as the alternate medium for teaching and learning al-Quran. With such number of websites and availability of many books and articles discussing Muslim's faith, there is a need to understand the structure of Quranic text and
0-7803-9521-2/06/$20.00 §2006 IEEE.
1 705
This project has surveyed a total of 125 websites to understand the nature of the structure. The other three websites had a display problem during the conduct of this study. Selection of the websites was based on the criteria: al-Quran as one of its main content. This project has reviewed the themes outlined in two of the four well-known exegeses, Ibn Kathir and Hamka. Review of the remaining two will be added once theT.i proposed system design has been implemented. |
ag
gla
D.th
Enls F
Lghnn
b
T
A
bV irv1
ldnsa
I
V
3. Findings and discussions
=
by Y .
Y
_-
A Crabi ,..... . 9,.Trn,.l
InrdIton
.. ,I
Egih
references at Islamicity.com Topig
Hele
|"When the Qur'an is
read,
listen to it Ivith
attention,
|
G.. S=,-h DI.
igi . . . . ii. IrZ ........ gI
s
L
%i
minhneiwsbeautiful
recitation
alts, learn the properrecitation erato
tranltini
Englshisalsoprovided, E E Suano
bIy Sheikh
Nlinshawi (ieal for
|This tool helps, both chiildren a3ndi < -
rirse-ay-vere trasCtlit, and-L
se.
= Recita3tion
ted
tmes
o
=arbershi =
(non-
thistonlyi prevuiew
mrode)}.
8or
At east a 56K Internetconnection.
Figure 4. Memorizer feature at Islamicity.com Tn tern of structure, these websites work on pointand-click, i.e. hypertext link. The retrieval capabilities
X ..
are topical and search terms/phrase in Arabic and other major languages like English, French, and Indonesian; Ai-chapters (i.e. suras) and Boolean operators. Gifts One-third of the websites surveyed in this project ch, Fren | offer at least one search feature Boolean operators and ntiCS.-rchd i urll.l'.,b,,.,isz,..a, -NIMN only two of them incorporated Google search. ____rdn5earc_ The latest development is retrieval of Quranic texts in Arabic using transliteration with errors/variant-tolerant, _ _ _ offered by IslamiCity, known as phonetic search (Figure M.n,b.rhim __irile u lMe l(IC 11
LeamondsIscovertheworldof1dom
1I
Q-
-
C-np-eh-- C-
Spanish, ]-arm Ph
..
Sr.h: Searc:h
mutiple an-g-ge,
(EP1i,h
y
T, rshNl Malay, l ElElriE
r, Tcpi- S-arh, Phone.t c S-ar,~h Cwhapt.r Sea .h, Topicnde .
,., .
....
=.
rangeLofverses as
man times as|~~~~~~~~rnny yau yo want LLhpter:e I r.. _ } ayah ) f~~~~~~~~~~~surah: number. IThe Ouran MlemTorizer = Simple a3nd ea3sy to 'li
thefis annoLZnce
Arabic texts, translation and/or recitation, and al-Hadith from Sahih Bukhari and Muslim Arabic texts, translation and/or recitation, and excerpt from exegesis of Ibn Kathir. Arabic texts, translation and/or recitation and/or memorizer, search feature (phonetic and topical) and link to other websites providing news and issues related to the Muslim's world, e.g. IslamiCity (Figure 2-4). ....
Fila|Ftures:l
|islarrmiCity feels privileged to
least one reciter
v.
and holId your peace: that ye miay receivi
Mercy. Qur'an7:204
I~~~~~~~R
Quranic texts are posted for access at the surveyed 1 * websites as: i. Arabic texts and the translation by at least one scholar like A. Yusuf Ali or Pickhall, and in at least two languages like English and Indonesian ii. Arabic texts, translation with either synchronized or asynchronized recitation by at
iv.
0
Figure 3. Translation in other languages and
3.1. Structure implemented on the website
iii.
r
PhqteISWh Chpe
Findings are discussed in the following sequence: structure as adopted by the Websites surveyed in linking Quranic texts and knowledge/topics, and the proposed system design for retrieving Quranic texts and knowledge derived from these texts or al-Quran.
XPckhl IC Shki
French
Tl.
"MP., t-pil. (IC M.-h-hip) 4
i i_
tr5
sra
ualm
rigure z. bearcn Teaiure, cngiisn Lransiauion, and audio recitation at memorizer, Islamricity.com
0-7803-9521-2/06/$20.00 §2006 IEEE.
1 706
Visualization and User Interface
Retrieval Capability Block-level link Stemming SentenceNVerse Completion Topical Other common features (e.g. Boolean & Phrase)
1. 2. 3. 4.
Figure 5. Phonetic search at Islamicity.com
Figure 6. The architecture for the proposed system design
Content-wise, other than the texts from al-Quran, quite a number of these websites do not include access to al-Hadith, and even if provided, the link between alQuran and al-Hadith was not established. Recitation of the texts varies from one chapter to selected chapters to all the chapters in al-Quran. Most websites offer Quranic texts as images of the Arabic scripts and the required diacritical marks other the main consonants and vowels. Some of these texts are presented in a calligraphic form, like Kufi. Information and/or knowledge linked to these Quranic texts include: i. Islamic concepts like Islamic rulings or judgments, refutations, youth, and history ii. Islam and science iii. Lectures and sermons iv. Belief v. Previous nations or civilizations vi. Acts of worship vii. Muslim's dress viii. Moral and manners ix. Food and drinks x. Principles of exegesis xi. Issues related to translation and transliteration These topics are linked to al-Quran without a certain and consistent pattern of structure. None of these websites follow the approaches adopted by scholars like Bukhari and Muslim. This survey revealed that none of these websites and others identified from the search adopted the system design as to be proposed in this project.
The corpus contains the texts of the whole al-Quran, Hadith, exegesis, translated texts of al-Quran and Hadith, Asbabun Nuzul, recitation and Tajweed, and Civilization, i.e. any publications that quoted the Quranic texts like i. Stories of the Prophets [12], ii. Sirah an-Nabawi (Story of Prophet Muhammad) [13], iii. Aqidah and Ibadah, iv. History of Islam [ 18], v. Contributions of Muslim Scholars to Art and Sciences, vi. Atlas of al-Quran [19], and vii. Stories of men, Women, and Children around the Prophet.
Al-Quran is the primary source and al-Hadith is its close supplementary; and thus Muslims treated both texts as their primary sources. Thus the texts from alQuran will be segmented as blocks and linked to the relevant Hadith and Civilization (Figure 7). The blocks of Quranic texts will also be linked to exegesis, Asbabun-Nuzul, translated texts, recitation and Tajweed. Exegeses explain texts in al-Quran and AsbabunNuzul [20] gives the context and causes for the revelation of these texts. Non-Arabic speaking Muslims need translation with the texts for easier understanding of the Arabic texts, while at the same time, they strongly accept the fact that Arabic texts must be preserved and that al-Quran should be read in its original language, i.e. Arabic. The recitation is provided with the symbols/marks of Tajweed to facilitate reading in the correct and best manner.
3.2. Proposed system design for retrieving Quranic Texts
Figure 6 shows the architecture for retrieving Quranic
texts.
0-7803-9521-2/06/$20.00 §2006 IEEE.
Corpus Quranic Text Hadith Exegesis, Asbabun Nuzul, Translation, Recitation and Tajweed Civilization
1 707
for retrieving Quranic texts and its citations in any subject or disciplines. Such design will help Muslims in understanding the connection of knowledge with the Quranic texts, the degree of significance between the texts in al-Quran; and the works that have been done by Muslim scholars. Muslims could then contribute their innovation and inventions to fill in the gap or build on the existing human knowledge. Such a system will help Muslim researchers in compiling relevant sources. From the system perspective, such design and its retrieval features help users identify the meaning of that particular verse/s and the proper recitation, and also in what way the verse has been used to guide the creation and generation of the human knowledge. The next phase of this project is to develop the proposed design using Python programming language in an Open Source environment as a prototype beginning with Surah al-Fatihah. Collection development for the storage/database is to be done in a parallel manner. Further research as derived from the current proposal would be to work on artificial intelligence in recognizing Quranic texts.
Works that cited this verse, e.g. stories of Prophets, Sirah anNabawi, Aqidah & Ibadah, History of Islam etc.
Figure 7. Graphical presentation of the blocklevel link. The themes compiled from the Exegesis written by Ibn Kathir [14] and Hamka [17] are used as a guide to determine the group of verses that could form a block. For example, Chapter 1 verse 1 is treated as one block since this verse is discussed as one theme in both Exegeses. Since some verses are superior in terms of command or elaborated further by other verses, the block-level link will also lead users to these supporting verses. The block-level link is also to be designed to retrieve similar verses. Thus the retrieval capability also serves within al-Quran itself. Upon the point-and-click action taken by users at the Civilization link, users will be shown a list of works that cited or consulted verses in that particular block of Quranic texts. These works are determined through citation analysis and verified by scholars or institutions as authentic. Currently, the works on citation analysis have been done on Islamic Economics [21] and Islamic Management [22]. Publications other than those given earlier include research activities and projects derived from al-Quran, like Islamic Economics, Islamic Banks, Syaria Rulings for financial transactions, Islamic Management, pattern recognition of Quranic texts in calligraphic forms, and Islamic Edutainment. Copyrights from all concerned publishers and authors will be secured before uploading their works on the Website. This paper also proposed the availability of the e-book version of the publications categorized as Civilization, encouraging reading the writing as a whole. Other than block-level link, retrieval techniques for the Quranic texts will be root word, phrase/words, and verse completion. The root word of the Quranic text is to be determined using heavy stemming. Retrieval by topical is based on the scheme outlined by Fathi Osman [23].
5. References [1]
2005].
2005, r,
available
at:
[accessed February
[2] H. Siddiqui, "Seeking Knowledge: An Imperative", Education Social Religious - Article Ref IC0602-2913, / www.isrlrc|amici available at: tm/artliPcls/ [accessed March 2006]. -
[3] Panel Akademi Pengurusan, Pengurusan dalam Islam: Menghayati Prinsip dan Nilai Quraini, YPEIM, Kuala Lumpur, 2005. [4] M. Sanderson and A. al-Berair, "Keep it Simple Sheffield a KISS approach to the Arabic track", available at: htpHi.shfacu/mr/v
licAtionspprsm
I heffeldt1A rabicTREC? _.pdf [accessed July 2005].
paprs/S