... frames have been evaluated in comparison with the results of the manual analysis. ...... “The president Hosni Mubarak and secretary general of the Arab league have congratulated doctor Essmat ...... FRA=Y0, att=@on, rel=PLC);.
Alexandria University Faculty of Arts Department of Phonetics and Linguistics
M.A. Thesis Submitted to Phonetics and Linguistics department Presented by Marwa Saber Selim Arafat Supervised by Prof.Dr. Sameh Saad Abou-Almaged Alansary Professor of Computational Linguistics Department of Phonetics and Linguistics, Faculty of Arts, Alexandria University
2015
I
Acknowledgment Before all, I thank Allah for guiding me through my life, showing me the right path and giving me the strength to pursue my studies in a field so close to my heart. It is with immense gratitude that I acknowledge the support and help of my Professor. Dr. Sameh Alansary. I wish to express my deepest thanks and sincere appreciation to him for his highly appreciated help, support and patience in supervising this thesis. His continuous guidance has been invaluable. I can never seize to learn from his expertise in the field of computational linguistics. I'm greatly indebted to this honorable gentleman for his generous efforts at supervising this thesis, as well as for the many good things that I learnt from him whether it was academic or personal. Writing the algorithm and codes for this thesis could have been much more difficult without his help. I cannot find words to express my gratitude to him who have deeply affected the formative years of my academic and technical career. I would also like to thank my teachers in the Faculty of Arts, Alexandria University, who taught me linguistic and everything I have learned from them either in the class or outside the class. Finally, I hope that this thesis would be a useful addition to the still-poor research practical aspect of Arabic computational linguistics as well as a starting point of more advanced achievements in this field.
II
Dedication I would like to thank my entire family for their continuous love and support. I would like to thank my mother, my father, my sister and my two brothers who have always been supportive and encouraging, putting up with my long working hours and mood swings. I would not have done it without them. This thesis is dedicated to my parents who have given me the support throughout my life and it is also dedicated to the soul of my close friend Mona Beshr who left this world.
III
Declaration I hereby declare that no portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.
IV
Abstract The prime purpose of this study is to automatically extract the syntactic arguments of the modern standard Arabic verbs by designing a parser for analyzing the syntactic structures of Arabic verb phrases based on Chomsky’s X-bar theory and to exemplify this task through tree diagrams. This study also aims to test to what extent X-bar theory is the most appropriate syntactic theory to reveal the related syntactic argument of a specific predicates. So as to fulfill the purposes of the study, the research adopts an analytical descriptive approach, where a corpora containing 600 sentences taken from Arabic Parkinson corpus for 60 Arabic verbs is analyzed using the interactive analyzer tool (IAN). IAN is chosen in this regard for modeling X-bar theory owing to its close association with the X-bar approach. Through analyzing and describing the core structure of Arabic verb phrases by the X-bar theory, the study consolidates the linguistic framework of X-bar theory which indicates that all languages share the same underlying syntactic structure. Acquiring the verb subcategorization is a fundamental issue in several NLP tasks, for instance, in parsing where the availability of knowledge related to Subcategorization Frames (SCFs) and the complement/adjunct distinction meaningfully increases the accuracy of the parsing results, generation, and machine translation. 60 Arabic verbs belonging to different transitivity sub categories have been selected; twenty verbs from the direct transitive subcategory were selected, twenty verbs from indirect transitive and twenty from ditransitive were selected. These are the most common verbs, according to the frequency of occurrence of each verb in the Arabic dictionary developed in Bibliotheca Alexandrina. The selected sentences have been chosen according to their length; the length of the sentences range from five to 12 words. 50 sentences have been extracted for each verb and then they were filtered to be 10 sentences. The filtration has been done to achieve some criteria; the criteria have been based on a linguistic and systematic basis. The corpus is intended to be representative of the contemporary standard use of the written Arabic language. The corpus is segmented into sentences and tagged for POS. All the required linguistic attributes are assigned to each word. The total number of the training corpus is 600 sentences. The results of the parser and the automatic extracted subcategorization frames have been evaluated in comparison with the results of the manual analysis.
V
List of Abbreviations NLP
Natural language processing
MP
Minimalist Program
TG
Transformational Grammar
TGG
Transformational-generative grammar
PSR
Phrase Structure Rules
PF
Phonological Form
LF
Logical Form
PP
Prepositional Phrase
IP
Inflectional Phrase
JP
Adjectival Phrase
AP
Adverbial Phrase
NP
Noun Phrase
VP
Verb Phrase
SF
Subcategorization Frame
RASP
Robust Accurate Statistical Parsing
GPSG
Generalized Phrase-Structure Grammar
FGD
Functional Generative Description
ATB
Arabic Tree Bank
T-rules
Transformation rules in the UNL frame work
D-rules
Disambiguation rules in the UNL frame work
N-rules
Normalization rules in the UNL frame work
ADJ
Adjective
VER
Verb
NLW
Natural Language Word.
FLG
Language Flag.
FRE
Frequency of the word occurrence
UNL
Universal Networking Language
VI
Org
Original sentence
S
Sentence
TRA
Transitivity
TST
Transitive
TSTD
Direct monotransitive
TSTI
Indirect monotransitive
TST2
Ditransitive
TST3
Tritransitive
LL
List to List rules
LT
List to Tree
TT
Tree to Tree rules
LR Parser It is an acronym. The L means that the parser reads the input text in one direction without backing up, the R means that the parser produces a reversed rightmost derivation.
VII
Grammar Symbols: 1PP 1PS 2PP 2PS 3PP 3PS NB JB PB AB VB Arg0 Arg1 MTW ANI DEFN NANM ANM NUM SNG PLR GEN MCL FEM PRE PER ART Subj Comp Adjt Spec has_AJ1 has_comp SEM ACC NOM GEN DEF PROJ () “” ^ e {|} (%) :
First First person Singular. Second person plural. Second person singular. Third person plural. Third person singular. Intermediate projection N bar. Intermediate projection J bar. Intermediate projection P bar Intermediate projection A bar Intermediate projection V bar First Argument Second Argument Multi word expression Animacy Definite noun Non animate Animate number Singular Plural Gender Masculine Feminine Preposition Person Article Subject Complement adjunct Specifier Has adjunct Has complement Semantic class Accusative case Nominative case Genitive case Definite noun Projected constituent Node String Not Empty node or Index of nodes Scope ID
VIII
person
plural.
Indicates the beginning of a sentence Indicates the end of a sentence.
IX
Contents List of Figures............................................................................................................................. XV List of Tables .......................................................................................................................... XVIII Chapter 1: introduction................................................................................................................. 2 1.1
Introduction ...................................................................................................................... 2
1.2
Aim of the study ................................................................................................................ 3
1.3
Research questions ........................................................................................................... 3
1.4
Significance of the study to the field ............................................................................... 3
1.5
Background of the study .................................................................................................. 4
1.6
Originality of the thesis .................................................................................................... 5
1.7
Map of the thesis ............................................................................................................... 5
Chapter 2: Theoretical Background ............................................................................................ 8 2.1
The Role of Syntax in Language ..................................................................................... 8
2.1.1
The Components of Grammars ...................................................................................... 8
2.1.2
The Representations of Syntax ...................................................................................... 8
2.1.2.1
2.2
More Complex Syntactic Structures in language ..................................................... 10
Syntactic Arguments of Verbs....................................................................................... 11 2.2.1
The difference between arguments and adjunct ....................................................... 12
2.2.2
Obligatory VS Optional arguments .......................................................................... 14
2.2.3
Representing arguments and adjuncts ...................................................................... 14
2.2.4
Argument Structure (AS or A-Structure) ................................................................. 16
2.2.5
Semantic Intuitions Concerning the Argument-Adjunct Distinction ....................... 17
2.2.5.1
Semantic argument (thematic role) ...................................................................... 17
2.2.5.2
Problem of Semantic Intuition ............................................................................. 18
2.2.5.3
Challenges in NLP ............................................................................................... 19
2.3
Different approaches in representing the structure .................................................... 20
2.3.1
Generative grammar .................................................................................................... 20
2.3.1.1
Transformational Grammar ...................................................................................... 21
2.3.1.2
Government and binding theory............................................................................... 21
X
2.3.1.2.1
The X-bar theory .............................................................................................. 22
2.3.1.2.1.1 Extending X-bar theory to sentences and clauses ....................................... 24 2.3.1.2.1.2 Phrase structure for the Arabic languages with VSO word order ............... 24 2.3.1.2.1.3 Constraints on movement ............................................................................ 28 2.3.1.3
Minimalist program .................................................................................................. 30
2.3.2
Categorial grammar ..................................................................................................... 32
2.3.3
Dependency grammar.................................................................................................. 32
2.3.4
Stochastic/probabilistic grammars/network theories ................................................. 33
2.3.5
Functionalist grammars................................................................................................ 33
2.4
Subcategorization ........................................................................................................... 33
2.4.1
Tree Families and Subcategorization Frames .............................................................. 36
2.4.2
Valency vs. subcategorization ...................................................................................... 37
2.4.2.1
Types of valency ...................................................................................................... 38
2.4.3
The status of subjects................................................................................................... 38
2.4.4
Verb Subcategorization in Linguistic Theory ................................................................ 38
2.4.4.1
Government-Binding and related approaches .......................................................... 38
2.4.4.2
Categorial Grammar ................................................................................................. 39
2.4.4.3
Lexical-Functional Grammar (LFG) ........................................................................ 39
2.4.4.4
Generalized Phrase-Structure Grammar ................................................................... 42
2.4.4.5
Head-Driven Phrase-Structure Grammar ................................................................. 44
2.4.5
On the semantic content of subcategorization frame ................................................. 44
2.4.5.1
Relations between Verb Meaning and Clause Structure .......................................... 44
2.4.5.2
Semantic Correlates of Subcategorization Frames ................................................... 45
2.4.6
Towards Building a Large Syntactic Lexicon ................................................................. 46
Chapter 3: The state of the Art................................................................................................... 52 3.1
Previous attempts of automatic extraction of sub categorization frames ....................... 52
3.1.1
Brent ......................................................................................................................... 53
3.1.2
Manning ................................................................................................................... 56
3.1.3
Ushioda et al............................................................................................................. 56
3.1.4
Briscoe & Carroll ..................................................................................................... 57
3.1.5
Buchholz .................................................................................................................. 58
3.1.6
O’Donovan et al ....................................................................................................... 59
3.1.7
Attia et al .................................................................................................................. 60
XI
3.2
Previous attempts on a number of different languages ................................................... 61
3.2.1
German ..................................................................................................................... 61
3.2.2
Czech ........................................................................................................................ 62
3.2.3
Bulgarian .................................................................................................................. 62
3.2.4
Portuguese ................................................................................................................ 62
3.2.5
Italian ....................................................................................................................... 63
Chapter 4: Data Compilation and Analysis .............................................................................. 65 4.1
Data Collection .................................................................................................................. 65
4.1.1
Selecting Arabic verbs ............................................................................................. 65
4.1.2
Corpus description and Classification ...................................................................... 70
4.2
The Linguistic Analysis of the Data .................................................................................... 74
4.2.1 The Linguistic Frame work and Algorithm .................................................................... 77 4.2.1.1
X-bar theory ......................................................................................................... 77
4.2.1.2
Detecting the Boundaries of the Phrases .............................................................. 78
4.2.1.2.1 The linguistic description of the subject in the corpus .............................. 79 4.2.1.2.2 Description of the Adjuncts in the Corpus................................................. 90 4.2.1.2.3 Description of the complements in the Corpus .......................................... 95 4.3 Linguistic Description of the Selected Examples from the Corpus ................................. 110
Chapter 5: The Automatic extraction of the Sub- categorization Frame for the Arabic verbs ............................................................................................................................................ 122 5.1
Using IAN tool in the automatic extraction of the subcategorization frames ................ 122
5.1.1
IAN as a tool for linguistic analysis ....................................................................... 122
5.1.1.1
The Formal Framework.......................................................................................... 124
A.
Basic Definitions ......................................................................................................... 124
B.
Types of rules ............................................................................................................. 125
5.2
Building the Dictionary of the Automatic Extraction of SCF............................................ 129
5.2.1
Morphological Information ........................................................................................ 130
5.2.1.1
Part of speech feature ............................................................................................. 130
5.2.1.2
Gender .................................................................................................................... 131
5.2.1.3
Number................................................................................................................... 131
5.2.1.4
Person ..................................................................................................................... 132
5.2.1.5
Tense ...................................................................................................................... 132
5.2.2
Syntactic Information ................................................................................................. 132
XII
5.2.2.1 Transitivity ................................................................................................................. 132 5.2.2.2 Case ............................................................................................................................ 133 5.2.3
Semantic Information ................................................................................................ 133
5.2.3.1
Animacy ................................................................................................................. 133
5.2.3.2
Semantic classification of the words ...................................................................... 133
5.3
The Development of the Grammar ................................................................................. 133
5.3.1
The Architecture of the Grammar Design .............................................................. 135
5.3.2
Grammar modules .................................................................................................. 136
5.3.2.1
Building the Normalization module ....................................................................... 136
5.3.2.1.1 Preprocessing phase................................................................................... 136 a)
Deleting strings ..................................................................................................... 136
b)
Replacing strings .................................................................................................. 137
c)
Creating nodes ...................................................................................................... 137
5.3.2.2
Building the Tokenization module .......................................................................... 137
a)
Identifying the POS for tokens ............................................................................... 138
b)
Disambiguation in the tokenization stage .............................................................. 138
5.3.2.3
Building the Morphological analysis module ........................................................ 139
5.3.2.4
Building the parsing module .................................................................................. 140
5.3.2.4.1 Projection phase ........................................................................................... 141 5.3.2.4.2 Intermediate projection rules ....................................................................... 141 5.3.2.4.3 Maximal projection rules ............................................................................. 142 5.3.2.5 5.4
Detecting the Verb Subcategorization Frames ...................................................... 152
A walk through examples from the corpus ..................................................................... 154
Chapter 6: Results and Discussion ........................................................................................... 161 6.1
Results ............................................................................................................................. 161
The subject is a NP consisting of (DET+N) and the object is a NP consisting of (N+NP+PP) and adjunct (PP) ........................................................................................ 162
The subject is a NP consisting of (DET+NOUN) and the object is a NP consisting of (N+NP+PP) and has no adjunct ..................................................................................... 163
The subject is a NP consisting of (N+JP+JP) and the object is NP (N+JP) .................. 166
Subject is a NP consisting of (N+NP (N+JP+PP)) and the object is NP (N+JP+JP) ... 167
The subject is a NP consisting of N+NP + JP (genitive construction) and the object is a NP consisting of (N+NP)................................................................................................. 170
The subject is a NP consisting NP+NP (apposition construction) and the object is NP 171 XIII
The subject is a DP consisting of DET+NP and the object is consisting of NP (N+JP) 173
The subject is a NP consisting of DET+N
and the object is consisting of PP
(P+NP(N+NP)) ............................................................................................................... 175 6.2
Evaluation ........................................................................................................................ 176
Chapter 7: Conclusion ............................................................................................................... 182 Reference .................................................................................................................................... 184 Appendix (A) .............................................................................................................................. 193 Appendix (B) .............................................................................................................................. 194 Appendix (C) .............................................................................................................................. 196 Appendix (D) .............................................................................................................................. 198 Appendix (E) .............................................................................................................................. 204 Appendix (F) ............................................................................................................................... 209 Appendix (G) .............................................................................................................................. 215 Appendix (H) .............................................................................................................................. 219 Appendix (I)................................................................................................................................ 225
XIV
List of Figures Figure 1: A series of transformation rules describe the syntax of the sentences. ............................... 9 Figure 2: The hierarchical relationships between the components of a sentence .............................. 9 Figure 3: The structure of the sentence "The boy with red shorts kicked the ball." ........................ 10 Figure 4: The structural representation of the conjunction between two complete sentences. ........ 11 Figure 5: X-bar schema .................................................................................................................... 15 Figure 6: The tree represented in dependency grammar .................................................................. 16 Figure 7: A derivational model assumed in GB which consists of four levels of representation. ... 22 Figure 8: Basic X-bar Structure. ...................................................................................................... 23 Figure 9: The structure of the phrase “ ”قال أن االقتصاد فقير................................................................. 24 Figure 10: The flat VP structure of Arabic sentences. ..................................................................... 25 Figure 11: The D- and S-structures under the subject adjunction proposal. .................................... 26 Figure 12: The D-structure for both SVO and VSO languages. ...................................................... 27 Figure 13: The D- and S-structures under the verb movement proposal. ........................................ 28 Figure 14: D-structure for passive.................................................................................................... 29 Figure 15: S-structure for passive .................................................................................................... 30 Figure 16: Different subcategorization frames for the verb buy ...................................................... 37 Figure 17: Example of functional annotation................................................................................... 40 Figure 18: A simplified lexical entry for the verb “paint”. .............................................................. 40 Figure 19: C- structure for Penn Treebank sentence “The inquiry soon focused on judge”. ........... 42 Figure 20: F-structure for Penn Treebank sentence “The inquiry soon focused on judge” ............. 42 Figure 21: Lexical Markup Framework Core Model. ...................................................................... 47 Figure 22: Lexical Markup Framework Syntax Extension (N.Loukil, K. Haddar and A. BEN HAMADOU. 2008). ........................................................................................................................ 48 Figure 23: Subcategorization frame1 ............................................................................................... 49 Figure 24: Subcategorization frame2 ............................................................................................... 49 Figure 25: A non –recursive (finite state grammar) for detecting certain verbal complements.”?” indicates an optional element. Any verb followed immediately expressions matching , , >)على. Including a subject and an oblique with the preposition >على. Table 1: Governable and nongovernable grammatical functions in LFG
In LFG, the subcategorization requirements of a particular predicate are expressed by its semantic form: FOCUS_(↑ SUBJ)(↑ OBLon) as shown in figure (20). The subcategorization requirements are totally expressed by semantic forms that enforced at f-structure level through completeness and coherence well-formedness conditions on f-structure (Kaplan and Bresnan, 1982). “An f-structure is locally complete iff it contains all the governable grammatical functions that its predicate governs. An f-structure is complete iff it and all its subsidiary f-structures are locally complete. An f-structure is locally coherent iff all the governable grammatical functions that it contains are governed by a local predicate. An f-structure is coherent iff it and all its subsidiary f-structures are locally coherent”. (page 211).
41
Figure 19: C- structure for Penn Treebank sentence “The inquiry soon focused on judge”.
Figure 20: F-structure for Penn Treebank sentence “The inquiry soon focused on judge” 2.4.4.4
Generalized Phrase-Structure Grammar
In the 1970s and 1980s, Gerald Gazdar and others developed the Generalized PhraseStructure Grammar. GPSG is considered an attempt to extend traditional phrase structure
42
grammars, so, they can handle the phenomena that only transformations were supposed to be able to explain. GPSG emphasized the necessity of formalization. It is characterized by its simple monostratal architecture and its formal notation, it was much easier to be implemented computationally (Chrupala, 2003). GPSG was initiated as an augmented phrase-structure grammar; the phrase-structure rewrite rules are replaced by immediate dominance rules; indicating the tree hierarchy of constituents but not their relative order. The ordering is described by linear precedence statements. This is more economical and flexible than traditional rewritten rules. In GPSG, a category is a set of feature-value pairs, for example, the category traditionally represented as NP corresponds to the following set: {,,}. The BAR feature corresponds to the bar-level concept in X-bar Theory, in which GPSG adopts. For the category N, the feature-value set would be similar but the BAR feature would be 0. In GPSG, features can have either atomic values or values that are themselves feature value sets. One such feature is AGR (agreement). {} The above notation indicates agreement with a 3rd person feminine singular NP. One important difference between the X-bar scheme as found in GB and the one used by GPSG is the fact that; in GPSG, the S is the projection of V. In GPSG subcategorization frames of verbs implemented by means of the feature SUBCAT whose value is an integer corresponding to an IP-rule describing the structure in which they are inserted. This feature is encoded in lexical entries: multiple frames mean multiple entries in the lexicon. As an example, consider the following lexical entry []. The value of the SUBCAT feature of the lexical entry ‘weep’. Metarules are regarded as another important view in GSPG. As the name indicates, these are rules that take rules as their input and produce other rules as their output. They extend the basic phrase structure grammar. Metarules in GPSG are used, for example, to derive rules licensing passive sentences from those that describe active ones. Their use permits to factor out redundancy that would be present in grammar, and provides a principled treatment of regular correspondences apparent between active and passive constructions. As a result of the fact that SUBCAT indexes verbs into immediate dominance rules, heads only subcategorize for their sisters. This in turn means that; subjects are not included in the subcategorization of the verb.
43
2.4.4.5
Head-Driven Phrase-Structure Grammar
It is the theory of grammar combining insights from a variety of sources, most notably GPSG, CG and GB, as it stresses the importance of precise formal specification. The theory uses typed feature structures to represent the integrated linguistic signs. The types are described by means of a multiple inheritance hierarchy that helps avoiding redundancies (Chrupala, 2003). HPSG is more lexicalist than other theories. Most linguistic information is contained in the lexical entries (Chrupala, 2003). Syntax and semantics are not widely independent, as in the approaches described above, but rather are tightly integrated in the same framework. The semantic component of HPSG is based on situation grammar (Barwise and Perry, 1983). In HPSG subcategorization, information is specified in lexical entries as exposed in Pollard and Sag (1987) and Pollard and Sag (1994). The subject is treated in a way nearly similar to other arguments. Verbs have a SUBCAT feature whose value is a list of synsem objects corresponding to values of the SYNSEM features of arguments subcategorized for by the head. The order of these objects corresponds to the relative obliqueness of the arguments, with the subject coming first, followed by the direct object, then the indirect object, then PPs and other arguments.
2.4.5 On the semantic content of subcategorization frame There is a relation between the meaning of the verb and the syntactic structure in which it appears. The closer any two verbs in their semantic structure, the greater the overlap should be in their syntactic structures (Gleitman and Fisher, 1991). Zwicky (1971) points out that fixing the meaning for a verb seems to allow prediction of many of its surface properties; verbs that are related in meaning share aspects of their clausal syntax. However, despite many such promising examples of syntactic/semantic linkages, the predictions from meaning to surface syntactic form appear complex. 2.4.5.1
Relations between Verb Meaning and Clause Structure
An essential feature of language design is that certain verbs are different in their occurrence in particular grammatical constructions. For example, the verb “laugh” does not permit a post-verbal NP (a book) as in sentence (40), in which the verb put occurs in structures like NP V NP PP as in sentence (41). While laugh occurs in structure like NP V. As a result the two verbs are associated with different complements, “put” accepting the complement NP PP and laugh accepting a null complement (Gleitman and Fisher, 1991). 40) *John laughed a book on the table. 41) John put a book on the table. After examining the two sentences in (40) and (41), “laugh” is associated with subcategorization frame NP V, while “put” is associated with subcategorization frame NP V NP PP.
44
2.4.5.2
Semantic Correlates of Subcategorization Frames
In the previous exemplified cases, the number of NP’s required for grammaticality appears to correspond to the number of semantic-relational elements necessary to identify the participants in the events described by the verb. For example, the notion of the verb “putting” requires an entity who does the putting, a thing that is put, and a location into which it is put; each such entity is required to be as a NP in well-formed sentences containing put. Thus sentence in (42) is a grammatical sentence for it renders all and only these thematic roles. In sentence (42), the agent (John) appears as subject, the thing moved (poison) appears as direct object, and the location (cup) appears as indirect object, marked by a locative preposition. On the other hand, sentence (43) is ruled out because it contains too few NP’s to fill the argument positions. 42) John put poison in the cup. 43) *John put in the cup. The logic of propositions of the figured verbs is reflected by the number and positioning of NP's in the clause. Gleitman and Fisher (1991) claim that there are correlations between verb meanings and sentence structure. Correlations exist because, other things being equal, a sentence making reference to a particular type of event or state of affairs will naturally allow the speaker to mention the necessary participants and to differentiate their roles in some systematic way (Dowty, 1986). A number of such distinctions in the subcategorization privileges of verbs have been examined in the linguistic literature and hypothesized to be reflections of semantic distinctions. Gleitman and Fisher (1991) exemplify a few examples of the distinctions in the subcategorization privileges of verbs as follows: 1. Motion verbs such as put, walk, and give allow or even require prepositional phrases (PP’s), as they encode the sources, paths, and goals of objects moving through space (Jackendoff, 1983, 1987;Talmy, 1975). 2. Verbs of spatial perception and of cognition characteristically allow sentential complements (SComp’s) that describe the events perceived or the propositions cognized. On the other hand verbs describing physical motion of bodies in space usually do not happen with SComps (compare John saw | believed that Mary was coming with *John put |*gave that Mary was Coming (Vendler, 1972). 3. Verbs describing acts are more natural in progressive and imperative structures than verbs which describe states. 4. Verbs describing symmetrical relationships are natural in plural intransitive (John and Mary met) structures while others are unnatural in singular intransitive structures (*John met) (Gleitman, 1965). 5. Verbs describing (externally caused) transfer, or change of possessor of an object from place to place (or from person to person) fit naturally and typically into sentences, containing three NP arguments (3-NP), while others do not. (Compare John moved his belongings to 45
Texas, John turned Mary into a bat, *Bill went Mary to the party; Jackendoff, 1978; Pinker, 1987.)
2.4.6 Towards Building a Large Syntactic Lexicon Comprehensive subcategorization lexicons are vital for the development of successful parsing technology (Carroll et al., 1998; Arun and Keller, 2005), important for various computational linguistic tasks (such as automatic verb classification, selectional preference acquisition, psycholinguistic experiments (Lapata et al., 2001, Schulte and Brew, 2002; McCarthy and Carroll, 2003)). There is an attempt to build a valency lexicon for English called VALEX lexicon. VALEX is a large valency (subcategorization) lexicon for English verbs which is suitable for statistical natural language processing, linguistic and psycholinguistic use. It includes subcategorization frame (SCF) and frequency information for 6,397 English verbs. It assumes a classification of 163 SCF types (Briscoe, 2000) - a superset of those found in the ANLT and COMLEX Syntax dictionaries. The SCFs abstract over specific lexicallygoverned particles and prepositions and specific predicate selectional preferences but include some derived semi-predictable bounded dependency constructions, such as particle and dative movement. The lexicon provides a lexical entry for each verb and SCF combination. It includes 212,741 entries in total, 33 per verb on average.VALEX differs from other existing valency lexicons in the following ways:
It was acquired automatically from five large corpora (both British and American) and the Web. The corpus data (consisting of 15.9M sentences in total) were processed using a recent version (Korhonen, 2002) of the comprehensive subcategorization acquisition system of Briscoe and Carroll (1997). Since the lexicon was acquired automatically, it contains some incorrect SCF entries and inaccurate frequencies. Software is therefore provided with the lexicon which can be used to remove noise from the lexicon, improve the quality of automatically acquired SCF distributions and/or create sub-lexicons suitable for different purposes. Four sub-lexicons are also provided for users which are more accurate than the basic lexicon and which can be readily employed for tasks that require better accuracy. The lexicon includes statistical information about the frequencies and relative frequencies of SCFs in corpus data. This makes it particularly suitable for statistical (NLP) use. For Arabic, there is an attempt to build an Arabic syntactic lexicon in Multimedia Information Retrieval and Advanced Computing Laboratory, University of Sfax, Tunisia for Arabic verbs based on the Lexical Markup Framework6(LMF) (Loukil, 2008), which describe the lexical information in a simple way using general guidelines and enable the sharing of resources.
6
LMF (Francopoulo, 2005) provides an extensible architecture that is relevant for modelling both Machine Readable Dictionaries and NLP lexical resources.
46
Computational lexicons that encode syntactic information are known to be difficult to construct because of the absence of dictionaries or existing lexical databases from which they can extract the syntactic knowledge. Computational lexicons provide specifications of syntactic behavior like surface properties, subcategorization frames, argument realizations and morphology syntax interaction. This kind of information is very useful especially for grammar parsers (Loukil, 2008).
Figure 21: Lexical Markup Framework Core Model. The syntactic behavior of the lexical entries in the lexicon is used to capture syntactic redundancy in the lexicon. It is described by a set of permitted syntactic formations grouped in semantically disjoint subsets. A Subcategorization frame represents the set of possible syntactic constructions associated to a predicate and actually realized by the combination of several complements or positions. A subcategorization frame can be seen as valence pattern specifying the order and the nature of permitted positions instances.
47
Figure 22: Lexical Markup Framework Syntax Extension (N.Loukil, K. Haddar and A. BEN HAMADOU. 2008). Loukil et al (2008) has discussed the possible syntactic information that can be embedded in the syntactic lexicon of verbs in Arabic, based on the syntactic extension of LMF. A lexical entry may have several frames providing each several mandatory or optional positions. Each position proposes possible realizations and their morphological syntactic descriptions given within the syntactic argument component. The lexeme property component describes syntactic features special to the lexical entry like tense and mood. Loukil’s model, assumes that an acceptable syntactic formation for a given verb is embedded in a subcategorization frame (SF). A SF consists of an ordered list of the arguments required by the verb, and a set of constraints on those arguments such as information about complement introducers. Loukil et al. (2008) have a trial in building a syntactic lexicon for Arabic Verbs (Arabic LMF syntactic lexicon) by specifying manually the subcategorization frames accepted by verbs in Arabic. Then edit those SFs (17 SFs in their lexicon), as those of figure 4 and figure 5, with the Lexus editor, which performs a compatibility check of the proposed structure with LMF. Finally, they edit Arabic verb lemmas and they affect one or many SFs to each entered verb. The lexicon contains 2500 verb lemmas with an average of 2.7 SF per verb.
48
Figure 23: Subcategorization frame1
Figure 24: Subcategorization frame2 Also, there has been an attempt to build a valency lexicon7 of Arabic verbs by Bielick, and Smrz (2008) using a morphologically and syntactically annotated corpus, the Prague Arabic Dependency Treebank (PADT) which provides refined linguistic annotations whose multi-level description scheme discerns functional morphology, analytical 7
Valency lexicons can find application in automatic parsing as well as in language generation
49
dependency syntax, and tectogrammatical representation of linguistic meaning. Their approach is inspired by the VALLEX lexicon of Czech verbs (Lopatkov´a et al., 2006).
50
CHAPTER 3 THE STATE OF THE ART
51
Chapter 3: The state of the Art In this chapter, the researcher presents the established methods for automatic subcategorization acquisition. The subcategorization frame is considered to be of great importance to several NLP tasks, such as Information Extraction or parsing. However, compiling resources including subcategorization representation is difficult and timeconsuming. Predicate subcategorization is a key component of a lexical entry, because most, if not all, recent syntactic theories 'project' syntactic structure from the lexicon (Briscoe & Carroll, 1997). Therefore, a wide-coverage parser utilizing such a lexicalist grammar must have access to an accurate comprehensive dictionary encoding (at a minimum) the number, category of a predicate's arguments and ideally also information about control with predicative arguments, semantic selection preferences on arguments, and so forth, to allow the recovery of the correct predicate-argument structure. Moreover, if the parser uses statistical techniques to rank analyses, it is also critical that the dictionary encode the relative frequency of distinct subcategorization classes for each predicate. It has been observed that half of parse failures on unseen test data were caused by inaccurate subcategorization information in the ANLT dictionary (Briscoe & Carroll., 1993). The close connection between sense and subcategorization as well as between subject domain and sense makes it likely that a fully accurate 'static' subcategorization dictionary of a language is unattainable in any case. Moreover, although Schabes (1992) and others have proposed 'lexicalized' probabilistic grammars to improve the accuracy of parse ranking, no wide-coverage parser has yet been constructed incorporating probabilities of different subcategorizations for individual predicates, because of the problems of accurately estimating them. These problems suggest that automatic construction or updating of subcategorization dictionaries from textual corpora is a more promising avenue to pursue. Various experiences show that the automatic extraction can be a practical and reliable solution for acquiring such a kind of knowledge.
3.1
Previous attempts of categorization frames
automatic
extraction
of
sub
In order to represent accurate subcategorization information, a distinction should be made between complements and adjuncts. Complements are taken to be syntactically specified and required by the head, whereas adjuncts (of time, place, purpose, etc.) can only modify a head, according to almost all different frameworks (e.g. the Minimalist Program (Chomsky 1995), Lexical-Functional Grammar (Bresnan 2001), Head-Driven Phrase Structure Grammar (Pollard and Sag 1994), Categorial Grammar (Morrill 1994), and Tree-Adjoining Grammar (Joshi and Schabes 1997) (Elghamry, 2004). Constituents have to be either selected (as complements) or not. If they are not, they are freely licensed as adjuncts. Several methods have been suggested for learning subcategorization frames automatically from text corpora (e.g., Brent1993; Manning 1993; Ushioda et al.1996; Briscoe & Caroll 1997; and Buchholz 1998; O’Donovan et al 2005 and Attia et al., 2011). No more recent methods have been reported for automatic frame identification. Other researches that have been done on frame identification in languages other than English are mainly based on one or 52
more of these methods. These methods depend on different techniques for extracting subcategorization frames. They have two important features in common. Firstly, they are corpus-dreiven; they focus on the distributional regularities in the input. Secondly, they use a probabilistic model of subcategorization frames.
3.1.1 Brent Brent (1991) claimed that the first step in finding a subcategorization frame is determining the verb. Brent used Brown untagged corpus as input. The attempt of determining the verb from untagged corpus is very difficult and poses a serious challenge, because of the noun/verb productive ambiguity. He used statistical disambiguators which have caused many problems since they have high error rate. The architecture of the Brent system consists of three modules (Brent, 1991): 1. Verb detection: the module that uses Case Filter (Rouvret and Vergnaud, 1980) to detect the verbs in the input. 2. SF detection: the module that detects subcategorization frames using a simple, finite state grammar for a fragment of English. 3. SF decision: the module that determines whether a verb is associated with a given SF, or its apparent occurrences in that SF are due to error. This is done using statistical models of the frequency distributions. Verb detection The technique Brent developed for determining verbs is based on the Case Filter. The Case Filter is a proposed rule of grammar which states that every noun-phrase must appear either immediately to the left of a tensed verb, immediately to the right of a preposition, or immediately to the right of a main verb. The program judges an open-class word to be a main verb if it is adjacent to a pronoun or proper name. Such a pronoun or proper name could be either the subject or the direct object of the verb. Other noun phrases are not used, because it is too difficult to determine their boundaries accurately. Efficiency and accuracy were the two criteria used for evaluating the performance of the main-verb detection module. SF detection The obvious approach to detecting SFs like "V NP to V" and "V to V" is to look for occurrences of just those patterns in the training corpus, but the obvious approach fails to address the attachment problem illustrated by examples (a) and (b): a. I expected [NP the man who smoked NP] to eat ice-cream. b. I doubted [NP the man who liked to eat ice-cream NP] Therefore, Brent has opted for another approach which is to wait for clear cases like "V PRONOUN to V'. The advantages can be seen by contrasting (a), (b) on the one hand, and (c) and (d). c. I expected him to eat ice-cream. d. * I doubted him to eat ice-cream.
53
Generally, Brent system (1991, 1993, and 1994) determines the syntactic structure that is necessary for frame acquisition using a small finite-state grammar that describes only that fragment of English that is most useful for recognizing SFs. The grammar uses approximate cues in the form of function morphemes – prepositions, determiners, inflection, pronouns, auxiliary verbs and complementizers, as well as proper names and punctuations to determine the syntactic structure that is necessary for frame acquisition as shown in table 2. Brent’s approach is based on the following two principles (Elghamry, 2004):
It does not try to parse sentences completely. Instead it relies on local morphosyntactic cues: (1) The word following a determiner is unlikely to function as a verb; (2) the sequence “that the” indicates the beginning of a clause. It does not try to draw categorical conclusions about a word on the basis of one or fixed number of examples. Instead, it attempts to determine the distribution of exceptions to the expected correspondence between cues and syntactic frames. It uses a statistical model to determine whether the occurrence of a verb with cues for a frame is too regular to be explained by randomly distributed exceptions. Table 2: Lexical categories used in the definition of the cues
The grammar for detecting SFs needs to distinguish three types of complements: direct objects, infinitives, and clauses. The grammars for each of these are presented in figure 25.
Figure 25: A non –recursive (finite state grammar) for detecting certain verbal complements.”?” indicates an optional element. Any verb followed immediately expressions matching , , "|""ة%z,POD); Rule (30):
(%y,Rertieved,^N)(%z,POD):=(%y,?N)(%z,POD);
The sign (?) in rule (30) is used to retrieve the entries from the dictionary. Moreover, in the morphological phase, the rules are also capable of recognizing and modifying the wrongly written (taa marbota) “ ”ةwhich is always wrote as (haa marbota) “ ”هby rule (31). This error could be predicted, since in Arabic no definite noun could end with (haa marbota) “”ه. Rule (31): (ART , %y ) (TEMP, "/..+ه/", %x ) ({BLK | STAIL}, %z ) := (%y ) (%x , ""ه">"ة, +Y ) (%z ) ; 5.3.2.4
Building the parsing module
Parsing (syntactic analysis) is the process of analyzing a string of symbols in a natural language, conforming to the rules of a formal grammar. Parsing is often performed as a method of understanding the exact meaning of a sentence. It usually emphasizes the importance of grammatical divisions such as subject, object and predicate. Within computational linguistics, parsing is used to refer to the formal analysis of a sentence or other string of words into its constituents, resulting in a parse tree demonstrating their syntactic relations to each other. The term ‘Parsing’ is used to describe the process of automatically building syntactic analyses of sentences in terms of a given grammar and lexicon. A parser is a component that transforms input data into data structure (parse tree). The parsing may be preceded or followed by other steps, or they may be combined into a single step. The parser is often preceded by a separate lexical analyzer, which identifies tokens from the sequence of input characters. Identifying the structure is the first step towards understanding the meaning of a sentence. Syntactic analysis (parsing) is a procedure that recognizes a sentence and discovers how it is built. Although, the development and maintenance of handwritten grammars is a hard task, rule-based parsing has a strong advantage, as one can easily modify and accommodate the parser to new tasks (Bassam et al., 2014). In the developed method of this thesis, the parser is preceded by other steps (normalization and tokenization). After the normalization and assigning the correct POS for each token in the sentence, the syntactic module should start drawing the syntactic trees for sentence structures according to the X- bar theory which is a specific implementation of constituency grammars. It is a method of sentence analysis that divides the sentence into constituents, but it states some very specific rules for doing that: the topmost node (S) is called XP (X-phrase) and is considered to be the maximal projection of a head X. The use of the symbol "X" and therefore "XP" comes from the fact that the theory claims that all the different types of phrases (NP, VP, JP, etc.) share the same underlying 140
structure. Projections are always binary, i.e., the tree cannot bring more than two branches at a time except for the cases of coordination, because this is not allowed in X-bar. In order to avoid this, the head may have intermediate projections before the maximal projection. These intermediate projections are called XB (from X-bar), and again must be replaced by the specific categories of the head (VB is the intermediate projection of V). 5.3.2.4.1
Projection phase
In the projection phase, small constituents are combined to gradually form a bigger tree until the whole sentence is analyzed. The main goal of projection rules is to build the sentence structure. The syntax of the projection rules is stated in figure 86: := ; Figure 86: The syntax of the rules in the projection phase. Where is a syntactic relation, including a , in case of head-only such as relations (VH, NH, PH, JH, AH, CH, IH, DH), or a and a , in case of binary relation. There are mainly two types of projection rules:
Replace, when the number of relations in the left side is the same as in the right side, and it is used for collapsing single-branched structures (i.e., parent nodes that have one single child). Merge, when the number of relations in the left side is greater than in the right side, and it is used for collapsing double-branched structures (i.e., parent nodes that have two children).
5.3.2.4.2
Intermediate projection rules
first intermediate projection (lower XB) Rule (32):
XH(%head):=XB(%head;);
Rule in (32) projects the first intermediate projection (XB) when there is no complement or adjunct (the second argument is empty). second intermediate projection (if any) Rule (33):
XB(%head;%xb1)XA(%head;%adjt):=XB(XB(%head;%xb1);%adjt);
Rule in (33) projects the second intermediate projection (XB) in case of adjt (the tree has a comp and an adjt, or two adjts). Rule in (34) projects the second intermediate projection (XB) in case of comp (the tree has two comps). Rule (34): p2);
XB(%head;%X1)XC(%head;%X2):=XB(XB(%head;%comp1);%com
141
Third intermediate projection (if any) Rule (35): XB(XB(%head;%xb1);%xb2))XA(%head;%adjt):=XB(XB(XB(%hea d;%xb1);xb2);%adjt)); In rules (35), the tree has one comp and two adjuncts, two complements and one adjunct or three adjuncts). Many other intermediate projections can be composed by the same way. There is no limit to the number of the intermediate projections that can be composed, since it depends on the context and the length of the sentences being analyzed. 5.3.2.4.3
Maximal projection rules
The head with no complement, adjunct or specifier can be projected to its maximal projection by the rule in (36). The tree that has only one intermediate projection and only one specifier can be composed by the rule in (37). The tree that has only one intermediate projection and no specifier can be composed by the rule in (38). The tree that has two intermediate projections and a specifier is combined by rule in (39). The tree that has two intermediate projections and no specifier is combined by rule (40): Rule (36):
XB (%head;):=XP(XB(%head;););
Rule (37):
XB(%head;%xb)XS(%head;%spec):=XP(XB(%head;%xb);%spec);
Rule (38):
XB(%head;%xb):=XP(XB(%head;%xb););
Rule (39): XB(XB(%head;%xb1);%xb2)XS(%head;%spec):=XP(XB(XB(%head ;%xb1);%xb2);%spec); Rule (40):
XB(XB(%head;%xb1);%xb2):=XP(XB(XB(%head;%xb1);%xb2););
In the developed grammar of this thesis, the main step is to detect the boundaries of the phrases in the sentences of the corpus, so the linguistic rules have been implemented and formalized for that purpose. The phrases are combined and projected in a certain order; it starts with the adverbial phrases, the prepositional phrases, adjectival phrase, noun phrases and the verbal phrases respectively.
The projection of the Adverbial Phrase The adverbial phrase can be projected gradually from the lexical category adverb (A) in sentence (88) by the rule in (41) and projected to its intermediate projection adverbial phrase (AB) by the rule in (42), then the intermediate projection adverbial phrase (AB) is projected to its maximal projection through the rule in (43). .) تخدم المجمع حاليا شبكة سكك حديد تؤمن نقل الفحم الحجري88 “Currently, the railway network which transfer coal is serving the convention.” Rule (41):
(A, ^proj , %x ) := (%x , +AB , +proj ) ;
142
Rule (42): +AB,+SEM=%x) ; Rule (43):
(AB , %x , ^PROJ) := (AB(%x , +PROJ ; +e , %y ) , (AB , %x , ^PROJ) := (AP(%x , +PROJ ; +e , %y ) , +AP) ;
Rule in (41) could tag the adverb “‘ ”حالياcurrently’ by the tag (AB) to enable rule in (42) to work depending on this tag and project the adverb from the lexical category adverb to the phrase category. The adverb “ ”حالياhas no complement (the complement is empty) in sentence (88), so it can be projected to its intermediate projection adverb phrase bar (AB). The semantic class of the lexical adverb will be assigned to its intermediate projection (AB) through the value (+SEM=%x) stated in the right side of the rule in (42). The projected intermediate phrase adverb phrase (AB) in (42) has no adjuncts or specifier, so it will be projected to its maximal projection through rule in (43).
The projection of the Adjectival Phrase The Adjectival phrase can be projected gradually from the lexical category adjective (J) in sentence (89) by rule in (44) and projected to its intermediate projection Adjectival phrase (JB) by rule in (45): .) تشجع الخطة األولى إقامة مشاريع شراكة بين مصانع كندية وأخرى محلية89 “The first plan encourages the establishment of partnership projects between canadian factories and other local ones.” Rule (44):
(J , ^PTP , ^proj , %x ) := (%x , +JB , +proj ) ;
Rule (45): (JB , %x , ^PROJ ) := (JB(%x , +PROJ ; +e , %y ) , +JB , +GEN = %x ,+NUM = %x, +DEF = %x , %01 ) ; Rule (46): (JB , %x , ^PROJ ,ordinal) := (JP(%x , +PROJ ; +e , %y ) , +JP , +GEN = %x ,+NUM = %x, +DEF = %x , %01,ordinal ) ; Rule in (44) could tag the adjective “‘ ”أولىfirst’ by the tag (JB) to enable rule in (45) to work depending on this tag and project the adjective from its lexical category to the phrasal category. The adjective “‘ ”أولىfirst’ has no complement (the complement is empty) in sentence (89), so it can be projected to its intermediate projection adjective phrase bar (JB) and at the same time the semantic class of the adjective assigned to its intermediate projection (+SEM=%x). Moreover, the gender, number and the definiteness of the adjective are also assigned to the projected (JB) by the rule (45). Then, the (JB) will be projected to its maximal projected adjectival phrase (JP) by the rule (46).
The projection of the Prepositional Phrase In the phrase “‘ ”عن هذه الواقعةabout this incident’ in sentence (90), the preposition “‘ ”عنabout’ that is assigned with the POS (PRE) has a complement; the complement is the noun phrase (NP) “‘ ”هذه الواقعةthis incident’ (the projection of the noun phrase will be discussed in details in page 189). 143
.) كتب طه حسين عن هذه الواقعة في رثائه ألحمد أمين90 “Taha Hussein wrote about this incident in his lament of Ahmed Amin in year 1945.” Rule (47): (PRE , ^prel , %p ) (NP , ^sub,%x ) ( %z ) := (PB(%p ; %x) , +PB,+SEM=% p, %01 ) (%z ) ; Rule (48):
(PB , %x , ^pro ) := (PP(%x , +pro ; +e , %y ) , +PP , %01 ) ;
Rule (47) states that when the preposition is followed by a noun phrase, the noun phrase will be considered as its complement. So, the preposition and its complement can be combined to form the intermediate prepositional phrase (PB) then the semantic class of the preposition is assigned to its intermediate projection (+SEM=%p) by rule (47). The semantic class of the preposition is important to transfer to its projected intermediate projection (PB). Because the preposition has no specifier in sentence (90), the (PB) will be projected to its maximal projection prepositional phrase (PP) by the rule (48). The projection of the Determiner Phrase
The determiner phrase can be projected gradually from the lexical category determiner (D) in sentence (91) by rule in (49) and projected to its maximal projection determiner (DP) Phrase. .) تمسك بعض الزعماء العرب بشعارات االستقالل91 “Some Arab leaders stuck to independence slogans.” Rule (49): (DEM,D , %n ) (NP, %x ) := (DB(%n ; %x), +DB,+SEM = %x,+GEN = %x,+NUM = %x ,+ANI = %x,+COMP); Rule (50): (DB , %x , ^pro ) := (DP(%x , +pro ; +e , %y ) , +DP,+SEM = %x,+GEN = %x,+NUM = %x ,+ANI = %x , %01 ) ; In the phrase “‘ ”بعض الزعماء العربsome Arab leaders’ in sentence (91), the determiner “”بعض ‘some’ that is assigned with the POS (D) has a complement; the complement is the noun phrase (NP) “‘ ”الزعماء العربArab leaders’. Rule (49) states that when the determiner is followed by a noun phrase, the noun phrase will be considered as its complement. So, the determiner and its complement can be combined to form the intermediate projection determiner phrase bar (DB) and at the same time the semantic class, gender, number and the animacy of the noun phrase will be assigned to the projected intermediate projection the determiner phrase (DB); the inheritance of the features of the lexical category to the phrasal category has a great importance in detecting the function of the phrase “”بعض الزعماء العرب ‘some Arab leaders’ in relation to the verb in sentence (91) during the determination of the arguments of the verb “‘ ”تمسكstick to’. The projected intermediate determiner phrase bar (DB) will be combined with an empty specifier to be projected to its maximal projection determiner phrase (DP) by rule (50).
144
The projection of the Noun Phrase The determination of nominal phrases boundaries is known to be a very strenuous task. NP detection and verb identification are the basic required information for the automatic acquisition of verbal subcategorization frames. The researcher has depended on some cues in building and determining the boundaries of the noun phrases. These cues are: a noun can be combined with other noun when the first is not definite (nakera) if it lacks the definite article or the idafa.
Handling genitive construction ()إضافة
During the processing, each noun in the sentence is assigned with the feature (NB). .) وزيره المولج92 “His minister that is in charge of” In the phrase in (92), the noun “‘ ”وزيرminister’ is followed by the masculine pronoun “‘ ”هhis’ and the adjectival phrase “‘ ”المولجin charge of’. Because, the noun “‘ ”وزيرminister’ which is not definite (nakra) is followed by pronoun “‘ ”هhis’, they will constitute a construct structure ( )مضاف ومضاف إليهand the word “‘ ”وزيرminister’ will become definite ( )معرفة باإلضافةand will be assigned with the attribute (DEF). Rule in (51) will be applied and it will move the masculine pronoun “‘ ”هhis’ before the noun “‘ ”وزيرminister’ as the pronoun “‘ ”هhis’ is its specifier and the noun will be combined with its adjacent adjective phrase “‘ ”المولجin charge of’ which agrees with the noun “‘ ”وزيرminister’ in the definiteness, number and gender. Therefore, the structure will be consisting of an adjective and its depicted form ()صفة وموصوف, which can be combined to form the intermediate projection noun phrase (NB) by the rule (51): Rule (51): (NB,%n,ANM,MCL,SNG)(POD,%s)(JP,GEN=MCL,NUM=SNG,%x )=:(POD,%s,SFX)(NB(%n;%x,+adjc),+has_AJ1,+NB,+SEM=%n,+GEN=%n,+ NUM = %n,+ANI = %n,DEF , %01 ) ; .) شجع التقرير الدول النامية على جعل أسواقها المالية أكثر جاذبية93 “The report encouraged the developing countries to make its financial markets more attractive.” The phrase “‘ ”أسواقها الماليةfinancial markets’ in sentence (93), the non-animte plural noun “‘ ”أسواقmarkets’ is followed by the feminine pronoun “‘ ”هاher’ and the adjectival phrase “‘ ”الماليةfinancial’. Because the noun “‘ ”أسواقmarkets’ which is not definite (nakra) is followed by pronoun “‘ ”هاher’, they will constitute a construct structure ()مضاف ومضاف إليه and the word “‘ ”أسواقmarkets’ will become definite ( )معرفة باإلضافةand will be assigned with the attribute (DEF). Rule in (52) will be applied and move the feminine pronoun “ ”هاbefore the noun “‘ ”أسواقmarkets’ as the pronoun “‘ ”هاher’ is its specifier and the noun will be combined with its adjacent adjective phrase “‘ ”الماليةfinancial’ which agrees with the noun “‘ ”أسواقmarkets’ in the definiteness, number and gender (as the noun “ ”أسواقis a non animate plural noun, its depicted adjective should have the feminine gender), so the structure 145
will be consisting of an adjective and its depicted form ()صفة وموصوف, which can be combined to form the intermediate projection noun phrase (NB) by the rule (52). The adjectival phrase “‘ ”الماليةfinancial’ is the adjunct of the noun “‘ ”أسواقmarkets’ and the combined intermediate projection noun phrase (NB) will have the semantic class of the main predicate “”أسواق ‘markets’. Rule (52): (NB,%n,NANM,PLR)(POD,%s)(JP,GEN=FEM,NUM=SNG,%x):= (POD,%s,SFX)(NB(%n; %x,+adjc ), +has_AJ1,+NB , +SEM = %n,+GEN = %n,+NUM = %n,+ANI = %n,DEF , %01 ) ; ) تنظم المؤتمر وزارة العدل في دولة الكويت بدعم من مؤسسة الكويت94 “The ministry of justice in Kuwait is organizing the conference with support from the Kuwait Foundation” In the phrase “‘ ”دعم من مؤسسة الكويتSupport from the Kuwait Foundation’ in (94), the word “ ”دعمis an indefinite noun, so it can be combined with the following PP “”من مؤسسة الكويت ‘from the Kuwait Foundation’ to form the intermediate projection noun phrase bar (NB) which will projected directly to its maximal projection as there is no specifier by rule in (53). Rule (53): (NB , %n ) (PP , %x ) := (NP(%n ; %x,+adjc ), +has_AJ1,+NP , +SEM = %n,+GEN = %n,+NUM = %n,+ANI = %n , %01 ) ;
Handling Noun modified by nominal and adjectival modifiers .) تخدم شركات الكهرباء السعودية العشر قرابة ثالثة ماليين مشترك95 “The ten Saudi electricity companies are serving nearly three million subscribers”
The noun “‘ ”شركاتcompanies’ in sentence (95) is permitting modification from the following noun phrase “‘‘ ”الكهرباءelectricity’ to form the intermediate projection noun phrase bar (NB) “‘ ”شركات الكهرباءelectricity companies’ by rule (54) which could be combined again with the following adjective phrase “‘ ”السعوديةsaudi’ through rule (55) to form a bigger intermediate projection noun phrase double bar (NB) “‘ ”شركات الكهرباء السعوديةsaudi electricity companies’ which could also be combined with the adjacent JP “‘ ”العشرten’ to form a bigger (NB) “ شركات .‘ ”الكهرباء السعودية العشرthe ten Saudi electricity companies’ by the same rule in (55). Rule (54): (NB,^DEF , %x ) (NP ,^DIGIT, %y ) ( %z) := (NB(%x , -NB ; %y ) , +SEM = %x,+GEN = %x,+NUM = %x,+ANI = %x , +NB , +DEF , %01 ) (%z ) ; Rule (55): (NB , %n ,has_AJ1) (JP , def , %x ) := (NB(%n ; %x,+adjc ), +has_AJ1,+NB , +SEM = %n,+GEN = %n,+NUM = %n,+ANI = %n , %01 ) ;
Handling definite NOUN and its adjectival modifiers ()صفة وموصوف
A different structure has been handled by the developed grammar in the phrase in (96). When the noun is definite by the definite article “‘ ”الthe’ and is followed by a definite adjective that 146
agrees in the number and gender with the noun. Hence, the adjective is the adjunct of this noun, and it will be combined with the noun to be projected into the intermediate projection noun phrase bar (NB) by rule in (56) and this intermediate projection will combined with the specifier “‘ ”الthe’ to form the maximal projection noun phrase by rule in (57): .) العام الماضي96 “The last year” Rule (56): (NB , %n ) (JP , GEN = %n , def , %x ) := (NB(%n ; %x,+adjc ), +has_AJ1,+NB, +SEM = %n,+GEN = %n,+NUM = %n,+ANI = %n , %01 ) ; Rule (57): ({ART|POD}, %z ) (NB, %x ) ( %y,^NB,^NP,^HUM ) := (NP(%x , NB ; %z ) , +SEM = %x,+GEN = %x,+NUM = %x ,+ANI = %x, +NP , +DEF , %01 ) (%y ); Rule (56) states that the noun “‘ ”عامyear’ in the phrase in (96) should be combined with the adjacent adjective phrase “‘ ”الماضيthe last’ to form the intermediate projection noun phrase bar (NB) which combined with the article (ART) to form the maximal projection NP; the syntactic structure is shown in figure 87 below:
Figure 87: The representation of the phrase “”العام الماضي.
147
.) أرباحا صافية97 “Net profit” In the phrase (97), the non-animate indefinite plural noun “‘ ”أرباحprofits’ is combined with the adjacent adjective “‘ ”صافيةnet’ to form the intermediate projection noun phrase bar (NB) which is assigned with the all feature of the noun such as the semantic class, gender, number and animacy as shown in rule (58) Rule (58): (NB, %n,NANM,PLR,{MCL|FEM}) (JP , GEN = FEM,NUM=SNG, %x ) := (NB(%n ; %x,+adjc ),+has_AJ1, +NB, +SEM = %n,+GEN = %n,+NUM = %n,+ANI = %n , %01 ) ; However, a different rule should be applied to parse a structure similar to that in sentence (98) when the noun is a masculine singular noun as in sentence (98). The developed rule is as shown in rule (59), in this case the developed intermediate projection noun phrase bar (NB) “‘ ”مؤتمر قانونيlegal conference’ assigned with the all feature of the noun (MCL,SNG and NANM). .) مؤتمرقانوني98 “A legal conference” Rule (59): (NB , %n,NANM,SNG,MCL) (JP , GEN = MCL,NUM=SNG, %x ) := (NB(%n ; %x,+adjc ),+has_AJ1, +NB, +SEM = %n,+GEN = %n,+NUM = %n,+ANI = %n , %01 ) ;
Handling Apposition .) شجع يلتسن هذه االشاعات99 “Yeltsin encouraged these rumors” .) نظم أحمد نظيف رئيس الوزراء مؤتمرا100 “The prime minister Ahmed Nazif has organized a conference.”
The developed grammar can recognize that the constituent “‘ ”يلتسنYeltsin’ in sentence (99) is a noun phrase by the rule in (60), because this word is assigned as a proper noun (PPN). Proper nouns do not have any specifiers or complements as mentioned before during the manual analysis, so they are projected directly to their maximal projections (NPs) by rule in (60). Moreover, the constructed NP “‘ ”يلتسنYeltsin’ does not permit the modification of the followed determiner phrase “‘ ”هذه االشاعاتthese rumors’. Although, it is not the case in the bold construct in sentence (100), the constructed NP “‘ ”أحمد نظيفAhmed Nazif’ permits the modification of the following phrase “‘ ”رئيس الوزراءprime minister’ (NP). During the processing of the sentence in (100), an important phenomenon has been dealt with which is the
148
apposition. The apposition is a grammatical construction in which two elements, normally noun phrases, are placed side by side, with one element identifying the other. The two elements are said to be in apposition. The developed grammar can achieve this through identifying the semantic classification of the proper name that construct the first NP in sentence (100) which is HUM (human), then, identifying the semantic classification of the followed NP “‘ ”رئيس الوزراءprime minister’ which is also HUM (human) as the noun “”رئيس ‘prime’ is stored in the dictionary as the following: [110468559"}2022984{]("رئيسLEX=N,POS=NOU,MOR=WFO,LST=WRD,GEN=MCL,NU M=SNG,PAR=M619,FRA=Y0,ABN=CCT,ALY=ALI,ANI=ANM,CAR=CTB,SEM=HUM,S FR=K0); Therefore, when the “‘ ”رئيسprime’ is combined with “‘ ”الوزراءministers’ the resulting construct will have the semantic classification HUM, hence, the developed grammar would be able to predict that the phrase “‘ ”رئيس الوزراءprime minister’ can stand instead of the NP “ أحمد ‘ ”نظيفAhmed Nazif’ and the two can be combined together to form a bigger NP (one constituent) “‘ ”أحمد نظيف رئيس الوزراءAhmed Nazif the prime minister’ by rule in (61). Rule (60): (%x,NB,{PPN|MTW},PROJ):= (NP(%x , -NB) , +SEM = %x,+GEN = %x,+NUM = %x ,+ANI = %x, +NP , +DFE,DEFN, %01); Rule (61): (NP,DEFN,%x,SEM=HUM ) (NP, %y,SEM=%x) (%z) := (NP(%x; %y) , +SEM = %x,+GEN = %x,+NUM = %x,+ANI = %x , +NP , %01 ) (%z ) ;
The projection of the verb Phrase After detecting the boundaries of the nominal chunks and the different other constructions existing in the sentences of the corpus around the defined predicates (verbs), the grammar should determine the arguments of the verb and decide which of the projected phrases can be an argument of the specified predicate. The verb behavior can be used as the main cue. The indirect transitive verbs are assigned in the dictionary with the feature (TSTI). If the verb with this feature is followed by a constructed prepositional phrase (PP) as in sentence (101), the (PP) will be marked as the complement and will be identified as the argument of the verb (Arg0) by rule (62) and will be combined with the verb to form the intermediate projection verb phrase (VB) by rule (63). However, it is not the case for verbs which are assigned with the direct transitive feature (TSTD) as the verb “‘ ”شجعencourage’ in sentence (102), in this case the prepositional phrase “‘ ”في البلدينin the two countries’ would be an adjunct of the verb by rule in (64): .) الذئاب الوحشية تنقض على الحضارة101 “The savage wolves leap upon the civilization.” .) اللجنة شجعت في البلدين جهود القطاع الخاص102 “The committee encouraged the private sector efforts in the two countries.”
149
Rule (62): (NP,MCL,PLR,NANM,%x)(VB,TSTI,FEM,%v)(PP,%z):= (NP,+subj,%x)(VB,TSTI ,%v ) (PP,+ Arg0 , %z); Rule (62) states that the projected NP which precedes the TSTI verb when it is MCL, PLR and NANM and the verb is FEM (the verb in this case agrees with the subject), the agreement between the verb and the NP is considered as a cue in identifying the argument and the nonargument of the predicate, the NP in this case should be the subject of this verb and the prepositional phrase (PP) will be combined with the verb to form the intermediate projection verb phrase (VB), in this case the (PP) is as sister node of the verb. Rule (63): (VB,TSTI ,%v ) (PP, %comp,Arg0 ):= (VB(%v ; %comp , +comp ) , +verb = %v , +VB , %01 ) ; Rule (63) states that the projected prepositional phrase (PP) will be combined with the verb to form the intermediate projection verb phrase (VB), in this case the (PP) is as sister node of the verb. The verb “‘ ”انقضleap upon’ will be subcategorized as permitting a prepositional phrase as its complement. Rule (64): (VB,TSTD,has_comp,%v)(PP,%x):=(VB(%v ; %x,+adjunct ),+has_A J1,+VB,%01) ; Again, because the arguments of Arabic verbs do not have fixed positions in the sentences, their automatic detection is considered as a challenge in the NLP. Hence, during the analysis of the corpus many rules were added to capture the different occurrences of specified verb arguments. In sentence (103), the complement is preceded by two constituents. The first constituent is the subject NP “‘ ”نحنwe’ which is determined through the features assigned to the verb as NUM=PLR and PER=1PP and the second constituent is the noun phrase (NP) “ عام 1996” ‘year 1996’ which has the semantic classification time (TIM) and is considered as an adjunct to the verb. . أرباحا كثيرة1996 ) حققنا عام103 “ we gained considerable profits in year 1996” Rule (65): (V,TSTD,%v,NUM=PLR,PER=1PP)(NP,TIM,%n) (NP,%n2,{FEM|MCL},NANM,{PLR|PLRT}):=(NP(""نحن,%S),+subj,+NP,+SE M=%S,+GEN=%S,+NUM=%S,+ANI= %S,+moved )(NP ,TIM, %n )( %v,"نا ":"" )(NP(%n2),+NP ,+SEM = %n2,+GEN = %n2,+NUM = %n2 ,+ANI = %n2,+comp,Arg0 )(VB,FEM , %n3 ) ; Rule (66): (VB,TSTD,%v)(NP,Arg0,%comp)(STAIL):=(VB(%v;%comp +comp ),+has_comp , +verb = %v , +VB , %01 ) ;
,
Rule (67): (NP,%x,TIM)(VB,has_comp,%v):= (VB(%v ; %x,+adjunct ) ,+has_AJ1, +VB , %01 ) ; Rule (68): (%x , {NP|DP},subj,moved) (VB , %v ) := (VP(%v ; %x ) , +verb = %v , +VP , %01 ) ;
150
Rule in (65) states that if a verb is followed by two noun phrases which are “1996 ‘ ”عامyear 1996’ and “‘ ”أرباحا كثيرةconsiderable profits’, and that the verb has the features TSTD, PLR and 1PP “‘ ”حققناwe gained’, so a plural first person subject “‘ ”نحنwe’ should be added to the structure (because the form of the verb implies an implicit subject) which agrees with the verb in gender and number, so it functions as a subject. The first NP “1996 ‘ ”عامyear 1996’ will be considered as an adjunct by applying the rule in (65), because the NP “1996 ‘ ”عامyear 1996’ carries the semantic feature time (TIM), since the word “‘ ”عامyear’ is stored in the dictionary as the following: [115203791"}1353275{]("عامLEX=N,POS=NOU,LST=WRD,GEN=MCL,NUM=SNG,PAR= M558,FRA=Y0,ABN=ABT,ANI=NANM,SEM=TIM) ; The verb “‘ ”حققgain’ is assigned with a direct transitive feature (TSTD) and this feature is stored in the dictionary, the remaining NP “‘ ”أرباح كثيرةconsiderable profits’ will be the sister node of the verb “‘ ”حققgain’ in sentence (103) and it will be combined with the verb to form the intermediate projection verb phrase (VB) “‘ ”حقق أرباحا كثيرةgained considerable profits’ by rule in (66), the projected VB will be combined with the adjunct “1996 ‘ ”عامyear 1996’ to form a larger VB by rule in (67). Then, the projected VB will be combined with the subject “‘ ”نحنwe’ which is assigned with the feature (subj) to form the maximal projected verb phrase VP by rule (68). Finally, the intermediate projection V bar (VB) can be projected when the verb is a sister node to any constituent whether the constituent is (PP, NP, DP, non-finite verb clause….) and the projected V bar (VB) can be combined with other phrasal category. It could be a PP, if the verb permits two objects (TST2 verbs), in this case, the formed phrase will be V double bar (VB). On the other hand, the combined V bar (VB) could be a sister node to the PP, but in this case the PP will be the adjunct to the verb, and not its second object in case of the direct object verb (TSTD) and the indirect object verbs (TSTI). Table 16 shows the kinds of phrases which could be first complement or second complement or adjunct in the VP structure:
151
Table 16: The structure of the verb phrase Structure VP Specifier Category NP
(Subject) Structure Proper noun N DET+N N+N N+PRON N+NP N+PP N+JP N+JP+JP+JP NP+CONJ +NP Proper noun +NP
HEAD V
V double bar (VB) Complement Structure NP Proper noun N DET+N N+N N+PRON N+NP N+PP N+JP N+JP+JP+JP NP+CONJ +NP
V V V 5.3.2.5
PP CP DP
PP,AP,JP
P+NP أن+ NP DET+NP
Detecting the Verb Subcategorization Frames
Verb subcategorization patterns are then extracted from the sub-analyses of the parsed sentences which begin/end at the boundaries of specified predicates. Once the sentences were processed to their constituent boundaries, for example [NP V NP] or [V NP NP], [V NP PP], [V NP AP PP NP], it will be easy to identify the predicate arguments. In this thesis, the syntactic annotation of selected corpus makes it possible to identify verbs arguments in different structures, for example rule in (69). Rule (69): (V , TSTI,SNG,FEM , %v ) (NP SNG,FEM , ^subj , %n ) (PP , %n2 ) := (+subj , %n ) (+V , +Done , %v ) (%n2,Arg0 ) ; 152
Rule in (69) states that if the structure is as follows [V NP PP], the transitivity of the verb would be indirect object (TSTI) and the following NP agrees in gender and number with the verb, it means that this NP is the subject of the verb. So, it has to be moved to precede the verb in the structure in order to be [NP V PP] so it is in the specifier place according to the schema of X- bar theory. Now, the verb in this structure is followed directly with a prepositional phrase (PP) and since that the verb is TSTI (its complement is indirect object), the PP with the index %n2 is marked as Arg0 (the first argument of the verb). Rule in (70) is responsible for marking the argument of the verb in the structure [V NP AP NP] for the sentence in (104). ) تخدم المجمع حاليا شبكة سكك حديد تؤمن نقل الفحم الحجري104 “Currently, the railway network which covers the transfer of coal is serving the convention.” Rule (70): (V,%v,TSTD,FEM,SNG ) (NP , ^subj,MCL,%n) (AP , %n3 ) (NP , %n2,GEN=%v, ,NUM=%v)(STAIL) := (+subj , %n2,+moved,GEN=%v) (AP , %n3 )( %v ) (NP , %n,+comp,Arg0 )(STAIL); Rule (70) states that if the predicate is TSTD verb and followed by NP that does not agree in gender and number with the verb (the NP “‘ ”المجمعassembly’ is masculine MCL and the verb “‘ ”تخدمserve’ is feminine), the NP will not be the subject of the verb. The NP is followed by the AP “‘ ”حالياcurrently’ with the index %n3. Then, the AP is followed by NP “ شبكة سكك حديد ‘ ”تؤمن نقل الفحم الحجريthe railway network secure delivering coal’ with the index %n which agrees in number and gender with the verb “‘ ”تخدمserve’, so the rule will mark it as the subject. The NP “‘ ”المجمعassembly’ will be marked as Arg0. The developed grammar has identified the NP “‘ ”المجمعassembly’ in sentence (104) as the argument (object) of the predicate “”تخدم, although its position is preceding the subject phrase, which contradicts the configurational information. Once the arguments of the predicates have been identified, the subcategorization frames of each predicate can be detected. Moreover, the developed grammar can identify the subcategorization frames of the verbs even if the same verb is used in different structures to mean two different senses. For example, the verb “ ”اعتمدhas two different trees with two different structures, for example, [V NP NP] and [V NP PP]. The rules in this situation will examine the transitivity of the verb “ ”اعتمدas the verb “ ”اعتمدis used in the dictionary with two senses, each one is assigned to different transitivity type (TSTD and TSTI). The first sense which is TSTI is represented as the following: [202711987"}2027964{]("اعتمدLEX=V,POS=VER,LST=WRD,GEN=MCL,NUM=SNG,PER =3PS,ATE=PAS,VOI=ACV,TRA=TSTI,PAR=M119,FRA=Y17,SEM=STT); But, the second sense which is TSTD transitivity is represented as the following: [202346895"}405135{]("اعتمدLEX=V,POS=VER,LST=WRD,GEN=MCL,NUM=SNG,PER= 3PS,ATE=PAS,VOI=ACV,TRA=TSTD,PAR=M118,FRA=Y0,SEM=POV);
153
If the verb was tokenized wrongly, for example as TSTI in the structure [V NP NP], the rule can backtrack the verb through examining the structure which the verb occurs in, by the rule in (71). Rule (71): (V,TSTI,%x)(NP,%y)({NP|JP},%z)({^PP|STAIL},%w):=(?TS TD,?[%x])(%y)(%z)(%w); The rule in (71), states that when the verb with the transitivity TSTI occurs in the structure [V NP NP], the rules should backtrack the verb using the feature “?” and it will identify the correct sense with the correct transitivity.
5.4 A walk through examples from the corpus In this section the researcher will introduce the full formal representation of sentences, from the selected corpus; includes the indirect object transitive verb “‘ ”وافقagree’. .) وافق شارون على الخطة المصرية105 “Sharon agreed to the Egyptian plan”. In order to parse the sentence in (105) “”وافق شارون على الخطة المصرية, first, the sentence should be tokenized according to the dictionary. Thus, the sentence will be tokenized to the following pattern: []مصرية[ ]ال( ] [] ة[)على الخط[ ] [ ]شارون[ ] [ ]وافق ””وافق, space, ” ” شارون, space, “”على الخط,”” تاء مربوطة, space, definite article,” ”مصريةand each node is assigned with the appropriate tag, so “ ”وافقis VER (verb), space is assigned with the tag BLK (blank), “ ”شارونis assigned with the tag PPN (proper noun), “ “على الخطis wrongly tokenized hence assigned with the tag ADJ (adjective), the definite article " " الis assigned with the tag ART (article) and finally the word “ ”مصريةis assigned with the tag ADJ (Adjective), as shown in Figure 88, which represents the automatic output of this stage.
154
Figure 88: The tokenization and tagging stage The adjective [ ]على الخطshould be retokenized as []على, []ال, [ ]خطand this is the role of the disambiguation rules. A rule should be added to block the sequence of the adjective and” تاء ”المربوطةif it is preceded by a masculine noun (MCL) “”شارون. Such rule has been added as in (72). The rule states that the adjective is blocked; the blocking is stated in the rule by the symbol (=0). Rule (72):
({J|V}, %01) (^BLK , ^STAIL, ^ACC) = 0;
155
Blocked at segment:[][ال][مصرية [)الخط](ة []]وافق So, the output would be as shown in][على figure[]][شارون 89: Position index:11 Re-tokenizing: ""على الخطة Position index:11 Pattern: []وافق][ ][شارون][ ][على][ ][ال][خطة][ ][ال][مصرية Current State Pattern:[]وافق][ ][شارون][ ][على][ ][ال][خطة][ ][ال][مصرية "202594674"}160695{] [وافق- "("وافقLEMMA=وافق, BF=وافق, LEX=V, POS=VER, LST=WRD, GEN=MCL, NUM=SNG, PER=3PS, ATE=PAS, VOI=ACV, TRA=TSTI, PAR=M242, FRA=Y17, SEM=SOV); " " - [ ]{-1}""(PUT=BLK); "115383488"}8506{] [شارون- "("شارونLEMMA=شارون, BF=شارون, LEX=N, POS=PPN, LST=MTW, GEN=MCL, NUM=SNGT, PAR=M0, FRA=Y0); " " - [ ]{-1}""(PUT=BLK); "2991{] [على- "(""}علىLEMMA=على, BF=على, LEX=P, POS=PRE, LST=WRD, PAR=M0, FRA=Y0, att=@on, rel=PLC); " " - [ ]{-1}""(PUT=BLK); "2418{] [ال- "(""}الLEMMA=ال, BF=ال, LEX=D, POS=ART, LST=WRD, PAR=M0, FRA=Y0, att=@def); "105898568"}39202{] [خطة- "("خطةLEMMA=خطة, BF=خطة, LEX=N, POS=NOU, LST=WRD, GEN=FEM, NUM=SNG, PAR=M1, FRA=Y0, ABN=ABT, ALY=ALI, ANI=NANM, CAR=CTB, SEM=CGN, SFR=K0); " " - [ ]{-1}""(PUT=BLK); "2418{] [ال- "(""}الLEMMA=ال, BF=ال, LEX=D, POS=ART, LST=WRD, PAR=M0, FRA=Y0, att=@def); "302971469"}44872{] [مصرية- "("مصريةLEMMA=مصري, BF=مصري, LEX=J, POS=ADJ, LST=WRD, GEN=FEM, NUM=SNG, DEG=PST, PAR=M466, FRA=Y0); Figure 89: Re-tokenization of the wrongly tokenized adjective The output of the tokenization stage is as shown in figure 90 Pattern:[]وافق][ ][شارون][ ][على][ ][ال][خطة][ ][ال][مصرية Figure 90: The output after the tokenization stage The stage that comes after the tokenization is the transformation stage. In the transformation stage, the researcher has built a set of T-rules to transform the natural language in figure 91 into a parsed tree.
156
Figure 91: The natural language input before starting the parsing The blank space should be deleted from the input in figure (91) to prepare the sentence for the parsing step. So, a rule for omitting the blank space is used as in rule (73) which states that if there is a blank space beside a word, it should be deleted as in figure (92). The blank space between the verb “ ”وافقand “ ”شارونis suppressed, and the rules will be applied recursively; the blank space beside each node is deleted, so the final output after deleting the blank space from the input will be as in figure 92. Rule (73):
(Word , %y , ^blk ) (BLK , %02 ) := (%y , +blk);
Figure 92: Deleting the space after the word “”وافق. ]01:"] ]وافق03:""[ ["شارون05:""[ ["على07:"] ]ال08:"خطة.] ["10:""[ ["ال11:"]مصرية Figure 93: Deleting the space in the NL input. In this phase, small constituents or trees are constructed for the small phrases (usually noun phrases) in the sentence and then combined to form a bigger tree gradually until the whole sentence is analyzed. First, the noun “‘ ”خطةplan’ will be projected to the intermediate constituent (NB) as it is the head of noun phrases as in rule (74). Then, the adjective “”مصرية ‘Egyptian’ will be projected to the intermediate constituent (JB) as in rule (75) then this intermediate constituent will be linked to the definite article “ ”الto form the maximal projection adjective phrase (JP) as in rule (76). Once the adjective phrase is projected to this maximal projection, it will leave the list structure and constitute a part of the syntactic tree. Rule (74):
(N , Word , ^NB , ^PROJ , %x ) := (%x , +NB , +PROJ ) ;
Rule (75):
(J , ^PTP , ^proj , %x ) := (%x , +JB , +proj );
Rule (76): (ART,%y)(JB , %x , ^pro ) := (JP(“ ”ال, %y ;%x , +pro) , +JP , +GEN = %x , +DEF = %x , %01 ) ; The constructed (JP( will be linked to intermediate constituent (NB) ”‘ ”خطةplan’ to build a bigger (NB) by the rule in rule (77) as shown in figure (94). The constructed (NB) “ خطة ‘ ”مصريةEgyptian plan’ will be combined with the specifier “ ”الto form the maximal projection (NP) by the rule in (78) as shown in figure (95). Rule (77): (NB, %n ) (JP , GEN = %n , def , %adjc ) := (NB(%n ; %adjc ) , +rel = mod , +NB , +GEN = %n , %01 ) ;
157
Rule (78): (ART, %z ) NB(NB(%x ; %y),+NB, %01 ):= (NP(%x , NB ; %z ), NP, %01 ) (%y ) ; String View: | | #L("03:""شارون,01:")وافق | #L("05:""على,03:"شارون.@on) | #L("05:"على.@on,"07:"ال.@def) | #L("07:"ال.@def,:02) | NB:02("01:,08:")خطة | JP:01("13:""ال,11:")مصرية |
| String View: | | #L("03:""شارون,01:")وافق |#L("05:""على,03:"شارون.@on) | #L("05:"على.@on,:03) | NP:03(:02,"07:"ال.@def) NB:02("01:,08:")خطة JP:01("13:"",11:")مصرية
Figure 95: Building the noun phrase “( ”الخطة المصريةNP)
Figure 94: Building the intermediate constituent “( خطة مصريةNB)
The (NP) “‘ ”الخطة مصريةEgyptian plan’ is preceded by the preposition “‘ ”علىon’ in the Arabic input sentence, so it will be linked with this preposition to form the intermediate projection (PB) by the rule in rule (79). There are no other modifiers for the constructed (PB) “ على الخطة ‘ ”مصريةon the Egyptian plan’ that will be projected to the corresponding maximal projection (PP) by rule in rule (80) as shown in figure (96). Rule (79):
(P , %p ) (NP %adjc ) := (PB(%p ; %adjc ) , +PB , %01);
Rule (80):
(PB , %x , ^pro ) := (PP(%x , +pro ; +e , %y ) , +PP , %01 ) ;
| | #L("03:""شارون,01:")وافق | #L("05:,03:")شارون | PP:05(:04,"":18) | PB:04("05:"على.@on,:03) | NP:03(:02,"07:"ال.@def) | NB:02("01:,08:")خطة | JP:01("13:"",11:")مصرية | | ---------------------LIST | ["05: ]03the :"["شارون ]01:"وافق {} “”على الخطة المصرية. Figure 96: Building prepositional phrase The remaining nodes (the processing units) that are not linked to the tree structure are the proper name “‘ ”شارونSharon’ and the verb “‘ ”وافقagree’ as shown in figure (96). “”شارون ‘Sharon’ will be projected to the maximal projection (NP) as discussed before because it has no specifier and complement. The (PP) is marked as in rule (81) by Arg0 as the verb is assigned with the feature TSTI. The (PP) will be linked to the verb “‘ ”وافقagree’ to form the
158
intermediate constituent (VB) by rule (82), then this intermediate constituent (VB) will be linked to the maximal projection (NP) “‘ ”شارونSharon’ to form the maximal projection (VP) by rule (83). Rule (81): (V , TSTI , Y1, %v ) (NP , ^subj , %x ) (PP , %y ) := (+subj , %x ) (%v , V , TSTI) (%y , +Arg0 ) ; Rule (82): (V, TSTI , %v ) (PP , %comp ) := (VB(%v ; %comp , +comp ) , +verb = %v , +VB , %01 ) ; Rule (83):
(%x , NP , subj ) (VB , %v ) := (VP(%v ; %x ) ,%01 );
| UW View: | | VP:08(:07,:06) | VB:07(وافق:01,:05) | PP:05(:04,"":18) | PB:04(05:على.@on,:03) | NP:03(:02,07:ال.@def) | NB:02(خطة:08,:01) | JP:01(مصرية:11,""ال:13) | NP:06(شارون:03,"":20) |
---------------------| Scope Reference: :05 | Current NL string:"" | Original NL string:[] | Attributes: PP, SCOPE, ,Arg0,comp | Parent scope::07 | ----------------------
Figure 98: The PP which is the complement of the verb “”وافق ‘agree”
Figure 97: The automatic syntactic representation of sentence in (105)
The (PP) “‘ ”على الخطة المصريةon the Egyptian plan’ is the verb’s complement (Arg0) or the argument of the verb “‘ ”وافقagree’ which is indirect transitive verb (TSTI) as shown in figure (97). The first NP “‘”شارونSharon’ is the subject of the verb (V) “‘ ”وافقagree’. The final tree for the sentence (105) is shown in figure (98). The subcategorization frame of the verb “”وافق ‘agree’ was extracted automatically as: [V,
+ [ـــــPP]].
159
CHAPTER 6 RESULTS AND DISCUSSION
160
Chapter 6: Results and Discussion The two preceding chapters have presented the linguistic description of some selected sentences of an identified set of verbs of different transitivity subcategories and have also presented a formalization of the linguistic rules using IAN tool. The set of the formalized rules has been developed to be capable of parsing the sentences, identifying the subjects\objects boundaries and deciding which constituents of the sentences are classified as arguments of the specified predicates and which are classified as adjuncts to the predicates. This chapter aims to demonstrate the extent of effectiveness of the developed rules and the adopted linguistic theory (X-bar theory) as well as to evaluate the extracted subcategorization frames classes which depend on the degree of accuracy of the parser and its capacity to identify the NPs boundaries and detect which constituents act as arguments and which as adjuncts. The rules have exhibited a considerable degree of observational adequacy illustrated through the presented examples. The corpus consists of 600 sentences for sixty verbs; ten sentences are selected for each verb. The verbs were selected from different types of transitivity; twenty verbs were classified as ditransitive verb, twenty were classified as indirect object verbs and twenty were classified as direct object verbs. The parser comprises 401 rules. The rules are divided into: 101 normalization rules, 100 tokenization and disambiguation rules and 200 transformation rules. To test the parser’s efficiency, each of the different structures configured in the former chapters is examined here to assess the performance and the accuracy of the parser.
6.1
Results
In this section, the researcher tries to prove the adequacy of the formal analysis made earlier and hence validating the parser. Testing each structure of the parsed VPs constitutes a valid criterion for passing sound judgments about the parser and its actual performance. There are a number of considerations that need to be taken into account during the evaluation process. The following is a list of criteria that test the adequacy of the parsed tree which entails the accuracy of the extracted subcategorization frames. The correct segmentation of the sentences into tokens (Tokenization and Disambiguation stage). The correct detection of the boundaries between phrases Correctly analyzed VPs in terms of constituency grammar; the parser should identify which constituents have been identified as arguments of the verb and mark them by the following labels Arg0 (the first argument of the verb), Arg1 (the second argument of the verb). The first argument of the verb should be represented as a sister node of the verb in the intermediate projection verb phrase bar (VB) and if there is a second argument, it should be a sister node of the intermediate projection verb phrase bar (VB).
161
Examples of the different structures of the subject and the object that were addressed in the two preceding chapters will be tested to evaluate the correct detection of the boundaries:
The subject is a NP consisting of (DET+N) and the object is a NP consisting of (N+NP+PP) and adjunct (PP)
Input .) شجعت اللجنة جهود القطاع الخاص في البلدين لتأسيس شركات مشتركة106 ‘The committee encouraged the private sector efforts in the two countries to establish joint companies’ The parsed tree is:
Figure 99: The syntactic structure of (106)
162
Figure (99) shows that the syntactic structure of the sentence“ شجعت اللجنة جهود القطاع الخاص في ”البلدين لتأسيس شركات مشتركةwhich is considered as the VP, that has the index “:13”, consists of the subject “”اللجنة, that has the index “:05”, and the VB, that has the index “:12”. The VB contains a smaller VB with index “:11” that is combined with the adjunct PP “ لتأسيس شركات ”مشتركةthat has the index “:10” and the smaller VB “:11” that consists of the main verb “ ”شجعتthat has the index “:01” and the complement NP “ ”جهود القطاع الخاص في البلدينwith the index “:0C”. The complement is shown with its detailed features in figure (100).
Figure 100: The detailed description of the scope :0C which is the complement of the main verb in sentence (106).
The subject is a NP consisting of (DET+NOUN) and the object is a NP consisting of (N+NP+PP) and has no adjunct
Input 21
) سيحقق المشروع زيادة انتاج الشركة من منجم الشيدية الى اكثر من107
‘The project would increase the company's production of Eshidiya mine to more than.’ The parsed tree is:
21
This part of the sentence will be omitted in the normalization module before the parsing
163
Figure 101: The syntactic structure of sentence in (107) Figure (101) shows that the syntactic structure of “”سيحقق المشروع زيادة انتاج الشركة من منجم الشيدية which is considered as the VP, that has the index “:0D”, consists of the specifer “”المشروع, that has the index “:02”, and the VB, that has the index “:0C”. The VB contains the main verb ““ ”سيحقق:01” and the complement NP “ ”زيادة انتاج الشركة من منجم الشيديةwith the index “:0B”. The complement is shown with its detailed features in figure (102).
164
Figure 102: The detailed description of the scope :0B which is the complement of the main verb in sentence (107).
165
The subject is a NP consisting of (N+JP+JP) and the object is NP (N+JP)
Input .) حققت الموازنات المستقلة األخري زيادات طفيفة108 ‘Other independent budgets have achieved slight increases’ The parsed tree is:
Figure 103: The syntactic structure of sentence in (108) Figure (103) shows that the syntactic structure of “ ”حققت الموازنات المستقلة األخري زيادات طفيفةwhich is considered as the VP, that has the index “:0A”, consists of the specifer “ الموازنات المستقلة ”األخري, that has the index “:08”, and the VB, that has the index “:09”. The VB contains the
166
main verb ““ ”حققت:01” and the complement NP “ ”زيادات طفيفةwith the index “:07”. The complement is shown with its detailed features in figure (104).
Figure 104: The detailed description of the scope :07 which is the complement of the main verb in sentence (108)
Subject is a NP consisting of (N+NP (N+JP+PP)) and the object is NP (N+JP+JP)
Input ) ينظم مركز القاهرة االقليمي للتحكيم التجاري الدولي مؤتمرا قانونيا دوليا109 ‘The Cairo regional center for international commercial arbitration organizes an international legal conference.’
167
The parsed tree is:
Figure 105: the syntactic structure of sentence (109) 168
Figure (105) shows that the syntactic structure of “ ينظم مركز القاهرة االقليمي للتحكيم التجاري الدولي ”مؤتمرا قانونيا دولياwhich is considered as the VP, that has the index “:13”, consists of the specifer “”مركز القاهرة االقليمي للتحكيم التجاري الدولي, that has the index “:11”, and the VB, that has the index “:12”. The VB contains the main verb ““ ”ينظم:01” and the complement NP “ مؤتمرا ”قانونيا دولياwith the index “:0B”. The complement is shown with its detailed features in figure (106).
Figure 106: The detailed description of the scope :0B which is the complement of the main verb in sentence (109)
169
The subject is a NP consisting of N+NP + JP (genitive construction) and the object is a NP consisting of (N+NP)
Input .) تخدم شركات الكهرباء السعودية العشر قرابة ثالثة ماليين مشترك110 “The ten Saudi electricity companies are serving nearly three million subscribers” The parsed tree is:
Figure 107: The syntactic structure of sentence (110) Figure (107) shows that the syntactic structure of “ تخدم شركات الكهرباء السعودية العشر قرابة ثالثة ماليين مشترك.” which is considered as the VP, that has the index “:0D”, consists of the specifer “”شركات الكهرباء السعودية العشر, that has the index “:07”, and the VB, that has the index “:0C”. 170
The VB contains the main verb ““ ”تخدم:01” and the complement NP “”قرابة ثالثة ماليين مشترك with the index “:0B”.
The subject is a NP consisting NP+NP (apposition construction) and the object is NP
Input .) اعتمد الدكتور يوسف بطرس غالي وزير االقتصاد الميزانيات العمومية والحسابات الختامية111 ‘Dr. Youssef Boutros Ghali, Minister of Economic affairs approved the public budgets and final accounts.’
171
The parsed tree is:
Figure 108 : The automatic output of sentence in sentence (111) Figure (108) shows that the syntactic structure of “اعتمد الدكتور يوسف بطرس غالي وزير االقتصاد الميزانيات العمومية والحسابات الختامية.” which is considered as the VP, that has the index “:10”, consists of the subject “”الدكتور يوسف بطرس غالي وزير االقتصاد, that has the index “:0B”, and the VB, that has the index “:0F”. The VB contains the main verb ““ ”اعتمد:38” and the complement NP “ ”الميزانيات العمومية والحسابات الختاميةwith the index “:0E”. The complement is shown with its detailed features in figure (109).
172
Figure 109: The detailed description of the scope :0E which is the complement of the main verb in sentence (111)
The subject is a DP consisting of DET+NP and the object is consisting of NP (N+JP)
Input .) خدم هذا المشروع قطاعات صناعية112 ‘This project served industrial sectors.’ The parsed tree is:
Figure 110: The syntactic structure of sentence (112).
173
Figure (115) shows that the syntactic structure of “ ”خدم هذا المشروع قطاعات صناعيةwhich is considered as the VP, that has the index “:08”, consists of the specifer “”هذا المشروع, that has the index “:06”, and the VB, that has the index “:07”. The VB contains the main verb “”خدم “:01” and the complement NP “ ”قطاعات صناعيةwith the index “:03”. The complement is shown with its detailed features in figure (111).
Figure 111: The detailed description of the scope :03 which is the complement of the main verb in sentence (112)
174
The subject is a NP consisting of DET+N and the object is consisting of PP (P+NP(N+NP))
Input .) تنقض الحكومة على نقابة الصحفيين113 ‘The government leaps upon the journalists union.’ The parsed tree is:
Figure 112: the syntactic structure of sentence (113) Figure (112) shows that the syntactic structure of “ ”تنقض الحكومة على نقابة الصحفيينwhich is considered as the VP, that has the index “:08”, consists of the subject “”الحكومة, that has the index “:01”, and the VB, that has the index “:07”. The VB contains the main verb “”تنقض “:04” and the complement PP “ ”على نقابة الصحفيينwith the index “:06”. The complement is shown with its detailed features in figure (113).
175
Figure 113: The detailed description of the scope :06 which is the complement of the main verb in sentence (113)
6.2
Evaluation
The researcher has presented an algorithm for automatic extraction of subcategorization frames from the corpus which is automatically parsed using the X-bar theory. In contrast to many other approaches, the developed parser does not predefine the extracted subcategorization frames. The researcher has applied the algorithm to 600 Arabic sentences (600 trees). The researcher has extracted the syntactic-arguments of each verb in the selected sentences that represent different syntactic patterns. Verb subcategorization patterns are then extracted from the sub-analyses which begin/end at the boundaries of specified predicates. The results were evaluated against a manually analyzed data. This data was achieved by analyzing a maximum of 10 occurrences for each of the 60 test verbs. The parser has proven substantial descriptive adequacy through the examples of the parser’s output presented above that have included all VP structural patterns previously analyzed in the preceding chapters. The way the rules were developed also provide an acceptable extent of flexibility and viability as far as natural language description is concerned, making them subject to modification, which in turn could facilitate adjustments and corrections after testing. Having tested all the structures of the verb phrases found in the corpus, the parser’s efficiency is estimated via F-measure; a statistical method of eliciting results by means of certain numerical calculations. The F-measure is primarily considered for the purpose of measuring the degree of accuracy and precision of the extracted subcategorization frames. It integrates two folds: precision and recall, which the F-measure calculations are based upon. Precision is the number of correct results divided by the number of all returned results; whereas, recall is the number of correct results divided by the number of results that should have been returned. The upcoming formula describes the way F-measure is computed:
176
F-measure = 2 x ((precision x recall) / (precision + recall)) Figure 114: The formula of the F-measure A result is considered "RETURNED" when the output is a complete tree (i.e., all the words are interlinked); a result is considered "CORRECT" when there is an equivalent between the automatic parsed tree and the expected output from the manual analysis. The number of the correct result that should have been returned was 600 sentences. The number of the total returned results from the analysis tool was 570 and the correct parsed trees that returned were 530. The Precision was calculated as the following: Precision = 530/570 = 0.92. The Recall was calculated as the following: Recall = 530/600 = 0.88. The F –measure has been calculated according to the formula in figure 114 and the result was as the following: F-measure = 2 x ((0.92 x 0.88) / (0.92 + 0.88)) =0.89 Most of the errors were due to: (1) the syntactic ambiguity in the sentences that include the prepositional phrase attachment (PP-attachment). The developed grammar sometimes, attaches the PP to the VP rather than to the NP which precedes the preposition, which leads to difficulty in processing, if the content of the sentence actually requires NP attachment such as the sentence in (114) the preposition phrase “‘ ”لالسكانfor housing’ will be wrongly attached to the verb and not to the NP “‘ ”شركة مدينة نصرNasr city company’. But, if there are two preposition phrases in the sentences, the first one will be attached to the preceding NP and the second will be attached to the VP, for example the sentence in (115). In sentence (115) the preposition phrase “‘ ”لسيارات مازداfor Mazda cars’ will be correctly attached to the preceding NP “‘ ”مبيعات التجزئةretail sales’ and not to the verb “‘ ”حققachieve’. Hence, the PP “”في أوروبا ‘in Europe’ will modify the verb “‘ ”حققachieve’ (the PP is adjunct). ) حققت «شركة مدينة نصر لالسكان »اعلى قيمة تداول في قطاع االسكان114 ‘Nasr city company for housing has achieved the highest turnover in the housing sector’ 190.852 نحو1996 ) حققت مبيعات التجزئة لسيارات «مازدا "في أوروبا عام115 ‘Retail sales for cars «Mazda» in Europe made in 1996 about 190.852.’ (3) Also, the subject-verb disagreement has caused ambiguity during the parsing and hence, failure in the extraction of the subcategorization frames such as the example in sentence (116).
177
.) نظم "مستشفى السالمة " في جدة مؤتمرا دوليا في فندق انتركونتيننتال جدة116 ‘Alsama hospital organized an international conference in Jeddah, at Intercontinental Jeddah Hotel’ The disagreement in the number and gender between the verb “‘ ”نظمorganize’ and the feminine singular subject “‘ ”مستشفى جدةJeddah hospital’ lead to failure in parsing sentence in (116). The developed rules have succeed in identifying the subcategorization frames of a large percentage of the selected verbs in the corpus as in sentence (117) the grammar has extracted the subcategorization of the verb [V+ [لPP] ] correctly. However, it has failed in extracting the subcategorization frame of the same verb “ ”استسلمas in sentence (118). The grammar wrongly identifies “ ”للدموعas the complement of “ ”استسلمwhile in fact it modifies the adjective “”المسيلة. .) استسلم زهير بخيت الذي يعول عليه االماراتيون كثيرا لرقابة عادل حسن117 ‘Zuhair Bakhit who Emiratis depend much on him to control Adel Hassan, surrendered.’ .) استسلمت الزوجة بعدما استخدمت قوات األمن القنابل المسيلة للدموع118 ‘The wife surrendered after the security forces had used tear gas.’ The total number of the automatically extracted frames for the sixty selected verbs is 23 and it is the same number of the manually extracted verbs. The list of the extracted pattern set is shown in table 17.
178
Table 17: The list of the pattern set
The list of patterns set V, + [NP] V, + [NP+NP] V, + [أنNP] V, +[NP+ إلىPP] V, +[NP+ علىPP] V, + [NP+لPP] V, + [NP+بينPP] V, + [NP+عنPP] V, + [NP+بPP] V, + [PP+PP] V, + PP [ علىNP] V, + PP [ لNP] V, + PP [ حتىNP] V, + PP [ بينNP] V, + PP [ فيNP] V, + PP [ حولNP] V, + PP [ منNP] V, + PP [ عنNP] V, + PP [ إلىNP] V, + PP[ بNP] V, + [عنPP+NP] V, V,
+ [لPP+بPP] + [لPP+NP]
179
Previous experiments for acquiring verbal subcategorization classes have been reported by Brent (1991, 1993), Manning (1993), Ushioda et al. (1993), Briscoe and Carroll (1997) and Attia (2011). All the reported methods are statistical based and not rule based. Moreover, the number of the extracted subcategorization classes was very few. These methods did not entertain any formal definitions of frames, and consequently assumed arbitrary and subjective characterizations of these frames. This lack of formalization would definitely result in inconsistent and relatively non-standardized frames and consequently lexical knowledge (Elghamry, 2004). In the experiments by Brent (1991, 1993), Manning (1993), Ushioda et al. (1993), the maximum number of distinct subcategorization classes recognized is sixteen, only Ushioda et al. (1993) has attempted to acquire the relative subcategorization frequency for individual predicates. Brent defines a number of lexical patterns such as closed class items and pronouns. The extracted subcategorization frames extracted by Brent were divided into five subcategorization classes, he does not report comprehensive results, but for one class, sentential complement verbs, he achieves 96% precision and 76% recall at classifying individual tokens of 63 distinct verbs of this class. Brent`s system was capable of dealing with only certain structures; namely "V NP to V" and "V to V". Furthermore, the system was successful with only certain type of NP which is pronouns, he had decided not to use the other noun phrases, because it is too difficult to determine their boundaries accurately. Ushioda et al. (1993) calculate the relative frequency of six subcategorization classes. They report an accuracy rate of 83% (254 errors) of 33 distinct verbs in text and suggest that incorrect noun phrase boundary detection accounts for the majority of errors. Manning (1993) recognizes sixteen distinct complementation patterns of 40 verbs in texts, the recall was 82%. Briscoe and carroll’s system (1997) rankings include all classes for each verb, from a total of 160 classes. . Attia et al extracted 240 frame types for 3,295 lemmas types, with 7,746 lemma frame types (for verbs, nouns and adjectives), averaging 2.35 frames per lemma. The developed method differs from the previous trails in several points. The first difference is that it is rule based. Therefore, it is considered as a novelty in the field. Being rule based has spared the parser the errors caused by statistical parsers. Moreover, the numbers of subcategorization frames for Brent, Ushioda et al. and Manning are considered small in comparison to the number of subcategorization frames presented in this thesis, which is total of 23 subcategorization frames for 60 verbs. An advantage that this method has over Brent`s system is its ability to deal with different types of NP that are mentioned in table 16.
180
CHAPTER 7 CONCLUSION
181
Chapter 7: Conclusion In this thesis, the researcher has analyzed the syntactic structures of some Arabic sentences based on the X-bar theory. The researcher has considered different structures in Arabic and demonstrated how they were analyzed, including the analysis of SVO, VOS, VSO. The researcher has also encountered the problem of analyzing the different nominal phrases such as noun and its adjectival modifiers, genitive construction and phrases that include apposition. Four sets of rules were developed: (1) Normalization rules to normalize the input (2) Tokenization rules to segment the input into tokens (3) Disambiguation rules to prevent wrong lexical choice from the dictionary and (4) Transformation rules to parse the structures and identify the arguments of the verbal predicates of the corpus. The implementation of the grammar rules has been presented. The results have shown a high accuracy level. The proposed methodology is flexible and can be extended to permit further modifications to include the extraction of the syntactic arguments of the other categories (noun, adverbs and adjectives). In the future, the research aims to provide lexical profiling for Arabic verbs and the other categories by covering syntactic subcategorization frames as well as to develop database that would provide a rich repository of Arabic lexicographic details. The subcategorization frames of 60 Arabic verbs have been extracted automatically from the corpus of 600 sentences. The results have been compared against a manually constructed collection of subcategorization frames designed manually by the researcher through the manual analysis of the 600 sentences. The comparison results have shown that the researcher has achieved high percentage of precision for the target verbs. The syntactic specifications of the result can be considered as a lexical database suitable for achieving an accurate syntactic parser. It is intended to enhance the system by adding more rules to deal with more sentence structures and to cover other syntactic features. The extraction algorithm does not deal with the passive voice and its effect on subcategorization behavior. There is a low frequency of the use of passive in Arabic is that there is a tendency to avoid passive verb forms when the active readings are also possible in order to avoid ambiguity and improve readability (Attia., 2011). There are two common methodologies for extracting subcategorization frames; the first is to predefine the subcategorization frames of the predicates and the second is to automatically parse sentences through which the user can extract the subcategorization frames of the predicates from parsed trees. In this thesis, the second methodology was adopted, since the researcher considered it as being more promising than introducing a predefinition for subcategorization frames. The researcher assumes that accurate parsing facilitates the automatic extraction to subcategorization frames of the predicate. However, it is argued that predefining subcategorization frames to the predicates will certainly help in improving the parsing results. It is a controversial issue. Therefore, the question would be what is more useful: is it the predefinition of subcategorization frames that will lead to a more accurate parser? or is it an accurate parser
182
that will lead to an accurate extraction of subcategorization frames of the predicates? Such question is highly confusing to the researcher. The results that the researcher have concluded are promising as the researcher has extracted 23 subcategorization frames accurately, which was the result of precise and adequate rules. In addition, the used tools for automatic extraction have assisted in the success of the experiment. Therefore, in future research, the researcher is intending to parse a larger corpus in order to extract more subcategorization frames for the rest of Arabic verbs. Furthermore, the researcher is planning to present a study for subcategorization frames of the other categories.
183
Reference Adger, D. (2003). Core Syntax. A Minimalist Approach. Oxford: Oxford University Press. Alan, M. (1993). Topics in the Syntax and Semantics of Coordinate Structures. PhD thesis, University of Maryland. Alansary, S., Nagi, M., Adly, N. (2010). UNL+3: The Gateway to a Fully Operational UNL System. In Proceedings of 10th International Conference on Language Engineering, Cairo, Egypt. Alansary, S. (2014). MUHIT: A Multilingual Harmonized Dictionary. The 9th edition of the Language Resources and Evaluation Conference, Reykjavik, Iceland. Al-Qahtani, D. (2004). Semantic Valence of Arabic Verbs. Libraire du Liban, Beirut. Anderson, S. (1977). Comments on the paper by Wasow. In P. Cullicover & A. Akmajian (Eds.), Formal syntax. New York: Academic Press. Anderson, S and Sandra, C. (1977). On grammatical relations and clause structure in verb initial languages. Syntax and semantics 8: Grammatical relations, P. Cole and J. Saddock, eds. 1-25. New York: Academic Press, Inc. Andrew, C. (2006). Syntax: A Generative Introduction, 2nd Edition. Malden, MA: Blackwell. Arun, A and Keller, F. (2005). Lexicalization in crosslinguistic probabilistic parsing: The case of French. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 306–313, Ann Arbor, Michigan. Arun, A. (2004). Statistical Parsing of the French Treebank.Master’s thesis, University of Edinburgh. Attia, M. (2008). Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. Ph.D. Thesis. The University of Manchester, Manchester, UK. Attia, M., Pavel Pecina, Lamia Tounsi, Antonio Toral and Josef van Genabith. (2011). Lexical Profiling for Arabic. Dublin City University, Dublin, Ireland.In Proceedings of eLex 2011, pp. 23-33. Baker, M. (1988). Incorporation: A theory of grammatical function changing. Chicago: University of Chicago Press. Barwise, J. and Perry, J. 1983. Situations and Attitudes. MIT Press, Cambridge, Mass. Bassam, H, Asma, M, Nadim O, Abeer T. (2014). Formal Description of Arabic Syntactic Structure in the Framework of the Government and Binding Theory. Computación y Sistemas, volume 18. Benmamoun, E. (2000). The feature structure of functional categories: a comparative study of Arabic dialects. Oxford: Oxford University Press. Bick, E. (2000). The Parsing System PALAVRAS. Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework.Aarhus: Aarhus University Press. Bielick, V., and O.Smrz. (2008). Building the Valency Lexicon of Arabic Verbs. In 6th Conference on Language Resources and Evaluation (LREC). Marrakech, Morocco. Black, C. (1998). A Step-by-step introduction to Government and Binding theory of syntax: SIL - Mexico Branch and University of North Dakota.
184
Boeckx, C. (2006). Linguistic Minimalism. Origins, Concepts, Methods and Aims. Oxford University Press. Boguraev, B. K. and Briscoe, E. J. (1987). Large lexicons for natural language processing utilising the grammar coding system of the Longman Dictionary of Contemporary English. Computational Linguistics, 13(4):219–240. Brent, M.R. (1991). Automatic extraction of subcategorization frames from untagged texts. In procesings of the 29th Annual Meeting of the ACL. 209-214. Brent, M.R. (1993). From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics.19: 243-262. Brent, M.R. (1994). Surface cues and robust inference as a basis for the early acquisition of subcategorization frames. Lingua 92: 433-470. Bresnan, J. 2001. Lexical-functional syntax. Oxford: Blackwell. Brisco, T., and Carroll, J. (1993). Generalized probabilistic LR parsing of natural language (corpora) with unification-based methods. Computational linguistics 19:25-59. Brisco, T., and Carroll, J. (1997). Automatic extraction of subcategorization from corpora. In proceeding of the 5th ACL Conference on Applied natural Language Processing, Washington, DC.356—363. Briscoe, T. (2000). Dictionary and System Subcategorisation Code Mappings. Unpublished manuscript, University of Cambridge Computer Laboratory. Briscoe, T. and Carroll, J. (2002). Robust accurate statistical annotation of general text. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, pages 1499–1504. Buchholz, S. (1998). Distinguishing Complements from Adjuncts Using Memory-Based Learning. In B. Keller ed. ‘Proceedings of the ESSLLI-98 Workshop on Automated Acquisition of Syntax and Parsing. Cahill, A., McCarthy, M., Genabith, J., and Way, A. (2002). Parsing text with a PCFG derived from Penn-II with an automatic F-structure annotation procedure. In Proceedings of the Seventh International Conference on LFG, edited by Miriam Butt and Tracy Holloway King. CSLI Publications, Stanford, CA, pages 76–95. Carlson, G. N., Tanenhaus, M. K. (1988). Thematic roles and language comprehension. In W. Wilkins (Ed.), Syntax and semantics, Volume 21: Thematic relations. New York: Academic Press. Carnie, A. (2006). Syntax: A Generative Introduction. Blackwell. Carroll, J. (1993). Practical unification-based parsing of natural language. Cambridge University Computer Laboratory, TR-224. Carroll, J. (1994). Relating complexity to practical performance in parsing with wide coverage unification grammars. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, NMSU, Las Cruces, NM. 287-294. Carroll, J and M. Rooth. (1998). Valence induction with a head-lexicalized PCFG. In 3rd Conference on Empirical Methods in Natural Language Processing, Granada, Spain. Carroll, J., Briscoe, T, and Sanfilippo, A. (1998). Parser evaluation: A survey and a new proposal. In Proceedings of the International Conference on Language Resources and Evaluation, pages 447–454.
185
Carter, A. and Gerken, L. A. (1997). Children’s use of grammatical morphemes in online sentence comprehension. In E. Clark (ed.), Proceedings of the 28th Annual Child Language Research Forum. Palo Alto, CA: Stanford University Press. Chen, S. F. and Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. In Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pages 310–318. Chesley, P and Salmon-Alt, S. (2006). Automatic extraction of subcategorization frames for French. In Proceedings of the LREC ’06. Choe, H.-S. (1986). An SVO analysis of VSO languages and parameterization: A study of Berber. MIT lexicon project working paper 14 (PP.121-158). Cambridge, MA: MITWPL. Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press. ISBN 0-262-53007-4. Chomsky, N. (1970). Remarks on nominalization. In: R. Jacobs and P. Rosenbaum (eds.) Reading in English Transformational Grammar, 184-221. Waltham: Ginn. Chomsky, N. (1972). The Port-Royal Grammar of 1660 identified similar principles. Chomsky, N. (1981/1993). Lectures on Government and Binding: The Pisa Lectures. Mouton de Gruyter. Chomsky, N. (1982). Some Concepts and Consequences of the Theory of Government and Binding. Linguistic Inquiry Monograph 6. MIT Press. Chomsky, N. (1986). Barriers. Linguistic Inquiry Monograph 13. MIT Press. Chomsky, N. (1993). A minimalist program for linguistic theory. MIT occasional papers in linguistics no. 1. Cambridge, MA: Distributed by MIT Working Papers in Linguistics. Chomsky, N. (1995). The Minimalist Program. MIT Press. Chomsky, N. (1995). Bare Phrase Structure. In Evolution and Revolution in Linguistic Theory. Essays in honor of Carlos Otero., eds. Hector Campos and Paula Kempchinsky, 51– 109. Chomsky, N. (2002). Syntactic Structures. p.11. Chrupala, G. (2003). Acquiring Verb Subcategorization from Spanish Corpora, PhD program “Cognitive Science and Language,” Universitat de Barcelona, Department of General Linguistics. Chung, S. (1990). VP’s and verb movement in Chamorro. Natural Language and Linguistic Theory 8.4:559-620. Clark, H. and Clark, E. (1977). Psychology and language: An introduction to psycholinguistics. New York: Harcourt Brace Jovanovich. Collins, M. (1999). Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania. Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4). Cook, W.A. (1979). Case Grammar: Development of the Matrix Model (1970–1978). Georgetown University Press. Culicover, P. and Jackendoff, R. (2005). Simpler Syntax. Oxford University Press. Dalrymple, M. (2001). Lexical Functional Grammar.Volume 34 of Syntax and Semantics. New York: Academic Press. DeNeefe, S. and Knight, K. (2009). Synchronous tree adjoining machine translation. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 186
Diesing, Molly. (1990). Verb movement and the subject position in Yiddish.Natural Language and Linguistic Theory 8:41-79. Ditters, E. (2003). An AGFL for the Description of Non-coinciding Phrasal Heads. In Proceedings of the Joint International Conference of the 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2003), July 28-30, Orlando Florida, USA, VOL. VI, pp. 107-112. Dligach, D. and Palmer, M. (2008).Word sense disambiguation with automatically retrieved semantic knowledge. International Journal of Semantic Computing (IJSC), 2(3):365–380. Dowty, D. (1986). On the semantic content of the notion “thematic role.” Paper presented at the University of Massachusetts Conference on Property Theory, Type Theory, and Semantics. Dowty, D. (1975). The stative in the progressive and other essence/accident contrasts. Linguistic Inquiry, 6, 579-588. Elghamry, K. (2004). A Generalized Cue-Based Approach to the Automatic Acquisition of Subcategorization Frames. PhD thesis, Indiana University, Bloomington, Indiana. El-Shishiny, H. (1990). A Formal Description of Arabic Syntax in Definite Clause Grammar. International conference on computational linguistics.inn proceeding COLING’90 of the 13th conference on computational linguistics –Volume 3 pages 345-347. Fassi-Fehri, A. (1993). Issues in Arabicclauses and words. Dordrecht: Kluwer Academic Publishers. Fillmore, C. J. (1968). Lexical entries for verbs. Foundations of Language, 4, 373-393. Fillmore, C. J. (1994). Under the circumstances. In Proceedings of the 20th Annual Meeting of the Berkeley Linguistics Society, Berkeley, California. Fisher, C., Gleitman, H., and R. Gleitman, L. (1991). On the semantic content of subcategorization frames. Cognitive Psychology, 23(3):331- 392. Francopoulo, G., George, M. (2005). Lexical markup framework (lmf = iso 24-613). Technical report, INRIA Loria. Gawron, J. (1988). Lexical Representations and the Semantics of Complementation. New York: Garland Publisher. Gelderen, E. (2002). An introduction to the grammar of English: Syntactic arguments and socio-historical background. Amsterdam: John Benjamins. Gleitman, L. R. (1965). Coordinating conjunctions in English. Language, 41, 260-293. Greenberg, J. (1963). Some universals of grammar with particular reference to the order of meaningful elements. In J. Greenberg (ed.), Universals of Language. Cambridge, Mass:MIT Press. 73-113. Grimshaw, J. (1990). Argument Structure. Cambridge, Mass./London, England. (Linguistic Inquiry Monographs 18). Grishman, R., Macleod, C., and Meyers, A. (1994). Comlex syntax: building a computational lexicon. In International Conference on Computational Linguistics, COLING94, pages 268–272. Haegeman, L. (1991). Introduction to government and Binding Theory. Blackwell, Oxford UK and Cambridge, USA. Hajic, J., and ttladkfi, B. (1998). Fagging inllective languages: Prediction of morphological categories for a rich, structured tagset. In Proceedings of COLING-ACI, 98, Universitd de Montrdal, Montreal, pages 483-490. 187
Hajic, J., Cmejrek, M., Dorr, B., Ding, Y., Eisner, J., Gildea, D., Koo, T., Parton, K., Penn, G., Radev, D., and Rambow, O. (2002). Natural language generation in the context of machine translation. Technical report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore. Hindle, D and Rooth, M. (1993). Structural ambiguity and lexical relations. Computational Linguistics, 19 (1). Hoyt, F. (2013). "Verb Phrase." Encyclopedia of Arabic Language and Linguistics. Hwang, J. (2011). Making Verb Argument Adjunct Distinction in English. Synthesis Exam paper, University of Colorado at Boulder. Ienco, D., Villata, S., Bosco, C. (2008). Automatic extraction of subcategorization frames for Italian. In the Proceedings of the sixth International Conference on Language Resources and Evaluation LREC, Marrakech, Marocco, 28-30. Itai, A. and Wintner, S. (2008). Language resources for Hebrew.Language Resources and Evaluation, 42(1). Jackendoff, R. (1972). Semantic interpretation in Generative Grammar. The MIT Press, Cambridge, Massachusetts. Jackendoff, R. (1978). Grammar as evidence for conceptual structure. In M. Halle, J. Bresnan, & G. Miller (Eds.), Linguistic theory and psychological reality. Cambridge, MA: MIT Press. Jackendoff, R. (1983). Semantics and cognition. Cambridge, MA: MIT Press. Jackendoff, R. (1987). The status of thematic relations in linguistic theory. Linguistic Znquiry, 18(3). Joshi, A. K., and Schabes, Y. (1997). Tree adjoining grammars. In G. Rozenberg and A.Salomaa (eds.) Handbook of formal languages. Berlin: Springer-Verlag, pp. 69-123. Kaplan, R. and Bresnan, J. (1982). Lexical functional grammar: A formal system for grammatical representation. In Joan Bresnan, editor, The Mental Representation of Grammatical Relations. MIT Press, Cambridge, MA, pages 173–281. Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3):400–401. Kimball, J. P. (1973). Seven principles of surface structure parsing in natural language.Cognition 2, 15-47. Kitagawa, Y. (1986). Subjects in Japanese and English. Ph.D. Dissertation. Amherst: University of Massachusetts. Koopman, H. and Sportiche, D. (1991). On the position of subjects. The syntax of verb initial languages, edited by J. McCloskey. 211-258. Lingua, Special Edition. Korhonen, A. (2002). Subcategorization Acquisition. Ph.D. thesis, University of Cambridge. Korhonen, A., Krymolowski, Y., Briscoe, T. (2006). A Large Subcategorization Lexicon for Natural Language Processing Applications. In the Proceedings of the fifth international conference on Language Resources and Evaluation (LREC 2006), Genova, Italy. Kuroda, S. Y. (1988). Whether we agree or not. Lingvisticae Investigationes 12:1-47. Lakoff, G and Ross, J. (1976).Why you can’t do so into the sink. In J. McCawley, editor, Notes from the Linguistic Underground, volume 7 of Syntax and Semantics. Academic press, New York.
188
Lapata, M., Keller, F. and Schulte, S. (2001). Verb frame frequency as a predictor of verb bias. Journal of Psycholinguistic Research, 30(4):419–435. Laplace, P. (1995). Philosophical Essays on Probabilities. Springer-Verlag. Lopatkova, M., Zabokrtsky, Z., and Benesova, V. (2006). Valency Lexicon of Czech Verbs VALLEX 2.0. Technical Report 34. UFAL MFF UK, Charles University in Prague. Loukil, N., Haddar, K. and Benhamadou A. (2008). A syntactic lexicon for Arabic verbs, LREC, 2008. Maamouri, M., Bies, A. (2004). Developing an ArabicTreebank: Methods, guidelines, procedures, and tools. In Workshop on Computational Approaches to Arabic Script based Languages, COLING. Manning, C.D. (1993). Automatic acquisition of a large subcategorization dictionary from corpora. Proceeding of the 31st annual meeting of the association for computational linguistics, Columbus, Ohio, pp.235-242. Marcus, M.P., Santorini, B., and Marcinkiewicz, M.A. (1993). Building a large annotated corpus of English: The PennTreebank.Computational Linguistics, 19. Marinov, S and Hemming, C. (2004). Automatic Extraction of Subcategorization Frames from the Bulgarian Tree Bank. Graduate School of Language Technology, Goteborg, Sweden. McCarthy, D and Carroll, J. (2003). Disambiguating nouns, verbs and adjectives using automatically acquired selectional preferences. Computational Linguistics, 29(4):639–654. McCloskey, J. (1991). Clause structure, ellipsis and proper government in Irish. The syntax of verb-initial languages, edited by James McCloskey, 259-302. Lingua, Special edition. Merlo, P and Ferrer, E. (2006). The notion of argument in prepositional phrase attachment. Computational Linguistics, 32(3):341–378. Messiant, C. (2006). A subcategorization acquisition system for French verbs. In the proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Techonologies, Columbus, Ohio, 55-60. Messiant, C., Poibeau, T., and Korhonen, A. (2008). Lexschem: a large subcategorization lexicon for french verbs. In LREC. European Language Resources Association. Miller, G, Beckwith, R., Felbaum, C., Gross, D., and Miller, K. (1990). Introduction to WordNet: An online lexical database. Journal of Lexicography, 3(4):235– 244. Mohammad, M. (1988). On the parallelism between IP and DP. In proceedings of the west coast conference on formal linguistics (WCCFL) VII.ed. By Hagit Borer. 241-254.Standord: CSLI. Molly, D. (1990). Verb movement and the subject position in Yiddish. Natural Language and Linguistic Theory 8:41-79. Montemagni, S., Barsotti, F., Battista, M., Calzolari, N., Corazzari, O., Lenci, A., Zampolli, A., Fanciulli, F., Massetani, M., Raffaelli, R., Basili, R., Pazienza, M.T., Saracino, D., Zanzotto, F., Mana, N., Pianesi, F., and Delmonte, R. (2003). Building the Italian SyntacticSemantic Treebank. In Anne Abeill ́e, editor, Building and using Parsed Corpora. Kluwer, Dordrecht. Morrill, G. (1994). Type-logical grammar. Dordrecht: Kluwer. O'Donovan, R., Burke, M., Cahill, A., van Genabith, J. & Way, A. (2005). Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks. Computational Linguistics, 31(3), pp. 329-366. 189
Palmer, M., Bies, A., Babko-Malaya, O., Diab, M., Maamouri M., Mansouri, A. & Zaghouni, W. (2008). A pilot Arabic Propbank. In Proceedings of LREC, Marrakech, Morocco. Panevova, J. (1974). On Verbal Frames in Functional Generative Description, Part I. Prague Bulletin of Mathematical Linguistics, 22:3–40, Panevova, J. (1975).On Verbal Frames in Functional Generative Description, Part II. Prague Bulletin of Mathematical Linguistics, 23:17–52, Panevova, J. (1994). Valency Frames and the Meaning of the Sentence. In The Prague School of Functional and Structural Linguistics, pages 223–243. John Benjamins, Amsterdam– Philadelphia. Pinker, S. (1987). Resolving a learnability paradox in the acquisition of the verb lexicon (Lexicon Project Working Papers No. 17). Cambridge, MA: MIT Center for Cognitive Science. Pollard, C. and Sag, I. (1987). Information-Based Syntax and Semantics, Volume 1: Fundamentals. University of Chicago Press, Stanford. Pollard, C. and Sag, I. (1994). Head-Driven Phrase-Structure Grammar.University of Chicago Press, Chicago.Pustejovsky, J. Quirk, R., Greenbam, S., Leech G., and Svartvik, J. (1985). A Comprehensive Grammar of the English Language. Longman, London and New York. Radford, A. (1988).Transformational Grammar: A First Course. Cambridge University Press. Rehbein, I., van Genabith, J. (2009). Automatic Acquisition of LFG Resources For German As Good As It Gets. In Proceedings of the LFG09 Conference. Cambridge, UK. Rouvret and Vergnaud, J- R. (1980). Specifying Reference to the Subject. Linguistic Inquiry, 11(1). Ryding, K. (2005). A reference grammar of Modern Standard Arabic. Cambridge: Cambridge University Press. Saeed, J. (1997). Semantics. Blackwell, 2nd edition. Sarkar, A and Zeman, D. (2000). Automatic extraction of subcategorization frames for Czech. In Proceedings of the 19th International Conference on Computational Linguistics, pages 691–697, Saarbrucken, Germany. Scarton, C. (2011). Construc~ao semiautomatica de um lexico computacional de verbos para o portugu^es do Brasil. In The Proceedings of the eighth Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiaba, Brazil. Schulte, S and. Brew, C. (2002). Inducing german semantic verb classes from purely syntactic subcategorisation information. In Proc. of the 40th Annual Meeting of ACL, Philadephia, USA. Schulte, S. (2002a). A subcategorisation lexicon for German verbs induced from a lexicalised PCFG. In Proceedings of the Third LREC Conference, pages 1351–1357, Las Palmas, Spain. Schulte, S. (2002b). Evaluating verb subcategorisation frames learned by a German statistical grammar against manual definitions in the Duden Dictionary. In Proceedings of the 10th EURALEX International Congress, pages 187–197, Copenhagen. Sells, P. (1985). Lectures on Contemporary Syntactic Theories. Center for the Study of Language and Information, Stanford. 190
Shady, M., and Gerken, L.A. (1999). Grammatical and caregiver cues in early sentence comprehension. Journal of Child Language, 26, 163-175. Sproat, R and Bedrick, S. (2011). CS506/606: Txt Nrmlztn. Steedman, M. (1994). Acquisition of Verb Categories. In L. Gleitman and B. Landau eds. The Acquisition of the Lexicon. Steedman, M., and and Baldridge,J. (2005) Combinatory Categorial Grammar to appear in: R. Borsley and K. Borjars (eds.) Non-Transformational Syntax, Blackwell Stern, N. (1994). The Verb Dictionary. Bar-Ilan University. Surdeanu, M., Harabagiu, S., Williams, J., and Aarseth, P. (2003). Using predicate-argument structures for information extraction. In Proc. of the 41st Annual Meeting of ACL, Sapporo. Talmy, L. (1975). Semantics and syntax of motion. In J. Kimball (Ed.), Syntax and Semantics. New York: Academic Press. Tesniere, L. (1959).Elements de syntaxe structurale. Editions Klincksieck. Ushioda, A., Evans, D. A., Gibson, T., and Waibel, A. (1996). Estimation of verbsubcategorization frame frequencies based on syntactic and multi-dimensional statistical analysis. In H. Bunt and M. Tomita (eds.), Recent Advances in ParsingTechnology. Dordrecht: Kluwer. 241-254. Vendler, Z. (1967). Linguistics in philosophy. Ithaca, NY: Cornell Univ. Press. Vendler, Z. (1972). Res cogitnns. Ithaca, NY: Cornell Univ. Press. Villavicencio, A. (2002). Learning To Distinguish PP Arguments from Adjuncts. International Conference on Computational Natural Language Learning. Wood, M. (1993). Categorial Grammars. Routledge, London and New York. Xue, N. (2006). Annotating the predicate-argument structure of Chinese nominalizations. In Proceedings of LREC’06. Zanette, A., Scarton, C., and Zilio, L. (2012). Automatic extraction of subcategorization frames from corpora: an approach to Portuguese. In Proceedings of the 2012 International Conference on Computational Processing of Portuguese - Demo Session, Coimbra, Portugal. Zwicky, A. (1971). In a manner of speaking. Linguistic Inquiry, 11(2), 223-233.
191
The Electronic Sites:
http://mawdoo3.com/%D8%AA%D8%B4%D8%B1%D9%8A%D8%AD_%D8%A7%D9 %84%D8%AC%D9%85%D9%84%D8%A9_%D8%A7%D9%84%D8%B9%D8%B1%D 8%A8%D9%8A%D8%A9 (http://www.unlweb.net/wiki/Voice). (http://www.unlweb.net/wiki/Case). (http://www.unlweb.net/wiki/Tense). (http://www.unlweb.net/wiki/Person). (http://www.unlweb.net/wiki/Number). (http://www.unlweb.net/wiki/Gender).
192
Appendix (A) The set of subcategorization frames extracted for the direct object verbs (TSTD)
Arabic verb []خدم []حقق []اعتبر []علم []نظم []واجه []كره []كتب []شجع []تمنى []تسلق []نظف []ضم []أغلق []مسح []نقل []سمع []وهب []لعن
[]ربح
TSTD Subcategorization Frame خدم: V, + [NP] حقق: V, + [NP] اعتبر: V, + [NP] علم: V, + [NP] علم: V, + [أنNP] نظم: V, + [NP] واجه: V, + [NP] كره: V, + [NP] كتب: V, + [NP] شجع: V, + [NP] تمنى: V, + [NP] تسلق: V, + [NP] نظف: V, + [NP] ضم: V, + [NP] أغلق: V, + [NP] مسح: V, + [NP] نقل: V, + [NP] سمع: V, + [NP] سمع: V, + [أنNP] وهب: V, + [NP] لعن: V, + [NP] ربح: V, + [NP]
193
Appendix (B) The set of subcategorization frames extracted for the indirect object verbs (TSTI)
Arabic verb []فاز []اعتمد
[]قضى
[]امتد []انقض []فصل
[]تمسك
[]التف
[]انسحب []التقى []حصل
TSDI Subcategorization frame فاز: V, +PP[ بNP] فاز: V, +Nothing اعتمد: V, +PP[ علىNP] اعتمد: V, +NP قضى: V, +[NP+NP] قضى: V, +NP قضى: V, +PP[ علىNP] قضى: V, +NP[ أنclause] امتد: V, +PP[ إلىNP] امتد: V, +PP[ لVP] امتد: V, +PP[ حتىVP] انقض: V, +PP[ علىNP] فصل: V, +[ـــــNP] فصل: V, +PP[ بينNP] فصل: V, +PP[ فيNP] تمسك: V, +PP[ بNP] تمسك:V, +Nothing التف: V, +PP[ حولNP] التف: V, +PP[ بNP] التف: V, +PP[ علىNP] انسحب:V, +PP[ منNP] انسحب:V, +PP[ إلىNP] التقى: V, +[ـــــNP] التقى: V, +PP[ بNP] حصل: V, +PP[ علىNP] حصل: V, +[ـــــNP] 194
]NPمن [: V, +PPتخلص ]NPإلى [:V, +PPتخلص ]NPب [: V, +PPارتبط
]ارتبط[
]NPعن [:V, +PPتخلى
]تخلى[
]NPعلى [: V, +PPوضع ]NPـــــ[:V, +وضع ]NPعلى [: V, +PPوافق
]وضع[
]NPل [: V, +PPاستسلم : V, + Nothingاستسلم
195
]تخلص[
]وافق[ استسلم
Appendix (C) The set of subcategorization frames extracted for the ditransitive verbs (TST2) TST2
Arabic verb []أعاد
[]فرض []كرس
Subcategorization Frame أعاد: V, +[NP+ إلىPP] أعاد: V,
+[NP]
فرض:V,
+[NP+ علىPP]
فرض:
V, +[NP]
كرس: V, + [NP+لPP] ضم: V,
+ [NP+لPP]
ضم: V,
+ NP
[]هنأ
هنأ: V, هنأ: V,
+ [NP+بPP] + [NP+علىPP]
[]غمر
غمر: V,
+[NP]
تبرع: V,
+ [PP+PP]
تبرع: V,
+ [PP]
[]ضم
[]تبرع
[]تنازل
[]أسند []مرر []وضع
تنازل: تنازل: تنازل: أسند: أسند: مرر: مرر: مرر:
V, + [PP+PP] V, + [PP] V, +Nothing V, + [NP+إلىPP] V, + [NP+لPP] V, + [NP] V, + [NP+إلىPP] V, +PP[ بNP]
وضع: V,
196
+ [NP]
TST2
Arabic verb []نقل []أضاف []اتهم
Subcategorization Frame نقل:
V,
+ [PP+NP]
نقل:
V,
+ [NP]
أضاف:V, + [NP+PP] أضاف:V, + [NP] اتهم: V, + [NP+PP] اتهم: V,
+ [PP+NP]
ربط: V,
+ [PP+NP]
ربط: V,
+ [NP+PP]
[]خصص
خصص:
V,
[]أجبر
أجبر: V,
[NP+ علىPP]
[]ربط
[]سمح
[]برر
+ [NP+PP]
سمح: V, +PP[ بNP] سمح: V, + [لPP+بPP] سمح: V, + [لPP+NP] برر: V, + [NP+PP] برر: V,
197
+ [NP]
)Appendix (D The selected sentences for the direct object verbs context sentences حقق «البنك االهلي التجاري »العام الماضي ارباحا صافية بلغت 915مليون حققت «شركة مدينة نصر لالسكان »اعلى قيمة تداول في قطاع االسكان حققت شركة «النقل المتحدة »المحدودة مبيعات بلغت 43مليون لاير سيحقق المشروع زيادة انتاج الشركة من منجم الشيدية الى اكثر من حققنا عام 1996أرباحا بلغت 787مليون درهم مقابل 602مليون تحقق بعض التقدم في تقريب وجهتي النظر خالل اجتماع الجزائر ،اال حققت مبيعات التجزئة لسيارات «مازدا »في أوروبا عام 1996نحو 190.852 حققت محفظة االسهم البريطانية بالجنيه االسترليني عائدات وصلت نسبتها الى 11.2 حققت الموازنات المستقلة األخرى زيادات طفيفة السنة الجارية لتبلغ موازنة المجلس حققت الشركة العام الماضي ارباحا صافية بلغت 427مليون لاير (نحو يعتبر هذا المشروع االول من مشاريع مجمع االقمشة الصناعي الذي تخطط اعتبر ان التحديث المطلوب ينبغي ان يشمل طرق االدارة وأساليب العمل اعتبر ان نمو االستهالك العالمي للنفط والغاز ،خصوصا في األسواق النامية تعتبر «كيوتل »من أنجح المؤسسات في قطر. يعتبر قرار المجلس التنفيذي االثنين الماضي دعم من قبل اعضاء المؤسسة اعتبر األحمر ان المصارف االسالمية وسيلة فعالة لزيادة حجم التداول النقدي تعتبر منطقة الساحل الغربي لبورسعيد من اهم مناطق التنمية السياحية ،وتتميز تعتبر الموازنة العامة البرنامج المالي السنوي لتحقيق خطة التنمية االقتصادية واالجتماعية يعتبر المزارعون االميركيون المسألة كلها أمرا بينا ال يحتاج الى تأويل اعتبر المبلغ المذكور دليال على أهمية االستثمار في األسواق الصاعدة ،مشيرا اعتبر المدير العام لمؤسسة االسمنت ان اقبال شركات عربية لالستثمار في اعتبر في تصريحات الى «الحياة »ان الجانب المهم في برنامج التخصيص يحاكم مكفاي بتهمة تفجير مبنى اتحادي في اوكالهوما في نيسان (ابريل) يحاكم رمزي يوسف امام محكمة اتحادية في نيويورك. تحاكم محكمة تل أبيب اآلن رجل االعمال االسرائيلي ناحوم مانبار بتهمة يحاكم أربعة من المتهمين في القضية غيابيا. تحاكم تلك القيادات غيابيا بتهمة اعالن االنفصال وإشعال حرب ،1994وعقدت سيحاكم في وقت الحق هذا العام بتهمة تدبير تفجير مركز التجارة تحاكم الحكومات على أفعالها ال على كالمها المعسول ،وهذه فرصة لحكومة يحاكم العادلي وستة من مساعديه في قضية التحريض على قتل المتظاهرين يحاكمها أمام التاريخ لحساب الحقيقة ،وحساب االمة العربية التي تتكون من تحاكم سياسة إيران على هذا األساس ،معتبرة أن مواقف إيران غير علمت «الحياة »ان مشاوارات جادة يجريها البنك االهلي المصري لالتفاق على
198
context sentences علم من مصدر أوروبي ان اعضاء مكتب المفوض األوروبي المكلف بملف يعلم االستاذ هيكل جيدا ان الدول واالمم تعمل على تحقيق اهدافها علمت ان المنظمات غير الحكومية الفلسطينية تعاني تراجعا في الدعم الذي علم القارئ فيما سبق ان نيتي كانت منصرفة الى السفر الى علم االشارات والتحليل النفسي علمتني أن الحياة مقدسة علم مراسل األهرام ان 3من الموقعين علي البيان وهم مقداد نعلم أنها مؤثرة فى المدى القصير على المواطنين. نعلم كيف تستغل هذا االمر في محاولة لقلب االنظمة وتقليب الشعوب ينظم مركز القاهرة االقليمي للتحكيم التجاري الدولي مؤتمرا قانونيا دوليا حول نظم «مستشفى السالمة »في جدة مؤتمرا دوليا في فندق انتركونتيننتال جدة نظمت الشركة اللبنانية لتطوير واعادة اعمار وسط بيروت (سوليدير )اللقاء الدوري نظم خالل عام 1996اكثر من 3827معرضا ومؤتمرا شكلوا زيادة تنظم المؤتمر وزارة العدل في دولة الكويت بدعم من مؤسسة الكويت ينظم اتحاد غرف التجارة والصناعة والزراعة في البالد العربية ندوة عن نظمت شركة «فيترا انترناشيونال المحدودة »احدى الشركات الموردة الرئيسية للوازم البيع تنظم الندوة وزارة االقتصاد في الجمهورية البولندية بمشاركة وزارتي المال والنقل تنظم مفوضية الشؤون التجارية االسترالية (أوستريد )سلسلة من الندوات في الواليات تنظم الفاليك الشراعية أيضا رحالت قصيرة للنزهة في وسط النيل وقت ستنظم الحملة في مختلف األقاليم وتشمل الدعم بالوسائل السمعية والبصرية ،مثل نظم أحمد نظيف رئيس الوزراء مؤتمرا. واجه مشروع بناء المحطة الخامسة اعتراضات كبيرة سببها تخوف السكان الذين واجه المشروع منذ انطالقه قبل اكثر من 20عاما عراقيل كبيرة واجه فرعون في الواليات المتحدة ايضا تهمة شراء حصة في مصرف واجهت اسرائيل مقاومة متماسكة في رفض مخططاتها في االستيطان و«االسرلة »ونزع واجه العرب في تاريخهم تحديات عدة تمثلت في غزوات ،كان بعضها واجهت المحاوالت العراقية السابقة في غالبية الحاالت موقفا دوليا يقوم على واجه الرئيس األلباني صالح بريشا انشقاقا داخل الحزب الحاكم بزعامته ،بعدما واجه نتانياهو ضغوطا كبيرة من واشنطن واليسار االسرائيلي من جهة ومن واجهت الديبلوماسية السودانية لفترة طويلة صعوبات كبيرة في محاوالتها ترطيب العالقات واجهت محكمة البداية في صيرة (عدن )امس المتهم الخامس في قضية واجه السباق التقليدي الذي أقيم في دايتونا ارتفاع درجة الحرارة فياج يكره االسرائيليون تذكيرهم بأن جذور الدولة اليهودية ،التي تحتفل قريبا بالذكرى اكره الديموقراطية كثيرا .انها تكاد تقتلني »كتب لورنس الى الليدي اتوالين يكره الموريتاني عموما المبالغة في تحسين امر أو تقبيحه. تكره التلفزيون والمسرح التجاري والعمل مع المحترفين . يكره االتحاد االوروبي ايضا تلك الودائع التي يعتبرها وسيلة الخفاء دخل
199
context sentences أكره العمل الذي يجمد حركتي واشعر بأنني مقيدة خلف مكتب ،ولكن أكره األغنياء جدا لجشعهم الذي بال حد ،وسطوتهم في االستهالك وإعالء أمره من أجلهم تجارة الذهب والماس والعقارات والسيارات الفخمة والقصور ،والمالبس أكره باعة السعادة الكاذبة ومشتريها أيضا .كما أكره الحيتان وسمك القرش نكره العنف أساسا في التعامل إال عند الضرورة القصوى ،أي عندما كتب المفوض االوروبي للمنافسة كاريل فان مييرت الى الوزير النغ يقول كتبت صحيفة «اخبار» اليومية ان القرار اتخذه المجلس األعلى لالدارة قبل كتب الوزير قبل يومين من قرار رفع الحظر الى الشركات االميركية كتب السيد تامر وائل يسأل عن امكانات ارتباط بانترنت عن طريق كتبنا في مناسبات عدة ماضية عن «جدران النار »ووسائل حماية االطفال كتبت القصص في زمن الحرب اللبنانية فهي تحمل كثيرا من مآسيها، يكتب ما يرغب الجميع بكتابته . كتب رئيس الدولة االسرائيلية السابق مذكراته باالنكليزية كتب طه حسين عن هذه الواقعة في رثائه الحمد امين عام يفتح «بنك سبأ االسالمي »أبوابه اليوم السبت لتقديم خدماته الى الجمهور فتحت الحكومة امام القطاع الخاص المجاالت التي كانت حكرا على القطاع سيفتح باب االكتتاب في الفترة من 41الى 82تموز (يوليو) فتح هذا الكتاب عيون اسبانيا على األثر التاريخي الذي ال تمتلكه فتح معرض فرانكفورت الدولي ال 57ابوابه امس امام الصحافة في فتحت مصر في العام الماضي عددا كبيرا من المكاتب السياحية الجديدة فتح ابوابه الخميس ويستمر ستة ايام . تفتح في مختلف انحاء الجمهورية دورات كثيرة لقراءة القرآن وتعلم اللغة فتحت فاطمة هواري عينيها لتجد نفسها في احد مستشفيات بيروت هذه فتح لي الحبر أحضانه. يشجع على هذا التفاؤل بروز مصطلحات وتسميات. شجعت اللجنة جهود القطاع الخاص في البلدين لتأسيس شركات مشتركة في شجع التقرير الدول النامية على جعل اسواقها المالية اكثر جاذبية يشجع المغرب منذ فترة فكرة توسيع الشراكة والمبادالت التجارية بين القطاع اللجنة شجعت في البلدين جهود القطاع الخاص. شجعت دولة االمارات العربية المتحدة االستثمارات الخاصة في تكرير النفط تشجع الخطة األولى اقامة مشاريع شراكة بين مصانع كندية وأخرى محلية يشجعني الكثير من االخوان واالصدقاء القراء على تكرار الكتابة في االمور شجع يلتسن هذه االشاعات عندما وبخ وزيره المولج بالشؤون االقتصادية علنا شجعت االسعار المرتفعة وسياسات االصالح الحكومية على زيادة مساحات القمح في شجعت تجربة دمياط ثم الكرك صالح الدين على بدء مناوشاته ضد تمنى مجلس إدارة الجمعية في اجتماع عقد أول من أمس أن تمنى أال يفرض على التسهيالت الممنوحة في االقراض للمصارف ضمان من
200
context sentences تمنى السحيباني ان تولي المصارف المركزية في الدول العربية اهمية خاصة تمنى المعارضون ،على كل حال ،لو ان وثائق اوسلو نصت صراحة نتمنى أن نفكر بروية ومن دون تشنج في البحث عن حلول يتمنى الفنان احمد زكي ان يجسد شخصية الرئيس المصري حسني مبارك تمنى الصبي لو كان بامكانه رؤيتها على الحقيقة. اتمنى للرئيس الفلسطيني العمر كله ،ولكن أزيد ان السيد محمود عباس تمنى النائب جميل شماس كثرة أمثال المعلم حسيب يعملون العمار بلدهم تمنى السفير البريطاني في لبنان ديفيد ماكلينن «ان يوفر السالم الشامل تعمل في اليمن 18شركة اجنبية في مجال التنقيب عن النفط تعمل السعودية على تنفيذ خطة مداها 52سنة حتى 0202تهدف يعمل في المركز المقترح ،وفقا للدراسة ،عدد من اساتذة الجامعات والمهندسين يعمل في القطاع الصناعي ما يزيد على 200الف عامل او تعمل مصر حاليا على تجهيز بعض المناطق السياحة العالجية ،باالضافة الى تعمل المؤسسة حاليا على تنفيذ مشروعين على الساحل الشرقي. تعمل مقاصة لندن مع سوق تبادل عقود النفط الدولية وسوق العقود تعمل وزارة التموين الفلسطينية في مختلف المناطق الفلسطينية لمصادرة لبان «علكة» تعمل «هنت »حاليا في منطقة امتياز في محافظة مأرب. تعمل اسرائيل واألردن ايضا على التوصل الى اتفاق القامة منطقة للتجارة تتسلق النسوة العجائز هذا البعيد على درجات منحوتة في الصخر تسلقت البنات على التو أعلى الرفوف ،قابضات بحرص على برطمانات هشة تسلقت االمريكية الكسندرا ستيفنسون التي بلغت نصف نهائي بطولة ويمبلدون ،مرتبة تتسلق بنجاح القمم األعلى في خمس من قارات العالم السبع. أتسلق للصعود إلى كابينة السائق ،الجانب األول اعتدت عليه فى معظم يتسلقوا اآلثار لتلتقط لهم الصور الفوتوغرافية في هذا الوضع أو أن تسلق بلكونة المطعم وسرق منه مبلغ 900دينار من الكاش باإلضافة يتسلقون على أكتاف ذلك المبدع. تسلق لصوص سقالة في متحف تاريخ الفن في فيينا في ساعة يتسلق البناء السياسي الهرمي بتلقائية نظف الصحون في الحانات. تنظف الجلد عميقا وتزيل المواد السمية في الجلد واألكزيما وتبيض البقع تنظف المكتب وتقدم القهوة للضيوف وتستقبل المكالمات الهاتفية ،وتقوم بأعمال السكرتاريا تنظف وزارة التربية ومناهجها وجهازها التعليمي من مخلفات السيادة األيدولوجية المقيتة ينظف بيته من فضائح الرشاوى التي أضرت بسمعة االتحاد عموما وكرة تنظف نفسها من مسؤولية الجريمة اإلرهابية المستمرة في الخليل ينظف ألف صحن في ساعتين فيدخل سجل االرقام القياسية تنظف قلب ابنها وعقله من التعصب والكراهية ومن سبق تصنيف البشر ننظف بناياتنا العريقة ونجمل شوارعنا
201
context sentences تنظفوا ما أحدثتموه من فوضى". ضم الوفد ممثلين لعدد من الشركات االلمانية المتخصصة في مجاالت االدارة تضم اللجنة التأسيسية لشركة «اعمار العقارية »المساهمة العامة خمسة اعضاء هم تضم قائمة المدعوين الى حفلة الزفاف اليوم 57شخصا بينهم بربارة يضم مجلس االدارة ممثلين عن الجهات المساهمة في البرنامج تضم مجموعة المنتجات الجلدية مايسترشتوك الملف الكالسيكي الطراز سيضم الفندق ايضا مراكز استجمام تشمل حوض سباحة في الهواء الطلق يضم المشروع مسجدا وقسما لالدارة واالستقبال ومركزا تجاريا ومطعما ومركزا للخدمات يضم برنامج تنمية البنية السياحية االساسية للفيلبين مشاريع كبرى ضم الجناح االيطالي في المعرض 41شركة الى جانب البعثة التجارية يضم مركز دبي التجاري العالمي بعد افتتاح اعمال التوسعات سبع صاالت أغلقت اسعار االسهم في سنغافورة على تراجع بسبب تردي اسهم الشركات أغلق باب االكتتاب في الشركة الجديدة في 17آب (اغسطس )الماضي. أغلقت أمس بورصة بانكوك بسبب العطلة في تايالند أغلقت األسبوع الماضي كتب التعزية الرسمية في وفاة األميرة ديانا ،وذلك أغلقت قوات األمن الطرقات المؤدية الى معهد التجارة والمباني واالشغال العامة أغلق الرئيس حسني مبارك الباب امام امكان تقديم مصر دعما للسودان أغلقت اسرائيل الضفة الغربية وقطاع غزة عقب االنفجار أول من أمس أغلقت السلطات العراقية طريق رشيد في قلب بغداد حيث نفذ الهجوم أغلقت قوات االحتالل اإلسرائيلية يوم الخميس18/7/2002م الحرم اإلبراهيمي في مدينة الخليل أغلقت الحكومة المنشأة في كانون األول (ديسمبر )الماضي. مسح رئيس الدولة يديه ووجهه بالنفط ورفع يديه الى السماء ليحمد مسحت أيادي علماء آثار تونسيين طبقات الغبار عن معالم تاريخية عريقة تمسح الزجاج بالزفير. مسح جبهته بيده السليمة ثم نظر تجاهي بنظرة عدائية ،وقال بصوت تمسح الدمع عن عيني واألود امسح الغبار عن جدار القلب مسحت الدولة الصهيونية من ذاكرتها الظروف التي أدت الى بروز حماس تمسح من أذهان الناس حكاية ناد رياضي عريق حط فوقه «الصقر» مسحت األلوان الحمراء من البالطو والوجه. تمسح الفتات وتحضر شمعدانات مجددة بشموع ذات عبق غير مألوف. نقلت الوكالة عن مسؤول في الشركة الصينية قوله ان طاقة الكيبل نقلت صحيفة الجمهورية الحكومية امس عن رئيس الشركة فاضل الشهاوي قوله نقلت وزارة الخارجية االسرائيلية عن المراقبين االقتصاديين ان التجارة السنوية مع ينقل بريطانيا من موقع الشريك المتمرد الى موقع الشريك الفاعل تنقل المعلومات من أكاديميين خارج وداخل البلد إلى االطباء الممارسين بشكل فنقلت تلك المدرسة الى أم درمان لندرس فيها سنة واحدة.
202
context sentences نقلت وكالة «االناضول »التركية عن اذاعة تابعة لحزب بارزاني ان عشرة نقل الحصص الزائدة ،والملكيات الخاصة التابعة للجماعة إلى زعماء القبائل نقل مكاتبهم إلى الشوارع لمراقبة أعمال النظافة فيها نقل التكنولوجيا الهندية لمصر سمعنا أن هيئة تسويق النفط العراقية «سومو »لم تبلغ بعد الزبائن سمعت مقاطع من تلك المعزوفة من أحد كبار مسؤولي صناعة السيارات سمعت من محاضر في معرض اخير للكومبيوتر استضافته لندن أنه لو سمعت نقرة خفيفة على الباب سمعت أيضا مغنيا آخر يقول: سمعنا جميعا ،تحية الطلقة في نومنا. سمعت عن رجل يقول ان زوجته تقود حياة مزدوجة :حياته وحياتها. اسمع اهتزاز األسوار العتيقة بمرور شاحنة. سمعنا االسبوع الماضي أيضا ان مجلس ادارة أبل امضى االربعاء الماضي نسمع دائما من المحللين ان أبل في حاجة ماسة الى طرح سيلعن كل بنود الوثيقة بعد توقيعها. يلعن البعض العام 2010لصعوبته وقسوته وكثرة التحديات التي صحبها معه، لعن هللا الربا وآكله وموكله وكاتبه وشاهده وهم يعلمون، نلعن العطاء بين الشرفاء مقابل نظرة رضا من السياسيين واألثرياء. نلعن الظالم في السر أو نتعود العتمة في العلن؟ نلعن المقاومة والمقاومين ألنهم إرهابيون متطرفون تلعن الحداثة والتقانة وأوباءهما العديدة. نلعن مصر حين تفوز قطر بتنظيم المونديال ،نكاية فى هؤالء الذين لعناه وقطعنا اوصاله تقطيعا. خدم هذا المشروع قطاعات صناعية منها صناعة المواد الغذائية ،المنتجات الصحية تخدم شركات الكهرباء السعودية العشر قرابة ثالثة ماليين مشترك . يخدم اضطرارا في ثكنة مقفرة تخدم المجمع حاليا شبكة سكك حديد تؤمن نقل الفحم الحجري من يخدم مطار ابو ظبي نحو 3ماليين مسافر سنويا .ويشهد خدم هذا البرنامج في الماضي نظام التشغيل االساسي وال يزال االصدار خدم بنيامين نتانياهو العالم العربي من حيث يدري أو ال يدري،
203
)Appendix (E The selected sentences for the indirect object verbs
context sentences اعتمد الدكتور يوسف بطرس غالي وزير االقتصاد الميزانيات العمومية والحسابات الختامية تعتمد فكرة الحفار علي شفط االتربة الناتجة عن الحفر بالبنطة والحلزون اعتمد الرئيس حسني مبارك الحركة السنوية لرؤساء بعثات مصر الدبلوماسية بالخارج اعتمد مجلس اإلدارة رواتب الجهاز الفني يعتمد السيناريو الثاني ،علي مساعدة المؤتمر الوطني العراقي المعارض في السيطرة نعتمد في حياتنا العصرية كثيرا علي الوجبات الجاهزة والسريعة يعتمد اقتراح هاتش علي انهاء المحاكمة بلغة توضح أن تصويت مجلس صفر أحرزه 1 /جدة 1فاز الزمالك علي أهلي جدة فاز فيلم عمارة يعقوبيان بالجائزة الذهبية بمهرجان زيورخ السينمائي وفازت مصر بالمراكز الثالثة األولي للمسابقة فاز في لقاء الذهاب بالقاهرة فاز يونس بجائزة اليونسكو عن المشروع نفسه جوهر فاز في الماراثون فاز الفنان نور الشريف بجائزة افضل ممثل عن دوره في فيلم فاز الفنان يوسف شعبان بمنصب نقيب الممثلين فاز السويدي ماغنوس الرسون المصنف ثامنا على االسباني فرانشيسكو كالفيت فاز الكتاب بجائزة أحسن مترجم في األردن قضى معظم وقته حر يفكر في فيزياء نيوتن فطور قضى البرت اخر حياة في البحث عن نظرية الحقل قضى آخرون العيد امام شاشة التلفاز لمتابعة آخر التطورات قضى يسوع ثالثة سنوات مع هؤالء التالميذ سأله أحدهم قضى حياته معلما ومتعلما عام )أيار مايو 19قضى أنفاسه األخيرة في قضى ثالثين عاما من حياته تلميذا طالبا للعلم قضى بعملية التعقيم هذه على الطفيليات التي كانت تعيث قضى الخلفاء الراشدون المهديون أن من أغلق بابا أو 485قضى هللا بسقوط دولة آل عباد في سنة تمتد لتساؤالت عند مناقشة دور منظمات المجتمع المدني لدور النقابات العمالية يمتد حق الدفاع عن النفس إلي دول أخري غير الدولة المعتدي امتدت موجة االرتفاعات القياسية لتشمل قيم واحجام التداول وعدد الصفقات المنفذة يمتد هذا الصراع االن إلي منابع النيل وربما كانت هذه المرحلة يمتد الطموح الي ان تحين الفرصة في األمد القريب 204
context sentences امتد الحوار إلي مدي كفاية المفتشين علي شركات الوساطة امتد االحتجاج علي الموقف األمريكي من العراق إلي التنديد بازدواجية المعايير مناطق ومحافظات 7امتد تأثير انخفاض االمطار الي الزراعة الصيفية في !عقبال عندك :امتدت يده إلي التليفون وطلب رقما وبعد لحظات قال امتدت الدردشة حتي عرف البياتي أطراف حديث يوسف إدريس عن المرض امتدت معاصينا إلي حاضرنا فضقنا باآلخر وصادرنا فكره نبيل مهنا األستاذ .يمتد هذا التأثير الي الحيوانات كما يقول د امتد سوء السلوك في البعض في فن المخاطبة التي تسمع فيها يمتد المشروع من الجنوب عن طريق أبوسمبل العوينات وشماال حتي جنوب انقض المجلس علي ذلك وأخذ السيد أحمد المحروقي في تحصيل ذلك !ينقض المغامرون عليها وعليهم فال يترحم عليهم أحد ينقض جنراالت الجيش علي السلطه مرة أخري اذا شعروا أو أن انقض علي الملك فأكل عينيه ولسانه وأذنيه الذئاب الوحشية تنقض على الحضارة وتبيد األعداء انقض شعب رشيد علي جنود وضباط الحملة لحظة وضعهم أمتعتهم علي انقض محمد يوسف بقسوة وعنف علي قدم صبري في كرة مشتركة انقضوا موظفي االصالح الزراعي علي األموال التي آلت إلي مسئوليتهم منذ الشهور األولي لتطبيق الحكومة تنقض علي نقابة الصحفيين إن عصفور الكناريا حين يفاجأ بالغراب ينقض على عشه كان يقدم »المصرف العقاري«قدمت المصارف المتخصصة خدمات كبيرة للمواطنين ،ف تقدم مختلف المحالت المشاركة في المهرجان خصومات وأسعارا ال تصدق على أعداه »دومينيك مونكورتوا«و »هيدي مورافتز«ابتكارا جديدا من المبدعين »شانيل«تقدم ورقة عمل »انفستكورب«قدم السيد نمير قردار الرئيس التنفيذي لبنك صندوق ضمان«ستقدم السلطة الفلسطينية هذه القروض الى صندوق مقترح باسم قدمت الورقة ،في اطار الخطة الزراعية متوسطة المدى قدم وزير التجارة بدولة البحرين السيد علي صالح الصالح في جلسة االستثمارية ومقرها لندن االستشارة الى شركة كورال »كابيتال ترست«قدمت مجموعة مليون دوالر لحساب 85قدم البنك الدولي الى المغرب قرضا قيمته »مناجم الفوسفات«قرضا الى شركة »البنك العربي«قدم تجمع مصرفي بقيادة كل مرة أتمرد .تستسلم الحالة لشكلها الصريح ،أنا الذي حاولت ،لماذا وتابعت الصحيفة .استسلمت الزوجة بعدما استخدمت قوات األمن القنابل المسيلة للدموع استسلم زهير بخيت الذي يعول عليه االماراتيون كثيرا لرقابة عادل حسن استسلم فريق الصومالي بعدها لينجح رعد الجوهر باضافة الهدف الرابع الذي استسلم أبي في أسى ،وكريم وسالي استسلما أيضا ،فقال أشرف في استسلمت للصمت كما تعودت أن أستسلم للبالء فى الحجرة المظلمة فتمتم استسلم رمزى صالح لألمر الواقع ،وبدأ يجهز نفسه للرحيل نهاية الموسم، استسلم الزعيم لألسر كما سلم قبلها بالمنفى ،داوم بوبو على
205
context sentences واحتل قطاع الكهرباء .فصل التقرير حصة كل قطاع من هذا التمويل فصل االحصاء هذه الموجودات بأنها تتكون من موجودات من الذهب قيمتها تفصل وادي حضرموت هضبتا حضرموت الشمالية والجنوبية من الوسط ،وهي مسطحات ما »ملحمة الحرافيش«و »اوالد حارتنا«يفصل ما بين روايتي نجيب محفوظ يفصل الحدود الشمالية للمنطقة االقتصادية الحرة سياج مكهرب يمتد عشرات الكيلومترات 37تفصل بين الفريقين حاليا ثالث نقاط فقط اذ جمع اتلتيكو تفصل نقطة واحدة بين لاير مدريد وبرشلونة ومثلها بين برشلونة وديبورتيفو تفصل المحكمة في الدعوى على وجه السرعة وكلما اقتضى األمر تأجيل تمسك جواد بالمفردات المعمارية للمدينة التي يعرفها ومنها الهالل الذي تنبه تمسك النحات ومساعدون بالسقالة التي تتمايل اكثر من خمسة سنتيمترات في تمسك الشباك في وجه التيار تمسكت الدولة العبرية امس بموقفها الذي قال بار ايالن انه ينطوي لن يحصل اي"تمسك نتانياهو في محادثاته مع روس بموقفه بأنه تمسك مسؤولون امنيون فلسطينيون بأن االعتقاالت التي نفذتها السلطة الفلسطينية التي تمسكت الدول االفريقية بقرار قمة ياوندي الذي رشح الدكتور بطرس غالي تمسك إبراهيم حسن ،المنسق العام للفريق الكروى األول بنادى الزمالك تمسك بعض الزعماء العرب بشعارات االستقالل ،تجد اآلن آذان عربية صماء، التف اللبنانيون والسوريون بسرعة حول شعار االستقالل التام الذي اسقط الخالفات التفت الساق بالساق التف الحبل على البكرة بسبب سرعة اندفاعي صعودا وبقيت معلقا هناك تلتف حولي دروب المدينة التف بعض جماعات اليهود من فقراء الريف حول بركوخبا واشتبكت مع ولم .التفت حشود باكية حول السوق تبحث عن اقارب أو اصدقاء تلتف جولة نتانياهو على مشكلة ديبلوماسية كبرى تمر بها اسرائيل مع التف السكان حوله ،فكان أعظم تجمع شهدته ساحة القرية التف األردنيون حول الدولة نظاما ومؤسسات وبنى ،طوقوها بأهداب العين ،وكانوا انسحب الجيش االسرائيلي من قسم من الجنوب ولكن االحتالل بقي في بعد نزع سالحها إلى بلدة بغالن التي تبعد »طالبان«انسحبت قوات انسحب مسعود مع مقاتليه الى داخل وادي بنشير الذي فجر مدخله .انسحبت القوات االسرائيلية من الشوارع واتخذت مواقع اخرى فوق سطوح المنازل العفو«انسحب النائب عصام فارس من الجلسة النيابية امس احتجاجا على انسحب المحاميان أحمد األبيض وبدر باسنيد عضوا هيئة الدفاع عن المتهمين انسحب االسالميون في مجلس النواب األردني من جلسة المجلس مساء امس سيلتقي الوفد وزراء قطاع االعمال العام والمال والتجارة والتموين واالقتصاد والتعاون التقى الجميع للمرة األولى في منطقة المخيم التي تقع في وادي يلتقي رئيس الوزراء المصري الدكتور كمال الجنزوري اليوم اعضاء اللجنة الوزارية التقى كبير مفاوضي كوريا الجنوبية في وقت الحق مع الصحافيين وقال
206
context sentences سيلتقي الوفد بأعضاء اللجنة التنفيذية لمجلس االعمال السعودي االميركي والتي ستجتمع سيلتقي المسؤول الروماني كال من الدكتور كمال الجنزوري رئيس الوزراء ووزراء التقيت في جامعة حيفا بالدكتور جورج قنازع وهو محقق لعدد من تلتقي هذه الجهود بمشاريع بحثية اخرى منجزة في مجال الفكر العربي تلتقي األمين العام للمجلس األعلى المصري لآلثار الدكتور علي حسن أعضاء يلتقي القارئ بإشارتين فادحتين في بداية عالقته بهذه الرواية الممتعة التي التقيت بهذا الطيار ولم احقق معه ،أرسل الى مديرية االستخبارات الجوية، حصل أحمد حميد الطاير وزير المواصالت االماراتي بالوكالة ،في مؤتمر وزراء يحصل األردن على جزء من النفط العراقي بسعر خاص يقل عن حصل قطاع السكة الحديد عام 1996على استثمارات قدرت ب حصلت الجزائر امس على قرض من صندوق أبو ظبي للتنمية مقداره حصل خطأ من قبل االدارة السابقة في عدد محدود من العلب على شهادة نظام الجودة العالمية األيزو »الرضوان للهندسة والمقاوالت«حصلت شركة التي تنشط في أسواق المال واالستثمار على مقعد في »نومورا«حصلت حصلت الشركة الخليجية الدولية لالستثمار على ترخيص من السلطات الكويتية بالعمل يحصل زبون المصرف المشترك في الخدمة المصرفية االلكترونية على االشتراك مجانا حصل الجمهوريون ايضا على ما طلبوه ،أي على نصوص تشمل خفض تخلص حتى اآلن ثالث جمهوريات في االتحاد السوفياتي السابق من االسلحة تخلصت صحافة موسكو من مطرقة الرقابة التي كان يفرضها الحزب الشيوعي تخلص الدراسة اليابانية الى ضرورة اقامة نظام مماثل لوكالة الطاقة الدولية تخلصا على ما يبدو من هذه الحالة قررت يو ميري أخيرا الف لاير في عامها االول 58تخلصت من الطائرات المستأجرة وربحت باتوا اكثر .تخلص البيروتيون من السذاجة التي ،ربما ،تحتاجها المدن لتبقى تخلص الفريق اليمني من خطته الدفاعية في الشوط الثاني بحثا عن تخلص العازف من الشعور بسيطرة من حوله عليه ،وخلص إلى نفسه تخلص نجار فى حلوان من حياته ،صباح أمس األول ،بأن صنع تخلصت سيدة فى أطفيح بمحافظة حلوان من زوجها عن طريق توصيل .يرتبط مستشارو الرئيس بيل كلينتون بعالقة وثيقة مع مستشاري طوني بلير 800ترتبط ايران بعالقات جيدة مع روسيا التي وقعت اتفاقا قيمته مكتبا وشركة منتشرة في مختلف دول العالم12 ،يرتبط بالشبكة اآلن ارتبطت صورة أذربيجان على الدوام بصناعة النفط والغاز التي تطورت أنشطتها يرتبط حاليا توسع نشاط المصارف المغربية بنمو حركة الطلب على االستهالك يرتبط الطلب على الطاقة ارتباطا وثيقا بمستوى النشاط االقتصادي ومعدالت نموه بالجامعات في السعودية ممثلة في كليات الصيدلة في »الدوائية«ترتبط رالشركة ارتبط تدهور سعر الصرف في صورة كاملة بمسألة التضخم وموضوع سعر يرتبط تطوير خدمات سريعة لحجز المركبات وتسهيالت االقامة والترفيه باعتماد البطاقات .ارتبط ذلك القانون بالسياسات االشتراكية التي كانت مطبقة في ذلك الوقت
207
context sentences وافق مجلس ادارة صندوق التنمية الصناعية السعودي على منح الشركة قرضا في المئة من 50وافقت على زيادة رأس مال الشركة بنسبة وافق المجلس التنفيذي في صندوق النقد الدولي على البرنامج بعد دراسة 001وافق المجتمعون على ان يكون االنطالق برأس مال مقترح مقداره المانية لعالج -وافقت الحكومة االلمانية على انشاء جمعية طبية مصرية وافق األردن على مبدأ استقبال رحالت تقوم بها طائرات تابعة ل .وافق مجلس الوزراء األردني اول من أمس على تعديل اسعار المياه وافقت الجمعية العمومية التأسيسية في اجتماعها على تقرير اللجنة التأسيسية في وافق وزير الصناعة السوري احمد نظام الدين على اقتراح المدير العام وافق شارون على الخطة المصرية تخلى المارك عن المكاسب التي حققها في وقت سابق مقابل العمالت تخلى كابيال عن نزعته الماركسية وارثه اللوممبي ويقول انه يؤمن باقتصار تخلت عن فوازير رمضان لصالح المسلسل التاريخي تخلى الرئيس األلباني صالح بريشا عن عناده وعن ادعائه تخلت النيابة العامة االتحادية عن تهمة تهريب االسلحة بحق الشقيقين مشيرة ليؤسس الحزب الديموقراطي ويشارك 1990تخلى اخيرا عن عضويته في نهاية تخلى الرئيس اليوغوسالفي االتحادي سلوبودان ميلوشيفيتش عن المتشددين من صرب البوسنة، تخلى الوحدة في الشوط الثاني عن طريقته الدفاعية في الشوط األول من قبل» ،السياسة«الذي التزمته »الصمت«يتخلى هيكل في المقال عن تخلت الشرطة عن افتراض جريمة السرقة في تحقيقها في مقتل كود
208
)Appendix (F The selected sentences for the ditransitive verbs context sentences يفرض دفتر الشروط ان يكون لدى الطرف الراغب بالتقدم بعرض رأس تفرض روسيا حاليا رقابة مشددة على مشتريات الذهب والمعادن الثمينة االخرى فرضت السلطات البلدية منذ اربع سنوات ضريبة اقامة في الفنادق تتراوح فرض مركز تحكيمي غير عربي فرضت ادارة الرئيس بيل كلينتون قانونا جديدا عام ،1995حظرت بموجبه فرضت عقوبات اميركية على شركتين كيماويتين صينيتين بسبب عالقتيهما المزعومة مع فرضت اليابان فائدة هامشية على القروض الحكومية المقدمة بالين. فرض مجلس األمن دفع تعويضات على العراق بموجب الفصل السابع من فرضت الواليات المتحدة في العام الماضي عقوبات على شركة «شيريت انترناشيونال» كرس سادة كاستييا انفسهم لصناعة الحرب اليدوية وتربية الخراف .لم يعرفوا كرس كوزبي مع زوجته وقتا كبيرا لمشاريع تتعلق بالتعليم والعائلة والثقافة. يكرس الكاتب اكثر قصصه لنقل معاناة الناس في العراق ،مع الحصار يكرس الكاتب أيضا فصال للحياة النيابية في العهد االستقاللي فيشرح طريقة كرس حياته لدراسة القرآن والحديث والنحو .وقد ذكر ابن خلكان مؤلفاته يكرس التعديل ضوابط العمل الحزبي التي حددها قانون االحزاب الصادر في كرس الزعيم االسالمي البالغ 47عاما من عمره القسم االول من كرس خطابه للحديث عن قمة هلسنكي .يلتسن يؤكد حصوله على كرس المؤتمر العالمي الذي عقد في طشقند في آب (اغسطس) كرس حياتك للشعب ورثت ادارة الرئيس كلنتون ذلك الموقف غير الحاسم ،و بدا انها ورث سليمان داوود وقال يا أيها الناس علمنا منطق الطير وأوتينا ورثت غابة من الحمام والحب واالنتظار ورثت الحكومة الجديدة تركة ثقيلة من المشاكل االقتصادية واالجتماعية والسياسية ،وقد ورث ياسين عن أبيه العمل فى نقل المحاصيل على المراكب الشراعية ورث الرئيس السادات تركة الخوف القابع فى نفس المواطن عن جمال ورث القرن العشرون الميراث البغيض للمد االستعماري اإلمبريالي في القرن التاسع ورث أوباما عن الرئيس جورج دبليو .بوش ما يشبه قطعة من ترث الحكومة الجديدة ارثا ثقيال جدا على الصعيد االقتصادي :تضخم وصل ورثوا من األسالف كل كريهة وتستروا خوفا من األتباع ضم الوفد ممثلين لعدد من الشركات االلمانية المتخصصة في مجاالت االدارة يضم الوفد ممثلين لسوق األوراق المالية والهيئة العامة لالستثمار ووزارتي الزراعة تضم قائمة المدعوين الى حفلة الزفاف اليوم 57شخصا بينهم بربارة يضم الوفد عشرة من رجال األعمال يعملون في قطاعات صناعية عدة
209
context sentences سيضم الفندق ايضا مراكز استجمام تشمل حوض سباحة في الهواء الطلق يضم المركز أحدث أجهزة الليزر لتأمين عالج األمراض الجلدية وازالة البقع تضم الجبيل الصناعية متنزهات ضخمة صممت على شكل غابات أخاذة من يضم المؤتمر كل شركات المالحة العالمية وشركات وأصحاب السفن والناقالت وهيئة تضم اذربيجان اكاديمية للعلوم وهناك نحو 821الف طالب جامعي وطالب يضم الفندق مقاهي ومطاعم تناسب األذواق والميول المختلفة بعضها داخلي وبعضها هنأ الرئيس حسني مبارك واالمين العام للجامعة العربية الدكتور عصمت عبدالمجيد هنأ المجلس خالل جلسته االسبوعية امس جوسبان بفوزه في االنتخابات وتكليفه هنأ أربكان النواب الذين صوتوا ضد مشروع القانون واعلن انه ينوي هنأ اتحاد الرابطات اللبنانية المسيحية الحكومة على «اعترافها بوجود مشكلة بعدما هنأ مدرب فيورنتينا كالوديو رانييري العبيه على عرضهم امام برشلونة «وقد هنأ مدرب فيورنتينا كالوديو رانييري العبيه على عرضهم امام برشلونة «وقد هنأ خادم الحرمين الشريفين الملك فهد بن عبدالعزيز الفريق البشير بانتخابه هنأ رئىس الجمهورية الياس الهراوي اللبنانيين بوحدتهم الوطنية التي تجلت خالل هنأ لبنان على تسليم الشريدي الى بون .واشنطن غمرت مياه البحر قسما من الميناء وسور المدينة الفينيقية التي كانت غمرت صدرك فوق صدري فارتوى غمر الصدفة في الماء ،غسلها فلمعت اكثر ،جففها وخبأها في ثوبه. غمرت موجة من االنباء السلبية الين امس االثنين في حين ظل غمرت السعادة العبي برشلونة بعدما قدموا كرة القدم الممتعة التي اعتادوا غمر الطوفان األرض ..وعمروها اللي بقيوا غمرت مياه عادمة أمس شوارع مدينة معان نتيجة تدفقها من أحد تبرعت السعودية بأكثر من 60مليون دوالر للمفوضية منذ العام ،1979 يتبرع غيتس والصحافيان اللذان ساعداه على الكتابة بإفهامكم ان جهاز الكومبيوتر تبرع خادم الحرمين الشريفين الملك فهد بن عبدالعزيز بمبلغ 01ماليين تبرع األمير الوليد بن طالل رئيس مجلس ادارة البنك المتحد بمبلغ تبرع الهالل األحمر المغربي بسيارتي إسعاف لبلدية "بيت أمر "الفلسطينية جنوب تبرع الشيخ مبارك الصباح بكتل من األخشاب لوضع الجندل عليها لبناء تبرع طاهر بقيمة الجائزة البالغة خمسة آالف دوالر الستحداث صندوق أردني تبرع أحد الحاضرين فأفهمه أنهم يضحكون بسبب قدوم دندوش إليه لكي تبرعت دول منظمة المؤتمر االسالمي بمبلغ 118مليون دوالر للدول االسيوية تبرع المهندس أشرف صبري بمبني مساحته الف متر م ومكون من يتنازل نجم الدين اربكان عن رئاسة الوزارة التركية لنائبته تانسو تشيلر تنازلت الشركة عن حقها في دعواها بعدما تعهدت وردة بتنفيذ العقد تنازلت الحكومة عن الحرب الجهادية -بعد كل انهر الدم وآالف تنازلت الصغيرة مرغمة عن رغبتها في شراء فستان أبيض يشبه فستان تنازل عبدالستار تميم ،والد المطربة اللبنانية الراحلة سوزان تميم ،عن الدعوى
210
context sentences تنازل منتصر الزيات ،المحامى ،عن البالغ الذى قدمه ضد الكاتب الصحفى تنازل سبعة مرشحين بمحافظة األقصر عن خوض االنتخابات المقبلة بدوائر المحافظة تنازلوا عن قسم من عقاراتهم لمصلحة شق الشوارع ،وبقيت على حالها تنازلت الحكومة عن ايرادات مستحقة لها سواء كان ذلك باالعفاء منها تنازل أحد التجار المعروفين بسمعتهم في السوق ،عن ثالثه ماليين ل.س تنازل الحسن إلى معاوية يكذبهم. أسندت مهمة االستشارة المالية للمشروع إلى مصرف «تشيس مانهاتن كوربوريشن »األميركي، يسند رأسه المثقل إلى الحائط أسند اليه رأسي أسندت حقيبة البيئة الى ممثلة لحزب الخضر هي دومينيك فواني. أسند الى فيكتور راؤول كاستيلو منصب رئيس المحكمة العليا الذي كان أسند االتحاد اآلسيوي مهمة قيادة المباراة لطاقم حكام كويتي مكون من أسندت الدكتورة جيهان رأسها على الحائط البالي القديم لثوان بال إرادية.. أسندت الحكومة االفغانية لوالدها منصبا لكنه كان مراقبا طوال الوقت ،وهكذا أسندت قيادة القوات الفيديرالية في الشيشان الى الجنرال فياتشيسالف تيخوميروف من أسند الكباريتي حقيبة التربية والتعليم لمنذر المصري ،والعدل للنائب عبدالكريم الدغمي، يمرر الجهاز كتلة النيوترون داخل الحقيبة فتبعثرها وتبدد محتوياتها وتمكن المكشاف مرر ليوناردو كرة رائعة الى روبرتو كارلوس الذي عكسها عرضية باتجاه مرر محمد علي كرة عالية واجهها حسن سعيد برأسه وابعدها الحارس مررت بمحل لبيع العصافير الملونة يالصق المتجر الصغير ...حملقت في االقفاص مرر روبرتو كارلوس كرة بينية الى مواطنه سافيو الذي كسر التسلل مررت على محل الخباز الذى أتعامل معه ،فوجئت بالسيدة تنظر لى مررت يدها على النقود وطلبت اسمي وفعلت كما فعلت مع زميلي مررت على رسول هللا صلى هللا عليه و سلم وفي إزاري وضع البروفسور تقديراته انطالقا من نتائج بحوث ميدانية أجراها بالتعاون مع وضعت التجارة الحدودية بين العراق وتركيا األمم المتحدة في موقف حرج وضع بنك التسويات الدولية عام 1995المرحلة الثانية من اللوائح التي وضع المؤسسون هدفا لهم يتلخص في تخفيف المعاناة عن أهل القدس وضعت وزارة التجارة الخارجية في مصر خطة للمشاركة في معارض عدة وضعت الشركة خطة عمل للبدء في تسوية األرض وتجهيز الخدمات األساسية وضع برنامج االمم المتحدة لالنماء تقريرا مفصال عن عمله في مجال وضع هاموند أوال تصميم طاولة قهوة على شكل بيانو مصغر ،اي وضع برنامج لعمل المنظمات االهلية العربية للسنوات الخمس المقبلة. حنان الشيخ فنجان القهوة على المائدة في شقتها في لندن نقلت صحيفة الجمهورية الحكومية امس عن رئيس الشركة فاضل الشهاوي قوله نقلت «اوابك »في تقريرها عن مسؤولين سعوديين ان «ارامكو »ستستثمر نحو نقلت وزارة الخارجية االسرائيلية عن المراقبين االقتصاديين ان التجارة السنوية مع
211
context sentences نقل أمس السفير القطري في الرياض علي بن عبدهللا المحمود رسالة ينقلك من قريته «عين عنوب »في لبنان الى حديقة منزله في قد نقلت هذه المعدات خصيصا من لندن وستزود الصالة 3آالف مقعد. قد نقل المهاجرون النمسويون تقاليد هذه المقاهي الى جميع انحاء العالم. قد نقل كريستوفر موافقة رابين الى القيادة السورية وتلقاها منه في 18 أضاف التقرير ان الشركة عملت على المحافظة على حصتها من السوق أضاف المصرف 14جهازا للصرف اآللي ليصل مجموعها الى 104أجهزة. أضاف كوستيلو في مؤتمر صحافي «أعلن اليوم الغاء سياسة االعمدة الستة». أضاف عبدالكريم ،في لقاء صحافي ،ان هذه المؤشرات ال تدع مجاال يضيف نظام ستابيليتراك مظهرين جديدين الى النظام المتكامل للتحكم بالشاسيه أضاف في بيان تلقته «الحياة »ان المؤتمر الذي تنظمه غرفة تجارة أضاف المتحدث باسم الشركة قائال« :انهت الشركة اآلن 51عشر عاما ربح «قطب الزيتون »الحاكم بزعامة رئيس الوزراء رومانو برودي ،المعركة في ربحت الدكتورة سعاد الصباح القضية التي رفعتها على جريدة «اندبندنت »وقبلت ربحت اسرائيل الحرب وتفوقت في السلم والعلم واالقتصاد. ربحت حماس سياسيا عندما تركت حمامة السالم التاريخية ،شمعون بيريز ربحت الزميلة غادة عبدالحافظ ،فى أقل من ساعتين70 ،جنيها من ربح المعارك دون وجود إستراتيجية واضحة .وقد التصق هذا االنطباع ،ولعقود اتهمت مجموعة طبية فلسطينية السلطة الوطنية بتعطيل مشروع صحي كبير اتهم االشتراكيون تحالف يمين الوسط الحاكم بفرض اجراءات تقشف فظيعة للوفاء اتهم السفير البلجيكي في لوكسمبورغ بودوان دو ريهوف مصارف لوكسمبورغية بغسل يتهم نائب ميسيسيبي السابق بتلقي هدايا مثل اغراض شخصية وبطاقات لحضور اتهمت اسرائيل امس الثلثاء مسؤولين فلسطيينين بان لهم عالقة بتجارة السيارات اتهم المسروجي بعض المتنفذين في نقابة الصيادلة الفلسطينيين بالوقوف وراء الحملة اتهم ضابط الشرطة السابق مارك فرنهام خصوصا بوضع القفاز الملوث بالدم يتهم الغرب الصين بأنها تضمر نوايا شريرة ذات طبيعة توسعية تجاه اتهمت بعثة فرنسية بتشويه اآلثار فأحيلت الى التحقيق اتهم رئيس الوزراء البريطاني جون ميجور الجيش الجمهوري االيرلندي بالوقوف وراء يربط المراقبون حاليا بين احتمال نمو العالقات التجارية وتحسن العالقات الفلسطينية ربط المصدر بين شطب تلك المادة التي دعمها بعض اعضاء المجلس ربط الوزير بين هذا المشروع وبين خط انابيب «سوميد »الذي ينقل تربط هذه الشركات ارتفاع أسعار الغاز محليا بارتفاع أسعاره عالميا .وتؤكد ربط الرحالت الدولية بالمناطق النائية ،واعتبرت ان التعاون القائم بين شركة ربط بلغاريا وتركيا بكيبل بصري لالتصاالت الهاتفية يربط ابن خلدون بين زوال ملك الدولة االيوبية بضعف العصبية االيوبية ربط دين االسالم االربريين بشكل غير مباشر بالعرب وثقافتهم على رغم ربطت تطوير الشراكة المتوسطية بمصير السالم في الشرق االوسط .وشددت على
212
context sentences خصصت الشركة مليوني جنيه استرليني للوحات والتصميم العام ،إال أن التنفيذ خصص البنك الدولي قرضا قيمته 300مليون دوالر للنهوض بالخدمات الصحية خصص في سيناء شرق قناة السويس مشروع للخدمات الزراعية والتنمية الريفية سيخصص قسم من االستثمارات لتحديث الرصيف التجاري لميناء بنزرت وإصالح كاسرة خصصت المجلس جزءا من مداوالته لمناقشة سياسات المصرف وتوجهاته المستقبلية ومنها خصص األوروبيون استثمارات للمركز قدرت بنحو 24مليون دوالر .وأظهرت دراسة خصصت شركة ال .جي الكترونيكس مبلغا يقدر بنحو 6.38بليون لاير خصصت الشركات الراعية للمهرجان الكثير من الجوائز المالية والعينية يتقدمها الذهب خصص المؤلف فصال خاصا لرموزية الدم وفعل التضاد الذي يالزمه .والحال خصصت الكاتبة الفصل الثالث لدرس موقف المعتزلة من النظريات الشيعية في بررت الحكومة حاجتها الى تطوير شبكة الكهرباء باعتماد 22مليون دوالر بررت وزارة االقتصاد والتجارة عدم مشاركتها في االجتماع المقترح لضبط األسعار بررت المصارف خطوتها بارتفاع اسعار الفوائد على الودائع في صورة ملحوظة يبرر نواب رغبتهم في التفاهم المسبق ب «ضيق الوقت »لدرس ملفات برر عزيز في لقاء مباشر مع الري كينغ في شبكة التلفزة تبرر المقاطعة معارضة االنتخابات بأن انتخابات رئاسية وبرلمانية سابقة زورت ،وبرفض خصصت تونس استثمارات قيمتها 62مليون دينار خصصت تونس استثمارات بقيمة 575مليون دوالر لتحديث قطاع التأهيل والتدريب خصصت شركة «الثريا »ثالث فرق حاليا الجراء مفاوضات مع المساهمين المحتملين خصصت 280مليون دوالر للتدريب المهني خالل 5سنوات . يخصصان لمناقشة التقرير السنوي والبحث في زيادة رأس المال . خصصت المادة العاشرة من البروتوكول للسياحة في مناطق الحكم الذاتي الفلسطيني خصصت العراق 609ماليين دوالر الستيراد مواد غذائية بمقتضى المرحلة الثانية سيخصص المؤتمر احدى جلساته الستعراض فرص االستثمار في مصر خصصت جلسات أول من أمس للنظر في مصادر تمويل االستثمارات. خصص الفصل الخامس لنقد أبرز الدراسات الجامعية في الموضوع وفيه نقد أجبرت التطورات في باكستان أمس رئيس الدولة على االستقالة أجبر اآلباء اوالدهم على االمتناع عن تناول وجبات الطعام التي تقدم أجبر هذا الهدف مدرب الوحدة على التخلي عن الطريقة الدفاعية التي أجبر انتخاب نتانياهو الملك حسين على تكثيف جهود اليجاد توازن في اجبرت ويلز المانيا في المجموعة األخيرة على خوض السباق حتى أمتاره اجبر لكوريون على المغامرة في الهجوم لكنهم لم يشكلوا خطورة كبيرة أجبر الحادث األخير أكثر من ألف عائلة مسيحية على مغادرة العراق أجبر انفجار فقاعة العقارات دبي على التعامل مع ديونها المتراكمة التي أجبر نجم ميامي دووين وايد على اللعب على رغم اصابته في تجبر العروس في قبيلة -جوبيس -األفريقية على ثقب لسانها ليلة زفافها يسمح امتداد الخيمة الطويل باستيعاب 03ألف زائر دفعة واحدة نظرا
213
context sentences تسمح القناة التي بنيت بعرض 05مترا بنقل المياه من نهر تسمح التعليمات الجديدة بتحويل اي مبالغ من حسابات المقيمين بالعمالت االجنبية تسمح التعليمات الجديدة للمصارف األردنية بتحويل اثمان البضائع والسلع المستوردة الى يسمح النظام الجديد ،وفقا لمحافظ مؤسسة النقد المصارف السعودية ،بالقيام بالمدفوعات سيسمح القانون الجديد للجهة باعتماد ضرائب كانت من اختصاصات االدارات المركزية سمحت المادة العاشرة للطرفين األردني واالسرائيلي فرض اية موانع او محددات يسمح للعراق اآلن بتصدير ما قيمته ملياري دوالر من النفط الخام سمحت السلطات االسرائيلية ل 22شاحنة أردنية محملة باالسمنت بعبور جسر يسمح القانون المغربي بإشراك القطاع الخاص في مشاريع البنى التحتية في أعاد فوز لوبن الي األذهان كل ذكريات الحرب العالمية أعاد بناء نفسه اقتصاديا وسياسيا ليتخذ المكانة التي يستحقها أعاد المحاولة ،ولكن دون فائدة ..شعر أن أعاد السماعة إلى مكانها .رن جرس أعاد الكرة بالخطأ لتصل لناجي مجرشي الذي سددها بقوة أعاد ألحكام محاكمنا العليا تقليدها القديم باستخدام االلفاظ االجنبية والتي حرمنا أعادت تأكيد موقفها الصريح والقاطع ،ضد كل محاوالت التطبيع مع العدو نعيد دراستها يا سيادة الرئيس أعادني الي الحديث عن اتحاد الكتاب وانتخاباته ،وزوبعة الحديث عن أهمية أعاد نيتانياهو نفس المشكلة مع كلينتون.
214
Appendix (G) Normalization rules ; RuleSet:(MX)MA_work_normalization, (MX)MA_work_normalization ("/\d+/",%y)("/[^\s\d]+/",^""ـ,%x):=(%y)(" ")(%x); ("/\d+/",%w)(""ـ,%x):=(%w)(%x,"-"); ("/(|)ُ| ِ| َ| ّ| ْ||ٍ||ٌ|ـ/",%x):=; (" ",%x)(" ",%y):=(%y); ({""|""|""}):=; (SHEAD,%w)("-",%x):=(%w); (^"/\d+/",%w)("-",%x):=-(%x); ({"+"|"·"|"†"|"•"|""|"”"|"•"|">"|""ة%z,POD); (%y,Rertieved,^N)(%z,POD):=(%y,?N)(%z,POD); (V,TSTI,%x)(NP,%y)({NP|JP},%z)({^PP|STAIL},%w):=(?TSTD,?[%x])(%y)(%z)(%w);
224
Appendix (I) Default Dictionary ; User:Marwa Saber (Marwa) ; Date:05.01.2015 at 08:51:32 [ ]{}""(PUT=BLK); [‾]{}""(PUT=oline); [_]{}""(PUT=UNDERSCORE) ; [-]{}""(PUT=HYPHEN); [–]{}""(PUT=ndash); [—]{}""(PUT=mdash); [,]{}""(PUT=COMMA); [;]{}""(PUT=SEMICOLON); [:]{}""(PUT=COLON); [!]{}""(PUT=EMARK); [¡]{}""(PUT=iexcl); [?]{}""(PUT=QMARK); [(""}{]؟PUT=AQMARK); [‽]{}""(PUT=IBANG); [⸘]{}""(PUT=IIBANG); […]{}""(PUT=hellip); [.]{}""(PUT=PERIOD); [·]{}""(PUT=middot); [']{}""(PUT=APOSTROPHE); [‘]{}""(PUT=lsquo); [’]{}""(PUT=rsquo); [‚]{}""(PUT=sbquo); [‹]{}""(PUT=lsaquo); [›]{}""(PUT=rsaquo); ["]{}""(PUT=QUOTE); [“]{}""(PUT=ldquo); [”]{}""(PUT=rdquo); [„]{}""(PUT=bdquo); [«]{}""(PUT=laquo); [»]{}""(PUT=raquo); [(]{}""(PUT=OPARENTHESIS); [)]{}""(PUT=CPARENTHESIS); [[]{}""(PUT=OSBRACKET); []]{}""(PUT=CSBRACKET); [{]{}""(PUT=OCBRACE); [}]{}""(PUT=CCBRACE); [@]{}""(PUT=AT); 225
[*]{}""(PUT=ASTERISK); [/]{}""(PUT=FSLASH); [\]{}""(PUT=BSLASH); [&]{}""(PUT=amp); [#]{}""(PUT=HASH); [%]{}""(PUT=PERCENTAGE); [‰]{}""(PUT=permil); [†]{}""(PUT=dagger); [‡]{}""(PUT=Dagger); [`]{}""(PUT=GRAVE); [´]{}""(PUT=ACUTE); [^]{}""(PUT=CIRCUMFLEX); [¯]{}""(PUT=macr); [¨]{}""(PUT=uml); [¸]{}""(PUT=cedil); [¶]{}""(PUT=para); [°]{}""(PUT=deg); [←]{}""(PUT=larr); [→]{}""(PUT=rarr); [↑]{}""(PUT=uarr); [↓]{}""(PUT=darr); [÷]{}""(PUT=divide); [×]{}""(PUT=times); []{}""(PUT=gt); [|]{}""(PUT=VERTICALBAR) ); [¦]{}""(PUT=brvbar); [~]{}""(PUT=TILDE) ); [♠]{}""(PUT=spades); [♣]{}""(PUT=clubs); [♥]{}""(PUT=hearts); [♦]{}""(PUT=diams); [¤]{}""(PUT=curren); [$]{}""(PUT=DOLLAR); [£]{}""(PUT=pound); [¥]{}""(PUT=yen); [€]{}""(PUT=EURO); [(""}{]؛PUT=SEMICOLON); [(""}{]ـPUT=HYPHEN); [،]{}""(PUT=COMMA); [//]{}""(PUT=BSLASH); [((]{}""(PUT=OPARENTHESIS); [))]{}""(PUT=CPARENTHESIS); [/(?i)(http\:\/\/[^ ]+)/]{}""(TEMP,LEX=N,POS=PPN,URL); 226
[/(?i)(https\:\/\/[^ ]+)/]{}""(TEMP,LEX=N,POS=PPN,URL); [/(?i)(ftp\:\/\/[^ ]+)/]{}""(TEMP,LEX=N,POS=PPN,URL); [/(?i)(www\.[^ ]+)/]{}""(TEMP,LEX=N,POS=PPN,URL); [/(?i)([^ ]+@[^ ]+)/]{}""(TEMP,LEX=N,POS=PPN,EMAIL);
227
بسم هللا الرَّحْ مَ ِن ال َّر ِحي ِم ي َل ْوال َأ ْن هَ َدا َنا ْال َح ْمد ِ ََّلِلِ الَّ ِذي هَ َدانَا لِهَ َذا َومَا ك َّنا لِنَ ْه َت ِد َ َّهللا صدق هللا ال َعظيم
228