The Cognitive Complexity of Texts and Its Formal ... - Semantic Scholar

1 downloads 0 Views 214KB Size Report
mal syntaxes and semantics of natural languages can be traced back to the works of Alfred Taski (1944) and Noam Chomsky (1956). Chomsky formalized a gen ...
Copyright © 2012 American Scientific Publishers All rights reserved Printed in the United States of America

Journal of Advanced Mathematics and Applications Vol. 1, 275–286, 2012

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics Yingxu Wang1 ∗ , Robert C. Berwick2 , and Xiangfeng Luo3 1 International Institute of Cognitive Informatics and Cognitive Computing (ICIC), Department of Electrical and Computer Engineering, Schulich School of Engineering University of Calgary, 2500 University Drive NW, Calgary, Alberta T2N 1N4, Canada 2 Deparment of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 3 Department of Computer Science, Shanghai University, Shanghai 200072, China

Keywords: Cognitive Informatics, Cognitive Linguistics, Text Comprehension, Text Complexity, Cognitive Complexity, Syntactical Complexity, Semantic Complexity, Knowledge Engineering, Denotational Mathematics, Concept Algebra, Cognitive Computing, Computing With Words.

1. INTRODUCTION Languages are gifted and collective inventions of human beings for thought, expression, communication, as well as knowledge representation and acquisition (Bickerton, 1990; Botha and Knight, 2009; Casti and Karlqvist, 1986; Cavalli-Sforza, 2000; Chomsky, 1982, 2007; Gleason, 1997; Wang, 2009a; Wang and Berwick, 2012; Zadeh, 1999). Linguistics is the science of natural and symbolic languages (Chomsky, 1956; Crystal, 1987; O’Grady and Archibald, 2000; Pattee, 1986; Pullman, 1997; Wardhaugh, 2006). Cognitive linguistics is an emerging discipline that studies the cognitive properties of natural languages and the cognitive models of linguistics in cognitive computing and computational intelligence (Eugene, 1996; Evans and Green, 2006; Langlotz and Andreas, 2006; Taylor, 2002; Wang, 2013; Wang and Berwick, 2012). ∗

Author to whom correspondence should be addressed.

J. Adv. Math. Appl. 2012, Vol. 1, No. 2

It was conventionally perceived that the natural language properties were non-quantifiable and their mathematical modeling was challenging problems in cognitive linguistics and computational linguistics. Studies on formal syntaxes and semantics of natural languages can be traced back to the works of Alfred Taski (1944) and Noam Chomsky (1956). Chomsky formalized a general language framework known as the universal grammar (Chomsky, 1956, 1957, 1959, 1962, 1965, 1982, 2007). Eugene et al. proposed the concept of cognitive linguistics, which attempts to explain the cognitive processes of language and knowledge acquisition, storage, production, and comprehension (Eugene, 1996; Wang and Berwick, 2012). Wang and his colleagues introduce the deductive grammar (Wang, 2009a), deductive semantics (Wang, 2010b) and their denotational mathematical means (Wang, 2008a) in cognitive linguistics and cognitive informatics (Wang, 2002, 2003, 2007a, b, 2009b, c, 2010a; Wang and Wang, 2006; Wang et al., 2006, 2009a,

2156-7565/2012/1/275/012

doi:10.1166/jama.2012.1022

275

RESEARCH ARTICLE

In order to rigorously measure the cognitive complexity of texts in natural languages, a formal metric is developed that provides a fundamental measure for the properties of syntax and semantics in textual comprehension, processing, and search. The objective and subjective aspects of text comprehension and their complexity are formally modeled. A formal language model is introduced that characterizes the discourse of natural languages. The mathematical models of cognitive complexity of texts and their properties are rigorously described at the sentence, paragraph, and essay levels from the bottom up. On the basis of the cognitive and mathematical models of cognitive linguistics, the measurement of cognitive complexity of texts is quantitatively established and tested by a set of case studies. A wide range of applications of the measurement of textual complexity are identified in cognitive linguistics and contemporary web technologies such as cognitive search engines, online document retrieval, natural language processing, cognitive linguistics, cognitive machine learning, computing with words, and cognitive computing.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

RESEARCH ARTICLE

b, 2011b). A concept algebra based technology (Wang, 2008b, 2010c) is developed recently for rigorous semantic modelling and analysis as well as machine learning (Tian et al., 2011; Wang et al., 2011a). Then, new paradigms of denotational mathematics (Wang, 2008a) known as inference algebra and semantic algebra is created for machine reasoning based on language knowledge bases (Wang, 2011a, b, 2013). Semantic analysis and comprehension are a deductive cognitive process. In cognitive psychology, comprehension

Wang et al.

is described as the construction of an internal representation based on existing knowledge previously gained in the brain (Matlin, 1998; Polya, 1957). Both Matlin and Polya identified the subjectivity of comprehension that it has a relation with the comprehender’s background knowledge, where whatever one is trying to comprehend, one relies on one’s existing knowledge previously acquired. According to cognitive informatics (Wang, 2010a, 2003, 2007b) and the Object-Attribute-Relation (OAR) model (Wang, 2007c) for internal knowledge representation, the semantics of a

Yingxu Wang is professor of denotational mathematics, cognitive informatics, software science, and brain science. He is President of International Institute of Cognitive Informatics and Cognitive Computing (ICIC), Director of Laboratory for Cognitive Informatics and Cognitive Computing, and Director of Laboratory for Denotational Mathematics and Software Science at the University of Calgary. He is a founding Fellow of ICIC, a Fellow of WIF (UK), a P. Eng. of Canada, a Senior Member of IEEE and ACM. He received a Ph.D. in Software Engineering from the Nottingham Trent University, UK, and a B.Sc. in Electrical Engineering from Shanghai Tiedao University. He has industrial experience since 1972 and has been a full professor since 1994. He was a visiting professor on sabbatical leave in the Computing Laboratory at Oxford University in 1995, Department of Computer Science at Stanford University in 2008, the Berkeley Initiative in Soft Computing (BISC) Lab at University of California, Berkeley in 2008, and CSAIL at MIT (2012), respectively. He is the founder and steering committee chair of the annual IEEE International Conference on Cognitive Informatics and Cognitive Computing (ICCI∗ CC). He is Editor-in-Chief of Journal of Advanced Mathematics and Applications (JAMA), and Associate Editor of IEEE Trans. on System, Man, and Cybernetics (Part A). Dr. Wang is the initiator of a few cutting-edge research fields or subject areas such as Denotational Mathematics (i.e., concept algebra, inference algebra, semantic algebra, process algebra, system algebra, granular algebra, and visual semantic algebra); Cognitive Informatics (theoretical framework of cognitive informatics, neuroinformatics, neurocomputing, the layered reference model of the brain (LRMB), the mathematical model of consciousness, and the cognitive learning engine); Abstract Intelligence (I, mathematical models of intelligence, theoretical foundations of brain science); Cognitive Computing (such as cognitive computers, cognitive robots, cognitive agents, and cognitive Internet); Software Science (on unified mathematical models and laws of software, cognitive complexity of software, and automatic code generators, the coordinative work organization theory, and built-in tests (BITs)); basic studies in Cognitive Linguistics (such as the cognitive linguistic framework, formal semantics of linguistics, mathematical model of abstract languages, deductive grammar of English, and the cognitive complexity of text comprehension). He has published over 140 peer reviewed journal papers, 230+ peer reviewed conference papers, and 25 books in denotational mathematics, cognitive informatics, cognitive computing, software science, and computational intelligence. He is the recipient of dozens international awards on academic leadership, outstanding contributions, research achievement, best papers, and teaching in the last three decades. Robert C. Berwick is professor of computational linguistics. Professor Berwick and his research group investigate computational models of language acquisition, language processing, and language change, within the context of machine learning, modern grammatical theory, and mathematical models of dynamical systems. In the area of machine learning and language, the lab uses the minimum description length (MDL) proposal, updated to incorporate Vapnik’s notion of both structural and empirical risk minimization, to induce models from naturalistic parent-child language examples such as the CHILDES corpus. They use this to test explicit hypotheses about the nature and rate of child language development within the context of current linguistic theories, and across multiple languages such as English, Dutch, French, and German. By parameterizing Chomsky’s current linguistic theory into a set of approximately two-dozen modules, the lab has implemented a Prolog system that can be rapidly switched among several dozen languages simply by substituting a new lexicon or dictionary. This computer model can be used to predict and test current linguistic theories with respect to their psycholinguistic fidelity and their logical adequacy. Further, this same model can be viewed as a formal account of both language change over time and language acquisition. 276

J. Adv. Math. Appl. 1, 275–286, 2012

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

Xiangfeng Luo is a professor in the School of Computers, Shanghai University, China. Currently, he is a visiting professor in Purdue University. He received the master’s and Ph.D. degrees from the Hefei University of Technology in 2000 and 2003, respectively. He was a postdoctoral researcher with the China Knowledge Grid Research Group, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), from 2003 to 2005. His main research interests include Web Wisdom, Cognitive Informatics, and Text Understanding. He has authored or co-authored more than 50 publications and his publications have appeared in IEEE Trans. on Automation Science and Engineering, IEEE Trans. on Systems, Man, and Cybernetics-Part C, IEEE Trans. on Learning Technology, etc. He has served as the Guest Editor of ACM Transactions on Intelligent Systems and Technology. Dr. Luo has also served on the committees of a number of conferences/workshops, including Program Co-chair of ICWL 2010 (Shanghai), WISM 2012 (Chengdu), CTUW2011 (Sydney) and more than 40 PC members of conferences and workshops.

sentence in a natural language is considered being understood only when: (a) The logical relations of parts of the sentence are clarified; and (b) All parts of sentence are reduced to the terminal entities, which are either a real-world image or a primitive abstract concept.

J. Adv. Math. Appl. 1, 275–286, 2012

2. THE DEDUCTIVE SYNTAX AND SEMANTICS FOR TEXT COMPREHENSION The structural and semantic properties of natural languages are described in this section based on a generic language model as the discourse of cognitive linguistics and computational linguistics. Then, the formal models of the deductive syntax and semantics are established. The relations and interactions between syntax and semantics of natural languages are explained. 2.1. The Cognitive Linguistic Model of Natural Languages A formal model of a general language can be described as an abstract language (Wang and Berwick, 2012), which forms the discourse of cognitive linguistics. Definition 1. The abstract language, , is a 5-tuple, i.e.,: ∧ (1)  =      where •  is the alphabet of the language as a finite ordered set of letters, i , 1 ≤ i ≤ n, i.e.,  ={1 , 2 , …, i , …, n },   ; • , the morphology, is a set of lexical rules between ∧ power sets of letters in the given alphabet of , i.e.,  = Þ × Þ  ; 277

RESEARCH ARTICLE

In cognitive informatics, cognitive science, AI, and computational intelligence, comprehension is identified as an ability to understand something, which indicates an intelligent power of abstract thought and reasoning of humans or intelligent systems. It is curious to explore the internal process of comprehension in the brain and to explain its basic mechanisms, because comprehension is one of the fundamental processes of brain at the advance cognitive layer according to the Layered Reference Model of the Brain (LRMB) (Wang et al., 2006). The studies on semantic analyses and text comprehension in cognitive linguistics lead to the analysis of a fundamental problem known as the cognitive complexity of texts and its relation to the syntactic complexity and the semantic complexity in natural languages. On the basis of the formal models and theories of the cognitive complexity of texts, a wide range of problems in cognitive linguistics may be explained. For instances, what is the formal model of the cognitive complexity of texts? What are the roles of and interactions between syntax and semantics in text comprehension? Why may the Internet have significantly reduced the cognitive complexity of texts? What is the threshold that may be used to distinguish a simple and a complex essay? How is the cognitive complexity of texts reduced for certain levels of readers? This paper presents a quantitative measurement of the cognitive complexity of texts towards the formal explanation of the aforementioned problems. In the remainder of the paper, both objective and subjective aspects of text cognition and their complexity are formally modeled for cognitive linguistics and cognitive computing. Section 2 introduces a cognitive linguistics model of texts. Section 3 analyzes the basic properties of cognitive complexity of

texts in natural languages. Section 4 creates a set of mathematical models of cognitive complexity for text comprehension where the metrics of text cognitive complexity is rigorously modeled and measured at the sentence, paragraph, and essay levels from the bottom up. Section 5 quantifies the measurement of text cognitive complexity in comprehensive case studies. A wide range of applications of the measurement of text complexity have been identified in online, web-based, and electronic document comprehension and processing for both humans and cognitive machines in cognitive linguistics and cognitive computing.

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

•  is the lexis, or vocabulary, a finite nonempty ordered ∧ set of words  in , i.e.,  =      . A subset of  is known as the primitive words (0  0 ⊂ , that directly represents real-world entities, proper names, meta-behaviors, and abstract concepts that cannot be further deduced onto more primitive concepts or behaviors; • is the syntax, a set of relational rules between power ∧ sets of words in the given vocabulary of , i.e., = Þ × Þ  ; •  is the semantics, a set of mappings between a power set of syntactic relations onto a power set of terminal syntactic relations 0 , which can be reduced as relations between the Cartesian products of nonterminal words  ∧ and terminal words 0 , i.e.,  = Þ → Þ 0 = Þ × Þ → Þ0 × Þ0  0 ⊆   . The formal model of the abstract language  indicates that any natural language can be built on the basis of the five essences known as the alphabet, morphology, lexis (vocabulary), syntax, and semantics. On the basis of , the properties of natural languages can be rigorously modeled and analyzed in the following subsections.

2009a, 2010b, c). A formal semantics is mathematical models for denoting meaning of symbols, notations, concepts, functions, and behaviors, as well as their relations that can be reduced onto a set of predefined entities and/or known concepts. Definition 3. The mathematical model of semantics, (, in language  is a function that maps a set of syntactic relation  into a set of terminal syntactic relations with terminal words  0 , i.e.,: ∧

( =  → 0 = W ×W → W0 ×W0

(3)

where W is a power set of words with legal strings of the alphabet in the language, W ⊂ Þ  ; and W 0 is a subset of W with a power set of terminal words, W 0 ⊂ W ⊂ Þ  . In cognitive linguistics, the state space of semantics in a natural language can be modeled as follows. Definition 4. The cognitive model of semantics of natural languages, (, is a 5-tuple, i.e.,: ∧

( = J  V  O S T 

RESEARCH ARTICLE

2.2. The Deductive Syntax of Cognitive Linguistics A formal syntax is abstract models of the syntax system in  where concrete strings of tokens and their grammatical relations are symbolically represented and rigorously analyzed. The syntactic rules that underlie natural languages form the domains of formal linguistics and grammars. One of the most influential linguistic framework known as the theory of universal grammar (UG) was proposed by Noam Chomsky (Chomsky, 1957, 2007). UG is a system of categories, mechanisms, and constraints shared by natural languages, which is perceived as innate based on neurolinguistic and cognitive linguistic studies (Chomsky, 1982; O’Grady and Archibald, 2000; Taylor, 2002). Definition 2. The mathematical model of syntax,  , in language  is a Cartesian product of words W , i.e.,: ∧

 = W ×W

(2)

where W is a power set of words with legal strings of the alphabet in the language, W ⊂ Þ  , and  ⊂ Þ  .

(4)

where • J is the subject of the sentence; • V is a behavior or action; • O is the object of the sentence; • S is the space where the action is occurring; • T is the time when the action is occurring. 2.4. Relations Between Syntax and Semantics in Cognitive Linguistics According to Definitions 1 through 4, the relationship between a language and its syntaxes and semantics can be illustrated as shown in Figure 1 (Wang, 2009a) where the text T is expressed in one-dimension (1-D), the syntax is in 2-D, and the semantics is in 5-D. Figure 1 explains that linguistic analyses are a deductive process that maps the 1-D text into the 5-D semantics via the 2-D syntactical analyses. O O …O …O O …

[T (1-D)]

2.3. The Deductive Semantics of Cognitive Linguistics Semantics is a domain of linguistics that studies the interpretation of words and sentences, and the analyses of their meanings. In linguistics, semantics deals with how the meaning of a sentence in a language is obtained and comprehended. Studies on semantics explore mechanisms in the understanding of language and the nature of meaning where syntactic structures play an important role in the interpretation of sentence and the intension and extension of word meaning (Chomsky, 2007; Taylor, 2002; Wang, 278

J V

S

T Ψ (2-D) Fig. 1.

O Θ (5-D)

The universal language processing (ULP) model.

J. Adv. Math. Appl. 1, 275–286, 2012

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

Theorem 1. The semantics of a sentence in a natural language is comprehended iff: (a) The syntactic relations of parts of speech in the sentence are clarified; and (b) The semantics of all elements of the sentence are reduced to terminals W 0 . The cognitive informatics model of comprehension is a cognitive process at the advanced cognition layer according to the LRMB model of the brain (Wang et al., 2006) and the object-attribute-relation (OAR) model of internal knowledge representation (Wang, 2007c). The cognitive model of the comprehension process can be conceptually modeled in the following steps: (a) To search relations from real entities to virtual entities and/or existing objects and attributes. (b) To build a partial or adequate OAR model of the entity. (c) To wrap up the sub-OAR model by classifying and connecting it to appropriate clusters of the entire OAR in long-term memory. (d) To memorize the new OAR model and its connections in long-term memory. The cognitive process of comprehension is supported by the low-level processes such as those of search, memorization, and knowledge representation. A rigorous description of the cognitive process of comprehension may refer to (Wang and Gafurov, 2010).

Based on the cognitive linguistic models of deductive syntax and semantics in the universal linguistic discourse developed in the preceding section, the basic properties of cognitive complexity of texts can be explained and analyzed. The cognitive complexity of texts and their comprehension is dependent on both its structural and denotational characteristics, i.e., the syntactical and semantic properties of the text. According to cognitive informatics, the former can be measured by the structural complexity of the text; while the latter can be measured by the number of semantic reductions during text comprehension. Lemma 1. The cognitive complexity of texts is proportional to both the syntactic complexity C dependent on the structural properties of the given text and the semantic complexity C( in term of the number of semantic reductions by searching the meaning of each word in the given text, respectively, i.e.,: ∧

C = f C  C( 

(5)

According to Lemma 1, the cognitive complexity of texts can be reduced to the analysis of the semantic complexity of elements in a given text and the syntactical complexity for composing them into the structure of the J. Adv. Math. Appl. 1, 275–286, 2012



C( =

n 

wi  n = W 

i=1

=

W  

wi &

(6)

i=1

where the unit of the semantic complexity is a single cognitive weight (&) that represents an internal semantic reduction of a terminal word. Definition 6. The cognitive weight of semantic reduction, w, in semantic analysis is the comprehension cost for searching or mapping a word in the linguistic knowledge base as shown in Table I. According to Table I, the deductive search for semantic reduction of a word can be divided into two categories known as the internal and external search. The former is a search in a reader’s internal knowledge base, which results in a unit cost, w = 1, when the search and comprehension of the word is successful. The latter is a search in external knowledge bases such as an on-line system, a dictionary, a library, an expert, and a comprehensive study with increasing costs in each type of search as given in Table I. A possible negative outcome of semantic analysis is an exhaustive search that results in an incomprehension with w = . As modeled by the cognitive weights for text comprehension, a reader or a cognitive system first searches the internal knowledge base in order to minimize the semantic analysis cost. However, if any internal search is unsatisfied, an external search must be conducted with higher cost. The cognitive weights also indicate that the subjective properties of individual readers towards the same text can be quantitatively recognized and represented. Definition 7. The syntactic complexity of a sentence, C , is the cognitive cost for composing the individual semantics of words in the sentence into a comprehended Table I.

The cognitive weight (w) for semantic reduction.

No.

Category

Scope of search

Cognitive weight (w)

1 2 3 4 5 6 7

Internal External

The reader’s knowledge base On-line Dictionary Library Expert A comprehensive study Incomprehension

1 2 4 6 8 10 

Exhaustive

279

RESEARCH ARTICLE

3. BASIC PROPERTIES OF THE COGNITIVE COMPLEXITY OF TEXTS

sentence and those beyond. Therefore, both the semantic and syntactic complexities of texts are rigorously described in the following before the entire cognitive complexity is formally explained. Definition 5. The semantic complexity of a sentence, C( , is a sum of the cognitive weights, wi , 1 ≤ i ≤ n, of each of its n words when they are deduced into a known concept or behavior in the given linguistic knowledge base, i.e.,:

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

RESEARCH ARTICLE

semantic relation denoted by the syntactic rules of the sentence, which is classified into five categories as follows: ⎧ ∧ ⎪ 1 S1 = V   //a meta-sentence ⎪ ⎪ ⎪ ∧ ⎪ ⎪ 2 S = + V  //a simple sentence ⎪ 2 ⎪ ⎪ ∧ ⎪ ⎨ 3 S3 = + V  //a complete sentence ∧ ∧ C = 4 S = + P V P NP  //a complex sentence (7) 4 ⎪ ⎪ ⎪ with phrases ⎪ ⎪ ⎪ ∧ ⎪ ⎪ 5 S5 = SS //a complex sentence ⎪ ⎪ ⎩ with clauses where the unit of the syntactic complexity is a single composition [P ] of the semantics of a single element in a metasentence that consists only of a verb or an interjection. The types of sentences as described in Definition 7 can be illustrated in Figure 2 where the structural influence on semantic composition of sentence elements is demonstrated. According to statistics, that most natural and practical sentences composed in the generic patterns as given in Figure 2 fall into categories 4 or 5, i.e., the complex sentence structures with phrases and clauses in order to convey precise expressions. According to Lemma 1 as well as Definitions 5 through 7, a quantitative measure of the cognitive complexity of texts can be derived as follows. Definition 8. The cognitive complexity of text, C, is a product of itssyntactic complexity C [P ] and semantic complexity C( [&], i.e.,: ∧

essay levels from the bottom up. Then, the metrics of text cognitive complexity is rigorously described by a set of mathematical models in the following subsections. 4.1. Cognitive Complexity of Texts at the Sentence Level Definition 10. The comprehension level of texts, k, k ∈ 0 1, is determined by the extents of individual reader’s ability to understand the general materials (kg ) and professional materials (kp ) in the given text as classified in Table II, i.e.,: 1 k = kg + kp  k ∈ 0 1 2

The typical comprehensive levels of readers can be classified into four categories known as insufficient (I = 20%), basic (B = 60%), medium (M = 80%), and advanced (A = 100%). The combinations of these comprehensive levels (k) indicate an individual reader’s average comprehension ability for a given text. Definition 11. The cognitive complexity of a sentence, CS, is a product of the syntactical complexity and the semantic complexity of the sentence (S) derived from Eq. (8), i.e.,: ∧

CS = C • C( = C •

C = f C  C( 

= C •

= C • 

W 

wi P&

(8)

where the unit of the cognitive complexity is called a composed cognitive weight (P&). Definition 9. The physical meaning of a unit of composed cognitive weight, P&, denotes a single semantic weight (&) by a meta-composition (P  with the simplest structure of a meta-sentence, i.e.,: ∧

1P& = 1P • 1&

wi

kW 



wi +

i=1

1−kW 



wj 

j=1

wi ≡ 1 internal 2 ≤ wj ≤  external

i=1

(9)

On the basis of the generic model of the cognitive complexity of texts as developed in Definitions 8 and 9, the cognitive complexities at different levels of the syntactic structures such as those of sentence, paragraph, and essay can be rigorously analyzed from the bottom up in the following section.

4. MATHEMATICAL MODELS OF THE COGNITIVE COMPLEXITY OF TEXTS The cognitive complexity of text comprehension can be modeled and measured at the sentence, paragraph, and 280

W   i=1

= C • C( 

(10)

= C • kW  +

1−kW 



wj  P&

(11)

j=1

where k, k ∈ 0 1, is the comprehension level of a given text to a reader as determined by Eq. (10), which divides the semantic analyses into kW  internal searches and 1 − kW  external searches. The model of sentence complexity as given in Definition 11 explains how the level of comprehension capability of a specific reader is quantitatively measured, which can represent the individual differences on the same sentence due to various comprehension ability and level of knowledge base. Corollary 1. When the average cognitive weight of external search difficulty is a constant, i.e., wj = n 2 ≤ n ≤ , the sentence cognitive complexity is reduced to the follows: ∧

CS = C • C( J. Adv. Math. Appl. 1, 275–286, 2012

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

AP

DP Fig. 2.

AP

VP

NP

CP

AP

NP

CP

DP

VP

CP

AP

NP

CP

The generic syntactic structure in DGE (Wang, 2009a).

= C •

W  

Proof. Theorem 2 can be directly proven by using Eq. (12), i.e.,: ⎧ Omin C = lim n + 1 − nkW  = W  ⎪ ⎪ k→1 ⎪ ⎨ //Full comprehension (14) C = lim n + 1 − nkW  = nW  →  O ⎪ max ⎪ k→0 ⎪ ⎩ //Incomprehension

wi

i=1



= C • kW  +

1−kW 



wj

j=1

= C • kW  + n1 − kW  wj = n = C • n + 1 − nkW  P&

(12)

Corollary 1 indicates that the higher the comprehension level k, the lower the cognitive complexity C in a text; and vice versa.

W  ≤ OC ≤ 

(13)

where the lower limit represents a full comprehension, while the upper limit represents an incomprehension after exhausted external searches for semantic reduction. Table II. The comprehension level (k) of reader’s knowledge bases (KBs).

No.

Level of general KB (kg )

Level of professional KB (kp )

k = 21 kg + kp 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

I I I I B B B B M M M M A A A A

I B M A I B M A I B M A I B M A

0.2 0.4 0.5 0.6 0.4 0.6 0.7 0.8 0.5 0.7 0.8 0.9 0.6 0.8 0.9 1.0

Weights insufficient: I = 20%; basic: B = 60%; medium: M = 80%; advanced: A = 100%.

J. Adv. Math. Appl. 1, 275–286, 2012

4.2. Cognitive Complexity of Texts at the Paragraph Level The semantic relations of sentences • are a set of connectors, which formally model the operations on phrase and sentence compositions and their joint meaning in structures beyond sentences. Definition 12. A paragraph, -, in language  is a composition of multiple sentences S towards a coherent theme of expressions, a behavioral process, and/or a synthesis of a structure, i.e.,: ∧

- = S •S  SS  S +  S +

(15)



• = →     where • is a set of primitive structural relations between sentences, • ⊂  ;  is a set of coordinatives and correlatives; and (S)+ and ( ;)+ denote that sequential sentences  separated by “,” or “;”, respectively, are repeated at least once. The primitive compositional operations of sentences in a paragraph -, •, can be classified as the sequential, parallel, embedded structures, and their combinations known 281

RESEARCH ARTICLE

Theorem 2. The order of the cognitive complexity of texts, OC, is constrained by the following limits:

The cognitive models of text comprehension and its complexity are applicable to both human and machine information processing systems such as intelligent search engines, word and knowledge processors, and cognitive robots.

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics Table III. The compositional structures of sentences in paragraphs. No. 1

Relation

Symbol

Sequential



Syntactic structure S S1

2



Parallel



S2

S

Mathematical model

Description

-s = S1 → S2 → · · · →Sn

Typical conjunctions: - Logical sequence: and, then, therefore, thus, further, moreover, furthermore - Time sequence: first ; second  finally Typical conjunctions: - Logical parallel: and, or, however, but, nevertheless - Time parallel: at the same time, alternatively, when, while, during, simultaneously Typical conjunctions: A clause introduced by that, which, if, whether. who, whom, where, because, … Typical conjunctions: Combination of the above three categories of primitive compositional operations

Sn

-p = S1  S2  · · ·  Sn

S1 S2 … Sn

3



Embedded

-e = S1  S2

S1 S2

4



Hybrid

-h =  Si = S0  (S1 → (S2  S3 

S0 S1

S2

RESEARCH ARTICLE

S3

as the hybrid structures. The syntactic structures, descriptions, and mathematical models of the primitive sentence structures are summarized in Table III. In addition to the structural relations of sentences as given in Table III, there are functional relations of sentences such as those of causation (such as therefore, thus, as a result of …,), extension (such as also, in addition to, in particular, especially, …), explanation (such as for example, for instance, …), and condition (such as if … then, …). However, from a structural point of view, the functional relations are sequential composition of sentences. Definition 13. The cognitive complexity of a paragraph - with the sequential structure of sentences, Cs -, is a sum of the cognitive complexities of all individual sentences, i.e.,: ∧

Cs - =

m 

=

CSi  P&

(16)

Definition 14. The cognitive complexity of a paragraph - with the parallel structure of sentences, Cp -, is a sum of the cognitive complexities of all individual sentences multiplied by the number of sentences, n, in the parallel structure, i.e.,: ∧

n 

CSi  P&

(17)

i=1

where n represents the synchronization cost in parallel semantic analysis. 282

A paragraph with a hybrid structure by combination of the three primitive structures can be treated as a sum of its separated parts that are decomposed to one of the primitive structures. Definition 16. The cognitive complexity of a paragraph - with the hybrid structure of sentences, Ch -, is the sum of the cognitive complexities in the three primitive forms of compositional structures, i.e.,: Ch - = Cs - + Cp - + Ce -

i=1

Cp - = n

i=1



CSi  m = -

i=1 - 

Definition 15. The cognitive complexity of a paragraph - with the embedded structure of sentences, Ce -, is a sum of the cognitive complexities of all embedded sentences multiplied by that of the outer layer sentence(s), i.e.,: n  ∧ (18) Ce - = CS0  • Ce Si  P&

=

ns 

CSi  + np

i=1



np 

CSj  + CS0 

j=1

ne 

Ce Si  P&

(19)

i=1

4.3. Cognitive Complexity of Texts at the Essay Level The structure and compositional relations of paragraphs in an essay (.) can be treated as a sequential structure among all paragraphs. Therefore, the measurement of the cognitive complexity of essays can be derived as the sum of all its paragraphs. J. Adv. Math. Appl. 1, 275–286, 2012

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

Definition 17. The cognitive complexity of an essay . with sequential paragraph structures, C., is the sum of the cognitive complexity of the p paragraphs, i.e.,: ∧

C. =

p 

C-i  p = .

i=1

=

. 

C-i  P&

(20)

i=1

Definition 18. The average cognitive complexity per word in an essay or lower structures, r., is a ratio between the sum of the cognitive complexity of the text C. and the total number of words in the text W , i.e.,: r. =

C. P&/W  W 

(21)

The measurement framework of the cognitive complexity of text systematically developed at the sentence, paragraph, and essay levels provide a coherent metrics for quantitative complexity analysis of texts in natural languages.

5. APPLICATIONS OF THE TEXT COGNITIVE COMPLEXITY METRICS IN COGNITIVE LINGUISTICS

Example 1. For a sentence S1 = “Claud, who was groaning and growling, got up and limped off.” Assuming both general and professional comprehensibility of a reader R1 are advanced, i.e., k = 1 0, analyze the cognitive complexity of R1 for the comprehension of S1 . According to Eq. (8), as an advanced reader with k = 1 in text comprehension, only internal searches are needed. Therefore, the cognitive complexity for the given sentence CS1 R1  is: CS1  R1  = C • C( = C •

W  

wi

Corollary 2. The average cognitive complexity per unit word in a given text at the levels of sentence rS, paragraph r-, and essay r. is a ratio between the cognitive complexity C and the number of words in the text W  at corresponding structural levels, i.e.,: ∧

CS P&/W  WS 



C- P&/W  W- 



C. P&/W  W. 

rS = r- = r. =

where the threshold for a complex text is r ≥ 10 0 P&/W . Example 2. Take S1 for R2 in Example 1 as an instance, the average cognitive complexity per unit word, rS1  R2 , can be quantitatively estimated as: rS1  R2  =

= 5 • 11 0 = 55 0 P& Use the same experiment for another reader R2 with an insufficient comprehensibility, e.g., k = 0 3 Assuming the average cognitive weight for external search is w = 3, J. Adv. Math. Appl. 1, 275–286, 2012

CS1  R2  WS1 

132 0 11 = 12 0 P&/W =

The result shows that S1 is a semantically hard sentence to R2 because rS1  R2  > r = 10 0 P&/W . However, rS1  R1  is significantly lower as analyzed below: rS1  R1  =

i=1

= C • W 

(22)

CS1  R1  WS1 

55 0 11 = 5 0 P&/W

=

because reader R1 is much advanced who understand the semantics of all words in S1 as given in Example 1. CS1  R2  = C • C( 283

RESEARCH ARTICLE

The measurement framework of the cognitive complexity of texts has been developed in preceding sections. This section presents a set of case studies on the measurement of text comprehension complexities to illustrate the applications of the theory and methodology for text complexity analyses in English as a typical natural language.

i.e., a combined use of dictionary and on-line knowledge bases for the 1 − k = 70% external searches, the cognitive complexity for comprehending S1 with regards to R2 is as follows: The above examples demonstrate that the cognitive complexity of text provides a quantitative explanation for the subjective complexity of text in natural languages for different individuals. It is observed that there is a certain empirical cognitive threshold () for sentence comprehension where  = 100 P&. Beyond this threshold, the comprehension difficulty of the given text will be significantly high for a specific reader. The cognitive threshold can be refined as the average cognitive complexity per word in the following corollary.

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

= C •

W  

wi

i=1

= C • kW  +

1−kW 



wj 

j=1

= 5 • 0 3 • 11 0 + 0 7 • 11 0 • 3 0 = 5 • 26 4 = 132 0 P& Example 3. Given a sentence, S2 = “A cognitive computer is an intelligent computer for knowledge processing.” Assuming the distribution of cognitive weights of readers R3 and R4 are as given in Table IV, analyze the cognitive complexities of S2 with regards to readers R3 and R4 , respectively.

in large part on the work of his predecessors, Isaac Newton deduced three laws of dynamics which he published in 1687 in his famous Principia. These laws are collectively known as Newton’s laws of motion, which provide the basis for understanding the effect that forces have on an object.” This paragraph encompasses three sentences in a sequential structure. All sentences are complex sentence with clauses whose syntactic complexities are 5. According to Eqs. (8) and (22), the cognitive complexity of -1 with regards to reader R5 is: C-1  R5  = =

W  

wj 

j=1

= 365 CR 365 C-1  R3  = = 5 0 SP /W W  73

For another reader R6 whose average cognitive weight for words in -1 is assumed as w = 2 5, the cognitive complexity on the same paragraph, C-1  R6 , is as follows:

i=1

= 4 • 1 + 6 + 3 + 1 + 1 + 4 + 1 + 1 + 2 + 2 = 88 P&

RESEARCH ARTICLE

Ci •

= 5 • 23 + 26 + 24

r-1  R5  = wi

3  i=1

CS2  R3  = C • C( = C •

CSi 

i=1

Applying Eq. (8), the cognitive complexities of S2 with regards to Reader 3 and Reader 4 can be quantitatively measured as follows, respectively:

W  

3 

C-1  R6  =

and

3 

CSi 

i=1

CS2  R4  = C • C( = C •

=

W 



3  i=1

wi

Ci •

W  

wj 

j=1

= 5 • 2 5 • 23 + 26 + 24

i=1

= 4 • 10

= 912 50 P&

= 40 P& r-1  R6  = It is obvious that the cognitive complexity of sentence S2 is much higher to reader R3 than R4 because the latter only needs internal searches in semantic reduction. Example 4. Analyze the cognitive complexity of the following sample paragraph -1 with complex sentences for Readers R5 (k = w = 1) and R6 ((k = 0 and w = 2 5) as follows: “The general principles of dynamics are rules which demonstrate a relationship between the motions of bodies and the forces which produced those motions. Based

912 50 C-1  R6  = = 12 50 P&/W W  73

The analysis results indicates that C-1  R6  = 5 0 P&/W , C-1  R5  = 12 5 P&/W , because of the extensive needs of external searches for semantic reduction. Example 5. Analyze the cognitive complexity of the following sample paragraph -2 for a reader R7 with w = 1: “Many commentators have noted that sentence connectors are an important and useful element in expository and

Table IV. The distribution of cognitive weights of sample readers. Word

A

cognitive

computer

is

an

intelligent

computer

for

knowledge

processing

wi (Reader 3) wi (Reader 4)

1 1

6 1

3 1

1 1

1 1

4 1

1 1

1 1

2 1

2 1

284

J. Adv. Math. Appl. 1, 275–286, 2012

Wang et al.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics

argumentative writing. Frequency studies of their occurrence in academic English extended at least as far back as Huddleston (1971). Textbooks have for many years regularly included chapters on sentence connectors. Most reference grammars deal only with their grammatical status, classification, meaning, and use.” Note that the paragraph -2 is a series structure of sentences. According to Eqs. (8) and (22), the cognitive complexity of -2 with regards to reader R7 is: C-2  R7  =

4 

CSi 

i=1

=

4 

Ci •

i=1

W  

wj 

j=1

= 5 • 18 + 4 • 17 + 4 • 11 + 4 • 13 = 254 P&

r-2  R7  =

C-2  R7  254 = = 4 31 P&/W W  59

6. CONCLUSIONS An important and conventionally unquantifiable problem on the cognitive complexity of texts in cognitive linguistics has been rigorously modeled by using denotational mathematics. The cognitive foundations of text comprehension and the discourse of formal natural languages have been systematically explored. The mathematical models of cognitive complexity of texts have been created as the product of the syntactic and semantic complexities of texts at the sentence, paragraph, and essay levels. A set of case studies has been presented to demonstrate the applications of this work in web text processing, search engines, natural language processing, cognitive linguistics, cognitive computing, cognitive learning engines, and computing with words. This work has revealed that the cognitive complexity of texts is proportional to both the syntactic complexity and the semantic complexity. It has also found that the threshold of the average cognitive complexity per unit word in a given text is r = 10 P&/W , beyond this threshold a reader will feel empirical difficulty for literature comprehension. Conventionally, the syntax and semantics of languages were treated separately, where the syntax has been J. Adv. Math. Appl. 1, 275–286, 2012

Therefore, the syntax of a language not only provides rules for semantic analyses in the process of semantic reduction from the top done, but also provides rules for semantic synthesis in order to reassemble the composed semantics of the entire sentence from the bottom up. Acknowledgments: The authors would like to acknowledge the Natural Science and Engineering Council of Canada (NSERC) for its partial support to this work. We would like to thank the anonymous reviewers for their valuable comments and suggestions.

References Bickerton, D. (1990). Language and Species (University of Chicago Press, Chicago). Botha, R. and Knight, C. (eds.), (2009). The Cradle of Language (Oxford University Press, Oxford, UK). Casti, J. L. and Karlqvist, A. (eds.), (1986). Complexity, Language, and Life: Mathematical Approaches, International Institute for Applied Systems Analysis (Laxenburg, Austria). Cavalli-Sforza, L. L. (2000). Genes, Peoples, and Languages (Farrar Straus and Giroux, NY, USA). Chomsky, N. (1956). Three models for the description of languages. I.R.E. Transactions on Information Theory IT-2(3): 113–124. Chomsky, N. (1957). Syntactic Structures (Mouton, The Hague). Chomsky, N. (1959). On certain formal properties of grammars. Information and Control 2: 137–167. Chomsky, N. (1962). Context-Free Grammar and Pushdown Storage, Quarterly Progress Report, MIT Research Laboratory, Vol. 65, pp. 187–194. Chomsky, N. (1965). Aspects of the Theory of Syntax (MIT Press, Cambridge, MA). Chomsky, N. (1982). Some Concepts and Consequences of the Theory of Government and Binding (MIT Press, Cambridge, MA). Chomsky, N. (2007). Approaching UG from Below. Interfaces + Recursion = Language? Chomsky’s Minimalism and the View from Syntax-Semantics, Mouton, Berlin. Crystal, D. (1987). The Cambridge Encyclopedia of Language (Cambridge University Press, NY). Eugene, H. ed. (1996). Cognitive Linguistics in the Redwoods: The Expansion of a New Paradigm in Linguistics (Cognitive Linguistics Research, Berlin), Vol. 6.

285

RESEARCH ARTICLE

The case studies presented in this section reveal that the cognitive complexity of texts not only provides a precise measurement for text comprehension, but also enables a rigorous explanation for the mechanisms of the cognitive complexity of text in natural languages. A wide range of applications in cognitive linguistics and computational linguistics can be derived based on the basic study on the fundamental properties of natural languages and their rigorous measurement.

statically perceived as only for structural analysis of a sentence. However, this work has revealed that the syntax of a sentence in natural languages plays an important role in both structural and semantic analyses where syntax guides the syntheses of semantics at the word and phrase levels as composition rules. This finding has explained that the cognitive process of text comprehension is a closed circle of both analysis and synthesis where: (a) The syntactic analysis breaks up a sentence into its elements in order to enabling the semantic analysis of words; (b) The composition of the semantics of words in the sentence based on the syntactic rules results in the comprehension of meaning of the entire sentence.

The Cognitive Complexity of Texts and Its Formal Measurement in Cognitive Linguistics Evans, V. and Green, M. (2006). Cognitive Linguistics: An Introduction (Edinburgh University Press, Edinburgh).

Int‘L Journal of Software Science and Computational Intelligence 1(1): 1–17.

Gleason, J. B. (1997). The Development of Language (Allyn and Bacon, Boston).

Wang, Y. (2010a). Cognitive robots: A reference model towards intelligent authentication. IEEE Robotics and Automation 17(4): 54–62.

Langlotz and Andreas (2006). Idiomatic Creativity: A CognitiveLinguistic Model of Idiom-Representation and Idiom Variation in English, John Benjamins, Ed. (Amsterdam).

Wang, Y. (2010b). On formal and cognitive semantics for semantic computing. International Journal of Semantic Computing 4(2): 203–237.

Matlin, M. W. (1998). Cognition, 4th edn. (Harcourt Brace College Pub., NY).

Wang, Y. (2010c). On concept algebra for computing with words (CWW). International Journal of Semantic Computing 4(3): 331–356.

O’Grady, W. and Archibald, J. (2000). Contemporary Linguistic Analysis: An Introduction, 4th edn. (Pearson Education Canada Inc., Toronto).

Wang, Y. (2011a). Inference algebra (IA): A denotational mathematics for cognitive computing and machine reasoning (I). International Journal of Cognitive Informatics and Natural Intelligence 5(4): 62–83.

Pattee, H. H. (1986). Universal Principles of Measurement and Language Functions in Evolving Systems. Complexity, Language, and Life: Mathematical Approaches, J. L. Casti and A. Karlqvist, Eds. (Springer-Verlag, Berlin), pp. 268–281. Polya, G. (1957). How to Solve It (Garden City, NY, Doubleday Anchor). Pullman, S. (1997). Computational Linguistics (Cambridge University Press, Cambridge, UK). Tarski, A. (1944). The semantic conception of truth. Philosophic Phenomenological Research 4: 13–47. Tian, Y., Wang, Y., Gavrilova, M. L., and Ruhe, G. (2011). A formal knowledge representation system (FKRS) for the intelligent knowledge base of a cognitive learning engine. International Journal of Software Science and Computational Intelligence 3(4): 1–17. Taylor, J. R. (2002). Cognitive Grammar (Oxford University Press). Wang, Y. (2002). Keynote: On cognitive informatics, Proceedings of the First IEEE International Conference on Cognitive Informatics (ICCI’02), Calgary, AB., Canada, IEEE CS Press, August, pp. 34–42.

RESEARCH ARTICLE

Wang et al.

Wang, Y. (2003). On cognitive informatics. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy 4(2): 151–167. Wang, Y. (2007a). Software Engineering Foundations: A Software Science Perspective CRC Series in Software Engineering, Vol. II (Auerbach Publications, USA). Wang, Y. (2007b). The Theoretical Framework of Cognitive Informatics. International Journal of Cognitive Informatics and Natural Intelligence, 1(1): 1–27. Wang, Y. (2007c). The OAR model of neural informatics for internal knowledge representation in the brain. International Journal of Cognitive Informatics and Natural Intelligence USA 1(3): 64–75. Wang, Y. (2008a). On Contemporary Denotational Mathematics for Computational Intelligence (Transactions of Computational Science, Springer), Vol. 2, pp. 6–29. Wang, Y. (2008b). On concept algebra: A denotational mathematical structure for knowledge and software modeling. International Journal of Cognitive Informatics and Natural Intelligence USA 2(2): 1–19. Wang, Y. (2009a). A formal syntax of natural languages and the deductive grammar. Fundamenta Informaticae 90(4): 353–368. Wang, Y. (2009b). On cognitive computing. International Journal of Software Science and Computational Intelligence 1(3): 1–15. Wang, Y. (2009c). On abstract intelligence: Toward a unified theory of natural, artificial, machinable, and computational intelligence.

Wang, Y. (2011b). On cognitive models of causal inferences and causation networks. International Journal of Software Science and Computational Intelligence 3(1): 50–60. Wang, Y. (2013). On semantic algebra: A denotational mathematics for natural language comprehension, manipulation, and cognitive computing. International Journal of New Mathematics and Applications 2(3): to appear. Wang, Y. and Wang, Y. (2006). Cognitive informatics models of the brain. IEEE Transactions on Systems, Man. and Cybernetics (C) 36(2): 203–207. Wang, Y., Wang, Y., Patel, S., and Patel, D. (2006). A layered reference model of the brain (LRMB). IEEE Transactions on Systems, Man, and Cybernetics (C) 36(2): 124–133. Wang, Y., Kinsner, W., and Zhang, D. (2009a). Contemporary cybernetics and its faces of cognitive informatics and computational intelligence. IEEE Trans. on System, Man, and Cybernetics (B), 39(4): 823–833. Wang, Y., Kinsner, W., Anderson, J. A., Zhang, D., Yao, Y., Sheu, P., Tsai, J., Pedrycz, W., Latombe, J.-C., Zadeh, L. A., Patel, D., and Chan, C. (2009b). A doctrine of cognitive informatics. Fundamenta Informaticae, 90(3): 203–228. Wang, Y. and Gafurov, D. (2010). The cognitive process of comprehension: A formal description. International Journal of Cognitive Informatics and Natural Intelligence 4(3): 44–58. Wang, Y., Tian, Y., and Hu, K. (2011a). Semantic manipulations and formal ontology for machine learning based on concept algebra. Int. Journal of Cognitive Informatics and Natural Intelligence 5(3), 1–29. Wang, Y., Berwick, R. C., Haykin, S., Pedrycz, W., Kinsner, W., Baciu, G., Zhang, D., Bhavsar, V. C., and Gavrilova, M. (2011b). Cognitive informatics and cognitive computing in year 10 and beyond. International Journal of Cognitive Informatics and Natural Intelligence 5(4): 1–2. Wang, Y. and Berwick, R. C. (2012). Towards a Formal Framework of Cognitive Linguistics. International Journal of Advanced Mathematics and Applications 1(2): in press. Wardhaugh, R. (2006). An Introduction to Sociolinguistics (WileyBlackwell, NY). Zadeh, L. A. (1999). From computing with numbers to computing with words—from manipulation of measurements to manipulation of perception. IEEE Trans. on Circuits and Systems I 45(1): 105–119.

Received: 8 July 2012. Accepted: 1 November 2012.

286

J. Adv. Math. Appl. 1, 275–286, 2012