Knowledge Representation: Predicate Logic ...

13 downloads 0 Views 339KB Size Report
2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]. Knowledge Representation: Predicate Logic. Implementation ...
2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]

Knowledge Representation: Predicate Logic Implementation using Sentence-Type for Natural Languages Madhuri A. Tayal

Dr. M M Raghuwansh

Research Scholar,

Principal

G. H. Raisoni College of

Engineering and Research

madhuri [email protected]

[email protected]

_

Abstract---Representing

the

content

of

the

text

Natural

language

processing (NLP)

science,

and linguistics concerned

is

a

field

of

artificial

intelligence,

with

interactions

the

between computers and human languages. It processes the data through lexical analysis, Syntax analysis, Semantic analysis, Discourse processing, Pragmatic analysis.

This

paper

compares

various

knowledge

representation schemes. The algorithm in this paper splits the English sentences into phrases and then represents these in predicate logic by considering the types

of

sentences

(Simple,

Interrogative,

Exclamatory, Passive etc.). The algorithm has been tested on real sentences of English. The algorithm has achieved an accuracy of 75%. This representation would be used in future for Semantic based Text summarization. Keywords:

Natural

Language,

Knowledge

Representation, Predicates.

I. INTRODUCTION

Language is the primary means of communication used by the humans. It is the tool everyone uses to express the greater part of ideas and emotions. It shapes thought, has a structure, and carries meaning. Natural language processing is concerned with the development of computational models of aspects of human language processing. Knowledge Representation is a fundamental area of research in computational linguistics. Knowledge should be represented in computer in such a manner that can be retrieved simply and powerfully. Knowledge representation is used in key areas of computational linguistics such as machine translation, storytelling question-answering, information retrieval and information extraction [13], [14]. The basic difficulty of knowledge representation is the growth of a sufficiently precise notation for representing knowledge. Such notation is referred to as a knowledge representation technique [10], [11], [14].

978-1-4673-4922-2113/$31.00 ©2013 IEEE

G. H. Raisoni College of Engineering, Nagpur [email protected]

Semantic is associated with the meaning of the language. The general idea of semantic interpretation is to take natural sentences and map them onto some representation of meaning. Semantic analysis, checks whether there is a sensible set of instructions in the programming language, any old noun phrase followed by some verb phrase makes a syntactically correct English sentence, a semantically correct one has subject-verb agreement, proper use of gender, and the components go together to express an idea that makes sense. For a program to be semantically valid, all variables, functions, classes, etc. must be properly defined; expressions and variables must be used in ways that respect the type system, access control must be respected, and so forth. Semantic analysis is the front end's penultimate phase and the compiler's last chance to weed out incorrect programs. The actual use of these representations has been addressed by the designers of the representation. Many researchers valued the importance of knowledge representation methods [4], [5], [6], [7], [8], [9]. The main purpose of knowledge representation is the creation of target language representation of a sentence's there is an important role that it plays. It imposes constraints on the representations structure and semantic. This paper describes a representation using predicate logic for English language depending upon the different types of sentences. Section-2 provides the information for different ways of representing knowledge.Section-3 represents algorithm for implementation of First order logic. (FOL).Section-4 describes the experimental results for FOL algorithm.Section-5 describes how this representation is useful for text summarization task. Section-6 proposes conclusion and future work.

is

really an important issue of knowledge representation. computer

HOD(CSE)

Rajiv Gandhi College of

Engineering, Nagpur.

Dr. Latesh Malik

11. DIFFERENT WAYS OF REPRESENTING KNOWLEDGE

The following are well known models, which are used for knowledge representation. A.

Logic

Logic is a formal language for representing facts and properties of a world in a precise, unambiguous way [13].

1264

2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]

The key concept to propositional logic is the proposition. A proposition is something that is exclusively either true or false. One often represents propositions by sentences in a human language. First-order logic (FOL) (also known as first-order predicate calculus or FOPC) adds predicates which can represent properties, e.g., mortal (person), or relationships, e.g., likes (Robert, Ice-cream), •



preconditions. The structure of a rule-based system shows at least two modules, the knowledge base (Rule-base) and the inference engine. The Rule-base comprises the domain knowledge in the form of rules. The inference engine represents the reasoning strategy for searching the rules, in a knowledge base, which enables to find a suitable solution of a problem. [15]. The advantages of production rules are: l. Simplicity. 2. Expressiveness and intuitiveness-Rules can be understood by a non-programmer also. 3. Modularity and modifIability-Individual rules can be changed and added. 4. Good for complex problems where humans have expert knowledge. 5. Each rule defines a small and independent piece of knowledge. 6. Rules are usually independently of other rules. Rule based system is not efficient representation scheme for the knowledge representation. It is used only for the representation of rule based knowledge (mathematical problems). In a rule based system, when certain rules are satisfied then the system gives single solution and when some other rules are satisfied then it provides an altered solution. From the representation of a sentence one should be able to retrieve the same input text/sentences and from rule based representation the retrieval of original text/sentence is very difficult task. However Rules are having their own applications like, Medical Diagnosis, Instruction training, Monitoring applications, Prediction based systems for agricultural applications etc.

Existentially quantified variables, e.g., ::3 There exist X such that... Universally quantified variables, e.g., such that...

"r/ for all X

Examples of predicate logic statements: 1. Tommy is a name of a dog: Tommy (dog) performs calculations: 2. Computer performs(Computer, calculations) Predicate logic employs the designs of constant, variable, function, predicate, logical connectives and quantifiers to signify facts [13]. In predicate logic, sentences can be split into words e.g. nouns, pronouns, verbs and adjectives or even phrases. As there are mostly finite numbers of words or phrases in a language, therefore one can easily store words or phrases for representing the knowledge e.g. in the form of text. Using this combination of number of nouns, verbs and phrases Amjad Ali, Mohammad Abid Khan [1] [2] [3] have proposed the knowledge representation method. In comparison to this method for representation of knowledge, the proposed method is based upon the types of sentences. The categorization is done on the basis of the types of the content of that sentence. e.g. Interrogative, Exclamatory, Simple sentence, Passive, facts. etc. The entire procedure is mentioned in the next section. The advantage of predicate logic is that sentence can be retrieved efficiently.

C.

Frames

Natural language understanding requires inference i.e., assumptions about what is typically true of the objects or situations under consideration. Such information can be coded in structures known as frames. A frame is a complex data structure for representing a stereotypical situation such as visual scenes, structure of complex physical objects or students examination marks etc. Frame is a type of schema used in many AI applications including vision and natural language processing. The situations to represent may be Frames are also useful for representing commonsense knowledge. As frames allow nodes to have structures they can be regarded as three-dimensional representations of knowledge. Later 1975, when Minsky initially suggested it, the concept of frames has played a key role in knowledge Representation research [8]. A frame is similar to a record structure and corresponding to the fields and values are slots and slot fillers. Basically it is a group of slots and fillers that defines a stereotypical object. A single frame is not much

B. Rules The production rules are simple but powerful forms of knowledge representation providing the flexibility of combining declarative and procedural representation for using them in a unified form [8], [13]. Examples of production rules: -IF condition THEN action -IF premise THEN conclusion -IF proposition pi and proposition p2 are true THEN proposition p3 is true. A Rule-based system involves an exact match the condition(s)/ premise(s) to predict the conclusion(s). This is very limiting, as real-world situations are often uncertain and do not match exactly with rule

1265

2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]

Pee-Wee-Reese belongs to the team Brooklyn Dodgers and has uniform color is blue. And person has the part nose. In this way the semantic relations can be represented through Semantic Network. Retrieving the actual and correct information from semantics networks is a very difficult task. Specifically, if someone wants to store and represent a text in computer in which there is no concept of objects and relations, then semantic network representation system is not appropriate and competent to represent and recover such text.

useful. Frame systems usually have collection of frames connected to each other. Value of an attribute of one frame may be another frame. A frame for a book is given below. Slots

Fillers

Publisher

Thomson

Title

Expert Systems

Author

Giarratano

Edition

Third

Year

1998

Pages

600

Ill.

FOL ALGORITHM

The steps for the FOL algorithm are as follows. Download POS tagger. (Stanford Tagger) [12]. Link POS tagger with Java. Input the sentence. POS tagger (Tags returns (tags)). Creating array of verbs. Assign index to verb. Creating array of subjects. Assign index w.r.t verb. Creating array of objects. Assign index W.r.t verb. Check the type of sentence through end symbols and different determiners like ( .', '7', '!', 'by', CD, 11 (tagged output for a number) etc.). 9. Internally sentences are arranged into (S V 0), Find Verb in the sentence, place it fust. Find subject and Object, place it accordingly in FOL forms [YeS, 0)]. 10. For complex sentences(S V 0 V 0 V 0), fmd main Subject and verb, then for the rest of the sentence Again find subject and verb. 11. If only one main subject found do step 6, else use conjunctions to join sentences. [V (SI, 01) 1\ V (S2, 02)].

1. 2. 3. 4. 5. 6. 7. 8.

Table 1: Frame for book

A frame is not a competent knowledge representation scheme in linguistics. Because if someone wants to represent sentences of a language, it is difficult to fragment the sentences into slots and their Fillers and vice versa is also very difficult. Once sentences are symbolized in frames, then the supporting words of a sentence are not kept and thus due to absence of supporting arguments in representation the retrieval of the same input sentence is very challenging.

'

D. Semantic Net A semantic net (or semantic network) is a network which represents semantic relations among concepts. So it is also called a propositional net .This is often used as a form of knowledge representation. Semantic nets convey meaning. Semantic nets are two dimensional representations of knowledge. Mathematically a semantic net can be defined as a labeled directed graph [13]. Semantic nets consist of nodes, links (edges) and link labels. In the semantic network diagram, nodes appear as circles or ellipses or rectangles to represent objects such as physical objects, concepts or situations. Links appear as arrows to express the relationships between objects, and link labels specify particular relations. Relationships provide the basic structure for organizing knowledge. The objects and relations involved need not be so concrete. Fig l. shows semantic network, where Pee-Wee-Reese is object (member) of Person, which is a subset of Mammal.

The steps for reconstructing the sentences from FOL are. 1. 2.

Reconstruct the sentence using Subject first S, next Verb and then the object. If it is passive sentences then reconstruction uses Object first next Verb and then the Subject.

A. POS Tagger A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word. It assigns a part-of-speech like noun, verb, pronoun, preposition, adverb, and adjective or other lexical class marker to each word in a sentence. This software is a Java implementation of the log-linear part-of-speech taggers. A number of of Taggers are available Stanford Tagger, Apache UIMA Tagger; Eric Brill's simple Rule Based Tagger etc. are some of them. Out of which Stanford tagger has been used. Its basic download contains two trained tagger models for English. The full download contains three trained English tagger models, an Arabic tagger model, a Chinese tagger model, and a German tagger model. Both

Fig 1: Semantic Network [13]

1266

2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]

versions include the same source and other required files. The tagger can be retrained on any language, given POS­ annotated training text for the language. The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags). The output is a single best POS tag for each word shown in table-2. Tag g er o/p CD

Meaning

Tagg er o/p

Cardinal

NNPS

NNS

RBS

Adverb,

WDT

Tag ger o/p

Meaning

Proper Noun,

TO

to

Wh-determiner e.g. which,

superlative

and that when it is used as a relative pronoun LS

List Item

RP

Particle

WP

Marker

Meaning

Noun, plural

Adjective, superlative

Wh-pronoun e.g. what, who, whom...

MD

plural

Number CC Coordinating

JJS

Modal

SYM

Symbol

e.g. can,

used for

could, might,

mathematical,

may...

scientific

YBN , past participle

WP$ Possessive wh-

pronoun

symbols

conjunction NN

e.g. and, but,

Noun,

TO

to

singular or

or...

WR

Wh-adverb

B

e.g. how, where

mass OT

Determiner

POT

Predeterminer e. UH g. all, both ... when they precede an

why

Interjection e.g. uh, well,

NNP Proper Noun,

yes, my...

singular

article EX

Existential

POS

there

Possessive

VB Verb, base form

Ending

subsumes

e.g. Nouns

imperatives,

ending in's

infinitives and

Table 2: POS Tagged output and their meanings.

B. Different Kinds afSentences

SUbjunctives FW

Foreign

PRP

Word

IN

Preposition

PRP$

Personal

VBO Verb, past tense

Pronoun

includes the

e.g. I, me, you,

conditional form

he...

of the verb to be

Possessive

VBG Verb, gerund or

or

Pronoun

persent

subordinatin

e.g. my, your,

participle

g

mine, yours...

conjunction JJ

Adjective

RB

Adverb

YBP

Verb, non-3rd

Most words that

person singular

end in -ly as well

present

as degree words like quite, too and very JJR

Adjective, comparative

RBR

Adverb, comparative

VBZ Verb, 3rd person

singular present

Adverbs

According to Wren and Martin the sentence comprises of Subject, Verb and Object. So, each sentence has to have a subject(S), Object (0) and a Verb (V). Some sentences may have adjectives, adverbs and conjunctions. There are also sentences which are interrogative i.e. they ask a question and there are some which are exclamatory. Keeping all these in mind, sentences are categorized in different type. It is important to categorize sentences because the POS tagger treats the sentences as group of words. It does not look at the meaning of the sentence as a whole. The basis for the process of categorization is shown in the table 3. The categorization is as follows: 1. Sentences having exactly one subject, one verb and one object. (SVO) 2. Sentences having exactly one subject, one verb, one object and adjectives also.(SVO with ADJECTIVES) 3. Simple fact statements. (FACTS) 4. Sentences containing more than one noun and verbs. (COMPLEX) 5. Sentences which question. (INTERROGATIVE) 6. Sentences which exclaim. (EXCLAMATORY) 7. Sentences which contain helping verbs. (MORE THAN ONE VERB) 8. Sentences which contain numbers. (NUMBERS). 9. Sentences in passive form. (PASSIVE). Basis of categorizing

1267

Category

2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]

Sentence with only one subject, one verb and one object.

Subject, Object

Sentence with only one subject, verb, and adjective followed by a verb.

with SVO adjective(JJ, JJR, JJS)

Verb,

Complex

Facts

Interrogative

Sentences with more than one subject or object and having "and" ... "or" in it.

Complex

Exclamatory

Sentences terminating with a"?".

Interrogative

Sentences That.

starting

with

This,

Sentences terminating with a"!". Sentences which auxiliary verb.

contain

an

More than one verbs

Exclamatory

Passive

More than one verb

Sentences in which the subject follows"by".

Passive

Sentences in which the numbers are given.(Numbers-"MD"(POS tagger returns))

Numbers

Numbers

Array used to store the similar information and Structures keeps any information.

used ( Array , similar information ) 1\ keeps (Structures , information )

What exactly is artificial intelligence?

is(exactly artificial intelligence)

How cold the night is! Computer helps drawing a picture.

is(cold night) helps drawing(Com puter, picture)

All types of calculations are performed by computer.

are performed(co mputer, types calculations)

There are two computers.

are(two computers)

Table 4: Results for Semantic Representation

CONCLUSION

This paper focuses on semantic representation for natural language. It describes the Semantic Representation for English Language. Different knowledge representation methods are available and concluded that First-Order Logic is the simple and best method for Knowledge Representation. So, First-Order Logic has been used. This algorithm computes fust order logic representations for sentences and also reconstructs the sentences from them. The accuracy of the system can be further increased through NLP-Chunker. It makes the word phrases for the sentences. Chunker can help while reconstructing the sentences more accurately. In future using some more tools the performance of the system will be improved.

Table 3: Categorization of English sentences

Simulating human language understanding on the computer is a great challenge. A way to approach it is to represent natural language meanings in logical form. The sentences and their corresponding results are shown in next section. IV.

RESULTS

The sentences and their corresponding representations are shown in the table 4.

semantic

REFERENCES Type of

Example

Output

[1] Amjad Ali, Mohammad Abid Khan, "Selecting Predicate Logic for Knowledge Representation by Comparative Study of Knowledge Representation Schemes" in 2009 International Conference on Emerging Technologies, IEEE 2009, pp 23-28. [2] Amjad AIi, Mohammad Abid Khan, Knowledge Representation of Urdu Text Using Predicate Logic, 2010 6th International Conference on Emerging Technologies (ICET), IEEE, pp. 293-298. [3] Dalia Fadl, Mostafa Aref, Rich Semantic Representation Based Approach for Text Generation, Ibrahim Fathy, The 8th International Conference on INFOrmatics and Systems (lNFOS2012) pp- 20-28

Sentence

SVO SVO with Adjectives Facts

Computer performs calculations. Computer is a programmable machine.

performs (Computer, calculations) is(Computer, programmabl e machine)

This is a computer.

is(computer)

1268

2013 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2013]

[9] S. Sawai, H. Fukushima, M. Sugirnoto, And N. Ukai Knowledge Representation and Machine Translation, COLING 1982, pp. 351-356. [10] P. J. Hayes, Some Problems and Non-Problems in Representation Theory, Proceedings AISB Summer Conference, July 1974, Essex University [11] STANFORD PARSER: http://nlp.stanford.edu/software/lex-parser.shtrnl

[4] Ralph M. Weischedel, Knowledge Representation and Natural Language Processing, Proceedings of the IEEE, Vol. 74, No 7, July 1986, pp 905-920. [5] Mitsuru Ishizuka, Kazuya Tananashi, Hermut Prendinger, A Predicate Logic Version Method for Cost based Hypothetical Reasoning Employing an Efficient Propositional level Mechanism, [6] Aikaterini Mpagouli, Ioannis Hatzilygeroudis, Converting First Order Logic into Natural Language: A First Level Approach, University of Patras, Hellas.; [7] Toshiaki Fuziki, Hidetsugu Nana, Manau Okumura, Automatic acquisition of script knowledge from a text collection, EACL 2003 Budapest, Hungry, pp 91-94. [8] M. Minsky, A Framework for Representing Knowledge, The Psychology of Computers vision, McGraw Hill, 1975.

[12] STANFORD POS TAGGER: http://nIp.stanford.edu/software/tagger.shtmI BOOKS

[13] Rich and Knight, "Artificial Intelligence", TATA Mc Graw Hill Second Edition. [14] Tanveer Siddiqui, U.S. Tiwari, Natural language Processing and Information Retrieval, (BOOK) Oxford University Press. [15] John Durkin, Expert Systems-Design and Development, Prentice Hall.

1269

Suggest Documents