Biomedical concept extraction based Information ... - IEEE Xplore

Biomedical Concept Extraction based Information Retrieval Model: application on the MeSH

Mondher SENDI

Mohamed Nazih OMRI

MARS Research Unit, Faculty of Sciences of Monastir, University of Monastir, 5019 Monastir, Tunisia [email protected]

MARS Research Unit, Faculty of Sciences of Monastir, University of Monastir, 5019 Monastir, Tunisia [email protected]

Abstract-This paper proposes a new approximate model for biomedical concept extraction. This model is based on possibilistic network, statistical computing and semantic proximity. The possibilistic network is used for representing

nized in the biomedical field. Our first contribution is the possibilistic representation of the MeSH structure in order to ensure the selection of the relevant concepts for a query. This method is focused on the defmition of the concept relevance and a model to determine its value. The second contribution is a new method to enrich the MeSH thesaurus by identifying semantic relations between con cepts. This contribution is the result of an in-depth study on the MeSH structure. The remainder of this paper is organized as follows. Sec tion 2 recapitulates related work. Section 3 details our method for concept extraction. Then, we present in the section 4 Experiments and results of evaluation. Finally, we conclude in the section 5 with an exposition of the obtained results.

the MeSH structure in order to select the relevant concepts for a biomedical text. Moreover, we propose an enrichment model of the MeSH thesaurus by the identification of the semantic relations between concepts. The results of the extraction model serve to mapping a query in an information retrieval process. And, to prove the significance of our model in the Information Retrieval context, we used a vector model and the OHSUMED collection.

Keywords;

Concept

Extraction,

Semantic

Proximity,

Possibilistic Network, Information Retrieval.

I.

INTRODUCTION

II.

The fast evolution of the information quantity available on the web has rendered the semantic information retriev al (lR) one of the largest research fields in the last years. Unlike the classic IR the semantic IR aims at generating results that are more relevant by understanding the pur pose of research and the significance of the search terms. It is mainly based on two tasks: Semantic Matching (SM) and Semantic Indexing (SI). The SM's objective is to calculate the similarity between a query and a document. Otherwise, the SM aims at se lecting the relevant documents towards query. An IR model that allows representing the two entities (Query and Document) generally used for modeling this process. Several models were proposed for the SM using various theories. We can cite the Boolean models [1], Vector models [2], Probabilistic models [3] and recently the Possibilistic models [16]. Each model relies on its theory to model the SM. Extraction of relevant concepts from a text is the main process of SI. An effective method for the concept extrac tion, allows us to index properly documents and query, and return the relevant information. Automatic concept extraction is the process of identifying the meaning and the relevant expressions from the text (document, para graph, sentence and query) to determine the most signifi cant topics. The problem that we are concerned with is mainly about l the biomedical concept extraction exploiting the MeSH thesaurus (Medical Subject Headings): thesaurus recog-

The task of extracting biomedical concepts from the text is the challenge for researchers in the information extrac tion field. Different methods and tools have been pro posed for solving this problem. They can be divided into four types: linguistic rules (LRs) [4][5][14], learning methods [7][8][21][23][24], statistical [9] [10][15] and those based on dictionary lookup [11][13]. The methods based on LRs exploit the morph-syntactic and lexical features of biomedical concepts in order to define a set of rules. Linguists and experts in the biomedi cal field generally defme them. In [5], orthographic and lexical features are exploited to determine a set of LRs for recognizing the protein names. In [6], the protein names are identified through morphological analysis, a lexical search using a lexical resource and a terminology parsing. These latter approaches are expensive in computation time and they are specific to a particular language. Machine learning models are widely used in biomedical concept extraction. In [7], the gene and protein names are identified using Hidden Markov model (HMM). This approach exploits the orthographic and lexical characte ristics to extract a technical terminology from MedLine (Medical Literature Retrieval System) abstracts. A Naive Bayes model is combined with (LRs) for the extraction of biomedical terms in [8]. Other approaches combine the statistical computing and exploitation of language charac teristics to extract concepts [25][26][27]. CINC-value method [9] uses a 'part-ol-speech' tagger to identify the concept candidates. The importance of each of them is determined through two complementary statistical meas ures: C l and NC l ' In [10], biomedical concepts are

1

MeSH presentation and publication : https://www.nlm.nih.gov/mesh/staffpubs.html

978-1-4673-8709-5/15/$31.00 ©2015 IEEE

RELATED WORK

va ue

40

va ue

extracted from a vector model. This method combines the statistical approach, the (LRs) approach and dictionary lookup using MeSH thesaurus. The approaches based on statistical computing use many parameters and heuristics without giving a meaning to these frequencies. Thus, they do not take account of the uncertainty generated by these heuristics. Methods based on dictionary lookup exploit mainly ter minological resources (Thesaurus, Meta-thesaurus, ontol ogy...) for the concept extraction problem. The MetaMap method [11], based on dictionary lookup in the UMLS meta-thesaurus uses natural language processing (NLP) and computational linguistic techniques to identify the concepts in the biomedical text. It calculates the relevance of each candidate concept through simple functions like centrality, variation; coverage ... This method is used by NLM (National Library of Medicine) for indexing Med Line citations. The ATM method [12] is another method proposed by NLM for extracting MeSH terms from a query. It is based on the exact matching between the MeSH entry terms and a query. In the next section, we propose an approximate hybrid approach for biomedical concept extraction by combining statistical computing, dictionary lookup and possibilistic

logic. III.

METHOD

A. Theoretical framework Our approach finds its theoretical framework in possibi listic logic and more precisely in the possibilistic network. It allows representing the elements of the problem, ex pressing ignorance and taking into account the imprecise and the uncertain. Possibility theory is based on the possibility distribu tions associated with a set of variables. A possibility dis tribution is a mapping from n (universe of discourse) to the interval [0, 1]. It represents a state of knowledge dis tinguishing what is plausible and what is less plausible. Moreover, possibilistic logic offers two complementary measures to determine the plausibility and certainty of an event: • The possibility: Denoted n(A) , it reflects the plausibility degree of an event A • The necessity: DenotedN (A), it reflects the degree of certainty of an event A. A possibilistic network is a directed graph on a set of binary variables. The edges in this network encode the causal/influence links between these variables. The joint possibility distribution associated with a possibilistic graph is computed either from product-based conditioning or from minimum-based conditioning [17].

B. AfeSllthesaurus MeSH thesaurus is a controlled vocabulary produced by the NLM. It allows managing documentary resources in terms of indexing, researching and cataloging the biomed ical documents. It contains a set of descriptors arrayed hierarchically from most general to most specific in up to twelve hierarchical levels. It comprises 27,149 descrip-

tors and over 218,000 entry terms. Each descriptor con tains one or more concepts, yet each concept contains one or more terms. In our approach, we use three different sets to represent the taxonomy MeSH: (a) a set of concepts exists in MeSH, (b) a set of terms and (c) a set of words. The (a) and (b) exists in MeSH but (c) is the new set. C.

Possibilistic algorithms for concept extraction

Concept extraction process consists of selecting a set of relevant concepts for a text out of a controlled vocabulary. The termino-Iogical resource has a very important role in this process. It makes it possible to simplify the selection process by grouping the biomedical concepts in a single terminology. However, other works [18] [19] have shown the existence of irrelevant concepts for texts. The proposed possibilistic model is based on the possi bility distributions related to the three following sets: concepts, terms (a set of one or more words) and words. The last set is built out of the MeSH concepts. In a second step, we proceed to the enrichment of the extracted con cepts. It is based on the exploitation of the descriptions of the MeSH concepts model for identifying the semantic relationship.

1)

Concept extraction algorithm a) General architecture

Concept extraction algorithm aims to determine the simi larity between a concepts and a query after matching the two objects (Concept, Query). Therefore, a query is an expression that consists of a set of words linked by con nectors such as (and, of, the ... ). In our approach, a query is seen as a set of words Q = {Wv W2, , Wq}, for which • • •

we capture the significant words by removing stopwords,

(e.g. for the query "treatinghypercalcemia of malignancy" Q = {treat ing, hypercalcemia, malignancy}). Further to this idea,we extract

all the words that exist in the MeSH entry terms to build a possibilistic layer around the thesaurus. This layer allows making the matching between a query and MeSH con cepts, and calculating the relevance of each concept se lected to a query. During this process, we extracted 27 858 words for the Possibilistic Layer from 22 568 con cepts MeSH. Possibilistic Layer is a set of word PL = {WI' ... , We}. Thus, the MeSH thesaurus is considered as a list of con cepts MeSH = {Cv C2, ... , Cn} . Figure 1 illustrates the concept extraction process in our approach. A query Q is composed of a set of words. It accesses to the MeSH structure through PL to select a list of relevant concepts. Each query selects a large number of concepts. (e.g. the query "review article on adult respiratory syndrome" selects more than 400 concepts). Therefore, we sort the list L by the degree of

relevance computing with a possibilistic model. To determine the relevant concepts for each query, we have defined a possibilistic model containing three layers. It allows representing the three following sets (Concepts, Terms and Words) and the dependency relationships. Architecture: (Figure 2) the first layer in this model is the concepts layer. It includes the concepts selected by the

2015 15th International Conference on Intelligent Systems DeSign and Applications (ISDA)

41

query Q (who share at least a word with Q). The second layer is the terms layer (MeSH entry terms). Each term in this layer can belong to a single concept. These last two layers represent the MeSH thesaurus. The third layer is the possibilistic layer that we have built around the thesaurus. Each root node is a binary random variable that can take a value in Dom ( Cj ) {Cj, -.Cj}: Cj Cj means that the Cj is relevant to the query Q, Cj -.Cj means that the Cj is irrelevant to the Q. The same thing for the variables T; and Wk. =

=

N ( mk � t·I)

=

I

icfk

*

NbWords(ti)

log(Nbeoncepts)

(3)

k.

=

J

Where icfkthe importance of the mk word in the Thesau rus, is similar to the idf parameter defined in the IR process, NbConcepts is the number of concepts in MeSH. TABLE II.

WORD POSSIBILITY DISTRIBUTION

=

n

(MkITi ) mk �m k

t;

�t;

1

NbWords(m) 1

i

1 - k

1

......... Figure 2. Possibilistic model architecture Figure 1. Architecture of the concept extraction process.

Possibility distribution: We use the number of words in each term for computing the possibility distribution (Table I). For a concept that contains a set of terms, the term that contains less words is most significant, contains specific information and more representative of its parent, (e.g., the concept "PedigreeIDOI0375" contains the three following terms: {Pedigree, Family Tree, Jdentity, Genetic}, we consider that the term

Query Evaluation: The query evaluation allows determining the relevance of each concept from two measures: Necessity and Possibility. We use the inference Eq. proposed in [17] and we add a new parameter Ie to take into account the words that are not shared between the two objects (Query, Concept):

n (cil Q) =

"Pedigree" is more representative of its concept). n

(t.le.) I J

=

I+Nbwords(Tmax)-Nbwords(ti) NbWords(Tmax)

=

Uj i

Ai

(TdCj )

ej

�ej

U;j

t; �t;

1

For the variables (Wk) IS k Sc, (Table II) the value of n ( mkl ti ) is determined from the number of the words in the ti term. n

(mkIti )

I

- NbWords(ti)

(2)

In our context, we calculate necessity value using the following Eq.:

42

n (etlci) * n (Ci)) *

A

=

(4)

n (Ci

*

Az

TABLE T. TERMS POSSIBILITY DISTRIBUTION n

=

for

*

Q) for n (-,Ci Q) Card (M(Q) n M(Cj)) Card (M(Q) n M(Cj)) ( Card (M(Q)) Card (M(Cj))

A= {AA2i

(1)

Where NbWords (t) is the number of words in the term t and Tmax is the term that contains the maximum number of words among the terms of Cj.

maxe'EeT( n (Q let)

1 -

)2

Al

Where aT is the set of possible configurations of the terms nodes, at a possible instantiation, M(X) is the set of words of X and Card (x) is the cardinality of the set x. We calculate the relevance of each concept from the degree of the Necessity using equation 5:

N(ed Q) 2)

=

1

- II( ,cd Q)

(5)

Enrichment algorithm

The enrichment of the MeSH thesaurus by semantic rela tions allows improving the results of the concept extrac tion algorithm previously developed. Firstly, it is to ex ploit the descriptions of the concepts that exist in the MeSH thesaurus in order to determine the semantic prox imity between the concepts [22]. It allows us to construct


a set of semantically related concepts to each concept. This process essentially based on the two following steps.

Exploitation of the glosses (Step-I): in this step, we exploit the definition of each concept to determine its semantic relatedness with the others MeSH concepts. Indeed, each concept in the MeSH has a definition of its meaning and the scope of its use in the medical field. If a C1 concept shares a number of words with the definition of another concept C2• (i.e. that it is present in the definition of C2), then it is possible that there is a relationship between C I and C2• Indeed, a definition (Dj) of a (Cj) concept is relevant to another concept (Cj) then the Cj concept has a semantic relationship with Cj.

Architecture: the root node represents the Dj definition of a Cj concept (Figure 3). In the second level, we find the words layer. It only contains the words shared between two objects (Definition, Concept). The third layer includes the terms nodes and the last node in this model is the concept. The possibility distribution: For the weighting of an arc connecting, an mk word and a defmition, we use word frequency (wf) in the definition and the inverse document frequency (idf). Subsequently, the degree of plausibility of a word is based on the following assumption: A word is possibly representative of a defmition if it appears frequently in that definition (Table III). We use the words layer for matching between documents and concepts.

Figure 3. Possibilistic model architecture

The degree of necessity is based on the following assump tion: A word is necessarily representative of the definition

if it appears more frequently in that definition and rarely in other definitions.

N (mk Where

I. dfk

--7

=

di)

=

CPki

=

ntfki

NbDefinitions

log ( Card ({Dj:mk E

idfk

*

"---

log(NbDefinitions)

---

) the

Din

(7)

Where

idfk

=

Nbconcepts

log(Card ({Cj:mk E C ) in

the word mk in the concepts collection. TABLE III. n

WORD POSSIBILITY DISTRIBUTION

(ThIMk)

mk

th -,th

NbWords

the word mk in the collection and NbDefmitions is the number of defmitions in the MeSH.

(th 1mk)

1 NbWords (m)

l-fShk 1

TERMS POSSIBILITY DISTRIBUTION

(MkiDa mk -,mk

di nwfki

1 - CPki

1

1

-,di

Concept co-occurrence (Step-2):In this step, we carry out the filtering of the result of the Step-l for determine the closest concept (semantically) from those returned. For this purpose, we propose a new method of semantic prox imity based on concept co-occurrence and more precisely on the contingency table proposed in the previous works. TABLE V.

CONTINGENCY TABLE OF TWO WORDS

MrPresent

Mj-Absent

a c

b d

Mj-Present Mj-Absent

(a) the number of times where the two words appear together, (b) where the first word appears without the second, (c) where the second word appears without the first and (d) where none of the words appears. To determine the semantic relation between two concepts from using the co-occurrence, we rely on the following hypothesis "Two concepts have a semantic relation, if their words appearing together in a document ". Then, we determine the semantic relation between the words for determining the semantic relation between concepts. We determine the first relation from the Contingency Table (Table V) and the Simple proximity coefficient (SMC): SMC

(

I'

1

m· m· ) = � a+b+d+c

(10)

Ck and Cg two concepts of MeSH, each of them is a set of words Ck { Mkl' Mk2, ... , Mkp }, Cg { Mglo Mg2, ... , Mgp } . We use the co-occurrence between all possible pairs (Mkj, Mgj) ECk X Cg, it is a matrix of size (ICkl x ICgl).Then, we determine the co-occurrence (Coc) between two concepts from the sum of the (SMCs) of all the pairs:

For the terms, we can determine the possibility value n (thImk) from the importance of the word mk in the term th (Table IV). n

(m)

1

TABLE IV. n

-,mk

1

=

.

Importance of

is the importance of

=

IV.

EVALUATION

(8)

A. Test collection

(9)

For the assessment of our algorithms, we have integrated our method of the concept extraction in an IR process. We

2015 15th International Conference on Intelligent Systems Design and Applications (ISDA)

43

used a vector model (VMS) [2] and a collection of bio medical documents. In this process, the matching between a document and a query is carried out between the query concepts (returned by our method) and the document concepts (provided by the expert). During our assessment, we use OHSUMED test collection [20] proposed in the track TREC9-Filtering 2000. Human experts have anno tated each document with a set of MeSH concepts. We used P@5, P@IO, P@50 representing respectively the mean precision values at the top 5, 10, 50 returned docu ments and the recall R for 63 queries. P@X

R

=

=

LP=l Pi@X n

Lr21 Ri

�,Ri

=

cept extraction approach (e.g.+0.l6 in term of P@5).This result shows that the use of the EN process for each con cept returned by the EX, allow us to find the more signifi cant concepts for each query. In this study, we used the first concept returned by the EN process for indexing the OHSUMED queries.

Measure

ATM

EX+ EN

EX

P@5 P@!10 P@50 Recall

0.1396 0.1317 0.0961 0.4360

0.2984

0.1333 0.1238 0.0936 0.4520

.

where Pi@X

=

COMPARISON WITH ATM

TABLE VI.

Ljx=l relevance(D j ) X

number of relevant documents returned number of relevant documents

0.2460 0.1638 0.5210

B. Results and discussion In order to determine the usefulness of the possibility distributions of our model, we compare results found for two different distributions. The first is the possibility distribution (Dist-l) of the terms in the first model (Table I) and (Table II). The second is a distribution (Dist-2) that ignores the knowledge of the terms (all possibility values II (TdCj ) and II (MkITi ) equal to 1). The Figure 4 illu strates the variation of two precisions P@50 for the two distributions relative to the concepts number. This syntac tic feature which we exploited gives +0.041 in the preci sion P@50. It shows the utility of the length of the terms for the comparison between terms of the same concept. We used the size of terms firstly on the 1 (n (mklti) = NbWords (t; ) and n (t d ca = 1+ Nbwords (Tmaxl- Nb�ords (til ) and in the second model on ( NbWords Tmax ( . Then, the the possibility II (thlmk) = 1 NbWords m)

- P@50 Distl - P@50 Dis2 0.1 0.09 0.08 0.07 0.06 0.05 0.04 2

3

possibilities

6

7

8

-P@IO

9

10

....... P@50

0.13 0.12 0.11

In Figure 5, we present the variation of the precisions (P@5, P@1O, and P@50) relative to the concepts number for all queries when we used only the concepts returned by the extraction EX process. The Figure 5 shows the variation of the precisions as a function of the concepts number. The three precisions have the same shape, its values decrease if the number of concepts increases. If the concepts number equal two, we find the three following values of precisions (max.) (P@5 0.l333, P@10 0.1238, P@50 0.0936); these values show that most queries can be indexed by at most two concepts returned by the EX process to have a maximal precision.

0.08

0.1 0.09 0.07 0.06 2

3

Figure 4. Figure 5.

4

5

6

7

8

9

10

Possibilistic distribution.

The variation of the precisions (P@5, P@IO and P@50)

=

=

In order to show the impact of the EN process in the concept extraction, we index the queries using the two methods together (EN + EX). We use only the first two concepts returned by the EX method. For each of them, we used the first concept returned by the EN process and then we calculate the precisions (P@5, P@1O and P@50). Table VI shows the importance of EN process in the con-

44

5

-P@5

0.14

exploitation of morphological characteristic of the MeSH terms for modeling the possibilistic network is an interested idea.

=

4

We compared the results found by our method with re sults found by the ATM method (see section 2). This comparison is given in the table (Table VI). We found almost the same values for the ATM method and our extraction process (EN). There is a little variation in the order of 0.002 for the precisions and approximately 0.02 for the recall. Nevertheless, the use of our two algorithms together (EX+EN) gives the best results that ATM (e.g. +0.15 in terms of P@5, +0.11 in terms of P@1O).


2009.

V.

CONCLUSION

This paper presents a new method for extracting biomedi cal concepts using the MeSH thesaurus. This method is structured around two main algorithms. The first one is a possibilistic algorithm for the concept extraction based on the approximate search and possibilistic model. The second one is an algorithm for thesaurus enrichment based on two steps: enrichment with the concepts defini tions and enrichment with the OHSUMED collection. We have exploited in this model the syntactic features of the MeSH entry, the semantic proximity between concepts and the statistical computing. A wide experimental study we have allows showing the meaning of different parame ters proposed by our method in particular, the possibility distributions in the two-possibilistic models. As a future work, we will concentrate on the determina tion of the number of relevant concepts for each query. We will also exploit the degree of relevance and the de gree of co-occurrence in order to determine the thresholds to filtering the results returned by two methods: extraction and enrichment. VI.

REFERENCES

[I] G. Salton, E. A Fox and H. Wu, "Extended Boolean information retrieval," Communications of the ACM, vol. 26, pp. 1022-1036, 1983. [2] C. Salton, A Wong and C.S. Yang, "A vector space model for automatic indexing," Commun. ACM, vol. 18, pp. 613-620, nov 1975. [3] H. Turtle and W.B. Croft, "Inference Networks for Document Re trieval,"Second annual ACM conf on Hypertext, pp. 213-224, 1989. [4] F. Fkih, M.N. Omri and I. Toumia, "A Linguistic Model for Termi nology Extraction based Conditional Random Fields",CoRR"12, vol. abs/1210.0252 Nov. 2012. [5] K. Fukuda, A Tamura, T. Tsunoda and T. Takagi, "Toward Informa tion Extraction: IdentifYing protein names from biological pa pers,"The Pacific Symposium on Biocomputing, Hawaii, pp. 707718, 1998. [6] R. Gaizauskas, G. Demetriou, P. J. Artymiuk, P. Willett, and "Pro tein structures and information extraction from biological texts: the PASTA system," Journal of Bioitiformatics 19(1), pp. 135-143, 2003. [7] N. Collier, N. Chikashi and J. Tsujii, "Extracting the Names of Genes and Gene Products with a Hidden Markov Model," Proc. of the i8th Cotiference on Computational Linguistics, vol. I, pp. 201-207, 2000.

[13] W. Chebil, L.F. Soualmia, M.N. Omri, and S.1. Darmoni, "Biomedi cal Concepts Extraction Based on Possibilistic Network and Vector Model", Artificial intelligence in Medicine, vol. 9105, p. 227-231, 2015. [14] M.N. Omri, "Systeme interactifflou d'aide al'utilisation de disposi tifs techniques: SIFADE". PhD, These de l'universite Pierre et Marie Curie, Paris, France, 1994. [15] F. Fkih, M.N. Omri, "Complex Terminology Extraction Model from Unstructured Web Text Based Linguistic and Statistical Knowledge", lJlRR, 2(3), pp. 1-18, 2012. [16] K. Garrouch, M.N. Omri and A. Kouzana, "A New Information Retrieval Model Based on Possibilistic Bayesian Networks", iCCRK'i2, vol. 9105, pp. 1-7, 2012. [17] F. Zohra, B. Mechmache and Z. Alimazighi, "Possibilistic Model for Aggregated Search in XML Documents," int. J. intell. in! Database Syst., vol. 6, pp. 381-404, September 2012. [18] M.N. Omri, "Pertinent Knowledge Extraction from a Semantic Network: Application of Fuzzy Sets Theory". International Journal on Artificial Intelligence Tools (!JAIT). I3 (3), 705-719, 2004. [19] O. Tuason, L. Chen, H. Liu, J.A Blake and C. Friedman, "Biological Nomenclatures: A Source of Lexical Knowledge and Ambiguity," Pacific Symposium on Biocomputing, pp. 238-249, 2004. [20] Ch. Buckley, T.1. Leone, D.H. Hickam, W.R. Hersh, "OHSUMED: an interactive retrieval evaluation and new large test collection for research," SiGiR'94, pp. 192-201, 1994. [21] M.N. Omri, I. Urdapilleta, J. Barthelemy, B. Bouchon-Meunier, C.A Tijus. "Semantic scales and fuzzy processing for sensorial evaluation studies". information Processing And Management of Uncertainty in Knowledge-Based Systems (iPMU96). 715-719, 1996. [22] M.N. Omri, "Fuzzy Knowledge Representation: Measure of Similari ty between Fuzzy Concepts in Semantic Nets", 5th Tunisia-Japan Symposium on Culture, Science and Technology. 80-85, 2004. [23] M.N. Omri, C.A Tijus, S. Poitrenaud, B. Bouchon-Meunier. "Fuzzy sets and semantic nets for on line assistance". Proceedings of 11th IEEE Conference on Artificial intelligence for Applications, pp. 374379, 1995. [24] F. Naour, Lobna Hlaoua, M.N. Omri, "Collaborative Information Retrieval Model Based on Fuzzy Confidence Network", Journal oj intelligent and Fuzzy Systems. 2015. [25] F. Fkih, M.N. Omri, "IRAFCA: An O(n) Information Retrieval Algorithm based on Formal Concept Analysis". Knowledge and in formation Systems, An international Journal. 44(3), 2015. [26] F Naouar, L Hlaoua, M.N. Omri. "Possibilistic Information Re trieval Model based on Relevant Annotations and Expanded Classi fication". 22nd international Conference on Neural information Processing (iCONiP2015), 2015. [27] W. Chebil, L.F. Soualmia, M.N. Omri, S.1. Darmoni, "Extraction possibiliste de concepts MeSH apartir de documents biomedicaux". Revue d'intelligence Artificielle (RIA), 28 (6), 729-752.

[8] S. Mukherjea, L.V. Subramaniam, G. Chanda, S. Sankararaman, R. Kothari, V. Batra, D. Bhardwaj, and B. Srivastava., "Enhancing a biomedical information extraction system with dictionary mining and context disambiguation," BM J. Res. Dev., vol. 48, pp. 693-701, 2004. [9] K. Frantzi, S. Ananiadou, H. Mima, "Automatic recognition of multi word terms: the C-valueINC-value method," international Journal on Digital Libraries, pp. 115-130, 2000. [10] P. Ruch, "Automatic assignment of biomedical categories: toward a generic approach," Bioinformatics, vol. 22, pp. 658-664, 2006. [II] AR. Aronson, "Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap," Proc. AMiA Symp., p. 17-21, 2001. [12] B. Thirion, 1. Robu, S.1. Darmini, "Optimization of the PubMed Automatic Term Mapping," Stud Health Technol inf, pp. 238-42,


45