Semantic Similarity Based Approach for Reducing

Recommend Documents

Similarity Based Approach for Comparing Home Healthcare. Processes Models in Portugal. Latifa Ilahiak, Ricardo Martinhob, Sonia Ayachi Ghannouchic, Dulce ...

Abstract——This paper presents a similarity-based approach for estimating the Remaining Useful Life (RUL) in prognostics. The approach is especially suitable ...

a semantic-based approach for artist similarity - MTG - UPF

resentation from these mentions in the form of a semantic graph or a mapping to a ..... contains human judgments of song similarity. According to [29], the ...

Semantic similarity-based approach to enhance ... - arXiv.org › publication › fulltext › Semantic-... › publication › fulltext › Semantic-...by H Turki · ‎2020Du, J., Chen, Q., Peng, Y., Xiang, Y., Tao, C., & Lu, Z. (2019). ML-Net: multi-label classi

A Cluster-Based Approach to Improve Similarity-Based Retrieval for ...

if the MAC stage efï¬ciently selects a small number of relevant cases. However ... by reuse of best-practice workï¬ows from a repository [2, 9] and the. monitoring of ...... supporting business process adjustment and analysis', Information Sys-.

Comparing String Similarity Measures for Reducing ... - gplsi

The LD of two strings m and n in length, respectively, can be calculated by a dynamic ..... threshold of 0.25, the expansion of abbreviations does not influence the ...

An approach to Semantic Textual Similarity based on Textual Entailment

Jun 7, 2012 - First Joint Conference on Lexical and Computational Semantics ... described below. ... the goal of providing a binary answer for each pair. (H,T) ...

world scenarios, it is possible that the customer cannot answer a question. To nevertheless ... providing her/him with information at the right place is the key. ..... outcome a with a higher utility is preferred to an outcome with lower utility. The

Evaluation of a Similarity-based Approach to

dialog has to be carried on that simulates a real sales talk between a sales person ..... 2 depicts the result for an ideal customer agent acting in a sales.

Why similarity search? ○ Distance function is a core issue in data mining. Without a good way to judge what is similar/dissimilar, no effective mining can be ...

An evidence based approach to reducing antibiotic ... - Semantic Scholar

Feb 10, 1999 - prescribing data for both practices; Dr John Ferguson (medical director of the ... epic verse, such as Virgil's Aeneid). Because the verse was ...

283, route de la Mini ere, 78533 Buc Cedex fbothosy,[email protected] ..... Introduction a la m ethode des nu ees dynamiques. In Analyse des donn ees ...

regulatory use of structure-activity relationships (SARs) and quantitative .... Chemical similarity attempts to answer the question: âhow similar is a given ... In 1869. Mendeleev [15] formulated the periodic table of elements through ..... Figure

Mining Ontologies for Analogy Questions: A Similarity-based Approach

Analogy questions are widely used in multiple-choice tests such as SATs ... other hand, objective questions are hard to prepare and require considerable.

A New Morphology-based Approach for Similarity ...

Searching on Wafer Bin Maps in Semiconductor. Manufacturing. Tsung-Jung Hsieh. Department of Industrial Engineering and Engineering. Management.

State Similarity Based Approach for Improving Performance in RL. â. Sertan Girgin. 1,2 and Faruk Polat. 1 and Reda Alhajj. 2,3. Middle East Technical ...

AbstractâMultiple instance learning (MIL) is a generaliza- tion of supervised learning which attempts to learn useful information from bags of instances. In MIL ...

Massachusetts' Evidence-Based Approach to Reducing Recidivism

Expand participation in effective but underused programs. ... Use the best national research to analyze all available st

ontology based approach for finding the similarity among questions in ...

To find the similarity between two questions, the concepts associated with each question are mapped to the domain ontology. Comparing these mappings gives ...

A Textual-Based Similarity Approach for Efficient and Scalable ...

similarity function that would be used as heuristic to determine if a given suspicious and source .... gram appears, limiting unnecessary comparisons. From these ...

Computing knowledge-based semantic similarity ... - Semantic Scholar

issue by exploiting the knowledge available in domain ontologies (SNOMED- ... user's specific search query (e.g. patient cohort identification) to multiple ...

ity of each surrogate server of a CDN for Web caching and content replication. Our method, called Similarity Replica- tion Caching (SRC), uses the heuristic ...

AbstractââThis paper presents a similarity-based approach for estimating the Remaining Useful Life (RUL) in prognostics. The approach is especially suitable ...

Template Based Semantic Similarity for Security ... - Semantic Scholar

are required, by the USA Patriot Act laws (Section 326), to implement a âCustomer. Identification Programâ. Anti-money laundering software can automate some ...

Semantic Similarity Based Approach for Reducing

Download PDF

0 downloads 0 Views 297KB Size Report

Comment

... and Tarau 2004, Biemann 2006). Page 3. 3. Full-text view-only at https://link.springer.com/epdf/10.1007/s10772-015-9284-6?author_access_token=lc-.

Semantic Similarity Based Approach for Reducing Arabic Texts Dimensionality Arafat Awajan The King Hussein Faculty of Computing Sciences Princess Sumaya University for Technology Amman - Jordan [email protected]

1

Abstract An efficient method is introduced to represent large Arabic texts in comparatively smaller size without losing significant information. The proposed method uses the distributional semantics to build the word-context matrix representing the distribution of words across contexts and to transform the text into a vector space model representation based on word semantic similarity. The linguistic features of the Arabic language, in addition to the semantic information extracted from different lexical-semantic resources such as Arabic WordNet and named entities’ gazetteers are used to improve the text representation and to create word clusters of similar and related words. Distributional similarity measures have been used to capture the words’ semantic similarity and to create clusters of similar words. The conducted experiments have shown that the proposed method significantly reduces the size of text representation by about 27% compared with the stem-based vector space model and by about 50% compared with the traditional bag-of-words model. Their results have shown that the amount of dimension reduction depends on the size and shape of the windows of analysis as well as on the content of the text.

Keywords: Text dimensionality reduction, distributional semantics, word-context matrix, semantic vector space model, Arabic language processing, word similarity

1

Introduction

Texts are high-dimensional objects in which every word may be considered as an independent attribute. With the increasing size of available texts in electronic form, reducing text dimensionality becomes an important issue to be addressed in many natural language processing (NLP) tasks, as these tasks perform well with low-dimensional texts and poorly in high-dimensional texts. Therefore the techniques that reduce the text dimension while minimizing information loss are essential, as they positively impacts the time and space efficiency and improves the quality of different NLP tasks’ results (Martins et al. 2003). Different techniques have been proposed for reducing text dimensionality. They are always incorporated in the model used for text representation. They aim to identify a low-dimensional representation of original text through eliminating redundancy of terms and shortening the number of features representing the text in preserving, as much as possible, the original text content. The reduction of text dimensionality can be tackled based on knowledge specified manually by experts, derived automatically from corpus statistics, or computed from linguistic resources. The accumulated knowledge from these different levels of analysis leads to special feature selection mechanisms able to reduce the text dimension without losing significant information. Generally, NLP applications receive textual documents as “bags of words”. Two main representations of texts are used in literature: feature vector representation and graph representation. In the feature vector representation, a document is presented by a vector built according to the vector space model (VSM), where the components represent the different features of the text, principally its terms or words (Salton et al. 1975). The graph representation is an alternative representation in which the terms are represented by nodes and the relations between them, such as their co-occurrence, are represented as edges (Mihalcea and Tarau 2004, Biemann 2006).

2

Full-text view-only at https://link.springer.com/epdf/10.1007/s10772-015-9284-6?author_access_token=lcKT5xXWplfbz5iXkcrfPe4RwlQNchNByi7wbcMAY4e6Q4gkQrT0QH6Dr7ErfigSfSG3PuB34XJaQev2SR8U7RQKRTDE KcGWMKNd0ozeNU8YkFgzxgQWzMeGsrQhtpd_zFK2WgIfA01V7Vi6YoxRQ%3D%3D

3