Unsupervised Text Extraction from G-Maps - arXiv

Recommend Documents

Unsupervised Relation Extraction from Web Documents - CiteSeerX

to support the expert in the exploration of the search ... the web using the Google search engine. .... Marketing GmbHâ instead of âBerlin Tourismus Mar-.

Unsupervised Structured Data Extraction from ... - Semantic Scholar

Apr 5, 2013 - integrated and reused in very big range of applications, such as price comparison ... business intelligence tools, various mashups and etc.

Unsupervised Features Extraction from Asynchronous ... - Google Sites

Now for many applications, especially those involving motion processing, successive ... 128x128 AER retina data in near

UNSUPERVISED TWO-STAGE KEYWORD EXTRACTION FROM ...

Index Termsâ keyword, topic coherence and term significance measure .... the TCS based on Google search engine, TCSg(ti), using the PLSA model as (7).

Overlay Text Extraction From TV News Broadcast - arXiv

Apr 2, 2016 - data. In Indian TV news broadcast, overlay text has widely varying formats and ... feature in a number of sub-tasks of news video analysis viz.

Intention Extraction From Text Messages

Almost every user of the Internet is using one or more text mes- .... tion, Iext, maps human communication messages (e.g., emails, SMS messages) to a set.

Text Extraction from Web Images

paper, we first give a short review of these methods proposed for text detection and recognition in web images; then a framework to extract from web images is ...

Unsupervised Deep Feature Extraction for Remote Sensing ... - arXiv

Nov 25, 2015 - Deep convolutional networks, deep learning, sparse features learning, feature ..... great majority of these methods have numerous meta-parameters and/or enforce ... Illustration of how EPLS generates the output target matrix.

Unsupervised Relation Extraction Using Dependency

dgP. cpIG pp gccd. â. â. â. â. = where p is a candidate pattern, c â the domain corpus, p' â a pattern ..... In: Proc. of EMNLP 2004, Barcelona, Spain (2004).

Towards Unsupervised Extraction of Verb Paradigms from Large ...

Trang Dang, Adwait Ratnaparkhi, Bill Woods, Lyle. Ungar, and anonymous .... gain ratio (Quinlan, 1990) were developed as metrics for automatic learning of ...

Unsupervised Skeleton Extraction and Motion Capture from Kinect ...

May 30, 2012 - Figure 1: Skeleton extraction from a 3D point cloud sequence. ... choice: why not extract object skeletons and capture motion directly from 3D.

Unsupervised Knowledge Extraction for Taxonomies of Concepts from ...

Abstract. A novel method for unsupervised acquisition of knowledge for taxonomies of concepts from raw Wikipedia text is presented. We assume that the ...

Unsupervised line network extraction from remotely sensed ... - Core

Caroline Lacoste â, Xavier Descombes â, Josiane Zerubia â and Nicolas Baghdadi +. â Ariana, joint research group CNRS/INRIA/UNSA. INRIA, 2004 route ...

Unsupervised Extraction of False Friends from Parallel Bi-Texts Using ...

Cognates, false friends, cross-lingual semantic similarity, Web as a corpus ... Previous work on extracting false friends from text can ..... A program for aligning.

Unsupervised Knowledge Extraction for Taxonomies of Concepts from ...

fier restrictions (e. g. most birds build nests), parts of the instances of ... building [8]. .... Aircraft. TitleConcept be T. T use TitleConcept. T build TitleConcept. Boat.

Automatic Extraction of Hierarchical Relations from Text

Relation extraction from text aims to detect and classify semantic relations .... General Architecture for Text Engineering (GATE) [6] is an infrastructure for.

METADATA EXTRACTION FROM TEXT IN SOCCER DOMAIN

reports for soccer domain. The UEFA ... video is available, the video portions that correspond to the ... provided for querying and searching the match corpus.

Morphological Lexicon Extraction from Raw Text Data

Markus Forsberg, Harald HammarstrÃ¶m and Aarne Ranta. {markus,harald2,aarne}@cs.chalmers.se. Department of Computing Science. Chalmers University of ...

Knowledge Extraction from Diagram and Text for

Knowledge Extraction from Diagram and Text for Media Integration. Yuichi NAKAMURA, Miwa TAKAHASHI, Masayuki ONDA, Yuichi OHTA. Institute of ...

Automated Concept Extraction from Plain Text

Automated Concept Extraction From Plain Text ... structured textual resources such as email archives, ..... The basic idea is to create a 'master' or 'template'.

Text extraction from comic books

Keywords: Haar-like features, comic books, comics text extraction, connected- .... (2004). 8. K. Arai, H. Tolle, âMethod for automatic e-comic scene frame ...

Information extraction from text messages using data

Jan 21, 2018 - world of modern computing, people feel free to share their views and ... Text mining, knowledge discovery, sentiment analysis, opinion mining.

Text Extraction from Document Images Using Fourier

Nov 9, 2014 - Fourier Transform, Document Image Analysis (DIA), DFT,. FFT, Text Extraction, Text Detection, Text Localization, Text. Enhancement. 1.

intelligent text extraction from pdf

Jun 13, 2005 - vision station TV5. ''I saw this number on the screen and it was a 140,'' Aubenassaid. ''It took a whiletorealizeit referredtothenum- ber of days I ...

Unsupervised Text Extraction from G-Maps - arXiv

Download PDF

14 downloads 5523 Views 654KB Size Report

Comment

magazines, articles, books, video snapshots, maps, news papers, manuscripts [32], advertisements, banners, street sign boards, name plates, cards, web pages ...

Unsupervised Text Extraction from G-Maps Chandranath Adak Department of Computer Science and Engineering University of Kalyani West Bengal-741235, India [email protected] Abstract—This paper represents an text extraction method from Google maps, GIS maps/images. Due to an unsupervised approach there is no requirement of any prior knowledge or training set about the textual and non-textual parts. Fuzzy CMeans clustering technique is used for image segmentation and Prewitt method is used to detect the edges. Connected component analysis and gridding technique enhance the correctness of the results. The proposed method reaches 98.5% accuracy level on the basis of experimental data sets. Keywords—clustering; connected component; Fuzzy C-Means clustering; gridding; text extraction

I.

INTRODUCTION

FUZZY C-MEANS CLUSTERING

Clustering [7] is a process to partition a space into some regions on the basis of some similarities and properties. FCM clustering technique is developed and improved by Dunn (1973) and Bezdek (1981) respectively. FCM clustering algorithm is as follows: (i) Randomly choose K points as initial cluster centers {C1, C2,…, CK} from a input space of data set X {x1, x2,…, xn }. (ii) Assign each point to its nearest cluster center, i.e., generate partition matrix U(X). U = [ uki ] ; k=1,2,…K ; i=1,2,…,n

;

(1)

(2)

,

∑

,

1

;1

d(z,x) is the distance function (here we use Euclidean distance function); m(>1) is a fuzzy exponent, called as fuzzifier. (iii) zk is calculated as follows : ∑

Text extraction [1-3] is a technique to take-out the textual portion from a non-textual background, e.g. images of magazines, articles, books, video snapshots, maps, news papers, manuscripts [32], advertisements, banners, street sign boards, name plates, cards, web pages etc. This text extraction problem is challenging in the widespread research areas of computer vision due to its complex backgrounds, i.e. unknown regions: mixture of textual and non-textual graphics. We deal with text extraction from Google maps (G-Maps), which is very essential for geographical information systems (GIS) [4] to know the information about the particular geographic region, written in the textual portions. Here we separate the texts from G-Maps in the unsupervised manner, i.e. no prior knowledge is required. We also use fuzzy c-means (FCM) clustering [5] for segmenting the input G-Maps, and Prewitt [6] method for edge detection. Connected component (CC) analysis and gridding lead the proposed method to be more accurate. II.

uki is the membership degree of ith point to kth cluster.

.

(3)

∑

(iv) Calculate cluster validity index (Jm) : ∑

∑

,

(4)

Clustering result is better for smaller Jm value. III.

PROPOSED METHOD

The proposed method consists of following steps: Step 1: G-Map is taken as input image IRGB (fig.2a). Step 2: IRGB is converted into grayscale image Igray (fig.2b) and noise is removed. Step 3: Igray is segmented [6, 8] with the help of FCM, and we get the segmented mask image Imask (fig.2c). Step 4: Ie , i.e. the edge (fig.2d) of Imask , is generated using Prewitt method [6] . Step 5: Dilation [6, 9] operation is performed on Ie to get the dilated image Id (fig.2e). Step 6: We generate all the connected components (Icc) [6,10] from Id , by the concept of label matrix. The maximal components Imcc (fig.2f) are extracted based on a threshold [6, 11] (T : 0