A Paper-to-HTML Table Converting System

18 downloads 24704 Views 352KB Size Report
German Research Center for Artificial Intelligence (DFKI). Р.O. Box 2080, ... table cells and the analysis of the layout to determine a correct row/column mapping. We start with .... The above table consists of what we call type 1 blocks a directory ...
 

               

               ! " #$# ! % " & '() * +, -," ,+, -,                                                   

         

                                                                

                                                      

                                  !  "#$      

 

                                 Æ                                                                          

                      

      !            "                 

     

  !           # $             %          &       '       (       )      

 

      

*

       

                         (  )                             + ,             -            +                                 ,                                             Document

Block Word

Block

Word Word

Word Word

Word

Block

Word Word

Word Word

Word

Word Word

Word Word

% &' (       )     

.                   /  *                 (    ) 0  1         

   

   !  "

                         2            

  %  #                                  

        #  $  %  (! 3) # 

      Æ          % '

                  #'         '                          .                    %        3   4                               5                   

6        &     7    4    8

   (          )                                                   (     )                          2 9                                                                                      :   '                    !          

                              6   '                

 % !  " '  ( 

5            ' 1                           )             

  (   )      1                                 This is a small it consists of lines - enough

This is a small it consists of lines - enough

This is a small it consists of lines - enough

% *'               +  ,

/  8                   (        ;   8 ?             > &

& $ @ A

0        (   )> /                 8 &  $> 5                 *> #       

,      ; <                          #                                            !           (/  8 )    '                                              * (     %                    

        

   /  & 9                              !                     13968 6254 17391 9613 13741 3128

Nov Nov Nov Nov Nov Oct

12 7 7 12 12 24

13:38 09:57 10:02 13:24 10:34 10:50

blocklist.c mainctrl.c margin.c ocrread.c restruct.c segmenter.c

13968 6254 17391 9613 13741 3128

Nov Nov Nov Nov Nov Oct

12 7 7 12 12 24

13:38 09:57 10:02 13:24 10:34 10:50

blocklist.c mainctrl.c margin.c ocrread.c restruct.c segmenter.c

% -' )        

9               '                       5    2        

     6       +     

!   "#"

9                 '    

           9                        

             '     $

* ,       

                      1             (/  $ 2 ) 5            B             '     2                   (   )                   

            #     (       )         !    -         

       $ $  (   )                  /  $ ( )    

       9                     

 (   ) ,       (  

 1    >       >   >    )              

                   /  $ ( )        ;   <    -rw-r--r--rw-r--r--rw-r--r--rw-r--r--

1 1 1 1 1

kieni kieni kieni kieni

6642 17811 13414 18879 3128

Dec 16 13:21 mainctrl.c 5 11:06 Dec 16 postscript.c Dec 5 16:41 Oct 24 10:50 segmenter.c

-rw-r--r--rw-r--r--rw-r--r--rw-r--r--

1 1 1 1 1

kieni kieni kieni kieni

6642 17811 13414 18879 3128

Dec 16 13:21 mainctrl.c 5 11:06 Dec 16 postscript.c Dec 5 16:41 Oct 24 10:50 segmenter.c

-rw-r--r--rw-r--r--rw-r--r--rw-r--r--

1 1 1 1 1

kieni kieni kieni kieni

6642 17811 13414 18879 3128

Dec 16 13:21 mainctrl.c 5 11:06 Dec 16 postscript.c Dec 5 16:41 Oct 24 10:50 segmenter.c

(c) The above table consists of what we call type 1 blocks a directory listing - manualy made "sparse".

Segmentation of Evaluation of Reconstruction text blocks and table rows/cells of row/columntable columns based on Heuristics structure

The above table consists of what we call type 1 blocks a directory listing - manualy made "sparse".

Segmentation of Evaluation of Reconstruction text blocks and table rows/cells of row/columntable columns based on Heuristics structure

The above table consists of what we call type 1 blocks a directory listing - manualy made "sparse".

Segmentation of Evaluation of Reconstruction text blocks and table rows/cells of row/columntable columns based on Heuristics structure

(a) Name-Anschluss-Raum Kieni 3485 380 Hinkelma 3456 369 Malburg 3585 474 Lutzy 3464 470

Name-Anschluss-Raum Kieni 3485 380 Hinkelma 3456 369 Malburg 3585 474 Lutzy 3464 470

The table shown to the left needs to be postprocessed in order to isolate the glued columns.

The table shown to the left needs to be postprocessed in order to isolate the glued columns.

Name-Anschluss-Raum Kieni 3485 380 Hinkelma 3456 369 Malburg 3585 474 Lutzy 3464 470

The table shown to the left needs to be postprocessed in order to isolate the glued columns.

(b) The following paragraph demonstrates the ability of our system to detect so called non-Manhattan layout objects

The following paragraph demonstrates the ability of our system to detect so called non-Manhattan layout objects

The following paragraph demonstrates the ability of our system to detect so called non-Manhattan layout objects

This is an example which becomes documents, but it is smaller and smaller from line remarkable to say that to line. Such a layout is my segmentation algorithm hard to find in todays is able to recognize the gap.

This is an example which becomes documents, but it is smaller and smaller from line remarkable to say that to line. Such a layout is my segmentation algorithm hard to find in todays is able to recognize the gap.

This is an example which becomes documents, but it is smaller and smaller from line remarkable to say that to line. Such a layout is my segmentation algorithm hard to find in todays is able to recognize the gap.

(c) Pos Nmb Description 1 2

2 4

3 4

2 1

PostScript Ref. Manual PS Quick Reference Guide and Tutorial Pattern Recognition Handbook SPIE Document Recognition IV

Pos Nmb Description 1 2

2 4

3 4

2 1

PostScript Ref. Manual PS Quick Reference Guide and Tutorial Pattern Recognition Handbook SPIE Document Recognition IV

Pos Nmb Description 1 2

2 4

3 4

2 1

PostScript Ref. Manual PS Quick Reference Guide and Tutorial Pattern Recognition Handbook SPIE Document Recognition IV

% .' /              +  ,

* .    / 0

5                    (/  $ 2 ) #          @

         

                   ( 9#755  )       1        '                           

        

               9                

    2                          '          ( ) 

 !          2                   9      .        1    8                   /  @ ( ) ** '      

     

                                 (/  $ 2 ) 9                               5            

             (   )        1                /                          -rw-r--r--rw-r--r--rw-r--r--rw-r--r--

1 1 1 1 1

kieni kieni kieni kieni

6642 17811 13414 18879 3128

Dec 16 13:21 mainctrl.c 5 11:06 Dec 16 postscript.c Dec 5 16:41 Oct 24 10:50 segmenter.c

The above table consists of what we call type 1 blocks a directory listing - manualy made "sparse".

Segmentation of Evaluation of Reconstruction text blocks and table rows/cells of row/columntable columns based on Heuristics structure

Name-Anschluss-Raum Kieni 3485 380 Hinkelma 3456 369 Malburg 3585 474 Lutzy 3464 470

The table shown to the left needs to be postprocessed in order to isolate the glued columns.

-rw-r--r--rw-r--r--rw-r--r--rw-r--r--

1 1 1 1 1

kieni kieni kieni kieni

6642 17811 13414 18879 3128

Dec 16 13:21 mainctrl.c 5 11:06 Dec 16 postscript.c Dec 5 16:41 Oct 24 10:50 segmenter.c

The above table consists of what we call type 1 blocks a directory listing - manualy made "sparse".

Segmentation of Evaluation of Reconstruction text blocks and table rows/cells of row/columntable columns based on Heuristics structure

Name-Anschluss-Raum Kieni 3485 380 Hinkelma 3456 369 Malburg 3585 474 Lutzy 3464 470

The table shown to the left needs to be postprocessed in order to isolate the glued columns.

-rw-r--r--rw-r--r--rw-r--r--rw-r--r--

1 1 1 1 1

kieni kieni kieni kieni

6642 17811 13414 18879 3128

Dec 16 13:21 mainctrl.c 5 11:06 Dec 16 postscript.c Dec 5 16:41 Oct 24 10:50 segmenter.c

The above table consists of what we call type 1 blocks a directory listing - manualy made "sparse".

Segmentation of Evaluation of Reconstruction text blocks and table rows/cells of row/columntable columns based on Heuristics structure

Name-Anschluss-Raum Kieni 3485 380 Hinkelma 3456 369 Malburg 3585 474 Lutzy 3464 470

The table shown to the left needs to be postprocessed in order to isolate the glued columns.

The following paragraph demonstrates the ability of our system to detect so called non-Manhattan layout objects

The following paragraph demonstrates the ability of our system to detect so called non-Manhattan layout objects

The following paragraph demonstrates the ability of our system to detect so called non-Manhattan layout objects

This is an example which becomes documents, but it is smaller and smaller from line remarkable to say that to line. Such a layout is my segmentation algorithm hard to find in todays is able to recognize the gap.

This is an example which becomes documents, but it is smaller and smaller from line remarkable to say that to line. Such a layout is my segmentation algorithm hard to find in todays is able to recognize the gap.

This is an example which becomes documents, but it is smaller and smaller from line remarkable to say that to line. Such a layout is my segmentation algorithm hard to find in todays is able to recognize the gap.

Pos Nmb Description 1 2

2 4

3 4

2 1

PostScript Ref. Manual PS Quick Reference Guide and Tutorial Pattern Recognition Handbook SPIE Document Recognition IV

Pos Nmb Description 1 2

2 4

3 4

2 1

PostScript Ref. Manual PS Quick Reference Guide and Tutorial Pattern Recognition Handbook SPIE Document Recognition IV

Pos Nmb Description 1 2

2 4

3 4

2 1

PostScript Ref. Manual PS Quick Reference Guide and Tutorial Pattern Recognition Handbook SPIE Document Recognition IV

% 0' #  +$, $            

!                                      A

       

            '                          

                                        !               

      :          1    /  @ ( )              >                        ;   <    /                              1                     

                       

$  %& & '  &'""

#                           1         /  A /                    -              

                   

Data Cells

Table Columns

Table Rows

Table Tiles

% 1' 2       

                              !        

  '                                              /  C                       !    '        C

                     (  # &&)            5          /  C ( )           5                         ; <  (      ) -                $   $    6             ,                 !   '                                                                                                                               /             2 * 9         (      ) 8 9                 1     & 9         (    )                     $ 9       ; <         /  C ( )                               2 %  3     

        6?B                                                   2 (*)       > (8)          > (&)              

            > ($)        3

4

3

4

2

3

2

1

3

4

3

4

4

2

1

2

% 3'                    

D

( &"  &) #                                  ,         .7%                 

% 4' 5               

-                                        

                  4 .   '                          (/  D) E

.               

' ,                                     9             /                     

  

*"

*  %  4   #  F             %  % E$  *$@8    7 # 7  F   *EE$ 8 ? 9  % '  G  /   0 H %        5 &     $&,. '        *EE$ & 9 B   #' %         5 &     ,  '   .  &  5

     6    3    ?

 #  *EEI $ 0 3   ? 4      %             5 &     2 $      (  ,       $ (,785  95 +  *EE@ @ 4   5              

       5 &   ,  '     (      ,' ( 7* *EE& A J 6   9                 5 &   ,  '     (      ,' ( 785  5 ' *EE@ C #  7    %   4   #          5 &   ,  '     (      ,' ( 7* *EE& D 3 -   # # 6            5 &     : , '   &    ;,'&

Suggest Documents