Semi-Supervised Feature Learning for Off-Line Writer ... - arXiv

0 downloads 0 Views 526KB Size Report
Based on experiments on ICDAR2013, CVL and IAM benchmark datasets, our results showed that semi-supervised feature learning improved the baseline ...
Semi-Supervised Feature Learning for Off-Line Writer Identifications Shiming Chen1, Yisong Wang1*, Chin-Teng Lin2, Zehong Cao2* 1Guizhou

University, Guiyang, Guizhou, China.

([email protected]; [email protected]) 2Centre

for Artificial Intelligence, Faculty of Engineering and IT, University of Technology Sydney, Sydney, NSW, Australia. ([email protected]; [email protected])

*Corresponding authors.

Abstract. Conventional approaches used supervised learning to estimate off-line writer identifications. In this study, we improved the off-line writer identifications by semi-supervised feature learning pipeline, which trained the extra unlabeled data and the original labeled data simultaneously. In specific, we proposed a weighted label smoothing regularization (WLSR) method, which assigned the weighted uniform label distribution to the extra unlabeled data. We regularized the convolutional neural network (CNN) baseline, which allows learning more discriminative features to represent the properties of different writing styles. Based on experiments on ICDAR2013, CVL and IAM benchmark datasets, our results showed that semi-supervised feature learning improved the baseline measurement and achieved better performance compared with existing writer identifications approaches.

Keywords: Semi-supervised Learning, Regularization, CNN, Writer Identifications.

1

Introduction

Handwritten texts are used for physiological biometric identifiers like speech, fingerprints, and faces, especially finding an individual writer or same writer documents for the forensics or security in a large data corpus. This topic has also attracted attention in the field of historical document analysis as for the mass-digitization processes of historical documents [1], [2], [3]. However, it requires lots of time and energy by experts. Therefore, many researchers contribute their attention to the automatic identification. Writer identification aims to extract the most similar document written by the same writer in a query database. This task usually has higher challenges for that, and it

2

required documents are sorted according to the distance while for identifying the writer of documents with the highest similarity (e.g. the smallest distance of feature vectors) is then assigned as an author to the document. Writer identification including two types is on-line and off-line. Furthermore, the latter one can be categorized into allographbased and textural-based methods. Textural-based methods compute a global statistics directly from the handwriting, e.g., the angles of stroke directions [4] or the width of the ink trace [5] were used for writer identification purposes. Allograph-based methods rely on local descriptors computed from small patches (allograph), and a global document descriptor is calculated utilizing statistics using the local descriptors subsequently [6], [7], [8]. The two methods can be further combined to form a discriminative global feature [9]. In this work, the proposed semi-supervised feature learning pipeline bases on allograph for off-line writer identifications. Although writer identification has achieved significant performance in some benchmark dataset, there are still lots of challenges of them for applying in the real world. Firstly, the use of different pens, the physical condition of the writer, distractions like multitasking and noise, and also that the writing style changes with age, which are the key factors resulting in the undesired performance of writer identification. Secondly, the writers of the training set are the difference from the test set, and every writer only contributes few handwritten text images in typically used benchmark datasets. Thirdly, the amount of handwriting benchmark dataset is severely insufficient for convolution neural network (CNN) model training, so it is a challenge to train a reliable CNN model using limited data. To overcome the challenge, we proposed a semi-supervised pipeline that leverages semi-supervised feature learning with deep CNN and the weighted label smoothing regularization (WLSR) as a powerful model to learn discriminative representations for off-line writer identifications. In specific, we firstly pre-proceed the original data and extra unlabeled data for training. Then, these original labeled data and extra unlabeled data are fed into the ResNet model [10] simultaneously. Furthermore, the WLSR method regularizes the learning process by integrating the unlabeled data, which can reduce risks of over-fitting and direct model to learn more competitive features. Finally, the mean values of all local features of every test page are regarded as global feature vectors for identifications. To summarize, we addressed below contributions in this study: a. Our research is the first study using a semi-supervised feature learning pipeline that integrates extra images and original images into ResNet model for writer identifications. The WLSR method for semi-supervised learning is proposed to regularize the identification model with the unlabeled data. b. Our results showed the proposed semi-supervised learning model had a consistent improvement over the ResNet baseline and achieved better performance than existing approaches based on benchmark datasets.

3

2

Related Works

In this section, we reviewed the relevant works on CNN, semi-supervised learning and writer identification. 2.1

Convolutional Neural Networks (CNNs)

CNNs are a well-known deep learning architecture inspired by the natural visual perception mechanism of the living creatures. It has been widely used and achieved exciting performance in the field of image classification and object recognition, object detection and tracking [10],[11], due to its powerful ability to learn deep features. Alex Krizhevsky et al. [11] proposed AlexNet to achieve a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry in the ILSVRC-2012. Since that time, VGGNet, GoogleNet, ResNet [10] were proposed to achieve better performance in the computer version field. K. He et al. [10] deepened the network with residual unit successfully and achieved the state-of-the-art performance in ILSVRC 2015. In [12], K. He further proved by experiments that ResNet can achieve more competitive representations than other CNN models. R. Eldan et al. [13] shown that the deeper network will learn the more discriminative representation, but it will need more resources to train. Therefore, a trade-off is recommended to use the ResNet-50 model in this work. 2.2

Semi-supervised Learning

Semi-supervised learning serves as a sub-class of supervised learning considering the unlabeled data, especially when the volume of the dataset is small. Some researchers proposed semi-supervised learning pipeline by combining unsupervised learning with supervised learning [14]. On the other hand, some prior works assigned the original label or new label to the unlabeled data. For example, Lee et. al [15] assigned a pseudo label to the unlabeled data in the class by taking the maximum value of the probability prediction. Another study [16] demonstrated the samples produced by the generator of the generative adversarial networks (GANs) are all taken as one class in the discriminator. We assumed to use different semi-supervised learnings by assigning a weighted uniform label distribution to the extra data according to the original training data (real data), which has the potential to regularize the baseline to improve performance. 2.3

Writer Identification

Recent progress works in writer identification mainly benefit from advancing CNNs. S. Fiel et al. [17] used a series of image preprocessing (binarization, text line segmentation, sliding window), and then generated a discriminative feature by cafenet for each 56x56 image patch. Xing and Qiao et al. [3] designed and optimized multi-stream structure for writer identification task and achieved high identification accuracy on the IAM and HWDB datasets. Christlein et al. [18] also proposed an effective way to learn CNN activation features in an unsupervised manner, which achieve the state-of-the-art result on Historical-WI (ICDAR 2017) datasets. Tang and Wu et al. [19] introduced a novel

4

method for off-line writer identification by using CNNs and joint Bayesian, and obtain the best performance compared to the existing methods evaluated on the benchmark datasets. Based on prior studies, we considered the ResNet-50 as a CNN baseline in our study, which is subsequently regularized by the WLSR method.

3

Semi-supervised Learning Pipeline

As shown in Fig. 1, our proposed pipeline consists of three main steps. A. Preprocessing: the handwriting pages were segmented to 256x256 patches by line segmentation method [20], and sliding window patches scanning without overlapping. B. Features extraction: the ResNet-50 baseline regularized by WLSR serves as an identification model for local features extraction. C. Encoding: take the mean value of all local features of every test page as a global feature vector, which is used for writer identification with the nearest neighbor approach.

Fig. 1. The pipeline of semi-supervised feature learning.

3.1

Preprocessing

At first, a binarization is taken for all handwriting pages with Otsu. Second, the lines have to be segmented respectively. Due to the CVL dataset [2] and IAM [21] dataset already provide a segmentation of the words, so these images are used for training and evaluation after normalization. For the ICADR2013 competition on Writer Identification dataset [1] the lines are segmented with the method of Arivazhagan [20], which based on a statistical approach to segment the text line exactly. Finally, all the text lines were cut into 256x256 patches without overlap using the sliding window. Furthermore, we remove noise patches (e.g. blank patches) to avoid the negative effects.

5

3.2

Feature Extraction

CNN Baseline. In this study, the ResNet-50 model is used as a baseline, because it can learn discriminative representations without too much time and computational resource in writer identification task. Following the conventional fine-tuning strategy, we use a model pre-trained on ImageNet. In order to avoid the model over-fitting and achieve more abstract features, we add a relu layer and replace the original pool layer with global average pool layer before the fully-connected layer. In addition, we modify the last layer to have K neurons to predict the K-classes, where K is the number of the classes of original training data. The extra data are mixed with original training data as the input of the CNN. That is, the labeled original training data and the unlabeled extra data are shuffled and simultaneously trained. After training, we will extract the local features of all the test patches from the fully-connected layer. Weighted Label Smoothing Regularization Method. Label smoothing regularization for outliers (LSRO) was firstly proposed by Zheng [22] for person re-identification. It extended label smoothing regularization (LSR) [23] from the supervised domain to leverage unsupervised data generated by GAN and set the virtual label distribution to be uniform over all classes, which regularized the baseline model and achieved better retrieval performance than baseline. In this work, we proposed the weighted label smoothing regularization for outliers (WLSR) method to regularize the CNN baseline with the extra unlabeled data for off-line writer identification. WLSR set the virtual label distribution to be weighted uniform over all classes, which will effectively regularize the baseline according to original train data distribution. For instance, the original training patches have a great number of words repeatedly, such “the” and “a”, so the identification model takes the frequency of occurrence of these words as discriminative representations, which limit the discriminative ability of the model. However, if we add these words of unlabeled extra data into the model for training, the classifier will make the wrong prediction towards the labeled words of patches, and thus the classifier will be penalized. We will further introduce WLSR concretely as follows. WLSR is proposed for use with the cross-entropy loss. Formally, let k Î {1, 2,..., K } be the original training data classes and the numbers of original training data be N. The cross-entropy loss can like Eq. 1. K

l = -å log( p(k ))q(k ),

(1)

k =1

Where p(k ) Î [0,1] is the predicted probability of training data belonging to a class k . It is derived from the softmax function that normalizes the output of the previous fullyconnected layer q ( k ) , which is the ground truth distribution. Let y be the ground truth class label. A pair (x i , yi ) is called the original training example, and i Î{1, 2,..., N } . Due

6

to WLSR set the virtual label distribution to be weighted uniform over all classes, so the label distribution qWLSR (k ) can be written as: N ì I ( yn = k ) å ï ï Z n =1 ï N qWLSR (k ) = í N ï I ( yn = k ) å ï n =1 ï1 - Z + Z N î

k ¹ y,

(2)

k=y

The Eq. 2 is called the weighted label smoothing regularization (WLSR). Combining Eq. 1, Eq. 2, we can rewrite the cross-entropy loss as: N

lWLSR = -(1 - Z ) log( p ( y )) - Z

K

å log( p(k )) k =1

å I(y n =1

n

N

= k) .

(3)

For the extra data, Z=1. For the original training data, Z=0. Therefore, the semi-supervised learning has two types of losses, one for real images and one for extra images. Compared to LSR and LSRO, WLSR may regularize the CNN baseline more effectively, because it more exactly reflects the original training data distribution. While the number of real images for each class is equal, the WLSR will equivalent to LSRO. Encoding and Evaluation. The all-local descriptors have been extracted from the model training. We need to aggregate them to encode a global feature vector for each test document. At first, we reduce the dimensionality of the local descriptors with PCA white, which has been shown that it will lower the identification time and improve the identification performance effectively [8],[18]. In addition, we take the mean value of all local descriptors of each test page as the global feature vector, which is used for nearest neighbor search using Manhattan distance as the evaluation metric.

4

Evaluation

4.1

Datasets

There are three different benchmark datasets were used for evaluation: the ICDAR2013 benchmark dataset [1], the CVL dataset [2] and the IAM [21] dataset. All of them are public and have been used in many recent publications [3], [6-9], [17], [19]. ICDAR2013 [1]. The ICDAR2013 benchmark dataset is divided into a training set with documents from 100 writers and a test set with documents from 250 writers. Every writer contributed four documents, which including two Greek documents and two English documents.

7

CVL [2]. There are 310 writers contributing documents for CVL dataset. The 27 writers of training set contributed seven documents each, and the 283 writers of test set contributed five documents each. All writers contributed one German document, and the others are English documents. IAM [21]. The IAM dataset was contributed by approximately 400 different writers with 1066 forms. A total of 82,227-word examples out of a vocabulary of 10,841 words occur in the collection. All of the documents were written in English. 4.2

Evaluation Metric

The mean average precision (mAP) and hard TOP-k scores, which are common evaluation metrics in image retrieval tasks. They are now used for our experimental evaluation. A ranking list of all documents in the query library is generated according to the similarity for each query document. Suppose there are N handwriting documents to be the query, thus the average precision AP(i ) of ith ( 1 £ i £ N ) query document like Eq. 4.

å AP(i ) =

M k =1

P(k ) rel (k ) R

(4)

Where M is the number of documents in the query library, R is the number of relevant documents of the ith query document in the query library. P(k ) is the precision at rank k , given by the number of documents from the same writer in the query up to rank k divided by k . rel (k ) is an indicator function that is one when the document retrieved at rank k is from the same writers and zero otherwise. The mAP is the mean value of average precision of all query documents. The hard TOPk scores depend on the calculation of the percentage of queries result, where the k highest ranked documents are from the same writer. 4.3

Experiments

We evaluate the proposed method with ICDAR2013, CVL and IAM benchmark dataset. We introduce the implementation detail and analysis experimental results as follows. Implementation Detail. In this work, we adopt the ResNet-50 model as a baseline. To gather more abstract features, we take the global average pool layer to replace the original pool layer and add a Relu activation feature layer. Furthermore, the fully-connected layer was modified to have 100 and 27 neurons for ICADAR2013 and CVL, respectively. We add a dropout layer before the last convolutional layer and set the dropout rate to 0.5 for training. The momentum of stochastic gradient descent is set to 0.9. We set the learning rate of the convolutional layers to 0.1 and decay to 0.01 after 45 epochs.

8

For evaluating the ICDAR2013, we take the ICDAR2013 training image patches as original labeled data and take the CVL training image patches as the extra unlabeled data. Due to the CVL and IAM datasets already provide a segmentation of words. Thus we directly take the CVL training words as original labeled data and IAM words as extra unlabeled data for evaluating the CVL dataset. The segmented image patches size is set to 256x256 while the width or height of words is set to 256 and keep the original aspect ratio as input. Table 1. The influence of the number of neurons of the fully-connected layer on CVL test set evaluated with hard TOP-k and mAP metric (%).

Fc-512

TOP-1 97.9

TOP-2 97.0

TOP-3 93.6

TOP-4 85.0

mAP 96.4

Fc-1024

98.4

97.4

94.9

87.9

97.0

Fc-2048

99.2

97.9

96.0

90.2

97.8

Fc-4096

98.5

97.6

94.7

88.0

97.3

Experimental Result Analysis. At first, we evaluate how the numbers of neurons of fully-connected layer affect the writer identification. The number of neurons is set to 512, 1024, 2048, 4096, which are assessed on CVL dataset, as shown in Table 1. It was evident that the semi-supervised feature learning pipeline achieves the best performance when the number of neurons of the fully-connected layer is set to 2048. Thus all of the following experiments use this configuration. Secondly, we verify the regularization ability of WLSR methods in semi-supervised feature learning pipeline. We added the same labeled and unlabeled data into the baseline and proposed pipeline for training, respectively. As shown in Table 2 and Table 3, we observe that the labeled data added in baseline almost have not effect for writer identification while semi-supervised leaning pipeline takes the same unlabeled data to improve the identification rate, which shows that it is the regularization of WLSR improve the performance of baseline, excluding the extra data.

Table 2. Comparison of different numbers of extra unlabeled images on CVL test set evaluated with hard TOP-k and mAP metric (%). 0(baseline)

TOP-1 98.3

TOP-2 97.0

TOP-3 92.5

TOP-4 87.0

mAP 95.7

12000(baseline)

98.4

97.0

94.0

87.2

96.8

1000(WLSR)

98.8

97.9

95.0

88.5

97.3

5000(WLSR)

98.9

97.9

95.4

88.9

97.5

12000(WLSR)

99.2

97.9

96.0

90.2

97.8

24000(WLSR)

99.0

97.9

95.2

89.9

97.6

9

Thirdly, we compare the proposed semi-supervised learning pipeline with baseline. As shown in Table 2, when we add 12000 extra unlabeled IAM words into CNN for training, our method significantly improves the writer identification performance on CVL test set. It reveals that WLSR method takes improvements of 0.9% (from 98.3% to 99.2%), 0.9% (from 97.0% to 97.9%), 3.5% (from 92.5% to 96.0%), 3.2% (from 87.0% to 90.2%) and 2.1% (from 95.7% to 97.8%) in hard TOP-1, TOP-2, TOP-3, TOP-4 and mAP, respectively. On ICADAR2013 shown in Table 3, we observe improvements of 1.7%, 4.4%, 6.0% and 2.1% in hard TOP-1, TOP-2, TOP-3, and mAP, respectively, when 1000 extra unlabeled CVL patches are added. It is self-evident that the proposed semi-supervised feature learning pipeline effectively improves the performance of the baseline. Finally, we extracted the number of extra unlabeled data profoundly affecting the regularization ability of WLSR. If too few extra unlabeled data are incorporated into the pipeline, the regularization of the WLSR is insufficient. In contrast, if too many extra unlabeled data are added, the pipeline tends to coverage towards assigning the weighed uniform prediction probabilities to all the training data. Therefore, the appropriate extra unlabeled data added to the system for the different dataset is demanded, which will avoid poor regularization and over-fitting of the pipeline. Table 3. Comparison of different numbers of extra unlabeled images on the ICDAR2013 test set evaluated with hard TOP-k and mAP metric (%). 0 (baseline)

TOP-1 94.9

TOP-2 74.6

TOP-3 55.1

mAP 88.0

1000(baseline)

95.1

74.3

57.3

88.1

500(WLSR)

94.8

75.5

56.3

88.1

1000(WLSR)

96.6

79.0

61.1

90.1

2000(WLSR)

96.5

78.6

59.6

90.0

5000(WLSR)

94.9

74.3

56.5

88.0

10 Table 4. Comparison of performance with other methods on CVL test set. Hard TOP-k and mAP metrics are listed (%). CS-UMD [2]

TOP-1 97.9

TOP-2 90.0

TOP-3 71.2

TOP-4 48.3

mAP N/A

QUQA A [2]

30.5

5.7

0.5

0.1

N/A

QUQA B [2]

92.9

84.9

71.5

50.6

N/A

TEBESSA-c [2]

97.6

94.3

88.2

73.9

N/A

TSINGHUA [2]

97.7

95.3

94.5

73.0

N/A

Fiel et al [17]

98.9

97.6

93.3

79.9

N/A

Christlein et al [7]

98.7

97.7

95.2

87.3

96.1

Nicolaou et al [6]

99.0

97.7

95.2

86.0

N/A

Christlein et al [8]

98.8

97.8

95.3

88.8

96.4

Ours

99.2

97.9

96.0

90.2

97.8

Table 5. Comparison of performance with the other methods on the ICDAR2013 test set. Hard TOP-k and mAP metrics are shown (%). CS-UMD-b [1]

TOP-1 95.0

TOP-2 20.2

TOP-3 8.4

mAP N/A

HIT-ICG [1]

94.8

63.2

36.5

N/A

TEBESSA-c [1]

93.4

62.6

36.5

N/A

CVL-IPK [1]

90.9

44.8

24.5

N/A

Fiel et al [17]

88.5

40.5

15.8

N/A

Christlein et al [7]

97.1

42.8

23.8

67.1

Nicolaou et al [6]

97.2

52.9

29.2

N/A

Christlein et al [8]

98.2

71.2

47.7

81.4

Ours

96.6

79.0

61.1

90.1

Finally, we compared our method with the other published methods evaluated on ICDAR2013 and CVL dataset, listed in Table 4 and Table 5. We can observe that the semi-supervised learning pipeline can achieve the better result than most of the other published approaches. Due to the encoding method lost too much important feature information, which cannot produce good outcomes compared to a new methodology [19]. Furthermore, we concentrated attention on the regularization ability of WLSR method for off-line writer identification, not on achieving a state-of-the-art identification result.

5

Conclusion

In this paper, we proposed a semi-supervised feature learning pipeline for off-line writer identifications. To the best of our knowledge, this is the first attempt to take

11

semi-supervised feature learning to the field of writer identifications. Of note, the WLSR method is introduced to train the extra unlabeled data and original labeled data simultaneously for the semi- supervised learning pipeline with its regularization ability, which improved the identification result for baseline model and achieved better performance than the other most published methods on CVL and ICDAR2013 datasets. In the future, we will take a better encoding method (resp. Fisher Vector, VLAD) replace the encoding method used in our current work. Furthermore, we will adopt the extra unlabeled data generated by generative adversarial networks for semi-supervised learning training, because it shares a more similar distribution with original labeled data.

References 1. G. Louloudis, B. Gatos, N. Stamatopoulos, and A. Papandreou. ICDAR 2013 Competition on Writer Identification. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, DC, USA: IEEE, 2013: 1397-1401. 2. Kleber, F., Fiel, S., Diem, M., Sablatnig, R. CVL-DataBase: An Off-Line Database for Writer Retrieval, Writer Identification and Word Spotting. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington DC, USA: IEEE, 2013: 560 -564. 3. L. Xing and Y. Qiao. DeepWriter: A Multi-Stream Deep CNN for Text independent Writer Identification. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition. Shenzhen, China: IEEE, 2017: 584–589. 4. S. He and L. Schomaker, Delta-n Hinge: Rotation-Invariant Features for Writer Identification. In: Proceedings of the 2014 22nd International Conference on Pattern Recognition. Stockholm, Sweden: IEEE, 2014: 2023–2028. 5. A. Brink, J. Smit, M. Bulacu, and L. Schomaker. Writer Identification Using Directional Ink-Trace Width Measurements. Pattern Recognition, 45(1), 2012: 162– 171. 6. A. Nicolaou, A.D. Bagdanov, M. Liwicki, and D. Karatzas. Sparse Radial Sampling LBP for Writer Identification. In: Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis, Tunisia: IEEE, 2015: 716720. 7. V. Christlein, D. Bernecker, F. Honig, and E. Angelopoulou. Writer identification and verification using GMM supervectors. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2014: 998-1005. 8. Christlein V, Bernecker D, Hönig F. Writer Identification Using GMM Supervectors and Exemplar-SVMs. Pattern Recognition, 2017, 63: 258-267. 9. Wu, X., Tang, Y., Bu, W. Offline Text-Independent Writer Identification Based on Scale Invariant Feature Transform. IEEE Transactions on Information Forensics and Security, 2014, 9(3), 526-536. 10. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, United States: IEEE, 2016: 770-778.

12

11. A. Krizhevsky, I. Sutskever, G. E Hinton. ImageNet classification with deep convolutional neural networks. In: Proceedings of 25th International Conference on Neural Information Processing Systems. Curran Associates, United States: ACM, 2012:1097-1105. 12. K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, Netherlands: Springer International Publishing, 2016: 630–645. 13. R. Eldan, O. Shamir. The Power of Depth for Feedforward Neural Networks . Computer Science, 2015. 14. R. R. Varior, B. Shuai, J. Lu, D. Xu, and G. Wang. A Siamese long short-term memory architecture for human re-identification. In ECCV, 2016. 15. D. H. Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshop, 2013. 16. A. Odena. Semi-supervised learning with generative adversarial networks. arXiv:1606.01583, 2016. 17. S. Fiel and R. Sablatnig. Writer Identification and Retrieval Using a Convolutional Neural Network. In: Proceedings of the 16th Computer Analysis of Images and Patterns. Valletta, Malta: Springer International Publishing, 2015: 26–37. 18. V. Christlein, M. Gropp, S. Fiel, and A. Maier. Unsupervised Feature Learning for Writer Identification and Writer Retrieval. arXiv: 1705.09369, 2017. 19. Y. Tang, X. Wu Text-Independent Writer Identification via CNN Features and Joint Bayesian. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition. Shenzhen, China: IEEE, 2017: 566-571. 20. Manivannan Arivazhagan, Harish Srinivasan, Sargur Srihari. A statistical approach to line segmentation in handwritten documents. Document Recognition and Retrieval XIV, 2007. 21. V. U. Marti, H. Bunke. The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis & Recognition, 2002, 5(1):39-46. 22. Z. Zheng, L. Zheng, Y. Yang. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 37743782. 23. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016.

Suggest Documents