Environmental Data Extraction from Multimedia ...

Environmental Data Extraction from Multimedia Resources Anastasia Moumtzidou 1, Victor Epitropou2, Stefanos Vrochidis1, Sascha Voth3, Anastasios Bassoukos2, Kostas Karatzas2, Jürgen Moßgraber3, Ioannis Kompatsiaris1, Ari Karppinen4 and Jaakko Kukkonen4 1

Information Technologies Institute 6th Klm Charilaou-Thermi Road Thessaloniki, Greece

{moumtzid, stefanos, ikom}@iti.gr 3

Fraunhofer Institute of Optronics, System Technologies and Image Exploitation Karlsruhe, Germany

[email protected] [email protected] ABSTRACT Extraction and analysis of environmental information is very important, since it strongly affects everyday life. Nowadays there are already many free services providing environmental information in several formats including multimedia (e.g. map images). Although such presentation formats might be very informative for humans, they complicate the automatic extraction and processing of the underlying data. A characteristic example is the air quality and pollen forecasts, which are usually encoded in image maps, while the initial (numerical) pollutant concentrations remain unavailable. This work proposes a framework for the semi-automatic extraction of such information based on a template configuration tool, on Optical Character Recognition (OCR) techniques and on methodologies for data reconstruction from images. The system is tested with a different air quality and pollen forecast heatmaps demonstrating promising results.

Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing – Indexing methods.

Keywords Environmental, multimedia, images, heatmaps, OCR, data reconstruction, template, configuration, pollen, air quality.

1. INTRODUCTION Environmental conditions are of particular interest for people, since they affect everyday life. Thus, meteorological conditions, air quality and pollen (i.e. weather, chemical weather and biological weather) are strongly related to health issues (e.g. allergies, asthma, cardiovascular diseases) and of course they play an important role in everyday outdoor activities such as sports and commuting. With a view to offering personalized decision support Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MAED’12, November 2, 2012, Nara, Japan. Copyright 2012 978-1-4503-1588-3/12/11...$15.00.

2

ISAG/AUTH Informatics Systems and Applications Group Aristotle University of Thessaloniki Thessaloniki, Greece

[email protected], [email protected], [email protected], 4

Finnish Meteorological Institute Helsinki, Finland

{ari.karppinen, jaakko.kukkonen}@fmi.fi services for people based on environmental information regarding their everyday activities [1], there is a need to extract and combine complementary and competing environmental information from several resources. One of the main steps towards this goal is the environmental information extraction from multimedia resources. Environmental observations are automatically performed by specialized instruments, “hosted” in stations established by environmental organizations, and the data collected are usually made available to the public through web portals. In addition to the observations, forecasts are used to foretell the levels of pollution in areas of interest, and these are usually published online in the form of images, while only a few of the data providers make available some means of access to their actual (numerical) forecast data. It should be mentioned that the presentation format adopted is human oriented and in most of the cases doesn’t allow for an automatic (or at least semi-automatic) extraction of information. A characteristic example is the air quality and pollen forecasts, which are usually encoded in image maps (heatmaps) of heterogeneous formats, while the initial (numerical) pollutant concentrations remain unavailable. In this context, we propose a semi-automatic framework for extracting environmental data from air quality and pollen concentrations, which are presented as heatmap images. The framework consists of three main components: an annotation tool for user intervention, an Optical Character Recognition (OCR) and text processing module, as well as the AirMerge heatmaps image processing system [2], [3]. The contribution of this paper is the integration of existing tools for environmental quality forecast data extraction (i.e. AirMerge) with text processing and OCR techniques tailored for heatmap analysis, under a configurable semi-automatic framework for processing air quality and pollen forecast heatmaps, which offers a graphical user interface for template-based customization. This paper is structured as follows: section 2 presents the relevant work and section 3 describes the problem. Section 4 introduces the proposed framework, the annotation tool, the text processing component and the image processing module. The evaluation is presented in section 5 and finally, section 6 concludes the paper.

2. RELATED WORK The task of map analysis strongly depends on the map type and the information we need to extract. Depending on the application,

a straightforward requirement would be to extract meaningful segments (e.g. rivers, forests, etc.), while in the case of heatmaps it is to transform color into numerical data. In general, the information contained in maps can be distinguished by their scale, colorization, quality, etc.. In the case of air quality and pollen forecast maps two types of information are mainly covered: 1.

Geographical information: points and lines describing country frontiers or other well-known points of interests or structures (e.g. sea, land) in a given coordinate system.

2.

Feature information: measured or forecasted parameters of any kind (e.g. average temperature) which is coded via a color scale representing the (measured or forecasted) values. Single values are referenced geographically by a color point at the corresponding geographical point.

which is usually the hour or day [7]. These providers present their air quality forecasts almost exclusively in the form of preprocessed images with a color index scale indicating the concentration of pollutants. In addition, they arbitrarily choose the image resolution and the color scale employed for visualizing pollution loadings, the covered region, as well as the geographical map projection. The mode of presentation varies from simple web images to AJAX, Java or Adobe Flash viewers [10]. The heatmaps that contain environmental information are static bitmap images, which represent the coverage data (e.g. concentrations) in terms of a color-coded scale over a geographical map. An example of such heatmap is depicted in Figure 1 obtained from the GEMS project1 website.

Chemical weather maps often use raster map images to represent forecasted data or spatially interpolated measured data. There are several approaches to extract and digitalize this image information automatically. The authors in [4] describe the process of the vectorization of digital image data. Hereby the geographical information, in form of lines, is extracted and converted to digital storable vector data. In [5] the authors use the specific knowledge of the known colorization in USGS maps, to automatically segment these maps based on their semantic contents (e.g. roads, rivers). Finally, [6] improves the segmentation quality of text and graphics in color map images, to enhance the results of the following analysis processes (e.g. OCR), by selecting black or dark pixels from color maps, cleaning them up from possible errors or known unwanted structures (e.g. dashed lines), to get cleaner text structures. Although research work has been conducted towards the automatic extraction of information in maps, to the best of our knowledge only the work performed for the AirMerge system addresses the extraction of information from chemical weather maps with image processing. In such works [7], [2], [3], a method to reconstruct environmental data out of chemical weather images is described. First, the relevant map section is scraped from the chemical weather image. Then, disturbances are removed and a color classification is used to classify every single data point (pixel), to recover the measured data. With the aid of the known geographical boundaries, given by the coordinate axis and the map projection type, the geographical position of the measured data point can be retrieved. In case of missing data points, a special interpolation algorithm is used to close these gaps.

Figure 1. Typical example of image of air quality forecasts. After observing the image, we conclude that besides the geographical information and concentrations, additional information is also provided, which is the type of environmental “feature” (e.g. ozone, birch pollen), the date/ time information of the meaningful information and a color scale. Therefore, the main parts of information that need to be extracted from the image are:  Heatmap: map depicting a geographical region with colors representing the environmental aspect value.

The proposed work proposes a framework, which integrates AirMerge and extends its application to pollen forecasts and in addition facilitates the procedure of information extraction from heatmaps using OCR and visual annotation techniques.

3. EMPIRICAL STUDY AND PROBLEM STATEMENT In order to clearly state the problem we have conducted an empirical study on more than 60 environmental websites (dealing with weather, air quality and pollen) and based on previous works [8] we reached to a conclusion that a considerable share of environmental content, almost 60%, is encoded in images. Specifically, pollen and air quality forecast information is always illustrated in heatmap images. Overall, it can be said [9] that the air quality and pollen information is usually presented in the form of images representing pollutant or pollen concentrations over a geographically bounded region, typically in terms of maximum or average concentration values for the time scale of reference,



Color scale: range indicating the correspondence between feature value and color



Coordinate axis (x, y): indicate the geographical longitude and latitude of every map point for a specific geographic projection



Title: contains information such as the type of aspect measured, the time and date of forecast



Additional information: watermarks, wind fields superimposed to concentration maps and any information that can be categorized as “noise” in terms of influencing the information content and representation value of the specific heatmap.

4. FRAMEWORK The proposed architecture draws upon the requirements that were set in the previous section. The idea is to employ image analysis and processing techniques to map the color variations on the images on specific categories that can be ranges of values. Optical character recognition techniques need to be used for recognizing

1

http://gems.ecmwf.int/d/products/raq/forecasts/plot_RIU/

text encoded in image format such as image titles, dates, environmental information and coordinates. Due to the fact that there is a large variation of images and many different representations, there is a need for optimizing and configuring the algorithms involved. Specifically, the intervention of an administrative user is required in order to annotate and manually segment different parts of a new image type (like data, legend, etc.), which need to be processed by the content extraction algorithms. The system workflow and the involved modules are depicted in Figure 2.

the elements and edit the values of the elements by hand. In case of ROIs or Points of Interest (POI) these elements are drawn in the graphical view onto the loaded image as overlay, to verify the parameters (e.g. size and position of an ROI). By using the mouse the displayed ROIs and POIs can be modified. In case of ROIs the four corner points of the drawn rectangle can be used to resize the actual element. By clicking in the centre of the rectangle, the object can be moved. POIs can be moved to another position. Any change of position and size is synchronized between both views.

In order to facilitate this configuration through a graphical user interface we have implemented the “Annotation Tool” (AnT), which is tailored for dealing with heat maps. The output of this tool is a configuration file that holds the static information of the image. The second module is the “Text Processing”, which uses the information of the configuration file to extract data from the corresponding image. More specifically, it retrieves and analyzes the information captured in text format using text processing techniques including OCR. The third module is the “Image Processing”, which uses information both from the output of the “Text processing” module and the configuration file to process the heatmap found inside the image. Figure 3. The user interface of Annotation Tool

4.2 OCR AND TEXT PROCESSING This module is driven by the configuration file and focuses on retrieving the textual information captured in the image using OCR and text processing. Figure 2. Image content distillation architecture The input of the framework is a heatmap image and the output is an XML file, in which each geographical coordinate of the initial heatmap image is associated with a value (e.g. air quality index).

4.1 ANNOTATION TOOL The results of the empirical study indicated that heatmap images share common characteristics, which in most cases do not match exactly, e.g. spatial arrangement. Therefore, it is necessary to manually identify the interesting parts in the images in order to automatically extract the important information. This can be achieved by a semi-automatic approach, which involves an administrative user, who would annotate parts of the image. The information provided by this user includes mainly the position and the dimension of several pre-specified elements of the image, such as the position and size of the heatmap, the color scale, the x and y axis, and the title, which are saved in a template file. To do this, the user has to define region of interests (ROI) and/or point of interests (POI) inside these images. Within the context of this work, a tool has been developed to make the annotation process easier and more user-friendly. The tool called Annotation Tool (“AnT”) can load images and predefined templates, and let the user interactively annotate them. The tool receives as input an image and produces an XML configuration file as output. The basic annotation structure of the images is preloaded based on the image type, in the form of an XML template file, which includes all necessary elements to describe the images (e.g. for heat maps at least the Region Of Interest (ROI) of the map, the ROI of the coordinates, etc.). The Annotation tool (Figure 3) provides two data views on the annotation data: a tree view and a graphical view. The tree view can be used to traverse through all

The first step includes the application of OCR on several parts of the input image. We used the Abbyy Fine Reader2 OCR software, which is applied separately on the following parts of the initial image: title, color scale, map x and y axis. In the second step empirical text processing techniques are applied on the results of OCR in order to make corrections by combining different sources of information. Part of the extracted information (e.g. coordinates) is used as input for the Image Processing Module. Then, we describe the two steps by applying them on the typical heatmap image of Figure 1 and present the results.

4.2.1 OCR on Title, Color Scale, Map Axis Based on the empirical study, considerable part of the meaningful information can be extracted from the text surrounding the image. More specifically, color scale and map axis are essential elements that provide information about the values and the geographical area covered. On the other hand, the title contains information about the environmental aspect measured and the corresponding date/ time. The location of the aforementioned image parts is captured in the configuration template. We apply OCR in the heatmap of Figure 1.The Tables 1, 2 and 3 contain the input and output of OCR for the title, the color scale and y axis (the results for x axis are omitted, since they are generated in a similar way to the ones of y axis). The values in bold indicate the errors produced by OCR. It should be noted that for the cases of color scale and x, y axis, we also retrieved the exact position of the text, in order to relate the text position with geographical coordinates.

2

http://www.abbyy.com.gr/

Table 1. Title - Image (top) and OCR output (bottom)

Tuesday 29 November 2011 OOUTC GEMS-RAQ Forecast t4027 VT: Wednesday 30 November 2011 03UTC Model: EURAD-IM Height level: Surface Parameter: Ozone [ \iq m31 Table 2. Color scale – Image (left) and OCR (with position) output (right) Position: left, top, right, bottom – Value

Position: 0, 17,51, 39 – Position: 0, 128, 50, 150 – Position: 0, 236, 51, 258 – Position: 3, 345, 51, 366 – Position: 3, 456, 51, 477 – Position: 2, 563, 51, 585 – Position: 3, 672, 51, 693 -

Value: 360 Value: 240 Value: 200 Value: 180 Value: 180 Value: 140 Value: 120

Table 3. Coordinates of y axis – Image (left) and OCR (with position) output (right) Position: left, top, right, bottom – Value Position: 20, 194, 78, 213 – Value: 65°N Position: 20, 386, 78, 405 - Value: 60°N Position: 21, 578, 77, 596 - Value: 55°N Position: 21, 770, 78, 789 - Value: 50°N Position: 20, 959, 47, 978 - Value: 45 Position: 51, 960, 77, 977 - Value: °N Position: 21, 1155, 47, 1172 - Value: 40 Position: 51, 1155, 77, 1172 - Value: °N

4.2.2.2 Color Scale The color scale shows the mapping between colors variations in the map and environmental aspect values. The extraction of information from the color scale is a two step procedure. The first step corrects OCR results, while the second correlates values to colors. In order to correct the OCR results, the most common difference among the scale values is calculated and then the error values are adapted accordingly. The correlation of values to colors is achieved by using the top-bottom or left-right coordinates, depending on the color scale orientation, of the color scale values and map them to the closest color. In the specific example, the most common interval among the values in the scale is 20 and error values are corrected based on that. Then, values are mapped onto coordinates and thus colors. For example, 140-160 is mapped onto the color found in (719,224) coordinates of the initial image.

4.2.2.3 X and Y Axis Regarding x and y axis, similar processing techniques are applied since they both represent the geographical coordinates of the map. Specifically, at least two points of the map, as well as their position with respect to the map needs to be resolved, in order to identify successfully all the point coordinates. The procedure followed includes again two steps: a) correction of the errors produced by OCR and b) use of the element the coordinates. For the specific example, after correcting OCR results, we associated the geographical coordinates (-10, 65) and (-5, 60) to the image map pixels (98, 125) and (162,189) respectively.

4.3 IMAGE PROCESSING

Then, we apply text processing to extract, correct and understand the semantic information encoded in the aforementioned places. Each of these segments was treated in different way since the type of the semantic information included is different.

In this section we present the image processing module that extracts data from different models and coordinate systems. The tool integrated into the system is the AirMerge engine, which performs various tasks concerning analysis, reverse engineering and reuse of heatmaps like Chemical Weather forecasts. The AirMerge engine combines elements of screen scraping, image processing and geographical coordinate transformations, in order to produce uniform, indexed data using a unified format and geographical projection. The engine is already in use as part of a more complex production environment [9], [3], it is available as a REST service via an API, and also in the proposed tool-chain as the final processing step, as indicated in Figure 2.

4.2.2.1 Title

4.3.1 Screen Scraping

4.2.2 Text processing on OCR Results

The title (if it exists) usually contains the name of the aspect, the measurement units and the date/ time. Moreover, in case that the data included in the map are forecast data, they contain two dates. Regarding the measurement units, these are usually standard depending on the measured environmental aspect and therefore we will not attempt to extract them. The date/time is considered as the most complex element given that it is presented in several different formats. In the current implementation, we focus on formats similar to the GEMs site. In order to correct possible errors in the textual format of the month, day and aspect we apply the Levenshtein distance and compare with three English ground truth sets. Then we correct the initial OCR result by considering the word from the ground truth dataset that has the minimum distance from it. For simplicity, in the current version we considered having only one date/time in the title. In the specific example, no corrections were required. Thus, the information we obtained from the title is the following: Date/ time: 2011-11-29 00:00:00, Aspect: Ozone

This step handles the cropping of the original image to a region of interest (the heatmap) and parsing of it into a 2D data array directly mapped to the original images’ pixels. Also, it associates each color to minimum/maximum value ranges of the air pollutant concentration levels, which is often implied by the color scale associated with the original images. It should be noted that the information about where to crop, where each color on the legend is, etc. are provided by the configuration template of the AnT in the proposed system. In this phase, the mapping of the images’ raster to a specific geographical grid is performed, since the images themselves represent a geographical region. The configuration system allows choosing between the most commonly encountered geographical projections (equirectangular, conical, etc.) and choosing keypoints in the image to allow for precise pixel-coordinate mapping. These functions are semiautomated in the autonomous AirMerge system.

4.3.2 Reconstruction of Missing Values and Data Gaps This step deals with unwanted elements such as legends, text, geomarkings and watermarks, as well as regions that are not part of the forecast area. The image’s pixels are classified into three main categories: valid data (with colors that satisfy the color scale’s classification), invalid data (with colors not present in the color scale), and regions containing colors that are explicitly marked for exclusion, and which are considered void for all further presentation and processing. However, regions containing unmarked invalid data are considered as regions with correctable errors or “data gaps” which can be filled-in. This distinction is due to their different appearance patterns: void regions are usually extended and continuous (e.g. sea regions not covered by the forecast, but present on the map), while invalid data regions are usually smaller but more noticeable (e.g. lines, text, watermarks “buried” in valid data regions) and with more noise-like patterns, and thus it is more compelling to remove them by using gapfilling techniques. These techniques include traditional grid interpolation as well as pattern-based interpolation techniques using neural networks, which are described in detail in [8].

Figure 4: Original Image

Figure 5: Reconstructed Image

Table 4. OCR error in Pollen website

Longitude step Latitude step

Original degrees

Estimation

Absolute Error

5o

4.98775

0.01225

o

4.98404

0.01596

5

SILAM model FMI website

5. RESULTS AND EVALUATION The evaluation of the framework is carried out into two steps with different focus. The first step deals with evaluating the OCR and providing a visual assessment of the output, while the second evaluates the final system result after running a series of tests on several images and comparing with ground truth. We omit the presentation of the final XML output of the system (i.e. mapping of geographic coordinates to forecast values), since its visual presentation is not that informative, and instead we present the reconstructed image, which derives from this representation and is more appropriate for visual inspection.

Figure 6: Original Image

Table 5. OCR error in SILAM website

5.1 OCR Performance and Visual Results The tests in this step focus on the recognition of the x and y axis and evaluate the assignment of pixels to geographical coordinates. Given the fact that we were not aware of the initial heatmap values we can only assess the results of AirMerge by visual comparison of the original image and the one produced by AirMerge. However, a more detailed assessment and evaluation of the AirMerge system can be found in [7], [2] and [3]. The images tested during the first step of the evaluation were extracted from the following sites of the Finnish Meteorological Institute (FMI): Pollen FMI site3 and SILAM model FMI site4. Pollen FMI website Figures 4 and 5 depict the original image and the reconstructed image produced from the proposed system after visualizing the XML output. The reconstructed figure is almost identical and in addition any noise (e.g. black lines) was removed. In table 4 we report the error introduced by the OCR named as “absolute error”, which is calculated as the difference when subtracting the OCR estimation (e.g. 4.98775 in the first line) from the initial degrees range (e.g. 5 in the first line). In both cases the absolute error is very low (around 0.3%) and acceptable.

3

http://pollen.fmi.fi

4

http://silam.fmi.fi/

Figure 7: Reconstructed Image

Longitude step Latitude step

Original degrees

Estimation

Absolute Error

5o

4.97516

0.02484

o

4.96523

0.03477

5

In case of the SILAM site, based on visual assessment the reconstructed image (Figure 7) is almost identical to the initial one (Figure 6). The absolute geo-coordinate error is very low (around 0.6%) and thus the error introduced by OCR is not significant.

5.2 System Evaluation In this step we focus on evaluating the system output with heatmaps from different providers. The evaluation is realized by comparing the results of AirMerge system based on manual configuration, which is considered as ground truth with the results of the proposed system involving AnT, OCR and AirMerge. The tests are performed on a set of 60 images, extracted from the following sites and locations for different times/dates: 

GEMS site, http://gems.ecmwf.int/d/products/raq/



Pollen FMI site, http://pollen.fmi.fi/pics/Europe_ECMWF_olive.html



SILAM site, http://silam.fmi.fi/AQ_forecasts/Europe_v4_8/index.html



Atmospheric and Oceanic Physics Group site, http://www.fisica.unige.it/atmosfera/bolchem/MAPS/

Table 6. Results comparing the proposed system output with the manually configured AirMerge.

Number of images Number of colors o

Pollen FMI site 15

SILAM site 15

GEMS site 15

Atmospheric and Oceanic Physics Group site 15

9

11

12

12

-4

2.42·10

-4

6.38·10-4

Latitude Error (in 5 )

2.85·10

Longitude Error (in 5o)

0.00174

3.88·10-4

0.00164

1.39·10-4

Mean percentage of pixels with correct value

97.424 %

90.957 %

77.23 %

77.3 %

0.1265

0.0365

0.0619

0.0504

Average error per pixel

8.54·10

-4

Demonstrator. In Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012), Heraclion, Crete, Greece.

In Table 6 we report the following results for every site: a) the number of images, b) the number of different colors in the color scale, c) the absolute latitude and longitude errors, which indicate the error introduced by the proposed framework for 5o degrees in each axis, d) the average percentage of pixels with correct values (i.e. compared with the values provided by manually configured AirMerge) and e) the average error introduced in each pixel due to OCR and thus misalignment of the coordinates. The error is

[2] Epitropou V., Karatzas K. and Bassoukos A. 2010. A method for the inverse reconstruction of environmental data applicable at the Chemical Weather portal. In Geospatial Crossroads @GI_Forum’10, In Proceedings of the GeoInformatics Forum Salzburg, 58-68, Wichmann Verlag, Berlin, ISBN 978-3-87907-496-9.

, where is the total number of calculated as: is the value of pixel i using AirMerge with manual pixels, configuration and is the value estimated by the system.

[3] Epitropou V. Karatzas K., Kukkonen J. and Vira J. 2012. Evaluation of the accuracy of an inverse image-based reconstruction method for chemical weather data, International Journal of Artificial Intelligence, in press.

Based on Table 6, it is evident that both the latitude and longitude errors are quite low for all sites and the percentage of pixels with correct values is satisfactory. The error introduced in each pixel value is in general quite low (around 6%), while only in the case of Pollen FMI site, the error is higher (around 12%). This is probably due to the fact that the values between sequential pixels were highly varying compared to the other sites.

[4] Musavi, M.T., Shirvaikar, M.V., Ramanathan, E. and Nekovei, A.R. 1988. Map processing methods: an automated alternative. In Proceedings of the Twentieth Southeastern Symposium on System Theory, 300-303.

6. CONCLUSIONS In this paper, we propose a framework for environmental information extraction from air quality and pollen forecast heatmaps, combining image processing, template configuration, as well as textual recognition components. This framework could serve as a basis for supporting environmental systems that provide either air quality information from several providers for direct comparison or orchestration purposes or decision support [1] on everyday issues (e.g. travel planning). The proposed work overcomes the limitation of not having access to the raw data, since it only considers information being publicly available on the Internet. Future work includes extensive evaluation with more images in different projections (e.g. conical), recognition of additional elements with OCR, as well as employment of Linked Open Data to enrich the semantics of the extracted information.

7. ACKNOWLEDGMENTS This work was supported by the FP7 project PESCaDO.

8. REFERENCES [1] Wanner, L., Rospocher, M., Vrochidis, S., Bosch, H., Bouayad-Agha, N., Bugel, U., Casamayor, G., Ertl, T., Hilbring, D., Karppinen, A., Kompatsiaris, I., Koskentalo, T., Mille, S., Moßgraber, J., Moumtzidou, A., Myllynen, M., Pianta, E., Saggion, H., Serafini, L., Tarvainen, V., and Tonelli, S. 2012. Personalized Environmental Service Configuration and Delivery Orchestration: The PESCaDO

[5] Henderson, T. C. and Linton, T. 2009. Raster Map Image Analysis. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition (ICDAR '09). Washington DC, USA, 376-380. [6] Cao, R. and Tan, C. 2002. Text/graphics separation in maps. In Fourth IAPR Workshop on Graphics Recognition, 2390, 167–177, Springer, Berlin. [7] Epitropou, V., Karatzas, K.D., Bassoukos, A., Kukkonen, J. and Balk, T. 2011. A new environmental image processing method for chemical weather forecasts in Europe. In Proceedings of the 5th International Symposium on Information Technologies in Environmental Engineering, Poznan, (Golinska, Paulina; Fertsch, Marek; Marx-Gómez, Jorge, eds.), ISBN: 978-3-642-19535-8, Springer Series: Environmental Science and Engineering, 781-791. [8] Karatzas K. 2009. Informing the public about atmospheric quality: air pollution and pollen, Allergo Journal, 18, Issue 3/09, 212-217. [9] Balk T., Kukkonen J., Karatzas K., Bassoukos A. and Epitropou V. 2011. A European open access chemical weather forecasting portal, Atmospheric Environment, 45, 6917-6922, doi:10.1016/j.atmosenv.2010.09.058 [10] Kukkonen, J., Klein, T., Karatzas, K., Torseth, K., Fahre Vik, A., San Jose, R., Balk, T. and Sofiev, M. 2009. COST ES0602: Towards a European network on chemical weather forecasting and information systems, Advances in Science and Research Journal, 1, 1–7.