A Framework for the Automatic Identification of Algae - umexpert

3 downloads 0 Views 2MB Size Report
3. S. Ching Soon Tan, Phooi Yee Lau. Universiti Tunku Abdul Rahman. Centre for Computing and .... histogram; (b) Result after the application of morphology operations. (a). (b). Fig. 4. ..... [1] E. G. Bellinger, D. C. Sigee, “Freshwater algae: identification and use as bioindicators,” John Wiley and Sons, pp. 271, 2010.
A Framework for the Automatic Identification of Algae (Neomeris Vanbosseae M.A.Howe):U3S Ching Soon Tan, Phooi Yee Lau

Siew-Moi Phang

Tang Jung Low

Universiti Tunku Abdul Rahman Centre for Computing and Intelligent Systems (CCIS) Kampar, Malaysia [email protected],[email protected] du.my

Universiti Malaya Institute of Ocean & Earth Sciences (IOES) Kuala Lumpur, Malaysia [email protected]

Universiti Teknologi Petronas Department of Computer and Information Sciences (CIS) Tronoh, Malaysia [email protected]

Abstract—Neomeris vanbosseae M.A.Howe (NVH) is an algae belonging to the Chlorophyta, which is a very diverse group of algae. Therefore when an algae biologist establishes an abundance assessment for algae biodiversity, the taxonomic identification and quantification could frequently lead to a time intensive procedure which is prone to a counting bias, due to fatigue when processing large pools of data samples repeatedly. To improve the effectiveness, this paper proposed a framework, being an assistive tool, to help marine biologists. The framework consists of (1) pre-processing the image, (2) segmenting region of interest, (3) extracting features (namely four different geometric features) , (4) evaluating and combining those features based on the given decision criterions, and (5) the quantification. Our methodology achieved satisfactory performance (NVH abundance) as it’s able to provide an encouraging result with 78.38% detection rate yielded by the comparison between manual count and our system automatic count. The major contribution of this work is the development and the deployment of an automatic identification system, named U3S, for biodiversity abundance studies of algae to assist the marine biologist in identifying algae species, complementing the existing operator intensive procedures. Keywords—assistive system; image analysis; Neomeris vanbosseae M.A.Howe shape analysis; algae abundance studies

I.

INTRODUCTION

Algae serve as the main oxygen producer and food source in the aquatic ecosystem. Presently, they are also used as the bio-indicator to monitor the freshwater ecosystem condition [1]. Because of their importance to the aquatic ecosystem, the estimation of the algae abundance becomes a key component to investigate the susceptibility to the change in the aquatic environment. Algae are very diverse and come from the different species with various colors, shapes and sizes. Neomeris vanbosseae M.A.Howe (NVH) [2] is among them, and belong to Neomeris genus under Chlorophyta division. The algae species is typically worm-shaped, slightly bent with the bright green upper portion and covered with numerous fine fibers. It grows on the rock or the coral in lower intertidal. They are distributed from the North West Cape region, Western Australia, around northern Australia to the southern Great Barrier Reef, Queensland, Tropical Indo-Pacific [2].

978-1-4799-0059-6/13/$31.00 ©2014 IEEE

Recently, image processing has been increasingly useful in the application of the biodiversity studies, particularly used as an assistive tool to process huge amount of data in various scientific works. Often, the algae scientists are mainly interested to investigate the abundant growth of the algae and also classify taxonomic algae based on their individual properties. However, their works are time-consuming and slow especially for the densely populated algae growing area. Other than that, another reason that motivates the development of the automated analysis system is the need in the abundance studies with the objective to collect information of the algae for monitoring purposes, particularly for the monitoring effect over the harmful algae species [3]. This has led many researchers to use image processing technique to develop automated systems [4], [5] to perform analysis, classification and even the quantification of the algae. In 2001, Pech-Pacheco et al. [6] proposed an automatic system for diatom localization and identification. Their work contributes to the automated analysis of microscopic phytoplanktonic samples. In 2002, Walker et al. [7] proposed a method to analyze image of freshwater microalgae by combining image processing technique with fluorescene-assisted image. In 2011, Monsoon et al. [8] proposed a system that is capable to classify four types of cynabacteria by applying Artificial Neutral Network (ANN) framework. Although the visual attentions of their works [6], [7], [8] were on the microscopic images processing against the microalgae studies, the similar objective of their works is restudied by our work. That is, to provide an automatic system to assist the biological scientists in algae biodiversity studies. The only difference is our case study exploits the habitat of the macroalgae under the real aquatic environment, not limited to the microalgae observation. A framework for the automatic identification of algae (NVH) is presented in this paper by evaluating the combination of the multiple different features extracted based on the prior knowledge about the characteristic of NVH. The main contribution of this work is an automatic operation for the identification of the algae species, complementing the existing operator intensive procedure. The rest of this paper is as follows. Section II describes the development tool and system overview. The methodology is presented in Section III. Section IV discusses the experiment

results and lastly, Section V concludes the paper with future works. II.

DEVELOPMENT TOOL AND SYSTEM OVERVIEW

An automated identification system, named as UTAR-UM Underwater System (U3S), is developed using OpenCV library that is widely used in the real-time computer vision application. U3S system flow is illustrated in Fig. 1. In Step 1, the system acquires NVH images as the input image. In Step 2, preprocessing is performed to: (1) smooth the image by reducing the noise, and (2) convert RGB color model of the image into HSV color model. Subsequently, in Step 3, the region segmentation is used to extract a set of the regions of interest. Step 4 is used to detect the edge information from the output of Step 3 and label each region of the interest (RoI). Then, Step 5 proposes to extract four different geometric features: (1) aspect ratio, (2) convex domains, (3) uniform width, and (4) parallelism between left right side borders. In Step 6, a decision criterion is given to combine with multiple individual features in order to perform the classification of RoI. Each RoI is determined whether it should be kept or removed by estimating its fitness thought a series of the combined feature evaluations. Lastly, in Step 7, quantification is performed to estimate the total number of those candidates that eventually ‘survive’ though Step 6, with high possibly corresponds to NVH.

Fig. 1. U3S System Flow

III.

METHODOLOGY

A. Step 1: Image Acquisition The images which contain NVH are acquired as the input image of the proposed system. They are obtained from the internet resources therefore their resolution are often not standardized and distinct but with RGB color format. B. Step 2: Pre-processing Pre-processing step is used to enhance the image quality before the segmentation. Firstly, a median filter is used to

smooth the image by reducing the noise. Then the raw data of the image are converted from RGB color space to HSV color space, as shown in Fig. 2. HSV is selected as the color representation because it mostly responds to the human perception for the ease of the separation of the luma from chroma due to its robustness in different lighting conditions.

(a)

(b)

Fig. 2. Conversion from RGB to HSV color space. (a) RGB; (b) HSV

C. Step 3: Region Segmentation Color information is often being considered a visible clue for the region segmentation. Based on the appearance of NVH, green color and white color components could be used to highlight the presence of NVH. Thus white pixels and green pixels are considered to be remained in the image and marked as foreground pixels by scaling them to white during the image binarization via the color threshold – see Fig. 3(a). Note that the foreground pixels make up the regions which possibly correspond to NVH whereas the background pixels often correspond to the non-interesting pixels. However, the precision degradation could be caused by the color-based segmentation due to the fact that different illuminations or several external factors (e.g. Shadow, nutrient and also variety change of water) can greatly affect the color appearance of the algae since the segmentation are very sensitive to the tolerance value assignment for the Hue (H), Saturation (S) and Value (V) variable in the HSV color space. To maintain the coherence of the segmentation quality, several testing images which contain NVH are randomly selected as the templates. At each template, majority of the green pixels and white pixels found on each NVH are gathered to create the green and white histograms separately – see Fig. 4. Next, green and white histograms are applied to determine the suitable tolerance value of H and S variables. For V variable, it is related to the brightness thereby its value ranging is fixed at between 100 and 255 without consider the object status under the lack of the lighting in which the case could make the color appearance of the object being distorted to be dark. During the color-based region segmentation, if the green pixels are found inside the input image, the segmentation will first consider green color as the primary color to extract RoI and otherwise, the segmentation will be based on the white color. After the segmentation, morphology open is used to delete the isolated pixels with a 3x3 rectangular structuring element and followed by applying the morphology close to fill the holes found in the blobs with a 3x3 rectangular structuring element – see Fig. 3(b).

and often known as noise, thus they are removed from the candidates queue.

(a)

(b)

Fig. 3. (a) Result after the application of the color threshold based on white histogram; (b) Result after the application of morphology operations.

(a)

(b)

Fig. 5. (a) Application of Canny’s edge detection; (b) Output of edge-based contour extraction

E. Step 5: Feature Extraction To generate a detector which is able to work well for the identification of NVH with different responses to the scale and rotation, we refer to the structural elements that make up NVH using four different geometric features extracted from the boundary shape of each candidate.They are (E1) aspect ratio, (E2) convex domain, (E3) uniform width and (E4) parallelism between left-right side borders.

(a)

E1. Aspect ratio: This feature is used to describe the proportional relationship between the length and width of the object. Firstly, the central baselines are determined by separately estimating the longest distance between two pixels along the contour at the horizontal axis and the vertical axis. Next, both are further distinguished into the major axis and minor axis based on the corresponding length. Aspect ratio is defined as , where is denoted as the length of the major axis, is denoted as the length of the minor axis. Since NVH has the elongated structure, each candidate region which has the aspect ratio that reaches over two is accepted after rigorous experiments.

(b) Fig. 4. The graph of the hue and saturation value determination for (a) white pixels and (b) green pixels.

D. Step 4: Edge-based Contour Extraction The binary output obtained from the Step 3 is processed into an edge map by using Canny ’s edge approach [9] - see Fig. 5(a). Then, the connected component analysis algorithm proposed by Suzuki and Abe [10] is employed to identify the continuities between the edge pixels, connect the continuous edge pixels to form a contour and finally performing labeling – see Fig. 5(b). Consider that tiny contours which occupy less than 0.03% of the whole image dimension are meaningless

E2. Convex domain: The convex domains can be found in the head portion and tail portion of NVH since they respectively presents as the curvature with the corresponding orientation and a sharp peak that being labeled as and in the Fig. are expressed in (1), 6(b). Let two sequences of points respectively corresponds to or which are where derived from two sides of major axis; K value is determined based on in such a way that the point set will keep collecting the points until achieve that the length of the sequence which formed by a total number of 2K+1 pixels is greater than . ,

,…,

, ,

,…,,

,

(1)

Then it is followed by using the equation (2) to determine whether the sequence of points forming the convex domain. cos

· ·

(2)

where , , denotes the angle between and . The K-cosine [11] contains the curvature information such that 1 1. As value is approximate to 1 ( 0 ), it indicates that a sharp angle between and is formed at , implying that is an dominant point to indicate the sign change in the sequence. Refer to the structure of NVH, the convex domain is defined as the curvature with a sharp peak, not conforming to the straight line with approximate to -1 ( 180 ). The sequence which results in value ranging from -0.5 to 1 is eligible for the convex domain representation. For E2 evaluation, only the candidate that finds two convex domains which separately located at and as depicted in Fig. 6(b) is accepted.

(a)

(b)

Fig. 7. Illustration of (a) E3 and (b) E4 feature

(a)

(b)

Fig. 6. Illustration of (a) E1 and (b) E2 feature

E3. Uniform width: NVH presents the uniform shape with the approximately same width for each pair of the middle pixels along the major axis. The region consists of concatenating , , …, , where is the width formed at pair of the middle pixels. This feature is evaluated according to (3) and pair of the middle (4). It states that the width formed at 1 pair pixels, , is compared with the width formed at . If the width difference resulted by of the middle pixels, , the examination between the comparison is less than pair of the middle pixels and 1 pair of the middle pixels are treated to be successful temporarily and continue to with examine the width of next pair of the middle pixels . Otherwise, the examination will quit immediately and the candidate will be rejected unless satisfy the condition above. If the examination successfully passes through N middle pixels pairs in which N corresponds to a total of 80% of all the middle pixels pairs in the candidate region, the candidate will be eventually accepted. ∆

,



,

2 5

,

1

1

(3)

,

1

1

(4)

E4. Parallelism between left-right side borders: With the bending properties found in the NVH, left-right side borders are often not expected to conform to a pair of straight lines but they are obviously a pair of parallel lines. This feature extraction begins by made use of the convex domain information as mentioned earlier in E2. As shown in Fig. 7(b), one of the convex points is selected and labeled as then its forward adjacent point and backward adjacent point are identified and used to search its next L forward points and next L backward points joining a pair of lines, where L is assigned to be 5 pixels after the rigorous experiment. The orientation angles of both lines are measured using (5) and compared with each other. If the difference between both line orientation angles are less than 45 degree, the current checked lines will be considered to be parallel temporarily and proceed to the next checking by updating and into and and search its next forward adjacent point and backward adjacent point , forming a new pair of lines to further determine whether they are parallel. Otherwise quit the checking immediately and consider this case to be the failure of E4 feature evaluation. For the successful E4 evaluation, the examination is iterated until stopped when either the total length of or that respectively accumulates from the first point, to the last checking point exceeds . atan2(

,

)

180

(5)

F. Step 6:Combination of the features Because a single feature is insufficient to examine the structure of NVH, herewith, multiple features are combined to generate a more reliable result. A set of hierarchical decision criterion are given to provide a strong clue in the investigation process in order to determine the candidate fitness during the combination of the features. The precedence rule for the features combination is indicated in Fig. 8.

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 8. Decision criterion are given to classify the candidate regions into NVH or other

For each candidate region, all the features i.e. aspect ratio (E1), convex domains (E2), uniform shape (E3) and parallelism between left right side borders (E4) are evaluated according to the priority indicated in Fig. 8. The criteria to determine whether the candidate region is treated as “pass” or “fail” for each individual feature evaluation are summarized in Table I. During the combination, the final candidates that are qualified to represent as NVH are determined since they passed all the feature evaluations. Otherwise, regardless of any individual feature is failed then the candidate will be eventually removed from the candidate queue. TABLE I.

THE EVALUATION CRITERIA OF THE INDIVIDUAL FEATURE

Item

Pass

E1

Fail 2

E2

2 convex domains

E3

∆ 1

,

2 5

IV.

2 < 2 convex domains ,

1



,

1

2 5

, 1

E4

G. Step 7:Quantification of the abundance Quantification is performed to estimate the abundance of NVH. All the final candidates that satisfy the conditions in Step 6 are kept in a list. Each candidate of this list is labeled and the number of the NVH counted is equal to the number of candidate(s) stored in the list, as shown in Fig. 9(e).

(a)

Fig. 9. A series of results. (a) The result after E1 feature evluation; (b) The result after E2 feature evalution; (c) The result after E3 feature evaluation; (d) The result after E4 feature evaluation; (e) The final result with the quantification details after combined E1, E2, E3 and E4, where testing image source is from [2]; (f), (g), (h) are the sample results using testing images from [12], [13], [14].

(b)

EXPERIMENTAL RESULTS AND DISCUSSION

The performance of U3S was evaluated using 7 images which contain NVH, obtained from 3 [2], [12], [13], [14], [15]. For the comparison purposes, each individual NVH that were found in each image were accurately registered. Then it is followed by the executing the U3S system to test the dataset. As shown in Table II, this system correctly identified 29 NVH, while missing 8 NVH, i.e. about 78.38% detection rate. The main reason behind the false negative (missed) here is due to the fact that some parts of the individual NVH are occluded by another (or more than one) NVH. TABLE II. IDENTIFICATION RESULTS BASED ON THE COMPARISON BETWEEN HUMAN OPERATOR AND U3S SYSTEM COUNTING Item

Human Count (HC)

System Count (SC)

Miss (FN)

Wrong (FP)

Img1 Img2 Img3 Img4 Img5 Img6 Img7 Total

8 11 3 3 1 1 10 37

8 6 3 3 1 1 7 29

0 5 0 0 0 0 3 8

0 0 0 0 0 0 0 0

Detection Rate ((HC-FNFP)/HC× 100%) 100 54.55 100 100 100 100 70.00 78.3784

V.

CONCLUSION

REFERENCES

A framework for the automatic identification of the algae is proposed by incorporating multiple features evaluation. In this paper, NVH is used as the case study. The main goal of this work is to complement the existing operator intensive procedure and reduce the required time and effort for the image analysis in the algae biodiversity study particularly in the abundance assessment. The identification of NVH is based on the combination of four different geometric features which being characterized with scale-invariance and rotateinvariance. They consist of aspect ratio, convex domain, uniform width and parallelism between left-right side borders. The experimental results of the proposed system were encouraging for it provides reasonable approximation of the actual quantification, i.e. achieving of about 78.38% detection rate for NVH.

[1] [2]

[3]

[4]

[5]

[6]

[7]

[8]

(a)

(b)

Fig. 10. The proposed method applied in the experimental test of other neomeris plants which belong to the same genus with NHV. (a) [16], (b) [17] Neomeris Annulata

[9]

[10]

However, the limitation of the proposed system is that the current proposed framework relies heavily on the quality of the region segmentation therefore poor illumination and occlusion could degrade the overall performance of the proposed system. After further experiments, we found that the U3S also works well in detecting and enumerating other Neomeris plants (e.g. Neomeris Annulata) (see. Fig.10) since most of the Neomeris members have the same algae structure. Future works for our framework will regard: 1) Extending the capability of the U3S system to recognize large variety of the algae species with a new unified strategy.

[11] [12]

[13]

[14]

ACKNOWLEDGMENT This work is supported by the UTAR Research Fund Project No. IPSR/RMC/UTARRF/2013-C2/L03 “A New Framework for the identification of Biodiversity Abundances for Underwater Species in Malaysian Waters” from the Universiti Tunku Abdul Rahman, Malaysia and Grant No. RG20613SUS.

[15]

[16] [17]

E. G. Bellinger, D. C. Sigee, “Freshwater algae: identification and use as bioindicators,” John Wiley and Sons, pp. 271, 2010. algaeBase. “Neomeris vanbosseae M.A.Howe,” www.algaebase.org. [online]. Available: http://www.algaebase.org/search/species/detail/?species_id=3738 [Assessed: Nov. 21, 2013]. K.G. Sellner, G.J. Doucette, G.J. Kirkpatrick, “Harmful algal blooms: causes, impacts and detection,” Journal of Industrial Micbiology and Biotechnology, vol. 30, no. 7, pp. 383-406, 2003. Phil F. CulverHouse, R. Williams, B. Reguera, V. Herry, S. GonzalezGil, “Do experts make mistakes? A comparison of human and machine identification of dinoflagellates,” Marine Ecology Progress Series, vol. 247, pp. 17-25, 2003. K.V. Embleton, C. E. Gibson, S. I. Heaney, “Automated counting of phytoplankton by pattern recognition: A comparison with a manual counting method,” Journal of Plankton Research, vol. 25, no. 6, pp. 669681, 2003. J.L. Pech-Pacheco, Cristóbal Gabriel, Alvarez-Borrego Josué and CohenLeon, “Automatic system for phytoplanktonic algae identification,” Limnetica, vol. 20, no. 1, pp. 143-158, 2001. Ross F. Walker, Kanako Ishikawa, Michio Kumagai, “Fluorescenceassisted image analysis of freshwater microalgae,” Journal of Microbiological Methods, vol. 51, no. 2, pp. 149-162, 2002. H.Manssor, M. Sorayya, S. Aishah, Mosleh A. Mogeeb A., “Automatic Recognition System for some cyanobacteria using image processing techniques and ANN approach,” International Conference on ENVHironment and Computer Science IPCBEE, vol. 19, pp. 73-78, 2011. J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 8, no. 6, pp. 679-698, 1986. S. Suzuki, K. Abe, “Topological Structural Analysis of Digitized Binary Images by Border Following,” Computer Vision, Graphics, and Image Processing (CVGIP),vol. 30, no. 1, pp. 32-46, 1985. Te-Hsiu Sun, “K-Cosine Corner Detection,” Journal of Computers, vol. 3, no. 7, pp. 16-22, July 2008. [Online].Available: http://guamreeflife.com/images/organisms/fullsize/algae_seagrass/algae/ green_algae/neomeris_cf_vanbosseae/neomeris_cf_vanbosseae_full1.jp g [Assessed: Nov. 23, 2013]. Claude Payri, “CalPhotos,” calphotos.berkeley.edu. [online]. Available: http://calphotos.berkeley.edu/cgi/img_query?seq_num=367685&one=T [Assessed: Nov. 23, 2013]. Jean-Louis Menou, “CalPhotos,” calphotos.berkeley.edu. [online]. Available: http://calphotos.berkeley.edu/cgi/img_query?seq_num=367116&one=T [Assessed: Nov. 23, 2013]. Antoine N’Yeurt, “CalPhotos,” calphotos.berkeley.edu. [online]. Available: http://calphotos.berkeley.edu/cgi/img_query?seq_num=362889&one=T [Assessed: Nov. 23, 2013]. [Online].Available: http://www.wetwebmedia.com/trialalgaeid.htm [Assessed: Nov. 23, 2013]. [Online].Available: http://www.wetwebmedia.com/Algae%20and%20Plt%20Pix/Green%20 Algae/Neomeris/MysteryGrowth.jpg [Assessed: Nov. 23, 2013].

Suggest Documents