Examination of Bioluminescent Excitation Responses ... - IEEE Xplore

4 downloads 912 Views 295KB Size Report
Aug 30, 2004 - Florida Institute of Technology. Dept. of Ocean Engineering [email protected]. Eric. D. Thosteson, Ph.D., P.E.. Florida Institute of Technology.
Examination of Bioluminescent Excitation Responses Using Empirical Orthogonal Function Analysis Jesse W. Davis Florida Institute of Technology Dept. of Ocean Engineering [email protected] Eric. D. Thosteson, Ph.D., P.E. Florida Institute of Technology Dept. of Ocean Engineering [email protected] Lee Frey Harbor Branch Oceanographic Institution [email protected] Edith A. Widder, Ph.D. Harbor Branch Oceanographic Institution [email protected]

Abstract – Bioluminescent intensities were measured August 30th – September 4th, 2004 in the Gulf of Maine with the HIDEX III Bathyphotometer. Empirical Orthogonal Function (EOF) analysis and Complex Empirical Orthogonal Function (CEOF) analysis were applied to these data sets in order to determine a unique excitation response for each species of bioluminescent organism encountered in the data sets. Using the results of the analysis, a filter was designed for real-time identification of dinoflagellates.

I. INTRODUCTION Bioluminescent intensities were measured August 30th – September 4th, 2004 in the Gulf of Maine with the HIDEX-BP GEN III (High Intake Defined Excitation – BathyPhotometer). The HIDEX III measures the pattern of light intensity, the excitation response, of bioluminescent organisms in seawater pumped through a 120mm diameter tube at approximately 18 liters per second. The data includes excitation responses from a mixture of bioluminescent organisms such as dinoflagellates, ctenophores, copepods, and euphausids. The HIDEX-BP GEN III is the newest of the high intake defined excitation bathyphotometers. More information on the HIDEX instruments can be found in [1]. The GEN III instrument employs twenty photomultiplier tubes (PMTs), Fig. 1, to measure the excitation responses of bioluminescent organisms and is also equipped with sensors to measure salinity, turbidity, chlorophyll, temperature, depth, and flow-rate as well as a spatial plankton analysis technique (SPLAT) system that visually analyzes the organisms encountered by the HIDEX III [2].

Fig. 1 The HIDEX III Zone Array Using the SPLAT system it is possible to identify different bioluminescent species by the unique temporal and spatial characteristics of their light emissions in response to mechanical stimulation [2,3]. The SPLAT system is video based and therefore measures with low temporal resolution (30 Hz). The HIDEX III measures only the temporal variation with a high temporal resolution (1 kHz). If each species has a light intensity signature unique to the organism, then this makes possible identification and quantification of bioluminescent organisms with non-imaging instruments such as the HIDEX III. Empirical orthogonal function (EOF) analysis and complex empirical orthogonal function (CEOF) analysis were applied to the data in an attempt to separate the light signatures of the various organisms. EOF analysis compresses large sets of data into statistical modes with the most variance contained in mode one, the most of the remaining variance contained in mode two, and so on [4]. CEOF analysis utilizes the addition of phase to detect propagating features in the data that EOF analysis cannot detect [4]. Using EOF and CEOF analysis, each species’ light signature can be extracted from large sets of mixed data and compressed into statistical modes of variance. It is hypothesized that time series coefficients for each mode

can then be used to quantify the abundance of organisms with common excitation responses that pass through the instrument. Using the results from the EOF and CEOF analysis, a filter can be designed for real-time identification of bioluminescent organisms using their unique excitation responses. II. FEATURE EXTRACTION TECHNIQUES A. EOF Analysis EOF analysis, also called Principal Component Analysis, is a classic technique used in statistical data analysis, feature extraction, and data compression [5]. EOF analysis uses second-order statistics to compress a large, multivariate set of data into a smaller set of variables with less redundancy [5]. It accomplishes this by setting the covariance matrix equal to zero and solving for the eigenvectors, eigenvalues, and time series coefficients. The first eigenvector, or mode, is found so that it contains the most variance in the data set, while the second eigenvector contains the most remaining variance, and so on. Time series coefficients are weighted constants used to show temporal changes in the magnitude of each eigenvector. The original data can be replicated at any instant in time by the summation of each eigenvector multiplied by its time series coefficient. B. CEOF Analysis CEOF analysis varies only slightly from conventional EOF analysis. Before creating a covariance matrix, CEOF analysis transforms the original time series of data into a complex time series. This gives the analysis an extra piece of information to use, the instantaneous phase. The addition of phase allows the analysis to produce not only the magnitude of the compressed data, but the instantaneous position as well. Now the analysis can detect both standing and propagating wave features in the data, whereas conventional EOF analysis could only detect standing wave patterns [4]. Besides being able to detect propagating wave patterns, CEOF analysis uses the “extra” phase information to create a more compact description of the data [6]. In other words it compresses the total variance into a smaller number of modes. In order to perform CEOF analysis, the original set of data must be transformed into a complex set of data using the Hilbert transform. The real part of the complex matrix is the original set of data, while the complex part is the Hilbert transform of the real part [6]. The complex data set can be represented as follows:

U j (t ) = u j (t ) + iulj (t )

(1.1)

Where u j (t ) is the original set of data and the Hilbert transform of the data [6].

ulj (t ) is

After applying this transformation, the same procedure used in conventional EOF analysis is implemented to obtain the complex eigenvectors and time series variables.

III. RESULTS A. Pattern Recognition Before implementing the EOF and CEOF analyses on the data sets, many hours were spent observing the excitation responses of the organisms. By observing the raw data, one can distinguish between different types of excitation responses encountered in the data sets. For instance, one excitation response produces a standing wave pattern from PMTs (photomultiplier tubes) 1-6, another produces a propagating pulse through all 20 PMTs, and yet another flashes at only one PMT and does not produce any other response. Once different excitation responses were identified, specific organisms were linked to each distinct response based on correlation with SPLAT CAM analysis and pumped plankton samples collected during HIDEX GEN III profiles. i) Standing Light Profile The simplest organism to link to a specific excitation response were the dinoflagellates, primarily a mixture of Ceratium fusus, Protoperidinium depressum, and P. sp. Dinoflagellates were found in large concentrations from depths just below the surface to the peak in chlorophyll content. A very distinct excitation response was found in the data sets at these depths and was verified as the dinoflagellates’ light signature by visually checking the SPLAT camera. The predominant light signature at these depths is a simple standing wave pattern that decays exponentially after its peak. The standing wave pattern peaks from PMT 2 – 3 and decays to the noise level by PMT 6. Fig. 2 shows the dinoflagellate’s excitation response as an instant in time.

Fig. 2 Instantaneous Dinoflagellate Response ii) Propagation in Light Profile Other excitation responses like the ones shown in Fig. 3 are abundant in the September 4th, 2004 data set but show variations in response length, width, and number of pulses. Some of the responses propagate through the entire length of the HIDEX III test chamber, while some have a delayed reaction to the stimulation caused by the excitation grid and finish their response before exiting the instrument. The responses also have different widths, or time in each PMT’s field of view, indicating larger and smaller organisms. Although these excitation responses produce similar maximum intensities (usually saturating the PMT), they produce a varying number of pulses, or peak intensities. In Fig. 3, there are two different excitation responses that vary both in width and response time. The first response is a larger organism that peaks twice in the middle of the test section; once at PMT 8 and again at PMT 11. The other response is from a smaller organism that also peaks twice, but at the beginning of the test section; once at PMT 3 and once at PMT 5.

Although mode 1 is the only mode that represents the dinoflagellates’ excitation response, it also contributes significantly to propagating organisms’ responses in the data set, indicating a linear dependence between the dinoflagellate response and that of the other organisms. The mode 1 eigenvector contributes to the other organisms’ excitation responses when they initially enter the instrument’s test section and when a large flash saturates the last PMT, but not while the organism is propagating through the middle of the test section. As will be seen, the linear dependence between the dinoflagellate excitation response and the other organisms’ responses will make the design of a real-time filter much more difficult.

Fig. 3

Propagating Excitation Responses

Although the September 4th, 2004 data set was known to contain numerous responses from ctenophores (Beroe cucumis) throughout the entire data set, a well defined excitation response could not be defined due to the biological randomness of the ctenophores’ flash dynamics. After linking unique excitation responses to different organisms, the results obtained from EOF and CEOF analyses could be analyzed to determine how efficient they were at separating the bioluminescent organisms’ light signatures. Fig. 4

Raw Data vs. Mode 1 Replicated Data

As mentioned, EOF and CEOF analyses compact the data into eigenvectors and time series variables. The eigenvectors can be described as common patterns in the original data set that relate to the excitation responses of the organisms, while the time series variables are coefficients that describe the temporal variation of each eigenvector. The distinct excitation responses determined above should be compacted into either one eigenvector or the sum of a few eigenvectors by the analyses. B. EOF Results EOF analysis was able to seperate the dinoflagellate light signature into one eigenvector (mode 1), but was ineffective at seperating the light signature’s of the other organisms into even a few modes ; instead their signature’s were expressed as the sum of the remaining modes. In order to verify that the analysis seperated the dinoflagellate excitation response into one mode, the data was replicated using only one mode 1 and compared to the original data. The results can be seen in Fig. 4 and Fig. 5. Note that both graphs in Fig. 4 are of the same scale. The replicated data and original data are nearly identical during periods of high dinoflagellate concentration. In fact the correlation between the two instantaneous responses in Fig. 5 is 98.57%, which corresponds to an R-square value of 0.9857. During these periods of high dinoflagellate concentration, the remaining modes have values that are orders of magnitude smaller and contribute nearly nothing to the replication of the data. This verifies that only one mode represents the dinoflagellate excitation response.

Fig. 5

Instantaneous Response of Raw Data vs. Mode 1 Replicated Data

C. CEOF Results CEOF analysis performed better than EOF analysis did. The analysis compacted more variance into fewer modes, was able to seperate the dinoflagellate light signature into one mode, and also expressed pieces of propagating organisms’ signatures as sums of two or three modes. Although CEOF analysis was able to compress more data into fewer modes, the mode 1 eigenvector still showed a linear dependence between the dinoflagellate response and the propagating organisms’ response. As with the EOF analysis, this dependence is only apparent at the beginning and end of the test section.

Although CEOF analysis could not separate the entire light signature of a propagating organism into one mode, it could express pieces of their signature as sums of two or three modes. Consider the comparison of the excitation response in Fig. 6 and the replicated data in Fig. 7. The sum of modes 1 and 2 expresses the initial stages of this propagation, the sum of modes 4,5, and 6 expresses the middle stages of the propagation, and the sum of modes 8,10, and 11, and finally mode 3 express the final stages of the propagation.

Fig. 6

The preliminary filter design simply correlates the dinoflagellate excitation response obtained from the EOF analysis to each light profile within a record. Using a threshold for detection of 96%, corresponding to an R-square value of 0.96, the filter returns a simple yes or no as to whether the ideal dinoflagellate excitation response is a match to the instantaneous light profile. The preliminary filter does indeed indicate the positions of the dinoflagellates within the record, but it also returns numerous false detections caused by other organisms. Fig. 8 shows the results of the preliminary ‘dinofilter’ applied to a data set taken on August 30th, 2004 with a strong concentration of dinoflagellates from the surface to 32 meters in depth (corresponding to 60 seconds). A ‘1’ represents a dinoflagellate while ‘0’ is none. After 60 seconds worth of data, there should not be any dinoflagellates triggered by the filter, but clearly there are numerous false detections.

Propagating Response

Fig. 8

Fig. 7

Propagating Response Mode Comparisons IV. APPLICATION

A. Filter Design Using the results obtained from the feature extraction analyses above, a real time filter for identifying dinoflagellates was developed.

Preliminary ‘Dinofilter’ Results

These false detections can be attributed to a linear dependence between the dinoflagellate excitation response and that of the propagating organisms’ response. The false detection problem occurs when the other organisms’ excitation responses center about PMTs 2 and 3. As the other organisms propagate through the instrument, the signals from successive PMT’s ramp up and ramp down in sequence. At the point when the organism is ramping up between PMT 2 and PMT 3, the ‘dinofilter’ is triggered, thus causing a false detection. In order to prevent this, a time delayed, or multistage filter is needed to weed out the false triggers. This time-delay filter takes advantage of the propagating nature of the other organisms’ excitation responses to distinguish them from the standing wave response of the dinoflagellates. This implies a high enough dinoflagellate concentration to sustain a standing wave pattern. Four ‘impulse’ filters are used to detect the propagating organisms in the data sets. An algorithm then uses the ‘impulse’ filters to determine if the ‘dinofilter’ was falsely triggered. B. Filter Results The new time-delay filter was applied to the same set of data taken on August 30th, 2004 with much better results, Fig. 9. Although the time-delay filter still

contains false detections, there are only 14 in the entire data set. With 160,265 samples taken in the data set, this yields a false detection probability of 0.0087 %. If only the time period from 60 seconds (end of dinoflagellates) until 160 seconds (end of the recorded data) is considered, the false detection probability is 0.014 %.

separate the dinoflagellate light signature into one mode, compress the total variance of each data set into a smaller number of modes, and piece-wise detect the propagating organisms in the data sets, whereas EOF analysis could only detect the standing wave pattern of the dinoflagellates. A time-delay filter was designed that utilized the results from the EOF and CEOF analysis to identify dinoflagellates during real-time data acquisitions. The filter correlates the ideal dinoflagellate excitation response to the instantaneous light profile in a data set while implementing a false detection algorithm to ensure accurate results. The time-delay filter was tested on known sets of data to verify its efficiency and was accurate to within 1% during the tests.

Fig. 9

Time-Delay Filter Results

In order to verify the efficiency of the filter, it was applied to another known data set that contains zero dinoflagellates. This data set was collected on September 4th, 2004 at a constant depth of approximately 135 meters. Fig. 10 shows the results of the time-delay filter. The filter is even more efficient in this data set. With 190,080 samples taken, only 8 false detections were recorded. This yields a false detection probability of 0.0042 %. The time-delay filter is very efficient at eliminating the false detections and providing accurate dinoflagellate locations in the data sets.

Fig. 10

Time-Delay Filter Results V. CONCLUSIONS

Both EOF and CEOF analysis were able to compress the dinoflagellate excitation response into one mode, but varied in the effectiveness of reproducing the other organisms’ excitation responses. Overall, CEOF analysis performed better than EOF analysis. It was able to

The ability of the feature extraction analyses to seperate the dinoflagellate excitation response and piecewise detect the propagating organisms’ responses suggests that the identification of bioluminescent organisms by excitation response alone is possible with instruments such as the HIDEX III. Acknowledgments Funded by ONR N00014-02-1-0949 to E.A. Widder. REFERENCES [1] Edith A. Widder, Lee Frey, and Jennifer Bowers, “Improved bioluminescence measurement instrument – A new high-intake defined excitation bathyphotometer developed for the U.S. Navy,” Sea Technology, vol. 46, no.2, pp. 10-15, February, 2005. [2] Edith A. Widder and Sonke Johnse, “3D spatial point patterns of bioluminescent plankton: a map of the underwater ‘minefield’,” Journal of Plankton Research, vol. 22, no.3, pp. 409-420, 2000. [3] Edith A. Widder, “Bioluminescence and the pelagic visual environment,” Mar. Fresh. Behav. Physiol., vol. 35, no. 1-2, pp. 1-26, 2001. [4] William J. Emery and Richard E. Thomson, Data Analysis Methods in Physical Oceanography, Pergamon, 1998, pp. 319-343. [5] Aapo Hyvarinen, Juha Karhunen, and Erkki Oja, Independent Component Analysis, John Wiley and Sons, Inc., 2001, pp. 125-144. [6] Guoxiong Liang, Thomas E. White, and Richard J. Seymour, “Complex principal component analysis of seasonal variation in nearshore bathymetry,” ICCE ’92, 23rd International Conference on Coastal Engineering, Ch. 172, 2: 2242-2250, 1992.