Fast Automatic Redshift Determination Using Absorption Lines ...

2 downloads 0 Views 1MB Size Report
Dec 25, 2011 - Jin-Shu HAN,1,2,3,4 A-Li LUO,1,2 and Yong-Heng ZHAO. 1,2. 1National Astronomical Observatories, Chinese Academy of Sciences, Beijing, ...
PASJ: Publ. Astron. Soc. Japan 63, 1313–1330, 2011 December 25 c 2011. Astronomical Society of Japan. 

Fast Automatic Redshift Determination Using Absorption Lines Recognition Jin-Shu H AN,1,2,3,4 A-Li L UO,1,2 and Yong-Heng Z HAO1,2 1 National

Astronomical Observatories, Chinese Academy of Sciences, Beijing, 100012, China jinshu [email protected] 2 Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing, 100012, China 3 Graduate University of Chinese Academy of Sciences, Beijing, 100049, China 4 Department of Computer Science and Technology, Dezhou University, Dezhou, 253020, China (Received 2011 March 11; accepted 2011 July 27)

Abstract To precisely measure the redshifts of non-ELGs (ELGs: emission-line galaxies), weaker-ELGs and galaxies with only one emission line that is clearly visible in the optical band, a fast automatic redshift determination algorithm (FRA) is proposed, which is different from the widely used cross-correlation method. The algorithm has higher speed because it determines the redshift by extracting and recognizing two pairs of prominent absorption lines in the blue band of a spectrum, G-band 4306 and H 4342 (vacuum wavelength), Hˇ 4863 and Mg 5177, rather than through matching an observational galaxy spectrum with the complete galaxy templates. Moreover, a wide range of redshift measurements from 0 to 0.65 and relatively high success rates can be reached by FRA. For spectra with a relatively wide wavelength range and with a resolution above R = 1000, FRA is an efficient automatic algorithm that can be used in spectra redshift surveys with large amounts of data. Key words: cosmology: distance scale — cosmology: observations — methods: data analysis — techniques: spectroscopic

1. Introduction There is much information contained in a spectrum, which can be used for understanding our universe. Generally, an observational galaxy spectrum is composed of continuum, spectral lines and noise. Continuum and absorption lines provide global information of all stellar contents in a galaxy, emission lines strongly indicate star-formation rates, and various noise is caused by instruments and the environment. According to the spectral features, we can classify galaxies as emission-line galaxies (ELGs) or non-emission-line galaxies (non-ELGs). For both of them, redshifts are important for further studies of galaxy physics and cosmology. With the development of large-scale spectroscopic surveys, such as the Sloan Digital Sky Survey (SDSS, AdelmanMcCarthy et al. 2008), the 2dF Galaxy Redshift Survey (2dFGRS, Colless et al. 2001), and the Large Sky Area MultiObject Fiber Spectroscopic Telescope (LAMOST), massive amounts of spectra data have been collected. In addition, there are some fainter objects with higher redshifts and lower signal-to-noise ratios, which makes it more difficult to measure redshifts reliably. Hence, for such large-quantity, lowquality spectra, quick and accurate data reduction and analysis methods are being studied. At present, the widely used automatic methods for redshift measurements can be sorted into two groups: the full-spectrum matching methods and the spectral-lines matching methods. The classical method of the former is cross-correlation analysis (Tonry & Davis 1979), which is mainly applied to nonELGs, and has been adopted by 2dFGRS (Colless et al. 2001). In this method, let G be a galaxy spectrum whose redshift needs to be determined, and all available galaxy template

spectra are shifted to all possible redshifts by the unit Δz. Then, the cross-correlation function between G and each template at each possible redshift is computed. The largest peak’s position and width in the cross-correlation function indicate an estimate value of the redshift and an “error” on the redshift, respectively. This method has three shortcomings. Firstly, the database of standard galaxy template spectra must be complete; namely, the templates must span all possible objects in the universe. Secondly, too much time is consumed due to the exhaustive search and cross-correlation matching, and the measuring time depends on the value of the redshift and the number of templates. Last but not least, the automatic redshift determination is substantially an application case in the pattern-recognition field. Cross-correlation analysis belongs to a matching method based on the full original data, although it transforms the original spectra data into frequency space by Fourier transformations, and only the data in the intermediate frequency are used for templates matching. Pattern-recognition theory (Bian et al. 2000) shows that the full original data possess the characters of high dimensions and large data volume, and cannot clearly reflect the intrinsic features of an object. In fact, a human generally recognizes an object according to its major characteristics, rather than all of its details. Therefore, in order to efficiently implement automatic redshift measurements, prominent features that reflect the intrinsic properties of an object, and are not be easily masked by unimportant details, should be extracted from the observation data. As the superior generalization of cross-correlation, the PCAZ method (Glazebrook & Offer 1998) adopted by SDSS (Adelman-McCarthy et al. 2008) firstly builds a set of orthogonal templates from a variety of galaxy template spectra

1314

J.-S. Han, A.-L. Luo, and Y.-H. Zhao

by the principal component analysis (PCA). Then, being similar to the cross-correlation method, PCAZ does crosscorrelation operation between the observational spectrum, G , and the corresponding linear combination of the orthogonal templates in every possible redshift. The PCAZ orthogonal templates can be regarded as the features of the object, but the PCAZ only extracts the features mathematically, and insufficiently considers the physical properties of the spectrum. In fact, the PCAZ method is still an exhaustive search and matching method, and has the same shortcomings as the cross-correlation method. The spectral-lines matching methods are mainly applied to ELGs and adopted by SDSS and 2dFGRS (Stoughton et al. 2002; Adelman-McCarthy et al. 2008; Colless et al. 2001; Drinkwater et al. 2009). In these methods, the local extreme points in a spectrum are detected as the candidate emission lines. Then, a Guassian is used to fit every candidate line to determine the line center. Finally, the redshift is determined by spectral-lines identification. There are some improvement algorithms, such as a pseudo-triangle technique (Qiu et al. 2002) and a density estimation method (Duan et al. 2005), and tools such as EZ (Garilli et al. 2010) have been developed. In the above-mentioned methods, the measurement results depend on the accuracy of lines extraction, and have to get worse in the following cases. One is weaker emission-line galaxies (weaker-ELGs), which tend to present more spurious lines, or missing true lines, especially when the signal-to-noise ratios are lower. The other is that the spectrum has only one clearly visible emission line, or even no line in the observation wavelength range due to the redshift, missing data caused by bad equipment pixels, or other reasons. Both cases can result in failing in emission-lines identification; therefore, we have to depend on the continuum and absorption lines to determine the redshift. We support designing different optimum algorithms for different kinds of galaxies according to their characteristics, rather than using only one method to process all kinds of galaxies. The algorithm proposed in this paper derives from this idea, and is suitable for redshift surveys of spectra with a relatively wide wavelength range and with resolution above R = 1000, such as SDSS or LAMOST. In this paper, galaxy spectra are classified as non-ELGs, weaker-ELGs, or strongerELGs. We design the improved algorithm of redshift determination mainly for non-ELGs and weaker-ELGs. As for the last one with several prominent emission lines, the spectral lines matching method is good. In this paper, in order to overcome the shortcomings of the full-spectrum matching methods, we propose a fast automatic redshift determination algorithm (FRA). The basic ideas of the FRA algorithm are described as follows. According to the spectral physical properties, the algorithm adopts a few prominent absorption lines as the spectral features, rather than using the exhaustive search strategy adopted by the full-spectrum matching methods. In order to measure higher redshifts, the feature lines in the blue band are utilized by the FRA. The first feature is selected from the G-band 4306 and H 4342, and the second is selected from Hˇ 4863 and Mg 5177. Then, the redshift is determined by absorption-lines identification. Our experiments show that the algorithm is suitable for

[Vol. 63,

non-ELGs and weaker-ELGs. The high measurement speed of the FRA is independent on the value of the redshift, itself. The success rate of measurements is higher, and the measuring precision meets the allowable error requirement, ˙0.001. The algorithm is robust and insensitive to uncertainties in the flux calibration. The range of redshift measurements is from 0 to 0.65, covering the redshifts of almost all known non-ELGs and weaker-ELGs in SDSS. In addition, for stronger-ELGs with less than two emission lines in the optical wavelength band, the algorithm can give a reference redshift. This paper is organized as follows. In section 2, we describe the characteristics of galaxies that can be well processed by our FRA, and also represent the method of distinguishing and rejecting ELGs with several stronger and wider emission lines. The principle of the FRA algorithm is presented in section 3. In section 4, we discuss the solutions of several typical cases, such as non-ELGs with higher redshifts, data missing spectra, weaker-ELGs and an example of LAMOST spectra. The validity of our algorithm is verified in section 5. Finally, we summarize this work in section 6. 2. Galaxy Category and Distinguishing Method 2.1. Galaxy Category In this paper, non-ElGs stand for the E/S0 galaxies, which usually show relatively homogeneous spectra with no, or weaker [O II]3727 or [N II]6583 (Kennicutt 1992). Weaker-ELGs mainly consist of Sa, Sab, and Sb galaxies, which usually show weaker emissions in [O II]3727, H˛ and [N II]6583, the equivalent widths of which are only a few angstroms. SDSS spectra are used to describe and evaluate the FRA algorithm in this paper, and figure 1 presents the positions of non-ElGs and weaker-ELGs in the SDSS classification system. SDSS classifies galaxies based on eClass, a single-parameter classifier, ranging from about 0.35 to 0.5 for early- to latetype galaxies (Yip et al. 2004); 2316 galaxy spectra are selected from five random SDSS sky areas: 0360, 0509, 0668, 0750, ˚ relation and 0877. Figure 1 shows the eClass–eW (4000 A) ˚ is the value of the 4000 A ˚ break diagram, where eW (4000 A) ratio, which is calculated by (blue flux)/(red flux), so for most galaxies the value is < 1. Figure 1 shows that the values of ˚ increase as an approximate linear function of the eW (4000 A) eClass values. In statistics, Tresse found that D4000 , where D4000 is defined as (red flux)=(blue flux), decreases as the equivalent widths of the emission lines increase (Tresse et al. ˚ Balmer break. 1999), namely ELGs have the lower 4000 A ˚ and D4000 have a reciprocal relationActually, eW (4000 A) ship; hence, Tresse’s conclusion is consistent with that shown in figure 1. The galaxies well processed by the FRA mainly are distributed in the areas of ABDC and BEFD in figure 1, where ˚ = 0.8. the vertical coordinate of line, ABE, is eW (4000 A) According to Tresse’s paper, the average D4000 for ELGs is about 31 percent smaller than that for non-ELGs; meanwhile, there is an overlap between the two distributions, and there is no clear separation into distinct populations. The start point of ˚ = 1=1.3 the overlap part is about D4000 = 1.3, or eW (4000 A)  0.77. Almost all galaxies with D4000 < 1.3 are ELGs, and ˚ only 2 percent of the non-ELGs have D4000 < 1.2 [eW (4000 A)

No. 6]

Fast Automatic Redshift Determination Using Absorption Lines Recognition

1315

˚ which can be described by y = 0.7298 * x + 0.6993. Every “*” Fig. 1. Approximate linear statistics relationship between eClass and eW (4000 A), symbol denotes a galaxy from five random SDSS sky areas, 0360, 0509, 0668, 0750, and 0877.

˚ = 1=1.2  0.83]. Hence, it is reasonable to let eW (4000 A) = 0.8 be the threshold. In the area of ABDC, 1653 galaxy ˚  0.8, spectra out of 1665 with eClass  0 have eW (4000 A) ˚ Balmer namely 99.3% non-ELGs present remarkable 4000 A breaks. In the area of BEFD, 197 galaxy spectra out of 511 ˚  0.8, which means with eClass 2 (0, 0.25] have eW (4000 A) 38.6% spectra have both weaker emission lines and clearer ˚ Balmer breaks. As for galaxies in other parts of 4000 A figure 1, they generally have stronger emission lines. 2.2. To Distinguish Galaxy Category The FRA algorithm is designed to automatically measure the redshifts of non-ELGs, weaker-ELGs, and galaxies with only one clearly visible emission line in the optical band, instead of ELGs with several stronger emission lines. Obviously, for a galaxy, the FRA must firstly distinguish its type, and then determines the next handling step. The steps of galaxy category judgment are as follows: 1. Input an observational galaxy spectrum and filter it using ˚ so as to filter a median filter with a narrow window (5–10 A), out the impulse noise, such as sky lines, or weaker or narrow emission lines. We then process the spectrum using a discrete cosine transformation (DCT), and the DCT coefficient array, M , is acquired. The first coefficient in M is the mean of all spectrum data, and others are low-frequency, mediumfrequency and high-frequency coefficients one by one. 2. Set the first m DCT coefficients (the low-frequency and medium-frequency coefficients) to zero, since the emission lines are high-frequency data. We then process the modified coefficient array using a DCT inverse transformation so as to abtain the reconstructed spectrum, mainly composed of higher frequency data. For example, in the middle panel of figure 2, the first 40 DCT coefficients are set to zero. The amplitudes of the emission lines in the reconstructed spectrum obviously

exceed those of the neighborhoods. 3. Remove the first p maximum and the first q minimum of the reconstructed spectrum so as to avoid the influences from too-high or too-low data. Then, compute the mean of the residual data, labeled by e. Finally we use the “*” symbols to mark the points being higher than ke, namely mark the positions of stronger emission lines. Another example is shown in figure 3, where no “*” mark means no stronger emission line in the spectrum. 4. Count the number of the local peaks among all “*” symbols; namely, calculate how many stronger emission lines exist in the spectrum. If the number is less than two, the galaxy belongs to non-ELGs or weaker-ELGs. We then replace every “*” symbol with the local mean of its neighborhood data, and sent the spectrum to the following processing steps of the FRA. Otherwise, the galaxy belongs to stronger-ELGs, and is processed by the spectral lines matching methods. To evaluate the performance of the above distinguishing method, we select 356 test spectra with SNg > 3 (SNg : median signal-to-noise ration in g 0 ) from random SDSS sky area 0360. The number of emission lines in every spectrum detected by the FRA is written into array N ; then, the elements of N are sorted in ascending order. According to the array N , the corresponding spectra are plotted one by one in figure 4. In other words, the horizontal coordinate of every spectrum is determined by array N . From left to right, the corresponding spectra have more and more stronger emission lines. The spectra between A and B have no stronger emission line, namely are non-ELGs. The spectra between B and C have only one stronger emission line, the spectra between C and D have two stronger emission lines, and the spectra between D and E denote ELGs with three or more emission lines. To verify the above results given by the FRA, we check ˚ values of the spectra, and the results are the eW (4000 A)

1316

J.-S. Han, A.-L. Luo, and Y.-H. Zhao

[Vol. 63,

Fig. 2. FRA method distinguishes and removes stronger-ELGs based on the DCT transformation. Upper panel: an observational galaxy spectrum. Middle panel: reconstructed spectrum mainly composed of high-frequency data. The “*” symbols mark the points being higher than ke, where k = 7. Lower panel: local details of the middle panel.

Fig. 3. FRA method distinguishes non-ELGs or weaker-ELGs based on DCT transformation. Upper panel: an observational galaxy spectrum. Lower panel: the reconstructed spectrum without the “*” symbols.

vertical coordinates of the corresponding spectra in figure 4. ˚ increase from left to In general, the values of eW (4000 A) right. Almost all spectra between A and C, which are classified ˚ as non-ELGs or weaker-ELGs by the FRA, have eW (4000 A) ˚  0.8, and only two spectra with high eW (4000 A) are misclassified. Hence, the global success rate of the FRA is higher than 99%. For spectra from random SDSS sky areas 0509, 0668, 0750, and 0877, similar test results are acquired.

2  20 1  10 = : (1) 10 20 Here, 10 , 20 are rest wavelengths, and 1 , 2 are wavelengths with a certain redshift from the observational spectrum. If 1 , 2 can be accurately detected and recognized, we can determine z by lines identification. Therefore, the kernel of the FRA algorithm lies in the choice of 10 , 20 and the detection of 1 and 2 .

3. Basic Principle of FRA Algorithm

3.1. Features Selection

For an observational galaxy spectrum, z (redshift) is an unknown constant,

Since non-ELGs and weaker-ELGs have no, or only one, clearly visible emission line in the optical band, the FRA algorithm utilizes a few prominent absorption lines instead of the

z=

No. 6]

Fast Automatic Redshift Determination Using Absorption Lines Recognition

1317

˚ values of the galaxy spectra in SDSS sky area 0360. Every “*” symbol marks a spectrum. Fig. 4. eW (4000 A)

Fig. 5. Galaxy template spectrum spDR2-023, which is used to determine redshift in SDSS. Features A, B, C, and D are prominent.

common emission lines adopted by the traditional spectral lines matching methods. Figure 5 presents the features in a spectrum, where line A is Hı 4103, B is G-band 4306, the neighbor of B is H 4342, line C is Hˇ 4863 and line D is Mg 5177. The fluxes in the features are usually significantly distinguished from those of their neighborhood wavelengths, and there are few other lines nearby; hence, it is easy to detect and recognize the features automatically. In addition, the features still stay in the optical band even in the case of a higher ˚ if redshift; for example, the Mg 5177 line moves to 8542 A the redshift is 0.65; [0, 0.65], the redshift measurement range of the FRA algorithm, is suitable for almost all requirements of redshift measurements for non-ELGs and weaker-ELGs, because clear spectra of too faint objects in the far distance could not be easily collected by the current 2- or 4-meter telescopes. Actually, in the whole SDSS galaxy spectra database,

there are only 274 galaxys with eClass < 2.5, SNg > 3.0, and z > 0.6. In other words, presently, only a few available non-ELGs or weaker-ELGs spectra with z > 0.6 have been collected in SDSS. Other important features are Balmer breaks at 2420, 2640, ˚ wavelength (Bruzual 1983), where 4000 A ˚ 2900, and 4000 A break locates in the optical band. No matter how its flux varies with the stellar metallicity, temperature and ages, a clear ˚ break often exists in non-ELGs and weaker-ELGs 4000 A ˚ break spectra (Tresse et al. 1999). Zaritsky regards the 4000 A as the criterium of galaxy classification (Zaritsky et al. 1995). ˚ break and lines identification Luo makes use of the 4000 A to determine the redshift (Luo et al. 2001). Three disadvantages exist in Luo’s algorithm. Firstly, Luo’s algorithm is unidirectional. The algorithm only estimates redshifts, but is unable to give the confidence levels of the measurements. Therefore, once an incorrectly estimated redshift appears, the algorithm has to output it rather than refuses it, or measures the redshift again, which will further affects the subsequent tasks that need an accurate redshift. To solve this problem, the judgment-feedback module is added in the FRA algorithm. Secondly, Luo’s algorithm never considers mis-recognition of ˚ break caused by greater flux fluctuations at other the 4000 A wavelengths of the spectra, or other breaks shifting into the ˚ break in the case of optical window, for example, 2900 A z > 0.24. Hence, Luo’s algorithm is only suitable for galaxies with smaller redshifts. Instead, the FRA algorithm can automatically process the mis-recognition problem and the redshift range is as large as [0, 0.65]. Thirdly, Luo makes use of dozens of common lines, including the Ca II H&K 3934, 3968, Ca I4227, Balmer lines, Mg I5167, 5173, 5184, and NaD5790, 5796, and so on. So many lines have various flux intensity and are not ranked, which makes it complex in lines extraction and identification, especially in the case of several weaker lines being contaminated by noise. Instead, the FRA algorithm picks two prominent absorption lines from

1318

J.-S. Han, A.-L. Luo, and Y.-H. Zhao

[Vol. 63,

Fig. 6. Flowchart of the fast automatic redshift determination using absorption lines recognition (FRA algorithm).

the four features A, B, C, and D in figure 5, and then the redshift is determined through lines identification. In practice, we process the observational spectra from the trailing ˚ break, since the front edge of the 4000 A ˚ edge of the 4000 A break locates at the start point of an optical spectrum where the local signal-to-noise ratio is low and the fluxes fluctuate frequently. Figure 6 presents the flowchart of the FRA algorithm. Compared with Luo’s algorithm, the FRA method greatly improved the whole flowchart and details in every processing step. According to the flowchart, more details will be described in the following sections.

3.2. Candidates of 4000 A˚ Balmer Break ˚ break commonly jumps from blue In a spectrum, the 4000 A to red, but it is approximatively described by a ramp function instead of an ideal step function. Not only does the flux of the ˚ jump vary greatly, but also its wavelength varies between 10 A ˚ Let f be the data sequence of the spectral flux, and and 50 A. the differences labeled by Δf are computed, Δfi = fi +1  fi :

(2) ˚ Here, i = 1, 2, 3  , Δi = i +1  i = 10 A. Typically, at ˚ break, Δfi > 0 and jΔfi j is the maximum. the 4000 A

No. 6]

Fast Automatic Redshift Determination Using Absorption Lines Recognition

1319

Fig. 7. Data-processing procedure of the FRA. Upper-left panel: Sp1 is an observational galaxy spectrum. Upper-right panel: Sp2 is the spectrum after median filtering Sp1. The curve of differences, Δf , is at the bottom. In the panel, the actual value of the curve is 4 * Δf , only in order to show the data more clearly. Lower-left panel: The top shows the local continuum of Sp2. The bottom is the subtracted-continuum spectrum where only the ˚ and [BkPoint + 750 A, ˚ BkPoint negative data are retained. The “*” symbols mark the local minimum points of the two parts, [BkPoint, BkPoint + 750 A] ˚ respectively. Lower-right panel: Find the accurate local minimum points, m1 and m2, in the observational spectrum. The “*” symbols mark + 2000 A] m10 and m20 determined based on Sp2, and the blacklozenge symbols mark m1 and m2. The automatic measurement results are consistent with the objective properties of the spectrum.

Spectral lines and noise are major factors affecting the ˚ break. According to the full correct recognition of the 4000 A width at half maximum (FWHM) and the peak flux (F), there are four cases: small FWHM and F, small FWHM and large F, large FWHM and small F, large FWHM and F. The first two mean narrow lines or impulse noise with various flux intensities, which can be filtered by a median filter with a wider window. The third means a broad line that presents a slower flux change in a wider wavelength range; hence, the corresponding jΔfi j is smaller. In most cases, the fourth means a strong and broad line, which mainly appears in ELGs. Since stronger-ELGs have been removed in the preceding module, this case will never be considered in the following sections. ˚ candidates are deterTake figure 7 for example, The 4000 A ˚ candidates mean possible mined as follows. Here, the 4000 A ˚ break trailing edge. wavelengths of the 4000 A 1. Use a median filter with a broader window to filter out the impulse noise, sharper lines or glitches; meanwhile, the spectral features discussed in subsection 3.1. are retained. In the upper-right panel of figure 7, Sp2 is the resulting spectrum of median filtering the observational spectrum Sp1. ˚ which is based on The window size of the filter is 90 A, the median filter principle and the spectra properties. The median filter belongs to a nonlinear filter, and does well in restraining impulse signals that are narrower than half of the filter window width. Figure 8 presents the processing results for different types of signals. Obviously, the median filter has no effect on the step and ramp signals; therefore, the ˚ break can not be filtered out, due to its approxima4000 A tive ramp characteristic. Moreover, figure 8 shows that the median filter can restrain the peaks of a triangular signal, but

retain its major profile. For a spectral line that is commonly regarded as Guassian, or an approximatively triangular signal, the smaller is its FWHM, the sharper is the line, and the more similar it is to an impulse signal, and the larger is its restrained part. The limit state is when the line is fully filtered out just as the third type in figure 8. However, such a case does not appear, because according to the Lick Indices (Trager et al. 1998) listed in table 1, the characteristic lines selected by the FRA are broader and clearly visible. [In table 1, Δ1 = (Red end + Red begin)=2  (Blue end + Blue begin)=2, and Δ2 = Red end  Blue begin.] Therefore, it is reasonable to ˚  90 A], ˚ let the common width of the filter window be [ 70 A, which can not only retain the shape of the spectrum’s peaks and the valleys where characteristic lines lie, but can restrain any glitches caused by noise or weaker spectral lines. 2. Compute Δfi of the median filtered spectrum. ˚ break according to 3. Determine n candidates of 4000 A Δfi . In figure 7, Δfmax (peak of Δf ) is just near to the ˚ break. Therefore, the FRA sorts the wavelength of 4000 A data sequence Δf , and determines the wavelength of Δfmax . Then, from Δfmax to the red side of the spectra, FRA checks every Δfi until Δfi < 0 is found, and the corresponding wave˚ break. length is a candidate for the trailing edge of the 4000 A ˚ break Note that an error maybe exist between the true 4000 A and the wavelength estimated by the FRA, but it is still well acceptable as long as they are close to each other, because in ˚ break is only the start this paper the trailing edge of the 4000 A point of measuring the redshift. Two common problems must be considered. Firstly, in about 10–20 percent of the observational spectra, Δfmax does not ˚ break, because of necessarily correspond to the true 4000 A

1320

J.-S. Han, A.-L. Luo, and Y.-H. Zhao

[Vol. 63,

Fig. 8. Median filter processes different types of signals, where the window width is 5 pixels. Left panel: original signals. The signals are the step, ramp, impulse and triangular signals from top to bottom. Right panel: signals after median filtering. Table 1. Index of characteristic lines used by the FRA.

Name

Blue begin

Blue end

Red begin

Red end

Δ1

Δ2

Lick G4300 Lick Hb Lick Mg2 ˚ break 4000 A

4267.625 4828.875 4896.375 3750.000

4283.875 4848.875 4958.875 3950.000

4320.125 4877.625 5302.375 4050.000

4333.375 4892.625 5367.375 4250.000

51.000 46.250 407.250 300.000

65.750 63.750 471.000 500.000

˚ break noise, spectral lines, spectral flux fluctuations or 2900 A shifting to the observational window, and so on. However, ˚ break is still higher in the whole specjΔfi j of the 4000 A ˚ break. trum, because of the jump characteristic of the 4000 A ˚ To automatically determine the true 4000 A break, the FRA ˚ break trailing algorithm adopts n candidates for the 4000 A

edge, and their default priorities are from big Δfi values to smaller ones. Secondly, there commonly exists a cluster of larger differ˚ ences in the neighborhood of Δfmax , because the true 4000 A ˚ break, 2900 A break, or other interferences commonly presents a certain jump width instead of an ideal jump. If the first

No. 6]

Fast Automatic Redshift Determination Using Absorption Lines Recognition

1321

˚ break. The meanings of all curves and marks are similar to those of figure 7. Fig. 9. Example of mis-recognition for the 4000 A

n largest values of Δf are naively selected and the corresponding trailing edges are computed, several of the same candidates with higher priority will be derived. Such cases will result in redundant computation, wasting time and memory, ˚ break not being selected even the true wavelength of 4000 A as a candidate. Hence, in order to keep the uniqueness of candidates, the FRA algorithm removes repeated ones through comparing candidates. Namely, the final n (n is from 3 ˚ break are different from each to 5) candidates for the 4000 A other. Then, according to the redshift confidence level, the judgment-feedback module described in subsection 3.4 determines whether the measured redshift is output, or another candidate is used to measure the redshift again. ˚ and Take figure 9 as an example, Δfmax appears at 4434 A ˚ hence, the the corresponding jump trailing edge is at 4444 A; estimated redshift zO is 0.1305 if we continue processing the spectra based on the result. However, the released redshift of SDSS z is 0.0259 and the error Δz = jz  zO j = 0.1046. Obviously it is too large to meet the demand of the measure˚ break is near the ment precision. Actually, the true 4000 A ˚ shown in the figure; second maximum of Δfi , namely 4074 A ˚ The lowerthe corresponding jump trailing edge is at 4114 A. ˚ break right panel shows that the FRA can find the true 4000 A trailing edge and the measured redshift zO is 0.0265; therefore, Δz = jOz  zj = 0.0006, which is less than the allowable error ˙0.001. 3.3. Automatic Redshift Determination The FRA begins to process spectral data from the trailing ˚ break labeled by BkPoint. The whole edge of the 4000 A processing procedure is divided into two parts: the detection and recognition of the absorption lines. The traditional method of detection is a single or multiplex Guassian fitting. Although it is worth discussing whether Guassian curves fit the

observational spectral lines well, this question will not be discussed in this paper. To find the central wavelengths of the characteristic lines that are used to determine the redshift automatically, a simple, but efficient, method is adopted in this paper. Take figure 7 as an example; the steps of automatic redshift determination are performed as follows, where steps 1 to 4 are the detection of the absorption lines and the 5th step is recognition of the absorption lines. 1. Normalize the spectrum Sp2, and then fit the local continuum using a second-order polynomial. The fitting range ˚ which can cover all characteris [BkPoint, BkPoint + 2000 A], ˚ istic lines used by the FRA. In detail, if z = 0.65, the 4000 A ˚ and Mg 5177 to 8542 A. ˚ The difference moves to 6600 A ˚ and 8542 A ˚ is 1942 A. ˚ Considering the allowbetween 6600 A ˚ from BkPoint. able error margin, it is reasonable to shift 2000 A 2. Subtract the local continuum from the spectrum, and then set the data greater than zero to zero, so as to extract the feature absorption lines. The resulting spectrum is labeled by Sp3 in the lower-left panel. ˚ 3. Divide Sp3 into two parts, [BkPoint, BkPoint + 750 A] ˚ Then, the local ˚ BkPoint + 2000 A]. and [BkPoint + 750 A, minimum points of the two parts are searched, respectively, since the central wavelengths of the characteristic lines are usually the local extreme points of the spectrum. The two local minimum points are labeled by m10 and m20 and marked by the “*” symbols in the lower-left panel. In the first part, there exist Hı 4103, the G-band 4306 and H 4342. In non-ELGs spectra, the absorption peak of Hı 4103 is usually less than that of the G-band 4306. However, in the ELGs spectra, with other emission lines becoming stronger, the absorption peak of Hı 4103 increases, and even becomes greater than that of the G-band 4306. Hence, the local minimum point may be Hı 4103 instead of the expected G-band 4306. More details are shown in

1322

J.-S. Han, A.-L. Luo, and Y.-H. Zhao

[Vol. 63,

Fig. 10. Five galaxy template spectra from SDSS. The absorption peaks of Hı 4103 and G-band 4306 are compared.

figure 10. In order to remove the interference caused by Hı 4103, FRA judges whether the wavelength of the local ˚ If the answer is minimum point is less than .BkPoint + 150 A). no, FRA determines that this extreme point is just the G-band 4306, and begins the next processing step. Otherwise, it is Hı 4103, and then from the local minimum point to its right and left, respectively, FRA lets the nonzero neighborhood data of the local minimum point be zero, until the first point with zero value is met. Then, FRA searches for the new local minimum point in this new spectrum. Repeat the step until the right minimum point is found. In the second part, there exist Hˇ 4863 and Mg 5177. Figure 10 shows that the prominent absorption line, Mg 5177, exists in both non-ELGs and ELGs. And Hˇ 4863 presents absorption line in non-ELGs and some ELGs. This step only detects the local minimum point, and the 5th step will recognize whether it is Hˇ 4863 or Mg 5177. 4. According to m10 and m20 , determine the accurate local minimum points in the observational spectrum Sp1. Since the filtered spectra Sp2 losses some detailed information of Sp1, m10 and m20 found in Sp2 are not, but close to the accurate local minimum points in Sp1. In response to this problem, the FRA searches the neighborhood of m10 and m20 in Sp1. Firstly, the FRA filters Sp1 using the median filter with narrow window to remove the impulse noise. Secondly, the FRA lets m10 be the central wavelength, then fit the local continuum in ˚ Thirdly, the FRA subtracts the window with a size of 300 A. the local continuum, and the new local spectrum labeled by Ls1 is derived in the lower-right panel. A similar operation is done to m20 , and Ls2 is derived. Finally, the local minimum points labeled by m1 and m2 are found in Ls1 and Ls2, and the corresponding wavelengths are stored in m1 and m2 .

5. Determine the redshift by absorption lines identification. m1 is probably the wavelength of G-band 4306 or H 4342 and m2 is probably the wavelength of Hˇ 4863 or Mg 5177. Following the standard line matching procedure, FRA recognizes m1 and m2 , and then determines the corresponding redshifts z1 and z2. The final estimated redshift is zO = (z1 + z2)=2. 3.4. Confidence Level of Redshift In the above sections, we have described how the FRA auto˚ break and characmatically detects and recognizes the 4000 A teristic lines. If the results given by the FRA are consistent with the true properties of the observational spectrum, the values of z1 and z2 are close to each other. Therefore, the confidence level of redshift labeled by RedshiftCof should meet RedshiftCof / (1=jz1  z2j). Considering that the allowable error of the galaxy redshift measurement should be less than ˙0.001, we define RedshiftCof as RedshiftCof = 1  0:1  jz1  z2j=0:003; jz1  z2j  0:027: (3) In addition, we define RedshiftCof = 0 when jz1z2j > 0.027. When RedshiftCof 0.9, the measured redshift is regarded as a credible one. When RedshiftCof < 0.9, the smaller is RedshiftCof, and the more incredible is the measured redshift. ˚ break Therefore, in such a case, FRA selects a new 4000 A candidate, and measures the redshift again. Repeat the above ˚ processing steps, until RedshiftCof 0.9. If all 4000 A break candidates have been tested, while no RedshiftCof is larger than 0.9, FRA will select the redshift with the highest RedshiftCof as the referenced redshift, and set the rejection flag, so as to inform the spectra-processing system of

No. 6]

Fast Automatic Redshift Determination Using Absorption Lines Recognition

1323

Fig. 11. Example of galaxy spectra with higher redshifts and low signal-to-noise ratios, where SNg = 0.361 and z = 0.5307. The meanings of all curves and marks are similar to those of figure 7.

measuring this spectrum using other methods. Figure 7 shows the whole procedure of the FRA processing of an observational spectrum. For this spectrum, the measured redshift is 0.127106, and the redshift released by SDSS is 0.127015, the difference value between the two redshifts being 0.00009, which is in the allowable error range, ˙0.001. The measurement time is 0.1250 s. In the same environments [Intel(R) core(TM)2 CPU E5300 2.6 GHz, 2GB memory, MATLAB R2007a], the redshift of the same spectrum is measured by the full-spectrum matching method (crosscorrelation method), and the galaxy templates [E, S0, Sa, and Sb (Kinney et al. 1996)] are adopted. The measurement error is also within the allowable range; however, the measurement time is 1.18 s, which is about 10-times longer than that of the FRA method. If more templates are used by the full-spectrum matching method, the speed of the method will be slower. Instead, the speed of the FRA is not be affected by the number of templates, since FRA does not adopt the standard galaxy template spectra database and exhaustive searching strategy. Moreover, the measurement time of the full-spectrum matching method depends on the redshift values. Instead, the measurement time of the FRA algorithm has nothing to do with the redshift values. Similarly, the experiments for more spectra with different redshifts show that the FRA method does better in measurement speeds than the fullspectrum matching method. All experiments of this paper are tested in the MATLAB environment. If the FRA algorithm is decided to be used in the spectra pipeline, it will be written in C/C++ or Python language, and the measurement speed will be much higher.

4. Processing Several Typical Types of Spectra In this section, we illustrate that the FRA method can do well in processing several typical types of galaxy spectra. 4.1. Non-ELGs with Higher Redshifts A higher redshift is usually accompanied by a lower signalto-noise ratio, which makes it more difficult to detect and recognize the characteristic lines, and further affects the redshift measurement. The FRA algorithm has a certain robust˚ break and several prominess, because it selects the 4000 A nent absorption lines as the features. In addition, although the median filter method is simple, it is effective for filtering the impulse noise. Finally, the features in the blue band of a spectrum make it convenient for measuring the higher redshift. The upper-left panel of figure 11 shows a spectrum with a low signal-to-noise ratio, SNg = 0.361. The redshifts measured by SDSS and our algorithm are 0.5307 and 0.5316, respectively, and the difference value between the two estimated redshifts is 0.0009. The measurement time of our algorithm is 0.1406 s. The measurement speed is fast and independent on the redshift value. 4.2. Data Missing Spectra Data missing spectra are recovered by linear interpolation in SDSS. In the case of missing more data, the measurement accuracy of the full-spectrum matching methods will be affected. Instead, the FRA can not be affected as long as the missing data do not appear in the feature wavelengths utilized by the FRA. Figure 12 shows an observational spectrum where many data in the red band are missed. The redshifts are 0.2248 and 0.2247, measured by SDSS and the FRA, respectively, and the

1324

J.-S. Han, A.-L. Luo, and Y.-H. Zhao

[Vol. 63,

Fig. 12. An example of data missing spectra, where SNg = 10.8618 and z = 0.2248. The meanings of all curves and marks are similar to those of figure 7.

Fig. 13. Example of weaker-ELGs, where SNg = 3.107 and z = 0.2960. In the upper-left panel, in order to show the data more clearly, we move up the observational spectrum by 10 flux units. In the bottom of the panel, there is the new spectrum where the emission-line data in the original spectrum have been replaced by the mean of the neighborhood data. The meanings of other curves and marks are similar to those of figure 7.

difference value between the two estimated redshifts is 0.0001. 4.3. Weaker-ELGs and Galaxies with Only One Visible Emission Line The upper-left panel of figure 13 shows a galaxy spectrum with the higher redshift (0.2960) and the lower signal-to-noise ratio (SNg = 3.107). There is only one clearly visible emission line in the optical band, and the weaker emission lines mix with

noise. Directly extracting emission lines will result in more spurious lines and missing true lines; then, the measurement accuracy of redshift will be affected. For this kind of spectra, the FRA firstly determines the position of the stronger emission line based on the DCT and DCT inverse transformation, then replaces the emission line with the mean of its neighborhood data. The following steps are similar to those of processing non-ELGs. For the spectrum in figure 13, the redshift measured

No. 6]

Fast Automatic Redshift Determination Using Absorption Lines Recognition

1325

Fig. 14. Example of LAMOST spectra, where SNg = 3.246 and z = 0.0448. The meanings of all curves and marks are similar to those of figure 7.

by the FRA is 0.2966 and the measurement error is 0.0006. 4.4. Spectra of LAMOST Presently, in the commissioning observation of LAMOST, both the instruments and the two-dimension pipeline are adjusted frequently, which causes the poor quality of the onedimension spectra. Figure 14 shows a typical spectrum of LAMOST. The spectrum is a little odd, because the continuum and spectral lines are distorted by sky lines, instrument noise ˚ to 5900 A, ˚ the and other noise. Especially, from about 5700 A connection gap between the blue and red bands is not good, and the flux calibration is not accurate. It is impossible to measure the redshift of such a spectrum accurately using the full-spectrum matching methods. Instead, our FRA can well measure the redshift, as shown in figure 14, and the measured redshift is 0.0450. The galaxy has been observed by SDSS (objectID = 587741533846110345), and the released redshift is 0.0448. The difference values between the two redshifts is 0.0002. Obviously, the prominent features used by the FRA cannot be easily affected by noise and the connection gap, and the FRA is insensitive to the uncertainties in the flux calibra˚  5950 A], ˚ tion. To be on the safe side, we mask [ 5690 A, in order to avoid the connection gap affecting the detection of ˚ break. The FRA can do well as long as the connecthe 4000 A tion gap does not appear in the features wavelengths utilized by the FRA, which is similar to that in subsection 4.2 (data missing spectra). 5. Experiments and Discussions 5.1. Test Data In this paper, the galaxy spectra from the seventh data release of SDSS (DR7) are used to evaluate the performances of the FRA algorithm. The DR7 spectroscopic data include data from 1802 main survey plates of 640 spectra each, and

cover 8200 square degrees. The SDSS spectra are taken with two fiber spectrographs, covering the wavelength range ˚ with a resolution of R = =Δ from  3800 to  9200 A varying from  1850 to  2200 (Adelman-McCarthy et al. ˚ between two pixels. 2008). In this paper, we adopt Δ = 2.5 A Note that SDSS spectra data are stored using vacuum wavelengths. Table 2 lists all test data sets from SDSS of this paper, which cover galaxies with different redshifts and signalto-noise ratios. Data set 1 is composed of non-ELGs, or E/S0 galaxies. The selection criteria of data are as follows: zConf (redshift confidence) 0.95, EW (H˛)(equivalent width of the line H˛)  0, Hgh(H˛)(Height of the line H˛)  0, EW ([O II]3727)  4, and Hgh([O II]3727)  3. According to SNg (median signalto-noise ratio in the g 0 band), data set 1 can be divided into three subsets: 1-1, 1-2, and 1-3. We select spectra from the subsets randomly and the data missing spectra are excluded. Subset 1-1: 9078 spectra with SNg 2 [20.0, 1) and z (redshift) 2 (0, 0.245]. Subset 1-2: 63638 spectra with SNg 2 [10.0, 20.0) and z 2 (0, 0.300], from which 6474 spectra are selected randomly. Subset 1-3: 97291 spectra with SNg 2 [3.0, 10.0) and z 2 (0, 0.600], from which 8399 spectra are selected. Data set 2 is composed of weaker-ELGs, mainly Sa, Sab, and Sb galaxies. The selection criterion of spectra data is ˚ Data set 2 also is divided into three EW (H˛) 2 (0, 3 A]. subsets: 2-1, 2-2, and 2-3. A few of spectra show stronger emissions in other wavelengths such as [O II]3727, and the FRA algorithm can detect the lines and determine the next processing step according to section 2. Subset 2-1: 1416 spectra with SNg 2 [20.0, 1), from which 63 data missing spectra are excluded. Subset 2-2: 6548 spectra with SNg 2 [10.0, 20.0), from which 1681 spectra are selected randomly. Subset 2-3: 11206 spectra with SNg 2 [3.0, 10.0), from which 2267 spectra are selected randomly.

1326

J.-S. Han, A.-L. Luo, and Y.-H. Zhao

[Vol. 63,

Table 2. Test data sets from SDSS of this paper.

Test data sets

Total amount in DR7

Data set 1 Data set 2 Data set 3

9078+63638+97291 = 170007 1416+6548+11206 = 19170 No statistical results

Data set 4

45881

Amount of data randomly selected to evaluate the algorithm 9078+6474+8399 = 23951 1353+1681+2267 = 5301 From data-set1:485+146+360 = 991 From data-set2: 63+87+96 = 246 45881

Table 3. Several subsets of the DR7 galaxy spectra with nonzero zWarning.

Subset

Selection criteria of the subset

Total amount

M A B C D

M: zWarning¤0 and SNg > 3:0 A: A M and zWarning = 0x00000100, 0x00000200, or 0x00000210 B: B M and eClass < 0.25 C: C=A\B ˚ < 0.8 D: D C and eW (4000 A)

84717 54441 63681 52464 45881

Table 4. Results of test 1 for data set 1 (non-ELGs).

SNg

Redshift

Total number of spectra

N1

N2

S1

S2

[20.0, 1)

[0.0,0.1) [0.1,0.2) [0.2,0.3) [0.0,0.1) [0.1,0.2) [0.2,0.3) [0.0,0.1) [0.1,0.2) [0.2,0.3) [0.3,0.4) [0.4,0.6)

8511 523 44 3605 2725 144 896 2768 2607 2117 11

64 6 1 149 121 6 166 322 413 551 4

72 7 3 151 125 10 167 523 507 671 11

99.3% 98.9% 97.7% 96.0% 95.6% 95.8% 81.5% 88.4% 84.2% 74.0% 63.6%

99.2% 98.7% 93.1% 95.8% 95.4% 93.0% 81.4% 81.1% 80.6% 69.1% 0

[10.0,20.0)

[3.0,10.0)

Data set 3 is composed of the data missing spectra that is excluded from data sets 1 and 2. Subset 3-1: 485 spectra from data set 1 and 63 spectra from data set 2, with SNg 2 [20.0, 1). Subset 3-2: 146 spectra from data set 1 and 87 spectra from data set 2, with SNg 2 [10.0, 20.0). Subset 3-3: 360 spectra from data set 1 and 96 spectra from data set 2, with SNg 2 [3.0, 10.0). Data set 4 is composed of non-ELGs and weaker-ELGs whose zWarning flags are set to 0x00000100, 0x00000200, or 0x00000210. In DR7, if everything is ok, the zWarning flag is 0x00000000. Otherwise, if an abnormal status appears, the zWarning flag is set (Stoughton et al. 2002; AdelmanMcCarthy et al. 2008). There are mainly three kinds of zWarning values that are related to the redshift. zWarning = 0x00000100 flags that the emission and absorption redshifts are inconsistent. The corresponding spectra usually present weaker emission lines, which are difficult to be detected and recognized. zWarning = 0x00000200 flags that the absorption redshifts are inconsistent. The third is a combination-value. For example, 0x00000210 denotes that both 0x00000010 and 0x00000200 problems exist in the galaxy spectrum. Also, 0x00001210 denotes that the spectra have low

confidence, absorption redshifts that are inconsistent and target misclassification problems. The common zWarning value is 0x00000100, 0x00000200, or 0x00000210; more details are listed in table 3. There are 54441 spectra as such in subset A, accounting for 64.3 percent of the total 84717 spectra in subset M and 5.8 percent of the total of 929555 spectra in DR7. Also, there are 45881 spectra in subset D, accounting for 84.3 percent of subset A. Subset D is data set 4. 5.2. Experiment Results and Discussions The requirement of the measurement accuracy is jOz  zj  0.001, where zO is the redshift measured by the FRA, and z is the redshift released by SDSS, which is assumed to be the optimal estimate value of the true redshift. Test 1: In table 4, figures 15 and 17, we illustrate the results obtained using data set 1 (non-ELGs). In table 4 and the following tables, the success rate of the FRA method labeled by S 1 is defined as (total number of galaxies N 1)=(total number of galaxies), where N 1 is the number of spectra measured by the FRA unsuccessfully. Similarly, N 2 is the number of spectra measured by the full-spectrum matching method

No. 6]

Fast Automatic Redshift Determination Using Absorption Lines Recognition

1327

Table 5. Results of test 2 for data set 2 (weaker-ELGs).

SNg

Redshift

Total number of spectra

N1

N2

S1

S2

[20.0, 1)

[0.0,0.1) [0.1,0.2) [0.2,0.3) [0.0,0.1) [0.1,0.2) [0.2,0.3) [0.0,0.1) [0.1,0.2) [0.2,0.3) [0.3,0.4) [0.4,0.6)

1296 56 1 1001 674 6 386 1538 310 33 none

25 2 0 46 30 1 100 243 69 13 —

27 2 1 46 32 2 103 368 81 17 —

98.1% 96.4% — 95.4% 95.5% — 74.1% 84.2% 77.7% 60.6% —

97.9% 96.4% — 95.4% 95.3% — 73.3% 76.1% 73.8% 48.5% —

[10.0, 20.0)

[3.0, 10.0)

Table 6. Results of test 3 for data set 3 (data missing spectra).

SNg

Redshift

Total number of spectra

N1

N2

S1

S2

[20.0, 1)

[0.0,0.1) [0.1,0.2) [0.2,0.3) [0.0,0.1) [0.1,0.2) [0.2,0.3) [0.0,0.1) [0.1,0.2) [0.2,0.3) [0.3,0.4) [0.4,0.6)

507 37 4 52 171 10 23 185 133 114 1

42 8 1 1 6 3 4 35 32 43 0

106 8 3 12 45 4 6 56 44 61 1

91.7% 78.4% — 98.1% 96.5% 70.0% 82.6% 81.1% 75.9% 62.3% —

79.1% 78.4% — 76.9% 73.7% 60.0% 73.9% 69.7% 66.9% 46.5% —

[10.0, 20.0)

[3.0, 10.0)

Table 7. Results of test 4 for data set 4.

SNg

Redshift

Total number of spectra

N1

N2

S1

S2

[20.0, 1)

[0.0,0.1) [0.1,0.2) [0.2,0.3) [0.0,0.1) [0.1,0.2) [0.2,0.3) [0.0,0.1) [0.1,0.2) [0.2,0.3) [0.3,0.4) [0.4,0.6)

4007 403 none 14867 13854 59 2579 8114 1831 164 3

102 10 — 1027 606 6 1196 1210 286 64 3

137 21 — 1300 1233 10 1155 2657 602 92 3

97.5% 97.5% — 93.1% 95.6% 89.8% 53.6% 85.1% 84.4% 61.0% —

96.6% 94.8% — 91.3% 91.1% 83.1% 55.2% 67.3% 67.1% 43.9% —

[10.0, 20.0)

[3.0, 10.0)

unsuccessfully, and S 2 is the corresponding success rate. Test 2: Results obtained using data set 2 (weaker-ELGs) are illustrated in table 5, figures 15 and 17. Test 3: Results obtained using data set 3 (data missing spectra) are illustrated in table 6, figures 16 and 17. Test 4: Results obtained using data set 4 are illustrated in table 7, figures 16 and 17. From the experiment results, we can draw the following conclusions. 1. The FRA method does well in determining the redshifts

of non-ELGs, weaker-ELGs and galaxies with only one visible emission line in the optical band, especially for spectra with higher signal-to-noise ratios. Figures 15 and 16 show that the redshift measurement errors for most spectra are within [0.001, 0.001], which meets the requirement for the measurement accuracy. Tables 4, 5, 7, and figure 17 show the whole redshift measurement success rate of the FRA. In the upper-left panel of figure 17, for example, for non-ELGs, the measurement success rate of the FRA is close to 1, and slowly decreases with the condition SNg > 11. From SNg = 11, the success

1328

J.-S. Han, A.-L. Luo, and Y.-H. Zhao

[Vol. 63,

Fig. 15. Distributions of the redshift measurement errors for non-ELGs and weaker-ELGs with different signal-to-noise ratios. In order to clearly show the results bewtween 0.003 and 0.003, we let the range of the horizontal coordinate (measurement error) be [0.003, 0.003]. If a measurement error is less than 0.003 or higher than 0.003, we let it be 0.003 or 0.003, respectively, which is why local peaks appear in the coordinates 0.003 or 0.003. The profile of every histogram is approximatively fitted with a Guassion function, and the coefficients  and  are marked in every panel.

Fig. 16. Distributions of the redshift measurement errors for data missing spectra and data set 4 with different signal-to-noise ratios. The meanings of coefficients  and  are similar to those of figure 15.

rate begins to decrease rapidly, but is still lager than 90% if SNg > 8. Similar results of weaker-ELGs and data set 4 are shown in other panels of figure 17. The experimental results of the full-spectrum matching method (cross-correlation method) are listed in tables 4, 5,

6, and 7. The galaxy templates E, S0, Sa, and Sb (Kinney et al. 1996) are adopted as the templates needed by the fullspectrum matching method. For spectra with high signal-tonoise ratios in data sets 1, 2, and 4, in general, the full-spectrum matching method does as well as the FRA method concerning

No. 6]

Fast Automatic Redshift Determination Using Absorption Lines Recognition

1329

Fig. 17. Success rate of the FRA for the spectra with different signal-ro-noise ratios in data sets 1 to 4, according to tables 4 to 7 and more detailed experimental results. The unit of SNg is 2.

the success rate. However, for spectra distorted by noise or higher redshifts, for example, when SNg 2 [3.0, 10.0) and z > 0.1, the success rates of the FRA method are larger than those of the full-spectrum matching method. Since more fluctuations of the spectral flux are caused by noise, or under the condition of higher redshift, more spectral data from the ultraviolet band shift into the optical band, it is not easy for the fullspectrum matching method to match the observed spectrum with the correct template; sometimes, the largest peak is not prominent, or multi-peaks appear in the cross-correlation function. Instead, the distortions caused by noise or higher redshift have less effects on the FRA method, because the prominent features used by the FRA are still in the optical band with the condition of high redshift, and not easily affected by the fluctuations of unimportant details. In addition, for every spectrum the average measurement speed of the FRA method is about 10-times faster than that of the full-spectrum matching method. 2. The FRA algorithm has a certain robustness, which embodies in two aspects, denoise and the ability to process data missing spectra. Based on the physical properties of spectra, the FRA method uses several prominent lines as features. The lines are easy to be extracted and recognized, and not be easily masked by the low level of noise; therefore, the FRA algorithm has a certain denoise ability and high success rate of the redshift measurement, which has been discussed in conclusion 1. However, with noise increasing, more glitches appear near the feature lines, which can affect detecting the central wavelength. When noise is high enough to cause more spectral data distortion, even the spectra not being recovered, a higher measurement error of the redshift is caused. That is why more spectra with higher measurement errors appear in SNg 2 [3.0, 10.0), as shown in figures 15 and 16. For such spectra seriously distorted

by noise, we have to output rejections according to the flag RedshiftCof. Moreover, we find that in tables 4, 5, 7, the success rate of the FRA in z 2 [0.1, 0.3) is slightly higher than that in z < 0.1 with the condition SNg 2 [3.0, 10.0), which is different from those in SNg 2 [20, 1) and SNg 2 [10.0, 20.0). The main reason is that noise masks the features in the g0 band when z < 0.1, and the features shift into the r 0 band with a higher signal-to-noise ratio when z 2 [0.1, 0.3). According to the above discussion, it is necessary to search for improved de-noising methods, which is our next research goal. Note that noise can only be reduced, but never be eliminated. For data missing spectra, on the whole, the FRA does better than the full-spectrum matching method, because the FRA cannot be affected as long as the missing data do not appear in the feature wavelengths utilized by the FRA. However, the lower-left panel of figure 17 shows that the success rate fluctuates relatively greatly, because the missing data may appear at any wavelength, certainly including the feature wavelengths. Another reason for unsuccessful measurements is that some higher slopes of linear interpolation may affect the detec˚ break or the feature lines, tion and recognition of the 4000 A since SDSS recovers the data missing spectra using linear interpolation. 6. Summary 1. We propose a fast automatic galaxy redshift determination method, named FRA, in this paper. The method is mainly applied to non-ELGS, weaker-ELGs and galaxies with only one visible emission line in the optical band. 2. The FRA method firstly distinguishes the galaxy category and removes stronger-ELGs using DCT and an inverse DCT transformation, then determines n candidates for the trailing

1330

J.-S. Han, A.-L. Luo, and Y.-H. Zhao Table 8. Comparison between the FRA and the full-spectrum matching method.

Method

I1

I2

I3

I4

I5

I6

I7

I8

FRA of this paper

non-ELGs, weaker-ELGs, ElGs with one prominent emission lines non-ELGs and give reference redshift for ELGs

no

speed is fast and independent on the redshift values

[0.0.65]

higher

yes

higher

higher

yes

speed is slow and dependent on the redshift values

ordinary

higher

no

ordinary

higher

Full-spectrum matching method

˚ break as the start points of the whole edge of the 4000 A processing procedure. The method does not adopt the standard galaxy template spectra database and exhaustive searching strategy; therefore, its measurement speed is high and independent of the redshift value. Actually, FRA determines redshifts by extracting and recognizing two pairs of absorption lines, generally, in all kinds of galaxies, G-band 4306 and H 4342, Hˇ 4863 and Mg 5177, instead of the emission lines used by other methods. The measurement range of the redshift is wide, from 0 to 0.65, because the absorption lines used as features by FRA locate in the blue band of the spectra. 3. The FRA method adopts the judgment-feedback processing module, which can judge and control whether outputting the measured redshift, measuring the redshift again or setting rejection flag, according to the confidence level parameter. 4. The FRA method can be used in many current fiber spectroscopic telescopes such as LAMOST or SDSS. However, because FRA requires a relatively large wavelength range and a spectra resolution above R = 1000, it cannot be applied to some kinds of redshift surveys that never meet the above conditions. In addition, the FRA algorithm is not suitable for ˚ break. However, galaxies with an unclearly visible 4000 A

commonly, such types of galaxies have several prominent emission lines, which can be removed by the galaxy category judgment module of FRA. The whole performance index of FRA are listed in table 8. (Where the meaning of I1 is the application scope. I2: whether one needs the galaxy template spectra. I3: the measurement speed. I4: the range of the measured redshift. I5: the success rate on the condition of SNg > 5. I6: whether one considers the physical properties of the spectra. I7: the ability of processing data missing spectra with SNg > 5. I8: the denoising performance.) In the future, our research work will mainly focus on denoising techniques and finding spectral features in the ultraviolet band, so as to measure higher redshifts. We thank our referee for valuable comments and suggestions, which helped us to improve this work. We thank Fang Zuo for doing a part of the experiments about the fullspectrum matching method, and thank James Wicker and Wei Du for kindly correcting the text. We acknowledge a grant support given by the Natural Science Foundation of China (NSFC) under No. 10973021.

References Adelman-McCarthy, J. K., et al. 2008, ApJS, 175, 297 Bian, Z. Q., & Zhang, X. G. 2000, Pattern Recognition, (Beijing: Tsinghua University Press) Bruzual, A. G., et al. 1983, ApJ, 273, 105 Colless, M., et al. 2001, MNRAS, 328, 1039 Drinkwater, M., & Blake, C. 2009, Anglo-Australian Observatory Newsletter, 115, 3 Duan, F.-Q., Wu, F.-C., Luo, A.-L., & Zhao, Y.-H. 2005, Chin. Spectrosc. Spectral Anal., 25, 1895 Garilli, B., Fumana, M., Franzetti, P., Paioro, L., Scodeggio, M., Le F`ebre, O., Paltani, S., & Scaramella, R. 2010, PASP, 122, 827 Glazebrook, K., Offer, A. R., & Deeley, K. 1998, ApJ, 492, 98 Kennicutt, R. C., Jr. 1992, ApJS, 79, 255

Kinney, A. L., Calzetti, D., Bohlin, R. C., McQuade, K., Storchi-Bergmann, T., & Schmitt, H. R. 1996, ApJ, 467, 38 Luo, A.-L., & Zhao, Y.-H. 2001, Chin. J. Astron. Astrophys., 1, 563 Qiu, B., Hu, Z. Y., & Zhao, Y. H. 2002, Chin. Spectrosc. Spectral Anal., 22, 695 Stoughton, C., et al. 2002, AJ, 123, 485 Tonry, J., & Davis, M. 1979, AJ, 84, 1511 Trager, S. C., Worthey, G., Faber, S. M., Burstein, D., & Gonz´alez, J. J. 1998, ApJS, 116, 1 Tresse, L., Maddox, S., Loveday, J., & Singleton, C. 1999, MNRAS, 310, 262 Yip, C. W., et al. 2004, AJ, 128, 585 Zaritsky, D., Zabludoff, A. I., & Willick, J. A. 1995, AJ, 110, 1602